To prompt this spec correctly, I had to first understand exactly how ILM built it in 1992.
This is a shot-by-shot study of the Nike Godzilla commercial — using it as a cinematography reference to learn how to direct AI video tools.
The workflow: Weavy for generation.
ILM shot everything in slow motion and selectively sped it up in the cut to sell the weight of massive scale.
Some issues I encountered:
Character consistency across shots. Godzilla looked different in almost every generation. Different dorsal fin shape, different skin texture, different body proportion.
Expression control. The ILM Godzilla has a very specific read. Added negative prompts like "aggressive", "fighting" to help balance out the emotion.
Camera lock. Trying to get the camera locked off - took a few tries to get it there. Seemed to not follow even with negative promts.
Scale consistency. Keeping Godzilla and the player at the same relative scale to each other and to the miniature buildings across different shots was challening.
The miniature set feeling. This is the hardest one to articulate. The ILM footage has a specific tactile quality — you can feel the physicality of the model buildings, the practical smoke, the real debris. Getting AI generation to feel handmade rather than digital is an ongoing problem that no prompt fully solves yet.
Please note, that the main goal of this case was just study the craft, not an attempt to replicate it.