Video Prompt Structure That Works Across Runway, Kling, and Veo

Tomas

Text-to-video prompts that work for image generation often fail badly for video. Motion, camera behavior, and pacing need to be specified explicitly - and the structure that works is different from what most image prompters expect.

Why image prompt habits break in video

Image generation is stateless. Video generation is temporal. The model has to commit to what happens over time, maintain consistency frame to frame, and handle motion physics. A prompt that produces a great single frame often produces incoherent or boring video because it says nothing about movement.

The 6-block prompt structure

Write one block at a time, then merge into a single clean prompt. This separation forces you to think about each dimension explicitly.

—

Block 1: Subject and action

What is in the scene and what is it doing. Be specific about motion verbs.

Weak: a woman in a cafe
Strong: a woman sitting at a cafe window, slowly stirring her coffee, glancing toward the street

The action words (“slowly stirring”, “glancing”) give the model motion direction.

—

Block 2: Camera and shot

Type of shot and camera position.

extreme close-up on hands
medium shot, slightly low angle
wide establishing shot
over-the-shoulder perspective

Without this, models default to whatever feels compositionally safe - often a static medium shot.

—

Block 3: Motion and pacing

Camera movement and scene energy. These are separate from subject motion.

Camera: slow dolly forward, static locked-off shot, gentle handheld drift, crane shot rising slowly

Pacing: unhurried, frenetic, calm and meditative, building tension

—

Block 4: Lighting

Be specific. “Good lighting” means nothing.

golden hour backlighting
cool blue neon ambient, rain-wet reflections
harsh overhead fluorescent
soft window light from the left

—

Block 5: Lens and focal length

This is the most underused block and one of the highest-leverage ones.

85mm portrait lens, shallow depth of field, background bokeh
24mm wide, everything in focus
telephoto compression, 200mm
fisheye lens distortion

Lens terms dramatically shift the feel of the output.

—

Block 6: Style and negative constraints

Visual style and what to exclude.

Style: cinematic film look, subtle grain, documentary style, naturalistic, hyper-real commercial aesthetic

Negatives: no text, no watermarks, no jump cuts, no camera shake (add whatever the specific tool responds to)

—

Assembled example

A woman sitting at a cafe window, slowly stirring her coffee, glancing toward the street. Medium shot, slightly low angle. Static locked-off camera, calm and unhurried pacing. Golden hour backlighting from the window. 85mm portrait lens, shallow depth of field, background bokeh. Cinematic film look, subtle grain. No text, no watermarks.

—

Tips per tool

Runway: responds well to camera instruction terms, generous with style keywords
Kling: rewards shorter prompts, benefits most from explicit negative constraints
Veo: documentary and cinematic reference styles trigger its best output

What structures are working for you? Drop your go-to video prompt formats below.

Curated by Selendia AI 🎥