Sora 2
Sora 2 is OpenAI’s video generation model. Compared to first-generation video models, Sora 2 produces noticeably stronger physical realism — cause-and-effect plays out plausibly, missed shots ricochet rather than teleport, and characters behave consistently across cuts. The Pro variant generates a synchronized audio track alongside the video.
Variants
Section titled “Variants”Sora 2 ships in two variants on Prodia:
| Variant | Resolution | Audio | Best for |
|---|---|---|---|
| Sora 2 | Fixed (720p-class) | No | Rapid iteration, lower cost |
| Sora 2 Pro | 720p or 1080p | Yes (synchronized) | Final output, content with dialogue or sound design |
Both variants accept text-to-video and image-to-video inputs and produce 4, 8, or 12 second clips.
What sets Sora 2 apart
Section titled “What sets Sora 2 apart”Physical plausibility: Sora 2’s defining property is that it gets physics roughly right — basketballs miss the rim and bounce off the backboard, water settles, momentum carries through a gesture. Earlier text-to-video models tend to bend the world to satisfy the prompt, deleting or warping objects to make the requested outcome happen. Sora 2 is more willing to let an action fail, which produces more usable footage for narrative content.
Synchronized audio (Pro): Sora 2 Pro generates the audio track jointly with the video, so dialogue, footsteps, and ambient sound line up with what’s happening on screen. There’s no separate foley pass. Ambient cues described in the prompt (for example “rain on a metal roof”, “crowd chatter”) are produced as part of the same generation.
Steerable via prompt: The model responds well to detailed cinematographic direction — shot type, lens, lighting, camera motion — and to multi-shot prompts that specify a sequence of scenes within a single clip.
Image-to-video animation:
The img2vid job types accept a still image and a motion prompt describing how the scene should evolve. Useful for animating product shots, character portraits, or storyboard frames.
When to use Sora 2
Section titled “When to use Sora 2”- Narrative content with dialogue or sound — Sora 2 Pro is the right choice when you need an audio track baked in
- Action sequences — sports, stunts, and physics-driven scenes benefit from Sora 2’s grounded behavior
- Storyboard animation — animate a still image into a short clip without a separate foley step
- Vertical and landscape social — both 16:9 and 9:16 are supported natively
For longer clips with audio control or audio-driven generation, see Wan 2.7. For the fastest video generation on Prodia (~22s), use Wan 2.2 Lightning. For precise camera choreography (dolly, pan, zoom presets) or motion masking, see Kling. For another joint audio-visual model with last-frame transition control, see Veo.
Job types
Section titled “Job types”| Job type | Description | Audio | Resolution |
|---|---|---|---|
inference.sora-2.txt2vid.v1 | Sora 2 text-to-video | No | Fixed |
inference.sora-2.img2vid.v1 | Sora 2 image-to-video | No | Fixed |
inference.sora-2.pro.txt2vid.v1 | Sora 2 Pro text-to-video | Yes | 720p or 1080p |
inference.sora-2.pro.img2vid.v1 | Sora 2 Pro image-to-video | Yes | 720p or 1080p |
Parameters
Section titled “Parameters”Common to all Sora 2 job types:
prompt(required) — text description, 3 to 4,096 charactersaspect_ratio—16:9(default) or9:16duration—4(default),8, or12secondsseed— integer 1 to 2,147,483,647 for reproducible results
Pro variants only:
resolution—720p(default) or1080p
Image-to-video only:
image— input image filename to animate. The image is referenced from the multipart upload.
Prompting tips
Section titled “Prompting tips”- Lead with action, not subject: “A cyclist sprints up a steep hill, pedals out of the saddle” produces better motion than “a cyclist on a hill”
- Describe the soundscape (Pro): mention diegetic sound — “tires on gravel”, “wind through pines”, “low ambient room tone” — when using Sora 2 Pro
- Cinematographic direction works: terms like “handheld”, “tracking shot”, “rack focus”, “shallow depth of field”, “golden hour” are interpreted as expected
- Animation prompts (img2vid): describe how the existing scene should change rather than re-describing it — “the camera dollies in slowly as the subject turns to face it”
- Use seeds for iteration: hold the seed constant while you tweak the prompt to see how each phrase changes the output
Examples
Section titled “Examples”Text-to-video (Sora 2 standard):
{ "type": "inference.sora-2.txt2vid.v1", "config": { "prompt": "A close-up cinematic shot of a golden retriever puppy bounding through a field of wildflowers at sunrise, soft warm light, slow motion", "aspect_ratio": "16:9", "duration": 4 }}Text-to-video with audio (Sora 2 Pro at 1080p):
{ "type": "inference.sora-2.pro.txt2vid.v1", "config": { "prompt": "A barista in an empty cafe pulls an espresso shot at golden hour. The grinder hums, steam hisses, the milk pitcher clinks against the bar. Shallow depth of field, warm window light.", "resolution": "1080p", "aspect_ratio": "16:9", "duration": 8 }}Image-to-video animation:
{ "type": "inference.sora-2.img2vid.v1", "config": { "image": "product-shot.jpg", "prompt": "Slow turntable rotation, soft studio lighting, the product turns to reveal the back face", "aspect_ratio": "16:9", "duration": 4 }}