Seedance
Seedance is ByteDance’s video generation model family, developed by the team behind Doubao and the Seedream image models. It is built for fast, prompt-faithful video generation: a single short prompt produces a coherent 5- or 10-second clip at up to 1080p, with strong adherence to camera-direction language (“dolly in”, “tracking shot”, “static”) and good motion stability across frames.
Architecture
Section titled “Architecture”Seedance is a Diffusion Transformer (DiT) trained jointly on text-to-video and image-to-video objectives. Two design choices stand out for API users:
- Decoupled spatial and temporal layers — the model alternates spatial-attention blocks (per-frame composition) with temporal-attention blocks (cross-frame motion). This split lets it learn fine-grained appearance and physics independently, which produces fewer flicker artifacts on textures like fur, fabric, and water than single-stream architectures.
- Multi-shot training — Seedance is trained on multi-shot sequences, so a single prompt that describes a brief sequence of actions (“the dog runs to the camera, then leaps over the puddle”) tends to produce a continuous shot that follows the described beats rather than a single static motion.
Variant comparison
Section titled “Variant comparison”The Seedance family on Prodia ships in two variants. Pro Turbo is the default starting point for almost all use cases; Pro exists for cases where you need maximum quality and are willing to pay more for it.
| Feature | Pro Turbo | Pro |
|---|---|---|
| Job type prefix | inference.seedance.proturbo.* | inference.seedance.pro.* |
| Resolutions | 480p, 1080p | 480p, 1080p |
| Aspect ratios (txt2vid) | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 |
| Duration | 5s or 10s | 5s or 10s |
| Generation time | ~45s | ~60s |
| Cost (1080p, 5s) | lower (around half of Pro) | higher |
Pro Turbo — the recommended default. A distillation/optimization tier with the same API surface as Pro and ~25% faster generation. The quality difference for typical prompts is small enough that Pro Turbo is the right choice unless you have a specific reason to pay more.
Pro — full-quality variant. Useful when you have a finished prompt that didn’t quite render the way you wanted on Pro Turbo and you want one more pass with the higher-quality model before iterating on the prompt.
When to use Seedance
Section titled “When to use Seedance”| Use case | Job type | Why Seedance |
|---|---|---|
| Short cinematic clip from a prompt | proturbo.txt2vid | ~45s generation at 1080p, strong camera-direction prompting |
| Animating a hero image / product shot | proturbo.img2vid | Faithful starting frame + natural motion in 5s |
| Loopable social content | proturbo.txt2vid with aspect_ratio: 9:16 | Native portrait support for TikTok/Reels |
| Final-pass quality on a locked prompt | pro.txt2vid / pro.img2vid | Slightly higher fidelity than Pro Turbo at higher cost |
For audio-driven video (lip-sync from a portrait + audio file) or first-and-last-frame interpolation, Wan 2.7 is the better fit — Seedance does not accept audio or a last_frame. For very low-latency 720p-only generation, Wan 2.2 Lightning generates in ~22s. For the most precise camera choreography (per-axis pan, tilt, roll, zoom), Kling exposes those controls explicitly.
Job types
Section titled “Job types”| Job type | Description | ETA |
|---|---|---|
inference.seedance.proturbo.txt2vid.v1 | Text-to-video, Pro Turbo | ~45s |
inference.seedance.proturbo.img2vid.v1 | Image-to-video from a starting frame, Pro Turbo | ~45s |
inference.seedance.pro.txt2vid.v1 | Text-to-video, Pro | ~60s |
inference.seedance.pro.img2vid.v1 | Image-to-video from a starting frame, Pro | ~60s |
Parameters
Section titled “Parameters”Common to all job types:
prompt(required) — text description of the desired clip, 3–4,096 charactersresolution—"480p"or"1080p"(default:"1080p"). Exact width/height varies with aspect ratio.duration—5or10seconds (default:5)camera_fixed—trueto discourage camera motion (default:false, not guaranteed)watermark—trueto add an “AI-generated” watermark (default:false)seed— integer for reproducible results
Text-to-video only:
aspect_ratio—"16:9"(default),"9:16","1:1","4:3","3:4", or"21:9"
Image-to-video only:
first_frame— filename of the starting frame, supplied as a multipart form file. Aspect ratio of the output follows the input image.
Prompting tips
Section titled “Prompting tips”- Describe the motion, not just the scene — Seedance is trained to follow camera and subject-motion language. A prompt like
"slow dolly in on the puppy, shallow depth of field"produces noticeably more intentional motion than"a puppy". - Use camera_fixed for product shots — when you want the subject to move but the camera held still (e.g. a rotating product on a tabletop), set
camera_fixed: trueand describe the subject motion in the prompt. - Match aspect ratio to platform —
9:16for TikTok/Reels/Shorts,16:9for YouTube/landing pages,1:1for Instagram feed,21:9for cinematic banners. - 480p is genuinely useful for iteration — 480p costs roughly an order of magnitude less than 1080p and finishes in the same wall time. Iterate on prompts at 480p, then switch the same prompt to 1080p for the final render.
- Short, declarative prompts beat long ones — Seedance handles 1–2 sentence prompts well. Very long prompts tend to get partially ignored; if you need to describe a sequence of actions, separate beats with commas rather than long subordinate clauses.
Examples
Section titled “Examples”Text-to-video at 1080p, 16:9, 5 seconds:
{ "type": "inference.seedance.proturbo.txt2vid.v1", "config": { "prompt": "a golden retriever puppy running through a field of wildflowers in slow motion, cinematic lighting, hyperrealistic", "resolution": "1080p", "aspect_ratio": "16:9", "duration": 5 }}Sample frame from the generated clip (480p preview shown for size):

Vertical 9:16 clip for TikTok/Reels with a fixed camera:
{ "type": "inference.seedance.proturbo.txt2vid.v1", "config": { "prompt": "a barista pouring latte art into a cup, steam rising, warm morning light", "resolution": "1080p", "aspect_ratio": "9:16", "duration": 5, "camera_fixed": true }}Image-to-video — animate a still frame with a described camera move. The first_frame field names the file, which is uploaded alongside the job as multipart form-data:
{ "type": "inference.seedance.proturbo.img2vid.v1", "config": { "first_frame": "cyberpunk-input.jpg", "prompt": "the camera slowly pushes forward through the cyberpunk market, neon signs flicker, people walk by", "resolution": "1080p", "duration": 5 }}Input frame:

Sample frame from the generated clip — the camera has pushed forward through the same scene:
