Seedance

Seedance is ByteDance’s video generation model family, developed by the team behind Doubao and the Seedream image models. It is built for fast, prompt-faithful video generation: a single short prompt produces a coherent 5- or 10-second clip at up to 1080p, with strong adherence to camera-direction language (“dolly in”, “tracking shot”, “static”) and good motion stability across frames.

Architecture

Seedance is a Diffusion Transformer (DiT) trained jointly on text-to-video and image-to-video objectives. Two design choices stand out for API users:

Decoupled spatial and temporal layers — the model alternates spatial-attention blocks (per-frame composition) with temporal-attention blocks (cross-frame motion). This split lets it learn fine-grained appearance and physics independently, which produces fewer flicker artifacts on textures like fur, fabric, and water than single-stream architectures.
Multi-shot training — Seedance is trained on multi-shot sequences, so a single prompt that describes a brief sequence of actions (“the dog runs to the camera, then leaps over the puddle”) tends to produce a continuous shot that follows the described beats rather than a single static motion.

Variant comparison

The Seedance family on Prodia ships in two variants. Pro Turbo is the default starting point for almost all use cases; Pro exists for cases where you need maximum quality and are willing to pay more for it.

Feature	Pro Turbo	Pro
Job type prefix	`inference.seedance.proturbo.*`	`inference.seedance.pro.*`
Resolutions	480p, 1080p	480p, 1080p
Aspect ratios (txt2vid)	16:9, 9:16, 1:1, 4:3, 3:4, 21:9	16:9, 9:16, 1:1, 4:3, 3:4, 21:9
Duration	5s or 10s	5s or 10s
Generation time	~45s	~60s
Cost (1080p, 5s)	lower (around half of Pro)	higher

Pro Turbo — the recommended default. A distillation/optimization tier with the same API surface as Pro and ~25% faster generation. The quality difference for typical prompts is small enough that Pro Turbo is the right choice unless you have a specific reason to pay more.

Pro — full-quality variant. Useful when you have a finished prompt that didn’t quite render the way you wanted on Pro Turbo and you want one more pass with the higher-quality model before iterating on the prompt.

When to use Seedance

Use case	Job type	Why Seedance
Short cinematic clip from a prompt	`proturbo.txt2vid`	~45s generation at 1080p, strong camera-direction prompting
Animating a hero image / product shot	`proturbo.img2vid`	Faithful starting frame + natural motion in 5s
Loopable social content	`proturbo.txt2vid` with `aspect_ratio: 9:16`	Native portrait support for TikTok/Reels
Final-pass quality on a locked prompt	`pro.txt2vid` / `pro.img2vid`	Slightly higher fidelity than Pro Turbo at higher cost

For audio-driven video (lip-sync from a portrait + audio file) or first-and-last-frame interpolation, Wan 2.7 is the better fit — Seedance does not accept audio or a last_frame. For very low-latency 720p-only generation, Wan 2.2 Lightning generates in ~22s. For the most precise camera choreography (per-axis pan, tilt, roll, zoom), Kling exposes those controls explicitly.

Job types

Job type	Description	ETA
`inference.seedance.proturbo.txt2vid.v1`	Text-to-video, Pro Turbo	~45s
`inference.seedance.proturbo.img2vid.v1`	Image-to-video from a starting frame, Pro Turbo	~45s
`inference.seedance.pro.txt2vid.v1`	Text-to-video, Pro	~60s
`inference.seedance.pro.img2vid.v1`	Image-to-video from a starting frame, Pro	~60s

Parameters

Common to all job types:

prompt (required) — text description of the desired clip, 3–4,096 characters
resolution — "480p" or "1080p" (default: "1080p"). Exact width/height varies with aspect ratio.
duration — 5 or 10 seconds (default: 5)
camera_fixed — true to discourage camera motion (default: false, not guaranteed)
watermark — true to add an “AI-generated” watermark (default: false)
seed — integer for reproducible results

Text-to-video only:

aspect_ratio — "16:9" (default), "9:16", "1:1", "4:3", "3:4", or "21:9"

Image-to-video only:

first_frame — filename of the starting frame, supplied as a multipart form file. Aspect ratio of the output follows the input image.

Prompting tips

Describe the motion, not just the scene — Seedance is trained to follow camera and subject-motion language. A prompt like "slow dolly in on the puppy, shallow depth of field" produces noticeably more intentional motion than "a puppy".
Use camera_fixed for product shots — when you want the subject to move but the camera held still (e.g. a rotating product on a tabletop), set camera_fixed: true and describe the subject motion in the prompt.
Match aspect ratio to platform — 9:16 for TikTok/Reels/Shorts, 16:9 for YouTube/landing pages, 1:1 for Instagram feed, 21:9 for cinematic banners.
480p is genuinely useful for iteration — 480p costs roughly an order of magnitude less than 1080p and finishes in the same wall time. Iterate on prompts at 480p, then switch the same prompt to 1080p for the final render.
Short, declarative prompts beat long ones — Seedance handles 1–2 sentence prompts well. Very long prompts tend to get partially ignored; if you need to describe a sequence of actions, separate beats with commas rather than long subordinate clauses.

Examples

Text-to-video at 1080p, 16:9, 5 seconds:

{
  "type": "inference.seedance.proturbo.txt2vid.v1",
  "config": {
    "prompt": "a golden retriever puppy running through a field of wildflowers in slow motion, cinematic lighting, hyperrealistic",
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

Sample frame from the generated clip (480p preview shown for size):

Seedance Pro Turbo text-to-video output, golden retriever puppy

Vertical 9:16 clip for TikTok/Reels with a fixed camera:

{
  "type": "inference.seedance.proturbo.txt2vid.v1",
  "config": {
    "prompt": "a barista pouring latte art into a cup, steam rising, warm morning light",
    "resolution": "1080p",
    "aspect_ratio": "9:16",
    "duration": 5,
    "camera_fixed": true
  }
}

Image-to-video — animate a still frame with a described camera move. The first_frame field names the file, which is uploaded alongside the job as multipart form-data:

{
  "type": "inference.seedance.proturbo.img2vid.v1",
  "config": {
    "first_frame": "cyberpunk-input.jpg",
    "prompt": "the camera slowly pushes forward through the cyberpunk market, neon signs flicker, people walk by",
    "resolution": "1080p",
    "duration": 5
  }
}

Input frame:

Seedance img2vid input — cyberpunk street

Sample frame from the generated clip — the camera has pushed forward through the same scene:

Seedance Pro Turbo img2vid output frame

Guides

Generating Videos Step-by-step tutorial covering text-to-video and image-to-video against the v2 inference API.

Polling Async Jobs The submit → poll → download flow used by all Seedance job types.

Async API Reference Endpoint reference for /v2/job/async.