Wan 2.2 Lightning

Wan 2.2 Lightning is Alibaba’s fast video generation model from the Wan family. It generates video in around 22 seconds — making it practical for interactive applications and high-volume pipelines.

Architecture

Wan 2.2 Lightning uses 14B active parameters in a Diffusion Transformer (DiT) architecture with three key components:

T5 text encoder — encodes multilingual text input with cross-attention in each transformer block
Spatiotemporal 3D VAE — compresses video frames simultaneously across space and time, dramatically reducing compute requirements
DiT backbone — processes the compressed latent space with shared MLP modules across transformer blocks

Wan 2.2 Lightning generates video in just 4 diffusion steps without requiring classifier-free guidance (CFG), enabling fast generation while maintaining strong visual quality.

When to use Wan 2.2 Lightning

Social media content — fast turnaround for short-form video at 720p
Rapid prototyping — quickly test video concepts before committing to longer, higher-quality generation
High-volume pipelines — the ~22s generation time makes batch processing practical
Image animation — bring product shots, illustrations, or photos to life with the img2vid mode

For higher resolution (1080p), longer duration (up to 15s), or features like audio-driven lip sync and video continuation, consider Wan 2.7 instead.

Job types

Job type	Description
`inference.wan2-2.lightning.txt2vid.v0`	Generate a video from a text prompt
`inference.wan2-2.lightning.img2vid.v0`	Generate a video from an input image and prompt

Parameters

prompt (required) — text description of the video to generate, up to 2,500 characters
resolution — output resolution: 720p (default, 1280x720) or 480p (832x480)
seed — integer for reproducible results
image (img2vid only) — input image filename to animate

Prompting tips

Wan 2.2 Lightning responds well to specific, action-oriented prompts. Include details about movement, camera angle, and visual style:

Be specific about motion: “A cat walking slowly through a garden” works better than “a cat in a garden”
Include visual style cues: “cinematic lighting”, “slow motion”, “4k” help guide quality
Describe camera movement: “tracking shot”, “pan left”, “aerial view” improve spatial coherence
Keep it concise: the model performs best with focused, clear prompts rather than long descriptions

Examples

Text-to-video:

{
  "type": "inference.wan2-2.lightning.txt2vid.v0",
  "config": {
    "prompt": "Two anthropomorphic cats boxing on a spotlighted stage, cinematic lighting, dynamic camera angles",
    "resolution": "720p"
  }
}

Image-to-video (animate a still image):

{
  "type": "inference.wan2-2.lightning.img2vid.v0",
  "config": {
    "prompt": "The person slowly turns their head and smiles, natural movement",
    "image": "portrait.jpg",
    "resolution": "720p"
  }
}

Guides

Generating Videos Step-by-step guide for generating videos with Prodia, including text-to-video and image-to-video examples.