Wan 2.2 Lightning
Wan 2.2 Lightning is Alibaba’s fast video generation model from the Wan family. It generates video in around 22 seconds — making it practical for interactive applications and high-volume pipelines.
Architecture
Section titled “Architecture”Wan 2.2 Lightning uses 14B active parameters in a Diffusion Transformer (DiT) architecture with three key components:
- T5 text encoder — encodes multilingual text input with cross-attention in each transformer block
- Spatiotemporal 3D VAE — compresses video frames simultaneously across space and time, dramatically reducing compute requirements
- DiT backbone — processes the compressed latent space with shared MLP modules across transformer blocks
Wan 2.2 Lightning generates video in just 4 diffusion steps without requiring classifier-free guidance (CFG), enabling fast generation while maintaining strong visual quality.
When to use Wan 2.2 Lightning
Section titled “When to use Wan 2.2 Lightning”- Social media content — fast turnaround for short-form video at 720p
- Rapid prototyping — quickly test video concepts before committing to longer, higher-quality generation
- High-volume pipelines — the ~22s generation time makes batch processing practical
- Image animation — bring product shots, illustrations, or photos to life with the img2vid mode
For higher resolution (1080p), longer duration (up to 15s), or features like audio-driven lip sync and video continuation, consider Wan 2.7 instead.
Job types
Section titled “Job types”| Job type | Description |
|---|---|
inference.wan2-2.lightning.txt2vid.v0 | Generate a video from a text prompt |
inference.wan2-2.lightning.img2vid.v0 | Generate a video from an input image and prompt |
Parameters
Section titled “Parameters”prompt(required) — text description of the video to generate, up to 2,500 charactersresolution— output resolution:720p(default, 1280x720) or480p(832x480)seed— integer for reproducible resultsimage(img2vid only) — input image filename to animate
Prompting tips
Section titled “Prompting tips”Wan 2.2 Lightning responds well to specific, action-oriented prompts. Include details about movement, camera angle, and visual style:
- Be specific about motion: “A cat walking slowly through a garden” works better than “a cat in a garden”
- Include visual style cues: “cinematic lighting”, “slow motion”, “4k” help guide quality
- Describe camera movement: “tracking shot”, “pan left”, “aerial view” improve spatial coherence
- Keep it concise: the model performs best with focused, clear prompts rather than long descriptions
Examples
Section titled “Examples”Text-to-video:
{ "type": "inference.wan2-2.lightning.txt2vid.v0", "config": { "prompt": "Two anthropomorphic cats boxing on a spotlighted stage, cinematic lighting, dynamic camera angles", "resolution": "720p" }}Image-to-video (animate a still image):
{ "type": "inference.wan2-2.lightning.img2vid.v0", "config": { "prompt": "The person slowly turns their head and smiles, natural movement", "image": "portrait.jpg", "resolution": "720p" }}