Wan 2.7
Video generation and editing with multi-reference control and temporal transfer
About model
Wan 2.7 is Wan AI's video generation and editing model, introducing instruction-based and reference-based video editing alongside temporal feature transfer. It supports text-to-video, image-to-video, and reference-to-video workflows with up to 5 simultaneous reference inputs for multi-subject compositions. The model accepts joint image, video, and audio references for synchronized subject+voice control, supports real human inputs as references or first frames, and generates native 1080p video from 2 to 15 seconds.
5
Multi-subject with mixed image/video/audio inputs
1080p
Across all generation and editing modes
2-15s
T2V and I2V with 2-10s for R2V
- Video Editing: Instruction-based and reference-based editing to modify subjects or scenes globally via text prompts or reference media
- Temporal Feature Transfer: Clone motion, camera moves, effects, and style directly from a reference video into new generations
- Multi-Reference Control: Up to 5 simultaneous references with joint subject+voice referencing via combined image, video, and audio inputs
- Flexible Generation: T2V, I2V, and R2V modes with first/last frame control, 3x3 grid-to-video, real human inputs, and native 1080p up to 15s
API usage
Endpoint:
Model card
Architecture Overview:
• Unified video generation and editing model supporting text-to-video (T2V), image-to-video (I2V), and reference-to-video (R2V) workflows
• Instruction-based and reference-based video editing: modify subjects or scenes globally via text prompts or reference media
• Temporal feature transfer: clone motion, camera moves, effects, and style directly from a reference video
• I2V supports first/last frame control and 3x3 grid-to-video generation
• Multi-reference support with up to 5 references for multi-subject, image+voice, and mixed image/video modes
• Joint subject+voice referencing via combined image, video, and audio inputs
• Real human image and video inputs supported as references or first frames
• Native 1080p output across all generation modes
Training Methodology:
• Built on the Wan model family with expanded capabilities for video editing and temporal transfer
• Trained for consistent subject identity preservation across multi-reference and multi-subject scenes
• Optimized for real human inputs maintaining natural appearance and motion
Performance Characteristics:
• T2V and I2V support 2-15 second generation with flexible duration control
• R2V supports 2-10 second generation
• Up to 5 simultaneous reference inputs for complex multi-subject compositions
• Temporal feature transfer preserves motion dynamics, camera work, and visual effects from source videoPrompting
Together AI API Access:
• Access Wan 2.7 via Together AI APIs using the endpoint Wan-AI/Wan2.7
• Authenticate using your Together AI API key in request headers
• Supports text-to-video, image-to-video, and reference-to-video generation modes
• Reference inputs accept image, video, and audio for joint subject+voice control
• Available on Together AI serverless infrastructureApplications & use cases
Video Editing & Post-Production:
• Instruction-based editing: modify subjects, scenes, and visual elements via text prompts
• Reference-based editing: transfer style, motion, and camera work from source videos
• Temporal feature transfer for replicating specific motion dynamics and effects
Marketing & Brand Content:
• Campaign video production with consistent character identity across multiple assets
• Product videos and social media content at native 1080p
• Brand mascot and spokesperson videos using real human reference inputs
Creative Production:
• Multi-subject scene composition with up to 5 reference inputs
• First/last frame control for precise narrative sequencing
• 3x3 grid-to-video generation for storyboard-to-video workflows
• Subject+voice referencing for synchronized character performances
- Model providerAlibaba
- TypeVideo
- Resolution/Duration1080p / 2-15s
- DeploymentServerless
- Endpoint
- Input modalitiesTextImageVideoAudio
- Output modalitiesVideo
- CategoryVideo