Company

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

Build complete multimodal applications with video, image, and text generation through unified APIs.

October 21, 2025

・

Justin Driemeyer, Necoline Hubner, Derek Petersen, Blaine Kasten, Rishabh Bhargava, Sonny Khan

What's New

New video generation API with models like OpenAI Sora 2, Google Veo 3.0, and Minimax Hailuo for high-quality video creation
40+ new image and video models, including Google's Imagen and Nano Banana, ByteDance SeeDream, and specialized editing tools
Complete workflows - Combine text, image, and video generation in single applications without switching providers
Same APIs you know - OpenAI-compatible endpoints, unified auth, transparent per-model pricing‍
Available now: Serverless endpoints with enterprise options for scale

Generative media is at the center of a new set of AI-native applications, from AI-powered video editors and personalized gaming experiences to automated marketing content. But building these apps has been complex, with developers having to juggle providers for text, images, and video—each with new SDKs, auth, rate limits, and billing. That fragmentation slows teams, complicates SLAs, and makes scaling a headache.

Today Together AI, the AI Native Cloud, is expanding the Together Model Platform to become your complete generative media infrastructure. Through our strategic partnership with Runware, we're integrating 20+ video models across six providers (including Google Veo 3.0, OpenAI Sora 2, and ByteDance Seedream) plus 15+ image models alongside leading LLMs and voice—spanning the quality-speed-cost spectrum that real applications demand, all accessible through the same fast, reliable APIs you use for text generation.

40+ Models Chosen for Production Workflows

New Video Generation Models

Video generation is new to Together AI. We're starting with models that create 4-30 second videos at various resolutions and styles. Each model optimizes for different needs - realism, motion consistency, or extended length. From quick 10-second clips with Minimax Hailuo to extended 30-second sequences with Kling v2.1, and specialized motion generation with SeeDance. This variety ensures developers can choose the right tool for their specific video generation requirements, from rapid prototyping to production-quality content creation.

Sora 2 Pro

Premium cinematic video generation with native audio and lifelike physics.

$2.40/video (720p/8s)

Try now

Google Veo 3

High-quality video creation with advanced camera movements and scene control.

$1.60/video (720p/8s)

Try now

PixVerse V5

Fast, affordable video generation with smooth motion and multiple artistic styles.

$0.30/video (1080p/5s)

Try now

ByteDance Seedance 1.0 Pro

Top-ranked video generation with multi-shot storytelling and cinematic quality.

$0.57/video (1080p/5s)

Try now

New Image Generation & Editing Models

Together AI's image generation capabilities span the full spectrum of creative and production needs. From photorealistic generation with Google's Imagen to artistic control with models like Nano Banana, developers get access to specialized tools optimized for different use cases without researching individual providers or managing separate integrations.

Gemini Flash Image 2.5 (Nano Banana)

Versatile image creation and editing with natural language control.

$0.039/image

Try now

Google Imagen 4.0 Ultra

Premium image generation with exceptional detail and text rendering.

$0.06/image

Try now

Qwen Image

High-quality image generation with perfect text integration and poster design.

$0.0058/image

Try now

34+ More Models

Complete range of specialized models for every creative and production use case.

From $0.0006/image

Browse all

Build Complete Workflows in One Platform

Combine text, image, and video generation in a single codebase without managing multiple providers. Your existing Together integration gains image editing, creative generation, and video production capabilities.

Here are three types of applications this makes practical to build:

🎮 Media Generation in Gaming

Technical capability: Gaming studios generating environmental assets, character variations, and cutscenes programmatically based on gameplay data.

Platform advantage: Single API call chain from game state to visual assets, enabling real-time content generation without managing multiple inference providers.

🛍️ Dynamic Advertising Creative

Technical capability: E-commerce platforms generating personalized product images, lifestyle shots, and video ads based on user preferences, seasonal trends, and inventory data.

Platform advantage: Real-time creative generation from user data to personalized visuals, enabling dynamic ad optimization without coordinating separate image and video providers.

🧠 Interactive Learning Platforms

Technical capability: Educational applications creating custom visual explanations, interactive diagrams, and personalized video content based on student questions and progress.

Platform advantage: Real-time multimodal responses using the same inference infrastructure, enabling sophisticated personalization without latency penalties from provider switching.

Production Deployment Options

Together AI's generative media capabilities are production-ready with enterprise-grade infrastructure and developer-focused tools.

Performance & Scale

✔ 40+ image and video models
✔ Up to 30-second video generation
✔ Multiple resolution options
✔ Transparent per-model pricing

Infrastructure

✔ Production-grade rate limits
✔ Serverless auto-scaling
✔ Global infrastructure
✔ Enterprise reliability

Developer Experience

✔ OpenAI-compatible APIs
✔ Same SDK as text models
✔ Unified authentication
✔ Single billing platform

Try it Now

If you're already using Together AI for text inference, adding image and video generation works the same way. Same authentication, same SDKs, same billing dashboard. All usage shows up in one place with transparent per-model pricing.

    
    from together import Together

    client = Together()

    # Create a video generation job
    job = client.videos.create(
        prompt="A serene sunset over the ocean with gentle waves",
        model="minimax/video-01-director",
        width=1366,
        height=768,
    )

    print(f"Job ID: {job.id}")

    # Check status
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    # When completed, access the video
    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")

Try the platform:

Interactive Playground - Test image and video generation before building
API Documentation - Complete integration guides and code examples
Model Library - Browse all available models with specifications

Deploy for production:

Start with serverless endpoints for development and testing
For enterprise deployments and maximum control, contact our Sales team.

The same Together AI platform you use for text inference now handles your complete generative AI stack. No additional integrations, no vendor management overhead, no learning new APIs - just expanded capabilities in the same developer experience you already know.

Ready to dive in?

Follow our step-by-step Quickstart to install, authenticate, and run your first video inference in minutes.

Get the Quickstart

Ready to build?

Start with our generative media APIs.

Try them now

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item #2

List Item #3

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Grow

Benefits included:

✔ Up to $30K in free platform credits*
✔ 6 hours of free forward-deployed engineering time.

Funding: ＄5M-$10M

Scale

Benefits included:

✔ Up to $50K in free platform credits*
✔ 10 hours of free forward-deployed engineering time.

Funding: ＄10M-＄25M

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

A. 2.08*1e-1 m
B. 2.08*1e-9 m
C. 2.08*1e-6 m
D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

A. releasing nitrogen in the soil.
B. crowding out non-native species.
C. adding carbon dioxide to the atmosphere.
D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

Links in this
article

Interactive Playground

Expanding Together AI Model Library into multimedia generation with 40+ new image and video models

40+ Models Chosen for Production Workflows

New Video Generation Models

New Image Generation & Editing Models

Build Complete Workflows in One Platform

Production Deployment Options

Try it Now

Subscribe to newsletter