Image

Qwen Image 2.0

Fast unified image generation and editing for rapid iteration

About model

Qwen Image 2.0 is Alibaba's 7B parameter unified image generation and editing model, currently ranked #1 on AI Arena for both text-to-image and image editing. Built on an 8B Qwen3-VL encoder paired with a 7B diffusion decoder, it delivers fast generation with strong prompt fidelity at native 2K resolution, balanced for rapid iteration and prototyping workflows. The model supports prompts up to 1,000 tokens with professional text rendering in English and Chinese, and handles both generation and editing in a single architecture.

AI Arena

Blind human evaluation for text-to-image and image editing

Native Resolution

7B model generating at 2048x2048 natively

DPG-Bench

88.32

Prompt adherence and spatial reasoning

Model key capabilities

Fast Iteration: Balanced generation speed with strong prompt fidelity for rapid prototyping and iterative design workflows
Unified Generation & Editing: Single 7B model handling text-to-image creation and reference-based image editing including style transfer, object manipulation, and in-image text editing
Professional Text Rendering: High-fidelity typography in English and Chinese with accurate layout and surface-adaptive rendering across infographics, posters, and slides
Flexible Output Control: Native 2K resolution with multiple aspect ratios, seed reproducibility, negative prompts, and multiple outputs per request

API usage

cURL
Python
Typescript

Endpoint:

Qwen/Qwen-Image-2.0

curl -X POST "https://api.together.xyz/v1/images/generations" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen-Image-2.0",
    "prompt": "Draw an anime style version of this image.",
    "width": 1024,
    "height": 768,
    "steps": 28,
    "n": 1,
    "response_format": "url",
    "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
  }'

from together import Together

client = Together()

imageCompletion = client.images.generate(
    model="Qwen/Qwen-Image-2.0",
    width=1024,
    height=768,
    steps=28,
    prompt="Draw an anime style version of this image.",
    image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
)

print(imageCompletion.data[0].url)

import Together from "together-ai";

const together = new Together();

async function main() {
  const response = await together.images.create({
    model: "Qwen/Qwen-Image-2.0",
    width: 1024,
    height: 1024,
    steps: 28,
    prompt: "Draw an anime style version of this image.",
    image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
  });

  console.log(response.data[0].url);
}

main();

Model card
Architecture Overview:
• 7B parameter diffusion decoder paired with an 8B Qwen3-VL vision-language encoder
• Unified architecture handling both text-to-image generation and instruction-based image editing in a single model
• Native 2K resolution output (2048x2048) with flexible aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3)
• Balanced for fast generation with strong prompt fidelity across rapid iteration cycles
• Prompts up to 1,000 tokens for complex scene descriptions with detailed typography instructions
• Reference image input for editing workflows including style transfer, object manipulation, and text editing within images

Training Methodology:
• Trained for high-fidelity text rendering across alphabetic and logographic scripts (English and Chinese)
• Optimized for typographic accuracy, layout coherence, and contextual integration of text within images
• Text adapts to different surfaces (glass, fabric, paper, signage) with correct perspective and material properties
• Supports diverse artistic styles from photorealism to anime, impressionist, and minimalist design

Performance Characteristics:
• #1 on AI Arena blind human evaluation for both text-to-image generation and image editing
• 88.32 on DPG-Bench for prompt adherence, spatial reasoning, and attribute binding
• Fast generation speed balanced for iterative design workflows
• Multiple output generation per request with seed-based reproducibility
‍
Prompting
Together AI API Access:
• Access Qwen Image 2.0 via Together AI APIs using the endpoint Qwen/Qwen-Image-2.0
• Authenticate using your Together AI API key in request headers
• Control output dimensions with height/width parameters (total pixels: 262,144 to 4,194,304)
• Use reference_images array for image editing workflows
• Supports seed for reproducibility, negative prompts, and multiple outputs per request
‍
Applications & use cases
Rapid Prototyping & Iteration:
• Fast concept exploration and visual ideation with strong prompt fidelity
• Quick-turnaround mockups for marketing, social media, and presentations
• Iterative design cycles with seed reproducibility for controlled variation

Design & Creative:
• Infographics, posters, and slides with accurate text rendering in English and Chinese
• Comic and storyboard generation with dialogue text and multi-panel layouts
• Product photography with accurate labels and packaging text

Image Editing Workflows:
• Style transfer and object manipulation via reference images
• Text editing within existing images with font and style preservation
• Detail enhancement and visual refinement
‍

Related models

Model specifications

Model data

Model provider
Qwen
Type
Image
Main use cases
Image Generation
Resolution/Duration
512x512 to 2048x2048
Deployment
Serverless
Endpoint
Qwen/Qwen-Image-2.0
Parameters
7B
Price
$0.04 / image
Input modalities
Text
Image
Output modalities
Image

Released
February 9, 2026
Category
Image

Run in Playground

Quickstart docs

Deploy model

Qwen Image 2.0

About model

API usage

Model card

Prompting

Applications & use cases