GPT Image 2
Flagship image generation and editing model with built-in reasoning and layout control
About model
GPT Image 2 is OpenAI's flagship image generation and editing model, released April 21, 2026 as the successor to GPT Image 1. It is the first OpenAI image model with built-in reasoning capabilities, and accepts both text prompts and up to 16 reference images per call for reference-guided generation and in-context editing. The model's headline improvements are strong prompt adherence, photorealistic rendering, and significantly improved text legibility — including readable embedded type in signs, labels, UI elements, and structured visual layouts. Multilingual text rendering covers Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts at above 95% accuracy per OpenAI. Outputs span text-to-image generation and targeted image editing, across resolutions from 1K to 4K and a wide range of aspect ratios.
95%+
Across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts
16
Multi-modal context support up to 100 MB per reference image
4K
Native output spanning 1K, 2K, and 4K quality tiers
- Text in Images: OpenAI reports above 95% text rendering accuracy across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts — a meaningful improvement over GPT Image 1, where embedded text was a persistent failure mode. Readable signs, labels, UI elements, and multi-word strings are first-class outputs.
- Reference-Guided Generation: Accepts up to 16 reference images per call (up to 100 MB each), enabling style transfer, product comp consistency, and iterative editing workflows without fine-tuning — a direct API input rather than a separate workflow step.
- Photorealism and Style Range: OpenAI describes improvements to reflections, materials, lighting, and photographic fidelity, alongside strong coverage of non-photographic styles including illustration, manga, pixel art, and structured layout-sensitive outputs such as posters, packaging, and product comps.
- Structured Visual Outputs: Designed to produce layout-sensitive deliverables — posters, packaging, diagrams, infographics, magazine spreads, and product renderings — where spatial composition and text placement need to be precise, not approximate.
API usage
Endpoint:
Model card
Architecture Overview:
• Proprietary vision-generation transformer architecture with built-in semantic reasoning
• Supports multi-modal text and image inputs (up to 16 reference images simultaneously)
• Native multi-resolution generation engine stretching up to 4K outputs
• Flexible aspect ratio support including standard 1:1, 3:2, 2:3, 4:3, 3:4, 4:5, 5:4, 9:16, and 16:9 distributions
Features & Interface:
• In-context canvas control allowing direct image-to-image editing, background replacement, and asset variations
• Native text layer compositing enabling precise localized letter styling
• Multi-reference formatting handles style weights, structure masks, and layout baselines
Performance Characteristics:
• Drastic reductions in text fragmentation errors compared to legacy image models
• High spatial awareness for text alignment, boundary limits, and multi-line formatting grids
• Robust multilingual capability handling intricate character line structures like Hindi and Arabic scripts
Prompting
Together AI API Access:
• Access GPT Image 2 via Together AI APIs using the endpoint openai/gpt-image-2
• Authenticate using your Together AI API key in request headers
• Pass text instructions or array payloads containing reference image URLs up to a limit of 16 assets
• Configured for $0.053 per asset iteration on serverless infrastructure
Applications & use cases
Marketing & Graphic Design:
• Generate posters, ad banners, and promotional cards with perfectly typeset copy
• Localize promotional assets into multiple scripts automatically via native multilingual font rendering
E-commerce & Branding:
• Keep product presentation uniform by passing current templates as reference frames
• Construct detailed mockups, complex packaging visuals, and catalog assets matching target brand style guidelines
Editorial Production:
• Develop multi-element compositions, clean diagrams, information charts, and text-embedded infographics
• Author consistent stylistic illustrations for storytelling books, websites, and application backdrops
- TypeImageReasoning
- Main use casesImage Generation
- Resolution/Duration1K; 2K; 4K
- DeploymentServerless
- Endpoint
- Price
$0.053 / image /per-image
- Input modalitiesTextImage
- Output modalitiesImage
- ReleasedApril 20, 2026
- CategoryImage