Models / OpenAI
Image
Reasoning

GPT Image 2

Flagship image generation and editing model with built-in reasoning and layout control

About model

GPT Image 2 is OpenAI's flagship image generation and editing model, released April 21, 2026 as the successor to GPT Image 1. It is the first OpenAI image model with built-in reasoning capabilities, and accepts both text prompts and up to 16 reference images per call for reference-guided generation and in-context editing. The model's headline improvements are strong prompt adherence, photorealistic rendering, and significantly improved text legibility — including readable embedded type in signs, labels, UI elements, and structured visual layouts. Multilingual text rendering covers Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts at above 95% accuracy per OpenAI. Outputs span text-to-image generation and targeted image editing, across resolutions from 1K to 4K and a wide range of aspect ratios.

Text Rendering Accuracy

95%+

Across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts

Max Reference Images

16

Multi-modal context support up to 100 MB per reference image

Output Resolution

4K

Native output spanning 1K, 2K, and 4K quality tiers

Model key capabilities
  • Text in Images: OpenAI reports above 95% text rendering accuracy across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts — a meaningful improvement over GPT Image 1, where embedded text was a persistent failure mode. Readable signs, labels, UI elements, and multi-word strings are first-class outputs.
  • Reference-Guided Generation: Accepts up to 16 reference images per call (up to 100 MB each), enabling style transfer, product comp consistency, and iterative editing workflows without fine-tuning — a direct API input rather than a separate workflow step.
  • Photorealism and Style Range: OpenAI describes improvements to reflections, materials, lighting, and photographic fidelity, alongside strong coverage of non-photographic styles including illustration, manga, pixel art, and structured layout-sensitive outputs such as posters, packaging, and product comps.
  • Structured Visual Outputs: Designed to produce layout-sensitive deliverables — posters, packaging, diagrams, infographics, magazine spreads, and product renderings — where spatial composition and text placement need to be precise, not approximate.
  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    openai/gpt-image-2

    curl -X POST "https://api.together.xyz/v1/images/generations" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "openai/gpt-image-2",
        "prompt": "Draw an anime style version of this image.",
        "width": 1024,
        "height": 768,
        "steps": 28,
        "n": 1,
        "response_format": "url",
        "image_url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
      }'
    
    from together import Together
    
    client = Together()
    
    imageCompletion = client.images.generate(
        model="openai/gpt-image-2",
        width=1024,
        height=768,
        steps=28,
        prompt="Draw an anime style version of this image.",
        image_url="https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
    )
    
    print(imageCompletion.data[0].url)
    
    
    
    import Together from "together-ai";
    
    const together = new Together();
    
    async function main() {
      const response = await together.images.create({
        model: "openai/gpt-image-2",
        width: 1024,
        height: 1024,
        steps: 28,
        prompt: "Draw an anime style version of this image.",
        image_url: "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png",
      });
    
      console.log(response.data[0].url);
    }
    
    main();
    
    
  • Model card

    Architecture Overview:
    • Proprietary vision-generation transformer architecture with built-in semantic reasoning
    • Supports multi-modal text and image inputs (up to 16 reference images simultaneously)
    • Native multi-resolution generation engine stretching up to 4K outputs
    • Flexible aspect ratio support including standard 1:1, 3:2, 2:3, 4:3, 3:4, 4:5, 5:4, 9:16, and 16:9 distributions

    Features & Interface:
    • In-context canvas control allowing direct image-to-image editing, background replacement, and asset variations
    • Native text layer compositing enabling precise localized letter styling
    • Multi-reference formatting handles style weights, structure masks, and layout baselines

    Performance Characteristics:
    • Drastic reductions in text fragmentation errors compared to legacy image models
    • High spatial awareness for text alignment, boundary limits, and multi-line formatting grids
    • Robust multilingual capability handling intricate character line structures like Hindi and Arabic scripts

  • Prompting

    Together AI API Access:
    • Access GPT Image 2 via Together AI APIs using the endpoint openai/gpt-image-2
    • Authenticate using your Together AI API key in request headers
    • Pass text instructions or array payloads containing reference image URLs up to a limit of 16 assets
    • Configured for $0.053 per asset iteration on serverless infrastructure

  • Applications & use cases

    Marketing & Graphic Design:
    • Generate posters, ad banners, and promotional cards with perfectly typeset copy
    • Localize promotional assets into multiple scripts automatically via native multilingual font rendering

    E-commerce & Branding:
    • Keep product presentation uniform by passing current templates as reference frames
    • Construct detailed mockups, complex packaging visuals, and catalog assets matching target brand style guidelines

    Editorial Production:
    • Develop multi-element compositions, clean diagrams, information charts, and text-embedded infographics
    • Author consistent stylistic illustrations for storytelling books, websites, and application backdrops

Related models
  • Model provider
    OpenAI
  • Type
    Image
    Reasoning
  • Main use cases
    Image Generation
  • Resolution/Duration
    1K; 2K; 4K
  • Deployment
    Serverless
  • Price

    $0.053 / image /per-image

  • Input modalities
    Text
    Image
  • Output modalities
    Image
  • Released
    April 20, 2026
  • Category
    Image