Models / OpenAI
Chat

gpt-oss-20B

Efficient open reasoning model for scalable AI deployment

About model

Scalable Open Reasoning:
gpt-oss-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    OpenAI/gpt-oss-20B

    curl -X POST https://api.together.xyz/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -d '{
        "model": "OpenAI/gpt-oss-20B",
        "messages": [{
          "role": "user",
          "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
        }]
      }'
    
    from together import Together
    
    client = Together()
    response = client.chat.completions.create(
      model="OpenAI/gpt-oss-20B",
      messages=[
      	{
    	    "role": "user", 
          "content": "Given two binary strings `a` and `b`, return their sum as a binary string"
        }
     ],
    )
    
    print(response.choices[0].message.content)
    
    
    import Together from "together-ai";
    
    const together = new Together();
    
    async function main() {
      const response = await together.chat.completions.create({
        model: "OpenAI/gpt-oss-20B",
        messages: [{
          role: "user",
          content: "Given two binary strings `a` and `b`, return their sum as a binary string"
        }]
      });
      
      console.log(response.choices[0]?.message?.content);
    }
    
    main();
    
    
  • Model card

    Architecture Overview:
    • Compact Mixture-of-Experts (MoE) design with SwiGLU activations
    • Token-choice MoE optimized for single-GPU efficiency
    • Alternating attention mechanism with full and sliding window contexts
    • Learned attention sink architecture for memory optimization

    Training Methodology:
    • Comprehensive safety evaluation and testing protocols
    • Global community feedback integration
    • Malicious fine-tuning resistance verification
    • Standard GPT-4o tokenizer with Harmony format compatibility

    Performance Characteristics:
    • Native FP4 quantization for optimal inference speed
    • Single B200 GPU deployment capability
    • 128K context window with efficient memory usage
    • Adjustable reasoning effort levels for task-specific optimization

  • Applications & use cases

    Development Applications:
    • Rapid prototyping and development support
    • Code generation and optimization
    • API design and documentation
    • System integration and testing

    Business Solutions:
    • Customer support automation
    • Content generation and editing
    • Process automation and workflow optimization
    • Market research and analysis

    Educational Use Cases:
    • Interactive tutoring and learning assistance
    • Curriculum development support
    • Research methodology guidance
    • Academic writing and editing

    Deployment Advantages:
    • Cost-effective single-GPU operation
    • Reduced infrastructure requirements
    • Scalable deployment across multiple instances
    • Edge computing and distributed processing capabilities

Related models
  • Model provider
    OpenAI
  • Type
    Chat
  • Main use cases
    Chat
    Small & Fast
    Function Calling
  • Features
    Function Calling
    JSON Mode
  • Fine tuning
    Supported
  • Speed
    High
  • Intelligence
    High
  • Deployment
    Serverless
    On-Demand Dedicated
    Monthly Reserved
  • Parameters
    20B
  • Context length
    128K
  • Input price

    $0.05 / 1M tokens

  • Output price

    $0.20 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    August 4, 2025
  • Last updated
    August 4, 2025
  • External link
  • Category
    Code