gpt-oss-20B

Efficient open reasoning model for scalable AI deployment

About model

Scalable Open Reasoning:
gpt-oss-20B provides powerful chain-of-thought reasoning in an efficient 20B parameter model. Designed for single-GPU deployment while maintaining sophisticated reasoning capabilities, this Apache 2.0 licensed model offers the perfect balance of performance and resource efficiency for diverse applications.

‍

Quickstart guides

Agents

How to Build Coding Agents

RAG

Building a RAG Workflow

Integrations

Using Together with Vercel's AI SDK

API usage

cURL
Python
Typescript

Endpoint:

OpenAI/gpt-oss-20B

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "OpenAI/gpt-oss-20B",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="OpenAI/gpt-oss-20B",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'OpenAI/gpt-oss-20B',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• Compact Mixture-of-Experts (MoE) design with SwiGLU activations
• Token-choice MoE optimized for single-GPU efficiency
• Alternating attention mechanism with full and sliding window contexts
• Learned attention sink architecture for memory optimization

Training Methodology:
• Comprehensive safety evaluation and testing protocols
• Global community feedback integration
• Malicious fine-tuning resistance verification
• Standard GPT-4o tokenizer with Harmony format compatibility

Performance Characteristics:
• Native FP4 quantization for optimal inference speed
• Single B200 GPU deployment capability
• 128K context window with efficient memory usage
• Adjustable reasoning effort levels for task-specific optimization
‍
Applications & use cases
Development Applications:
• Rapid prototyping and development support
• Code generation and optimization
• API design and documentation
• System integration and testing

Business Solutions:
• Customer support automation
• Content generation and editing
• Process automation and workflow optimization
• Market research and analysis

Educational Use Cases:
• Interactive tutoring and learning assistance
• Curriculum development support
• Research methodology guidance
• Academic writing and editing

Deployment Advantages:
• Cost-effective single-GPU operation
• Reduced infrastructure requirements
• Scalable deployment across multiple instances
• Edge computing and distributed processing capabilities
‍

Related models

Model specifications

Model data

Model provider
OpenAI
Type
Chat
Main use cases
Chat
Small & Fast
Function Calling
Features
Function Calling
JSON Mode
Fine tuning
Supported
Speed
High
Intelligence
High
Deployment
Serverless
On-Demand Dedicated
Monthly Reserved
Endpoint
OpenAI/gpt-oss-20B
Parameters
20B
Context length
128K
Input price
$0.05 / 1M tokens
Output price
$0.20 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
August 4, 2025
Last updated
August 4, 2025
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model

gpt-oss-20B

About model

API usage

Model card

Applications & use cases