Models / Deep Cogito
LLM
Chat

Cogito v2.1 671B

Advanced hybrid reasoning model with self-improving capabilities

About model

Cogito v2.1 671B is Deep Cogito's flagship open-source hybrid reasoning model, built on Iterated Distillation and Amplification (IDA) that learns to think better through self-improvement. Outperforming all US open models and rivaling Claude 4 Opus and O3, Cogito v2.1 achieves frontier-level performance while using 60% shorter reasoning chains than competitors — delivering breakthrough efficiency with 4,894 average tokens per response (lowest among frontier models) at just $1.25 per million tokens.

AIME 2025 (Competition Math)

89.47%

Elite mathematical reasoning outperforming models 10x larger

More Efficient Reasoning

60%

Shorter chains than DeepSeek R1 with equal accuracy

Avg Tokens per Response

4,894

Lowest among all frontier models for massive cost savings

Model key capabilities
  • Hybrid Reasoning Modes: Seamlessly switch between fast standard responses and deep step-by-step reasoning
  • Self-Improving Intelligence: IDA methodology distills reasoning discoveries back into parameters, compounding over time
  • State-of-the-Art Benchmarks: 98.57% MATH-500, 77.72% GPQA Diamond, 84.69% MMLU Pro
  • Production-Ready Efficiency: 128K context window, OpenAI-compatible API, native tool calling support
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

76.0%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    deepcogito/cogito-v2-1-671b

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "deepcogito/cogito-v2-1-671b",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="deepcogito/cogito-v2-1-671b",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'deepcogito/cogito-v2-1-671b',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Cogito v2.1 671B employs a sophisticated Mixture-of-Experts (MoE) architecture with 671 billion total parameters, utilizing sparse routing mechanisms to activate only specialized expert subnetworks per token, enabling massive scale without proportional compute costs
    • Features a 128K token context window optimized for long-form reasoning, technical documentation, and multi-turn conversations
    • Implements a hybrid inference system supporting both standard mode (direct answers using internalized "intuition") and reasoning mode (step-by-step self-reflection with visible thought chains)
    • Optimized for efficient serverless deployment on Together AI's infrastructure

    Training Methodology - Iterated Distillation & Amplification (IDA):
    • Revolutionary self-improvement approach where the model runs reasoning chains during training, then is trained on its own intermediate thoughts to develop stronger "machine intuition"
    • Unlike traditional models that rely on extended inference-time reasoning, Cogito distills successful reasoning patterns directly into model parameters
    • Training process explicitly rewards shorter, more efficient reasoning paths while discouraging unnecessary computational detours
    • Trained on multilingual datasets spanning 30+ languages with emphasis on coding, STEM, instruction following, and general helpfulness
    • Total training cost remarkably achieved at under $3.5 million for the entire Cogito family (3B to 671B), demonstrating unprecedented cost efficiency

    Performance Characteristics:
    • AIME 2025 (Competition Mathematics): 89.47% - outperforming models 10x larger
    • MATH-500 benchmark: 98.57% accuracy
    • GPQA Diamond (Scientific Reasoning): 77.72%
    • SWE-Bench Verified (Coding): 42.00% solve rate
    • MMLU Pro (Reasoning & Knowledge): 84.69%
    • Multilingual MMLU: 86.24% across 30+ languages
    • Average token efficiency: 4,894 tokens per response (lowest among frontier models)
    • Performance competitive with DeepSeek v3, matching or exceeding latest 0528 model while using 60% shorter reasoning chains
    • Approaches capabilities of closed models like Claude 4 Opus, O3, and GPT-5 across diverse benchmarks
    • Demonstrates emergent multimodal reasoning capabilities, able to reason about images despite not being explicitly trained for visual tasks

  • Applications & use cases

    High-Performance Use Cases:
    • Advanced Mathematical Problem Solving: Superior performance on competition mathematics (AIME 2025: 89.47%), calculus, optimization problems, and quantitative analysis
    • Software Engineering & Code Generation: 42% solve rate on SWE-Bench demonstrates strong debugging, code review, and system design capabilities
    • Scientific Research & STEM: 77.72% on GPQA Diamond showcases expertise in physics, chemistry, biology, and interdisciplinary scientific reasoning
    • Multilingual Applications: 86.24% on Multilingual MMLU enables global deployment across 30+ languages with native-level comprehension
    • Legal & Policy Analysis: Reasoning mode excels at applying precedents, analyzing case law, and providing nuanced legal interpretations

    Enterprise Applications:
    • Intelligent Document Processing: 128K context window handles entire technical documents, contracts, research papers in single context
    • Customer Support Automation: Hybrid mode allows fast responses for simple queries, deep reasoning for complex troubleshooting
    • Financial Analysis & Risk Assessment: Strong quantitative reasoning combined with efficient token usage for cost-effective at-scale deployment
    • Educational Technology: Step-by-step reasoning mode ideal for tutoring, homework help, and adaptive learning systems
    • Research Assistance: Frontier performance at $1.25/1M tokens makes large-scale research analysis economically viable

    Developer & Research Applications:
    • Rapid Prototyping: Together AI's serverless platform enables instant deployment without infrastructure setup
    • Model Experimentation: Compare standard vs reasoning modes in real-time via playground interface
    • Benchmark Development: Performance approaching closed frontier models enables reproducible research
    • Scalable Research: Serverless infrastructure scales automatically for large-scale experiments

    Cost-Sensitive Deployments:
    • High-Volume Production: Lowest token usage (4,894 avg) among frontier models translates to 20-40% cost savings vs alternatives
    • Serverless Efficiency: Pay-per-use pricing on Together AI eliminates infrastructure costs and management overhead
    • Startup & SMB Applications: Frontier capabilities at accessible pricing ($1.25/1M tokens) democratizes advanced AI
    • Auto-scaling: Together AI's serverless infrastructure automatically handles traffic spikes without manual intervention

    Unique Capabilities:
    • Emergent Image Reasoning: Despite no explicit visual training, demonstrates ability to reason about images when presented in context
    • Efficiency-First Design: 60% shorter reasoning chains mean faster responses and lower costs without sacrificing accuracy
    • Hybrid Intelligence: Seamlessly switch between fast intuition and deep deliberation based on query complexity

Related models
  • Model provider
    Deep Cogito
  • Type
    LLM
    Chat
  • Main use cases
    Chat
    Small & Fast
  • Deployment
    Serverless
  • Parameters
    671B (MoE)
  • Context length
    32K
  • Input price

    $1.25 / 1M tokens

  • Output price

    $1.25 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Quantization level
    FP8
  • Category
    Chat