Models / Arcee AI
Chat

Arcee AI Trinity Mini

Advanced sparse MoE model for efficient inference on Together AI.

About model

Trinity Mini brings frontier-level language understanding to your applications without frontier costs. This 26B sparse mixture-of-experts model activates just 3B parameters per token, delivering exceptional reasoning, tool use, and multi-turn conversation capabilities across a 128K context window. Whether you're building conversational AI, agentic workflows, or production systems requiring long-context understanding, Trinity Mini offers the efficiency and performance to scale from prototype to production seamlessly.

Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

58.6%

92.1%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    arcee-ai/trinity-mini

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "arcee-ai/trinity-mini",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="arcee-ai/trinity-mini",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'arcee-ai/trinity-mini',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • Sparse mixture-of-experts (MoE) architecture with 26B total parameters and 3B activated per token
    • Efficient attention mechanism that reduces memory and compute requirements while preserving long-context coherence
    • 128K-token context window supporting extended document processing and multi-turn interactions

    Training Methodology:
    • Trained with continuous reinforcement learning techniques for ongoing capability improvements
    • Built by Arcee AI's collaborative research team focused on delivering best-in-class per-parameter performance
    • Optimized for multi-turn conversations, tool use, and structured outputs

    Performance Characteristics:
    • Strong context utilization that fully leverages long inputs for coherent multi-turn reasoning
    • Reliable function and tool calling capabilities for agent workflows
    • High inference efficiency generating tokens rapidly while minimizing compute
    • Outstanding price-to-performance ratio compared to dense models of similar capability

  • Applications & use cases

    Conversational AI Applications:
    • Multi-turn customer support chatbots with long conversation history
    • Virtual assistants with tool integration and function calling
    • Interactive documentation and knowledge base systems

    Agentic Workflows:
    • Multi-step agent systems requiring tool use and reasoning
    • Workflow automation with structured output generation
    • RAG systems with extended context understanding

    Enterprise Integration:
    • Cost-efficient production deployments via Together AI APIs
    • Internal tooling with natural language interfaces
    • Document analysis and processing pipelines with 128K context support

Related models
  • Model provider
    Arcee AI
  • Type
    Chat
  • Main use cases
    Chat
    Small & Fast
  • Deployment
    Serverless
    Monthly Reserved
  • Parameters
    26B
  • Activated parameters
    3B
  • Context length
    32.7K
  • Input price

    $0.05 / 1M tokens

  • Output price

    $0.15 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text