Models / Arcee AI
LLM

Arcee AI AFM-4.5B-Preview

4.5B-parameter foundation model trained on 6.58T curated tokens, achieving 200+ CPU tokens/sec with Western compliance standards, outperforming Qwen3-4B and Gemma3-4B across benchmarks.

About model

Arcee AI AFM-4.5B-Preview is an instruction-tuned model designed for enterprise-grade performance, developed for diverse deployment environments. It excels in mathematical reasoning and code generation. Suitable for enterprise users requiring robust model performance.

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    arcee-ai/AFM-4.5B-Preview

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "arcee-ai/AFM-4.5B-Preview",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="arcee-ai/AFM-4.5B-Preview",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'arcee-ai/AFM-4.5B-Preview',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    AFM-4.5B-Preview is Arcee AI's first foundation model, trained on 6.58T curated tokens via partnership with DatologyAI. Optimized for enterprise deployment with Western compliance standards and efficient CPU/GPU inference.

    Key Improvements:

    • 200+ tokens/sec CPU performance on 4-bit quantization
    • Enterprise-grade compliance for regulated industries
    • Multi-stage post-training with RL and KTO alignment

    Benchmark Performance

    Benchmark AFM-4.5B-Preview Qwen3-4B-Base Gemma3-4B-IT
    MMLU 65.3 69.9 57.7
    PIQA 81.5 74.6 77.3
    Winogrande 70.4 66.4 69.7
    ARC-Challenge 61.9 54.2 57.2
    HellaSwag 79.6 68.4 74.2
Related models
  • Model provider
    Arcee AI
  • Type
    LLM
  • Main use cases
    Chat
    Small & Fast
  • Deployment
    Serverless
  • Parameters
    4.6B
  • Context length
    65k
  • Input modalities
    Text
  • Output modalities
    Text