Models / Liquid AI / / LFM2 24B A2B API
LFM2 24B A2B API

This model isn’t available on Together’s Serverless API.
Deploy this model on an on-demand Dedicated Endpoint or pick a supported alternative from the Model Library.
LFM2-24B-A2B is a hybrid MoE model with 24B total parameters (2.3B activated per token) optimized as the fast inner-loop model for high-volume multi-agent pipelines. The model features a unique hybrid architecture with 30 double-gated LIV convolution blocks + 10 GQA blocks, delivering cost-effective inference enabling massive agent concurrency on the same infrastructure. With native function calling, web search, and structured outputs, LFM2-24B-A2B serves as the generation backbone in high-throughput RAG pipelines while supporting 9 languages across 32,768 token context on Together AI's production infrastructure.
Key Capabilities
LFM2 24B A2B API Usage
Endpoint
How to use LFM2 24B A2B
Model details
Architecture Overview:
• Hybrid MoE model with 24B total parameters, 2.3B activated per token
• 40-layer architecture: 30 double-gated LIV convolution blocks + 10 GQA blocks
• 64 experts per MoE block with top-4 routing, first 2 layers dense
• Hidden dimension: 2,048 with expert intermediate size: 1,536
• 32,768 token context length for extended workflows
• 65,536 vocabulary size for efficient tokenization
• Minimal active parameters enabling massive agent concurrency
• Designed as fast inner-loop model in multi-step agent pipelines
Training Methodology:
• Trained on 17T tokens (pre-training ongoing)
• General-purpose instruct model without reasoning traces
• Optimized for fast inference in high-volume multi-agent systems
• 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese
Performance Characteristics:
• Cost-effective efficiency: 24B MoE with only 2.3B active parameters per token
• Native function calling for tool orchestration in agent workflows
• Web search integration for retrieval-augmented generation
• Structured outputs for reliable data extraction and formatting
• Fast inner-loop performance optimized for multi-step pipelines
• High-throughput inference enabling massive concurrent workloads
Prompting LFM2 24B A2B
Applications & Use Cases
High-Volume Multi-Agent Pipelines:
• Optimized as fast inner-loop model for multi-step agent workflows at scale
• Native function calling for tool orchestration and API integration
• Structured outputs for reliable data extraction between agent steps
• Minimal active parameters (2.3B) enabling massive concurrent agent execution
• 32K context supporting extended multi-turn agent conversations
• Cost-effective inference for high-throughput production deployments
High-Throughput RAG Pipelines:
• Generation backbone optimized for production-scale retrieval-augmented setups
• Web search integration for real-time information retrieval
• Structured outputs for consistent formatting of retrieved data
• Efficient tokenization with 65,536 vocabulary size
• Fast inference enabling low-latency, high-volume RAG responses
• Cost-effective scaling for enterprise RAG deployments
Production Agentic Tool Use:
• Native function calling for seamless tool integration at scale
• Web search capabilities for autonomous information gathering
• Structured outputs ensuring reliable tool response parsing
• Fast inner-loop performance for high-throughput agent operations
• Multi-language support (9 languages) for global deployment
• Minimal active parameters reducing inference costs
Cost-Effective Inference at Scale:
• 24B parameters with only 2.3B active—cheaper inference per token
• Run more concurrent agents on same infrastructure
• Hybrid architecture optimized for production efficiency
• Minimal memory footprint via sparse MoE activation
• High-volume deployment without proportional cost increases
Multilingual Production Applications:
• 9-language support: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese
• Cross-lingual agent workflows and tool calling
• Multilingual RAG pipelines with consistent performance
• Global deployment with regional language support
• Cost-effective scaling across international markets
