How Scaled Cognition Trains APT-1 on Together AI GPU Clusters
~3-4 months
Cumulative research time recovered
~50%
cost savings vs. other providers
Executive Summary
Scaled Cognition needed reliable bare-metal GPU infrastructure to build Large Action Models (LAMs) using custom training stacks that managed platforms couldn't support.
They adopted Together AI's GPU Clusters with direct access to H200 and B200 GPUs, enabling their custom Slurm orchestration and multi-node model training workflows.
As a result, they recovered an estimated 3–4 months of cumulative research time previously lost to infrastructure failures and achieved ~50% cost savings vs. other providers, helping to power, among other deployments, the Genesys Cloud™ platform used by thousands of enterprises to deliver next-level customer experience.
About Scaled Cognition
Scaled Cognition is the creator of APT-1, the only frontier model for CX that eliminates hallucinations, enabling next-generation customer support for regulated enterprises and notably powering the Genesys Cloud™ CX platform.
Founded by UC Berkeley AI Professor Dan Klein and serial entrepreneur Dan Roth — the founders of Microsoft Semantic Machines — and joined by some of the leading researchers and engineers in the field, including early collaborators such as Percy Liang, now co-founder of Together AI, the company builds Large Action Models engineered for deterministic behavior, provenance, and policy-aligned execution across banking, healthcare, and other compliance-critical domains.
Their breakthrough: APT-1®, the industry-leading frontier Large Action Model (LAM) that optimizes for actions rather than tokens, eliminating hallucinations in structured workflows like function calling and delivering far stricter policy adherence. LAMs track the provenance of every piece of information so that, for example, a credit card number can be traced back to an API response rather than generated from model weights—enabling AI agents that can guarantee compliance with regulations in industries where a single error can expose an organization to audits, regulatory scrutiny, and severe business and reputational risk.
The Challenge
Building production-grade action models with guaranteed compliance required overcoming critical infrastructure and operational challenges:
INFRASTRUCTURE RELIABILITY FAILURES
Working with multiple data center providers and compute vendors, training runs lasting more than a few hours would frequently fail due to networking issues—network controllers failing, GPU communication dropping, or jobs stalling without clear diagnostics. Each incident required slow back-and-forth with providers to track down problems at the networking or GPU level, forcing researchers to debug hardware instead of developing models. The cumulative impact: months of research capacity lost to infrastructure issues rather than model development.
UNSUSTAINABLE ECONOMICS
AWS pricing made sustained experimentation prohibitively expensive—especially when training runs take multiple days and require dozens of experiments to validate architectures and hyperparameters.
PLATFORM CONSTRAINTS
The team's methodology required bare-metal access with direct SSH to configure Slurm for distributed job scheduling and run their custom training stack. Their approach combines custom loss functions for action optimization, multi-node training using custom architectures, and tree-structured conversation training. Managed ML platforms introduced layers of complexity that prevented implementing these workflows.
The Solution
Together AI's GPU Clusters provided the infrastructure foundation and partnership Scaled Cognition needed to build production-grade Large Action Models:
BARE METAL WITH HIGH-PERFORMANCE NETWORKING
Together AI provided a B200 cluster with NVIDIA Quantum-2 InfiniBand interconnect and direct SSH access, enabling Scaled Cognition to implement their training stack. The bare-metal access meant they could set up Slurm for distributed job orchestration, run custom CUDA kernels, and implement multi-node workflows without platform limitations. Together AI's cluster deployment proved transformational: after repeated failures elsewhere, multi-day training runs completed reliably without hardware-related interruptions—freeing months of research time previously lost to infrastructure debugging.
HANDS-ON TECHNICAL PARTNERSHIP
Together's team helped with the initial Slurm configuration, migrating Scaled Cognition from their previous custom orchestration system. The ongoing support proved equally valuable—a shared Slack channel provided fast responses to any issues, and contract negotiations for cluster upgrades switching to the latest and greatest hardware and increasing the number of GPUs has been proceeding smoothly as their needs scaled. An initial personal connection opened the door; the team stayed because of reliability and support quality.
COST-EFFECTIVE SCALABILITY
Together's pricing enabled sustainable iteration at scale at approximately 50% of AWS base rates for comparable GPU instances. When renewal time came, the team felt confident keeping their research workloads on Together's infrastructure based on the combination of reliability, support quality, and economics.
Results
Together AI's GPU Clusters enabled Scaled Cognition to recover months of lost research time and achieve the training velocity required to build production-grade Large Action Models:
Estimated 3-4 months of cumulative research time recovered
~50% cost savings vs. other compute providers
Production Deployment at Scale
APT-1 now powers Genesis Global's cloud AI CX platform, processing customer support workflows with guaranteed policy compliance and provenance tracking in banking, healthcare, and other regulated industries. The models enable structured function calls and multi-step workflows that adhere strictly to policies where compliance errors carry serious consequences.
"We spent 3-4 months blocked by infrastructure failures with previous providers—debugging networking issues that had nothing to do with our models. Together AI's cluster has had zero training-blocking issues since we switched. That reliability, combined with significant cost savings compared to other providers, completely changed our development velocity. But honestly, the responsive support is what keeps us here: shared Slack channel, problems resolved within hours, smooth scaling of our training cluster as our needs have been expanding. Together AI lets us focus on building breakthrough models instead of fighting infrastructure." — Anthony Platanios, VP of Research, Scaled Cognition
Use case details
Products used
Highlights
- 3-4 months research time recovered
- ~50% cost savings vs. other providers
- Zero training-blocking issues
- Bare-metal H200/B200 access
- Reliable multi-node training runs
Use case
Training production-grade Large Action Models (LAMs) for regulated customer support workflows.
Company segment
Frontier Action Model Company