Together AI’s Instant Clusters Enable Latent Health to Build Clinical AI That Outperforms GPT-4

Summary

Latent Health needed affordable, flexible training to build high-accuracy clinical AI for major health systems without slowing iteration.

They adopted Together AI’s Instant GPU Clusters—bare-metal H100s with SSH, fast provisioning, and InfiniBand—enabling multi-node RL and long-context training.

As a result, they achieved ~7× lower training cost ($14 vs $98/hr), 97% clinical QA accuracy (beating GPT-4/o3), and 2–3× faster experimentation—powering minutes-not-hours review cycles across partner hospitals.

About Latent Health

Latent Health automates critical healthcare workflows for 25 major health systems including UCSF, Northwestern, Yale, and Vanderbilt University Medical Center. Their AI platform analyzes patient charts and surfaces the clinical information pharmacists need for medication approvals, reducing review cycles from hours to minutes.

Founded by experts in ML research and healthcare operations with a mission to create "a provider for every patient," the 35-person company processes workflows for hundreds of pharmacists across major health networks. To compete against commercial foundation models on healthcare-specific tasks, Latent Health needed infrastructure that could support rapid, cost-effective model training.

The Challenge

Building clinical question answering models with extraordinary accuracy required overcoming critical infrastructure challenges:

UNSUSTAINABLE COSTS

AWS H100 instances at $98/hour made the large-scale, iterative training required for healthcare AI financially prohibitive for a growing startup.
VENDOR LIMITATIONS

Traditional cloud ML services introduced layers of complexity that prevented Latent Health's team from implementing reinforcement learning workflows and multi-node orchestration for their custom training approaches.
HEALTHCARE-SPECIFIC DEMANDS

Training on de-identified clinical data demanded flexible infrastructure supporting multi-modal datasets and rapid experimentation across models ranging from 7B to 70B parameters.

The Solution

Latent Health chose Together Instant Clusters for clinical AI training, unlocking the performance and flexibility needed to outcompete commercial foundation models.

BARE METAL FLEXIBILITY

Direct SSH access to 8x NVIDIA H100 GPU clusters through Together's Instant GPU Clusters eliminated the debugging headaches and platform constraints that plague other cloud providers, enabling Latent Health's ML team to implement multi-node reinforcement learning and evolutionary algorithms without hitting vendor limitations.
RAPID, SCALABLE TRAINING

Together Instant Clusters supported Latent Health's growth from single-GPU experiments to training Qwen, Llama, and Stella embedding models on massive clinical datasets, with clusters provisioned in minutes and interconnected via NVIDIA Quantum-2 InfiniBand.
HANDS-ON PARTNERSHIP

Together AI's team provided custom infrastructure setups, trial credits for testing, and hands-on engineering support including late-night fixes during critical training deadlines.

Results

Together AI's infrastructure enabled Latent Health to achieve breakthrough clinical AI performance while maintaining cost efficiency:

7x cost reduction

NVIDIA H100 clusters at $14/hour vs AWS $98/hour
97% accuracy

On Latent Clinical QA Internal benchmarks vs GPT-4/o3 at 75-80%
2-3x faster experimentation

Daily training runs enabled rapid iteration

Real-World Clinical Impact

These technical improvements translate directly into measurable outcomes across Latent Health's health system partners. MetroHealth achieved an 80% reduction in review time (down from 25 minutes to just 5 minutes per prior authorization) alongside a 45% increase in submission capacity. Ochsner Health saw 75% faster review times and 96% increase in monthly throughput per pharmacist, impacting over 20,000 specialty patients.

"Together Instant Clusters enabled us to build the intelligence backbone that's core to our business. Our partners will accept nothing less than state of the art so it is critical that we distill clinical nuance and the latest compendia into our models. Together AI's infrastructure gives us both the performance and experimentation freedom to make that happen." — Allan Bishop, Head of Engineering, Latent Health