Inference

When Standard Inference Frameworks Failed, Together AI Enabled 5x Performance Breakthrough

92%
accuracy vs OpenAI's 18.3%
30%
cost savings vs hyperscalers
<24 hour
model deployment

Executive Summary

When Vercept needed to deploy custom computer vision models that existing inference frameworks couldn't support, only Together AI's research-grade infrastructure made their breakthrough possible.

The result: 5x better performance than OpenAI on computer automation tasks and a new category of AI applications that major cloud providers couldn't enable.

About Vercept

Vercept builds AI that sees and acts on computers like humans do. Founded by AI researchers from the Allen Institute for AI, the company created Vy--a Mac application that lets users control their computers through natural language commands.

The nine-person team has attracted enterprise customers across finance, software development, and content creation industries. Their technology serves users ranging from individuals with disabilities using speech-to-text systems to businesses automating complex multi-step workflows across different applications.

The Infrastructure Challenge

Computer automation creates unique infrastructure demands that traditional cloud providers cannot meet:

Extreme Latency Requirements

Vy tasks require 200+ individual screen interactions. "If there's a 0.005 second delay, because I'm doing 100 of those for doing a single task, it's not just slower trajectory--it also results in mistakes," explains CEO Kiana Ehsani. Microscopic delays compound into workflow failures.

Near-Perfect Accuracy Standards

Computer automation demands 99.999% accuracy because a single clicking mistake cascades through the entire workflow. This precision level exceeds what existing computer vision models could deliver.

Unpredictable Scaling Demands

Usage can triple overnight without warning, requiring infrastructure that scales instantly without manual intervention or advance capacity planning.

Rapid Deployment Cycles

Continuous model experimentation requires deployment cycles measured in hours, not days, to maintain competitive advantage.

Custom Model Support

Traditional inference frameworks cannot support the specialized computer vision architectures required for real-time screen interaction. Unlike language models that predict tokens, computer automation requires understanding interaction outcomes, error recovery, and dynamic environment adaptation.

The Together AI Solution

Vercept evaluated multiple cloud providers but chose Together AI for three critical differentiators:

Technical Maturity: Together's inference platform supports custom computer vision models that standard frameworks cannot handle. The engineering team worked through multiple technical sessions to adapt their infrastructure for Vercept's specialized requirements.

Operational Flexibility: Together provides auto-scaling that monitors response latency and queue depth, enabling instant scaling without pre-provisioning. Multi-region deployment across the US ensures consistent performance for distributed users.

Cost Efficiency: Together delivered 30% cost savings compared to hyperscaler alternatives while providing superior technical support and customization capabilities.

"Together has helped us deploy VyUI, our state-of-the-art computer AI model," explains co-founder Luca Weihs. "We had multiple in-depth meetings where we brainstormed how we could satisfy our model's custom technical requirements while still leveraging Together's infrastructure for efficient, load-balanced inference."

The deployment solution enables Vercept to push model updates to production in under 24 hours--critical for the rapid experimentation cycles that drive their performance advantages.

Business Impact

Performance Results: VyUI achieved 92% accuracy on ScreenSpot v1 (UI element identification) compared to OpenAI's 18.3%. This performance advantage translates directly to operational reliability--fewer failed automations, reduced error recovery overhead, and higher user productivity. Similar advantages appear across ScreenSpot v2 (94.7% vs 87.9%) and GroundUI Web (84.8% vs 82.3%).

Scaling Validation: Together's infrastructure proved critical during an unexpected usage surge that tripled Vercept's user base overnight. The platform scaled seamlessly without advance warning, maintaining performance throughout the demand spike.

Operational Efficiency: The sub-24-hour deployment cycle enabled countless benchmark experiments with 1-3% incremental improvements. As Kiana explains, "Because we could benchmark faster, we could iterate on different training paradigms easier. And because of that, we could make more informed decisions."

Enterprise Applications: Software developers automate coding tasks and environment configuration. Finance teams streamline invoice reconciliation from platforms like Amazon and Facebook Ads. Content creators accelerate video production workflows. The accessibility impact includes individuals with disabilities using speech-to-text integration for computer control.

"Together has been very competitive on price and creative in their willingness to meet our term-length needs. Compared to hyper-scalers and other large cloud providers, there is no comparison on price--we easily save ~30% on costs for similar terms." — Luca Weihs, Co-founder, Vercept

Use case details

Products used