How Yutori runs browser-use AI agents at production scale on Together AI’s inference platform

00:00

Summary

Yutori is an AI-native company building products that put AI agents to work in the browser, for everyone from prosumers to enterprises. Their product suite spans Scouts (an always-on web monitoring agent), Delegate (an AI chief of staff that manages tasks end-to-end across your apps), and a developer API for web monitoring, deep research, and natural language browser automation.

This is all supercharged by Navigator, Yutori’s proprietary browser-use model, trained with reinforcement learning on real websites and capable of anything a human knowledge worker can do in a browser.

About Yutori

Yutori is a true AI-native company: founded by AI researchers, run by AI agents internally, and building products that put AI agents to work for everyone else. This product suite runs across two surfaces:

For prosumers: Scouts, AI agents that monitor the web 24/7 and surface new information on anything users care about, and Delegate, an AI chief of staff that takes on tasks end-to-end — navigating websites, drafting replies, researching the web, and managing workflows across your apps (email, calendar, slack, linear, notion, granola) — so users can delegate tasks and move on with their day.
For developers and enterprises: APIs for monitoring the web, deep research, and natural language browser automation (fill out forms, log into portals and complete routine knowledge work tasks, extract information from dynamic webpages in a structured format).

All of the above are powered by Navigator, Yutori's proprietary browser-use model that predicts keyboard and mouse actions to interact with a browser, unlocking anything that a human knowledge worker can do. Navigator is trained using SFT and reinforcement learning on synthetic and real websites.

For AI-native companies like Yutori, inference isn’t just a cost center — it’s the product. Browser-use agents place unusual demands on inference infrastructure: short outputs, long and growing inputs, multimodal vision-language processing, and two very different latency profiles depending on whether a human is waiting or not. Together AI’s inference platform is purpose-built for exactly this kind of workload, handling both ends of that spectrum and giving Yutori the elastic scaling to grow without renegotiating capacity every time demand spikes.

The company operates on two levels: an AI research lab building the world’s best computer-use models for browser navigation and an AI agent product company. Scouts functions like Google Alerts for the AI era — agents monitor the web continuously and report back only when something new and relevant surfaces. Delegate goes further: it's a fully autonomous agent that users hand tasks off to entirely, from researching and drafting to navigating websites and managing workflows, without needing to stay in the loop at every step. The Navigator model powering both products is also available as a public API for developers building their own browser automation pipelines.

The challenge

For AI-native companies, inference infrastructure isn't a supporting concern — it's directly load-bearing. Yutori's agent architecture creates an inference profile that standard cloud platforms are not built to serve:

Browser agents run in tight, continuous loops: Every step of a Navigator agent involves taking a browser screenshot, appending it to the full conversation history, and calling the model for the next action. Context grows with every turn. A single agent task can require dozens of model calls over 10–15 minutes, and that loop cannot tolerate unpredictable latency spikes or the browser times out.
Two workloads with opposite performance requirements: Yutori serves two distinct use cases that pull infrastructure in different directions. Background agents like Scouts are asynchronous: no human is waiting, so throughput and cost efficiency matter most. Real-time agent feedback, where a user is sitting in front of a browser and needs to be unblocked quickly, is latency-sensitive. Optimizing for one without the other means leaving users underserved at one end of the product.
Vision-language at scale requires purpose-built infrastructure: Navigator is a multimodal model that processes browser screenshots alongside growing text histories. Input lengths vary from low thousands to 100k tokens, while outputs are typically tool-calls representing UI actions and JS code. That workload — long inputs, short outputs, high cache-reuse potential — is a fundamentally different serving problem than typical LLM deployments.
Elastic scaling is a hard requirement, not a nice-to-have: As Yutori grows, traffic can spike suddenly and significantly. An inference provider that requires long lead times to provision capacity creates a ceiling on growth. Yutori needed the ability to scale up on a short horizon; without it, acquired users can't be served.

The solution

Yutori runs on Together AI, the AI Native Cloud, purpose-built for the inference demands of companies like Yutori that are AI native from the ground up. Together AI's inference platform serves as the backbone for both Scouts and Delegate, and the Navigator public API:

Optimized inference for a high-cache, long-input workload: Together AI's inference platform uses vLLM-based serving with prefix caching, well-matched to Yutori's agentic loop architecture where only the last few turns of context change between calls and prior history is untouched. That cache-hit efficiency translates directly into lower latency and cost per agent step, critical when a single user task can involve dozens of model calls.
Inference tuned for both ends of the latency spectrum: Together AI worked with Yutori to optimize the inference stack for both workload types: maximizing throughput for background agents running asynchronously, and meeting the tighter latency bounds required when agents are operating in the foreground. The result is an inference platform that performs well across both product surfaces rather than forcing a trade-off.

"Working with Together AI has allowed Yutori to scale without failures. That's important." — Dhruv Batra, Co-founder and Chief Scientist, Yutori

Elastic capacity on demand: When Yutori needs to double GPU capacity on short notice to handle a traffic surge, Together delivers it. That ability to scale without renegotiation means Yutori can serve new users immediately as they arrive, converting growth directly into revenue.
Inference infrastructure that scales with the product surface: Together AI's platform supports the full Yutori production suite, serving two very different classes of inference demand. Where Scouts runs agents asynchronously in the background, Delegate handles open-ended, multi-step tasks that require sustained model calls across longer time horizons. Together's elastic capacity and reliable uptime means Yutori can expand what their agents can do — and who they can do it for — without outgrowing their inference layer.

"Working in collaboration with Together AI, we've been able to bring Yutori's products not just to consumers using background agents, but also to developers who are hooking up to their workflows." — Dhruv Batra, Co-founder and Chief Scientist, Yutori

Results

Together AI's inference platform enables Yutori to run production browser-use agents at the performance levels their product requires. For an AI-native company where inference is the product, that reliability is foundational across both consumer and developer workloads.

Frontier model performance at a fraction of the inference cost: Navigator outperforms the world's best frontier models, including Anthropic Opus and Open AI GPT, on browser-use tasks while running 2x faster per step and at 4–5x lower inference cost. Together AI's optimized inference serving helps Yutori deliver those advantages to developers and pass the cost savings on to users.
Throughput-optimized serving for background agents: For Scouts' asynchronous monitoring agents, Together maximizes throughput within Yutori's latency bounds, serving as many parallel requests as possible without failures. Higher throughput directly reduces per-task inference cost, improving product economics as usage scales.
Latency performance for real-time agent interactions: For real-time agent feedback interactions where a user is waiting on an agent response, Together's infrastructure meets the tighter latency requirements without sacrificing reliability. Browser agents that miss their response window cause browser timeouts; Together's serving keeps that from happening.
Elastic scaling without service disruption: Yutori can scale GPU capacity on a short horizon when demand spikes — without long provisioning lead times or capacity negotiations. That elasticity means traffic growth converts to served users rather than dropped requests.
A foundation for an expanding production suite: Together AI's inference platform supports Scouts, Delegate, and the Navigator public API, a growing surface of agent products with distinct inference characteristics. As Yutori's products take on more complex, longer-horizon tasks, Together scales alongside them without requiring infrastructure renegotiation at every growth stage.

"One thing that is a bit underappreciated in this area is the degree to which performance optimization matters. There is a core expertise in serving the model — making sure it scales with variable requests, making sure infrastructure stays up. What was surprising to me was the degree of reliability and the performance impact of the various knobs that could be tuned." — Dhruv Batra, Co-founder and Chief Scientist, Yutori

‍

How Yutori runs browser-use AI agents at production scale on Together AI’s inference platform

About Yutori

The challenge

The solution

Results

Related customer stories

Start building on Together AI