This website uses cookies to anonymously analyze website traffic using Google Analytics.
Company

The Frontier is Open

June 9, 2025

By 

Charles Zedlewski

Peter Thiel once said that good startups are built around secrets: ideas that only a few people see at first but that becomes obvious later. There’s no secret that AI models are the foundation of our next application platform; it’s arguably the most seismic platform shift since the internet roughly 30 years ago. What is still a secret to most in 2025, is this new AI application platform is quickly becoming a ubiquitous open source commodity.

Openness and open source has been the inevitable direction of nearly every new platform dating back to the internet browser. From browsers to operating systems, to databases to containers, users have converged on open source as the “default” path for usage and innovation. This shift from proprietary to open source is progressing even faster this time with AI. AI models are built on open research and trained on public data. It’s odd that many of the best known AI models today are proprietary derivations of such open ingredients.

Open source platforms are commonly known to offer greater flexibility and lower cost. These qualities matter more for AI applications than they have in past platforms.

The cost advantages of open source in AI are considerable both because:

  1. AI models have similar interfaces but are subtly incompatible in their respective strengths and biases - open source means you can remain invested in a model but still have portability and accountability from infrastructure providers
  2. The compute to power AI is still quite expensive and open source has been in the lead in innovating on the efficient use of this precious resource

The flexibility of open weights and open source licenses means you can:

  1. Adapt models to your use case in ways you can only do with open weights: for example quantize them to fit your quality/performance requirements
  2. Own the final artifact so you can run your AI where it needs to be run whether that be a cloud, a device, a robot or an airgapped datacenter
  3. Distill within and across model families without fear of violating proprietary license terms

Not only is open source AI more flexible and less costly than proprietary, it’s also increasingly where we find the frontier for AI.

Because AI is a relatively new and immature technology, we’re understandably compelled to run towards the frontier to reduce our risk of failure. Proprietary AI developers want to believe the frontier can only progress from closed labs working around a magic cauldron sitting on a pyre of GPU’s. But building a future on proprietary AI is betting that success is best secured by the sum of the smart people working at just one lab. As Bill Joy once said: “no matter who you are, most of the smartest people work for someone else.”

Today open research is producing open models that occupy different points on the current AI frontier. Open source models are claiming more “firsts”:

  • Llama 3 was a breakthrough in AI affordability. It was the first source of quality, low cost intelligence and brought down industry API pricing by 80%.
  • Deepseek R1 was a breakthrough in the use mixture of experts architecture to make very large gains in pretraining & inference efficiency
  • Qwen3 was the first ever foundation model designed for “hybrid reasoning”

And open source models already hold a number of the superlatives in the ecosystem of open and closed models. Today:

  • The largest range of model sizes is in open source, enabling developers to find the best fit for their use case
  • The highest quality instruct model is open source, meaning the best go-to model for fast, consistent intelligence for agent building is open source
  • The most parameter efficient SOTA quality coding model is open source - making it possible to have powerful AI coding assistance local to a laptop
  • The best source for new model distillation is a mixture of open source models

Proprietary models continue to occupy the frontier in several dimensions such as reasoning and multimodality, but these gaps have been shrinking over time. It would be naive to think open source won’t someday have SOTA frontier achievements in these areas as well.

Open source is the way forward for AI as a platform — but for developers to benefit from this trend, open source AI needs a new kind of cloud. This is the reason why Together AI was founded by Vipul, Ce, Tri, Percy and Chris, because:

  1. Open out-innovates thanks to community diversity, but diverse also means variable. There are nearly 2 million open source models on Huggingface but they each have unique configurations, dependencies and infrastructure needs. Together AI is a cloud platform that normalizes this variability so we can have production uses of open source AI.
  2. Open-source models are more adaptable to your use case but the skill expectations of that adaption can be excessive. Together AI is a cloud platform that gives powerful, usable abstractions so engineers—not just researchers—can fit AI models to their use case.
  3. Open source models are architected assuming AI-native infrastructure which is evolving almost as fast as the models. Together AI is a cloud platform that’s built to quickly adopt the newest AI native compute, networking & storage architectures to absorb the next wave of open source innovation. 

Together AI was built to serve as the bridge between developers, researchers and the open source AI frontier. The signs of the founders’ ambition are everywhere. Together AI has the best roster of researchers of any open AI platform company today because it takes researchers to translate frontier research into a production platform whose results can match or exceed the benchmarks in papers. Together AI has built the most comprehensive range of pretraining & posttraining services so users can start at any place in the AI development lifecycle and grow on within one coherent system. Only 3 years old, the company operates an extensive fleet of AI native infrastructure in more than a dozen datacenters because a large global compute footprint is table stakes to power open source AI.

Developers and customers are starting to take note. Open source AI has attracted a fast growing and increasingly diverse array of companies to Together AI’s cloud. From the foundation model labs like Cohere, Cartesia and Nous Research who build on open research to the AI native agent companies like Cognition and SmarterDx, SaaS leaders like Salesforce and Zoom, and enterprises like SK Telecom who build on open models. As Together AI keeps improving the platform’s functionality, performance and predictability, the move to open source Ai will only accelerate. Adding new & better services to develop against the platform will compound this trend.

As an AI cloud for developers, Together AI was the perfect home for me; where I could apply the lessons I was fortunate to learn building the open source category leader for big data (Cloudera) and the open source category leader for durable execution (Temporal). I’m glad to do this with founders like Vipul and Ce who have an expansive and compelling vision for where Together AI can go but also have the humility and curiosity to leave room for others to add to that vision. And of course, I’m hiring. My DM’s are open for interested product managers, doc writers, program managers, developer advocates and more: Together needs you. It’s never the easiest, but it’s almost always the most fun building on the frontier.

  • Lower
    Cost
    20%
  • faster
    training
    4x
  • network
    compression
    117x

Q: Should I use the RedPajama-V2 Dataset out of the box?

RedPajama-V2 is conceptualized as a pool of data that serves as a foundation for creating high quality datasets. The dataset is thus not intended to be used out of the box and, depending on the application, data should be filtered out using the quality signals that accompany the data. With this dataset, we take the view that the optimal filtering of data is dependent on the intended use. Our goal is to provide all the signals and tooling that enables this.

No items found.
Start
building
yours
here →