
Today, we’re excited to announce our $150M Series D, led by BOND, with Jay Simons joining our Board. We’re also thrilled to welcome Conviction and CapitalG to the round, alongside support from 01A, IVP, Spark, Greylock, Scribble Ventures, BoxGroup, and Premji Invest.
Our team started obsessing over artificial intelligence over 15 years ago (though those of us that were working on it called it ML). At the time, and in the years that followed, software engineers focused on what they saw as more exciting challenges in ecommerce, mobile, and crypto. Instead, we worked non-stop for weeks on factorization and ensemble models for the Netflix Challenge. I competed in way too many random Kaggle competitions. We were all over OpenAI’s first product release in 2016–OpenAI Gym–to come up with models that made imaginary carts drive up Mountains.
Read More – Euclid Power Raises $20M in Series A Funding
We searched for high-impact applications for AI that we could build a company around – everything from predicting disease progression, fraud detection, generative art . We had an AI hammer and everything looked like a nail if you squinted, but the reality was we had technology in search of a problem.
We searched for teams that wanted to deploy AI in production, at scale, but there simply weren’t that many. From 2019 to 2022, very few teams were doing any large scale model serving–what we now call inference–especially for generative models. Teams were focused on feature stores, interpretability, and experimentation. They didn’t want inference. But we just couldn’t shake our belief that one day, products would be built on large, powerful models. We had a hunch that one day, production-grade AI systems would be ubiquitous and require three things, and we aggressively built for that world:
- Fast models
- Interchangeable compute
- Flexible, open, and Pythonic runtimes
In 2022, the market showed up
OpenAI launched ChatGPT November 30, 2022. It was like the world’s AI lightbulb turned on all at once. Suddenly, the new standard for consumer expectations was set: the fastest models, great interfaces, and strong developer APIs were now table stakes for any future-facing products. All of a sudden, AI was all San Francisco engineers wanted to talk about and work on.
At the same time Stable Diffusion and Whisper showed that open-source models could be powerful and in high demand. And in the last 18 months, open-source models have quickly approached the quality of closed-source ones. Labs claimed that anything non-frontier was a commodity, but excitement around launches like DeepSeek, Qwen, and Flux showed that open-source models are now a necessary part of the ecosystem. Developers want transparency and trust, control over runtimes, and direct control over the costs of running expensive workloads.
When developer enthusiasm is combined with a lot of capital and open source resources, progress happens fast. More and more applications were being built using models and those models had to run somewhere.
Seemingly overnight, inference went from niche to mainstream.
Users don’t care what models you use
Today, the best AI app companies use a mix of open and closed models to deliver top consumer and enterprise experiences. There’s little dogma about which type to use—only a focus on the best model for the thing you are trying to solve for your users. Scaling AI products is hard though!
To serve more users, developers will need to serve them cost-efficiently. To chase security-first enterprise contracts, developers will need flexibility with where models can be served from. And to get 4 or 5 9s of reliability, infrastructure will need to be isolated. The only way this is possible is by taking control of training your own models or using post-trained or fine-tuned variants of the most powerful open-source models.
But running these models well is non-negotiable. When the models slow down, the various agents for support, programming and design also slow down, and real productivity is lost. When the models are down, healthcare workers can’t use the AI superpowers they have come to rely on. Inference is powering and blocking end-user experiences, and our users come to us to run their models as fast and reliably as possible — there are few tradeoffs allowed.
Read More – ACT-ion Battery Technologies Closes $4M Pre-Series A Extension