

Scorecard, an AI agent evaluation platform, announced today $3.75 million in seed funding from Kindred Ventures, Neo, Inception Studio, Tekton Ventures, along with angel investors from OpenAI, Apple, Waymo, Uber, Perplexity, Meta, and others. Built by a team with deep experience in autonomous vehicle simulation at Waymo and Uber, Scorecard enables developers to run tens of thousands of evaluation tests daily in virtual environments, dramatically accelerating AI agent development and deployment. The company has already run millions of tests for customers including Thomson Reuters, which uses Scorecard to test and deploy CoCounsel, its suite of legal AI agents.
Read More – Prelude Security Raises Additional $16M in Funding
The Testing Crisis in AI Agent Development
In the coming years, millions of AI agents will be deployed across legaltech, fintech, healthtech, and insurtech, but testing remains slow and error-prone. Manual evaluation often requires writing custom scripts, curating datasets, and exporting results, processes that can take days or weeks while introducing human error. This sluggish approach delays feature rollouts, obscures blind spots in AI behavior, and creates risks to compliance, security, and user trust. Without fast, repeatable validation, teams struggle to ship innovations confidently or respond quickly to production issues.
Scorecard’s Solution: A Programmable Platform for Rapid Iteration
Scorecard provides a fully managed evaluation engine that lets teams define test suites in minutes using a no-code UI or Python/TypeScript SDKs. Users can script end-to-end scenarios, from conversational prompts and compliance checks to performance benchmarks, and execute tens of thousands of tests per day against live or staged AI agents. All results feed into an interactive dashboard with real-time metrics, failure reports, and trend analysis, making it easy to spot regressions, diagnose edge-case errors, and measure improvements over time.
By integrating directly into existing CI/CD pipelines, Scorecard automates continuous validation. Every code commit, model update, or configuration change triggers a fresh round of rigorous testing without human intervention, reducing months of manual testing to seconds.
The platform has already found strong traction with enterprise customers. “At Thomson Reuters, the reliability and effectiveness of CoCounsel Core, our professional-grade legal AI assistant, are paramount,” said Tyler Alexander, Director of AI Reliability at Thomson Reuters. “Scorecard enables us to scale our continuous monitoring efforts and make them vastly more efficient.”
“At Waymo, we saw firsthand how millions of simulations can make the difference between a working prototype and a category-defining product that changes the world,” said Darius Emrani, founder and CEO of Scorecard. “With Scorecard, we’re giving companies the tools to iterate on AI agents at unprecedented velocity.”
The Scorecard founding team brings deep expertise from building testing and simulation systems at scale. CEO and founder Darius Emrani spent many years leading product teams that built these critical technologies at Waymo, where he helped grow the simulation & evaluation team from a few dozen engineers to over 200, partnering with engineering leadership to design and scale infrastructure that ran millions of scenario evaluations daily. This work directly powered the world’s first commercial autonomous ride-hail service, operating across multiple states and completing tens of millions of rides. Prior to Waymo, Emrani led simulation at Uber’s Advanced Technologies Group, where he built the evaluation frameworks used to stress-test self-driving systems at scale.
“With millions of AI agents set to be deployed over the coming years in regulated industries like legal, finance, and healthcare, trust in AI isn’t optional, it’s mission-critical,” explained Steve Jang, General Partner at Kindred Ventures. “The Scorecard team’s experience testing and deploying self driving car systems at Waymo and Uber, uniquely equip them to build robust and reliable evals for agents in virtual or physical environments.
Read More – Inspiren Raises $100M in Series B Funding