Sign in to confirm you’re not a bot
This helps protect our community. Learn more
Comments are turned off. Learn more
HAI Seminar with Sanmi Koyejo: Beyond Benchmarks – Building a Science of AI Measurement
11Likes
432Views
Apr 82025
The widespread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety. While current evaluation practices rely on static benchmarks, these methods face fundamental efficiency, reliability, and real-world relevance challenges. This talk presented a path toward a measurement framework that bridges established psychometric principles with modern AI evaluation needs. The speakers demonstrated how techniques from Item Response Theory, amortized computation, and predictability analysis can substantially improve the rigor and efficiency of AI evaluation. Through case studies in safety assessment and capability measurement, the scholars demonstrated how this approach can enable more reliable, scalable, and meaningful evaluation of AI systems. This work points toward a broader vision: evolving AI evaluation from a collection of benchmarks into a rigorous measurement science that can effectively guide research, deployment, and policy decisions. This seminar was recorded on March 19, 2025 at Stanford University.

Follow along using the transcript.

Stanford HAI

26K subscribers