HAI Seminar with Sanmi Koyejo: Beyond Benchmarks – Building a Science of AI Measurement

11Likes

432Views

Apr 82025

The widespread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety. While current evaluation practices rely on static benchmarks, these methods face fundamental efficiency, reliability, and real-world relevance challenges. This talk presented a path toward a measurement framework that bridges established psychometric principles with modern AI evaluation needs. The speakers demonstrated how techniques from Item Response Theory, amortized computation, and predictability analysis can substantially improve the rigor and efficiency of AI evaluation. Through case studies in safety assessment and capability measurement, the scholars demonstrated how this approach can enable more reliable, scalable, and meaningful evaluation of AI systems. This work points toward a broader vision: evolving AI evaluation from a collection of benchmarks into a rigorous measurement science that can effectively guide research, deployment, and policy decisions. This seminar was recorded on March 19, 2025 at Stanford University.

HAI Seminar with Sanmi Koyejo: Beyond Benchmarks – Building a Science of AI Measurement

Stanford HAI

HAI Seminar with Sanmi Koyejo: Beyond Benchmarks – Building a Science of AI Measurement

Description

Stanford HAI

Transcript

HAI Seminar with Nestor Maslej: Presenting the 2025 AI Index Report

HAI Seminar with Angela Aristidou: Deploying AI in Organizations & Society

Full interview: "Godfather of AI" shares prediction for future of AI, issues warnings

Accelerating Scientific Discovery with AI - lecture by Sir Demis Hassabis

AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference

Former FBI Agent: If They Do This Please RUN! Narcissists Favourite Trick To Control You!

Hackers, malware and the darknet - The fatal consequences of cyberattacks | DW Documentary

Your Brain: Who's in Control? | Full Documentary | NOVA | PBS

Lawrence: Canada's PM humiliated Trump today, but not as much as Trump humiliated himself

Artificial Intelligence | 60 Minutes Full Episodes

Here’s How Biocomputing Works And Matters For AI | Bloomberg Primer

HAI Seminar with Vauhini Vara: The Impact of Al on Writing

Can AI Match the Human Brain? | Surya Ganguli | TED

Trump on Upholding Constitution: "I Don't Know" | The Daily Show

Google DeepMind CEO Demis Hassabis: The Path To AGI, Deceptive AIs, Building a Virtual Cell

Deep Dive into LLMs like ChatGPT

HAI Seminar with Lowry Pressly: Privacy and the Power of Unknowing

2025 AI+Education Summit: Navigating the AI Frontier–Challenges, Opportunities, and Ethical Dilemmas

NVIDIA CEO Jensen Huang's Vision for the Future

Integrating Generative AI Into Business Strategy: Dr. George Westerman