Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo

In Review. Preliminary version accepted at the NeurIPS  2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

Paper Policy Brief