Papers
* denotes equal contribution
α-β denotes alphabetical order
External Links: Google Scholar. arXiv. Github.
* denotes equal contribution
α-β denotes alphabetical order
External Links: Google Scholar. arXiv. Github.
On the 2025–26 academic job market, seeking tenure-track positions beginning Fall 2026.
I am broadly interested in reliable AI and am generally working on the science of measuring AI capabilities and intervening on AI behavior — bridging theory, algorithms, and real-world impact. I develop the principles and practices of reliable AI evaluation. This includes studying the external validity of key benchmarks (ImageNet) in deep learning, the internal validity of benchmarks for out-of-distribution generalization, and frameworks for valid evaluation of latent AI capabilities and traits. I also develop methods to understand and intervene on mechanisms that determine AI behavior, such as causal versus spurious pathways. My work enables AI systems to generalize and adapt to new environments that differ from their training data, ensuring that AI systems are reliable and safe in dynamic, real-world settings. Application areas of my work include health and medicine, algorithmic fairness, and AI policy.
The Science of AI Measurement -- Validity and Reliability
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation
Olawale Salaudeen*, Anka Reuel*, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo
Working Paper
[arXiv] [webpage]
ImageNot: A Contrast with ImageNet Preserves Model Rankings
Olawale Salaudeen, Moritz hardt
In review, 2025
[arXiv] [code] [webpage]
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
Tom Sühr, Florian E. Dorner, Olawale Salaudeen, Augustin Kelava, Samira Samadi
In review, 2025
[arXiv]
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness
Stephen R. Pfohl, Natalie Harris, Chirag Nagpal, David Madras, Vishwali Mhasawade, Olawale Salaudeen, Awa Dieng, Shannon Sequeira, Santiago Arciniegas, Lillian Sung, Nnamdi Ezeanochie, Heather Cole-Lewis, Katherine Heller, Sanmi Koyejo, Alexander D'Amour
NeurIPS 2025 (to appear)
[arXiv]
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, William Isaac
The Bridge 2025, National Academy of Engineering
[arXiv]
Understanding Subgroup Performance Differences of Fair Predictors using Causal Models
Stephen Robert Pfohl, Natalie Harris, Chirag Nagpal, David Madras, Vishwali Mhasawade, Olawale Salaudeen, Katherine A Heller, Sanmi Koyejo, Alexander Nicholas D'Amour
In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models
[paper]
Addressing Observational Biases in Algorithmic Fairness Assessments
Chirag Nagpal, Olawale Salaudeen, Sanmi Koyejo, Stephen Pfohl
NeurIPS 2022 AFCP Workshop (extended abstract)
[poster]
The Science of Intervening on AI Behavior Mechanisms and Steering Behavior -- Generalization and Adaptation to New Environments
Aggregation Hides OOD Generalization Failures from Spurious Correlations
Olawale Salaudeen, Haoran Zhang, Kumail Alhamoud, Sara Beery, Marzyeh Ghassemi
NeurIPS 2025 (to appear); Spotlight
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Olawale Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo
In TMLR 2025
[arXiv] [code] [webpage] [news]
On Domain Generalization Datasets as Proxy Benchmarks for Causal Representation Learning
Olawale Salaudeen, Nicole Chiou, Sanmi Koyejo
In NeurIPS 2024 Causal Representation Learning Workshop (Oral)
[paper]
Causally Inspired Regularization Enables Domain General Representations
Olawale Salaudeen, Oluwasanmi Koyejo
In AISTATS 2024
[arXiv] [code] [webpage]
Proxy Methods for Domain Generalization
Katherine Tsai, Stephen R. Pfohl, O. Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D’Amour, Sanmi Koyejo, Arthur Gretton.
In AISTATS 2024
[arXiv] [code]
Adapting to Latent Subgroup Shifts via Concepts and Proxies
α–β. Ibrahim Alabdulmohsin*, Nicole Chiou*, Alexander D’Amour*, Arthur Gretton*, Sanmi Koyejo*, Matt J. Kusner*, Stephen R. Pfohl*, Olawale Salaudeen*, Jessica Schrouff*, Katherine Tsai*.
In AISTATS 2023
[arXiv] [code] [webpage]
Adapting to Shifts in Latent Confounders using Observed Concepts and Proxies
Matt J. Kusner, Ibrahim Alabdulmohsin, Stephen Pfohl, Olawale Salaudeen, Arthur Gretton, Sanmi Koyejo, Jessica Schrouff, Alexander D’Amour
In ICML 2022 PODS Workshop
[paper]
Exploiting Causal Chains for Domain Generalization
Olawale Salaudeen, Oluwasanmi Koyejo
In NeurIPS 2021 DistShift Workshop
[paper]
AI for Heterogeneous Populations
On Group Sufficiency Under Label Bias
Haoran Zhang, Olawale Salaudeen, Marzyeh Ghassemi
NeurIPS 2025 (to appear)
What’s in a Query: Polarity-Aware Distribution-Based Fair Ranking
Aparna Balagopalan, Kai Wang, Olawale Salaudeen, Asia Biega, Marzyeh Ghassemi
In WWW 2025
[arXiv] [code]
Applications in Neuroscience and Neuroimaging
Enhancing fMRI Motion Denoising with ICA-AROMA and Causal Discovery
Olawale Salaudeen, Paul Camacho, Aron Barbey, Brad Sutton, Sanmi Koyejo
In Review
[code]
Ultra-fast 3D fMRI to Explore Cardiac-Induced Fluctuations in BOLD-Based Functional Imaging
Brad Sutton, Aaron Anderson, Benjamin Zimmerman, Paul Camacho, Riwei Jin, Charles Marchini, Olawale Salaudeen, Natalie Ramsy, Davide Boido, Serge Charpak, Andrew Webb, Luisa Ciobanu
International Society for Magnetic Resonance in Medicine (ISMRM), 2022. (abstract).
[link]