Papers

* denotes equal contribution

α-β denotes alphabetical order

On the 2025–26 academic job market, seeking tenure-track positions beginning Fall 2026.

I develop the principles and practices of reliable AI evaluation. This includes studying the external validity of key benchmarks (ImageNet) in deep learning, the internal validity of benchmarks for out-of-distribution generalization, and frameworks for valid evaluation of latent AI capabilities and traits. I also develop methods to understand and intervene on mechanisms that determine AI behavior, such as causal versus spurious pathways. My work enables AI systems to generalize and adapt to new environments that differ from their training data, ensuring that AI systems are reliable and safe in dynamic, real-world settings. Application areas of my work include health and medicine, algorithmic fairness, and AI policy.

Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations. Olawale Salaudeen, Haoran Zhang, Kumail Alhamoud, Sara Beery, Marzyeh Ghassemi. NeurIPS 2025. Accepted as a spotlight paper. [paper]
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation. Olawale Salaudeen*, Anka Reuel*, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo. Preprint. Accepted at the NeurIPS 2025 Workshop on LLM Evaluation. [paper] [webpage] [policy brief]
On Evaluating Methods vs. Evaluating Models. Olawale Salaudeen, Florian Dorner, Peter Hase. Accepted as an oral at the NeurIPS 2025 Workshop on LLM Evaluation. [paper forthcoming!]
On Group Sufficiency Under Label Bias. Haoran Zhang, Olawale Salaudeen, Marzyeh Ghassemi. NeurIPS 2025. [paper]
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness. Stephen R. Pfohl, Natalie Harris, Chirag Nagpal, David Madras, Vishwali Mhasawade, Olawale Salaudeen, Awa Dieng, Shannon Sequeira, Santiago Arciniegas, Lillian Sung, Nnamdi Ezeanochie, Heather Cole-Lewis, Katherine Heller, Sanmi Koyejo, Alexander D'Amour. NeurIPS 2025. [paper] (Preliminary version appeared at NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models).
Improving Single-round Active Adaptation: A Prediction Variability Perspective. Xiaoyang Wang, Yibo Jacky Zhang, Olawale E Salaudeen, Mingyuan Wu, Hongpeng Guo, Chaoyang He, Klara Nahrstedt, Sanmi Koyejo. TMLR 2025. [paper]
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? Olawale Salaudeen, Nicole Chiou, Shiny Weng, Sanmi Koyejo. TMLR 2025. Awarded TMLR Journal to Conference [J2C] Certification. [paper] [code] [webpage] [news]
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. Tom Sühr, Florian E. Dorner, Olawale Salaudeen, Augustin Kelava, Samira Samadi. Preprint, 2025. [paper]
Toward an Evaluation Science for Generative AI Systems. Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, William Isaac. The Bridge 2025, National Academy of Engineering. [paper]
What’s in a Query: Polarity-Aware Distribution-Based Fair Ranking. Aparna Balagopalan, Kai Wang, Olawale Salaudeen, Asia Biega, Marzyeh Ghassemi. WWW 2025. [paper] [code]
On Domain Generalization Datasets as Proxy Benchmarks for Causal Representation Learning. Olawale Salaudeen, Nicole Chiou, Sanmi Koyejo. Accepted as an oral at the NeurIPS 2024 Workshop on Causal Representation Learning. [paper]
ImageNot: A Contrast with ImageNet Preserves Model Rankings. Olawale Salaudeen, Moritz Hardt. Preprint, 2024. [paper] [code] [webpage]
Causally Inspired Regularization Enables Domain General Representations. Olawale Salaudeen, Oluwasanmi Koyejo. AISTATS 2024. [paper] [code] [webpage] (Preliminary version appeared at NeurIPS 2021 Workshop on Distribution Shift)
Proxy Methods for Domain Adaptation. Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D’Amour, Sanmi Koyejo, Arthur Gretton. AISTATS 2024. [paper] [code]
Addressing Observational Biases in Algorithmic Fairness Assessments. Chirag Nagpal, Olawale Salaudeen, Sanmi Koyejo, Stephen Pfohl. NeurIPS 2022 AFCP Workshop (extended abstract). [poster]
Adapting to Latent Subgroup Shifts via Concepts and Proxies
α–β. Ibrahim Alabdulmohsin*, Nicole Chiou*, Alexander D’Amour*, Arthur Gretton*, Sanmi Koyejo*, Matt J. Kusner*, Stephen R. Pfohl*, Olawale Salaudeen*, Jessica Schrouff*, Katherine Tsai*. AISTATS 2023. Preliminary version appeared at ICML 2022 Workshop on the Principles of Distribution Shift. [paper] [code] [webpage]
Enhancing fMRI Motion Denoising with ICA-AROMA and Causal Discovery. Olawale Salaudeen, Paul Camacho, Aron Barbey, Brad Sutton, Sanmi Koyejo. In Review. [code]
Ultra-fast 3D fMRI to Explore Cardiac-Induced Fluctuations in BOLD-Based Functional Imaging. Brad Sutton, Aaron Anderson, Benjamin Zimmerman, Paul Camacho, Riwei Jin, Charles Marchini, Olawale Salaudeen, Natalie Ramsy, Davide Boido, Serge Charpak, Andrew Webb, Luisa Ciobanu. International Society for Magnetic Resonance in Medicine (ISMRM), 2022 (abstract). [link]

Page updated

Google Sites

Report abuse