Location: NYC
olawale [at] mit [dot] edu
On the 2025–26 academic job market, seeking tenure-track positions beginning Fall 2026.
I work on AI for society through the science of valid measurement and prediction of AI capabilities and risks, and interventions to steer AI behavior. I bridge theoretical and empirical analysis and algorithm development to enable robust evaluation of AI systems, uncover the spurious and causal mechanisms behind their behavior, and design adaptation methods that steer behavior safely in changing environments.
I am an AI Institute Fellow in Residence at Schmidt Sciences, a Postdoctoral Affiliate at the Massachusetts Institute of Technology (w/ Prof. Marzyeh Ghassemi), and a Postdoctoral Scholar at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard. I received my Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2024 (w/ Prof. Sanmi Koyejo), where I was also a Visiting Ph.D. Student at Stanford University (2022–2024). I earned my B.S. in Mechanical Engineering with minors in Mathematics and Computer Science from Texas A&M University in 2019.
2025. Best Paper Award, NeurIPS Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
2025. Top Area Chair, NeurIPS 2025 (Main Track)
2025-Present. AI Institute Fellow in Residence, Schmidt Sciences
2025-Present. Scholar, Eric and Wendy Center at the Broad Institute of MIT and Harvard
2025. Fellow, New York University Tandon Faculty First-Look
2025. Fellow, Georgia Institute of Technology FOCUS Program
2021-24. Graduate Fellow, UIUC Beckman Institute
2021-24. Associate Fellow, The National GEM Consortium
2019-24. Sloan Scholar, Alfred P. Sloan Foundation
2021-23. Research Trainee, NSF Miniature Brain Machinery, UIUC
2023-24. Research Intern, Max Planck Institute for Intelligent Systems (w/ Dr. Moritz Hardt)
2023. Machine Learning Intern, Cruise LLC
2022. Student Researcher, Google Brain (now Google DeepMind) (w/ Dr. Alex D’Amour)
2017-22. R&D Intern, Sandia National Laboratories (w/ Dr. Eric Goodman)
See Publications (and related Blog Posts) for more. Google Scholar has a (potentially) more updated list.
* denotes equal contribution. α-β denotes alphabetical order.
I am also very happy to discuss new research directions and collaborations; please reach out if there is shared interest!
AI systems exhibit jagged intelligence—they excel at some tasks but fail at others that share a common human capability. My recent work aims to develop measurements of AI-specific latent traits and capabilities and risks to enable less jagged, more predictable behaviors and propensities across real-world settings.
Selected Papers. Preprint 2025 (Policy Brief), NAE 2025, NeurIPS Workshop on LLM Eval 2025 (Best Paper), NeurIPS 25, Preprint 2024.
AI models often rely on spurious correlations, latching onto easy but unreliable cues for decision-making. I design methods that help models focus on the stable, causal patterns instead, so they behave more reliably when conditions change.
Selected Papers. TMLR 2025 (J2C Certificate; news; Oral @ Neurips Workshop on Causal Representation Learning), NeurIPS 2025 (Spotlight; news), AISTATS 2024, NeurIPS 2025, WWW 2025.
AI behaviors often become unreliable when they encounter new environments, but they can adapt if provided with the right cues. My work develops methods that utilize context available at inference time to adjust model behavior on the fly, ensuring systems remain reliable and safe when conditions change.
Selected Papers. AISTATS 2023, AISTATS 2024, TMLR 2025.
Winter 2025. Our NeurIPS spotlight paper, titled Aggregation Hides OOD Generalization Failures from Spurious Correlations, was featured in MIT News.
Winter 2025. Our work On Evaluating Methods vs. Evaluating Models received a best paper award at the NeurIPS Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
Fall 2025. Our [policy brief] on validating claims about AI is now available!
Fall 2025. Three [papers] are accepted to NeurIPS 2025 (main track), including one spotlight selection! (i) Aggregation Hides OOD Generalization Failures from Spurious Correlations (spotlight), (ii) On Group Sufficiency Under Label Bias, and (iii) Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness.
Fall 2025. Two [papers] are accepted at the NeurIPS Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling, including one oral selection! (i) On Evaluating Methods vs. Evaluating Models (oral) and (ii) Measurement to Meaning: A Validity-Centered Framework for AI Evaluation.
Fall 2025. [service]. I am co-organizing the [workshop] on The Science of Benchmarking and Evaluating AI at EurIPS 25 in Copenhagen, Denmark with Yatong Chen, Moritz Hardt and Joaquiin Vanschoren!
Fall 2025. Our [paper] on single-round active learning – Improving Single-round Active Adaptation: A Prediction Variability Perspective – is accepted at TMLR!
Summer 2025. Our [paper] on the limitations of domain generalization benchmarks and solutions – Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? – is accepted at TMLR!
Summer 2025. Our [preprint] on the limitations of evaluating AI systems with tests carefully designed for human populations – Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead – is now available on arXiv!
Summer 2025. Our [preprint] on interpreting disaggregated evaluations of algorithm fairness – Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness – is now available on arXiv!
Summer 2025. [service]. I am serving as a program chair for the Machine Learning for Health (ML4H) conference in San Diego, CA, in December. Please reach out if you are interested in sponsoring this great conference!
Summer 2025. [honors/appointment]. I will spend the next year at Schmidt Sciences in NYC as a Visiting Scientist (previously titled AI Institute Fellow) starting this summer! Please reach out if you are in NYC!
Spring 2025. [honors/appointment]. I joined the Eric and Wendy Schmidt Center, led by Prof. Caroline Uhler at the Broad Institute of MIT and Harvard, as a postdoctoral scholar.
Spring 2025. Our [paper] Toward an Evaluation Science for Generative AI Systems appeared in The Bridge's (National Academy of Engineering) latest edition on "AI Promises & Risks."
Spring 2025. I gave a [talk] on addressing distribution shifts with varying levels of deployment distribution information at the MIT LIDS Postdoc NEXUS meeting!
Winter 2025. [service]. I am co-organizing the new AI for Society seminar at MIT.
Winter 2025. Our [paper] titled What’s in a Query: Examining Distribution-based Amortized Fair Ranking will appear at the International World Wide Web Conference (WWW), 2025.
Winter 2025. I was selected as an NYU Tandon Faculty First-Look Fellow; I look forward to visiting and giving a [honors/talk] on our work on distribution shifts at NYU in February; news!
Winter 2025. [service]. I am co-organizing the 30th Annual Sanjoy K. Mitter LIDS Student Conference at MIT.
Winter 2025. I was selected as a Georgia Tech FOCUS Fellow; I look forward to visiting and giving a [honors/talk] on our work on distribution shifts at Georgia Tech in January!
Fall 2024. Our [paper] titled On Domain Generalization Datasets as Proxy Benchmarks for Causal Representation Learning will appear at the Neurips 2024 workshop on causal representation learning as an oral Presentation.
Fall 2024. [appointment]. I joined the Healthy ML Lab, led by Prof. Marzyeh Ghassemi, at MIT as a postdoctoral associate!
Spring 2025. Our [preprint] on domain generalization benchmarks – Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? – is now available on arXiv!
Summer 2024. I gave a talk on our work on distribution shift at Texas State's Computer Science seminar.
Summer 2024. I gave a [talk] on our work on distribution shift at UT Austin's Institute for Foundations of Machine Learning (IFML).
Summer 2024. I successfully defended my PhD dissertation titled “Towards External Valid Machine Learning: A Spurious Correlations Perspective”!
Spring 2024. I gave a [talk] on AI for critical systems at the MobiliT.AI forum (May 28-29)!
Spring 2024. I gave a [talk] at UIUC Machine Learning Seminar on our work on the external validity of ImageNet; artifacts here!
Spring 2024. Our [preprint] demonstrating the external validity of ImageNet model/architecture rankings – ImageNot: A contrast with ImageNet preserves model ranking – is now available on arXiv!
Winter 2024. Two [papers] on machine learning under distribution shift will appear at AISTATS 2024 (see Publications)!
Winter 2024. I have returned to Stanford from MPI!
Fall 2023. I will join the Social Foundations of Computation department at the Max Planck Institute for Intelligent Systems in Tübingen, Germany this fall as a Research Intern working with Dr. Moritz Hardt!
Spring 2023. I passed my PhD Preliminary Exam!
Spring 2023. I will join Cruise LLC's Autonomous Vehicles Behaviors team in San Francisco, CA this summer as a Machine Learning Intern!
Fall 2022. I have moved to Stanford University as a "student of new faculty (SNF)" with Professor Sanmi Koyejo!
Summer 2022. I am honored to be selected as a top reviewer (10%) of ICML 2022!
Summer 2022. I will join Google Brain (now Google Deepmind) in Cambridge, MA this summer as a Research Intern!
Spring 2021. I gave a [talk] on leveraging causal discovery for fMRI denoising at the Beckman Institute Graduate Student Seminar!
Fall 2021. Our [paper] titled Exploiting Causal Chains for Domain Generalization was accepted at the 2021 NeurIPS Workshop on Distribution Shift!
Fall 2021. I was selected as a Miniature Brain Machinery (MBM) NSF Research Trainee!
Summer 2021. I was selected to receive an Illinois GEM Associate Fellowship!
Spring 2021. I passed my Ph.D. qualifying exam!
Spring 2020. I was selected to receive a 2020 Beckman Institute Graduate Fellowship!
I am happy to mentor students with overlapping research interests. Particularly for undergrads at MIT, programs like UROP are a great mechanism for mentorship.
More generally, I am very happy and available to give advice and feedback on applying to and navigating both undergraduate and graduate programs in computer science and related disciplines – especially for those to whom this type of feedback and guidance would be otherwise unavailable.