I am broadly interested in reliable AI and am generally working on the science of measuring AI capabilities and intervening on AI behavior — bridging theory, algorithms, and real-world impact. I develop the principles and practices of reliable AI evaluation. This includes studying the external validity of key benchmarks (ImageNet) in deep learning, the internal validity of benchmarks for out-of-distribution generalization, and frameworks for valid evaluation of latent AI capabilities and traits. I also develop methods to understand and intervene in mechanisms, such as causal versus spurious, that determine AI behavior. My work enables AI systems to generalize and adapt to new environments that differ from their training data, ensuring that AI systems are reliable and safe in dynamic, real-world settings. Application areas of my work include health and medicine, algorithmic fairness, and AI policy.
My research has been supported by a Sloan Scholarship, Beckman Graduate Research Fellowship, GEM Associate Fellowship, NSF Miniature Brain Machinery Traineeship. Additionally, I have interned at Sandia National Laboratories (w/ Dr. Eric Goodman), Google Brain (now Google DeepMind) (w/ Dr. Alex D’Amour), Cruise LLC, and the Max Planck Institute for Intelligent Systems (w/ Dr. Moritz Hardt).