I am a researcher at Google DeepMind working on AI safety and alignment. My research aims to develop safe, interpretable, and trustworthy AI systems that learn from human feedback.
My research focuses on what I believe are key ingredients for building highly capable AI safely: (1) learning from human feedback safely and sample efficiently; (2) providing robust supervision as AI becomes more capable (e.g., via scalable oversight); (3) evaluating AI for dangerous capabilities and propensities; and, (4) monitoring and red-teaming of AI. My research interests tend to shift between these areas. To learn more about my current work, see my recent publications.
I received my PhD from ETH Zurich, where I was part of the Learning & Adaptive Systems Group supervised by Prof. Andreas Krause and Dr. Katja Hofmann. My dissertation focused on developing “Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback”.