I am a researcher at Google DeepMind working on AI alignment. My research aims to develop safe, interpretable, and trustworthy AI systems that learn from human feedback.
I received my PhD from ETH Zurich, where I was part of the Learning & Adaptive Systems Group supervised by Prof. Andreas Krause and Dr. Katja Hofmann. My dissertation focused on developing “Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback”.
My research focuses on what I believe are key ingredients for building safe and trustworthy AI systems: (1) learning from human feedback safely and sample efficiently; (2) providing robust supervision for very capable AI systems (e.g., via scalable oversight); (3) evaluating AI systems for dangerous capabilities; and, (4) monitoring and red-teaming of AI systems. My research interests tend to shift between these areas. To learn more about my current work, see my recent publications.