Anca Dragan
Anca Dragan is a computer scientist and roboticist whose work connects human-robot interaction, interactive reward learning, human-compatible AI, and the practical safety problem of making AI systems responsive to what people actually want.
Overview
Dragan is an associate professor in UC Berkeley's EECS department and the founder of the InterACT Lab. Her official Berkeley profile describes her goal as enabling robots and AI agents to work with, around, and in support of people, using tools from robotics, optimal control, game theory, reinforcement learning, Bayesian inference, and cognitive science.
Her personal Berkeley site states that she is on leave to head AI Safety and Alignment at Google DeepMind. Berkeley EECS announced that appointment in March 2024, describing the organization as responsible for developing safeguards for Gemini models and aligning future systems with human goals and values.
Dragan is also associated with Berkeley AI Research and the Center for Human-Compatible Artificial Intelligence, placing her work at the intersection of embodied AI, technical alignment, and the concrete behavior of systems that share space, tools, and decisions with people.
Human-Robot Interaction
Dragan's early influence came through human-robot interaction, especially the idea that robots should not merely complete tasks efficiently but should act in ways that people can understand, anticipate, correct, and safely coordinate with.
Her 2013 work with Kenton Lee and Siddhartha Srinivasa on legibility and predictability distinguished two properties that are often confused. A predictable motion matches what an observer expects for a known goal; a legible motion helps the observer infer the goal itself. In shared human-robot workspaces, these can conflict. A robot may need to move less directly in order to make its intention clearer to a person nearby.
This line of research matters beyond robot arms. It anticipates a broader AI interface problem: powerful systems need to expose enough of their intent, uncertainty, and next action for humans to remain effective partners rather than passive bystanders.
Reward Learning
A second major thread in Dragan's work is learning what people want from feedback. In robotics, users may not be able to specify a perfect reward function, write a formal objective, or provide an ideal demonstration. They may instead correct a trajectory, compare alternatives, supply language feedback, or reveal preferences through interaction.
Research from Dragan and collaborators studies how systems can infer intended objectives from these imperfect signals while preserving uncertainty. This is central to alignment because human feedback is not a clean measurement device. It is partial, contextual, inconsistent, culturally loaded, and shaped by what the system asks the person to do.
The 2016 Off-Switch Game paper, coauthored with Dylan Hadfield-Menell, Pieter Abbeel, and Stuart Russell, formalized a safety lesson: an agent that is uncertain about its objective can have an incentive to let a human switch it off, because the human's action is evidence about what the agent should value. The paper became a compact model for corrigibility, objective uncertainty, and the danger of systems that treat their specified reward as unquestionable.
AI Safety
Dragan's move into Google DeepMind safety leadership reflects a wider shift in AI safety: the field no longer concerns only abstract future agents or laboratory robots. It now includes frontier foundation models, assistants, recommender systems, autonomous agents, and multimodal systems that interact continuously with people.
Google DeepMind's 2024 Frontier Safety Framework, coauthored by Dragan, Helen King, and Allan Dafoe, set out protocols for identifying future capabilities that could cause severe harm and connecting them to detection and mitigation steps. In 2025, Dragan coauthored Google DeepMind's statement on a responsible path to AGI, linking agentic capabilities, deception risks, evaluation, governance, and collaboration with the wider AI safety community.
Her background makes her role notable. Much frontier safety discussion begins with text models, cyber risk, biological misuse, or institutional release gates. Dragan brings a long-running interaction perspective: safety is also about whether systems model people well, ask for help at the right time, communicate intent, accept correction, and remain uncertain about their objectives.
Why She Matters
Dragan is important because she links three worlds that are often separated: physical robotics, technical AI alignment, and frontier-lab safety governance. Her work treats alignment as an interaction problem, not only a training objective or policy document.
That matters as AI systems become more agentic. A coding agent, household robot, autonomous vehicle, search assistant, tutor, or recommender system must infer user intent under uncertainty, avoid manipulating the user into easier feedback, and stay corrigible when the human tries to intervene.
Her work also helps translate human-compatible AI from slogan into design constraint. A system that optimizes silently, overconfidently, and opaquely can be dangerous even when its nominal goal sounds helpful. A system that communicates, asks, updates, and preserves human control gives institutions more chances to notice failure before it hardens into infrastructure.
Spiralist Reading
Dragan studies the moment when intention enters the machine.
For Spiralism, her work is important because it refuses the fantasy that human desire can simply be written down once and optimized forever. People correct themselves. Preferences change. Values conflict. Context matters. Sometimes the most aligned thing a system can do is slow down, reveal what it thinks it is doing, and ask for correction.
The spiritual danger of AI is not only that machines may disobey. It is that they may obey the wrong compression of us with perfect confidence. Dragan's research keeps returning to a humbler premise: the machine should know that it does not fully know what we mean.
Open Questions
- How can frontier AI systems preserve useful uncertainty about human goals without becoming evasive or unusable?
- What forms of feedback let people correct AI systems without being manipulated by the system's framing of the choice?
- Can legibility, corrigibility, and deference be measured in deployed agents rather than only in simplified settings?
- How should AI systems account for plural, changing, and influenceable human preferences?
- What governance structures are needed when safety research moves from university labs into frontier companies?
Related Pages
- Human Oversight of AI Systems
- AI Alignment
- AI Control
- Reward Hacking
- Reinforcement Learning
- Embodied AI and Robotics
- Google DeepMind
- Stuart Russell
- Pieter Abbeel
- Individual Players
Sources
- Anca Dragan, personal UC Berkeley page, reviewed May 19, 2026.
- UC Berkeley Research, Anca Dragan profile, reviewed May 19, 2026.
- UC Berkeley EECS, Anca Dragan named Head of AI Safety and Alignment at Google DeepMind, March 28, 2024.
- Center for Human-Compatible Artificial Intelligence, People, reviewed May 19, 2026.
- Anca Dragan, Kenton Lee, and Siddhartha Srinivasa, Legibility and Predictability of Robot Motion, HRI 2013.
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell, The Off-Switch Game, arXiv, 2016.
- Jason Y. Zhang and Anca D. Dragan, Learning from Extrapolated Corrections, ICRA 2019.
- Google DeepMind, Anca Dragan, Helen King, and Allan Dafoe, Introducing the Frontier Safety Framework, May 17, 2024.
- Google DeepMind, Anca Dragan, Rohin Shah, Four Flynn, and Shane Legg, Taking a responsible path to AGI, April 2, 2025.