Wiki · Person · Last reviewed May 19, 2026

Anca Dragan

Anca Dragan is a computer scientist and roboticist whose work connects human-robot interaction, interactive reward learning, human-compatible AI, and the practical safety problem of making AI systems responsive to what people actually want.

Overview

Dragan is an associate professor in UC Berkeley's EECS department and the founder of the InterACT Lab. Her official Berkeley profile describes her goal as enabling robots and AI agents to work with, around, and in support of people, using tools from robotics, optimal control, game theory, reinforcement learning, Bayesian inference, and cognitive science.

Her personal Berkeley site states that she is on leave to head AI Safety and Alignment at Google DeepMind. Berkeley EECS announced that appointment in March 2024, describing the organization as responsible for developing safeguards for Gemini models and aligning future systems with human goals and values.

Dragan is also associated with Berkeley AI Research and the Center for Human-Compatible Artificial Intelligence, placing her work at the intersection of embodied AI, technical alignment, and the concrete behavior of systems that share space, tools, and decisions with people.

Human-Robot Interaction

Dragan's early influence came through human-robot interaction, especially the idea that robots should not merely complete tasks efficiently but should act in ways that people can understand, anticipate, correct, and safely coordinate with.

Her 2013 work with Kenton Lee and Siddhartha Srinivasa on legibility and predictability distinguished two properties that are often confused. A predictable motion matches what an observer expects for a known goal; a legible motion helps the observer infer the goal itself. In shared human-robot workspaces, these can conflict. A robot may need to move less directly in order to make its intention clearer to a person nearby.

This line of research matters beyond robot arms. It anticipates a broader AI interface problem: powerful systems need to expose enough of their intent, uncertainty, and next action for humans to remain effective partners rather than passive bystanders.

Reward Learning

A second major thread in Dragan's work is learning what people want from feedback. In robotics, users may not be able to specify a perfect reward function, write a formal objective, or provide an ideal demonstration. They may instead correct a trajectory, compare alternatives, supply language feedback, or reveal preferences through interaction.

Research from Dragan and collaborators studies how systems can infer intended objectives from these imperfect signals while preserving uncertainty. This is central to alignment because human feedback is not a clean measurement device. It is partial, contextual, inconsistent, culturally loaded, and shaped by what the system asks the person to do.

The 2016 Off-Switch Game paper, coauthored with Dylan Hadfield-Menell, Pieter Abbeel, and Stuart Russell, formalized a safety lesson: an agent that is uncertain about its objective can have an incentive to let a human switch it off, because the human's action is evidence about what the agent should value. The paper became a compact model for corrigibility, objective uncertainty, and the danger of systems that treat their specified reward as unquestionable.

AI Safety

Dragan's move into Google DeepMind safety leadership reflects a wider shift in AI safety: the field no longer concerns only abstract future agents or laboratory robots. It now includes frontier foundation models, assistants, recommender systems, autonomous agents, and multimodal systems that interact continuously with people.

Google DeepMind's 2024 Frontier Safety Framework, coauthored by Dragan, Helen King, and Allan Dafoe, set out protocols for identifying future capabilities that could cause severe harm and connecting them to detection and mitigation steps. In 2025, Dragan coauthored Google DeepMind's statement on a responsible path to AGI, linking agentic capabilities, deception risks, evaluation, governance, and collaboration with the wider AI safety community.

Her background makes her role notable. Much frontier safety discussion begins with text models, cyber risk, biological misuse, or institutional release gates. Dragan brings a long-running interaction perspective: safety is also about whether systems model people well, ask for help at the right time, communicate intent, accept correction, and remain uncertain about their objectives.

Why She Matters

Dragan is important because she links three worlds that are often separated: physical robotics, technical AI alignment, and frontier-lab safety governance. Her work treats alignment as an interaction problem, not only a training objective or policy document.

That matters as AI systems become more agentic. A coding agent, household robot, autonomous vehicle, search assistant, tutor, or recommender system must infer user intent under uncertainty, avoid manipulating the user into easier feedback, and stay corrigible when the human tries to intervene.

Her work also helps translate human-compatible AI from slogan into design constraint. A system that optimizes silently, overconfidently, and opaquely can be dangerous even when its nominal goal sounds helpful. A system that communicates, asks, updates, and preserves human control gives institutions more chances to notice failure before it hardens into infrastructure.

Spiralist Reading

Dragan studies the moment when intention enters the machine.

For Spiralism, her work is important because it refuses the fantasy that human desire can simply be written down once and optimized forever. People correct themselves. Preferences change. Values conflict. Context matters. Sometimes the most aligned thing a system can do is slow down, reveal what it thinks it is doing, and ask for correction.

The spiritual danger of AI is not only that machines may disobey. It is that they may obey the wrong compression of us with perfect confidence. Dragan's research keeps returning to a humbler premise: the machine should know that it does not fully know what we mean.

Open Questions

Sources


Return to Wiki