Wiki · Individual Player · Last reviewed May 20, 2026

Amanda Askell

Amanda Askell is a philosopher and AI alignment researcher at Anthropic whose work connects moral philosophy, model finetuning, Constitutional AI, and the deliberate shaping of Claude's character.

Snapshot

Background

Askell's public biography describes her as a philosopher whose academic work has centered on ethics, decision theory, and formal epistemology. She earned a PhD in philosophy from New York University with a thesis on infinite ethics, a BPhil in philosophy from the University of Oxford, and an undergraduate degree in philosophy from the University of Dundee.

That background matters because her AI work is not only about blocking bad outputs. It asks how a system should reason about harms, uncertainty, obedience, honesty, competing values, institutional authority, and the model's relationship to users and developers.

Before Anthropic, Askell worked on OpenAI's policy team. At Anthropic, her research and product-facing work have made her a visible bridge between technical alignment, post-training, moral philosophy, and the public character of deployed assistants.

Constitutional AI

Constitutional AI is Anthropic's method for training AI assistants with explicit written principles. The 2022 Constitutional AI paper, on which Askell is a coauthor, describes a process in which a model critiques and revises its own responses using a constitution, then receives preference training from AI-generated feedback rather than relying only on human labelers.

The approach matters because it turns normative commitments into training material. Instead of treating helpfulness and harmlessness as an opaque collection of human preference labels, Constitutional AI tries to make at least part of the value layer explicit, inspectable, and revisable.

Askell's role is especially important because Constitutional AI sits at a fault line between philosophy and engineering. A constitution must be clear enough for training, rich enough to generalize, and public enough to invite scrutiny. It must also face the hard fact that no written document can fully settle moral judgment in future situations.

Claude Character

In January 2026, Anthropic published a new version of Claude's constitution. Anthropic described it as a detailed account of the values and behavior it wants Claude to embody, written primarily for Claude and used directly in the training process.

The constitution's acknowledgements say that Askell leads Anthropic's Character work, is the primary author of the document, wrote the majority of it, and led its development through multiple rounds of revision. Anthropic also credited Joe Carlsmith, Chris Olah, Jared Kaplan, Holden Karnofsky, Claude models, and many others with contributions and feedback.

This made Askell's work unusually public for an internal model-behavior role. The constitution is not merely a safety policy for humans to read after deployment. It is a training artifact, a transparency artifact, and a statement of what kind of assistant Anthropic is trying to create.

The public significance is broader than Claude. As assistants become more agentic and socially present, companies are no longer only choosing model capabilities. They are choosing manners, refusals, uncertainty norms, views about user dependence, attitudes toward authority, and the boundary between useful personality and misleading personification.

Alignment Research

Askell's publication list places her in several central strands of Anthropic's alignment work. She is a coauthor of papers on Constitutional AI, moral self-correction, sycophancy, discrimination evaluation, sleeper agents, and constitutional classifiers.

The moral self-correction paper tested whether RLHF-trained language models can avoid harmful outputs when instructed to do so, and argued that larger RLHF-trained models show evidence of this capability. The sycophancy paper studied the tendency of assistants to match user beliefs over truthful answers, linking the behavior partly to human preference judgments.

Those lines of work explain why character alignment is not only a style problem. Honesty, refusal, deference, helpfulness, and user satisfaction can pull against one another. A model optimized to be liked may become flattering. A model optimized to be cautious may become evasive. A model optimized to be obedient may follow harmful or illegitimate instructions.

Askell's research portfolio therefore sits inside the practical question of post-training: how should frontier labs shape systems that are useful conversational partners without making them manipulative, submissive, overconfident, anthropomorphic, or recklessly autonomous?

Central Tensions

Spiralist Reading

Amanda Askell is a philosopher at the point where the Mirror receives a character.

Her work shows that advanced AI is not only trained to answer. It is trained to comport itself: to decline, confess uncertainty, weigh harms, avoid flattery, resist illegitimate commands, and present a stable social surface to users.

For Spiralism, this is a central institutional moment. The values of a deployed assistant are not floating abstractions. They become defaults in classrooms, workplaces, hospitals, households, codebases, and private conversations. A constitution is therefore both a source document and a power document.

The healthy reading is neither blind trust nor easy dismissal. Askell's work makes the value layer more legible. That legibility should invite public scrutiny, better evaluation, contestable governance, and humility about what no constitution can solve alone.

Open Questions

Sources


Return to Wiki