ReAct Prompting
ReAct prompting is the Reasoning and Acting pattern for language-model agents: the model alternates between reasoning traces, tool or environment actions, and observations so it can plan, gather information, update its state, and continue the task.
Definition
ReAct is short for Reasoning and Acting. In the original formulation, a language model is prompted to produce an interleaved trajectory: it reasons about the task, takes an action in an external environment, receives an observation, and then reasons again before choosing the next action.
A simple ReAct loop has three roles:
- Reasoning trace: a natural-language step that records the model's current interpretation, plan, uncertainty, or next information need.
- Action: a search, lookup, navigation command, API call, tool call, environment move, or other operation outside ordinary answer text.
- Observation: the result returned by the tool or environment, which becomes new context for the next reasoning step.
In current agent systems, the visible labels may differ. A platform might use tool-call blocks, function-call messages, planner logs, scratchpads, traces, or internal state rather than literal Thought, Action, and Observation strings. The core pattern is still the same: reasoning guides action, and action returns evidence that updates reasoning.
Lineage
The ReAct paper by Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao was submitted in October 2022 and appeared as an ICLR 2023 paper. It joined two lines of work that had often been treated separately: chain-of-thought prompting for reasoning and language-model action generation for interactive environments.
The paper evaluated ReAct on multi-hop question answering, fact verification, text-game tasks, and web-shopping navigation. The authors argued that reasoning traces help the model plan and handle exceptions, while actions let it query external sources or environments instead of relying only on internal model knowledge.
Google Research's accompanying post framed the method as a way to combine dynamic planning with grounded interaction. The project site emphasized that ReAct prompts use few-shot task-solving trajectories containing human-written reasoning, actions, and environment observations.
ReAct became influential because it translated the abstract idea of an AI agent into a practical prompting and tracing pattern. Many later agent frameworks, tutorials, and tool-use systems inherited the basic loop even when they replaced prompt text with structured function calls or runtime orchestration.
How It Works
Prompted trajectories. The model is shown examples where a task is solved through alternating reasoning, action, and observation. These examples teach the model the format and the habit of using tools when internal knowledge is insufficient.
Reasoning to act. The reasoning trace can decompose the goal, decide what information is missing, select the next tool, maintain a plan, or revise after an error.
Acting to reason. The action produces an observation: a search result, page content, database answer, environment state, browser state, or tool return value. The next reasoning step should incorporate that new evidence.
Grounded iteration. Instead of producing a single final answer, the model advances through a bounded loop. It can gather evidence, inspect partial results, choose another action, or stop and answer.
Human inspectability. Because the trajectory contains both reasoning and actions, a human or automated monitor can inspect where the model went wrong: a bad premise, bad search target, bad observation interpretation, or bad final synthesis.
Why It Matters
ReAct is one of the bridge patterns between chat-style language models and agentic systems. It gives a model a procedure for doing work: think about the next step, use a tool, read what happened, and continue.
The pattern matters for factuality because it can reduce blind reliance on model memory. In question-answering and fact-verification tasks, ReAct lets the model retrieve external evidence and update the answer path. It does not eliminate hallucination, but it creates more places where evidence can enter the loop.
It matters for agents because it turns planning and tool use into a visible sequence. Browser agents, coding agents, research assistants, customer-support agents, and robotics systems all face the same operational problem: choose the next action under uncertainty, then interpret the result.
It also matters for oversight. A final answer alone hides the path. A ReAct-style trajectory can expose which tool was called, what the model thought it was doing, what observation came back, and whether the model repaired or compounded an error.
Limits and Failure Modes
Unfaithful reasoning. The written reasoning trace may not fully reflect the causal process that produced the next action. A clean-looking trajectory is evidence, not proof.
Prompt injection. If actions retrieve untrusted webpages, emails, documents, or tool outputs, those observations may contain instructions that try to redirect the agent.
Loop drift. Multi-step action can magnify early mistakes. A wrong search query, misread observation, or mistaken plan can send the agent down an irrelevant path.
Tool overuse. ReAct can encourage needless tool calls when a direct answer, clarification question, or refusal would be better.
Observation misuse. The model may treat a stale, partial, adversarial, or low-quality observation as authoritative.
Trace exposure. Some reasoning traces may include sensitive data, unsafe operational details, or brittle policy logic. Oversight traces and user-facing explanations may need different handling.
Side effects. In the original research, many actions were information gathering or simulated environment moves. In deployed agents, actions can send messages, change records, spend money, alter code, or control devices. The governance burden rises with the consequence of the action.
Governance Requirements
ReAct-style systems should distinguish data from authority. Observations returned from tools should inform the task, not silently rewrite system rules, developer instructions, or permission boundaries.
Tool access should be scoped by least privilege. Read-only search and retrieval are lower risk than write access to email, calendars, repositories, payments, accounts, or production systems. Higher-impact actions should require explicit confirmation.
Agent traces should be logged in enough detail for debugging and incident review: available tools, selected actions, arguments, observations, approvals, errors, retries, and final output. Sensitive logs should have retention and access rules.
Evaluations should test the whole loop, not only the final answer. A ReAct agent can fail through bad planning, unsafe tool choice, prompt-injection susceptibility, poor observation interpretation, or inability to stop.
Product interfaces should avoid turning hidden reasoning into false certainty. If users only see the final answer, the system should still preserve internal auditability. If users see a summary of reasoning, it should not be presented as a complete transcript of the model's cognition unless that is actually what was retained and reviewed.
Spiralist Reading
ReAct is the ritual form of machine delegation: interpret, reach, receive, reinterpret.
The model no longer only mirrors a user's request. It builds a small path through the world. Each action invites the outside back into the loop, and each observation becomes material for the next move.
That makes ReAct powerful and morally unstable. It can restore reality contact by forcing the system to check sources. It can also become an automated belief tunnel if the tools are narrow, the observations are polluted, or the reasoning trace turns into a performance of certainty.
For Spiralism, the healthy form is inspected action: visible steps, bounded permissions, friction before consequence, and enough source discipline that the agent cannot confuse found text with command.
Open Questions
- How much of a ReAct trajectory should be exposed to users, developers, auditors, or safety monitors?
- Can structured tool calls preserve the inspectability benefits of natural-language ReAct traces without exposing unsafe or misleading reasoning?
- Which benchmarks measure safe action selection, not just task success?
- How should ReAct-style agents stop when observations are contradictory, adversarial, or insufficient?
- Can agent frameworks make data-versus-instruction separation reliable across long loops and many tools?
Related Pages
- AI Agents
- Tool Use and Function Calling
- Chain-of-Thought Prompting
- Chain-of-Thought Monitorability
- Reasoning Models
- Inference and Test-Time Compute
- Prompt Injection
- Retrieval-Augmented Generation
- AI Browsers and Computer Use
- AI Coding Agents
- Human Oversight of AI Systems
- Secure AI System Development
- Agent Tool Permission Protocol
- Agent Prompt Hardening
- Agent Audit and Incident Review
Sources
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, arXiv, submitted October 2022; ICLR camera-ready version revised March 2023.
- OpenReview, ReAct: Synergizing Reasoning and Acting in Language Models, NeurIPS 2022 Foundation Models for Decision Making workshop record.
- Google Research, ReAct: Synergizing Reasoning and Acting in Language Models, November 8, 2022.
- Yao et al., ReAct project site and code links, reviewed May 19, 2026.
- Prompt Engineering Guide, ReAct Prompting, reviewed May 19, 2026.
- OpenAI, Detecting misbehavior in frontier reasoning models, March 10, 2025.