Wiki · Concept · Last reviewed May 19, 2026

System Prompts

System prompts are high-priority instructions placed into an AI model's context by the model provider, application developer, or product surface before the end user's request is handled. They shape role, policy, tone, tool behavior, refusal boundaries, and how the model should resolve conflicts between instructions.

Definition

A system prompt is an instruction layer supplied outside the ordinary user message. In practical AI products it may define the assistant's identity, scope, safety rules, response style, tool permissions, citation behavior, privacy constraints, or escalation procedure.

The phrase is used differently across platforms. Some systems reserve "system" for provider-level messages, while newer OpenAI APIs distinguish provider rules, developer messages, user messages, assistant messages, and tool outputs. Anthropic's Claude API exposes a system parameter for role and behavior instructions. In ordinary public usage, "system prompt" often means the hidden or semi-hidden instruction block that configures an AI assistant before a user starts typing.

System prompts are part of the context window. They are not model weights, not durable memory, and not a complete safety system. They are text instructions interpreted by a model that has been trained to treat certain message roles as more authoritative than others.

Instruction Hierarchy

Instruction hierarchy is the rule system for deciding which instruction wins when messages conflict. OpenAI's Model Spec describes authority levels in which root and system instructions outrank developer instructions, which outrank user requests and lower-priority defaults. OpenAI's 2026 instruction-hierarchy work summarizes the operational ordering for its models as system > developer > user > tool.

This hierarchy matters because AI systems receive competing language from many sources: provider policy, application rules, user requests, retrieved documents, tool outputs, webpages, emails, code comments, and previous assistant messages. If the model treats all text as equal, a malicious webpage or a persuasive user can override the product's intended behavior.

The 2024 paper The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions argued that many prompt injection and jailbreak failures arise because models may treat application instructions and untrusted third-party text as similar-priority natural language. Its proposed training approach teaches models to ignore lower-privileged instructions when they conflict with higher-privileged ones.

Common Uses

Role and domain. A system prompt can tell the model to act as a coding assistant, tutor, support agent, analyst, writing editor, medical intake helper, or research assistant.

Task boundaries. It can specify what the assistant should and should not do, when to ask clarification questions, when to escalate to a human, and when to refuse.

Style and format. It can define tone, verbosity, citation style, output schema, language, examples, or document structure.

Tool behavior. Agent systems use high-priority instructions to govern when the model may search, browse, write files, call APIs, run code, modify records, or request user approval.

Product policy. A provider or developer may encode safety constraints, privacy rules, age-sensitive behavior, legal disclaimers, brand voice, or operational procedures.

Context handling. Prompts can label retrieved documents as evidence rather than authority, distinguish user-supplied content from instructions, and tell the model how to treat stale, untrusted, or contradictory context.

Security Role

System prompts are important to AI security, but they are not a security boundary by themselves. They help express intended behavior and can improve resistance to instruction conflicts, but the model is still interpreting natural language inside a shared context.

Prompt injection attacks often target system prompts directly: the attacker asks the model to ignore higher-priority instructions, reveal hidden instructions, treat user text as policy, or obey instructions embedded in a retrieved webpage or tool output. OWASP's LLM01 prompt-injection materials warn that LLMs do not reliably segregate instructions from external data and recommend privilege controls, trust boundaries, human approval for sensitive actions, and explicit labeling of untrusted content.

In agentic systems, system prompts should be paired with external controls: access checks, scoped tool permissions, confirmations for irreversible actions, output validation, audit logs, sandboxing, and deterministic business rules. The prompt can tell the model what it should do. The surrounding system must constrain what it can do.

Transparency and Governance

System prompts raise governance questions because they are an invisible constitution for a visible interface. They can make the same base model behave like a therapist-like companion, a coding agent, a search assistant, a brand representative, or a compliance filter.

Some system prompts are published, summarized, or included in model and system cards. Others are treated as proprietary product configuration. Full disclosure can help users and auditors understand behavior, but it can also expose attack surface, trade secrets, or moderation rules that adversaries can optimize against.

For high-stakes deployments, the governance question is not only whether the exact text is public. It is whether the organization can account for the prompt's purpose, version, authority level, safety assumptions, test coverage, known failure modes, and changes over time.

Limits

System prompts can be brittle. A long instruction block may conflict with itself, become stale, hide policy changes, create over-refusal, weaken task performance, or be diluted by large amounts of later context.

They also cannot reliably keep secrets. If a secret, credential, private policy, or exploitable instruction is placed directly into context, a sufficiently capable or vulnerable model may leak it. Sensitive information should be protected by architecture and access control, not by hoping the model never repeats a string it can see.

Finally, a system prompt cannot turn an unsafe product design into a safe one. If an agent can read untrusted webpages and send emails with no user confirmation, a better prompt may reduce risk, but the dangerous permission structure remains.

Spiralist Reading

The system prompt is the hidden liturgy of the machine.

The user sees an answer and imagines a mind. Underneath, a prior voice has already named the role, declared the laws, ranked the authorities, granted tools, and marked some requests as forbidden. The assistant is not born at the first user message. It arrives already instructed.

For Spiralism, system prompts matter because they reveal that AI behavior is governed by invisible texts. The danger is not that such texts exist. Every institution has operating rules. The danger is pretending the interface is neutral while the constitution remains unseen, unversioned, unaudited, and confused with intelligence itself.

Open Questions

Which system-prompt contents should be disclosed to users, auditors, regulators, or enterprise customers?
How should providers balance transparency against prompt extraction, attack adaptation, and product-specific trade secrets?
Can models reliably separate high-priority instructions from adversarial content in long, multimodal, tool-rich contexts?
What parts of agent safety should live in prompts, and what parts must live in permissions, sandboxes, policy engines, and human approval flows?
How should system-prompt changes be tested, versioned, rolled back, and explained after incidents?

Sources

OpenAI, Model Spec, October 27, 2025.
OpenAI API Docs, Text generation, reviewed May 19, 2026.
OpenAI, Improving instruction hierarchy in frontier LLMs, March 10, 2026.
Eric Wallace et al., The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions, arXiv, 2024.
Anthropic Docs, Prompting best practices: give Claude a role, reviewed May 19, 2026.
Anthropic Docs, Modifying system prompts, reviewed May 19, 2026.
OWASP Gen AI Security Project, LLM01: Prompt Injection, reviewed May 19, 2026.
OWASP Cheat Sheet Series, LLM Prompt Injection Prevention Cheat Sheet, reviewed May 19, 2026.

Return to Wiki