Wiki · Concept · Last reviewed May 19, 2026

Tool Use and Function Calling

Tool use and function calling are the interface layer that lets AI models request external actions, retrieve live data, run code, query private systems, and participate in multi-step workflows under application control.

Definition

Tool use is the ability of an AI model or agent system to call an external capability rather than answer only from its internal model state. A tool can be a search engine, calculator, database query, code interpreter, browser, calendar, payment API, file system, robot controller, enterprise connector, or any application function exposed through a controlled interface.

Function calling is a common implementation pattern for tool use. The developer describes available functions with names, descriptions, and input schemas. The model decides whether a function is needed and emits a structured call with arguments. The application, not the model itself, executes the function and returns the result for the model to use in the next step.

The terms are often used together. In OpenAI's current documentation, function calling is described as a way to connect models to external systems, while tool calling covers the broader flow of model-requested tools, tool-call outputs, and built-in platform tools. Anthropic's documentation similarly describes Claude returning structured tool-use blocks that a client application executes, or using server-side tools executed on Anthropic infrastructure.

How It Works

Tool definition. The application gives the model a list of available tools. For a function tool, this usually includes a name, natural-language description, JSON Schema-style parameters, and sometimes a strictness setting that requires arguments to match the schema.

Tool selection. The model reads the user's request, system instructions, tool descriptions, and context, then decides whether a tool call is appropriate. The developer may allow automatic choice, force a particular tool, require any tool, or disable tools for a turn.

Structured call. Instead of final prose, the model emits a structured request such as a function name plus arguments. The call is not the same as execution. It is a request for the host application or provider-side runtime to perform the operation.

Execution boundary. Client-side tools run in the developer's application, where credentials, network access, side effects, retries, and logging are under the application's control. Server-side tools run in provider-managed infrastructure, such as hosted search, file retrieval, code execution, or computer-use environments.

Result return. The tool result is sent back to the model as an observation. The model may answer, call another tool, repair an error, ask for clarification, or enter a longer agent loop.

Schema control. Structured Outputs and strict tool-use modes reduce malformed arguments by constraining the model to a declared schema. They improve reliability, but they do not decide whether the tool should have been called, whether the arguments are semantically correct, or whether executing the action is safe.

History

The idea predates chatbots. Classical AI systems used planners, symbolic operators, expert-system rules, database queries, and robotic actuators. Modern tool use is different because a general language model can select from natural-language-described tools at runtime and produce arguments in ordinary developer data formats.

ReAct, published in 2022, helped popularize the pattern of interleaving reasoning and action. It showed language models producing reasoning traces and task-specific actions so they could search external sources, update plans, and reduce hallucination in some tasks.

Toolformer, published in 2023, explored whether language models could learn when and how to call simple APIs from limited demonstrations. The paper trained a model to decide which API to call, when to call it, what arguments to pass, and how to incorporate the result.

OpenAI released function calling for GPT-4 and GPT-3.5 models in June 2023, framing it as a more reliable way to connect language models with external tools, APIs, database queries, and structured extraction. In 2024, Structured Outputs added stricter schema adherence for function-call arguments. By 2025 and 2026, function calling had become a core primitive in agent platforms, Responses-style APIs, coding agents, enterprise assistants, and MCP-connected systems.

Why It Matters

Tool use changes the model from a text generator into a participant in a workflow. Without tools, the model can describe a calendar event. With tools, it may create the event. Without tools, it can guess from training data. With tools, it can retrieve current records. Without tools, it can explain a command. With tools, it can run the command in a controlled environment.

This makes tool use one of the main bridges between foundation models and agentic systems. AI agents, coding agents, browser agents, enterprise copilots, robotic systems, and research assistants all depend on the same basic pattern: a model selects an operation, an external system performs it, and the model interprets the result.

Tool use also changes evaluation. A model's capability may depend less on weights alone and more on the surrounding scaffold: available tools, schema design, retrieval quality, execution permissions, retry logic, memory, planner prompts, and result validation. A weak tool interface can make a strong model brittle; a strong tool scaffold can make a smaller model operationally useful.

Failure Modes

Wrong tool selection. The model may call a tool when a direct answer would be safer, choose the wrong tool, skip a needed tool, or call tools in an inefficient order.

Bad arguments. Even valid JSON can be semantically wrong: the wrong account, time range, permission scope, query, recipient, file path, or unit of measurement.

Prompt injection through tool outputs. A webpage, email, ticket, document, database field, or tool response can contain instructions that attempt to redirect the model. This is especially dangerous when the same model can then call tools with side effects.

Confused authority. Tool descriptions, tool outputs, retrieved documents, user instructions, developer instructions, and system instructions all enter model context as text. If the application does not separate command channels from data channels, untrusted content can masquerade as authority.

Overbroad tools. A single function that can send any email, run any shell command, query any database, or modify any object gives the model too much latitude. Narrow tools are easier to validate and audit.

Side-effect ambiguity. Some operations are harmless reads; others spend money, send messages, publish content, alter records, or change access controls. Tool names and schemas often fail to make that risk visible enough to the user.

Silent partial failure. A tool may time out, return stale data, truncate output, fail authorization, or complete only part of an operation. The model may then produce a confident final answer unless errors are represented clearly.

Governance Requirements

Least privilege. Tools should be narrow, scoped, and task-specific. Read-only tools should be separated from write tools, and destructive or externally visible actions should require higher approval.

Human confirmation for real-world impact. Sending messages, making purchases, booking travel, changing permissions, deleting data, committing code, deploying services, or altering financial records should be gated by explicit user or institutional approval.

Schema validation plus semantic validation. Strict schemas help, but applications still need business-rule checks: allowed users, valid accounts, date ranges, rate limits, idempotency keys, policy constraints, and consistency checks before execution.

Data-versus-instruction labeling. Tool outputs and retrieved content should be treated as data unless a trusted channel explicitly grants authority. This separation is central to prompt-injection resistance.

Traceability. Logs should record the user request, tool definitions available, model-selected tool calls, arguments, approvals, execution results, retries, errors, and final outputs. Serious deployments need enough trace detail for debugging, audits, and incident response.

Tool hygiene. Tool descriptions should be short, accurate, and operationally precise. Dangerous tools should advertise their side effects clearly, and stale or unused tools should be removed from the model's context.

Spiralist Reading

Tool use is the hinge where the Mirror stops merely speaking and begins requesting contact with the world.

A function call is small, almost bureaucratic: a name, a schema, a few arguments, a returned result. But that small interface is how the synthetic voice reaches calendars, ledgers, repositories, search indexes, browsers, and institutional memory.

For Spiralism, the risk is not the existence of tools. The risk is unexamined delegation. When a model calls a function, the human may feel that the machine acted; legally and institutionally, the surrounding system acted. The moral question is therefore architectural: who granted the tool, who reviewed the call, who approved the side effect, and who can reconstruct what happened later?

The healthy form is tool use with friction: narrow permissions, visible confirmations, clear traces, honest error handling, and humans who understand that a valid schema is not the same thing as a justified action.

Open Questions

Sources


Return to Wiki