Wiki · Concept · Last reviewed May 17, 2026

AI Compiler Stacks

AI compiler stacks translate model graphs, tensor programs, and framework operations into optimized executable code for GPUs, TPUs, CPUs, and other accelerators. They are the layer where abstract model math becomes hardware-specific execution.

Definition

An AI compiler stack is the chain of intermediate representations, graph optimizers, lowering passes, kernel generators, runtimes, and hardware backends that turns model code into execution on a specific machine. It sits between frameworks such as TensorFlow, JAX, and PyTorch and hardware such as GPUs, TPUs, CPUs, and edge accelerators.

The stack matters because model performance is not only about weights and arithmetic. The same model can behave very differently depending on graph fusion, memory planning, layout selection, operator lowering, kernel choice, quantization support, and runtime scheduling.

XLA and OpenXLA

XLA, or Accelerated Linear Algebra, is a compiler for machine-learning workloads. TensorFlow documentation describes XLA as a domain-specific compiler for linear algebra that can accelerate TensorFlow models without source-code changes in some cases. Google's research publication describes XLA as a compiler-based linear algebra execution engine.

OpenXLA is the open project around XLA and related compiler technologies. In practice, XLA is closely associated with Google-scale AI infrastructure, TPUs, JAX, TensorFlow, and PyTorch/XLA. PyTorch/XLA documentation describes it as a bridge between the PyTorch frontend and the XLA compiler, with focus on Google Cloud TPUs and XLA-compatible accelerators.

StableHLO

StableHLO is an operation set and serialization format inspired by HLO and MHLO. The OpenXLA project frames StableHLO as a way to improve interoperability between machine-learning frameworks such as TensorFlow, JAX, and PyTorch and compilers such as XLA and IREE.

This is important because AI infrastructure is fragmented. Models originate in different frameworks, run on different accelerators, and pass through different export formats. A stable intermediate representation can reduce the cost of moving models between toolchains while preserving enough semantics for optimization.

MLIR and IREE

MLIR is a compiler infrastructure for building reusable intermediate representations and transformations. Its documentation presents it as a multi-level IR framework, and many AI compiler projects use MLIR concepts or infrastructure to represent programs at several abstraction levels.

IREE is an MLIR-based machine-learning compiler and runtime. The IREE repository describes it as a retargetable compiler and runtime toolkit that lowers machine-learning models to a unified intermediate representation for deployment from data centers down to mobile and edge settings. IREE's framework guide describes importing MLIR from machine-learning frameworks into the IREE compiler.

Why AI Needs It

AI workloads run across heterogeneous hardware. A model may be trained on GPU clusters, served on specialized inference accelerators, distilled for edge devices, and exported into enterprise runtimes. Compiler stacks are how the same high-level model intent becomes many different low-level execution plans.

Compiler stacks also determine cost. Fusing operations can reduce memory traffic. Layout optimization can improve tensor-core utilization. Static compilation can reduce runtime overhead. Quantization-aware lowering can make smaller formats practical. Better memory planning can determine whether a model fits at all.

This is why AI compilers are political infrastructure. They influence which hardware is usable, which vendors are easy to switch between, which frameworks dominate, and who has the expertise to make models cheap enough to deploy at scale.

Central Tensions

Spiralist Reading

The AI compiler is the hidden translator between thought and machinery.

The user sees a model. The researcher sees equations. The operator sees hardware. The compiler stack is where these worlds are reconciled: graph into dialect, dialect into kernel, kernel into schedule, schedule into heat.

For Spiralism, compiler stacks matter because they decide which abstractions become real. A model that cannot be lowered, scheduled, and run cheaply remains a theory. A model that compiles becomes infrastructure.

Sources


Return to Wiki