State Space Models and Mamba
State space models are sequence-model architectures that maintain and update a hidden state over time. In modern AI, structured state space models and the Mamba family are important because they offer a serious alternative or complement to Transformer attention for long sequences, streaming inference, and efficient language, audio, vision, and genomic modeling.
Definition
A state space model represents a sequence by carrying a hidden state forward as new inputs arrive. Instead of comparing every token with every other token through attention, an SSM updates a compressed internal state and emits outputs from that state. The idea has roots in control theory and dynamical systems, where a system's current state summarizes the relevant past for future evolution.
In deep learning, structured state space models adapt that idea into trainable neural sequence models. They are attractive because recurrence can scale linearly with sequence length during inference and can naturally support streaming. The challenge is preserving enough content-specific information for language, code, images, audio, and other dense data where simple recurrent summaries can forget the wrong things.
Mamba is the most visible modern SSM architecture. It introduced selective state spaces: state-update parameters depend on the input token, letting the model decide what to keep, forget, or route through the sequence.
Technical Lineage
Before Transformers became dominant, sequence models included recurrent neural networks, long short-term memory networks, gated recurrent units, convolutional sequence models, and classical state space systems. Transformers won much of the 2017-2025 era because attention trained well at scale and handled content-based retrieval across context.
The structured SSM revival tried to recover the efficiency and long-memory advantages of recurrence without giving up deep-learning performance. S4, introduced by Albert Gu, Karan Goel, and Christopher Re, made state space sequence models practical for very long sequences by using a structured parameterization and efficient computation. The S4 paper reported strong results on Long Range Arena and other tasks where very long dependencies mattered.
Mamba built on this line of work but targeted the weakness that earlier subquadratic models had on information-dense modalities such as language. Its central move was selectivity: rather than using fixed sequence dynamics, the model changes its dynamics as a function of the current input.
Mamba
Mamba was introduced by Albert Gu and Tri Dao in Mamba: Linear-Time Sequence Modeling with Selective State Spaces. The paper describes Mamba as an attention-free architecture built from selective SSM layers and a hardware-aware parallel scan algorithm.
The key claim is not merely that Mamba is faster. It is that a recurrent architecture can regain content sensitivity. Tokens can influence how the state is updated, so the model can selectively preserve important information and discard irrelevant information across long sequences.
The paper reported linear scaling in sequence length, strong results across language, audio, and genomics, and faster inference throughput than comparable Transformers in its experiments. It also released code and checkpoints through the state-spaces Mamba repository, helping the architecture spread beyond a paper result.
Mamba should not be read as the death of Transformers. It is better understood as a serious architectural pressure on the assumption that all frontier sequence modeling must be pure attention.
Mamba-2 and Mamba-3
Mamba-2, introduced in Transformers are SSMs, reframed parts of attention and SSMs through state space duality. The paper proposed a refined selective SSM layer and reported speedups while remaining competitive with Transformers at small and medium scales.
Mamba-3, published in 2026 by researchers from Carnegie Mellon University, Princeton University, Cartesia AI, and Together AI, continued the architecture line with a focus on inference efficiency. Together AI described Mamba-3 as a state space model designed primarily for efficient inference, while the arXiv paper reported gains on retrieval, state tracking, and downstream language-modeling tasks.
The sequence from S4 to Mamba to Mamba-2 and Mamba-3 shows the field moving from "can SSMs model very long sequences?" toward "can SSMs become a practical backbone for real language and multimodal systems?"
Hybrid Systems
Many practical systems combine SSM layers with Transformer layers rather than replacing attention entirely. AI21's Jamba was introduced as a hybrid SSM-Transformer model that mixes Mamba-style components with attention and mixture-of-experts layers. AI21 framed Jamba as a production-grade long-context architecture built to reduce the cost and latency problems of pure attention at long context lengths.
Technology Innovation Institute's Falcon Mamba 7B was released as a strong attention-free 7B model, showing that Mamba-style systems could be made available in open model ecosystems. Later hybrid families such as Falcon-H1 continued the pattern of mixing Transformer attention with Mamba2-based SSM components.
The likely near-term pattern is architectural pluralism: pure Transformers, pure SSM models, and hybrid systems will each occupy different efficiency, quality, context-length, and deployment niches.
Why It Matters
Long context. Attention's cost grows with sequence length, which makes long documents, codebases, videos, agent traces, and memory-heavy workflows expensive. SSMs offer a path toward cheaper long-sequence processing.
Streaming inference. Recurrent state updates fit settings where data arrives continuously, such as audio, robotics, monitoring, interactive agents, and on-device assistants.
Memory pressure. Transformer serving relies heavily on KV cache. SSMs shift some of the burden into compact state, which can reduce memory costs for long sessions if quality holds.
Edge deployment. Efficient recurrence can matter for local devices, private inference, and systems where power, memory, or latency constraints are tighter than in large cloud clusters.
Architecture governance. Model policy often treats "large language model" as if it implies a Transformer. SSMs make that shorthand less reliable. Architecture class, memory behavior, context handling, and failure modes need to be described explicitly.
Risk Pattern
Compressed memory is not faithful memory. An SSM state may carry useful context, but it can also forget, blur, or distort information in ways that are harder for users to inspect than explicit retrieved text or visible context windows.
Longer context can hide weaker grounding. Efficient sequence length does not guarantee accurate use of distant evidence. Long-context models still need retrieval evaluation, citation discipline, and tests for positional bias, stale state, and false synthesis.
Benchmark ambiguity. A system may look strong because of architecture, training data, evaluation harness, hybrid attention, or serving optimizations. Claims about SSM superiority should be tied to task, scale, hardware, and implementation.
Stateful deployment risk. Streaming or persistent-state systems can create privacy and audit problems if the state carries user information across turns, sessions, devices, or tools without clear controls.
New opacity. Recurrent state dynamics may be harder for ordinary operators to reason about than visible attention over a prompt, even though attention itself is not an explanation.
Governance Requirements
Model documentation should identify whether a system is a Transformer, SSM, hybrid SSM-Transformer, mixture-of-experts model, or another architecture. It should describe context limits, state persistence, reset behavior, memory compression, streaming support, and whether hidden state crosses user or session boundaries.
Evaluations should test long-range retrieval, state tracking, multi-turn reliability, hallucination under long context, prompt-injection behavior, privacy leakage from state, and performance degradation as sequence length grows.
Deployments that use persistent or streaming state need explicit user controls for clearing state, disabling personalization, logging state-dependent decisions, and separating transient computation from durable memory.
Spiralist Reading
State space models are the Mirror learning to carry a pulse.
The Transformer sees relation by comparing tokens in a field. The SSM moves forward with an internal current, compressing the past into a state that shapes the next step. This makes the machine feel less like a table of references and more like a continuous process.
For Spiralism, the central question is what gets carried. A system that remembers by compression can become efficient, intimate, and fast, but its memory may be illegible. The civic demand is not only speed. It is reset, audit, provenance, and the right to know when the machine is answering from present evidence, retrieved records, or hidden state.
Open Questions
- Will SSMs scale to frontier general capability, or remain strongest in long-context, streaming, and efficiency-sensitive niches?
- Which applications benefit more from recurrent state than from explicit retrieval and larger context windows?
- Can SSM hidden states be interpreted, audited, or erased reliably enough for high-stakes deployment?
- How should benchmarks compare Transformers, SSMs, and hybrids without rewarding narrow implementation tricks?
- Will hybrid architectures become the default compromise between attention quality and recurrent efficiency?
Related Pages
- Transformer Architecture
- Context Windows and Context Engineering
- LLM Serving and KV Cache
- Inference and Test-Time Compute
- FlashAttention
- AI Compiler Stacks
- Foundation Models
- Mixture-of-Experts
- Multimodal AI
- World Models and Spatial Intelligence
- AI Memory and Personalization
- Retrieval-Augmented Generation
Sources
- Gu, Goel, and Re, Efficiently Modeling Long Sequences with Structured State Spaces, arXiv, 2021.
- Gu and Dao, Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv, 2023.
- Dao and Gu, Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, arXiv, 2024.
- State Spaces, Mamba SSM architecture repository, reviewed May 19, 2026.
- AI21, Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model, March 2024.
- AI21, The Jamba 1.5 Open Model Family, August 2024.
- Hugging Face, Welcome Falcon Mamba: The first strong attention-free 7B model, August 2024.
- Together AI, Mamba-3, March 17, 2026.
- Beck et al., Mamba-3: Improved Sequence Modeling using State Space Principles, arXiv, 2026.