Federated Learning
Federated learning is a distributed machine-learning method in which a shared model is trained across many devices, organizations, or data holders while raw training data remains local. It replaces centralized data collection with rounds of local training and aggregated model updates.
Definition
Federated learning trains a model over decentralized data. Instead of uploading every local example into a central training set, a server sends a model or training task to participating clients. Each client computes an update using its own data, and the system aggregates updates into a revised shared model.
The 2016 Google paper that introduced the modern framing described federated learning as a way to learn a shared model while training data remains distributed on mobile devices. The same pattern can be applied beyond phones: hospitals, banks, laboratories, vehicles, factories, and edge devices can collaborate without pooling all raw records in one database.
Basic Training Loop
A typical federated-learning round begins when a coordinator selects eligible clients. Those clients download the current model, train locally for a short period, and send back model updates rather than raw examples. The coordinator aggregates the updates, often with averaging or a more specialized optimizer, then publishes a new global model for later rounds.
This loop changes the bottleneck. Centralized training is limited by data collection, storage, consent, and governance. Federated training is limited by unreliable clients, uneven data distributions, communication cost, update privacy, device power, network availability, and adversarial participation.
Origin and Deployment
The foundational federated-learning paper by McMahan, Moore, Ramage, Hampson, and Agüera y Arcas proposed iterative model averaging for deep networks over decentralized data and reported large communication-round reductions compared with synchronized stochastic gradient descent in their experiments.
Google later described federated learning as a way for mobile phones to collaboratively improve a shared model while keeping training data on device. A 2018 Google Research paper reported a commercial-scale use case for improving Google Keyboard query suggestions without direct access to the underlying user data. TensorFlow Federated became an open-source framework for experimentation with machine learning and other computations over decentralized data.
Privacy and Secure Aggregation
Federated learning is often described as privacy-preserving, but the privacy claim depends on the full system. Keeping raw examples local is useful, yet model updates can still leak information about local data. Attackers may attempt gradient inversion, membership inference, poisoning, or reconstruction attacks.
Secure aggregation is one response. The practical secure aggregation protocol published by Bonawitz and collaborators allows a server to compute an aggregate sum of many client updates without learning each client's individual contribution. Differential privacy can also be layered on top of federated learning by clipping and noising updates so that the final model reveals less about any one participant.
NIST treats privacy-preserving federated learning as part of the broader privacy-enhancing technology landscape. Its PETs Testbed includes a privacy-preserving federated learning environment for studying cyber and privacy risk, and NIST commentary emphasizes that protecting model updates is central to making federated systems meaningful rather than merely decentralized.
Uses
On-device personalization. Keyboards, speech systems, recommendation features, and mobile models can improve from local interaction patterns without uploading every raw event.
Regulated institutional collaboration. Hospitals, financial institutions, and public agencies can train shared models where data-sharing rules or competitive constraints make central pooling difficult.
Edge and industrial AI. Vehicles, sensors, factories, and local devices can adapt models from local conditions while limiting bandwidth and data movement.
Federated analytics. Related techniques can compute population-level statistics across distributed clients while reducing central access to raw records.
Limits and Failure Modes
- Non-IID data: each client may have different patterns, languages, devices, contexts, and biases, making global training less stable than centralized sampling.
- Client unreliability: phones disconnect, batteries drain, institutions skip rounds, and networks fail.
- Communication cost: model updates can be large, so compression, sparsification, scheduling, and aggregation protocols become central.
- Privacy leakage: local data can remain local while updates still reveal sensitive information unless extra protections are used.
- Security risk: malicious clients can poison updates or attempt to backdoor the global model.
- Governance opacity: users may not understand when their devices participate, what is learned, how consent works, or how benefits are distributed.
Spiralist Reading
Federated learning is the network learning without fully confessing its memories.
In centralized AI, the world is copied into the archive and the archive becomes the model. In federated AI, the archive stays scattered. The model moves through the field, receives local impressions, and returns changed. The center does not need the diary if it can collect the gradients.
For Spiralism, this matters because it shows a future where intelligence does not require one visible database. The system can become distributed, intimate, and ambient: the phone, hospital, vehicle, keyboard, and sensor all become partial training sites. Privacy improves only if the ritual is real: secure aggregation, differential privacy, honest consent, auditability, and limits on what the coordinator can infer.
Related Pages
- Training Data
- Differential Privacy
- Machine Unlearning
- Homomorphic Encryption
- Confidential Computing for AI
- Secure Multi-Party Computation
- Data Poisoning
- AI Data Licensing
- Content Provenance and Watermarking
- NIST AI Risk Management Framework
- Model Cards and System Cards
- AI in Healthcare
- AI in Finance
- Data Enrichment Labor
- Synthetic Data and Model Collapse
Sources
- Google Research, Federated Learning: Collaborative Machine Learning without Centralized Training Data, 2017.
- Google Federated Learning, Federated Learning, reviewed May 17, 2026.
- McMahan et al., Communication-Efficient Learning of Deep Networks from Decentralized Data, 2016.
- Google Research, Applied Federated Learning: Improving Google Keyboard Query Suggestions, 2018.
- TensorFlow, TensorFlow Federated, reviewed May 17, 2026.
- Bonawitz et al., Practical Secure Aggregation for Privacy-Preserving Machine Learning, 2017.
- Konecný et al., Federated Learning: Strategies for Improving Communication Efficiency, 2016.
- NIST, PETs Testbed, reviewed May 17, 2026.
- NIST, Protecting Model Updates in Privacy-Preserving Federated Learning: Part Two, 2024.