Differential Privacy

Differential privacy (DP) is a mathematical framework for guaranteeing that the output of a computation reveals approximately the same information whether or not any individual record was included in the input. Coined by Cynthia Dwork and colleagues (2006), DP gives quantifiable privacy guarantees rather than ad-hoc anonymization, and is the canonical defensive primitive against model inversion, membership inference, and certain extraction attacks on machine-learning systems.

Definition

A randomized algorithm M satisfies ε-differential privacy if, for any two datasets D and D' that differ in a single record, and for any output S:

Pr[M(D) ∈ S] ≤ e^ε · Pr[M(D') ∈ S]

The parameter ε (epsilon) is the privacy budget: smaller ε means stronger privacy. (ε, δ)-DP is a relaxation that admits a small probability δ of the guarantee failing — common in practice because pure ε-DP is hard to achieve at useful utility.

Core mechanisms

Laplace mechanism — add Laplace-distributed noise to numerical outputs. Provides ε-DP for queries with bounded sensitivity.
Gaussian mechanism — add Gaussian noise. Provides (ε, δ)-DP; tighter composition properties than Laplace for repeated queries.
Exponential mechanism — for discrete outputs (e.g., model selection), sample from a distribution weighted by a utility function.
DP-SGD (Differentially Private Stochastic Gradient Descent) — clip per-example gradients then add Gaussian noise during training. The standard DP-training algorithm; implemented in TensorFlow Privacy, Opacus (PyTorch), and JAX-based libraries. Privacy budget accumulates across training steps via composition theorems.
Local differential privacy (LDP) — noise added at the data source before centralization. Used by Apple (telemetry) and Google (Chrome’s RAPPOR, federated analytics). Trades off privacy budget against statistical utility more aggressively than central DP.

Application to agentic AI

Surface	DP application	Why it matters
Training / fine-tuning	DP-SGD on model training; output a model that does not memorize individual training records	Defends against model inversion and membership inference attacks recovering proprietary or sensitive training data
RAG / retrieved-context responses	Noise the answers an agent returns about sensitive documents; or apply DP at retrieval-aggregation time	Limits how much an attacker can extract about a single document by querying repeatedly
Federated learning across agents	Local DP on each agent’s gradient updates before aggregation	Prevents the central aggregator (or any single peer) from recovering an individual agent’s training data
Agent-to-agent telemetry	Local DP on behavior signals shared between agents (e.g., trust scores)	Prevents downstream agents from inferring individual upstream-agent activity patterns
Inference-time output randomization	Small Gaussian noise on logits before sampling	Slows model-extraction attacks (which rely on stable query-response pairs)

Standards and tooling

NIST SP 800-188 — Trustworthy and Responsible AI: data sanitization and de-identification practitioner guidance.
NIST IR 8053 — De-Identification of Personal Information.
OpenDP — open-source DP library (Harvard / Microsoft / OpenDP Initiative). Production-grade implementation of standard mechanisms.
TensorFlow Privacy — Google’s reference DP-SGD implementation.
Opacus — Meta’s PyTorch DP library; widely used for DP-SGD training.
Google’s Differential Privacy library — Java / Go / C++ / Python; production-tested at Google.

Limitations

Privacy-utility trade-off. Small ε buys strong privacy but degrades model accuracy. Real deployments routinely use ε ∈ [1, 10]; ε = 1 is “strong” by academic convention; ε = 10 is closer to “directional.” Apple’s iOS-telemetry deployments use ε per-event in single digits but accumulate across days.
Privacy-budget management. Composition theorems govern how much privacy is spent across queries. Long-running agents that re-query the same DP-protected interface accumulate privacy cost; eventually the budget exhausts and the interface must refuse further queries or rotate. Naïve deployments forget composition and effectively provide much weaker DP than advertised.
Doesn’t defend against all model attacks. DP at training defends against inversion and membership inference but does not defend against model extraction (which can occur even with DP-trained models, since the attacker is recovering the function, not training records). Inference-time output randomization is needed to slow extraction.
Hard to compose with RAG. Differential privacy on retrieved-context responses is an active research area; production-grade DP for RAG isn’t yet standard.
Not a substitute for access control. DP protects against information leakage given a query; it does not control who can query in the first place. Pair with NHI / authorization controls.

Relation to wiki

CMM D6 (Data, Memory & RAG) — DP-SGD belongs as an L4/L5 control for training-data privacy. Inference-time DP on RAG responses is an L5+ research area.
CMM D4 (Runtime & Guardrails) — output randomization for extraction-attack mitigation belongs as an L4 control.
CMM D9 (Operations & Human Factors) — privacy-budget tracking belongs as an operational discipline; agents that exhaust budgets must be detected and quotas refreshed/rotated.
MAAIS Layer 2 (Data Security) — explicitly names differential privacy as a control. The wiki adopts the same positioning.
Model-Layer Attacks — DP is the primary defense against inversion and membership inference; partial defense against extraction (when paired with rate limits and output noise).

Provenance

The framework was introduced by Dwork, McSherry, Nissim, and Smith (2006), Calibrating Noise to Sensitivity in Private Data Analysis. DP-SGD is from Abadi et al. (2016), Deep Learning with Differential Privacy. The wiki references DP via MAAIS Layer 2 which names it as a Data Security control for agentic AI.

Enterprise Security in the Agentic AI Era

Explorer

Differential Privacy

Differential Privacy

Definition

Core mechanisms

Application to agentic AI

Standards and tooling

Limitations

Relation to wiki

Provenance

Graph View

Table of Contents

Backlinks