Lakera Guard

Lakera Guard is a commercial AI security API from Lakera (Swiss company) that provides real-time detection of prompt injection attacks, jailbreaks, sensitive data leakage, and unsafe content in LLM inputs and outputs. It is deployed as a middleware layer that intercepts requests and responses between the application and the LLM.

Core detection capabilities

Detection categoryDescription
Prompt injectionDirect and indirect prompt injection attempts in user inputs and retrieved content
JailbreaksAttempts to bypass safety instructions or system prompts
PII / sensitive dataCredit card numbers, SSNs, API keys, passwords in inputs or outputs
Hate speech / harmful contentUnsafe output classification
Competitor mentionsConfigurable topic filtering

The detection models are continuously updated based on Lakera’s attack intelligence database, which aggregates novel injection and jailbreak patterns in near-real time from the Gandalf game — a public red-teaming platform that has collected millions of adversarial prompts.

Integration pattern

Lakera Guard operates as a sidecar API — the application routes prompts through Lakera before sending them to the LLM, and optionally routes responses through Lakera before returning them to users:

User input → Lakera Guard (input scan) → LLM → Lakera Guard (output scan) → User

Integration is via REST API; SDKs available for Python, TypeScript, LangChain, and major agent frameworks. Latency is typically 10–50ms per call.

Comparison with LlamaFirewall

DimensionLlamaFirewallLakera Guard
LicensingOpen source (Meta)Commercial SaaS
DeploymentSelf-hostedCloud API (on-premises available)
Prompt injection recall97.5% (PromptGuard 2 benchmark)Not publicly benchmarked at equivalent specificity
CoT auditingYes (AlignmentCheck)No
Code scanningYes (CodeShield)No
Data freshnessStatic model releasesContinuous updates from Gandalf intelligence
Best forFOSS/self-hosted deployments; coding agentsSaaS deployments; organizations needing managed service

Role in the RA Runtime plane

In the RA Runtime plane, Lakera Guard appears in the Topic / content safety row alongside NVIDIA NeMo Content Safety NIM, Lasso, and Microsoft Prompt Shields. The content safety row covers a broader capability than the LlamaFirewall row (input filtering / prompt injection detection) — Lakera Guard addresses both, making it a broader-scope alternative to LlamaFirewall for organizations that prefer a managed SaaS approach.

The enterprise recommended stack uses LlamaFirewall (OSS, no license cost) + NVIDIA NeMo NIMs (commercial inference); Lakera Guard is the alternative for teams that want a fully managed, continuously-updated detection service without self-hosting.