Plan-validate-execute pattern

Source: Securing Workspace GenAI at Google (Unprompted 2026, Nicolas Lidzborski). Transcript share.

A structural pattern for handling high-stakes irreversible actions in agentic systems. Instead of letting the agent execute autonomously, the action sequence splits into three stages with a deterministic gate between planning and execution. Nicolas Lidzborski presented Google Workspace’s implementation at Unprompted March 2026.

The pattern operationalizes HITL for irreversible writes (moves, deletes, sends, sharing changes), where post-hoc detection arrives too late.

The three stages

1. Plan

The agent enumerates what it intends to do before any execution begins. The plan is a structured artifact: action class, target resource, parameters, source of intent. It is not free-form natural language. It is the input to the validation gate.

The structured-plan requirement matters. Free-form intent (“I’ll email John about the contract”) is not validatable. A structured plan (“send_email; recipient=user_id_42; subject=…; attached_doc_id=…; intent_source=user_query_at_t1”) is validatable against policy and against what the user actually asked for.

2. Validate

A gatekeeper evaluates the planned action against:

Dynamically generated policy: derived from session state, accumulated risk level, data sensitivity classifications, and user history
User intent: does the planned action match what the user asked for? Cross-reference against agency-gap failure modes (wrong recipient, wrong target document, wrong scope)
Pre-defined deny rules: actions unconditionally forbidden in the current context (for example, an outbound web request after accessing high-sensitivity data)

Three outcomes:

Outcome	Action
Pass	Proceed to Execute
Confirm	Block until a human approves; surface the structured plan and reasoning to the user
Block	Refuse the action; log the attempt; surface the reason to the user

The Block path applies when a contextual security framework detects systemic failure, such as rapid-fire sensitive actions or scope-escalation patterns.

3. Execute

Only after the gate passes does the agent invoke the underlying tool. Execution is mechanical. The policy decision is already made, and the tool call is constrained to the parameters in the validated plan.

Structural distinction from “ask before doing”

Naive HITL implementations interleave the model’s reasoning with confirmation prompts. The model decides what to do, asks the user, then proceeds on the reply. Plan-validate-execute differs in three load-bearing ways:

Validation is deterministic, not LLM-based. A non-LLM policy engine evaluates the plan. This breaks the recursive-injection failure mode, where an LLM-based reviewer is itself vulnerable to the same injection.
The plan is visible before execution. When prompted, the user sees the concrete action (recipient, subject, attached resources) rather than a model summary that may not match what the agent will do.
The gate is enforced by the runtime, not the model. The model can propose; only the runtime can act. This is the platform-layer-not-prompt-layer principle in concrete form.

Review fatigue

Lidzborski is explicit: “There’s still a lot of UX research to deal with review fatigue and rubber stamping. Like, first people will keep verifying things, then they’ll just become approval bots.”

Plan-validate-execute does not solve review fatigue; it creates the surface where the problem becomes addressable. Mitigations under research:

Adaptive confirmation frequency: confirm-tier the first time an action class runs in a session, then auto-tier identical actions within the same context
Risk-proportional UX: heavyweight modals for high-impact actions, inline notifications for low-impact ones
Structured-plan summaries: surface the delta from expected behavior rather than the full plan, so anomalies stand out
Post-hoc anomaly correlation: even when users rubber-stamp, drift detection catches sustained anomaly patterns

None of these has a settled best practice as of 2026. The pattern is sound; the UX layer remains open.

Implementation surface across the reference architecture

Plan-validate-execute concentrates in the Control plane of the reference architecture but touches multiple planes:

Plane	Role
Control	Gatekeeper and policy engine; least-agency tier evaluation
Identity	Validates the agent identity issuing the plan; binds the plan to a human principal
Runtime	Lifecycle hook intercepts tool calls before execution
Egress	Tool-call broker enforces the validated plan parameters
Observability	Logs the plan, validation decision, and execution outcome for audit

A reference implementation pairs:

Cedar or OPA for the policy engine
HITL confirmation UX for the user-facing gate
Tenuo Warrants (where deployed) for the capability-token layer that constrains what the agent can request
AgentGateway for tool-call brokering

CMM positioning

In the CMM:

D3 (Runtime Guardrails) L3+: the pattern that takes a deployment from “guardrails exist” to “guardrails are structurally enforced”
D5 (Human Oversight Architecture) L3: operationalizes HITL for irreversible actions
D7 (Observability & Audit) L3: the plan, validation, and execution triple is high-value audit data

Cross-references

HITL: plan-validate-execute is the HITL implementation for the confirm tier
Bullen’s Stripe talk: Stripe’s three-ring containment uses analogous gate-before-action mechanics on egress and tool policy
Agency gap: the validation step catches non-adversarial agency-gap failures such as “wrong John” errors
Orchestration hijacking: the deterministic policy engine in the validation step resists the prompt-injection attacks that compromise LLM-based reviewers

Enterprise Security in the Agentic AI Era

Explorer

Plan-Validate-Execute Pattern