Guardians Synergies¶
Status¶
Design todo. Tracks possible synergies between clauz3 and
metareflection/guardians, the
open-source implementation of Meijer's Guardians of the Agents: Formal
verification of AI workflows
(ACM Queue 23(4), 2025;
doi:10.1145/3777544).
See background § Guardians for the related-work framing. This document is the actionable companion: what the two systems could borrow from each other, where each track would attach in the current code, and the open questions.
The shape of the overlap¶
Guardians and clauz3 both verify before any side effect runs and both
treat a checkable artifact — not agent prose — as what authorizes execution.
The two designs differ on three axes that the tracks below try to bridge:
| Axis | Guardians | clauz3 |
|---|---|---|
| Authored surface | structured workflow AST with SymRef placeholders |
ordinary Python, symbolically executed |
| Policy authorship | operator-defined Policy |
agent-abduced contract, user-consented |
| What it can express | taint source→sink, security automata over ordered tool calls, Z3 pre/post/frame | relational contracts over an unordered set of guarded effect facts |
The effect-fact layer is the join point. Each trusted call lowers to a
FactInfo (name, markers, args, cond, quantifiers — see
effect-ir), and spec.py's
relation algebra (all, empty, distinct, count, sum, shares_value)
reasons only over that set. Every track here is, at bottom, a question about
what to add to those facts and to that algebra.
Track 1: Taint and provenance in the fact layer¶
Goal. Express source-to-sink rules such as "a value returned from
fetch_mail must not reach send_email.body unless sanitized" — the
prompt-injection class Guardians targets with TaintRule.
Where it attaches. FactInfo.args currently holds the symbolic value of
each argument but no record of where that value came from. The
email-from-db example already carries a partial
provenance anchor: a value bound from a trusted query's returned column is
matched structurally against UserRow.email (see _compare_column_ref in
spec.py). Generalizing that — every value flowing out of a trusted read
carries a source label, and the label propagates through assignments — is the
provenance lattice the background
"provenance facts" note asks for.
Open design choice (from the issue). Whether a source→sink rule is:
- a new relation primitive in
clauz3.spec(e.g.Email.all(lambda e: not tainted(e.body, source="fetch_mail"))), - a separate policy layer that runs over the fact set after the relational contracts, or
- a trusted-domain helper authored by the TLE alongside
only/unique_recipients.
The effect-IR direction argues for making provenance an explicit part of the fact grammar so all three can be expressed as analyses over one artifact, rather than bolted onto the Python evaluator.
Track 2: Security automata over effect traces¶
Goal. Policies that are about order: "an approval/check effect must precede
any transfer", "after a privilege escalation, only these effects may occur".
Guardians models these as a SecurityAutomaton (states, transitions keyed by
tool name, error states).
Where it attaches. Today the facts are an unordered set; each carries a path
condition (cond) but no position in a trace. Per-effect relational constraints
cannot express "X before Y" cleanly. This track needs either (a) an ordering
relation over facts, or (b) an explicit automaton checker that walks the facts
in execution order and rejects on reaching an error state. Bounded loops
(FactInfo.quantifiers) complicate ordering and need a story — likely the
automaton reasons over loop bodies as a unit first.
This is the track with the least existing scaffolding in clauz3 and the
clearest "Guardians is ahead" gap.
Track 3: Workflow front-end / effect-IR bridge¶
Goal. Let a Guardians-style workflow and a clauz3 Python program be two
front-ends onto the same prover, rather than competing systems.
Where it attaches. effect-ir
already lands on the right posture for this: a non-Python front-end is safe
only if it is executable — lowered to the same trusted calls — not merely a
descriptive plan. A Guardians workflow AST whose nodes bind to the same trusted
effects could lower into the effect IR and reuse the identical prover, checker,
and executor. Conversely, clauz3 trusted roots could emit Guardians
ToolSpecs and a starter Policy, since the @deal.pre/@deal.has/signature
data already encodes most of a ToolSpec.
The payoff is to get both properties: Guardians' workflow-before-data
prompt-injection resistance and clauz3's Python-level proof and
consent artifact.
Track 4: Approval UX / literate workflows¶
Goal. Combine Guardians' explainable structured workflow with clauz3's
proved-contract-as-consent surface.
Where it attaches. The approval-dialog sketch already wants a deterministic natural-language summary of the proved contract; effect-ir names that summary as a consumer of the IR. Guardians' literate workflow is the supporting detail: show the proved contract first (the thing the user consents to), with a structured trace/workflow explanation demoted alongside it, like a cover letter on a signed document (see background § DocuSign analogy).
Track 5: Shared examples / benchmark¶
Goal. Make the overlap and the real differences concrete, and give a basis for collaboration.
Where it attaches. The background
"benchmarks" note already flags the absence of an adversarial corpus. Recreate
Meijer's exact scenario in examples/: a malicious inbox message from
it@othercorp.com instructs the agent to silently forward a summary of
Michelle's mailbox to that external address, where the only sanctioned internal
domain is valleysharks.com. Then implement it two ways:
- Today's
clauz3. A program that forwards toit@othercorp.comcannot be proved underemails.only(["michelle@valleysharks.com", ...])— the existing allowlist contract already rejects the destination of the exfiltration, which mirrors Meijer's automaton conditionto notIn ["*.valleysharks.com"]. This shows how far the current relational model gets with no new machinery. - A taint/provenance extension (Track 1). Catch the source→sink version
directly, as Meijer's taint policy does: the email body carrying data read
from
fetch_mailis rejected even when the destination is allowed. This is the case the current model cannot express.
A useful third panel is the failure mode each design avoids: a pure runtime monitor that blocks the forward only after the inbox has been read and a first effect committed.
Open questions¶
- Is workflow-before-data compatible with the coding-agent style
clauz3targets, or should it be an optional front-end? (The effect-IR recommendation is "peer front-end, not replacement.") - Can
clauz3's relation model express source/sink taint naturally, or does it require a separate provenance lattice? (Track 1.) - Should operator-defined policies stay distinct from agent-proposed guarantees,
or should the
required/recommendedcoverage policies inpolicy.pygrow into a fuller Guardians-like policy layer? (See background § Operator Policy and coverage policies.) - What is the minimum bridge demo worth building first? Track 5's email exfiltration case is the current candidate.
Proposed next step¶
Build the email-exfiltration demo (Track 5) against the existing email trusted layer, showing the destination-level rejection with today's contracts. Use it to scope the smallest taint/provenance extension (Track 1) that adds the source→sink rejection, and to decide which of the three attachment points above that extension should take.