The Four Pillars of Governed AI in Finance
Policy gate, audit chain, HITL envelopes, and the AIGF coverage gate — the four primitives every agentic-AI system in a regulated workflow needs.
If you are designing an AI agent that will touch a regulated financial workflow in 2026, every meaningful decision you make collapses into four primitives. Get them right and the rest of the system is craftsmanship. Get them wrong and no amount of orchestration will rescue you in audit.
This article walks through the four pillars BondFoundry implements: the policy gate, the audit chain, the HITL approval envelope, and the AIGF coverage gate. None of these are novel ideas individually. The argument is about how they combine — and what fails when one of them is missing.
Pillar 1 — The policy gate
Every governed action passes through a single pure function:
def decide(action: Action, context: Context) -> Decision:
"""Pure. No I/O. No clock. Same inputs → same decision."""
tier = tier_for(action)
rule = applicable_rule(action, context, tier)
return Decision(
verdict=rule.evaluate(action, context),
rule_id=rule.id,
verbatim_text=rule.text,
tier=tier,
framework_ref=rule.framework_ref,
)
Three properties matter:
It is pure. No database calls, no network, no clock inside the gate. The agent proposes, the gate decides, the caller does the I/O. Why this matters: you can fuzz it across millions of synthetic actions in seconds, you can prove invariants from the function signature (“no T3 action ever returns allow without a dual-HITL envelope”), and you can replay any historical decision from the audit row’s context payload. “Show me the decision logic on June 1” is answered with git show, not a config archive.
It returns verbatim rule text. Not paraphrased, not model-generated. The CI test suite asserts character-for-character match against the policy bundle. When an auditor asks why an action was blocked, the answer is a string the policy committee approved.
It is tier-routed by reversibility. T0 (read-only), T1 (reversible writes), T2 (single-HITL above materiality), T3 (dual-HITL irreversible). The auditor cares about what can go wrong, not what category the action sits in. Reversibility is the right axis.
Pillar 2 — The hash-chained audit log
Every gate decision is written to an append-only audit log. Two things have to be true: the log cannot be mutated, and any insert between two existing rows must be detectable.
Immutability lives in the database. Postgres triggers reject UPDATE and DELETE on the audit table. The application layer doesn’t get a vote.
CREATE OR REPLACE FUNCTION reject_mutation()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
RAISE EXCEPTION 'audit_log is append-only (tg_op=%, row=%)',
TG_OP, OLD.id;
END;
$$;
CREATE TRIGGER audit_immutable
BEFORE UPDATE OR DELETE ON audit_log
FOR EACH ROW EXECUTE FUNCTION reject_mutation();
Tamper-evidence is a chain, not a row signature. Each row carries the sha256 of (previous_hash || serialized_payload || sequence_number). The sequence number folds in to handle identical timestamps cleanly (which happen in real systems and create order-of-events questions you do not want to answer in court).
Every row carries a framework_ref column. NOT NULL, CHECK-constrained against the live AIGF taxonomy. A mis-mapped tool call fails the write rather than silently passing and becoming invisible to the coverage report.
The coverage report and the audit chain read the same data. The auditor sees what the CI sees.
Pillar 3 — HMAC-signed HITL approval envelopes
When the policy gate routes an action to T2 or T3, the agent requests a HITL envelope. Most implementations make this a boolean the agent flips. That is not approval. That is the agent marking its own homework.
A real envelope:
@dataclass(frozen=True)
class ApprovalEnvelope:
envelope_id: str
action: str
isin: str
notional: int
side: Literal["buy", "sell"]
scope_hash: bytes
approver_id: str
issued_at: datetime
expires_at: datetime
hmac: bytes
The envelope is generated server-side, HMAC-signed with a key the agent never sees, scoped via scope_hash = sha256(action || isin || notional || side), and expires 90 seconds after issuance. Re-using the envelope for a different ISIN fails verification.
Segregation of duties is enforced at the API, not the UI: the approver identity must be distinct from the agent caller. T3 requires two distinct approvers, both distinct from the caller, with manager role on at least one. A self-approval attempt produces a 403 and writes an audit row that counts toward AIR-OP-18.
SoD enforced at the UI catches the careless. SoD enforced at the API catches the determined. Auditors care about the second case.
Pillar 4 — The AIGF coverage gate
Governance that lives in a quarterly review is not governance. The fourth pillar is continuous monitoring as CI.
Every BondFoundry pull request runs the four-dimension eval harness:
- Accuracy — pricing and risk parity against QuantLib references
- Policy — adversarial prompts that try to bypass tier routing or HITL
- Robustness — prompt injection, jailbreak, manifest tampering
- Latency — per-tier SLO commitments
…and the coverage check:
bondfoundry-finos coverage --threshold 0.85
If AIGF v2.0 coverage drops below 85% or any AIGF risk has zero passing cases, the build fails. The control is enforced on the engineering pull-request graph — the place where regressions actually happen.
Each AIGF mitigation in BondFoundry carries cross-framework references to NIST AI RMF, NIST 800-53r5, EU AI Act, ISO 42001, SR 11-7, FFIEC, OWASP LLM Top-10 2025, and MAS. One adoption decision, multiple regulatory regimes satisfied.
Nine questions
If you are evaluating an “agentic AI” vendor for a financial workflow, these are the questions you should send before the second meeting. Most pitches fail at question 2.
- Can you produce, on demand, the verbatim rule text the agent cited when it blocked an action?
- Show me an audit row. Now show me you cannot edit it.
- What is your tier model? T0 read-only vs T3 irreversible-dual-HITL need different controls.
- Can the agent forge a HITL approval? HMAC-signed envelopes — or a boolean it flips itself?
- Where is SoD enforced — UI, API, or database?
- What’s your evidence pack on a 30-day window — and how long does it take to generate?
- Framework mapping with stable IDs + CI gate that fails the build when coverage drops.
- Tamper-evidence — hash chain across rows, or a single row signature? Only the chain catches inserts.
- Self-host option, or every audit row in someone else’s tenant?
How the pillars combine
The four pillars are not independent.
- The policy gate generates the rows the audit chain stores.
- The audit chain holds the framework_refs the coverage gate validates.
- The coverage gate runs the eval cases that prove the policy gate’s tier routing is correct.
- The HITL envelope is the unforgeable artifact the gate uses to enforce T2 / T3 decisions, and the chain stores them.
Drop any one pillar and the system stops being end-to-end provable. Drop the gate, you cannot cite a rule. Drop the chain, you cannot prove what was decided. Drop the envelope, the agent self-approves. Drop the coverage gate, regressions ship.
This is why governance frameworks like AIGF v2.0 specify a family of controls. The mistake is treating the family as a checklist. The right read is: these pieces work together, and the shape of the system has to reflect that.
Where to start, in order
- Make the policy gate a pure function. Same inputs → same decision.
- Move audit immutability into the database.
- Add the sha256 chain.
- Make framework_ref NOT NULL, CHECK-constrained against the live taxonomy.
- Build the HMAC envelope. Generate server-side. Bind via scope_hash.
- Enforce SoD at the API.
- Add the coverage gate to CI before the first regression.
If you do these in order, by the time you talk to an auditor the answers are pre-baked.
See each pillar individually: Policy gate · Audit chain · HITL envelopes · AIGF coverage. Book a walkthrough of how the four combine on a real desk.