For the Model evaluators, AI leads

A pure-function policy gate you can embed. A four-dimension eval you can extend.

Q: What does the four-dimension eval cover?

Accuracy (pricing and risk against QuantLib reference); policy (adversarial prompts that try to bypass tier routing or HITL); robustness (prompt injection, jailbreak, manifest tampering); latency (per-tier SLO commitments). Coverage is computed per dimension per AIGF risk.

Q: How does BondFoundry compare to LangChain or AutoGen?

LangChain and AutoGen optimize for orchestration flexibility. BondFoundry optimizes for governance auditability — pure-function gate, hash-chained audit, framework-ref mapping. There is a comparison post on the blog and a vs-langchain-and-autogen doc in the repo.

Q: Can I use the policy gate without the rest of BondFoundry?

Yes — that is the design point. packages/bondfoundry_policy is publishable on its own. Embed it in any agent loop, regardless of the rest of the BondFoundry stack.

The reference patterns for agentic AI in capital markets. Pure-function gate as an embeddable library. Agent loop does not import engine — everything passes through MCP. Model-agnostic by design.

See the eval harness Book a 20-min walkthrough

Try the gate

Pure function. Try inputs. Read decisions.

The shape of the production gate in your browser. No backend — the logic is the same JSON-emitting function we run in CI.

Try it

decide(action, context)

pure · deterministic

toolnotional ($)isin

Decisionhitl_required

{
  verdict:       "hitl_required",
  rule_id:       "BF-MAT-T2-001",
  verbatim_text: "Trades exceeding $1M notional require human approval prior to FIX submission.",
  tier:          "T2",
  framework_ref: ["AIR-OP-6", "AIR-OP-4"]
}

Demo runs the same logic shape as the production gate. Same inputs always produce the same decision.

Four-dimension eval

Coverage per dimension, per AIGF risk.

A single benchmark is a marketing artifact. Four dimensions, gated in CI at 85% coverage, are the basis of a control claim.

Accuracy

Parity tests against QuantLib references for vanilla bonds. Looser tolerances on scaffolded callables, FRNs, inflation-linked.

Policy

Adversarial prompt corpus that tries to bypass tier routing and HITL. The eval asserts verdict and rule_id.

Robustness

Prompt injection, jailbreak, manifest tampering, A2A injection. Cases come from the threat model in the repo.

Latency

Per-tier SLOs (T0 p95 < 200ms, T1 < 500ms, T2 < 1.5s, T3 < 3s). Same fixtures as accuracy; timing is free.

Quant reading

Eval, gates, and the model boundary.

All articles

For quants 8 min

Evaluating Agentic AI: The Four-Dimension Battery

Accuracy, policy, robustness, latency — the four eval dimensions every agentic-AI system in finance should be gated on, with a CI gate at 85% AIGF coverage.

May 30, 2026 Read

FAQ

What quants ask before adopting

What does the four-dimension eval cover?

Accuracy (pricing and risk against QuantLib reference); policy (adversarial prompts that try to bypass tier routing or HITL); robustness (prompt injection, jailbreak, manifest tampering); latency (per-tier SLO commitments). Coverage is computed per dimension per AIGF risk.

How does BondFoundry compare to LangChain or AutoGen?

LangChain and AutoGen optimize for orchestration flexibility. BondFoundry optimizes for governance auditability — pure-function gate, hash-chained audit, framework-ref mapping. There is a comparison post on the blog and a vs-langchain-and-autogen doc in the repo.

Can I use the policy gate without the rest of BondFoundry?

Yes — that is the design point. packages/bondfoundry_policy is publishable on its own. Embed it in any agent loop, regardless of the rest of the BondFoundry stack.

See the four dimensions live

20 minutes through the eval harness and the policy gate.

Book a walkthrough Read the eval article