Verifiable Trust
A new employee joins your team with impressive credentials. A graduate degree from a top program. Strong recommendations. A polished interview.
You believe they can do the job. But you do not hand them unrestricted access on day one.
You start with defined responsibilities. You watch how they perform. When they consistently meet expectations, you expand their scope. When they miss deadlines, you slow down. When they handle pressure well, you speed up. Trust grows with evidence. Credentials open the door. Behavior determines how far someone goes inside.
When organizations deploy AI systems, they often abandon this logic entirely. A system performs well in testing. Leadership approves deployment. The system is granted broad permissions immediately, operating with authority it has never earned through observed behavior.
The Capability Trap
Capability and consistency are different qualities. A system can be highly capable on average and still be unreliable when conditions change. It can perform brilliantly in demonstrations and behave unpredictably in production.
The history of AI deployment reflects this pattern. Systems that excelled in testing environments produced harmful outputs when users found unexpected prompts. Agents that handled routine cases flawlessly made baffling decisions when context shifted slightly. The issue was misplaced trust in capability rather than verified performance.
Claimed capability sounds like this: "This system achieved 98 percent accuracy on a benchmark. It can handle customer service inquiries."
Demonstrated consistency sounds like this: "This system has processed 47,000 customer inquiries over six months. It maintained policy compliance in 99.2 percent of cases. All exceptions fell within defined parameters. Trust level is high for routine inquiries. Escalation is required for refunds over $500."
The difference is how knowledge is formed. Claimed capability relies on inference. Demonstrated consistency relies on evidence.
How Trust Calibration Works
Trust is not a setting applied at deployment. It is a variable that changes in response to evidence.
Trust Profiles
Every agent has a TrustProfile tracking its earned trust:
profile = runtime.trust_calibrator.get_profile("my-agent")
print(f"Overall trust: {profile.overall_trust:.2f}") # 0.0–1.0
print(f"Successful actions: {profile.successful_actions}")
print(f"Violations: {profile.violation_count}")
print(f"Violation rate: {profile.violation_rate:.1%}")Trust starts at baseline (0.5) and moves based on observed behavior.
Asymmetric Calibration
Building trust is deliberately harder than losing it:
Action allowed
+0.01
—
Action completed successfully
+0.005
—
Action denied (violation)
-0.05
5 successful actions
Action interrupted
-0.03
3 successful actions
One violation costs as much trust as five successes earn. This prevents trust farming — rapidly executing many low-risk actions to build trust for a single high-risk action.
Per-Dimension Trust
Trust is tracked per dimension, not just overall. An agent that consistently triggers concerns on one dimension (e.g., cascading impact) will have its trust lowered specifically for that dimension, even if other dimensions are fine.
Time Decay
Trust that isn't actively reinforced drifts back toward baseline. A once-trusted agent that hasn't been active gradually returns to neutral. This prevents stale high-trust profiles from persisting after an agent has been idle.
Drift Integration
Behavioral drift directly erodes trust, scaled by drift severity and confidence:
< 0.10
None (normal variance)
0.10 – 0.20
-0.002
0.20 – 0.40
-0.008
0.40 – 0.60
-0.02
≥ 0.60
-0.04
When drift recovers (drops below 0.15 from above), trust receives a small recovery bonus (+0.003).
How Trust Affects Governance
Trust feeds back into every governance evaluation:
UCS modulation: Higher trust shifts UCS scores upward. Lower trust shifts them down. At default influence (0.2), trust can move the score by ±10%.
Tier 3 decisions: High trust (
> 0.7) tips borderline UCS scores to ALLOW. Low trust (< 0.4) triggers ESCALATE.Human override: Agents with trust below 0.3 require human approval for all actions.
Interrupt sensitivity: Lower trust increases the likelihood of monitoring and interruption during execution.
Configuration
Cost-Aware Audit Trails
When cost-sensitive governance is enabled, every verdict's BehavioralProvenance records the active CostProfile and derived thresholds alongside the trust and drift data. This creates a complete decision-theoretic audit trail.
Each provenance snapshot includes:
cost_profile_active: Whether cost-sensitive thresholds were in effectcost_false_allow/cost_false_deny: The asymmetric error costs that shaped the decision boundaryderived_allow_threshold/derived_deny_threshold: The actual thresholds used, derived from signal detection theorythreshold_source: Whether thresholds came from acost_profile, weresmoothedfrom a previous cycle, or usedstatic_defaultvaluesthreshold_shift_from_default: How far the cost profile moved the allow threshold from the static default (0.7)
This means auditors can reconstruct not just what was decided, but why the decision boundary was where it was. A DENY verdict with threshold_source="cost_profile" and cost_false_allow=1.0 tells a clear story: the system denied this action because the cost of a false allow was catastrophic, and the derived thresholds reflected that asymmetry.
The Principle
Deployment is the beginning of evaluation, not the end. Authority is adjusted continuously, not granted permanently. Trust is earned, not assumed.
Organizations that verify with evidence can appropriately exercise authority. Those who trust in assumptions risk being unprepared when failures emerge.
Last updated

