Dynamic Trust
Trust is not a setting applied at deployment. It is a variable that changes in response to evidence.
Every agent starts at baseline trust (0.5) and earns or loses trust through observed behavior. The asymmetry is intentional: building trust is harder than losing it. One violation costs as much as five successes earn. This prevents trust farming — rapidly executing low-risk actions to build trust for a single high-risk action.
The Calibration Loop
Every governance evaluation produces an outcome. Every outcome is evidence. Evidence updates trust. Updated trust influences the next evaluation. This is the closed loop that makes governance adaptive rather than static.
Trust Profiles
Every agent has a TrustProfile tracking its earned trust:
profile = runtime.trust_calibrator.get_profile("my-agent")
print(f"Overall trust: {profile.overall_trust:.2f}") # 0.0–1.0
print(f"Successful actions: {profile.successful_actions}")
print(f"Violations: {profile.violation_count}")
print(f"Violation rate: {profile.violation_rate:.1%}")Events That Calibrate Trust
Successful action completion
+0.01
Positive evidence of reliable behavior
Governance violation (DENY)
-0.05
Direct evidence of boundary problem
Execution interrupted
-0.03
Governance had to intervene mid-action
Drift alert (medium)
-0.02
Behavioral baseline shifting
Drift alert (high/critical)
-0.05
Significant departure from established behavior
Override: revoke an approval
-0.05
Human confirmed governance should have denied
Override: approve a denial
0.00
Human accepted risk; governance correctly flagged
Time decay (per hour idle)
toward baseline
Trust earned must be actively maintained
The 5:1 asymmetry between violation cost and success gain reflects the actual asymmetry of risk: one serious violation can cause more harm than fifty successful actions prevent.
Per-Dimension Trust
Trust is tracked per dimension, not just overall. An agent that consistently triggers concerns on one dimension (e.g., cascading impact) will have its trust lowered specifically for that dimension, even if other dimensions are fine.
When an action is denied, dimensions that vetoed or scored below 0.3 have their per-dimension trust decreased. When an action is allowed and a dimension scores above 0.7, that dimension's trust is gradually restored.
Trust Trajectory
The calibrator tracks not just the current trust score but its trajectory — whether trust is trending upward, stable, or declining over recent history.
Rising
Tier 3 tips ambiguous decisions toward ALLOW
Stable
Standard Tier 3 evaluation
Declining
Tier 3 tips ambiguous decisions toward ESCALATE
Declining (steep)
Increased scrutiny across all tiers
A declining trajectory increases scrutiny even when the absolute trust score is still above the action minimum. An agent with a trust score of 0.6 that has been declining for 50 actions is treated with more caution than an agent at 0.55 whose trust is stable.
How Trust Affects Governance
Trust feeds back into every governance evaluation at two points:
UCS modulation — The trust score shifts how the UCS is calculated. At default influence (0.2), trust can move the score by ±10%. Higher trust shifts UCS scores upward. Lower trust shifts them down.
Tier 3 deliberation — For actions in the ambiguity zone that Tier 2 cannot resolve, Tier 3 uses trust as the deciding input:
High trust (> 0.7) + borderline UCS (> 0.5) → ALLOW
Low trust (
< 0.4) → ESCALATE to human reviewAgents with trust below 0.3 require human approval for all actions (via the Human Override dimension)
Drift-Driven Calibration
Behavioral drift feeds directly into trust calibration. When the drift monitor detects that an agent's behavior has shifted from its established baseline, it applies a trust penalty proportional to drift severity:
< 0.10
None (normal variance)
0.10 – 0.20
-0.002
0.20 – 0.40
-0.008
0.40 – 0.60
-0.02
≥ 0.60
-0.04
When drift recovers (drops below 0.15 from above), trust receives a small recovery bonus (+0.003). All adjustments are scaled by the drift score's confidence.
This means an agent does not need to trigger a governance violation to have its trust reduced. Behavioral drift alone — detected before any boundary is crossed — is sufficient evidence for the calibrator to act.
Override-Driven Calibration
Human override decisions are evidence too. When a human revokes an approval — confirming that governance should have denied an action it allowed — the calibrator applies a penalty equivalent to a direct violation (-0.05).
When a human approves a denial, no trust change occurs. Governance correctly identified a risk; the human chose to accept it. This is not evidence that governance erred.
Time Decay
Trust that isn't actively reinforced drifts back toward baseline. A once-trusted agent that hasn't been active gradually returns to neutral. This prevents stale high-trust profiles from persisting after an agent has been idle.
The decay rate is configurable. At the default rate (0.01 per hour), an agent at trust 0.8 that goes idle will decay back to 0.5 over approximately 30 hours.
Configuration
Manual Adjustment
Trust can be adjusted manually by a human authority. Manual adjustments are recorded in the audit trail with the actor, timestamp, and reason.
:::warning Manual trust adjustments bypass the calibration loop and take effect immediately. They are appropriate for responding to external findings or one-time events. For ongoing governance, rely on calibration rather than manual adjustment. :::
CLI
Last updated

