Dynamic Trust

Trust is not a setting applied at deployment. It is a variable that changes in response to evidence.

Every agent starts at baseline trust (0.5) and earns or loses trust through observed behavior. The asymmetry is intentional: building trust is harder than losing it. One violation costs as much as five successes earn. This prevents trust farming — rapidly executing low-risk actions to build trust for a single high-risk action.

The Calibration Loop

Every governance evaluation produces an outcome. Every outcome is evidence. Evidence updates trust. Updated trust influences the next evaluation. This is the closed loop that makes governance adaptive rather than static.

Trust Profiles

Every agent has a TrustProfile tracking its earned trust:

profile = runtime.trust_calibrator.get_profile("my-agent")

print(f"Overall trust: {profile.overall_trust:.2f}")     # 0.0–1.0
print(f"Successful actions: {profile.successful_actions}")
print(f"Violations: {profile.violation_count}")
print(f"Violation rate: {profile.violation_rate:.1%}")

Events That Calibrate Trust

Event
Default Delta
Reasoning

Successful action completion

+0.01

Positive evidence of reliable behavior

Governance violation (DENY)

-0.05

Direct evidence of boundary problem

Execution interrupted

-0.03

Governance had to intervene mid-action

Drift alert (medium)

-0.02

Behavioral baseline shifting

Drift alert (high/critical)

-0.05

Significant departure from established behavior

Override: revoke an approval

-0.05

Human confirmed governance should have denied

Override: approve a denial

0.00

Human accepted risk; governance correctly flagged

Time decay (per hour idle)

toward baseline

Trust earned must be actively maintained

The 5:1 asymmetry between violation cost and success gain reflects the actual asymmetry of risk: one serious violation can cause more harm than fifty successful actions prevent.

Per-Dimension Trust

Trust is tracked per dimension, not just overall. An agent that consistently triggers concerns on one dimension (e.g., cascading impact) will have its trust lowered specifically for that dimension, even if other dimensions are fine.

When an action is denied, dimensions that vetoed or scored below 0.3 have their per-dimension trust decreased. When an action is allowed and a dimension scores above 0.7, that dimension's trust is gradually restored.

Trust Trajectory

The calibrator tracks not just the current trust score but its trajectory — whether trust is trending upward, stable, or declining over recent history.

Trajectory
Governance Effect

Rising

Tier 3 tips ambiguous decisions toward ALLOW

Stable

Standard Tier 3 evaluation

Declining

Tier 3 tips ambiguous decisions toward ESCALATE

Declining (steep)

Increased scrutiny across all tiers

A declining trajectory increases scrutiny even when the absolute trust score is still above the action minimum. An agent with a trust score of 0.6 that has been declining for 50 actions is treated with more caution than an agent at 0.55 whose trust is stable.

How Trust Affects Governance

Trust feeds back into every governance evaluation at two points:

UCS modulation — The trust score shifts how the UCS is calculated. At default influence (0.2), trust can move the score by ±10%. Higher trust shifts UCS scores upward. Lower trust shifts them down.

Tier 3 deliberation — For actions in the ambiguity zone that Tier 2 cannot resolve, Tier 3 uses trust as the deciding input:

  • High trust (> 0.7) + borderline UCS (> 0.5) → ALLOW

  • Low trust (< 0.4) → ESCALATE to human review

  • Agents with trust below 0.3 require human approval for all actions (via the Human Override dimension)

Drift-Driven Calibration

Behavioral drift feeds directly into trust calibration. When the drift monitor detects that an agent's behavior has shifted from its established baseline, it applies a trust penalty proportional to drift severity:

Drift Score
Trust Erosion per Check

< 0.10

None (normal variance)

0.10 – 0.20

-0.002

0.20 – 0.40

-0.008

0.40 – 0.60

-0.02

≥ 0.60

-0.04

When drift recovers (drops below 0.15 from above), trust receives a small recovery bonus (+0.003). All adjustments are scaled by the drift score's confidence.

This means an agent does not need to trigger a governance violation to have its trust reduced. Behavioral drift alone — detected before any boundary is crossed — is sufficient evidence for the calibrator to act.

Override-Driven Calibration

Human override decisions are evidence too. When a human revokes an approval — confirming that governance should have denied an action it allowed — the calibrator applies a penalty equivalent to a direct violation (-0.05).

When a human approves a denial, no trust change occurs. Governance correctly identified a risk; the human chose to accept it. This is not evidence that governance erred.

Time Decay

Trust that isn't actively reinforced drifts back toward baseline. A once-trusted agent that hasn't been active gradually returns to neutral. This prevents stale high-trust profiles from persisting after an agent has been idle.

The decay rate is configurable. At the default rate (0.01 per hour), an agent at trust 0.8 that goes idle will decay back to 0.5 over approximately 30 hours.

Configuration

Manual Adjustment

Trust can be adjusted manually by a human authority. Manual adjustments are recorded in the audit trail with the actor, timestamp, and reason.

:::warning Manual trust adjustments bypass the calibration loop and take effect immediately. They are appropriate for responding to external findings or one-time events. For ongoing governance, rely on calibration rather than manual adjustment. :::

CLI

Last updated