Verifiable Trust

A new employee joins your team with impressive credentials. A graduate degree from a top program. Strong recommendations. A polished interview.

You believe they can do the job. But you do not hand them unrestricted access on day one.

You start with defined responsibilities. You watch how they perform. When they consistently meet expectations, you expand their scope. When they miss deadlines, you slow down. When they handle pressure well, you speed up. Trust grows with evidence. Credentials open the door. Behavior determines how far someone goes inside.

When organizations deploy AI systems, they often abandon this logic entirely. A system performs well in testing. Leadership approves deployment. The system is granted broad permissions immediately, operating with authority it has never earned through observed behavior.

The Capability Trap

Capability and consistency are different qualities. A system can be highly capable on average and still be unreliable when conditions change. It can perform brilliantly in demonstrations and behave unpredictably in production.

The history of AI deployment reflects this pattern. Systems that excelled in testing environments produced harmful outputs when users found unexpected prompts. Agents that handled routine cases flawlessly made baffling decisions when context shifted slightly. The issue was misplaced trust in capability rather than verified performance.

Claimed capability sounds like this: "This system achieved 98 percent accuracy on a benchmark. It can handle customer service inquiries."

Demonstrated consistency sounds like this: "This system has processed 47,000 customer inquiries over six months. It maintained policy compliance in 99.2 percent of cases. All exceptions fell within defined parameters. Trust level is high for routine inquiries. Escalation is required for refunds over $500."

The difference is how knowledge is formed. Claimed capability relies on inference. Demonstrated consistency relies on evidence.

How Trust Calibration Works

Trust is not a setting applied at deployment. It is a variable that changes in response to evidence.

Trust Profiles

Every agent has a TrustProfile tracking its earned trust:

profile = runtime.trust_calibrator.get_profile("my-agent")

print(f"Overall trust: {profile.overall_trust:.2f}")     # 0.0–1.0
print(f"Successful actions: {profile.successful_actions}")
print(f"Violations: {profile.violation_count}")
print(f"Violation rate: {profile.violation_rate:.1%}")

Trust starts at baseline (0.5) and moves based on observed behavior.

Asymmetric Calibration

Building trust is deliberately harder than losing it:

Event

Trust Change

Recovery Cost

Action allowed

+0.01

—

Action completed successfully

+0.005

—

Action denied (violation)

-0.05

5 successful actions

Action interrupted

-0.03

3 successful actions

One violation costs as much trust as five successes earn. This prevents trust farming — rapidly executing many low-risk actions to build trust for a single high-risk action.

Per-Dimension Trust

Trust is tracked per dimension, not just overall. An agent that consistently triggers concerns on one dimension (e.g., cascading impact) will have its trust lowered specifically for that dimension, even if other dimensions are fine.

# Dimension-level trust
for dim_name, dim_trust in profile.dimension_trust.items():
    print(f"  {dim_name}: {dim_trust:.2f}")

Time Decay

Trust that isn't actively reinforced drifts back toward baseline. A once-trusted agent that hasn't been active gradually returns to neutral. This prevents stale high-trust profiles from persisting after an agent has been idle.

Drift Integration

Behavioral drift directly erodes trust, scaled by drift severity and confidence:

Drift Score

Trust Erosion per Check

< 0.10

None (normal variance)

0.10 – 0.20

-0.002

0.20 – 0.40

-0.008

0.40 – 0.60

-0.02

≥ 0.60

-0.04

When drift recovers (drops below 0.15 from above), trust receives a small recovery bonus (+0.003).

How Trust Affects Governance

Trust feeds back into every governance evaluation:

UCS modulation: Higher trust shifts UCS scores upward. Lower trust shifts them down. At default influence (0.2), trust can move the score by ±10%.
Tier 3 decisions: High trust (> 0.7) tips borderline UCS scores to ALLOW. Low trust (< 0.4) triggers ESCALATE.
Human override: Agents with trust below 0.3 require human approval for all actions.
Interrupt sensitivity: Lower trust increases the likelihood of monitoring and interruption during execution.

Configuration

from nomotic.trust import TrustConfig

config = TrustConfig(
    success_increment=0.01,     # Trust gained per success
    violation_decrement=0.05,   # Trust lost per violation
    interrupt_decrement=0.03,   # Trust lost per interruption
    baseline_trust=0.5,         # Starting trust / decay target
    decay_rate=0.01,            # Decay rate per hour
    min_trust=0.05,             # Floor — trust never drops below this
    max_trust=0.95,             # Ceiling — trust never exceeds this
)

# nomotic.yaml
agents:
  my-agent:
    trust:
      initial: 0.5
      minimum_for_action: 0.3
      floor: 0.05
      ceiling: 0.95
      decay_rate: 0.001

Cost-Aware Audit Trails

When cost-sensitive governance is enabled, every verdict's BehavioralProvenance records the active CostProfile and derived thresholds alongside the trust and drift data. This creates a complete decision-theoretic audit trail.

Each provenance snapshot includes:

cost_profile_active: Whether cost-sensitive thresholds were in effect
cost_false_allow / cost_false_deny: The asymmetric error costs that shaped the decision boundary
derived_allow_threshold / derived_deny_threshold: The actual thresholds used, derived from signal detection theory
threshold_source: Whether thresholds came from a cost_profile, were smoothed from a previous cycle, or used static_default values
threshold_shift_from_default: How far the cost profile moved the allow threshold from the static default (0.7)

This means auditors can reconstruct not just what was decided, but why the decision boundary was where it was. A DENY verdict with threshold_source="cost_profile" and cost_false_allow=1.0 tells a clear story: the system denied this action because the cost of a false allow was catastrophic, and the derived thresholds reflected that asymmetry.

The Principle

Deployment is the beginning of evaluation, not the end. Authority is adjusted continuously, not granted permanently. Trust is earned, not assumed.

Organizations that verify with evidence can appropriately exercise authority. Those who trust in assumptions risk being unprepared when failures emerge.

PreviousWhy Governance, Security, Bias, and Ethics Must Be Integrated NextWeights and Vetoes

Last updated 8 days ago

Good evening

hashtagThe Capability Trap

hashtagHow Trust Calibration Works

hashtagTrust Profiles

hashtagAsymmetric Calibration

hashtagPer-Dimension Trust

hashtagTime Decay

hashtagDrift Integration

hashtagHow Trust Affects Governance

hashtagConfiguration

hashtagCost-Aware Audit Trails

hashtagThe Principle