Counterfactual Behavioral Replay

Counterfactual Behavioral Replay transforms Nomotic's audit trails from static records into an interactive behavioral forensics tool. Take any agent's actual behavioral history and replay it against a different contract version, a different set of invariants, or a different governance configuration, then observe the projected outcomes.

Why Counterfactual Replay

Compliance teams, auditors, and incident responders constantly ask questions that nobody can answer without manual reconstruction:

  • "What if we had applied Contract v2.1 instead of v2.0 during last Tuesday's incident?"

  • "How many additional denials would the new invariant set have produced over the last 30 days?"

  • "If we tighten the semantic anchor threshold from 0.85 to 0.92, which agents would have been flagged and when?"

Counterfactual Behavioral Replay answers these questions automatically. It turns audit conversations from "what happened" into "what would have happened if" — the question every regulator, board member, and risk officer actually wants answered.

How It Works

Behavioral Replay Engine

The BehavioralReplayEngine loads historical action sequences from the audit trail, applies alternative governance configurations, and produces a ReplayReport comparing actual verdicts with counterfactual verdicts.

from nomotic.replay import BehavioralReplayEngine, ReplayConfig
from nomotic.audit_store import AuditStore

engine = BehavioralReplayEngine(audit_store=AuditStore(base_dir))

# "What if we had used stricter thresholds?"
report = engine.replay(
    agent_id="claims-processor",
    replay_config=ReplayConfig(
        allow_threshold=0.9,
        description="Proposed Q2 stricter thresholds",
    ),
)

print(report.generate_summary())
# Replayed 847 actions for claims-processor. The alternative config would
# have produced 23 different verdicts: 18 stricter (more denials), 5 looser.
# Net effect: 2.1% of actions would have been blocked that were previously allowed.

Replay Configuration

The ReplayConfig specifies alternative governance parameters. Any field left as None uses the original configuration:

Verdict Comparison

The report highlights divergence points — actions where the alternative configuration would have produced a different verdict:

The direction classifies each divergence using a strictness ordering:

Direction
Meaning
Example

stricter

Counterfactual would have been more restrictive

ALLOW → DENY

looser

Counterfactual would have been more permissive

DENY → ALLOW

lateral

Different verdict at same strictness level

ESCALATE → MODIFY

same

No change

ALLOW → ALLOW

Fleet Replay

Replay the same configuration change across multiple agents to understand fleet-wide impact:

Configuration Comparison

Compare two alternative configurations against the same behavioral history:

CLI Usage

Replay with threshold overrides

Replay with a config file

Where replay-config.json contains:

Time-scoped replay

HTTP API

Replay

Request body:

Response: A ReplayReport JSON object with divergence points, counts, and summary.

Compare Configurations

Request body:

Response: {"config_a": ReplayReport, "config_b": ReplayReport}

Integration with Drift Taxonomy

Replay integrates with the drift taxonomy: operators can replay a coordinated drift event under alternative governance configurations to determine whether different contract parameters, invariant thresholds, or semantic anchor tolerances would have caught the propagation chain earlier.

Replay Report

The ReplayReport provides a complete summary:

Field
Description

report_id

Unique identifier for this replay

agent_id

Agent whose history was replayed

actions_replayed

Total actions processed

total_divergences

Actions with different verdicts

stricter_count

Counterfactual was more restrictive

looser_count

Counterfactual was more permissive

divergence_points

List of VerdictComparison objects

summary

Human-readable summary text

Last updated