Sequential Governance Optimization

Why Point-in-Time Governance Is Suboptimal

Traditional governance systems — including Nomotic's own Behavioral Trajectory Engine prior to v0.8.0 — make intervention decisions at a single point in time. When drift exceeds a threshold, the system triggers a fixed intervention. This approach has three structural weaknesses:

  1. No lookahead: The system cannot reason about the cost of intervening now versus later. An advisory today might be cheap enough to prevent an expensive escalation tomorrow.

  2. No strategy comparison: Threshold-based systems consider exactly one intervention at a time. They cannot compare "throttle now" against "wait 10 steps and escalate" against "graduated advisory → throttle → escalate."

  3. No cost awareness: The intervention triggered depends only on the violation probability, not the cost of the intervention itself. A throttle action may have lower violation-prevention value than an advisory when fatigue has accumulated.

The Monte Carlo Policy Optimizer addresses all three by simulating multiple intervention strategies forward and selecting the one with the lowest expected total cost.

Monte Carlo Approach

The optimizer evaluates each candidate strategy by running N Monte Carlo rollouts (default: 200) per strategy:

For each strategy S in candidate_strategies:
    For rollout = 1..N:
        1. Initialize: drift = current_drift, velocity = current_velocity
        2. For each step in [0..horizon]:
            a. If strategy S triggers an intervention at this step:
               - Sample intervention cost from InterventionCostModel
               - Project drift with intervention effect via DriftDynamicsModel
            b. Else:
               - Project drift naturally via DriftDynamicsModel
            c. If drift > violation_threshold:
               - Add per-step violation cost (from CostProfile.false_allow_cost)
        3. total_cost = sum(intervention_costs) + sum(violation_costs)
    Compute mean_cost(S) and variance(S) across N rollouts
Select S* = argmin(mean_cost)

The optimizer leverages the full DriftDynamicsModel including archetype-specific parameters, intervention decay/rebound dynamics, and causal modifiers. This means the Monte Carlo projections capture self-correction, intervention fatigue, and archetype-specific intervention effectiveness.

Predefined Strategies

Five intervention strategies are predefined. Operators can override or extend these via OptimizerConfig.strategies.

Strategy
Schedule
When Used

no_op

empty

Drift is negligible; intervention would waste resources

early_advisory

advisory @ step 0

Low drift detected; soft nudge may suffice

wait_and_throttle

throttle @ step 25

Moderate drift; wait to confirm before acting

immediate_escalate

escalate @ step 0

High drift; immediate human review needed

graduated

advisory @ 0, throttle @ 25, escalate @ 50

Proportional response across the horizon

Heuristic Pruning

To avoid wasting compute on irrelevant strategies, the optimizer prunes the candidate set based on current drift:

  • drift < 0.10: Only no_op and early_advisory are considered. Aggressive strategies would be pure cost with no benefit.

  • drift > 0.40: no_op and early_advisory are excluded. Passive strategies are too risky at this drift level.

  • 0.10 ≤ drift ≤ 0.40: All strategies are evaluated.

This pruning typically reduces compute by 40-60% while preserving optimality for the relevant drift range.

Policy Output Format

The optimizer produces an InterventionPolicy — a frozen, serializable, human-readable intervention plan:

The to_readable() method produces a decision-tree format suitable for operator dashboards:

Policy Stability

The optimizer includes a hysteresis mechanism to prevent policy thrashing. When re-optimizing for an agent that already has a policy, the new policy is only adopted if it improves expected cost by more than policy_switch_threshold (default: 10%).

This prevents the optimizer from oscillating between similarly-scored strategies on each evaluation cycle.

Compute Guardrails

The optimizer is designed to operate within the runtime's evaluation loop, so compute is bounded:

Parameter
Default
Purpose

n_rollouts

200

Simulations per strategy

timeout_seconds

5.0

Hard wall-clock limit per agent

horizon

100

Forward projection steps

If the timeout is reached mid-simulation, the optimizer returns the best result computed so far. This ensures the governance pipeline never blocks indefinitely.

How to Enable

The optimizer is backward-compatible. When enable_policy_optimizer=False (default), the trajectory engine behaves exactly as in v0.7.0 with simple threshold-based interventions.

Configuration

Worked Example

Scenario: Agent sales-bot-7 has been drifting at 0.15 with velocity 0.005/step for the past 50 observations. Archetype is sales-agent with a BALANCED cost profile.

  1. Pruning: drift=0.15 is in [0.10, 0.40] → all 5 strategies evaluated.

  2. Simulation results (200 rollouts each, horizon=100):

    Strategy
    Mean Cost
    Variance

    no_op

    0.19

    0.008

    early_advisory

    0.11

    0.004

    wait_and_throttle

    0.14

    0.006

    immediate_escalate

    0.18

    0.003

    graduated

    0.12

    0.005

  3. Selection: early_advisory wins with mean cost 0.11.

  4. Confidence: CV = sqrt(0.004)/0.11 = 0.57 → confidence = 0.43. Moderate — advisory outcomes are somewhat variable for sales agents.

  5. Cost vs naive: (0.18 - 0.11) / 0.18 = 39% savings compared to immediate escalation.

  6. Policy output:

The operator sees exactly what the system plans to do and why. If drift continues to rise despite the advisory, the next optimization cycle will select a more aggressive strategy.

Multi-Agent Policy Coordination

When the FleetBehavioralMonitor detects coordinated or correlated drift across multiple agents, the optimizer can evaluate cross-agent intervention strategies that are cheaper than intervening on each agent independently.

Multi-agent coordination is flag-gated: it activates only when a FleetDriftAlert with scope "coordinated" or "correlated" is detected. Single-agent optimization is unaffected.

Chain Optimization (optimize_chain)

For coordinated drift propagating through an interaction chain (A → B → C), the optimizer evaluates intervening at each point in the chain:

Intervening at the root of a propagation chain can reduce drift for all downstream agents, making a single strong intervention cheaper than three independent ones. The attenuation factor controls how much of the drift reduction propagates to each subsequent hop.

Example: In a 3-agent chain with drifts [0.45, 0.35, 0.25], intervening at the root with an escalation that reduces its drift by 0.30 also reduces agent B's drift by 0.18 (0.30 × 0.6) and agent C's by 0.11 (0.18 × 0.6). This single intervention may cost less than three independent graduated responses.

Correlated Upstream Fix (optimize_correlated)

For correlated drift (multiple agents drifting in the same direction due to a shared upstream cause), the optimizer compares:

  1. Independent policies: Optimize each agent separately, sum costs.

  2. Upstream fix: A single fix to the shared cause (e.g., reverting a model update, fixing a data pipeline) at a fixed cost.

If the upstream fix cost is less than the sum of individual intervention costs, all agents receive an "upstream_fix" policy indicating the shared remediation.

When upstream_fix_cost is not provided, it is estimated as 2× the median individual intervention cost.

Flag Gating

Multi-agent coordination only activates when the FleetBehavioralMonitor fires a coordinated or correlated drift alert. The integration point is FleetBehavioralMonitor.on_alert_with_policy_coordination(), which dispatches to optimize_chain or optimize_correlated based on the alert scope.

Fleet-scope alerts (aggregate distribution shift) do not trigger policy coordination — they indicate a fleet-wide trend, not agent-to-agent interaction.

Last updated