Sequential Governance Optimization

Why Point-in-Time Governance Is Suboptimal

Traditional governance systems — including Nomotic's own Behavioral Trajectory Engine prior to v0.8.0 — make intervention decisions at a single point in time. When drift exceeds a threshold, the system triggers a fixed intervention. This approach has three structural weaknesses:

No lookahead: The system cannot reason about the cost of intervening now versus later. An advisory today might be cheap enough to prevent an expensive escalation tomorrow.
No strategy comparison: Threshold-based systems consider exactly one intervention at a time. They cannot compare "throttle now" against "wait 10 steps and escalate" against "graduated advisory → throttle → escalate."
No cost awareness: The intervention triggered depends only on the violation probability, not the cost of the intervention itself. A throttle action may have lower violation-prevention value than an advisory when fatigue has accumulated.

The Monte Carlo Policy Optimizer addresses all three by simulating multiple intervention strategies forward and selecting the one with the lowest expected total cost.

Monte Carlo Approach

The optimizer evaluates each candidate strategy by running N Monte Carlo rollouts (default: 200) per strategy:

For each strategy S in candidate_strategies:
    For rollout = 1..N:
        1. Initialize: drift = current_drift, velocity = current_velocity
        2. For each step in [0..horizon]:
            a. If strategy S triggers an intervention at this step:
               - Sample intervention cost from InterventionCostModel
               - Project drift with intervention effect via DriftDynamicsModel
            b. Else:
               - Project drift naturally via DriftDynamicsModel
            c. If drift > violation_threshold:
               - Add per-step violation cost (from CostProfile.false_allow_cost)
        3. total_cost = sum(intervention_costs) + sum(violation_costs)
    Compute mean_cost(S) and variance(S) across N rollouts
Select S* = argmin(mean_cost)

The optimizer leverages the full DriftDynamicsModel including archetype-specific parameters, intervention decay/rebound dynamics, and causal modifiers. This means the Monte Carlo projections capture self-correction, intervention fatigue, and archetype-specific intervention effectiveness.

Predefined Strategies

Five intervention strategies are predefined. Operators can override or extend these via OptimizerConfig.strategies.

Strategy

Schedule

When Used

no_op

empty

Drift is negligible; intervention would waste resources

early_advisory

advisory @ step 0

Low drift detected; soft nudge may suffice

wait_and_throttle

throttle @ step 25

Moderate drift; wait to confirm before acting

immediate_escalate

escalate @ step 0

High drift; immediate human review needed

graduated

advisory @ 0, throttle @ 25, escalate @ 50

Proportional response across the horizon

Heuristic Pruning

To avoid wasting compute on irrelevant strategies, the optimizer prunes the candidate set based on current drift:

drift < 0.10: Only no_op and early_advisory are considered. Aggressive strategies would be pure cost with no benefit.
drift > 0.40: no_op and early_advisory are excluded. Passive strategies are too risky at this drift level.
0.10 ≤ drift ≤ 0.40: All strategies are evaluated.

This pruning typically reduces compute by 40-60% while preserving optimality for the relevant drift range.

Policy Output Format

The optimizer produces an InterventionPolicy — a frozen, serializable, human-readable intervention plan:

InterventionPolicy(
    strategy_name="graduated",
    expected_total_cost=0.12,
    intervention_schedule=(("advisory", 0), ("throttle", 25), ("escalate", 50)),
    conditions={
        "stand_down_if": "drift < 0.15 within 25 actions",
        "escalate_if": "drift > 0.35",
    },
    confidence=0.85,
    cost_vs_naive=0.37,   # 37% cheaper than naive escalation
)

The to_readable() method produces a decision-tree format suitable for operator dashboards:

Policy: graduated
Step 0: Issue advisory
Step 25: Throttle if drift persists
Step 50: Escalate
Stand down if: drift < 0.15 within 25 actions
Escalate if: drift > 0.35
Expected cost: 0.12 (37% savings vs naive)

Policy Stability

The optimizer includes a hysteresis mechanism to prevent policy thrashing. When re-optimizing for an agent that already has a policy, the new policy is only adopted if it improves expected cost by more than policy_switch_threshold (default: 10%).

This prevents the optimizer from oscillating between similarly-scored strategies on each evaluation cycle.

Compute Guardrails

The optimizer is designed to operate within the runtime's evaluation loop, so compute is bounded:

Parameter

Default

Purpose

n_rollouts

200

Simulations per strategy

timeout_seconds

5.0

Hard wall-clock limit per agent

horizon

100

Forward projection steps

If the timeout is reached mid-simulation, the optimizer returns the best result computed so far. This ensures the governance pipeline never blocks indefinitely.

How to Enable

from nomotic import GovernanceRuntime, RuntimeConfig

runtime = GovernanceRuntime(
    config=RuntimeConfig(
        enable_trajectory=True,
        enable_policy_optimizer=True,
    )
)

The optimizer is backward-compatible. When enable_policy_optimizer=False (default), the trajectory engine behaves exactly as in v0.7.0 with simple threshold-based interventions.

Configuration

from nomotic.policy_optimizer import OptimizerConfig

config = OptimizerConfig(
    n_rollouts=500,              # More rollouts = better estimates, more compute
    timeout_seconds=10.0,        # Generous timeout for batch processing
    horizon=200,                 # Look further ahead
    policy_switch_threshold=0.05, # More sensitive to improvements
    strategies={                  # Custom strategy set
        "no_op": [],
        "early_advisory": [("advisory", 0)],
        "aggressive": [("throttle", 0), ("escalate", 10)],
    },
)

Worked Example

Scenario: Agent sales-bot-7 has been drifting at 0.15 with velocity 0.005/step for the past 50 observations. Archetype is sales-agent with a BALANCED cost profile.

Pruning: drift=0.15 is in [0.10, 0.40] → all 5 strategies evaluated.
Simulation results (200 rollouts each, horizon=100):
Strategy
Mean Cost
Variance
no_op
0.19
0.008
early_advisory
0.11
0.004
wait_and_throttle
0.14
0.006
immediate_escalate
0.18
0.003
graduated
0.12
0.005
Selection: early_advisory wins with mean cost 0.11.
Confidence: CV = sqrt(0.004)/0.11 = 0.57 → confidence = 0.43. Moderate — advisory outcomes are somewhat variable for sales agents.
Cost vs naive: (0.18 - 0.11) / 0.18 = 39% savings compared to immediate escalation.

Policy output:

Policy: early_advisory
Step 0: Issue advisory
Stand down if: drift < 0.15 within 25 actions
Expected cost: 0.11 (39% savings vs naive)

The operator sees exactly what the system plans to do and why. If drift continues to rise despite the advisory, the next optimization cycle will select a more aggressive strategy.

Multi-Agent Policy Coordination

When the FleetBehavioralMonitor detects coordinated or correlated drift across multiple agents, the optimizer can evaluate cross-agent intervention strategies that are cheaper than intervening on each agent independently.

Multi-agent coordination is flag-gated: it activates only when a FleetDriftAlert with scope "coordinated" or "correlated" is detected. Single-agent optimization is unaffected.

Chain Optimization (`optimize_chain`)

For coordinated drift propagating through an interaction chain (A → B → C), the optimizer evaluates intervening at each point in the chain:

For each intervention point P in the chain:
    1. Compute optimal single-agent policy at P
    2. Model downstream effect: drift reduction propagates with attenuation
       (default 0.6 per hop)
    3. Total fleet cost = intervention cost at P
                        + residual violation costs for downstream agents
                        + independent costs for upstream agents

Also compute: independent intervention (each agent optimized separately)

Select whichever option has the lower total fleet cost.

Intervening at the root of a propagation chain can reduce drift for all downstream agents, making a single strong intervention cheaper than three independent ones. The attenuation factor controls how much of the drift reduction propagates to each subsequent hop.

Example: In a 3-agent chain with drifts [0.45, 0.35, 0.25], intervening at the root with an escalation that reduces its drift by 0.30 also reduces agent B's drift by 0.18 (0.30 × 0.6) and agent C's by 0.11 (0.18 × 0.6). This single intervention may cost less than three independent graduated responses.

Correlated Upstream Fix (`optimize_correlated`)

For correlated drift (multiple agents drifting in the same direction due to a shared upstream cause), the optimizer compares:

Independent policies: Optimize each agent separately, sum costs.
Upstream fix: A single fix to the shared cause (e.g., reverting a model update, fixing a data pipeline) at a fixed cost.

If the upstream fix cost is less than the sum of individual intervention costs, all agents receive an "upstream_fix" policy indicating the shared remediation.

When upstream_fix_cost is not provided, it is estimated as 2× the median individual intervention cost.

Flag Gating

Multi-agent coordination only activates when the FleetBehavioralMonitor fires a coordinated or correlated drift alert. The integration point is FleetBehavioralMonitor.on_alert_with_policy_coordination(), which dispatches to optimize_chain or optimize_correlated based on the alert scope.

Fleet-scope alerts (aggregate distribution shift) do not trigger policy coordination — they indicate a fleet-wide trend, not agent-to-agent interaction.

PreviousArchitecture Overview NextThree-Tier Evaluation

Last updated 7 days ago

Good evening

hashtagWhy Point-in-Time Governance Is Suboptimal

hashtagMonte Carlo Approach

hashtagPredefined Strategies

hashtagHeuristic Pruning

hashtagPolicy Output Format

hashtagPolicy Stability

hashtagCompute Guardrails

hashtagHow to Enable

hashtagConfiguration

hashtagWorked Example

hashtagMulti-Agent Policy Coordination

hashtagChain Optimization (optimize_chain)

hashtagCorrelated Upstream Fix (optimize_correlated)

hashtagFlag Gating