Semantic Drift Detection

Semantic drift is the fifth distribution in Nomotic's behavioral fingerprint. While the existing four distributions — action, target, temporal, and outcome — track the structural shape of agent behavior, semantic drift tracks the meaning-level mapping between an agent's instructions and its actions.

The Problem

An agent instructed to "research flights" that begins "booking flights" while maintaining identical action type and target distributions has drifted semantically. The structural fingerprint sees no change — the agent still performs reads and writes against the same APIs. But the operational meaning of its instructions has shifted: "research" no longer means passive information gathering; it now means transactional execution.

Structural drift detection cannot catch this. Semantic drift detection can.

Core Concepts

Semantic Anchors

A SemanticAnchor defines what an instruction term should mean operationally:

from nomotic.semantic import SemanticAnchor

anchor = SemanticAnchor(
    term="research",
    expected_action_distribution={"read": 0.9, "query": 0.1},
    expected_target_distribution={"search_api": 0.7, "reviews": 0.3},
    tolerance=0.15,
)

Anchors are set per-agent, typically from a BehavioralContract or archetype defaults. They encode the expected behavioral signature of each instruction term.

Semantic Action Map

A SemanticActionMap tracks the observed mapping between instruction terms and action patterns for a specific agent. As the agent operates, each action is tagged with the instruction term it was executed under, building per-term action and target distributions.

Semantic Drift Score

A SemanticDriftScore compares the observed distributions against the anchored expectations using Jensen-Shannon Divergence (the same metric used for structural drift). Per-term drift is computed as:

Action mapping is weighted more heavily (0.6) than target mapping (0.4) because a shift in what the agent does under an instruction is more significant than a shift in where it does it.

The overall semantic drift is the observation-count-weighted average across all anchored terms.

Severity Thresholds

Semantic drift uses the same severity thresholds as structural drift:

Severity
Overall Score

none

< 0.05

low

0.05 - 0.15

moderate

0.15 - 0.35

high

0.35 - 0.60

critical

>= 0.60

Architecture

Semantic drift is tracked externally to the main BehavioralFingerprint because it requires instruction context that the fingerprint doesn't have access to. The architecture is:

  • SemanticObserver — sits alongside FingerprintObserver in the observation layer. Maintains per-agent SemanticActionMap instances and SemanticAnchor registrations.

  • BehavioralFingerprint.semantic_map_ref — a reference field linking the fingerprint to its semantic map.

  • DriftCalculator.compare() — accepts an optional semantic_drift_score parameter. When provided, semantic drift is included in the weighted overall score. When absent, it is excluded for backward compatibility.

  • DriftScore.semantic_drift — new field (defaults to 0.0) storing the semantic component.

  • DriftMonitor — accepts an optional semantic_observer and automatically fetches semantic drift before computing overall drift.

Usage

Setting Anchors

Providing Instruction Context

Instruction context flows through action parameters:

The SemanticObserver extracts instruction context from (in order of priority):

  1. The explicit instruction_context parameter passed to observe()

  2. action.parameters["instruction_context"]

  3. action.parameters["task_description"]

  4. action.parameters["goal"]

  5. Falls back to "__untagged__" if no context is available

Querying Semantic Drift

Direct SemanticObserver Usage

Archetype Integration

All 10 built-in archetype priors now include "semantic" in their drift_weights. Archetypes where semantic meaning is especially important have elevated weights:

Archetype
Semantic Weight
Rationale

financial-analyst

1.5

Financial terms must retain exact meaning

security-monitor

1.3

Security terminology is safety-critical

All others

1.0

Standard semantic drift sensitivity

The ArchetypePrior dataclass also gains a semantic_anchors field for defining default anchors per archetype.

Drift Taxonomy Placement

Semantic drift is a distribution (what is drifting), not a scope (who is drifting). In the Nomotic Drift Taxonomy:

Five Drift Distributions:

  1. Action drift — change in what the agent does

  2. Target drift — change in where the agent operates

  3. Temporal drift — change in when the agent acts

  4. Outcome drift — change in governance evaluation outcomes

  5. Semantic drift — change in the meaning mapping between instructions and actions

Five Drift Scopes:

  1. Agent drift — individual agent behavioral deviation

  2. Human drift — human reviewer oversight degradation

  3. Fleet drift — aggregate drift across agent populations

  4. Correlated drift — multiple agents drifting in the same direction

  5. Coordinated drift — agents drifting in complementary ways

Semantic drift can appear at any scope: an individual agent can drift semantically, an entire fleet can exhibit correlated semantic drift, or agents can show coordinated semantic drift where one agent's "research" drift complements another's "execute" drift.

Serialization

All semantic types support to_dict() / from_dict() roundtrip serialization:

Thread Safety

All mutable state in SemanticActionMap and SemanticObserver is protected by threading.Lock, matching the thread-safety guarantees of the existing fingerprint system.

Last updated