Control Specifications

Technical Control Specifications

Detailed enforcement points, failure modes, and evaluation criteria for each of the six ACR control pillars. These specifications define the testable requirements for conformance assessment.

Every agent must have a unique identity, a declared purpose, and constraints tied to that purpose. The system must maintain an authoritative agent record verifiable through cryptographic proof.

Enforcement Points

  • Agent identity validation at every protected action via cryptographic proof (signed tokens, certificates, workload identity, or attestation)
  • Purpose scope check against authoritative manifest containing: agent_id, owner, purpose, risk tier, allowed tools, forbidden tools, approved data access scope, operational boundaries
  • Session binding validation with documented lifetime, binding rules, and revalidation conditions
  • Any change to purpose, capability scope, or high-risk boundary requires authorized control path, versioning, logging, and audit trail
  • Revocation of agent identity or execution authority prevents all future action execution

Failure Modes

  • Identity validation failure: deny execution, invalidate unverified request, log failure reason
  • Purpose mismatch: deny execution, log attempted out-of-scope action
  • Revoked identity: prevent all future protected action execution
  • Missing manifest fields: deny action until authoritative record is complete

Evaluation Criteria

  • Every sampled protected action is attributable to a single registered agent_id
  • Authoritative agent record contains every required field
  • Out-of-scope actions are denied
  • Changes to purpose or capability scope are versioned and auditable
  • Revoked identities cannot execute protected actions

The control plane enforces policy at Input, Execution, and Output Boundaries. Policy enforcement is machine-enforceable and runtime-executed. Documentation-only policies do not satisfy this requirement.

Enforcement Points

  • Input Boundary: schema validation, prompt/instruction sanitization, injection and jailbreak detection, length/format/content limits, source trust evaluation, rejection or transformation of disallowed inputs
  • Execution Boundary: tool allowlisting, destination restriction, parameter validation, spend and rate limits, data access authorization, action-tier classification, sequence or context-aware control, approval gating
  • Output Boundary: PII/PHI/sensitive-data redaction, output filtering, transformation or truncation, destination-aware release restrictions, blocking or escalation for disallowed content
  • Versioned policy definitions maintained with decision evidence: policy identifier, effective version, decision outcome, justification or triggered rule, approval or override context
  • Policy engine unavailability or indeterminate result: deny execution by default

Failure Modes

  • Policy engine unavailable: treat as DENY, prevent all protected execution, record failure condition
  • Indeterminate result from policy evaluation: treat as DENY, log failure
  • Disallowed input: denied, modified, or escalated as required by policy
  • Disallowed action: denied at execution boundary, logged with policy basis
  • Disallowed output: blocked, redacted, or escalated per output boundary rules

Evaluation Criteria

  • Policy evaluation occurs before protected execution at all applicable boundaries
  • Each sampled boundary control produces a logged and auditable outcome
  • Disallowed inputs, actions, and outputs are denied, modified, or escalated as required
  • Policy versions can be traced from decision records to approved policy definitions
  • Policy engine failure results in prevented execution

The system detects deviations from intended role, expected patterns, or approved boundaries. A conformant drift-detection capability requires a defined behavioral baseline, detection signals, thresholded response criteria, and evidence of review or calibration.

Enforcement Points

  • Behavioral baseline established before unrestricted autonomous execution: minimum 30 days of representative activity, or documented temporary baseline approved by human authority for new agents
  • Continuous monitoring of 8 signal categories: tool usage patterns, data access patterns, action frequency and burst behavior, repeated policy denials, escalation pressure or repeated reformulations, novel action sequences, off-hours or boundary-inconsistent activity, suspicious input/output patterns related to manipulation attempts
  • Normalized drift score computed in range 0.0 to 1.0, or documented severity classification mapping deterministically to response tiers
  • Threshold crossings trigger documented response tier automatically without ad hoc operator interpretation
  • Response tiers: throttle, restrict, isolate, kill

Failure Modes

  • Baseline unavailable for new agent: documented temporary baseline required before unrestricted execution
  • Drift detection failure: restrict agent to safe-state pending resolution
  • Drift threshold exceeded: trigger documented response tier immediately
  • False positive identification: calibration review triggered, threshold adjustment, evidence of review retained

Evaluation Criteria

  • A baseline exists for each protected agent or protected action class
  • Drift output is deterministic for a given input dataset and detector state
  • Threshold crossings trigger the documented response tier
  • Response tiers are logged
  • Retained drift records are sufficient to reconstruct why a response was triggered

The system logs all actions and all decisions. Observability records enable reconstruction of who acted, what was proposed, what controls applied, what disposition was produced, what approvals or overrides occurred, what downstream execution occurred, and when each event occurred.

Enforcement Points

  • Conformant telemetry records include at minimum: schema version, event identifier, timestamp, event type, agent identifier, purpose, request or correlation identifier, decision outcome, relevant policy results, execution metadata, approval metadata for escalation events
  • Correlation identifier or equivalent linkage data recorded across the full action chain
  • High-risk events stored in tamper-evident or append-only form
  • Retention: all ACR events retained for at least 90 days; high-risk agent events retained for at least 13 months
  • Audit-ready export without requiring manual reconstruction from unstructured logs
  • Privacy-preserving treatment applied to sensitive fields: each field category documented as plaintext, redacted, hashed, tokenized, or omitted

Failure Modes

  • Logging pipeline failure: prevent execution of high-risk protected actions
  • Primary log unavailable: record failure condition through alternate durable mechanism if one exists
  • Evidence preservation failure: transition affected protected action classes to documented safe-state
  • Retention violation: alert and remediate before minimum periods expire

Evaluation Criteria

  • Sampled events contain all required telemetry fields
  • Correlation identifiers connect the full action chain for sampled events
  • Retention controls enforce the required minimum retention periods
  • High-risk logs are append-only or tamper-evident
  • Exported evidence can reconstruct sampled incidents without reliance on unstructured logs

The system maintains containment capability independent of the agent runtime. Containment controls include an externally operable kill capability, isolation modes, documented safe-states for each production agent, and logging of all containment actions.

Enforcement Points

  • Kill capability: operates outside the agent runtime, invocable by authorized humans, invocable by automation only if identity and triggering conditions are documented, completes within 30 seconds from activation to enforcement, revokes or invalidates execution authority, prevents further protected execution until restoration is authorized
  • Graduated containment tiers: throttling, restriction of tools/permissions/destinations, isolation of agent or execution path, kill or full shutdown
  • Safe-state per production agent defines: what actions remain possible, what actions are disabled, how human escalation occurs, how evidence is preserved during containment
  • If protected action class can change persistent state: document whether rollback is supported; if not, compensating controls must be documented
  • Kill-switch and safe-state procedures tested at least quarterly with test evidence retained

Failure Modes

  • Agent runtime unavailable or compromised: containment remains independently operable
  • Kill activation: revokes execution authority, prevents all further protected execution
  • Rollback not supported: compensating controls must be documented and available
  • Safe-state undefined: production agent must not operate without documented safe-state

Evaluation Criteria

  • Kill capability functions outside the agent runtime
  • Activation prevents further protected execution within 30 seconds
  • Each containment tier exists and produces the documented effect
  • Safe-state behavior is tested and documented
  • Quarterly containment test records are available

Human authority remains the final governance layer for actions classified above the autonomous tier. The system classifies actions into risk tiers and maintains an escalation authority matrix defining roles, delegation limits, backup approvers, and timeout behavior.

Enforcement Points

  • Risk-tiered action classification: actions permitted autonomously, actions requiring review or gating, actions requiring explicit human approval before execution
  • New or unclassified high-impact action types default to the highest approval class until explicitly classified
  • Escalation authority matrix defines: which roles may approve which action categories, delegation limits, backup approvers, timeout behavior
  • For escalated actions: durable approval record created, reviewer presented with agent identity, declared purpose, requested action, parameters, policy basis for escalation, current drift state, and timeout deadline
  • Break-glass capability (if provided): scoped and time-limited, logged in tamper-evident form with approver identity, subject to mandatory post-use review, approved in advance by defined authority
  • Human authority controls remain operable even if agent runtime or model infrastructure is unavailable

Failure Modes

  • Approval timeout: execute documented timeout behavior per escalation matrix
  • Human authority unavailable: agent enters safe-state for all actions above autonomous tier
  • Break-glass misuse: mandatory post-use review triggered, tamper-evident log preserved
  • No rationale supplied for approval: record must explicitly state that no rationale was supplied

Evaluation Criteria

  • Action tiering exists and is applied deterministically
  • Unclassified high-impact actions default to the highest approval class
  • Escalated actions do not execute before approval
  • Approval records contain all required review context
  • Break-glass events (if enabled) meet scope, duration, logging, and review requirements