Control Specifications

Technical Control Specifications

Detailed enforcement points, failure modes, and evaluation criteria for each of the six ACR control pillars. These specifications define the testable requirements for conformance assessment.

Every agent must have a unique identity, a declared purpose, and constraints tied to that purpose. The system must maintain an authoritative agent record verifiable through cryptographic proof.

Enforcement Points

Agent identity validation at every protected action via cryptographic proof (signed tokens, certificates, workload identity, or attestation)
Purpose scope check against authoritative manifest containing: agent_id, owner, purpose, risk tier, allowed tools, forbidden tools, approved data access scope, operational boundaries
Session binding validation with documented lifetime, binding rules, and revalidation conditions
Any change to purpose, capability scope, or high-risk boundary requires authorized control path, versioning, logging, and audit trail
Revocation of agent identity or execution authority prevents all future action execution

Failure Modes

Identity validation failure: deny execution, invalidate unverified request, log failure reason
Purpose mismatch: deny execution, log attempted out-of-scope action
Revoked identity: prevent all future protected action execution
Missing manifest fields: deny action until authoritative record is complete

Evaluation Criteria

Every sampled protected action is attributable to a single registered agent_id
Authoritative agent record contains every required field
Out-of-scope actions are denied
Changes to purpose or capability scope are versioned and auditable
Revoked identities cannot execute protected actions

View Pillar Crosswalk Mapping

The control plane enforces policy at Input, Execution, and Output Boundaries. Policy enforcement is machine-enforceable and runtime-executed. Documentation-only policies do not satisfy this requirement.

Enforcement Points

Input Boundary: schema validation, prompt/instruction sanitization, injection and jailbreak detection, length/format/content limits, source trust evaluation, rejection or transformation of disallowed inputs
Execution Boundary: tool allowlisting, destination restriction, parameter validation, spend and rate limits, data access authorization, action-tier classification, sequence or context-aware control, approval gating
Output Boundary: PII/PHI/sensitive-data redaction, output filtering, transformation or truncation, destination-aware release restrictions, blocking or escalation for disallowed content
Versioned policy definitions maintained with decision evidence: policy identifier, effective version, decision outcome, justification or triggered rule, approval or override context
Policy engine unavailability or indeterminate result: deny execution by default

Failure Modes

Policy engine unavailable: treat as DENY, prevent all protected execution, record failure condition
Indeterminate result from policy evaluation: treat as DENY, log failure
Disallowed input: denied, modified, or escalated as required by policy
Disallowed action: denied at execution boundary, logged with policy basis
Disallowed output: blocked, redacted, or escalated per output boundary rules

Evaluation Criteria

Policy evaluation occurs before protected execution at all applicable boundaries
Each sampled boundary control produces a logged and auditable outcome
Disallowed inputs, actions, and outputs are denied, modified, or escalated as required
Policy versions can be traced from decision records to approved policy definitions
Policy engine failure results in prevented execution

View Pillar Crosswalk Mapping

The system detects deviations from intended role, expected patterns, or approved boundaries. A conformant drift-detection capability requires a defined behavioral baseline, detection signals, thresholded response criteria, and evidence of review or calibration.

Enforcement Points

Behavioral baseline established before unrestricted autonomous execution: minimum 30 days of representative activity, or documented temporary baseline approved by human authority for new agents
Continuous monitoring of 8 signal categories: tool usage patterns, data access patterns, action frequency and burst behavior, repeated policy denials, escalation pressure or repeated reformulations, novel action sequences, off-hours or boundary-inconsistent activity, suspicious input/output patterns related to manipulation attempts
Normalized drift score computed in range 0.0 to 1.0, or documented severity classification mapping deterministically to response tiers
Threshold crossings trigger documented response tier automatically without ad hoc operator interpretation
Response tiers: throttle, restrict, isolate, kill

Failure Modes

Baseline unavailable for new agent: documented temporary baseline required before unrestricted execution
Drift detection failure: restrict agent to safe-state pending resolution
Drift threshold exceeded: trigger documented response tier immediately
False positive identification: calibration review triggered, threshold adjustment, evidence of review retained

Evaluation Criteria

A baseline exists for each protected agent or protected action class
Drift output is deterministic for a given input dataset and detector state
Threshold crossings trigger the documented response tier
Response tiers are logged
Retained drift records are sufficient to reconstruct why a response was triggered

View Pillar Crosswalk Mapping

The system logs all actions and all decisions. Observability records enable reconstruction of who acted, what was proposed, what controls applied, what disposition was produced, what approvals or overrides occurred, what downstream execution occurred, and when each event occurred.

Enforcement Points

Conformant telemetry records include at minimum: schema version, event identifier, timestamp, event type, agent identifier, purpose, request or correlation identifier, decision outcome, relevant policy results, execution metadata, approval metadata for escalation events
Correlation identifier or equivalent linkage data recorded across the full action chain
High-risk events stored in tamper-evident or append-only form
Retention: all ACR events retained for at least 90 days; high-risk agent events retained for at least 13 months
Audit-ready export without requiring manual reconstruction from unstructured logs
Privacy-preserving treatment applied to sensitive fields: each field category documented as plaintext, redacted, hashed, tokenized, or omitted

Failure Modes

Logging pipeline failure: prevent execution of high-risk protected actions
Primary log unavailable: record failure condition through alternate durable mechanism if one exists
Evidence preservation failure: transition affected protected action classes to documented safe-state
Retention violation: alert and remediate before minimum periods expire

Evaluation Criteria

Sampled events contain all required telemetry fields
Correlation identifiers connect the full action chain for sampled events
Retention controls enforce the required minimum retention periods
High-risk logs are append-only or tamper-evident
Exported evidence can reconstruct sampled incidents without reliance on unstructured logs

View Pillar Crosswalk Mapping

The system maintains containment capability independent of the agent runtime. Containment controls include an externally operable kill capability, isolation modes, documented safe-states for each production agent, and logging of all containment actions.

Enforcement Points

Kill capability: operates outside the agent runtime, invocable by authorized humans, invocable by automation only if identity and triggering conditions are documented, completes within 30 seconds from activation to enforcement, revokes or invalidates execution authority, prevents further protected execution until restoration is authorized
Graduated containment tiers: throttling, restriction of tools/permissions/destinations, isolation of agent or execution path, kill or full shutdown
Safe-state per production agent defines: what actions remain possible, what actions are disabled, how human escalation occurs, how evidence is preserved during containment
If protected action class can change persistent state: document whether rollback is supported; if not, compensating controls must be documented
Kill-switch and safe-state procedures tested at least quarterly with test evidence retained

Failure Modes

Agent runtime unavailable or compromised: containment remains independently operable
Kill activation: revokes execution authority, prevents all further protected execution
Rollback not supported: compensating controls must be documented and available
Safe-state undefined: production agent must not operate without documented safe-state

Evaluation Criteria

Kill capability functions outside the agent runtime
Activation prevents further protected execution within 30 seconds
Each containment tier exists and produces the documented effect
Safe-state behavior is tested and documented
Quarterly containment test records are available

View Pillar Crosswalk Mapping

Human authority remains the final governance layer for actions classified above the autonomous tier. The system classifies actions into risk tiers and maintains an escalation authority matrix defining roles, delegation limits, backup approvers, and timeout behavior.

Enforcement Points

Risk-tiered action classification: actions permitted autonomously, actions requiring review or gating, actions requiring explicit human approval before execution
New or unclassified high-impact action types default to the highest approval class until explicitly classified
Escalation authority matrix defines: which roles may approve which action categories, delegation limits, backup approvers, timeout behavior
For escalated actions: durable approval record created, reviewer presented with agent identity, declared purpose, requested action, parameters, policy basis for escalation, current drift state, and timeout deadline
Break-glass capability (if provided): scoped and time-limited, logged in tamper-evident form with approver identity, subject to mandatory post-use review, approved in advance by defined authority
Human authority controls remain operable even if agent runtime or model infrastructure is unavailable

Failure Modes

Approval timeout: execute documented timeout behavior per escalation matrix
Human authority unavailable: agent enters safe-state for all actions above autonomous tier
Break-glass misuse: mandatory post-use review triggered, tamper-evident log preserved
No rationale supplied for approval: record must explicitly state that no rationale was supplied

Evaluation Criteria

Action tiering exists and is applied deterministically
Unclassified high-impact actions default to the highest approval class
Escalated actions do not execute before approval
Approval records contain all required review context
Break-glass events (if enabled) meet scope, duration, logging, and review requirements

View Pillar Crosswalk Mapping