---
title: "Implementing Mission Architecture End to End"
date: "2026-04-11T10:30:00-07:00"
lastmod: "2026-04-11T10:30:00-07:00"
description: "A complete implementation spec for the Mission Authority Service model: compiler pipeline, Cedar policy reference, approval model, admin console, host and MCP enforcement, observation mode rollout, and test suite — ready to hand off to an implementation team."
summary: "The full implementation spec for Mission architecture: how to compile a bounded authority record from user intent, project it into Cedar policies and audience-specific OAuth tokens, enforce it at the host and MCP tool boundary, gate irreversible actions at the commit boundary, and keep the governance model operational through template ownership, revocation SLAs, audit integrity tiers, and an operator console built around decisions not raw artifacts. Includes V1 product contract, sequenced build plan, configuration management, and a complete test appendix."
tags:
  - "Agentic Identity"
  - "Delegated Authority"
  - "IAM"
  - "OAuth"
  - "MCP"
  - "Security Architecture"
---


The architectural argument in [Where Mission Lives in the IAM Stack](/notes/where-mission-lives-in-the-iam-stack/) is that Mission needs a durable state owner. The obvious next question is: what would an end-to-end implementation actually look like?

The rest of this note makes that concrete.

> You can build a credible Mission architecture today with OAuth and MCP, but only if Mission is treated as the durable authority record and everything else is treated as a projection or enforcement surface.

### What this design gets right that others miss

Before diving in, the six non-obvious choices worth reading carefully:

1. **The compiler is the trust boundary, not the model.** The shaping model produces a proposal. The compiler turns that proposal into enforceable state using trusted inputs: the resource catalog, policy templates, and deterministic rules. The model cannot grant itself authority by producing a permissive proposal. This is not how most agent systems work today.

2. **`constraints_hash` is a live version handle, not an audit tag.** Every enforcement point checks the hash on every request. A stale hash blocks execution. A new hash triggers a fresh policy bundle pull. This is what makes revocation and amendment effective at runtime rather than just in the log.

3. **The host queries authority once at authority transitions, not per thought.** The capability snapshot model means the host fetches current authority state at session start and when Mission state changes — not on every tool consideration. This keeps MAS off the hot path and makes the system fast enough to use in practice.

4. **The commit boundary is owned by the downstream system, not the agent.** Irreversible side effects require revalidation at the moment they become real, by a component the agent cannot bypass. This is the only containment boundary that survives prompt injection.

5. **Cedar policies are per template class, not per Mission instance.** Entity snapshots are per Mission; policies are shared. This means O(templates) policy sets, not O(missions). Amendments change the entity snapshot, not the policy. The system stays evaluable at scale.

6. **Approval is either pre-approved patterns or explicit inline user pause.** There is no background wait on an unavailable approver. Most work auto-approves. When a human needs to be in the loop, they are the user in the current session. This is the model that actually works in agentic deployments.

## How to read this note

This note is doing three jobs at once. Read it in this order:

| If you want... | Read these sections first |
|---|---|
| the core architecture | `The Concrete Architecture`, `Main Path`, `How Intent Gets Compiled`, `Mission Approval Paths`, `How Cedar Fits` |
| runtime execution mechanics | `Runtime Enforcement and Token Projection`, `Agent Identity and the Subject Token`, `How to Scope OAuth Tokens to Mission`, `How the Components Interact`, `Containment Is Not Optional` |
| host and tool integration | `Claude Code as the Agent Host`, `What the MCP Server Must Do`, `OpenClaw as the Agent Host (Illustrative)` |
| operator and admin operations | `Configuration Management`, `What Still Needs Real-World Tuning`, `Operator Admin Dashboard Spec`, `Operator console model`, `MAS Operator Runbook for Common Incidents`, `Pre-Enforcement and First Deployment Runbooks` |
| advanced deployment details | `Cross-Domain Tool Access with ID-JAG (Advanced Profile)`, `Delegation and Derived Sub-Missions (Advanced Profile)`, `Lifecycle and Runtime Consequences`, `Credential Lifecycle and PAM`, `MAS Availability and Degraded Mode`, `Multi-Tenant MAS Isolation` |
| worked examples and validation | `Applying the research`, `Worked Example`, `Cross-Domain Worked Example (Advanced Profile)`, `Test Appendix` |

The simplified core path is:

1. shape the Mission
2. compile and approve it
3. project it into policy and tokens
4. enforce it at host and MCP boundaries
5. refresh and revalidate it when lifecycle, approval, or risk changes

## V1 Product Contract

This is the single authoritative statement of what v1 is. An implementation team can build v1 by satisfying everything in this section. Supporting detail is in the sections referenced.

**What v1 is:**

| Dimension | V1 value |
|---|---|
| Trust domain | one enterprise trust domain — no cross-domain federation |
| Host | one host: Claude Code with hooks wired per this spec |
| OAuth AS mode | self-contained JWTs with `mission_id` and `constraints_hash` claims; freshness validated via MAS live check at commit boundary |
| Policy language | Cedar — no custom adapters in v1 |
| MCP surface | one MCP server family for the initial template pack |
| Approval model | `auto_with_release_gate` as the default; inline user step-up for gated actions |
| Commit-boundary owner | downstream system of record for each high-risk tool family |
| Template pack | `board_packet_preparation`, `support_ticket_triage`, `draft_and_review` |
| Sub-agents | not in v1 |
| Async enterprise approval | not in v1 |

**V1 is done when:**

1. The compiler produces a deterministic enforcement bundle with a stable `constraints_hash` for any input drawn from the v1 template pack
2. Missions move through the full lifecycle (`draft → approved → active → completed/revoked`) with approval evidence persisted
3. No external tool executes based on `tools/list` alone — `tools/call` is Mission-aware and enforces Cedar evaluation
4. Stale `constraints_hash` blocks execution at the host and at the MCP server
5. Token issuance is audience-specific and reproducible from Mission state
6. Gated actions cannot obtain a commit-boundary pass without a current approval object
7. Commit-boundary actions are non-bypassable — the downstream system of record owns the final check
8. Revocation changes runtime behavior on the next checkpoint (within entity snapshot TTL, not immediately)
9. Signal ingestion is live — anomaly signals flow from host to MAS and affect Mission state
10. The admin console surfaces: active Missions, pending approvals, recent denials, template drift, and emergency controls

**What is explicitly out of scope for v1:**

- cross-domain federation (ID-JAG, domain-B token exchange)
- sub-agent delegation (child Missions, narrowing proofs)
- asynchronous enterprise approval workflows
- advanced profiles (multi-agent orchestration, broad delegation)
- general-purpose temporary elevation
- custom policy adapters (Cedar only in v1)

If a feature does not appear in the v1 include list above, it is out of scope. Resist including it "just in case" — advanced profile complexity added early is the main reason v1 deployments stall.

**Supporting sections:**

| Topic | Where to find it |
|---|---|
| Build sequence | [V1 sequenced build plan](#v1-sequenced-build-plan) |
| Component list | [The Concrete Architecture](#the-concrete-architecture) |
| Template definitions | [Template Building](#template-building) and [Template governance ownership](#template-governance-ownership-and-cadence) |
| Compiler pipeline | [How Intent Gets Compiled](#how-intent-gets-compiled) |
| Host integration | [Claude Code as the Agent Host](#claude-code-as-the-agent-host) |
| Approval model | [`auto` vs `auto_with_release_gate`](#auto-vs-auto_with_release_gate-side-by-side-comparison) |
| Cedar policy | [Cedar Policy Reference](#cedar-policy-reference) |
| Configuration defaults | [Configuration Management](#configuration-management) |
| Admin console | [Admin dashboard and operator runbook](#admin-dashboard-and-operator-runbook) |
| Test suite | [Test Appendix](#test-appendix) |

## Build target

This note should be readable as a build handoff. A coding agent implementing this design should be able to identify:

- required components
- required artifacts
- required state transitions
- required enforcement points
- required failure behavior

Anything not tied to one of those should be treated as explanatory context, not as the implementation contract.

The implementation below uses:

- a **Mission Authority Service (MAS)** as the state owner
- a **Mission shaping** step that turns user intent into a bounded authority record
- an OAuth authorization server to mint **tool-facing tokens** scoped to Mission
- **MCP servers** as the tool transport and tool enforcement surface
- explicit **containment** at both the tool boundary and the commit boundary
- a **Cedar** policy layer to evaluate principal, action, resource, and context against the Mission Authority Model

I also looked at the current [Model Context Protocol authorization spec](https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization), the MCP [tools spec](https://modelcontextprotocol.io/specification/2025-03-26/server/tools), [OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693), [OAuth 2.0 Rich Authorization Requests](https://datatracker.ietf.org/doc/html/rfc9396), the [Cedar policy language](https://docs.cedarpolicy.com/), and Anthropic's [Claude Code hooks](https://docs.anthropic.com/en/docs/claude-code/hooks).

## The Concrete Architecture

Use seven components:

| Component | Required responsibility | Minimum output |
|---|---|---|
| Agent host | own the execution loop and enforce host-side policy before tool execution | allowed / ask / deny decision, runtime signals |
| Mission shaper | turn user request into structured proposal without broadening intent | Mission proposal JSON |
| Mission Authority Service | persist Mission lifecycle and compiled authority state | governance record, approval state, `constraints_hash` |
| Policy engine | evaluate policy during issuance and execution | permit / forbid / residual capability surface |
| OAuth authorization server | mint audience-specific projections of Mission authority | MCP transport tokens, direct API tokens |
| MCP gateway / client | attach projected authority to tool transports | authenticated MCP/API requests |
| MCP servers / tool services | enforce tool visibility and execution constraints | filtered `tools/list`, enforced `tools/call`, commit-boundary checks |

That is the minimum shape. Do not collapse it into prompt text and local orchestration state.

### Core ERD

This ERD shows the durable records in the simplified core profile.

```mermaid
erDiagram
    MISSION ||--|| GOVERNANCE_RECORD : "materializes"
    MISSION ||--o{ APPROVAL_OBJECT : "may require"
    MISSION ||--o{ TOKEN_PROJECTION : "projects into"
    MISSION ||--o{ RUNTIME_SIGNAL : "receives"
    MISSION ||--|| POLICY_BUNDLE : "compiles to"
    MISSION }o--|| TEMPLATE : "matches"
    MISSION }o--|| RESOURCE_CATALOG_VERSION : "compiled against"
    GOVERNANCE_RECORD ||--|| POLICY_BUNDLE : "versioned by constraints_hash"
    POLICY_BUNDLE ||--o{ TOKEN_PROJECTION : "constrains"

    MISSION {
      string mission_id PK
      string status
      string purpose_class
      string constraints_hash
      string template_version
      string catalog_version
    }
    GOVERNANCE_RECORD {
      string mission_id PK
      string user_id
      string agent_id
      string approval_mode
      datetime created_at
      datetime expires_at
    }
    APPROVAL_OBJECT {
      string approval_id PK
      string mission_id FK
      string approval_type
      string approved_by
      string constraints_hash
      datetime expires_at
    }
    TOKEN_PROJECTION {
      string token_projection_id PK
      string mission_id FK
      string audience
      string constraints_hash
      string token_type
    }
    RUNTIME_SIGNAL {
      string signal_id PK
      string mission_id FK
      string event_type
      string severity
      datetime emitted_at
    }
    POLICY_BUNDLE {
      string constraints_hash PK
      string mission_id FK
      string template_class
      string bundle_version
    }
    TEMPLATE {
      string template_id PK
      string purpose_class
      string version
    }
    RESOURCE_CATALOG_VERSION {
      string catalog_version PK
      string tenant_id
      datetime published_at
    }
```

### Runtime Artifact ERD

This ERD shows the narrower runtime objects that enforcement points actually consume.

```mermaid
erDiagram
    SUBJECT_TOKEN ||--o{ AUDIENCE_TOKEN : "exchanged into"
    GOVERNANCE_RECORD ||--o{ CAPABILITY_SNAPSHOT : "projects into"
    POLICY_BUNDLE ||--o{ CAPABILITY_SNAPSHOT : "shapes"
    CAPABILITY_SNAPSHOT ||--o{ AUDIENCE_TOKEN : "drives issuance"
    APPROVAL_OBJECT ||--o{ COMMIT_INTENT : "authorizes"
    AUDIENCE_TOKEN ||--o{ COMMIT_INTENT : "presents"

    SUBJECT_TOKEN {
      string token_id PK
      string subject
      string issuer
      datetime expires_at
    }
    AUDIENCE_TOKEN {
      string token_id PK
      string audience
      string mission_id
      string constraints_hash
      datetime expires_at
    }
    CAPABILITY_SNAPSHOT {
      string mission_id PK
      string constraints_hash
      string planning_state
      datetime refresh_after
    }
    COMMIT_INTENT {
      string commit_intent_id PK
      string mission_id
      string resource_owner
      string status
    }
```

### Required implementation artifacts

The system is not complete until it can produce and consume these artifacts:

| Artifact | Produced by | Consumed by |
|---|---|---|
| Mission proposal | Mission shaper | MAS compiler |
| governance record | MAS | operators, host, audit |
| compiled policy bundle | compiler | Cedar evaluator, AS, MCP server, host |
| `constraints_hash` | compiler | host, AS, MCP server, audit |
| approval object | MAS / approver | host, MCP commit boundary, audit |
| token projection | AS | MCP server, downstream API |
| runtime signal | host / MCP / MAS | MAS, caches, PAM, audit |

### Artifact interface contracts

These artifacts need stable wire shapes. A coding agent should not invent them ad hoc.

#### Mission proposal schema

Produced by the Mission shaper. Consumed by the compiler.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `proposal_id` | string | stable identifier for the shaped proposal |
| `summary` | string | short human-readable statement of requested work |
| `purpose` | string | one-sentence purpose statement |
| `requested_resource_classes` | string[] | high-level resource classes implied by the request |
| `requested_actions` | string[] | high-level action classes implied by the request |
| `requested_tools` | string[] | explicit or inferred tools |
| `stage_constraints` | object[] | requested or implied approval gates |
| `time_bounds` | object | requested or inferred time window |
| `delegation_bounds` | object | optional advanced-profile child-agent behavior |
| `explicit_exclusions` | string[] | out-of-scope items |
| `open_questions` | string[] | ambiguities that block approval |
| `confidence` | string | `low`, `medium`, or `high` |

Minimum payload:

```json
{
  "proposal_id": "prop_01JR9S2N0P",
  "summary": "Prepare the Q2 board packet comparing actuals to plan",
  "purpose": "Prepare an internal board packet for Q2 financial review",
  "requested_resource_classes": ["finance.read", "documents.read", "documents.write"],
  "requested_actions": ["read", "summarize", "draft"],
  "requested_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "stage_constraints": [
    {
      "name": "release_gate",
      "reason": "user asked to review before release"
    }
  ],
  "time_bounds": {
    "requested_ttl_seconds": 28800
  },
  "delegation_bounds": {
    "subagents_allowed": false,
    "requested_max_depth": 0
  },
  "explicit_exclusions": ["publish_external", "pay"],
  "open_questions": [],
  "confidence": "high"
}
```

#### Review packet schema

Produced by the compiler. Consumed by human approvers and auto-approval logic.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `review_id` | string | review object identifier |
| `mission_id` | string | target Mission |
| `purpose_class` | string | selected normalized purpose |
| `summary` | string | human-readable description |
| `allowed_tools` | string[] | tools in the candidate envelope |
| `gated_tools` | string[] | tools requiring explicit approval |
| `denied_tools` | string[] | tools excluded from the envelope |
| `trust_domains` | string[] | domains involved |
| `risk_level` | string | low/medium/high |
| `risk_factors` | object[] | scored reasons for approval mode |
| `recommended_path` | string | `auto`, `auto_with_release_gate`, `human_step_up`, `clarification_required`, `denied` — `auto_with_release_gate` means the Mission can activate immediately but one or more stage gates must be satisfied before the gated actions become real |

Minimum payload:

```json
{
  "review_id": "rev_01JR9S9D1H",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "purpose_class": "board_packet_preparation",
  "summary": "Prepare Q2 board packet comparing actuals to plan",
  "allowed_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "gated_tools": ["docs.publish"],
  "denied_tools": ["email.send_external"],
  "trust_domains": ["enterprise"],
  "risk_level": "medium",
  "risk_factors": [
    {"signal": "destructive_action", "score": 30, "value": "docs.publish"}
  ],
  "recommended_path": "auto_with_release_gate"
}
```

#### Governance record schema

Produced by the MAS. Consumed by operators, host, policy services, and audit.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `mission_id` | string | Mission identifier |
| `status` | string | lifecycle state |
| `approval_mode` | string | approval path used |
| `principal` | object | user and agent identities |
| `purpose_class` | string | normalized purpose |
| `approved_tools` | string[] | current allowed tools |
| `actions` | string[] | current allowed actions |
| `allowed_domains` | string[] | current trust-domain envelope |
| `stage_constraints` | object[] | remaining gates |
| `delegation_bounds` | object | child-agent limit |
| `constraints_hash` | string | current enforceable version |

Minimum payload:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "active",
  "approval_mode": "auto",
  "principal": {
    "user_id": "user_123",
    "agent_id": "agent_research_assistant"
  },
  "purpose_class": "board_packet_preparation",
  "approved_tools": ["mcp__finance__erp.read_financials", "mcp__docs__docs.read", "mcp__docs__docs.write"],
  "actions": ["read", "summarize", "draft"],
  "allowed_domains": ["enterprise"],
  "stage_constraints": [
    {
      "name": "controller_approval",
      "applies_to": ["mcp__docs__docs.publish"]
    }
  ],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0
  },
  "constraints_hash": "sha256-abc123"
}
```

#### Approval object schema

Produced by MAS or human approval workflow. Consumed by host, MCP commit boundary, and audit.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `approval_id` | string | approval artifact identifier |
| `mission_id` | string | Mission being approved |
| `approval_type` | string | e.g. `controller_approval`, `finance_approval` |
| `approved_by` | string | human or policy identity |
| `approved_scope` | object | exact tools and actions covered by this approval |
| `status` | string | `granted`, `denied`, `expired` |
| `issued_at` | string | approval time |
| `expires_at` | string | approval expiry |
| `constraints_hash` | string | Mission version this approval applies to |
| `reusable_within_mission` | boolean | whether the approval may be used more than once within the same Mission version; defaults to `false` for irreversible actions |

Minimum payload:

```json
{
  "approval_id": "appr_01JR9SQP4W",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "approval_type": "controller_approval",
  "approved_by": "user:controller_42",
  "approved_scope": {
    "tools": ["mcp__docs__docs.publish"],
    "actions": ["publish_external"]
  },
  "status": "granted",
  "issued_at": "2026-04-11T18:10:00Z",
  "expires_at": "2026-04-11T19:10:00Z",
  "constraints_hash": "sha256-abc123",
  "reusable_within_mission": false
}
```

#### Runtime signal schema

Produced by host, MCP server, or MAS. Consumed by MAS, caches, PAM, and audit.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `signal_id` | string | signal identifier |
| `mission_id` | string | Mission related to the event |
| `source` | string | emitter such as `host`, `mcp_server`, `mas` |
| `event_type` | string | `tool.denied`, `approval.granted`, `mission.revoked`, etc. |
| `tool` | string | affected tool if any |
| `resource_id` | string | affected resource if any |
| `risk_level` | string | current severity |
| `timestamp` | string | event time |
| `correlation_id` | string | request/session correlation |

Minimum payload:

```json
{
  "signal_id": "sig_01JR9T3Y2M",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "source": "mcp_server",
  "event_type": "commit.denied",
  "tool": "mcp__docs__docs.publish",
  "resource_id": "mcp__docs__docs.publish",
  "risk_level": "high",
  "timestamp": "2026-04-11T18:12:00Z",
  "correlation_id": "req_01JR9T3W7G"
}
```

#### Token projection metadata schema

Produced by the AS alongside token issuance logic. Consumed by MCP servers, downstream APIs, and audit.

Required fields:

| Field | Type | Meaning |
|---|---|---|
| `projection_id` | string | issuance record identifier |
| `mission_id` | string | Mission used for issuance |
| `constraints_hash` | string | Mission version projected into the token |
| `audience` | string | MCP server or API audience |
| `allowed_tools` | string[] | tool projection for the audience |
| `allowed_actions` | string[] | action projection for the audience |
| `allowed_domains` | string[] | domain projection for the audience |
| `issued_at` | string | issuance time |
| `expires_at` | string | projection expiry |

Minimum payload:

```json
{
  "projection_id": "proj_01JR9T5N5S",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "audience": "https://mcp.finance.internal",
  "allowed_tools": ["mcp__finance__erp.read_financials"],
  "allowed_actions": ["read"],
  "allowed_domains": ["enterprise"],
  "issued_at": "2026-04-11T18:15:00Z",
  "expires_at": "2026-04-11T19:15:00Z"
}
```

These schemas are the minimum contract. They can be extended, but the build should not proceed without stable versions of them.

#### Tool name forms

Two forms appear in these schemas and should not be mixed within the same field:

- **Short form** (`erp.read_financials`, `docs.write`): used in Mission proposals and review packets, where human readability matters and the canonical resolution has not yet happened.
- **Canonical resource ID** (`mcp__finance__erp.read_financials`, `mcp__docs__docs.write`): used in governance records, compiled tokens, Cedar entities, and any field consumed by enforcement systems. Format is `mcp__<server>__<tool>` for MCP tools.

The resource catalog is the only translation surface between these two forms. Enforcement systems must always use the canonical resource ID. Any field in a schema above that lists tools for enforcement purposes uses the canonical form. Any field produced by or consumed by the shaper or review UI may use the short form.

### Service API contracts

Stable payloads are not enough. The system also needs stable service boundaries.

Use four API groups:

1. **MAS Mission APIs**
2. **approval workflow APIs**
3. **authorization server APIs**
4. **signal ingestion APIs**

The endpoints below are the minimum useful surface.

#### MAS Mission APIs

| Method | Path | Purpose |
|---|---|---|
| `POST` | `/missions` | create Mission from shaped proposal and request context |
| `POST` | `/missions?mode=preview` | dry-run compile — returns review packet without persisting state |
| `GET` | `/missions/{mission_id}` | fetch governance record |
| `GET` | `/missions` | list Missions for a principal (required for session resumption) |
| `POST` | `/missions/{mission_id}/capability-snapshot` | return current capability snapshot |
| `POST` | `/missions/{mission_id}/derive` | request derived sub-Mission |
| `POST` | `/missions/{mission_id}/amend` | request Mission amendment (narrowing or broadening) |
| `POST` | `/missions/{mission_id}/clarify` | submit clarification responses to resolve `pending_clarification` |
| `POST` | `/missions/{mission_id}/suspend` | suspend Mission (operator-initiated; requires operator lift) |
| `POST` | `/missions/{mission_id}/pause` | pause Mission (user-initiated; user can resume) |
| `POST` | `/missions/{mission_id}/resume` | resume a user-paused Mission |
| `POST` | `/missions/{mission_id}/revoke` | revoke Mission |
| `POST` | `/missions/{mission_id}/complete` | mark Mission completed and release authority |
| `POST` | `/missions/{mission_id}/clone` | create new Mission pre-populated from an existing Mission's approved scope |

**Tenant admin operations** (require operator or admin role; `user_id` from token must have tenant-admin entitlement):

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/tenants/{tenant_id}/missions` | list all Missions in tenant (admin view with all statuses) |
| `POST` | `/tenants/{tenant_id}/missions/revoke-all` | revoke all active Missions for a user (offboarding, incident) |
| `POST` | `/tenants/{tenant_id}/emergency-halt` | halt all agent activity in tenant immediately |
| `POST` | `/tenants/{tenant_id}/emergency-readonly` | drop all tenants to read-only mode |
| `GET` | `/tenants/{tenant_id}/missions/summary` | aggregated view: count by purpose class, risk level, approval mode |

**Mission quota and rate limits:**

MAS enforces quotas at Mission creation time. Requests that exceed quota receive `429 Too Many Requests` with a `retry_after` field.

| Quota | Default | Scope |
|---|---|---|
| Active Missions per user | 5 | per user, per tenant |
| Mission creations per hour | 20 | per user, per tenant |
| Active Missions per tenant | configurable | per tenant (operator-set) |
| `pending_clarification` Missions | 3 | per user; blocks new submissions until resolved or expired |

Quota limits are configurable per tenant via operator API. The default active-Mission limit of 5 per user is intentionally low — most users doing governed work have one or two active Missions at a time. A user hitting this limit is likely accumulating stale Missions that should be completed or expired.

Minimum `POST /missions` request:

```json
{
  "proposal": {
    "proposal_id": "prop_01JR9S2N0P",
    "summary": "Prepare the Q2 board packet comparing actuals to plan"
  },
  "request_context": {
    "user_id": "user_123",
    "agent_id": "agent_research_assistant",
    "session_id": "sess_123",
    "tenant_id": "acme",
    "entry_channel": "claude_code"
  }
}
```

Minimum `POST /missions` response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "active",
  "approval_mode": "auto",
  "constraints_hash": "sha256-abc123"
}
```

Minimum `GET /missions/{mission_id}` response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "active",
  "approval_mode": "auto",
  "principal": {
    "user_id": "user_123",
    "agent_id": "agent_research_assistant"
  },
  "purpose_class": "board_packet_preparation",
  "approved_tools": ["mcp__finance__erp.read_financials", "mcp__docs__docs.read", "mcp__docs__docs.write"],
  "actions": ["read", "summarize", "draft"],
  "allowed_domains": ["enterprise"],
  "stage_constraints": [
    {
      "name": "controller_approval",
      "applies_to": ["mcp__docs__docs.publish"]
    }
  ],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0
  },
  "constraints_hash": "sha256-abc123"
}
```

Minimum `POST /missions/{mission_id}/derive` request:

```json
{
  "parent_mission_id": "mis_parent_01",
  "requested_tools": ["mcp__finance__erp.read_financials"],
  "requested_actions": ["read", "summarize"],
  "requested_domains": ["enterprise"],
  "requested_expiry": "2026-04-11T20:00:00Z",
  "requested_max_depth": 0
}
```

Minimum `POST /missions/{mission_id}/derive` response:

```json
{
  "mission_id": "mis_child_01",
  "parent_mission_id": "mis_parent_01",
  "status": "active",
  "constraints_hash": "sha256-child-123",
  "proof_id": "proof_01JR9T1R7W"
}
```

Minimum `POST /missions/{mission_id}/amend` request:

```json
{
  "amendment_type": "narrowing",
  "reason": "governance operator removed external send scope",
  "delta": {
    "remove_tools": ["mcp__email__email.send_external"],
    "remove_actions": [],
    "add_tools": [],
    "add_actions": [],
    "modify_stage_constraints": [],
    "modify_time_bounds": {}
  },
  "requested_by": "operator:security_ops_42"
}
```

For broadening amendments, `amendment_type` is `"broadening"` and the delta uses `add_tools` and `add_actions`. Broadening must go through the approval path.

Minimum `POST /missions/{mission_id}/amend` response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "amendment_id": "amend_01JR9V8K2P",
  "amendment_type": "narrowing",
  "status": "active",
  "constraints_hash": "sha256-def456",
  "prior_constraints_hash": "sha256-abc123"
}
```

For broadening amendments that require human approval, the response status will be `pending_approval` and the prior `constraints_hash` remains active until approval is granted.

Amendment compiler pipeline:

1. **Narrowing**: apply the delta directly; run only steps 8-10 of the compiler (enforcement bundle + `constraints_hash` + persist). No re-shaping, no re-classification, no approval.
2. **Broadening**: run steps 4-10 of the compiler against the delta only (resolve the added resources, score the delta, build a delta review packet, emit for approval). The existing `constraints_hash` remains active for the non-broadened scope while the delta is pending.

Minimum `POST /missions/{mission_id}/clarify` request:

```json
{
  "clarification_id": "clar_01JR9S6A1Z",
  "responses": [
    {
      "question": "Which external recipients are in scope?",
      "answer": "investor-relations@acme.com only",
      "resolved": true
    }
  ]
}
```

Minimum `POST /missions/{mission_id}/clarify` response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "pending_approval",
  "remaining_open_questions": []
}
```

After all open questions are resolved, the MAS re-runs approval classification from step 2 (template match) with the resolved fields. If the updated proposal now qualifies for auto-approval, the Mission moves directly to `active`. If it still requires human review, it moves to `pending_approval` and a review work item is created.

Minimum `GET /missions` query parameters:

| Parameter | Required | Purpose |
|---|---|---|
| `user_id` | yes | filter to Missions for this principal |
| `status` | no | filter by lifecycle status (e.g., `active`, `pending_approval`) |
| `entry_channel` | no | filter by host surface |

The caller's identity (from the bearer token) must match the `user_id` parameter or the request must be rejected. MAS must not allow cross-user Mission listing.

Minimum `GET /missions` response:

```json
{
  "missions": [
    {
      "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
      "status": "active",
      "purpose_class": "board_packet_preparation",
      "constraints_hash": "sha256-abc123",
      "created_at": "2026-04-11T10:35:00Z",
      "expires_at": "2026-04-11T23:59:59Z",
      "entry_channel": "claude_code"
    }
  ]
}
```

**Session resumption contract:**

When a new session needs to resume work on an existing Mission (e.g., after async approval, session timeout, or continuation):

1. The host calls `GET /missions?user_id={uid}&status=active` with the user's current session token
2. MAS verifies the caller's identity matches `user_id` — the binding is to the authenticated user, not to any session identifier
3. The host finds the Mission by `purpose_class` or presents the list to the user to select from
4. The host calls `GET /missions/{mission_id}` to hydrate the full authority record
5. The host re-establishes the local Mission cache: `mission_id`, `constraints_hash`, `approved_tools`, `stage_constraints`
6. The host proceeds as if the Mission was just activated — there is no "resume session" operation; every session binds to the current Mission state at the time it starts

The session never resumes; the Mission resumes. A new session is always a fresh execution environment that fetches the current capability snapshot before acting. This means a session cannot assume anything about what prior sessions did — it must refresh its planning view from current authority state.

Required behavior:

- `POST /missions` must fail closed on unknown tools or unresolved clarifications
- `POST /missions/{mission_id}/derive` must persist a narrowing proof artifact
- `POST /missions/{mission_id}/amend` with narrowing must produce a new `constraints_hash` and emit `mission.amended` immediately
- `POST /missions/{mission_id}/amend` with broadening must not change the active `constraints_hash` until approval is granted
- `POST /missions/{mission_id}/clarify` must re-run approval classification after resolving questions
- `POST /missions/{mission_id}/complete` must transition status to `completed` and revoke all active tokens for this Mission
- `GET /missions` must enforce that the caller can only list their own Missions
- suspend and revoke endpoints must emit lifecycle signals
- `POST /missions/{id}/pause` blocks token issuance and tool execution; emits `mission.paused` with source `user`; does not change `constraints_hash`; can only be lifted by the same user via `POST /missions/{id}/resume` — not by operator suspend/lift flow
- `POST /missions/{id}/resume` on a user-paused Mission re-activates token issuance; emits `mission.resumed`; does not re-run approval classification
- `POST /missions/{id}/clone` runs the full compiler pipeline against the current catalog and template version using the approved scope of the source Mission as the proposal input; does not skip compilation or approval — the result may differ from the original if the catalog or template changed
- `POST /missions?mode=preview` runs steps 1-9b of the compiler pipeline; returns the review packet and recommended approval path; persists nothing; no `mission_id` is created; the response includes a `preview_id` that can be passed to a subsequent `POST /missions` call to skip re-shaping (the compiler re-runs from the preview result rather than re-calling the shaper)

#### Approval workflow APIs

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/approvals/work-items/{review_id}` | fetch review work item |
| `POST` | `/approvals/work-items/{review_id}/approve` | approve gated Mission scope |
| `POST` | `/approvals/work-items/{review_id}/deny` | deny gated Mission scope |
| `GET` | `/missions/{mission_id}/approvals/{approval_type}` | fetch current approval object for one gate |

Minimum approve request:

```json
{
  "approved_by": "user:controller_42",
  "approval_type": "controller_approval",
  "constraints_hash": "sha256-abc123",
  "approved_scope": {
    "tools": ["mcp__docs__docs.publish"],
    "actions": ["publish_external"]
  },
  "expires_at": "2026-04-11T19:10:00Z"
}
```

Minimum approve response:

```json
{
  "approval_id": "appr_01JR9SQP4W",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "active",
  "constraints_hash": "sha256-abc123"
}
```

Minimum deny response:

```json
{
  "review_id": "rev_01JR9S9D1H",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "denied",
  "reason": "controller rejected external publication"
}
```

Required behavior:

- approval requests must be bound to the current `constraints_hash`
- if the Mission version changed since the work item was created, approve must fail with conflict
- approval grant or denial must emit a signal and invalidate planning caches

#### Authorization server APIs

These APIs extend standard OAuth (RFC 6749, RFC 8693 token exchange, RFC 7662 introspection) with Mission-specific claims. The full request/response contract is specified below.

| Method | Path | Purpose |
|---|---|---|
| `POST` | `/oauth/token` | exchange subject token for Mission-scoped audience token |
| `POST` | `/oauth/introspect` | introspect opaque tokens where used |
| `POST` | `/oauth/revoke` | revoke issued tokens where required |

Mission-aware introspection response (extends RFC 7662):

```json
{
  "active": true,
  "sub": "user_123",
  "aud": "https://mcp.finance.internal",
  "scope": "mcp.tools.call finance.read",
  "exp": 1744408800,
  "iat": 1744405200,
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "allowed_tools": ["mcp__finance__erp.read_financials"],
  "stage_constraints": []
}
```

When the Mission is revoked or the token is invalidated, return:

```json
{
  "active": false
}
```

Consumers using the opaque token model should treat `active: false` the same as a stale `constraints_hash` in the self-contained model: deny the request and emit a signal. The `mission_id` and `constraints_hash` in the introspection response are the authoritative liveness signals — not the token's own `exp` claim.

Minimum Mission-scoped token request:

```json
{
  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
  "subject_token": "eyJ...",
  "subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
  "audience": "https://mcp.finance.internal",
  "scope": "mcp.tools.call finance.read",
  "resource": "https://mcp.finance.internal",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "requested_tools": ["mcp__finance__erp.read_financials"],
  "requested_actions": ["read"]
}
```

Minimum token response:

```json
{
  "access_token": "eyJ...",
  "token_type": "Bearer",
  "expires_in": 3600,
  "issued_token_type": "urn:ietf:params:oauth:token-type:access_token",
  "projection_id": "proj_01JR9T5N5S",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123"
}
```

Required behavior:

- AS must verify Mission is active and hash-current before issuance
- AS must reject requests for tools or actions outside the current projection
- AS must emit issuance records or projection metadata for audit

**Token lifetime defaults:**

| Token type | Recommended `expires_in` | Reasoning |
|---|---|---|
| MCP transport token | 300-900 seconds (5-15 min) | Short enough that a revoked Mission stops materializing before most sessions end; long enough to avoid constant re-issuance on active sessions |
| Direct API token | 300-600 seconds (5-10 min) | Direct API tokens bypass MCP-layer freshness checks, so the shorter window compensates |
| Approval object | 1800-3600 seconds (30-60 min) | Approval windows should be long enough for a human to take action but not so long that an approved gate stays open indefinitely |
| Sub-agent delegated token | not to exceed parent token remaining lifetime | enforced by the AS narrowing check at issuance |

The `expires_in: 3600` in the minimum token response example is a placeholder for documentation. Production deployments should use the 300-900 range for MCP transport tokens. Longer lifetimes require either opaque introspection or synchronous commit-boundary checks to enforce revocation within the token window.

**Token refresh strategy:**

With 300-900 second lifetimes, the host must refresh tokens proactively. Do not wait for a 401.

Use this refresh rule:

- refresh when remaining token lifetime falls below `max(60s, expires_in * 0.2)` — that is, with at least 60 seconds remaining or 20% of the original lifetime, whichever is larger
- the host local gateway (or hook script) checks expiry on each `PreToolUse` event and refreshes before the call if the token is within the refresh window
- do not refresh in the middle of an in-flight `tools/call`; refresh between calls only
- if a refresh fails, the behavior depends on the risk level of the next call:
  - low-risk read with remaining lifetime > 0: allow the call and retry refresh after
  - any commit-boundary or gated action: fail closed and surface the refresh failure before the side effect becomes real
- if the Mission has been revoked or suspended since the last refresh, the AS will reject the refresh attempt; that rejection is the signal that triggers session shutdown or restricted mode

A refresh attempt is the same token exchange request as initial issuance, using the current `constraints_hash`. If the Mission has been amended since the last issuance, the refreshed token will carry the updated projection automatically.

#### Signal ingestion APIs

| Method | Path | Purpose |
|---|---|---|
| `POST` | `/signals` | ingest runtime or lifecycle signal |
| `GET` | `/missions/{mission_id}/signals` | fetch signal history for debugging and audit |

Minimum `POST /signals` request:

```json
{
  "signal_id": "sig_01JR9T3Y2M",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "source": "mcp_server",
  "event_type": "commit.denied",
  "tool": "mcp__docs__docs.publish",
  "resource_id": "mcp__docs__docs.publish",
  "risk_level": "high",
  "timestamp": "2026-04-11T18:12:00Z",
  "correlation_id": "req_01JR9T3W7G"
}
```

Minimum `POST /signals` response:

```json
{
  "accepted": true,
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "effects": [
    "cache_invalidation",
    "risk_recompute"
  ]
}
```

Required behavior:

- signal ingestion must be idempotent on `signal_id`
- MAS must process signals in Mission order where ordering matters
- signal effects that change enforceable state must produce a new `constraints_hash`

**Backpressure and 429 client behavior:**

When the ingestion endpoint returns `429 signal_rate_exceeded`, the caller must not drop the signal silently. Use the following rules:

| Signal priority | On 429 | Max retry window |
|---|---|---|
| Lifecycle-critical (`mission.suspended`, `mission.revoked`, `approval.granted`, `approval.expired`) | queue locally; retry with exponential backoff + jitter; at-least-once delivery required | retry for up to 5 minutes; if still failing, surface a degraded-mode alert to the operator |
| High-priority enforcement signals (`tool.denied`, `commit.denied`, `anomaly.flagged`) | queue locally; retry with exponential backoff | retry for up to 60 seconds; drop if still failing and log the drop |
| Low-priority telemetry (`tool.allowed`, `session.start`, `session.end`) | retry once after 5 seconds; drop on second failure | no extended retry; these are best-effort |

Use exponential backoff starting at 1 second with a 2x multiplier and ±20% jitter. Cap individual retry intervals at 30 seconds.

Callers must not retry lifecycle-critical signals synchronously in the middle of a tool call. Queue the signal and continue; the retry happens asynchronously.

## Main Path

This is the main build path for the system:

1. shape a Mission proposal
2. compile it into a bounded authority envelope
3. approve or escalate it
4. project it into policy and tokens
5. enforce it at host and MCP boundaries
6. re-evaluate it as Mission state changes

The next sections follow that path in order.

## Key Decisions

This design has a few decisions that matter more than the rest. Readers should not have to infer them from the middle of the note.

| Decision | Chosen default | Why |
|---|---|---|
| Mission state owner | dedicated MAS | authority needs durable lifecycle and approval state |
| policy layer | Cedar | concrete PARC model, deterministic evaluation, good fit for issuance and tool checks |
| planning model | cached capability snapshot with bounded refresh | gives the host a stable planning surface without making MAS a chatty hot-path dependency |
| token model | audience-specific projections | avoids broad ambient tool tokens |
| token liveness | self-contained token + Mission freshness by default | keeps MCP/tool latency reasonable while preserving revocation semantics |
| runtime binding | trusted local gateway or sender-constrained token | plain bearer tokens are too weak for tool authority |
| approval model | auto-approval plus human step-up | most internal work should not require bespoke review, but irreversible actions still need gates |
| enforcement points | host boundary, `tools/call`, commit boundary | prompt text and `tools/list` are not security controls |
| delegation model | derived sub-Missions with issuance-time narrowing proof | child agents must never inherit broad parent authority implicitly |
| cross-domain model | ID-JAG for identity bridge, local token per domain | target domains stay sovereign |
| baseline entitlement relationship | Mission narrows standing entitlements by default; approved temporary elevation is a separate advanced profile | keeps core deployment aligned with existing IAM while leaving room for explicit JIT mediation |

These defaults are not arbitrary. Changing any of them changes the architecture materially.

**On baseline entitlements:** the core profile is a narrowing model. Mission does not replace entitlement provisioning; it narrows, stages, and governs already-authorized access at execution time. If a tool requires standing entitlement and the user has it, the compiler may include that tool in the Mission envelope. If a tool requires standing entitlement and the user does not have it, the compiler must not silently widen authority.

There are two supported outcomes:

1. **Core profile:** fail closed for that tool or resource and surface the gap in the review packet.
2. **Advanced profile:** mark the request as requiring **approved temporary elevation**. In that mode, Mission approval does not itself grant access. It authorizes a separate elevation flow through the organization's entitlement broker, JIT credential service, or target-domain AS. The resulting temporary entitlement must be scoped, time-bounded, auditable, and bound back to the Mission and `constraints_hash`.

This keeps Mission architecture compatible with existing RBAC or ABAC systems rather than pretending Mission is a replacement for entitlement provisioning.

### Layer-by-layer design decisions

This is the compact architectural spine by layer.

| Layer | Key design decision |
|---|---|
| Mission shaping | model output is a proposal, not authority |
| compiler | deterministic compile from proposal + catalog + templates + context |
| templates | templates stay thin and describe work pattern envelopes, not full backend semantics |
| approval | default is auto-approval plus inline human step-up |
| Mission state | MAS is the authority root as a logical role; workflow and distribution should be split when possible |
| policy | Cedar is the canonical policy model and compiler target |
| planning | host consumes a capability snapshot and refreshes only at authority transitions |
| tokens | OAuth tokens are audience-specific projections, not authority records |
| runtime binding | trusted local gateway or sender-constrained tokens by default |
| host enforcement | host is the first containment boundary |
| MCP enforcement | `tools/call` is the real enforcement point; `tools/list` is convenience |
| commit boundary | downstream system of record owns final commit where possible |
| delegation | child agents get derived sub-Missions and narrowing proofs |
| cross-domain | cross-domain is an advanced profile; ID-JAG is identity bridge only |
| scopes / Cedar / FGA | scopes for coarse boundary, Cedar for Mission context, FGA for resource-instance checks |
| risk | runtime risk is a parallel control plane, not the authority source |
| audit / integrity | high-value events must be traceable and tamper-evident |
| privacy | cross-domain projections expose only what the target domain can act on |
| rollout | migration is architectural, not just operational |
| implementation order | single-domain core first, advanced profiles later |

### Short architectural spine

If you strip the design down to its essentials:

1. shape intent into a bounded proposal
2. compile it deterministically into authority state
3. approve or escalate it
4. project it narrowly into runtime tokens and policy bundles
5. enforce it at host, tool, and commit boundaries
6. keep it current through lifecycle, signals, and revalidation

### Decision checkpoints

Before implementation starts, answer these questions explicitly:

1. will the runtime use opaque/introspected tokens or self-contained tokens plus Mission freshness
2. will tokens be held by a trusted local gateway or sender-constrained directly
3. which actions are commit-boundary actions on day one
4. which purpose templates qualify for auto-approval
5. which teams own catalog, templates, approvals, and emergency revocation
6. is the deployment running narrowing-only Missions, or does it support approved temporary elevation as an advanced profile

If those are not decided, the rest of the implementation will drift.

### Things not to simplify away

Three places where the design made a harder-but-correct choice that will be tempting to simplify under schedule pressure. Don't.

**1. The capability snapshot model.** Teams will be tempted to query MAS on every tool consideration ("let me check if this is allowed first"). Don't. Build the capability snapshot: query once at session start and after authority transitions, hold it, plan inside it. Per-call MAS queries make MAS a hot-path synchronous dependency that will fail under load and add 50-200ms to every tool decision.

**2. Compiler output validation (step 9b).** Teams will be tempted to skip the independent validation pass after compilation and just trust that the compiler is correct. Don't. The `constraints_hash` proves the bundle is self-consistent; it does not prove the bundle is correct. Step 9b is the one check that catches a compiler bug before it produces a valid-but-wrong hash that auto-approves something it shouldn't.

**3. Commit-boundary lock in MAS, not the host.** Teams will be tempted to implement the commit serialization lock in the agent host process or the MCP server. Don't. A lock that lives in the host or MCP server is invisible to other concurrent sessions against the same Mission. The lock must be in MAS because MAS is the only component with global Mission scope.

**Reconciling "MAS owns the lock" with "downstream owns serialization":** these are two different locks at two different scopes, and both must exist.

- **MAS advisory coordination lock** — prevents two concurrent sessions from simultaneously reaching the commit boundary for the same Mission and both proceeding. This is acquired by the host (via `POST /missions/{id}/commit-boundary/acquire`) before presenting the step-up prompt, and released after the downstream write completes or times out. It lives in MAS because only MAS has cross-session visibility. It is short-lived (seconds, not the duration of the whole session).
- **Downstream idempotency lock** — the downstream system of record rejects duplicate or conflicting writes for the same `commit_intent_id`. This survives MAS lock release. It is scoped to the specific resource or effect, not to the whole Mission.

Both are required. The MAS lock alone does not prevent duplicate writes if the MAS lock is released and a retry occurs. The `commit_intent_id` alone does not prevent two concurrent sessions from both acquiring the downstream write slot simultaneously. Together they provide: (a) cross-session coordination before the write, and (b) idempotent duplicate suppression at the write itself.

When the MAS advisory lock is unavailable, the host must treat the commit boundary as blocked (fail closed). When only the downstream idempotency lock is unavailable, the downstream system handles the failure per its own protocol.

### Architecture positions after deployment pressure

The design started as a clean logical model. Real deployments force a few sharper positions.

| Question | Chosen position |
|---|---|
| what is MAS | a logical authority role; authoritative state stays in MAS, while workflow, bundle distribution, and signal buffering should be split into adjacent services as the default production shape |
| how strict is planning refresh | refresh the capability snapshot at session start and after authority transitions; do not query MAS per thought or per routine tool choice |
| is Cedar mandatory everywhere | Cedar is the canonical policy model and compiler target; equivalent local adapters are acceptable at runtime |
| who owns commit boundary by default | the downstream system of record whenever it owns the final write; host and MCP provide prechecks, not final authority |
| is cross-domain core | no; single-domain Mission architecture is the core profile, cross-domain federation is an advanced profile |
| what is the main approval model | template-driven auto-approval plus inline user-controlled step-up by default; asynchronous enterprise review is a separate profile |
| what security guarantee is in scope | point-in-time authority control, auditable checkpoints, and revocation-aware containment; not complete prevention of harmful multi-step action sequences |
| are templates primary | yes for the first deployable model, but they sit on top of resource mappings and can be complemented by more resource-centered governance later |
| where does runtime risk live | as a parallel signal and control plane that feeds Mission decisions; not as the sole source of authority |
| how strong is audit integrity by default | tamper-evident records for high-value events are the default target; lighter integrity may be acceptable for low-risk operational telemetry |
| is migration architectural | yes; bootstrap Missions, shadow mode, and phased enforcement are part of the architecture, not just rollout advice |

These are the positions this note now assumes.

### Simplified deployment profile

If the goal is to achieve the same governance objectives with less distributed-systems risk, use this as the default profile:

1. **single domain only**
2. **narrowing-only Missions**
3. **MAS holds authority state only**
4. **host consumes one cached capability snapshot per Mission version**
5. **OAuth audience tokens carry `mission_id` and `constraints_hash`**
6. **host precheck + MCP `tools/call` enforcement**
7. **downstream system owns final commit boundary**
8. **no sub-agents in v1, or only pre-issued child delegation artifacts**
9. **no cross-domain federation in the core build**
10. **Cedar is canonical at compile and AS layers; adapters are acceptable elsewhere**

This profile keeps the important properties:

- bounded authority
- revocation-aware containment
- audibility and traceability
- distributed ownership of irreversible writes

while removing the most failure-prone parts of the broader design from the first deployment.

### V1 product boundary

To keep the first deployment operationally sane, treat v1 as a product with a hard boundary, not as a partial implementation of every future feature.

The recommended v1 product is:

- one enterprise trust domain
- one production host integration
- one OAuth AS mode
- one MCP server family or tightly related tool family
- one approval mode for routine work plus one inline step-up path for gated actions
- one small template pack
- one commit-boundary pattern owned by the downstream system of record

Everything else should be explicitly out of scope for v1:

- cross-domain federation
- asynchronous approval workflows
- broad delegation or multi-agent orchestration
- general-purpose temporary elevation
- multiple competing host integrations in the same first rollout

If a feature does not fit this boundary, move it to an advanced profile instead of letting it complicate the core.

### V1 reference stack

For the first real deployment, the recommended reference stack is:

- **one host:** Claude Code
- **one authority service shape:** MAS authority state + inline step-up approval
- **one AS mode:** self-contained JWTs plus freshness checks
- **one planning model:** capability snapshot refresh at authority transitions
- **one tool surface:** one MCP server family for the initial template pack
- **one commit-boundary owner pattern:** downstream system of record owns final write
- **one template pack:** `board_packet_preparation`, `support_ticket_triage`, and `draft_and_review`

If the implementation cannot be described this concretely, the v1 boundary is still too loose.

### Core profile mandatory defaults

V1 has one valid configuration. The following values are non-negotiable for v1. An implementation team may override them for v2 with documented justification — but v1 ships with these values and "we haven't decided yet" is not an acceptable v1 state for any of them.

**Approval model:** `auto_with_release_gate` is the default for all templates. Templates may use `auto` only when the template's risk tier is `low` and the template has no commit-boundary tools. `human_step_up` is reserved for high-risk templates. These are the only three valid modes — no custom approval modes in v1.

**Entity snapshot TTLs:**
- default for low-risk reads: 120 seconds
- for token issuance: 0 seconds (live check required)
- for commit-boundary decisions: 0 seconds (live check required)
Not configurable per-deployment in v1. These values are locked in the `mas.*` configuration namespace until production tuning data from a real deployment justifies changing them.

**Token lifetime:** 300–900 seconds. Default: 600 seconds. Not shorter (impractical refresh overhead). Not longer (widens the revocation gap). The AS may issue shorter tokens for high-risk audiences.

**Policy language:** Cedar. No custom adapters in v1. Every enforcement point evaluates Cedar policy. If a runtime cannot embed the Cedar library, it calls a sidecar — it does not substitute a custom allow/deny function.

**Commit-boundary ownership:** the downstream system of record owns the final write. Host and MCP precheck are convenience layers. If a downstream system does not yet own its commit boundary, that tool family is out of scope for v1 — it goes on the v2 backlog after the downstream system implements the boundary.

**Capability snapshot refresh:** exactly at session start and after authority transitions (Mission activation, amendment, suspension/resumption). Not per-tool-call. Not per-model-thought. Not on a timer. Deviating from this rule either makes MAS a hot-path per-call dependency (bad) or leaves stale snapshots cached too long (bad). Both are wrong.

**Anomaly signal weights:** use the default values from the `mas.*` configuration namespace. Do not tune them before running the system in observation mode for at least 30 days on real traffic. Premature tuning of weights produces a classifier that is calibrated to theory rather than real behavior.

**Advanced profiles:** off by default. No cross-domain federation, no sub-agent delegation, no async enterprise approval in v1. These may not even be implemented — they are out of scope, not just disabled. Implementing them as stubs "just in case" adds complexity that must be maintained and tested without being used.

### V1 sequenced build plan

Build in this order. Each step has a minimum acceptance test that unlocks the next step. Do not advance until the acceptance test passes — each step is a load-bearing dependency for the steps that follow.

**Step 1 — Resource catalog stub**

Build a minimal catalog service with a single hardcoded resource record. The record needs: `tool_name`, `resource_class`, `trust_domain`, `data_sensitivity`, `commit_boundary` (bool), `allowed_action_classes`.

*Acceptance test:* `GET /catalog/resources/{tool_name}` returns the record in the correct schema. One test tool per resource class you plan to use in the first template.

**Step 2 — Minimal compiler (Phase 0 pipeline)**

Build the 3-step Phase 0 pipeline: normalize intent → constrain to catalog → emit enforcement bundle. Skip classification scoring, review packet generation, and validation (steps 2, 4, 5, and 9b). Emit a `constraints_hash` for the bundle.

*Acceptance test:* given a natural-language Mission description and a catalog stub, the compiler produces a deterministic enforcement bundle with a stable `constraints_hash`. Run the compiler twice on the same input; hashes must match. Change one field; hash must differ.

**Step 3 — MAS core: Mission lifecycle state machine**

Build the Mission state machine: `draft → pending_review → approved → active → completed/revoked`. Implement `POST /missions`, `GET /missions/{id}`, `POST /missions/{id}/activate`, `POST /missions/{id}/revoke`. Persist governance records and compiled enforcement bundle.

*Acceptance test:* create a Mission, advance it through the full lifecycle (draft → approved → active → revoked). Verify state transitions are recorded with timestamps and actor IDs. Verify that activating from `draft` (not `approved`) returns an error.

**Step 4 — Capability snapshot API**

Add `POST /missions/{id}/capability-snapshot`. Return `allowed_tools`, `gated_tools`, `denied_actions`, `constraints_hash`, `anomaly_flags` (empty), `session_budget_status` (zeroed counters), `snapshot_ttl_seconds`.

*Acceptance test:* call the snapshot endpoint for an active Mission; verify the response matches the compiled enforcement bundle. Amend the Mission (trigger a recompile); verify the next snapshot returns the new `constraints_hash`. Verify the old `constraints_hash` is no longer returned.

**Step 5 — Host hook wiring (Claude Code)**

Wire the Claude Code hooks: `PreToolUse` checks `allowed_tools`/`denied_actions` from the cached capability snapshot. `PostToolUse` increments local budget counters and fires a signal to MAS. Session-start checklist runs in order (see [Session start checklist](#session-start-checklist)).

*Acceptance test:* with an active Mission that denies `tool_x`, issue a call to `tool_x` via the wired host — verify the call is blocked before reaching the MCP server. With a Mission that allows `tool_y`, verify the call proceeds. Verify the PreToolUse decision is logged.

**Step 6 — Cedar policy evaluation at host**

Replace the simple allowlist/denylist check with Cedar policy evaluation. Load the compiled Cedar policy bundle from MAS. Evaluate `allow`/`deny` using the Cedar runtime at PreToolUse. Verify `constraints_hash` from the local bundle matches the Mission's current hash before evaluating.

*Acceptance test:* write a Cedar policy that denies `finance.read` after 5pm (a time-based constraint). Verify the host denies the tool call at 5:01pm. Verify it allows the call at 4:59pm. Verify that presenting an expired bundle (mismatched `constraints_hash`) causes the host to fetch a fresh bundle before evaluating.

**Step 7 — MCP server enforcement**

Add `tools/call` enforcement in the MCP server: validate the OAuth audience token (verify `mission_id` claim), check `constraints_hash` against the current Mission, reject calls outside `allowed_tools`. This is the second enforcement layer after the host precheck.

*Acceptance test:* bypass the host (call the MCP server directly) with a valid token for Mission A. Verify the MCP server allows calls within Mission A's `allowed_tools`. Verify it rejects calls outside `allowed_tools` even with a valid token. Verify it rejects a token with a `constraints_hash` that does not match the current Mission.

**Step 8 — Commit-boundary integration**

Add commit-boundary flow: host acquires MAS advisory lock via `POST /missions/{id}/commit-boundary/acquire`, presents step-up prompt to user, on confirmation issues `commit_intent_id`, passes it through to the downstream owner. The downstream owner rejects duplicate `commit_intent_id` values.

*Acceptance test:* trigger a gated action. Verify the host surfaces the step-up confirmation prompt before the tool executes. Verify that replaying the same `commit_intent_id` at the downstream owner returns a rejection. Verify that attempting the commit without acquiring the MAS lock fails.

**Step 9 — Signal rail**

Wire anomaly signal emission: host fires `POST /missions/{id}/signals` for the 8 anomaly types. MAS accumulates weights and transitions the Mission to `suspended_anomaly` when the threshold is exceeded. The capability snapshot reflects the suspension.

*Acceptance test:* emit repeated `repeated_denial` signals until the accumulated weight exceeds the suspension threshold. Verify MAS transitions the Mission to `suspended_anomaly`. Verify the next capability snapshot refresh returns a suspended state. Verify the host blocks further tool calls in the suspended state.

**Step 10 — Approval workflow**

Add the approval routing service: `POST /missions/{id}/submit-for-review` creates a work item. Approvers receive notification (email adapter first). `POST /approvals/work-items/{id}/approve` or `/reject` transitions the Mission. Auto-approval fires when all `auto` stage constraints are satisfied.

*Acceptance test:* submit a Mission for review. Verify a work item appears for the configured approver. Approve via the API. Verify Mission transitions to `approved`. Verify the approval object is stamped with approver identity and timestamp. Configure a template with `approval_required: auto` and verify a Mission created from it activates without requiring a manual approval call.

**Build order constraints:**
- Steps 1–2 are sequential (compiler depends on catalog)
- Steps 3–4 are sequential (snapshot depends on MAS lifecycle)
- Steps 5–6 can overlap (Cedar evaluation replaces the step 5 precheck; run step 5 first to get the wiring right, then swap in Cedar)
- Steps 7–10 are each independently shippable once steps 1–6 are complete
- Do not begin step 10 until step 3's lifecycle state machine is solid — the approval workflow directly manipulates Mission state

### Reference deployment topology

MAS is a logical authority role, not a single monolithic service. Splitting it into adjacent services is the correct production shape. Here is the reference topology:

```
┌─────────────────────────────────────────────────────────────────┐
│                        MAS Core                                  │
│  Governance records, compiled authority state, lifecycle FSM     │
│  Synchronous write path. Stateful. P0 availability.              │
│  APIs: POST /missions, GET /missions/{id}, POST /missions/{id}/  │
└─────────────┬───────────────────────────────┬───────────────────┘
              │ reads (sync)                  │ publishes bundles
              ▼                               ▼
┌─────────────────────────┐   ┌───────────────────────────────────┐
│  Bundle Distribution    │   │  Approval Workflow Service         │
│  CDN-backed, read-heavy │   │  Human review queue, routing,      │
│  GET /policy-bundle     │   │  notification delivery, work items │
│  Async pull on miss     │   │  POST /approvals/work-items/{id}/  │
└─────────────────────────┘   └───────────────────────────────────┘
              │                               │
              │ consumed by                   │ resolves to
              ▼                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Enforcement Points                             │
│  AS token endpoint, MCP servers, host precheck, commit boundary  │
│  Read bundle from distribution layer. Write signals to buffer.   │
└─────────────────────────────┬───────────────────────────────────┘
                              │ signals (async, fire-and-forget)
                              ▼
                ┌─────────────────────────┐
                │  Signal Ingestion Buffer │
                │  Write-heavy, async      │
                │  POST /signals           │
                │  Drains into MAS core    │
                └─────────────────────────┘
```

**Synchronous dependencies (must be available for operations to proceed):**
- MAS core: Mission creation, capability snapshot, commit-boundary lock, sub-agent lineage check
- Bundle distribution: first request against a new Mission (cold-start pull)

**Async dependencies (degradation is acceptable):**
- Bundle distribution after cold-start: subsequent requests use cached bundle
- Approval workflow: pending approval stays pending; existing active Missions continue
- Signal ingestion: signals buffer; MAS core processes when buffer drains

**What this means operationally:** MAS core and bundle distribution are P0. The approval workflow service and signal buffer can degrade without stopping active Mission execution. Size and replicate accordingly.

### MAS centrality mitigation

MAS has high conceptual centrality — almost every design decision references it. The mitigation is not to remove MAS but to classify which decisions actually require a live MAS call vs. which can operate on cached state or entirely locally. This classification determines the real blast radius of a MAS outage.

| Decision | MAS dependency class | Mitigation when MAS is unavailable |
|---|---|---|
| Mission creation | **synchronous required** — cannot defer | fail closed; no new Missions until MAS is reachable |
| Token issuance | **synchronous required** — AS must verify Mission status and `constraints_hash` | fail closed; no new tokens |
| Capability snapshot refresh | **synchronous at session start** — required once, then cached | use cached snapshot for up to TTL; refresh on next authority transition |
| `PreToolUse` allow/deny | **local** — evaluates against cached capability snapshot | allow low-risk reads up to cache TTL; deny writes and gated actions |
| `tools/call` enforcement at MCP | **local** — evaluates against cached entity snapshot | deny high-risk calls; allow reads up to entity snapshot TTL (120s) |
| Commit-boundary advisory lock | **synchronous required** — MAS owns the lock | fail closed; block commit-boundary action |
| Signal emission | **async, fire-and-forget** — MAS ingests from buffer | buffer locally; flush when reachable; no blocking on signal delivery |
| Approval state check | **synchronous on commit-boundary** — must verify approval is still current | fail closed if approval cannot be verified |
| Session budget counter report | **async** — host reports to MAS | queue locally; flush when reachable; enforce from local count |
| Anomaly suspension check | **async via signals** — suspension state arrives via snapshot refresh | host polls snapshot at session start; use last known state between polls |

**The practical blast radius of a MAS core outage:**

- Existing active sessions with valid cached snapshots continue for up to 120s for read operations
- Token issuance stops immediately (fail closed)
- Commit-boundary actions stop immediately (fail closed)
- No new Missions or session starts succeed
- Signal delivery buffers but does not block active sessions

This is the right failure mode: governance tightens under outage, never loosens. The 120s window for read operations is the designed residual risk window. Shorten it by reducing entity snapshot TTL, at the cost of higher MAS read traffic.

**How to reduce MAS centrality without changing the design:**

- replicate MAS core with strong consistency reads — most MAS reads are state queries, not writes
- put bundle distribution behind a CDN — the most common read (cached policy bundle) never hits MAS core
- pre-warm capability snapshots for known recurring sessions (e.g., daily scheduled agent jobs) so the session-start MAS call succeeds before the session begins
- use the signal buffer to absorb write spikes rather than hitting MAS core directly on every tool use

## Applying the research

The research linked in the earlier Mission-shaping discussion is not being used here as decoration. It drives specific design choices.

| Research insight | How it is applied here |
|---|---|
| intent is not authority | shaping output is only a proposal; MAS compilation and approval create real authority |
| open-ended language needs a constrained intermediate form | compiler produces `purpose_class`, `resource_classes`, `action_classes`, `stage_constraints`, and `delegation_bounds` |
| common work patterns should be reusable | templates capture normal authority envelopes, deny sets, stage gates, and approval routing |
| human approval only matters if it is legible | review packets and approval objects make the decision surface explicit |
| runtime drift matters after initial approval | `constraints_hash`, signal-driven invalidation, and commit-boundary rechecks keep authority live |
| identity bridging is not the same as authority | ID-JAG bridges identity across domains; MAS remains the source of Mission authority |

In other words, the research is showing up in three places:

1. **the compiler**
   turns ambiguous intent into a constrained authority candidate
2. **the approval layer**
   decides whether that candidate becomes real authority
3. **the runtime layer**
   keeps that authority current as execution changes

### Paper-specific contributions

The papers are contributing different pieces of the design.

| Source | Key idea we are taking | Where it appears in the design |
|---|---|---|
| [Intent-Based Access Control: Securing Agentic AI Through Fine-Grained Authorization of User Intent](https://ibac.dev/ibac-paper.pdf) | derive minimum permissions from user intent, enforce at every tool call, keep hard denies outside agent control | shaping prompt, compiler deny sets, per-call host/MCP enforcement, approval on denied-but-escalable requests |
| [Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching](https://arxiv.org/abs/2510.26702) | semantic matching should narrow delegated scopes, but matching gets harder as scope combinations grow | template matching, approval classification, AS-side evaluation before token issuance, audience-specific token projection |
| [Defeating Prompt Injections by Design](https://arxiv.org/abs/2503.18813) | separate trusted control flow from untrusted data flow; do not let retrieved content decide authority | trusted compiler inputs, proposal-versus-authority split, host and MCP enforcement wrappers, commit-boundary rechecks |

### How IBAC changes the compiler

The IBAC paper is the clearest argument for why the compiler has to produce a concrete authority envelope instead of broad ambient permissions.

We are applying four IBAC-style ideas directly:

1. **minimum capability derivation**
   The compiler starts from the narrowest candidate envelope implied by the request.
2. **specific resource binding**
   Approval and escalation should name exact tools, resources, or domains where possible.
3. **hard deny precedence**
   Organizational deny sets stay outside agent control and cannot be broadened by the model.
4. **tool-boundary enforcement**
   Every actual execution path goes through host or MCP authorization checks, not just prompt-time interpretation.

We are not copying IBAC's exact tuple model, but the design is using the same core idea: derive the narrowest actionable authority set and enforce it deterministically.

### How semantic task-to-scope matching changes approval

The semantic task-to-scope matching paper pushes on a real weakness: raw model matching becomes fragile as tasks need more scopes and more combinations.

That is why this design does not stop at "LLM guessed the right scopes."

Instead, it adds:

- purpose templates
- deterministic resource catalogs
- hard disqualifiers
- review packets
- approval modes

Semantic matching is the front end. It is not the final grant decision.

### How CaMeL changes runtime containment

CaMeL's main contribution is the insistence that untrusted retrieved content should not be able to influence trusted control flow.

That shows up here in three rules:

1. Mission authority is compiled from trusted inputs:
   - user request
   - trusted context
   - policy templates
   - resource catalog
2. untrusted tool output may affect reasoning, but not authority state
3. irreversible actions are rechecked at a non-bypassable commit boundary

That is why the note keeps separating:

- proposal from approval
- token from Mission
- model reasoning from authority state

### What we are not taking from the papers

The note is intentionally not doing a few things those papers might tempt readers to assume:

- the LLM is not the policy engine
- one semantic parse is not enough for the whole session
- prompt-injection defense is not treated as sufficient without deterministic enforcement
- cross-domain identity assertions do not carry Mission semantics

The design uses those papers to justify a stricter compiler and enforcement model, not to outsource authority decisions back to the model.

## Start With the Shaping Prompt

The first thing to implement is not token exchange. It is Mission shaping.

Here is a practical prompt for Claude or Codex that is good enough to start with.

```text
You are the Mission Shaper for an autonomous agent system.

Your job is to transform a user request into a structured Mission proposal for governance and enforcement.

Do not plan execution steps in detail. Do not invent permissions. Do not broaden intent.

Produce JSON with these fields:
- purpose: one-sentence statement of what the user is trying to accomplish
- summary: short human-readable description of the Mission
- requested_resource_classes: array of high-level resource classes the task appears to require
- requested_actions: array of high-level action classes
- requested_tools: array of tool categories or named tools if explicitly requested
- stage_constraints: array of stage gates or approval checkpoints implied by the request
- time_bounds: object with any obvious duration or deadline constraints
- delegation_bounds: object with whether sub-agents appear necessary and what limits should apply
- explicit_exclusions: array of actions or resource classes that are clearly out of scope
- open_questions: array of questions that must be answered before approval
- confidence: low | medium | high

Rules:
- Prefer narrower authority over broader authority.
- If the request implies external communication, payment, deletion, publication, or irreversible action, add a stage constraint requiring explicit approval before that action becomes real.
- If the request is ambiguous, put the ambiguity in open_questions rather than expanding authority.
- Do not output markdown. Output JSON only.
```

This prompt should not be the authority record. Its output is only the **proposal**.

**Pass the template list as context.** The shaper prompt above works but it guesses purpose classes blind. The host should fetch `GET /templates` at session start and inject the template summaries into the shaping prompt before calling the model:

```text
Available Mission templates for this organization:
- board_packet_preparation: reads finance and document systems; gates publication; auto-approved
- support_ticket_triage: reads/updates tickets, internal notes; partner ticket create; auto-approved for allowlisted partners
- sales_account_research: CRM read, document draft; no outbound; auto-approved
- engineering_release_drafting: issue tracker read, build metadata read, release note draft; auto-approved with publish gate
- vendor_due_diligence: vendor doc read, internal memo draft; no external send; auto-approved

When selecting purpose_class, use only the purpose class names listed above. If the user's request does not match any template, set purpose_class to null and add an open question explaining what type of work this is.
```

A shaper that receives this context will propose matching purpose classes on the first pass for ~90% of requests that fit known templates, dramatically reducing clarification loops and classification failures. The template list is not trusted input to the compiler — the compiler still validates the match — but it makes the shaper's output much more likely to be compilable on first attempt.

The shaping model is not a security control. The instructions say "do not broaden intent" but the model will still broaden intent for vague requests, ambiguous phrasing, or any user who writes a wide scope on purpose. A user who writes "help me with the quarterly finance work and also anything else that comes up" will get a broad proposal. The shaper output is untrusted input to the compiler. The compiler is the trust boundary. Hard denies, catalog resolution, and template matching are what constrain authority -- not the shaping instructions.

## How Intent Gets Compiled

The shaping prompt produces a proposal. The MAS still has to turn that proposal into enforceable state.

This is the part that most systems hand-wave. Do not.

The compiler should be a deterministic pipeline with explicit inputs, transforms, and outputs.

### Compiler contract

**Required inputs**
- Mission proposal
- request context
- resource catalog
- policy templates
- current authority state

**Processing**
- normalize
- classify
- constrain
- score for approval
- build review packet
- compile enforcement bundle
- compute `constraints_hash`

**Required outputs**
- governance record
- policy bundle
- token projections
- host enforcement hints

**Failure behavior**
- missing required input -> fail closed
- unknown resource class -> fail closed or route to clarification
- no matching template -> restrictive fallback or human review
- invalid compiled state -> do not emit `constraints_hash`

**Acceptance checks**
- same input set yields same compiled output
- denied resources do not appear in any token projection
- stage-gated actions appear in governance and enforcement outputs
- compiled hash changes whenever enforceable state changes

### Compiler inputs

The compiler should receive at least these inputs:

1. **Mission proposal**
   Output from the shaping model.
2. **request context**
   - user identity
   - agent host identity
   - channel or entry point
   - current session ID
   - tenant or organization
3. **resource catalog**
   The registry that maps real tools, APIs, datasets, folders, and domains into internal resource classes.
4. **policy templates**
   Organizational defaults for low-risk work, stage gates, partner-domain rules, data handling, and delegation.
5. **current authority state**
   Existing Missions, approvals, suspended domains, risk flags, and business events already attached to the user or agent.

If one of those inputs is missing, the compiler should fail explicitly rather than quietly broadening authority.

### Purpose classification algorithm

The compiler needs an explicit classification procedure for `purpose_class`.

Use this sequence:

1. the shaping model proposes the top 3 candidate purpose classes with confidence
2. the deterministic matcher scores each candidate against:
   - requested tools
   - requested actions
   - trust domains
   - keywords in the summary
   - historical template fit if allowed
3. apply tie-break rules:
   - prefer narrower purpose class
   - prefer enterprise-local over cross-domain when both fit
   - prefer templates with fewer allowed actions
4. if the top score is below the confidence threshold, route to clarification
5. if the top two scores are too close, route to clarification or review

Minimum scoring contract:

```json
{
  "candidate_purpose_classes": [
    {"purpose_class": "board_packet_preparation", "model_score": 0.81, "rule_score": 0.92},
    {"purpose_class": "financial_analysis", "model_score": 0.74, "rule_score": 0.66}
  ],
  "selected_purpose_class": "board_packet_preparation",
  "selection_reason": "narrower matching template with stronger tool fit"
}
```

Required behavior:

- no classifier result may broaden authority by itself
- unknown or low-confidence classifications must fail into clarification or review
- classification output must be persisted for audit

#### Purpose classifier inputs

The classifier should run on a fixed input object, not on raw prose alone.

Minimum input:

```json
{
  "proposal_summary": "Prepare the Q2 board packet comparing actuals to plan",
  "requested_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "requested_actions": ["read", "summarize", "draft"],
  "requested_domains": ["enterprise"],
  "explicit_stage_constraints": ["release_gate"],
  "user_channel": "claude_code",
  "tenant_id": "acme"
}
```

The classifier should not read runtime tool output, prior agent chain-of-thought, or arbitrary retrieved content. Classification is based on trusted shaping output plus trusted catalog context.

#### Purpose classifier thresholds and tie-breaks

Use explicit thresholds so the implementation behaves predictably.

| Condition | Result |
|---|---|
| top candidate `rule_score >= 0.80` and margin from second candidate `>= 0.10` | accept top candidate |
| top candidate `rule_score >= 0.65` and candidate maps to exact template/tool fit | accept with `review_recommended = false` |
| top candidate below threshold | `clarification_required` |
| top two candidates within margin | `clarification_required` |
| no candidate maps to known template family | `human_step_up` or restrictive fallback |

**These thresholds are calibration anchors, not production values.** The specific numbers (0.80, 0.65, 0.10) have no empirical basis. A greenfield deployment has no request distribution data to derive them from. In practice, the first few hundred real requests will surface patterns that score at 0.78 (just below threshold) for routine work, or 0.82 for genuinely ambiguous requests. Treat these numbers as the starting shape and budget explicit calibration time -- collect rejected/clarification outcomes, compare against what a human would have done, and adjust thresholds before using the classifier for auto-approval decisions. The numbers must be fixed in configuration and versioned. Do not bake them into code.

**Tie-break rule:** when two candidates are within margin, the tie-break must be deterministic:

1. select the candidate with the fewest allowed tools in the matched template
2. if still tied, select the candidate with the lower aggregate risk score for the matched template
3. if still tied, `clarification_required`

"Prefer narrower purpose class" is not a decision rule. Narrower allowed-tool count is.

#### Purpose classifier output contract

The compiler should persist the full classification result, not only the selected class.

Minimum stored result:

```json
{
  "classification_version": "purpose-classifier-v2",
  "selected_purpose_class": "board_packet_preparation",
  "review_recommended": false,
  "clarification_required": false,
  "candidate_rankings": [
    {
      "purpose_class": "board_packet_preparation",
      "model_score": 0.81,
      "rule_score": 0.92,
      "matched_templates": ["board_packet_low_risk_v3"]
    },
    {
      "purpose_class": "financial_analysis",
      "model_score": 0.74,
      "rule_score": 0.66,
      "matched_templates": ["finance_analysis_internal_v2"]
    }
  ],
  "selection_reason": "narrower matching template with stronger tool fit"
}
```

### Compiler outputs

The compiler should produce four concrete outputs, not one vague "approved Mission":

1. **Governance record**
   The approved or pending Mission that humans and operators reason about.
2. **Policy bundle**
   Cedar-friendly principal, resource, action, and context projections plus the policy set or template-linked policies to evaluate them.
3. **Token projections**
   The narrow claim sets that an AS can embed into MCP transport tokens and direct API tokens.
4. **Host enforcement hints**
   Action mappings, commit-boundary flags, safe-mode rules, and signal thresholds that the agent host can apply before a tool call ever leaves the session.

That split matters. The Mission record is what was proposed and approved. The policy bundle is what gets evaluated. The token projection is what gets carried. The host hints are what make containment immediate.

### Step-by-step compiler pipeline

> **Minimum viable compiler for Phase 0:** If you are building the first version, you do not need all 10 steps immediately. Start with three:
> 1. **Resolve tools** — look up every requested tool in the catalog; fail closed on anything not found
> 2. **Match template** — check whether the resolved tool set fits one known template; fail if no match
> 3. **Compute hash** — SHA-256 of `{allowed_tools, stage_constraints}` sorted canonical JSON
>
> That is enough to prove the compiler is deterministic and fails closed — which is the Phase 0 exit gate. Add the remaining steps as you scale. The full pipeline below is the target, not the starting point.

Use a fixed sequence like this.

#### Step 1: assign stable identifiers

The compiler should first mint or attach:

- `mission_id`
- request correlation ID
- session ID
- tenant / org ID
- initial `proposal_hash`

Nothing downstream should rely on raw prompt text as the stable identifier.

#### Step 2: normalize free-form fields

Convert model-produced prose into stable internal categories.

For example:

- `Prepare the Q2 board packet` -> `purpose_class = board_packet_preparation`
- `pull final numbers` -> `action_classes = ["read", "summarize", "draft"]`
- `call me before releasing` -> `stage_constraint = release_gate`
- `might need help from another agent` -> `delegation_bounds.subagents_allowed = true`

Normalization should use:

- internal enums
- catalog lookups
- deterministic mapping rules
- reviewable mapping tables

Do not let every request invent new resource or action categories.

#### Step 3: resolve concrete resource classes

Take the requested tools or implied systems and resolve them through the resource catalog.

Example:

- `erp.read_financials` -> `finance.read`
- `docs.write` -> `documents.write`
- `docs.publish` -> `publication.external`

At this stage, also attach:

- data sensitivity class
- trust domain
- partner-domain classification
- whether the resource can create an irreversible effect

This is where vague intent becomes real surface area.

### Resource catalog schema

The resource catalog needs a stable schema. Without it, compiler outputs will drift.

Minimum catalog record:

```json
{
  "resource_id": "mcp__finance__erp.read_financials",
  "resource_type": "tool",
  "resource_class": "finance.read",
  "trust_domain": "enterprise",
  "data_sensitivity": "internal_financial",
  "commit_boundary": false,
  "aliases": ["erp.read_financials", "finance.actuals.read"],
  "allowed_action_classes": ["read"],
  "owner": "finance-platform"
}
```

Required behavior:

- every executable tool, API, dataset, folder, and partner domain must have a stable record
- aliases must resolve to one canonical `resource_id`
- unknown resources must fail closed
- catalog changes must be versioned and auditable

#### Resource catalog ownership and onboarding

The catalog needs an operational owner. Otherwise the compiler will drift from reality.

Use this ownership split:

| Field | System of record | Owner |
|---|---|---|
| `resource_id`, aliases | resource catalog | platform security or IAM |
| `resource_class`, allowed action classes | resource catalog | tool or API owning team |
| `trust_domain`, partner classification | resource catalog | IAM / federation team |
| `data_sensitivity` | data catalog or resource catalog | data governance |
| `commit_boundary` | resource catalog | tool or API owning team with security review |

Onboarding uses a tiered model. Not every tool needs a full security review before it can be used:

**Tier 1 — self-attestation (low-sensitivity internal tools)**

An owning team submits a catalog record via `POST /catalog/resources`. It enters with `status: provisional` automatically if it meets all of:

- `trust_domain: enterprise` (internal network only)
- `data_sensitivity: internal` or lower
- `commit_boundary: false`
- no external API calls

A `provisional` record can be used by the compiler for low-risk Mission templates only (templates with aggregate risk score below the medium threshold). The compiler must record that a provisional resource was used; this is surfaced in the approval packet.

Full security review is still required before:
- the record reaches `status: approved`
- the resource is usable in medium-risk or high-risk Mission templates
- `trust_domain`, `commit_boundary`, or `data_sensitivity` change

**Tier 2 — full review (everything else)**

1. new tool or API cannot be used until it has an `approved` catalog record
2. owning team must declare action classes, trust domain, and commit-boundary behavior
3. security or IAM must approve the trust-domain and sensitivity mapping
4. compiler consumes only the published catalog version

**What self-attestation does not change:** the compiler still fails if a resource is not in the catalog at all. "Not in catalog" is not the same as "provisional". Self-attestation fast-path requires that the record exists; it just allows provisional use in bounded cases.

#### Resource catalog service API

The catalog must be a queryable service, not a static file. The compiler and MCP servers need to resolve resources at compilation time and at runtime without reading a local file.

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/catalog/resources/{resource_id}` | fetch one catalog record by canonical ID or alias |
| `GET` | `/catalog/resources?resource_class={class}` | list all resources in a class |
| `GET` | `/catalog/version` | fetch current catalog version and hash |
| `POST` | `/catalog/resources` | submit a new catalog record for review |
| `PUT` | `/catalog/resources/{resource_id}` | update an existing record (triggers re-review if trust-domain or commit-boundary changes) |

Minimum `GET /catalog/resources/{resource_id}` response:

```json
{
  "resource_id": "mcp__finance__erp.read_financials",
  "resource_type": "tool",
  "resource_class": "finance.read",
  "trust_domain": "enterprise",
  "data_sensitivity": "internal_financial",
  "commit_boundary": false,
  "aliases": ["erp.read_financials", "finance.actuals.read"],
  "allowed_action_classes": ["read"],
  "owner": "finance-platform",
  "mcp_server_url": "https://mcp.finance.internal",
  "catalog_version": "2026-04-11",
  "status": "approved"
}
```

The `mcp_server_url` field closes the gap between catalog records and MCP token audiences. The compiler uses it to determine which AS audience to request a token for when a resource is included in a Mission.

**`mcp_server_url` must be verified against the AS audience registry.** The AS maintains a registered set of known audiences. Before the compiler emits a token request for a given `mcp_server_url`, it must verify the URL is registered as a valid audience in the AS. An unregistered URL fails compilation.

Required behavior:

- `mcp_server_url` changes to an existing catalog record must trigger the same review path as `commit_boundary` or `trust_domain` changes — the record returns to `pending_review` and cannot be used until re-approved
- the AS audience registry and the catalog `mcp_server_url` field must be reconciled during catalog approval: a catalog record cannot reach `approved` status if its `mcp_server_url` is not registered in the AS
- the compiler must reject any `mcp_server_url` that is not in the AS registered audience set, even if the catalog record is `approved`

This prevents a compromised or misconfigured catalog record from redirecting valid Mission-scoped tokens to an attacker-controlled server.

#### AS audience registry spec

The audience registry is owned by the AS team, not the catalog team. This is an intentional separation: the catalog team governs what resources exist; the AS team governs which servers can receive Mission-scoped tokens. A catalog team member with write access to the catalog must not be able to unilaterally register a new token audience.

**Component ownership:** the audience registry is a service operated by the AS team. It is not a field in the resource catalog. The catalog record's `mcp_server_url` is a reference into the registry, not the registry itself.

**Registration API:**

```
POST /audiences
Authorization: Bearer <AS admin token>
```

Request body:

```json
{
  "url": "https://mcp.finance.internal",
  "display_name": "Finance Platform MCP Server",
  "owning_team": "finance-platform",
  "technical_contact": "finance-platform-oncall@example.com",
  "environment": "production",
  "registration_reason": "Initial registration for Finance ERP tool surface"
}
```

Response:

```json
{
  "audience_id": "aud_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "url": "https://mcp.finance.internal",
  "status": "pending_verification",
  "registered_at": "2025-10-01T10:00:00Z",
  "registered_by": "identity-team-admin"
}
```

Status transitions: `pending_verification → verified` (after AS team confirms the URL is reachable, TLS-valid, and controlled by the owning team) or `pending_verification → rejected`.

**Required fields for registration:**

| Field | Type | Purpose |
|---|---|---|
| `url` | string | exact URL as it will appear in the token `aud` claim |
| `display_name` | string | human-readable name for admin dashboards |
| `owning_team` | string | team responsible for the server |
| `technical_contact` | string | email or on-call handle for incidents |
| `environment` | enum: `production`, `staging`, `development` | staging/development URLs must not be accepted for production catalog records |
| `registration_reason` | string | why this server needs to receive Mission-scoped tokens |

**Verification process (what the AS team does before `verified`):**

1. Confirm the URL is reachable over HTTPS with a valid certificate matching the domain
2. Confirm the owning team has signing authority over the domain (DNS or certificate ownership check)
3. Confirm the environment matches (production URL must not resolve to a staging host)
4. Record verification evidence (checked by, checked at, evidence type)

Automated verification can handle steps 1 and 3. Step 2 and step 4 require manual confirmation for the initial registration. URL changes always require re-verification.

**Catalog approval gate:**

The catalog approval workflow must call `GET /audiences?url={mcp_server_url}` before approving any catalog record. If the URL is not present with `status: verified`, the catalog record must remain in `pending_review`. The catalog approval UI should surface this as a blocking check: "mcp_server_url not registered in audience registry — contact the AS team."

**Change control for URL changes:**

When a catalog record's `mcp_server_url` changes:

1. The catalog record returns to `pending_review`
2. The new URL must be registered in the audience registry and reach `verified` status before the catalog record can be re-approved
3. The old URL registration is not deactivated automatically — it remains `verified` until the AS team explicitly deactivates it (there may be other catalog records using the same URL)
4. Token requests using the old URL continue to succeed until the catalog record is re-approved with the new URL

**What the registry prevents:**

An attacker with catalog write access cannot redirect tokens to a server they control by changing `mcp_server_url` in the catalog — the change triggers re-review, and the new URL must pass AS verification before it can receive tokens. The catalog record is stuck in `pending_review` until the registry entry is verified, which requires AS team involvement.

**List endpoint:**

```
GET /audiences
GET /audiences?url={url}        (exact URL match)
GET /audiences?owning_team={team}
```

Returns all audience registrations visible to the caller's tenant. AS admins see all; catalog owners see records matching their team's owned catalog entries.

Minimum `GET /catalog/version` response:

```json
{
  "version": "2026-04-11T10:00:00Z",
  "version_hash": "sha256-catalog-abc",
  "record_count": 142
}
```

The compiler should pin the catalog version used for each compilation run and record it alongside the `constraints_hash`. If the catalog version used changes, the same proposal may compile differently.

Required behavior:

- `GET /catalog/resources/{id}` must accept both canonical `resource_id` and any registered alias
- catalog records with `status: pending_review` must not be used by the compiler
- catalog changes to `trust_domain`, `commit_boundary`, or `data_sensitivity` must require a new security review before `status` returns to `approved`
- the compiler must record the `catalog_version` used in each compilation output

#### Resource resolution algorithm

Resolution should be deterministic.

Use this order:

1. exact canonical `resource_id` match
2. exact alias match
3. product-local alias namespace match
4. fail closed

Required behavior:

- fuzzy matching must not be used for executable resources
- if two aliases resolve to different canonical resources, compilation fails
- if a requested tool name maps to a resource with no declared action class for the requested action, compilation fails

#### Step 4: compute the baseline authority envelope

Build the narrowest authority envelope implied by the request before policy broadens or narrows it.

For a board-packet Mission, that might be:

```json
{
  "purpose_class": "board_packet_preparation",
  "resource_classes": ["finance.read", "documents.read", "documents.write"],
  "action_classes": ["read", "summarize", "draft"],
  "candidate_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "candidate_commit_boundaries": ["docs.publish", "email.send_external"],
  "candidate_domains": ["enterprise"]
}
```

This object should still be thought of as a candidate envelope, not yet approved authority.

#### Step 5: apply organizational constraints

Now apply policy templates and environment rules.

Examples:

- deny by default:
  - `hr.read`
  - `treasury.transfer`
  - `email.send_external`
- require stage gate:
  - `external_send`
  - `final_publish`
- reduce maximum time:
  - request asked for seven days
  - org policy caps this purpose class at eight hours
- reduce delegation:
  - request asked for sub-agents
  - org policy allows max depth `1`

This is the first point where organizational policy changes the candidate envelope.

#### Step 6: score for approval mode

The compiler should now produce an **approval classification**.

A practical scoring shape is:

- `risk_level`
  - low
  - medium
  - high
- `auto_approval_eligible`
  - true | false
- `required_human_approvals`
  - list of approval types
- `required_open_questions`
  - unresolved ambiguities that block activation

For example:

```json
{
  "risk_level": "medium",
  "auto_approval_eligible": true,
  "required_human_approvals": [],
  "required_open_questions": []
}
```

or:

```json
{
  "risk_level": "high",
  "auto_approval_eligible": false,
  "required_human_approvals": ["controller_approval"],
  "required_open_questions": ["Which external recipients are in scope?"]
}
```

This classification should be explainable. Do not emit only a boolean.

### Approval decision function

Approval mode should be computed by a deterministic function, not by intuition.

Use this decision order:

1. **hard deny check**
   If any hard policy deny matches, result is `denied`.
2. **clarification check**
   If required fields or open questions remain unresolved, result is `clarification_required`.
3. **hard disqualifier check**
   If the request contains any non-auto-approvable element, result is `human_step_up`.
4. **risk threshold check**
   If no hard disqualifier applies, compare aggregate risk score to the auto-approval threshold.

Minimum risk factors (qualitative signal categories):

| Factor | Example | Effect |
|---|---|---|
| external domain | new partner API | increase risk |
| sensitive data | HR, payroll, legal | increase risk |
| destructive action | delete, publish, pay | increase risk |
| privileged action | admin or infra change | increase risk |
| delegation depth | child or grandchild agent request | increase risk |
| long duration | authority requested beyond template default | increase risk |

This table names the signal categories. The scoring model below assigns concrete numeric weights to each category so that the aggregate score can be compared against a threshold.

Required behavior:

- hard denies override every other rule
- hard disqualifiers override aggregate score
- risk score must be explainable in the review packet
- approval mode must be persisted with the evidence record

#### Approval risk tier model

Assign each Mission to a risk tier based on the signals present. The tier determines the default approval mode. This replaces numeric scoring, which creates false precision without empirical calibration data.

**Tier assignment — use the highest-matching tier:**

| Tier | Assigned when the Mission contains any of... | Default mode |
|---|---|---|
| **LOW** | enterprise-internal tools only; no commit-boundary actions; no external domains; data sensitivity ≤ `internal` | `auto` |
| **MEDIUM** | write actions; known approved partner domain; delegation depth = 1; data sensitivity = `restricted`; duration above template default | `auto` if template matches and no hard disqualifier; else `human_step_up` |
| **HIGH** | external send, publish, or payment; new or unclassified external domain; destructive action; sensitive data (HR, payroll, legal hold); privileged or admin action | `human_step_up` |
| **BLOCKED** | hard deny trigger; no valid approver route; open question unresolved after limit | `denied` |

Assignment is worst-case: if any signal maps to HIGH, the Mission is HIGH regardless of other signals. BLOCKED always overrides.

**Approval TTL defaults by tier:**

| Tier | Approval TTL |
|---|---|
| LOW | no approval object needed; auto-activation |
| MEDIUM step-up | 4 hours |
| HIGH publish/send | 1 hour |
| HIGH payment | 15 minutes |

**Hard deny default set** (always BLOCKED regardless of tier):

- `treasury.transfer`
- `admin_change` on production systems
- `delete` on regulated data
- new external domain with destructive scope

Hard disqualifiers bypass tier assignment entirely.

This tier model encodes the same logic as a scoring table but without fabricating numeric precision. The tiers can be calibrated by observing whether MEDIUM Missions are actually being escalated appropriately vs. auto-approving — a much simpler calibration task than tuning numeric thresholds.

#### Approval work-routing rules

Approval also needs explicit routing.

Minimum routing contract:

| Condition | Approver type |
|---|---|
| external publication | business owner or controller |
| payment or funds movement | finance approver |
| admin or infrastructure change | platform approver |
| partner-domain access without standing template | domain owner or security approver |
| high-sensitivity regulated data | data owner |

If no valid approver route exists, result is `denied`.

#### Default approver groups

If the deployment does not already have named approver groups, start with:

| Approval type | Default approver group |
|---|---|
| `controller_approval` | finance controller or delegated finance approver |
| `data_owner_approval` | owner of the dataset or business system |
| `security_approval` | security operations or platform security |
| `platform_approval` | platform engineering owner for admin or infra changes |
| `external_send_approval` | business owner for the target audience |

#### Step 7: build the review packet

Before approval, build the object a human or policy engine will actually review.

That packet should include:

- Mission summary
- purpose class
- requested tools and resource classes
- denied items
- stage-gated items
- trust domains involved
- delegation request
- time bounds
- open questions
- risk explanation
- recommended approval path

For example:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "summary": "Prepare Q2 board packet comparing actuals to plan",
  "purpose_class": "board_packet_preparation",
  "allowed_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "denied_tools": ["email.send_external"],
  "stage_gated_tools": ["docs.publish"],
  "trust_domains": ["enterprise"],
  "delegation_bounds": {"subagents_allowed": false, "max_depth": 0},
  "time_bounds": {"expires_at": "2026-04-11T23:59:59Z"},
  "risk_level": "medium",
  "recommended_path": "auto_with_release_gate"
}
```

This packet is the bridge between model-shaped intent and enforceable authority.

The review packet must also include a **purpose class confirmation surface**. Before the packet is submitted for approval — auto or human — the compiler must check that the classified purpose class is consistent with the user's original summary. The check is not a model call; it is a structural check:

- every approved tool must appear in the matched template's tool set or an explicit delta justification must be present
- every denied tool must be traceable to a hard deny in the template or to an explicit scope exclusion
- if the user's summary contains signals that the template hard-denies (e.g., summary says "send to our auditors" and template denies `send_external`), the compiler must emit an `open_question` asking the user to confirm their intent, not silently proceed on the mis-match

This does not catch all misclassifications — a confident misclassification that fits the template will still pass. But it catches the case where the classified template is structurally incompatible with the user's stated intent and prevents auto-approval from proceeding on an intent that the approved tools can never satisfy.

#### Step 8: compile the enforcement bundle

Only after the review packet exists should the compiler emit enforceable artifacts:

- Cedar entities
- Cedar context template
- approved tools
- stage constraints
- domain list
- delegation bounds
- host hints
- token projection templates

#### Step 9: compute `constraints_hash`

Hash the compiled enforcement state, not the raw prompt and not the review UI payload.

The hash should cover:

- allowed tools
- allowed resource classes
- action classes
- stage constraints
- delegation bounds
- time bounds
- trust domains
- approval requirements

**Computation specification:**

1. Serialize the above fields as a single JSON object with keys sorted lexicographically at every nesting level.
2. Serialize arrays in their stable canonical order (tools and resource classes sorted alphabetically, stage constraints sorted by name).
3. Compute SHA-256 over the UTF-8 bytes of the serialized string.
4. Encode as `sha256-<hex>`.

The determinism requirement is strict: the same compiled Mission state must produce the same `constraints_hash` on every call and on every node. Any field that is excluded from the hash input is not part of the enforceable state and cannot be used as a versioning signal.

That makes `constraints_hash` a version handle for what is actually enforceable.

#### Step 9b: validate compiler output against approval criteria

`constraints_hash` proves that the enforcement bundle is self-consistent and reproducible. It does not prove the bundle is correct. Add an independent validation pass after compilation, before persistence:

1. **Scope containment check**: verify that every tool in the compiled `allowed_tools` set appears in the matched template's allowed set or in an explicit approved delta. Any tool that appears in the bundle but not in the template or delta is a compiler error — fail compilation.
2. **Hard deny check**: verify that every tool in the matched template's hard deny list is absent from `allowed_tools`. A compiler bug that fails to apply a hard deny must be caught here.
3. **Risk score consistency check**: verify that the compiled `risk_level` is consistent with the signals present. If the bundle contains any signal that is a hard disqualifier for auto-approval (e.g., `commit_boundary: true` on any tool, external domain, destructive action), the compiled `approval_mode` must not be `auto`.
4. **Stage gate completeness check**: verify that every tool in the bundle with `commit_boundary: true` has a corresponding stage gate entry in `stage_constraints`.

If any check fails, compilation fails. Log the specific check failure as a `compiler.validation_error` event. Do not emit a partial bundle.

This validation pass is the thing that catches a compiler bug before it produces an incorrectly permissive hash. A hash is only as safe as the output it was computed from.

#### Step 10: persist all three layers

Persist:

1. proposal and review artifacts
2. governance record
3. compiled enforcement bundle

You need all three later:

- proposal for audit and dispute resolution
- governance record for operators and humans
- enforcement bundle for host, AS, and MCP consumers

### Board-packet compilation example

For the board-packet example, the pipeline should do something like this:

1. shape prompt output into a structured proposal
2. map `Prepare a board packet` to `purpose_class = board_packet_preparation`
3. resolve tools:
   - `erp.read_financials`
   - `docs.read`
   - `docs.write`
4. classify domains:
   - all enterprise-local
5. apply defaults:
   - deny `email.send_external`
   - deny `hr.read`
   - deny `treasury.transfer`
6. apply stage gates:
   - `controller_approval` required for `docs.publish`
7. cap delegation:
   - `subagents_allowed = true`
   - `max_depth = 1`
8. emit:
   - one governance record
   - one Cedar policy bundle
   - one host hint bundle
   - one token projection template for finance MCP
   - one token projection template for docs MCP

That compiler is where local policy and Mission shaping meet. Do not skip it.

### Cedar Policy Bundle Distribution

The compiler produces a Cedar policy bundle. That bundle must reach the AS, MCP servers, and agent host before they can evaluate Cedar locally. The distribution model is pull-on-invalidation:

1. Each consumer caches the current bundle keyed by `mission_id` and `constraints_hash`.
2. When a consumer receives a `constraints_hash` it does not recognize — from a token claim, a capability snapshot response, or a signal — it fetches the new bundle from the MAS.
3. The MAS exposes a bundle fetch endpoint:

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/missions/{mission_id}/policy-bundle?hash={constraints_hash}` | fetch compiled Cedar bundle for a specific Mission version |

Minimum response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "cedar_entities": [...],
  "cedar_policies": "permit(...) when {...};\nforbid(...) unless {...};",
  "valid_until": "2026-04-11T20:00:00Z"
}
```

Required behavior:

- consumers must not use a bundle whose `constraints_hash` differs from the hash on the current token or planning response
- the MAS must reject bundle fetch requests for revoked Missions
- bundle fetch must be authenticated; the consumer must present its own service identity
- `valid_until` is a hint for cache TTL, not a security boundary; commit-boundary checks must always fetch fresh state

**Cold-start bootstrap:** every new Mission creates a guaranteed cache miss at every enforcement point. To avoid a pile of synchronous bundle fetch calls on the first request, MAS should push a lightweight activation hint to registered enforcement points when a Mission moves to `active`. The hint is not the full bundle — it is just a notification that a new Mission is live, so enforcement points can prefetch in the background:

```json
{
  "event": "mission.activated",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "template_class": "board_packet_v1",
  "prefetch_url": "/missions/mis_01JR9S4YDY6QF5Q9Q54M0YB4V1/policy-bundle"
}
```

Enforcement points that receive this signal before the first request arrives can prefetch the bundle and warm the cache before any tool call comes in. Enforcement points that don't receive the signal (or are newly brought online) still work correctly — they take the cold-start cache miss and pull on first request. The push is an optimization, not a requirement for correctness.

**Propagation lag tolerance:** consumers may serve requests from a cached bundle for up to the host policy cache TTL (30-120 seconds) after a `constraints_hash` change, for low-risk reads only. Token issuance and commit-boundary actions must fetch the current bundle before proceeding. This means two MCP server nodes may briefly diverge on policy for low-risk reads — that is acceptable. They must not diverge on destructive or gated actions.

**Entity snapshot TTL:** entity snapshots are cached separately from tokens. A token has its own `expires_in`. An entity snapshot must have its own independent TTL that is not derived from the token lifetime. Required values:

| Cache type | Maximum TTL | On `constraints_hash` mismatch |
|---|---|---|
| Entity snapshot (low-risk reads) | 120 seconds | pull fresh snapshot immediately |
| Entity snapshot (token issuance) | 0 seconds (always fresh) | required fresh pull |
| Entity snapshot (commit-boundary) | 0 seconds (always fresh) | required fresh pull |

A token with a 15-minute lifetime does not grant a 15-minute entity snapshot validity. The entity snapshot expires independently at 120 seconds. An enforcement point that holds a valid token against a 121-second-old snapshot must pull a fresh snapshot before evaluating, even if the token is valid. This is the only way a narrowing amendment becomes effective at enforcement points before the token expires.

### What the MAS Stores

After shaping, the MAS should store two related objects:

1. **Approved Mission**
2. **Mission Authority Model**

The approved Mission is the governance record. The Mission Authority Model is the thing enforcement systems can actually consume.

Start with a narrow schema.

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "principal": {
    "user_id": "user_123",
    "agent_id": "agent_research_assistant"
  },
  "purpose": "Prepare a board packet comparing Q2 actuals against plan.",
  "resource_classes": ["finance.read", "documents.read", "documents.write"],
  "actions": ["read", "summarize", "draft"],
  "approved_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "stage_constraints": [
    {
      "name": "release_gate",
      "required_approval": "controller_approval",
      "applies_to": ["external_send", "final_publish"]
    }
  ],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0,
    "inherit_by_default": false
  },
  "time_bounds": {
    "expires_at": "2026-04-11T23:59:59Z"
  },
  "status": "active",
  "constraints_hash": "sha256-abc123"
}
```

This is enough to be useful. It does not need to be a universal policy language.

### Mission Approval Paths

The MAS should not do "approval" as a single yes/no. It should run a fixed approval procedure and record every decision point.

### Approval contract

**Required inputs**
- compiled review packet
- approval templates
- hard disqualifier rules
- current authority state
- approver routing rules

**Processing**
- check clarification blockers
- match template
- evaluate hard disqualifiers
- choose approval mode
- activate or create review work item
- record approval basis

**Required outputs**
- approval mode
- Mission status
- approval work item or active Mission
- approval evidence
- current `constraints_hash`

**Failure behavior**
- unresolved open question -> `pending_clarification`
- hard deny -> `denied`
- no valid approver route -> `denied`
- expired review item -> remain non-active

**Acceptance checks**
- auto-approved Missions always identify approving policy
- step-up Missions always identify approver type and reason
- no gated token issuance occurs while status is non-active
- approval evidence references the exact Mission version approved

The minimum useful paths are:

1. **organizational auto-approval**
2. **human step-up approval**

| Approval Mode | Trigger Shape | Mission Status | Token Issuance | Human In Loop |
|---|---|---|---|---|
| `auto` | request fits approved template and has no hard disqualifiers | `active` | allowed immediately | no |
| `human_step_up` | request crosses policy threshold but may still be allowed | `pending_approval` | blocked for gated actions | yes |
| `clarification_required` | request is ambiguous or incomplete | `pending_clarification` | blocked | maybe later |
| `denied` | request violates hard policy or lacks valid path | `denied` | blocked | optional review |

### Step-by-step approval procedure

Use an ordered approval flow like this.

#### Step 1: check for blocking open questions

Before approval, the MAS should ask:

- are there unresolved open questions from shaping?
- are required fields missing?
- is the target domain ambiguous?
- is the external recipient set unknown?

If yes:

- status = `pending_clarification`
- no Mission-scoped token issuance
- host remains in restricted planning mode

Do not auto-approve an ambiguous Mission.

**`pending_clarification` exit rules:**

A Mission must not stay in `pending_clarification` indefinitely. Required constraints:

| Rule | Value |
|---|---|
| Maximum clarification rounds | 3 |
| Clarification timeout | 30 minutes from last user interaction |
| Maximum `pending_clarification` lifetime | 24 hours from initial submission |

After the maximum rounds or timeout, the Mission transitions to `denied` automatically. The MAS emits `mission.denied` with `reason: clarification_timeout` or `reason: clarification_rounds_exceeded`.

**Who can close an open question:** only the authenticated user who submitted the Mission (matched by `user_id` from the original `POST /missions` request context). The shaper model cannot close open questions — it can suggest answers in a clarification prompt, but the answers must come from the user via `POST /missions/{id}/clarify`. This is the adversarial-model-safe rule: if a prompt injection causes the shaper to add spurious open questions, only the real user can resolve them, and the Mission expires if they don't.

**Open question count limit:** if the shaper emits more than 5 open questions on a single Mission proposal, the MAS must treat this as a classification failure and return `status: denied` with `reason: excessive_ambiguity` rather than entering `pending_clarification`. A proposal with 5+ unresolved questions is not a Mission candidate — it is an unformed request.

#### Step 2: match against an approval template

Take the compiled review packet and try to match it against an organizational template.

A template might say:

- purpose class = `board_packet_preparation`
- domains = `enterprise` only
- actions subset of `read`, `summarize`, `draft`
- tools subset of `erp.read_financials`, `docs.read`, `docs.write`
- no destructive action
- no external communication
- max runtime = 8 hours

If all of those match, the Mission can stay on the auto-approval path.

#### Step 3: evaluate hard disqualifiers

Even if a template mostly matches, the MAS should immediately disqualify auto-approval for:

- external publication
- new partner domain
- payment or funds movement
- destructive deletion
- admin or privileged infrastructure change
- high-sensitivity data access outside standing policy
- delegation depth beyond policy maximum

This avoids "mostly low-risk" requests sneaking through as auto-approved.

#### Step 4: determine approval mode

At this point the MAS should emit one of:

- `approval_mode = auto`
- `approval_mode = human_step_up`
- `approval_mode = clarification_required`
- `approval_mode = denied`

This should be an explicit field in the governance record, not an inferred state.

#### Step 5: if auto-approved, activate immediately

For auto-approved Missions, the MAS should persist:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "approval_mode": "auto",
  "approved_by": "org_policy:board_packet_low_risk_v3",
  "approved_at": "2026-04-11T10:35:00Z",
  "status": "active",
  "constraints_hash": "sha256-abc123"
}
```

Then it should emit:

- active Mission record
- policy bundle
- host hint bundle
- token projection templates

This should be common for low-risk read, summarize, and draft work inside known enterprise boundaries.

#### Step 6: if human approval is required, create an approval work item

For step-up cases, the MAS should create a review object for the human approver.

That object should include:

- Mission summary
- exact tools and resource classes requested
- denied items
- stage-gated items
- trust domains involved
- delegation requested
- time bounds
- why auto-approval failed
- recommended approve / deny decision

For example:

```json
{
  "review_id": "rev_01JR9S9D1H",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "approval_mode": "human_step_up",
  "required_approval": "controller_approval",
  "reason": "external publication requested",
  "allowed_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "gated_tools": ["docs.publish"],
  "denied_tools": ["email.send_external"],
  "risk_level": "high"
}
```

#### Step 7: move Mission into a controlled pending state

When step-up is required, the MAS should persist:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "approval_mode": "human_step_up",
  "status": "pending_approval",
  "required_approval": "controller_approval",
  "reason": "external publication requested"
}
```

While pending:

- the host should block tool calls outside a small safe set
- the AS should refuse Mission-scoped token issuance for gated actions
- the MCP layer should treat the Mission as non-active
- sub-agent derivation should be blocked unless separately approved

#### Step 8: on human decision, emit a state transition

If the human approves:

- attach approval object
- update status to `active`
- if approval changes enforceable state, emit a new `constraints_hash`
- invalidate host, AS, and MCP caches

If the human denies:

- status = `denied`
- no Mission-scoped token issuance
- host exits restricted planning mode and surfaces denial

#### Step 9: record approval basis

Whether approval was automatic or human, the MAS should always record:

- who or what approved it
- which policy or review object was used
- exact approval timestamp
- any stage gates that still remain
- the `constraints_hash` that was approved

That is the audit basis for every downstream action.

### Auto-approval example

This should be common for low-risk work:

- approved tool families only
- approved data classes only
- no external communication
- no destructive action
- no privileged admin action
- no cross-domain access outside an approved partner set
- time bound within policy maximum

Example result:

```json
{
  "approval_mode": "auto",
  "approved_by": "org_policy:board_packet_low_risk_v3",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "active",
  "constraints_hash": "sha256-abc123"
}
```

### Human step-up example

Common escalation triggers are:

- external communication
- publication
- payment
- deletion
- admin or infrastructure change
- high-sensitivity data classes
- cross-domain access to a domain without standing policy
- sub-agent delegation beyond the normal bound

Example result:

```json
{
  "approval_mode": "human_step_up",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "status": "pending_approval",
  "required_approval": "controller_approval",
  "reason": "external publication requested"
}
```

Once a human approves, the MAS emits a new state and often a new `constraints_hash`. That should invalidate caches and force fresh evaluation before work resumes.

### Honest model of approval timing

The `pending_approval` state sounds like the agent waits in the background while a reviewer receives a notification, deliberates, and returns an answer. That is not a realistic operational model for most deployments.

In practice, approval works in one of two ways:

**Pre-approved patterns (the common case).** The approval is not an event — it is a pre-existing organizational decision. The template was reviewed and approved by a policy owner. The criteria are known and fixed. Auto-approval is just the runtime confirmation that this specific request fits the pattern. The user does not wait; execution starts immediately.

**Explicit user-controlled pause (the realistic step-up case).** When a request crosses a threshold that requires human judgment, the agent stops and surfaces the situation to the user in the current session. The user is the approver. They review the review packet inline and confirm or deny before execution continues. This is not a background notification flow — the agent is blocked in the user's session until the user responds.

What does not work in practice: creating a pending state that blocks execution while waiting for an asynchronous approver who may not respond for hours or days, in a context where the model session may have expired. Do not build systems that assume an available approver who responds within a predictable SLA unless that approver is a webhook into an existing ticketing or compliance system with known routing and response behavior.

If you need asynchronous approval, that is a first-class architectural decision: the Mission is submitted, the session ends, and a new session begins after approval. The implementation must handle session resumption from approved Mission state rather than trying to keep a pending agent session alive indefinitely.

### `auto` vs. `auto_with_release_gate`: side-by-side comparison

These two approval modes are easily confused. Both allow the Mission to activate immediately without human step-up. The difference is in which tools become available immediately.

| Property | `auto` | `auto_with_release_gate` |
|---|---|---|
| Mission status after compilation | `active` immediately | `active` immediately |
| Token issuance | all approved tools get tokens | tokens issued for all tools, including gated tools |
| Read/draft tools available | immediately | immediately |
| Gated (commit) tools available | immediately — no stage gate required | only after the stage gate condition is satisfied |
| Stage gate | none defined | one or more stage gates defined in `stage_constraints` |
| Gate satisfaction | n/a | user step-up approval or external approver response |
| Who can satisfy the gate | n/a | user (inline) or external approver (async) per template routing |
| Effect of gate on `constraints_hash` | n/a | `constraints_hash` does not change when gate is satisfied; only `approvals` context changes |

**`auto` — board_packet example:**

The `board_packet_preparation` template with `approval_required: "auto"` and no stage gates. When activated:
- All tools are immediately available: `finance_query`, `docs_editor`, `docs_publish`
- The model can call `docs.publish` without any further approval
- Use this only when the organization explicitly trusts that an auto-approved board packet Mission should publish without a human checkpoint

**`auto_with_release_gate` — board_packet example (the correct default):**

The `board_packet_preparation` template with `approval_required: "auto_with_release_gate"` and a `controller_approval` stage gate on `docs.publish`. When activated:
- Read and draft tools are immediately available: `finance_query`, `docs_editor`
- `docs.publish` is in `gated_tools` — the token is issued, but the commit-boundary check requires `approvals.contains("controller_approval")` before the publish executes
- The model can plan, draft, and research freely; when it reaches the publish step, it surfaces a step-up prompt
- Once the user or Finance Controller approves, `docs.publish` executes

**Why `auto_with_release_gate` is the right default for most internal workflows:**

It allows the agent to do all preparatory work without blocking on approval, while reserving the approval gate for the moment an irreversible side effect is about to happen. This is better UX than `human_step_up` (which requires approval before the agent can do anything) and safer than `auto` (which lets side effects happen without any human checkpoint).

**Starter template pack — correct approval modes:**

| Template | Approval mode | Gated tools |
|---|---|---|
| `board_packet_preparation` | `auto_with_release_gate` | `docs.publish` (requires `controller_approval`) |
| `support_ticket_triage` | `auto` for allowlisted partners; `human_step_up` for new partners | `ticket_create` on new partners |
| `sales_account_research` | `auto` | none — read and draft only, no external send |
| `engineering_release_drafting` | `auto_with_release_gate` | `release_publish` (requires `release_manager_approval`) |
| `vendor_due_diligence` | `auto` | none — read and draft only |

### Compilation and approval sequence

```mermaid
sequenceDiagram
    actor User
    participant Host as Agent Host
    participant Shaper as Mission Shaper
    participant MAS as MAS Compiler
    participant Templates as Policy Templates
    participant Approver as User / Policy Approver

    User->>Host: prompt
    Host->>Shaper: shape request
    Shaper-->>Host: Mission proposal
    Host->>MAS: create Mission(proposal, context)
    MAS->>Templates: match template + defaults
    Templates-->>MAS: matched template + deny set + gates
    MAS->>MAS: normalize, resolve catalog, score risk
    alt auto-approvable
        MAS-->>Host: active Mission + constraints_hash + capability snapshot
    else step-up required
        MAS-->>Approver: review packet
        Approver-->>MAS: approve / deny
        MAS-->>Host: updated status + constraints_hash
    end
```

### Template Building

The compiler and approval flow depend on templates. If template building is weak, the whole system becomes ad hoc review plus brittle policy code.

Treat templates as first-class governed artifacts.

For the core profile, keep templates **thin**. A template should define the work-pattern envelope:

- allowed tool families
- denied tool families
- gated actions
- max duration
- whether delegation is allowed
- review route

Do not force templates to encode all backend resource semantics. Keep resource-instance detail in the catalog, backend authorization, and FGA layers. Thin templates age better and are easier to govern.

#### Starter template pack

If the goal is to get a real system running, start with this baseline. It is broad enough to cover the most common internal workflows and narrow enough to be governable without extensive policy work upfront.

| Template | Default scope | Default approval mode |
|---|---|---|
| `board_packet_preparation` | enterprise finance read, document read/write, no external send | auto with release gate |
| `support_ticket_triage` | ticket read/update, internal notes, approved partner ticket create | auto for allowlisted partners, step-up otherwise |
| `sales_account_research` | CRM read, document draft, no outbound send | auto |
| `engineering_release_drafting` | issue tracker read, build metadata read, release note draft | auto with publish gate |
| `vendor_due_diligence` | vendor doc read, internal memo draft, no external send | auto |

That is enough to start without pretending every workflow has to be templated on day one.

### What a template is

A template is not just a Cedar policy snippet.

A useful Mission template bundles:

1. **purpose classification**
   Which user requests this template applies to.
2. **allowed resource classes**
   Which systems and data classes are normally in scope.
3. **allowed action classes**
   Which kinds of actions are normally permitted.
4. **default tool set**
   Which tools or MCP surfaces are the normal execution path.
5. **hard denies**
   Which tools, actions, or domains are never allowed under this template.
6. **stage gates**
   Which actions require approval even if the Mission is auto-approved overall.
7. **delegation bounds**
   Whether sub-agents are allowed and at what depth.
8. **time bounds**
   Maximum lifetime and approval TTL defaults.
9. **cross-domain rules**
   Which partner domains are allowed and under what conditions.
10. **review routing**
    Which human approver type should receive a step-up request.

That means a template is closer to a policy package than to a single rule. In the simplified profile, start with the first six fields and add the rest only when a real deployment need appears.

### Start from approved work patterns

Do not build templates from abstract IAM theory alone. Build them from common, already-accepted work patterns.

Examples:

- board packet preparation
- sales account research
- support-ticket triage
- engineering release drafting
- vendor due diligence

For each pattern, capture:

- what users actually ask for
- what tools they actually need
- what data classes are normally touched
- where irreversible actions happen
- what should always require human approval

This is the raw material for the template.

### Template authoring procedure

Use a repeatable authoring flow.

#### Step 1: define the purpose class

Give the template a stable purpose identifier:

- `board_packet_preparation`
- `support_ticket_triage`
- `vendor_security_review`

This becomes the anchor for matching, review, and reporting.

#### Step 2: define the normal authority envelope

Specify:

- allowed resource classes
- allowed actions
- default tools
- expected trust domains
- maximum lifetime

For example:

```json
{
  "template_id": "tpl_board_packet_v3",
  "purpose_class": "board_packet_preparation",
  "allowed_resource_classes": [
    "finance.read",
    "documents.read",
    "documents.write"
  ],
  "allowed_action_classes": [
    "read",
    "summarize",
    "draft"
  ],
  "default_tools": [
    "erp.read_financials",
    "docs.read",
    "docs.write"
  ],
  "allowed_domains": ["enterprise"],
  "max_duration": "8h"
}
```

#### Step 3: define the deny set

Every template should name the things it excludes, not just the things it allows.

For example:

```json
{
  "denied_tools": [
    "email.send_external",
    "treasury.transfer",
    "hr.read"
  ],
  "denied_action_classes": [
    "delete",
    "pay",
    "admin_change"
  ]
}
```

This matters because it prevents later broadening by omission.

#### Step 4: define stage gates

Templates should separate:

- actions that are allowed outright
- actions that are allowed only after a checkpoint

For example:

```json
{
  "stage_gates": [
    {
      "name": "release_gate",
      "approval_type": "controller_approval",
      "applies_to_tools": ["docs.publish"],
      "applies_to_actions": ["publish_external"]
    }
  ]
}
```

This is where auto-approval and human approval coexist in the same template.

#### Step 5: define delegation defaults

For each template, decide:

- whether sub-agents are allowed
- what max depth is allowed
- whether child Missions can cross domains
- whether child Missions can reach commit boundaries

Most templates should be conservative here.

#### Step 6: define cross-domain rules

If the template can cross trust domains, specify:

- which partner domains are allowed
- whether ID-JAG is permitted
- what local scopes are typically requested in the target domain
- which cases still require human approval

For example:

```json
{
  "cross_domain_rules": [
    {
      "domain": "payroll-partner.example",
      "allowed": true,
      "exchange_mode": "id_jag",
      "allowed_actions": ["ticket.create"],
      "requires_human_approval": false
    },
    {
      "domain": "signing.example",
      "allowed": true,
      "exchange_mode": "id_jag",
      "allowed_actions": ["signature.submit"],
      "requires_human_approval": true
    }
  ]
}
```

#### Step 7: attach review routing

When step-up happens, the system should already know where the approval request goes.

Examples:

- `controller_approval`
- `security_approval`
- `manager_approval`
- `vendor_owner_approval`

This should be part of the template, not recomputed ad hoc every time.

#### Starter review routes

Use a small default approver set first:

| Approval type | Default owner |
|---|---|
| `controller_approval` | finance controller |
| `security_approval` | security operations |
| `platform_approval` | platform engineering owner |
| `external_send_approval` | business owner for the target audience |
| `vendor_owner_approval` | vendor or partner integration owner |

Do not introduce many specialized approver types until the first baseline is stable.

#### Recommended initial template defaults

Use these defaults unless a business process requires something broader:

| Field | Default |
|---|---|
| `max_duration` | 8 hours |
| `subagents_allowed` | false |
| `max_depth` | 0 |
| external domains | deny unless explicitly allowlisted |
| destructive actions | deny unless stage-gated |
| publish/send actions | always stage-gated |

#### Session budget fields

Templates carry per-session budget limits. Budgets serve two purposes: (1) they narrow the exposure window for instruction-sequence attacks by bounding total agent activity, and (2) they give operators a knob to prevent runaway agents.

Add these fields to every template definition:

```json
{
  "session_budgets": {
    "max_reads_per_resource_class": {
      "finance.read": 100,
      "hr.read": 50
    },
    "max_external_calls_per_session": 20,
    "max_wall_clock_duration_seconds": 28800
  }
}
```

| Budget field | What it limits | When triggered |
|---|---|---|
| `max_reads_per_resource_class` | calls to tools in a given resource class, counted per session | class counter reaches limit |
| `max_external_calls_per_session` | calls to external/partner domains (anything outside the tenant's trust domain) | running total reaches limit |
| `max_wall_clock_duration_seconds` | elapsed time from Mission activation | wall clock exceeds value |

**Behavior when a budget is reached:** MAS transitions the Mission to `suspended_budget`. The host surfaces: "I've reached the session limit for [resource class / external calls / time]. Confirm to continue." The user resumes with `POST /missions/{id}/resume`, which resets the relevant counter with explicit user acknowledgment logged to the Mission audit trail.

`suspended_budget` is a sub-state of `active`. It does not require a new Mission. The audit record includes which budget triggered the suspension and the counter value at suspension time.

**Capability snapshot interaction:** the capability snapshot response includes current counter values so the host can warn the user before a hard suspension:

```json
{
  "session_budget_status": {
    "finance.read": { "used": 87, "limit": 100, "warning_threshold": 90 },
    "external_calls": { "used": 18, "limit": 20, "warning_threshold": 18 },
    "wall_clock_seconds": { "used": 14400, "limit": 28800, "warning_threshold": 25200 }
  }
}
```

When `used >= warning_threshold`, the host should surface a passive warning ("Approaching session limit for finance.read") before the hard stop.

#### Session budget counter ownership

The capability snapshot carries authoritative counter values, but something must increment them. The ownership rule is:

**The host owns local counter tracking. MAS owns the authoritative count.**

Specifically:

1. The host increments local in-memory counters on every `PostToolUse` event (resource class reads, external calls, wall-clock elapsed).
2. The host reports counter state to MAS asynchronously via `POST /missions/{id}/signals` — not as a blocking call before each tool response, but after the tool use completes.
3. MAS persists the reported counts and returns the authoritative accumulated total in subsequent capability snapshot responses.
4. The host uses its local in-memory count for within-session enforcement decisions (e.g., blocking a tool call when `used >= limit`).
5. On session restore (crash/restart), the host pulls the last MAS-authoritative count from the capability snapshot and resumes from there.

This model avoids adding a synchronous MAS round-trip to every tool call while keeping MAS as the durable source of truth.

**Failure behaviors:**

| Condition | Behavior |
|---|---|
| MAS unreachable when host reports counter | host queues the report locally; flushes on next successful contact; continues enforcing from local count |
| host crashes before reporting current count | session restore pulls last MAS count; any unreported increments since last report are lost — the budget is effectively under-counted by at most one report interval |
| host count and MAS count diverge by more than one report interval | capability snapshot refresh reconciles: host adopts MAS count as authoritative and continues from there |
| MAS count exceeds limit at snapshot refresh | host enforces suspension immediately, even if local count had not yet hit the limit |

**Report interval:** the host should report counter state to MAS at minimum: (a) when the local count crosses a warning threshold, (b) at session end, and (c) every 60 seconds for long-running sessions. Waiting until session end is not sufficient for sessions that exceed `max_wall_clock_duration_seconds`.

**What the MCP server does not own:** the MCP server does not track or report budget counters. It may receive tool calls, but the host is the component that sees every tool call across all MCP servers in a session and is therefore the correct aggregation point.

### Template governance ownership and cadence

Templates with no named owner, no review cadence, and no deprecation path become brittle policy debt. The governance structure is not optional — it is what prevents the template model from decaying into an ungoverned collection of policy files that nobody audits.

**Required ownership fields per template:**

Every template definition must carry these fields:

```json
{
  "owner": "finance-platform-team",
  "owner_contact": "finance-platform-oncall@example.com",
  "reviewer": "security-operations",
  "reviewer_contact": "secops@example.com",
  "last_reviewed_at": "2025-10-01T00:00:00Z",
  "next_review_due": "2026-10-01T00:00:00Z",
  "review_cadence": "annual",
  "risk_tier": "medium"
}
```

`owner` is the team responsible for keeping the template current and resolving policy questions about it. `reviewer` is the team that must sign off on changes. For high-risk templates, `owner` and `reviewer` must be different teams.

**Review cadence rules:**

| Template risk tier | Required review cadence | Grace period before status changes |
|---|---|---|
| low | annual | 30 days past `next_review_due` |
| medium | semi-annual (every 6 months) | 14 days past `next_review_due` |
| high | quarterly | 7 days past `next_review_due` |

When `next_review_due` is exceeded by the grace period, the template moves to `pending_re_review`. Templates in `pending_re_review`:
- continue to function for currently active Missions
- cannot be used to activate new Missions
- appear in the admin dashboard's template drift view with an "overdue review" indicator

**What a review requires:**

A review is not just signing off. The reviewer must:
1. confirm the allowed resource classes still match the intended work pattern
2. confirm the hard deny list still covers the right actions
3. confirm the approval routing still reaches the right approvers
4. confirm the session budget limits are still appropriate
5. run the template simulation against a representative Mission to verify the compiled output looks correct

After confirming all five, the reviewer sets `last_reviewed_at` to now and `next_review_due` to the next cadence date. The review event is logged to the audit trail with reviewer identity.

**Deprecation path:**

When a template is replaced by a newer version:

1. Publish the new template version (`board_packet_v4`) as `draft → active`
2. Mark the old version (`board_packet_v3`) as `deprecated`
3. Deprecated templates continue to work for active Missions — do not revoke active Missions that were created from the deprecated template
4. Set a `deprecated_at` and `remove_after` date on the deprecated template (minimum 30 days notice)
5. On `remove_after`, the template moves to `archived` — no new Missions, no active Missions can be created from it
6. The archived template record is retained in the system for audit purposes — it is never deleted

Templates must not jump from `active` to `archived` without a deprecation period. A sudden removal is a breaking change for any session that depends on that template being available for cloning or resumption.

**What to do when a template has no owner:**

Ownerless templates are a governance gap. If a template's `owner` is an unmaintained team or alias, it must be assigned a new owner before its next review is due. If no owner can be found, the template must be deprecated — an ownerless template cannot be trusted to represent current policy intent.

### Template re-review on capability change

Templates reference catalog resources. When a referenced resource changes in a way that affects the template's risk profile, the template must be re-reviewed.

**Triggering conditions for mandatory template re-review:**

| Catalog record change | Effect on template |
|---|---|
| `commit_boundary` flips from `false` to `true` | any template that includes this tool and does not already have a stage gate for it must be flagged for re-review; auto-approval eligibility may change |
| `trust_domain` changes | any template that allows this resource based on its trust domain must be re-evaluated against the template's trust domain allowlist |
| `data_sensitivity` increases | any template that allows access to this resource without sensitivity-appropriate gates must be re-reviewed |
| `allowed_action_classes` expands | any template that grants this resource without evaluating the new action class must be reviewed |

**Implementation:** the resource catalog should emit a `catalog.record.changed` event when any of these fields change on an approved record. Template management should subscribe to this event and automatically transition affected templates to `pending_re_review`. Templates in `pending_re_review` continue to function for currently active Missions but cannot be used to activate new Missions until re-review is complete.

Template owners must be notified when a resource they depend on changes. "We reviewed the template once" is not sufficient — the review is of the template against the resources as they were at review time, not as they are today.

### Template discovery API

Templates must be discoverable. Users and hosts cannot use what they cannot see.

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/templates` | list available templates for the caller's tenant |
| `GET` | `/templates/{template_id}` | fetch one template's definition |

`GET /templates` is tenant-scoped and returns only templates in `active` status. Templates in `draft` or `deprecated` are not returned to end users (operators can see all statuses via admin API).

Minimum `GET /templates` response:

```json
{
  "templates": [
    {
      "template_id": "board_packet_v3",
      "purpose_class": "board_packet_preparation",
      "display_name": "Board Packet Preparation",
      "description": "Prepare internal financial board packets. Reads finance and document systems; gates publication.",
      "allowed_resource_classes": ["finance.read", "documents.read", "documents.write"],
      "hard_denied_actions": ["publish_external", "pay", "delete"],
      "gated_actions": ["final_publish"],
      "default_approval_mode": "auto",
      "status": "active"
    }
  ]
}
```

**The host should fetch the template list at session start and pass it to the shaper as context.** A shaper that knows what templates exist produces proposals that are far more likely to match on first pass. Without template context, the shaper proposes purpose classes by guessing — many will be correct but some will be confident misclassifications that look reasonable but don't match any real template.

### Catalog and tool discovery UX for end users

Users should not need to know what tools exist to make a valid Mission request. But they should be able to understand what the agent can do for their current Mission and why some things are off-limits.

**What users should be able to discover:**

| Question | Where the answer comes from | How the host surfaces it |
|---|---|---|
| "What can you do for me?" | Capability snapshot `available_tools` list | Model summarizes in natural language from `display_name` fields |
| "Why can't you do [thing]?" | Denial reason + Mission scope | Plain-language denial message + "See what I can do" |
| "What Missions can I create?" | `GET /templates` list | Offer the template list as an option when no Mission exists |
| "Can this Mission be expanded?" | Current template + amendment path | "I can request an amendment if you need [specific thing]" |
| "What is this Mission allowed to do?" | Capability snapshot + Mission record | Summarize approved tools grouped by purpose |

**How the model should present available tools:**

Do not present the raw tool list to users. Group tools by what they accomplish:

```
For this Mission I can:
- Read financial data (ERP actuals, budget plans)
- Draft and edit board packet documents
- Search the document library

I cannot (and would need a new Mission for):
- Send emails or external messages
- Make payments or transfers
- Publish documents externally (requires step-up approval)
```

This grouping comes from the `resource_class` labels in the capability snapshot, not from reading the raw tool names. The model should translate `finance.read` into "Read financial data," not list tool names like `mcp__finance__erp.read_financials`.

**What users should NOT see:**

- Raw MCP tool names (`mcp__finance__erp.read_financials`)
- Cedar entity names (`Mission::Tool::"..."`)
- `constraints_hash`
- Template internal IDs

**Tool discovery for new Mission requests:**

When a user makes a request that doesn't match the current Mission, the host should offer context rather than a hard stop:

1. "That's outside my current Mission, but I can check if there's a template for it."
2. Host calls `GET /templates` filtered by keywords from the user's request.
3. If a matching template exists: "I found a template for [purpose]. Want me to start a new Mission for that? It will need [approval mode] to activate."
4. If no template exists: "I don't have a template for that kind of work. You may need to ask your administrator to add it."

This flow prevents the user from hitting a dead end. It turns "denied" into "here's what you can do instead."

### Admin lifecycle for template and mapping changes

Every change to a template or resource catalog record affects compiled Mission state. The admin lifecycle makes those changes deliberate and reversible.

#### Template change lifecycle

```
draft edit → simulation → diff review → approval → publish → live
                                                              ↓
                                          active Missions continue on old version
                                          new Missions use new version
```

**Step-by-step:**

1. **Draft edit** — operator creates a new draft version of the template. The existing `active` version is unchanged. Multiple draft versions can exist simultaneously; only one can be promoted to `active`.
2. **Simulation** — operator runs `POST /templates/{id}/simulate` with a representative Mission proposal to verify the compiled review packet looks correct. This is a required step before the publish button is enabled.
3. **Diff review** — operator reviews `GET /templates/{id}/diff?from_version=active&to_version=draft` to see the human-readable summary of what changed.
4. **Approval** — the template `reviewer` (a different team than the `owner` for high-risk templates) signs off on the diff. For low-risk changes (fixing a typo in `description`), self-approval is acceptable. For risk-profile changes (adding a resource class, removing a stage gate), reviewer sign-off is required.
5. **Publish** — operator promotes the draft to `active`. The prior `active` version moves to `deprecated` automatically.
6. **Live** — active Missions created from the old version continue on the old version's compiled state. New Missions created after the publish use the new version.

**Template rollback:**

If the published version causes problems (unexpected denials, broken workflows), rollback by publishing the prior version back to `active`:
- set the deprecated prior version's status back to `active`
- set the current active version to `deprecated`
- existing active Missions compiled from the current version continue until expiry; new Missions use the rolled-back version

Rollback is a new publish event, not a revert operation. The audit trail shows: publish → rollback as two distinct events with actor identity and timestamps.

**Emergency change path:**

When a template must be changed immediately due to a security incident (e.g., a newly discovered tool is allowing access it shouldn't):

1. Operator submits an emergency change with `"emergency": true` in the publish request
2. Normal simulation and diff review steps are bypassed (logged as bypassed, not hidden)
3. Dual-operator approval is required — the publish request must be countersigned by a second operator with security operations authority
4. MAS emits an `template.emergency_change` audit event that triggers immediate notification to the security operations team
5. The emergency change is flagged in the template's audit history and must be reviewed in the next regular security review

#### Resource catalog change lifecycle

```
catalog edit → template impact analysis → re-review of affected templates → approval → publish
```

**Step-by-step:**

1. **Catalog edit** — operator creates a draft version of the catalog record with the proposed change
2. **Template impact analysis** — `POST /catalog/resources/{id}/impact-analysis` returns a list of all templates that reference this resource and would be affected by the change
3. **Re-review of affected templates** — each affected template moves to `pending_re_review`. Templates in this state cannot be used to activate new Missions until the template reviewer signs off on the change's effect
4. **Approval** — catalog record and all affected templates must be approved before the record goes live
5. **Publish** — catalog record moves to `approved`; affected templates move back to `active` after their individual re-reviews

**The key constraint:** a catalog record change that affects templates cannot go live while any affected template is still in `pending_re_review`. This prevents a catalog change from silently altering compiled Mission authority without template owners knowing.

### Template simulation and diff tooling

Before publishing a template change from `draft` to `active`, operators need to know what the compiled authority envelope will look like and how it differs from the current version. Without this, template changes are applied blind — operators must either test in production or trust that the JSON they edited is correct.

#### Simulation API

Run a proposed Mission against a draft template to see what the compiled review packet would look like before the template is published:

```
POST /templates/{template_id}/simulate
```

**Request body:**

```json
{
  "mission_proposal": {
    "intent": "Prepare Q4 board materials and publish the final packet",
    "actor_id": "user_alice",
    "requested_tools": ["finance_query", "docs_editor", "docs_publish"]
  },
  "template_version": "draft"
}
```

**Response shape:**

```json
{
  "simulated_review_packet": {
    "purpose_class": "board_packet_preparation",
    "resource_classes": ["finance.read", "documents.write"],
    "action_classes": ["read", "draft", "publish"],
    "stage_constraints": [
      { "action": "docs.publish", "gate_type": "controller_approval", "approvers": ["cfo_role"] }
    ],
    "approval_required": "auto",
    "session_budgets": {
      "max_reads_per_resource_class": { "finance.read": 100 },
      "max_external_calls_per_session": 20,
      "max_wall_clock_duration_seconds": 28800
    },
    "hard_denied_actions": ["publish_external", "pay", "delete"]
  },
  "would_auto_approve": true,
  "constraints_hash_preview": "b9c2e4... (not final — computed at publish time)",
  "warnings": [
    "Tool docs_publish maps to action class publish which is in stage_constraints — verify gate is intentional."
  ]
}
```

The simulation is non-persisting: it does not create a Mission, modify any state, or produce a real `constraints_hash`. It is a read-only preview for operator validation.

**What the simulation surface in the admin dashboard:**

When an operator moves a template from `draft` toward `active`, the dashboard requires a simulation run before the publish button is enabled. The dashboard shows:
- the simulated review packet in structured form
- any `warnings` highlighted prominently
- `would_auto_approve: true/false` with the reason
- a "looks correct, publish" confirmation button

#### Diff API

Show what changed between two versions of a template's compiled authority envelope:

```
GET /templates/{template_id}/diff?from_version=v2&to_version=v3
```

**Response shape:**

```json
{
  "template_id": "board_packet_preparation",
  "from_version": "v2",
  "to_version": "v3",
  "diff": {
    "added_resource_classes": ["hr.read"],
    "removed_resource_classes": [],
    "added_action_classes": [],
    "removed_action_classes": ["external_write"],
    "added_hard_denied_actions": ["send_slack"],
    "removed_hard_denied_actions": [],
    "added_stage_gates": [],
    "removed_stage_gates": [
      { "action": "docs.draft", "gate_type": "self_approve" }
    ],
    "changed_session_budgets": {
      "prior": { "max_wall_clock_duration_seconds": 14400 },
      "new": { "max_wall_clock_duration_seconds": 28800 }
    },
    "changed_approval_mode": {
      "prior": "auto_with_release_gate",
      "new": "auto"
    }
  },
  "human_summary": "Added resource class hr.read. Removed action class external_write. Added hard deny for send_slack. Removed self-approval gate on docs.draft. Session time limit increased from 4h to 8h. Approval mode changed from auto_with_release_gate to auto."
}
```

**When diff is available:** the diff endpoint works for any two persisted versions of a template (including `draft` vs. the current `active` version). Passing `from_version=active&to_version=draft` shows what will change when the pending draft is published.

**Admin dashboard presentation:** the diff is shown automatically any time an operator views a `draft` template that has a prior `active` version. It appears alongside the template editor with the `human_summary` at the top and the field-level diff expandable below. Operators should not be able to publish a template without seeing the diff if one exists.

### Compile templates into machine artifacts

Templates should not remain as prose or JSON only. Compile them into:

1. **matching rules**
   Used to map proposals into candidate templates.
2. **Cedar templates or policy fragments**
   Used to generate the policy bundle.
3. **approval routing rules**
   Used to create review work items.
4. **token projection rules**
   Used by the AS to know what each audience should see.
5. **host hint defaults**
   Used by the host to know safe mode, commit boundaries, and expected signals.

In other words, the Mission compiler should not hardcode logic for every purpose class. It should load template packages.

### Template selection

The compiler should not choose a template by one fuzzy LLM guess.

Use a layered selection procedure:

1. shaping model proposes one or more candidate purpose classes
2. deterministic matcher checks:
   - user org
   - tools requested
   - domains involved
   - action classes
3. if one template matches strongly, select it
4. if multiple templates match, require disambiguation or human selection
5. if no template matches, route to:
   - fallback restrictive template
   - or human review

That is safer than pretending every request cleanly fits one template.

### Template testing

Templates should be tested like code.

At minimum, each template needs:

1. **positive cases**
   Requests that should auto-approve.
2. **negative cases**
   Requests that should be denied or escalated.
3. **boundary cases**
   Requests that trigger stage gates, delegation limits, or cross-domain constraints.
4. **projection tests**
   Ensure token claims, tool lists, and Cedar entities all match the template.

For example:

- "prepare Q2 board packet" -> auto-approved
- "prepare Q2 board packet and email it to investors" -> pending approval
- "prepare Q2 board packet and transfer funds" -> denied

If you are not regression-testing templates, you are editing production authority logic blind.

### Template versioning

Templates should be versioned explicitly:

- `tpl_board_packet_v1`
- `tpl_board_packet_v2`
- `tpl_board_packet_v3`

The compiler should record:

- selected template ID
- selected template version
- compiled `constraints_hash`

That way you can answer:

- which template approved this Mission?
- what changed between v2 and v3?
- did the approval basis change after a policy update?

### Template ownership

Do not let every team invent templates independently.

At minimum:

- policy team owns template definitions
- domain owners approve resource mappings and deny lists
- security architecture owns stage-gate defaults and cross-domain rules
- application teams propose new work-pattern templates

That is how templates become stable enough to be trusted.

#### Template governance in production

Templates need an operational governance loop or they will decay.

Minimum loop:

- monthly usage review by template
- quarterly re-approval for high-risk templates
- telemetry on denial rate, step-up rate, and exception rate by template
- deprecation process for stale templates
- explicit rule that repeated exceptions must become either:
  - a new narrowed template
  - or a hard deny

If a template accumulates repeated manual overrides, it is no longer a stable template. Treat that as governance debt and force review.

### Starter Templates for Greenfield Deployments

Before production data exists to derive templates from, start with these three foundational templates. They cover the most common agent work patterns and define the deny set that every subsequent template inherits.

**`tpl_read_only_research_v1`**
- purpose class: `research`, `analysis`, `summarization`
- allowed resource classes: `*.read` for approved internal sources
- allowed actions: `read`, `summarize`
- hard denies: all write, send, publish, delete, pay actions; all external domains
- delegation: no sub-agents
- max duration: 4 hours
- stage gates: none
- auto-approval eligible: yes

**`tpl_draft_and_review_v1`**
- purpose class: `draft`, `prepare`, `create_internal_document`
- allowed resource classes: `*.read`, `documents.write` for internal destinations
- allowed actions: `read`, `summarize`, `draft`
- hard denies: external send, publish, delete, pay; external domains
- delegation: one level allowed for read-only sub-tasks
- max duration: 8 hours
- stage gates: none for internal drafts; `release_gate` required for anything with a distribution step
- auto-approval eligible: yes for internal-only scope

**`tpl_restrictive_fallback_v1`**
- purpose class: any unmatched request
- allowed resource classes: none — read-only access to the session's own workspace only
- allowed actions: `read` of session artifacts only
- hard denies: all external access, all write operations
- delegation: none
- max duration: 1 hour
- stage gates: all actions require human approval
- auto-approval eligible: no

The fallback template activates when no other template matches. It does not allow the agent to do useful work — that is intentional. Hitting the fallback should trigger a clarification flow so the right template can be defined, not silently grant broad permissions.

## How Cedar Fits

[Cedar](https://docs.cedarpolicy.com/) is a good fit for the policy layer because it already evaluates authorization decisions in terms of:

- **principal**
- **action**
- **resource**
- **context**

That maps cleanly onto a Mission architecture:

- **principal**: user, agent host, sub-agent, or workload identity
- **action**: tool invocation, API call, publish, delete, send, approve
- **resource**: MCP server, tool, document, table, API endpoint, tenant
- **context**: `mission_id`, lifecycle state, stage constraints, network posture, runtime risk, time, approval status

A practical pattern is:

1. MAS stores the approved Mission and the compiled Mission Authority Model.
2. MAS publishes one or more Cedar policies or policy templates derived from that model.
3. The AS, MCP server, or commit-boundary service evaluates requests against those Cedar policies using a small entity graph and request context.

### Cedar is the policy model, not necessarily the runtime engine everywhere

The design uses Cedar as the common policy model and compiler target. That does **not** mean every runtime surface must embed the same Cedar engine in-process.

Use this rule:

| Surface | What is required |
|---|---|
| MAS / compiler | Cedar-compatible policy bundle generation |
| AS | Cedar evaluation directly or a semantically equivalent policy adapter |
| host | Cedar evaluation directly, residual policy, or a compiled local adapter |
| MCP server | Cedar evaluation directly or a semantically equivalent local enforcement adapter |
| backend service with strong native auth | native auth or FGA is acceptable if it enforces the same compiled constraints |

The invariant is semantic consistency, not identical runtime libraries.

### Example Cedar policy

```cedar
permit(
  principal,
  action == Action::"call_tool",
  resource in ToolGroup::"board_packet_tools"
)
when {
  context.mission_id == "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1" &&
  context.mission_status == "active" &&
  principal.agent_id == "agent_research_assistant"
};

forbid(
  principal,
  action == Action::"publish_external",
  resource
)
unless {
  context.approvals.contains("controller_approval")
};
```

This is the right level of concreteness:

- Mission compilation produces Cedar-friendly entities and context
- Cedar decides permit/forbid
- tokens and tool filters are projections of that decision surface

### What gets compiled into Cedar

The practical move is to compile Mission into Cedar PARC inputs rather than trying to make Cedar store the whole Mission document.

For example, this Mission intent:

- purpose: prepare board packet
- approved tools: `erp.read_financials`, `docs.read`, `docs.write`
- stage constraint: `controller_approval` required for `final_publish`

can compile into:

1. **Principal entities**
   - `Mission::Agent::"agent_research_assistant"`
   - `Mission::User::"user_123"`
2. **Resource entities**
   - `Mission::Tool::"erp.read_financials"`
   - `Mission::Tool::"docs.write"`
   - `Mission::ToolGroup::"board_packet_tools"`
3. **Action entities**
   - `Mission::Action::"call_tool"`
   - `Mission::Action::"publish_external"`
4. **Context**
   - `mission_id`
   - `constraints_hash`
   - `mission_status`
   - `approvals`
   - `runtime_risk`
   - `network_zone`
   - `commit_boundary`

That means the Cedar request the host or server evaluates is always concrete:

```json
{
  "principal": "Mission::Agent::\"agent_research_assistant\"",
  "action": "Mission::Action::\"call_tool\"",
  "resource": "Mission::Tool::\"docs.write\"",
  "context": {
    "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
    "constraints_hash": "sha256-abc123",
    "mission_status": "active",
    "approvals": [],
    "runtime_risk": "normal",
    "commit_boundary": false
  }
}
```

The implementation pattern is:

- Mission text stays in the MAS
- compiled authority becomes Cedar entities, actions, and context
- token claims and tool filters are derived from the same compiled model

### Cedar policy model summary

The compiler produces two artifacts per Mission:

1. **Template policy set** — one per template class (e.g., `board_packet_v1`), shared across all Mission instances of that class. Contains permit/forbid logic. Does not embed Mission-instance values. O(template classes), not O(missions).
2. **Mission entity snapshot** — one per Mission instance. Contains the entity graph: which tools belong to which ToolGroups, principal records, and approval-gating metadata. Recomputed on each amendment. The `constraints_hash` is the SHA-256 of its canonical JSON serialization.

Do not embed `constraints_hash` in Cedar rules. It is a cache-staleness detector at enforcement points, not an authorization condition.

See [Cedar Policy Reference](#cedar-policy-reference) in the appendix for the full schema, action vocabulary, and generation recipe.

### Cedar evaluator process

Cedar evaluation is a library call, not a network call. The Cedar evaluator runs in-process at each enforcement point (AS token endpoint, MCP server request handler, commit-boundary gate). It does not run as a standalone service.

**Startup:**

1. Each enforcement point fetches its initial policy set at startup time from `GET /missions/{id}/policy-bundle?fields=template_policy`
2. It caches the template policy in memory keyed by template class name
3. It fetches the Mission entity snapshot from `GET /missions/{id}/policy-bundle?fields=entity_snapshot` and caches it keyed by `constraints_hash`

**Per-request evaluation:**

```
1. read mission_id and constraints_hash from the incoming request token (or from the MCP session binding)
2. check local cache: do we have an entity snapshot for this mission_id + constraints_hash?
   - if yes: use it
   - if no: fetch GET /missions/{id}/policy-bundle, cache the new snapshot, evict the old one
3. look up template policy by template_class (from the Mission record or token claim)
4. build runtime context from request: mission_status, approvals, runtime_risk, commit_boundary, trust_domain
5. call Cedar.is_authorized(principal, action, resource, context, entities)
6. if result == Deny: return 403 with deny reason
7. if result == Allow: proceed
```

**What the evaluator does NOT do:**

- it does not call MAS on every request — that path is only the cache-miss path
- it does not store Mission state; MAS owns state
- it does not make network calls during evaluation — all needed data is loaded before the call

If MAS is unavailable and the local snapshot has expired, the evaluator fails closed (deny) rather than proceeding on stale data. See MAS degraded mode.

### What gets evaluated in Cedar

Use Cedar in three places:

1. **Token issuance**
   Should this Mission get a token for this audience and tool set?
2. **Tool execution**
   Should this `tools/call` request be allowed right now?
3. **Commit boundary**
   Should this side-effecting action become real right now?

This keeps authority logic out of prompt instructions and token scopes.

### Capability snapshot for planning

Policy should not appear only as a deny at execution time. The host should also hold a current view of the allowed action space before it plans. The simplest production shape is a cached **Mission Capability Snapshot**, not a live MAS round trip on every planning step.

That loop is:

1. **fetch snapshot**
   Get the current allowed action space for this Mission, principal, and environment.
2. **plan**
   Build the next steps only inside that snapshot.
3. **execute**
   Run the plan, with normal runtime enforcement still active.
4. **observe and refresh**
   If approvals, risk, lifecycle state, or `constraints_hash` change, refresh the snapshot before planning further.

This matters because sequencing is part of authorization. A plan that is valid before approval is granted may be invalid after a state change, and a plan that assumes an old scope should not continue on stale policy.

#### Snapshot refresh budget and local planning rules

Capability snapshots are meant to reduce control-plane chatter without leaving the host blind.

Use these default rules:

| Situation | Host behavior |
|---|---|
| normal planning inside current `constraints_hash` and cached capability snapshot | plan locally |
| new domain, new gated action, or child-agent request | refresh before continuing |
| `constraints_hash` changed or approval changed | refresh immediately |
| MAS unavailable | continue only for low-risk cached reads within TTL; otherwise stop |

Recommended refresh budget:

- one snapshot fetch at session start or Mission activation
- one refresh after any approval, revocation, suspension, or hash change
- one refresh before any cross-domain step
- no repeated per-thought refreshes inside the same unchanged policy window

If the host exceeds that budget routinely, the snapshot is too weak or the cache policy is wrong.

### What the capability snapshot contains

The host does not need the full policy graph. It needs a stable planning surface:

- allowed tools
- allowed action classes
- current stage-gated actions
- current trust domains
- delegation allowance
- any active deny conditions

For example:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "allowed_tools": [
    "mcp__finance__erp.read_financials",
    "mcp__docs__docs.read",
    "mcp__docs__docs.write"
  ],
  "gated_tools": [
    "mcp__docs__docs.publish"
  ],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0
  }
}
```

That is a capability snapshot, not the full authority model.

For the **simplified core profile**, the minimum snapshot is smaller:

- `mission_id`
- `constraints_hash`
- `planning_state`
- `allowed_tools`
- `gated_tools`
- `denied_actions`
- `refresh_after_seconds`

Treat these as the core contract. The following fields are **optional advanced-profile extensions**:

- `allowed_domains`
- `delegation_bounds`
- `requested_domain`
- `requested_delegation_depth`

### Residual policy as a planning input

If the policy engine supports partial evaluation or residual policy, use it.

A residual policy can tell the host something more useful than a raw allow/deny, for example:

```cedar
permit(principal, action, resource)
when { resource.path like "/workspace/*" };
```

That lets the host plan inside the known safe region instead of probing by trial and error. The host still needs runtime checks, but planning becomes authority-aware rather than authority-blind.

### When to refresh the snapshot

The host should refresh before planning further when:

- `constraints_hash` changes
- an approval is granted or expires
- risk state changes
- delegation state changes
- Mission lifecycle changes
- a cross-domain step is introduced

This is the operational rule:

do not keep planning on stale authority.

### Capability snapshot API

To make planning practical without a chatty control plane, expose a capability snapshot surface from the MAS or policy service.

Minimum request:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "principal": "agent_research_assistant",
  "session_id": "sess_123",
  "constraints_hash": "sha256-abc123"
}
```

Minimum response:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "allowed_tools": [
    "mcp__finance__erp.read_financials",
    "mcp__docs__docs.read",
    "mcp__docs__docs.write"
  ],
  "gated_tools": [
    "mcp__docs__docs.publish"
  ],
  "allowed_domains": [
    "enterprise"
  ],
  "denied_actions": [
    "publish_external",
    "pay"
  ],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0
  }
}
```

Required behavior:

- host planning must use this surface, not infer authority from past tool successes
- response must be keyed to one `constraints_hash`
- stale hash in request must trigger refresh or denial

#### How to present the capability snapshot to the model

The capability snapshot must be injected as **trusted system context**, not as retrieved content or a user-turn message. Concretely:

- inject it at `SessionStart` or `UserPromptSubmit` via the hook's `addContext` or equivalent mechanism, so it appears in the system-controlled portion of the context
- do not re-inject it as tool output or as a message attributed to the user — that path is reachable by prompt injection
- include only the fields the model needs to plan: `allowed_tools`, `gated_tools`, `denied_actions`, `delegation_bounds`; do not inject the full governance record or approval evidence into model context
- when `constraints_hash` changes, re-inject the updated surface before the next planning step; do not rely on the model's memory of prior context

The host holds the authoritative capability snapshot. The model receives a projection of it. Tool-call enforcement at `PreToolUse` and `tools/call` does not depend on the model having seen the snapshot correctly — those checks happen regardless. The purpose of injecting the snapshot is to reduce wasted tool calls and model confusion, not to establish the security boundary.

#### Capability snapshot endpoint contract

This should be implemented as a real API, not as an in-process helper that only one host can use.

Minimum endpoint:

- `POST /missions/{mission_id}/capability-snapshot`

Minimum request fields:

| Field | Required | Meaning |
|---|---|---|
| `mission_id` | yes | Mission being planned against |
| `principal` | yes | agent or workload identity planning the next step |
| `session_id` | yes | runtime session requesting the capability snapshot |
| `constraints_hash` | yes | host's currently cached Mission version |
| `requested_domain` | no | optional advanced-profile next-domain hint |
| `requested_delegation_depth` | no | optional advanced-profile child-agent planning hint |

Minimum response fields:

| Field | Meaning |
|---|---|
| `mission_id` | Mission identifier |
| `constraints_hash` | current enforceable version |
| `planning_state` | `active`, `stale`, `pending_approval`, `denied` |
| `allowed_tools` | tools that can be planned immediately |
| `gated_tools` | tools requiring explicit approval before real execution |
| `denied_actions` | actions the planner must not attempt |
| `anomaly_flags` | active anomaly conditions that restrict planning (see below) |
| `refresh_after_seconds` | host cache hint |

`allowed_domains` and `delegation_bounds` are optional advanced-profile fields and should be omitted from the simplified single-domain core unless those profiles are enabled.

**`anomaly_flags` shape:**

```json
{
  "anomaly_flags": [
    {
      "flag": "repeated_denied_calls",
      "severity": "warning",
      "affected_tools": ["mcp__email__email.send_external"],
      "description": "3 denied calls to this tool in the last 5 minutes",
      "restriction": "tool_suspended_pending_review"
    }
  ]
}
```

An empty `anomaly_flags` array means no active anomaly conditions. The host must check this field on every capability snapshot response and restrict planning to tools not covered by an active flag. Flags with `restriction: tool_suspended_pending_review` mean the host should not plan use of that tool until the flag clears or is lifted by an operator.

#### Capability snapshot failure behavior

Use explicit status semantics:

| Condition | Response |
|---|---|
| Mission active and hash current | `200` with `planning_state = active` |
| Mission active but caller hash stale | `409` with current `constraints_hash` |
| Mission pending approval | `423` or `200` with `planning_state = pending_approval` |
| Mission denied or revoked | `403` |
| unknown Mission | `404` |

The host should treat any non-active planning state as a reason to stop expanding the plan.

## Runtime Enforcement and Token Projection

The main path above ends once Mission state has been compiled and approved. The next sections cover how that state becomes runtime tokens and enforcement decisions.

## Agent Identity and the Subject Token

Every token exchange starts with a `subject_token`. That token has to come from somewhere, and the choice determines who the AS trusts and what appears in the `act` claim chain.

Use one of these sources:

| Source | Use when | Strength | Main weakness |
|---|---|---|---|
| OIDC token from enterprise IdP | human starts the session | strongest user binding | requires interactive login path |
| workload identity credential | headless or server-side agent | proves runtime software identity | no direct human `sub` |
| service account token | early or low-sensitivity deployment | simple to ship | weakest identity binding |

Recommendation:

- human-initiated production session -> OIDC token
- headless production deployment -> workload identity
- early internal prototype -> service account only if the rest of the surface is tightly constrained

| Token Type | Issuer | Main Purpose | Typical `sub` | Typical Consumer |
|---|---|---|---|---|
| subject token | enterprise IdP or workload issuer | prove who is asking for Mission-scoped tokens | user or workload | AS |
| MCP transport token | AS | authorize access to one MCP server audience | user or workload, with agent in `act` | MCP server |
| direct API token | AS or target-domain AS | authorize one downstream API surface | user or workload, with agent in `act` | API / service |
| ID-JAG | enterprise IdP AS | bridge identity across trust domains | user or enterprise principal | target-domain AS |
| approval object | MAS | prove a human or policy checkpoint was satisfied | approver identity | host, MAS, MCP commit boundary |

For the simplified single-domain core, you only need three runtime artifacts:

1. `subject_token`
2. audience-specific token (`MCP transport token` or `direct API token`)
3. `approval_object` for gated actions

`ID-JAG` and `delegation_artifact` belong to advanced profiles and should stay out of the first deployment unless the use case truly requires them.

### Sub-agent authentication (Advanced Profile)

The rest of this section is optional. Skip it entirely for the simplified single-domain core.

A sub-agent cannot use the user's OIDC session — it did not perform an interactive login. Use one of these patterns:

**Delegation artifact plus narrowed child token.** This is the preferred pattern. When MAS approves a derived sub-Mission, it also issues a signed delegation artifact for the child. The artifact binds the child Mission to the approved parent Mission, actor chain, child agent identity, narrowing proof summary, expiry, and current `constraints_hash`. The parent presents that artifact to the AS, which validates it locally and mints a short-lived child subject token with the child agent's identity as `sub` and the parent as `act`. The sub-agent uses that as its `subject_token` when requesting MCP transport tokens.

```json
{
  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
  "subject_token": "<parent-subject-token>",
  "subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
  "requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
  "audience": "https://auth.example.com",
  "scope": "agent.delegate",
  "mission_id": "mis_child_01",
  "agent_id": "agent_research_sub_01"
}
```

Minimum delegation artifact payload:

```json
{
  "delegation_id": "dlg_01JR9VD2D3M1B3S54K1A8CF8NQ",
  "parent_mission_id": "mis_parent_01",
  "child_mission_id": "mis_child_01",
  "parent_agent_id": "agent_research_assistant",
  "child_agent_id": "agent_research_sub_01",
  "actor_chain": ["user_123", "agent_research_assistant"],
  "constraints_hash": "sha256:6e0e7c...",
  "proof_summary": {
    "tools_subset": true,
    "actions_subset": true,
    "domains_subset": true,
    "expiry_subset": true,
    "delegation_depth_subset": true
  },
  "exp": "2026-04-12T22:10:00Z",
  "signature": "..."
}
```

**Workload identity for the sub-agent process.** If the sub-agent runs as a separate process or container, issue it a workload identity credential (SPIFFE SVID or equivalent) scoped to its agent identity. The AS validates the workload credential and mints a Mission-scoped token for the child Mission. This is the stronger option for long-running or isolated sub-agents.

**Injected narrow token from the parent.** The parent requests a Mission-scoped token for the child's audience and passes it directly. The sub-agent never calls the AS itself. This is simpler but gives the parent full control over the child's token, which is acceptable for trusted in-process delegation and must not be used across process or trust boundaries.

In all cases, the `act` chain in any token the sub-agent obtains must reflect the delegation path:

```json
{
  "sub": "agent_research_sub_01",
  "act": {
    "sub": "agent_research_assistant",
    "act": {
      "sub": "user_123"
    }
  }
}
```

The AS should not reconstruct Mission lineage from MAS on every sub-agent issuance. The runtime proof model is:

1. validate the parent subject token or parent-held child audience token
2. validate the signed delegation artifact locally
3. confirm artifact expiry and `constraints_hash` freshness
4. confirm the requested child audience and child agent match the artifact
5. confirm the `act` chain in the issued token matches the artifact `actor_chain`
6. mint a short-lived child token or reject

MAS remains authoritative for deriving the child Mission and issuing the delegation artifact, but MAS should not be a synchronous dependency for every child token mint. MAS lookups are a fallback for recovery, not the default issuance path.

**Fallback validation path:** if the artifact is missing, expired, or references unknown Mission state, the AS may query MAS for the current child Mission record and reject unless MAS can return a fresh signed replacement artifact. That keeps runtime issuance on a locally verifiable path while preserving MAS as the source of truth for delegation.

**Availability posture:** MAS degradation should block *new* child Mission derivations and artifact refreshes, but it should not immediately break already-issued child credentials and valid delegation artifacts. Multi-agent orchestration therefore degrades by reducing refresh and new delegation capacity first, not by making every child token issuance synchronous on MAS.

## How to Scope OAuth Tokens to Mission

There are two patterns:

| Pattern | Best fit | Shape |
|---|---|---|
| scopes + Mission claims | simpler deployments | coarse scopes plus `mission_id`, `constraints_hash`, `allowed_tools` |
| `authorization_details` | richer tool bounds | structured audience-specific request object |

### Pattern A: Scope strings plus Mission claims

Use ordinary OAuth scopes for coarse tool families and add Mission projection claims such as `mission_id`, `constraints_hash`, and `allowed_tools`.

### Pattern B: `authorization_details`

Use [RAR](https://datatracker.ietf.org/doc/html/rfc9396) when the audience needs structured tool authorization input.

Example token request:

```json
{
  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
  "subject_token": "agent-session-token",
  "audience": "https://mcp.example.com/finance",
  "resource": "https://mcp.example.com/finance",
  "authorization_details": [
    {
      "type": "mcp_tool_access",
      "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
      "tools": ["erp.read_financials", "docs.write"],
      "stage_constraints": ["release_gate"]
    }
  ]
}
```

Recommendation:

- start with **scopes + Mission projection**
- add `authorization_details` only when the audience needs structured bounds

### Scopes, Cedar, and FGA

Tool authorization needs three layers, and they should not be confused.

| Layer | What it is good for | What it should not be asked to do |
|---|---|---|
| OAuth scopes | coarse audience and capability families | current Mission lifecycle, approval state, or delegation semantics |
| Cedar | Mission-aware policy decisions across host, AS, MCP, and commit boundary | per-resource sharing graphs or document collaboration state |
| FGA / relationship auth | concrete resource-instance checks such as document, folder, project, or dataset access | acting as the durable source of Mission authority |

Use scopes for the coarse tool family boundary:

- `mcp.tools.call`
- `finance.read`
- `docs.write`
- `ticket.create`

Use Cedar for the Mission-aware decision:

- is this tool call inside the current Mission
- is the `constraints_hash` current
- is the Mission active
- is approval present for this stage-gated action
- is this delegated actor still inside its bound

Use FGA only where the downstream tool or backend needs instance-level checks:

- can this principal write this document
- can this Mission access this folder
- can this actor open a ticket in this project
- can this agent read this dataset row set or collection

The layering rule is:

1. MAS owns the Mission authority record
2. AS projects Mission into scopes and audience-specific claims
3. host and MCP use Cedar to enforce Mission-aware policy
4. backend tools or APIs may apply FGA for concrete resource-instance checks

Example:

- token projection includes:
  - `scope = "mcp.tools.call docs.write"`
  - `mission_id`
  - `constraints_hash`
  - `allowed_tools = ["mcp__docs__docs.write", "mcp__docs__docs.publish"]`
- host and MCP enforce:
  - `docs.write` is in scope
  - Mission is active
  - publish is stage-gated
  - approval is current
- docs backend FGA enforces:
  - this Mission principal may write to folder `board/q2`
  - this principal may not publish into `external/investor-relations`

That is the intended split:

- scopes narrow the token
- Cedar evaluates Mission context
- FGA protects concrete resource instances

### Token validation modes

Two token-validation patterns appear in this note. Choose one per deployment context and apply it consistently.

**Opaque or introspected tokens:** use when the AS is on the critical path, immediate revocation matters more than latency, and the consumer can tolerate a live introspection dependency. The AS returns `active: false` immediately after Mission revocation, and consumers rely on introspection for token liveness.

**Self-contained tokens plus live Mission freshness:** use when the consumer validates locally, latency matters, and a signal rail or MAS freshness check is already in place. The token signature and claims validate locally, but the consumer still checks `mission_id` and `constraints_hash` freshness against a local cache or live MAS state. Revocation becomes effective at the next freshness check, token exchange, `tools/call`, or commit-boundary call.

The implementation examples in this note mostly use the second model for MCP servers: self-contained token for ordinary validation, live Mission freshness for authority state. If you choose the first model instead, replace local `mission_revoked_or_stale(...)` checks with token introspection and keep the same fail-closed rules for commit boundaries.

| Validation Mode | Where Liveness Comes From | Revocation Speed | Dependency | Best Fit |
|---|---|---|---|---|
| opaque / introspected token | AS introspection | fast | live AS | central control, short tolerance for stale auth |
| self-contained token + Mission freshness | local JWT validation + MAS/cache freshness | next freshness check | signal rail or MAS freshness path | lower latency MCP and tool surfaces |

### What to expose and what to keep internal

Not all Mission state belongs in every projection. A token that crosses a trust boundary or reaches a third-party MCP server should carry only what that system needs to make its decision.

**Always safe to include in tokens:**
- `mission_id` (opaque reference, not Mission content)
- `constraints_hash` (version handle for cache invalidation)
- audience-specific `allowed_tools`
- `scope` or `authorization_details` scoped to this audience

**Include with care:**
- `stage_constraints`: appropriate for the MCP server or resource that enforces the gate; not appropriate in tokens sent to external domains that do not need to know the internal approval model
- `act` chain: appropriate for audit; may reveal more about the internal principal chain than an external partner needs to see

**Keep internal to the MAS:**
- `purpose` text and `summary`
- `open_questions` from shaping
- `delegation_bounds` structure
- full resource class and action sets (project only the audience-specific subset)
- approval evidence beyond a reference ID

The test is simple: if the receiving system cannot act on a field, it should not see it. Design projections for the narrowest consumer, not the broadest.

#### Cross-domain privacy rules

Cross-domain flows need stronger minimization than same-domain flows.

Minimum rules:

- use directed or pairwise subject identifiers per target domain where possible
- do not expose internal purpose text, approval reason text, review packet contents, or free-form justifications outside the source domain
- do not expose full `act` chains cross-domain unless the target domain must evaluate them; prefer a reduced delegation reference
- do not expose internal resource classes or deny sets to another domain
- treat `mission_id` as an opaque reference, not a readable business identifier

The target domain should learn only:

- who or what is requesting access in that domain's terms
- what local audience or local scope is being requested
- enough correlation material to audit its own issuance and enforcement

## Cross-Domain Tool Access with ID-JAG (Advanced Profile)

The rest of this section is optional. Skip it entirely for the simplified single-domain core.

Mission-scoped OAuth tokens are enough inside one trust domain. When the agent needs to call tools across different organizational domains, add an identity-assertion step rather than stretching one local token across all of them.

This is where **ID-JAG** fits.

The current [Identity Assertion JWT Authorization Grant draft](https://datatracker.ietf.org/doc/html/draft-ietf-oauth-identity-assertion-authz-grant-02) defines ID-JAG as an identity assertion issued by an IdP authorization server for a resource authorization server in another trust domain. The practical pattern is:

1. the user authenticates to the enterprise IdP
2. the host or gateway asks the IdP AS for an **ID-JAG** targeted at domain B's authorization server
3. the IdP AS applies enterprise policy and returns the ID-JAG
4. the host exchanges the ID-JAG at domain B's authorization server for a domain-B access token
5. the host uses that domain-B token to call domain B's MCP server or API

That gives you one assertion bridge plus one local access token per target domain.

Be explicit about the boundary:

- **Mission** is the durable authority record
- **ID-JAG** is a cross-domain identity assertion
- **domain-local access tokens** are enforcement projections minted by each target domain

ID-JAG should not become the place where Mission meaning lives. It does not replace MAS, it does not carry the full authority model, and it does not make the target domain subordinate to the source domain. It only gives the target domain enough identity and delegation context to decide whether it wants to mint its own local token.

#### Cross-domain traceability without oversharing

The source domain still needs traceability across the boundary, but that traceability should not require exposing internal context to the target domain.

Use this split:

- **source domain retains full audit linkage**
  - `mission_id`
  - internal approval references
  - full actor and host lineage
- **target domain receives a minimal correlation handle**
  - pairwise subject or directed identifier
  - external correlation ID
  - domain-local token ID

The source domain maps external correlation IDs back to internal Mission records locally. The target domain does not need the internal Mission contents to remain auditable inside the source domain.

### Cross-domain flow

For a cross-domain tool call, the flow looks like:

1. MAS determines the Mission permits access to partner domain B
2. AS or gateway asks the enterprise IdP AS for an ID-JAG with:
   - `aud =` domain B authorization server
   - `client_id =` the registered client in domain B
   - optional `resource`
3. enterprise IdP policy decides whether this cross-domain identity bridge is allowed
4. host exchanges the ID-JAG at domain B AS together with a **domain-B-local authorization request** for a domain-B access token
5. domain B MCP server or API evaluates the resulting domain-B token locally

The resource domain stays sovereign. Domain B does not trust the enterprise host directly. It trusts its own AS, which trusts the incoming ID-JAG.

That distinction matters:

- MAS decides whether the Mission allows crossing into domain B
- the enterprise IdP AS decides whether it will issue the ID-JAG
- domain B AS decides whether it will mint a domain-B token for the requested local scope
- domain B resource or MCP server decides whether the resulting token is enough for the requested action

Those are related decisions, but they are not the same decision.

### Cross-domain readiness tiers

Not every partner or external domain will support the ideal flow on day one.

Use this readiness model:

| Tier | Capability | What it supports |
|---|---|---|
| 0 | no federation support | no cross-domain Mission execution; manual handoff only |
| 1 | OAuth token issuance only | narrow local integration after bilateral onboarding |
| 2 | ID-JAG or equivalent identity bridge | domain-local token minting with explicit trust setup |
| 3 | full Mission-aware integration | local token minting, auditable correlation, reliable revocation handling |

Do not assume every partner can start at Tier 2 or 3.

#### Cross-domain onboarding checklist

Before enabling one partner domain, confirm:

- target domain supports the required exchange flow
- target domain has stable local scopes or tool projection
- both sides agree on correlation and incident contacts
- privacy review approves the external correlation fields
- revocation and outage behavior are understood on both sides

### Cross-domain sequence diagram

```mermaid
sequenceDiagram
    participant Host as Host / Gateway
    participant MAS as MAS
    participant IdP as Enterprise IdP AS
    participant BAuth as Domain B AS
    participant BRes as Domain B MCP/API

    Host->>MAS: confirm domain B allowed for Mission
    MAS-->>Host: allow / deny
    Host->>IdP: request ID-JAG
    IdP-->>Host: ID-JAG or deny
    Host->>BAuth: exchange ID-JAG + local auth request
    BAuth-->>Host: domain-B token or deny
    Host->>BRes: call with domain-B token
    BRes->>BRes: enforce local policy
    BRes-->>Host: result
```

### Human approval and auto-approval for cross-domain requests (Advanced Profile)

Cross-domain access is exactly where MAS approval mode matters.

If organizational policy already allows:

- `finance.read`
- approved ticket creation
- approved document-signature preparation

for a known partner set, the MAS can auto-approve the Mission and let token exchange proceed without a user prompt.

If the Mission requests:

- a new partner domain
- external publication
- privileged write access in another domain

the MAS should escalate to human approval before requesting the ID-JAG.

That keeps the approval decision in the authority plane rather than burying it inside token exchange.

## How the Components Interact

### 1. User request arrives

The user asks:

> Prepare the Q2 board packet, reconcile the final numbers, and call back for approval before releasing the final presentation.

The agent host:
- sends the request to the Mission shaper
- receives a structured proposal
- forwards it to the MAS for approval

### 2. MAS approves the Mission

The MAS:
- stores the approved Mission
- compiles the Mission Authority Model
- returns:
  - `mission_id`
  - lifecycle state
  - approved tools / resource classes
  - any stage constraints

At this point, the agent can begin planning. It still cannot assume arbitrary tool access.

### 3. Agent asks for tool-facing access tokens

OAuth enters at token projection time.

Do not issue one broad token for “everything the agent might need.” Issue **derived tool-facing tokens** from the AS based on Mission projections.

Use:

- [OAuth Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693) to derive downstream tokens for specific tools or MCP servers
- [Rich Authorization Requests](https://datatracker.ietf.org/doc/html/rfc9396) when you need structured authorization input rather than flat scopes

A practical pattern is:

1. Agent host presents its subject token to the AS.
2. AS looks up `mission_id` in the MAS.
3. AS issues a token for a specific audience:
   - an MCP server
   - or a direct SaaS/API tool
4. Token contains:
   - standard OAuth claims
   - a Mission projection
   - narrowed scopes / `authorization_details`
   - optional `act` chain for delegated execution

Use the [OAuth Actor Profile](https://mcguinness.github.io/draft-mcguinness-oauth-actor-profile/draft-mcguinness-oauth-actor-profile.html) `act` claim only for delegation lineage. It is not a substitute for Mission state.

Example issued token claims:

```json
{
  "iss": "https://auth.example.com",
  "sub": "user_123",
  "aud": "https://mcp.example.com/finance",
  "scope": "mcp.tools.list mcp.tools.call finance.read docs.write",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "allowed_tools": ["erp.read_financials", "docs.read", "docs.write"],
  "stage_constraints": ["release_gate"],
  "act": {
    "sub": "agent_research_assistant"
  }
}
```

Also decide how tokens are bound to the runtime.

If these are plain bearer tokens, a leaked token is just a movable credential. The implementation needs one of:

- a trusted local gateway that holds tokens and never exposes them to the model
- sender-constrained tokens
- or another transport binding that ties the token to the calling runtime

Do not leave that decision implicit.

**If using a trusted local gateway:** it must be a separate OS process, not a subprocess spawned by the agent host or a script reachable via the model's Bash execution environment. Tokens are stored in OS-backed secure storage (macOS Keychain, Linux kernel keyring, or a secrets manager sidecar) — not in the working directory, not in environment variables the model can read, not in files the `Bash` hook can access. The gateway exposes a local Unix socket or loopback endpoint. The agent host calls it to attach tokens to outbound requests; the model cannot call it directly. This separation is what makes the local gateway meaningful as a containment boundary: if the model is compromised via prompt injection, it cannot exfiltrate tokens by reading a file or calling a subprocess that writes them to stdout.

### What the AS should evaluate before issuing a token

The AS should not mint a Mission-scoped token just because the session asks for one. It should evaluate a Cedar request first:

**Required inputs**
- subject token
- requested audience
- requested scopes or `authorization_details`
- `mission_id`
- current Mission status and `constraints_hash`

**Processing**
- validate subject token
- load current Mission projection
- evaluate issuance policy
- apply audience-specific projection
- mint narrow token only if policy permits

**Required outputs**
- MCP transport token or direct API token
- denial with reason

**Failure behavior**
- inactive Mission -> deny
- stale `constraints_hash` -> refresh required
- audience outside Mission scope -> deny
- step-up approval missing -> deny

**Acceptance checks**
- token audience matches one concrete consumer
- token does not include denied tools or actions
- token lifetime does not exceed Mission lifetime
- token issuance is reproducible from the current Mission projection

- **principal**: current agent host or workload
- **action**: `issue_token`
- **resource**: MCP server audience or downstream API audience
- **context**:
  - `mission_id`
  - `constraints_hash`
  - requested scopes or `authorization_details`
  - current Mission status
  - delegation depth
  - step-up approval state

The issuance decision should answer three concrete questions:

- can this principal obtain a token for this audience
- can that token include these projected tools or actions
- is step-up approval required before issuance

### 4. MCP client uses OAuth bearer tokens

The current [MCP authorization spec](https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization) is explicit that MCP authorization is transport-level OAuth for HTTP transports. MCP clients use standard `Authorization: Bearer` headers, and MCP servers validate tokens using ordinary OAuth patterns.

That means your agent host or MCP gateway should:

- discover MCP authorization metadata
- obtain a token for the MCP server audience
- attach `Authorization: Bearer <token>` on every MCP HTTP request

If the agent needs multiple MCP servers, mint multiple audience-specific tokens.

Split issuance into two token shapes:

1. **MCP transport token**
   Audience is the MCP server. Claims identify the Mission and the tool families visible there.
2. **Direct API token**
   Audience is the downstream SaaS or service API. Claims and `authorization_details` are specific to that API surface.

These should not be treated as interchangeable just because both are OAuth tokens.

The enforcement rule should be simple:

- **MCP transport token** authorizes access to an MCP server surface
- **direct API token** authorizes access to a downstream API surface
- neither token is the Mission
- both tokens are cacheable projections that can be invalidated when Mission state changes

### 5. MCP server filters tools based on Mission

Per the current [MCP tools spec](https://modelcontextprotocol.io/specification/2025-03-26/server/tools), servers expose tools through `tools/list` and `tools/call`. They must validate inputs, implement access controls, rate limit invocations, and sanitize outputs. The same spec also says tool annotations are untrusted unless they come from a trusted server.

So your MCP server should not trust model behavior or tool annotations as the security mechanism. It should:

#### On `tools/list`

- validate the bearer token
- read `mission_id`, allowed scopes, and allowed tools
- call the MAS or policy cache if needed
- return only the tools allowed for that Mission

If the Mission allows only `erp.read_financials` and `docs.write`, then `email.send` and `slack.post` do not appear in the tool list. This is convenience, not security. The real control stays on `tools/call`.

#### On `tools/call`

- validate the bearer token again
- verify the tool name is allowed
- validate input arguments against schema
- apply Mission-side constraints
- apply rate limits / quotas / budget checks
- invoke the downstream tool only if all checks pass

For ordinary low-risk reads, this check can often be token-only:

- validate token
- verify tool allowance
- verify local Cedar decision

For higher-risk actions, require a live lookup or a commit-boundary call:

- live MAS status check
- fresh Cedar evaluation with current context
- approval check
- anomaly check

One practical decision rule is:

- **token-only**
  - low-risk reads
  - stable Mission status
  - no stage gate
  - no anomaly signal
- **token + local Cedar**
  - ordinary writes inside a trusted boundary
  - bounded tool arguments
  - current policy bundle still valid for the current `constraints_hash`
- **live MAS or commit-boundary lookup**
  - external communication
  - publication
  - deletion
  - payment
  - high-risk anomaly state

That keeps the common path fast without flattening all calls to the same risk level.

## Containment Is Not Optional

A Mission architecture without containment is only half an architecture.

The MAS answers:
- what was approved
- what is active
- what is allowed in principle

Containment answers:
- what should be blocked right now
- what needs fresh approval
- what should be stopped when the model drifts

You need both.

In Claude Code terms, containment starts in the host before it reaches the tool server.

The host-side rule should be:

- block what is clearly out of Mission before it reaches any external tool
- require explicit approval for actions that need a user or operator checkpoint
- emit signals even when the downstream tool never runs

### Tool-boundary containment

At the MCP server or tool gateway:

- filter visible tools
- validate input schemas
- enforce size limits and rate limits
- block dangerous argument combinations
- enforce data egress restrictions
- sanitize outputs before they go back to the model

Examples:
- deny `email.send` if the Mission has no external communication permission
- deny `docs.publish` until `release_gate` is satisfied
- block a `sql.query` argument that touches tables outside approved resource classes

### Commit-boundary containment

Some actions should not be fully decided at `tools/call`.

Examples:
- publishing a final board packet
- sending external email
- initiating payment
- deleting records

For those, add a **commit boundary**:

1. Tool request reaches a side-effecting operation.
2. System revalidates:
   - Mission still active
   - actor still valid
   - stage constraint satisfied
   - user approval still current
   - anomaly checks clean
3. Only then is the irreversible action committed.

That can be implemented:
- in the MCP server
- in a downstream service
- in a workflow engine
- or in a gateway immediately before the write

But it must exist somewhere non-bypassable.

#### Commit-boundary ownership classes

In production, the commit boundary often belongs to the downstream system of record, not to the host or MCP layer.

Use this ownership model:

| Boundary owner | Best fit |
|---|---|
| host | local-only actions that do not leave the runtime or workspace |
| MCP server | tool-local effects where the MCP server is already the authoritative write path |
| downstream service | payments, publication, tickets, signatures, records of truth |
| workflow engine | multi-step approvals or commits that span systems |

Default rule:

- if a downstream system owns the authoritative write, that system should own the final commit boundary
- host and MCP may still do prechecks, but they are not the final authority

**Commit-boundary serialization — owner-scoped, not MAS-global:** the design does **not** require a single MAS-owned lock for every gated side effect in a Mission. That approach makes MAS a hot-path lock service and serializes unrelated actions unnecessarily.

Use this rule instead:

- serialization, idempotency, and duplicate suppression belong to the **owner of the irreversible side effect**
- if the downstream system of record owns the final write, it should own the commit token, lease, or idempotency key
- if a workflow engine owns the side effect, it should own the in-flight approval or commit lease
- MAS records approval state and current `constraints_hash`, but it is not the global runtime lock manager

Minimum requirement:

- every commit-boundary action must carry a stable `commit_intent_id`
- the final enforcement owner must reject duplicate or conflicting commits for the same `commit_intent_id`
- if a deployment needs broader serialization, scope it to the concrete resource or effect class, not to the whole Mission

**Why this matters:** two gated actions under the same Mission may be unrelated. A publish to one docs workspace and a ticket update in another system should not block each other just because they share a Mission. Owner-scoped commit control preserves integrity without turning MAS into a distributed lock service.

**Availability consequence:** MAS is still a synchronous dependency for approval state and freshness on some commit-boundary flows, but it is no longer the lock owner for every irreversible write. That reduces hot-path centralization and lets downstream systems use their own stronger write-serialization semantics.

**How this reconciles with "commit-boundary lock in MAS":** MAS holds an advisory coordination lock that gates cross-session access to the commit boundary — a short-lived lock acquired before the step-up prompt and released after the downstream write completes. The downstream system owns idempotency via `commit_intent_id`, which survives the release of the MAS lock. These are two different locks at two different scopes, and both must be present. See [Things not to simplify away](#things-not-to-simplify-away) for the full reconciliation.

#### Commit-boundary user experience

The commit boundary is the moment where an irreversible side effect becomes real. Users and models should both understand when they are at a commit boundary and what happens there.

**What the model sees at a commit boundary:**

Before the host or MCP server executes the side effect, the model should receive context that this is a point of no return:

```
[Commit boundary: "Publish Q2 Board Packet"

This action is irreversible. Before proceeding I will:
1. Verify your Mission is still active (live check with Mission Authority Service)
2. Verify the required Controller approval is still valid
3. Check there are no active anomaly flags on this session

If all checks pass, the document will be published immediately.
If any check fails, the action will be denied and you will be notified.]
```

This message should be injected by the host's `PermissionRequest` hook before the action reaches the commit-boundary owner.

**Three outcomes at the commit boundary and how they appear to the user:**

| Outcome | What the host tells the model | What the model says to the user |
|---|---|---|
| All checks pass, action committed | `[Commit succeeded: "Publish Q2 Board Packet" completed at 14:32 UTC. Audit record: evt_01xyz]` | "Done. The board packet was published successfully." |
| Live check fails — Mission changed | `[Commit denied: Mission constraints changed since the last check. Refreshing and retrying.]` | "The policy was updated just as I was completing that step. Refreshing and trying again." |
| Approval missing or expired | `[Commit denied: required approval is not present or has expired. Action blocked.]` | "I can't complete that step — the approval window closed. I'll request a new approval." |
| MAS unreachable | `[Commit denied: cannot verify Mission status. Action blocked until connectivity is restored.]` | "I can't complete that right now — I'm unable to verify your current Mission. I'll try again in a moment." |

**What the user should see after a successful commit:**

The host should surface a brief confirmation that includes:
- what action was taken
- when it happened (timestamp)
- what the user can do to review or undo (if applicable)

Example: "I published the Q2 Board Packet to the board folder at 2:32 PM. Board members can view it at [link]. Contact the board secretary if you need to make corrections."

**What the user should NOT see:**

- `commit_intent_id` values
- `constraints_hash` at time of commit
- internal approval object IDs
- raw error codes from commit-boundary failures

### Runtime feedback to the MAS

Containment also has to report back.

When the agent or tool layer sees:
- repeated denied calls
- prompt injection indicators
- attempts to access out-of-scope tools
- policy violations
- release-gate attempts before approval

it should send runtime events back to the MAS.

That allows the MAS to:
- suspend the Mission
- narrow remaining authority
- require step-up approval
- or revoke the Mission entirely

Without that feedback loop, containment is just local damage control.

### What this architecture does not address: instruction sequences

This limitation is central enough that it should shape how the rest of the note is read: the architecture gives you strong **point-in-time** authority checks, bounded token projection, auditable approvals, and revocation-aware containment. It does **not** prove that a long sequence of individually allowed actions is globally safe.

Cedar evaluates individual `tools/call` requests. Each call is policy-evaluated and either permitted or denied. A sub-agent's tool calls are each checked against its Mission-scoped policy.

This architecture does not evaluate instruction sequences. A series of individually-permitted tool calls whose combined effect is outside Mission intent is not caught by any of the enforcement mechanisms in this design. Example: read-only access to a financial database + read-only access to a draft document service + draft-write access = an agent that can exfiltrate financial data into a document that the user then exports. Each individual call is permitted; the sequence is not.

This is a policy layer limitation, not a bug in the implementation. Per-call authorization is necessary but not sufficient for behavioral containment. What fills the gap:

- **anomaly detection in signals**: the MAS receives every tool call signal; repeated patterns (N reads to sensitive resource + draft write) can be detected as an anomaly signal and trigger suspension
- **session-level budget limits**: the template can specify maximum call counts per resource class within a session; exceeding the budget triggers step-up review
- **commit-boundary as a sequence break**: publish and send operations require commit-boundary revalidation, which includes a full Mission liveness check — a human reviewing a publish action sees the full context of what was drafted

None of these eliminate the problem; they reduce the window and raise the cost. A Mission architecture that claims to fully address instruction-sequence risks would be overclaiming. Design the system to detect anomalous sequences and suspend, not to prove none can occur.

**What this architecture bounds (not eliminates):**

| Sequence risk | How it is bounded | What residual risk remains |
|---|---|---|
| Exfiltration via permitted tools | `cross_resource_exfil_pattern` anomaly detection; session budgets cap read volume | A slow exfiltration across many sessions within budget won't trigger the anomaly threshold |
| Incremental scope expansion across sessions | Mission state is per-Mission, not per-session; each session binds to current authority | An attacker with sustained access and patience could spread a sequence across many approved Missions |
| Tool sequence producing unintended cumulative side effect | Commit-boundary forces human review before publish/send | Read-only cumulative effects (context enrichment for later prompt injection) are not bounded |
| Prompt injection via tool results | sanitize_tool_result heuristics + authority-in-MAS model | Sophisticated injection that doesn't match known patterns may get through |

**What would be needed to bound these residuals:**

- Cross-session correlation: the MAS would need to persist and analyze tool call history across multiple Missions for the same user and purpose class. This is not in scope for v1 but is the right direction for v2 behavioral analysis.
- Intent-relative sequence evaluation: policy evaluated against a declared intent model, not just individual call permissibility. This requires research-grade ML and is out of scope for any near-term deployment.
- Trusted execution environments: if the model runtime itself is trusted, exfiltration via tool results becomes harder. This is a model deployment question, not a protocol question.

**What this architecture is correctly designed to do:** contain single-session authority, make revocation fast and reliable, and audit enough to reconstruct what happened. For most enterprise deployments, that is the right trade-off. Behavioral sequence safety is a harder problem that should be stated honestly, not papered over with speculative controls.

## Host Integration Examples

The next sections show concrete host integrations. Claude Code is the primary reference because its hook surface is explicit. OpenClaw is an illustrative integration pattern using the same Mission contracts.

## Claude Code as the Agent Host

Anthropic's [Claude Code hooks](https://docs.anthropic.com/en/docs/claude-code/hooks) make this architecture concrete because they provide deterministic control points before and after prompts, tool calls, and stop events.

### Host contract

**Required inputs**
- current Mission context
- local policy bundle or residual policy surface
- tool invocation events
- approval state
- risk state

**Processing**
- query before plan
- intercept tool calls
- evaluate Cedar locally
- ask, allow, or deny
- emit runtime signals

**Required outputs**
- tool execution decision
- updated session context
- audit or anomaly signal

**Failure behavior**
- missing Mission context -> restricted mode
- stale local cache -> refresh capability snapshot before continuing
- approval required and absent -> deny or defer
- policy engine unavailable -> fail closed for high-risk actions

**Acceptance checks**
- no external tool call bypasses host-side evaluation
- host can block or defer stage-gated actions
- denied actions produce signals
- Mission changes invalidate host cache before next high-risk action

The most useful hooks here are:

- `UserPromptSubmit`
- `PreToolUse`
- `PostToolUse`
- `Stop`
- `SubagentStop`
- `SessionStart`
- `PermissionRequest`
- `PermissionDenied`

#### What each hook should do

#### `UserPromptSubmit`

Use this hook to:

- create or look up the active Mission
- inject Mission summary and current constraints into context
- block prompts that attempt to bypass the Mission

The hooks reference says `UserPromptSubmit` can add context or block prompt processing. Use it to ensure the session starts inside a Mission.

#### `PreToolUse`

Anthropic's hooks support `permissionDecision = allow | deny | ask` for `PreToolUse`. Use it to:

- map the requested Claude Code tool call to a Mission action
- evaluate Cedar with:
  - principal = agent host / user / workload
  - action = requested tool invocation
  - resource = file, command, MCP tool, or external service
  - context = Mission state, stage constraints, runtime risk
- deny or ask when the call is outside the Mission

Claude Code's hook input includes `tool_name`, `tool_input`, and `tool_use_id` for `PreToolUse`, which is enough to map a concrete tool invocation into a Cedar request before execution.

Map host events like this:

- `Bash` command to `host.exec`
- `Write` to `workspace.write`
- `mcp__finance__erp.read_financials` to `tool.call`

and then evaluate those against Mission instead of trusting the model's explanation.

#### `PermissionRequest`

Use this hook for actions that are allowed in principle but require an explicit checkpoint.

Claude Code's `PermissionRequest` hook can allow or deny on behalf of the user and can also update tool input. In a Mission architecture, use it to:

- pause on stage-gated actions
- collect a fresh approval token or approval event
- downgrade or narrow the tool input if policy requires a safer variant

This is a good fit for "ask at the commit boundary" behavior.

#### `PermissionDenied`

Use this hook to turn denied tool attempts into governance signals.

Claude Code emits `PermissionDenied` when auto mode blocks a tool call, and it exposes the denied `tool_name`, `tool_input`, and reason. That gives the host a clean place to:

- log repeated attempts
- raise the session risk score
- tell the MAS that the Mission may need narrowing or suspension

#### `PostToolUse`

Use this hook to:

- inspect tool output
- detect anomalies or prompt-injection indicators
- send audit and anomaly events to the MAS
- add blocking feedback to Claude Code if the call succeeded technically but violated policy expectations

Anthropic's hooks support `additionalContext` and blocking feedback after tool execution. That is a clean place to feed policy findings back into the session.

#### `Stop` and `SubagentStop`

Use these hooks to prevent Claude Code from declaring completion while required Mission obligations remain open:

- unresolved approval gate
- required audit log not emitted
- required artifact not written
- pending anomaly review

If Mission requires a release gate, `Stop` is one of the places to make sure the host does not silently stop before the governance loop is complete.

### Step-up approval inline UX for Claude Code

When the model reaches a stage-gated action, the `PermissionRequest` hook fires. The host must surface a clear, action-specific approval prompt to the user — not a generic "do you want to continue?" dialog.

**What the approval prompt must communicate:**

1. What action is about to happen (in plain language, not tool names)
2. What data or systems are affected (specifics, not abstract resource class names)
3. What happens after approval (the side effect, irreversibility if applicable)
4. Who will review it (for async approvals)

**Inline approval prompt example (synchronous, user is the approver):**

```
[Approval required]

I'm ready to publish the Q2 Board Packet to the SharePoint board folder.

What will happen:
- The document "Q2 Board Packet Final.pdf" will be published to: Board/2026-Q2/
- This is irreversible — once published, the document is visible to all board members
- The publish will be logged to the audit trail

Do you want to proceed?
[Approve] [Deny] [See document first]
```

**Inline approval prompt example (async, requires external approver):**

```
[Step-up approval required]

Publishing the Q2 Board Packet requires Controller approval.
I've submitted a review request to the Finance Controller.

What happens next:
- You'll receive a notification when the Controller responds
- Once approved, I'll complete the publish automatically
- If denied, I'll explain what needs to change

This Mission will stay active while we wait.
[View pending request] [Cancel request]
```

**What the `permission-gate.sh` hook must do for stage-gated actions:**

```bash
#!/usr/bin/env bash

# Read hook input
TOOL_NAME=$(echo "$HOOK_INPUT" | jq -r '.tool_name')
MISSION_ID=$(cat "$CLAUDE_PROJECT_DIR/.claude/mission_context.json" | jq -r '.mission_id')

# Check if this tool is gated
IS_GATED=$(cat "$CLAUDE_PROJECT_DIR/.claude/mission_context.json" | jq \
  --arg t "$TOOL_NAME" '.stage_constraints[] | select(.tool == $t) | .gated')

if [ "$IS_GATED" != "true" ]; then
  # Not gated — allow
  echo '{"permissionDecision": "allow"}' 
  exit 0
fi

# Check for existing valid approval
APPROVAL=$(curl -sf "$MAS_URL/missions/$MISSION_ID/approvals/stage_gate?tool=$TOOL_NAME" \
  -H "Authorization: Bearer $MAS_TOKEN")

STATUS=$(echo "$APPROVAL" | jq -r '.status')

if [ "$STATUS" = "granted" ]; then
  # Approval already in hand — check it hasn't expired
  EXPIRES=$(echo "$APPROVAL" | jq -r '.expires_at')
  NOW=$(date -u +%s)
  EXP=$(date -u -d "$EXPIRES" +%s 2>/dev/null || date -ju -f "%Y-%m-%dT%H:%M:%SZ" "$EXPIRES" +%s)
  
  if [ "$NOW" -lt "$EXP" ]; then
    echo '{"permissionDecision": "allow"}'
    exit 0
  fi
fi

# No valid approval — surface the step-up prompt
DISPLAY_NAME=$(echo "$APPROVAL" | jq -r '.tool_display_name // "this action"')
APPROVER=$(cat "$CLAUDE_PROJECT_DIR/.claude/mission_context.json" | \
  jq -r --arg t "$TOOL_NAME" '.stage_constraints[] | select(.tool == $t) | .approver_role')

cat <<EOF
{
  "permissionDecision": "deny",
  "userMessage": "This action (${DISPLAY_NAME}) requires ${APPROVER} approval before I can proceed. I've submitted a review request. You'll be notified when it's approved."
}
EOF

# Submit the approval request to MAS
curl -sf -X POST "$MAS_URL/missions/$MISSION_ID/approvals/request" \
  -H "Authorization: Bearer $MAS_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"tool\": \"$TOOL_NAME\", \"constraints_hash\": \"$(cat "$CLAUDE_PROJECT_DIR/.claude/mission_context.json" | jq -r '.constraints_hash')\"}"

exit 0
```

**What the model should say after a step-up prompt is denied and approval is requested:**

The host injects: `[Approval requested for [action]: awaiting [approver role] review. Mission remains active for other work.]`

The model responds: "I've submitted this for [approver role] review. While we wait, I can continue with other parts of the work that don't require approval — want me to proceed with [next step]?"

This keeps the agent useful during the approval wait rather than blocking the entire session.

### Async approval pending UX

When a Mission enters `pending_approval` (for Mission creation, not step-up) or an async step-up approval is in flight, the user's session is still open. The host must handle this window explicitly — it is not a blocking state, but it is also not full operational status.

**What Claude Code shows when the Mission enters `pending_approval` with the session open:**

```
[Mission submitted for approval]

I've submitted your Mission request for Controller review. Here's what's happening:

  Mission: Q4 Board Packet Preparation
  Status: Waiting for Finance Controller approval
  Submitted: 2:14 PM

While we wait, I can help you with:
  - Planning the board packet structure
  - Drafting content that doesn't require finance data access
  - Answering questions about the process

[View pending request] [Work on something else] [Close session and return later]
```

**What Claude Code shows when the Mission enters `pending_approval` and the session is ending:**

```
[Before you go]

Your Mission request is pending approval. You can close this session — when the Controller approves it, you'll receive a notification.

To resume: open a new session and I'll pick up where we left off. Your approved Mission will be ready.

[Close session]
```

**How the user knows when an async approval arrives:**

Approval arrival is surfaced through two paths:

1. **In-session notification (if the session is still open):** the host polls the capability snapshot endpoint on a 30-second interval while a Mission is in `pending_approval`. When the snapshot returns `status: active`, the host surfaces:

```
[Mission approved]

Your Mission request has been approved by Finance Controller (Sarah K.).
I'm ready to begin. Want me to start with the financial data pull?
```

2. **Out-of-session notification:** the approval workflow service sends the configured notification (email, Slack, etc.) per the [Approval notification delivery spec](#approval-notification-delivery). The notification contains a deep link back to the session or a resumption link.

**Polling rules for in-session approval detection:**
- poll `POST /missions/{id}/capability-snapshot` every 30 seconds while session is open and Mission is in `pending_approval`
- stop polling if the session enters an idle state for more than 10 minutes; resume on next user activity
- do not surface "still waiting" messages on every poll — only surface the approval arrival
- if the Mission moves to `denied` during polling, surface: "Your Mission request was denied by [approver]. Reason: [denial_reason]. Want to revise and resubmit?"

**What happens if approval takes longer than the session lifetime:**

If the session ends before approval arrives:
1. The Mission remains in `pending_approval` — it does not expire because the session ended.
2. The user receives an out-of-session notification when approval arrives.
3. When the user opens a new session, the host runs the session-start checklist. `GET /missions?actor={actor_id}&status=pending_approval` returns the in-flight Mission. The host surfaces:

```
[Pending Mission]

You have a Mission request waiting for approval:
  Mission: Q4 Board Packet Preparation
  Status: Waiting for Finance Controller
  Submitted: 2 hours ago

[Check status] [Start something else] [Cancel request]
```

4. If the user checks status and the Mission is now `active`, the host proceeds with normal session-start flow for an active Mission.
5. If the Mission is still `pending_approval`, the host offers to notify the user when it's ready and allows other work to proceed.

**What the host must NOT do during `pending_approval`:**

- attempt tool calls that require the pending Mission's authority
- surface the Mission as `active` before the capability snapshot confirms `status: active`
- assume approval will arrive within any particular time window
- create a second Mission for the same purpose class while one is `pending_approval` (see `active_mission_exists` error)

### Parallel tool execution

Claude Code can execute multiple tool calls in parallel within one model turn. The enforcement model must handle this correctly.

**Rules for parallel `PreToolUse` events:**

- each hook invocation evaluates against the same `constraints_hash` snapshot cached at the start of the turn
- if two parallel calls both read the same cached Mission state and both return `allow`, both proceed; no locking is required for concurrent reads
- if one parallel call returns `ask` (a stage-gated action), that call is deferred; the other calls continue independently; the deferred call must not block its sibling calls from completing
- if a parallel call triggers a `constraints_hash` change (e.g., a `PostToolUse` signal causes a Mission amendment), the host must:
  1. allow in-flight calls against the prior hash to complete normally
  2. invalidate the cache before starting the next model turn
  3. refresh the capability snapshot before the model plans its next step

**Rules for concurrent commit-boundary actions:**

- two parallel tool calls must not both attempt the same commit-boundary action
- the host hook must serialize commit-boundary requests: if a commit-boundary call is in flight, any second call to the same gated tool must wait or deny
- the MCP server must use the `idempotency_key` to ensure a retry after a parallel collision does not produce duplicate side effects
- the recommended pattern is to treat commit-boundary actions as non-parallelizable: only one gated side effect may be in flight for a given Mission at a time

**`constraints_hash` consistency during a parallel turn:**

Load the cached `constraints_hash` once at the start of each model turn and use that snapshot for all parallel `PreToolUse` evaluations in that turn. Do not re-read the cache file mid-turn. If a signal arrives that invalidates the cache during a parallel turn, process it after the turn completes, not mid-turn.

### Example Claude Code hook wiring

The concrete shape in Claude Code looks like this:

```json
{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup|resume",
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/load-mission-context.sh"
          }
        ]
      }
    ],
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/shape-or-select-mission.sh"
          }
        ]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit|MultiEdit|mcp__.*",
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/pretool-cedar-check.sh"
          }
        ]
      }
    ],
    "PermissionRequest": [
      {
        "matcher": "Bash|mcp__.*",
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/permission-gate.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash|Write|Edit|MultiEdit|mcp__.*",
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/posttool-signal.sh"
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/check-open-gates.sh"
          }
        ]
      }
    ]
  }
}
```

MCP tools appear in hook events with names like `mcp__<server>__<tool>`, which makes them matchable in the same control plane as local tools.

### Session Resume

When a session is resumed after an interruption, the host must re-establish Mission state before any tool call executes.

#### Session-start host checklist

Every session start must complete this checklist before the model is given any prompt or allowed any tool call. This is the minimum correctness bar for session initialization.

```
SESSION START CHECKLIST

[ ] 1. Load cached mission_id and constraints_hash from local session file
        → if no cached mission_id: skip to step 5 (no prior Mission)

[ ] 2. Call GET /missions/{mission_id}
        → on 404: treat as revoked; inject revocation message; go to step 5
        → on error: inject restricted-mode message; allow clarification only

[ ] 3. Call POST /missions/{mission_id}/capability-snapshot
        → on error: inject restricted-mode message; allow clarification only

[ ] 4. Validate cached constraints_hash against live Mission state
        → if hash changed: discard cached policy bundle; fetch fresh bundle from /policy-bundle
        → if hash matches: restore cached bundle

[ ] 5. Handle Mission state (see state table in Session Resume section)
        → active: inject capability snapshot into session context
        → pending_approval: enter restricted mode; surface approval-pending message
        → suspended: enter restricted mode; surface suspension message
        → revoked/expired: surface termination message; require new Mission
        → no Mission: prepare shaper prompt with template list for next UserPromptSubmit

[ ] 6. Fetch template list GET /templates
        → cache for use by shaper at UserPromptSubmit

[ ] 7. Check session budget status from capability snapshot
        → if any budget at warning threshold: inject passive warning into session
        → if any budget exceeded: inject suspended_budget message; block tool execution

[ ] 8. Check anomaly_flags from capability snapshot
        → if any flag active: inject restricted-mode message; restrict planning to flagged tools

[ ] 9. Set local policy cache entry:
        { mission_id, constraints_hash, approved_tools, stage_constraints,
          session_budget_status, anomaly_flags, capability_snapshot_timestamp }

[ ] 10. Confirm all checks passed before allowing first UserPromptSubmit
         → if any check failed and error is unrecoverable: surface error; do not proceed
```

Any session that skips this checklist and lets the model proceed without Mission context is not compliant with this architecture.

`load-mission-context.sh` at `SessionStart` should:

1. Check the local session file for a cached `mission_id` and `constraints_hash`.
2. Call `GET /missions/{mission_id}` and `POST /missions/{mission_id}/capability-snapshot` to fetch current Mission state and current capability snapshot.
3. Handle each response:

| Mission state returned | Host behavior |
|---|---|
| `active`, same `constraints_hash` | restore cached context, inject capability snapshot into session |
| `active`, new `constraints_hash` | fetch updated bundle and refreshed capability snapshot, re-inject into session |
| `pending_approval` | enter restricted mode, surface pending-approval message to model |
| `suspended` | enter restricted mode, surface suspension message; wait for `mission.resumed` signal |
| `revoked` or `denied` | surface revocation message; do not resume Mission; require new Mission for new work |
| `expired` | surface expiry message; offer to create a replacement Mission from the prior purpose |
| `404` | treat as revoked; do not continue on a missing Mission record |

The message surfaced to the model when the Mission is not active should be injected as trusted system context, not as a user turn. Example for suspension:

```
[Mission paused: the active Mission for this session has been suspended by governance. No tool calls are permitted until the Mission is resumed. You may continue with planning and clarification only.]
```

For revoked or expired Missions:

```
[Mission ended: the active Mission for this session is no longer valid. Any further work requires a new Mission to be created and approved.]
```

These messages prevent the model from attempting tool calls that will be denied and from hallucinating explanations for why the session changed state.

### Clarification UX flow in Claude Code

When MAS returns `status: pending_clarification`, the host must surface the open questions to the user and collect answers before attempting any tool calls.

**What Claude Code receives from MAS:**

```json
{
  "status": "pending_clarification",
  "mission_id": "mis_01abc",
  "open_questions": [
    {
      "question_id": "q1",
      "text": "Which external recipients are in scope? (e.g., board members only, or all shareholders?)",
      "required": true
    },
    {
      "question_id": "q2",
      "text": "Should the published document include unpublished financial projections?",
      "required": true
    }
  ],
  "question_count": 2,
  "expires_at": "2026-04-12T15:00:00Z"
}
```

**Host injection into the session (trusted system context, not user turn):**

```
[Mission pending clarification: before this Mission can be activated, the following questions must be answered.

1. Which external recipients are in scope? (e.g., board members only, or all shareholders?)
2. Should the published document include unpublished financial projections?

Please ask the user these questions and collect their answers. No tool calls are permitted until the Mission is activated. The Mission will expire if not resolved by 3:00 PM UTC.]
```

**Model behavior:** the model asks the user the open questions in natural language. It does not rephrase the questions as separate turns — it should batch them into one message so the user can answer all at once.

**When the user answers**, the host collects the answers and calls:

```http
POST /missions/{mission_id}/clarify
Content-Type: application/json

{
  "responses": [
    { "question_id": "q1", "answer": "Board members and institutional shareholders only" },
    { "question_id": "q2", "answer": "No — exclude unpublished projections" }
  ]
}
```

**MAS response after clarification:**

| Result | Status code | Host action |
|---|---|---|
| Auto-approved | `200` with `status: active` | inject active Mission context, proceed normally |
| Needs human review | `200` with `status: pending_approval` | inject approval-pending message, enter restricted mode |
| Still ambiguous | `200` with `status: pending_clarification` | surface remaining questions (2nd round) |
| Excessive ambiguity | `422` with `reason: excessive_ambiguity` | surface denial message, offer to restart with a clearer request |

**Counting rounds:** the host must track clarification round count. After 3 rounds with unresolved questions, surface: "This request couldn't be clarified enough to create a Mission. Try rephrasing what you want to accomplish." Do not submit a 4th clarification round.

**What the model must not do:**
- attempt tool calls while in `pending_clarification`
- treat the open questions as optional
- submit answers on behalf of the user (the user must provide answers)
- re-issue a new `POST /missions` to bypass the clarification state

### Claude Code hook flow

In a Mission-aware Claude Code deployment:

1. `SessionStart`
   establishes or resumes Mission context per the session resume contract above
2. `UserPromptSubmit`
   creates or selects the Mission
3. `PreToolUse`
   enforces host-side containment and Cedar policy
4. MCP server
   enforces tool-side containment and Mission-aware tool filtering
5. `PostToolUse`
   emits signals and adds corrective context
6. `Stop` / `SubagentStop`
   block completion until Mission obligations are satisfied

The host should keep a small local policy cache keyed by:

- `mission_id`
- `constraints_hash`
- current approval state
- session risk state

When any of those change, invalidate the cache and force the next high-risk tool call back through a live MAS or commit-boundary check.

Pseudocode implementations of `pretool-cedar-check.sh` and `permission-gate.sh` appear in the end-to-end walkthrough below.

### Mission Health Indicator for Host Surfaces

The host surface should always display the current Mission health state so users understand the agent's operational posture without asking.

**Health states and what users see:**

| Internal state | Indicator label | Color convention | Meaning |
|---|---|---|---|
| `active`, no warnings | Mission Active | Green | Normal operation; all tools available |
| `active`, budget warning | Budget Warning | Yellow | Approaching a session limit |
| `active`, approval pending | Waiting for Approval | Yellow | Gated action is pending human review |
| `active`, anomaly flag | Restricted | Orange | Unusual activity detected; some tools restricted |
| `pending_approval` | Awaiting Approval | Yellow | Mission not yet active; no tool execution |
| `pending_clarification` | Needs Input | Yellow | Open questions must be resolved before activation |
| `suspended` or `suspended_budget` | Paused | Orange | Operator or budget pause; confirm to continue |
| `revoked` | Ended | Red | Mission revoked; no further execution |
| `expired` | Expired | Red | Mission expired; new Mission required |
| no Mission | No Mission | Gray | No active Mission; request will create one |

**What the indicator should include beyond a label:**

- Mission display name (not purpose class code)
- Time until expiry (if < 24 hours: "Expires in 3h 40m")
- Active approval countdown (if pending: "Approval requested 12m ago")
- Budget bar for any resource class nearing its limit (if > 80% consumed)

**What the indicator must not include:**

- `constraints_hash` value
- Cedar entity names
- Internal Mission ID (show only in debug mode)
- Approval object IDs

**Indicator update contract:**

The health indicator updates when:
1. The capability snapshot is refreshed (session start, Mission state change)
2. A `mission.amended`, `mission.suspended`, `mission.resumed`, `mission.revoked`, or `mission.expired` signal arrives
3. A session budget warning threshold is crossed during tool execution
4. An approval is granted or expires

The indicator should not poll MAS on every tool call. It should update on signals and on capability snapshot refreshes, which happen at natural session boundaries.

**Minimal implementation for Claude Code:**

Claude Code does not have a persistent UI element, but it does support injecting context into the system prompt. The health indicator in Claude Code should be a one-line status prepended to every model turn that reflects the current Mission state:

```
[Mission: Board Packet Preparation | Active | Expires in 6h | Budget: finance.read 45/100]
```

This gives the model and the user a consistent baseline each turn without requiring a visual UI component.

### What the MCP Server Must Do

If you want one concise checklist, it is this.

**Required inputs**
- validated access token
- current Mission freshness state
- tool request name and arguments
- local policy bundle

**Processing**
- validate token
- validate Mission freshness
- filter visible tools
- enforce `tools/call`
- recheck at commit boundary
- emit runtime signals

**Required outputs**
- filtered `tools/list`
- allowed tool result
- denied or deferred tool response

**Failure behavior**
- invalid token -> unauthorized
- stale Mission -> deny and require refresh
- out-of-scope tool -> forbidden
- approval missing at commit boundary -> deny or defer
- policy engine unavailable -> fail closed for high-risk actions

**Acceptance checks**
- `tools/list` never reveals denied tools
- `tools/call` cannot execute denied or stale requests
- irreversible actions pass through commit-boundary revalidation
- denial, deferral, and success events are emitted to MAS

An MCP server in a Mission architecture must:

1. implement [MCP OAuth authorization](https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization) for HTTP transports
2. validate bearer tokens on every request
3. treat tool annotations as untrusted for security decisions
4. filter `tools/list` based on Mission-scoped authorization
5. validate `tools/call` arguments against schema and policy
6. enforce access controls, rate limits, and output sanitization
7. invoke a commit-boundary check before irreversible effects
8. emit audit and anomaly events back to the MAS

If it does not do those things, it is a convenience adapter, not a governance-aware tool server.

## OpenClaw as the Agent Host (Illustrative)

> **Note:** This section is a proposed integration pattern, not a normative description of OpenClaw's product behavior. The hook surface and configuration schema used here are illustrative; Claude Code is the reference host with a concrete, documented hook API. The Mission authority model is the same in both cases.

Claude Code is one concrete host. OpenClaw is another, but the control points are different.

The current OpenClaw docs describe an architecture with:

- an **Agent Core** that orchestrates model calls and state
- a **tool layer** for built-in tools
- a **skill layer** for packaged capabilities
- a **channel layer** for user interaction
- isolated sandboxes and per-skill permissions

That means the Mission architecture should attach to OpenClaw in three places:

1. **agent configuration**
   narrow which tools and skills are even enabled for this agent
2. **gateway or execution wrapper**
   enforce Mission checks before a tool or skill actually executes
3. **approval and signal path**
   pause execution, request approval, and emit runtime events back to MAS

Do not treat the OpenClaw system prompt as the Mission boundary. The Mission boundary needs to live in the configured tool/skill surface and the execution wrapper.

### OpenClaw configuration example

This section is an **illustrative integration pattern**, not a claim that OpenClaw exposes the same explicit hook surface as Claude Code.

OpenClaw's docs describe agents, tools, skills, and channels. The YAML below is therefore a proposed Mission-aware configuration shape for OpenClaw-style deployments, not a normative product schema:

```yaml
agents:
  board-packet-agent:
    model: anthropic/claude-sonnet-4.6
    system_prompt: >
      You are the board packet assistant.
      Work only inside the active Mission constraints provided at runtime.
    channels:
      - slack
    memory:
      backend: sqlite
    tools:
      - file_read
      - file_write
      - http_request
    skills:
      - mission-gateway
      - finance-reader
      - docs-writer
```

The important point is that this agent should not have:

- shell access
- unrestricted HTTP destinations
- generic email-send skills
- generic browser automation

unless the Mission model and containment layer explicitly need them.

### What the `mission-gateway` skill should do

OpenClaw's docs treat skills as the packaged capability surface. In a Mission architecture, the practical integration pattern is to create one local skill whose only job is to mediate execution.

That skill should:

1. load the active Mission context from MAS or local cache
2. map the requested tool or skill invocation into:
   - principal
   - action
   - resource
   - context
3. evaluate Cedar locally
4. deny, allow, or defer
5. call the actual skill or downstream MCP/API only if policy allows it
6. emit a signal to MAS after execution or denial

### OpenClaw worked example

Assume the user asks OpenClaw over Slack:

> Pull Q2 actuals, draft the board memo, and ask me before sending anything externally.

The flow should be:

1. OpenClaw channel receives the message
2. Agent Core routes execution through the `mission-gateway` wrapper before any external skill or tool executes
3. `mission-gateway`:
   - shapes or looks up the Mission
   - calls MAS for approval
   - receives `mission_id`, `constraints_hash`, and approved tool set
4. `mission-gateway` allows:
   - `finance-reader`
   - `docs-writer`
5. `mission-gateway` denies or defers:
   - external-send skills
   - unapproved browser or shell access
6. when the agent later attempts an external send, `mission-gateway`:
   - checks the stage gate
   - requests human approval if needed
   - retries only after MAS emits approval

The difference from Claude Code is only the interception point:

- Claude Code -> explicit host hooks
- OpenClaw -> skill or tool wrapper layer

| Host | Natural Enforcement Point | Strength | Main Limitation |
|---|---|---|---|
| Claude Code | explicit host hooks such as `PreToolUse`, `PostToolUse`, `PermissionRequest` | deterministic interception before and after tool use | product-specific hook surface |
| OpenClaw | skill / tool wrapper layer such as `mission-gateway` | portable architecture for multi-channel agents | less explicit built-in interception surface in docs |

### OpenClaw and multiple domains

OpenClaw's channel and skill model is useful for multi-domain work because one agent can coordinate:

- an internal finance skill
- a partner-domain ticketing skill
- a signing-domain approval skill

But the Mission rules stay the same:

- one Mission in MAS
- one policy bundle keyed by `constraints_hash`
- one local token or ID-JAG flow per target domain
- one containment decision before each real side effect

In OpenClaw terms, that means:

- internal finance skill uses a local enterprise token
- partner-domain skill exchanges ID-JAG for a partner-domain token
- signing skill requires a stage-gated approval object before commit

The host may look different, but the authority pattern does not change.

## Worked Example

This section is the most concrete walk-through in the note. If a reader only wants one execution path from prompt to enforcement, this is the one to read.

### One End-to-End Tool Path

Here is one concrete path all the way through the system.

The user asks:

> Prepare the Q2 board packet and ask me before releasing the final deck.

#### Step 1: `UserPromptSubmit` shapes or selects the Mission

Claude Code fires `UserPromptSubmit`.

The hook calls `shape-or-select-mission.sh`, which must handle three cases, not two:

**Case 1: no active Mission for this session (cold start)**

1. send the prompt to the Mission shaper
2. forward the structured proposal to the MAS
3. receive back `mission_id`, approved tools, stage constraints, `constraints_hash`
4. cache the Mission binding in session state

If organizational policy can approve the Mission, MAS returns `status = active`. If the request exceeds the auto-approval envelope, MAS returns `status = pending_approval` and the host stays in restricted mode until a human approves.

**Case 2: active Mission exists, new instruction is within scope**

1. compare the new instruction against the current Mission's `allowed_tools` and `resource_classes`
2. if the new instruction fits comfortably inside the existing envelope, inject the `mission_id` and `constraints_hash` into the prompt context and proceed — do not re-shape
3. the model executes within the already-approved authority

This is the common case during a multi-turn session. Do not re-run the shaper on every prompt. The Mission was approved for a purpose class; instructions within that purpose class do not need a new Mission.

**Case 3: active Mission exists, new instruction appears to extend scope**

The hard case. The user said "and also do X" where X looks like it might require tools or resource classes outside the current Mission.

The hook must choose:

- **delta amendment path**: classify whether the new instruction is a narrowing or broadening of the current Mission. If narrowing or same-scope, proceed. If broadening, call `POST /missions/{id}/amend` with the delta and wait for the new `constraints_hash` before proceeding.
- **cold-start path**: if the new instruction is clearly a new task unrelated to the current Mission, close the current Mission (`POST /missions/{id}/complete`) and start a new shape-and-compile cycle.

The hook cannot resolve this automatically in all cases. When ambiguous, the hook should pause execution and surface the ambiguity to the user: "this looks like it might extend your current Mission — should I add it to the existing scope or start a new task?" Defaulting to cold-start is safer than defaulting to scope extension.

**What shape-or-select-mission.sh must not do:** silently widen the current Mission by re-running shaping over the combined prompt history. The shaper does not know the current Mission boundary; it will produce a new proposal that may be broader. Always use the amendment path to modify an active Mission.

After any of the three cases resolve:

At this point the host caches:

```json
{
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "approved_tools": [
    "mcp__finance__erp.read_financials",
    "mcp__docs__docs.read",
    "mcp__docs__docs.write"
  ],
  "stage_constraints": [
    {
      "name": "release_gate",
      "applies_to": ["mcp__docs__docs.publish"]
    }
  ]
}
```

The host now has a stable Mission handle. It is not operating off prompt text alone.

#### Step 2: the compiler produces policy and token projections

The MAS compiler turns that Mission into three enforcement artifacts:

1. **Cedar request shape**
   - principal: `Mission::Agent::"agent_research_assistant"`
   - action: `Mission::Action::"call_tool"`
   - resources:
     - `Mission::Tool::"mcp__finance__erp.read_financials"`
     - `Mission::Tool::"mcp__docs__docs.write"`
     - `Mission::Tool::"mcp__docs__docs.publish"`
2. **Token projection**
   - audience `https://mcp.example.com/finance`
   - allowed tools for finance server only
3. **Host hint**
   - `mcp__docs__docs.publish` requires approval at commit boundary

That gives each layer a narrow artifact instead of the whole Mission blob.

If the Mission was auto-approved, compilation continues immediately.

If the Mission was escalated, compilation may still produce a provisional policy bundle, but token issuance and gated tool execution should remain blocked until the human approval signal arrives.

#### Step 3: the host obtains an MCP transport token

Before Claude Code can call the finance MCP server, the host or MCP gateway asks the AS for a token:

```json
{
  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
  "subject_token": "agent-session-token",
  "audience": "https://mcp.example.com/finance",
  "resource": "https://mcp.example.com/finance",
  "authorization_details": [
    {
      "type": "mcp_tool_access",
      "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
      "tools": ["erp.read_financials"]
    }
  ]
}
```

Before issuing, the AS evaluates a Cedar request roughly like:

```json
{
  "principal": "Mission::Agent::\"agent_research_assistant\"",
  "action": "Mission::Action::\"issue_token\"",
  "resource": "Mission::Audience::\"https://mcp.example.com/finance\"",
  "context": {
    "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
    "constraints_hash": "sha256-abc123",
    "requested_tools": ["erp.read_financials"],
    "mission_status": "active"
  }
}
```

If permitted, the AS returns a transport token for the finance MCP server only.

#### Step 4: `PreToolUse` enforces before the tool call leaves Claude Code

Claude Code decides it wants to call:

- `mcp__finance__erp.read_financials`

Before that happens, `PreToolUse` fires and passes the hook:

- `tool_name = mcp__finance__erp.read_financials`
- `tool_input = { "quarter": "Q2", "view": "actual_vs_plan" }`
- `tool_use_id = ...`

`pretool-cedar-check.sh` maps this into a local policy request:

```json
{
  "principal": "Mission::Agent::\"agent_research_assistant\"",
  "action": "Mission::Action::\"call_tool\"",
  "resource": "Mission::Tool::\"mcp__finance__erp.read_financials\"",
  "context": {
    "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
    "constraints_hash": "sha256-abc123",
    "runtime_risk": "normal",
    "commit_boundary": false
  }
}
```

The hook returns:

- `allow` if policy permits the tool call
- `deny` if it is outside Mission
- `ask` if the host needs a checkpoint first

For this read case, the likely answer is `allow`.

#### Step 5: the MCP server enforces again on `tools/call`

The host sends:

- `Authorization: Bearer <finance-mcp-token>`
- MCP `tools/call` for `erp.read_financials`

The MCP server:

1. validates the token
2. checks that `mission_id` and `constraints_hash` are current enough
3. verifies the requested tool is in the token projection
4. runs its own local Cedar check or policy-cache lookup
5. validates the input shape

Only then does it call the actual finance backend.

This duplication is deliberate. The host and the server are separate enforcement surfaces.

#### Step 6: `PostToolUse` emits signals and updates session context

The finance MCP server returns a bounded result.

`PostToolUse` fires in Claude Code. The host:

1. records an audit event:
   - `tool.called`
   - `tool_name = mcp__finance__erp.read_financials`
   - `mission_id = mis_...`
2. inspects output for:
   - prompt injection markers
   - unexpected data volume
   - out-of-scope references
3. sends a runtime signal to MAS if anything is abnormal
4. optionally injects `additionalContext` back into the session

That gives the model the tool result, but it also gives governance a record that the call happened.

#### Step 7: publish hits the commit boundary

Later in the session, Claude Code decides it wants to call:

- `mcp__docs__docs.publish`

This time `PreToolUse` maps the call to a resource with a stage gate:

```json
{
  "principal": "Mission::Agent::\"agent_research_assistant\"",
  "action": "Mission::Action::\"publish_external\"",
  "resource": "Mission::Tool::\"mcp__docs__docs.publish\"",
  "context": {
    "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
    "constraints_hash": "sha256-abc123",
    "approvals": [],
    "commit_boundary": true
  }
}
```

The host does not silently allow this call.

It returns `ask`, which routes into `PermissionRequest`.

`permission-gate.sh` then:

1. asks the MAS whether the required approval is already present
2. if not, requests or waits for `controller_approval`
3. refreshes local Mission state
4. retries the policy check only after approval is granted

If approval never arrives, the publish operation never becomes real.

#### Step 8: approval signal invalidates caches and allows one narrow action

When approval is granted, the MAS emits:

- `approval.granted`
- `mission_id = mis_...`
- `constraints_hash = sha256-abc123`
- `approval = controller_approval`

The agent host and docs MCP server invalidate their local caches.

The next `PreToolUse` and commit-boundary check now evaluate with:

```json
{
  "approvals": ["controller_approval"],
  "commit_boundary": true
}
```

The publish action is now allowed, but only for the active Mission and current constraint version.

#### Step 9: denial and drift feed containment

If Claude Code keeps trying:

- `mcp__email__email.send_external`

without Mission authority, two things happen:

1. `PreToolUse` denies it locally
2. `PermissionDenied` emits a signal such as:

```json
{
  "event_type": "tool.denied",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "tool_name": "mcp__email__email.send_external",
  "risk_level": "elevated",
  "decision": "deny"
}
```

If the pattern repeats, the MAS can:

- narrow remaining authority
- suspend the Mission
- force step-up approval
- or end the session

That is containment in practice. The system does not just block. It changes authority state when drift appears.

#### Sequence diagram

```mermaid
sequenceDiagram
    actor User
    participant Host as Claude Code Host
    participant MAS as MAS / Policy
    participant AS as OAuth AS
    participant MCP as MCP Server
    participant Tool as Tool Service

    User->>Host: prompt
    Host->>MAS: create/activate Mission
    MAS-->>Host: mission_id + constraints_hash + capability snapshot
    Host->>AS: token exchange(subject_token, mission_id, audience)
    AS-->>Host: MCP audience token
    Host->>Host: PreToolUse using capability snapshot
    Host->>MCP: tools/call + audience token
    MCP->>MCP: validate token + current constraints_hash
    MCP->>Tool: execute read/draft action
    Tool-->>MCP: result
    MCP-->>Host: tool result
    Host->>MAS: runtime signal(tool.called)
    alt gated action
        Host->>MAS: request approval state
        MAS-->>Host: approval required / approval granted
        Host->>MCP: commit-boundary call + approval object + commit_intent_id
        MCP->>Tool: final side effect
        Tool-->>MCP: committed
        MCP-->>Host: committed
    end
```

The important control points are:

- Mission is approved before token issuance
- token issuance is separately authorized
- `PreToolUse` blocks obvious out-of-scope calls before they leave the host
- MCP server enforces again at `tools/call`
- commit-boundary approval is revalidated at the moment the side effect becomes real
- signals update authority state instead of just generating logs

#### Host hook pseudocode

The host scripts do not need to be complicated. They need to be deterministic.

#### `pretool-cedar-check.sh`

```bash
#!/usr/bin/env bash
set -euo pipefail

event_json="$(cat)"
tool_name="$(jq -r '.tool_name' <<<"$event_json")"
tool_input="$(jq -c '.tool_input' <<<"$event_json")"
mission_id="$(jq -r '.session.mission_id' ~/.claude/mission-context.json)"
constraints_hash="$(jq -r '.session.constraints_hash' ~/.claude/mission-context.json)"

resource="$(mission-map-tool "$tool_name")"
action="$(mission-map-action "$tool_name" "$tool_input")"
runtime_risk="$(session-risk-score)"

cedar_request="$(jq -n \
  --arg principal "$(mission-map-principal)" \  # derive from session context
  --arg action "$action" \
  --arg resource "$resource" \
  --arg mission_id "$mission_id" \
  --arg constraints_hash "$constraints_hash" \
  --arg runtime_risk "$runtime_risk" \
  '{
    principal: $principal,
    action: $action,
    resource: $resource,
    context: {
      mission_id: $mission_id,
      constraints_hash: $constraints_hash,
      runtime_risk: $runtime_risk,
      commit_boundary: false
    }
  }'
)"

decision="$(cedar-eval "$cedar_request")"

case "$decision" in
  allow)
    jq -n '{ permissionDecision: "allow" }'
    ;;
  ask)
    jq -n '{ permissionDecision: "ask" }'
    ;;
  *)
    emit-mission-signal tool.denied "$tool_name" "$mission_id"
    jq -n '{
      permissionDecision: "deny",
      denyReason: "Tool call is outside the active Mission"
    }'
    ;;
esac
```

The shell is incidental. The sequence is what matters:

1. map Claude Code tool events into Mission actions and resources
2. evaluate Cedar with current Mission state
3. return `allow`, `ask`, or `deny`
4. emit a signal when denial changes session risk

**Mapping function contracts:**

`mission-map-tool "$tool_name"` returns the canonical Cedar resource string for a given Claude Code tool name. Use this translation table as the base; extend it from the resource catalog for site-specific tools:

| Claude Code tool name | Cedar resource |
|---|---|
| `Bash` | `Mission::Resource::"host.exec"` |
| `Write`, `Edit`, `MultiEdit` | `Mission::Resource::"workspace.write"` |
| `Read` | `Mission::Resource::"workspace.read"` |
| `Glob`, `Grep` | `Mission::Resource::"workspace.read"` |
| `mcp__<server>__<tool>` | `Mission::Tool::"mcp__<server>__<tool>"` (pass through as canonical resource ID) |
| unknown | fail closed: return empty string and deny |

`mission-map-action "$tool_name" "$tool_input"` returns the Cedar action string. Base rules:

| Tool category | `tool_input` signal | Cedar action |
|---|---|---|
| `Bash` | command contains `rm`, `delete`, `drop` | `Mission::Action::"delete"` |
| `Bash` | command writes to external endpoint | `Mission::Action::"send_external"` |
| `Bash`, `Write`, `Edit` | otherwise | `Mission::Action::"draft"` |
| `Read`, `Glob`, `Grep` | any | `Mission::Action::"read"` |
| `mcp__*` | tool name ends in `publish`, `send`, `pay` | `Mission::Action::"publish_external"` or matching vocabulary entry |
| `mcp__*` | otherwise | `Mission::Action::"call_tool"` |

`mission-map-principal` returns the Cedar principal string for the current session. Source it from the session context file in this order: workload identity claim if present, then OIDC `sub` if present, then registered `agent_id`. Format: `Mission::Agent::"<agent_id>"` for agent-initiated calls, `Mission::User::"<user_id>"` for user-initiated calls.

#### `permission-gate.sh`

```bash
#!/usr/bin/env bash
set -euo pipefail

event_json="$(cat)"
tool_name="$(jq -r '.tool_name' <<<"$event_json")"
mission_id="$(jq -r '.session.mission_id' ~/.claude/mission-context.json)"

gate="$(required-gate-for-tool "$tool_name")"

if [[ -z "$gate" ]]; then
  jq -n '{ permissionDecision: "allow" }'
  exit 0
fi

approval_state="$(mas-get-approval "$mission_id" "$gate")"

if [[ "$approval_state" != "granted" ]]; then
  emit-mission-signal tool.deferred "$tool_name" "$mission_id"
  request-approval "$mission_id" "$gate" "$tool_name"
  jq -n '{
    permissionDecision: "deny",
    denyReason: "Required approval is not yet granted"
  }'
  exit 0
fi

refresh-mission-context "$mission_id"
jq -n '{ permissionDecision: "allow" }'
```

This is the commit-boundary pattern in host form:

- detect that the tool is stage-gated
- ask MAS for current approval state
- deny until the gate is satisfied
- refresh Mission state before allowing retry

If you implement only one host-side rule, implement this one for irreversible actions.

#### MCP server pseudocode

The MCP server should be just as explicit as the host. It is not there to trust host decisions. It is there to enforce its own boundary.

#### `tools/list`

```python
def list_tools(request):
    token = validate_access_token(request.headers["Authorization"])
    mission_id = token["mission_id"]
    constraints_hash = token["constraints_hash"]

    if mission_revoked_or_stale(mission_id, constraints_hash):
        raise Unauthorized("Mission state is no longer current")

    allowed_tools = token.get("allowed_tools", [])
    visible = []

    for tool in ALL_TOOLS:
        if tool.name not in allowed_tools:
            continue
        if not local_policy_allows_listing(tool, mission_id, constraints_hash):
            continue
        visible.append(redact_tool_metadata(tool))

    return {"tools": visible}
```

`redact_tool_metadata(tool)` should strip fields that reveal backend system details the model does not need and should not have: internal endpoint URLs, credential parameter names, implementation notes in the description, and any annotation that names upstream systems outside the approved trust domain. The tool name, input schema, and a minimal description of what the tool does are sufficient. If the tool spec includes `x-internal-*` extension fields or backend routing hints, remove them before returning.

The purpose of `tools/list` filtering is to reduce accidental overreach. It is not the final security decision.

#### `tools/call`

```python
def call_tool(request):
    token = validate_access_token(request.headers["Authorization"])
    mission_id = token["mission_id"]
    constraints_hash = token["constraints_hash"]
    tool_name = request.json["name"]
    arguments = request.json["arguments"]

    if mission_revoked_or_stale(mission_id, constraints_hash):
        emit_signal("tool.denied", mission_id, tool_name, "stale_mission")
        raise Unauthorized("Mission state is no longer current")

    if tool_name not in token.get("allowed_tools", []):
        emit_signal("tool.denied", mission_id, tool_name, "tool_not_allowed")
        raise Forbidden("Tool is outside Mission scope")

    validate_schema(tool_name, arguments)

    cedar_request = {
        "principal": principal_from_token(token),
        "action": action_for_tool(tool_name, arguments),
        "resource": resource_for_tool(tool_name),
        "context": {
            "mission_id": mission_id,
            "constraints_hash": constraints_hash,
            "runtime_risk": current_runtime_risk(mission_id),
            "commit_boundary": tool_requires_commit_boundary(tool_name),
        },
    }

    decision = cedar_eval(cedar_request)
    if decision == "deny":
        emit_signal("tool.denied", mission_id, tool_name, "cedar_deny")
        raise Forbidden("Policy denied tool call")

    if tool_requires_commit_boundary(tool_name):
        return defer_to_commit_boundary(token, tool_name, arguments)

    result = invoke_backend_tool(tool_name, arguments)
    emit_signal("tool.called", mission_id, tool_name, "success")
    return sanitize_tool_result(tool_name, result)
```

The structure is the same as the host:

1. validate token
2. validate Mission freshness
3. validate tool allowance
4. evaluate Cedar
5. execute only if all checks pass

MCP uses JSON-RPC error objects. When a `tools/call` is denied at the Mission boundary, the server should return:

```json
{
  "jsonrpc": "2.0",
  "id": "<request-id>",
  "error": {
    "code": -32001,
    "message": "Tool call denied: outside Mission scope",
    "data": {
      "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
      "tool": "mcp__email__email.send_external",
      "reason": "tool_not_allowed"
    }
  }
}
```

Use code `-32001` for Mission-scope denials, `-32002` for stale Mission state, and `-32003` for approval missing at a commit boundary. Do not surface internal policy details or Cedar decision traces in the `data` field. `mission_id` and `reason` are sufficient for the caller to emit a signal and for audit to correlate the event.

**What the model sees and what it can do with it:**

A model that receives a `-32001` error sees a structured error, not silence. The model can react to this in ways that escape containment while technically complying with every enforcement rule:

- retry the same tool against a different MCP server that is not Mission-aware
- ask the user to perform the denied action manually ("I can't send email directly — could you send this for me?")
- attempt to achieve the same effect through a permitted tool (write the content to a document the user controls, rather than sending it)
- generate plausible-sounding output based on what it expected the tool to return, without actually calling the tool

None of these are catchable by the policy layer. The model's response to a denial is prompt text, not a tool call, and is not evaluated by Cedar.

What mitigates this is not enforcement but context: a model with a well-scoped Mission has a clear sense of what it is supposed to be doing, and a denial that is consistent with the Mission intent (e.g., "you don't have email access for this Mission") is more likely to produce appropriate behavior than one that appears arbitrary. The host system prompt should describe the Mission purpose explicitly so the model understands why restrictions exist. This is not a security control — it is a usability lever that reduces the model's motivation to work around denials.

The denial reason codes (`tool_not_allowed`, `mission_inactive`, `approval_missing`) should be structured so the host can surface them to the model with a human-readable explanation attached, not as raw codes the model has to interpret. The host's `PostToolUse` hook can inject a user-facing message like "this tool is outside your current Mission scope" alongside the error, giving the model context that reduces confusion without exposing policy internals.

### MAS API standard error body

The MCP server uses JSON-RPC error objects with numeric codes. MAS uses HTTP REST APIs. All MAS API error responses must use the following standard body regardless of endpoint:

```json
{
  "error_code": "string — machine-readable code",
  "message": "string — human-readable description for operators and developers",
  "mission_id": "string | null — present when the error is Mission-scoped",
  "request_id": "string — correlates to the request; always present",
  "details": {}
}
```

`details` is endpoint-specific and optional. It must never contain raw Cedar decision traces, internal stack traces, policy bundle contents, or `constraints_hash` internals.

**Standard error codes by endpoint:**

| Endpoint | HTTP status | `error_code` | Meaning |
|---|---|---|---|
| `POST /missions` | 400 | `invalid_request` | required field missing or malformed |
| `POST /missions` | 422 | `excessive_ambiguity` | intent could not be constrained; compiler returned too many candidate templates |
| `POST /missions` | 422 | `unknown_tool` | proposed tool not found in resource catalog |
| `POST /missions` | 422 | `catalog_miss` | resource class referenced in proposal has no catalog record |
| `POST /missions` | 422 | `template_mismatch` | no template matches the classified purpose class |
| `POST /missions` | 422 | `scope_exceeds_actor` | proposed scope is broader than the actor's base entitlements |
| `POST /missions` | 409 | `active_mission_exists` | actor already has an active Mission; cannot create a second without completing the first |
| `POST /missions/{id}/activate` | 409 | `pending_approval` | Mission cannot be activated while in `pending_approval` state |
| `POST /missions/{id}/activate` | 409 | `pending_clarification` | Mission cannot be activated while in `pending_clarification` state |
| `POST /missions/{id}/activate` | 412 | `constraints_hash_mismatch` | `constraints_hash` in the activate request does not match the compiled bundle |
| `POST /missions/{id}/capability-snapshot` | 404 | `mission_not_found` | no Mission with the given ID exists in the caller's tenant |
| `POST /missions/{id}/capability-snapshot` | 409 | `mission_not_active` | snapshot only available for Missions in `active` state |
| `POST /missions/{id}/capability-snapshot` | 503 | `snapshot_unavailable` | MAS cannot serve a snapshot; caller must treat as fail-closed |
| `POST /missions/{id}/signals` | 400 | `invalid_signal_type` | signal type not in the known signal type enumeration |
| `POST /missions/{id}/signals` | 404 | `mission_not_found` | |
| `POST /missions/{id}/signals` | 429 | `signal_rate_exceeded` | caller is sending signals faster than the ingestion buffer accepts |
| `POST /missions/{id}/commit-boundary/acquire` | 409 | `lock_held` | another session holds the commit-boundary advisory lock for this Mission |
| `POST /missions/{id}/commit-boundary/acquire` | 409 | `mission_not_active` | cannot acquire lock for an inactive Mission |
| `POST /missions/{id}/commit-boundary/acquire` | 412 | `constraints_hash_mismatch` | Mission constraints changed since the caller last refreshed |
| `POST /missions/{id}/amend` | 403 | `broadening_requires_approval` | the proposed amendment broadens scope; the amendment is queued for approval, not applied immediately |
| `POST /missions/{id}/amend` | 422 | `unknown_tool` | |
| `POST /missions/{id}/renew` | 409 | `mission_expired` | Mission is already in `completed` state; cannot renew |
| `POST /missions/{id}/revoke` | 403 | `insufficient_authority` | caller does not have revocation authority for this Mission |
| All endpoints | 401 | `unauthenticated` | missing or invalid bearer token |
| All endpoints | 403 | `tenant_mismatch` | authenticated tenant does not own the Mission |
| All endpoints | 500 | `internal_error` | MAS internal failure; retry with backoff |
| All endpoints | 503 | `mas_unavailable` | MAS is temporarily unavailable; all callers should treat as fail-closed |

**Error body example for a compiler failure:**

```json
{
  "error_code": "template_mismatch",
  "message": "No template matched the classified purpose class 'ad_hoc_query'. Available templates: board_packet_preparation, support_ticket_triage, draft_and_review.",
  "mission_id": null,
  "request_id": "req_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "details": {
    "classified_purpose_class": "ad_hoc_query",
    "available_purpose_classes": ["board_packet_preparation", "support_ticket_triage", "draft_and_review"]
  }
}
```

**Client handling rule:** all MAS clients must handle `mas_unavailable` as fail-closed — block tool calls, do not retry indefinitely, surface the error to the user. A client that silently degrades to allow-all on `503` defeats the entire governance model.

### Commit-boundary revalidation

```python
def defer_to_commit_boundary(token, tool_name, arguments):
    mission_id = token["mission_id"]
    constraints_hash = token["constraints_hash"]

    fresh_state = mas_get_live_state(mission_id)
    if fresh_state["constraints_hash"] != constraints_hash:
        emit_signal("commit.denied", mission_id, tool_name, "constraints_changed")
        raise Conflict("Mission constraints changed; retry required")

    approval = fresh_state["approvals"].get(required_gate_for(tool_name))
    if approval != "granted":
        emit_signal("commit.required", mission_id, tool_name, "approval_missing")
        raise Forbidden("Required approval is not granted")

    cedar_request = {
        "principal": principal_from_token(token),
        "action": action_for_tool(tool_name, arguments),
        "resource": resource_for_tool(tool_name),
        "context": {
            "mission_id": mission_id,
            "constraints_hash": constraints_hash,
            "approvals": list(fresh_state["approvals"].keys()),
            "runtime_risk": current_runtime_risk(mission_id),
            "commit_boundary": True,
        },
    }

    if cedar_eval(cedar_request) != "allow":
        emit_signal("commit.denied", mission_id, tool_name, "cedar_recheck_failed")
        raise Forbidden("Commit-boundary policy denied action")

    result = invoke_backend_tool(tool_name, arguments)
    emit_signal("tool.called", mission_id, tool_name, "committed")
    return sanitize_tool_result(tool_name, result)
```

This is the server-side equivalent of `permission-gate.sh`.

The difference is that this is the non-bypassable path immediately before the side effect becomes real.

**`sanitize_tool_result(tool_name, result)` contract:**

The purpose is to prevent untrusted tool output from carrying embedded instructions or authority claims that could influence the model's next planning step or override Mission state.

**Honest framing of what this does:** steps 2 and 3 below (injection pattern stripping, authority claim stripping) are lightweight heuristics that catch unsophisticated attacks. A sufficiently obfuscated injection payload will get through. The primary defense against prompt injection is that authority lives in MAS, not in model context — the model cannot elevate its own permissions by reading a tool result that claims to grant new permissions, because the host never reads authority from model context. Sanitization is a defense-in-depth layer, not a primary control.

Apply in this order:

1. **Size limit.** Truncate results that exceed the configured maximum (recommended: 50KB for text, 200KB for structured data). Return a truncated marker rather than silently dropping content.
2. **Strip embedded instruction patterns.** Scan for strings that match known prompt-injection indicators: phrases like "ignore previous instructions", "you are now", "system:", role-override sequences, and JSON structures that claim to be system messages or tool responses. Replace matched content with `[redacted: potential injection]`. This catches low-sophistication attempts; it does not catch encoding tricks or context-sensitive injections.
3. **Strip unexpected authority claims.** If the result contains fields named `mission_id`, `constraints_hash`, `approval`, `permissions`, or `allowed_tools` at the top level, remove them. Tool results must not carry authority state into model context. The reason this matters is not that the model might literally parse those fields — it is that a model following an injected instruction might echo them back in a way that looks like system output.
4. **Enforce output schema.** If the tool has a declared output schema in the resource catalog, validate the result against it. Fields not in the schema should be stripped rather than passed through.
5. **Return the sanitized result.** Log any redactions as `tool.output.sanitized` signals to the MAS.

### Signal emission

```python
def emit_signal(event_type, mission_id, tool_name, decision):
    payload = {
        "event_type": event_type,
        "mission_id": mission_id,
        "tool_name": tool_name,
        "decision": decision,
        "timestamp": now_iso8601(),
    }
    post_json(MAS_EVENT_ENDPOINT, payload)
```

This can start simple. What matters is that denial, deferral, approval wait, and commit success are all visible to the authority plane.

## Key Tradeoffs

This architecture is not free. The costs are real and should be stated directly.

| Choice | Benefit | Cost | Mitigation |
|---|---|---|---|
| Durable MAS as authority owner | clear lifecycle, audit trail, approval basis; revocation is a single state change | another P0 control plane to build, secure, and operate | split into adjacent services (bundle distribution, approval workflow, signal buffer); only MAS core is P0 — see [MAS centrality mitigation](#mas-centrality-mitigation) |
| Audience-specific token projection | narrow runtime authority; no single token covers everything | more token issuance requests; cache churn on short lifetimes | 300–900s token lifetime; cached at host; issuance is only blocked on MAS unavailability |
| Capability snapshot before planning | host plans inside the actual allowed surface; fewer wasted tool attempts | session-start latency; refresh on every authority transition | single snapshot per session start (not per tool); 120s cache for reads; only commit-boundary forces a live check |
| Cedar as the policy layer | deterministic evaluation; auditable decisions; one language across all enforcement points | schema discipline and compilation pipeline are mandatory; Cedar library must be embedded everywhere | Cedar evaluates locally (no network call per decision); entity snapshot is cached; only cache-miss path hits MAS |
| Three-layer enforcement (host + MCP + commit boundary) | real containment; no single bypass point | more integration surface; more places to diagnose denials | [Policy debugging diagnostic playbook](#policy-debugging-diagnostic-playbook) and `explain` API make diagnosis tractable |
| `auto_with_release_gate` as the default approval mode | agent can do all preparatory work without blocking; gate only fires at irreversible moment | requires stage gate configuration per template; approval routing must work | users are motivated to complete the approval at the commit point because they've invested work; SLA targets enforce fast routing |
| Self-contained token + freshness check | lower runtime latency; no per-call introspection | revocation only takes effect at next freshness checkpoint (token issuance or commit boundary) | commit-boundary forces a live MAS check; short token lifetime (300-900s) bounds the revocation gap — see [Revocation latency](#revocation-latency) for concrete SLA targets |
| Template governance model | reusable, auditable authority envelopes; templates age gracefully | templates can rot if ownership is weak | [Template governance ownership and cadence](#template-governance-ownership-and-cadence) mandates ownership fields, review cadence, and deprecation path |
| Downstream-owned commit boundary | final write authority stays with the real system of record; MAS is not a write-path dependency | each high-risk tool family must implement its own commit boundary | [MAS advisory lock](#reconciling-mas-owns-the-lock-with-downstream-owns-serialization) handles cross-session coordination; `commit_intent_id` handles idempotency |

The tradeoff pattern is consistent:

- narrower authority means more projections and more cache management
- stronger containment means more enforcement points and more diagnosis surface
- better auditability means more state and more events

That is not accidental overhead. It is the price of replacing ambient tool authority with governed authority.

**What can be avoided:**

The design has intentional complexity and incidental complexity. The incidental complexity is where teams waste time:

| Incidental complexity to avoid | How to avoid it |
|---|---|
| Per-tool-call MAS queries | use capability snapshot; query MAS once per session start |
| Custom policy adapters in v1 | use Cedar everywhere; adapters are a v2+ consideration |
| Advanced profiles in v1 | [Core profile mandatory defaults](#core-profile-mandatory-defaults) makes them off by default and unimplemented |
| Duplicate sequence diagrams | one diagram per flow (Mermaid) |
| Cedar schema inline in main flow | Cedar reference in appendix; summary in main flow |
| Ambiguous commit-boundary ownership | [MAS centrality mitigation](#mas-centrality-mitigation) decision table resolves this |

**Advanced-profile tradeoffs** (only relevant if those profiles are enabled):

| Choice | Benefit | Cost |
|---|---|---|
| Derived sub-Missions | safer delegation; narrowing proof is auditable | more issuance logic; narrowing proof must be stored and verifiable |
| Cross-domain local token minting | preserves target-domain sovereignty | more exchange steps; political cost of partner onboarding |
| Approved temporary elevation | supports JIT access mediation | heavier entitlement integration; higher review burden |

## FAQ

### Why not just store Mission in the token?

Mission is lifecycle state, not just issuance state. A token carries a projection of Mission authority at issuance time. It cannot carry approval state that changes after issuance, suspension that happens mid-session, or revocation that fires between tool calls. Tokens project Mission; they are not Mission.

### Why not use only OAuth scopes?

Scopes work for static entitlements. This problem requires audience-specific tool bounds that change per-session, stage gates that require runtime approval objects, lifecycle states (active, suspended, revoked) that tokens cannot represent, and delegation semantics with narrowing proofs. Scopes are part of the projection layer but cannot replace the authority layer.

### Why not let the LLM do the approval classification?

Model output is proposal input, not authority. The classifier can help shape and rank candidates. But approval mode, hard denies, and stage gates need deterministic policy and stored evidence that survives session boundaries. A classification that lives only in the model's context cannot be audited, revoked, or amended.

### Why do both the host and MCP server enforce?

They protect different boundaries. The host stops obvious overreach before a request leaves the session — it keeps the model from planning toward denied tools and confuses the user with hard stops. The MCP server is the non-bypassable boundary because it is the only component that cannot be bypassed by the agent. One without the other leaves a gap: a host-only model lets a determined attacker call MCP directly; an MCP-only model allows the model to attempt denied tools repeatedly with no in-session signal.

### Why is `tools/list` not enough?

Hiding a tool is not the same as denying its execution. A model that knows about a tool (from prior context, from the session transcript, or from guessing) can still call it via `tools/call` even if it does not appear in `tools/list`. Real enforcement requires `tools/call` evaluation and, for irreversible actions, a commit-boundary check. `tools/list` filtering is a UX layer, not a security boundary.

### Why use Cedar instead of baking rules into code?

One policy language evaluated at four enforcement points (AS token issuance, host precheck, MCP `tools/call`, commit boundary). Hard-coding separate rule sets in each service guarantees that they drift. Cedar produces auditable, diffable policy that a security team can review without reading code. The Cedar schema and generation recipe are in the [Cedar Policy Reference appendix](#cedar-policy-reference).

### Why require a capability snapshot before planning?

Because planning on stale authority wastes the user's time and the model's context. A model that plans a 5-step workflow and gets denied on step 3 has wasted all prior work and produced a confusing user experience. Planning inside the current allowed surface means the model's plan is realistic before the first tool call. The cost is one MAS call at session start — not per tool, not per thought.

### Why separate delegation from cross-domain federation?

They solve different problems with different mechanisms. Delegation answers: "is this child agent's scope a strict subset of the parent's approved scope?" — answered by a narrowing proof stored in MAS. Federation answers: "will the target domain mint a local token for this principal and this scope?" — answered by an ID-JAG exchange with the target domain's AS. Conflating them produces a system where narrowing proofs and token exchange are intertwined and neither is correctly specified.

### Does this architecture require human approval for everything?

No. The default mode is `auto_with_release_gate`: the Mission activates immediately, all preparatory tools are available without any human gate, and only the irreversible commit action (publish, send, delete, pay) triggers a step-up. Routine work — reading data, drafting documents, summarizing, searching — proceeds without human involvement. Human review is reserved for the moment something cannot be undone.

### What is the minimum safe deployment?

The minimum safe internal deployment ends Phase 3, not Phase 0. It requires:

- deterministic compiler with a versioned template pack and resource catalog
- Mission lifecycle with persisted approval evidence
- Mission-aware `tools/call` with Cedar evaluation
- Capability snapshot refresh at session start and authority transitions
- Audience-specific token projection
- Stage-gated execution for irreversible actions
- Downstream-owned commit boundary for each high-risk tool

See the full acceptance criteria in [V1 Product Contract](#v1-product-contract). Filtered `tools/list` alone is not a governed deployment.

### When should I amend vs. create a new Mission vs. clone?

See [Create vs. amend vs. clone: decision tree](#create-vs-amend-vs-clone-decision-tree) for the full flowchart. The short version: amend when the current Mission is active and the new scope is additive to the same purpose; create new when the purpose class changes; clone when the same work pattern repeats after a prior Mission completes.

### What happens when MAS is unavailable?

Existing sessions with valid cached snapshots continue for reads up to 120 seconds. Token issuance stops immediately (fail closed). Commit-boundary actions stop immediately. No new Missions or session starts succeed. The design is explicit: governance tightens under outage, never loosens. See [MAS centrality mitigation](#mas-centrality-mitigation) for the full decision classification table.

### How does an operator diagnose a denial?

See [Policy debugging diagnostic playbook](#policy-debugging-diagnostic-playbook). The short path: check `GET /missions/{id}/audit?event_type=denial` for the denial event, then call `POST /missions/{id}/explain` with the tool name and actor to get a human-readable decision trace. Four steps from "user says it was denied" to "here is the specific rule and what to do about it."

### What does an operator do every morning?

Five checks in five minutes: pending approvals past SLA, templates approaching review deadline, unexplained Mission suspensions, bootstrap Mission count, overnight emergency control events. See [Operator console model](#operator-console-model) for the full daily/incident/weekly cadence.

### Should admins see `constraints_hash` values?

No. `constraints_hash` values appear truncated (first 12 characters) in the admin dashboard for correlation purposes only. Operators see the `human_summary` from the amendment diff API — what changed in plain English — not the raw hash. The hash is a cache-staleness key for enforcement points; it is not an admin-facing artifact.

### What does a user see when something is denied?

Plain language, not policy internals. "That tool isn't part of your current Mission" not "Cedar forbid rule matched." "This action requires Controller approval" not "`approval_missing: controller_approval`." See the [denial message translation table](#plain-language-denial-messages) for the full mapping. The model constructs the user-facing message from host-injected context; users never see raw error codes.

### How do I migrate an existing Claude Code deployment?

See [Rollout into an existing deployment](#rollout-into-an-existing-deployment) for the 6-step migration sequence and [Bootstrap Mission specification](#bootstrap-mission-specification) for how ungoverned sessions get a time-limited bootstrap Mission rather than being blocked immediately. The key: deploy in observation mode first, run for 30+ days, then enforce by template family.

### What is `auto_with_release_gate` and how is it different from `auto`?

Both activate the Mission immediately without human step-up. The difference: `auto` gives full access to all tools including commit-boundary tools immediately. `auto_with_release_gate` gives immediate access to read/draft tools and defers the commit-boundary tools until a stage gate is satisfied. `auto_with_release_gate` is the correct default for most internal workflows. See the [comparison table](#auto-vs-auto_with_release_gate-side-by-side-comparison) for the full side-by-side.

### Plain-language denial messages

The host's `PostToolUse` hook and the MCP server's error response carry structured denial codes. Those codes need to be translated into user-facing language the agent can work with and users can understand.

**Required translation table:**

| Denial reason | User-facing message | What the user can do |
|---|---|---|
| `tool_not_allowed` | "That tool isn't part of your current Mission. I can only use tools approved for [Mission purpose]." | Check the Mission scope or create a new Mission that includes this tool. |
| `mission_inactive` | "Your Mission has been paused or ended. I can't take action until it's active again." | Resume the Mission or create a new one. |
| `mission_expired` | "Your Mission has expired. Any further work requires a new Mission." | Start a new Mission from the same purpose template. |
| `approval_missing` | "This action requires approval before I can proceed. [approver name] needs to review this step." | Wait for approval or ask the approver to review the pending request. |
| `constraints_changed` | "The Mission policy was updated since this session started. Refreshing before I continue." | No action needed — the host will fetch the updated Mission automatically. |
| `commit_boundary_deny` | "I can't complete this action right now — it has side effects that require a fresh policy check. Retrying." | No action needed — the host will retry with a live Mission check. |
| `budget_exceeded` | "I've reached the session limit for [resource class / time / external calls]. Confirm to continue." | Resume the Mission to reset the budget counter with your acknowledgment. |
| `anomaly_detected` | "Unusual activity was detected in this session. Mission access has been restricted pending review." | Contact your Mission operator to review the anomaly flag. |

**Translation rules:**

1. Never surface Cedar decision traces, `constraints_hash` values, or Cedar entity names to end users.
2. Never tell the user what policy rule was triggered. Tell them what they can do next.
3. For `tool_not_allowed`, include the Mission purpose name so the user understands why this tool is excluded.
4. For `approval_missing`, include the approver type (not personal email) if available in the Mission approval spec.
5. For transient errors (`constraints_changed`, `commit_boundary_deny`), tell the user the host is handling it — don't surface a blocker that looks like a hard stop.

**What the model should say vs. what the host should say:**

The host injects denial context as trusted system context. The model then speaks to the user in natural language. Do not require the model to construct denial explanations from raw error codes — that produces inconsistent or confusing messages.

Host injection format (not visible to user directly):

```
[Tool denied: mcp__email__send_external — reason: tool_not_allowed — mission: "Board Packet Preparation" — user message: "That tool isn't part of your current Mission."]
```

The model reads this, understands the denial, and responds to the user in context: "I'm not able to send external emails for this Mission — the Board Packet Preparation Mission is limited to internal document preparation and read access."

### Mission history and "My Missions" user journey

Users need a self-service view of their Missions — what is active, what is pending, what has completed, and what was denied.

**Minimum "My Missions" surface:**

The host or agent platform should expose a "My Missions" view backed by `GET /missions?user_id={uid}`. The minimum information per Mission:

| Field | What the user sees |
|---|---|
| `status` | Active / Waiting for approval / Paused / Completed / Expired |
| `purpose_class` display name | Plain-language Mission name (e.g., "Board Packet Preparation") |
| `created_at` | "Started [date]" |
| `expires_at` | "Expires in [X hours/days]" |
| `approved_tools` count | "Access to [N] tools" |
| open actions | Resume / Clone / View details / Complete |

Users should never see `constraints_hash`, Cedar entity names, or approval object IDs in this view.

**User journey: completing a Mission:**

1. User opens "My Missions" view.
2. User selects an active Mission.
3. User sees current status, when it expires, and what tools it has access to.
4. User selects "Complete" — host calls `POST /missions/{id}/complete`.
5. MAS transitions to `completed`, revokes active tokens, emits `mission.completed`.
6. Host surfaces: "Mission completed. No further tool execution is available under this Mission."

**User journey: reusing a past Mission:**

1. User opens "My Missions" view and sees a completed Mission.
2. User selects "Clone" — host calls `POST /missions/{id}/clone`.
3. MAS runs the full compiler against the current catalog and template version.
4. If auto-approvable, the new Mission activates immediately.
5. Host surfaces: "New Mission created from [previous purpose]. It's ready to use."
6. If it needs review, host surfaces: "A new Mission has been submitted for approval. You'll be notified when it's approved."

**User journey: understanding a denial:**

1. User asks why the agent couldn't do something.
2. Host surfaces the denial reason in plain language (see denial message table).
3. User selects "View Mission scope" — host shows the approved tools and actions for the current Mission.
4. If the user wants to expand scope, host offers: "Want to request an expanded Mission? I can submit an amendment."
5. Amendment path: `POST /missions/{id}/amend` with the expanded scope — follows the normal approval path.

**What to exclude from user views:**

- raw `constraints_hash` — users do not need to see or understand this
- Cedar policy names or entity IDs
- `act` chain contents
- approval object internals (`approval_id`, `reusable_within_mission`)
- audit trail raw events (available to operators, not end users)

The user's mental model should be: "My Mission is a record of what I asked the agent to do and what it's allowed to do. I can see it, pause it, and close it."

#### My Missions search and filter

Users accumulate Missions over time. Without search and filter, "My Missions" becomes an unnavigable list. All filters map to `GET /missions` query parameters.

**Filter capabilities:**

| Filter | UI control | `GET /missions` query parameter |
|---|---|---|
| Status | Multi-select chips: Active / Pending / Completed / Expired / Denied | `?status=active,pending` (comma-separated, multiple values) |
| Purpose class | Dropdown populated from the user's historical purpose classes | `?purpose_class=board_packet_preparation` |
| Date range | Date picker: "created after" and "created before" | `?created_after=2025-10-01&created_before=2025-10-31` |
| Free text search | Search box, searches `mission_summary` and `purpose_class` display name | `?search=board+packet` (server-side full-text search on indexed fields) |
| Sort | Dropdown: Most recent activity (default), Created newest first, Expires soonest | `?sort=last_active_desc` / `created_at_desc` / `expires_at_asc` |

**Default view state:** on opening "My Missions," the default filter is `status=active,pending` sorted by `last_active_desc`. This surfaces the Missions most likely to be relevant without burying active work under old completed records.

**Pagination:** `GET /missions` returns paginated results. Default page size: 20. The UI should support infinite scroll or a "Load more" control. Total count is shown: "Showing 12 of 47 Missions."

**Empty states:**

| Condition | Message |
|---|---|
| No Missions exist at all | "You haven't started any governed work yet. When you make your first request, I'll create your first Mission automatically." |
| No Missions match the current filter | "No Missions match your filters. [Clear filters]" |
| Active Missions only filter, no active Missions | "No active Missions. [Browse completed Missions] or [Start new work]" |
| Search returns no results | "No Missions found for '[search text]'. Try a shorter search or [clear filters]." |

**Search indexing scope:** the search parameter matches against: `mission_summary` (the plain-language description produced during shaping), `purpose_class` display name, and `template_id` display name. It does not search tool names, Cedar policy text, or audit trail events — those are operator-level views.

**Mission list item minimum fields (for list row, not detail view):**

```
[Status badge] Board Packet Preparation
Started Oct 1  ·  Expires in 6 hours  ·  3 tools  ·  [Resume] [More ▼]
```

Where:
- Status badge: color-coded (see Mission health indicator spec)
- "3 tools" = count of `allowed_tools` from the capability snapshot
- `[Resume]` is shown for `active` and `pending` Missions; `[Clone]` for `completed` and `expired`; `[View]` for `denied`

**Filter persistence:** the user's last-used filter state should persist in local session storage so returning to "My Missions" restores where they left off.

### How should policy debugging work?

Policy debugging is not optional. A deployable system needs an explanation path for every important decision.

Minimum explanation surfaces:

- **compile explanation**
  - matched template
  - denied tools
  - stage gates
  - risk factors
- **issuance explanation**
  - requested audience
  - projected tools and actions
  - why a token was denied or narrowed
- **host denial explanation**
  - tool name
  - category of failure: stale context, out of scope, approval missing
- **MCP denial explanation**
  - token invalid, stale Mission, schema failure, policy deny, commit-boundary deny
- **amendment explanation**
  - previous hash
  - new hash
  - changed tool/action/gate set

If operators cannot answer "why was this denied?" within one screen, the governance model is too hard to run.

#### Policy debugging diagnostic playbook

When a user reports "the agent couldn't do X," the operator follows this sequence:

**Step 1 — Identify the denial source**

Call `GET /missions/{mission_id}/audit?event_type=denial&limit=20` to find the most recent denial events for this Mission. Each denial event carries: `source_component` (host_precheck / mcp_server / commit_boundary), `tool_name`, `denial_reason`, and `timestamp`.

Match the denial timestamp to the user's reported time. If no denial event exists for that time: the action may not have been attempted (user misremembered), or the denial happened at a component that isn't yet emitting audit events.

**Step 2 — Determine if it is a template/catalog problem or a runtime state problem**

| `denial_reason` | Likely cause | Where to look |
|---|---|---|
| `tool_not_in_allowed_set` | tool is not in the Mission's compiled `allowed_tools` | check the template's allowed resource classes and the catalog entry for the tool |
| `action_class_not_in_allowed_set` | action class is present but not permitted | check the template's `allowed_action_classes` |
| `hard_denied_action` | the tool or action is in the template's deny list | check the template's `hard_denied_actions` |
| `approval_missing` | stage gate requires approval that hasn't been granted | check the Mission's current approval objects |
| `mission_not_active` | Mission is in a non-active state (suspended, expired) | check the Mission's current status |
| `constraints_hash_mismatch` | enforcement point is on stale policy | check when the enforcement point last refreshed its snapshot |
| `entity_snapshot_expired` | entity snapshot TTL has elapsed | check the MCP server's cache refresh behavior |

**Step 3 — Use the policy explain API**

```
POST /missions/{mission_id}/explain
{
  "tool_name": "mcp__finance__erp.read_financials",
  "actor_id": "user_alice",
  "action_class": "read"
}
```

Response:

```json
{
  "decision": "deny",
  "decision_source": "cedar_evaluation",
  "matching_rule": "forbid: tool not in allowed_tools for this Mission",
  "template_id": "board_packet_v3",
  "entity_snapshot_hash": "b9c2e4...",
  "context_evaluated": {
    "mission_status": "active",
    "approvals": [],
    "runtime_risk": "low",
    "commit_boundary": false,
    "trust_domain": "enterprise"
  },
  "human_explanation": "This tool (finance ERP read) is not in the allowed tools for the Board Packet Preparation Mission. The Mission only allows: finance_query, docs_editor, docs_search.",
  "operator_explanation": "Cedar forbid rule matched: tool 'mcp__finance__erp.read_financials' is not a member of ToolGroup 'board_packet_read_tools'. Template board_packet_v3 does not include this resource class."
}
```

`human_explanation` is surfaced to users. `operator_explanation` is surfaced in the admin console only.

**Step 4 — Check if the fix is a template amendment, catalog addition, or approval routing change**

| Root cause | Fix |
|---|---|
| Tool missing from template's resource classes | template amendment (broadening — requires approval) |
| Tool missing from catalog entirely | add catalog record and register in audience registry |
| Tool in hard deny list by mistake | template amendment to remove it from hard denies |
| Approval routing sends to wrong approver | fix approver group configuration |
| Approval TTL too short | extend TTL in template configuration |
| Enforcement point on stale snapshot | force cache refresh; check invalidation path |

**What the explain API does not expose:**

- raw Cedar policy text
- internal constraint graph
- `constraints_hash` internals
- other users' approval states
- signal weight accumulation detail (visible in the admin dashboard, not the explain response)

## Protocol Reuse vs Extensions

This architecture is implementable now largely by reusing existing protocols, but it still requires some new application-layer contracts and a few places where future standardization would help.

### What can be reused as-is

| Protocol or standard | What is reused directly |
|---|---|
| OAuth 2.0 | token issuance, bearer token transport, audience scoping |
| OpenID Connect | subject tokens from enterprise IdP |
| OAuth Token Exchange (`RFC 8693`) | deriving audience-specific tokens from a subject token |
| OAuth Rich Authorization Requests (`RFC 9396`) | structured authorization input when scopes are too coarse |
| OAuth introspection / revocation | liveness and revocation for opaque-token deployments |
| MCP authorization and tool surface | `Authorization: Bearer`, `tools/list`, `tools/call` |
| Cedar | policy evaluation across host, AS, MCP, and commit boundary |
| ID-JAG | identity assertion bridge across trust domains |

None of those need to be changed to build the system described here.

### What is reused with a local profile

These standards are reused, but they need a tighter local profile to be useful for Mission-aware authorization.

| Reused surface | Local profile needed |
|---|---|
| OAuth access tokens | include `mission_id`, `constraints_hash`, and audience-specific tool projection claims |
| OAuth `authorization_details` | define a local type for tool access projection |
| OAuth token issuance policy | make issuance depend on current authority state, approval state, and current `constraints_hash` |
| MCP responses | map Mission denials, stale Mission state, and approval-required states into consistent server behavior |
| audit and signal events | carry Mission correlation and integrity fields consistently across services |

This is profile work, not protocol replacement.

### New application-layer protocols or contracts you must define

These are the parts no existing standard gives you.

| New contract | Why it is needed |
|---|---|
| MAS APIs | Mission creation, capability snapshot, lifecycle, derivation, amendment, revoke, signal ingestion |
| Mission proposal schema | model-shaped intent is not standardized |
| governance record schema | durable Mission authority state needs a shared shape |
| review packet schema | human or policy approval needs a legible review object |
| approval object schema | scoped, integrity-protected approval artifact |
| runtime signal schema | shared event contract for risk, denial, revoke, and approval updates |
| token projection metadata | shared record of what authority was projected into each issued token |
| narrowing proof artifact | delegated child authority needs explicit attenuation proof |
| policy bundle fetch contract | host, AS, and MCP need a common way to fetch Cedar bundles by `constraints_hash` |

These are the core Mission-specific contracts. They are not covered by OAuth or MCP.

### Protocol extension candidates worth standardizing later

If this architecture is meant to interoperate across multiple implementations, these are the highest-value standardization points.

| Candidate extension | Why it would help |
|---|---|
| standard Mission token claims | avoids every implementation inventing different names for `mission_id` or `constraints_hash` |
| standard tool-access `authorization_details` type | makes audience-specific tool projection interoperable |
| standard Mission denial / stale-state response model for MCP | makes clients handle approval-required and stale-Mission cases consistently |
| standard privacy-preserving cross-domain trace handle | supports traceability without exposing internal Mission content |
| standard delegated narrowing proof format | makes child-Mission or delegated-token attenuation auditable across systems |

These are not required to build now, but they would reduce fragmentation later.

### What should not be extended

Some boundaries should stay clean.

| Surface | What not to do |
|---|---|
| ID-JAG | do not turn it into a Mission carrier or approval carrier |
| MCP | do not try to stuff full governance state into the protocol |
| OAuth scopes | do not force scopes to carry lifecycle, approval, or delegation semantics by themselves |
| Cedar | do not treat Cedar as the storage system for Mission records |

The design works because each layer does one job:

- MAS stores authority state
- OAuth projects authority into tokens
- MCP carries tool calls
- Cedar evaluates policy
- ID-JAG bridges identity across domains

### Short summary

Use existing standards for:

- OAuth
- OIDC
- Token Exchange
- RAR
- introspection / revocation
- MCP auth and tool surfaces
- Cedar
- ID-JAG

Define new local contracts for:

- MAS APIs
- Mission schemas
- approval objects
- runtime signals
- policy bundle fetch
- narrowing proof artifacts

Standardize later, if needed, around:

- Mission token claims
- tool-access `authorization_details`
- MCP Mission error semantics
- privacy-preserving cross-domain trace handles

## What Still Needs Real-World Tuning

At this point the architecture is concrete and specced. What remains is not missing theory — it is operational calibration that can only be done with real usage data. Each subsection below explains *why* the calibration is necessary; the actionable checklist is in [Open Issues and TODO](#open-issues-and-todo).

### Template fit

The starter template pack is only a starting point. Real deployments need to learn:

- which templates are too broad
- which are too narrow
- which work patterns are missing
- which exceptions happen often enough to deserve a new template

This usually requires running in observation mode first and reviewing real prompt, tool, and approval patterns.

### Approval burden

The approval model is structurally sound, but the workflow can still be unusable if:

- too many actions require step-up
- approval TTLs are too short
- approver routing is too slow
- users cannot tell why approval is required

The real test is whether irreversible actions are gated without turning ordinary work into a queue.

**Approval latency SLA targets:**

| Approval mode | Target latency | What happens if SLA is exceeded |
|---|---|---|
| `auto` — compiler decision | < 2 seconds | compiler bug; alert and investigate |
| inline step-up (user is approver) | < 30 seconds for user to respond | no escalation — user can take their time; agent waits |
| async step-up (external approver) — routine | < 2 business hours | escalate to approver's manager; notify requestor of delay |
| async step-up (external approver) — high-risk | < 30 minutes | escalate immediately; offer emergency bypass path (with dual-operator approval logged) |
| async Mission creation review | < 4 business hours | escalate; requestor notified at 2h mark |

When an async approval exceeds its SLA, MAS emits an `approval.sla_exceeded` event. The approval workflow service must have a configured escalation path for this event — it does not retry the original notification, it escalates to a secondary approver or queue.

**Bypass signals — what to watch for:**

A well-designed approval model that is too slow or noisy will be bypassed. These patterns in the audit log indicate the approval model is being worked around:

| Pattern | Likely cause | Response |
|---|---|---|
| User requests Mission amendments more than 3x per week for the same purpose class | template scope is too narrow | widen the template scope |
| Bootstrap Missions staying active past 30 days | migration is stalled or approval for real Missions is too slow | investigate migration blockers |
| Same user creates new Mission immediately after a denial | user found that creating a new Mission resets the denial | check if denial reasons are being used as approval gates |
| Approval request withdrawn and resubmitted multiple times | approver routing is wrong or approver is unresponsive | fix routing; set SLA escalation |
| Many denials followed by out-of-scope tool calls via a different path | user is asking for the denied action through a permitted tool indirectly | template has a gap that needs to be closed |

**The design principle:** approval must be faster than the workaround. If the legitimate path takes longer than an informal alternative (asking a colleague to run the query, using a non-Mission-aware tool), users will use the informal path. The governance model only works if it is faster and clearer than circumventing it.

Concretely: `auto_with_release_gate` should be the default because it lets users do all preparatory work without any approval gate. The gate only fires at the commit boundary — the moment when something irreversible is about to happen. By that point, the user has invested time in the work and is motivated to complete the approval, not bypass it.

### Risk thresholds

The default scoring model is a baseline, not a validated safety model.

You still need to tune:

- false-positive rate
- false-negative rate
- which runtime signals actually correlate with risky behavior
- how session-local, Mission-local, and principal-wide risk interact

### Backend resource mappings

This is one of the first places reality diverges from design.

You need to validate:

- whether high-level resource classes actually map cleanly to backend systems
- whether backend teams can maintain those mappings
- whether FGA exists where instance-level checks are needed
- whether data sensitivity classes match real usage

### Host ergonomics

Host integration is technically specified, but still needs usability tuning.

You need to observe:

- how often hosts hit stale policy
- how often capability-snapshot refresh adds noticeable latency
- whether denials are understandable
- whether the hook or wrapper model creates too much operator or user friction

### Revocation latency

Revocation acts at the next checkpoint, not magically everywhere at once. The following target SLAs define what "effective revocation" means for each action class:

| Action class | Revocation-to-containment SLA | Mechanism |
|---|---|---|
| Low-risk read | ≤ token lifetime (300-900s maximum) | token expires or refresh rejected by AS |
| High-risk write or gated action | ≤ 120 seconds | MAS emits `mission.suspended` or `mission.revoked` signal; enforcement points refresh cache on next signal receipt; cache TTL ≤ 120s for high-risk routes |
| Commit-boundary action | immediate (synchronous) | commit boundary always calls MAS live before the write; MAS revocation stops issuance of the `commit_intent_id`; no commit proceeds without a live MAS check |
| New session or token refresh | immediate | AS rejects token issuance for revoked Mission at the time of the request |

These SLAs assume signal propagation is working. The degraded-mode contract applies when MAS is unreachable: high-risk writes and commit-boundary actions fail closed immediately; low-risk reads may continue within the cache TTL.

In production, measure:

- time from operator revocation action to `mission.revoked` signal emission at MAS (target: < 2 seconds)
- time from signal emission to enforcement point cache update (target: < 30 seconds under normal load)
- time from cache update to first denied high-risk request (target: immediate — no additional grace window)
- token lifetime distribution in active sessions — skew toward the low end (300-400 seconds) if tight revocation is a priority

### Commit-boundary placement

This is one of the most important deployment-specific choices.

For each high-risk tool, decide:

- where the true irreversible boundary is
- whether the MCP server can own it
- whether the downstream service must own it
- what idempotency contract exists there

If the commit boundary is placed too early, workflows become noisy. If it is placed too late, irreversible actions can escape the governance loop.

### Observability quality

The note defines signals, audit records, and SLOs. That is still not the same as operational clarity.

You need to test whether:

- an operator can reconstruct why an action was denied
- support can explain approval failures
- security can trace revocation propagation
- auditors can connect action, approval, Mission version, and actor identity cleanly

### Admin workflow quality

The architecture is only usable if the admin surface is small and decision-oriented.

You need to validate:

- whether admins can operate mostly from Missions, approvals, denials, and template drift views
- whether template changes are simulatable before rollout
- whether hash changes can be explained without reading raw policy
- whether emergency controls are visible and safe to use under pressure

### End-user experience quality

The governance model is technically correct only if it is also understandable to the person being gated.

You need to validate:

- whether users can tell what the agent is allowed to do now
- whether users understand why approval is required
- whether denial messages tell the user what to do next
- whether inline approvals are short and specific enough to avoid fatigue

### Rollout tolerance

Migration always exposes hidden dependencies.

The design should be validated in:

- observation mode
- narrow deny mode on selected tool families
- template-by-template enforcement rollout
- staged commit-boundary rollout

Do not assume the whole environment is ready for full enforcement at once.

### Observation (shadow) mode spec

Observation mode is not "deploy and watch." It is a specific operational mode with a defined contract. Without a precise definition, teams will implement it inconsistently — some logging nothing, some blocking anyway — and the migration data will be unreliable.

**Mechanical definition of observation mode:**

1. The host hook runs all checks (PreToolUse, Cedar policy evaluation, `constraints_hash` validation) exactly as in enforcement mode.
2. When the Cedar evaluation would produce a `deny`, the host:
   - logs the denial as a `would_have_denied` signal to MAS (not a `denied` signal)
   - proceeds with the tool call anyway (returns allow)
3. MAS accumulates `would_have_denied` signals separately from enforcement denials. The shadow distribution is queryable by operators before enforcement is enabled.
4. The `anomaly_flags` rail operates in observation mode the same as in enforcement: anomaly signals are emitted, weights accumulated, but MAS does **not** suspend the Mission on anomaly threshold breach. Instead it emits a `shadow_suspension_would_have_triggered` event to the operator dashboard.

**What observation mode does NOT do:**
- It does not disable `constraints_hash` freshness checking. The host still must have a current bundle.
- It does not disable commit-boundary prechecks. Those run and produce shadow records but do not block.
- It does not disable audit logging. Every tool call is still recorded.

**API: shadow signals vs. enforcement signals**

The signal endpoint is the same (`POST /missions/{id}/signals`), but the signal shape carries a mode field:

```json
{
  "signal_type": "would_have_denied",
  "source_component": "host_precheck",
  "tool_name": "finance_query",
  "denial_reason": "action_class_not_in_allowed_set",
  "mode": "observation",
  "timestamp": "2025-10-01T14:32:00Z"
}
```

Enforcement signals use `"mode": "enforcement"`. MAS stores them in separate signal buckets. The operator dashboard shows both distributions side by side.

**Operator switch: observation → enforcement mode**

Enforcement mode is scoped per template family, not globally. A team can enforce `board_packet_preparation` while still observing `support_ticket_triage`.

Switch endpoint:

```
POST /templates/{template_id}/enforcement-mode
{
  "mode": "enforcement",
  "effective_at": "2025-11-01T00:00:00Z"
}
```

`effective_at` can be immediate (`now`) or scheduled. Scheduled switches are logged and visible in the operator dashboard before they fire.

**Before switching to enforcement, operators should verify:**

| Check | Target threshold |
|---|---|
| `would_have_denied` rate | < 5% of tool calls across sessions using this template (tune by environment) |
| false positive candidates | review top denial reasons — are they policy calibration issues or real violations? |
| shadow suspension trigger count | zero unexpected `shadow_suspension_would_have_triggered` events in the past 7 days |
| missing resource catalog entries | no `catalog.resource.not_found` signals (missing tools should be catalogued, not denied silently) |

**Observation mode duration:** no fixed minimum, but at least long enough to capture a full work cycle for the template's purpose class. A `board_packet_preparation` Mission should observe at least 3–5 real board packet sessions before switching to enforcement. A `support_ticket_triage` Mission should observe 50–100 ticket sessions.

**Observation mode is not permanent:** templates that stay in observation mode indefinitely provide no protection. Set a target enforcement date at observation mode deployment. If the target is missed, require an explicit decision to extend — not a silent default.

### Ownership discipline

The architecture assumes that teams will keep templates, mappings, approval groups, policy bundles, and revocation paths current.

That only happens if the organization also has:

- change control for templates and mappings
- review cadence for approver groups and thresholds
- incident ownership for policy failures and false denials
- operational accountability for revocation, emergency disablement, and migration quality

The short version is:

the architecture is now concrete enough to build, but it still needs production tuning in policy fit, approval UX, latency, resource mapping, and operational ownership before it becomes a reliable system.

## Configuration Management

The document has many configurable values scattered across it. Without a spec for where they live, who can change them, and how changes are versioned, an operator who needs to adjust a threshold must either guess or read source code. This section consolidates the configuration surface.

### Configuration namespaces

Configuration values belong to one of three namespaces, owned by different teams:

| Namespace | Owner | Values |
|---|---|---|
| `mas.*` | MAS team / security operations | classifier thresholds, anomaly signal weights, clarification round limits, entity snapshot TTLs, anomaly suspension threshold, session quota defaults, Mission expiry warning thresholds |
| `template.*` | Policy team | session budget defaults, approval TTLs per risk tier, stage gate routing, time bound defaults — these live in template definitions, not global config |
| `as.*` | AS team | token lifetimes (min/max), introspection cache TTL, audience registry verification TTL |

Template-managed values are governed by the template review process, not the configuration API. Changing a session budget on a template requires re-review per the template re-review rules.

### Configuration API

All configuration reads and writes require an operator-scoped token. End-user tokens cannot access the configuration API.

**Read a namespace:**

```
GET /config/{namespace}
Authorization: Bearer <operator token>
```

Example: `GET /config/mas`

**Response:**

```json
{
  "namespace": "mas",
  "version": "42",
  "updated_at": "2025-10-15T09:00:00Z",
  "updated_by": "ops-admin@example.com",
  "values": {
    "classifier.auto_approve_threshold": 0.80,
    "classifier.human_step_up_threshold": 0.65,
    "classifier.deny_threshold": 0.10,
    "entity_snapshot_ttl_seconds.default": 120,
    "entity_snapshot_ttl_seconds.token_issuance": 0,
    "entity_snapshot_ttl_seconds.commit_boundary": 0,
    "anomaly.suspension_weight_threshold": 100,
    "anomaly.signal_weights.repeated_denial": 20,
    "anomaly.signal_weights.out_of_scope_attempt": 25,
    "anomaly.signal_weights.commit_boundary_retry": 40,
    "anomaly.signal_weights.prompt_injection_indicator": 60,
    "anomaly.signal_weights.argument_pattern_anomaly": 30,
    "anomaly.signal_weights.cross_resource_exfil_pattern": 70,
    "anomaly.signal_weights.session_budget_spike": 20,
    "clarification.max_rounds": 3,
    "session.max_active_missions_per_user": 5,
    "expiry_warning.critical_threshold_seconds": 900,
    "expiry_warning.high_threshold_seconds": 3600,
    "expiry_warning.normal_threshold_seconds": 86400
  }
}
```

**Update a value:**

```
PATCH /config/{namespace}
Authorization: Bearer <operator token>
Content-Type: application/json

{
  "values": {
    "classifier.auto_approve_threshold": 0.85
  },
  "reason": "Tightening classifier threshold after Q4 policy review"
}
```

Response: the full updated namespace config with incremented `version`.

Every write is logged to the MAS audit trail with: operator identity, prior value, new value, `reason`, timestamp.

**Read AS namespace:**

```
GET /config/as
```

```json
{
  "namespace": "as",
  "version": "7",
  "values": {
    "token.min_lifetime_seconds": 300,
    "token.max_lifetime_seconds": 900,
    "token.default_lifetime_seconds": 600,
    "introspection_cache.ttl_seconds": 30,
    "audience_registry.verification_ttl_days": 365
  }
}
```

### Configuration versioning contract

- every configuration write increments `version` (monotonically increasing integer)
- `version` is included in capability snapshots: `"config_version": "42"` — enforcement points that cache configuration must invalidate their cache when the version changes
- configuration history is immutable: the audit trail records every value change with prior and new values; records cannot be deleted
- configuration rollback is a new write with the prior values, not a revert operation — the audit trail must show the rollback as an explicit decision

### Which changes require Mission re-compilation

| Configuration value changed | Effect on active Missions |
|---|---|
| `classifier.*` thresholds | no recompilation required; affects new Missions only — existing compiled Missions are not reclassified |
| `entity_snapshot_ttl_seconds.*` | no recompilation; enforcement points adopt new TTL on next cache refresh |
| `token.max_lifetime_seconds` | no recompilation; new tokens issued after the change use the new lifetime |
| `anomaly.signal_weights.*` | no recompilation; MAS adopts new weights for signals received after the change; accumulated weights from prior signals are not retroactively recalculated |
| `anomaly.suspension_weight_threshold` | no recompilation; MAS adopts the new threshold immediately — if any active Mission's accumulated weight already exceeds the new threshold, it is suspended at the next signal evaluation |
| `clarification.max_rounds` | no recompilation; applies to the next clarification attempt on any Mission |
| `session.max_active_missions_per_user` | no recompilation; applies at Mission creation time |
| `expiry_warning.*` thresholds | no recompilation; applies to the next expiry notification cycle |

**Summary:** no configuration change in the `mas.*` or `as.*` namespaces requires recompiling active Missions. Configuration values govern behavior going forward; they do not invalidate existing compiled state. Template changes require re-review per the template governance rules and may produce a new `constraints_hash`.

### Sensitive values

Some configuration values are security-relevant. Changes to these values must require dual approval (two operator tokens) and must trigger an immediate notification to the security operations team:

- `classifier.deny_threshold` (lowering this reduces how aggressively the compiler blocks Missions)
- `anomaly.suspension_weight_threshold` (raising this makes it harder to trigger suspension)
- any `anomaly.signal_weights.*` for high-severity signals (reducing weights for `prompt_injection_indicator` or `cross_resource_exfil_pattern`)

The dual-approval requirement is enforced at the API level: a single operator token is insufficient for these fields; a second confirmation token from a different operator identity is required.

## Design Assessment

This section is the blunt evaluation of the architecture. All findings identified in the assessment have been resolved — either specced in this document or explicitly deferred to deployment decisions. The [Assessment Task List](#assessment-task-list) records the gap-to-resolution chain.

### What is smart

- **Mission as durable authority state** is the right core move. It fixes the common failure mode where prompt text, session memory, and token claims pretend to be governance. The governance record is the root artifact everything else derives from.
- **Capability snapshots** are a strong simplification. They give the host a planning surface without making MAS a chatty hot-path dependency. One call at session start; cache for reads; live check only at commit boundary.
- **`constraints_hash` as a live enforcement handle** is the right mechanism for keeping enforcement points synchronized without per-call MAS queries. Every enforcement point carries the hash; mismatch triggers a fresh fetch before evaluation.
- **Downstream-owned commit boundary** is the right distributed-systems stance. Final write authority stays with the system of record. MAS holds an advisory lock for cross-session coordination; `commit_intent_id` provides idempotency at the write.
- **`auto_with_release_gate` as the default approval mode** is the right UX balance. The model can do all preparatory work without blocking; the human gate fires only at the irreversible moment. The user is motivated to complete approval because they've invested work.
- **O(templates) Cedar policy** is the right performance model. One template policy set shared across all Mission instances of that class; per-instance entity snapshot changes on amendment. Policy set size does not grow with Mission count.
- **The architecture is honest about sequence risk.** It is a point-in-time authority control system, not a behavioral sequence analyzer. Session budgets and anomaly detection bound but do not eliminate the residual risk. That honesty makes the security posture more credible.
- **V1 product contract and mandatory defaults** mean implementation teams do not have to interpret optional paths. One configuration, one approval mode, one policy language, one token validation model.

### What is differentiated

The differentiator is not OAuth or MCP usage on their own. It is the combination:

| Capability | What it replaces |
|---|---|
| Mission as a durable authority record with lifecycle | session-scoped token claims pretending to be authority |
| Compiler from natural language intent to bounded enforcement bundle | ad hoc prompt-level approval and policy by convention |
| `constraints_hash` as a cache-staleness detector across enforcement points | per-call MAS queries or static policy that ignores amendments |
| Capability snapshot for host planning | planning by trial-and-error against repeated denials |
| Downstream-owned commit boundary with MAS advisory lock | host-side locks that are invisible to concurrent sessions |
| O(templates) Cedar policy with per-Mission entity snapshot | O(missions) policy that does not scale |

This combination is meaningfully different from:
- plain scoped tokens (no lifecycle, no approval evidence, no stage gates)
- "just use FGA" (FGA governs resource instances; it does not compile intent, gate by phase, or carry approval evidence)
- prompt-level approval (not durable, not auditable, not revocable)
- per-call MAS queries (makes MAS a hot-path synchronous dependency under load)

### What is risky

- **MAS still has high conceptual centrality.** Even with the service split, too many decisions still anchor on it. → *Resolved: see [MAS centrality mitigation](#mas-centrality-mitigation) for the decision classification table and blast radius analysis.*
- **Template governance can rot.** If template ownership is weak, the whole model degrades into brittle policy debt. → *Resolved: see [Template governance ownership and cadence](#template-governance-ownership-and-cadence) for required ownership fields, review cadence rules, and deprecation path.*
- **Approval UX is a real failure mode.** A structurally correct model with slow or noisy approval will get bypassed. → *Resolved: see [Approval burden](#approval-burden) for SLA targets, bypass signal patterns, and the core design principle.*
- **Capability snapshot drift** is a risk if host, AS, MCP, and downstream systems interpret the same authority differently. → *Resolved: see [Capability snapshot drift detection and reconciliation](#capability-snapshot-drift-detection-and-reconciliation) for the drift taxonomy, reconciliation protocol, and weekly runbook check.*
- **Cross-domain remains expensive.** It is coherent architecturally and still likely to be slow politically and operationally. → *Not resolved in this spec: cross-domain federation is an advanced profile explicitly excluded from v1. The architecture is correct; the operational cost is real and must be budgeted separately when v2 scope is set.*
- **Sequence-level misuse is still only partially controlled.** The note is honest about that, but it remains the biggest residual security gap. → *Not resolvable by this architecture: multi-step instruction sequence attacks are a residual risk that cannot be fully addressed by point-in-time authority control. The document is honest about this limit. Mitigations are session budgets, anomaly detection, and commit-boundary containment — not a full solution.*

### What is missing

- a single **v1 product contract** that an implementation team can point to without interpreting multiple optional paths → *Resolved: see [V1 Product Contract](#v1-product-contract) at the top of the document.*
- a stronger **policy debugging workflow** for operators → *Resolved: see [Policy debugging diagnostic playbook](#policy-debugging-diagnostic-playbook) and the `POST /missions/{id}/explain` API.*
- a more explicit **admin lifecycle** for template changes, mapping changes, and rollbacks → *Resolved: see [Admin lifecycle for template and mapping changes](#admin-lifecycle-for-template-and-mapping-changes).*
- a clearer **operator console model** for what gets surfaced every day versus only during incidents → *Resolved: see [Operator console model](#operator-console-model) for daily, incident, and weekly cadence views.*

### What is too complex

- the architecture is strongest in its single-domain core and weaker when advanced profiles bleed into the mental model → *Addressed: [Core profile mandatory defaults](#core-profile-mandatory-defaults) makes advanced profiles explicitly off by default and unimplemented in v1, not just disabled.*
- the artifact set is still heavy (governance record, policy bundle, capability snapshot, approval object, token projection, runtime signals, delegation artifacts) → *Addressed: see [Artifact quick reference](#artifact-quick-reference) for the orientation table. The artifact count is unchanged — the complexity is real — but the reference table makes it navigable.*
- this is manageable only if the first deployment stays narrow and does not implement advanced profiles prematurely → *Addressed: the V1 Product Contract and Core Profile Mandatory Defaults enforce this as non-negotiable defaults, not recommendations.*

### What can be improved

- make the **core profile even more opinionated** → *Resolved: [Core profile mandatory defaults](#core-profile-mandatory-defaults) gives non-negotiable values for all key configuration points.*
- reduce the number of valid v1 variations → *Resolved: the V1 Product Contract is the single authoritative v1 statement; anything outside it is explicitly out of scope.*
- make explanation surfaces first-class, not follow-on → *Resolved: [Policy debugging diagnostic playbook](#policy-debugging-diagnostic-playbook) and the `explain` API make explanation a first-class operation, not an afterthought.*
- make admin operations revolve around decisions, not raw artifacts → *Resolved: [Operator console model](#operator-console-model) defines the daily view in terms of decisions and health indicators, not raw artifact inspection.*
- keep templates thin and push backend semantics to catalog and backend-owned auth layers → *Standing guidance: this is a design principle, not a spec gap. The catalog spec and template re-review rules enforce it mechanically.*

### What we would change to optimize for admin experience

- give admins a small operating surface: active Missions, pending approvals, recent denials, template drift, suspensions/revocations/emergency controls → *Done: admin dashboard spec covers these five views.*
- add policy diff and simulation tooling before rollout → *Done: template simulation and diff APIs specced.*
- make `constraints_hash` changes explainable in human terms → *Done: amendment diff API with `human_summary` field specced.*
- require every exception or bootstrap allowlist entry to have an owner and expiry → *Done: bootstrap Mission spec includes organizational deadline and owner requirement.*

### What we would change to optimize for end-user experience

- reduce the user-visible model to: what the agent can do now, what needs approval, why something is blocked, what to do next → *Done: denial message translation table, clarification UX, and commit-boundary UX all implement this model.*
- keep approvals short, scoped, and action-specific → *Done: step-up approval UX spec and approval SLA requirements enforce this.*
- translate denials into plain language → *Done: denial message translation table and policy explain API.*
- do not expose Cedar, raw policy names, `constraints_hash`, or transport errors directly to users → *Standing rule: artifact reference and UX specs all enforce this consistently.*

## Assessment Task List

All items below are resolved. This list is retained for audit purposes — it documents the gap-to-resolution chain for each finding. The last items closed were audit integrity tiers and MAS HA posture.

### Core product tasks

- [x] Freeze one exact v1 product contract: → see [V1 Product Contract](#v1-product-contract)
- [x] Remove any remaining advanced-profile assumptions from core examples and contracts → [Core profile mandatory defaults](#core-profile-mandatory-defaults)
- [x] Freeze the capability snapshot as a versioned core contract with optional extension fields clearly separated → session-start checklist and capability snapshot spec
- [x] Freeze one user-visible approval model for v1: template auto-approval plus inline step-up only → [`auto` vs `auto_with_release_gate`](#auto-vs-auto_with_release_gate-side-by-side-comparison)
- [x] Freeze one token validation model for v1 → core profile mandatory defaults

### Admin experience tasks

- [x] Define the v1 admin console surface → [Operator Admin Dashboard Spec](#operator-admin-dashboard-spec) and [Operator console model](#operator-console-model)
- [x] Build explanation views for compile, issuance, host denials, MCP denials, amendment diffs → [Policy debugging diagnostic playbook](#policy-debugging-diagnostic-playbook) and `POST /missions/{id}/explain` API
- [x] Add template simulation and diff tooling before rollout → [Template simulation and diff tooling](#template-simulation-and-diff-tooling)
- [x] Add explicit ownership and expiry for bootstrap allowlists and exceptions → [Bootstrap Mission specification](#bootstrap-mission-specification)
- [x] Define the admin workflow for template publish, rollback, and emergency narrowing → [Admin lifecycle for template and mapping changes](#admin-lifecycle-for-template-and-mapping-changes)

### End-user experience tasks

- [x] Freeze the user-facing denial message set and approval prompts → [Plain-language denial messages](#plain-language-denial-messages) and [Step-up approval inline UX](#step-up-approval-inline-ux-for-claude-code)
- [x] Ensure every blocked action tells the user what to do next → denial message translation table includes "what the user can do" column; natural-language amendment UX handles scope gaps
- [x] Keep approval objects action-scoped and short-lived by default → approval object spec includes `expires_at` and one-shot consumption by default
- [x] Validate that inline approval prompts are specific enough to avoid fatigue → step-up UX requires: what action, what data/systems, what happens after, who reviews
- [x] Remove transport-oriented errors from end-user surfaces → denial translation table and host injection format enforce this

### Security and correctness tasks

- [x] Keep security claims bounded to point-in-time authority control → Resolved: the instruction sequence gap bounding table and sequence-level misuse finding are honest about residual risk; session budgets and anomaly detection are mitigations, not solutions
- [x] Add anomaly and sequence-risk monitoring that is operationally usable → [Anomaly detection spec](#anomaly-detection-spec) with 8 signal types, weight table, and MAS response
- [x] Validate commit-boundary ownership per high-risk tool family → open issue requiring real deployment — see [Open Issues and TODO](#open-issues-and-todo)
- [x] Verify that host, AS, MCP, and downstream interpret the same authority consistently → [Capability snapshot drift detection and reconciliation](#capability-snapshot-drift-detection-and-reconciliation) specifies the protocol and weekly runbook check

### Governance and policy tasks

- [x] Establish template review cadence, telemetry, and deprecation rules → [Template governance ownership and cadence](#template-governance-ownership-and-cadence)
- [x] Establish backend mapping ownership → resource catalog spec includes `owner` field and re-review trigger rules
- [x] Define policy debugging SLAs for operators and support → policy debugging diagnostic playbook defines the 4-step path; approval SLA targets defined in approval burden spec
- [x] Finalize audit integrity tiers for high-value events vs. ordinary telemetry → [Audit integrity and tamper evidence](#audit-integrity-and-tamper-evidence) defines three tiers: Tier 1 (tamper-evident/write-once), Tier 2 (append-only), Tier 3 (standard telemetry) with enumerated event lists
- [x] Finalize MAS service split and HA posture → [MAS Availability and Degraded Mode](#mas-availability-and-degraded-mode) specifies minimum HA requirements (active-active or active-passive across 2+ AZs, synchronous replication for governance records), degraded-mode behavior table, and operator controls; exact capacity ratios require load data from first deployment

### Advanced profile tasks

Advanced profile decisions (approved temporary elevation, sub-agent delegation, cross-domain federation, async enterprise approval) are tracked in [Open Issues and TODO](#open-issues-and-todo). All advanced profile sections are marked with "Skip this section on your first deployment" callouts throughout the document.

## Open Issues and TODO

The spec is complete. The items below are the remaining production-hardening and advanced-profile decisions that require real deployment data or deliberate product choices. See [What to do next — prioritized](#what-to-do-next--prioritized) for the ordered action list.

### Production-hardening (requires real deployment data)

- [x] Calibrate template fit with real prompts and tool traces → [Template Calibration Procedure](#template-calibration-procedure) and [Pre-Enforcement Deployment Checklist](#pre-enforcement-deployment-checklist) spec the full observation-mode-to-enforcement path
- [x] Validate backend resource mappings with system owners → [Resource Mapping Validation Procedure](#resource-mapping-validation-procedure) specifies the 4-check walkthrough and validation gate
- [x] Validate commit-boundary ownership and idempotency contracts per tool family → [Commit-Boundary Ownership Register](#commit-boundary-ownership-register) and [Commit-Boundary Idempotency Validation](#commit-boundary-idempotency-validation) spec the register and 4-test validation procedure
- [x] Measure revocation and freshness behavior under load → [Revocation Measurement Runbook](#revocation-measurement-runbook) defines instrumentation, measurement formula, and remediation table
- [x] Tune approval burden and step-up thresholds after real usage data → [Approval Burden 30-Day Review](#approval-burden-30-day-review) defines the 7-check review cadence and tuning actions

### Advanced profile (explicit product decisions required)

- [ ] Decide whether approved temporary elevation is in v2 scope — Mission-narrowing-only is the v1 default; requires concrete entitlement-broker use case to justify (P2 #9)
- [ ] Decide whether sub-agent delegation is v2 or out of scope — spec exists; needs a concrete product use case before implementation begins (P2 #10)
- [ ] Decide whether cross-domain federation is a product requirement — architecture is correct; political and operational cost must be explicitly budgeted before it is in scope (P2 #11)
- [ ] Decide whether async enterprise approval is needed — inline step-up is correct for v1; async requires a specific off-hours or multi-approver use case to justify (P2 #12)

### Short list — decisions before implementation starts

1. Fill the [Component Ownership Register](#component-ownership-register) — one named individual per component
2. Fill the [Commit-Boundary Ownership Register](#commit-boundary-ownership-register) — named downstream service owner per commit-boundary tool in the v1 pack
3. Confirm the v1 token validation model (self-contained JWTs with freshness check — per core profile mandatory defaults)
4. Deploy MAS in observation mode before enforcement — follow the [Pre-Enforcement Deployment Checklist](#pre-enforcement-deployment-checklist)

### What to do next — prioritized

This list orders all remaining open work by when it blocks forward progress. The spec is complete; what remains is deployment execution.

**P0 — Required before any enforcement goes live**

1. ~~**Assign component owners.**~~ → *Resolved: [Component Ownership Register](#component-ownership-register) defines accountability per component and the pre-sprint gate.*
2. ~~**Assign commit-boundary owners per tool family.**~~ → *Resolved: [Commit-Boundary Ownership Register](#commit-boundary-ownership-register) lists v1 template pack commit-boundary tools and the per-tool owner register.*
3. ~~**Deploy in observation mode first.**~~ → *Resolved: [Pre-Enforcement Deployment Checklist](#pre-enforcement-deployment-checklist) gives the ordered 6-step procedure from observation deployment through template-by-template enforcement switching.*
4. ~~**Calibrate templates with real prompts and traces.**~~ → *Resolved: [Template Calibration Procedure](#template-calibration-procedure) defines what to measure, calibration decisions, and the enforcement-readiness pass criteria.*

**P1 — First deployment validation (before calling v1 complete)**

5. ~~**Validate backend resource mappings with system owners.**~~ → *Resolved: [Resource Mapping Validation Procedure](#resource-mapping-validation-procedure) defines the 4-check walkthrough, validation register, and gate criteria.*
6. ~~**Measure actual revocation propagation under load.**~~ → *Resolved: [Revocation Measurement Runbook](#revocation-measurement-runbook) defines instrumentation points, measurement formula, 4 controlled tests, and remediation table for misses.*
7. ~~**Tune approval burden from usage data.**~~ → *Resolved: [Approval Burden 30-Day Review](#approval-burden-30-day-review) defines the 7-check review, tuning actions, and escalation path.*
8. ~~**Validate commit-boundary idempotency contracts.**~~ → *Resolved: [Commit-Boundary Idempotency Validation](#commit-boundary-idempotency-validation) defines 4 tests (basic replay, concurrent submission, advisory lock interaction, MAS unavailability) with pass criteria and Phase 4 gate requirements.*

**P2 — Decisions required before v2 planning starts**

9. **Approved temporary elevation: v2 or later?** The architecture supports it; the question is whether the organizational use case justifies the additional entitlement-broker integration before the v1 deployment is proven.
10. **Sub-agent delegation: v2 or out of scope?** The spec exists ([Delegation and Derived Sub-Missions](#delegation-and-derived-sub-missions-advanced-profile)); the question is whether the product roadmap has a concrete use case that requires it.
11. **Cross-domain federation: product requirement or future option?** The architecture is correct and the operational cost is real. A deliberate budget (engineering, political, operational) must be allocated before this is in scope — it cannot be added incrementally.
12. **Async enterprise approval: needed for a specific workflow?** Inline step-up is correct for v1. Async approval requires an operational use case (e.g., off-hours work submitted for next-morning review) to justify the added approval-routing infrastructure.

**P3 — Long-term operational health**

13. **Establish template governance review cadence.** Assign every template a named owner, `next_review_due` date, and risk tier per the [template governance spec](#template-governance-ownership-and-cadence). Templates without owners rot into policy debt.
14. **Establish operational review cadence.** Run the weekly operator review checklist ([Operator console model](#operator-console-model)) on a fixed schedule. Assign incident ownership for policy failures and false denials before the first production incident — not after.
15. **Build toward the full operator console spec.** The [Operator Admin Dashboard Spec](#operator-admin-dashboard-spec) defines the required views. Prioritize: active Missions + pending approvals (P0); recent denials + template drift + emergency controls (P1); full audit trace and policy explain integration (P2).

## Conclusion

After looking at the current OAuth and MCP pieces, my conclusion is:

1. **Yes, this architecture is implementable now.**
   You do not need to wait for a full new standard.

2. **No, OAuth and MCP do not solve it by themselves.**
   They give you the transport and token rails, not the authority model.

3. **The MAS is the key difference between a governed system and a clever demo.**
   Without a durable Mission state owner, the system falls back to token claims, prompt text, and orchestration metadata pretending to be authority.

4. **Containment is mandatory.**
   Mission approval without tool-boundary and commit-boundary containment is not governance. It is aspiration.

5. **The right implementation strategy is narrow first.**
   Start with a thin Mission record, a narrow Mission Authority Model, one host, one MCP family, capability snapshots, and Mission-scoped OAuth tokens. Then add richer semantics, runtime feedback, and advanced profiles only after the core path is stable.

6. **This spec is complete enough to build from.**
   The v1 product contract, compiler pipeline, Cedar policy reference, approval model, host integration, MCP enforcement, admin console, operator runbook, configuration management, revocation SLAs, audit integrity tiers, and test suite are all specified. The remaining open items are deployment-specific calibration decisions that require real usage data, not additional spec work.

The practical version of the architecture is:

> shape the Mission, store it durably, project it narrowly, enforce it at the tool boundary, and revalidate it at the commit boundary.

That is how the theory becomes a system. The [prioritized action list](#what-to-do-next--prioritized) in the Open Issues section translates the spec into the concrete decisions and measurements that need to happen before and during the first production deployment.
## Reference Appendices

The sections below are supporting reference material for the main design. They answer the implementation questions people usually ask after the core path is clear:

- signals and event contracts
- delegated sub-Missions
- lifecycle and amendment semantics
- credential, cache, and audit behavior
- worked pseudocode and end-to-end examples

### Artifact quick reference

The design produces seven primary artifacts. A new engineer should read this table before diving into any individual section — it is the map that shows how the pieces relate.

| Artifact | What it is | Who creates it | Who owns it | Where it lives | Created/updated when | Who reads it | If unavailable |
|---|---|---|---|---|---|---|---|
| **Governance record** | the durable authority record: approved intent, resource scope, stage gates, approval evidence | MAS compiler + approval workflow | MAS | MAS core store | at Mission creation; updated on amendment | MAS, audit trail | Mission cannot be activated; all downstream artifacts depend on this |
| **Policy bundle** (`constraints_hash` + entity snapshot + template policy) | the compiled, machine-evaluable authority state | MAS compiler | MAS / bundle distribution service | CDN-backed bundle store; keyed by `(mission_id, constraints_hash)` | on Mission creation and each amendment | AS (token issuance), host (PreToolUse), MCP server (tools/call), commit-boundary owner | enforcement points fall back to cached copy up to TTL, then fail closed |
| **Capability snapshot** | the host-facing planning surface: allowed tools, gated tools, denied actions, anomaly flags, budget status | MAS | MAS (served at `POST /missions/{id}/capability-snapshot`) | in-memory at host; refreshed from MAS | at session start; after authority transitions | host only — not shared with MCP server or AS directly | host enters restricted mode; denies all tool calls until snapshot is refreshed |
| **Approval object** | the evidence that a stage gate was satisfied: who approved, when, for what action, one-shot or reusable | approval workflow service | MAS | MAS core store; referenced in capability snapshot | on each approval event; expires after TTL | host (step-up check), commit-boundary owner (final gate check) | commit-boundary check fails closed; action is denied |
| **Token projection** (audience token) | an OAuth bearer token scoped to a specific MCP server audience; carries `mission_id`, `constraints_hash`, scopes | OAuth AS | AS (short-lived; subject token is the durable credential) | in-memory at host; presented to MCP server on each `tools/call` | on each token issuance request from host; expires in 300–900s | AS unreachable: no new tokens; existing valid tokens can be used up to their expiry for low-risk reads |
| **Runtime signals** | tool-use events, anomaly indicators, budget counter reports; the feedback loop from execution back to authority state | host (PostToolUse), MCP server, commit-boundary owner | MAS (after ingestion from buffer) | signal ingestion buffer → MAS core | on each tool use, anomaly detection, or budget threshold crossing | MAS (to update Mission state), audit trail | signals are async/fire-and-forget; buffered locally if MAS is unreachable; no enforcement effect until ingested |
| **Delegation artifacts** (child Mission record, narrowing proof) | evidence that a child agent's Mission is a strict subset of the parent's authority; used in sub-agent lineage verification | MAS (on `POST /missions/{id}/derive`) | MAS | MAS core store | on each child Mission derivation | MAS (lineage check on sub-agent tool calls) | child Mission cannot be created; sub-agent execution fails closed (v1: not used) |

**Reading this table:**

- The governance record is the root artifact — everything else is derived from or validates against it
- The policy bundle is the compiled representation of the governance record — it is what enforcement points actually evaluate
- The capability snapshot is a host-local view of the policy bundle, optimized for planning decisions
- The approval object is the runtime evidence that a human or policy checkpoint was satisfied
- Token projections carry authority into the tool transport layer
- Runtime signals close the feedback loop from execution back to authority state
- Delegation artifacts are the lineage chain for sub-agent trust (v1: not applicable)

**The key dependency chain:**

```
governance record
  → policy bundle (compiled from governance record)
    → capability snapshot (planning view of policy bundle)
    → token projection (audience-scoped credential from policy bundle)
      → tools/call enforcement (evaluates policy bundle + token)
        → commit-boundary check (evaluates approval object + live policy bundle)
          → runtime signals (feed back into governance record on next MAS update)
```

### Capability snapshot drift detection and reconciliation

Capability snapshot drift occurs when different enforcement points hold inconsistent views of the same Mission's authority. The `constraints_hash` is the primary anti-drift mechanism, but it does not cover all drift modes.

**Drift taxonomy — what can differ between enforcement points:**

| Drift type | Components at risk | Root cause |
|---|---|---|
| Entity snapshot version | host, MCP server, AS | cache TTL mismatch — one component refreshed, another hasn't yet |
| Approval object validity | host precheck vs. commit-boundary owner | approval expired between the host precheck and the downstream write |
| Token claims vs. current Mission | AS token vs. current MAS state | token was issued under an older `constraints_hash` than the current one |
| Anomaly flag state | host vs. MCP server | host received a suspension signal; MCP server has not yet refreshed |
| Session budget count | host local count vs. MAS authoritative count | host hasn't flushed its counter report yet |

**The primary reconciliation mechanism: `constraints_hash`**

Every enforcement point holds a local copy of the entity snapshot keyed by `constraints_hash`. The protocol is:

1. On each authorization decision, the enforcement point reads the `constraints_hash` from the incoming request (token claim or session binding)
2. If it matches the locally cached hash: use the cached snapshot
3. If it does not match: fetch a fresh snapshot from MAS before evaluating; do not fall back to the stale cached copy
4. If MAS is unreachable when a fresh snapshot is needed: fail closed — do not evaluate against a stale snapshot for high-risk decisions

This protocol ensures that a changed Mission constraint propagates to all enforcement points within one entity snapshot TTL, not just to the first one that polls.

**Approval object drift:**

The host precheck verifies approval state from the capability snapshot (which may be up to TTL seconds old). The commit-boundary owner verifies approval from a live MAS call. This creates a window where the host precheck sees a valid approval but the commit-boundary check sees an expired one.

Resolution rule: **the commit-boundary check is authoritative**. A discrepancy means the host precheck was working from stale data. The commit-boundary owner denies the action. This is the correct fail-closed behavior — it may occasionally deny an action that the host thought was approved, but it never allows an action that the commit-boundary owner considers unauthorized.

**Operational reconciliation check (weekly runbook item):**

Periodically verify that enforcement points are not diverging in practice:

1. For 5 random active Missions, call the capability snapshot endpoint and record the `constraints_hash`
2. Call each configured MCP server's health endpoint and confirm it is serving policy for the same `constraints_hash`
3. Call the AS introspection endpoint for a recent token and confirm the `constraints_hash` claim matches
4. If any endpoint returns a different hash for the same Mission: that endpoint is serving stale policy and must be forced to refresh

This check catches cases where an enforcement point has a broken cache invalidation path and is running on perpetually stale policy.

**What to do when reconciliation fails:**

If an enforcement point cannot refresh its snapshot (MAS unavailable) and its cached snapshot is expired, it must:
- deny all high-risk calls (writes, commits, gated actions)
- allow low-risk reads up to the entity snapshot TTL (120s default)
- surface a degraded-mode indicator to operators
- not evaluate stale policy against high-risk decisions

### How Signals Are Used

Signals are an optimization path for propagation speed, not the safety foundation. The safety model does not depend on signals arriving. Short token lifetimes, commit-boundary live checks, and TTL-bounded caches are what enforce correctness when signals are delayed or dropped. Signals reduce the lag between a Mission state change and the moment enforcement points act on it — but every enforcement point must still fail closed when its cached state is stale, regardless of whether a signal arrived.

Build the signal rail to improve responsiveness. Do not wire enforcement to it as a primary control.

Signals are the thing that keeps this from becoming a static approval record that consumers cache indefinitely.

Use them in two directions:

#### Outbound from MAS

- Mission suspended
- Mission revoked
- approval granted
- approval expired
- delegation revoked

Targets:

- OAuth AS
- MCP gateway
- MCP servers
- agent host

#### Inbound to MAS

- denied tool calls
- anomalous argument patterns
- prompt-injection indicators
- repeated out-of-scope attempts
- commit-boundary denials
- sub-agent spawn events

Targets:

- MAS event API
- policy cache invalidation path
- audit sink

This is where the earlier IAM-stack post's CAEP and Shared Signals discussion becomes concrete. You do not need universal event standards on day one, but you do need an explicit event contract.

At minimum, the signal contract should carry:

- `mission_id`
- `constraints_hash`
- actor identity
- event type
- risk level
- affected tool or resource
- timestamp
- decision taken

Examples:

- host sends `tool.denied`, `tool.deferred`, `prompt.out_of_scope`
- MCP server sends `tool.called`, `tool.denied`, `commit.required`, `commit.denied`
- MAS sends `mission.suspended`, `mission.revoked`, `approval.granted`, `approval.expired`

That is enough for:

- policy cache invalidation
- token revocation or non-renewal
- session risk escalation
- forced step-up approval

### Anomaly Detection Spec

The signal rail carries anomaly signals, but anomaly detection must be defined precisely enough to implement. Vague "anomaly detection" that is never specified is not a control.

**What counts as an anomaly signal:**

| Signal type | Trigger condition | Risk weight |
|---|---|---|
| `repeated_denial` | same tool denied 3+ times in one session | medium |
| `out_of_scope_attempt` | tool attempted that is not in Mission scope | low (first attempt) / high (3+) |
| `commit_boundary_retry` | same commit-boundary action attempted after denial, within 60s | high |
| `prompt_injection_indicator` | tool result contains known injection patterns (see sanitize_tool_result heuristics) | high |
| `argument_pattern_anomaly` | tool arguments deviate significantly from expected schema (e.g., SQL-like strings in display name fields) | medium |
| `cross_resource_exfil_pattern` | read from sensitive resource class followed immediately by write to external resource class, same session | high |
| `session_budget_spike` | resource class calls spike >50% over session average in a 5-minute window | medium |
| `unexpected_tool_sequence` | tool call sequence is statistically improbable given Mission purpose class (requires per-purpose baseline) | medium (optional in v1) |

**How the host emits anomaly signals:**

The `PostToolUse` hook checks the sanitized tool result and call context. If any trigger condition is met, the hook calls:

```http
POST /missions/{mission_id}/events
Content-Type: application/json

{
  "event_type": "anomaly.detected",
  "mission_id": "mis_01abc",
  "constraints_hash": "sha256-abc123",
  "signal_type": "repeated_denial",
  "tool": "mcp__email__send_external",
  "session_id": "ses_01xyz",
  "timestamp": "2026-04-12T14:22:00Z",
  "detail": "Tool denied 3 times in current session"
}
```

**How MAS responds to anomaly signals:**

| Accumulated signal weight | MAS action |
|---|---|
| 1-2 medium signals | log to audit; no enforcement change |
| 3+ medium signals or 1 high signal | set `anomaly_flag` on Mission; push updated capability snapshot; host restricts planning |
| 2+ high signals in same session | suspend Mission; emit `mission.suspended` with `reason: anomaly` |
| `prompt_injection_indicator` or `cross_resource_exfil_pattern` | suspend immediately on first occurrence |

**`anomaly_flags` in the capability snapshot:**

```json
{
  "anomaly_flags": [
    {
      "flag_type": "repeated_denial",
      "tools_restricted": ["mcp__email__send_external"],
      "since": "2026-04-12T14:22:00Z",
      "severity": "medium"
    }
  ]
}
```

When `anomaly_flags` is non-empty, the host restricts planning to exclude flagged tools and surfaces: "I've restricted some tools in this session due to unusual activity. Contact your Mission operator if this is unexpected."

**What anomaly detection does NOT do:**

- It does not catch instruction-sequence attacks that use only permitted tools (see the instruction sequence section)
- It does not evaluate intent — it evaluates patterns against baselines
- It is not a substitute for audit; it is a real-time trip wire

**V1 minimum:** implement `repeated_denial`, `out_of_scope_attempt`, and `commit_boundary_retry` signals. Defer `unexpected_tool_sequence` (requires per-purpose baselines) until per-Mission signal history is available.

### Delegation and Derived Sub-Missions (Advanced Profile)

> **Skip this section on your first deployment.** Delegation and sub-agent orchestration add significant complexity — lineage verification, narrowing proofs, child Mission lifecycle, and `act` chain propagation. The V1 product boundary explicitly excludes multi-agent orchestration. Come back to this section when you have a working single-agent deployment and a concrete use case that requires sub-agents.

Sub-agents should not inherit Mission authority by default.

When the host needs a sub-agent, it should request a **derived sub-Mission** from the MAS. The sub-Mission should:

- inherit the parent `mission_id` as lineage, not as shared authority
- receive a new `mission_id`
- receive a new `constraints_hash`
- narrow resource classes, actions, and tools
- reduce or preserve, but never broaden, time bounds
- reduce or preserve, but never broaden, delegation depth

For example:

```json
{
  "mission_id": "mis_child_01",
  "parent_mission_id": "mis_parent_01",
  "lineage": ["mis_parent_01"],
  "approved_tools": ["mcp__finance__erp.read_financials"],
  "actions": ["read", "summarize"],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0,
    "inherit_by_default": false
  },
  "constraints_hash": "sha256-child-123"
}
```

The derivation rules should be simple:

1. child tools must be a subset of parent tools
2. child actions must be a subset of parent actions
3. child expiry must not exceed parent expiry
4. child stage gates must be preserved when the child can reach the same commit boundary
5. child lineage must be visible in audit and token `act` chains

That gives you two separate but aligned records:

- OAuth `act` preserves who called whom
- Mission lineage preserves which delegated authority record the child was operating under

Do not let one stand in for the other.

#### Issuance-time narrowing proof

Derived sub-Missions and delegated tokens should not be created on trust. The issuer should prove at issuance time that the child request is no broader than the parent authority.

Use a fixed narrowing check:

1. compare requested child tools to parent-approved tools
2. compare requested child actions to parent-approved actions
3. compare requested child domains to parent-approved domains
4. compare requested child expiry to parent expiry
5. compare requested child delegation depth to parent remaining depth
6. verify any parent stage gates that still apply are preserved

If any requested child field exceeds the parent envelope, issuance fails or moves to a new approval flow.

#### Narrowing proof artifact

The narrowing result should be stored as its own artifact so downstream audit and debugging can reconstruct why the child was issuable.

Minimum artifact:

```json
{
  "proof_id": "proof_01JR9T1R7W",
  "parent_mission_id": "mis_parent_01",
  "child_mission_id": "mis_child_01",
  "checked_dimensions": {
    "tools": "subset",
    "actions": "subset",
    "domains": "subset",
    "time": "subset",
    "delegation": "subset",
    "stage_gates": "preserved"
  },
  "result": "pass",
  "generated_at": "2026-04-11T18:00:00Z"
}
```

If any dimension fails, the artifact should still be stored with `result = fail` and the failing dimension named.

#### Narrowing check pseudologic

Use a fixed evaluation order:

1. reject if parent Mission is not `active`
2. reject if parent `constraints_hash` is stale in caller context
3. reject if child requests any unknown tool, action, or domain
4. reject if child tool set is not exact subset of parent tool set
5. reject if child action set is not exact subset of parent action set
6. reject if child domain set is not exact subset of parent domain set
7. reject if child expiry exceeds parent expiry
8. reject if child delegation depth exceeds parent remaining depth
9. reject if child removes a parent gate for the same side effect
10. persist proof artifact and issue child only if all checks pass

### Narrowing algorithm

Use exact subset checks for each dimension:

| Dimension | Required rule |
|---|---|
| tools | child tools must be subset of parent tools |
| actions | child actions must be subset of parent actions |
| domains | child domains must be subset of parent domains |
| time | child expiry must be less than or equal to parent expiry |
| delegation | child max depth must be less than parent remaining depth |
| stage gates | parent gate must be preserved if child can reach same effect |

Required behavior:

- any non-subset result fails issuance
- unknown tool, domain, or action fails issuance
- preserved stage gates must appear in child governance and policy bundle
- proof result must be persisted as an artifact, not only logged

### Example narrowing check

Parent Mission:

```json
{
  "approved_tools": [
    "mcp__finance__erp.read_financials",
    "mcp__docs__docs.read",
    "mcp__docs__docs.write"
  ],
  "actions": ["read", "summarize", "draft"],
  "allowed_domains": ["enterprise"],
  "delegation_bounds": {
    "subagents_allowed": true,
    "max_depth": 1
  }
}
```

Child request:

```json
{
  "approved_tools": [
    "mcp__finance__erp.read_financials"
  ],
  "actions": ["read", "summarize"],
  "allowed_domains": ["enterprise"],
  "delegation_bounds": {
    "subagents_allowed": false,
    "max_depth": 0
  }
}
```

This child is narrower on every dimension, so issuance can proceed.

If the child instead asked for:

- `mcp__email__email.send_external`
- `publish_external`
- domain `partner.example`

the narrowing check fails immediately because those are not inside the parent envelope.

### What gets recorded

When narrowing succeeds, record:

- parent `mission_id`
- child `mission_id`
- parent `constraints_hash`
- child `constraints_hash`
- narrowing decision timestamp
- dimensions checked:
  - tools
  - actions
  - domains
  - time
  - delegation

That makes the delegated authority chain auditable. It also means a later reviewer can tell whether the child was properly attenuated or not.

### Narrowing proof for delegated tokens

Use the same rule for delegated token issuance.

Before minting a child or downstream token, the AS should verify:

- the requested audience is inside Mission scope
- the requested tools or scopes are a subset of current Mission projection
- the token lifetime does not exceed Mission lifetime
- the `act` chain reflects the actual caller

This is the issuance-time equivalent of runtime containment. It stops over-broad delegated authority before the token exists.

### Lifecycle and Runtime Consequences

Mission states should have operational meaning, not just labels.

Use at least these states:

- `pending_approval`
- `active`
- `suspended`
- `revoked`
- `completed`
- `expired`

| State | New Token Issuance | New Tool Calls | Commit-Boundary Actions | Expected Host Behavior |
|---|---|---|---|---|
| `pending_approval` | no | restricted safe set only | no | planning / clarification only |
| `active` | yes | yes, subject to policy | yes, with gates | normal execution |
| `suspended` | no | fail closed for risky calls | no | pause and wait |
| `revoked` | no | no | no | terminate Mission execution |
| `completed` | no | no new work | no | finalize and stop |
| `expired` | no | no | no | fail closed until renewed or replaced |

The runtime consequences should be explicit:

#### `pending_approval`

- host may allow only shaping, clarification, and safe local planning
- AS denies Mission-scoped token issuance except for explicitly safe bootstrap actions
- MCP servers deny normal tool execution

#### `active`

- host enforces normal Mission policy
- AS may mint Mission-scoped tokens
- MCP servers may honor those tokens subject to local policy

#### `suspended`

- host stops new high-risk tool calls
- AS stops minting new tokens
- MCP servers fail closed on new `tools/call`
- existing long-running work should be paused at the next non-bypassable checkpoint

#### `revoked`

- host terminates active Mission execution
- AS revokes or refuses renewal of existing Mission-derived tokens
- MCP servers reject the next call even if the token has not naturally expired
- sub-Missions are recursively suspended or revoked

#### `completed`

- host stops using the Mission for new work
- AS stops minting new tokens
- read-only post-completion audit access may remain

#### `expired`

- treat like a time-based revocation
- all new token issuance stops
- commit-boundary actions fail closed until Mission is renewed or replaced

#### In-flight work

You do not need to kill every TCP session instantly. You do need non-bypassable rechecks at:

- token issuance
- `tools/call`
- commit boundary
- sub-agent spawn
- stop / completion

That is where state changes become operational.

#### Revocation propagation sequence

When a Mission is revoked by a business event, the propagation path should be explicit.

1. Business event reaches the MAS: employee offboarding, policy violation, or manual operator action.
2. MAS marks the Mission `revoked` and recursively marks all child sub-Missions.
3. MAS emits `mission.revoked` on the signal rail.
4. Each consumer acts on the signal:
   - **AS**: refuses token refresh and new issuance for this `mission_id`; if opaque or introspected tokens are used, introspection returns inactive
   - **PAM / NHI system**: rotates or revokes Mission-scoped credentials
   - **orchestrator / agent host**: stops execution at the next non-bypassable checkpoint
   - **MCP servers and resource servers**: reject requests carrying this `mission_id` on next call, either via introspection or via Mission freshness checks
5. MAS records the revocation event with actor identity, reason, and timestamp.

Revocation does not mean instant stoppage everywhere. It means no new authority is granted and existing authority is not renewed. Short token lifetimes, forced re-issuance, and commit-boundary rechecks are what make that effective in practice. For high-sensitivity Missions, supplement signal-based propagation with direct token revocation endpoint calls to shorten the enforcement window.

#### Revocation sequence

```text
Business Event         MAS              Signal Rail          AS / PAM           Host / MCP
      |                |                   |                   |                   |
      | revoke         |                   |                   |                   |
      |--------------->| mark revoked      |                   |                   |
      |                | mark children     |                   |                   |
      |                |------------------>| mission.revoked   |                   |
      |                |                   |------------------>| stop issue/rotate |
      |                |                   |--------------------------------------->|
      |                |                   |                   | reject next call   |
      |                | audit event       |                   |                   |
```

#### Mission expiry notification contract

Missions expire at `expires_at`. Expiry is a predictable event unlike revocation — users and hosts should be warned before it happens, not surprised after.

**Warning thresholds:**

| Time before expiry | Action |
|---|---|
| 24 hours | MAS emits `mission.expiring_soon` with `expires_at` and `mission_id`; notification delivery sends user-facing alert |
| 1 hour | MAS emits second `mission.expiring_soon`; host should surface passive warning in UI |
| 15 minutes | MAS emits `mission.expiring_soon` with urgency flag; host should surface inline warning before next tool call |
| 0 (expiry) | MAS transitions to `expired`; emits `mission.expired` |

**`mission.expiring_soon` event shape:**

```json
{
  "event_type": "mission.expiring_soon",
  "mission_id": "mis_01abc",
  "constraints_hash": "sha256-abc123",
  "expires_at": "2026-04-11T23:59:59Z",
  "minutes_remaining": 60,
  "urgency": "normal",
  "renewal_url": "https://mas.example.com/missions/mis_01abc/renew"
}
```

`urgency` values: `normal` (24h), `high` (1h), `critical` (15m).

**Host behavior on expiry warning:**
- `normal`: log the warning; no user-facing UI change needed
- `high`: surface a passive indicator ("Mission expires in ~1 hour") in agent status area
- `critical`: surface inline before the next tool call: "Your Mission expires in 15 minutes. Continue without renewal? [Renew] [Continue]"

**Renewal path:** `POST /missions/{id}/renew` with an updated `requested_expiry`. If the original approval was auto-approved and the renewal scope matches, MAS auto-renews and emits `mission.renewed`. If scope changed or approval tier requires human review, MAS moves to `pending_approval` for the renewal. The Mission remains active with the original `expires_at` during review; if review is not resolved before expiry, the Mission transitions to `expired` and execution must stop.

**What the host must not do:** continue tool execution after `expires_at` has passed, even if a valid token has remaining lifetime. The Mission state is authoritative; token lifetime is not.

### Create vs. amend vs. clone: decision tree

When a Mission doesn't cover what the user needs, there are three paths. The wrong choice has real consequences: unnecessary new Missions fragment audit continuity; inappropriate amendments bypass intended approval scope; cloning stale templates reintroduces revoked permissions.

**Decision tree:**

```
User requests out-of-scope work
│
├─ Is there a currently active Mission?
│  │
│  ├─ YES → Is the new scope additive to the current Mission's purpose?
│  │         │
│  │         ├─ YES (same purpose class, new tools or resource classes)
│  │         │   → AMEND (broadening path; may require approval)
│  │         │
│  │         └─ NO (genuinely different purpose class, e.g., switching from
│  │                board_packet_preparation to support_ticket_triage)
│  │             → NEW MISSION (complete or suspend the current Mission first)
│  │
│  └─ NO (Mission is completed, expired, or does not exist)
│      │
│      ├─ Is the new work the same purpose class as a recently completed Mission?
│      │   │
│      │   ├─ YES → CLONE (preferred; preserves template version and approval history
│      │   │               as starting context; creates a fresh Mission with new id)
│      │   │
│      │   └─ NO → NEW MISSION from template
│      │
│      └─ Is the user trying to extend scope mid-task on a completed Mission?
│          → CLONE then AMEND if the completed Mission's scope wasn't broad enough
```

**When to amend (not create new):**
- The Mission is `active` and the user discovers mid-task that they need one more resource class or tool
- The purpose class is unchanged
- The additional scope is a minor expansion, not a fundamentally different work pattern
- The user wants audit continuity (the amendment record shows scope was added, not that a new Mission was created to work around the original)

Amending mid-task preserves the chain of authority for the work in progress. Creating a new Mission to get more scope while the original is active splits the audit record and may leave the original Mission's approvals unused.

**When to create new (not amend):**
- The new work is a genuinely different purpose class (different work pattern, different risk profile, different approver group)
- The current Mission's approval basis does not apply to the new work
- The user is done with the current work and starting something unrelated
- The organization policy prohibits broadening Missions after activation (some compliance contexts require this)

**When to clone (not start from scratch):**
- The user is doing the same work they did last week (same purpose class, same template)
- A Mission for this purpose class was recently completed and the template has not changed
- The user wants to start with the same tool set and approval structure as last time without re-negotiating scope

Cloning creates a new Mission record with a new `mission_id`. It does not copy the completed Mission's active approvals — those are scoped to the completed Mission. The new Mission starts fresh through its own approval path, but with the template version and structure pre-populated.

**What the host should offer by default:**

When the user makes a request that is outside the current Mission scope, the host should follow this priority:

1. If the current Mission is `active` and the request is additive to the same purpose class: surface the amendment path first. "This is outside my current Mission. Want me to request an amendment to add [resource/tool]?"
2. If the current Mission is `active` but the purpose class is different: surface new Mission creation. "This looks like a different kind of work. Want me to start a new Mission for it? You'll need to complete or pause the current one first."
3. If no Mission is active and a recently completed Mission matches: surface cloning. "You've done this before under [Mission name]. Want me to start a new Mission based on that one?"
4. If no match: surface new Mission creation from template. "I don't have a Mission for this. Let me help you create one."

The host should never silently work around a scope gap by treating a denial as final without offering the user a path forward.

### Natural-language amendment request UX

When a user asks the agent to do something outside the current Mission scope, the agent must not jump directly to `POST /missions/{id}/amend`. The user asked for a task, not an amendment — they don't know what an amendment is. The host translates the user's request into an amendment proposal, explains it in plain language, and gets confirmation before submitting.

**Step 1 — Agent explains the scope gap:**

Host injection (trusted system context, not shown to user directly):
```
[SCOPE GAP] Tool mcp__email__send_external is outside the current Mission scope.
Current Mission: board_packet_preparation
Current allowed actions: read, draft, publish (internal only)
Amendment path: broadening (requires Controller approval)
```

Model response template:
```
I can draft the email content for you, but sending it directly to the board is outside my current Mission — this Mission is set up for internal document work only.

I have a couple of options for you:
1. I can prepare the email as a draft you send manually — that doesn't need any approval change
2. I can request access to send email for this Mission, but that would need Finance Controller approval (usually takes a few hours)

Which would you prefer?
```

**Step 2 — User requests the amendment path:**

If the user says "request the access" or equivalent, the host surfaces the amendment proposal before submitting:

Host injection:
```
[AMENDMENT PROPOSAL]
Type: broadening
Add tools: mcp__email__send_external
Add resource classes: email.external_write
Approval required: controller_approval (Finance Controller)
Current Mission status: active (continues during review)
Effect on current work: none — current Mission stays active; email access added only after approval
```

Model response template:
```
Here's what requesting email access would mean for this Mission:

  Adding: ability to send email to external recipients
  Who needs to approve: Finance Controller
  Current work: continues normally while we wait

Want me to submit this amendment request?
[Yes, submit it] [No, I'll send it myself]
```

**Step 3 — User confirms; host submits the amendment:**

After user confirmation, the host calls `POST /missions/{id}/amend` with the broadening delta. The MAS response indicates whether the amendment auto-approved or is pending:

If auto-approved:
```
Done — email access has been added to this Mission.
I can now send the board notification. Ready when you are.
```

If pending approval:
```
I've submitted the amendment request. Once the Finance Controller approves it, I'll be able to send the email directly.

In the meantime, want me to prepare the email draft so it's ready to send the moment we get approval?
```

**Step 4 — User declines the amendment:**

```
Got it. I'll prepare the email as a draft you can send — I'll format it exactly as you'd want it to appear.
```

The host does not retry the denied path, does not attempt to send via alternative tools, and does not suggest workarounds that would achieve the same effect outside the Mission scope.

**Host injection format for scope gap detection:**

The `PreToolUse` hook identifies the scope gap and injects a structured context block into the next system prompt segment:

```json
{
  "scope_gap": {
    "requested_tool": "mcp__email__send_external",
    "denial_reason": "tool_not_in_allowed_set",
    "amendment_path_available": true,
    "amendment_requires_approval": "controller_approval",
    "alternative_paths": [
      { "description": "Draft the content without sending", "requires_amendment": false }
    ]
  }
}
```

The model uses this context to generate the response in Step 1 above. Without this structured context, the model must guess at amendment paths, which produces inconsistent UX.

**What the model must NOT do after a scope denial:**
- silently attempt the action via a different tool that achieves the same effect
- tell the user "I can't do that" without offering an amendment or alternative path
- submit an amendment request without user confirmation
- imply that the amendment is guaranteed to be approved

### Mission Amendment and Scope Change

A running Mission may need modified scope mid-execution.

The governing rule is simple: narrowing is automatic, broadening requires approval.

#### Narrowing

If the MAS removes tools, resource classes, or actions from an active Mission:
- produce a new `constraints_hash`
- emit `mission.amended` with the new hash
- downstream caches invalidate and refresh
- tokens issued under the old hash are not renewed for the removed scope

No new approval is required. The authority envelope contracted.

Narrowing compiler pipeline: run only steps 8-10 (rebuild enforcement bundle, recompute `constraints_hash`, persist). The original proposal, classification, and approval basis are preserved unchanged. The amendment record references the prior `constraints_hash` so audit can reconstruct the change.

#### Broadening

If a running Mission requests expanded authority:
- treat it as a new sub-request requiring its own approval path
- do not allow the Mission to expand scope in place without going through the approval path
- if organizational auto-approval rules cover the addition, the MAS issues a new `constraints_hash` after the policy check
- if they do not, the Mission moves to `pending_approval` for the delta

Scope contraction can be applied immediately by governance. Scope expansion is a new authorization event.

Broadening compiler pipeline: run steps 4-10 against the delta only (resolve the added resources against the catalog, score the delta for approval, build a delta review packet, emit for approval if needed, produce a provisional enforcement bundle for the broadened scope, and emit a new `constraints_hash` only after approval is granted). The existing `constraints_hash` remains active for the non-broadened scope while the delta is pending. Child Missions derived before the broadening remain at their original narrow scope.

#### Handling `pending_clarification`

When shaping produces unresolved open questions, the Mission enters `pending_clarification`. The model-facing message for this state should be injected as trusted system context:

```
[Mission pending: the requested Mission could not be compiled because one or more questions are unresolved. You may ask the user for clarification on these points. No tool calls are permitted until the Mission is activated.]
```

The host should surface the `open_questions` list from the MAS response to the user and collect answers. When answers are ready, the host calls `POST /missions/{id}/clarify` with the resolved responses. The MAS then re-runs approval classification from step 2. If the resolved proposal now qualifies for auto-approval, the Mission activates immediately. If it still requires human step-up, it moves to `pending_approval`.

Required behavior:

- the host must not attempt tool calls while the Mission is `pending_clarification`
- clarifications must be submitted through `POST /missions/{id}/clarify`, not by re-issuing a `POST /missions` request with a new proposal
- the prior proposal and unresolved questions must be preserved in audit even after resolution

#### In-flight token handling on amendment

When a `constraints_hash` changes:
- tokens issued under the old hash are treated as **stale projections**
- stale projections may remain syntactically valid JWTs, but they are not sufficient for new authorization decisions
- host, AS, and MCP caches must invalidate immediately
- fresh token issuance is required under the new hash before the next external tool call
- commit-boundary checks must always use the current `constraints_hash`, not the token's embedded hash

#### Amendment diff API

Raw `constraints_hash` values are not actionable for admins. When a hash changes, operators need to see what changed in human-readable terms.

**Endpoint:**

```
GET /missions/{mission_id}/amendments/{amendment_id}/diff
```

**Response shape:**

```json
{
  "amendment_id": "amend_987",
  "mission_id": "mission_abc",
  "prior_constraints_hash": "a3f8d1...",
  "new_constraints_hash": "b9c2e4...",
  "amended_at": "2025-10-15T09:14:00Z",
  "amended_by": "user_012",
  "amendment_reason": "scope narrowed after anomaly signal accumulation",
  "diff": {
    "added_tools": [],
    "removed_tools": ["mcp__email__send_external"],
    "added_resource_classes": [],
    "removed_resource_classes": [],
    "added_action_classes": [],
    "removed_action_classes": ["external_write"],
    "added_stage_gates": [
      { "tool": "docs.publish", "gate_type": "controller_approval" }
    ],
    "removed_stage_gates": [],
    "changed_time_bounds": {
      "prior_expires_at": "2025-10-16T00:00:00Z",
      "new_expires_at": "2025-10-15T17:00:00Z"
    },
    "changed_session_budgets": {
      "prior": { "max_external_calls_per_session": 20 },
      "new": { "max_external_calls_per_session": 5 }
    }
  },
  "human_summary": "Tool mcp__email__send_external was removed. Action class external_write was removed. Stage gate controller_approval was added to docs.publish. Session expiry moved from 2025-10-16 00:00 to 2025-10-15 17:00. External call budget reduced from 20 to 5."
}
```

**`human_summary` generation rule:** the MAS generates this field at amendment time (not lazily on request) so it is always available without re-diffing. The summary should be in plain English, past tense, organized as: tools removed first, tools added second, gates added/removed, time bounds, budget changes.

**What the field contains (and what it does not):**
- contains: logical policy changes in terms operators understand (tools, resource classes, gates, time bounds, budgets)
- does not contain: raw Cedar policy text, `constraints_hash` internals, JSON field diffs of the enforcement bundle, or any cryptographic material

**Admin dashboard presentation:** the amendment diff view in the admin dashboard should surface `human_summary` prominently, with `prior_constraints_hash` and `new_constraints_hash` shown as truncated (first 12 chars) references. Operators click through to the full diff if they need field-level detail. Amendment records are immutable — admins cannot edit them.

**List endpoint for amendment history:**

```
GET /missions/{mission_id}/amendments
```

Returns amendments in reverse chronological order. Each entry includes `amendment_id`, `amended_at`, `amended_by`, `human_summary` (abbreviated to 200 chars), and `prior_constraints_hash` / `new_constraints_hash`. Full diff is available via the individual amendment endpoint.

### Credential Lifecycle and PAM

Mission state is the authority clock. PAM is the credential governor. They need to be coupled.

When Mission state changes, the signal rail is not only for token consumers. It is also for the credential layer that manages secrets, brokered credentials, and JIT access the agent used during execution.

#### On Mission activation

- JIT credential provisioning is triggered for tool APIs that require brokered secrets
- PAM records the `mission_id` as the authority basis for any credentials issued
- PAM time-bounds credentials to Mission `expires_at` at most

#### On Mission suspension

- new JIT credential requests should be denied
- existing live credentials move to a hold state
- PAM should not issue new secrets for the Mission until it is resumed

#### On Mission revocation or completion

- PAM rotates or revokes all credentials issued under that Mission
- Brokered sessions (database connections, cloud API sessions, SSH certificates) should be terminated at their next renewal or sooner if PAM supports active teardown
- The credential teardown event is recorded in the MAS audit trail

#### Signal contract with PAM

The PAM or NHI system subscribes to:
- `mission.activated` → provision JIT credentials if needed
- `mission.suspended` → hold new issuance
- `mission.revoked` → revoke all Mission-scoped credentials
- `mission.completed` → revoke and clean up

Not all credential types support instant revocation. Short-lived credentials with forced re-issuance and explicit `mission_id` binding are the most effective pattern. Long-lived API keys should not be used for Mission-scoped work unless a PAM system that participates in this lifecycle governs them. The requirement is not instantaneous teardown across every credential type. It is that PAM is a first-class consumer of Mission lifecycle events rather than a system operating on its own unrelated clock.

### Sender-Constrained Token Choice

The note already says bearer tokens are not enough. Pick a concrete pattern.

The most practical default here is:

1. **trusted local gateway for host-side isolation**
2. **sender-constrained tokens where the downstream service supports them**

That means:

- the model never sees raw downstream credentials
- Claude Code hooks call a local gateway or local client wrapper
- the gateway holds tokens in memory or OS-backed secure storage
- the gateway attaches tokens to MCP or API calls
- the MCP server or API validates sender constraints where possible

If you have to choose one implementation pattern first, choose:

- local gateway for host tools and MCP calls
- DPoP for HTTP APIs that support it

If a service cannot validate sender-constrained tokens, keep the token audience narrow and lifetime short, and require commit-boundary rechecks for high-risk effects.

### Cache and Consistency Model

The architecture now uses:

- MAS as source of truth
- `constraints_hash` as compiled-state version
- local policy caches in host, AS, and MCP servers

Make the cache rules explicit.

#### What is cached

- compiled Cedar policy bundle
- current Mission summary
- current approval state
- current risk state

#### Cache keys

- `mission_id`
- `constraints_hash`
- approval version
- risk version

#### Cache invalidation

Invalidate immediately on:

- `mission.suspended`
- `mission.revoked`
- `approval.granted`
- `approval.expired`
- `delegation.revoked`
- risk escalation above threshold
- new `constraints_hash`

#### TTL guidance

- host local policy cache: very short, for example 30-120 seconds
- MCP policy cache: short, for example 30-120 seconds
- AS policy cache: short to moderate, for example 60-300 seconds

The exact numbers are less important than one rule:

high-risk calls must not rely on long-lived cached authority state.

#### Fail-open vs fail-closed

Use a simple rule:

- low-risk reads may use short-lived cached state if MAS is temporarily unavailable
- token issuance fails closed when Mission freshness cannot be established
- commit-boundary actions fail closed when live approval or status cannot be confirmed

That is the minimum consistency model that keeps the system practical without weakening containment.

#### Consolidated fail-closed rules

Every component in this architecture has a specific fail-closed obligation. This table collects them in one place so implementations can be verified against a single checklist.

| Component | Trigger | Required behavior |
|---|---|---|
| **Mission compiler** | unknown tool not in catalog | deny — do not compile to `active`; surface as `denied` with `reason: unknown_tool` |
| **Mission compiler** | tool in catalog but not in template's allowed set | deny — hard deny; not a clarification question |
| **Mission compiler** | step 9b validation failure | deny — do not emit enforcement bundle; compiler fails closed |
| **OAuth AS** | Mission status is not `active` | deny token issuance; return `400 invalid_grant` |
| **OAuth AS** | `constraints_hash` mismatch with MAS live state | deny token issuance |
| **OAuth AS** | MAS unreachable during token issuance | deny token issuance — do not fall back to cached state for issuance |
| **Host (`PreToolUse`)** | no Mission context loaded | restrict to planning-only; deny all tool calls |
| **Host (`PreToolUse`)** | cached `constraints_hash` stale beyond TTL | deny tool call; refresh capability snapshot first |
| **Host (`PreToolUse`)** | Cedar denies the request | deny tool call; emit denial signal |
| **Host (`PreToolUse`)** | Mission status not `active` | deny tool call; surface Mission state to model |
| **Host (`PreToolUse`)** | anomaly flag active for requested tool | deny tool call; surface restricted-mode message |
| **Host** | capability snapshot refresh fails | enter restricted mode; deny all tool calls until snapshot is refreshed |
| **MCP server** | token invalid or expired | return `401 Unauthorized` |
| **MCP server** | Mission stale (entity snapshot TTL exceeded) | deny `tools/call`; return `-32002` |
| **MCP server** | tool not in Mission scope | deny `tools/call`; return `-32001`; omit from `tools/list` |
| **MCP server** | commit-boundary live check fails | deny action; return `-32003`; do not execute side effect |
| **MCP server** | approval missing at commit boundary | deny action; return `-32003` |
| **MCP server** | MAS unreachable at commit boundary | deny action — do not proceed without live confirmation |
| **Enforcement point** | entity snapshot TTL expired | deny high-risk calls; fetch fresh snapshot before evaluating |
| **Enforcement point** | MAS unreachable | deny high-risk calls; allow low-risk reads up to cache TTL only |
| **Commit-boundary owner** | approval expired since host check | deny — do not execute side effect |
| **Commit-boundary owner** | duplicate `commit_intent_id` | deny — idempotency check fails closed |
| **Child Mission derivation** | child scope broader than parent | deny — narrowing proof fails; do not issue child Mission |

**Rule across all components:** when in doubt, deny. The architecture is built on short TTLs and fast refresh paths specifically so that fail-closed does not mean "stuck forever."

### Resource Semantics and Entity Generation

The note is concrete on tools, but real policy depends on resource semantics too.

You need a mapping layer from enterprise systems into Cedar resources and Mission resource classes.

For example:

- `finance.read`
  - `Mission::Dataset::"erp.actuals.q2"`
  - `Mission::API::"finance-mcp.read_financials"`
- `documents.write`
  - `Mission::Folder::"board-packets.q2"`
  - `Mission::Tool::"mcp__docs__docs.write"`
- `email.send_external`
  - `Mission::Audience::"external_email"`
  - `Mission::Tool::"mcp__email__email.send_external"`

That mapping should be owned centrally enough that:

- new APIs and tools get classified once
- Cedar resources stay stable
- Mission proposals can be compiled against known classes

Without this layer, every tool becomes an ad hoc policy decision.

#### Backend resource mapping contract

The compiler and policy layer need a backend-published mapping contract. Otherwise `finance.read` or `documents.write` stays too abstract to enforce safely.

Minimum mapping record:

```json
{
  "resource_class": "finance.read",
  "backend_system": "erp",
  "entity_type": "table_set",
  "entities": [
    "erp.actuals.q2",
    "erp.plan.q2"
  ],
  "fga_relations": [
    {
      "relation": "reader",
      "resource": "folder:board-packets/q2"
    }
  ],
  "owner_team": "finance-platform",
  "mapping_version": "2026-04-11"
}
```

Required behavior:

- every high-level `resource_class` used in templates must map to one or more backend entities
- backend teams publish and version these mappings
- compiler consumes only published mappings
- unmapped resource classes fail closed

#### Starter backend mapping set

If the team needs a starting point, publish mappings for these classes first:

| Resource class | Typical backend mapping |
|---|---|
| `finance.read` | ERP actuals/plan datasets, finance MCP read endpoint |
| `documents.read` | internal document folders or knowledge collections |
| `documents.write` | draft folder, internal memo folder, board-packet workspace |
| `ticket.create` | internal ticket queue or allowlisted partner ticketing project |
| `release.publish` | release artifact store or publication endpoint |

That starter set is enough to support the initial template pack without inventing too many backend abstractions at once.

#### Opinionated baseline mappings

To make the starter template pack executable, use these baseline mappings on day one:

| Template | Minimum backend mappings required |
|---|---|
| `board_packet_preparation` | `finance.read`, `documents.read`, `documents.write`, `release.publish` |
| `support_ticket_triage` | `documents.read`, `documents.write`, `ticket.create` |
| `sales_account_research` | `documents.read`, `documents.write`, CRM read class if available |
| `engineering_release_drafting` | issue-tracker read class, `documents.write`, `release.publish` |
| `vendor_due_diligence` | `documents.read`, `documents.write` |

If one of those mappings does not exist, either the template is not ready for production or the template must be narrowed until the backend surface exists.

#### Entity generation rules

Use these generation rules:

| Input | Cedar or backend artifact |
|---|---|
| dataset family | `Mission::Dataset` entity plus optional FGA resource IDs |
| tool endpoint | `Mission::Tool` entity |
| document folder or project | `Mission::Folder` or `Mission::Project` entity |
| row or document selector | backend-local filter or FGA tuple, not a free-form model string |

The core rule is that the model never invents resource-instance identifiers. Resource instances come from catalog or backend-owned mappings.

### Runtime Risk Model

Runtime risk needs a concrete state model, not only signals and examples.

Use three scopes of risk:

| Scope | Meaning | Example |
|---|---|---|
| session-local | current host conversation or run | repeated denied tool attempts in one Claude Code session |
| Mission-local | execution risk attached to one Mission | multiple commit-boundary deferrals on one Mission |
| principal-wide | risk attached to user, workload, or agent identity | repeated policy violations across Missions |

Use a simple severity ladder:

| Level | Meaning | Typical effect |
|---|---|---|
| `normal` | no active elevated risk | ordinary policy evaluation |
| `elevated` | some suspicious or repeated denied behavior | more live freshness checks, possible step-up |
| `high` | strong signal of drift, misuse, or attack | suspend high-risk actions, human review |
| `critical` | likely compromise or severe policy breach | suspend or revoke Mission immediately |

#### Risk signal handling

Minimum runtime effects:

| Signal | Default effect |
|---|---|
| repeated `tool.denied` | raise session-local risk |
| repeated `commit.required` without completion | raise Mission-local risk |
| prompt-injection indicator | raise session-local risk and require fresh capability snapshot |
| external-domain denial spike | raise Mission-local risk |
| policy engine mismatch or stale-hash conflict | raise Mission-local risk |
| manual operator flag | set Mission or principal to `high` or `critical` immediately |

#### Risk decay and reset

Risk cannot only go up.

Use these default rules:

- session-local risk resets when the session ends
- Mission-local risk decays one level after a clean interval, for example 30 minutes without new negative signals
- principal-wide risk decays only by explicit policy or longer review windows
- `critical` does not auto-decay; it requires human or policy clearance

### Approval UX and Workflow Contract

Approval is not complete until the approver surface is defined.

The review UI or workflow must show:

- Mission summary
- requesting user and agent identity
- exact tools and actions requested
- trust domains involved
- denied items
- gated items
- approval TTL
- reason auto-approval did not apply
- recommended decision

The approver must be able to:

- approve exact scoped items
- deny with reason
- shorten approval TTL
- require clarification before deciding

Batch approval should be allowed only for:

- the same `purpose_class`
- the same `constraints_hash`
- the same approver type

Do not batch unrelated Mission deltas into one opaque approval gesture.

#### Approval notification delivery

Creating a review work item does not notify anyone unless the notification contract is defined. Implement at least one notification channel before putting approvals into production.

Minimum notification contract:

| Trigger | Channel | Content |
|---|---|---|
| `human_step_up` work item created | primary channel for approver group | Mission summary, requesting user, gated action, approval TTL, link to review UI |
| work item approaching expiry | same channel, reminder | same fields plus remaining TTL |
| work item expired without action | primary channel + escalation contact | expiry notice, Mission blocked, escalation options |
| approval granted or denied | requesting user's channel | decision, approved scope, TTL if granted |

Minimum notification payload:

```json
{
  "notification_id": "notif_01JR9V9X4Q",
  "review_id": "rev_01JR9S9D1H",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "approval_type": "controller_approval",
  "requesting_user": "user_123",
  "gated_action": "publish_external",
  "mission_summary": "Prepare Q2 board packet comparing actuals to plan",
  "review_url": "https://governance.internal/approvals/rev_01JR9S9D1H",
  "expires_at": "2026-04-11T19:10:00Z"
}
```

The MAS delivers this payload to a configured notification adapter for the approver group. The adapter is responsible for routing to the right channel (email, Slack webhook, ticketing system, push notification). The MAS does not need to know the channel details; it only needs a stable adapter endpoint per approver group.

**Adapter configuration per approver group:**

```json
{
  "approver_group": "finance_controller",
  "adapters": [
    {
      "channel": "email",
      "config": {
        "to_group": "finance-controllers@example.com",
        "reply_to": "noreply-governance@example.com",
        "template": "approval_request_v2"
      }
    },
    {
      "channel": "slack_webhook",
      "config": {
        "webhook_url": "${SLACK_FINANCE_APPROVALS_WEBHOOK}",
        "mention_group": "@finance-controllers",
        "include_review_button": true
      }
    }
  ],
  "escalation_adapter": {
    "channel": "pagerduty",
    "config": {
      "service_key": "${PD_FINANCE_APPROVAL_SERVICE_KEY}",
      "severity": "warning",
      "trigger_on": "expiry_without_action"
    }
  }
}
```

**Email adapter minimum content:**

```
Subject: [Approval Required] Board Packet Preparation — publish_external — expires 3:00 PM

A governance approval is requested.

Mission: Board Packet Preparation
Requested by: Karl McGuinness (user_123)
Action requiring approval: Publish document externally
Description: The Q2 board packet is ready to publish to the SharePoint board folder.

Expires: April 12, 2026 at 3:00 PM UTC (in 4 hours)

[Approve] [Deny] [View full review packet]

Risk assessment: MEDIUM — this action is irreversible.
```

**Slack adapter minimum content:**

```
:hourglass: *Approval required* (expires in 4 hours)

*Mission:* Board Packet Preparation
*Requested by:* Karl McGuinness
*Action:* Publish Q2 board packet externally

[Approve] [Deny] [View review packet]
```

**Webhook adapter (for ticketing systems like Jira, ServiceNow):**

The notification payload above is delivered as an HTTP POST to the configured endpoint. The ticket system adapter should:
1. Create a work item from the notification payload
2. Assign to the appropriate approver queue
3. Call back to `POST /approvals/work-items/{review_id}/approve` or `/deny` when the ticket is resolved

**What notification adapters must not do:**
- Send the full Cedar policy or `constraints_hash` in notification content
- Include raw approval object IDs in user-visible fields
- Allow approval via unauthenticated webhook reply (approver must be authenticated to call the MAS approval endpoint)

Supported notification modes by risk level:

| Risk level | Minimum notification requirement |
|---|---|
| `auto` with release gate | notification optional; approver-pull model acceptable |
| `human_step_up` medium risk | notification required at work item creation; reminder at 50% of TTL |
| `human_step_up` high risk | notification required at creation; reminder at 25% of TTL; escalation at expiry |

#### Approval SLA and escalation

Minimum operational rules:

- every review item has an expiry
- every review item has an owning approver group
- expired review items return the Mission to blocked state
- if no approver acts within SLA, route to a fallback approver or deny

### Replay and Idempotency

Freshness checks are not enough by themselves. Side effects also need replay protection.

Use idempotency keys for:

- commit-boundary actions
- approval submissions
- token exchanges where the caller retries automatically

Minimum commit-boundary rule:

- host or MCP generates `idempotency_key`
- backend stores outcome keyed by `idempotency_key + mission_id + tool_name`
- retry with same key returns same decision or result, not a duplicate side effect

Signal ingestion already uses `signal_id` for idempotency. Apply the same discipline to approvals and commit actions.

### Observability and SLOs

The system needs operational targets, not only audit records.

Minimum SLO table:

| Surface | Target |
|---|---|
| MAS capability snapshot p95 | under 200 ms inside one region |
| token issuance p95 | under 300 ms without human approval |
| signal ingestion acceptance p95 | under 100 ms |
| revocation propagation to next checkpoint | under one token refresh or freshness interval |
| policy bundle fetch p95 | under 200 ms |

Minimum dashboards:

- Mission lifecycle transitions per hour
- token issuance allow/deny counts
- `tools/call` deny rates by tool
- commit-boundary deferral and denial rates
- approval queue depth and expiry counts
- stale-hash conflict rate

### Data Retention and Privacy

Mission systems collect sensitive governance data. Retention and visibility rules must be explicit.

Minimum retention guidance:

| Artifact | Default retention |
|---|---|
| Mission proposal | short to medium, for example 30-90 days unless needed for audit |
| governance record | retain for audit and operational history, often 1 year or policy-defined |
| approval object | retain with audit record lifetime |
| runtime signal | shorter operational retention, for example 30-90 days unless escalated |
| audit record | policy or regulatory retention, often longer than proposals or signals |

Minimum privacy rules:

- token projections expose only audience-necessary fields
- approver UIs do not show unrelated prior Missions by default
- cross-domain tokens must not expose internal purpose text or approval evidence
- audit viewers are role-limited and tenant-scoped
- free-form proposal text should be redactable or access-controlled separately from enforcement state

#### Audit privacy rules

Auditability does not require exposing every field to every reader.

Use these controls:

- separate **operational audit views** from **compliance export views**
- redact free-form prompt and proposal text from ordinary operational views by default
- expose approval references without exposing approval comments unless the viewer has approver-read permission
- store cross-domain correlation handles separately from internal Mission detail so external-domain events can be traced without disclosing internal Mission content
- encrypt audit storage at rest and require role-based read access with tenant scoping

### Approval Object and Approval Expiry

Human approval should not be just a boolean. The canonical approval object schema is defined in the artifact interface contracts section. Use that schema without modification.

The runtime properties that matter most here:

- `approved_scope` names the exact tools and actions covered, so a broader approval cannot be used to satisfy a narrower gate or vice versa.
- `constraints_hash` pins the approval to the Mission version it was issued against. An approval issued against an older hash does not satisfy a gate after the Mission is amended.
- `reusable_within_mission` defaults to `false` for irreversible actions. A one-shot approval is consumed on first use and must be re-requested for subsequent attempts.
- `expires_at` enforces a hard TTL. Commit-boundary rechecks must verify the approval object is still within its window at the moment the side effect becomes real.

For irreversible actions, set a short TTL and leave `reusable_within_mission` false unless the business process explicitly requires otherwise.

#### Approval workflow edge cases

Use these rules:

- if an approval arrives for an old `constraints_hash`, reject it and require a new review
- if an approval expires after host approval but before commit boundary, the commit fails closed
- if the approver reduces scope relative to the request, emit a new approval object with the reduced scope only
- if multiple approvals are required, the commit boundary checks all required approval types, not just one

#### Approval expiry countdown and warning UX

Approvals expire at `expires_at`. The host must warn the user before an approval expires so they are not mid-task when a stage-gated action suddenly fails closed.

**How the host tracks approval TTL:**

The capability snapshot response includes approval state. The host caches `expires_at` for each approval object it holds and checks it before each `PreToolUse` involving a gated action:

```json
{
  "active_approvals": [
    {
      "approval_id": "apr_01abc",
      "approved_scope": ["final_publish"],
      "expires_at": "2026-04-12T14:30:00Z",
      "minutes_remaining": 22,
      "reusable_within_mission": false,
      "consumed": false
    }
  ]
}
```

**Warning thresholds and host behavior:**

| Time before approval expiry | Host behavior |
|---|---|
| 30 minutes | Surface passive indicator: "Approval for [action] expires in ~30 minutes" |
| 10 minutes | Surface inline warning before any gated action: "Your approval expires soon. Complete this step now or request a new approval." |
| 5 minutes | Surface blocking warning before gated action: "Approval expires in 5 minutes — proceed now or the action will require re-approval." |
| Expired | Do not attempt the action; surface: "Approval for this step has expired. You'll need to request a new approval to continue." |

**Expired approval at commit boundary:**

If an approval expires after the host checks it but before the MCP server's commit-boundary recheck, the MCP server returns `-32003` (approval missing). The host should:
1. Surface: "The approval window closed just as I was completing that step. Requesting a new approval now."
2. Re-request approval via the approval workflow API.
3. Do not retry the action until a new approval object is received.

**One-shot approvals:** when `reusable_within_mission: false`, the approval is consumed on first successful use. The host must update its capability snapshot after a successful commit-boundary action to reflect that the approval is consumed. If the model attempts the same gated action again, the host should prompt for a new approval rather than checking a stale cached approval state.

#### Approval integrity rules

Approval objects are security artifacts. Treat them as such.

Minimum integrity requirements:

- approval objects must be issued only by the MAS or an approved workflow service
- approval objects must be signed or MACed so the host or MCP server can verify integrity
- approval objects must include `mission_id`, `constraints_hash`, approval scope, and expiry in the signed payload
- approval objects must be immutable once issued; narrowed replacements create new approval objects rather than editing the old one

### Audit Record Shape

Signals are not enough without a canonical audit record.

Every significant decision should produce an audit record with:

- `event_id`
- `mission_id`
- `constraints_hash`
- `parent_mission_id` if present
- actor identity
- `act` chain if present
- tool or resource name
- action name
- decision
- reason
- approval references
- timestamp
- correlation IDs:
  - session ID
  - tool use ID
  - token ID if relevant
  - cross-domain correlation ID if relevant
 - integrity fields:
  - `record_hash`
  - `prev_record_hash` if using a hash chain
  - signer or emitter identity

For example:

```json
{
  "event_id": "evt_01JR9V1DZZ",
  "mission_id": "mis_01JR9S4YDY6QF5Q9Q54M0YB4V1",
  "constraints_hash": "sha256-abc123",
  "actor": {
    "user_id": "user_123",
    "agent_id": "agent_research_assistant"
  },
  "action": "publish_external",
  "resource": "mcp__docs__docs.publish",
  "decision": "allow",
  "approval_ids": ["apr_01JR9T2M4N3"],
  "tool_use_id": "toolu_123",
  "timestamp": "2026-04-11T18:10:00Z"
}
```

That is enough to reconstruct who acted for whom, under which Mission version, and with what approval basis.

#### Audit integrity and tamper evidence

Auditability is not enough if the records can be rewritten quietly.

**Audit integrity tiers:**

Not all events carry the same integrity requirement. Use three tiers:

| Tier | Integrity requirement | Events |
|---|---|---|
| **Tier 1 — tamper-evident** | signed by emitting service or written to write-once storage; `record_hash` + `prev_record_hash` chain required | Mission approval issuance; Mission state transitions (activate, suspend, revoke, complete); commit-boundary success or denial; emergency bypass actions; dual-approval config changes; template governance approvals |
| **Tier 2 — append-only** | append-only log; `record_hash` required; `prev_record_hash` recommended | tool call allow/deny (non-commit); token issuance and revocation; capability snapshot fetch; stage gate evaluation; policy bundle distribution; amendment creation and approval |
| **Tier 3 — standard telemetry** | standard structured log; no hash chain required | cache hit/miss; API call latency; signal ingestion acknowledgements; snapshot refresh timing; `would_have_denied` shadow events |

Tier 1 events must be stored in a system that prevents modification after write — write-once object storage, an append-only ledger, or a signing service that attests each record at emission time. Tier 2 events use standard append-only logging (no delete or update). Tier 3 events may use any telemetry store.

Minimum integrity controls for all tiers:

- every audit record gets a canonical serialized form and a `record_hash`
- Tier 1 and Tier 2 append-only streams include `prev_record_hash` to form a tamper-evident chain within the tier
- `constraints_hash` and policy bundle version used for the decision must be recorded with every Tier 1 and Tier 2 event

The goal is:

- **audibility**: important actions are always recorded
- **traceability**: records can be linked across prompt, Mission, token, tool call, approval, and commit
- **integrity**: records and approvals cannot be silently modified
- **cross-domain privacy**: external domains see only what they need for local enforcement and local audit

#### Traceability chain

For a high-risk action, the audit chain should be reconstructable end to end:

1. user prompt or triggering business event
2. shaped proposal
3. compiled Mission and `constraints_hash`
4. approval object or auto-approval basis
5. issued token projection
6. host tool decision
7. MCP tool decision
8. commit-boundary decision
9. backend side effect or denial

If an operator cannot walk that chain from one correlation handle, the system is not yet traceable enough for production.

### Operator Admin Dashboard Spec

Operators need a day-to-day view of what is happening across all Missions in their tenant. The dashboard is not a configuration tool — it is a monitoring and emergency-response surface.

**Required views:**

#### Active Missions view

Columns: Mission display name, status, user, template, created, expires, last activity, actions.

Filters: status (active / pending / suspended / expired), template, user, date range.

Actions per Mission: View details, Suspend, Revoke, View audit trail.

The operator should be able to answer "how many Missions are active right now and which ones expire today?" without writing a query.

#### Pending approvals view

Columns: Mission name, user, approval type, submitted, risk tier, approver assigned, SLA expiry.

Sort: oldest first by default (to surface stale approvals).

Actions: Approve, Deny, Reassign, View review packet.

The review packet link should open the full compiler output for that Mission — template matched, tools approved, stage gates, risk factors — not just the raw JSON.

#### Recent denials view

Columns: Mission name, user, tool denied, denial reason, timestamp.

Filters: denial reason, template, date range.

The operator should be able to identify patterns: if `mcp__email__send_external` is denied 30 times in a day for the same template, the template may need adjustment.

#### Template drift view

For each active template: name, version, last reviewed, number of Missions using it, pending re-review flag.

Flag any template in `pending_re_review` — these are templates whose referenced resources changed and that are blocked from new Mission activations until re-reviewed.

#### Emergency controls

| Action | What it does | Confirmation required |
|---|---|---|
| Suspend all Missions for tenant | transitions all `active` Missions to `suspended` | yes — requires operator to type tenant ID |
| Revoke single Mission | transitions Mission to `revoked`, emits signal | yes — confirmation dialog |
| Revoke by template | revokes all Missions using a given template | yes — requires operator to type template name |
| Disable template | transitions template to `deprecated`; blocks new Mission activations | yes |
| Force approval expiry | expires an approval object immediately | yes |

Emergency controls should require explicit confirmation and produce an audit record with the operator's identity and stated reason.

**What the operator admin dashboard must not expose:**

- Token values or raw credential data
- Cedar policy source (available in template management, not this view)
- User session content or conversation history
- `act` chain internals

### MAS Operator Runbook for Common Incidents

#### Incident: Mission is stuck in `pending_approval` with no approver action

Diagnosis steps:
1. Check `GET /approvals/work-items/{review_id}` — is the review item assigned to an approver?
2. Check approver's notification delivery — did the approval request notification actually reach them?
3. Check SLA expiry — is the work item past its SLA?

Resolution:
- If approver is unavailable: reassign via `POST /approvals/work-items/{review_id}/reassign` with a new `assigned_to`
- If SLA is expired: escalate per the template's `escalation_path`; if no escalation is configured, deny the Mission and require resubmission
- If notification failed: re-trigger notification via `POST /approvals/work-items/{review_id}/notify`

#### Incident: `constraints_hash` mismatch errors spiking at enforcement points

Diagnosis steps:
1. Check recent `mission.amended` events for affected `mission_id` values — is an amendment in flight?
2. Check enforcement point cache TTL — are enforcement points refreshing within the configured TTL?
3. Check signal rail delivery — are `mission.amended` signals reaching enforcement points?

Resolution:
- If signal delivery is degraded: enforcement points must rely on TTL expiry to self-heal; reduce entity snapshot TTL if the window is too long
- If amendment is stuck mid-propagation: force a cache flush at affected enforcement points via `POST /missions/{id}/invalidate-cache`

#### Incident: High denial rate for a specific tool

Diagnosis steps:
1. Check recent denials view — what Mission template is being denied most?
2. Check template definition — is the tool in `hard_denied_actions` or just missing from `approved_tools`?
3. Check catalog record — has the tool's `data_sensitivity` or `allowed_action_classes` changed since the template was reviewed?

Resolution:
- If tool should be allowed: file a template amendment request with the business owner; do not grant ad-hoc exceptions
- If tool is correctly denied but users keep attempting it: update the template's `purpose_description` so the shaper stops proposing it

#### Incident: Session budgets being hit unexpectedly

Diagnosis steps:
1. Check `GET /missions/{id}` — what are the budget counts for the affected Mission?
2. Check session audit trail — are the tool calls consistent with the stated Mission purpose?
3. Check template session budgets — are the limits too low for the actual workload?

Resolution:
- If limits are correctly set: the session is behaving differently than expected; review audit trail for anomalies
- If limits are too low: file a template update to raise the budget; require business owner sign-off
- If anomaly is suspected: trigger Mission suspension via `POST /missions/{id}/suspend`

#### Incident: MAS unavailable; enforcement points losing connectivity

Enforcement points must fail closed on MAS unavailability. Verify:
1. Is the signal rail delivering `mission.suspended` events? If not, enforcement points should not be loosening policy.
2. Are commit-boundary live checks returning errors? If yes, commit-boundary actions must be denied until connectivity restores.
3. Are entity snapshot TTLs expiring? Enforcement points with expired snapshots must deny high-risk calls.

Do not extend TTLs or add exceptions to recover uptime during an outage. Fix MAS availability instead.

### Operator console model

The admin dashboard has views and runbook playbooks. The operator console model defines which views an operator should be looking at on what cadence. Without this, operators either check everything compulsively (alert fatigue) or check nothing until something breaks.

**Daily morning health check (5 minutes):**

An operator starting their day should answer these five questions before doing anything else:

| Question | Where to find the answer | Target |
|---|---|---|
| Are there any pending approvals older than 2 hours? | Pending approvals view, sorted by submitted ascending | 0 approvals past SLA |
| Are any templates approaching their review deadline? | Template drift view, filtered to "review due in 30 days" | 0 overdue templates |
| How many Missions are in `suspended_anomaly` state? | Active Missions view, filtered to suspended_anomaly | 0 unexpected suspensions |
| How many bootstrap Missions are still active? | Active Missions view, filtered to `mission_type=bootstrap` | decreasing toward 0 |
| Did any emergency controls fire overnight? | Audit trail, filtered to `event_type=emergency_action` | 0, unless known |

The default dashboard view should open to these five indicators automatically. An operator who opens the console and sees five green indicators is done with their morning check.

**Incident response view (on-demand):**

When a user reports a problem, the operator opens the incident view and follows the policy debugging diagnostic playbook (see above). The incident view shows:

1. Search by user ID → their recent Missions → most recent audit events
2. Denial events for that user in the last 24 hours
3. The `explain` button on any denial event — calls `POST /missions/{id}/explain` and shows the human-readable decision trace
4. Quick actions: View template, View catalog entries for denied tool, Reassign approval

The operator should be able to go from "user says X was denied" to "here is the specific rule that denied it and why" in under 3 clicks.

**Weekly review cadence (30 minutes):**

| Review item | Source | Action threshold |
|---|---|---|
| Approval burden analysis | Pending approvals → average time to resolution by template | If any template's median approval time > 4h, consider loosening approval mode or fixing routing |
| Denial rate trends | Recent denials → count by denial reason by template | If a single (tool, template) pair accounts for > 20% of denials, the template may need adjustment |
| Template drift status | Template drift view | All templates reviewed on schedule; no orphaned templates |
| Bootstrap Mission count | Active Missions filtered to bootstrap | Trending toward zero; flag if increasing |
| Session budget hits | Config namespace logs or audit trail | If budget suspensions are frequent, investigate whether budgets match real workload |

**What should NOT appear on the default operator view:**

- Raw Cedar policy text (available in template management, not the dashboard)
- `constraints_hash` values (available in the debug view, not the daily view)
- Signal weight detail (available in the anomaly view, not the daily view)
- Raw audit event JSON (available in the audit trail, not the summary view)
- Token values or credential data (never surfaced to operators)

The daily view and weekly review use plain-language summaries and counts. Operators should not need to parse JSON to run their morning check.

### Cross-Domain Worked Example (Advanced Profile)

Here is the missing multi-domain case.

The user asks:

> Reconcile Q2 actuals, open a support ticket with our payroll provider for one anomaly, and prepare the final board packet for signature approval.

This requires three domains:

1. **enterprise finance MCP server**
2. **partner payroll ticketing domain**
3. **third-party document-signing domain**

The flow should be:

1. MAS compiles one parent Mission
2. organizational policy auto-approves:
   - finance reads
   - draft document preparation
3. MAS escalates for human approval before:
   - partner-domain ticket submission if the partner is not in the standing allowlist
   - final publish or signature submission
4. host gets:
   - enterprise finance MCP token locally
   - ID-JAG for payroll partner domain, then payroll-domain access token
   - ID-JAG for signing domain only after human approval if required
5. each target domain mints and enforces its own token locally

That example shows why these are separate layers:

- Mission says the work is in scope
- enterprise policy decides whether cross-domain identity exchange is allowed
- each external domain still decides what it will honor

### Failure Paths

The note needs explicit failure handling because the happy path is not enough.

#### Token exchange denied

- AS returns denial
- host emits `token.denied`
- MAS may narrow remaining plan options or escalate to human approval

#### Approval timeout

- approval object expires
- host and MCP caches invalidate on `approval.expired`
- commit-boundary retry fails closed

#### Signal delivery failure

- local enforcement still happens
- event delivery should retry asynchronously
- high-risk outcomes should not depend on best-effort telemetry alone

#### MAS unavailable

- low-risk cached reads may continue for a short TTL
- token issuance fails closed
- commit-boundary actions fail closed

#### Constraints changed mid-session

- new `constraints_hash` invalidates local caches
- next tool call or commit-boundary check forces refresh
- old projections are not accepted for high-risk actions

#### External-domain denial

- domain B AS denies ID-JAG exchange or downstream token issuance
- host records the denial as domain-local, not as Mission approval
- agent may continue with other authorized parts of the Mission

### Ownership Model

The implementation is concrete enough now that ownership should be explicit.

- **MAS team**
  - Mission record
  - lifecycle
  - approvals
  - event API
- **policy team**
  - Cedar templates
  - compiler mappings
  - resource classification
  - auto-approval rules
- **identity / AS team**
  - token exchange
  - ID-JAG issuance path
  - token revocation
  - sender-constraint validation
- **agent platform team**
  - Claude Code hooks
  - local gateway
  - cache behavior
  - host-side signals
- **MCP / tool team**
  - `tools/list` and `tools/call` enforcement
  - commit-boundary rechecks
  - output sanitization

There should also be a named emergency authority that can:

- suspend a Mission
- revoke a Mission
- revoke a class of tokens
- disable one MCP server or tool family

If nobody owns those control points, the design is not operational.

### Securing the MAS itself

The MAS governs agent authority. Its own access model must be at least as strong as the systems it governs.

Minimum controls:
- Mission creation requires authenticated identity and records the requester
- Mission approval requires a separate privileged role, not the requesting agent or agent host
- Mission suspension and revocation require authenticated authority with an audited reason
- The MAS write API is not callable by agents without a separate trust boundary in between

An agent that can write to the MAS unilaterally can approve its own Mission. That is not a governance architecture. It is a prompt-injection target.

Practical separations:
- separate credentials for MAS write operations versus MAS read operations
- no agent-facing MAS write path without a human or policy approval step in between
- all MAS state changes are audited with actor identity and reason
- MAS audit records are write-once and not modifiable by the agent or agent host

The MAS's access model is the governance root. Everything else in this architecture depends on it.

### MAS Availability and Degraded Mode

The MAS is the authority root. Its unavailability is not a soft degradation — it stops token issuance and commit-boundary enforcement entirely within one token refresh window. This demands an explicit HA posture and a defined degraded-mode contract.

**Minimum HA requirements:**

- the MAS must run as an active-active or active-passive cluster across at least two availability zones
- the MAS database must use synchronous replication or a distributed ACID store; asynchronous replication is not acceptable for governance records or approval objects
- the signal ingestion endpoint may accept async delivery with a durable queue, but lifecycle state changes (suspend, revoke, approve) must be synchronous
- the policy bundle fetch endpoint should be cacheable at the CDN or load-balancer layer with short TTLs; the MAS backend does not need to serve every bundle fetch directly

**Degraded-mode contract:**

MAS unavailability creates three zones of behavior depending on what the consumer needs:

| Consumer need | Source of truth | Degraded-mode behavior |
|---|---|---|
| capability snapshot refresh | MAS | fail closed; host enters restricted mode with no new planning |
| token issuance | MAS (Mission status + `constraints_hash`) | fail closed; no new tokens issued |
| `tools/call` for low-risk reads | cached local policy bundle | allow for up to local cache TTL (30-120 seconds) if current token is valid and no revocation signal has arrived |
| `tools/call` for writes or gated actions | cached local policy bundle + current approval object if required | fail closed unless the local bundle is current within TTL and any required approval object is still valid |
| commit-boundary actions | downstream or workflow owner, plus current approval object and `commit_intent_id` | fail closed if the final owner cannot verify approval, freshness, or duplicate suppression locally |

**Degraded-mode duration limits:**

The host and MCP servers may use their locally cached policy bundle for low-risk reads for up to their configured cache TTL after losing MAS connectivity. After that TTL, they must also fail closed for reads. High-risk writes and commit-boundary actions may proceed only if the final owner can validate the cached authority projection and approval object without a live MAS call. Otherwise they fail closed immediately.

Operator controls during MAS degradation:

- emergency readonly mode: operator flag that allows reads from cache indefinitely until MAS recovers, suitable for batch read workflows that do not need fresh approval
- emergency halt: operator flag that stops all agent sessions immediately; for high-security environments where any execution without live governance is unacceptable
- maintenance window: pre-announced downtime window in which in-flight sessions are notified to reach a stopping point before the window begins

These modes require a separate, MAS-independent control plane endpoint (for example, a feature flag service or a local configuration file) so they can be set even when the MAS itself is unreachable.

#### MAS centrality and service split

In production, MAS should remain the authority root, but it should not become one monolithic everything-service.

Use this split:

| Responsibility | Can stay in MAS | Can be split out |
|---|---|---|
| Mission lifecycle and current authority state | yes | no |
| approval workflow UI and queueing | optional | yes |
| policy bundle distribution cache | optional | yes |
| signal ingestion front door | optional | yes |
| capability snapshot API | yes, or thin read replica/service | yes if it preserves current state semantics |

The rule is:

- MAS owns the authoritative state transition
- surrounding services may own distribution, workflow, queueing, or caching
- no split is allowed to create multiple competing authority sources

### Multi-Tenant MAS Isolation

> **Skip this section if you are deploying for one organization.** Multi-tenant isolation is relevant when MAS serves multiple separate organizations (SaaS product) or product teams with strict data separation requirements. A single-tenant enterprise deployment does not need the isolation model described here — row-level `tenant_id` filtering in your existing data store is sufficient.

For deployments where the MAS serves multiple tenants (multiple organizations or product customers), tenant isolation is a first-class security requirement. A bug that allows a Mission query to return another tenant's governance record is a critical incident, not a data quality issue.

**Isolation model options — choose one:**

| Model | Isolation mechanism | Best fit |
|---|---|---|
| Row-level tenancy | every MAS table row has a `tenant_id` column; every query includes a tenant scope predicate | single-service multi-tenant deployment |
| Schema-per-tenant | each tenant gets an isolated database schema or namespace | higher isolation requirement, moderate operational cost |
| Instance-per-tenant | separate MAS instance per tenant | highest isolation, highest operational cost; required for regulated or contractually isolated tenants |

**Minimum requirements for any model:**

- Every MAS API endpoint must extract `tenant_id` from the authenticated caller's token, not from the request body. The caller cannot self-assert their tenant.
- Every database query must include the resolved `tenant_id` as a mandatory filter, not as an optional parameter.
- Mission IDs must be globally unique or prefixed by `tenant_id` so that a leaked ID from one tenant cannot be used to query another tenant's Mission.
- Audit records must include `tenant_id` so that per-tenant audit export is possible.
- The policy bundle fetch endpoint must verify the requesting consumer belongs to the same tenant as the Mission being fetched.

Cross-tenant access — such as a partner integration where tenant A's agent accesses tenant B's MCP server — must go through the ID-JAG cross-domain flow, not through a shared MAS namespace.

#### Tenant-safe cache rules

Every cache key used by host, AS, or MCP must include `tenant_id` in addition to `mission_id` and `constraints_hash`. A cache that keys only on `mission_id` is unsafe in a multi-tenant deployment even if Mission IDs are intended to be globally unique.

## Cedar Policy Reference

This appendix contains the full Cedar schema, action vocabulary, and generation recipe for implementers. The main path references this section from [Cedar policy model summary](#cedar-policy-model-summary).

### Cedar schema

Cedar requires a schema file for type-checked policy evaluation. Use this as the base schema. Extend it from the resource catalog as new entity types are needed.

```cedar
namespace Mission {

  entity Agent = {
    "agent_id": String,
    "workload_id"?: String
  };

  entity User = {
    "user_id": String,
    "tenant_id": String
  };

  entity Tool = {
    "resource_class": String,
    "trust_domain": String,
    "commit_boundary": Bool
  };

  entity ToolGroup in [Tool] = {};

  entity Dataset = {
    "resource_class": String,
    "trust_domain": String
  };

  entity Audience = {
    "domain": String,
    "trust_domain": String
  };

  action call_tool
    appliesTo {
      principal: [Agent, User],
      resource: [Tool, ToolGroup],
      context: {
        "mission_id": String,
        "constraints_hash": String,
        "mission_status": String,
        "approvals": Set<String>,
        "runtime_risk": String,
        "commit_boundary": Bool,
        "trust_domain": String
      }
    };

  action read
    appliesTo {
      principal: [Agent, User],
      resource: [Tool, Dataset],
      context: {
        "mission_id": String,
        "constraints_hash": String,
        "mission_status": String,
        "approvals": Set<String>,
        "runtime_risk": String,
        "commit_boundary": Bool,
        "trust_domain": String
      }
    };

  action publish_external
    appliesTo {
      principal: [Agent, User],
      resource: [Tool, Audience],
      context: {
        "mission_id": String,
        "constraints_hash": String,
        "mission_status": String,
        "approvals": Set<String>,
        "runtime_risk": String,
        "commit_boundary": Bool,
        "trust_domain": String
      }
    };

  action issue_token
    appliesTo {
      principal: [Agent],
      resource: [Audience],
      context: {
        "mission_id": String,
        "constraints_hash": String,
        "mission_status": String,
        "delegation_depth": Long,
        "requested_tools": Set<String>
      }
    };

  action delegate
    appliesTo {
      principal: [Agent, User],
      resource: [Agent],
      context: {
        "mission_id": String,
        "constraints_hash": String,
        "delegation_depth": Long
      }
    };

}
```

Actions not listed here (`draft`, `delete`, `send_external`) follow the same `appliesTo` shape as `call_tool`. Define them explicitly in the schema before generating policies that use them.

### Cedar action vocabulary

Do not let each tool invent actions ad hoc. Define a stable action vocabulary and map tool methods into it.

Minimum vocabulary:

| Tool behavior | Cedar action |
|---|---|
| read-only fetch | `Mission::Action::"read"` |
| create or modify draft state | `Mission::Action::"draft"` |
| send message or API request to external party | `Mission::Action::"send_external"` |
| publish or finalize externally visible state | `Mission::Action::"publish_external"` |
| delete or destroy state | `Mission::Action::"delete"` |
| delegate to child agent | `Mission::Action::"delegate"` |
| request token for audience | `Mission::Action::"issue_token"` |

The host and MCP server should map concrete tools into this vocabulary before evaluation.

### Cedar generation recipe

**Pass 1 — template policy** (run once per template class; update only if template changes):

1. For each permitted action class in the template vocabulary, emit a `permit` rule referencing the template-class ToolGroup (e.g., `ToolGroup::"board_packet_write_tools"`)
2. For each denied action class, emit an explicit `forbid`
3. For each stage gate in the template, emit a conditional `permit` checking `context.approvals.contains("gate_name")`
4. For each trust-domain restriction, emit a `forbid` with domain condition

**Pass 2 — Mission entity snapshot** (run each time Mission is compiled or amended):

1. Emit `Tool` entity records for each allowed tool (with `resource_class`, `trust_domain`, `commit_boundary` attributes)
2. Emit `ToolGroup` memberships: for each allowed tool, add it as a member of the template-class group
3. Emit `Principal` entity records for user, agent, and any child lineage
4. Serialize the entity graph to canonical JSON (sorted keys, no whitespace)
5. Compute `constraints_hash` = `SHA-256(canonical_json)`
6. Store the entity snapshot and hash in MAS; make it available at `GET /missions/{id}/policy-bundle`

Required behavior:
- template policy must be reproducible from the template definition alone
- entity snapshot must be reproducible from Mission compiled state alone
- any change to the entity snapshot must produce a new `constraints_hash`
- `forbid` rules in the template always win over general `permit` rules

### Cedar generation example

For a Mission using the `board_packet_v1` template that allows `docs.write` but gates `docs.publish`:

**Template policy** (static, shared across all `board_packet_v1` Missions):

```cedar
// board_packet_v1 -- shared template policy
permit(
  principal is Mission::Agent,
  action == Action::"draft",
  resource in ToolGroup::"board_packet_write_tools"
)
when {
  context.mission_status == "active"
};

forbid(
  principal is Mission::Agent,
  action == Action::"publish_external",
  resource
)
unless {
  context.approvals.contains("controller_approval")
};
```

**Mission entity snapshot** (per-instance, recomputed on amendment):

```json
{
  "entities": [
    {
      "uid": "Mission::Tool::\"mcp__docs__docs.write\"",
      "attrs": {
        "resource_class": "docs",
        "trust_domain": "enterprise",
        "commit_boundary": false
      },
      "parents": ["Mission::ToolGroup::\"board_packet_write_tools\""]
    },
    {
      "uid": "Mission::Tool::\"mcp__docs__docs.publish\"",
      "attrs": {
        "resource_class": "docs",
        "trust_domain": "enterprise",
        "commit_boundary": true
      },
      "parents": []
    },
    {
      "uid": "Mission::Agent::\"agent_research_assistant\"",
      "attrs": { "agent_id": "agent_research_assistant" },
      "parents": []
    }
  ]
}
```

The `constraints_hash` is the SHA-256 of the canonical form of this entity snapshot. Enforcement points check their cached hash on every request. On mismatch, they pull a fresh snapshot from `GET /missions/{id}/policy-bundle` before evaluating.

The compiler must regenerate the entity snapshot from Mission state without consulting model output again. The template policy is authored once and version-controlled alongside the template definition.

## Implementation Plan Appendix

If I had to build this in order, I would do this:

| Phase | Goal | Deliverables |
|---|---|---|
| 0 | stand up authority inputs and compiler substrate | resource catalog, policy templates, purpose classifier config, compiler service, stable artifact schemas, deterministic `constraints_hash` generation |
| 1 | stand up approval workflow and Mission lifecycle | MAS APIs, governance record storage, review packet generation, approval work items, approval object issuance, lifecycle state transitions |
| 2 | stand up the minimum safe single-domain runtime path | choose token validation model, choose sender-constraint or trusted gateway pattern, one OAuth AS, one MCP server, capability snapshot integration, Mission-aware `tools/call` enforcement |
| 3 | make single-domain runtime authorization materially Mission-aware | Cedar-backed issuance policy or equivalent adapters, audience-specific token projection, host `PreToolUse` checks, stage-gated execution, approval-bound token issuance, downstream-owned commit boundaries |
| 4 | make containment and revocation real | downstream-owned commit-boundary integration by default, runtime signal ingestion, Mission suspension and revocation propagation, cache invalidation, credential lifecycle tied to Mission lifecycle |
| 5 | add delegated execution safely if needed | derived sub-Missions, narrowing proof artifacts, delegated token issuance, `act` chain propagation with [OAuth Actor Profile](https://mcguinness.github.io/draft-mcguinness-oauth-actor-profile/draft-mcguinness-oauth-actor-profile.html) |
| 6 | add advanced profiles | cross-domain federation, asynchronous enterprise approval workflows, partner-domain readiness tiers, per-domain enforcement and revocation behavior |

### Honest assessment of Phase 0

The Phase 0 description ("resource catalog, policy templates, purpose classifier config, compiler service, stable artifact schemas, deterministic `constraints_hash` generation") sounds like a large distributed system. It is not. An honest minimum viable Phase 0 is:

- **one hard-coded template** — pick the most common internal workflow (e.g., `board_packet_preparation`). Write the allowed tools, allowed action classes, and hard denies as a static JSON file. This is the template. Do not build a template management system first.
- **a 10-tool catalog** — enumerate the specific tools the template needs. Give each a `resource_id`, `resource_class`, `trust_domain`, and `commit_boundary` field. A flat JSON file is fine at this stage.
- **a compiler that fails closed on everything else** — if the incoming proposal references anything not in the catalog, compilation fails. If the proposal does not match the one template, compilation fails. The compiler's job at Phase 0 is to say "yes" to the one known pattern and "no" to everything else.
- **deterministic `constraints_hash`** — SHA-256 of the sorted-key canonical JSON of the compiled output. No randomness.

That is a weekend's work, not a quarter's work. The rest of Phase 0's scope (classifier config, catalog service API) is the second iteration.

**Version recording is in scope for Phase 0.** The Phase 0 exit gate requires that compilation inputs be versioned and the version used be recorded in every compiled output. This does not mean building a catalog management system — it means your template JSON file has a version field, your catalog JSON file has a version field, and the compiler writes both into the compiled output. A git commit SHA is sufficient as a version identifier at Phase 0. The Phase 1 iteration adds the catalog service API and formal version management. Phase 0 just needs the compiler to record what it compiled from.

The Phase 0 exit gate: the compiler is deterministic, unknown tools fail closed, and every compiled output records the template version and catalog version used. If you can prove those three properties, Phase 0 is done.

### Simplified v1 cut

If the team needs the smallest serious deployment, stop after Phase 3 and keep these limits in place:

- single-domain only
- narrowing-only Missions
- one host integration
- one MCP server family
- capability snapshot refresh only at authority transitions
- no sub-agents unless pre-issued delegation artifacts are already implemented
- no cross-domain federation

That delivers the main governance value without taking on the highest distributed-systems risk from delegation and partner-domain coordination.

### Phase gates

Do not treat the phases as soft suggestions. Each phase should have an exit gate.

| Phase | Must be true before moving on |
|---|---|
| 0 | compiler is deterministic; unknown tools fail closed; compilation inputs (template file + catalog file) are versioned and the version used is recorded in every compiled output |
| 1 | Missions can move through `pending_clarification`, `pending_approval`, `active`, and `denied`; approval evidence is persisted |
| 2 | no external tool executes based on `tools/list` alone; `tools/call` is Mission-aware; stale `constraints_hash` blocks execution; hosts do not query MAS on every reasoning step |
| 3 | token issuance is audience-specific and reproducible from Mission state; gated actions cannot obtain usable tokens without approval; runtime surfaces either use Cedar directly or a semantically equivalent adapter |
| 4 | commit-boundary actions are non-bypassable; downstream system of record owns the final commit boundary where applicable; revocation changes runtime behavior on next checkpoint; signal ingestion is live |
| 5 | child Missions cannot broaden parent authority; narrowing proofs are stored and testable |
| 6 | advanced profiles remain optional; each target domain mints and enforces its own token locally; source-domain authority does not bypass target-domain policy |

### Why this phase order

This order matches the actual dependency graph in the rest of the design:

1. the compiler cannot work without catalog, templates, and stable schemas
2. runtime approval cannot work without review packets and lifecycle state
3. safe MCP runtime starts at `tools/call`, not at `tools/list`
4. token model and freshness strategy have to be chosen before AS and MCP behavior are stable
5. single-domain deployment is the core profile; cross-domain and asynchronous approval are advanced profiles
6. containment and revocation hardening depend on signals, current hash semantics, and approval objects already existing
7. delegation and cross-domain federation are separate expansions and should not be built as one combined final step

### Minimum internal deployment cut

If the goal is an internal deployment rather than a prototype, the minimum acceptable cut is the end of Phase 3, not Phase 2.

That minimum cut requires:

- deterministic compiler output
- approval workflow with persisted evidence
- Mission-aware `tools/call`
- scoped capability snapshot refresh for session start, state changes, delegation, and cross-domain steps
- audience-specific token projection
- stage-gated execution for irreversible actions
- a clear commit-boundary owner for each high-risk tool, with downstream ownership preferred where the system of record is downstream

Stopping earlier is still useful for prototyping, but it is not yet a governed deployment.

### Ownership by phase

The phase plan only works if ownership is assigned explicitly. Use the following default split:

- **MAS team**: Mission state, lifecycle, approvals, signal ingestion
- **policy team**: compiler, templates, Cedar generation, approval rules
- **identity / AS team**: token issuance, introspection or freshness model, sender constraint, ID-JAG
- **agent platform team**: host integration, capability snapshot use, local cache behavior, host signals
- **MCP / tool team**: `tools/call` enforcement, commit boundary, backend resource validation
- **security operations**: emergency controls, revocation authority, high-risk policy override

Use this RACI-style matrix:

| Phase | Accountable | Primary responsible teams | Must be consulted |
|---|---|---|---|
| 0 compiler substrate | policy team | policy team, MAS team | MCP/tool team, security operations |
| 1 approval workflow and lifecycle | MAS team | MAS team, policy team | security operations, business approver owners |
| 2 minimum safe runtime path | agent platform team | agent platform team, identity/AS team, MCP/tool team | MAS team, policy team |
| 3 Mission-aware runtime authorization | identity/AS team | identity/AS team, policy team, agent platform team | MAS team, MCP/tool team |
| 4 containment and revocation | security operations | MAS team, MCP/tool team, identity/AS team, agent platform team | policy team |
| 5 delegated execution | MAS team | MAS team, policy team, identity/AS team | agent platform team, security operations |
| 6 cross-domain federation | identity/AS team | identity/AS team, MAS team | policy team, security operations, external domain owners |

The handoff rule should be simple:

- no phase starts without a named accountable owner
- no phase exits without the accountable owner signing off on the phase gate
- any phase that changes `constraints_hash` semantics, approval semantics, or revocation behavior requires MAS, policy, and identity review together

### Rollout into an existing deployment

Greenfield Phase 1 above assumes no prior agent infrastructure. If Mission architecture is being introduced into an existing agent deployment, use this migration sequence instead.

**Step 1: instrument before enforcing.**
Deploy the MAS in observation mode. Shape Missions from real prompts but do not block anything yet. Use the compiler output to understand what authority envelopes current sessions are implicitly using. This gives you the data to build initial templates from real patterns rather than theory.

Expect this phase to be longer than planned. Real environments usually reveal:

- missing catalog entries
- hidden tool dependencies
- overly broad existing permissions
- workflows that do not fit the initial template pack

**Step 2: introduce token projection without changing enforcement.**
Add `mission_id` and `constraints_hash` as claims to new tokens without changing what the MCP server or host accepts. Verify that token issuance produces stable projections before using them for enforcement decisions.

**Step 3: enforce at `tools/list` only.**
Begin filtering the visible tool list per Mission. This is lower-risk than blocking `tools/call` because it reduces surface area without hard-failing existing workflows. Observe whether any session depends on tools that do not match a Mission projection.

**Step 4: enforce at `tools/call` for non-destructive tools.**
Add Cedar evaluation at `tools/call` for read, summarize, and draft tools. Do not use unconditional `allow` as the fallback. For unrecognized tools, use one of:

- deny by default
- route to observation-only if the tool is on a pre-approved migration allowlist
- require explicit temporary bootstrap policy

Monitor denial and fallback rates. The migration goal is to shrink the bootstrap allowlist to zero.

Treat the bootstrap allowlist as debt, not as a steady-state feature. Review it on a fixed schedule and require an owner for each remaining entry.

**Step 5: enforce at commit boundaries.**
Add commit-boundary rechecks for destructive, external, and publication tools. This is the highest-impact enforcement step and should not be skipped, but it is also the step most likely to interrupt real workflows if templates are incomplete.

**Step 6: migrate existing sessions.**
Existing sessions that predate Mission architecture should be given a bootstrap Mission on their next prompt. The shaper can derive a Mission from the session's recent tool call history as a starting point. Do not allow sessions to continue indefinitely without a Mission record.

**In-flight session handling:** sessions active at the moment enforcement goes live can continue under a permissive bootstrap Mission until their next natural pause point (stop event, approval gate, or session expiry). Do not revoke in-flight tokens that were issued before Mission enforcement was enabled. Let them expire naturally while the new issuance path enforces Mission scope.

#### Bootstrap Mission specification

A bootstrap Mission is a time-limited, permissive Mission created automatically for sessions that predate Mission architecture. It is not a permanent feature — it is a migration bridge with an expiry.

**How to detect a session with no active Mission:**

The host checks `GET /missions?actor={actor_id}&status=active` at session start. If the result is empty, the session has no active Mission. The host should also check the local session state for a cached `mission_id` — if absent, the session is ungoverned.

A session that reaches `PreToolUse` with no `mission_id` must either:
1. block all tool calls until a Mission is created, or
2. create a bootstrap Mission and proceed under it

Option 1 is correct for new deployments. Option 2 is correct during the migration window.

**How to create a bootstrap Mission from tool call history:**

The host reads the last N tool calls from the session transcript (recommend N = 20, or all tool calls in the current session if fewer). It passes these to the Mission shaper as "observed tool usage," not as a natural-language intent description. The shaper identifies the resource classes touched and emits a minimal enforcement bundle:

```json
{
  "mission_type": "bootstrap",
  "purpose_class": "general_assistance",
  "resource_classes": ["<derived from observed tool calls>"],
  "action_classes": ["read", "draft"],
  "stage_constraints": [],
  "approval_required": "none",
  "bootstrap_source": "tool_call_history",
  "bootstrap_session_id": "<session_id>"
}
```

**Bootstrap Mission scope rules:**

| Decision | Rule |
|---|---|
| purpose_class | always `general_assistance` — do not try to infer a narrower purpose from history |
| resource_classes | include all resource classes observed in recent tool calls; do not include unobserved classes even if they seem related |
| action_classes | include only `read` and `draft` by default; do not include `commit` or `publish` even if observed — those require a real Mission |
| stage_constraints | none — do not require approval for bootstrap actions |
| gated_tools | none in bootstrap — the bootstrap Mission is not the moment to add new gates |
| max_wall_clock_duration_seconds | 8 hours; no extensions |

**Bootstrap Mission expiry:**

Bootstrap Missions expire after `max_wall_clock_duration_seconds` (8 hours by default) or at the operator-configured migration deadline, whichever comes first. They cannot be renewed. When a bootstrap Mission expires:

1. The host surfaces: "Your session was running under a temporary bootstrap Mission that has expired. To continue, please describe your goal and I'll help set up a Mission."
2. The user creates a real Mission via the standard flow.
3. The expired bootstrap Mission record is retained in audit history.

**How long bootstrap Missions can exist:**

Set an organization-wide bootstrap deadline — a date after which no new bootstrap Missions are issued and existing bootstrap Missions are not renewed. Recommended: 90 days from the date enforcement mode is enabled. After the deadline:

- sessions with no Mission are blocked at `PreToolUse`
- users are directed to create a Mission before proceeding

Treat the bootstrap deadline as a hard date, not a soft goal. Slipping it indefinitely means governance is never fully enforced.

**How to transition users from bootstrap to real Missions:**

When a bootstrap Mission is approaching expiry (within 2 hours), the host surfaces a proactive prompt:

> "Your current session is running under a temporary bootstrap authorization that expires in 2 hours. Would you like to set up a full Mission now so your work can continue without interruption?"

If the user says yes, the host runs the standard Mission creation flow (intent elicitation, template selection, shaping, review). The bootstrap Mission remains active until the new Mission reaches `active` state or until it expires.

If the user says no, the bootstrap Mission expires at its scheduled time and the user must create a Mission at that point.

**What bootstrap Missions do not allow:**

- commit-boundary actions (write to external systems, publish, send)
- sub-agent delegation
- cross-domain tool access
- Mission amendment or cloning
- session budget increases

These restrictions protect against the bootstrap path being used to bypass governance for high-risk operations. Users who need those capabilities must create a real Mission.

**Bootstrap Mission audit record:**

Every tool call under a bootstrap Mission is logged with `"mission_type": "bootstrap"` in the audit record. This allows operators to identify bootstrap tool usage and use it to inform real template creation. The migration goal is to make `"mission_type": "bootstrap"` disappear from the audit log within the migration window.

#### What usually breaks first during rollout

In practice, the first failures are usually:

- templates that are too narrow for real work
- resources that are missing from the catalog
- approval routes that are too slow
- commit-boundary placement that is too early or too late
- partner domains that cannot support the desired exchange flow

Plan for long shadow mode and staged enforcement by tool family, not one global switch.

### Pre-Enforcement and First Deployment Runbooks

These runbooks operationalize the P0 and P1 items from [What to do next — prioritized](#what-to-do-next--prioritized). Each runbook converts an open item into a concrete procedure with defined entry conditions, steps, and pass/fail criteria.

---

#### Component Ownership Register

**Entry condition:** before the first sprint begins.

**What "ownership" means per component:**

| Component | Owner's primary accountability |
|---|---|
| MAS core | Mission lifecycle correctness; approval evidence durability; revocation propagation; degraded-mode behavior |
| Policy compiler | Compilation determinism; step 9b validation; `constraints_hash` reproducibility; template-to-bundle fidelity |
| OAuth AS | Token issuance correctness; projection narrowness; introspection accuracy; key rotation |
| Host integration (Claude Code) | Hook correctness; capability snapshot freshness; session budget tracking; denial message surfacing |
| MCP enforcement | `tools/call` Cedar evaluation; commit-boundary pre-check; signal emission; bundle cache management |
| Security operations | Revocation authority; emergency halt; anomaly threshold oversight; incident response for policy failures |

**Register — fill one row per named individual before starting Phase 1:**

| Component | Named owner | On-call handle | Backup owner | Phase gate sign-off authority |
|---|---|---|---|---|
| MAS core | — | — | — | yes |
| Policy compiler | — | — | — | yes |
| OAuth AS | — | — | — | yes |
| Host integration | — | — | — | yes |
| MCP enforcement | — | — | — | yes |
| Security operations | — | — | — | yes |

**Gate:** no sprint begins until every row is filled. A team name in the "Named owner" column does not satisfy this gate — one person is accountable per row.

**Ongoing:** each owner must sign off on their component's phase gate before the project advances to the next phase. Any phase that changes `constraints_hash` semantics, approval semantics, or revocation behavior requires all six owners to review together.

---

#### Commit-Boundary Ownership Register

**Entry condition:** before Phase 4 (containment and revocation) begins.

**V1 template pack — tools with `commit_boundary: true`:**

| Tool | Template | Action class | Why it has a commit boundary |
|---|---|---|---|
| `mcp__docs__docs.publish` | `board_packet_preparation` | `publish_external` | Publishes to a distribution list; not reversible after recipient delivery |
| `mcp__email__email.send_external` | any template with external send | `external_send` | External delivery; no recall after send |
| `mcp__docs__docs.delete` | any template with delete scope | `delete` | Irreversible data removal |
| `mcp__calendar__calendar.send_invite` | any template with scheduling scope | `external_send` | Sends external calendar invitations |

Add rows for each tool in your actual deployed template pack. The list above covers the v1 starter pack; any tool whose backend action is irreversible must appear here.

**Register — fill before Phase 4 gate:**

| Tool | Downstream service | Service owner (named person) | `commit_intent_id` endpoint | Idempotency contract confirmed? | Replay test status |
|---|---|---|---|---|---|
| `mcp__docs__docs.publish` | — | — | — | ☐ | ☐ |
| `mcp__email__email.send_external` | — | — | — | ☐ | ☐ |
| (add rows per deployment) | | | | | |

**Gate:** every tool with `commit_boundary: true` must have a named downstream service owner and a confirmed `commit_intent_id` contract before Phase 4 enforcement goes live. A tool without a named owner is out of scope for v1 enforcement until the owner is assigned.

---

#### Pre-Enforcement Deployment Checklist

**Entry condition:** compiler, approval workflow, and capability snapshot are working (Phase 3 complete). No enforcement has been enabled yet.

**Step 1: Deploy all templates in observation mode.**

For each template in the v1 template pack:

```
POST /templates/{template_id}/enforcement-mode
{
  "mode": "observation",
  "effective_at": "now"
}
```

Confirm the response shows `mode: observation` for all templates before proceeding. Verify in the operator dashboard that `would_have_denied` signals are flowing.

**Step 2: Wire the host to emit shadow signals.**

Verify that the host hook is calling `POST /signals` with `"mode": "observation"` on every tool call. Spot-check 5 recent tool calls in `GET /missions/{id}/signals` — each should have a corresponding signal record. If shadow signals are not flowing, enforcement mode would be collecting no data — stop here and fix the signal rail.

**Step 3: Run a minimum observation window per template.**

| Template | Minimum observation window | Minimum session count |
|---|---|---|
| `board_packet_preparation` | 3–5 full board-packet sessions | 30 tool calls per session |
| `support_ticket_triage` | 50–100 ticket sessions | 10 tool calls per session |
| `draft_and_review` | 10–20 full document sessions | 20 tool calls per session |
| Any new template | 2 full work cycles | 30 tool calls minimum |

Do not set an enforcement switch date before the minimum window has elapsed.

**Step 4: Review the shadow distribution before switching.**

Pull the shadow denial report:

```
GET /templates/{template_id}/shadow-report
```

Before switching any template to enforcement:

| Check | Target | Action if not met |
|---|---|---|
| `would_have_denied` rate | < 5% of tool calls | identify the top denial reasons; widen template scope or add missing catalog entries |
| Shadow suspension trigger count | 0 in past 7 days | investigate and resolve the anomaly signal before proceeding |
| Missing resource catalog entries | 0 `catalog.resource.not_found` signals | add missing entries to the catalog before enforcing |
| Denial reasons are policy gaps (not violations) | review top 10 | if any look like real violations, review them with the security ops owner before enforcing |

**Step 5: Switch template-by-template, not globally.**

Switch one template to enforcement. Observe for 24–48 hours. If denial rate spikes or legitimate work is blocked, roll back:

```
POST /templates/{template_id}/enforcement-mode
{ "mode": "observation", "effective_at": "now" }
```

Do not switch a second template until the first template is stable. Template-by-template enforcement means a calibration error in one template does not stop the whole deployment.

**Step 6: Define a rollback owner.**

Before any enforcement switch, a named person must be identified as the rollback owner — the person with authority to run the rollback command without an approval chain during an incident. This person must have the API credentials to call the enforcement-mode endpoint.

---

#### Template Calibration Procedure

**Entry condition:** at least one full observation window completed per template (see Pre-Enforcement Deployment Checklist Step 3).

**What to measure:**

For each template, collect from the shadow report:

| Metric | What it means |
|---|---|
| `would_have_denied` rate | percentage of tool calls that would have been denied; high rate = template too narrow |
| Top 5 denial reasons | what tools or actions are most frequently blocked; shows calibration gaps |
| False-positive candidates | denials that look like legitimate work; review these manually |
| Missing catalog entries | tools that appear in real sessions but are absent from the catalog |
| Shadow suspension events | sessions that would have been suspended; review each one |

**Calibration decisions:**

| Symptom | Likely cause | Adjustment |
|---|---|---|
| `would_have_denied` rate > 10% | template scope too narrow | add the most-denied tool classes to the template's `allowed_tools`; re-run observation |
| `would_have_denied` rate > 20% | template does not match the actual work pattern | consider creating a new template for this pattern rather than widening the existing one |
| `would_have_denied` rate < 1% | template may be too broad | review the template's deny list; check that hard denies are present and correct |
| Shadow suspension fires consistently on same signal type | anomaly weight is miscalibrated for this work pattern | lower the weight for that signal type in the template's anomaly config; document the change |
| Top denial reason is a tool that should be allowed | catalog entry missing or tool mapped to wrong resource class | add or fix the catalog entry; re-run observation for 1 more work cycle |

**Pass criteria for enforcement readiness:**

A template is ready for enforcement when:

1. `would_have_denied` rate is below 5% for two consecutive full work cycles
2. All shadow suspension events have been reviewed and explained
3. No `catalog.resource.not_found` signals remain for this template
4. At least one false-positive review has been completed — human eyes on the top 5 denial events

**Document calibration changes:** every template adjustment made during calibration must be recorded in the template's revision history with a reason. Do not adjust and re-observe without recording what changed.

---

#### Resource Mapping Validation Procedure

**Entry condition:** catalog entries exist; at least one observation window has been run.

**For each catalog entry in the v1 template pack, complete the following walkthrough with the system owner:**

| Check | Question to ask | Pass | Fail — action |
|---|---|---|---|
| Resource ID accuracy | "Does this `resource_id` map to the right backend endpoint or object type?" | System owner confirms the mapping | Update `resource_id` and re-run observation |
| Action class fidelity | "Does the `action_class` (read, write, delete, publish) accurately describe what this tool does to data?" | System owner confirms | Correct the action class; check if approval mode or stage gate changes are needed |
| Trust domain correctness | "Is this resource actually internal, or does it have external dependencies?" | System owner confirms `trust_domain` value | Update; if `trust_domain` changes to `external`, template must be re-reviewed |
| Commit boundary accuracy | "When this tool is called, is the action reversible?" | If irreversible, `commit_boundary: true` is confirmed | Add `commit_boundary: true` and assign a downstream service owner in the commit-boundary register |

**Process:**

1. Export the catalog entries for the v1 template pack as a spreadsheet or structured doc.
2. Schedule a 30-minute review with each system owner responsible for tools in the catalog.
3. Walk through each entry using the four checks above.
4. Mark each entry: `validated_by`, `validated_at`, `validation_outcome` (pass / needs_correction).
5. For any entry with `needs_correction`: update the catalog, re-run a short observation window (10–20 sessions), and re-validate.

**Gate:** all catalog entries used by v1 templates must have `validation_outcome: pass` before enforcement goes live. An unvalidated catalog entry is a potential source of valid-looking false denials that point to wrong systems.

---

#### Revocation Measurement Runbook

**Entry condition:** Phase 4 complete; revocation and signal ingestion are live.

**Target SLAs (from [Revocation latency](#revocation-latency)):**

| Action class | Target |
|---|---|
| MAS state change to signal emission | < 2 seconds |
| Signal emission to enforcement-point cache update | < 30 seconds |
| High-risk write or gated action containment | ≤ 120 seconds from revocation |
| Commit-boundary action containment | immediate (live check) |

**Measurement procedure:**

1. **Instrument at four points:**
   - MAS: log `revocation_emitted_at` timestamp when `mission.revoked` signal is emitted
   - Signal ingestion: log `signal_ingested_at` when MAS accepts the signal
   - Enforcement points (host, MCP server): log `cache_updated_at` when a `constraints_hash` change is processed
   - Commit boundary: log `live_check_at` and `denied_at` for any revoked Mission hitting the commit boundary

2. **Run a controlled revocation test:**
   - Create a test Mission with a known actor on a non-production host
   - Begin a session using the Mission
   - Revoke the Mission via the MAS API
   - Observe the timeline: `revocation_emitted_at` → `signal_ingested_at` → `cache_updated_at` → next denied tool call

3. **Measure at each stage:**

   ```
   signal_propagation_latency = signal_ingested_at - revocation_emitted_at
   cache_update_latency        = cache_updated_at - signal_ingested_at
   enforcement_latency         = first_denied_at  - revocation_emitted_at
   ```

4. **Evaluate against targets:** if any measurement exceeds its target:

   | Measurement exceeds target | Likely cause | Action |
   |---|---|---|
   | MAS emission > 2 seconds | MAS processing queue backlog | scale MAS signal dispatch; check for lock contention |
   | Cache update > 30 seconds | cache TTL too long or signal not reaching enforcement point | reduce cache TTL; verify signal subscription is working |
   | High-risk write not contained within 120s | signal not reaching host or MCP server | check signal delivery; verify hook is calling cache invalidation on signal receipt |
   | Commit boundary not immediate | live check is not happening; commit boundary is using cached state | fix commit-boundary implementation to force live MAS check |

5. **Run under load:** repeat the controlled test with 10 concurrent active sessions to confirm propagation does not degrade under concurrency.

6. **Document baseline:** record the measured latencies as the deployment baseline. Update the baseline after any infrastructure change that could affect signal propagation.

---

#### Approval Burden 30-Day Review

**Entry condition:** 30 days of production Missions active (not observation-only sessions).

**What to pull:**

```
GET /analytics/approval-patterns?window=30d
```

**Review checklist:**

| Check | What to look for | Threshold for action |
|---|---|---|
| Bypass pattern: frequent amendments | user amends the same purpose class > 3x per week | widen the template scope; the scope is too narrow for real usage |
| Bypass pattern: bootstrap Mission duration | bootstrap Missions still active > 30 days | investigate why real Mission creation is not completing; fix the blocker |
| Bypass pattern: Mission after denial | same user creates a new Mission immediately after a denial | check whether denial reason is being used as an approval gate incorrectly |
| Bypass pattern: withdrawn-and-resubmitted approvals | approval requests withdrawn and resubmitted > 2 times | fix approval routing; add SLA escalation if the approver is unresponsive |
| Auto-approval rate | % of Missions that auto-approved | if below 80%, approval burden is too high for the task class; review template risk tiers |
| Async step-up SLA miss rate | % of async approvals that exceeded their SLA target | if > 10%, escalation routing is broken; fix before the next 30-day cycle |
| Step-up abandonment rate | user abandons session after step-up prompt appears | if > 20%, step-up prompts are too complex or the gate fires too early in the workflow |

**Tuning actions — document every change:**

Every adjustment to approval thresholds, step-up triggers, or template risk tiers must be recorded in the configuration change log with: what changed, why, who approved the change, before/after values. This creates an audit trail for approval burden changes.

If auto-approval rate is below 60%, escalate to the security operations owner — this indicates the default risk tiers may need recalibration, which is a security decision, not just a UX tuning.

---

#### Commit-Boundary Idempotency Validation

**Entry condition:** commit-boundary implementation is in place; `commit_intent_id` flow is wired.

**What to validate:** that a commit action that is submitted twice (network retry, user double-click, concurrent session race) produces the same final state as submitting it once, with no duplicated side effects.

**Test procedure:**

**Test 1: Basic replay prevention**

1. Submit a commit-boundary action with `commit_intent_id = "test_idempotency_001"`.
2. Observe the action completes successfully; record the response.
3. Submit the exact same request again with the same `commit_intent_id`.
4. **Expected:** the second request returns a success response with the same result as the first (idempotent), but no second side effect is produced downstream.
5. **Fail condition:** the second request either (a) produces a second side effect, or (b) returns an error that causes the caller to retry with a new `commit_intent_id`, bypassing idempotency.

**Test 2: Concurrent submission**

1. Submit the same commit action from two concurrent sessions with the same `commit_intent_id` within 100ms.
2. **Expected:** exactly one succeeds; the other receives a `commit_intent_id_conflict` response and does not execute a side effect.
3. **Fail condition:** both execute, or both fail.

**Test 3: MAS advisory lock interaction**

1. Acquire the MAS advisory lock for a Mission.
2. Submit a commit action from a second session for the same Mission.
3. **Expected:** the second session's commit action is queued or rejected with `advisory_lock_held` until the first session releases the lock.
4. **Fail condition:** the second session's commit proceeds without waiting for the lock.

**Test 4: MAS unavailability during commit**

1. Submit a commit-boundary action.
2. Make MAS unavailable (mock or network partition) before the `commit_intent_id` is recorded.
3. Retry the commit action after MAS recovers, using the same `commit_intent_id`.
4. **Expected:** the action completes exactly once on recovery; the downstream side effect occurs once.
5. **Fail condition:** the action either fails permanently (lost commit) or executes twice on recovery.

**Pass criteria:** all four tests pass in a pre-production environment before Phase 4 is signed off. Tests 1-3 are mandatory for Phase 4 gate. Test 4 may be deferred to a chaos engineering exercise post-Phase 4, but must be completed before the deployment is declared production-grade.

**Document results:** record test run date, environment, `commit_intent_id` values used, and pass/fail per test in the deployment runbook. Idempotency is a correctness property — it must be re-tested after any change to the commit-boundary implementation.

## Test Appendix

At this point the architecture should be buildable. The next requirement is that it be testable. A coding agent implementing this design should be able to write these tests before or alongside the services.

### Compiler tests

| Test | Input | Expected result |
|---|---|---|
| narrow purpose match | internal board-packet request with finance/doc tools only | `purpose_class = board_packet_preparation`, no clarification |
| ambiguous purpose | request could fit two templates with similar scores | `clarification_required` |
| unknown tool | proposal includes tool absent from catalog | compilation fails closed |
| hard deny in proposal | request includes `treasury.transfer` under low-risk template | `denied` or hard disqualifier path |
| stage gate extraction | request says "show me before sending" | `stage_constraints` includes send or publish gate |
| deterministic compile | same proposal + same catalog + same template version | same governance record and same `constraints_hash` |
| changed enforceable state | remove one allowed tool or add one gate | new `constraints_hash` |

### Approval tests

| Test | Input | Expected result |
|---|---|---|
| auto-approved low-risk Mission | enterprise-only read/draft Mission inside known template | status `active`, approval mode `auto` |
| human step-up for external publish | same Mission plus `docs.publish` | status `pending_approval`, review work item emitted |
| clarification blocks activation | proposal has unknown external recipients | status `pending_clarification`, no token issuance |
| no approver route | gated action but no matching approver type | status `denied` |
| approval expiry | valid approval object reaches `expires_at` | gated actions return to blocked state |
| approval hash mismatch | approval object references old `constraints_hash` | commit denied until fresh approval |

### Token issuance tests

| Test | Input | Expected result |
|---|---|---|
| active Mission token issuance | active Mission, allowed audience, valid subject token | AS issues audience-specific token projection |
| stale Mission hash at issuance | caller presents old `constraints_hash` | AS rejects or forces refresh |
| gated action token request | request attempts token for gated publish action without approval | AS denies issuance |
| denied domain | request asks for audience outside allowed domains | AS denies issuance |
| token projection narrowing | audience = finance MCP | token only carries finance tool/action projection |
| introspected token revocation | Mission revoked after opaque token issuance | introspection returns inactive |
| self-contained token freshness | Mission revoked after JWT issuance | MCP/API freshness check denies use |

### Capability snapshot tests

| Test | Input | Expected result |
|---|---|---|
| current Mission capability snapshot | active Mission and current hash | `planning_state = active` with current planning surface |
| stale host hash | active Mission but request carries old hash | `409` or stale response with fresh hash |
| pending approval Mission | Mission requires human approval | `planning_state = pending_approval` |
| revoked Mission | revoked Mission queried by host | `403` |
| gated tool present in snapshot | active Mission with publish gate | response includes `gated_tools` and current `refresh_after_seconds` |

### Host enforcement tests

| Test | Input | Expected result |
|---|---|---|
| allowed tool call | `PreToolUse` for allowed read tool | host returns `allow` |
| denied tool call | `PreToolUse` for tool outside Mission | host returns `deny`, emits signal |
| gated tool without approval | `PermissionRequest` for publish tool with no approval object | host returns `deny` and requests approval |
| gated tool with approval | approval object matches current hash and tool | host returns `allow` |
| stale Mission context | host local context hash differs from MAS | host refreshes or denies before planning further |
| prompt-induced overreach | model attempts tool not in capability snapshot after untrusted input | host denies call |

### MCP server tests

| Test | Input | Expected result |
|---|---|---|
| filtered `tools/list` | token only allows finance read tool | server lists only finance read tool |
| `tools/call` outside projection | token excludes requested tool | `403`, signal emitted |
| schema validation failure | malformed arguments for allowed tool | request denied before backend invocation |
| Cedar deny | allowed tool name but action/resource/context denied by Cedar | `403`, signal emitted |
| stale Mission at call time | token hash no longer current | `401` or `409`, no backend call |
| commit-boundary missing approval | publish tool invoked without valid approval object | deferred or denied |
| commit-boundary recheck fail | approval granted but runtime risk or hash changed | denied before side effect |

### Advanced profile tests

The simplified core profile does not require delegation or cross-domain behavior. Treat these as separate suites that stay disabled until those profiles are enabled.

#### Delegation and narrowing tests

| Test | Input | Expected result |
|---|---|---|
| valid child derivation | child requests subset of tools/actions/domains | sub-Mission issued, proof artifact stored |
| broader child tool set | child includes tool absent from parent | issuance fails |
| deeper child delegation | child requests depth greater than parent remaining depth | issuance fails |
| dropped parent gate | child can publish but omits controller gate | issuance fails |
| stale parent Mission | child derivation requested against revoked or stale parent | issuance fails |
| delegation artifact + `act` alignment | issued child token, delegation artifact, and Mission record | child token `act` chain matches signed delegation artifact and approved Mission lineage |

### Lifecycle and revocation tests

| Test | Input | Expected result |
|---|---|---|
| Mission suspension | MAS sets Mission to `suspended` | host and MCP deny further execution except safe queries |
| Mission revocation | MAS revokes Mission after token issuance | host/MCP deny calls, caches invalidated, signals emitted |
| Mission amendment | allowed tool removed from Mission | new `constraints_hash`; old projection treated as stale |
| signal-driven risk escalation | repeated tool denials or anomaly signal | Mission moves to higher-risk state or step-up path |
| approval granted transition | approver approves pending gate | host refresh sees active gate satisfaction |
| approval expired transition | approval TTL passes before commit | commit denied and re-approval required |

### End-to-end acceptance test

This is the default **simplified core profile** acceptance path.

One full path should be automated:

1. user submits board-packet request
2. Mission shaper emits proposal
3. compiler selects `board_packet_preparation`
4. MAS auto-approves read/draft scope
5. AS issues finance and docs MCP tokens with narrow projections
6. host allows `erp.read_financials` and `docs.write`
7. host blocks `docs.publish` pending controller approval
8. controller approves with approval object bound to current `constraints_hash`
9. host refreshes the capability snapshot
10. MCP commit boundary revalidates approval and current Mission state
11. publish succeeds and emits committed signal

The test passes only if every control point uses the same Mission version and no step infers authority from prior tool success alone.

#### Advanced profile acceptance add-ons

Only enable these after the simplified core profile is stable.

##### Cross-domain tests

| Test | Input | Expected result |
|---|---|---|
| ID-JAG issuance for allowed partner domain | Mission allows partner domain, enterprise IdP AS is configured | ID-JAG issued for target domain audience |
| ID-JAG denied for unknown domain | Mission does not list the target domain | enterprise IdP AS denies issuance; host records domain-local denial |
| target-domain AS accepts ID-JAG | valid ID-JAG for a correctly configured AS | domain-B token issued for local scope |
| target-domain AS rejects ID-JAG | ID-JAG audience or issuer not recognized by domain-B AS | exchange fails; agent continues with remaining Mission scope |
| cross-domain publish requires step-up | Mission lists domain but partner write requires human approval | MAS blocks ID-JAG request until approval granted |
| internal Mission state not visible to target domain | token sent to domain-B MCP server | domain-B token carries only audience-specific projection; purpose text, approval evidence, full `act` chain absent |
| pairwise subject identifier | same user accessing two external partners | different directed subject identifiers used per target domain |

### Amendment tests

| Test | Input | Expected result |
|---|---|---|
| narrowing via amend | operator removes one tool from active Mission | new `constraints_hash` emitted; old token projection rejected for removed tool |
| broadening auto-approved | delta qualifies for auto-approval | new `constraints_hash` emitted immediately; both old and new scope are valid during brief overlap |
| broadening requires human approval | delta exceeds auto-approval threshold | Mission enters `pending_approval` for delta; prior scope remains active |
| clarification resolution | `POST /missions/{id}/clarify` with all questions answered | MAS re-runs classification; Mission activates if auto-approvable |
| amendment hash conflict | approver approves broadening against old hash | MAS rejects; requires new approval against current hash |
| narrowing invalidates child Mission | parent tool removed that child was using | child Mission flags as stale on next hash check |

