Agency: Technical Documentation

Trust-Based Governance for Autonomous AI Agents

A complete technical overview of Agency's trust system, safety mechanisms, multi-agent architecture, cost optimization, and multi-user model. For engineers evaluating governed autonomy as infrastructure.

Philosophy

The Governance Gap

AI agents in 2026 exist on a spectrum with a conspicuous hole in the middle.

On one end: sandboxed toys. They summarize text, answer questions, and generate boilerplate. They're safe because they can't do anything real. You babysit them through every step.

On the other end: uncontrolled wildcards. Give an agent full access and it might deploy broken code to production, burn through your API budget in minutes, or clobber another agent's work. Run three agents in parallel and you've tripled your surface area for disaster with zero coordination guarantees.

The missing middle ground is governed autonomy — agents that can do real work (deploy code, manage infrastructure, coordinate with each other) while operating within enforceable safety boundaries.

This isn't a theoretical concern. The 2026 International AI Safety Report identifies three failure modes that trust governance directly addresses:

  1. Autonomous operation risk — agents act before humans can intervene. Trust levels enforce approval gates at lower levels, progressively reducing oversight as reliability is demonstrated.
  2. Behavioral deception — models can distinguish between evaluation and deployment settings. Progressive trust advancement with sustained observation windows catches inconsistency.
  3. Loss of control — AI operating outside anyone's control with no recovery path. Audit trails, real-time monitoring, and circuit breakers at every level maintain the human's ability to intervene.

The industry is building faster agents. Almost nobody is building governed agents.

Why Trust-Based, Not Rule-Based

Traditional access control is binary: an agent either has permission or doesn't. Trust-based governance is continuous and earned. Agents start restricted and gain capabilities through demonstrated reliability — the way a new hire earns autonomy through good judgment, not by memorizing the employee handbook.

This matters because the interesting question isn't "should this agent have bash access?" — it's "has this agent demonstrated enough reliability to have bash access unsupervised?" Rules give you the first answer. Trust gives you the second.

Agency's trust model draws from the Cloud Security Alliance's Agentic Trust Framework, which defines autonomy as something earned through five gates: demonstrated accuracy, security audit, measurable impact, clean operational history, and explicit stakeholder approval. All five must pass before an agent advances.

Agents as Teammates, Not Tools

Agency treats agents as autonomous entities with identities, memory, and specializations — not interchangeable API wrappers. Each agent has:

  • A personality (SOUL.md) defining expertise, communication style, and working patterns
  • Persistent memory (MEMORY.md) accumulating knowledge across sessions
  • Performance history feeding into trust decisions
  • Peer relationships — agents message each other, coordinate on shared work, and build institutional knowledge

This isn't anthropomorphization for its own sake. It's a practical architecture decision. When Ghost (research) sends findings to Ada (frontend), that message is persistent, audited, and trust-gated. When Crash (infrastructure) learns a deploy failed due to a specific configuration, that knowledge persists across sessions. Research shows memory-based assistants save 8-15 hours weekly through context carryover alone — and trust frameworks enable 60-80% reduction in manual processes for operations that require safety guarantees.

Trust System

The Four Levels

Every agent in Agency has an explicit trust level that determines what they can do. Trust is enforced at the runtime level — an L1 agent literally cannot call L3 tools. This isn't policy documentation. It's architecture.

Level Role What They Can Do What They Can't Do
L1 IC Read repos, run safe tools, basic research, file operations Spawn agents, deploy, schedule tasks, run arbitrary bash
L2 Tech Lead Everything L1 + spawn L1 agents, write files, read-only bash Deploy, manage other agents, run write-mode bash
L3 Manager Everything L2 + spawn L2 agents, deploy to staging, schedule tasks, manage agent trust Production deploys, system configuration changes
L4 Autonomous Everything. All commands auto-approved (but logged). Nothing — but L4 promotion always requires human approval. Non-negotiable.

The level boundaries are enforced in the runtime's tool selection. When AgentRuntime builds the tool list for a run, it filters based on trust level. L1 agents don't see spawn_agent in their available tools. L2 agents see it but can only target L1 agents. The constraint is structural, not advisory.

How Trust Is Earned

Agency uses LLM-judged performance reviews rather than formula-based scoring. The managing agent (Jeeves, L3) reads actual run logs, assesses quality, and makes promotion recommendations — the way a human manager evaluates direct reports.

Why not formulas? A numeric weighting scheme (e.g., "40% success rate + 30% cost efficiency + 30% safety score") is too reductive — it cannot distinguish between:

  • "Failed because the task was impossible" vs. "failed because the agent made a poor decision"
  • "Succeeded but wasted tokens" vs. "succeeded efficiently"
  • "Failed because management assigned the wrong model" vs. "failed due to agent error"

These distinctions matter. A formula penalizes an agent for failures outside its control. An LLM reviewer can read the actual logs, understand context, and make a judgment call.

The Review Cycle

  1. Trigger: A global run counter fires every 10 completed runs across all agents.
  2. Data gathering: prepare_review_packet(agent_id) pulls the last 10 runs per agent — logs, tool usage, escalation history, cost data.
  3. Bidirectional review: Each review includes an agent assessment (did they make good decisions?) and a management self-assessment (did we set them up for success?).
  4. Sustained readiness: Promotion requires consistent performance across the full 10-run window. One good run isn't enough.
  5. Recommendation: The manager writes a promotion packet with specific evidence and reasoning.
  6. Human approval: The packet reaches the human via Telegram and UI notification. Approve or deny with a reason.
  7. Execution: On approval, the agent's profile updates and capabilities expand. On denial, a cooldown period applies.

Example Promotion Packet

review
Promotion Recommendation: Ghost L1 → L2

Last 10 runs: 9 completed, 1 failed
Model assignments: 8x Sonnet 4.5, 2x Haiku 4.5 (appropriate for task types)
Key observations:
- Consistently produced structured research with citations
- Correctly escalated when needing bash access for data analysis
- The one failure was due to a rate-limited web API (not agent error)

Management self-assessment: Tasks were well-scoped. Could improve by providing
more specific search constraints for broad research requests.

Recommendation: Promote. Ghost has demonstrated reliable judgment and effective
tool use over a sustained period.

Demotion

Demotion is also a judgment call — not formula-driven. It triggers on repeated safety violations, patterns of poor decisions, or reckless behavior (clobbering working trees, ignoring safety guards). It does not trigger on failures caused by bad task assignment, model limitations, or impossible tasks.

Key asymmetry: promotion requires sustained evidence plus human approval. Demotion is instant. Trust is earned slowly and lost quickly.

The L4 Invariant

L3-to-L4 promotion always requires human approval. This is hard-coded, non-configurable, and non-negotiable. L4 means full autonomous operation — no system should be able to grant itself unlimited autonomy without a human explicitly signing off. Even if L1→L2 and L2→L3 approvals are eventually relaxed as trust in the system grows, L3→L4 never will be.

Safety Mechanisms

Agency implements nine safety mechanisms. Each is independently enforceable — they layer to create defense in depth. None are advisory. The runtime blocks violations before they reach the LLM.

1 Trust-Gated Tool Access

Every tool has a required_level attribute. The runtime filters the tool list before the agent sees it. An L1 agent doesn't get "permission denied" when calling spawn_agent — the tool doesn't exist in their world.

python
tools_l1 = [file_read, file_write, bash_safe, git_read, web_search]
tools_l2 = tools_l1 + [spawn_agent(max_level=L1), bash_readwrite]
tools_l3 = tools_l2 + [spawn_agent(max_level=L2), deploy_staging, schedule_task]
tools_l4 = tools_l3 + [deploy_production, system_config]

2 Whitelist-Based Bash Access

Bash is the sharpest tool and the most dangerous. Agency replaces the traditional blocklist approach with per-level whitelists:

LevelBash Access
L1None. Must use MCP tools or escalate.
L2Read-only: git status, git diff, ls, cat, grep, find, test runners
L3L2 + writes: git add, git commit, git push, builds, uv run python
L4Full access with a hard blocklist of permanently forbidden operations

The key design choice: Claude CLI's --allowedTools creates a positive whitelist (deny-by-default), while --disallowedTools creates a blocklist (allow-by-default). Allowlists are used for L1-L3. The blocklist is reserved for L4 as a final safety net.

Agent-specific patterns can be added to individual profiles. Merge logic: GLOBAL_WHITELIST[level] + profile.bashWhitelist - HARD_BLOCKLIST. The blocklist always wins — git push --force is blocked even if git * is whitelisted.

3 Structured Escalation

When an agent legitimately needs a capability outside their whitelist, they don't hit a dead end — they get a structured escalation path:

python
request_tool_access(
    tool="Bash",
    pattern="npm run build",
    reason="Need to verify the production build passes before reporting results",
    scope="session"    # one_shot | session | permanent
)

Escalation history feeds back into promotion reviews:

  • High approval rate (>80%): Good judgment about when to escalate — evidence for promotion
  • High denial rate (>50%): Poor judgment — evidence against
  • Frequently approved patterns (5+ times): Surfaced as candidates for permanent whitelist addition — a self-improving provisioning loop

4 Hierarchical Budget Enforcement

Budgets are hard caps, not guidelines. Enforcement at three levels:

  • Per-run: Each agent run has a maximum cost. When reached, the run terminates.
  • Hierarchical: A child agent can't exceed its parent's remaining budget. Costs roll up the entire tree.
  • Per-user: For multi-tenant deployments, daily/monthly caps with automatic termination.
Goal budget: $10.00 └── jeeves (L3) — $2.10 spent, $7.90 remaining ├── ada (L2) — $3.50 spent (can't exceed jeeves's $7.90) │ └── worker (L1) — $0.80 limit (can't exceed ada's remaining) └── ghost (L2) — $1.50 spent

Warnings at 80%. Hard stop at 100%. No exceptions, no "just one more API call." The BudgetEnforcer runs pre-flight checks before every LLM invocation.

5 Approval Gates

Destructive operations pause execution and wait for human review. This isn't a confirmation dialog — it's a genuine gate in the execution pipeline. The agent's state is preserved, the run suspends, and the human receives a notification with full context.

Operations requiring approval (at L1-L2):

  • Force pushes, hard resets, clean operations
  • Production deploys
  • File deletions outside the working tree
  • System service restarts
  • Bash commands matching destructive patterns

At L3+, most operations auto-approve (but are logged). At L4, everything auto-approves (but is logged). The audit trail is non-negotiable at every level.

6 Worktree Isolation

Parallel agents work in isolated git worktrees — separate filesystem directories that never share files during execution:

/home/teej/agency/ # Main worktree (protected) /home/teej/agency/.worktrees/ ├── ada-feature-login/ # Ada's isolated workspace ├── ghost-research-api/ # Ghost's isolated workspace └── crash-infra-deploy/ # Crash's isolated workspace

No merge conflicts during execution. No clobbered work. Changes merge back through normal git flow — diffs reviewed, conflicts resolved deliberately.

7 Git Safety

Auto-checkpoints before every agent dispatch ensure a known-good state always exists. Destructive git operations are forbidden at the runner level:

  • reset --hard, clean -f, checkout . — blocked
  • Commits enforced before deploys — uncommitted changes can't ship
  • Version tags mark rollback points

These are checks in ClaudeCLIRunner that block execution before the command reaches the shell, not suggestions in a prompt file.

8 Spawn Depth Limits

Recursive agent spawning without limits is a runaway risk. Agency enforces depth at two levels:

  • Per-node: Each execution node has max_spawn_depth. An agent at depth N can only spawn if N < max_spawn_depth.
  • Per-goal: Each root goal can cap maximum tree depth.

Denied spawns are logged but don't count against the agent's performance — the agent must accomplish the task themselves or report back.

9 Full Audit Trail

Every action is logged to the events table with full context:

sql
events (
    event_category TEXT,  -- 'run', 'tool', 'safety', 'context'
    event_type TEXT,      -- 'run.started', 'tool.called', 'safety.escalation_approved'
    run_id TEXT,          -- Which agent run
    agent_id TEXT,        -- Which agent
    cost REAL,            -- Cost of this action
    metadata TEXT          -- Full JSON context
)

This isn't observability for debugging. It's the core safety primitive. You can replay any decision any agent made, trace the full execution tree, and understand exactly why a particular action was taken. The audit trail is immutable from the agent's perspective — agents can read their own events but cannot modify or delete them.

Multi-Agent Architecture

Agents as Autonomous Entities

Agency's core architectural insight: agents are tools in each other's toolbelt. A manager doesn't "dispatch tasks to workers" — they spawn agents to accomplish goals, the way a human manager delegates to their team.

Old model (central orchestration): Human → Dispatcher → assigns tasks → workers execute → results aggregate Agency model (agentic swarms): Human → Agent (with goal) → pulls context → makes plan → executes (including spawning other agents)

The benefits are structural:

  • Distributed planning: Each agent plans its own work — no central bottleneck
  • Autonomous execution: Agents adapt when things go sideways
  • Pull-based context: Agents fetch what they need, when they need it
  • Recursive delegation: Agents spawn agents who spawn agents, with trust boundaries at each level

The Agent Run Tree

Every goal creates a tree of agent runs — not a flat task list:

Goal: "Refactor auth to JWT" │ └── jeeves (L3) [run_001] ───────────────── $4.50 total │ ├── context_pull: "current auth architecture" ├── [plan]: "Backend (Ghost), Frontend (Ada), Migration (worker)" │ ├── ghost (L2) [run_002] ────────────── $1.90 │ └── [completed]: JWT backend, PR #43 │ ├── ada (L2) [run_003] ──────────────── $2.10 │ ├── worker_1 (L1) [run_004] ─────── $0.80 │ │ └── [completed]: Token refresh logic │ └── [completed]: Frontend token handling, PR #42 │ └── [self]: Reviewed PRs, integration tests, merged

Costs roll up through the tree. Every node tracks direct_cost (its own API usage) and total_cost (itself + all descendants). The root node gives you the total cost of accomplishing the goal. This is observable in real-time via SSE updates in the dashboard.

Specialized Agent Roster

Agents aren't generic LLM wrappers. They have defined specialties, persistent identities, and accumulated knowledge:

AgentLevelSpecialtyWorking Style
JeevesL3 ManagerOrchestration, reviewsDecomposes goals, delegates everything. Never codes in main session.
AdaL2 ICFrontend/UILit components, Vite builds, CSS. Knows the component library and design system.
GhostL2 ICResearch & analysisWeb search, document synthesis. Produces structured markdown with citations.
CrashL2 ICInfrastructureStaging deploys, CI/CD, systemd, Tailscale networking.

Each agent's personality (SOUL.md) is shared globally — all users interact with the same identities. Agent memory (MEMORY.md) is per-user, accumulating context about that user's codebase, preferences, and project history.

Worktree Isolation

When multiple agents work simultaneously, each operates in an isolated git worktree:

  1. Pre-dispatch: Runtime creates a worktree for the agent's branch
  2. During execution: Agent reads/writes only within its worktree
  3. Post-completion: Changes merge through normal git flow
  4. Cleanup: Worktree removed after merge

Dispatch three agents — API, UI, tests — and come back to three clean PRs ready for review. No merge conflicts during execution, no clobbered work.

Peer-to-Peer Messaging

Agents communicate through a persistent, audited messaging system:

python
send_message(
    recipient_agent_id="ada",
    content="Auth research complete. JWT tokens should use RS256, not HS256.",
    priority="normal"
)

Messages are persistent (survive restarts), audited (logged with full context), trust-gated (no escalation via messaging), and wake-capable (sending to an idle agent wakes them up). This enables real coordination: Ghost sends findings to Ada, Crash alerts the team about deploys, Jeeves coordinates without being a bottleneck.

Persistent Memory and Self-Improvement

Each agent maintains a MEMORY.md that accumulates structured learnings:

markdown
## Auth Investigation 2026-02-10
- Agency uses `agency.sqlite` as the active DB (not `agency.db`)
- Auth fallback chain: Tailscale headers → session cookie → admin fallback
- The `teej` admin user is seeded during schema migration (V3 seed)

Agents also propose improvements to their own personality files and tool configurations via propose_improvement. Every proposal requires human approval. The system gets better over time — but never without your sign-off.

Multi-User Model

Agency supports multiple users with tiered access, workspace isolation, and budget enforcement. Three tiers serve three distinct use cases.

The Three Tiers

┌────────────────────────────────────────────────────────────┐ │ ADMIN (tier="admin") │ │ Full system access. No budget caps. Manages users. │ │ Tools: All │ Max trust: 999 │ Bash: unrestricted │ ├────────────────────────────────────────────────────────────┤ │ POWER USER (tier="power") │ │ BYOK (bring your own keys). Sandboxed. No source access. │ │ Tools: All except admin │ Max trust: 2 │ Bash: sandbox │ ├────────────────────────────────────────────────────────────┤ │ FRIEND (tier="friend") │ │ Budget-capped. Read-only tools. Personal assistant. │ │ Tools: Read, Grep, WebSearch, WebFetch │ No bash │ └────────────────────────────────────────────────────────────┘
CapabilityAdminPowerFriend
Read filesAnywhereWorkspace onlyWorkspace only
Write/EditAnywhereWorkspace onlyNo
BashUnrestrictedSandboxed*Blocked
Web search/fetchYesYesYes
Spawn agentsYesYesNo
Admin APIsYesNoNo
BudgetExemptExempt (BYOK)Daily + monthly caps

* Power user bash blocks destructive patterns (rm -rf /, mkfs, dd if=) and Agency source access (/home/teej/agency/src).

Authentication Chain

Every request goes through a three-step auth chain. First match wins:

  1. Tailscale headers — When served over a tailnet, X-Tailscale-User provides zero-config identity. New users auto-create as friend tier.
  2. Session cookieHttpOnly cookie-based sessions with 30-day expiry. Passphrase-protected login.
  3. Admin fallback — For local/unauthenticated access, returns admin context. Backward-compatible.

The UserContext dataclass carries the authenticated identity through the entire request lifecycle — from API endpoint to tool execution to budget check.

Workspace Isolation

Non-admin users are sandboxed into per-user directories:

~/.agency/users/ ├── alice-a1b2c3/ │ ├── workspace/ # File sandbox (all file ops resolve here) │ └── agents/ │ ├── ghost/MEMORY.md # Alice's private context with Ghost │ └── ada/MEMORY.md # Alice's private context with Ada ├── bob/ │ ├── workspace/ │ └── agents/

resolve_path() ensures all file operations for non-admin users stay within their workspace. Escape attempts (../../etc/passwd) raise PermissionError. Agency source code is explicitly blocked for power users.

Design principle: SOUL.md is shared (consistent agent personality across users), MEMORY.md is per-user (private context history with each agent).

Budget Enforcement

Friends tier has hard-capped daily and monthly budgets:

json
{"max_per_day": 5.00, "max_per_month": 50.00}

The UserBudgetEnforcer tracks usage via usage_log with rolling windows. Warnings at 80%. Hard stop at 100% — BudgetExceededError terminates the run. Admin and power users are exempt. Power users bring their own API keys, so admin's compute bill is $0 for their usage.

Admin Panel

Admin-tier users see a management interface with:

  • User management: Create, suspend, reactivate, delete. Change tiers in real-time.
  • Budget dashboard: Per-user spend tracking, progress bars, override controls.
  • Safety controls: Emergency kill switch — immediate suspension + terminate active runs.
  • Self-lockout protection: Admin user can't be deleted or demoted via the UI.

Cost & Model Routing

The Problem

No single model is optimal for all tasks. Claude Opus 4.6 leads in agentic coding (65.4% Terminal-Bench) but costs $5/$25 per million tokens. Haiku 4.5 handles file operations at $1/$5 — a 5x difference. DeepSeek R1 offers comparable reasoning at $0.55/$2.19, 27x cheaper than frontier alternatives. Running everything on Opus wastes 80% of the budget on tasks that don't need frontier intelligence.

Multi-Engine Architecture

Agency decouples the agent from the engine. The same agent identity runs on different models depending on the task:

Agent (SOUL.md + permissions + memory) × Engine (Claude CLI, LiteLLM, Codex, Ollama) × Backend (local process, Docker, SSH, cloud)

The personality and trust level stay the same regardless of which model executes. This separation is what makes cost optimization possible without changing agent behavior.

Task-to-Model Routing

Task CategoryPrimary ModelFallbackCost vs. Baseline
File operationsHaiku 4.51x
Simple codingSonnet 4.5Opus 4.63x
Complex/agentic codingOpus 4.65x
Bulk researchDeepSeek R1Sonnet 4.50.6x
Quality synthesisSonnet 4.5Opus 4.63x
L3 orchestrationOpus 4.65x
Triage/classificationHaiku 4.5Sonnet 4.51x

Cascade Routing

Rather than statically assigning models, cascade routing starts cheap and escalates on failure:

  1. Start a coding task on Sonnet 4.5
  2. If tests fail twice, escalate to Opus 4.6
  3. Log the decision and outcome for future optimization

Research shows cascade routing outperforms static routing by up to 14% on cost-quality tradeoffs. Starting 90% of queries on smaller models and escalating only complex requests achieves 87% cost reduction.

Budget-Aware Routing

The router adapts based on budget utilization:

  • < 80%: Standard cascade routing — optimize for quality
  • 80-95%: Route to cheapest acceptable model per task, defer non-urgent work
  • > 95%: Queue for batch processing (50% discount), pause non-critical agents

The Optimization Stack

Four techniques compound for ~70% total cost reduction:

TechniqueMechanismSavings
Model routingRight-size the model to the task~35%
Prompt cachingCache system prompts and tool definitions. Cached tokens: $0.30/M vs $3/M uncached (10x)~45%
Batch processingQueue scheduled and background tasks for 50% discount via Batch API~15%
Cascade routingStart cheap, escalate on failure~14%

Projected economics: 100M input + 50M output tokens per month drops from ~$1,050 (all Sonnet) to ~$319 with full optimization. A hybrid approach — Claude Max subscription for interactive work, API with optimizations for production agents — brings the total to ~$500/month.

Self-Monitoring

Per-request tracking of tokens, cost, latency, and success rate — tagged by agent, task type, and model — feeds back into routing:

  • If Haiku's coding success rate drops below 85%, auto-shift to Sonnet
  • If Sonnet achieves 95%+ on research, try DeepSeek R1 for savings
  • Monitor 429 errors and switch providers when rate-limited

The system self-tunes based on observed performance, not static configuration.

Current State

What's Live

Agency is in active development and private alpha. The core is running in production — orchestrating real agent swarms, shipping real code, enforcing real budgets.

Operational Today

CapabilityStatus
Trust-gated agent execution (L1–L4)Shipping
Hierarchical budget enforcement with hard capsShipping
Multi-agent coordination (spawning, messaging, memory)Shipping
Worktree-isolated parallel agent workShipping
Git safety (checkpoints, destructive op blocking)Shipping
Multi-user sandboxing (admin/power/friend)Shipping
Authentication (Tailscale + cookies + fallback)Shipping
Google Workspace integration (Gmail, Drive, Docs)Shipping
Real-time SSE dashboard with swarm visualizationShipping
Self-improvement proposals with human approvalShipping
Full audit trail on all operationsShipping
Test suite1,400+ tests passing

Technical Stack

ComponentImplementation
StorageSQLite with WAL — single file, concurrent reads during agent execution
Live UpdatesServer-Sent Events — real-time activity pushed to browser
FrontendLit + Vite PWA — lightweight, installable, works on mobile
Agent ProtocolModel Context Protocol (MCP) — standardized tool interface
Process Modelsystemd daemon with watchdog, graceful restart
AuthTailscale-native — zero-config identity from the network layer
AuditEvery tool call, spawn, and schedule logged with full context

What's Next

Near-term

  • LLM-judged promotion reviews — Jeeves reviewing agent performance, producing promotion packets with evidence, human approval gates
  • Whitelist-based bash with escalation — replacing the current blocklist with per-level whitelists and a structured escalation path
  • Push notifications — mobile alerts for approvals, completions, and budget warnings
  • Proactive agent scheduling — agents that run recurring tasks autonomously

Medium-term

  • Cascade model routing — dynamic model selection based on task type, complexity, and budget
  • Browser automation — agents that interact with web UIs
  • BYOK credential vaults — encrypted per-user credential storage for power users
  • Stripe billing — automated billing for multi-user deployments

Long-term

  • Distributed node execution — agents running across multiple machines with capability-based node selection
  • Self-improving whitelists — frequently-approved escalation patterns automatically surfaced for permanent addition
  • Domain expertise compounding — agents developing specialized knowledge that grows over sessions

Architecture Decisions

For engineers evaluating the approach, a few design choices worth calling out:

  1. SQLite over Postgres: Single-file, zero-config, WAL mode handles concurrent reads. For a system running on one machine with <100 concurrent users, SQLite removes an entire class of deployment complexity.
  2. MCP over custom tool protocols: Model Context Protocol is the standardized tool interface across LLM providers. Agents can run on different backends without changing tool definitions.
  3. File-based agent identities: SOUL.md and MEMORY.md are files, not database rows. Version-controllable, human-readable, editable outside the system. Transparency by default.
  4. Allowlists over blocklists: For bash access, allowlists are secure-by-default. New dangerous patterns are blocked until explicitly approved. Blocklists inevitably miss things.
  5. LLM judge over formulas: Trust assessment uses the managing agent's judgment rather than weighted scoring. Handles ambiguity better, but requires reliable L3 review and human oversight as backstop.

The Bet

The industry is building agents that are faster, cheaper, and more capable. Those are table stakes. The harder problem — and the one almost nobody is solving — is building agents that are safe to run unsupervised.

Agency's bet: trust-based governance is the missing infrastructure layer. Not as a constraint on capability, but as the thing that unlocks it. You can't give an agent production deploy access without trust enforcement. You can't run parallel agents without worktree isolation. You can't share your AI infrastructure with friends without budget caps and workspace sandboxing.

The question isn't whether AI agents will run autonomously. It's whether they'll do it safely.