Agency

The Problem

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

Safe but useless for real work. They can summarize text and answer questions, but they can't deploy code, manage infrastructure, or operate autonomously. You babysit them through every step.

⚡ Autonomous Agents

Powerful but terrifying. Give an agent access to your codebase and it might deploy broken code to production. Give it API keys and it might burn through your budget in minutes. Run multiple agents and they'll overwrite each other's work.

The industry is building faster agents. Almost nobody is building governed agents.

How Agency Works

Agents aren't made smarter.
They're made safe to run unsupervised.

Level	Capabilities	Example
L1	Read repos, run safe tools, basic research	New agent, first day
L2	Write files, spawn sub-agents, web access	Proven IC contributor
L3	Deploy to staging, manage agents, schedule tasks	Team lead
L4	Production deploys, system configuration	Battle-tested, full autonomy

Promotion requires a composite trust score — success rate (40%), cost efficiency (30%), safety record (30%) — calculated over a rolling 30-day window. The runtime enforces these levels. An L1 agent literally cannot call L3 tools. This isn't policy. It's architecture.

Nine Safety Mechanisms

🛡️

Trust Levels

Every agent starts restricted. Trust is earned through performance, not granted by default. L1→L4 progression enforced at the runtime level.

💰

Budget Enforcement

Hard caps per agent, per user, per day. Hierarchical — children can't exceed parents. Warnings at 80%, termination at 100%.

🚧

Approval Gates

Destructive operations pause and wait for human review. Force pushes, production deploys, file deletions — nothing proceeds without sign-off.

🌿

Worktree Isolation

Parallel agents work in isolated git worktrees. No merge conflicts during execution. No clobbered work. Changes merge through normal git flow.

🔐

Git Safety

Auto-checkpoints before every dispatch. Destructive git ops forbidden at runtime. Version tags mark rollback points. Always a safe state to return to.

🧠

Multi-Engine

Claude, GPT, Gemini, Codex — agents use the right model for the job. Optimize per-task for cost, speed, or capability.

👥

Specialized Agents

Ada (UI), Ghost (Research), Crash (Infra) — persistent memory, domain expertise, personality. They remember your codebase.

📡

Peer-to-Peer Messaging

Agents communicate directly. Messages are persistent, audited, and trust-gated. No back-channel privilege escalation.

🔄

Self-Improvement

Agents propose updates to their own personality, tools, and patterns. Every change requires human approval. Better over time, never unsupervised.

Use Cases

Built for teams. Designed for individuals.

🏗️ For Development Teams

Dispatch a feature to three agents. They work in parallel on isolated branches — one on the API, one on the UI, one on tests. Each agent operates within its trust level, stays within budget, and checkpoints its work automatically.

When they finish, you review the diffs. Approve, merge, ship.

What this replaces

Manual task splitting, sequential AI pair programming, copy-pasting between chat windows, hoping nobody's changes break anybody else's work.

What this enables

A three-agent sprint that runs while you're in a meeting. You come back to three PRs ready for review, not three chat windows waiting for input.

🏠 For Personal Use

Your own AI staff, running on your own infrastructure.

Research assistant that searches the web and produces structured reports with citations
Email manager that triages your inbox, drafts responses, and flags what needs attention
Calendar organizer that coordinates scheduling across your accounts

Admin Full access, unrestricted

Power User Bring your own API key, sandboxed

Friend Budget-capped, read-only, isolated

What this replaces

Six different AI subscriptions that don't talk to each other and forget everything between sessions.

What this enables

A persistent AI team that knows your preferences, operates within your rules, and gets better over time.

Under the Hood

Infrastructure software, not a cloud service

Your agents, your data, your infrastructure. Nothing phones home. The audit trail is the product.

Component	Implementation
Storage	SQLite with WAL — single file, zero config, concurrent reads during agent execution
Live Updates	Server-Sent Events — real-time agent activity, approvals, and status pushed to the browser
Frontend	Lit + Vite PWA — lightweight, installable, works on mobile. No framework overhead
Agent Protocol	Model Context Protocol (MCP) — standardized tool interface across LLM providers
Process Model	systemd daemon with watchdog, graceful restart, structured logging
Auth	Tailscale-native — zero-config identity from the network layer. No passwords, no OAuth
Audit	Every tool call, every spawn, every schedule logged with full context

Current State

Private alpha. Running in production.

The core is live — orchestrating real agent swarms, shipping real code, enforcing real budgets.

Working Today

✓ Trust-gated agent execution (L1–L4)
✓ Hierarchical budget enforcement with hard caps
✓ Worktree-isolated parallel agent work
✓ Peer-to-peer agent messaging
✓ Persistent agent memory and self-improvement
✓ Multi-user sandboxing with tier-based access
✓ Google Workspace integration (Gmail, Drive, Docs)
✓ Real-time SSE dashboard with swarm visualization
✓ 1,400+ tests passing

Coming Next

○ Push notifications for approvals and alerts
○ Proactive agent scheduling — agents that work while you sleep
○ Browser automation tools
○ Expanded model routing for cost optimization
○ Stripe billing for multi-user deployments

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

⚡ Autonomous Agents

Agents aren't made smarter.
They're made safe to run unsupervised.

Nine Safety Mechanisms

Trust Levels

Budget Enforcement

Approval Gates

Worktree Isolation

Git Safety

Multi-Engine

Specialized Agents

Peer-to-Peer Messaging

Self-Improvement

One request. Five agents. Zero babysitting.

Built for teams. Designed for individuals.

🏗️ For Development Teams

🏠 For Personal Use

Infrastructure software, not a cloud service

Private alpha. Running in production.

The question isn't whether AI agents will run autonomously.

Agency

AI agents exist in two modes:sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

⚡ Autonomous Agents

Agents aren't made smarter.They're made safe to run unsupervised.

Nine Safety Mechanisms

Trust Levels

Budget Enforcement

Approval Gates

Worktree Isolation

Git Safety

Multi-Engine

Specialized Agents

Peer-to-Peer Messaging

Self-Improvement

One request. Five agents. Zero babysitting.

Built for teams. Designed for individuals.

🏗️ For Development Teams

🏠 For Personal Use

Infrastructure software, not a cloud service

Private alpha. Running in production.

The question isn't whether AI agents will run autonomously.

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

Agents aren't made smarter.
They're made safe to run unsupervised.