Autonomous by design. Accountable by default.
Safe but useless for real work. They can summarize text and answer questions, but they can't deploy code, manage infrastructure, or operate autonomously. You babysit them through every step.
Powerful but terrifying. Give an agent access to your codebase and it might deploy broken code to production. Give it API keys and it might burn through your budget in minutes. Run multiple agents and they'll overwrite each other's work.
The industry is building faster agents. Almost nobody is building governed agents.
| Level | Capabilities | Example |
|---|---|---|
| L1 | Read repos, run safe tools, basic research | New agent, first day |
| L2 | Write files, spawn sub-agents, web access | Proven IC contributor |
| L3 | Deploy to staging, manage agents, schedule tasks | Team lead |
| L4 | Production deploys, system configuration | Battle-tested, full autonomy |
Promotion requires a composite trust score — success rate (40%), cost efficiency (30%), safety record (30%) — calculated over a rolling 30-day window. The runtime enforces these levels. An L1 agent literally cannot call L3 tools. This isn't policy. It's architecture.
Every agent starts restricted. Trust is earned through performance, not granted by default. L1→L4 progression enforced at the runtime level.
Hard caps per agent, per user, per day. Hierarchical — children can't exceed parents. Warnings at 80%, termination at 100%.
Destructive operations pause and wait for human review. Force pushes, production deploys, file deletions — nothing proceeds without sign-off.
Parallel agents work in isolated git worktrees. No merge conflicts during execution. No clobbered work. Changes merge through normal git flow.
Auto-checkpoints before every dispatch. Destructive git ops forbidden at runtime. Version tags mark rollback points. Always a safe state to return to.
Claude, GPT, Gemini, Codex — agents use the right model for the job. Optimize per-task for cost, speed, or capability.
Ada (UI), Ghost (Research), Crash (Infra) — persistent memory, domain expertise, personality. They remember your codebase.
Agents communicate directly. Messages are persistent, audited, and trust-gated. No back-channel privilege escalation.
Agents propose updates to their own personality, tools, and patterns. Every change requires human approval. Better over time, never unsupervised.
Watch Agency orchestrate a real multi-agent sprint — from natural language to shipped feature.
Dispatch a feature to three agents. They work in parallel on isolated branches — one on the API, one on the UI, one on tests. Each agent operates within its trust level, stays within budget, and checkpoints its work automatically.
When they finish, you review the diffs. Approve, merge, ship.
Manual task splitting, sequential AI pair programming, copy-pasting between chat windows, hoping nobody's changes break anybody else's work.
A three-agent sprint that runs while you're in a meeting. You come back to three PRs ready for review, not three chat windows waiting for input.
Your own AI staff, running on your own infrastructure.
Six different AI subscriptions that don't talk to each other and forget everything between sessions.
A persistent AI team that knows your preferences, operates within your rules, and gets better over time.
Your agents, your data, your infrastructure. Nothing phones home. The audit trail is the product.
| Component | Implementation |
|---|---|
| Storage | SQLite with WAL — single file, zero config, concurrent reads during agent execution |
| Live Updates | Server-Sent Events — real-time agent activity, approvals, and status pushed to the browser |
| Frontend | Lit + Vite PWA — lightweight, installable, works on mobile. No framework overhead |
| Agent Protocol | Model Context Protocol (MCP) — standardized tool interface across LLM providers |
| Process Model | systemd daemon with watchdog, graceful restart, structured logging |
| Auth | Tailscale-native — zero-config identity from the network layer. No passwords, no OAuth |
| Audit | Every tool call, every spawn, every schedule logged with full context |
The core is live — orchestrating real agent swarms, shipping real code, enforcing real budgets.
It's whether they'll do it safely.
Request Early Access →