Agency

Building what you dream, while you dream.

Autonomous by design. Accountable by default.

AI agents exist in two modes:
sandbox toys or uncontrolled wildcards

🔒 Sandboxed Agents

Safe but useless for real work. They can summarize text and answer questions, but they can't deploy code, manage infrastructure, or operate autonomously. You babysit them through every step.

⚡ Autonomous Agents

Powerful but terrifying. Give an agent access to your codebase and it might deploy broken code to production. Give it API keys and it might burn through your budget in minutes. Run multiple agents and they'll overwrite each other's work.

The industry is building faster agents. Almost nobody is building governed agents.

Agents aren't made smarter.
They're made safe to run unsupervised.

Level Capabilities Example
L1 Read repos, run safe tools, basic research New agent, first day
L2 Write files, spawn sub-agents, web access Proven IC contributor
L3 Deploy to staging, manage agents, schedule tasks Team lead
L4 Production deploys, system configuration Battle-tested, full autonomy

Promotion requires a composite trust score — success rate (40%), cost efficiency (30%), safety record (30%) — calculated over a rolling 30-day window. The runtime enforces these levels. An L1 agent literally cannot call L3 tools. This isn't policy. It's architecture.

Nine Safety Mechanisms

🛡️

Trust Levels

Every agent starts restricted. Trust is earned through performance, not granted by default. L1→L4 progression enforced at the runtime level.

💰

Budget Enforcement

Hard caps per agent, per user, per day. Hierarchical — children can't exceed parents. Warnings at 80%, termination at 100%.

🚧

Approval Gates

Destructive operations pause and wait for human review. Force pushes, production deploys, file deletions — nothing proceeds without sign-off.

🌿

Worktree Isolation

Parallel agents work in isolated git worktrees. No merge conflicts during execution. No clobbered work. Changes merge through normal git flow.

🔐

Git Safety

Auto-checkpoints before every dispatch. Destructive git ops forbidden at runtime. Version tags mark rollback points. Always a safe state to return to.

🧠

Multi-Engine

Claude, GPT, Gemini, Codex — agents use the right model for the job. Optimize per-task for cost, speed, or capability.

👥

Specialized Agents

Ada (UI), Ghost (Research), Crash (Infra) — persistent memory, domain expertise, personality. They remember your codebase.

📡

Peer-to-Peer Messaging

Agents communicate directly. Messages are persistent, audited, and trust-gated. No back-channel privilege escalation.

🔄

Self-Improvement

Agents propose updates to their own personality, tools, and patterns. Every change requires human approval. Better over time, never unsupervised.

One request. Five agents. Zero babysitting.

Watch Agency orchestrate a real multi-agent sprint — from natural language to shipped feature.

Agency — Jeeves (L3 Manager)
$0.00
1,296 tests

Built for teams. Designed for individuals.

🏗️ For Development Teams

Dispatch a feature to three agents. They work in parallel on isolated branches — one on the API, one on the UI, one on tests. Each agent operates within its trust level, stays within budget, and checkpoints its work automatically.

When they finish, you review the diffs. Approve, merge, ship.

What this replaces

Manual task splitting, sequential AI pair programming, copy-pasting between chat windows, hoping nobody's changes break anybody else's work.

What this enables

A three-agent sprint that runs while you're in a meeting. You come back to three PRs ready for review, not three chat windows waiting for input.

🏠 For Personal Use

Your own AI staff, running on your own infrastructure.

  • Research assistant that searches the web and produces structured reports with citations
  • Email manager that triages your inbox, drafts responses, and flags what needs attention
  • Calendar organizer that coordinates scheduling across your accounts
Admin Full access, unrestricted
Power User Bring your own API key, sandboxed
Friend Budget-capped, read-only, isolated
What this replaces

Six different AI subscriptions that don't talk to each other and forget everything between sessions.

What this enables

A persistent AI team that knows your preferences, operates within your rules, and gets better over time.

Infrastructure software, not a cloud service

Your agents, your data, your infrastructure. Nothing phones home. The audit trail is the product.

ComponentImplementation
Storage SQLite with WAL — single file, zero config, concurrent reads during agent execution
Live Updates Server-Sent Events — real-time agent activity, approvals, and status pushed to the browser
Frontend Lit + Vite PWA — lightweight, installable, works on mobile. No framework overhead
Agent Protocol Model Context Protocol (MCP) — standardized tool interface across LLM providers
Process Model systemd daemon with watchdog, graceful restart, structured logging
Auth Tailscale-native — zero-config identity from the network layer. No passwords, no OAuth
Audit Every tool call, every spawn, every schedule logged with full context

Private alpha. Running in production.

The core is live — orchestrating real agent swarms, shipping real code, enforcing real budgets.

Working Today
  • Trust-gated agent execution (L1–L4)
  • Hierarchical budget enforcement with hard caps
  • Worktree-isolated parallel agent work
  • Peer-to-peer agent messaging
  • Persistent agent memory and self-improvement
  • Multi-user sandboxing with tier-based access
  • Google Workspace integration (Gmail, Drive, Docs)
  • Real-time SSE dashboard with swarm visualization
  • 1,400+ tests passing
Coming Next
  • Push notifications for approvals and alerts
  • Proactive agent scheduling — agents that work while you sleep
  • Browser automation tools
  • Expanded model routing for cost optimization
  • Stripe billing for multi-user deployments

The question isn't whether AI agents will run autonomously.

It's whether they'll do it safely.

Request Early Access →