docs: Rewrite README as proper project documentation

Replace the original design document / brainstorm with a focused project
README covering what Codewalkers does, getting started, architecture,
supported providers, CLI reference, workflow, development, testing, and
links to detailed docs.
This commit is contained in:
Lukas May
2026-03-06 16:09:56 +01:00
parent c069049c35
commit e4b750ceb9

481
README.md
View File

@@ -1,311 +1,236 @@
# Codewalkers
# Project concept
Multi-agent workspace for orchestrating multiple AI coding agents working in parallel on a shared codebase.
Codewalkers is a multi-agent workspace inspired by gastown. It works differently in the following ways:
* Subagents (e.g. Workers) that handle tasks run in -p mode and respond with a clear json schema
* One cw (codewalk) web server is running that is also managing the agents
* There shall be a clear post worktree setup hook that by default copies files (e.g. .env files) prepared inside a dedicated folder in the Project
* It shall support multiple claude code accounts (see ccswitch) and switch them as they run into usage limits
* It shall have a web dashboard at some point in the project
* The project shall start with a file based UI. That is a folder structure representing the data of the project refreshed when saving (fs events) and updated when db data changes (use events as trigger). The fsui shall be started with `cw fsui` which instantiate a bidirectional watcher that subscribes to the events via a websocket
* It shall support integration branches that Workers clone their work from and integrate branches into
* It shall base all it's larger development work on initiatives. Initiatives describe a larger amount of work. The concept from the user must follow a formal planning process where the work is verified for integration into the existing codebase, a sophisticated technical concept is created. An initiative is only started once approved by a developer. Analysis work is performed by Architects.
* The project shall use a global SQlite DB which also manages tasks
* It shall have a cli (the cli shall also be the server application only that it only works as a cli when not run with --server). The cli shall be called "cw"
* The communication from and between agents shall happen using an STDIO based mcp that is also implemented in the main binary. e.g. cw mcp
Codewalkers coordinates agents from different providers (Claude, Codex, Gemini, Cursor, AMP, Auggie, OpenCode) through a unified CLI and web dashboard. It manages the full lifecycle: planning initiatives, decomposing work into phases and tasks, dispatching agents to isolated git worktrees, merging results, and reviewing changes — all from a single `cw` command.
## Key Features
---
- **Multi-provider agents** — Data-driven provider configs. Add new providers without code changes.
- **Initiative workflow** — Three-level hierarchy (Initiative > Phase > Task) with dependency DAGs and topological ordering.
- **Architect agents** — AI-driven planning: discuss, plan, detail, and refine modes produce structured proposals you accept or dismiss.
- **Git worktree isolation** — Each agent works in its own worktree. No conflicts during parallel execution.
- **Account rotation** — Register multiple provider accounts. On usage-limit exhaustion, automatically fails over to the next available account (LRU scheduling).
- **Two execution modes** — *YOLO* auto-merges everything. *Review-per-phase* adds manual approval gates with inline code review, threaded comments, and diff viewing.
- **Docker preview deployments** — Zero-config preview environments via `.cw-preview.yml`, `docker-compose.yml`, or bare `Dockerfile`. Caddy reverse proxy with health checks.
- **Inter-agent communication** — Agents ask each other questions via a conversation system. Idle agents auto-resume when a question arrives.
- **Chat sessions** — Persistent iterative refinement loops where you send messages and agents apply changes with revertable changesets.
- **Real-time dashboard** — React web UI with live agent output streaming, pipeline visualization, Tiptap page editor, and command palette.
- **Event-driven architecture** — 58+ typed domain events drive all coordination. No polling loops.
- **Cassette testing** — Record/replay agent interactions for full-pipeline E2E tests at zero API cost.
# Implementation considerations
## Getting Started
* Typescript as a programming language
* Trpc as an API layer
* React with shadcn & tanstack router for the frontend running with vite. Tiptap for markdown editor UIs
* Simple deployment (one web server serving front and backend in deployed mode - in dev the frontend may use a dev server for hot reloads). The app shall just be startable by installing the cli and then running it with --server. No more setup needed. The local frontend dev server shall be proxied through the backend in the same path as the compiled frontend would be served in production mode
* SQLite as a database
* Future support for multi user management (for now only one user implement a stub)
* Hexagonal architecture
* Built as a modular monolith with clear separation between modules incl. event bus (can be process internal with swappable adapter for the future)
### Prerequisites
---
- Node.js 20+
- Git 2.38+ (for `merge-tree --write-tree`)
- Docker (optional, for preview deployments)
- At least one supported AI coding CLI installed (e.g., `claude`, `codex`, `gemini`)
### Install
# Modules
```sh
git clone <repo-url> && cd codewalk-district
npm install
npm run build
npm link
```
## Tasks
This makes the `cw` CLI available globally.
Beads-inspired task management for agent coordination. Centralized SQLite storage (not Git-distributed like beads).
### Initialize a workspace
Key features:
* **Status workflow**: `open``in_progress``blocked` | `closed`
* **Priority system**: P0 (critical) through P3 (low)
* **Dependency graph**: Tasks block other tasks; `ready` query finds actionable work
* **Assignment tracking**: Prevents multiple agents claiming same task
* **Audit history**: All state changes logged for debugging
```sh
cd your-project
cw init
```
CLI mirrors beads: `cw task ready`, `cw task create`, `cw task close`, etc.
Creates a `.cwrc` config file marking the workspace root.
See [docs/tasks.md](docs/tasks.md) for schema and CLI reference.
### Register a project
## Initiatives
```sh
cw project register --name my-app --url /path/to/repo
```
Notion-like document hierarchy for planning larger features. SQLite-backed with parent-child relationships for structured queries (e.g., "all subpages of initiative X", "inventory of all documents").
### Add a provider account
Key features:
* **Lifecycle**: `draft``review``approved``in_progress``completed`
* **Nested pages**: User journeys, business rules, technical concepts, architectural changes
* **Phased work plans**: Approved initiatives generate tasks grouped into phases
* **Rolling approval**: User approves phase plans one-by-one; agents execute approved phases while subsequent phases are reviewed
```sh
# Auto-extract from current Claude login
cw account add
Workflow: User drafts → Architect iterates (GSD-style questioning) → Approval or draft extension and further iterations with the Architect → Tasks created with `initiative_id` + `phase` → Execute
# Or register with a setup token
cw account add --token <token> --email user@example.com
```
See [docs/initiatives.md](docs/initiatives.md) for schema and workflow details.
### Start the server
## Domain Layer
```sh
cw --server
```
DDD-based documentation of the **as-is state** for agent and human consumption. Initiatives reference and modify domain concepts; completed initiatives update the domain layer to reflect the new state.
Starts the coordination server on `localhost:3847`. The web dashboard is served at the same address.
**Scope**: Per-project domains or cross-project domains (features spanning multiple projects).
### Create an initiative and start working
**Core concepts tracked:**
* **Bounded Contexts** — scope boundaries defining where a domain model applies
* **Aggregates** — consistency boundaries, what changes together
* **Domain Events** — events exposed by the project that trigger workflows or side effects
* **Business Rules & Invariants** — constraints that must always hold; agents must preserve these
* **Ubiquitous Language** — glossary of domain terms to prevent agent misinterpretation
* **Context Maps** — relationships between bounded contexts (especially for cross-project domains)
* **External Integrations** — systems the domain interacts with but doesn't own
```sh
cw initiative create "Add user authentication"
cw architect discuss <initiative-id>
```
**Codebase mapping**: Each concept links to folder/module paths. Auto-maintained by agents after implementation work.
From the web dashboard, accept the architect's proposals, approve phases, and dispatch execution.
**Storage**: Dual adapter support — SQLite tables (structured queries) or Markdown with YAML frontmatter (human-readable, version-controllable).
## Architecture
## Orchestrator
```
CLI (cw)
+-- CoordinationServer
|-- HTTP + tRPC API (70+ procedures)
|-- EventBus (58 typed events)
|-- MultiProviderAgentManager
| |-- ProcessManager (detached child processes)
| |-- WorktreeManager (git worktrees per agent)
| |-- OutputHandler (JSONL stream parsing)
| +-- LifecycleController (retry, signal recovery)
|-- DispatchManager (task queue, dependency DAG)
|-- PhaseDispatchManager (phase queue, topological sort)
|-- ExecutionOrchestrator (end-to-end coordination)
|-- PreviewManager (Docker compose, Caddy proxy)
+-- 14 Repository ports (SQLite/Drizzle adapters)
Main orchestrator loop handling coordination across agents. Can be split per project or initiative for load balancing in the future.
Web UI (React 19)
+-- TanStack Router + tRPC React Query
|-- Initiative management & page editor (Tiptap)
|-- Pipeline visualization (phase DAG)
|-- Execution tab (task dispatch, live agent output)
+-- Review tab (diffs, inline comments, approval)
```
## Session State
**Monorepo layout:**
Tracks execution state across agent restarts. Unlike Domain Layer (codebase state), session state tracks position, decisions, and blockers.
| Path | Description |
|------|-------------|
| `apps/server/` | CLI, coordination server, agent management, all backend modules |
| `apps/web/` | React dashboard (Vite + Tailwind + shadcn/ui) |
| `packages/shared/` | Shared TypeScript types between server and web |
**STATE.md** maintains:
* Current position (phase, plan, task, wave)
* Decisions made (locked choices with reasoning)
* Active blockers (what's waiting, workarounds)
* Session history (who worked on what, when)
**Hexagonal architecture:** Repository ports define data access interfaces. Drizzle/SQLite adapters implement them. Swappable without touching business logic.
See [docs/session-state.md](docs/session-state.md) for session state management.
## Supported Providers
---
| Provider | CLI | Resume | Structured Output |
|----------|-----|--------|-------------------|
| Claude | `claude` | `--resume` | Prompt-based |
| Codex | `codex` | `codex resume` | `--output-schema` |
| Gemini | `gemini` | `--resume` | `--output-format` |
| Cursor | `cursor-agent` | — | `--output-format` |
| AMP | `amp` | `--thread` | `--json` |
| Auggie | `aug` | — | — |
| OpenCode | `opencode` | — | `--format` |
# Model Profiles
Different agent roles have different needs. Model selection balances quality, cost, and latency.
| Profile | Use Case | Cost | Quality |
|---------|----------|------|---------|
| **quality** | Critical decisions, architecture | Highest | Best |
| **balanced** | Default for most work | Medium | Good |
| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
| Agent | Quality | Balanced (Default) | Budget |
|-------|---------|-------------------|--------|
| Architect | Opus | Opus | Sonnet |
| Worker | Opus | Sonnet | Sonnet |
| Verifier | Sonnet | Sonnet | Haiku |
| Orchestrator | Sonnet | Sonnet | Haiku |
See [docs/model-profiles.md](docs/model-profiles.md) for model selection strategy.
---
# Notes
The "reference" folder contains the implementation of Gastown, get-shit-done and ccswitch (a cli tool to use multiple claude code accounts).
---
# Core Principles
## Task Decomposition
Breaking large goals into detailed instructions for agents. Supported by Tasks, Jobs, Workflows, and Pipelines. Ensures work is decomposed into trackable, atomic units that agents can execute autonomously.
See [docs/task-granularity.md](docs/task-granularity.md) for task specification standards.
## Pull Model
"If there is work in your Queue, YOU MUST RUN IT." This principle ensures agents autonomously proceed with available work without waiting for external input. The heartbeat of autonomous operation.
## Eventual Completion
The overarching goal ensuring useful outcomes through orchestration of potentially unreliable processes. Persistent Tasks and oversight agents (Monitor, Supervisor) guarantee eventual workflow completion even when individual operations may fail or produce varying results.
## Context Engineering
Agent output quality degrades predictably as context fills. This is a first-class concern:
* **0-30% context**: Peak quality (thorough, comprehensive)
* **30-50% context**: Good quality (solid work)
* **50-70% context**: Degrading (shortcuts appear)
* **70%+ context**: Poor quality (rushed, minimal)
**Rule: Stay UNDER 50% context.** Plans sized to fit ~50%. Workers get fresh context per task. Orchestrator stays at 30-40% with heavy work in subagent contexts.
See [docs/context-engineering.md](docs/context-engineering.md) for context management rules.
## Goal-Backward Verification
Task completion ≠ Goal achievement. Verification confirms observable outcomes, not checkbox completion. Each phase ends with goal-backward verification checking observable truths, required artifacts, and required wiring.
See [docs/verification.md](docs/verification.md) for verification patterns.
## Deviation Rules
Workers encounter unexpected issues during execution. Four rules govern autonomous action:
* **Rule 1**: Auto-fix bugs (no permission needed)
* **Rule 2**: Auto-add missing critical functionality (no permission needed)
* **Rule 3**: Auto-fix blocking issues (no permission needed)
* **Rule 4**: ASK about architectural changes (permission required)
See [docs/deviation-rules.md](docs/deviation-rules.md) for detailed guidance.
---
# Environments
## Workspace
The shared environment where all users operate. The Workspace coordinates all agents across multiple Projects and houses workspace-level agents like Orchestrator and Supervisor. It defines the boundaries, infrastructure, and rules of interaction between agents, projects, and resources.
## Project
A self-contained repository under Workspace management. Each Project has its own Workers, Integrator, Monitor, and Team members. Projects define goals, constraints, and context for users working on a specific problem or domain. This is where actual development work happens.
---
# Workspace-Level Roles
## Codewalker
A human operator. Users are the primary inhabitants of the Workspace. They control the system and make final decisions.
## Orchestrator
The coordinating authority of the Workspace. Responsible for initiating Jobs, coordinating work distribution, and notifying users of important events. The Orchestrator operates from the workspace level and has visibility across all Projects.
## Supervisor
Daemon process running continuous health check cycles. The Supervisor ensures agent activity, monitors system health, and triggers recovery when agents become unresponsive.
## Helpers
The Supervisor's pool of maintenance agents handling background tasks like cleanup, health checks, and system maintenance.
## Watchdog
A special Helper that checks the Supervisor periodically, ensuring the monitor itself is still running. Creates a chain of accountability.
---
# Project-Level Roles
## Worker
An ephemeral agent optimized for execution. Workers are spawned for specific tasks, perform focused work such as coding, analysis, or integration. They work in isolated git worktrees to avoid conflicts, produce Merge Requests, and are cleaned up after completion.
Workers follow deviation rules and create atomic commits per task. See [docs/agents/worker.md](docs/agents/worker.md) for the full agent prompt.
## Integrator
Manages the Merge Queue for a Project. The Integrator handles merging changes from Workers, resolving conflicts, and ensuring code quality before changes reach the main branch.
## Monitor
Observes execution and lifecycle events within a Project. Monitors detect failures, enforce limits, oversee Workers and the Integrator, and ensure system health. Can trigger recovery actions when needed.
## Team
Long-lived, named agents for persistent collaboration. Unlike ephemeral Workers, Team members maintain context across sessions and are ideal for ongoing work relationships and complex multi-session tasks.
## Architect
Analysis agent for initiative planning. Architects iterate on initiative drafts with the user through structured questioning. They validate integration with existing codebase, refine technical concepts, and produce work plans broken into phases. Architects don't execute—they plan.
See [docs/agents/architect.md](docs/agents/architect.md) for the full agent prompt and workflow.
## Verifier
Validation agent that confirms goals are achieved, not just tasks completed. Verifiers run goal-backward verification after phase execution, checking observable truths, required artifacts, and required wiring. They identify gaps and create remediation tasks when needed.
Key responsibilities:
* **Goal-backward verification** — Check outcomes, not activities
* **Three-level checks** — Existence, substance, wiring
* **Anti-pattern scanning** — TODOs, stubs, empty returns
* **User acceptance testing** — Walk users through deliverables
* **Remediation** — Create targeted fix tasks when gaps found
See [docs/agents/verifier.md](docs/agents/verifier.md) for the full agent prompt and verification patterns.
---
# Work Units
## Task
The atomic unit of work. SQLite-backed work item with dependency tracking. Tasks link actions, state changes, and artifacts across the Workspace with precision and traceability. They can represent issues, tickets, jobs, or any trackable work item.
## Template
A reusable workflow definition. TOML-based source file describing how tasks are structured, sequenced, and executed across agents. Templates define patterns for common operations like health checks, code review, or deployment.
## Schema
A template class for instantiating Pipelines. Schemas define the structure and steps of a workflow without being tied to specific work items.
## Pipeline
Durable chained Task workflows. Pipelines represent multi-step processes where each step is tracked as a Task. They survive agent restarts and ensure complex workflows complete.
## Ephemeral
Temporary Tasks destroyed after runs. Ephemerals are lightweight work items used for transient operations that don't need permanent tracking.
## Queue
A pinned Task list for each agent. The Queue is an agent's primary work source - when work appears in your Queue, the Pull Model dictates you must run it.
---
# Workflow Commands
## Job
A coordinated group of tasks executed together. The primary work-order wrapping related Tasks. Jobs allow related work to be dispatched, tracked, and completed as a single operational unit.
## Assign
The act of putting work on an agent's Queue. Assign translates intent into action, sending Workers or Team members into motion.
## Notify
Real-time messaging between agents. Allows immediate communication without going through formal channels. Quick pings and status updates.
## Handoff
Agent session refresh. When context gets full or an agent needs a fresh start, Handoff transfers work state to a new session while preserving critical context.
## Replay
Querying previous sessions for context. Replay allows agents to access their predecessors' decisions and context from earlier work.
## Poll
Ephemeral loop maintaining system heartbeat. Poll cycles (Supervisor, Monitor) continuously run health checks and trigger actions as needed.
---
# Storage & Memory
## Context Store
A persistent store of memory, context, and knowledge. Preserves state across executions, enabling agents to remember decisions, history, and learned insights.
## Audit Log
The authoritative record of system state and history. Ensures reproducibility, auditing, and continuity across operations.
## Sandbox
A personal workspace for an agent. Contains tools, local context, and temporary state used during active reasoning and execution.
## Config
The configuration and rule set governing a Project or the Workspace. Defines behavior, permissions, and operational constraints.
---
# Documentation Index
## Modules
* [docs/tasks.md](docs/tasks.md) — Task schema, CLI, and workflows
* [docs/initiatives.md](docs/initiatives.md) — Initiative lifecycle and phase management
## Operational Concepts
* [docs/context-engineering.md](docs/context-engineering.md) — Context budget rules and quality curve
* [docs/verification.md](docs/verification.md) — Goal-backward verification patterns
* [docs/deviation-rules.md](docs/deviation-rules.md) — How agents handle unexpected work
* [docs/task-granularity.md](docs/task-granularity.md) — Task specification standards
* [docs/session-state.md](docs/session-state.md) — Session continuity and handoffs
* [docs/execution-artifacts.md](docs/execution-artifacts.md) — PLAN, SUMMARY, VERIFICATION files
* [docs/model-profiles.md](docs/model-profiles.md) — Model selection by role
## Agent Prompts
* [docs/agents/architect.md](docs/agents/architect.md) — Planning and decomposition
* [docs/agents/worker.md](docs/agents/worker.md) — Task execution
* [docs/agents/verifier.md](docs/agents/verifier.md) — Goal-backward verification
Providers are configured as data in `apps/server/agent/providers/presets.ts`. Adding a new provider means adding an entry to the presets object.
## CLI Reference
```
cw --server [-p port] Start coordination server
cw init Initialize workspace (.cwrc)
cw status Server health check
cw id [-n count] Generate nanoid(s) offline
cw agent spawn <prompt> --task <id> [--provider <name>]
cw agent stop|delete|list|get|resume|result <name>
cw initiative create|list|get|phases <name|id>
cw architect discuss|plan|detail|refine <id>
cw phase add-dependency --phase <id> --depends-on <id>
cw phase queue|dispatch|queue-status|dependencies <id>
cw task list|get|status <id>
cw dispatch queue|next|status|complete <id>
cw project register --name <n> --url <u>
cw project list|delete|sync|status [name|id]
cw account add|list|remove|refresh|extract [id]
cw preview start|stop|list|status|setup [id]
cw listen --agent-id <id>
cw ask <question> --from <id> --agent-id <target>
cw answer <response> --conversation-id <id>
```
## Workflow Overview
```
1. Create initiative cw initiative create "Feature X"
2. Plan with architect cw architect discuss <id> --> plan --> detail
3. Accept proposals (web UI: review & accept phase/task proposals)
4. Approve phases (web UI: approve phases for execution)
5. Dispatch (web UI: queue phases, auto-dispatch tasks to agents)
6. Agents execute (parallel, isolated worktrees, auto-retry on crash)
7. Review (web UI: diff viewer, inline comments, approve/request changes)
8. Merge (auto or manual per execution mode)
9. Complete (push branch or merge into default branch)
```
**Execution modes:**
- **YOLO** — Phases auto-merge on completion, next phase auto-dispatches. No gates.
- **Review per phase** (default) — Each completed phase pauses for human review. Approve to merge and continue.
## Development
```sh
npm run dev # Watch mode (server)
npm run dev:web # Vite dev server (frontend)
npm run build # TypeScript compilation
npm link # Link CLI globally after build
```
After any change to server code (`apps/server/**`), run `npm run build && npm link`.
## Testing
```sh
npm test # Unit + E2E (no API cost)
npm test -- <file> # Run specific test file
# Record cassettes (one-time API cost)
CW_CASSETTE_RECORD=1 npm test -- <test-file>
# Real provider integration tests (~$0.50)
REAL_CLAUDE_TESTS=1 npm test -- apps/server/test/integration/real-providers/ --test-timeout=300000
```
The **cassette system** records real agent subprocess interactions and replays them deterministically. Full-pipeline E2E tests run at zero API cost after initial recording. See [docs/testing.md](docs/testing.md).
## Documentation
| Topic | Link |
|-------|------|
| Architecture | [docs/architecture.md](docs/architecture.md) |
| Agent lifecycle, providers, accounts | [docs/agent.md](docs/agent.md) |
| Database schema & repositories | [docs/database.md](docs/database.md) |
| Server & tRPC API (70+ procedures) | [docs/server-api.md](docs/server-api.md) |
| Frontend & components | [docs/frontend.md](docs/frontend.md) |
| CLI commands & configuration | [docs/cli-config.md](docs/cli-config.md) |
| Dispatch & events (58 event types) | [docs/dispatch-events.md](docs/dispatch-events.md) |
| Git, process management, logging | [docs/git-process-logging.md](docs/git-process-logging.md) |
| Docker preview deployments | [docs/preview.md](docs/preview.md) |
| Testing & cassette system | [docs/testing.md](docs/testing.md) |
| Database migrations | [docs/database-migrations.md](docs/database-migrations.md) |
| Logging guide | [docs/logging.md](docs/logging.md) |
## Tech Stack
- **Runtime:** Node.js (ESM), TypeScript
- **Database:** SQLite via better-sqlite3 + Drizzle ORM
- **API:** tRPC v11 with SSE subscriptions
- **Frontend:** React 19, TanStack Router, Tailwind CSS, shadcn/ui, Tiptap
- **Process:** execa (detached child processes)
- **Git:** simple-git (worktrees, branches, merges)
- **Logging:** pino (structured JSON)
- **Testing:** vitest