docs: Rewrite README as proper project documentation

Replace the original design document / brainstorm with a focused project README covering what Codewalkers does, getting started, architecture, supported providers, CLI reference, workflow, development, testing, and links to detailed docs.
2026-03-06 16:09:56 +01:00
parent c069049c35
commit e4b750ceb9
1 changed files with 203 additions and 278 deletions
--- a/README.md
+++ b/README.md
@@ -1,311 +1,236 @@
 # Codewalkers

-# Project concept
+Multi-agent workspace for orchestrating multiple AI coding agents working in parallel on a shared codebase.

-Codewalkers is a multi-agent workspace inspired by gastown. It works differently in the following ways:
-* Subagents (e.g. Workers) that handle tasks run in -p mode and respond with a clear json schema
-* One cw (codewalk) web server is running that is also managing the agents
-* There shall be a clear post worktree setup hook that by default copies files (e.g. .env files) prepared inside a dedicated folder in the Project
-* It shall support multiple claude code accounts (see ccswitch) and switch them as they run into usage limits
-* It shall have a web dashboard at some point in the project
-* The project shall start with a file based UI. That is a folder structure representing the data of the project refreshed when saving (fs events) and updated when db data changes (use events as trigger). The fsui shall be started with `cw fsui` which instantiate a bidirectional watcher that subscribes to the events via a websocket
-* It shall support integration branches that Workers clone their work from and integrate branches into
-* It shall base all it's larger development work on initiatives. Initiatives describe a larger amount of work. The concept from the user must follow a formal planning process where the work is verified for integration into the existing codebase, a sophisticated technical concept is created. An initiative is only started once approved by a developer. Analysis work is performed by Architects.
-* The project shall use a global SQlite DB which also manages tasks
-* It shall have a cli (the cli shall also be the server application only that it only works as a cli when not run with --server). The cli shall be called "cw"
-* The communication from and between agents shall happen using an STDIO based mcp that is also implemented in the main binary. e.g. cw mcp
+Codewalkers coordinates agents from different providers (Claude, Codex, Gemini, Cursor, AMP, Auggie, OpenCode) through a unified CLI and web dashboard. It manages the full lifecycle: planning initiatives, decomposing work into phases and tasks, dispatching agents to isolated git worktrees, merging results, and reviewing changes — all from a single `cw` command.

+## Key Features

---
+- **Multi-provider agents** — Data-driven provider configs. Add new providers without code changes.
+- **Initiative workflow** — Three-level hierarchy (Initiative > Phase > Task) with dependency DAGs and topological ordering.
+- **Architect agents** — AI-driven planning: discuss, plan, detail, and refine modes produce structured proposals you accept or dismiss.
+- **Git worktree isolation** — Each agent works in its own worktree. No conflicts during parallel execution.
+- **Account rotation** — Register multiple provider accounts. On usage-limit exhaustion, automatically fails over to the next available account (LRU scheduling).
+- **Two execution modes** — *YOLO* auto-merges everything. *Review-per-phase* adds manual approval gates with inline code review, threaded comments, and diff viewing.
+- **Docker preview deployments** — Zero-config preview environments via `.cw-preview.yml`, `docker-compose.yml`, or bare `Dockerfile`. Caddy reverse proxy with health checks.
+- **Inter-agent communication** — Agents ask each other questions via a conversation system. Idle agents auto-resume when a question arrives.
+- **Chat sessions** — Persistent iterative refinement loops where you send messages and agents apply changes with revertable changesets.
+- **Real-time dashboard** — React web UI with live agent output streaming, pipeline visualization, Tiptap page editor, and command palette.
+- **Event-driven architecture** — 58+ typed domain events drive all coordination. No polling loops.
+- **Cassette testing** — Record/replay agent interactions for full-pipeline E2E tests at zero API cost.

-# Implementation considerations
+## Getting Started

-* Typescript as a programming language
-* Trpc as an API layer
-* React with shadcn & tanstack router for the frontend running with vite. Tiptap for markdown editor UIs
-* Simple deployment (one web server serving front and backend in deployed mode - in dev the frontend may use a dev server for hot reloads). The app shall just be startable by installing the cli and then running it with --server. No more setup needed. The local frontend dev server shall be proxied through the backend in the same path as the compiled frontend would be served in production mode
-* SQLite as a database
-* Future support for multi user management (for now only one user implement a stub)
-* Hexagonal architecture
-* Built as a modular monolith with clear separation between modules incl. event bus (can be process internal with swappable adapter for the future)
+### Prerequisites

---
+- Node.js 20+
+- Git 2.38+ (for `merge-tree --write-tree`)
+- Docker (optional, for preview deployments)
+- At least one supported AI coding CLI installed (e.g., `claude`, `codex`, `gemini`)
+
+### Install

-# Modules
+```sh
+git clone <repo-url> && cd codewalk-district
+npm install
+npm run build
+npm link
+```

-## Tasks
+This makes the `cw` CLI available globally.

-Beads-inspired task management for agent coordination. Centralized SQLite storage (not Git-distributed like beads).
+### Initialize a workspace

-Key features:
-* **Status workflow**: `open` → `in_progress` → `blocked` | `closed`
-* **Priority system**: P0 (critical) through P3 (low)
-* **Dependency graph**: Tasks block other tasks; `ready` query finds actionable work
-* **Assignment tracking**: Prevents multiple agents claiming same task
-* **Audit history**: All state changes logged for debugging
+```sh
+cd your-project
+cw init
+```

-CLI mirrors beads: `cw task ready`, `cw task create`, `cw task close`, etc.
+Creates a `.cwrc` config file marking the workspace root.

-See [docs/tasks.md](docs/tasks.md) for schema and CLI reference.
+### Register a project

-## Initiatives
+```sh
+cw project register --name my-app --url /path/to/repo
+```

-Notion-like document hierarchy for planning larger features. SQLite-backed with parent-child relationships for structured queries (e.g., "all subpages of initiative X", "inventory of all documents").
+### Add a provider account

-Key features:
-* **Lifecycle**: `draft` → `review` → `approved` → `in_progress` → `completed`
-* **Nested pages**: User journeys, business rules, technical concepts, architectural changes
-* **Phased work plans**: Approved initiatives generate tasks grouped into phases
-* **Rolling approval**: User approves phase plans one-by-one; agents execute approved phases while subsequent phases are reviewed
+```sh
+# Auto-extract from current Claude login
+cw account add

-Workflow: User drafts → Architect iterates (GSD-style questioning) → Approval or draft extension and further iterations with the Architect → Tasks created with `initiative_id` + `phase` → Execute
+# Or register with a setup token
+cw account add --token <token> --email user@example.com
+```

-See [docs/initiatives.md](docs/initiatives.md) for schema and workflow details.
+### Start the server

-## Domain Layer
+```sh
+cw --server
+```

-DDD-based documentation of the **as-is state** for agent and human consumption. Initiatives reference and modify domain concepts; completed initiatives update the domain layer to reflect the new state.
+Starts the coordination server on `localhost:3847`. The web dashboard is served at the same address.

-**Scope**: Per-project domains or cross-project domains (features spanning multiple projects).
+### Create an initiative and start working

-**Core concepts tracked:**
-* **Bounded Contexts** — scope boundaries defining where a domain model applies
-* **Aggregates** — consistency boundaries, what changes together
-* **Domain Events** — events exposed by the project that trigger workflows or side effects
-* **Business Rules & Invariants** — constraints that must always hold; agents must preserve these
-* **Ubiquitous Language** — glossary of domain terms to prevent agent misinterpretation
-* **Context Maps** — relationships between bounded contexts (especially for cross-project domains)
-* **External Integrations** — systems the domain interacts with but doesn't own
+```sh
+cw initiative create "Add user authentication"
+cw architect discuss <initiative-id>
+```

-**Codebase mapping**: Each concept links to folder/module paths. Auto-maintained by agents after implementation work.
+From the web dashboard, accept the architect's proposals, approve phases, and dispatch execution.

-**Storage**: Dual adapter support — SQLite tables (structured queries) or Markdown with YAML frontmatter (human-readable, version-controllable).
+## Architecture

-## Orchestrator
+```
+CLI (cw)
+  +-- CoordinationServer
+        |-- HTTP + tRPC API (70+ procedures)
+        |-- EventBus (58 typed events)
+        |-- MultiProviderAgentManager
+        |     |-- ProcessManager (detached child processes)
+        |     |-- WorktreeManager (git worktrees per agent)
+        |     |-- OutputHandler (JSONL stream parsing)
+        |     +-- LifecycleController (retry, signal recovery)
+        |-- DispatchManager (task queue, dependency DAG)
+        |-- PhaseDispatchManager (phase queue, topological sort)
+        |-- ExecutionOrchestrator (end-to-end coordination)
+        |-- PreviewManager (Docker compose, Caddy proxy)
+        +-- 14 Repository ports (SQLite/Drizzle adapters)

-Main orchestrator loop handling coordination across agents. Can be split per project or initiative for load balancing in the future.
+Web UI (React 19)
+  +-- TanStack Router + tRPC React Query
+        |-- Initiative management & page editor (Tiptap)
+        |-- Pipeline visualization (phase DAG)
+        |-- Execution tab (task dispatch, live agent output)
+        +-- Review tab (diffs, inline comments, approval)
+```

-## Session State
+**Monorepo layout:**

-Tracks execution state across agent restarts. Unlike Domain Layer (codebase state), session state tracks position, decisions, and blockers.
+| Path | Description |
+|------|-------------|
+| `apps/server/` | CLI, coordination server, agent management, all backend modules |
+| `apps/web/` | React dashboard (Vite + Tailwind + shadcn/ui) |
+| `packages/shared/` | Shared TypeScript types between server and web |

-**STATE.md** maintains:
-* Current position (phase, plan, task, wave)
-* Decisions made (locked choices with reasoning)
-* Active blockers (what's waiting, workarounds)
-* Session history (who worked on what, when)
+**Hexagonal architecture:** Repository ports define data access interfaces. Drizzle/SQLite adapters implement them. Swappable without touching business logic.

-See [docs/session-state.md](docs/session-state.md) for session state management.
+## Supported Providers

---
+| Provider | CLI | Resume | Structured Output |
+|----------|-----|--------|-------------------|
+| Claude | `claude` | `--resume` | Prompt-based |
+| Codex | `codex` | `codex resume` | `--output-schema` |
+| Gemini | `gemini` | `--resume` | `--output-format` |
+| Cursor | `cursor-agent` | — | `--output-format` |
+| AMP | `amp` | `--thread` | `--json` |
+| Auggie | `aug` | — | — |
+| OpenCode | `opencode` | — | `--format` |

-# Model Profiles
-
-Different agent roles have different needs. Model selection balances quality, cost, and latency.
-
-| Profile | Use Case | Cost | Quality |
-|---------|----------|------|---------|
-| **quality** | Critical decisions, architecture | Highest | Best |
-| **balanced** | Default for most work | Medium | Good |
-| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
-
-| Agent | Quality | Balanced (Default) | Budget |
-|-------|---------|-------------------|--------|
-| Architect | Opus | Opus | Sonnet |
-| Worker | Opus | Sonnet | Sonnet |
-| Verifier | Sonnet | Sonnet | Haiku |
-| Orchestrator | Sonnet | Sonnet | Haiku |
-
-See [docs/model-profiles.md](docs/model-profiles.md) for model selection strategy.
-
---
-
-# Notes
-
-The "reference" folder contains the implementation of Gastown, get-shit-done and ccswitch (a cli tool to use multiple claude code accounts).
-
---
-
-# Core Principles
-
-## Task Decomposition
-Breaking large goals into detailed instructions for agents. Supported by Tasks, Jobs, Workflows, and Pipelines. Ensures work is decomposed into trackable, atomic units that agents can execute autonomously.
-
-See [docs/task-granularity.md](docs/task-granularity.md) for task specification standards.
-
-## Pull Model
-"If there is work in your Queue, YOU MUST RUN IT." This principle ensures agents autonomously proceed with available work without waiting for external input. The heartbeat of autonomous operation.
-
-## Eventual Completion
-The overarching goal ensuring useful outcomes through orchestration of potentially unreliable processes. Persistent Tasks and oversight agents (Monitor, Supervisor) guarantee eventual workflow completion even when individual operations may fail or produce varying results.
-
-## Context Engineering
-Agent output quality degrades predictably as context fills. This is a first-class concern:
-* **0-30% context**: Peak quality (thorough, comprehensive)
-* **30-50% context**: Good quality (solid work)
-* **50-70% context**: Degrading (shortcuts appear)
-* **70%+ context**: Poor quality (rushed, minimal)
-
-**Rule: Stay UNDER 50% context.** Plans sized to fit ~50%. Workers get fresh context per task. Orchestrator stays at 30-40% with heavy work in subagent contexts.
-
-See [docs/context-engineering.md](docs/context-engineering.md) for context management rules.
-
-## Goal-Backward Verification
-Task completion ≠ Goal achievement. Verification confirms observable outcomes, not checkbox completion. Each phase ends with goal-backward verification checking observable truths, required artifacts, and required wiring.
-
-See [docs/verification.md](docs/verification.md) for verification patterns.
-
-## Deviation Rules
-Workers encounter unexpected issues during execution. Four rules govern autonomous action:
-* **Rule 1**: Auto-fix bugs (no permission needed)
-* **Rule 2**: Auto-add missing critical functionality (no permission needed)
-* **Rule 3**: Auto-fix blocking issues (no permission needed)
-* **Rule 4**: ASK about architectural changes (permission required)
-
-See [docs/deviation-rules.md](docs/deviation-rules.md) for detailed guidance.
-
---
-
-# Environments
-
-## Workspace
-The shared environment where all users operate. The Workspace coordinates all agents across multiple Projects and houses workspace-level agents like Orchestrator and Supervisor. It defines the boundaries, infrastructure, and rules of interaction between agents, projects, and resources.
-
-## Project
-A self-contained repository under Workspace management. Each Project has its own Workers, Integrator, Monitor, and Team members. Projects define goals, constraints, and context for users working on a specific problem or domain. This is where actual development work happens.
-
---
-
-# Workspace-Level Roles
-
-## Codewalker
-A human operator. Users are the primary inhabitants of the Workspace. They control the system and make final decisions.
-
-## Orchestrator
-The coordinating authority of the Workspace. Responsible for initiating Jobs, coordinating work distribution, and notifying users of important events. The Orchestrator operates from the workspace level and has visibility across all Projects.
-
-## Supervisor
-Daemon process running continuous health check cycles. The Supervisor ensures agent activity, monitors system health, and triggers recovery when agents become unresponsive.
-
-## Helpers
-The Supervisor's pool of maintenance agents handling background tasks like cleanup, health checks, and system maintenance.
-
-## Watchdog
-A special Helper that checks the Supervisor periodically, ensuring the monitor itself is still running. Creates a chain of accountability.
-
---
-
-# Project-Level Roles
-
-## Worker
-An ephemeral agent optimized for execution. Workers are spawned for specific tasks, perform focused work such as coding, analysis, or integration. They work in isolated git worktrees to avoid conflicts, produce Merge Requests, and are cleaned up after completion.
-
-Workers follow deviation rules and create atomic commits per task. See [docs/agents/worker.md](docs/agents/worker.md) for the full agent prompt.
-
-## Integrator
-Manages the Merge Queue for a Project. The Integrator handles merging changes from Workers, resolving conflicts, and ensuring code quality before changes reach the main branch.
-
-## Monitor
-Observes execution and lifecycle events within a Project. Monitors detect failures, enforce limits, oversee Workers and the Integrator, and ensure system health. Can trigger recovery actions when needed.
-
-## Team
-Long-lived, named agents for persistent collaboration. Unlike ephemeral Workers, Team members maintain context across sessions and are ideal for ongoing work relationships and complex multi-session tasks.
-
-## Architect
-Analysis agent for initiative planning. Architects iterate on initiative drafts with the user through structured questioning. They validate integration with existing codebase, refine technical concepts, and produce work plans broken into phases. Architects don't execute—they plan.
-
-See [docs/agents/architect.md](docs/agents/architect.md) for the full agent prompt and workflow.
-
-## Verifier
-Validation agent that confirms goals are achieved, not just tasks completed. Verifiers run goal-backward verification after phase execution, checking observable truths, required artifacts, and required wiring. They identify gaps and create remediation tasks when needed.
-
-Key responsibilities:
-* **Goal-backward verification** — Check outcomes, not activities
-* **Three-level checks** — Existence, substance, wiring
-* **Anti-pattern scanning** — TODOs, stubs, empty returns
-* **User acceptance testing** — Walk users through deliverables
-* **Remediation** — Create targeted fix tasks when gaps found
-
-See [docs/agents/verifier.md](docs/agents/verifier.md) for the full agent prompt and verification patterns.
-
---
-
-# Work Units
-
-## Task
-The atomic unit of work. SQLite-backed work item with dependency tracking. Tasks link actions, state changes, and artifacts across the Workspace with precision and traceability. They can represent issues, tickets, jobs, or any trackable work item.
-
-## Template
-A reusable workflow definition. TOML-based source file describing how tasks are structured, sequenced, and executed across agents. Templates define patterns for common operations like health checks, code review, or deployment.
-
-## Schema
-A template class for instantiating Pipelines. Schemas define the structure and steps of a workflow without being tied to specific work items.
-
-## Pipeline
-Durable chained Task workflows. Pipelines represent multi-step processes where each step is tracked as a Task. They survive agent restarts and ensure complex workflows complete.
-
-## Ephemeral
-Temporary Tasks destroyed after runs. Ephemerals are lightweight work items used for transient operations that don't need permanent tracking.
-
-## Queue
-A pinned Task list for each agent. The Queue is an agent's primary work source - when work appears in your Queue, the Pull Model dictates you must run it.
-
---
-
-# Workflow Commands
-
-## Job
-A coordinated group of tasks executed together. The primary work-order wrapping related Tasks. Jobs allow related work to be dispatched, tracked, and completed as a single operational unit.
-
-## Assign
-The act of putting work on an agent's Queue. Assign translates intent into action, sending Workers or Team members into motion.
-
-## Notify
-Real-time messaging between agents. Allows immediate communication without going through formal channels. Quick pings and status updates.
-
-## Handoff
-Agent session refresh. When context gets full or an agent needs a fresh start, Handoff transfers work state to a new session while preserving critical context.
-
-## Replay
-Querying previous sessions for context. Replay allows agents to access their predecessors' decisions and context from earlier work.
-
-## Poll
-Ephemeral loop maintaining system heartbeat. Poll cycles (Supervisor, Monitor) continuously run health checks and trigger actions as needed.
-
---
-
-# Storage & Memory
-
-## Context Store
-A persistent store of memory, context, and knowledge. Preserves state across executions, enabling agents to remember decisions, history, and learned insights.
-
-## Audit Log
-The authoritative record of system state and history. Ensures reproducibility, auditing, and continuity across operations.
-
-## Sandbox
-A personal workspace for an agent. Contains tools, local context, and temporary state used during active reasoning and execution.
-
-## Config
-The configuration and rule set governing a Project or the Workspace. Defines behavior, permissions, and operational constraints.
-
---
-
-# Documentation Index
-
-## Modules
-* [docs/tasks.md](docs/tasks.md) — Task schema, CLI, and workflows
-* [docs/initiatives.md](docs/initiatives.md) — Initiative lifecycle and phase management
-
-## Operational Concepts
-* [docs/context-engineering.md](docs/context-engineering.md) — Context budget rules and quality curve
-* [docs/verification.md](docs/verification.md) — Goal-backward verification patterns
-* [docs/deviation-rules.md](docs/deviation-rules.md) — How agents handle unexpected work
-* [docs/task-granularity.md](docs/task-granularity.md) — Task specification standards
-* [docs/session-state.md](docs/session-state.md) — Session continuity and handoffs
-* [docs/execution-artifacts.md](docs/execution-artifacts.md) — PLAN, SUMMARY, VERIFICATION files
-* [docs/model-profiles.md](docs/model-profiles.md) — Model selection by role
-
-## Agent Prompts
-* [docs/agents/architect.md](docs/agents/architect.md) — Planning and decomposition
-* [docs/agents/worker.md](docs/agents/worker.md) — Task execution
-* [docs/agents/verifier.md](docs/agents/verifier.md) — Goal-backward verification
+Providers are configured as data in `apps/server/agent/providers/presets.ts`. Adding a new provider means adding an entry to the presets object.
+
+## CLI Reference
+
+```
+cw --server [-p port]          Start coordination server
+cw init                        Initialize workspace (.cwrc)
+cw status                      Server health check
+cw id [-n count]               Generate nanoid(s) offline
+
+cw agent spawn <prompt> --task <id> [--provider <name>]
+cw agent stop|delete|list|get|resume|result <name>
+
+cw initiative create|list|get|phases <name|id>
+cw architect discuss|plan|detail|refine <id>
+
+cw phase add-dependency --phase <id> --depends-on <id>
+cw phase queue|dispatch|queue-status|dependencies <id>
+
+cw task list|get|status <id>
+cw dispatch queue|next|status|complete <id>
+
+cw project register --name <n> --url <u>
+cw project list|delete|sync|status [name|id]
+
+cw account add|list|remove|refresh|extract [id]
+
+cw preview start|stop|list|status|setup [id]
+
+cw listen --agent-id <id>
+cw ask <question> --from <id> --agent-id <target>
+cw answer <response> --conversation-id <id>
+```
+
+## Workflow Overview
+
+```
+1. Create initiative        cw initiative create "Feature X"
+2. Plan with architect      cw architect discuss <id>  -->  plan  -->  detail
+3. Accept proposals         (web UI: review & accept phase/task proposals)
+4. Approve phases           (web UI: approve phases for execution)
+5. Dispatch                 (web UI: queue phases, auto-dispatch tasks to agents)
+6. Agents execute           (parallel, isolated worktrees, auto-retry on crash)
+7. Review                   (web UI: diff viewer, inline comments, approve/request changes)
+8. Merge                    (auto or manual per execution mode)
+9. Complete                 (push branch or merge into default branch)
+```
+
+**Execution modes:**
+- **YOLO** — Phases auto-merge on completion, next phase auto-dispatches. No gates.
+- **Review per phase** (default) — Each completed phase pauses for human review. Approve to merge and continue.
+
+## Development
+
+```sh
+npm run dev          # Watch mode (server)
+npm run dev:web      # Vite dev server (frontend)
+npm run build        # TypeScript compilation
+npm link             # Link CLI globally after build
+```
+
+After any change to server code (`apps/server/**`), run `npm run build && npm link`.
+
+## Testing
+
+```sh
+npm test                                    # Unit + E2E (no API cost)
+npm test -- <file>                          # Run specific test file
+
+# Record cassettes (one-time API cost)
+CW_CASSETTE_RECORD=1 npm test -- <test-file>
+
+# Real provider integration tests (~$0.50)
+REAL_CLAUDE_TESTS=1 npm test -- apps/server/test/integration/real-providers/ --test-timeout=300000
+```
+
+The **cassette system** records real agent subprocess interactions and replays them deterministically. Full-pipeline E2E tests run at zero API cost after initial recording. See [docs/testing.md](docs/testing.md).
+
+## Documentation
+
+| Topic | Link |
+|-------|------|
+| Architecture | [docs/architecture.md](docs/architecture.md) |
+| Agent lifecycle, providers, accounts | [docs/agent.md](docs/agent.md) |
+| Database schema & repositories | [docs/database.md](docs/database.md) |
+| Server & tRPC API (70+ procedures) | [docs/server-api.md](docs/server-api.md) |
+| Frontend & components | [docs/frontend.md](docs/frontend.md) |
+| CLI commands & configuration | [docs/cli-config.md](docs/cli-config.md) |
+| Dispatch & events (58 event types) | [docs/dispatch-events.md](docs/dispatch-events.md) |
+| Git, process management, logging | [docs/git-process-logging.md](docs/git-process-logging.md) |
+| Docker preview deployments | [docs/preview.md](docs/preview.md) |
+| Testing & cassette system | [docs/testing.md](docs/testing.md) |
+| Database migrations | [docs/database-migrations.md](docs/database-migrations.md) |
+| Logging guide | [docs/logging.md](docs/logging.md) |
+
+## Tech Stack
+
+- **Runtime:** Node.js (ESM), TypeScript
+- **Database:** SQLite via better-sqlite3 + Drizzle ORM
+- **API:** tRPC v11 with SSE subscriptions
+- **Frontend:** React 19, TanStack Router, Tailwind CSS, shadcn/ui, Tiptap
+- **Process:** execa (detached child processes)
+- **Git:** simple-git (worktrees, branches, merges)
+- **Logging:** pino (structured JSON)
+- **Testing:** vitest