Lukas May 269a2d2616 feat: Extend AgentInfo with exitCode + add getAgentInputFiles/getAgentPrompt tRPC procedures
Adds exitCode to AgentInfo type and propagates it through all toAgentInfo()
implementations. Enhances getAgent to also return taskName and initiativeName
from their respective repositories. Adds two new filesystem-reading tRPC
procedures for the Agent Details tab: getAgentInputFiles (reads .cw/input/
files with binary detection, 500 KB cap, sorted) and getAgentPrompt (reads
.cw/agent-logs/<name>/PROMPT.md with 1 MB cap and structured ENOENT handling).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 21:39:29 +01:00

Codewalkers

Project concept

Codewalkers is a multi-agent workspace inspired by gastown. It works differently in the following ways:

  • Subagents (e.g. Workers) that handle tasks run in -p mode and respond with a clear json schema
  • One cw (codewalk) web server is running that is also managing the agents
  • There shall be a clear post worktree setup hook that by default copies files (e.g. .env files) prepared inside a dedicated folder in the Project
  • It shall support multiple claude code accounts (see ccswitch) and switch them as they run into usage limits
  • It shall have a web dashboard at some point in the project
  • The project shall start with a file based UI. That is a folder structure representing the data of the project refreshed when saving (fs events) and updated when db data changes (use events as trigger). The fsui shall be started with cw fsui which instantiate a bidirectional watcher that subscribes to the events via a websocket
  • It shall support integration branches that Workers clone their work from and integrate branches into
  • It shall base all it's larger development work on initiatives. Initiatives describe a larger amount of work. The concept from the user must follow a formal planning process where the work is verified for integration into the existing codebase, a sophisticated technical concept is created. An initiative is only started once approved by a developer. Analysis work is performed by Architects.
  • The project shall use a global SQlite DB which also manages tasks
  • It shall have a cli (the cli shall also be the server application only that it only works as a cli when not run with --server). The cli shall be called "cw"
  • The communication from and between agents shall happen using an STDIO based mcp that is also implemented in the main binary. e.g. cw mcp

Implementation considerations

  • Typescript as a programming language
  • Trpc as an API layer
  • React with shadcn & tanstack router for the frontend running with vite. Tiptap for markdown editor UIs
  • Simple deployment (one web server serving front and backend in deployed mode - in dev the frontend may use a dev server for hot reloads). The app shall just be startable by installing the cli and then running it with --server. No more setup needed. The local frontend dev server shall be proxied through the backend in the same path as the compiled frontend would be served in production mode
  • SQLite as a database
  • Future support for multi user management (for now only one user implement a stub)
  • Hexagonal architecture
  • Built as a modular monolith with clear separation between modules incl. event bus (can be process internal with swappable adapter for the future)

Modules

Tasks

Beads-inspired task management for agent coordination. Centralized SQLite storage (not Git-distributed like beads).

Key features:

  • Status workflow: openin_progressblocked | closed
  • Priority system: P0 (critical) through P3 (low)
  • Dependency graph: Tasks block other tasks; ready query finds actionable work
  • Assignment tracking: Prevents multiple agents claiming same task
  • Audit history: All state changes logged for debugging

CLI mirrors beads: cw task ready, cw task create, cw task close, etc.

See docs/tasks.md for schema and CLI reference.

Initiatives

Notion-like document hierarchy for planning larger features. SQLite-backed with parent-child relationships for structured queries (e.g., "all subpages of initiative X", "inventory of all documents").

Key features:

  • Lifecycle: draftreviewapprovedin_progresscompleted
  • Nested pages: User journeys, business rules, technical concepts, architectural changes
  • Phased work plans: Approved initiatives generate tasks grouped into phases
  • Rolling approval: User approves phase plans one-by-one; agents execute approved phases while subsequent phases are reviewed

Workflow: User drafts → Architect iterates (GSD-style questioning) → Approval or draft extension and further iterations with the Architect → Tasks created with initiative_id + phase → Execute

See docs/initiatives.md for schema and workflow details.

Domain Layer

DDD-based documentation of the as-is state for agent and human consumption. Initiatives reference and modify domain concepts; completed initiatives update the domain layer to reflect the new state.

Scope: Per-project domains or cross-project domains (features spanning multiple projects).

Core concepts tracked:

  • Bounded Contexts — scope boundaries defining where a domain model applies
  • Aggregates — consistency boundaries, what changes together
  • Domain Events — events exposed by the project that trigger workflows or side effects
  • Business Rules & Invariants — constraints that must always hold; agents must preserve these
  • Ubiquitous Language — glossary of domain terms to prevent agent misinterpretation
  • Context Maps — relationships between bounded contexts (especially for cross-project domains)
  • External Integrations — systems the domain interacts with but doesn't own

Codebase mapping: Each concept links to folder/module paths. Auto-maintained by agents after implementation work.

Storage: Dual adapter support — SQLite tables (structured queries) or Markdown with YAML frontmatter (human-readable, version-controllable).

Orchestrator

Main orchestrator loop handling coordination across agents. Can be split per project or initiative for load balancing in the future.

Session State

Tracks execution state across agent restarts. Unlike Domain Layer (codebase state), session state tracks position, decisions, and blockers.

STATE.md maintains:

  • Current position (phase, plan, task, wave)
  • Decisions made (locked choices with reasoning)
  • Active blockers (what's waiting, workarounds)
  • Session history (who worked on what, when)

See docs/session-state.md for session state management.


Model Profiles

Different agent roles have different needs. Model selection balances quality, cost, and latency.

Profile Use Case Cost Quality
quality Critical decisions, architecture Highest Best
balanced Default for most work Medium Good
budget High-volume, low-risk tasks Lowest Acceptable
Agent Quality Balanced (Default) Budget
Architect Opus Opus Sonnet
Worker Opus Sonnet Sonnet
Verifier Sonnet Sonnet Haiku
Orchestrator Sonnet Sonnet Haiku

See docs/model-profiles.md for model selection strategy.


Notes

The "reference" folder contains the implementation of Gastown, get-shit-done and ccswitch (a cli tool to use multiple claude code accounts).


Core Principles

Task Decomposition

Breaking large goals into detailed instructions for agents. Supported by Tasks, Jobs, Workflows, and Pipelines. Ensures work is decomposed into trackable, atomic units that agents can execute autonomously.

See docs/task-granularity.md for task specification standards.

Pull Model

"If there is work in your Queue, YOU MUST RUN IT." This principle ensures agents autonomously proceed with available work without waiting for external input. The heartbeat of autonomous operation.

Eventual Completion

The overarching goal ensuring useful outcomes through orchestration of potentially unreliable processes. Persistent Tasks and oversight agents (Monitor, Supervisor) guarantee eventual workflow completion even when individual operations may fail or produce varying results.

Context Engineering

Agent output quality degrades predictably as context fills. This is a first-class concern:

  • 0-30% context: Peak quality (thorough, comprehensive)
  • 30-50% context: Good quality (solid work)
  • 50-70% context: Degrading (shortcuts appear)
  • 70%+ context: Poor quality (rushed, minimal)

Rule: Stay UNDER 50% context. Plans sized to fit ~50%. Workers get fresh context per task. Orchestrator stays at 30-40% with heavy work in subagent contexts.

See docs/context-engineering.md for context management rules.

Goal-Backward Verification

Task completion ≠ Goal achievement. Verification confirms observable outcomes, not checkbox completion. Each phase ends with goal-backward verification checking observable truths, required artifacts, and required wiring.

See docs/verification.md for verification patterns.

Deviation Rules

Workers encounter unexpected issues during execution. Four rules govern autonomous action:

  • Rule 1: Auto-fix bugs (no permission needed)
  • Rule 2: Auto-add missing critical functionality (no permission needed)
  • Rule 3: Auto-fix blocking issues (no permission needed)
  • Rule 4: ASK about architectural changes (permission required)

See docs/deviation-rules.md for detailed guidance.


Environments

Workspace

The shared environment where all users operate. The Workspace coordinates all agents across multiple Projects and houses workspace-level agents like Orchestrator and Supervisor. It defines the boundaries, infrastructure, and rules of interaction between agents, projects, and resources.

Project

A self-contained repository under Workspace management. Each Project has its own Workers, Integrator, Monitor, and Team members. Projects define goals, constraints, and context for users working on a specific problem or domain. This is where actual development work happens.


Workspace-Level Roles

Codewalker

A human operator. Users are the primary inhabitants of the Workspace. They control the system and make final decisions.

Orchestrator

The coordinating authority of the Workspace. Responsible for initiating Jobs, coordinating work distribution, and notifying users of important events. The Orchestrator operates from the workspace level and has visibility across all Projects.

Supervisor

Daemon process running continuous health check cycles. The Supervisor ensures agent activity, monitors system health, and triggers recovery when agents become unresponsive.

Helpers

The Supervisor's pool of maintenance agents handling background tasks like cleanup, health checks, and system maintenance.

Watchdog

A special Helper that checks the Supervisor periodically, ensuring the monitor itself is still running. Creates a chain of accountability.


Project-Level Roles

Worker

An ephemeral agent optimized for execution. Workers are spawned for specific tasks, perform focused work such as coding, analysis, or integration. They work in isolated git worktrees to avoid conflicts, produce Merge Requests, and are cleaned up after completion.

Workers follow deviation rules and create atomic commits per task. See docs/agents/worker.md for the full agent prompt.

Integrator

Manages the Merge Queue for a Project. The Integrator handles merging changes from Workers, resolving conflicts, and ensuring code quality before changes reach the main branch.

Monitor

Observes execution and lifecycle events within a Project. Monitors detect failures, enforce limits, oversee Workers and the Integrator, and ensure system health. Can trigger recovery actions when needed.

Team

Long-lived, named agents for persistent collaboration. Unlike ephemeral Workers, Team members maintain context across sessions and are ideal for ongoing work relationships and complex multi-session tasks.

Architect

Analysis agent for initiative planning. Architects iterate on initiative drafts with the user through structured questioning. They validate integration with existing codebase, refine technical concepts, and produce work plans broken into phases. Architects don't execute—they plan.

See docs/agents/architect.md for the full agent prompt and workflow.

Verifier

Validation agent that confirms goals are achieved, not just tasks completed. Verifiers run goal-backward verification after phase execution, checking observable truths, required artifacts, and required wiring. They identify gaps and create remediation tasks when needed.

Key responsibilities:

  • Goal-backward verification — Check outcomes, not activities
  • Three-level checks — Existence, substance, wiring
  • Anti-pattern scanning — TODOs, stubs, empty returns
  • User acceptance testing — Walk users through deliverables
  • Remediation — Create targeted fix tasks when gaps found

See docs/agents/verifier.md for the full agent prompt and verification patterns.


Work Units

Task

The atomic unit of work. SQLite-backed work item with dependency tracking. Tasks link actions, state changes, and artifacts across the Workspace with precision and traceability. They can represent issues, tickets, jobs, or any trackable work item.

Template

A reusable workflow definition. TOML-based source file describing how tasks are structured, sequenced, and executed across agents. Templates define patterns for common operations like health checks, code review, or deployment.

Schema

A template class for instantiating Pipelines. Schemas define the structure and steps of a workflow without being tied to specific work items.

Pipeline

Durable chained Task workflows. Pipelines represent multi-step processes where each step is tracked as a Task. They survive agent restarts and ensure complex workflows complete.

Ephemeral

Temporary Tasks destroyed after runs. Ephemerals are lightweight work items used for transient operations that don't need permanent tracking.

Queue

A pinned Task list for each agent. The Queue is an agent's primary work source - when work appears in your Queue, the Pull Model dictates you must run it.


Workflow Commands

Job

A coordinated group of tasks executed together. The primary work-order wrapping related Tasks. Jobs allow related work to be dispatched, tracked, and completed as a single operational unit.

Assign

The act of putting work on an agent's Queue. Assign translates intent into action, sending Workers or Team members into motion.

Notify

Real-time messaging between agents. Allows immediate communication without going through formal channels. Quick pings and status updates.

Handoff

Agent session refresh. When context gets full or an agent needs a fresh start, Handoff transfers work state to a new session while preserving critical context.

Replay

Querying previous sessions for context. Replay allows agents to access their predecessors' decisions and context from earlier work.

Poll

Ephemeral loop maintaining system heartbeat. Poll cycles (Supervisor, Monitor) continuously run health checks and trigger actions as needed.


Storage & Memory

Context Store

A persistent store of memory, context, and knowledge. Preserves state across executions, enabling agents to remember decisions, history, and learned insights.

Audit Log

The authoritative record of system state and history. Ensures reproducibility, auditing, and continuity across operations.

Sandbox

A personal workspace for an agent. Contains tools, local context, and temporary state used during active reasoning and execution.

Config

The configuration and rule set governing a Project or the Workspace. Defines behavior, permissions, and operational constraints.


Documentation Index

Modules

Operational Concepts

Agent Prompts

Description
No description provided
Readme 4.3 MiB
Languages
HTML 52.9%
TypeScript 45.5%
Shell 1.1%
CSS 0.3%
JavaScript 0.2%