Files
Codewalkers/.planning/research/SUMMARY.md
Lukas May 0ff65b0b02 feat: Rename application from "Codewalk District" to "Codewalkers"
Update all user-facing strings (HTML title, manifest, header logo,
browser title updater), code comments, and documentation references.
Folder name retained as-is.
2026-03-05 12:05:08 +01:00

8.0 KiB

Project Research Summary

Project: Codewalkers Domain: Multi-agent orchestration / Developer tooling Researched: 2026-01-30 Confidence: HIGH

Executive Summary

Codewalkers enters a space where the basic problem (running multiple Claude Code agents in parallel) is already solved by tools like Claude Squad, par, and Claude Flow. The differentiation opportunity isn't in parallel execution—that's table stakes now. The gap is in coordination quality: preventing conflicts before they happen, making review manageable, and keeping the developer in control without drowning in context switches.

The recommended approach is a TypeScript CLI with embedded tRPC server, SQLite persistence, and hexagonal architecture. Skip Redis/BullMQ—SQLite with WAL mode handles 10-15k tasks/second locally. Use Commander.js (not oclif), Drizzle ORM with better-sqlite3, and execa for process spawning. The modular monolith structure with clear port/adapter boundaries sets up future evolution without rewrites.

The biggest risks are process management failures (zombie/orphan agents), SQLite WAL corruption during backups, and race conditions with concurrent agents. These are addressed in the first phases with explicit process tree tracking, proper WAL checkpoint strategy, and atomic transaction design using BEGIN IMMEDIATE.

Key Findings

Commander.js + tRPC + Drizzle/better-sqlite3 + execa. All dependencies have HIGH confidence based on official docs and ecosystem maturity.

Core technologies:

  • Commander.js 14.x: CLI framework — lighter than oclif, TypeScript types built-in
  • tRPC v11: API layer — type-safe, standalone server adapter, SSE subscriptions
  • Drizzle ORM 0.44+ / better-sqlite3 11.x: Database — faster than raw better-sqlite3 with prepared statements
  • Zod 3.24+: Validation — shared contracts between CLI and server
  • execa 9.6+: Process spawning — auto-cleanup, cross-platform, graceful shutdown

Expected Features

Must have (table stakes):

  • Git worktree isolation per agent
  • tmux session management
  • Background/yolo execution mode
  • Start/stop/list agents CLI
  • Session persistence across restarts

Should have (differentiators):

  • File-level coordination (which agent touches which files)
  • Cost/token tracking per agent
  • Conflict prediction before merge

Defer (v2+):

  • AI task decomposition
  • Cross-agent communication
  • Web dashboard

Architecture Approach

Hexagonal architecture with modular monolith. Ports are interfaces in domain layer, adapters are implementations in infrastructure. Event bus (Node.js EventEmitter) decouples modules. SQLite task queue with atomic claim via transactions.

Major components:

  1. Task Scheduler — Job queue with priority, retry, delay; SQLite-backed
  2. Agent Pool — Piscina-style pool with bounded concurrency (min: 1, max: CPU cores)
  3. Process Supervisor — Child process lifecycle with automatic restart
  4. MCP Transport — STDIO-based JSON-RPC 2.0 for agent communication
  5. Event Bus — In-memory pub/sub for module decoupling

Critical Pitfalls

  1. Zombie/orphan processes: child.kill() doesn't clean up process trees. Use terminate package to kill all children. Track all PIDs.
  2. SQLite WAL corruption: Never fs.copyFile() an active database. Use SQLite backup API or checkpoint before copy.
  3. SQLITE_BUSY despite busy_timeout: Use BEGIN IMMEDIATE for any transaction that will write. busy_timeout doesn't help mid-transaction.
  4. Graceful shutdown failures: Node.js doesn't handle shutdown by default. Listen for SIGTERM, use process.exitCode not process.exit().
  5. Git worktree orphaning: Always use git worktree remove, never rm -rf. Run git worktree prune periodically.

Implications for Roadmap

Based on research, suggested phase structure:

Phase 1: Core Architecture

Rationale: Foundation prevents cascading failures. Process lifecycle, signal handling, and cross-platform paths must be right from day one. Delivers: CLI binary, server mode, graceful shutdown, process tree tracking Addresses: Zombie/orphan processes, graceful shutdown failures, cross-platform paths Avoids: Process cleanup technical debt

Phase 2: Data Layer

Rationale: SQLite patterns affect everything downstream. Get WAL, transactions, and checkpointing right before building features. Delivers: SQLite database, task queue, state persistence Uses: Drizzle ORM, better-sqlite3, WAL mode Addresses: SQLite corruption, SQLITE_BUSY errors, checkpoint starvation

Phase 3: Git Integration

Rationale: Worktree management is core to the value proposition. File isolation is table stakes. Delivers: Worktree creation/cleanup, branch isolation Addresses: Worktree orphaning Avoids: Manual cleanup chaos

Phase 4: Agent Orchestration

Rationale: Now that processes, data, and git work, add the agent pool and task dispatch. Delivers: Agent spawn/stop, task dispatch, agent pool Uses: execa, process supervisor, MCP transport Addresses: Multi-agent race conditions, state ownership

Phase 5: File System UI

Rationale: Bidirectional sync after core features are stable. Delivers: File watcher, SQLite ↔ filesystem sync Addresses: File watcher race conditions

Phase 6: CLI Polish

Rationale: UX improvements after features work. Delivers: Rich terminal UI, token tracking, error messages Addresses: UX pitfalls

Phase Ordering Rationale

  • Core → Data → Git → Agent: Each phase depends on the previous. Can't spawn agents without process management. Can't persist agent state without database. Can't isolate agents without worktrees.
  • File System UI late: It's an interface, not core logic. Build when you know what state needs syncing.
  • CLI Polish last: Optimize what works. Don't polish what might change.

Research Flags

Phases likely needing deeper research during planning:

  • Phase 4 (Agent Orchestration): MCP protocol integration needs validation with actual Claude Code behavior
  • Phase 5 (File System UI): Bidirectional sync patterns may need prototyping

Phases with standard patterns (skip research-phase):

  • Phase 1 (Core Architecture): Well-documented Node.js patterns
  • Phase 2 (Data Layer): SQLite + Drizzle has extensive docs
  • Phase 3 (Git Integration): Git worktree is well-documented

Confidence Assessment

Area Confidence Notes
Stack HIGH Commander, tRPC, Drizzle all have official docs, large user base
Features HIGH Competitor analysis clear; Claude Squad establishes baseline
Architecture HIGH Hexagonal + SQLite task queue is well-documented pattern
Pitfalls HIGH Based on post-mortems, official SQLite docs, Node.js issues

Overall confidence: HIGH

Gaps to Address

  • MCP protocol with Claude Code: Need to validate STDIO transport works as expected. May need to spawn agents differently.
  • simple-git worktree support: Library covers basics but may need CLI fallback for some operations.
  • Token tracking: Need to find how Claude Code exposes usage metrics (if at all).

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Token tracking approach — no verified source for Claude Code usage API

Research completed: 2026-01-30 Ready for roadmap: yes