Delete 3 files: - completion-detection.test.ts (private method tests, covered by crash-race-condition) - completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition) - real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths) Remove individual tests: - crash-race-condition.test.ts #4 (weaker duplicate of #2) - mock-manager.test.ts duplicate "(second test)" for detail_complete - process-manager.test.ts 2 "logs comprehensive" tests with empty assertions - edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios Update test-inventory.md to reflect removals.
23 KiB
Test Inventory
Comprehensive catalog of every test in the suite, what it verifies, and where coverage is weak or redundant.
41 test files | 33 unit | 7 E2E | 3 integration (mocked) | 5 integration (real providers)
Last audited: 2026-03-02
Coverage Matrix
Subjective 0–10 scores. 0 = no coverage, 5 = happy paths only, 10 = comprehensive with edge cases.
| Module | Happy Path | Error/Edge | Integration | Race/Concurrency | Notes |
|---|---|---|---|---|---|
| Agent Manager | 8 | 6 | 6 | 9 | Missing: getResult with actual result, multi-provider spawn |
| Agent OutputHandler | 6 | 4 | 5 | 8 | Missing: processAgentOutput with error status, no-result-line case |
| Agent FileIO | 8 | 7 | 8 | — | Missing: round-trip write→read, writeInputFiles content verification |
| Agent ProcessManager | 6 | 4 | 3 | 0 | Heavy mocking; missing: killProcess, cleanupWorktree, error paths |
| Agent Lifecycle | 8 | 7 | 3 | 0 | Missing: controller integration, analyzeError→shouldRetry chain |
| DB Repositories | 9 | 7 | 9 | — | task.test.ts thin: missing findByParentTaskId, approval, category queries |
| DB Cascades | 10 | — | 10 | — | Excellent: full graph verified |
| Dispatch Manager | 7 | 5 | 8 | 0 | "dependency scenario" test is misleading (deps not implemented) |
| Phase Dispatch | 9 | 7 | 8 | 0 | Best dependency tests (diamond pattern, blocking propagation) |
| Coordination/Merge | 7 | 5 | 7 | 0 | Overlap with ConflictResolutionService |
| Git Manager | 9 | 7 | 9 | 0 | Real git repos, no mocks. Excellent. |
| Git BranchManager | 6 | 5 | 6 | — | Only remoteBranchExists tested. Other methods untested. |
| Process Manager/Registry | 8 | 6 | — | 0 | Registry is pure and clean. Manager has mock fragility. |
| Logging | 8 | 6 | 8 | 0 | Real filesystem. Good. |
| Events Bus | 8 | 3 | — | 0 | Missing: handler-throws, handler ordering |
| Server/HTTP | 8 | 6 | 9 | 0 | Real HTTP + real PID files |
| Shutdown | 5 | 2 | — | 0 | Missing: signal→handler→shutdown chain, error resilience |
| tRPC Router | 3 | 1 | 3 | 0 | Only health and status tested. All domain procedures untested. |
| Preview | 7 | 5 | 4 | 0 | Missing: discoverConfig, docker-client, health-checker unit tests |
| E2E Workflows | 8 | 6 | 8 | 0 | Good lifecycle coverage. Output parsing never tested E2E. |
Test Files — Detailed Inventory
Agent Module (src/agent/)
manager.test.ts — MultiProviderAgentManager
| Test | Verifies |
|---|---|
| spawn / creates agent record with provided name | repository.create called with name |
| spawn / rejects duplicate agent names | Unique name constraint enforced |
| spawn / emits AgentSpawned event with name | Event has agent name in payload |
| spawn / writes diagnostic files for workdir verification | spawn-diagnostic.json written with metadata |
| spawn / uses custom cwd if provided | child_process.spawn gets custom cwd |
| stop / stops running agent and updates status | DB updated to stopped |
| stop / kills detached process if running | Status update after spawn+stop (does NOT verify kill) |
| stop / throws if agent not found | Error on missing ID |
| stop / emits AgentStopped event | Event has user_requested reason |
| list / returns all agents with names | findAll returns correct data |
| get / finds agent by id | findById returns agent |
| get / returns null if not found | findById returns null |
| getByName / finds agent by name | findByName returns agent |
| getByName / returns null if not found | findByName returns null |
| resume / resumes agent waiting for input | Spawn called with --resume + session args |
| resume / rejects if not waiting for input | Error on wrong status |
| resume / rejects if no session | Error on null sessionId |
| resume / emits AgentResumed event | Event has sessionId |
| getResult / returns null when no result | Agent result is null |
| delete / deletes agent and clears active state | repository.delete called |
| delete / emits agent:deleted event | Event has agent name |
| delete / throws if not found | Error on missing ID |
| delete / handles missing workdir gracefully | No error when workdir absent |
Mocking: 5 vi.mock() calls (child_process, git/manager, node:fs, node:fs/promises, file-tailer). Brittle.
Gaps: No test for getResult with actual result. No multi-provider tests. findByTaskId/findBySessionId mocked but never exercised.
output-handler.test.ts — OutputHandler
| Test | Verifies |
|---|---|
| processAgentOutput / parse questions from Claude output | waiting_for_input status + questions stored |
| processAgentOutput / handle malformed questions | Falls to crashed status |
| processAgentOutput / handle "done" status | No waiting_for_input set |
| getPendingQuestions / retrieve stored questions | JSON round-trip from DB |
| getPendingQuestions / return null when none | Null for no questions |
| formatAnswersAsPrompt / format normal answers | String formatting correct |
| formatAnswersAsPrompt / instruction-enhanced answers | __instruction__ prepended |
| formatAnswersAsPrompt / instruction with whitespace | Trimmed correctly |
| formatAnswersAsPrompt / only instruction, no answers | Instruction + header only |
| formatAnswersAsPrompt / empty answers | Header with no entries |
Gaps: No test for processAgentOutput with "error" status. No test for output with no result line. Uses real provider registry (fragile).
mutex-completion.test.ts — handleCompletion mutex
| Test | Verifies |
|---|---|
| prevent concurrent completion | Same agentId blocked |
| allow sequential completion | Same agentId allowed after first finishes |
| clean up on exception | Lock released on error |
| per-agent mutex isolation | Different agentIds run concurrently |
Excellent focused design. 4 complementary scenarios. Uses real timers (50ms delays).
file-io.test.ts — File I/O utilities
| Test | Verifies |
|---|---|
| generateId / non-empty string | Returns truthy |
| generateId / unique values | 100 unique IDs |
| writeInputFiles / initiative.md | File created |
| writeInputFiles / phase.md | File created |
| writeInputFiles / task.md | File created |
| writeInputFiles / pages | Files in pages/ subdir |
| writeInputFiles / empty options | No crash |
| readSummary / with frontmatter | Parsed correctly |
| readSummary / missing file | Returns null |
| readSummary / no frontmatter | Body only |
| readSummary / empty files_modified | Empty array |
| readPhaseFiles / from phases/ dir | Parsed with frontmatter |
| readPhaseFiles / missing dir | Returns [] |
| readPhaseFiles / no dependencies | Defaults to [] |
| readTaskFiles / from tasks/ dir | Parsed with frontmatter |
| readTaskFiles / defaults category/type | execute/auto defaults |
| readTaskFiles / missing dir | Returns [] |
| readDecisionFiles / from decisions/ dir | Parsed |
| readDecisionFiles / missing dir | Returns [] |
| readPageFiles / from pages/ dir | Parsed with pageId |
| readPageFiles / missing dir | Returns [] |
| readPageFiles / ignores non-.md | Filter works |
Real filesystem, no mocks. Good. Gaps: writeInputFiles only checks existsSync, not file content. No write→read round-trip test.
process-manager.test.ts — ProcessManager
| Test | Verifies |
|---|---|
| getAgentWorkdir | Path computation |
| createProjectWorktrees | Worktree creation for initiative projects |
| createProjectWorktrees / throws on failure | Error propagation |
| createStandaloneWorktree | Path returned |
| createStandaloneWorktree / throws on failure | Error propagation |
| spawnDetached / validates cwd exists | existsSync called |
| spawnDetached / throws if cwd missing | Error thrown |
| spawnDetached / passes correct cwd | spawn args verified |
| spawnDetached / writes prompt file | PROMPT.md written |
| buildSpawnCommand / native prompt mode | Command+args assembled |
| buildSpawnCommand / flag prompt mode | Command+args assembled |
| buildResumeCommand / flag style | Resume args assembled |
| buildResumeCommand / no resume support | Error thrown |
6 vi.mock() calls. Most brittle test file. Missing: killProcess, cleanupWorktree.
mock-manager.test.ts — MockAgentManager (testing the test double)
43 tests covering: default scenario, configured delay, error/questions scenarios, resume, stop, list, get/getByName, scenario overrides, event emission order, name uniqueness, constructor options, agent modes (execute/discuss/plan/detail), structured questions.
Thorough.
Agent Lifecycle (src/agent/lifecycle/)
error-analyzer.test.ts — 20 tests. Auth failure (4), usage limit (4), timeout (2), missing signal (3), process crash (4), unknown (2), context preservation (2). Pure behavioral classification tests.
instructions.test.ts — 6 tests. Constant content verification + addInstructionToPrompt pure function tests. Appropriate for 28-line module.
retry-policy.test.ts — 12 tests. Config (2), shouldRetry per error type (8), backoff delays (2). Clean decision-function tests. Missing: AgentExhaustedError/AgentFailureError class tests.
signal-manager.test.ts — 18 tests. clearSignal (2), checkSignalExists (2), readSignal (7), waitForSignal (3), validateSignalFile (4). Real filesystem, no mocks. waitForSignal has real-timer flakiness risk.
DB Repositories (src/db/repositories/drizzle/)
All use real in-memory SQLite via createTestDatabase(). No mocks. Integration tests against real SQL.
agent.test.ts — 20 tests. Full CRUD + findByStatus, findByTaskId, findBySessionId. Gap: never tests initiativeId field.
cascade.test.ts — 4 tests. High value. Validates entire FK cascade graph: initiative→phases→tasks→pages→junction rows. Verifies SET NULL on agents/conversations. Before/after assertions prove setup correctness.
initiative.test.ts — 12 tests. Standard CRUD + findByStatus. Simplest repo, clean.
message.test.ts — 22 tests. Most comprehensive repo test. Covers sender/recipient polymorphism, threading (parentMessageId), status lifecycle (pending→read→responded), filtering (findPendingForUser, findRequiringResponse, findReplies).
phase.test.ts — 21 tests. CRUD + 12 dependency tests (createDependency, getDependencies, getDependents, findDependenciesByInitiativeId). Direct-only (not transitive) verified. Cross-initiative scoping verified.
task.test.ts — 12 tests. Thinnest relative to feature surface. Missing: findByParentTaskId, findByInitiativeId, findPendingApproval, category filtering, requiresApproval behavior.
Shared pattern: 8 tests use setTimeout(resolve, 10) for timestamp differentiation (~80ms total delay).
Dispatch (src/dispatch/)
manager.test.ts — 13 tests. Queue, getNextDispatchable (priority, queuedAt ordering), completeTask, blockTask, dispatchNext (no tasks, no agents, success), getQueueState, dependency scenario. Real DB + mock event bus + mock agent manager. Gap: "dependency scenario" test admits dependencies aren't implemented — just retests priority ordering.
phase-manager.test.ts — 17 tests. queuePhase (status validation, deps), getNextDispatchablePhase (dependency blocking), dispatchNextPhase, completePhase, blockPhase, diamond dependency scenario. Best dependency tests: full diamond pattern (A→B, A→C, B→D, C→D) with step-by-step verification. Gap: mock DispatchManager never asserted (task-level dispatch not verified).
Coordination (src/coordination/)
manager.test.ts — 14 tests. queueMerge, getNextMergeable (priority ordering), processMerges (success path, conflict detection, resolution task creation, message creation), getQueueState, handleConflict, error handling (undefined repos).
conflict-resolution-service.test.ts — 10 tests. handleConflict (resolution task properties, original task blocking, agent message, event emission, optional deps, error cases, multiple conflicts, parentTaskId preservation).
Overlap: 5+ tests in manager.test.ts duplicate conflict-handling logic from conflict-resolution-service.test.ts.
Git (src/git/)
manager.test.ts — 22 tests. create, remove, list, get, diff (clean/added/modified/deleted), merge (success/conflict/abort/cleanup), edge cases. Real git repos on filesystem. No mocks. Excellent. Most confident test file in the suite.
simple-git-branch-manager.test.ts — 4 tests. remoteBranchExists only. Real git with bare repo + clones. Gap: only one method tested; other methods untested.
Process (src/process/)
manager.test.ts — 18 tests. spawn, stop, stopAll, restart, isRunning (live/dead/stopped), event emission. Uses mocked execa. Good separation of event bus concern.
registry.test.ts — 14 tests. Pure unit tests, no mocks. register, get, getAll, updateStatus, unregister, getByPid, clear, size. Exemplary.
Logging (src/logging/)
manager.test.ts — 13 tests. getBaseDir, ensureLogDir, ensureProcessDir, getProcessDir, getLogPath, listLogs (filtering), cleanOldLogs (retainDays). Real filesystem.
writer.test.ts — 17 tests. open, writeStdout/writeStderr (string/Buffer, timestamps, multi-line), close (flush, idempotent), write-after-close, append mode, event emission, backwards compatibility. Real filesystem + real EventBus.
Events (src/events/)
bus.test.ts — 10 tests. emit/on, event isolation, once, off, multiple handlers, typed events, timestamp preservation, factory. Pure behavioral tests. Gap: no handler-throws test, no handler ordering test.
Server (src/server/)
index.test.ts — 16 tests. Lifecycle (start/stop/double-start/PID conflict/idempotent stop/isRunning), HTTP endpoints (health/status/404/405), PID file management (create/remove/stale cleanup), event emission. Real HTTP server + real fetch. Gap: no port-already-in-use test.
shutdown.test.ts — 6 tests. shutdown() call order (server.stop before processManager.stopAll), signal handler installation (SIGTERM/SIGINT/SIGHUP). Gap: doesn't verify signal→handler→shutdown chain. Doesn't test error resilience (what if server.stop() throws).
tRPC Router (src/trpc/)
router.test.ts — 14 tests. health procedure (shape, Zod, uptime calc, null startedAt, processCount), status procedure (shape, Zod, startedAt, pid, uptime), Zod schema validation.
Major gap: Only health and status are tested. All domain procedures untested: initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews.
Preview (src/preview/)
compose-generator.test.ts — 10 tests. Pure function tests. YAML generation, build config handling, Caddy deps, Caddyfile routing (single/multi/internal/specificity), labels. No mocks. Strong.
config-reader.test.ts — 6 tests. parseCwPreviewConfig only. Minimal/multi-service, validation errors, internal services, build normalization. Gap: discoverConfig (filesystem discovery) is untested — always mocked in manager.test.ts.
manager.test.ts — 20 tests. start (full lifecycle, phaseId, Docker unavailable, project not found, compose-up failure, health-check failure, no healthcheck), stop (cleanup, missing labels), list (all/filtered/skip non-preview/skip incomplete), getStatus (running/failed/stopped/building/null), stopAll (all/error resilience/empty). 6 vi.mock() calls.
port-allocator.test.ts — 4 tests. Base port, skip used ports, gap filling, real TCP port contention. One real TCP server test. Gap: no upper-bound/exhaustion test.
E2E Tests (src/test/e2e/)
All use createTestHarness() with in-memory DB, MockAgentManager, MockWorktreeManager, real dispatch/coordination managers.
happy-path.test.ts — 6 tests. Single task flow, sequential deps, parallel dispatch, full merge flow, complex dependency flow (diamond), fixture dependency DB verification. Gap: merge flow requires manual agent+worktree bridging (harness not fully integrated).
architect-workflow.test.ts — 8 tests. Discuss mode (complete, pause/resume), plan mode (complete, persist phases), plan conflict detection (reject dup, auto-dismiss stale, different initiatives), full discuss→plan→phases workflow. Gap: architect output is never parsed — mock discards structured data.
decompose-workflow.test.ts — 10 tests. Detail mode (complete, pause/resume, multiple questions), detail conflict detection (3 tests), task persistence (create, all types, dependencies), full workflow. Gap: filename says "decompose" but describe says "Detail". Task dependency rows never verified in DB.
phase-dispatch.test.ts — 7 tests. Cleanest E2E file. Independent parallel dispatch, sequential deps, diamond dependency (A→B,C→D), approval gate (pending/in_progress rejection), blocked phase (no dispatch, downstream blocking). No fake timers, no agent involvement.
recovery-scenarios.test.ts — 9 tests. Queue state in DB, in-progress task recovery after crash, blocked state persistence, merge queue recovery, agent Q&A (question/resume, structured questions, status transitions, multiple questions).
edge-cases.test.ts — 12 tests. Agent crash (events, task status, error message), agent waiting (1 test), task blocking (4 tests), merge conflict handling (4 tests).
extended-scenarios.test.ts — 6 tests. Conflict hand-back round-trip (full lifecycle, context preservation, sequential conflicts), multi-agent parallel work (parallel completion, merge ordering, mixed outcomes).
Integration Tests (src/test/integration/)
agent-workdir-verification.test.ts — 2 tests. Skipped by default (REAL_WORKDIR_TESTS). Spawns real Claude agent, verifies working directory path via diagnostic files and agent-created files.
crash-race-condition.test.ts — 3 tests. Always runs. Tests signal.json-based completion detection vs crash marking. Covers questions, no-signal crash, and slim-wildebeest scenario. Accesses private methods via as any.
real-claude.test.ts — 3 tests. Skipped by default. Direct Claude CLI contract tests: done/questions/error status outputs validated against Zod schema. ~$0.10.
Real Provider Tests (src/test/integration/real-providers/)
All skipped by default. ~$0.50 total per full run.
claude-manager.test.ts — 7 tests. Output parsing (text_delta, init/session, result), questions flow, session resume (2 tests), error handling. Resume test accepts crashed as valid (too lenient).
codex-manager.test.ts — 3 tests. Spawn, output parsing, provider config. All accept crashed as valid — pass vacuously when Codex isn't installed.
conversation.test.ts — 1 test. Most ambitious: two real Claude agents doing actual coding while communicating via cw ask/listen/answer. Verifies 2 conversations, file creation, content, and interleaved coding+conversation. ~$0.30, 5min timeout.
crash-recovery.test.ts — 4 tests. Server restart simulation (resume streaming, mark dead agent, process output for dead agent), event idempotency (no duplicate events on restart).
schema-retry.test.ts — 9 tests. Valid output (done, questions, multiple questions), retry logic (bad→good, code blocks, surrounding text), mode-specific schemas (discuss, plan, detail). Tests 5-6 have conditional assertions (silently pass on crash).
Remaining Redundancy
Tests covering the same scenario multiple times.
| Scenario | Tests |
|---|---|
| Task blocking (DB status) | edge-cases.test.ts, recovery-scenarios.test.ts |
| Merge conflict → resolution task | edge-cases.test.ts, extended-scenarios.test.ts (full round-trip) |
| Conflict resolution task creation | coordination/manager.test.ts (5 tests), conflict-resolution-service.test.ts (10 tests) |
Coverage Gaps (Priority Order)
Critical
-
tRPC domain procedures entirely untested — initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews. Only
health/statushave tests. This is the single largest gap. -
Agent output parsing never tested E2E — Architect helpers (
setArchitectPlanComplete,setArchitectDetailComplete) discard structured output. The pipeline from agent output → DB persistence is exercised nowhere. -
task.test.tsthin relative to feature surface — Missing:findByParentTaskId,findByInitiativeId,findPendingApproval,approveTask, category filtering,requiresApprovalfield behavior. These features exist in production but have zero direct unit test coverage.
Important
-
discoverConfig(preview config discovery) untested — The filesystem walk logic (.cw-preview.yml → docker-compose.yml → Dockerfile) is always mocked away. -
docker-client.jsandhealth-checker.jshave no unit tests — Always mocked. No verification of the Docker shell-out layer. -
Signal handler → shutdown chain untested —
shutdown.test.tsverifies listeners are installed andshutdown()works when called directly, but never tests signal → handler → shutdown. -
No concurrent dispatch/merge tests — For a multi-agent system, racing
dispatchNext()or concurrent merges are never exercised. -
processAgentOutputmissing error status test — "done" and "questions" tested, "error" not tested.
Minor
-
EventBus handler-throws behavior untested — Does emit swallow or propagate handler errors?
-
Port allocator upper-bound/exhaustion untested — What happens when all ports 9100-9200 are taken?
-
error-analyzer→retry-policyintegration untested — Each tested independently but the chain (analyzeErroroutput fed toshouldRetry) is never verified end-to-end. -
Git
BranchManagerincomplete — OnlyremoteBranchExiststested. Other methods (if any) untested. -
ProcessManager.killProcessandcleanupWorktreeuntested — No unit test for these methods.
Fragility Assessment
High Fragility (refactoring will break tests)
agent/process-manager.test.ts— 6vi.mock()calls. Any import restructuring breaks tests.agent/manager.test.ts— 5vi.mock()calls. Same concern.preview/manager.test.ts— 6vi.mock()calls. Same concern.crash-race-condition.test.ts— Accesses privatehandleCompletionviaas any.
Low Fragility (behavioral tests)
- DB repository tests — Real in-memory SQLite, test public API only.
git/manager.test.ts— Real git repos, no mocks.file-io.test.ts— Real filesystem, pure function tests.process/registry.test.ts— Pure unit tests, no mocks.preview/compose-generator.test.ts— Pure function tests.events/bus.test.ts— Pure behavioral tests.- E2E tests — Test through public harness API.
Mock Style Inconsistencies
The test suite uses 4 different mocking patterns with no shared convention:
| Pattern | Files | Concern |
|---|---|---|
vi.mock() module-level |
manager.test.ts, process-manager.test.ts, preview/manager.test.ts | Import-structure coupled |
Factory function (createMockX()) |
output-handler.test.ts, coordination/.test.ts, server/.test.ts | Cleaner but copy-pasted |
| Inline const object | mutex-completion.test.ts | Simple but mutated per-test |
| Hand-rolled class | harness.ts (MockWorktreeManager, CapturingEventBus) | Best for shared infrastructure |
createMockEventBus() is independently defined in 5+ files. Should be extracted to shared test utility.