Files

Lukas May 1540039c52 test: Remove redundant and dead tests (-743 lines)

Delete 3 files:
- completion-detection.test.ts (private method tests, covered by crash-race-condition)
- completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition)
- real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths)

Remove individual tests:
- crash-race-condition.test.ts #4 (weaker duplicate of #2)
- mock-manager.test.ts duplicate "(second test)" for detail_complete
- process-manager.test.ts 2 "logs comprehensive" tests with empty assertions
- edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios

Update test-inventory.md to reflect removals.

2026-03-02 12:57:27 +09:00

23 KiB

Raw Blame History

Test Inventory

Comprehensive catalog of every test in the suite, what it verifies, and where coverage is weak or redundant.

41 test files | 33 unit | 7 E2E | 3 integration (mocked) | 5 integration (real providers)

Last audited: 2026-03-02

Coverage Matrix

Subjective 0–10 scores. 0 = no coverage, 5 = happy paths only, 10 = comprehensive with edge cases.

Module	Happy Path	Error/Edge	Integration	Race/Concurrency	Notes
Agent Manager	8	6	6	9	Missing: `getResult` with actual result, multi-provider spawn
Agent OutputHandler	6	4	5	8	Missing: `processAgentOutput` with error status, no-result-line case
Agent FileIO	8	7	8	—	Missing: round-trip write→read, `writeInputFiles` content verification
Agent ProcessManager	6	4	3	0	Heavy mocking; missing: `killProcess`, `cleanupWorktree`, error paths
Agent Lifecycle	8	7	3	0	Missing: controller integration, `analyzeError→shouldRetry` chain
DB Repositories	9	7	9	—	task.test.ts thin: missing `findByParentTaskId`, approval, category queries
DB Cascades	10	—	10	—	Excellent: full graph verified
Dispatch Manager	7	5	8	0	"dependency scenario" test is misleading (deps not implemented)
Phase Dispatch	9	7	8	0	Best dependency tests (diamond pattern, blocking propagation)
Coordination/Merge	7	5	7	0	Overlap with ConflictResolutionService
Git Manager	9	7	9	0	Real git repos, no mocks. Excellent.
Git BranchManager	6	5	6	—	Only `remoteBranchExists` tested. Other methods untested.
Process Manager/Registry	8	6	—	0	Registry is pure and clean. Manager has mock fragility.
Logging	8	6	8	0	Real filesystem. Good.
Events Bus	8	3	—	0	Missing: handler-throws, handler ordering
Server/HTTP	8	6	9	0	Real HTTP + real PID files
Shutdown	5	2	—	0	Missing: signal→handler→shutdown chain, error resilience
tRPC Router	3	1	3	0	Only `health` and `status` tested. All domain procedures untested.
Preview	7	5	4	0	Missing: `discoverConfig`, `docker-client`, `health-checker` unit tests
E2E Workflows	8	6	8	0	Good lifecycle coverage. Output parsing never tested E2E.

Test Files — Detailed Inventory

Agent Module (`src/agent/`)

`manager.test.ts` — MultiProviderAgentManager

Test	Verifies
spawn / creates agent record with provided name	`repository.create` called with name
spawn / rejects duplicate agent names	Unique name constraint enforced
spawn / emits AgentSpawned event with name	Event has agent name in payload
spawn / writes diagnostic files for workdir verification	`spawn-diagnostic.json` written with metadata
spawn / uses custom cwd if provided	`child_process.spawn` gets custom cwd
stop / stops running agent and updates status	DB updated to `stopped`
stop / kills detached process if running	Status update after spawn+stop (does NOT verify kill)
stop / throws if agent not found	Error on missing ID
stop / emits AgentStopped event	Event has `user_requested` reason
list / returns all agents with names	`findAll` returns correct data
get / finds agent by id	`findById` returns agent
get / returns null if not found	`findById` returns null
getByName / finds agent by name	`findByName` returns agent
getByName / returns null if not found	`findByName` returns null
resume / resumes agent waiting for input	Spawn called with `--resume` + session args
resume / rejects if not waiting for input	Error on wrong status
resume / rejects if no session	Error on null sessionId
resume / emits AgentResumed event	Event has sessionId
getResult / returns null when no result	Agent result is null
delete / deletes agent and clears active state	`repository.delete` called
delete / emits agent:deleted event	Event has agent name
delete / throws if not found	Error on missing ID
delete / handles missing workdir gracefully	No error when workdir absent

Mocking: 5 vi.mock() calls (child_process, git/manager, node:fs, node:fs/promises, file-tailer). Brittle. Gaps: No test for getResult with actual result. No multi-provider tests. findByTaskId/findBySessionId mocked but never exercised.

`output-handler.test.ts` — OutputHandler

Test	Verifies
processAgentOutput / parse questions from Claude output	`waiting_for_input` status + questions stored
processAgentOutput / handle malformed questions	Falls to `crashed` status
processAgentOutput / handle "done" status	No `waiting_for_input` set
getPendingQuestions / retrieve stored questions	JSON round-trip from DB
getPendingQuestions / return null when none	Null for no questions
formatAnswersAsPrompt / format normal answers	String formatting correct
formatAnswersAsPrompt / instruction-enhanced answers	`__instruction__` prepended
formatAnswersAsPrompt / instruction with whitespace	Trimmed correctly
formatAnswersAsPrompt / only instruction, no answers	Instruction + header only
formatAnswersAsPrompt / empty answers	Header with no entries

Gaps: No test for processAgentOutput with "error" status. No test for output with no result line. Uses real provider registry (fragile).

`mutex-completion.test.ts` — handleCompletion mutex

Test	Verifies
prevent concurrent completion	Same agentId blocked
allow sequential completion	Same agentId allowed after first finishes
clean up on exception	Lock released on error
per-agent mutex isolation	Different agentIds run concurrently

Excellent focused design. 4 complementary scenarios. Uses real timers (50ms delays).

`file-io.test.ts` — File I/O utilities

Test	Verifies
generateId / non-empty string	Returns truthy
generateId / unique values	100 unique IDs
writeInputFiles / initiative.md	File created
writeInputFiles / phase.md	File created
writeInputFiles / task.md	File created
writeInputFiles / pages	Files in pages/ subdir
writeInputFiles / empty options	No crash
readSummary / with frontmatter	Parsed correctly
readSummary / missing file	Returns null
readSummary / no frontmatter	Body only
readSummary / empty files_modified	Empty array
readPhaseFiles / from phases/ dir	Parsed with frontmatter
readPhaseFiles / missing dir	Returns []
readPhaseFiles / no dependencies	Defaults to []
readTaskFiles / from tasks/ dir	Parsed with frontmatter
readTaskFiles / defaults category/type	`execute`/`auto` defaults
readTaskFiles / missing dir	Returns []
readDecisionFiles / from decisions/ dir	Parsed
readDecisionFiles / missing dir	Returns []
readPageFiles / from pages/ dir	Parsed with pageId
readPageFiles / missing dir	Returns []
readPageFiles / ignores non-.md	Filter works

Real filesystem, no mocks. Good. Gaps: writeInputFiles only checks existsSync, not file content. No write→read round-trip test.

`process-manager.test.ts` — ProcessManager

Test	Verifies
getAgentWorkdir	Path computation
createProjectWorktrees	Worktree creation for initiative projects
createProjectWorktrees / throws on failure	Error propagation
createStandaloneWorktree	Path returned
createStandaloneWorktree / throws on failure	Error propagation
spawnDetached / validates cwd exists	existsSync called
spawnDetached / throws if cwd missing	Error thrown
spawnDetached / passes correct cwd	spawn args verified
spawnDetached / writes prompt file	PROMPT.md written
buildSpawnCommand / native prompt mode	Command+args assembled
buildSpawnCommand / flag prompt mode	Command+args assembled
buildResumeCommand / flag style	Resume args assembled
buildResumeCommand / no resume support	Error thrown

6 vi.mock() calls. Most brittle test file. Missing: killProcess, cleanupWorktree.

`mock-manager.test.ts` — MockAgentManager (testing the test double)

43 tests covering: default scenario, configured delay, error/questions scenarios, resume, stop, list, get/getByName, scenario overrides, event emission order, name uniqueness, constructor options, agent modes (execute/discuss/plan/detail), structured questions.

Thorough.

Agent Lifecycle (`src/agent/lifecycle/`)

error-analyzer.test.ts — 20 tests. Auth failure (4), usage limit (4), timeout (2), missing signal (3), process crash (4), unknown (2), context preservation (2). Pure behavioral classification tests.

instructions.test.ts — 6 tests. Constant content verification + addInstructionToPrompt pure function tests. Appropriate for 28-line module.

retry-policy.test.ts — 12 tests. Config (2), shouldRetry per error type (8), backoff delays (2). Clean decision-function tests. Missing: AgentExhaustedError/AgentFailureError class tests.

signal-manager.test.ts — 18 tests. clearSignal (2), checkSignalExists (2), readSignal (7), waitForSignal (3), validateSignalFile (4). Real filesystem, no mocks. waitForSignal has real-timer flakiness risk.

DB Repositories (`src/db/repositories/drizzle/`)

All use real in-memory SQLite via createTestDatabase(). No mocks. Integration tests against real SQL.

agent.test.ts — 20 tests. Full CRUD + findByStatus, findByTaskId, findBySessionId. Gap: never tests initiativeId field.

cascade.test.ts — 4 tests. High value. Validates entire FK cascade graph: initiative→phases→tasks→pages→junction rows. Verifies SET NULL on agents/conversations. Before/after assertions prove setup correctness.

initiative.test.ts — 12 tests. Standard CRUD + findByStatus. Simplest repo, clean.

message.test.ts — 22 tests. Most comprehensive repo test. Covers sender/recipient polymorphism, threading (parentMessageId), status lifecycle (pending→read→responded), filtering (findPendingForUser, findRequiringResponse, findReplies).

phase.test.ts — 21 tests. CRUD + 12 dependency tests (createDependency, getDependencies, getDependents, findDependenciesByInitiativeId). Direct-only (not transitive) verified. Cross-initiative scoping verified.

task.test.ts — 12 tests. Thinnest relative to feature surface. Missing: findByParentTaskId, findByInitiativeId, findPendingApproval, category filtering, requiresApproval behavior.

Shared pattern: 8 tests use setTimeout(resolve, 10) for timestamp differentiation (~80ms total delay).

Dispatch (`src/dispatch/`)

manager.test.ts — 13 tests. Queue, getNextDispatchable (priority, queuedAt ordering), completeTask, blockTask, dispatchNext (no tasks, no agents, success), getQueueState, dependency scenario. Real DB + mock event bus + mock agent manager. Gap: "dependency scenario" test admits dependencies aren't implemented — just retests priority ordering.

phase-manager.test.ts — 17 tests. queuePhase (status validation, deps), getNextDispatchablePhase (dependency blocking), dispatchNextPhase, completePhase, blockPhase, diamond dependency scenario. Best dependency tests: full diamond pattern (A→B, A→C, B→D, C→D) with step-by-step verification. Gap: mock DispatchManager never asserted (task-level dispatch not verified).

Coordination (`src/coordination/`)

manager.test.ts — 14 tests. queueMerge, getNextMergeable (priority ordering), processMerges (success path, conflict detection, resolution task creation, message creation), getQueueState, handleConflict, error handling (undefined repos).

conflict-resolution-service.test.ts — 10 tests. handleConflict (resolution task properties, original task blocking, agent message, event emission, optional deps, error cases, multiple conflicts, parentTaskId preservation).

Overlap: 5+ tests in manager.test.ts duplicate conflict-handling logic from conflict-resolution-service.test.ts.

Git (`src/git/`)

manager.test.ts — 22 tests. create, remove, list, get, diff (clean/added/modified/deleted), merge (success/conflict/abort/cleanup), edge cases. Real git repos on filesystem. No mocks. Excellent. Most confident test file in the suite.

simple-git-branch-manager.test.ts — 4 tests. remoteBranchExists only. Real git with bare repo + clones. Gap: only one method tested; other methods untested.

Process (`src/process/`)

manager.test.ts — 18 tests. spawn, stop, stopAll, restart, isRunning (live/dead/stopped), event emission. Uses mocked execa. Good separation of event bus concern.

registry.test.ts — 14 tests. Pure unit tests, no mocks. register, get, getAll, updateStatus, unregister, getByPid, clear, size. Exemplary.

Logging (`src/logging/`)

manager.test.ts — 13 tests. getBaseDir, ensureLogDir, ensureProcessDir, getProcessDir, getLogPath, listLogs (filtering), cleanOldLogs (retainDays). Real filesystem.

writer.test.ts — 17 tests. open, writeStdout/writeStderr (string/Buffer, timestamps, multi-line), close (flush, idempotent), write-after-close, append mode, event emission, backwards compatibility. Real filesystem + real EventBus.

Events (`src/events/`)

bus.test.ts — 10 tests. emit/on, event isolation, once, off, multiple handlers, typed events, timestamp preservation, factory. Pure behavioral tests. Gap: no handler-throws test, no handler ordering test.

Server (`src/server/`)

index.test.ts — 16 tests. Lifecycle (start/stop/double-start/PID conflict/idempotent stop/isRunning), HTTP endpoints (health/status/404/405), PID file management (create/remove/stale cleanup), event emission. Real HTTP server + real fetch. Gap: no port-already-in-use test.

shutdown.test.ts — 6 tests. shutdown() call order (server.stop before processManager.stopAll), signal handler installation (SIGTERM/SIGINT/SIGHUP). Gap: doesn't verify signal→handler→shutdown chain. Doesn't test error resilience (what if server.stop() throws).

tRPC Router (`src/trpc/`)

router.test.ts — 14 tests. health procedure (shape, Zod, uptime calc, null startedAt, processCount), status procedure (shape, Zod, startedAt, pid, uptime), Zod schema validation.

Major gap: Only health and status are tested. All domain procedures untested: initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews.

Preview (`src/preview/`)

compose-generator.test.ts — 10 tests. Pure function tests. YAML generation, build config handling, Caddy deps, Caddyfile routing (single/multi/internal/specificity), labels. No mocks. Strong.

config-reader.test.ts — 6 tests. parseCwPreviewConfig only. Minimal/multi-service, validation errors, internal services, build normalization. Gap: discoverConfig (filesystem discovery) is untested — always mocked in manager.test.ts.

manager.test.ts — 20 tests. start (full lifecycle, phaseId, Docker unavailable, project not found, compose-up failure, health-check failure, no healthcheck), stop (cleanup, missing labels), list (all/filtered/skip non-preview/skip incomplete), getStatus (running/failed/stopped/building/null), stopAll (all/error resilience/empty). 6 vi.mock() calls.

port-allocator.test.ts — 4 tests. Base port, skip used ports, gap filling, real TCP port contention. One real TCP server test. Gap: no upper-bound/exhaustion test.

E2E Tests (`src/test/e2e/`)

All use createTestHarness() with in-memory DB, MockAgentManager, MockWorktreeManager, real dispatch/coordination managers.

happy-path.test.ts — 6 tests. Single task flow, sequential deps, parallel dispatch, full merge flow, complex dependency flow (diamond), fixture dependency DB verification. Gap: merge flow requires manual agent+worktree bridging (harness not fully integrated).

architect-workflow.test.ts — 8 tests. Discuss mode (complete, pause/resume), plan mode (complete, persist phases), plan conflict detection (reject dup, auto-dismiss stale, different initiatives), full discuss→plan→phases workflow. Gap: architect output is never parsed — mock discards structured data.

decompose-workflow.test.ts — 10 tests. Detail mode (complete, pause/resume, multiple questions), detail conflict detection (3 tests), task persistence (create, all types, dependencies), full workflow. Gap: filename says "decompose" but describe says "Detail". Task dependency rows never verified in DB.

phase-dispatch.test.ts — 7 tests. Cleanest E2E file. Independent parallel dispatch, sequential deps, diamond dependency (A→B,C→D), approval gate (pending/in_progress rejection), blocked phase (no dispatch, downstream blocking). No fake timers, no agent involvement.

recovery-scenarios.test.ts — 9 tests. Queue state in DB, in-progress task recovery after crash, blocked state persistence, merge queue recovery, agent Q&A (question/resume, structured questions, status transitions, multiple questions).

edge-cases.test.ts — 12 tests. Agent crash (events, task status, error message), agent waiting (1 test), task blocking (4 tests), merge conflict handling (4 tests).

extended-scenarios.test.ts — 6 tests. Conflict hand-back round-trip (full lifecycle, context preservation, sequential conflicts), multi-agent parallel work (parallel completion, merge ordering, mixed outcomes).

Integration Tests (`src/test/integration/`)

agent-workdir-verification.test.ts — 2 tests. Skipped by default (REAL_WORKDIR_TESTS). Spawns real Claude agent, verifies working directory path via diagnostic files and agent-created files.

crash-race-condition.test.ts — 3 tests. Always runs. Tests signal.json-based completion detection vs crash marking. Covers questions, no-signal crash, and slim-wildebeest scenario. Accesses private methods via as any.

real-claude.test.ts — 3 tests. Skipped by default. Direct Claude CLI contract tests: done/questions/error status outputs validated against Zod schema. ~$0.10.

Real Provider Tests (`src/test/integration/real-providers/`)

All skipped by default. ~$0.50 total per full run.

claude-manager.test.ts — 7 tests. Output parsing (text_delta, init/session, result), questions flow, session resume (2 tests), error handling. Resume test accepts crashed as valid (too lenient).

codex-manager.test.ts — 3 tests. Spawn, output parsing, provider config. All accept crashed as valid — pass vacuously when Codex isn't installed.

conversation.test.ts — 1 test. Most ambitious: two real Claude agents doing actual coding while communicating via cw ask/listen/answer. Verifies 2 conversations, file creation, content, and interleaved coding+conversation. ~$0.30, 5min timeout.

crash-recovery.test.ts — 4 tests. Server restart simulation (resume streaming, mark dead agent, process output for dead agent), event idempotency (no duplicate events on restart).

schema-retry.test.ts — 9 tests. Valid output (done, questions, multiple questions), retry logic (bad→good, code blocks, surrounding text), mode-specific schemas (discuss, plan, detail). Tests 5-6 have conditional assertions (silently pass on crash).

Remaining Redundancy

Tests covering the same scenario multiple times.

Scenario	Tests
Task blocking (DB status)	`edge-cases.test.ts`, `recovery-scenarios.test.ts`
Merge conflict → resolution task	`edge-cases.test.ts`, `extended-scenarios.test.ts` (full round-trip)
Conflict resolution task creation	`coordination/manager.test.ts` (5 tests), `conflict-resolution-service.test.ts` (10 tests)

Coverage Gaps (Priority Order)

Critical

tRPC domain procedures entirely untested — initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews. Only health/status have tests. This is the single largest gap.
Agent output parsing never tested E2E — Architect helpers (setArchitectPlanComplete, setArchitectDetailComplete) discard structured output. The pipeline from agent output → DB persistence is exercised nowhere.
task.test.ts thin relative to feature surface — Missing: findByParentTaskId, findByInitiativeId, findPendingApproval, approveTask, category filtering, requiresApproval field behavior. These features exist in production but have zero direct unit test coverage.

Important

discoverConfig (preview config discovery) untested — The filesystem walk logic (.cw-preview.yml → docker-compose.yml → Dockerfile) is always mocked away.
docker-client.js and health-checker.js have no unit tests — Always mocked. No verification of the Docker shell-out layer.
Signal handler → shutdown chain untested — shutdown.test.ts verifies listeners are installed and shutdown() works when called directly, but never tests signal → handler → shutdown.
No concurrent dispatch/merge tests — For a multi-agent system, racing dispatchNext() or concurrent merges are never exercised.
processAgentOutput missing error status test — "done" and "questions" tested, "error" not tested.

Minor

EventBus handler-throws behavior untested — Does emit swallow or propagate handler errors?
Port allocator upper-bound/exhaustion untested — What happens when all ports 9100-9200 are taken?
error-analyzer → retry-policy integration untested — Each tested independently but the chain (analyzeError output fed to shouldRetry) is never verified end-to-end.
Git BranchManager incomplete — Only remoteBranchExists tested. Other methods (if any) untested.
ProcessManager.killProcess and cleanupWorktree untested — No unit test for these methods.

Fragility Assessment

High Fragility (refactoring will break tests)

agent/process-manager.test.ts — 6 vi.mock() calls. Any import restructuring breaks tests.
agent/manager.test.ts — 5 vi.mock() calls. Same concern.
preview/manager.test.ts — 6 vi.mock() calls. Same concern.
crash-race-condition.test.ts — Accesses private handleCompletion via as any.

Low Fragility (behavioral tests)

DB repository tests — Real in-memory SQLite, test public API only.
git/manager.test.ts — Real git repos, no mocks.
file-io.test.ts — Real filesystem, pure function tests.
process/registry.test.ts — Pure unit tests, no mocks.
preview/compose-generator.test.ts — Pure function tests.
events/bus.test.ts — Pure behavioral tests.
E2E tests — Test through public harness API.

Mock Style Inconsistencies

The test suite uses 4 different mocking patterns with no shared convention:

Pattern	Files	Concern
`vi.mock()` module-level	manager.test.ts, process-manager.test.ts, preview/manager.test.ts	Import-structure coupled
Factory function (`createMockX()`)	output-handler.test.ts, coordination/.test.ts, server/.test.ts	Cleaner but copy-pasted
Inline const object	mutex-completion.test.ts	Simple but mutated per-test
Hand-rolled class	harness.ts (MockWorktreeManager, CapturingEventBus)	Best for shared infrastructure

createMockEventBus() is independently defined in 5+ files. Should be extracted to shared test utility.

23 KiB Raw Blame History Unescape Escape

Test Inventory

Coverage Matrix

Test Files — Detailed Inventory

Agent Module (src/agent/)

manager.test.ts — MultiProviderAgentManager

output-handler.test.ts — OutputHandler

mutex-completion.test.ts — handleCompletion mutex

file-io.test.ts — File I/O utilities

process-manager.test.ts — ProcessManager

mock-manager.test.ts — MockAgentManager (testing the test double)

Agent Lifecycle (src/agent/lifecycle/)

DB Repositories (src/db/repositories/drizzle/)

Dispatch (src/dispatch/)

Coordination (src/coordination/)

Git (src/git/)

Process (src/process/)

Logging (src/logging/)

Events (src/events/)

Server (src/server/)

tRPC Router (src/trpc/)

Preview (src/preview/)

E2E Tests (src/test/e2e/)

Integration Tests (src/test/integration/)

Real Provider Tests (src/test/integration/real-providers/)