Files
Codewalkers/docs/test-inventory.md
Lukas May 1540039c52 test: Remove redundant and dead tests (-743 lines)
Delete 3 files:
- completion-detection.test.ts (private method tests, covered by crash-race-condition)
- completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition)
- real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths)

Remove individual tests:
- crash-race-condition.test.ts #4 (weaker duplicate of #2)
- mock-manager.test.ts duplicate "(second test)" for detail_complete
- process-manager.test.ts 2 "logs comprehensive" tests with empty assertions
- edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios

Update test-inventory.md to reflect removals.
2026-03-02 12:57:27 +09:00

405 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Test Inventory
Comprehensive catalog of every test in the suite, what it verifies, and where coverage is weak or redundant.
**41 test files** | **33 unit** | **7 E2E** | **3 integration (mocked)** | **5 integration (real providers)**
Last audited: 2026-03-02
---
## Coverage Matrix
Subjective 010 scores. 0 = no coverage, 5 = happy paths only, 10 = comprehensive with edge cases.
| Module | Happy Path | Error/Edge | Integration | Race/Concurrency | Notes |
|--------|:---------:|:----------:|:-----------:|:----------------:|-------|
| Agent Manager | 8 | 6 | 6 | 9 | Missing: `getResult` with actual result, multi-provider spawn |
| Agent OutputHandler | 6 | 4 | 5 | 8 | Missing: `processAgentOutput` with error status, no-result-line case |
| Agent FileIO | 8 | 7 | 8 | — | Missing: round-trip write→read, `writeInputFiles` content verification |
| Agent ProcessManager | 6 | 4 | 3 | 0 | Heavy mocking; missing: `killProcess`, `cleanupWorktree`, error paths |
| Agent Lifecycle | 8 | 7 | 3 | 0 | Missing: controller integration, `analyzeError→shouldRetry` chain |
| DB Repositories | 9 | 7 | 9 | — | task.test.ts thin: missing `findByParentTaskId`, approval, category queries |
| DB Cascades | 10 | — | 10 | — | Excellent: full graph verified |
| Dispatch Manager | 7 | 5 | 8 | 0 | "dependency scenario" test is misleading (deps not implemented) |
| Phase Dispatch | 9 | 7 | 8 | 0 | Best dependency tests (diamond pattern, blocking propagation) |
| Coordination/Merge | 7 | 5 | 7 | 0 | Overlap with ConflictResolutionService |
| Git Manager | 9 | 7 | 9 | 0 | Real git repos, no mocks. Excellent. |
| Git BranchManager | 6 | 5 | 6 | — | Only `remoteBranchExists` tested. Other methods untested. |
| Process Manager/Registry | 8 | 6 | — | 0 | Registry is pure and clean. Manager has mock fragility. |
| Logging | 8 | 6 | 8 | 0 | Real filesystem. Good. |
| Events Bus | 8 | 3 | — | 0 | Missing: handler-throws, handler ordering |
| Server/HTTP | 8 | 6 | 9 | 0 | Real HTTP + real PID files |
| Shutdown | 5 | 2 | — | 0 | Missing: signal→handler→shutdown chain, error resilience |
| tRPC Router | 3 | 1 | 3 | 0 | **Only `health` and `status` tested. All domain procedures untested.** |
| Preview | 7 | 5 | 4 | 0 | Missing: `discoverConfig`, `docker-client`, `health-checker` unit tests |
| E2E Workflows | 8 | 6 | 8 | 0 | Good lifecycle coverage. Output parsing never tested E2E. |
---
## Test Files — Detailed Inventory
### Agent Module (`src/agent/`)
#### `manager.test.ts` — MultiProviderAgentManager
| Test | Verifies |
|------|----------|
| spawn / creates agent record with provided name | `repository.create` called with name |
| spawn / rejects duplicate agent names | Unique name constraint enforced |
| spawn / emits AgentSpawned event with name | Event has agent name in payload |
| spawn / writes diagnostic files for workdir verification | `spawn-diagnostic.json` written with metadata |
| spawn / uses custom cwd if provided | `child_process.spawn` gets custom cwd |
| stop / stops running agent and updates status | DB updated to `stopped` |
| stop / kills detached process if running | Status update after spawn+stop (does NOT verify kill) |
| stop / throws if agent not found | Error on missing ID |
| stop / emits AgentStopped event | Event has `user_requested` reason |
| list / returns all agents with names | `findAll` returns correct data |
| get / finds agent by id | `findById` returns agent |
| get / returns null if not found | `findById` returns null |
| getByName / finds agent by name | `findByName` returns agent |
| getByName / returns null if not found | `findByName` returns null |
| resume / resumes agent waiting for input | Spawn called with `--resume` + session args |
| resume / rejects if not waiting for input | Error on wrong status |
| resume / rejects if no session | Error on null sessionId |
| resume / emits AgentResumed event | Event has sessionId |
| getResult / returns null when no result | Agent result is null |
| delete / deletes agent and clears active state | `repository.delete` called |
| delete / emits agent:deleted event | Event has agent name |
| delete / throws if not found | Error on missing ID |
| delete / handles missing workdir gracefully | No error when workdir absent |
**Mocking**: 5 `vi.mock()` calls (child_process, git/manager, node:fs, node:fs/promises, file-tailer). Brittle.
**Gaps**: No test for `getResult` with actual result. No multi-provider tests. `findByTaskId`/`findBySessionId` mocked but never exercised.
---
#### `output-handler.test.ts` — OutputHandler
| Test | Verifies |
|------|----------|
| processAgentOutput / parse questions from Claude output | `waiting_for_input` status + questions stored |
| processAgentOutput / handle malformed questions | Falls to `crashed` status |
| processAgentOutput / handle "done" status | No `waiting_for_input` set |
| getPendingQuestions / retrieve stored questions | JSON round-trip from DB |
| getPendingQuestions / return null when none | Null for no questions |
| formatAnswersAsPrompt / format normal answers | String formatting correct |
| formatAnswersAsPrompt / instruction-enhanced answers | `__instruction__` prepended |
| formatAnswersAsPrompt / instruction with whitespace | Trimmed correctly |
| formatAnswersAsPrompt / only instruction, no answers | Instruction + header only |
| formatAnswersAsPrompt / empty answers | Header with no entries |
**Gaps**: No test for `processAgentOutput` with `"error"` status. No test for output with no result line. Uses real provider registry (fragile).
---
#### `mutex-completion.test.ts` — handleCompletion mutex
| Test | Verifies |
|------|----------|
| prevent concurrent completion | Same agentId blocked |
| allow sequential completion | Same agentId allowed after first finishes |
| clean up on exception | Lock released on error |
| per-agent mutex isolation | Different agentIds run concurrently |
**Excellent focused design.** 4 complementary scenarios. Uses real timers (50ms delays).
---
#### `file-io.test.ts` — File I/O utilities
| Test | Verifies |
|------|----------|
| generateId / non-empty string | Returns truthy |
| generateId / unique values | 100 unique IDs |
| writeInputFiles / initiative.md | File created |
| writeInputFiles / phase.md | File created |
| writeInputFiles / task.md | File created |
| writeInputFiles / pages | Files in pages/ subdir |
| writeInputFiles / empty options | No crash |
| readSummary / with frontmatter | Parsed correctly |
| readSummary / missing file | Returns null |
| readSummary / no frontmatter | Body only |
| readSummary / empty files_modified | Empty array |
| readPhaseFiles / from phases/ dir | Parsed with frontmatter |
| readPhaseFiles / missing dir | Returns [] |
| readPhaseFiles / no dependencies | Defaults to [] |
| readTaskFiles / from tasks/ dir | Parsed with frontmatter |
| readTaskFiles / defaults category/type | `execute`/`auto` defaults |
| readTaskFiles / missing dir | Returns [] |
| readDecisionFiles / from decisions/ dir | Parsed |
| readDecisionFiles / missing dir | Returns [] |
| readPageFiles / from pages/ dir | Parsed with pageId |
| readPageFiles / missing dir | Returns [] |
| readPageFiles / ignores non-.md | Filter works |
**Real filesystem, no mocks. Good.** Gaps: `writeInputFiles` only checks `existsSync`, not file content. No write→read round-trip test.
---
#### `process-manager.test.ts` — ProcessManager
| Test | Verifies |
|------|----------|
| getAgentWorkdir | Path computation |
| createProjectWorktrees | Worktree creation for initiative projects |
| createProjectWorktrees / throws on failure | Error propagation |
| createStandaloneWorktree | Path returned |
| createStandaloneWorktree / throws on failure | Error propagation |
| spawnDetached / validates cwd exists | existsSync called |
| spawnDetached / throws if cwd missing | Error thrown |
| spawnDetached / passes correct cwd | spawn args verified |
| spawnDetached / writes prompt file | PROMPT.md written |
| buildSpawnCommand / native prompt mode | Command+args assembled |
| buildSpawnCommand / flag prompt mode | Command+args assembled |
| buildResumeCommand / flag style | Resume args assembled |
| buildResumeCommand / no resume support | Error thrown |
**6 `vi.mock()` calls.** Most brittle test file. Missing: `killProcess`, `cleanupWorktree`.
---
#### `mock-manager.test.ts` — MockAgentManager (testing the test double)
43 tests covering: default scenario, configured delay, error/questions scenarios, resume, stop, list, get/getByName, scenario overrides, event emission order, name uniqueness, constructor options, agent modes (execute/discuss/plan/detail), structured questions.
**Thorough.**
---
#### Agent Lifecycle (`src/agent/lifecycle/`)
**`error-analyzer.test.ts`** — 20 tests. Auth failure (4), usage limit (4), timeout (2), missing signal (3), process crash (4), unknown (2), context preservation (2). Pure behavioral classification tests.
**`instructions.test.ts`** — 6 tests. Constant content verification + `addInstructionToPrompt` pure function tests. Appropriate for 28-line module.
**`retry-policy.test.ts`** — 12 tests. Config (2), shouldRetry per error type (8), backoff delays (2). Clean decision-function tests. Missing: `AgentExhaustedError`/`AgentFailureError` class tests.
**`signal-manager.test.ts`** — 18 tests. clearSignal (2), checkSignalExists (2), readSignal (7), waitForSignal (3), validateSignalFile (4). Real filesystem, no mocks. `waitForSignal` has real-timer flakiness risk.
---
### DB Repositories (`src/db/repositories/drizzle/`)
All use real in-memory SQLite via `createTestDatabase()`. No mocks. Integration tests against real SQL.
**`agent.test.ts`** — 20 tests. Full CRUD + findByStatus, findByTaskId, findBySessionId. Gap: never tests `initiativeId` field.
**`cascade.test.ts`** — 4 tests. **High value.** Validates entire FK cascade graph: initiative→phases→tasks→pages→junction rows. Verifies SET NULL on agents/conversations. Before/after assertions prove setup correctness.
**`initiative.test.ts`** — 12 tests. Standard CRUD + findByStatus. Simplest repo, clean.
**`message.test.ts`** — 22 tests. **Most comprehensive repo test.** Covers sender/recipient polymorphism, threading (parentMessageId), status lifecycle (pending→read→responded), filtering (findPendingForUser, findRequiringResponse, findReplies).
**`phase.test.ts`** — 21 tests. CRUD + 12 dependency tests (createDependency, getDependencies, getDependents, findDependenciesByInitiativeId). Direct-only (not transitive) verified. Cross-initiative scoping verified.
**`task.test.ts`** — 12 tests. **Thinnest relative to feature surface.** Missing: `findByParentTaskId`, `findByInitiativeId`, `findPendingApproval`, category filtering, `requiresApproval` behavior.
**Shared pattern**: 8 tests use `setTimeout(resolve, 10)` for timestamp differentiation (~80ms total delay).
---
### Dispatch (`src/dispatch/`)
**`manager.test.ts`** — 13 tests. Queue, getNextDispatchable (priority, queuedAt ordering), completeTask, blockTask, dispatchNext (no tasks, no agents, success), getQueueState, dependency scenario. Real DB + mock event bus + mock agent manager. Gap: "dependency scenario" test admits dependencies aren't implemented — just retests priority ordering.
**`phase-manager.test.ts`** — 17 tests. queuePhase (status validation, deps), getNextDispatchablePhase (dependency blocking), dispatchNextPhase, completePhase, blockPhase, diamond dependency scenario. **Best dependency tests**: full diamond pattern (A→B, A→C, B→D, C→D) with step-by-step verification. Gap: mock DispatchManager never asserted (task-level dispatch not verified).
---
### Coordination (`src/coordination/`)
**`manager.test.ts`** — 14 tests. queueMerge, getNextMergeable (priority ordering), processMerges (success path, conflict detection, resolution task creation, message creation), getQueueState, handleConflict, error handling (undefined repos).
**`conflict-resolution-service.test.ts`** — 10 tests. handleConflict (resolution task properties, original task blocking, agent message, event emission, optional deps, error cases, multiple conflicts, parentTaskId preservation).
**Overlap**: 5+ tests in `manager.test.ts` duplicate conflict-handling logic from `conflict-resolution-service.test.ts`.
---
### Git (`src/git/`)
**`manager.test.ts`** — 22 tests. create, remove, list, get, diff (clean/added/modified/deleted), merge (success/conflict/abort/cleanup), edge cases. **Real git repos on filesystem. No mocks. Excellent.** Most confident test file in the suite.
**`simple-git-branch-manager.test.ts`** — 4 tests. `remoteBranchExists` only. Real git with bare repo + clones. Gap: only one method tested; other methods untested.
---
### Process (`src/process/`)
**`manager.test.ts`** — 18 tests. spawn, stop, stopAll, restart, isRunning (live/dead/stopped), event emission. Uses mocked `execa`. Good separation of event bus concern.
**`registry.test.ts`** — 14 tests. **Pure unit tests, no mocks.** register, get, getAll, updateStatus, unregister, getByPid, clear, size. Exemplary.
---
### Logging (`src/logging/`)
**`manager.test.ts`** — 13 tests. getBaseDir, ensureLogDir, ensureProcessDir, getProcessDir, getLogPath, listLogs (filtering), cleanOldLogs (retainDays). Real filesystem.
**`writer.test.ts`** — 17 tests. open, writeStdout/writeStderr (string/Buffer, timestamps, multi-line), close (flush, idempotent), write-after-close, append mode, event emission, backwards compatibility. Real filesystem + real EventBus.
---
### Events (`src/events/`)
**`bus.test.ts`** — 10 tests. emit/on, event isolation, once, off, multiple handlers, typed events, timestamp preservation, factory. Pure behavioral tests. Gap: no handler-throws test, no handler ordering test.
---
### Server (`src/server/`)
**`index.test.ts`** — 16 tests. Lifecycle (start/stop/double-start/PID conflict/idempotent stop/isRunning), HTTP endpoints (health/status/404/405), PID file management (create/remove/stale cleanup), event emission. Real HTTP server + real fetch. Gap: no port-already-in-use test.
**`shutdown.test.ts`** — 6 tests. shutdown() call order (server.stop before processManager.stopAll), signal handler installation (SIGTERM/SIGINT/SIGHUP). Gap: **doesn't verify signal→handler→shutdown chain**. Doesn't test error resilience (what if server.stop() throws).
---
### tRPC Router (`src/trpc/`)
**`router.test.ts`** — 14 tests. health procedure (shape, Zod, uptime calc, null startedAt, processCount), status procedure (shape, Zod, startedAt, pid, uptime), Zod schema validation.
**Major gap**: **Only `health` and `status` are tested.** All domain procedures untested: initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews.
---
### Preview (`src/preview/`)
**`compose-generator.test.ts`** — 10 tests. Pure function tests. YAML generation, build config handling, Caddy deps, Caddyfile routing (single/multi/internal/specificity), labels. No mocks. Strong.
**`config-reader.test.ts`** — 6 tests. `parseCwPreviewConfig` only. Minimal/multi-service, validation errors, internal services, build normalization. Gap: `discoverConfig` (filesystem discovery) is untested — always mocked in manager.test.ts.
**`manager.test.ts`** — 20 tests. start (full lifecycle, phaseId, Docker unavailable, project not found, compose-up failure, health-check failure, no healthcheck), stop (cleanup, missing labels), list (all/filtered/skip non-preview/skip incomplete), getStatus (running/failed/stopped/building/null), stopAll (all/error resilience/empty). 6 `vi.mock()` calls.
**`port-allocator.test.ts`** — 4 tests. Base port, skip used ports, gap filling, real TCP port contention. One real TCP server test. Gap: no upper-bound/exhaustion test.
---
### E2E Tests (`src/test/e2e/`)
All use `createTestHarness()` with in-memory DB, MockAgentManager, MockWorktreeManager, real dispatch/coordination managers.
**`happy-path.test.ts`** — 6 tests. Single task flow, sequential deps, parallel dispatch, full merge flow, complex dependency flow (diamond), fixture dependency DB verification. Gap: merge flow requires manual agent+worktree bridging (harness not fully integrated).
**`architect-workflow.test.ts`** — 8 tests. Discuss mode (complete, pause/resume), plan mode (complete, persist phases), plan conflict detection (reject dup, auto-dismiss stale, different initiatives), full discuss→plan→phases workflow. Gap: architect output is never parsed — mock discards structured data.
**`decompose-workflow.test.ts`** — 10 tests. Detail mode (complete, pause/resume, multiple questions), detail conflict detection (3 tests), task persistence (create, all types, dependencies), full workflow. Gap: filename says "decompose" but describe says "Detail". Task dependency rows never verified in DB.
**`phase-dispatch.test.ts`** — 7 tests. **Cleanest E2E file.** Independent parallel dispatch, sequential deps, diamond dependency (A→B,C→D), approval gate (pending/in_progress rejection), blocked phase (no dispatch, downstream blocking). No fake timers, no agent involvement.
**`recovery-scenarios.test.ts`** — 9 tests. Queue state in DB, in-progress task recovery after crash, blocked state persistence, merge queue recovery, agent Q&A (question/resume, structured questions, status transitions, multiple questions).
**`edge-cases.test.ts`** — 12 tests. Agent crash (events, task status, error message), agent waiting (1 test), task blocking (4 tests), merge conflict handling (4 tests).
**`extended-scenarios.test.ts`** — 6 tests. Conflict hand-back round-trip (full lifecycle, context preservation, sequential conflicts), multi-agent parallel work (parallel completion, merge ordering, mixed outcomes).
---
### Integration Tests (`src/test/integration/`)
**`agent-workdir-verification.test.ts`** — 2 tests. Skipped by default (`REAL_WORKDIR_TESTS`). Spawns real Claude agent, verifies working directory path via diagnostic files and agent-created files.
**`crash-race-condition.test.ts`** — 3 tests. Always runs. Tests signal.json-based completion detection vs crash marking. Covers questions, no-signal crash, and slim-wildebeest scenario. Accesses private methods via `as any`.
**`real-claude.test.ts`** — 3 tests. Skipped by default. Direct Claude CLI contract tests: done/questions/error status outputs validated against Zod schema. ~$0.10.
---
### Real Provider Tests (`src/test/integration/real-providers/`)
All skipped by default. ~$0.50 total per full run.
**`claude-manager.test.ts`** — 7 tests. Output parsing (text_delta, init/session, result), questions flow, session resume (2 tests), error handling. Resume test accepts `crashed` as valid (too lenient).
**`codex-manager.test.ts`** — 3 tests. Spawn, output parsing, provider config. **All accept `crashed` as valid** — pass vacuously when Codex isn't installed.
**`conversation.test.ts`** — 1 test. **Most ambitious**: two real Claude agents doing actual coding while communicating via `cw ask`/`listen`/`answer`. Verifies 2 conversations, file creation, content, and interleaved coding+conversation. ~$0.30, 5min timeout.
**`crash-recovery.test.ts`** — 4 tests. Server restart simulation (resume streaming, mark dead agent, process output for dead agent), event idempotency (no duplicate events on restart).
**`schema-retry.test.ts`** — 9 tests. Valid output (done, questions, multiple questions), retry logic (bad→good, code blocks, surrounding text), mode-specific schemas (discuss, plan, detail). Tests 5-6 have conditional assertions (silently pass on crash).
---
## Remaining Redundancy
Tests covering the same scenario multiple times.
| Scenario | Tests |
|----------|-------|
| Task blocking (DB status) | `edge-cases.test.ts`, `recovery-scenarios.test.ts` |
| Merge conflict → resolution task | `edge-cases.test.ts`, `extended-scenarios.test.ts` (full round-trip) |
| Conflict resolution task creation | `coordination/manager.test.ts` (5 tests), `conflict-resolution-service.test.ts` (10 tests) |
---
## Coverage Gaps (Priority Order)
### Critical
1. **tRPC domain procedures entirely untested** — initiatives, tasks, phases, agents, pages, proposals, accounts, conversations, previews. Only `health`/`status` have tests. This is the single largest gap.
2. **Agent output parsing never tested E2E** — Architect helpers (`setArchitectPlanComplete`, `setArchitectDetailComplete`) discard structured output. The pipeline from agent output → DB persistence is exercised nowhere.
3. **`task.test.ts` thin relative to feature surface** — Missing: `findByParentTaskId`, `findByInitiativeId`, `findPendingApproval`, `approveTask`, category filtering, `requiresApproval` field behavior. These features exist in production but have zero direct unit test coverage.
### Important
4. **`discoverConfig` (preview config discovery) untested** — The filesystem walk logic (.cw-preview.yml → docker-compose.yml → Dockerfile) is always mocked away.
5. **`docker-client.js` and `health-checker.js` have no unit tests** — Always mocked. No verification of the Docker shell-out layer.
6. **Signal handler → shutdown chain untested**`shutdown.test.ts` verifies listeners are installed and `shutdown()` works when called directly, but never tests signal → handler → shutdown.
7. **No concurrent dispatch/merge tests** — For a multi-agent system, racing `dispatchNext()` or concurrent merges are never exercised.
8. **`processAgentOutput` missing error status test** — "done" and "questions" tested, "error" not tested.
### Minor
9. **EventBus handler-throws behavior untested** — Does emit swallow or propagate handler errors?
10. **Port allocator upper-bound/exhaustion untested** — What happens when all ports 9100-9200 are taken?
11. **`error-analyzer``retry-policy` integration untested** — Each tested independently but the chain (`analyzeError` output fed to `shouldRetry`) is never verified end-to-end.
12. **Git `BranchManager` incomplete** — Only `remoteBranchExists` tested. Other methods (if any) untested.
13. **`ProcessManager.killProcess` and `cleanupWorktree` untested** — No unit test for these methods.
---
## Fragility Assessment
### High Fragility (refactoring will break tests)
- **`agent/process-manager.test.ts`** — 6 `vi.mock()` calls. Any import restructuring breaks tests.
- **`agent/manager.test.ts`** — 5 `vi.mock()` calls. Same concern.
- **`preview/manager.test.ts`** — 6 `vi.mock()` calls. Same concern.
- **`crash-race-condition.test.ts`** — Accesses private `handleCompletion` via `as any`.
### Low Fragility (behavioral tests)
- **DB repository tests** — Real in-memory SQLite, test public API only.
- **`git/manager.test.ts`** — Real git repos, no mocks.
- **`file-io.test.ts`** — Real filesystem, pure function tests.
- **`process/registry.test.ts`** — Pure unit tests, no mocks.
- **`preview/compose-generator.test.ts`** — Pure function tests.
- **`events/bus.test.ts`** — Pure behavioral tests.
- **E2E tests** — Test through public harness API.
---
## Mock Style Inconsistencies
The test suite uses 4 different mocking patterns with no shared convention:
| Pattern | Files | Concern |
|---------|-------|---------|
| `vi.mock()` module-level | manager.test.ts, process-manager.test.ts, preview/manager.test.ts | Import-structure coupled |
| Factory function (`createMockX()`) | output-handler.test.ts, coordination/*.test.ts, server/*.test.ts | Cleaner but copy-pasted |
| Inline const object | mutex-completion.test.ts | Simple but mutated per-test |
| Hand-rolled class | harness.ts (MockWorktreeManager, CapturingEventBus) | Best for shared infrastructure |
`createMockEventBus()` is independently defined in 5+ files. Should be extracted to shared test utility.