Files
Codewalkers/docs/testing.md

10 KiB

Testing

src/test/ — Test infrastructure, fixtures, and test suites.

Framework

vitest (Vite-native test runner)

Test Categories

Unit Tests

Located alongside source files (*.test.ts):

  • src/agent/*.test.ts — Manager, output handler, completion detection, file I/O, process manager
  • src/db/repositories/drizzle/*.test.ts — Repository adapters
  • src/dispatch/*.test.ts — Dispatch manager
  • src/git/manager.test.ts — Worktree operations
  • src/process/*.test.ts — Process registry and manager
  • src/logging/*.test.ts — Log manager and writer

E2E Tests (Mocked Agents)

src/test/e2e/:

File Scenarios
happy-path.test.ts Single task, parallel, complex flows
architect-workflow.test.ts Discussion + plan agent workflows
detail-workflow.test.ts Task detail with child tasks
phase-dispatch.test.ts Phase-level dispatch with dependencies
recovery-scenarios.test.ts Crash recovery, agent resume
edge-cases.test.ts Boundary conditions
extended-scenarios.test.ts Advanced multi-phase workflows

These use MockAgentManager which bypasses the real subprocess pipeline. They test dispatch/coordination logic only.

Cassette Tests (Pipeline Integration, Zero API Cost)

src/test/cassette/ — Tests the full agent execution pipeline using pre-recorded cassettes.

Unlike E2E tests, cassette tests exercise the real ProcessManager → FileTailer → OutputHandler → SignalManager path. Unlike real provider tests, they cost nothing to run in CI.

See Cassette System below for full documentation.

Integration Tests (Real Providers)

src/test/integration/real-providers/skipped by default (cost real money):

File Provider Cost
claude-manager.test.ts Claude CLI ~$0.10
codex-manager.test.ts Codex varies
schema-retry.test.ts Claude CLI ~$0.10
crash-recovery.test.ts Claude CLI ~$0.10

Enable with env vars: REAL_CLAUDE_TESTS=1, REAL_CODEX_TESTS=1

Test Infrastructure

TestHarness (src/test/harness.ts)

Central test utility providing:

  • In-memory SQLite database with schema applied
  • All 10 repository instances
  • MockAgentManager — simulates agent behavior (done, questions, error)
  • MockWorktreeManager — in-memory worktree simulator
  • CapturingEventBus — captures events for assertions
  • DefaultDispatchManager and DefaultPhaseDispatchManager
  • 25+ helper methods for test scenarios

Fixtures (src/test/fixtures.ts)

Pre-built task hierarchies for testing:

Fixture Structure
SIMPLE_FIXTURE 1 initiative → 1 phase → 1 group → 3 tasks (A→B, A→C deps)
PARALLEL_FIXTURE 1 initiative → 1 phase → 2 groups → 4 independent tasks
COMPLEX_FIXTURE 1 initiative → 2 phases → 4 groups → cross-phase dependencies

Real Provider Harness (src/test/integration/real-providers/harness.ts)

  • Creates real database, real agent manager with real CLI tools
  • Provides describeRealClaude() / describeRealCodex() that skip when env var not set
  • MINIMAL_PROMPTS — cheap prompts for testing output parsing

Test Inventory

See test-inventory.md for a complete catalog of every test, what it verifies, coverage gaps, redundancy map, and fragility assessment.

Running Tests

# Unit + E2E tests (no API cost)
npm test

# Specific test file
npm test -- src/agent/manager.test.ts

# Cassette tests — replay pre-recorded cassettes (no API cost)
npm test -- src/test/cassette/

# Record new cassettes locally (requires real Claude CLI)
CW_CASSETTE_RECORD=1 npm test -- src/test/integration/real-providers/claude-manager.test.ts

# Real provider tests (costs money!)
REAL_CLAUDE_TESTS=1 npm test -- src/test/integration/real-providers/ --test-timeout=300000

Cassette System

src/test/cassette/ — VCR-style recording and replay for the agent subprocess pipeline.

Why it exists

The MockAgentManager used in E2E tests skips from "spawn called" directly to "agent:stopped emitted". It never exercises ProcessManager, FileTailer, OutputHandler, or SignalManager. Bugs in those layers (signal.json race conditions, JSONL parsing failures, crash detection) are invisible to E2E tests.

Real provider tests do exercise those layers, but they are slow, expensive, and can't run in CI without credentials.

Cassette tests bridge this gap: they run the real MultiProviderAgentManager pipeline but replace the live Claude/Codex subprocess with a replay worker that writes pre-recorded output.

Coverage the cassette layer adds

  • FileTailer — fs.watch + poll cycle, incremental JSONL reading
  • OutputHandler — stream event parsing, signal detection, result capture
  • SignalManager — signal.json read/write/timing
  • LifecycleController — retry logic, missing signal recovery
  • ProcessManager — subprocess PID tracking, poll-for-completion
  • Prompt normalization drift detection — key mismatch = re-record = visible diff

Key generation

Each cassette is identified by a SHA256 hash of four components:

Component What it captures
normalizedPrompt Prompt with UUIDs, temp paths, timestamps, session numbers replaced with placeholders
providerName e.g. claude, codex
modelArgs Provider CLI args with the prompt value stripped (sorted for stability)
worktreeHash SHA256 of all non-hidden files in the agent worktree at spawn time

The worktreeHash is what detects content drift for execute-mode agents: if the worktree changes, the key misses and the cassette is re-recorded.

Normalization (src/test/cassette/normalizer.ts) strips dynamic content that varies between runs but doesn't affect agent behavior:

  • UUIDs → __UUID__
  • Workspace root path → __WORKSPACE__
  • ISO 8601 timestamps → __TIMESTAMP__
  • Unix epoch milliseconds → __EPOCH__
  • Session numbers → session__N__

If a prompt template changes (e.g. someone edits buildExecutePrompt()), the normalized hash changes → cassette miss → test fails in CI → developer must re-record → the diff shows the new agent response in the PR. This makes prompt drift auditable.

Cassette file format

Cassettes live in src/test/cassettes/<32-char-hash>.json and are committed to git.

{
  "version": 1,
  "key": {
    "normalizedPrompt": "You are a Worker agent...",
    "providerName": "claude",
    "modelArgs": ["--dangerously-skip-permissions", "--verbose", "--output-format", "stream-json"],
    "worktreeHash": "empty"
  },
  "recording": {
    "jsonlLines": [
      "{\"type\":\"system\",\"session_id\":\"abc\"}",
      "{\"type\":\"result\",\"subtype\":\"success\",\"result\":\"ok\"}"
    ],
    "signalJson": { "status": "done", "message": "Task complete" },
    "exitCode": 0,
    "recordedAt": "2026-03-02T12:00:00.000Z"
  }
}

How replay works

CassetteProcessManager (extends ProcessManager) overrides two methods:

  1. spawnDetached() — on a cache hit, spawns replay-worker.mjs instead of the real CLI. The worker writes the recorded JSONL lines to stdout (which spawnDetached redirects to the output file via fd) and writes signal.json relative to its cwd. Everything above — FileTailer, OutputHandler, poll loop — runs unmodified.

  2. pollForCompletion() — on a cache miss (record mode), wraps the onComplete callback to read the output file and signal.json after the process exits, then saves the cassette before handing off to OutputHandler.

MultiProviderAgentManager accepts an optional processManagerOverride constructor parameter so CassetteProcessManager can be injected without changing production callers.

Mode control

Env var Mode Behaviour
(none) replay Cassette must exist; throws if missing. Safe for CI.
CW_CASSETTE_RECORD=1 auto Replays if cassette exists, runs real agent and records if missing.
CW_CASSETTE_FORCE_RECORD=1 record Always runs real agent; overwrites existing cassette. Use when prompt changed intentionally.

Writing cassette tests

import { createCassetteHarness } from '../cassette/index.js';
import { MINIMAL_PROMPTS } from '../integration/real-providers/prompts.js';
import type { RealProviderHarness } from '../integration/real-providers/harness.js';

describe('agent pipeline (cassette)', () => {
  let harness: RealProviderHarness;

  beforeAll(async () => {
    harness = await createCassetteHarness({ provider: 'claude' });
  });

  afterAll(() => harness.cleanup());

  it('completes a task and emits agent:stopped', async () => {
    const agent = await harness.agentManager.spawn({
      taskId: null,
      prompt: MINIMAL_PROMPTS.done,
      mode: 'execute',
      provider: 'claude',
    });

    const result = await harness.waitForAgentCompletion(agent.id);
    expect(result?.success).toBe(true);

    const stopped = harness.getEventsByType('agent:stopped');
    expect(stopped).toHaveLength(1);
  });
});

createCassetteHarness() returns a RealProviderHarness, so tests written for real providers work unchanged.

Cassette directory

src/test/cassettes/
  <hash>.json     ← committed to git; one file per recorded scenario
  .gitkeep

Cassettes are committed so CI can run without any AI API credentials. When a cassette needs updating (prompt changed, provider output format changed), re-record locally with CW_CASSETTE_RECORD=1 and commit the updated file.

Files

File Purpose
types.ts CassetteKey, CassetteRecording, CassetteEntry interfaces
normalizer.ts normalizePrompt(), stripPromptFromArgs()
key.ts hashWorktreeFiles(), buildCassetteKey()
store.ts CassetteStore — find/save cassette JSON files
replay-worker.mjs Subprocess that replays a cassette (plain JS ESM, no build step)
process-manager.ts CassetteProcessManager — overrides spawnDetached and pollForCompletion
harness.ts createCassetteHarness() — factory returning RealProviderHarness
index.ts Barrel exports
cassette.test.ts Unit tests for normalizer, key generation, and store