Move src/ → apps/server/ and packages/web/ → apps/web/ to adopt standard monorepo conventions (apps/ for runnable apps, packages/ for reusable libraries). Update all config files, shared package imports, test fixtures, and documentation to reflect new paths. Key fixes: - Update workspace config to ["apps/*", "packages/*"] - Update tsconfig.json rootDir/include for apps/server/ - Add apps/web/** to vitest exclude list - Update drizzle.config.ts schema path - Fix ensure-schema.ts migration path detection (3 levels up in dev, 2 levels up in dist) - Fix tests/integration/cli-server.test.ts import paths - Update packages/shared imports to apps/server/ paths - Update all docs/ files with new paths
10 KiB
Testing
apps/server/test/ — Test infrastructure, fixtures, and test suites.
Framework
vitest (Vite-native test runner)
Test Categories
Unit Tests
Located alongside source files (*.test.ts):
apps/server/agent/*.test.ts— Manager, output handler, completion detection, file I/O, process managerapps/server/db/repositories/drizzle/*.test.ts— Repository adaptersapps/server/dispatch/*.test.ts— Dispatch managerapps/server/git/manager.test.ts— Worktree operationsapps/server/process/*.test.ts— Process registry and managerapps/server/logging/*.test.ts— Log manager and writer
E2E Tests (Mocked Agents)
apps/server/test/e2e/:
| File | Scenarios |
|---|---|
happy-path.test.ts |
Single task, parallel, complex flows |
architect-workflow.test.ts |
Discussion + plan agent workflows |
detail-workflow.test.ts |
Task detail with child tasks |
phase-dispatch.test.ts |
Phase-level dispatch with dependencies |
recovery-scenarios.test.ts |
Crash recovery, agent resume |
edge-cases.test.ts |
Boundary conditions |
extended-scenarios.test.ts |
Advanced multi-phase workflows |
These use MockAgentManager which bypasses the real subprocess pipeline. They test dispatch/coordination logic only.
Cassette Tests (Pipeline Integration, Zero API Cost)
apps/server/test/cassette/ — Tests the full agent execution pipeline using pre-recorded cassettes.
Unlike E2E tests, cassette tests exercise the real ProcessManager → FileTailer → OutputHandler → SignalManager path. Unlike real provider tests, they cost nothing to run in CI.
See Cassette System below for full documentation.
Integration Tests (Real Providers)
apps/server/test/integration/real-providers/ — skipped by default (cost real money):
| File | Provider | Cost |
|---|---|---|
claude-manager.test.ts |
Claude CLI | ~$0.10 |
codex-manager.test.ts |
Codex | varies |
schema-retry.test.ts |
Claude CLI | ~$0.10 |
crash-recovery.test.ts |
Claude CLI | ~$0.10 |
Enable with env vars: REAL_CLAUDE_TESTS=1, REAL_CODEX_TESTS=1
Test Infrastructure
TestHarness (apps/server/test/harness.ts)
Central test utility providing:
- In-memory SQLite database with schema applied
- All 10 repository instances
MockAgentManager— simulates agent behavior (done, questions, error)MockWorktreeManager— in-memory worktree simulatorCapturingEventBus— captures events for assertionsDefaultDispatchManagerandDefaultPhaseDispatchManager- 25+ helper methods for test scenarios
Fixtures (apps/server/test/fixtures.ts)
Pre-built task hierarchies for testing:
| Fixture | Structure |
|---|---|
SIMPLE_FIXTURE |
1 initiative → 1 phase → 1 group → 3 tasks (A→B, A→C deps) |
PARALLEL_FIXTURE |
1 initiative → 1 phase → 2 groups → 4 independent tasks |
COMPLEX_FIXTURE |
1 initiative → 2 phases → 4 groups → cross-phase dependencies |
Real Provider Harness (apps/server/test/integration/real-providers/harness.ts)
- Creates real database, real agent manager with real CLI tools
- Provides
describeRealClaude()/describeRealCodex()that skip when env var not set MINIMAL_PROMPTS— cheap prompts for testing output parsing
Test Inventory
See test-inventory.md for a complete catalog of every test, what it verifies, coverage gaps, redundancy map, and fragility assessment.
Running Tests
# Unit + E2E tests (no API cost)
npm test
# Specific test file
npm test -- apps/server/agent/manager.test.ts
# Cassette tests — replay pre-recorded cassettes (no API cost)
npm test -- apps/server/test/cassette/
# Record new cassettes locally (requires real Claude CLI)
CW_CASSETTE_RECORD=1 npm test -- apps/server/test/integration/real-providers/claude-manager.test.ts
# Real provider tests (costs money!)
REAL_CLAUDE_TESTS=1 npm test -- apps/server/test/integration/real-providers/ --test-timeout=300000
Cassette System
apps/server/test/cassette/ — VCR-style recording and replay for the agent subprocess pipeline.
Why it exists
The MockAgentManager used in E2E tests skips from "spawn called" directly to "agent:stopped emitted". It never exercises ProcessManager, FileTailer, OutputHandler, or SignalManager. Bugs in those layers (signal.json race conditions, JSONL parsing failures, crash detection) are invisible to E2E tests.
Real provider tests do exercise those layers, but they are slow, expensive, and can't run in CI without credentials.
Cassette tests bridge this gap: they run the real MultiProviderAgentManager pipeline but replace the live Claude/Codex subprocess with a replay worker that writes pre-recorded output.
Coverage the cassette layer adds
FileTailer— fs.watch + poll cycle, incremental JSONL readingOutputHandler— stream event parsing, signal detection, result captureSignalManager— signal.json read/write/timingLifecycleController— retry logic, missing signal recoveryProcessManager— subprocess PID tracking, poll-for-completion- Prompt normalization drift detection — key mismatch = re-record = visible diff
Key generation
Each cassette is identified by a SHA256 hash of four components:
| Component | What it captures |
|---|---|
normalizedPrompt |
Prompt with UUIDs, temp paths, timestamps, session numbers replaced with placeholders |
providerName |
e.g. claude, codex |
modelArgs |
Provider CLI args with the prompt value stripped (sorted for stability) |
worktreeHash |
SHA256 of all non-hidden files in the agent worktree at spawn time |
The worktreeHash is what detects content drift for execute-mode agents: if the worktree changes, the key misses and the cassette is re-recorded.
Normalization (src/test/cassette/normalizer.ts) strips dynamic content that varies between runs but doesn't affect agent behavior:
- UUIDs →
__UUID__ - Workspace root path →
__WORKSPACE__ - ISO 8601 timestamps →
__TIMESTAMP__ - Unix epoch milliseconds →
__EPOCH__ - Session numbers →
session__N__
If a prompt template changes (e.g. someone edits buildExecutePrompt()), the normalized hash changes → cassette miss → test fails in CI → developer must re-record → the diff shows the new agent response in the PR. This makes prompt drift auditable.
Cassette file format
Cassettes live in src/test/cassettes/<32-char-hash>.json and are committed to git.
{
"version": 1,
"key": {
"normalizedPrompt": "You are a Worker agent...",
"providerName": "claude",
"modelArgs": ["--dangerously-skip-permissions", "--verbose", "--output-format", "stream-json"],
"worktreeHash": "empty"
},
"recording": {
"jsonlLines": [
"{\"type\":\"system\",\"session_id\":\"abc\"}",
"{\"type\":\"result\",\"subtype\":\"success\",\"result\":\"ok\"}"
],
"signalJson": { "status": "done", "message": "Task complete" },
"exitCode": 0,
"recordedAt": "2026-03-02T12:00:00.000Z"
}
}
How replay works
CassetteProcessManager (extends ProcessManager) overrides two methods:
-
spawnDetached()— on a cache hit, spawnsreplay-worker.mjsinstead of the real CLI. The worker writes the recorded JSONL lines to stdout (whichspawnDetachedredirects to the output file via fd) and writessignal.jsonrelative to its cwd. Everything above —FileTailer,OutputHandler, poll loop — runs unmodified. -
pollForCompletion()— on a cache miss (record mode), wraps theonCompletecallback to read the output file andsignal.jsonafter the process exits, then saves the cassette before handing off toOutputHandler.
MultiProviderAgentManager accepts an optional processManagerOverride constructor parameter so CassetteProcessManager can be injected without changing production callers.
Mode control
| Env var | Mode | Behaviour |
|---|---|---|
| (none) | replay |
Cassette must exist; throws if missing. Safe for CI. |
CW_CASSETTE_RECORD=1 |
auto |
Replays if cassette exists, runs real agent and records if missing. |
CW_CASSETTE_FORCE_RECORD=1 |
record |
Always runs real agent; overwrites existing cassette. Use when prompt changed intentionally. |
Writing cassette tests
import { createCassetteHarness } from '../cassette/index.js';
import { MINIMAL_PROMPTS } from '../integration/real-providers/prompts.js';
import type { RealProviderHarness } from '../integration/real-providers/harness.js';
describe('agent pipeline (cassette)', () => {
let harness: RealProviderHarness;
beforeAll(async () => {
harness = await createCassetteHarness({ provider: 'claude' });
});
afterAll(() => harness.cleanup());
it('completes a task and emits agent:stopped', async () => {
const agent = await harness.agentManager.spawn({
taskId: null,
prompt: MINIMAL_PROMPTS.done,
mode: 'execute',
provider: 'claude',
});
const result = await harness.waitForAgentCompletion(agent.id);
expect(result?.success).toBe(true);
const stopped = harness.getEventsByType('agent:stopped');
expect(stopped).toHaveLength(1);
});
});
createCassetteHarness() returns a RealProviderHarness, so tests written for real providers work unchanged.
Cassette directory
apps/server/test/cassettes/
<hash>.json ← committed to git; one file per recorded scenario
.gitkeep
Cassettes are committed so CI can run without any AI API credentials. When a cassette needs updating (prompt changed, provider output format changed), re-record locally with CW_CASSETTE_RECORD=1 and commit the updated file.
Files
| File | Purpose |
|---|---|
types.ts |
CassetteKey, CassetteRecording, CassetteEntry interfaces |
normalizer.ts |
normalizePrompt(), stripPromptFromArgs() |
key.ts |
hashWorktreeFiles(), buildCassetteKey() |
store.ts |
CassetteStore — find/save cassette JSON files |
replay-worker.mjs |
Subprocess that replays a cassette (plain JS ESM, no build step) |
process-manager.ts |
CassetteProcessManager — overrides spawnDetached and pollForCompletion |
harness.ts |
createCassetteHarness() — factory returning RealProviderHarness |
index.ts |
Barrel exports |
cassette.test.ts |
Unit tests for normalizer, key generation, and store |