Move src/ → apps/server/ and packages/web/ → apps/web/ to adopt standard monorepo conventions (apps/ for runnable apps, packages/ for reusable libraries). Update all config files, shared package imports, test fixtures, and documentation to reflect new paths. Key fixes: - Update workspace config to ["apps/*", "packages/*"] - Update tsconfig.json rootDir/include for apps/server/ - Add apps/web/** to vitest exclude list - Update drizzle.config.ts schema path - Fix ensure-schema.ts migration path detection (3 levels up in dev, 2 levels up in dist) - Fix tests/integration/cli-server.test.ts import paths - Update packages/shared imports to apps/server/ paths - Update all docs/ files with new paths
246 lines
10 KiB
Markdown
246 lines
10 KiB
Markdown
# Testing
|
|
|
|
`apps/server/test/` — Test infrastructure, fixtures, and test suites.
|
|
|
|
## Framework
|
|
|
|
**vitest** (Vite-native test runner)
|
|
|
|
## Test Categories
|
|
|
|
### Unit Tests
|
|
Located alongside source files (`*.test.ts`):
|
|
- `apps/server/agent/*.test.ts` — Manager, output handler, completion detection, file I/O, process manager
|
|
- `apps/server/db/repositories/drizzle/*.test.ts` — Repository adapters
|
|
- `apps/server/dispatch/*.test.ts` — Dispatch manager
|
|
- `apps/server/git/manager.test.ts` — Worktree operations
|
|
- `apps/server/process/*.test.ts` — Process registry and manager
|
|
- `apps/server/logging/*.test.ts` — Log manager and writer
|
|
|
|
### E2E Tests (Mocked Agents)
|
|
`apps/server/test/e2e/`:
|
|
| File | Scenarios |
|
|
|------|-----------|
|
|
| `happy-path.test.ts` | Single task, parallel, complex flows |
|
|
| `architect-workflow.test.ts` | Discussion + plan agent workflows |
|
|
| `detail-workflow.test.ts` | Task detail with child tasks |
|
|
| `phase-dispatch.test.ts` | Phase-level dispatch with dependencies |
|
|
| `recovery-scenarios.test.ts` | Crash recovery, agent resume |
|
|
| `edge-cases.test.ts` | Boundary conditions |
|
|
| `extended-scenarios.test.ts` | Advanced multi-phase workflows |
|
|
|
|
These use `MockAgentManager` which bypasses the real subprocess pipeline. They test dispatch/coordination logic only.
|
|
|
|
### Cassette Tests (Pipeline Integration, Zero API Cost)
|
|
`apps/server/test/cassette/` — Tests the full agent execution pipeline using pre-recorded cassettes.
|
|
|
|
Unlike E2E tests, cassette tests exercise the real `ProcessManager → FileTailer → OutputHandler → SignalManager` path. Unlike real provider tests, they cost nothing to run in CI.
|
|
|
|
See **[Cassette System](#cassette-system)** below for full documentation.
|
|
|
|
### Integration Tests (Real Providers)
|
|
`apps/server/test/integration/real-providers/` — **skipped by default** (cost real money):
|
|
| File | Provider | Cost |
|
|
|------|----------|------|
|
|
| `claude-manager.test.ts` | Claude CLI | ~$0.10 |
|
|
| `codex-manager.test.ts` | Codex | varies |
|
|
| `schema-retry.test.ts` | Claude CLI | ~$0.10 |
|
|
| `crash-recovery.test.ts` | Claude CLI | ~$0.10 |
|
|
|
|
Enable with env vars: `REAL_CLAUDE_TESTS=1`, `REAL_CODEX_TESTS=1`
|
|
|
|
## Test Infrastructure
|
|
|
|
### TestHarness (`apps/server/test/harness.ts`)
|
|
Central test utility providing:
|
|
- In-memory SQLite database with schema applied
|
|
- All 10 repository instances
|
|
- `MockAgentManager` — simulates agent behavior (done, questions, error)
|
|
- `MockWorktreeManager` — in-memory worktree simulator
|
|
- `CapturingEventBus` — captures events for assertions
|
|
- `DefaultDispatchManager` and `DefaultPhaseDispatchManager`
|
|
- 25+ helper methods for test scenarios
|
|
|
|
### Fixtures (`apps/server/test/fixtures.ts`)
|
|
Pre-built task hierarchies for testing:
|
|
| Fixture | Structure |
|
|
|---------|-----------|
|
|
| `SIMPLE_FIXTURE` | 1 initiative → 1 phase → 1 group → 3 tasks (A→B, A→C deps) |
|
|
| `PARALLEL_FIXTURE` | 1 initiative → 1 phase → 2 groups → 4 independent tasks |
|
|
| `COMPLEX_FIXTURE` | 1 initiative → 2 phases → 4 groups → cross-phase dependencies |
|
|
|
|
### Real Provider Harness (`apps/server/test/integration/real-providers/harness.ts`)
|
|
- Creates real database, real agent manager with real CLI tools
|
|
- Provides `describeRealClaude()` / `describeRealCodex()` that skip when env var not set
|
|
- `MINIMAL_PROMPTS` — cheap prompts for testing output parsing
|
|
|
|
## Test Inventory
|
|
|
|
See **[test-inventory.md](test-inventory.md)** for a complete catalog of every test, what it verifies, coverage gaps, redundancy map, and fragility assessment.
|
|
|
|
## Running Tests
|
|
|
|
```sh
|
|
# Unit + E2E tests (no API cost)
|
|
npm test
|
|
|
|
# Specific test file
|
|
npm test -- apps/server/agent/manager.test.ts
|
|
|
|
# Cassette tests — replay pre-recorded cassettes (no API cost)
|
|
npm test -- apps/server/test/cassette/
|
|
|
|
# Record new cassettes locally (requires real Claude CLI)
|
|
CW_CASSETTE_RECORD=1 npm test -- apps/server/test/integration/real-providers/claude-manager.test.ts
|
|
|
|
# Real provider tests (costs money!)
|
|
REAL_CLAUDE_TESTS=1 npm test -- apps/server/test/integration/real-providers/ --test-timeout=300000
|
|
```
|
|
|
|
---
|
|
|
|
## Cassette System
|
|
|
|
`apps/server/test/cassette/` — VCR-style recording and replay for the agent subprocess pipeline.
|
|
|
|
### Why it exists
|
|
|
|
The `MockAgentManager` used in E2E tests skips from "spawn called" directly to "agent:stopped emitted". It never exercises `ProcessManager`, `FileTailer`, `OutputHandler`, or `SignalManager`. Bugs in those layers (signal.json race conditions, JSONL parsing failures, crash detection) are invisible to E2E tests.
|
|
|
|
Real provider tests do exercise those layers, but they are slow, expensive, and can't run in CI without credentials.
|
|
|
|
Cassette tests bridge this gap: they run the **real** `MultiProviderAgentManager` pipeline but replace the live Claude/Codex subprocess with a replay worker that writes pre-recorded output.
|
|
|
|
### Coverage the cassette layer adds
|
|
|
|
- `FileTailer` — fs.watch + poll cycle, incremental JSONL reading
|
|
- `OutputHandler` — stream event parsing, signal detection, result capture
|
|
- `SignalManager` — signal.json read/write/timing
|
|
- `LifecycleController` — retry logic, missing signal recovery
|
|
- `ProcessManager` — subprocess PID tracking, poll-for-completion
|
|
- Prompt normalization drift detection — key mismatch = re-record = visible diff
|
|
|
|
### Key generation
|
|
|
|
Each cassette is identified by a SHA256 hash of four components:
|
|
|
|
| Component | What it captures |
|
|
|-----------|-----------------|
|
|
| `normalizedPrompt` | Prompt with UUIDs, temp paths, timestamps, session numbers replaced with placeholders |
|
|
| `providerName` | e.g. `claude`, `codex` |
|
|
| `modelArgs` | Provider CLI args with the prompt value stripped (sorted for stability) |
|
|
| `worktreeHash` | SHA256 of all non-hidden files in the agent worktree at spawn time |
|
|
|
|
The `worktreeHash` is what detects content drift for execute-mode agents: if the worktree changes, the key misses and the cassette is re-recorded.
|
|
|
|
**Normalization** (`src/test/cassette/normalizer.ts`) strips dynamic content that varies between runs but doesn't affect agent behavior:
|
|
- UUIDs → `__UUID__`
|
|
- Workspace root path → `__WORKSPACE__`
|
|
- ISO 8601 timestamps → `__TIMESTAMP__`
|
|
- Unix epoch milliseconds → `__EPOCH__`
|
|
- Session numbers → `session__N__`
|
|
|
|
If a prompt *template* changes (e.g. someone edits `buildExecutePrompt()`), the normalized hash changes → cassette miss → test fails in CI → developer must re-record → the diff shows the new agent response in the PR. This makes prompt drift auditable.
|
|
|
|
### Cassette file format
|
|
|
|
Cassettes live in `src/test/cassettes/<32-char-hash>.json` and are committed to git.
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"key": {
|
|
"normalizedPrompt": "You are a Worker agent...",
|
|
"providerName": "claude",
|
|
"modelArgs": ["--dangerously-skip-permissions", "--verbose", "--output-format", "stream-json"],
|
|
"worktreeHash": "empty"
|
|
},
|
|
"recording": {
|
|
"jsonlLines": [
|
|
"{\"type\":\"system\",\"session_id\":\"abc\"}",
|
|
"{\"type\":\"result\",\"subtype\":\"success\",\"result\":\"ok\"}"
|
|
],
|
|
"signalJson": { "status": "done", "message": "Task complete" },
|
|
"exitCode": 0,
|
|
"recordedAt": "2026-03-02T12:00:00.000Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
### How replay works
|
|
|
|
`CassetteProcessManager` (extends `ProcessManager`) overrides two methods:
|
|
|
|
1. **`spawnDetached()`** — on a cache hit, spawns `replay-worker.mjs` instead of the real CLI. The worker writes the recorded JSONL lines to stdout (which `spawnDetached` redirects to the output file via fd) and writes `signal.json` relative to its cwd. Everything above — `FileTailer`, `OutputHandler`, poll loop — runs unmodified.
|
|
|
|
2. **`pollForCompletion()`** — on a cache miss (record mode), wraps the `onComplete` callback to read the output file and `signal.json` after the process exits, then saves the cassette before handing off to `OutputHandler`.
|
|
|
|
`MultiProviderAgentManager` accepts an optional `processManagerOverride` constructor parameter so `CassetteProcessManager` can be injected without changing production callers.
|
|
|
|
### Mode control
|
|
|
|
| Env var | Mode | Behaviour |
|
|
|---------|------|-----------|
|
|
| *(none)* | `replay` | Cassette must exist; throws if missing. Safe for CI. |
|
|
| `CW_CASSETTE_RECORD=1` | `auto` | Replays if cassette exists, runs real agent and records if missing. |
|
|
| `CW_CASSETTE_FORCE_RECORD=1` | `record` | Always runs real agent; overwrites existing cassette. Use when prompt changed intentionally. |
|
|
|
|
### Writing cassette tests
|
|
|
|
```ts
|
|
import { createCassetteHarness } from '../cassette/index.js';
|
|
import { MINIMAL_PROMPTS } from '../integration/real-providers/prompts.js';
|
|
import type { RealProviderHarness } from '../integration/real-providers/harness.js';
|
|
|
|
describe('agent pipeline (cassette)', () => {
|
|
let harness: RealProviderHarness;
|
|
|
|
beforeAll(async () => {
|
|
harness = await createCassetteHarness({ provider: 'claude' });
|
|
});
|
|
|
|
afterAll(() => harness.cleanup());
|
|
|
|
it('completes a task and emits agent:stopped', async () => {
|
|
const agent = await harness.agentManager.spawn({
|
|
taskId: null,
|
|
prompt: MINIMAL_PROMPTS.done,
|
|
mode: 'execute',
|
|
provider: 'claude',
|
|
});
|
|
|
|
const result = await harness.waitForAgentCompletion(agent.id);
|
|
expect(result?.success).toBe(true);
|
|
|
|
const stopped = harness.getEventsByType('agent:stopped');
|
|
expect(stopped).toHaveLength(1);
|
|
});
|
|
});
|
|
```
|
|
|
|
`createCassetteHarness()` returns a `RealProviderHarness`, so tests written for real providers work unchanged.
|
|
|
|
### Cassette directory
|
|
|
|
```
|
|
apps/server/test/cassettes/
|
|
<hash>.json ← committed to git; one file per recorded scenario
|
|
.gitkeep
|
|
```
|
|
|
|
Cassettes are committed so CI can run without any AI API credentials. When a cassette needs updating (prompt changed, provider output format changed), re-record locally with `CW_CASSETTE_RECORD=1` and commit the updated file.
|
|
|
|
### Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `types.ts` | `CassetteKey`, `CassetteRecording`, `CassetteEntry` interfaces |
|
|
| `normalizer.ts` | `normalizePrompt()`, `stripPromptFromArgs()` |
|
|
| `key.ts` | `hashWorktreeFiles()`, `buildCassetteKey()` |
|
|
| `store.ts` | `CassetteStore` — find/save cassette JSON files |
|
|
| `replay-worker.mjs` | Subprocess that replays a cassette (plain JS ESM, no build step) |
|
|
| `process-manager.ts` | `CassetteProcessManager` — overrides `spawnDetached` and `pollForCompletion` |
|
|
| `harness.ts` | `createCassetteHarness()` — factory returning `RealProviderHarness` |
|
|
| `index.ts` | Barrel exports |
|
|
| `cassette.test.ts` | Unit tests for normalizer, key generation, and store |
|