Audited all 44 test files one by one. Documents what each test verifies,
identifies 12 redundant test pairs, 13 coverage gaps (prioritized), fragility
assessment, and mock style inconsistencies.
Implements cassette recording/replay to test the full agent execution
pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager)
without real AI API calls.
Key components:
- `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached
to replay cassettes or record real runs on completion
- `replay-worker.mjs`: standalone node script that replays JSONL + signal.json
as a subprocess, exercising the complete file-based output pipeline
- `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash
- `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps,
session numbers) from prompts for stable cassette keys
- `key.ts`: hashes normalized prompt + provider args + worktree file content
(worktree hash detects content drift for execute-mode agents)
- `createCassetteHarness()`: wraps RealProviderHarness with cassette support,
same interface so existing real-provider tests work unchanged
Mode control via env vars:
(default) → replay: cassette must exist (safe for CI)
CW_CASSETTE_RECORD=1 → auto: replay if exists, record if missing
CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette
MultiProviderAgentManager gains an optional `processManagerOverride` constructor
parameter for clean dependency injection without changing existing callers.
Cassette files live in src/test/cassettes/ and are intended to be committed
to git so CI runs without API access.
- Add withFakeTimers(fn) helper to TestHarness for scoped timer control
- Replace all vi.runAllTimersAsync() with harness.advanceTimers() in E2E
and harness tests (37 call sites across 5 files)
- Keep vi.useFakeTimers() per-test activation pattern (intentional)
- Add @vitest/coverage-v8 dep so `npm run test:coverage` actually works
- Add exclude patterns to vitest config (node_modules, dist, packages)
- Replace dynamic import('vitest') in advanceTimers with direct vi import
Nulls out agents.initiativeId before deleting the initiative row,
ensuring the delete succeeds even on databases where migration 0025
(which adds ON DELETE SET NULL to the FK) hasn't been applied.
Dispatch manager now wraps task descriptions with buildExecutePrompt()
so agents receive the full execution protocol. Update test to match
prompt wrapping. Add worktree isolation note to workspace layout.
Drop redundant Specificity Test section (covered by examples and checklist),
remove Task Design Rules (implied by entire prompt), flatten frontmatter
docs, trim good example, tighten sizing/checkpoint/context sections.
Remove CODEBASE_VERIFICATION references, document new shared constants
(TEST_INTEGRITY, SESSION_STARTUP, PROGRESS_TRACKING), update mode prompt
descriptions with TDD protocol, Definition of Done checklists, and
mandatory test specifications.
Replace the weak 7-step execution protocol with an explicit red-green-refactor
cycle that requires agents to write failing tests before implementing. Move
anti-patterns and scope rules above deviation/git sections so critical
constraints get more attention. Add session startup verification, progress
tracking, and a mandatory definition-of-done checklist that must pass before
signaling completion. Remove dead CODEBASE_VERIFICATION import.
Detail: Replace vague "how to verify" requirement with mandatory test specification
(file path, scenarios, run command) for execute-category tasks. Update good-task
example to demonstrate the new format. Add Definition of Done checklist.
Plan: Add Testing Strategy section requiring tests within each implementation phase
instead of trailing test phases. Add Definition of Done checklist.
Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro
data (107 lines / 4.1 files = 46% success for best agents). Old rules
used file count as the primary proxy which correlates poorly with task
difficulty compared to lines changed.
Add CONTEXT_MANAGEMENT shared block to plan and detail mode prompts so
architect agents also benefit from compaction awareness and parallel
execution hints. Update index.ts re-exports and agent docs.
- Add CONTEXT_MANAGEMENT constant: tells agents to keep working through
context compaction and parallelize reads
- Add "why" reasoning to each GIT_WORKFLOW rule so agents understand the
purpose, not just the rule
- Slim buildInterAgentCommunication: replace verbose bash code blocks with
a brief usage pattern paragraph, condense CLI docs to bullet list
Standalone agents (no initiative or 0 linked projects) run in a
workspace/ subdirectory, but signal.json lookups used the parent
directory. This caused all standalone agents to be marked "crashed"
despite successful completion.
Track the actual agent cwd at spawn time via ActiveAgent.agentCwd
and probe for the workspace/ subdirectory during reconciliation and
crash detection paths.
- Add deleteTask tRPC mutation (repo already had delete method)
- Add X button to TaskRow, hidden until hover, with confirmation dialog
- Shift+click bypasses confirmation for fast bulk deletion
- Invalidates listInitiativeTasks on success
- Document shift+click pattern in CLAUDE.md as standard for destructive actions
- New onConversationAnswer subscription: listens for conversation:answered
events matching a specific conversation ID, yields the answer text
- cw ask now subscribes via SSE instead of polling getConversation
- Removed --poll-interval and --timeout flags from cw ask
- Updated prompt to reflect SSE-based cw ask (no polling options)
- INTER_AGENT_COMMUNICATION constant → buildInterAgentCommunication(agentId) function
- Manager injects actual agent ID into prompt after DB record creation
- Agent ID hardcoded in cw listen/ask commands — no manifest.json indirection
- cw listen now uses onPendingConversation SSE subscription instead of polling
- CLI trpc-client upgraded with splitLink for subscription support
- All CLI flags (--agent-id, --from, --timeout, --poll-interval) documented in prompt
- conversation:created/answered added to ALL_EVENT_TYPES
- Execution mode badge toggles between YOLO/REVIEW on click
- Branch badge opens inline editor (input + save/cancel)
- Branch editing locked once any task has left pending status
- Server-side guard rejects branch changes after work has started
- getInitiative returns branchLocked flag
- updateInitiativeConfig now accepts optional branch field
Body: height 100vh + overflow hidden instead of min-height 100vh,
so the browser never shows a scrollbar on html/body.
AppLayout: h-screen flex column with shrink-0 header and flex-1
min-h-0 overflow-auto main. Pages like initiatives scroll within
main; agents page uses h-full with internal panel scrollers.
Left agent list gets min-h-0 for proper overflow containment in grid.
Right output panel gets overflow-hidden so AgentOutputViewer stays
within the available grid cell height.
Replace full CoordinationServer with a lightweight mock that serves only
conversation tRPC procedures backed by an in-memory repository. Agents
now have real coding tasks (write spec, ask questions, create summary)
and the two-question flow proves the listen→answer→re-listen cycle works.
Two-session test: Agent A listens for questions and answers, Agent B
asks a question and captures the response. Also fixes missing
conversationRepository passthrough in tRPC adapter.
Three bugs fixed in auto-cleanup commit retry flow:
1. resumeForCommit now calls getDirtyWorktreePaths() to include specific
project subdirectory names in the prompt, so the agent knows which
dirs to cd into and commit (instead of running git from the non-repo
parent dir).
2. Removed finally block in tryAutoCleanup that reset the retry counter
after every call, making MAX_COMMIT_RETRIES ineffective. Counter is
now only cleaned up on success, max retries, or error.
3. resumeForCommit returns false early if no worktrees are actually
dirty, preventing unnecessary commit retries for clean agents.
Enables parallel agents to communicate through a CLI-based conversation
mechanism coordinated via tRPC. Agents can ask questions to peers and
receive answers, with target resolution by agent ID, task ID, or phase ID.
Preview deployments let reviewers spin up the app at a specific branch
in local Docker containers, accessible through a single Caddy reverse
proxy port. Docker is the source of truth — no database table needed.
New module: src/preview/ with config discovery (.cw-preview.yml →
compose → Dockerfile fallback), compose generation, Docker CLI wrapper,
health checking, and port allocation (9100-9200 range).
Setup tokens from `claude setup-token` can't query the usage API,
resulting in a useless "Usage API request failed" message. Now shows
the actual HTTP status and guides users to complete OAuth setup.
Also distinguishes warning state (yellow) from error state (red)
in the AccountCard UI.
DB log chunk insertion is now the sole trigger for agent:output events.
Eliminates triple emission (FileTailer, handleStreamEvent, output buffer)
in favor of: FileTailer.onRawContent → DB insert → EventBus emit.
- createLogChunkCallback emits agent:output after successful DB insert
- spawnInternal now wires onRawContent callback (fixes session 1 gap)
- Remove eventBus from FileTailer (no longer touches EventBus)
- Remove eventBus from ProcessManager constructor (dead parameter)
- Remove agent:output emission from handleStreamEvent text_delta
- Remove outputBuffers map and all buffer helpers from manager/handler
- Remove getOutputBuffer from AgentManager interface and implementations
- getAgentOutput tRPC: DB-only, no file fallback
- onAgentOutput subscription: no initial buffer yield, events only
- AgentOutputViewer: accumulates raw JSONL chunks, parses uniformly
Allow users to specify a custom branch when creating initiatives
(auto-generated if left blank). Add updateProject tRPC procedure
and /settings/projects page with inline-editable defaultBranch.
Planning tasks (research, discuss, plan, detail, refine) have their own
architect flow and should never enter the dispatch pipeline or clutter
agent context. Three changes:
1. Phase auto-queue skips planning-category tasks
2. Safety net in getNextDispatchable() skips planning tasks
3. gatherInitiativeContext() filters to execution tasks only
The dismiss mutation only invalidated `listAgents` but the hook reads
from `getActiveRefineAgent`, so the banner stayed visible after dismiss.
Added optimistic cache clearing and invalidation for `getActiveRefineAgent`.