Apply font-display to headings across settings layout and health page.
Replace text-destructive with text-status-error-fg for consistency with
the design system status tokens. Increase projects page section spacing
from space-y-4 to space-y-6.
- Add Plus Jakarta Sans as display font for headings
- Add subtle noise texture overlay + indigo radial gradient for depth
- New keyframe animations: glow-pulse, fade-in-up, scale-in, slide-in-right
- Card: interactive hover-lift + selected ring variants
- Button: scale micro-interactions, destructive glow, transition-all
- Header: logo upgrade with wordmark, animated nav indicator bar, glass search button, gradient shadow depth
- StatusDot: glow halos per status variant (active/success/error/warning/urgent)
- HealthDot: glow effects for connected/disconnected/reconnecting states
- Card hover-lift and status glow CSS utilities
Subtle fade-in + y-offset animations on mount for all main pages
(initiatives list, initiative detail, agents, inbox) and staggered
card animations for initiative and agent lists.
Tasks are now grouped by dependency depth using the same
groupPhasesByDependencyLevel utility. Parallel tasks are wrapped in dashed
containers, sequential layers connected by status-aware lines. Replaces
the flat TaskRow list and DependencyIndicator callout bars.
Phases are now grouped by dependency depth using groupPhasesByDependencyLevel.
Single-phase layers render as compact nodes, multi-phase layers are wrapped in
a dashed "PARALLEL" container. Connectors between layers turn green when prior
layers are all completed. Staggered entrance animation per layer.
Replace plain text dependency indicators with visual, status-aware components:
- New DependencyChip/PhaseNumberBadge components with status-colored styling
- Sidebar shows compact numbered circles for phase deps instead of text
- Detail panel uses bordered cards with phase badges and status indicators
- Task dependency callout bars with resolved/total counters
- Collapse mechanism for tasks with 3+ dependencies (+N more button)
- Full dark mode support via semantic status tokens
The auto-spawned agent on initiative creation was using discuss mode
(Q&A) when it should use refine mode (expand content). Now:
- Description seeds root page as tiptap content (split on double newlines)
- Spawns refine agent with the populated page in inputContext
- getActiveRefineAgent broadened to also surface discuss agents (for
CLI-spawned discuss agents)
- RefineAgentPanel shows mode-appropriate label for discuss vs refine
The discuss agent spawned on initiative creation received only the
initiative in its inputContext, missing the task that carries the user's
description. The agent started without knowing what to discuss.
Auto-spawned discuss/plan/refine agents were invisible because:
1. listInitiatives only filtered for mode='detail' agents
2. deriveInitiativeActivity returned 'idle' for zero phases before
checking for active agents
Broadened agent filter to all architect modes (discuss, plan, detail,
refine), moved active agent check before zero-phases early return, and
added 'discussing'/'refining' activity states with pulsing indicators.
Prompt changes in detail.ts invalidated the old cassette hashes.
Re-recorded all 4 cassettes with updated prompt content. Replay
verified passing in 12s.
Detail agents define task dependencies in YAML frontmatter but they were
silently dropped — never written to the task_dependencies table. This
caused all tasks to dispatch in parallel regardless of intended ordering,
and the frontend showed no dependency information.
- Add fileIdToDbId mapping and second-pass dependency creation in
output-handler.ts (mirrors existing phase dependency pattern)
- Add task_dependency to changeset entry entityType enum
- Add listPhaseTaskDependencies tRPC procedure for batch querying
- Wire blockedBy in PhaseDetailPanel and PhaseWithTasks from real data
- Clarify dependency semantics in detail prompt
Execution agents were spawning blind — no input files, no knowledge of
what predecessor tasks accomplished. This adds three capabilities:
1. summary column on tasks table — completeTask() reads the finishing
agent's result.message and stores it on the task record
2. dispatchNext() gathers full initiative context (initiative, phase,
sibling tasks, pages) and passes it as inputContext so agents get
.cw/input/task.md, initiative.md, phase.md, and context directories
3. context/tasks/*.md files now include the summary field in frontmatter
so dependent agents can see what prior agents accomplished
When an agent asks a question via `cw ask` targeting an idle agent,
the conversation router now auto-resumes the idle agent's session so
it can answer. Previously, questions to idle agents sat unanswered
forever because target resolution only matched running agents.
Changes:
- Add `resumeForConversation()` to AgentManager interface and implement
on MultiProviderAgentManager (mirrors resumeForCommit pattern)
- Relax createConversation target resolution: prefer running, fall back
to idle (was running-only)
- Trigger auto-resume after conversation creation for idle targets
- Add concurrency lock (conversationResumeLocks Set) to prevent
double-resume race conditions
Planning modes (plan, refine) get a minimal block with just cw ask
syntax. Execution modes get the full protocol: commands table, shell
recipe for listener lifecycle, targeting guidance, when/when-not
decision criteria, good/bad examples, and answering guidelines.
listInitiativeTasks was filtering out detail tasks server-side, so the
detailAgentByPhase mapping could never resolve agent.taskId to a phaseId.
Move the filter to client-side (displayTasks) so detail tasks are available
for agent mapping but excluded from counts and display grouping.
Thread detail agent info through PipelineGraph → PipelineStageColumn →
PipelinePhaseGroup. Phase groups now show spinner + "Detailing…" when a
detail agent is active and "Review changes" when finished with no tasks.
Add 'detailing' activity state derived from active detail agents
(mode=detail, status running/waiting_for_input). Initiative cards show
pulsing "Detailing" indicator. Phase sidebar items show spinner during
active detailing and "Review changes" when the agent finishes.
listInitiatives now returns an activity object (state, activePhase, phase
counts) derived server-side from phases, eliminating per-card listPhases
queries. Initiative cards show a StatusDot with pulse animation + label
instead of a static StatusBadge. Removed redundant View and Spawn Architect
buttons from cards. Added variant override prop to StatusDot.
tRPC subscriptions use connecting/pending/error/idle — not success.
The old code mapped pending→isConnecting and waited for success (which
never fires), causing AgentOutputViewer to permanently show "Connecting...".
Now: connecting→isConnecting, pending→isConnected, idle→disconnected.
Architect agents (discuss, plan, detail, refine) were producing generic
analysis disconnected from the actual codebase. They had full tool access
in their worktrees but were never instructed to explore the code.
- Add CODEBASE_EXPLORATION shared constant: read project docs, explore
structure, check existing patterns, use subagents for parallel exploration
- Inject into all 4 architect prompts after INPUT_FILES
- Strengthen discuss prompt: analysis method references codebase, examples
cite specific paths, definition_of_done requires codebase references
- Fix spawnArchitectDiscuss to pass full context (pages/phases/tasks) via
gatherInitiativeContext() — was only passing bare initiative metadata
- Update docs/agent.md with new tag ordering and shared block table
Move drizzle/, dist/, and coverage/ into apps/server/ so all
server-specific artifacts live alongside the source they belong to.
- git mv drizzle/ → apps/server/drizzle/
- drizzle.config.ts: out → ./apps/server/drizzle
- tsconfig.json: outDir → ./apps/server/dist, exclude drizzle dir
- package.json: main/bin/clean point to apps/server/dist/
- vitest.config.ts: reportsDirectory → ./apps/server/coverage
- .gitignore: add coverage/ entry
- ensure-schema.ts: update getMigrationsPath() for new layout
- docs/database-migrations.md: update drizzle/ references
Move src/ → apps/server/ and packages/web/ → apps/web/ to adopt
standard monorepo conventions (apps/ for runnable apps, packages/
for reusable libraries). Update all config files, shared package
imports, test fixtures, and documentation to reflect new paths.
Key fixes:
- Update workspace config to ["apps/*", "packages/*"]
- Update tsconfig.json rootDir/include for apps/server/
- Add apps/web/** to vitest exclude list
- Update drizzle.config.ts schema path
- Fix ensure-schema.ts migration path detection (3 levels up in dev,
2 levels up in dist)
- Fix tests/integration/cli-server.test.ts import paths
- Update packages/shared imports to apps/server/ paths
- Update all docs/ files with new paths
The cassette-backed test (full-flow-cassette.test.ts) covers the same
discuss→plan→detail→execute pipeline without API cost. The real-agent
test added no unique value once cassettes were committed, and the
Stage 6 npm-test validation it included was soft (warn, not fail).
Also removes the now-unused shouldRunFullFlowTests export and the
FULL_FLOW_TESTS=1 entry from CLAUDE.md.
Three issues discovered and fixed after initial recording:
1. Agent workdir names not normalized — random animal names (e.g.
"available-sheep") embedded in workspace paths caused key drift.
Added AGENT_WORKDIR_RE to replace agent-workdirs/<name> with
agent-workdirs/__AGENT__ in normalizer.ts.
2. Phase/task files missing on replay — plan/detail agents write output
to .cw/output/ (phases/, tasks/) which the server reads on completion.
The replay worker only emits JSONL; it doesn't re-execute file writes.
Extended cassette format with outputFiles field and added capture
(walkOutputDir) + restore (restoreOutputFiles) logic to process-manager.
3. Recording timeout too short — fixed CASSETTE_FLOW_TIMEOUT to be
mode-aware: 60 min for recording runs, 5 min for replay.
Also commit the 4 recorded cassettes (discuss/plan/detail/execute)
that make the full-flow cassette test runnable in CI without API costs.
14 files in docs/wireframes/v2/ addressing 13 UX gaps from v1:
- Theme spec with indigo brand, status tokens, terminal/diff tokens,
dark mode, Geist typography, 6px radius, layered shadows
- Wireframes for all pages with loading/error/empty states
- Shared component specs (SaveIndicator, EmptyState, ErrorState,
CommandPalette, ThemeToggle)
- normalizer.ts: Add NANOID_RE (21-char alphanumeric) → __ID__ as step 2.5,
fixing cassette key instability from nanoid agent IDs in prompts
- harness.ts: Add FullFlowHarnessOptions.processManagerFactory for injecting
CassetteProcessManager without duplicating harness setup
- full-flow-cassette.test.ts: New cassette-backed variant of full-flow test;
skips automatically when no cassettes exist (fresh clone), runs in ~seconds
once cassettes are recorded and committed
- CLAUDE.md: Document cassette recording command for the full-flow test
- driveToCompletion() now catches inner waitForAgentAttention timeouts
instead of letting them propagate — long-running execute/detail agents
(>3 min without transitioning to waiting_for_input) no longer crash the
polling loop; the outer deadline handles termination correctly
- Switch execute stage from waitForAgentCompletion to driveToCompletion
so any clarifying questions get auto-answered
- Increase DETAIL_TIMEOUT_MS 8→15 min, PLAN_TIMEOUT_MS 8→12 min,
EXECUTE_TIMEOUT_MS 10→20 min — architect agents are variable in
practice; these are upper bounds not expectations
- Raise FULL_FLOW_TIMEOUT 30→60 min to cover worst-case stacking
- Update CLAUDE.md test command with correct --test-timeout=3600000
Verified: full pipeline (discuss→plan→detail→execute) passes in ~499s
Replace ## Heading sections with descriptive XML tags (<role>, <task>,
<execution_protocol>, <examples>, etc.) for unambiguous first-order vs
second-order delimiter separation per Anthropic best practices.
- shared.ts: All constants wrapped in their XML tag
- Mode prompts: Consistent tag vocabulary and ordering across all 5 modes
- Examples use <examples> > <example label="good/bad"> nesting
- workspace.ts: Output wrapped in <workspace> tags
- Delete dead src/agent/prompts.ts (zero imports)
- Update docs/agent.md with XML tag documentation
Adds a complete multi-agent workflow test gated behind FULL_FLOW_TESTS=1:
- src/test/fixtures/todo-api/ — minimal JS project with missing complete()
method and failing tests; gives execute agents a concrete, verifiable task
- src/test/integration/full-flow/harness.ts — FullFlowHarness wiring all 11
repos + real MultiProviderAgentManager + tRPC caller + driveToCompletion()
helper for Q&A loops
- src/test/integration/full-flow/report.ts — stage-by-stage console formatters
(discuss/plan/detail/execute/git diff/final summary)
- src/test/integration/full-flow/full-flow.test.ts — staged integration test
that validates breakdown granularity, agent output quality, and that npm test
passes in the project worktree after execution
Run with:
FULL_FLOW_TESTS=1 npm test -- src/test/integration/full-flow/ --test-timeout=1800000
Audited all 44 test files one by one. Documents what each test verifies,
identifies 12 redundant test pairs, 13 coverage gaps (prioritized), fragility
assessment, and mock style inconsistencies.
Implements cassette recording/replay to test the full agent execution
pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager)
without real AI API calls.
Key components:
- `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached
to replay cassettes or record real runs on completion
- `replay-worker.mjs`: standalone node script that replays JSONL + signal.json
as a subprocess, exercising the complete file-based output pipeline
- `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash
- `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps,
session numbers) from prompts for stable cassette keys
- `key.ts`: hashes normalized prompt + provider args + worktree file content
(worktree hash detects content drift for execute-mode agents)
- `createCassetteHarness()`: wraps RealProviderHarness with cassette support,
same interface so existing real-provider tests work unchanged
Mode control via env vars:
(default) → replay: cassette must exist (safe for CI)
CW_CASSETTE_RECORD=1 → auto: replay if exists, record if missing
CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette
MultiProviderAgentManager gains an optional `processManagerOverride` constructor
parameter for clean dependency injection without changing existing callers.
Cassette files live in src/test/cassettes/ and are intended to be committed
to git so CI runs without API access.
- Add withFakeTimers(fn) helper to TestHarness for scoped timer control
- Replace all vi.runAllTimersAsync() with harness.advanceTimers() in E2E
and harness tests (37 call sites across 5 files)
- Keep vi.useFakeTimers() per-test activation pattern (intentional)
- Add @vitest/coverage-v8 dep so `npm run test:coverage` actually works
- Add exclude patterns to vitest config (node_modules, dist, packages)
- Replace dynamic import('vitest') in advanceTimers with direct vi import
Nulls out agents.initiativeId before deleting the initiative row,
ensuring the delete succeeds even on databases where migration 0025
(which adds ON DELETE SET NULL to the FK) hasn't been applied.