Codewalkers

Author	SHA1	Message	Date
Lukas May	c8f370583a	feat: Add codebase exploration to architect agent prompts Architect agents (discuss, plan, detail, refine) were producing generic analysis disconnected from the actual codebase. They had full tool access in their worktrees but were never instructed to explore the code. - Add CODEBASE_EXPLORATION shared constant: read project docs, explore structure, check existing patterns, use subagents for parallel exploration - Inject into all 4 architect prompts after INPUT_FILES - Strengthen discuss prompt: analysis method references codebase, examples cite specific paths, definition_of_done requires codebase references - Fix spawnArchitectDiscuss to pass full context (pages/phases/tasks) via gatherInitiativeContext() — was only passing bare initiative metadata - Update docs/agent.md with new tag ordering and shared block table	2026-03-03 12:45:14 +01:00
Lukas May	1043079a08	feat: Persist agents page filter in URL query params, default to questions	2026-03-03 12:42:32 +01:00
Lukas May	2f2ad6eb95	feat: Add remove account button to health page UI	2026-03-03 12:08:48 +01:00
Lukas May	86c6ad8be1	chore: Switch dev.sh to side-by-side split layout (server \| web)	2026-03-03 12:04:23 +01:00
Lukas May	2eada071a1	fix: Use npx tsx so tsx resolves from local node_modules	2026-03-03 12:03:05 +01:00
Lukas May	0fad4a42b9	chore: Move dev.sh into workdir/ with correct working directory	2026-03-03 12:02:21 +01:00
Lukas May	8e77503941	chore: Add dev.sh tmux script to start server and frontend together	2026-03-03 11:59:34 +01:00
Lukas May	b11cae998c	refactor: Co-locate server artifacts under apps/server/ Move drizzle/, dist/, and coverage/ into apps/server/ so all server-specific artifacts live alongside the source they belong to. - git mv drizzle/ → apps/server/drizzle/ - drizzle.config.ts: out → ./apps/server/drizzle - tsconfig.json: outDir → ./apps/server/dist, exclude drizzle dir - package.json: main/bin/clean point to apps/server/dist/ - vitest.config.ts: reportsDirectory → ./apps/server/coverage - .gitignore: add coverage/ entry - ensure-schema.ts: update getMigrationsPath() for new layout - docs/database-migrations.md: update drizzle/ references	2026-03-03 11:55:12 +01:00
Lukas May	04c212da92	feat: Implement v2 design system with indigo brand, dark mode, and status tokens Complete frontend design overhaul replacing achromatic shadcn/ui defaults with an indigo-branded (#6366F1), status-aware, dark-mode-enabled token system. Phase 1 — Theme Foundation: - Replace all CSS tokens in index.css with v2 light/dark mode values - Add 24 status tokens (6 statuses × 4 variants), 22 terminal tokens, 7 diff tokens, 5 shadow tokens, 9 transition/animation tokens, 10 z-index tokens, 10-step extended indigo scale - Install Geist Sans/Mono variable fonts (public/fonts/) - Extend tailwind.config.ts with all new token utilities - Add dark mode flash-prevention script in index.html - Add status-pulse and shimmer keyframe animations - Add global focus-visible styles and reduced-motion media query Phase 2 — ThemeProvider + Toggle: - ThemeProvider context with system preference listener - 3-state ThemeToggle (Sun/Monitor/Moon) - Radix tooltip primitive for tooltips - localStorage persistence with 'cw-theme' key Phase 3 — Shared Components + Token Migration: - StatusDot: mapEntityStatus() maps raw statuses to 6 semantic variants - StatusBadge: uses status token bg/fg/border classes - Badge: 6 new status variants + xs size - EmptyState, ErrorState, SaveIndicator shared patterns - CommandPalette: Cmd+K search with fuzzy matching, keyboard nav - Skeleton with shimmer animation + SkeletonCard composite layouts - KeyboardShortcutHint, NavBadge, enhanced Sonner config - Migrate ALL hardcoded Tailwind colors to token classes across AgentOutputViewer, review/*, ProgressBar, AccountCard, InitiativeHeader, DependencyIndicator, PipelineTaskCard, PreviewPanel, ChangeSetBanner, MessageCard, PhaseDetailPanel Phase 4 — App Layout Overhaul: - Single 48px row header with CW logo, nav with NavBadge counts, Cmd+K search button, ThemeToggle, HealthDot - Remove max-w-7xl from header/main; pages control own widths - ConnectionBanner for offline/reconnecting states - BrowserTitleUpdater with running/questions counts - useGlobalKeyboard (1-4 nav, Cmd+K), useConnectionStatus hooks - Per-page width wrappers (initiatives max-w-6xl, settings max-w-4xl) Phase 5 — Page-Level Token Migration: - ReviewSidebar: all hardcoded green/orange/red → status/diff tokens - CommentThread: resolved state → status-success tokens - Settings health: green → status-success-dot	2026-03-03 11:43:09 +01:00
Lukas May	34578d39c6	refactor: Restructure monorepo to apps/server/ and apps/web/ layout Move src/ → apps/server/ and packages/web/ → apps/web/ to adopt standard monorepo conventions (apps/ for runnable apps, packages/ for reusable libraries). Update all config files, shared package imports, test fixtures, and documentation to reflect new paths. Key fixes: - Update workspace config to ["apps/", "packages/"] - Update tsconfig.json rootDir/include for apps/server/ - Add apps/web/** to vitest exclude list - Update drizzle.config.ts schema path - Fix ensure-schema.ts migration path detection (3 levels up in dev, 2 levels up in dist) - Fix tests/integration/cli-server.test.ts import paths - Update packages/shared imports to apps/server/ paths - Update all docs/ files with new paths	2026-03-03 11:22:53 +01:00
Lukas May	8c38d958ce	refactor: Remove full-flow.test.ts in favour of cassette variant The cassette-backed test (full-flow-cassette.test.ts) covers the same discuss→plan→detail→execute pipeline without API cost. The real-agent test added no unique value once cassettes were committed, and the Stage 6 npm-test validation it included was soft (warn, not fail). Also removes the now-unused shouldRunFullFlowTests export and the FULL_FLOW_TESTS=1 entry from CLAUDE.md.	2026-03-03 10:53:41 +01:00
Lukas May	25360e1711	fix: Stabilize full-flow cassette keys and restore output files on replay Three issues discovered and fixed after initial recording: 1. Agent workdir names not normalized — random animal names (e.g. "available-sheep") embedded in workspace paths caused key drift. Added AGENT_WORKDIR_RE to replace agent-workdirs/<name> with agent-workdirs/__AGENT__ in normalizer.ts. 2. Phase/task files missing on replay — plan/detail agents write output to .cw/output/ (phases/, tasks/) which the server reads on completion. The replay worker only emits JSONL; it doesn't re-execute file writes. Extended cassette format with outputFiles field and added capture (walkOutputDir) + restore (restoreOutputFiles) logic to process-manager. 3. Recording timeout too short — fixed CASSETTE_FLOW_TIMEOUT to be mode-aware: 60 min for recording runs, 5 min for replay. Also commit the 4 recorded cassettes (discuss/plan/detail/execute) that make the full-flow cassette test runnable in CI without API costs.	2026-03-03 10:35:13 +01:00
Lukas May	1e374abcd6	docs: Design review pass on all v2 wireframes 13 files reviewed with mission-control design lens. Key additions: - theme: extended indigo scale, 4-level surface hierarchy, 22 terminal tokens, transition/z-index/focus-visible token categories - All screens: keyboard shortcuts, loading/error/empty states hardened - 5 new shared components: StatusDot, SkeletonLoader, Toast, Badge, KeyboardShortcutHint - settings: expanded from 2 to 5 sub-pages (accounts, workspace, danger zone) - review-tab: 3-pane layout, inline comments, file nav, hunk controls - execution-tab: zoom, partial failure state, stale agent detection - dialogs: 2 bugs found (mutation locking, error placement) Total: 4,039 → 9,302 lines (+130% from review pass)	2026-03-02 19:36:26 +09:00
Lukas May	478a7f18e9	docs: Add v2 wireframes and theme specification 14 files in docs/wireframes/v2/ addressing 13 UX gaps from v1: - Theme spec with indigo brand, status tokens, terminal/diff tokens, dark mode, Geist typography, 6px radius, layered shadows - Wireframes for all pages with loading/error/empty states - Shared component specs (SaveIndicator, EmptyState, ErrorState, CommandPalette, ThemeToggle)	2026-03-02 18:13:17 +09:00
Lukas May	41b1d0e986	feat: Add cassette support for full-flow integration test - normalizer.ts: Add NANOID_RE (21-char alphanumeric) → __ID__ as step 2.5, fixing cassette key instability from nanoid agent IDs in prompts - harness.ts: Add FullFlowHarnessOptions.processManagerFactory for injecting CassetteProcessManager without duplicating harness setup - full-flow-cassette.test.ts: New cassette-backed variant of full-flow test; skips automatically when no cassettes exist (fresh clone), runs in ~seconds once cassettes are recorded and committed - CLAUDE.md: Document cassette recording command for the full-flow test	2026-03-02 17:42:43 +09:00
Lukas May	89db580ca4	docs: Add ASCII wireframe mockups for all frontend pages Covers: app layout, initiatives list, initiative detail (4 tabs), agents page, inbox, settings (health + projects), and all dialogs.	2026-03-02 17:28:14 +09:00
Lukas May	988160b2b7	fix: Patch full-flow test timeouts and driveToCompletion polling loop - driveToCompletion() now catches inner waitForAgentAttention timeouts instead of letting them propagate — long-running execute/detail agents (>3 min without transitioning to waiting_for_input) no longer crash the polling loop; the outer deadline handles termination correctly - Switch execute stage from waitForAgentCompletion to driveToCompletion so any clarifying questions get auto-answered - Increase DETAIL_TIMEOUT_MS 8→15 min, PLAN_TIMEOUT_MS 8→12 min, EXECUTE_TIMEOUT_MS 10→20 min — architect agents are variable in practice; these are upper bounds not expectations - Raise FULL_FLOW_TIMEOUT 30→60 min to cover worst-case stacking - Update CLAUDE.md test command with correct --test-timeout=3600000 Verified: full pipeline (discuss→plan→detail→execute) passes in ~499s	2026-03-02 17:15:12 +09:00
Lukas May	76aca71705	refactor: Restructure agent prompts with XML tags Replace ## Heading sections with descriptive XML tags (<role>, <task>, <execution_protocol>, <examples>, etc.) for unambiguous first-order vs second-order delimiter separation per Anthropic best practices. - shared.ts: All constants wrapped in their XML tag - Mode prompts: Consistent tag vocabulary and ordering across all 5 modes - Examples use <examples> > <example label="good/bad"> nesting - workspace.ts: Output wrapped in <workspace> tags - Delete dead src/agent/prompts.ts (zero imports) - Update docs/agent.md with XML tag documentation	2026-03-02 14:15:28 +09:00
Lukas May	55eb6a494b	test: Add full-flow integration test (discuss→plan→detail→execute) Adds a complete multi-agent workflow test gated behind FULL_FLOW_TESTS=1: - src/test/fixtures/todo-api/ — minimal JS project with missing complete() method and failing tests; gives execute agents a concrete, verifiable task - src/test/integration/full-flow/harness.ts — FullFlowHarness wiring all 11 repos + real MultiProviderAgentManager + tRPC caller + driveToCompletion() helper for Q&A loops - src/test/integration/full-flow/report.ts — stage-by-stage console formatters (discuss/plan/detail/execute/git diff/final summary) - src/test/integration/full-flow/full-flow.test.ts — staged integration test that validates breakdown granularity, agent output quality, and that npm test passes in the project worktree after execution Run with: FULL_FLOW_TESTS=1 npm test -- src/test/integration/full-flow/ --test-timeout=1800000	2026-03-02 13:28:23 +09:00
Lukas May	1540039c52	test: Remove redundant and dead tests (-743 lines) Delete 3 files: - completion-detection.test.ts (private method tests, covered by crash-race-condition) - completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition) - real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths) Remove individual tests: - crash-race-condition.test.ts #4 (weaker duplicate of #2) - mock-manager.test.ts duplicate "(second test)" for detail_complete - process-manager.test.ts 2 "logs comprehensive" tests with empty assertions - edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios Update test-inventory.md to reflect removals.	2026-03-02 12:57:27 +09:00
Lukas May	a2ab4c4a84	docs: Add comprehensive test inventory with coverage gaps and redundancy map Audited all 44 test files one by one. Documents what each test verifies, identifies 12 redundant test pairs, 13 coverage gaps (prioritized), fragility assessment, and mock style inconsistencies.	2026-03-02 12:23:39 +09:00
Lukas May	e9ec5143fd	docs: Document cassette testing system in docs/testing.md and CLAUDE.md	2026-03-02 12:22:46 +09:00
Lukas May	ec031211a2	fix: Resolve advanceTimers return type mismatch (Promise<VitestUtils> → Promise<void>)	2026-03-02 12:19:47 +09:00
Lukas May	0ed657b644	feat: Add VCR-style cassette testing system for agent subprocess pipeline Implements cassette recording/replay to test the full agent execution pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager) without real AI API calls. Key components: - `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached to replay cassettes or record real runs on completion - `replay-worker.mjs`: standalone node script that replays JSONL + signal.json as a subprocess, exercising the complete file-based output pipeline - `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash - `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps, session numbers) from prompts for stable cassette keys - `key.ts`: hashes normalized prompt + provider args + worktree file content (worktree hash detects content drift for execute-mode agents) - `createCassetteHarness()`: wraps RealProviderHarness with cassette support, same interface so existing real-provider tests work unchanged Mode control via env vars: (default) → replay: cassette must exist (safe for CI) CW_CASSETTE_RECORD=1 → auto: replay if exists, record if missing CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette MultiProviderAgentManager gains an optional `processManagerOverride` constructor parameter for clean dependency injection without changing existing callers. Cassette files live in src/test/cassettes/ and are intended to be committed to git so CI runs without API access.	2026-03-02 12:17:52 +09:00
Lukas May	a1366efe4d	refactor: Standardize fake timer usage across E2E tests - Add withFakeTimers(fn) helper to TestHarness for scoped timer control - Replace all vi.runAllTimersAsync() with harness.advanceTimers() in E2E and harness tests (37 call sites across 5 files) - Keep vi.useFakeTimers() per-test activation pattern (intentional)	2026-03-02 12:08:24 +09:00
Lukas May	dcb855ede1	fix: Repair test harness coverage, excludes, and timer overhead - Add @vitest/coverage-v8 dep so `npm run test:coverage` actually works - Add exclude patterns to vitest config (node_modules, dist, packages) - Replace dynamic import('vitest') in advanceTimers with direct vi import	2026-03-02 12:01:16 +09:00
Lukas May	863117c63a	fix: Detach agents before initiative deletion to prevent FK constraint failure Nulls out agents.initiativeId before deleting the initiative row, ensuring the delete succeeds even on databases where migration 0025 (which adds ON DELETE SET NULL to the FK) hasn't been applied.	2026-02-18 18:35:06 +09:00
Lukas May	6fa025251e	feat: Wire up initiative deletion end-to-end Add deleteInitiative tRPC procedure, wire Delete button in InitiativeCard with confirm dialog (Shift+click bypass), remove unused onDelete prop chain. Fix agents table FK constraints (initiative_id, account_id missing ON DELETE SET NULL) via table recreation migration. Register conversations migration in journal. Expand cascade delete tests to cover pages, projects, change sets, agents (set null), and conversations (set null).	2026-02-18 17:54:53 +09:00
Lukas May	80aa3e42fb	Fix StatusBadge crash when status is undefined	2026-02-18 17:44:38 +09:00
Lukas May	8bece70a61	fix: Wire archive button to updateInitiative mutation The Archive menu item in InitiativeCard had no onClick handler. Added mutation call with confirmation dialog (shift+click to skip).	2026-02-18 17:44:01 +09:00
Lukas May	e52b9d3332	Remove unused Edit and Duplicate menu items from initiative card	2026-02-18 17:43:21 +09:00
Lukas May	1331fb737d	refactor: Wire buildExecutePrompt into dispatch manager Dispatch manager now wraps task descriptions with buildExecutePrompt() so agents receive the full execution protocol. Update test to match prompt wrapping. Add worktree isolation note to workspace layout.	2026-02-18 17:40:03 +09:00
Lukas May	b63a8b605c	refactor: Compress refine prompt for conciseness (439→243 words, -45%) - Tighten items 1-3 arrow notation, compress item 4 to Better/Best progressive comparison, shorten item 5 scenario example - Cut 3 redundant Rules bullets (already stated in Output Files and guard paragraphs) - Collapse 5 DoD checks to 2 non-redundant verification items - Compress behavioral guard paragraphs	2026-02-18 17:30:57 +09:00
Lukas May	a4d48262c1	refactor: Compress detail prompt for conciseness (775→473 words, -39%) Drop redundant Specificity Test section (covered by examples and checklist), remove Task Design Rules (implied by entire prompt), flatten frontmatter docs, trim good example, tighten sizing/checkpoint/context sections.	2026-02-18 17:30:56 +09:00
Lukas May	c9769b09b7	refactor: Compress plan prompt for conciseness Cut ~35% of words while preserving all high-value content: - Merged Testing Strategy into Phase Design (rule + example) - Eliminated Rules section (redundant with Phase Design, Dependencies) - Compressed Dependency Graph intro (examples speak for themselves) - Trimmed File Ownership and Specificity prose - Reduced Existing Context from 4 to 2 bullets - Tightened Definition of Done checklist	2026-02-18 17:30:09 +09:00
Lukas May	a4502ebf77	refactor: Compress discuss prompt for conciseness (~30% word reduction) Cut redundant rules already demonstrated by good/bad examples, removed default-Claude-behavior instructions, collapsed verbose sections into single directives.	2026-02-18 17:30:07 +09:00
Lukas May	e73e99cb28	refactor: Compress shared agent prompts for conciseness (1060→699 words, -34%) Apply aggressive compression: imperative style, remove anti-laziness emphasis, cut rationale where obvious, eliminate redundant explanations. All constant names and function signatures preserved.	2026-02-18 17:30:04 +09:00
Lukas May	67f98f4f35	refactor: Compress execute prompt for conciseness (~47% word reduction) - Cut 5 anti-patterns: placeholder code, blind imports, ignoring test failures (all default Claude behavior), plus self-validating tests and test mutation (both already covered by TEST_INTEGRITY in shared.ts) - Compressed execution protocol steps to imperative essentials - Merged scope rules from 4 bullets to 3 - Trimmed definition of done checklist (removed redundant 5th item) - Removed anti-laziness language (IMPORTANT, MUST, aggressive emphasis)	2026-02-18 17:30:00 +09:00
Lukas May	44d2a3ff08	docs: Update agent.md to reflect prompt overhaul Remove CODEBASE_VERIFICATION references, document new shared constants (TEST_INTEGRITY, SESSION_STARTUP, PROGRESS_TRACKING), update mode prompt descriptions with TDD protocol, Definition of Done checklists, and mandatory test specifications.	2026-02-18 17:21:57 +09:00
Lukas May	9ed7e9ad16	refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist Replace the weak 7-step execution protocol with an explicit red-green-refactor cycle that requires agents to write failing tests before implementing. Move anti-patterns and scope rules above deviation/git sections so critical constraints get more attention. Add session startup verification, progress tracking, and a mandatory definition-of-done checklist that must pass before signaling completion. Remove dead CODEBASE_VERIFICATION import.	2026-02-18 17:20:11 +09:00
Lukas May	b5509232f6	refactor: Add testability focus and definition-of-done checklists to discuss/refine prompts Discuss prompt: add Testability & Verification question category, require verification criteria for behavioral decisions, add definition-of-done checklist. Refine prompt: strengthen unverifiable-requirements check to demand testable acceptance criteria with inputs/outputs, extend missing-edge-cases to frame as testable scenarios, add definition-of-done checklist.	2026-02-18 17:19:53 +09:00
Lukas May	09a388b490	refactor: Enforce mandatory test specs in detail prompt, add testing strategy to plan prompt Detail: Replace vague "how to verify" requirement with mandatory test specification (file path, scenarios, run command) for execute-category tasks. Update good-task example to demonstrate the new format. Add Definition of Done checklist. Plan: Add Testing Strategy section requiring tests within each implementation phase instead of trailing test phases. Add Definition of Done checklist.	2026-02-18 17:19:48 +09:00
Lukas May	298c570bc4	refactor: Overhaul shared prompt constants — remove CODEBASE_VERIFICATION, trim GIT_WORKFLOW/CONTEXT_MANAGEMENT, add TEST_INTEGRITY/SESSION_STARTUP/PROGRESS_TRACKING	2026-02-18 17:18:53 +09:00
Lukas May	c04e6d7778	refactor: Replace file-count task sizing with lines-changed heuristic Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro data (107 lines / 4.1 files = 46% success for best agents). Old rules used file count as the primary proxy which correlates poorly with task difficulty compared to lines changed.	2026-02-18 16:54:10 +09:00
Lukas May	7354582d69	refactor: Add context management to plan/detail prompts, update docs Add CONTEXT_MANAGEMENT shared block to plan and detail mode prompts so architect agents also benefit from compaction awareness and parallel execution hints. Update index.ts re-exports and agent docs.	2026-02-18 16:43:19 +09:00
Lukas May	4ef9db1501	refactor: Improve shared agent prompts — add context management, explain git rules, slim inter-agent comms - Add CONTEXT_MANAGEMENT constant: tells agents to keep working through context compaction and parallelize reads - Add "why" reasoning to each GIT_WORKFLOW rule so agents understand the purpose, not just the rule - Slim buildInterAgentCommunication: replace verbose bash code blocks with a brief usage pattern paragraph, condense CLI docs to bullet list	2026-02-18 16:41:55 +09:00
Lukas May	459c09b687	refactor: Overhaul execute prompt with test-first protocol, context management, anti-hardcoding - Add CONTEXT_MANAGEMENT import and inject into template - Rewrite execution protocol: test-first (step 3), parallel file reads, execution-over-deliberation - Add "why" rationale to scope rules (conflict prevention, overwrite risk) - Add hard-coded solutions anti-pattern, soften imperative tone - Rename section from "Anti-Patterns (never do these)" to "Anti-Patterns"	2026-02-18 16:41:53 +09:00
Lukas May	58514fef3f	docs: Document standalone agent path resolution in completion detection	2026-02-10 16:01:25 +01:00
Lukas May	2aa807a394	fix: Resolve signal.json path mismatch for standalone agents Standalone agents (no initiative or 0 linked projects) run in a workspace/ subdirectory, but signal.json lookups used the parent directory. This caused all standalone agents to be marked "crashed" despite successful completion. Track the actual agent cwd at spawn time via ActiveAgent.agentCwd and probe for the workspace/ subdirectory during reconciliation and crash detection paths.	2026-02-10 16:00:37 +01:00
Lukas May	62a542116d	feat: Add task deletion with shift+click auto-confirm - Add deleteTask tRPC mutation (repo already had delete method) - Add X button to TaskRow, hidden until hover, with confirmation dialog - Shift+click bypasses confirmation for fast bulk deletion - Invalidates listInitiativeTasks on success - Document shift+click pattern in CLAUDE.md as standard for destructive actions	2026-02-10 15:58:24 +01:00

1 2 3 4 5 ...

443 Commits