Codewalkers

Author	SHA1	Message	Date
Lukas May	478a7f18e9	docs: Add v2 wireframes and theme specification 14 files in docs/wireframes/v2/ addressing 13 UX gaps from v1: - Theme spec with indigo brand, status tokens, terminal/diff tokens, dark mode, Geist typography, 6px radius, layered shadows - Wireframes for all pages with loading/error/empty states - Shared component specs (SaveIndicator, EmptyState, ErrorState, CommandPalette, ThemeToggle)	2026-03-02 18:13:17 +09:00
Lukas May	41b1d0e986	feat: Add cassette support for full-flow integration test - normalizer.ts: Add NANOID_RE (21-char alphanumeric) → __ID__ as step 2.5, fixing cassette key instability from nanoid agent IDs in prompts - harness.ts: Add FullFlowHarnessOptions.processManagerFactory for injecting CassetteProcessManager without duplicating harness setup - full-flow-cassette.test.ts: New cassette-backed variant of full-flow test; skips automatically when no cassettes exist (fresh clone), runs in ~seconds once cassettes are recorded and committed - CLAUDE.md: Document cassette recording command for the full-flow test	2026-03-02 17:42:43 +09:00
Lukas May	89db580ca4	docs: Add ASCII wireframe mockups for all frontend pages Covers: app layout, initiatives list, initiative detail (4 tabs), agents page, inbox, settings (health + projects), and all dialogs.	2026-03-02 17:28:14 +09:00
Lukas May	988160b2b7	fix: Patch full-flow test timeouts and driveToCompletion polling loop - driveToCompletion() now catches inner waitForAgentAttention timeouts instead of letting them propagate — long-running execute/detail agents (>3 min without transitioning to waiting_for_input) no longer crash the polling loop; the outer deadline handles termination correctly - Switch execute stage from waitForAgentCompletion to driveToCompletion so any clarifying questions get auto-answered - Increase DETAIL_TIMEOUT_MS 8→15 min, PLAN_TIMEOUT_MS 8→12 min, EXECUTE_TIMEOUT_MS 10→20 min — architect agents are variable in practice; these are upper bounds not expectations - Raise FULL_FLOW_TIMEOUT 30→60 min to cover worst-case stacking - Update CLAUDE.md test command with correct --test-timeout=3600000 Verified: full pipeline (discuss→plan→detail→execute) passes in ~499s	2026-03-02 17:15:12 +09:00
Lukas May	76aca71705	refactor: Restructure agent prompts with XML tags Replace ## Heading sections with descriptive XML tags (<role>, <task>, <execution_protocol>, <examples>, etc.) for unambiguous first-order vs second-order delimiter separation per Anthropic best practices. - shared.ts: All constants wrapped in their XML tag - Mode prompts: Consistent tag vocabulary and ordering across all 5 modes - Examples use <examples> > <example label="good/bad"> nesting - workspace.ts: Output wrapped in <workspace> tags - Delete dead src/agent/prompts.ts (zero imports) - Update docs/agent.md with XML tag documentation	2026-03-02 14:15:28 +09:00
Lukas May	55eb6a494b	test: Add full-flow integration test (discuss→plan→detail→execute) Adds a complete multi-agent workflow test gated behind FULL_FLOW_TESTS=1: - src/test/fixtures/todo-api/ — minimal JS project with missing complete() method and failing tests; gives execute agents a concrete, verifiable task - src/test/integration/full-flow/harness.ts — FullFlowHarness wiring all 11 repos + real MultiProviderAgentManager + tRPC caller + driveToCompletion() helper for Q&A loops - src/test/integration/full-flow/report.ts — stage-by-stage console formatters (discuss/plan/detail/execute/git diff/final summary) - src/test/integration/full-flow/full-flow.test.ts — staged integration test that validates breakdown granularity, agent output quality, and that npm test passes in the project worktree after execution Run with: FULL_FLOW_TESTS=1 npm test -- src/test/integration/full-flow/ --test-timeout=1800000	2026-03-02 13:28:23 +09:00
Lukas May	1540039c52	test: Remove redundant and dead tests (-743 lines) Delete 3 files: - completion-detection.test.ts (private method tests, covered by crash-race-condition) - completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition) - real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths) Remove individual tests: - crash-race-condition.test.ts #4 (weaker duplicate of #2) - mock-manager.test.ts duplicate "(second test)" for detail_complete - process-manager.test.ts 2 "logs comprehensive" tests with empty assertions - edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios Update test-inventory.md to reflect removals.	2026-03-02 12:57:27 +09:00
Lukas May	a2ab4c4a84	docs: Add comprehensive test inventory with coverage gaps and redundancy map Audited all 44 test files one by one. Documents what each test verifies, identifies 12 redundant test pairs, 13 coverage gaps (prioritized), fragility assessment, and mock style inconsistencies.	2026-03-02 12:23:39 +09:00
Lukas May	e9ec5143fd	docs: Document cassette testing system in docs/testing.md and CLAUDE.md	2026-03-02 12:22:46 +09:00
Lukas May	ec031211a2	fix: Resolve advanceTimers return type mismatch (Promise<VitestUtils> → Promise<void>)	2026-03-02 12:19:47 +09:00
Lukas May	0ed657b644	feat: Add VCR-style cassette testing system for agent subprocess pipeline Implements cassette recording/replay to test the full agent execution pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager) without real AI API calls. Key components: - `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached to replay cassettes or record real runs on completion - `replay-worker.mjs`: standalone node script that replays JSONL + signal.json as a subprocess, exercising the complete file-based output pipeline - `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash - `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps, session numbers) from prompts for stable cassette keys - `key.ts`: hashes normalized prompt + provider args + worktree file content (worktree hash detects content drift for execute-mode agents) - `createCassetteHarness()`: wraps RealProviderHarness with cassette support, same interface so existing real-provider tests work unchanged Mode control via env vars: (default) → replay: cassette must exist (safe for CI) CW_CASSETTE_RECORD=1 → auto: replay if exists, record if missing CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette MultiProviderAgentManager gains an optional `processManagerOverride` constructor parameter for clean dependency injection without changing existing callers. Cassette files live in src/test/cassettes/ and are intended to be committed to git so CI runs without API access.	2026-03-02 12:17:52 +09:00
Lukas May	a1366efe4d	refactor: Standardize fake timer usage across E2E tests - Add withFakeTimers(fn) helper to TestHarness for scoped timer control - Replace all vi.runAllTimersAsync() with harness.advanceTimers() in E2E and harness tests (37 call sites across 5 files) - Keep vi.useFakeTimers() per-test activation pattern (intentional)	2026-03-02 12:08:24 +09:00
Lukas May	dcb855ede1	fix: Repair test harness coverage, excludes, and timer overhead - Add @vitest/coverage-v8 dep so `npm run test:coverage` actually works - Add exclude patterns to vitest config (node_modules, dist, packages) - Replace dynamic import('vitest') in advanceTimers with direct vi import	2026-03-02 12:01:16 +09:00
Lukas May	863117c63a	fix: Detach agents before initiative deletion to prevent FK constraint failure Nulls out agents.initiativeId before deleting the initiative row, ensuring the delete succeeds even on databases where migration 0025 (which adds ON DELETE SET NULL to the FK) hasn't been applied.	2026-02-18 18:35:06 +09:00
Lukas May	6fa025251e	feat: Wire up initiative deletion end-to-end Add deleteInitiative tRPC procedure, wire Delete button in InitiativeCard with confirm dialog (Shift+click bypass), remove unused onDelete prop chain. Fix agents table FK constraints (initiative_id, account_id missing ON DELETE SET NULL) via table recreation migration. Register conversations migration in journal. Expand cascade delete tests to cover pages, projects, change sets, agents (set null), and conversations (set null).	2026-02-18 17:54:53 +09:00
Lukas May	80aa3e42fb	Fix StatusBadge crash when status is undefined	2026-02-18 17:44:38 +09:00
Lukas May	8bece70a61	fix: Wire archive button to updateInitiative mutation The Archive menu item in InitiativeCard had no onClick handler. Added mutation call with confirmation dialog (shift+click to skip).	2026-02-18 17:44:01 +09:00
Lukas May	e52b9d3332	Remove unused Edit and Duplicate menu items from initiative card	2026-02-18 17:43:21 +09:00
Lukas May	1331fb737d	refactor: Wire buildExecutePrompt into dispatch manager Dispatch manager now wraps task descriptions with buildExecutePrompt() so agents receive the full execution protocol. Update test to match prompt wrapping. Add worktree isolation note to workspace layout.	2026-02-18 17:40:03 +09:00
Lukas May	b63a8b605c	refactor: Compress refine prompt for conciseness (439→243 words, -45%) - Tighten items 1-3 arrow notation, compress item 4 to Better/Best progressive comparison, shorten item 5 scenario example - Cut 3 redundant Rules bullets (already stated in Output Files and guard paragraphs) - Collapse 5 DoD checks to 2 non-redundant verification items - Compress behavioral guard paragraphs	2026-02-18 17:30:57 +09:00
Lukas May	a4d48262c1	refactor: Compress detail prompt for conciseness (775→473 words, -39%) Drop redundant Specificity Test section (covered by examples and checklist), remove Task Design Rules (implied by entire prompt), flatten frontmatter docs, trim good example, tighten sizing/checkpoint/context sections.	2026-02-18 17:30:56 +09:00
Lukas May	c9769b09b7	refactor: Compress plan prompt for conciseness Cut ~35% of words while preserving all high-value content: - Merged Testing Strategy into Phase Design (rule + example) - Eliminated Rules section (redundant with Phase Design, Dependencies) - Compressed Dependency Graph intro (examples speak for themselves) - Trimmed File Ownership and Specificity prose - Reduced Existing Context from 4 to 2 bullets - Tightened Definition of Done checklist	2026-02-18 17:30:09 +09:00
Lukas May	a4502ebf77	refactor: Compress discuss prompt for conciseness (~30% word reduction) Cut redundant rules already demonstrated by good/bad examples, removed default-Claude-behavior instructions, collapsed verbose sections into single directives.	2026-02-18 17:30:07 +09:00
Lukas May	e73e99cb28	refactor: Compress shared agent prompts for conciseness (1060→699 words, -34%) Apply aggressive compression: imperative style, remove anti-laziness emphasis, cut rationale where obvious, eliminate redundant explanations. All constant names and function signatures preserved.	2026-02-18 17:30:04 +09:00
Lukas May	67f98f4f35	refactor: Compress execute prompt for conciseness (~47% word reduction) - Cut 5 anti-patterns: placeholder code, blind imports, ignoring test failures (all default Claude behavior), plus self-validating tests and test mutation (both already covered by TEST_INTEGRITY in shared.ts) - Compressed execution protocol steps to imperative essentials - Merged scope rules from 4 bullets to 3 - Trimmed definition of done checklist (removed redundant 5th item) - Removed anti-laziness language (IMPORTANT, MUST, aggressive emphasis)	2026-02-18 17:30:00 +09:00
Lukas May	44d2a3ff08	docs: Update agent.md to reflect prompt overhaul Remove CODEBASE_VERIFICATION references, document new shared constants (TEST_INTEGRITY, SESSION_STARTUP, PROGRESS_TRACKING), update mode prompt descriptions with TDD protocol, Definition of Done checklists, and mandatory test specifications.	2026-02-18 17:21:57 +09:00
Lukas May	9ed7e9ad16	refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist Replace the weak 7-step execution protocol with an explicit red-green-refactor cycle that requires agents to write failing tests before implementing. Move anti-patterns and scope rules above deviation/git sections so critical constraints get more attention. Add session startup verification, progress tracking, and a mandatory definition-of-done checklist that must pass before signaling completion. Remove dead CODEBASE_VERIFICATION import.	2026-02-18 17:20:11 +09:00
Lukas May	b5509232f6	refactor: Add testability focus and definition-of-done checklists to discuss/refine prompts Discuss prompt: add Testability & Verification question category, require verification criteria for behavioral decisions, add definition-of-done checklist. Refine prompt: strengthen unverifiable-requirements check to demand testable acceptance criteria with inputs/outputs, extend missing-edge-cases to frame as testable scenarios, add definition-of-done checklist.	2026-02-18 17:19:53 +09:00
Lukas May	09a388b490	refactor: Enforce mandatory test specs in detail prompt, add testing strategy to plan prompt Detail: Replace vague "how to verify" requirement with mandatory test specification (file path, scenarios, run command) for execute-category tasks. Update good-task example to demonstrate the new format. Add Definition of Done checklist. Plan: Add Testing Strategy section requiring tests within each implementation phase instead of trailing test phases. Add Definition of Done checklist.	2026-02-18 17:19:48 +09:00
Lukas May	298c570bc4	refactor: Overhaul shared prompt constants — remove CODEBASE_VERIFICATION, trim GIT_WORKFLOW/CONTEXT_MANAGEMENT, add TEST_INTEGRITY/SESSION_STARTUP/PROGRESS_TRACKING	2026-02-18 17:18:53 +09:00
Lukas May	c04e6d7778	refactor: Replace file-count task sizing with lines-changed heuristic Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro data (107 lines / 4.1 files = 46% success for best agents). Old rules used file count as the primary proxy which correlates poorly with task difficulty compared to lines changed.	2026-02-18 16:54:10 +09:00
Lukas May	7354582d69	refactor: Add context management to plan/detail prompts, update docs Add CONTEXT_MANAGEMENT shared block to plan and detail mode prompts so architect agents also benefit from compaction awareness and parallel execution hints. Update index.ts re-exports and agent docs.	2026-02-18 16:43:19 +09:00
Lukas May	4ef9db1501	refactor: Improve shared agent prompts — add context management, explain git rules, slim inter-agent comms - Add CONTEXT_MANAGEMENT constant: tells agents to keep working through context compaction and parallelize reads - Add "why" reasoning to each GIT_WORKFLOW rule so agents understand the purpose, not just the rule - Slim buildInterAgentCommunication: replace verbose bash code blocks with a brief usage pattern paragraph, condense CLI docs to bullet list	2026-02-18 16:41:55 +09:00
Lukas May	459c09b687	refactor: Overhaul execute prompt with test-first protocol, context management, anti-hardcoding - Add CONTEXT_MANAGEMENT import and inject into template - Rewrite execution protocol: test-first (step 3), parallel file reads, execution-over-deliberation - Add "why" rationale to scope rules (conflict prevention, overwrite risk) - Add hard-coded solutions anti-pattern, soften imperative tone - Rename section from "Anti-Patterns (never do these)" to "Anti-Patterns"	2026-02-18 16:41:53 +09:00
Lukas May	58514fef3f	docs: Document standalone agent path resolution in completion detection	2026-02-10 16:01:25 +01:00
Lukas May	2aa807a394	fix: Resolve signal.json path mismatch for standalone agents Standalone agents (no initiative or 0 linked projects) run in a workspace/ subdirectory, but signal.json lookups used the parent directory. This caused all standalone agents to be marked "crashed" despite successful completion. Track the actual agent cwd at spawn time via ActiveAgent.agentCwd and probe for the workspace/ subdirectory during reconciliation and crash detection paths.	2026-02-10 16:00:37 +01:00
Lukas May	62a542116d	feat: Add task deletion with shift+click auto-confirm - Add deleteTask tRPC mutation (repo already had delete method) - Add X button to TaskRow, hidden until hover, with confirmation dialog - Shift+click bypasses confirmation for fast bulk deletion - Invalidates listInitiativeTasks on success - Document shift+click pattern in CLAUDE.md as standard for destructive actions	2026-02-10 15:58:24 +01:00
Lukas May	bfefbc85af	feat: Switch cw ask from polling to SSE via onConversationAnswer subscription - New onConversationAnswer subscription: listens for conversation:answered events matching a specific conversation ID, yields the answer text - cw ask now subscribes via SSE instead of polling getConversation - Removed --poll-interval and --timeout flags from cw ask - Updated prompt to reflect SSE-based cw ask (no polling options)	2026-02-10 15:56:54 +01:00
Lukas May	bfc1b422f9	feat: Inject agent ID into prompts, SSE-based cw listen, all flags documented - INTER_AGENT_COMMUNICATION constant → buildInterAgentCommunication(agentId) function - Manager injects actual agent ID into prompt after DB record creation - Agent ID hardcoded in cw listen/ask commands — no manifest.json indirection - cw listen now uses onPendingConversation SSE subscription instead of polling - CLI trpc-client upgraded with splitLink for subscription support - All CLI flags (--agent-id, --from, --timeout, --poll-interval) documented in prompt - conversation:created/answered added to ALL_EVENT_TYPES	2026-02-10 15:53:01 +01:00
Lukas May	c2d665c24f	feat: Make initiative branch and execution mode editable from header - Execution mode badge toggles between YOLO/REVIEW on click - Branch badge opens inline editor (input + save/cancel) - Branch editing locked once any task has left pending status - Server-side guard rejects branch changes after work has started - getInitiative returns branchLocked flag - updateInitiativeConfig now accepts optional branch field	2026-02-10 15:52:40 +01:00
Lukas May	3ff1f485f1	fix: Prevent agents page from scrolling — lock layout to viewport Body: height 100vh + overflow hidden instead of min-height 100vh, so the browser never shows a scrollbar on html/body. AppLayout: h-screen flex column with shrink-0 header and flex-1 min-h-0 overflow-auto main. Pages like initiatives scroll within main; agents page uses h-full with internal panel scrollers.	2026-02-10 15:47:55 +01:00
Lukas May	142f67c131	fix: Prevent agents page from scrolling — constrain scroll to panels Left agent list gets min-h-0 for proper overflow containment in grid. Right output panel gets overflow-hidden so AgentOutputViewer stays within the available grid cell height.	2026-02-10 15:32:43 +01:00
Lukas May	9f5421f6bc	test: Rewrite conversation integration test with mock server and real tasks Replace full CoordinationServer with a lightweight mock that serves only conversation tRPC procedures backed by an in-memory repository. Agents now have real coding tasks (write spec, ask questions, create summary) and the two-question flow proves the listen→answer→re-listen cycle works.	2026-02-10 15:27:26 +01:00
Lukas May	f8c5dce588	test: Add PreviewManager integration tests 21 tests covering the full preview lifecycle: start (happy path, phaseId, Docker unavailable, project not found, compose failure, health check failure, no healthchecks), stop, list (with filter, missing labels), getStatus (running/failed/stopped/building/not found), and stopAll (including partial failure resilience).	2026-02-10 14:02:43 +01:00
Lukas May	9902069d8d	test: Add real Claude inter-agent conversation integration test Two-session test: Agent A listens for questions and answers, Agent B asks a question and captures the response. Also fixes missing conversationRepository passthrough in tRPC adapter.	2026-02-10 13:49:04 +01:00
Lukas May	60f06671e4	fix: Include dirty worktree paths in commit prompt and fix retry counter Three bugs fixed in auto-cleanup commit retry flow: 1. resumeForCommit now calls getDirtyWorktreePaths() to include specific project subdirectory names in the prompt, so the agent knows which dirs to cd into and commit (instead of running git from the non-repo parent dir). 2. Removed finally block in tryAutoCleanup that reset the retry counter after every call, making MAX_COMMIT_RETRIES ineffective. Counter is now only cleaned up on success, max retries, or error. 3. resumeForCommit returns false early if no worktrees are actually dirty, preventing unnecessary commit retries for clean agents.	2026-02-10 13:44:10 +01:00
Lukas May	a6371e156a	feat: Add inter-agent conversation system (listen, ask, answer) Enables parallel agents to communicate through a CLI-based conversation mechanism coordinated via tRPC. Agents can ask questions to peers and receive answers, with target resolution by agent ID, task ID, or phase ID.	2026-02-10 13:43:30 +01:00
Lukas May	270a5cb21d	feat: Add Docker-based preview deployments for phase review Preview deployments let reviewers spin up the app at a specific branch in local Docker containers, accessible through a single Caddy reverse proxy port. Docker is the source of truth — no database table needed. New module: src/preview/ with config discovery (.cw-preview.yml → compose → Dockerfile fallback), compose generation, Docker CLI wrapper, health checking, and port allocation (9100-9200 range).	2026-02-10 13:24:56 +01:00
Lukas May	783a07bfb7	fix: Show actionable error details for account health check failures Setup tokens from `claude setup-token` can't query the usage API, resulting in a useless "Usage API request failed" message. Now shows the actual HTTP status and guides users to complete OAuth setup. Also distinguishes warning state (yellow) from error state (red) in the AccountCard UI.	2026-02-10 13:16:03 +01:00
Lukas May	06f443ebc8	refactor: DB-driven agent output events with single emission point DB log chunk insertion is now the sole trigger for agent:output events. Eliminates triple emission (FileTailer, handleStreamEvent, output buffer) in favor of: FileTailer.onRawContent → DB insert → EventBus emit. - createLogChunkCallback emits agent:output after successful DB insert - spawnInternal now wires onRawContent callback (fixes session 1 gap) - Remove eventBus from FileTailer (no longer touches EventBus) - Remove eventBus from ProcessManager constructor (dead parameter) - Remove agent:output emission from handleStreamEvent text_delta - Remove outputBuffers map and all buffer helpers from manager/handler - Remove getOutputBuffer from AgentManager interface and implementations - getAgentOutput tRPC: DB-only, no file fallback - onAgentOutput subscription: no initial buffer yield, events only - AgentOutputViewer: accumulates raw JSONL chunks, parses uniformly	2026-02-10 11:47:36 +01:00

1 2 3 4 5 ...

430 Commits