Commit Graph

71 Commits

Author SHA1 Message Date
Lukas May
76aca71705 refactor: Restructure agent prompts with XML tags
Replace ## Heading sections with descriptive XML tags (<role>, <task>,
<execution_protocol>, <examples>, etc.) for unambiguous first-order vs
second-order delimiter separation per Anthropic best practices.

- shared.ts: All constants wrapped in their XML tag
- Mode prompts: Consistent tag vocabulary and ordering across all 5 modes
- Examples use <examples> > <example label="good/bad"> nesting
- workspace.ts: Output wrapped in <workspace> tags
- Delete dead src/agent/prompts.ts (zero imports)
- Update docs/agent.md with XML tag documentation
2026-03-02 14:15:28 +09:00
Lukas May
1540039c52 test: Remove redundant and dead tests (-743 lines)
Delete 3 files:
- completion-detection.test.ts (private method tests, covered by crash-race-condition)
- completion-race-condition.test.ts (covered by mutex-completion + crash-race-condition)
- real-e2e-crash.test.ts (dead: expect(true).toBe(true), hardcoded paths)

Remove individual tests:
- crash-race-condition.test.ts #4 (weaker duplicate of #2)
- mock-manager.test.ts duplicate "(second test)" for detail_complete
- process-manager.test.ts 2 "logs comprehensive" tests with empty assertions
- edge-cases.test.ts 2 Q&A tests redundant with recovery-scenarios

Update test-inventory.md to reflect removals.
2026-03-02 12:57:27 +09:00
Lukas May
0ed657b644 feat: Add VCR-style cassette testing system for agent subprocess pipeline
Implements cassette recording/replay to test the full agent execution
pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager)
without real AI API calls.

Key components:
- `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached
  to replay cassettes or record real runs on completion
- `replay-worker.mjs`: standalone node script that replays JSONL + signal.json
  as a subprocess, exercising the complete file-based output pipeline
- `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash
- `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps,
  session numbers) from prompts for stable cassette keys
- `key.ts`: hashes normalized prompt + provider args + worktree file content
  (worktree hash detects content drift for execute-mode agents)
- `createCassetteHarness()`: wraps RealProviderHarness with cassette support,
  same interface so existing real-provider tests work unchanged

Mode control via env vars:
  (default)                  → replay: cassette must exist (safe for CI)
  CW_CASSETTE_RECORD=1       → auto: replay if exists, record if missing
  CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette

MultiProviderAgentManager gains an optional `processManagerOverride` constructor
parameter for clean dependency injection without changing existing callers.

Cassette files live in src/test/cassettes/ and are intended to be committed
to git so CI runs without API access.
2026-03-02 12:17:52 +09:00
Lukas May
1331fb737d refactor: Wire buildExecutePrompt into dispatch manager
Dispatch manager now wraps task descriptions with buildExecutePrompt()
so agents receive the full execution protocol. Update test to match
prompt wrapping. Add worktree isolation note to workspace layout.
2026-02-18 17:40:03 +09:00
Lukas May
b63a8b605c refactor: Compress refine prompt for conciseness (439→243 words, -45%)
- Tighten items 1-3 arrow notation, compress item 4 to Better/Best
  progressive comparison, shorten item 5 scenario example
- Cut 3 redundant Rules bullets (already stated in Output Files and
  guard paragraphs)
- Collapse 5 DoD checks to 2 non-redundant verification items
- Compress behavioral guard paragraphs
2026-02-18 17:30:57 +09:00
Lukas May
a4d48262c1 refactor: Compress detail prompt for conciseness (775→473 words, -39%)
Drop redundant Specificity Test section (covered by examples and checklist),
remove Task Design Rules (implied by entire prompt), flatten frontmatter
docs, trim good example, tighten sizing/checkpoint/context sections.
2026-02-18 17:30:56 +09:00
Lukas May
c9769b09b7 refactor: Compress plan prompt for conciseness
Cut ~35% of words while preserving all high-value content:
- Merged Testing Strategy into Phase Design (rule + example)
- Eliminated Rules section (redundant with Phase Design, Dependencies)
- Compressed Dependency Graph intro (examples speak for themselves)
- Trimmed File Ownership and Specificity prose
- Reduced Existing Context from 4 to 2 bullets
- Tightened Definition of Done checklist
2026-02-18 17:30:09 +09:00
Lukas May
a4502ebf77 refactor: Compress discuss prompt for conciseness (~30% word reduction)
Cut redundant rules already demonstrated by good/bad examples,
removed default-Claude-behavior instructions, collapsed verbose
sections into single directives.
2026-02-18 17:30:07 +09:00
Lukas May
e73e99cb28 refactor: Compress shared agent prompts for conciseness (1060→699 words, -34%)
Apply aggressive compression: imperative style, remove anti-laziness
emphasis, cut rationale where obvious, eliminate redundant explanations.
All constant names and function signatures preserved.
2026-02-18 17:30:04 +09:00
Lukas May
67f98f4f35 refactor: Compress execute prompt for conciseness (~47% word reduction)
- Cut 5 anti-patterns: placeholder code, blind imports, ignoring test
  failures (all default Claude behavior), plus self-validating tests
  and test mutation (both already covered by TEST_INTEGRITY in shared.ts)
- Compressed execution protocol steps to imperative essentials
- Merged scope rules from 4 bullets to 3
- Trimmed definition of done checklist (removed redundant 5th item)
- Removed anti-laziness language (IMPORTANT, MUST, aggressive emphasis)
2026-02-18 17:30:00 +09:00
Lukas May
9ed7e9ad16 refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist
Replace the weak 7-step execution protocol with an explicit red-green-refactor
cycle that requires agents to write failing tests before implementing. Move
anti-patterns and scope rules above deviation/git sections so critical
constraints get more attention. Add session startup verification, progress
tracking, and a mandatory definition-of-done checklist that must pass before
signaling completion. Remove dead CODEBASE_VERIFICATION import.
2026-02-18 17:20:11 +09:00
Lukas May
b5509232f6 refactor: Add testability focus and definition-of-done checklists to discuss/refine prompts
Discuss prompt: add Testability & Verification question category, require
verification criteria for behavioral decisions, add definition-of-done checklist.

Refine prompt: strengthen unverifiable-requirements check to demand testable
acceptance criteria with inputs/outputs, extend missing-edge-cases to frame as
testable scenarios, add definition-of-done checklist.
2026-02-18 17:19:53 +09:00
Lukas May
09a388b490 refactor: Enforce mandatory test specs in detail prompt, add testing strategy to plan prompt
Detail: Replace vague "how to verify" requirement with mandatory test specification
(file path, scenarios, run command) for execute-category tasks. Update good-task
example to demonstrate the new format. Add Definition of Done checklist.

Plan: Add Testing Strategy section requiring tests within each implementation phase
instead of trailing test phases. Add Definition of Done checklist.
2026-02-18 17:19:48 +09:00
Lukas May
298c570bc4 refactor: Overhaul shared prompt constants — remove CODEBASE_VERIFICATION, trim GIT_WORKFLOW/CONTEXT_MANAGEMENT, add TEST_INTEGRITY/SESSION_STARTUP/PROGRESS_TRACKING 2026-02-18 17:18:53 +09:00
Lukas May
c04e6d7778 refactor: Replace file-count task sizing with lines-changed heuristic
Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro
data (107 lines / 4.1 files = 46% success for best agents). Old rules
used file count as the primary proxy which correlates poorly with task
difficulty compared to lines changed.
2026-02-18 16:54:10 +09:00
Lukas May
7354582d69 refactor: Add context management to plan/detail prompts, update docs
Add CONTEXT_MANAGEMENT shared block to plan and detail mode prompts so
architect agents also benefit from compaction awareness and parallel
execution hints. Update index.ts re-exports and agent docs.
2026-02-18 16:43:19 +09:00
Lukas May
4ef9db1501 refactor: Improve shared agent prompts — add context management, explain git rules, slim inter-agent comms
- Add CONTEXT_MANAGEMENT constant: tells agents to keep working through
  context compaction and parallelize reads
- Add "why" reasoning to each GIT_WORKFLOW rule so agents understand the
  purpose, not just the rule
- Slim buildInterAgentCommunication: replace verbose bash code blocks with
  a brief usage pattern paragraph, condense CLI docs to bullet list
2026-02-18 16:41:55 +09:00
Lukas May
459c09b687 refactor: Overhaul execute prompt with test-first protocol, context management, anti-hardcoding
- Add CONTEXT_MANAGEMENT import and inject into template
- Rewrite execution protocol: test-first (step 3), parallel file reads, execution-over-deliberation
- Add "why" rationale to scope rules (conflict prevention, overwrite risk)
- Add hard-coded solutions anti-pattern, soften imperative tone
- Rename section from "Anti-Patterns (never do these)" to "Anti-Patterns"
2026-02-18 16:41:53 +09:00
Lukas May
2aa807a394 fix: Resolve signal.json path mismatch for standalone agents
Standalone agents (no initiative or 0 linked projects) run in a
workspace/ subdirectory, but signal.json lookups used the parent
directory. This caused all standalone agents to be marked "crashed"
despite successful completion.

Track the actual agent cwd at spawn time via ActiveAgent.agentCwd
and probe for the workspace/ subdirectory during reconciliation and
crash detection paths.
2026-02-10 16:00:37 +01:00
Lukas May
bfefbc85af feat: Switch cw ask from polling to SSE via onConversationAnswer subscription
- New onConversationAnswer subscription: listens for conversation:answered
  events matching a specific conversation ID, yields the answer text
- cw ask now subscribes via SSE instead of polling getConversation
- Removed --poll-interval and --timeout flags from cw ask
- Updated prompt to reflect SSE-based cw ask (no polling options)
2026-02-10 15:56:54 +01:00
Lukas May
bfc1b422f9 feat: Inject agent ID into prompts, SSE-based cw listen, all flags documented
- INTER_AGENT_COMMUNICATION constant → buildInterAgentCommunication(agentId) function
- Manager injects actual agent ID into prompt after DB record creation
- Agent ID hardcoded in cw listen/ask commands — no manifest.json indirection
- cw listen now uses onPendingConversation SSE subscription instead of polling
- CLI trpc-client upgraded with splitLink for subscription support
- All CLI flags (--agent-id, --from, --timeout, --poll-interval) documented in prompt
- conversation:created/answered added to ALL_EVENT_TYPES
2026-02-10 15:53:01 +01:00
Lukas May
3ff1f485f1 fix: Prevent agents page from scrolling — lock layout to viewport
Body: height 100vh + overflow hidden instead of min-height 100vh,
so the browser never shows a scrollbar on html/body.
AppLayout: h-screen flex column with shrink-0 header and flex-1
min-h-0 overflow-auto main. Pages like initiatives scroll within
main; agents page uses h-full with internal panel scrollers.
2026-02-10 15:47:55 +01:00
Lukas May
a6371e156a feat: Add inter-agent conversation system (listen, ask, answer)
Enables parallel agents to communicate through a CLI-based conversation
mechanism coordinated via tRPC. Agents can ask questions to peers and
receive answers, with target resolution by agent ID, task ID, or phase ID.
2026-02-10 13:43:30 +01:00
Lukas May
783a07bfb7 fix: Show actionable error details for account health check failures
Setup tokens from `claude setup-token` can't query the usage API,
resulting in a useless "Usage API request failed" message. Now shows
the actual HTTP status and guides users to complete OAuth setup.
Also distinguishes warning state (yellow) from error state (red)
in the AccountCard UI.
2026-02-10 13:16:03 +01:00
Lukas May
06f443ebc8 refactor: DB-driven agent output events with single emission point
DB log chunk insertion is now the sole trigger for agent:output events.
Eliminates triple emission (FileTailer, handleStreamEvent, output buffer)
in favor of: FileTailer.onRawContent → DB insert → EventBus emit.

- createLogChunkCallback emits agent:output after successful DB insert
- spawnInternal now wires onRawContent callback (fixes session 1 gap)
- Remove eventBus from FileTailer (no longer touches EventBus)
- Remove eventBus from ProcessManager constructor (dead parameter)
- Remove agent:output emission from handleStreamEvent text_delta
- Remove outputBuffers map and all buffer helpers from manager/handler
- Remove getOutputBuffer from AgentManager interface and implementations
- getAgentOutput tRPC: DB-only, no file fallback
- onAgentOutput subscription: no initial buffer yield, events only
- AgentOutputViewer: accumulates raw JSONL chunks, parses uniformly
2026-02-10 11:47:36 +01:00
Lukas May
ca548c1eaa feat: Auto-branch initiative system with per-project default branches
Planning tasks (research, discuss, plan, detail, refine) now run on
the project's defaultBranch instead of hardcoded 'main'. Execution
tasks (execute, verify, merge, review) auto-generate an initiative
branch (cw/<slug>) on first dispatch. Branch configuration removed
from initiative creation — it's now fully automatic.

- Add PLANNING_CATEGORIES/EXECUTION_CATEGORIES to branch-naming
- Dispatch manager splits logic by task category
- ProcessManager uses per-project defaultBranch fallback
- Phase dispatch uses project.defaultBranch for ensureBranch base
- Remove mergeTarget from createInitiative input
- Rename updateInitiativeMergeConfig → updateInitiativeConfig
- Add defaultBranch field to registerProject + UI
- Rename mergeTarget → branch across all frontend components
2026-02-10 10:53:35 +01:00
Lukas May
0407f05332 refactor: Rename agent modes breakdown→plan, decompose→detail
Full rename across the codebase for clarity:
- breakdown (initiative→phases) is now "plan"
- decompose (phase→tasks) is now "detail"

Updates schema enums, TypeScript types, events, prompts, output handler,
tRPC procedures, CLI commands, frontend components, tests, and docs.
Also fixes 0022 migration multi-statement issue (adds statement-breakpoint markers).
2026-02-10 10:51:42 +01:00
Lukas May
f9f8b4c185 refactor(agent): Use agent name instead of ID for log directory paths
Aligns agent-logs directory naming with agent-workdirs so both use the
human-readable agent name, making filesystem correlation trivial.
2026-02-10 10:41:47 +01:00
Lukas May
bf898cb86e feat(agent): Enrich breakdown/decompose agent input with full initiative context
Breakdown and decompose agents now receive all existing phases, tasks,
and pages as read-only context so they can plan with awareness of what
already exists instead of operating in a vacuum.
2026-02-10 10:18:55 +01:00
Lukas May
4d3bd9ca90 fix(agent): Add refresh token validation before token refresh
Check for refresh token availability before attempting credential refresh.
Setup tokens that expire without a refresh token now return a clear error
instead of attempting an invalid refresh operation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 10:01:35 +01:00
Lukas May
265fcb1149 fix(agent): Add refresh token validation before token refresh
Check for refresh token availability before attempting credential refresh.
Setup tokens that expire without a refresh token now return a clear error
instead of attempting an invalid refresh operation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:54:00 +01:00
Lukas May
e35927f321 fix(agent): Handle optional OAuth fields in usage.ts credential reader
Updated checkAccountHealth to handle setup tokens with null expiresAt:
- Changed currentExpiresAt type from number to number | null
- Use conditional for tokenExpiresAt ISO string conversion

This completes the OAuth setup token support across all credential
reading and validation functions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:50:46 +01:00
Lukas May
b021b9690e fix(agent): Handle expired setup tokens without refresh token
Add validation to check for refresh token availability before attempting
token refresh. Setup tokens that expire without a refresh token now
return a clear error message instead of attempting an invalid refresh.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:50:40 +01:00
Lukas May
a59e18710f fix(agent): Handle optional OAuth fields in usage.ts credential reader
Make refreshToken and expiresAt optional in usage credential validation.
Aligns with changes in default-credential-manager.ts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:50:22 +01:00
Lukas May
11b1378b91 fix(agent): Handle optional OAuth token fields in credential manager
Updated readCredentials and isTokenExpired to support setup tokens:
- Removed refreshToken requirement check
- Use nullish coalescing for refreshToken and expiresAt fields
- Treat tokens without expiresAt as non-expired

Completes OAuth credential handling for setup tokens across all
credential management functions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:50:19 +01:00
Lukas May
8930d1aa43 fix(agent): Handle optional OAuth token fields in credential manager
Make refreshToken and expiresAt optional in OAuth credential validation.
Setup tokens without expiry are now treated as non-expired.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:49:55 +01:00
Lukas May
008c783c50 fix(agent): Handle null refreshToken/expiresAt in credential manager
Updated DefaultAccountCredentialManager to handle setup tokens:
- Removed refreshToken requirement in validation check
- Use nullish coalescing for refreshToken and expiresAt
- Treat tokens without expiresAt as non-expired (setup tokens)

Completes the setup token support changes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:49:48 +01:00
Lukas May
c204aab403 fix(agent): Allow null refreshToken and expiresAt for setup tokens
Modified OAuthCredentials interface to support setup tokens that don't
have refresh tokens or expiry times:
- refreshToken: string | null
- expiresAt: number | null

Updated in both src/agent/accounts/usage.ts and
src/agent/credentials/types.ts for consistency.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:49:36 +01:00
Lukas May
342b490fe7 feat: Task decomposition for Tailwind/Radix/shadcn foundation setup
Decomposed "Foundation Setup - Install Dependencies & Configure Tailwind"
phase into 6 executable tasks:

1. Install Tailwind CSS, PostCSS & Autoprefixer
2. Map MUI theme to Tailwind design tokens
3. Setup CSS variables for dynamic theming
4. Install Radix UI primitives
5. Initialize shadcn/ui and setup component directory
6. Move MUI to devDependencies and verify setup

Tasks follow logical dependency chain with final human verification
checkpoint before proceeding with component migration.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:48:51 +01:00
Lukas May
fab7706f5c feat: Phase schema refactor, agent lifecycle module, and log chunks
Phase model changes:
- Drop `number` column (ordering now by createdAt + dependency DAG)
- Replace `description` (plain text) with `content` (Tiptap JSON)
- Add `approved` status as dispatch gate
- Add phase dependency management (list, remove, dependents)
- Approval gate in PhaseDispatchManager.queuePhase()

Agent log chunks:
- New `agent_log_chunks` table for DB-first output persistence
- LogChunkRepository port + DrizzleLogChunkRepository adapter
- FileTailer onRawContent callback streams chunks to DB
- getAgentOutput reads from DB first, falls back to file

Agent lifecycle module (src/agent/lifecycle/):
- SignalManager: atomic signal.json read/write/wait operations
- RetryPolicy: exponential backoff with error-specific strategies
- ErrorAnalyzer: pattern-based error classification
- CleanupStrategy: debug archival vs production cleanup
- AgentLifecycleController: orchestrates retry/recovery flow
- Missing signal recovery with instruction injection

Completion detection fixes:
- Read signal.json file instead of parsing stdout as JSON
- Cancellable pollForCompletion with { cancel } handle
- Centralized state cleanup via cleanupAgentState()
- Credential handler consolidation (prepareProcessEnv)

Prompts refactor:
- Split monolithic prompts.ts into per-mode modules
- Add workspace layout section to agent prompts
- Fix markdown-to-tiptap double-serialization

Server/tRPC:
- Subscription heartbeat (30s) and bounded queue (1000 max)
- Phase CRUD: approvePhase, deletePhase, dependency queries
- Page: findByIds, getPageUpdatedAtMap
- Wire new repositories through container and context
2026-02-09 22:33:28 +01:00
Lukas May
43e2c8b0ba fix(agent): Eliminate race condition in completion handling
PROBLEM:
- Agents completing with questions were incorrectly marked as "crashed"
- Race condition: polling handler AND crash handler both called handleCompletion()
- Caused database corruption and lost pending questions

SOLUTION:
- Add completion mutex in OutputHandler to prevent concurrent processing
- Remove duplicate completion call from crash handler
- Only one handler executes completion logic per agent

TESTING:
- Added mutex-completion.test.ts with 4 test cases
- Verified mutex prevents concurrent access
- Verified lock cleanup on exceptions
- Verified different agents can process concurrently

FIXES: residential-cuckoo and 12+ other agents stuck in crashed state
2026-02-08 15:51:32 +01:00
Lukas May
6f5fd3a0af fix(agent): Implement incremental JSONL parsing to eliminate race conditions
Replaces file completion detection with a superior approach that reads only
complete JSONL lines and tracks file position. This eliminates race conditions
without any delays or polling.

Key improvements:
- Read up to last complete line, avoiding partial lines during writes
- Track file position per agent for incremental reading
- Process only valid, complete JSON lines
- Clean up position tracking on completion/crash
- No hardcoded delays or polling required

This approach is more robust, responsive, and elegant than timing-based solutions.
The race condition where agents were marked as crashed is now completely resolved.
2026-02-08 14:10:02 +01:00
Lukas May
604da7cd0d fix(agent): Replace hardcoded 500ms delay with robust file completion detection
Fixes race condition where agents were incorrectly marked as crashed when
output files took longer than 500ms to complete writing.

Changes:
- Replace hardcoded 500ms delay with polling-based file completion detection
- Add signal file validation to ensure JSON is complete before processing
- Make status updates atomic to prevent race conditions
- Update cleanup manager to pass outputFilePath for proper timing

This resolves the issue where successful agents like "abundant-wolverine"
were marked as crashed despite producing valid output.
2026-02-08 14:03:47 +01:00
Lukas May
2877484012 Add userDismissedAt field to agents schema 2026-02-07 00:33:12 +01:00
Lukas May
5605547aea fix(13-01): parse structured_output from Claude CLI response
- Add structured_output field to ClaudeCliResult interface
- Read from structured_output when present (--json-schema response)
- Fall back to parsing result for backwards compatibility
2026-02-02 10:38:10 +01:00
Lukas May
a79b15376e test(12-07): add MockAgentManager decompose mode tests
Add tests for decompose mode scenarios:
- Spawn agent in decompose mode
- Complete with tasks on decompose_complete
- Pause on questions in decompose mode
- Emit stopped event with decompose_complete reason
- Set result message with task count
2026-02-01 11:54:20 +01:00
Lukas May
7ff979becf feat(12-05): export buildDecomposePrompt from agent module
- Add buildDecomposePrompt to public exports
2026-02-01 11:49:57 +01:00
Lukas May
48336ec39d feat(12-05): create buildDecomposePrompt function
- Add buildDecomposePrompt for decompose mode agent operations
- Import Phase and Plan types from schema
- Comprehensive prompt explaining task breakdown rules, types, and output format
2026-02-01 11:49:45 +01:00
Lukas May
2bd0bc52a3 feat(12-03): add decompose mode support to MockAgentManager
- Import TaskBreakdown from schema.ts
- Add decompose_complete status to MockAgentScenario type
- Update completeAgent() to handle decompose_complete scenarios
- Emit agent:stopped with reason 'decompose_complete' for E2E testing
2026-02-01 11:44:40 +01:00
Lukas May
8754cdea98 feat(12-03): add decompose mode support to ClaudeAgentManager
- Import decomposeOutputSchema and decomposeOutputJsonSchema from schema.ts
- Update getJsonSchemaForMode() to handle 'decompose' mode
- Add handleDecomposeOutput() method following pattern of handleBreakdownOutput()
- Update handleAgentCompletion() switch to call handleDecomposeOutput for decompose mode
- Handle decompose_complete/questions/unrecoverable_error statuses
2026-02-01 11:43:55 +01:00