Replace ## Heading sections with descriptive XML tags (<role>, <task>,
<execution_protocol>, <examples>, etc.) for unambiguous first-order vs
second-order delimiter separation per Anthropic best practices.
- shared.ts: All constants wrapped in their XML tag
- Mode prompts: Consistent tag vocabulary and ordering across all 5 modes
- Examples use <examples> > <example label="good/bad"> nesting
- workspace.ts: Output wrapped in <workspace> tags
- Delete dead src/agent/prompts.ts (zero imports)
- Update docs/agent.md with XML tag documentation
Implements cassette recording/replay to test the full agent execution
pipeline (ProcessManager → FileTailer → OutputHandler → SignalManager)
without real AI API calls.
Key components:
- `CassetteProcessManager`: extends ProcessManager, intercepts spawnDetached
to replay cassettes or record real runs on completion
- `replay-worker.mjs`: standalone node script that replays JSONL + signal.json
as a subprocess, exercising the complete file-based output pipeline
- `CassetteStore`: reads/writes cassette JSON files keyed by SHA256 hash
- `normalizer.ts`: strips dynamic content (UUIDs, temp paths, timestamps,
session numbers) from prompts for stable cassette keys
- `key.ts`: hashes normalized prompt + provider args + worktree file content
(worktree hash detects content drift for execute-mode agents)
- `createCassetteHarness()`: wraps RealProviderHarness with cassette support,
same interface so existing real-provider tests work unchanged
Mode control via env vars:
(default) → replay: cassette must exist (safe for CI)
CW_CASSETTE_RECORD=1 → auto: replay if exists, record if missing
CW_CASSETTE_FORCE_RECORD=1 → record: always run real agent, overwrite cassette
MultiProviderAgentManager gains an optional `processManagerOverride` constructor
parameter for clean dependency injection without changing existing callers.
Cassette files live in src/test/cassettes/ and are intended to be committed
to git so CI runs without API access.
Dispatch manager now wraps task descriptions with buildExecutePrompt()
so agents receive the full execution protocol. Update test to match
prompt wrapping. Add worktree isolation note to workspace layout.
Drop redundant Specificity Test section (covered by examples and checklist),
remove Task Design Rules (implied by entire prompt), flatten frontmatter
docs, trim good example, tighten sizing/checkpoint/context sections.
Replace the weak 7-step execution protocol with an explicit red-green-refactor
cycle that requires agents to write failing tests before implementing. Move
anti-patterns and scope rules above deviation/git sections so critical
constraints get more attention. Add session startup verification, progress
tracking, and a mandatory definition-of-done checklist that must pass before
signaling completion. Remove dead CODEBASE_VERIFICATION import.
Detail: Replace vague "how to verify" requirement with mandatory test specification
(file path, scenarios, run command) for execute-category tasks. Update good-task
example to demonstrate the new format. Add Definition of Done checklist.
Plan: Add Testing Strategy section requiring tests within each implementation phase
instead of trailing test phases. Add Definition of Done checklist.
Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro
data (107 lines / 4.1 files = 46% success for best agents). Old rules
used file count as the primary proxy which correlates poorly with task
difficulty compared to lines changed.
Add CONTEXT_MANAGEMENT shared block to plan and detail mode prompts so
architect agents also benefit from compaction awareness and parallel
execution hints. Update index.ts re-exports and agent docs.
- Add CONTEXT_MANAGEMENT constant: tells agents to keep working through
context compaction and parallelize reads
- Add "why" reasoning to each GIT_WORKFLOW rule so agents understand the
purpose, not just the rule
- Slim buildInterAgentCommunication: replace verbose bash code blocks with
a brief usage pattern paragraph, condense CLI docs to bullet list
Standalone agents (no initiative or 0 linked projects) run in a
workspace/ subdirectory, but signal.json lookups used the parent
directory. This caused all standalone agents to be marked "crashed"
despite successful completion.
Track the actual agent cwd at spawn time via ActiveAgent.agentCwd
and probe for the workspace/ subdirectory during reconciliation and
crash detection paths.
- New onConversationAnswer subscription: listens for conversation:answered
events matching a specific conversation ID, yields the answer text
- cw ask now subscribes via SSE instead of polling getConversation
- Removed --poll-interval and --timeout flags from cw ask
- Updated prompt to reflect SSE-based cw ask (no polling options)
- INTER_AGENT_COMMUNICATION constant → buildInterAgentCommunication(agentId) function
- Manager injects actual agent ID into prompt after DB record creation
- Agent ID hardcoded in cw listen/ask commands — no manifest.json indirection
- cw listen now uses onPendingConversation SSE subscription instead of polling
- CLI trpc-client upgraded with splitLink for subscription support
- All CLI flags (--agent-id, --from, --timeout, --poll-interval) documented in prompt
- conversation:created/answered added to ALL_EVENT_TYPES
Body: height 100vh + overflow hidden instead of min-height 100vh,
so the browser never shows a scrollbar on html/body.
AppLayout: h-screen flex column with shrink-0 header and flex-1
min-h-0 overflow-auto main. Pages like initiatives scroll within
main; agents page uses h-full with internal panel scrollers.
Enables parallel agents to communicate through a CLI-based conversation
mechanism coordinated via tRPC. Agents can ask questions to peers and
receive answers, with target resolution by agent ID, task ID, or phase ID.
Setup tokens from `claude setup-token` can't query the usage API,
resulting in a useless "Usage API request failed" message. Now shows
the actual HTTP status and guides users to complete OAuth setup.
Also distinguishes warning state (yellow) from error state (red)
in the AccountCard UI.
DB log chunk insertion is now the sole trigger for agent:output events.
Eliminates triple emission (FileTailer, handleStreamEvent, output buffer)
in favor of: FileTailer.onRawContent → DB insert → EventBus emit.
- createLogChunkCallback emits agent:output after successful DB insert
- spawnInternal now wires onRawContent callback (fixes session 1 gap)
- Remove eventBus from FileTailer (no longer touches EventBus)
- Remove eventBus from ProcessManager constructor (dead parameter)
- Remove agent:output emission from handleStreamEvent text_delta
- Remove outputBuffers map and all buffer helpers from manager/handler
- Remove getOutputBuffer from AgentManager interface and implementations
- getAgentOutput tRPC: DB-only, no file fallback
- onAgentOutput subscription: no initial buffer yield, events only
- AgentOutputViewer: accumulates raw JSONL chunks, parses uniformly
Full rename across the codebase for clarity:
- breakdown (initiative→phases) is now "plan"
- decompose (phase→tasks) is now "detail"
Updates schema enums, TypeScript types, events, prompts, output handler,
tRPC procedures, CLI commands, frontend components, tests, and docs.
Also fixes 0022 migration multi-statement issue (adds statement-breakpoint markers).
Breakdown and decompose agents now receive all existing phases, tasks,
and pages as read-only context so they can plan with awareness of what
already exists instead of operating in a vacuum.
Check for refresh token availability before attempting credential refresh.
Setup tokens that expire without a refresh token now return a clear error
instead of attempting an invalid refresh operation.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Check for refresh token availability before attempting credential refresh.
Setup tokens that expire without a refresh token now return a clear error
instead of attempting an invalid refresh operation.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated checkAccountHealth to handle setup tokens with null expiresAt:
- Changed currentExpiresAt type from number to number | null
- Use conditional for tokenExpiresAt ISO string conversion
This completes the OAuth setup token support across all credential
reading and validation functions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add validation to check for refresh token availability before attempting
token refresh. Setup tokens that expire without a refresh token now
return a clear error message instead of attempting an invalid refresh.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Make refreshToken and expiresAt optional in usage credential validation.
Aligns with changes in default-credential-manager.ts.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated readCredentials and isTokenExpired to support setup tokens:
- Removed refreshToken requirement check
- Use nullish coalescing for refreshToken and expiresAt fields
- Treat tokens without expiresAt as non-expired
Completes OAuth credential handling for setup tokens across all
credential management functions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Make refreshToken and expiresAt optional in OAuth credential validation.
Setup tokens without expiry are now treated as non-expired.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated DefaultAccountCredentialManager to handle setup tokens:
- Removed refreshToken requirement in validation check
- Use nullish coalescing for refreshToken and expiresAt
- Treat tokens without expiresAt as non-expired (setup tokens)
Completes the setup token support changes.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Modified OAuthCredentials interface to support setup tokens that don't
have refresh tokens or expiry times:
- refreshToken: string | null
- expiresAt: number | null
Updated in both src/agent/accounts/usage.ts and
src/agent/credentials/types.ts for consistency.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
PROBLEM:
- Agents completing with questions were incorrectly marked as "crashed"
- Race condition: polling handler AND crash handler both called handleCompletion()
- Caused database corruption and lost pending questions
SOLUTION:
- Add completion mutex in OutputHandler to prevent concurrent processing
- Remove duplicate completion call from crash handler
- Only one handler executes completion logic per agent
TESTING:
- Added mutex-completion.test.ts with 4 test cases
- Verified mutex prevents concurrent access
- Verified lock cleanup on exceptions
- Verified different agents can process concurrently
FIXES: residential-cuckoo and 12+ other agents stuck in crashed state
Replaces file completion detection with a superior approach that reads only
complete JSONL lines and tracks file position. This eliminates race conditions
without any delays or polling.
Key improvements:
- Read up to last complete line, avoiding partial lines during writes
- Track file position per agent for incremental reading
- Process only valid, complete JSON lines
- Clean up position tracking on completion/crash
- No hardcoded delays or polling required
This approach is more robust, responsive, and elegant than timing-based solutions.
The race condition where agents were marked as crashed is now completely resolved.
Fixes race condition where agents were incorrectly marked as crashed when
output files took longer than 500ms to complete writing.
Changes:
- Replace hardcoded 500ms delay with polling-based file completion detection
- Add signal file validation to ensure JSON is complete before processing
- Make status updates atomic to prevent race conditions
- Update cleanup manager to pass outputFilePath for proper timing
This resolves the issue where successful agents like "abundant-wolverine"
were marked as crashed despite producing valid output.
- Add structured_output field to ClaudeCliResult interface
- Read from structured_output when present (--json-schema response)
- Fall back to parsing result for backwards compatibility
Add tests for decompose mode scenarios:
- Spawn agent in decompose mode
- Complete with tasks on decompose_complete
- Pause on questions in decompose mode
- Emit stopped event with decompose_complete reason
- Set result message with task count
- Add buildDecomposePrompt for decompose mode agent operations
- Import Phase and Plan types from schema
- Comprehensive prompt explaining task breakdown rules, types, and output format
- Import TaskBreakdown from schema.ts
- Add decompose_complete status to MockAgentScenario type
- Update completeAgent() to handle decompose_complete scenarios
- Emit agent:stopped with reason 'decompose_complete' for E2E testing