Decomposed "Foundation Setup - Install Dependencies & Configure Tailwind" phase into 6 executable tasks: 1. Install Tailwind CSS, PostCSS & Autoprefixer 2. Map MUI theme to Tailwind design tokens 3. Setup CSS variables for dynamic theming 4. Install Radix UI primitives 5. Initialize shadcn/ui and setup component directory 6. Move MUI to devDependencies and verify setup Tasks follow logical dependency chain with final human verification checkpoint before proceeding with component migration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3.6 KiB
Crash Marking Race Condition Fix
Problem
Agents were being incorrectly marked as "crashed" despite completing successfully with valid signal.json files. This happened because of a race condition in the output polling logic.
Root Cause
In src/agent/output-handler.ts, the handleCompletion() method had two code paths for completion detection:
- Main path (line 273): Checked for
signal.jsoncompletion - ✅ WORKED - Error path (line 306): Marked agent as crashed when no new output detected - ❌ PROBLEM
The race condition occurred when:
- Agent completes and writes
signal.json - Polling happens before all output is flushed to disk
- No new output detected in
output.jsonl - Error path triggers
handleAgentError()→ marks agent as crashed - Completion detection never runs because agent already marked as crashed
Specific Case: slim-wildebeest
- Agent ID:
t9itQywbC0aZBZyc_SL0V - Status:
crashed(incorrect) - Exit Code:
NULL(agent marked crashed before process exit) - Signal File: Valid
signal.jsonwithstatus: "questions"✅ - Problem: Error path triggered before completion detection
Solution
Added completion detection in the error path before marking agent as crashed.
Code Change
In src/agent/output-handler.ts around line 305:
// BEFORE (line 306)
log.warn({ agentId }, 'no result text from stream or file');
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);
// AFTER
// Before marking as crashed, check if the agent actually completed successfully
const agentWorkdir = getAgentWorkdir(agentId);
if (await this.checkSignalCompletion(agentWorkdir)) {
const signalPath = join(agentWorkdir, '.cw/output/signal.json');
const signalContent = await readFile(signalPath, 'utf-8');
log.debug({ agentId, signalPath }, 'detected completion via signal.json in error path');
this.filePositions.delete(agentId); // Clean up tracking
await this.processSignalAndFiles(agentId, signalContent, agent.mode as AgentMode, getAgentWorkdir, active?.streamSessionId);
return;
}
log.warn({ agentId }, 'no result text from stream or file');
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);
Logic
The fix adds a final completion check right before marking an agent as crashed. If the agent has a valid signal.json with status done, questions, or error, it processes the completion instead of marking as crashed.
Impact
- ✅ Prevents false crash marking for agents that completed successfully
- ✅ No performance impact - only runs when no new output detected (rare)
- ✅ Backward compatible - still marks truly crashed agents as crashed
- ✅ Comprehensive - uses same robust
checkSignalCompletion()logic as main path
Testing
Verified with completion detection tests:
npm test -- src/agent/completion-detection.test.ts
Manual verification showed slim-wildebeest's signal.json would be detected:
- Signal file exists:
true - Status:
questions - Should complete:
true✅
Future Improvements
- Unified completion detection - Consider consolidating all completion logic into a single method
- Enhanced logging - Add more detailed logs for completion vs crash decisions
- Metrics - Track completion detection success rates to identify remaining edge cases
Related Files
src/agent/output-handler.ts- Main fix locationsrc/agent/completion-detection.test.ts- Existing test coveragesrc/agent/manager.ts- Secondary crash handling (different logic)