# Crash Marking Race Condition Fix ## Problem Agents were being incorrectly marked as "crashed" despite completing successfully with valid `signal.json` files. This happened because of a **race condition** in the output polling logic. ### Root Cause In `src/agent/output-handler.ts`, the `handleCompletion()` method had two code paths for completion detection: 1. **Main path** (line 273): Checked for `signal.json` completion - ✅ **WORKED** 2. **Error path** (line 306): Marked agent as crashed when no new output detected - ❌ **PROBLEM** The race condition occurred when: 1. Agent completes and writes `signal.json` 2. Polling happens before all output is flushed to disk 3. No new output detected in `output.jsonl` 4. Error path triggers `handleAgentError()` → marks agent as crashed 5. Completion detection never runs because agent already marked as crashed ### Specific Case: slim-wildebeest - **Agent ID**: `t9itQywbC0aZBZyc_SL0V` - **Status**: `crashed` (incorrect) - **Exit Code**: `NULL` (agent marked crashed before process exit) - **Signal File**: Valid `signal.json` with `status: "questions"` ✅ - **Problem**: Error path triggered before completion detection ## Solution **Added completion detection in the error path** before marking agent as crashed. ### Code Change In `src/agent/output-handler.ts` around line 305: ```typescript // BEFORE (line 306) log.warn({ agentId }, 'no result text from stream or file'); await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir); // AFTER // Before marking as crashed, check if the agent actually completed successfully const agentWorkdir = getAgentWorkdir(agentId); if (await this.checkSignalCompletion(agentWorkdir)) { const signalPath = join(agentWorkdir, '.cw/output/signal.json'); const signalContent = await readFile(signalPath, 'utf-8'); log.debug({ agentId, signalPath }, 'detected completion via signal.json in error path'); this.filePositions.delete(agentId); // Clean up tracking await this.processSignalAndFiles(agentId, signalContent, agent.mode as AgentMode, getAgentWorkdir, active?.streamSessionId); return; } log.warn({ agentId }, 'no result text from stream or file'); await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir); ``` ### Logic The fix adds a **final completion check** right before marking an agent as crashed. If the agent has a valid `signal.json` with status `done`, `questions`, or `error`, it processes the completion instead of marking as crashed. ## Impact - ✅ **Prevents false crash marking** for agents that completed successfully - ✅ **No performance impact** - only runs when no new output detected (rare) - ✅ **Backward compatible** - still marks truly crashed agents as crashed - ✅ **Comprehensive** - uses same robust `checkSignalCompletion()` logic as main path ## Testing Verified with completion detection tests: ```bash npm test -- src/agent/completion-detection.test.ts ``` Manual verification showed slim-wildebeest's signal.json would be detected: - Signal file exists: `true` - Status: `questions` - Should complete: `true` ✅ ## Future Improvements 1. **Unified completion detection** - Consider consolidating all completion logic into a single method 2. **Enhanced logging** - Add more detailed logs for completion vs crash decisions 3. **Metrics** - Track completion detection success rates to identify remaining edge cases ## Related Files - `src/agent/output-handler.ts` - Main fix location - `src/agent/completion-detection.test.ts` - Existing test coverage - `src/agent/manager.ts` - Secondary crash handling (different logic)