Decomposed "Foundation Setup - Install Dependencies & Configure Tailwind" phase into 6 executable tasks: 1. Install Tailwind CSS, PostCSS & Autoprefixer 2. Map MUI theme to Tailwind design tokens 3. Setup CSS variables for dynamic theming 4. Install Radix UI primitives 5. Initialize shadcn/ui and setup component directory 6. Move MUI to devDependencies and verify setup Tasks follow logical dependency chain with final human verification checkpoint before proceeding with component migration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
91 lines
3.6 KiB
Markdown
91 lines
3.6 KiB
Markdown
# Crash Marking Race Condition Fix
|
|
|
|
## Problem
|
|
|
|
Agents were being incorrectly marked as "crashed" despite completing successfully with valid `signal.json` files. This happened because of a **race condition** in the output polling logic.
|
|
|
|
### Root Cause
|
|
|
|
In `src/agent/output-handler.ts`, the `handleCompletion()` method had two code paths for completion detection:
|
|
|
|
1. **Main path** (line 273): Checked for `signal.json` completion - ✅ **WORKED**
|
|
2. **Error path** (line 306): Marked agent as crashed when no new output detected - ❌ **PROBLEM**
|
|
|
|
The race condition occurred when:
|
|
1. Agent completes and writes `signal.json`
|
|
2. Polling happens before all output is flushed to disk
|
|
3. No new output detected in `output.jsonl`
|
|
4. Error path triggers `handleAgentError()` → marks agent as crashed
|
|
5. Completion detection never runs because agent already marked as crashed
|
|
|
|
### Specific Case: slim-wildebeest
|
|
|
|
- **Agent ID**: `t9itQywbC0aZBZyc_SL0V`
|
|
- **Status**: `crashed` (incorrect)
|
|
- **Exit Code**: `NULL` (agent marked crashed before process exit)
|
|
- **Signal File**: Valid `signal.json` with `status: "questions"` ✅
|
|
- **Problem**: Error path triggered before completion detection
|
|
|
|
## Solution
|
|
|
|
**Added completion detection in the error path** before marking agent as crashed.
|
|
|
|
### Code Change
|
|
|
|
In `src/agent/output-handler.ts` around line 305:
|
|
|
|
```typescript
|
|
// BEFORE (line 306)
|
|
log.warn({ agentId }, 'no result text from stream or file');
|
|
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);
|
|
|
|
// AFTER
|
|
// Before marking as crashed, check if the agent actually completed successfully
|
|
const agentWorkdir = getAgentWorkdir(agentId);
|
|
if (await this.checkSignalCompletion(agentWorkdir)) {
|
|
const signalPath = join(agentWorkdir, '.cw/output/signal.json');
|
|
const signalContent = await readFile(signalPath, 'utf-8');
|
|
log.debug({ agentId, signalPath }, 'detected completion via signal.json in error path');
|
|
this.filePositions.delete(agentId); // Clean up tracking
|
|
await this.processSignalAndFiles(agentId, signalContent, agent.mode as AgentMode, getAgentWorkdir, active?.streamSessionId);
|
|
return;
|
|
}
|
|
|
|
log.warn({ agentId }, 'no result text from stream or file');
|
|
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);
|
|
```
|
|
|
|
### Logic
|
|
|
|
The fix adds a **final completion check** right before marking an agent as crashed. If the agent has a valid `signal.json` with status `done`, `questions`, or `error`, it processes the completion instead of marking as crashed.
|
|
|
|
## Impact
|
|
|
|
- ✅ **Prevents false crash marking** for agents that completed successfully
|
|
- ✅ **No performance impact** - only runs when no new output detected (rare)
|
|
- ✅ **Backward compatible** - still marks truly crashed agents as crashed
|
|
- ✅ **Comprehensive** - uses same robust `checkSignalCompletion()` logic as main path
|
|
|
|
## Testing
|
|
|
|
Verified with completion detection tests:
|
|
```bash
|
|
npm test -- src/agent/completion-detection.test.ts
|
|
```
|
|
|
|
Manual verification showed slim-wildebeest's signal.json would be detected:
|
|
- Signal file exists: `true`
|
|
- Status: `questions`
|
|
- Should complete: `true` ✅
|
|
|
|
## Future Improvements
|
|
|
|
1. **Unified completion detection** - Consider consolidating all completion logic into a single method
|
|
2. **Enhanced logging** - Add more detailed logs for completion vs crash decisions
|
|
3. **Metrics** - Track completion detection success rates to identify remaining edge cases
|
|
|
|
## Related Files
|
|
|
|
- `src/agent/output-handler.ts` - Main fix location
|
|
- `src/agent/completion-detection.test.ts` - Existing test coverage
|
|
- `src/agent/manager.ts` - Secondary crash handling (different logic) |