Files
Codewalkers/docs/archive/crash-marking-fix.md
Lukas May 342b490fe7 feat: Task decomposition for Tailwind/Radix/shadcn foundation setup
Decomposed "Foundation Setup - Install Dependencies & Configure Tailwind"
phase into 6 executable tasks:

1. Install Tailwind CSS, PostCSS & Autoprefixer
2. Map MUI theme to Tailwind design tokens
3. Setup CSS variables for dynamic theming
4. Install Radix UI primitives
5. Initialize shadcn/ui and setup component directory
6. Move MUI to devDependencies and verify setup

Tasks follow logical dependency chain with final human verification
checkpoint before proceeding with component migration.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 09:48:51 +01:00

3.6 KiB

Crash Marking Race Condition Fix

Problem

Agents were being incorrectly marked as "crashed" despite completing successfully with valid signal.json files. This happened because of a race condition in the output polling logic.

Root Cause

In src/agent/output-handler.ts, the handleCompletion() method had two code paths for completion detection:

  1. Main path (line 273): Checked for signal.json completion - WORKED
  2. Error path (line 306): Marked agent as crashed when no new output detected - PROBLEM

The race condition occurred when:

  1. Agent completes and writes signal.json
  2. Polling happens before all output is flushed to disk
  3. No new output detected in output.jsonl
  4. Error path triggers handleAgentError() → marks agent as crashed
  5. Completion detection never runs because agent already marked as crashed

Specific Case: slim-wildebeest

  • Agent ID: t9itQywbC0aZBZyc_SL0V
  • Status: crashed (incorrect)
  • Exit Code: NULL (agent marked crashed before process exit)
  • Signal File: Valid signal.json with status: "questions"
  • Problem: Error path triggered before completion detection

Solution

Added completion detection in the error path before marking agent as crashed.

Code Change

In src/agent/output-handler.ts around line 305:

// BEFORE (line 306)
log.warn({ agentId }, 'no result text from stream or file');
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);

// AFTER
// Before marking as crashed, check if the agent actually completed successfully
const agentWorkdir = getAgentWorkdir(agentId);
if (await this.checkSignalCompletion(agentWorkdir)) {
  const signalPath = join(agentWorkdir, '.cw/output/signal.json');
  const signalContent = await readFile(signalPath, 'utf-8');
  log.debug({ agentId, signalPath }, 'detected completion via signal.json in error path');
  this.filePositions.delete(agentId); // Clean up tracking
  await this.processSignalAndFiles(agentId, signalContent, agent.mode as AgentMode, getAgentWorkdir, active?.streamSessionId);
  return;
}

log.warn({ agentId }, 'no result text from stream or file');
await this.handleAgentError(agentId, new Error('No output received'), provider, getAgentWorkdir);

Logic

The fix adds a final completion check right before marking an agent as crashed. If the agent has a valid signal.json with status done, questions, or error, it processes the completion instead of marking as crashed.

Impact

  • Prevents false crash marking for agents that completed successfully
  • No performance impact - only runs when no new output detected (rare)
  • Backward compatible - still marks truly crashed agents as crashed
  • Comprehensive - uses same robust checkSignalCompletion() logic as main path

Testing

Verified with completion detection tests:

npm test -- src/agent/completion-detection.test.ts

Manual verification showed slim-wildebeest's signal.json would be detected:

  • Signal file exists: true
  • Status: questions
  • Should complete: true

Future Improvements

  1. Unified completion detection - Consider consolidating all completion logic into a single method
  2. Enhanced logging - Add more detailed logs for completion vs crash decisions
  3. Metrics - Track completion detection success rates to identify remaining edge cases
  • src/agent/output-handler.ts - Main fix location
  • src/agent/completion-detection.test.ts - Existing test coverage
  • src/agent/manager.ts - Secondary crash handling (different logic)