fix: prevent stale duplicate planning tasks from blocking phase completion

Three fixes for phases getting stuck when a detail task crashes and is retried:

1. detailPhase mutation (architect.ts): clean up orphaned pending/in_progress
   detail tasks before creating new ones, preventing duplicates at the source
2. orchestrator recovery: detect and complete stale duplicate planning tasks
   (same category+phase, one completed, one pending)
3. ensureBranch: catch "already exists" TOCTOU race instead of blocking phase
This commit is contained in:
Lukas May
2026-03-06 21:44:26 +01:00
parent ee8c7097db
commit 346d62ef8d
4 changed files with 40 additions and 3 deletions

View File

@@ -149,8 +149,11 @@ When an agent crashes (`agent:crashed` event), the orchestrator automatically re
On server restart, `recoverDispatchQueues()` also recovers:
- Stuck `in_progress` tasks whose agents are dead (status is not `running` or `waiting_for_input`) — reset to `pending` and re-queued
- Erroneously `blocked` tasks whose agents completed successfully (status is `idle` or `stopped`) — marked `completed` so the phase can progress. This handles the legacy case where conflict resolution incorrectly blocked already-completed tasks.
- Stale duplicate planning tasks — if a phase has both a completed and a pending task of the same planning category (e.g. two `detail` tasks from a crash-and-retry), the pending one is marked `completed` with summary "Superseded by retry"
- Fully-completed `in_progress` phases — after task recovery, if all tasks in an `in_progress` phase are completed, triggers `handlePhaseAllTasksDone` to complete/review the phase
The `detailPhase` mutation in `architect.ts` also cleans up orphaned pending/in_progress detail tasks before creating new ones, preventing duplicates at the source.
Manual retry via `retryBlockedTask()` resets `retryCount` to 0, giving the task a fresh set of automatic retries.
### Coalesced Scheduling