fix: prevent stale duplicate planning tasks from blocking phase completion

Three fixes for phases getting stuck when a detail task crashes and is retried: 1. detailPhase mutation (architect.ts): clean up orphaned pending/in_progress detail tasks before creating new ones, preventing duplicates at the source 2. orchestrator recovery: detect and complete stale duplicate planning tasks (same category+phase, one completed, one pending) 3. ensureBranch: catch "already exists" TOCTOU race instead of blocking phase
2026-03-06 21:44:26 +01:00
parent ee8c7097db
commit 346d62ef8d
4 changed files with 40 additions and 3 deletions
--- a/docs/dispatch-events.md
+++ b/docs/dispatch-events.md
@@ -149,8 +149,11 @@ When an agent crashes (`agent:crashed` event), the orchestrator automatically re
 On server restart, `recoverDispatchQueues()` also recovers:
 - Stuck `in_progress` tasks whose agents are dead (status is not `running` or `waiting_for_input`) — reset to `pending` and re-queued
 - Erroneously `blocked` tasks whose agents completed successfully (status is `idle` or `stopped`) — marked `completed` so the phase can progress. This handles the legacy case where conflict resolution incorrectly blocked already-completed tasks.
+- Stale duplicate planning tasks — if a phase has both a completed and a pending task of the same planning category (e.g. two `detail` tasks from a crash-and-retry), the pending one is marked `completed` with summary "Superseded by retry"
 - Fully-completed `in_progress` phases — after task recovery, if all tasks in an `in_progress` phase are completed, triggers `handlePhaseAllTasksDone` to complete/review the phase

+The `detailPhase` mutation in `architect.ts` also cleans up orphaned pending/in_progress detail tasks before creating new ones, preventing duplicates at the source.
+
 Manual retry via `retryBlockedTask()` resets `retryCount` to 0, giving the task a fresh set of automatic retries.

 ### Coalesced Scheduling