fix: conflict resolution tasks now get dispatched instead of permanently blocking initiative

- Remove original task blocking in handleConflict (task is already completed by handleAgentStopped)
- Return created conflict task from handleConflict so orchestrator can queue it for dispatch
- Add dedup check to prevent duplicate resolution tasks on crash retries
- Queue conflict resolution task via dispatchManager in mergeTaskIntoPhase
- Add recovery for erroneously blocked tasks in recoverDispatchQueues
- Update tests and docs
This commit is contained in:
Lukas May
2026-03-06 20:37:29 +01:00
parent a41caa633b
commit d4a28713f6
7 changed files with 103 additions and 28 deletions

View File

@@ -117,6 +117,15 @@ InitiativeChangesRequestedEvent { initiativeId, phaseId, taskId }
| `agent:crashed` | Auto-retry crashed task up to `MAX_TASK_RETRIES` (3). Increments `retryCount`, resets status to `pending`, re-queues. Exceeding retries leaves task `in_progress` for manual intervention. |
| `task:completed` | Merge task branch (if branch exists), check phase completion, dispatch next queued task |
### Conflict Resolution → Dispatch Flow
When a task branch merge produces conflicts:
1. `mergeTaskIntoPhase()` detects conflicts from `branchManager.mergeBranch()`
2. Calls `conflictResolutionService.handleConflict()` which creates a "Resolve conflicts" task (with dedup — skips if an identical pending/in_progress resolution task already exists)
3. The original task is **not blocked** — it was already completed by `handleAgentStopped` before the merge attempt. The pending resolution task prevents premature phase completion.
4. Orchestrator queues the new conflict task via `dispatchManager.queue()`
5. `scheduleDispatch()` picks it up and assigns it to an idle agent
### Crash Recovery
When an agent crashes (`agent:crashed` event), the orchestrator automatically retries the task:
@@ -125,7 +134,9 @@ When an agent crashes (`agent:crashed` event), the orchestrator automatically re
3. If under limit: increments `retryCount`, resets task to `pending`, re-queues for dispatch
4. If over limit: logs warning, task stays `in_progress` for manual intervention
On server restart, `recoverDispatchQueues()` also recovers stuck `in_progress` tasks whose agents are dead (status is not `running` or `waiting_for_input`). These are reset to `pending` and re-queued.
On server restart, `recoverDispatchQueues()` also recovers:
- Stuck `in_progress` tasks whose agents are dead (status is not `running` or `waiting_for_input`) — reset to `pending` and re-queued
- Erroneously `blocked` tasks whose agents completed successfully (status is `idle` or `completed`) — marked `completed` so the phase can progress. This handles the legacy case where conflict resolution incorrectly blocked already-completed tasks.
Manual retry via `retryBlockedTask()` resets `retryCount` to 0, giving the task a fresh set of automatic retries.