fix: Prevent lost task completions after server restart

Three bugs causing empty phase diffs when server restarts during agent
execution:

1. Startup ordering race: reconcileAfterRestart() emitted agent:stopped
   before orchestrator registered listeners — events lost. Moved
   reconciliation to after orchestrator.start().

2. Stuck in_progress tasks: recoverDispatchQueues() only re-queued
   pending tasks. Added recovery for in_progress tasks whose agents
   are dead (not running/waiting_for_input).

3. Branch force-reset destroys work: git branch -f wiped commits when
   a second agent was dispatched for the same task. Now checks if the
   branch has commits beyond baseBranch before resetting.

Also adds:
- agent:crashed handler with auto-retry (MAX_TASK_RETRIES=3)
- retryCount column on tasks table + migration
- retryCount reset on manual retryBlockedTask()
This commit is contained in:
Lukas May
2026-03-06 12:19:59 +01:00
parent a69527b7d6
commit eac03862e3
9 changed files with 94 additions and 13 deletions

View File

@@ -51,6 +51,7 @@ All adapters use nanoid() for IDs, auto-manage timestamps, and use Drizzle's `.r
| status | text enum | 'pending' \| 'in_progress' \| 'completed' \| 'blocked' |
| order | integer | default 0 |
| summary | text nullable | Agent result summary — propagated to dependent tasks as context |
| retryCount | integer NOT NULL | default 0, incremented on agent crash auto-retry, reset on manual retry |
| createdAt, updatedAt | integer/timestamp | |
### task_dependencies