refactor: Replace file-count task sizing with lines-changed heuristic

Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro data (107 lines / 4.1 files = 46% success for best agents). Old rules used file count as the primary proxy which correlates poorly with task difficulty compared to lines changed.
2026-02-18 16:54:10 +09:00
parent 7354582d69
commit c04e6d7778
1 changed files with 7 additions and 4 deletions
--- a/src/agent/prompts/detail.ts
+++ b/src/agent/prompts/detail.ts
@@ -65,10 +65,13 @@ If two tasks need to modify the same file or need the functionality another task

 ## Task Sizing

- **1-5 files**: Good task size
- **7+ files**: Too big — split into smaller tasks
- **1 sentence description**: Too small — merge with related work or add more detail
- **500+ words**: Probably overspecified — simplify or split
+Size tasks by expected lines changed — this predicts difficulty far more than file count.
+
+- **Under ~150 lines changed across 1-3 files**: Sweet spot. High confidence an agent completes this in one shot.
+- **~150-300 lines or 4-5 files**: Risky. Only if the work is highly mechanical (e.g., repetitive migrations, boilerplate). Needs very precise specs.
+- **300+ lines or 5+ files**: Too big — split it. Agent success drops sharply at this scale.
+- **1 sentence description**: Too vague — merge with related work or add concrete detail.
+- **Under ~20 lines**: Too small — merge with a related task to avoid per-task overhead.

 ## Checkpoint Tasks