refactor: Replace file-count task sizing with lines-changed heuristic

Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro
data (107 lines / 4.1 files = 46% success for best agents). Old rules
used file count as the primary proxy which correlates poorly with task
difficulty compared to lines changed.
This commit is contained in:
Lukas May
2026-02-18 16:54:10 +09:00
parent 7354582d69
commit c04e6d7778

View File

@@ -65,10 +65,13 @@ If two tasks need to modify the same file or need the functionality another task
## Task Sizing
- **1-5 files**: Good task size
- **7+ files**: Too big — split into smaller tasks
- **1 sentence description**: Too small — merge with related work or add more detail
- **500+ words**: Probably overspecified — simplify or split
Size tasks by expected lines changed — this predicts difficulty far more than file count.
- **Under ~150 lines changed across 1-3 files**: Sweet spot. High confidence an agent completes this in one shot.
- **~150-300 lines or 4-5 files**: Risky. Only if the work is highly mechanical (e.g., repetitive migrations, boilerplate). Needs very precise specs.
- **300+ lines or 5+ files**: Too big — split it. Agent success drops sharply at this scale.
- **1 sentence description**: Too vague — merge with related work or add concrete detail.
- **Under ~20 lines**: Too small — merge with a related task to avoid per-task overhead.
## Checkpoint Tasks