refactor: Replace file-count task sizing with lines-changed heuristic
Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro data (107 lines / 4.1 files = 46% success for best agents). Old rules used file count as the primary proxy which correlates poorly with task difficulty compared to lines changed.
This commit is contained in:
@@ -65,10 +65,13 @@ If two tasks need to modify the same file or need the functionality another task
|
|||||||
|
|
||||||
## Task Sizing
|
## Task Sizing
|
||||||
|
|
||||||
- **1-5 files**: Good task size
|
Size tasks by expected lines changed — this predicts difficulty far more than file count.
|
||||||
- **7+ files**: Too big — split into smaller tasks
|
|
||||||
- **1 sentence description**: Too small — merge with related work or add more detail
|
- **Under ~150 lines changed across 1-3 files**: Sweet spot. High confidence an agent completes this in one shot.
|
||||||
- **500+ words**: Probably overspecified — simplify or split
|
- **~150-300 lines or 4-5 files**: Risky. Only if the work is highly mechanical (e.g., repetitive migrations, boilerplate). Needs very precise specs.
|
||||||
|
- **300+ lines or 5+ files**: Too big — split it. Agent success drops sharply at this scale.
|
||||||
|
- **1 sentence description**: Too vague — merge with related work or add concrete detail.
|
||||||
|
- **Under ~20 lines**: Too small — merge with a related task to avoid per-task overhead.
|
||||||
|
|
||||||
## Checkpoint Tasks
|
## Checkpoint Tasks
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user