From c04e6d7778d5838d9e155f013c67654332ac3ed2 Mon Sep 17 00:00:00 2001 From: Lukas May Date: Wed, 18 Feb 2026 16:54:10 +0900 Subject: [PATCH] refactor: Replace file-count task sizing with lines-changed heuristic Anchor on ~150 lines changed as the sweet spot based on SWE-bench Pro data (107 lines / 4.1 files = 46% success for best agents). Old rules used file count as the primary proxy which correlates poorly with task difficulty compared to lines changed. --- src/agent/prompts/detail.ts | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/agent/prompts/detail.ts b/src/agent/prompts/detail.ts index d5865d2..93835ba 100644 --- a/src/agent/prompts/detail.ts +++ b/src/agent/prompts/detail.ts @@ -65,10 +65,13 @@ If two tasks need to modify the same file or need the functionality another task ## Task Sizing -- **1-5 files**: Good task size -- **7+ files**: Too big — split into smaller tasks -- **1 sentence description**: Too small — merge with related work or add more detail -- **500+ words**: Probably overspecified — simplify or split +Size tasks by expected lines changed — this predicts difficulty far more than file count. + +- **Under ~150 lines changed across 1-3 files**: Sweet spot. High confidence an agent completes this in one shot. +- **~150-300 lines or 4-5 files**: Risky. Only if the work is highly mechanical (e.g., repetitive migrations, boilerplate). Needs very precise specs. +- **300+ lines or 5+ files**: Too big — split it. Agent success drops sharply at this scale. +- **1 sentence description**: Too vague — merge with related work or add concrete detail. +- **Under ~20 lines**: Too small — merge with a related task to avoid per-task overhead. ## Checkpoint Tasks