From 09a388b490f460f3963b93c29b6bf6dae3dc36d1 Mon Sep 17 00:00:00 2001
From: Lukas May <lukas.may@carealytix.com>
Date: Wed, 18 Feb 2026 17:19:48 +0900
Subject: [PATCH] refactor: Enforce mandatory test specs in detail prompt, add
 testing strategy to plan prompt

Detail: Replace vague "how to verify" requirement with mandatory test specification
(file path, scenarios, run command) for execute-category tasks. Update good-task
example to demonstrate the new format. Add Definition of Done checklist.

Plan: Add Testing Strategy section requiring tests within each implementation phase
instead of trailing test phases. Add Definition of Done checklist.
---
 src/agent/prompts/detail.ts | 45 ++++++++++++++++++++++++++++++-------
 src/agent/prompts/plan.ts   | 24 +++++++++++++++++++-
 2 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/src/agent/prompts/detail.ts b/src/agent/prompts/detail.ts
index 93835ba..b48a4f6 100644
--- a/src/agent/prompts/detail.ts
+++ b/src/agent/prompts/detail.ts
@@ -29,9 +29,14 @@ ${ID_GENERATION}
 Before finalizing each task, ask: **"Could a worker agent execute this without clarifying questions?"**
 
 Every task body MUST include:
-1. **What to create or modify** — if possible, specific file paths (e.g., \`src/db/schema.ts\`, \`src/api/routes/users.ts\`)
-2. **Expected behavior** — what the code should do, with concrete examples or edge cases
-3. **How to verify** — specific test to run, endpoint to hit, or behavior to check
+1. **What to create or modify** — specific file paths (e.g., \`src/db/schema.ts\`, \`src/api/routes/users.ts\`)
+2. **Expected behavior** — what the code should do, with concrete examples, inputs/outputs, and edge cases
+3. **Test specification** — REQUIRED for every execute-category task:
+   - Test file path (e.g., \`src/api/validators/user.test.ts\`)
+   - Test scenarios to cover (happy path, error cases, edge cases)
+   - Run command (e.g., \`npm test -- src/api/validators/user.test.ts\`)
+   Non-execute tasks (research, discuss, etc.) may omit this.
+4. **Verification command** — the exact command to confirm the task is complete (e.g., \`npm test -- path/to/test\`)
 
 **Bad task:**
 \`\`\`
@@ -42,13 +47,26 @@ Body: Add validation to the user model. Make sure all fields are validated prope
 **Good task:**
 \`\`\`
 Title: Add Zod validation schema for user creation
-Body: Create src/api/validators/user.ts with a Zod schema for CreateUserInput:
+Body: Create \`src/api/validators/user.ts\` with a Zod schema for CreateUserInput:
 - email: valid email format, lowercase, max 255 chars
 - name: string, 1-100 chars, trimmed
 - password: min 8 chars, must contain uppercase + number
-Export the schema and inferred type. Add unit tests in src/api/validators/user.test.ts
-covering: valid input, missing fields, invalid email, short password.
-Verify: npm test -- src/api/validators/user.test.ts
+
+Export the schema and inferred type.
+
+Test file: \`src/api/validators/user.test.ts\`
+Test scenarios:
+- Valid input passes validation
+- Missing required fields rejected
+- Invalid email format rejected
+- Password too short / missing uppercase / missing number rejected
+- Whitespace-only name rejected
+
+Files modified:
+- src/api/validators/user.ts (create)
+- src/api/validators/user.test.ts (create)
+
+Verify: \`npm test -- src/api/validators/user.test.ts\`
 \`\`\`
 
 ## File Ownership Constraints
@@ -94,5 +112,16 @@ Use checkpoint types for work that requires human judgment:
 - If a task in context/tasks/ already covers the same work (even under a different name), do NOT create a duplicate
 - Pages contain requirements — use them to create detailed task descriptions
 - DO NOT create tasks that overlap with existing tasks in other phases
-${CONTEXT_MANAGEMENT}`;
+${CONTEXT_MANAGEMENT}
+
+## Definition of Done
+
+Before writing signal.json with status "done", verify:
+
+- [ ] Every execute-category task has a test file path and run command
+- [ ] Every task has a file ownership list
+- [ ] No two parallel tasks modify the same files
+- [ ] Every task passes the specificity test (a worker agent can execute without clarifying questions)
+- [ ] Tasks are sized within the ~20-300 lines-changed range
+- [ ] Context files were read — no duplicate work with existing tasks`;
 }
diff --git a/src/agent/prompts/plan.ts b/src/agent/prompts/plan.ts
index 3b3c96f..c1a54da 100644
--- a/src/agent/prompts/plan.ts
+++ b/src/agent/prompts/plan.ts
@@ -26,6 +26,17 @@ ${ID_GENERATION}
 - Size: 2-5 tasks each (not too big, not too small) - if the work is independent enough and the tasks are very similar you can also create more tasks for the phase
 - Clear, action-oriented names (describe what gets built, not how)
 
+## Testing Strategy
+
+Tests are not a separate phase — they're part of every phase.
+
+- Do NOT create standalone "write tests" or "integration testing" phases at the end. Tests must be written alongside implementation within each phase.
+- Foundation phases should include test infrastructure setup if the project needs it (test config, fixtures, utilities).
+- Each phase description should mention what aspects will be tested as part of that phase's work.
+
+**Bad plan**: Phase 1: Database → Phase 2: API → Phase 3: Frontend → Phase 4: Tests
+**Good plan**: Phase 1: Database + schema tests → Phase 2: API + endpoint tests → Phase 3: Frontend + component tests
+
 ## Dependency Graph
 
 Every plan MUST include an explicit dependency graph in the frontmatter in the output. For each phase, list:
@@ -74,5 +85,16 @@ Reference specific files and directories from the codebase when possible.
 - Group related work together
 - Make dependencies explicit using phase IDs
 - Each task should be completable in one session
-${CONTEXT_MANAGEMENT}`;
+${CONTEXT_MANAGEMENT}
+
+## Definition of Done
+
+Before writing signal.json with status "done", verify:
+
+- [ ] Every phase has explicit dependencies (or explicitly has none)
+- [ ] No fully-serial chain without justification — most real work has parallelizable tracks
+- [ ] Parallel phases do not modify the same files
+- [ ] Each phase description is specific enough for a detail agent to break into tasks without clarifying questions
+- [ ] Testing is part of each implementation phase, not a separate trailing phase
+- [ ] Existing context was accounted for — no planned work that's already covered`;
 }