From 09a388b490f460f3963b93c29b6bf6dae3dc36d1 Mon Sep 17 00:00:00 2001 From: Lukas May Date: Wed, 18 Feb 2026 17:19:48 +0900 Subject: [PATCH] refactor: Enforce mandatory test specs in detail prompt, add testing strategy to plan prompt Detail: Replace vague "how to verify" requirement with mandatory test specification (file path, scenarios, run command) for execute-category tasks. Update good-task example to demonstrate the new format. Add Definition of Done checklist. Plan: Add Testing Strategy section requiring tests within each implementation phase instead of trailing test phases. Add Definition of Done checklist. --- src/agent/prompts/detail.ts | 45 ++++++++++++++++++++++++++++++------- src/agent/prompts/plan.ts | 24 +++++++++++++++++++- 2 files changed, 60 insertions(+), 9 deletions(-) diff --git a/src/agent/prompts/detail.ts b/src/agent/prompts/detail.ts index 93835ba..b48a4f6 100644 --- a/src/agent/prompts/detail.ts +++ b/src/agent/prompts/detail.ts @@ -29,9 +29,14 @@ ${ID_GENERATION} Before finalizing each task, ask: **"Could a worker agent execute this without clarifying questions?"** Every task body MUST include: -1. **What to create or modify** — if possible, specific file paths (e.g., \`src/db/schema.ts\`, \`src/api/routes/users.ts\`) -2. **Expected behavior** — what the code should do, with concrete examples or edge cases -3. **How to verify** — specific test to run, endpoint to hit, or behavior to check +1. **What to create or modify** — specific file paths (e.g., \`src/db/schema.ts\`, \`src/api/routes/users.ts\`) +2. **Expected behavior** — what the code should do, with concrete examples, inputs/outputs, and edge cases +3. **Test specification** — REQUIRED for every execute-category task: + - Test file path (e.g., \`src/api/validators/user.test.ts\`) + - Test scenarios to cover (happy path, error cases, edge cases) + - Run command (e.g., \`npm test -- src/api/validators/user.test.ts\`) + Non-execute tasks (research, discuss, etc.) may omit this. +4. **Verification command** — the exact command to confirm the task is complete (e.g., \`npm test -- path/to/test\`) **Bad task:** \`\`\` @@ -42,13 +47,26 @@ Body: Add validation to the user model. Make sure all fields are validated prope **Good task:** \`\`\` Title: Add Zod validation schema for user creation -Body: Create src/api/validators/user.ts with a Zod schema for CreateUserInput: +Body: Create \`src/api/validators/user.ts\` with a Zod schema for CreateUserInput: - email: valid email format, lowercase, max 255 chars - name: string, 1-100 chars, trimmed - password: min 8 chars, must contain uppercase + number -Export the schema and inferred type. Add unit tests in src/api/validators/user.test.ts -covering: valid input, missing fields, invalid email, short password. -Verify: npm test -- src/api/validators/user.test.ts + +Export the schema and inferred type. + +Test file: \`src/api/validators/user.test.ts\` +Test scenarios: +- Valid input passes validation +- Missing required fields rejected +- Invalid email format rejected +- Password too short / missing uppercase / missing number rejected +- Whitespace-only name rejected + +Files modified: +- src/api/validators/user.ts (create) +- src/api/validators/user.test.ts (create) + +Verify: \`npm test -- src/api/validators/user.test.ts\` \`\`\` ## File Ownership Constraints @@ -94,5 +112,16 @@ Use checkpoint types for work that requires human judgment: - If a task in context/tasks/ already covers the same work (even under a different name), do NOT create a duplicate - Pages contain requirements — use them to create detailed task descriptions - DO NOT create tasks that overlap with existing tasks in other phases -${CONTEXT_MANAGEMENT}`; +${CONTEXT_MANAGEMENT} + +## Definition of Done + +Before writing signal.json with status "done", verify: + +- [ ] Every execute-category task has a test file path and run command +- [ ] Every task has a file ownership list +- [ ] No two parallel tasks modify the same files +- [ ] Every task passes the specificity test (a worker agent can execute without clarifying questions) +- [ ] Tasks are sized within the ~20-300 lines-changed range +- [ ] Context files were read — no duplicate work with existing tasks`; } diff --git a/src/agent/prompts/plan.ts b/src/agent/prompts/plan.ts index 3b3c96f..c1a54da 100644 --- a/src/agent/prompts/plan.ts +++ b/src/agent/prompts/plan.ts @@ -26,6 +26,17 @@ ${ID_GENERATION} - Size: 2-5 tasks each (not too big, not too small) - if the work is independent enough and the tasks are very similar you can also create more tasks for the phase - Clear, action-oriented names (describe what gets built, not how) +## Testing Strategy + +Tests are not a separate phase — they're part of every phase. + +- Do NOT create standalone "write tests" or "integration testing" phases at the end. Tests must be written alongside implementation within each phase. +- Foundation phases should include test infrastructure setup if the project needs it (test config, fixtures, utilities). +- Each phase description should mention what aspects will be tested as part of that phase's work. + +**Bad plan**: Phase 1: Database → Phase 2: API → Phase 3: Frontend → Phase 4: Tests +**Good plan**: Phase 1: Database + schema tests → Phase 2: API + endpoint tests → Phase 3: Frontend + component tests + ## Dependency Graph Every plan MUST include an explicit dependency graph in the frontmatter in the output. For each phase, list: @@ -74,5 +85,16 @@ Reference specific files and directories from the codebase when possible. - Group related work together - Make dependencies explicit using phase IDs - Each task should be completable in one session -${CONTEXT_MANAGEMENT}`; +${CONTEXT_MANAGEMENT} + +## Definition of Done + +Before writing signal.json with status "done", verify: + +- [ ] Every phase has explicit dependencies (or explicitly has none) +- [ ] No fully-serial chain without justification — most real work has parallelizable tracks +- [ ] Parallel phases do not modify the same files +- [ ] Each phase description is specific enough for a detail agent to break into tasks without clarifying questions +- [ ] Testing is part of each implementation phase, not a separate trailing phase +- [ ] Existing context was accounted for — no planned work that's already covered`; }