From b5509232f6b3334abdc42a5daf074795836129aa Mon Sep 17 00:00:00 2001 From: Lukas May Date: Wed, 18 Feb 2026 17:19:53 +0900 Subject: [PATCH] refactor: Add testability focus and definition-of-done checklists to discuss/refine prompts Discuss prompt: add Testability & Verification question category, require verification criteria for behavioral decisions, add definition-of-done checklist. Refine prompt: strengthen unverifiable-requirements check to demand testable acceptance criteria with inputs/outputs, extend missing-edge-cases to frame as testable scenarios, add definition-of-done checklist. --- src/agent/prompts/discuss.ts | 50 +++++++++++++++++++++++++++++++++++- src/agent/prompts/refine.ts | 24 ++++++++++++++++- 2 files changed, 72 insertions(+), 2 deletions(-) diff --git a/src/agent/prompts/discuss.ts b/src/agent/prompts/discuss.ts index 147ddf6..b5555a7 100644 --- a/src/agent/prompts/discuss.ts +++ b/src/agent/prompts/discuss.ts @@ -20,15 +20,63 @@ Write decisions to \`.cw/output/decisions/{id}.md\`: ${ID_GENERATION} +## Goal-Backward Analysis + +Before asking questions, work backward from the goal: + +1. **Observable outcome**: What will the user see/do when this is done? +2. **Artifacts needed**: What code, config, or infra produces that outcome? +3. **Wiring**: How do the artifacts connect (data flow, API contracts, events)? +4. **Failure points**: What can go wrong? What are the edge cases? + +Only ask questions that this analysis cannot answer from the codebase alone. + +## Question Quality + +**Bad question**: "How should we handle errors?" +**Good question**: "The current API returns HTTP 500 for all errors. Should we: (a) add specific error codes (400, 404, 409) with JSON error bodies, (b) keep 500 but add error details in the response body, or (c) add a custom error middleware that maps domain errors to HTTP codes?" + +Every question must: +- Reference something concrete (file, pattern, constraint) +- Offer specific options when choices are clear +- Explain what depends on the answer + +## Decision Quality + +**Bad decision**: "We'll use a database for storage" +**Good decision**: "Use SQLite via better-sqlite3 with drizzle-orm. Schema in src/db/schema.ts, migrations via drizzle-kit. Chosen over PostgreSQL because: single-node deployment, no external deps, existing pattern in the codebase." + +Every decision must include: what, why, and what alternatives were rejected. + +When the decision affects observable behavior, also include: how to verify it works (acceptance criteria, test approach, or measurable outcome). + +## Read Before Asking + +Before asking ANY question, check if the codebase already answers it: +- Read existing code patterns, config files, package.json +- Check if similar problems were already solved elsewhere +- Don't ask "what framework should we use?" if the project already uses one + ## Question Categories - **User Journeys**: Main workflows, success/failure paths, edge cases - **Technical Constraints**: Patterns to follow, things to avoid, reference code - **Data & Validation**: Data structures, validation rules, constraints - **Integration Points**: External systems, APIs, error handling +- **Testability & Verification**: How will we verify each feature works? What are measurable acceptance criteria? What test strategies apply (unit, integration, e2e)? ## Rules - Ask 2-4 questions at a time, not more - Provide options when choices are clear - Capture every decision with rationale -- Don't proceed until ambiguities are resolved`; +- Don't proceed until ambiguities are resolved + +## Definition of Done + +Before writing signal.json with status "done", verify: + +- [ ] Every question references something concrete (file, pattern, constraint) +- [ ] Every question offers specific options when choices are clear +- [ ] Every decision includes what, why, and rejected alternatives +- [ ] Behavioral decisions include verification criteria +- [ ] The codebase was checked before asking — no questions the code already answers`; } diff --git a/src/agent/prompts/refine.ts b/src/agent/prompts/refine.ts index d69e684..c0f0d53 100644 --- a/src/agent/prompts/refine.ts +++ b/src/agent/prompts/refine.ts @@ -18,11 +18,33 @@ Write one file per modified page to \`.cw/output/pages/{pageId}.md\`: - Frontmatter: \`title\`, \`summary\` (what changed and why) - Body: Full new markdown content for the page (replaces entire page body) +## What to Improve (priority order) + +1. **Ambiguity**: Requirements that could be interpreted multiple ways → make them specific +2. **Missing details**: Gaps that would force an agent to guess → fill them with concrete decisions +3. **Contradictions**: Statements that conflict with each other or with existing code → resolve them +4. **Unverifiable requirements**: "Make it fast", "Handle errors properly" → add testable acceptance criteria with specific inputs, expected outputs, and verification commands where possible. "Response time under 200ms" is better than "make it fast", but "GET /api/users with 1000 records returns in under 200ms (verify: \`npm run bench -- api/users\`)" is what a worker agent actually needs. +5. **Missing edge cases**: Happy path only → add error states, empty states, boundary conditions, concurrent access. Frame these as testable scenarios: "When the cart is empty and the user clicks checkout, the system should display 'Your cart is empty' and disable the payment button." + +Do NOT refine for style, grammar, or formatting unless it genuinely hurts clarity. A rough but precise requirement is better than a polished but vague one. + +If all pages are already clear and actionable, signal done with no output files. Don't refine for the sake of refining. + ## Rules - Ask 2-4 questions at a time if you need clarification - Only propose changes for pages that genuinely need improvement - Each output page's body is the FULL new content (not a diff) - Preserve [[page:\$id|title]] cross-references in your output - Focus on clarity, completeness, and consistency -- Do not invent new page IDs — only reference existing ones from .cw/input/pages/`; +- Do not invent new page IDs — only reference existing ones from .cw/input/pages/ + +## Definition of Done + +Before writing signal.json with status "done", verify: + +- [ ] Only pages with genuine clarity problems were modified — no style-only changes +- [ ] All [[page:\$id|title]] cross-references are preserved +- [ ] Ambiguous requirements now have specific, testable acceptance criteria +- [ ] Each modified page's summary accurately describes what changed and why +- [ ] If no pages needed improvement, signal done with no output files — don't refine for the sake of refining`; }