refactor: Add testability focus and definition-of-done checklists to discuss/refine prompts

Discuss prompt: add Testability & Verification question category, require verification criteria for behavioral decisions, add definition-of-done checklist. Refine prompt: strengthen unverifiable-requirements check to demand testable acceptance criteria with inputs/outputs, extend missing-edge-cases to frame as testable scenarios, add definition-of-done checklist.
2026-02-18 17:19:53 +09:00
parent 09a388b490
commit b5509232f6
2 changed files with 72 additions and 2 deletions
--- a/src/agent/prompts/discuss.ts
+++ b/src/agent/prompts/discuss.ts
@@ -20,15 +20,63 @@ Write decisions to \`.cw/output/decisions/{id}.md\`:

 ${ID_GENERATION}

+## Goal-Backward Analysis
+
+Before asking questions, work backward from the goal:
+
+1. **Observable outcome**: What will the user see/do when this is done?
+2. **Artifacts needed**: What code, config, or infra produces that outcome?
+3. **Wiring**: How do the artifacts connect (data flow, API contracts, events)?
+4. **Failure points**: What can go wrong? What are the edge cases?
+
+Only ask questions that this analysis cannot answer from the codebase alone.
+
+## Question Quality
+
+**Bad question**: "How should we handle errors?"
+**Good question**: "The current API returns HTTP 500 for all errors. Should we: (a) add specific error codes (400, 404, 409) with JSON error bodies, (b) keep 500 but add error details in the response body, or (c) add a custom error middleware that maps domain errors to HTTP codes?"
+
+Every question must:
+- Reference something concrete (file, pattern, constraint)
+- Offer specific options when choices are clear
+- Explain what depends on the answer
+
+## Decision Quality
+
+**Bad decision**: "We'll use a database for storage"
+**Good decision**: "Use SQLite via better-sqlite3 with drizzle-orm. Schema in src/db/schema.ts, migrations via drizzle-kit. Chosen over PostgreSQL because: single-node deployment, no external deps, existing pattern in the codebase."
+
+Every decision must include: what, why, and what alternatives were rejected.
+
+When the decision affects observable behavior, also include: how to verify it works (acceptance criteria, test approach, or measurable outcome).
+
+## Read Before Asking
+
+Before asking ANY question, check if the codebase already answers it:
+- Read existing code patterns, config files, package.json
+- Check if similar problems were already solved elsewhere
+- Don't ask "what framework should we use?" if the project already uses one
+
 ## Question Categories
 - **User Journeys**: Main workflows, success/failure paths, edge cases
 - **Technical Constraints**: Patterns to follow, things to avoid, reference code
 - **Data & Validation**: Data structures, validation rules, constraints
 - **Integration Points**: External systems, APIs, error handling
+- **Testability & Verification**: How will we verify each feature works? What are measurable acceptance criteria? What test strategies apply (unit, integration, e2e)?

 ## Rules
 - Ask 2-4 questions at a time, not more
 - Provide options when choices are clear
 - Capture every decision with rationale
- Don't proceed until ambiguities are resolved`;
+- Don't proceed until ambiguities are resolved
+
+## Definition of Done
+
+Before writing signal.json with status "done", verify:
+
+- [ ] Every question references something concrete (file, pattern, constraint)
+- [ ] Every question offers specific options when choices are clear
+- [ ] Every decision includes what, why, and rejected alternatives
+- [ ] Behavioral decisions include verification criteria
+- [ ] The codebase was checked before asking — no questions the code already answers`;
 }
--- a/src/agent/prompts/refine.ts
+++ b/src/agent/prompts/refine.ts
@@ -18,11 +18,33 @@ Write one file per modified page to \`.cw/output/pages/{pageId}.md\`:
 - Frontmatter: \`title\`, \`summary\` (what changed and why)
 - Body: Full new markdown content for the page (replaces entire page body)

+## What to Improve (priority order)
+
+1. **Ambiguity**: Requirements that could be interpreted multiple ways → make them specific
+2. **Missing details**: Gaps that would force an agent to guess → fill them with concrete decisions
+3. **Contradictions**: Statements that conflict with each other or with existing code → resolve them
+4. **Unverifiable requirements**: "Make it fast", "Handle errors properly" → add testable acceptance criteria with specific inputs, expected outputs, and verification commands where possible. "Response time under 200ms" is better than "make it fast", but "GET /api/users with 1000 records returns in under 200ms (verify: \`npm run bench -- api/users\`)" is what a worker agent actually needs.
+5. **Missing edge cases**: Happy path only → add error states, empty states, boundary conditions, concurrent access. Frame these as testable scenarios: "When the cart is empty and the user clicks checkout, the system should display 'Your cart is empty' and disable the payment button."
+
+Do NOT refine for style, grammar, or formatting unless it genuinely hurts clarity. A rough but precise requirement is better than a polished but vague one.
+
+If all pages are already clear and actionable, signal done with no output files. Don't refine for the sake of refining.
+
 ## Rules
 - Ask 2-4 questions at a time if you need clarification
 - Only propose changes for pages that genuinely need improvement
 - Each output page's body is the FULL new content (not a diff)
 - Preserve [[page:\$id|title]] cross-references in your output
 - Focus on clarity, completeness, and consistency
- Do not invent new page IDs — only reference existing ones from .cw/input/pages/`;
+- Do not invent new page IDs — only reference existing ones from .cw/input/pages/
+
+## Definition of Done
+
+Before writing signal.json with status "done", verify:
+
+- [ ] Only pages with genuine clarity problems were modified — no style-only changes
+- [ ] All [[page:\$id|title]] cross-references are preserved
+- [ ] Ambiguous requirements now have specific, testable acceptance criteria
+- [ ] Each modified page's summary accurately describes what changed and why
+- [ ] If no pages needed improvement, signal done with no output files — don't refine for the sake of refining`;
 }