refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist

Replace the weak 7-step execution protocol with an explicit red-green-refactor cycle that requires agents to write failing tests before implementing. Move anti-patterns and scope rules above deviation/git sections so critical constraints get more attention. Add session startup verification, progress tracking, and a mandatory definition-of-done checklist that must pass before signaling completion. Remove dead CODEBASE_VERIFICATION import.
2026-02-18 17:20:11 +09:00
parent b5509232f6
commit 9ed7e9ad16
1 changed files with 45 additions and 17 deletions
--- a/src/agent/prompts/execute.ts
+++ b/src/agent/prompts/execute.ts
@@ -3,12 +3,14 @@
 */

 import {
-  CODEBASE_VERIFICATION,
  CONTEXT_MANAGEMENT,
  DEVIATION_RULES,
  GIT_WORKFLOW,
  INPUT_FILES,
+  PROGRESS_TRACKING,
+  SESSION_STARTUP,
  SIGNAL_FORMAT,
+  TEST_INTEGRITY,
 } from './shared.js';

 export function buildExecutePrompt(taskDescription?: string): string {
@@ -23,19 +25,41 @@ Execute the assigned coding task. Read inputs, implement the work, test it, comm
 ${taskSection}
 ${INPUT_FILES}
 ${SIGNAL_FORMAT}
+${SESSION_STARTUP}

 ## Execution Protocol

-Follow these steps in order. Signal done only after committing.
+Follow these steps in order. Do not skip steps. Signal done only after the final checklist passes.

-1. **Read inputs**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code.
-2. **Orient in the codebase**: Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent.
-3. **Write or update tests**: Before implementing, write or update tests that define the expected behavior. This anchors your implementation and catches regressions early.
-4. **Implement**: Write the code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives.
-5. **Verify**: Run the full relevant test suite. All tests must pass before committing.
-6. **Commit**: Stage and commit your changes with a descriptive message.
-7. **Signal**: Write \`.cw/output/signal.json\` with status "done".
-${CODEBASE_VERIFICATION}
+1. **Startup**: Verify your environment per the Session Startup section. If baseline tests fail, signal error immediately.
+
+2. **Read & orient**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code. Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent.
+
+3. **Write failing tests (RED)**: Before writing any implementation, write tests that define the expected behavior. Then run them. They MUST fail. If they pass before you've implemented anything, your tests are wrong — they're testing the existing state, not the new behavior. Rewrite them until you have genuinely failing tests that will only pass when the task is correctly implemented.
+
+4. **Implement (GREEN)**: Write the minimum code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives.
+
+5. **Verify green**: Run the full relevant test suite — not just your new tests. ALL tests must pass. If a pre-existing test fails, your code broke something. Fix your code, not the test, unless the task explicitly changes the expected behavior.
+
+6. **Commit**: Stage specific files and commit with a descriptive message. Update progress file.
+
+7. **Iterate**: If the task has multiple parts, repeat steps 3-6 for each part. Each cycle should produce a commit.
+
+Not all tasks require tests (e.g., config changes, documentation). If the task genuinely has no testable behavior, skip steps 3 and 5, but state explicitly in your progress file why tests were not applicable.
+${TEST_INTEGRITY}
+
+## Anti-Patterns
+
+IMPORTANT: These are patterns that waste time, ship bugs, or break other agents' work. Avoid them.
+
+- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't.
+- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first.
+- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests.
+- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end.
+- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better.
+- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally.
+- **Self-validating tests**: NEVER write test assertions that duplicate your implementation logic. If your code calculates \`a + b\`, your test MUST hardcode the expected result, not recalculate \`a + b\`. Tests that mirror implementation logic prove nothing.
+- **Test mutation**: If a test fails, investigate why. Fix the code, not the assertion. Changing \`expect(discount).toBe(10)\` to \`expect(discount).toBe(0)\` to make a test pass is shipping a bug.

 ## Scope Rules

@@ -45,14 +69,18 @@ ${CODEBASE_VERIFICATION}
 - If you need to modify a file that another task explicitly owns, coordinate via \`cw ask\` first — that agent may have in-progress changes you'd overwrite.
 ${DEVIATION_RULES}
 ${GIT_WORKFLOW}
+${PROGRESS_TRACKING}
 ${CONTEXT_MANAGEMENT}

-## Anti-Patterns
+## Definition of Done

- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't.
- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first.
- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests.
- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end.
- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better.
- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally.`;
+Before writing signal.json with status "done", verify ALL of these:
+
+- [ ] All tests pass (run the full relevant test suite)
+- [ ] No uncommitted changes (\`git status\` is clean)
+- [ ] Progress file is updated with final status
+- [ ] You implemented exactly what the task asked — no more, no less
+- [ ] You did not modify files outside your task's scope
+
+If any item fails, fix it before signaling. If you cannot fix it, signal with status "error" explaining what's wrong.`;
 }