refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist

Replace the weak 7-step execution protocol with an explicit red-green-refactor
cycle that requires agents to write failing tests before implementing. Move
anti-patterns and scope rules above deviation/git sections so critical
constraints get more attention. Add session startup verification, progress
tracking, and a mandatory definition-of-done checklist that must pass before
signaling completion. Remove dead CODEBASE_VERIFICATION import.
This commit is contained in:
Lukas May
2026-02-18 17:20:11 +09:00
parent b5509232f6
commit 9ed7e9ad16

View File

@@ -3,12 +3,14 @@
*/
import {
CODEBASE_VERIFICATION,
CONTEXT_MANAGEMENT,
DEVIATION_RULES,
GIT_WORKFLOW,
INPUT_FILES,
PROGRESS_TRACKING,
SESSION_STARTUP,
SIGNAL_FORMAT,
TEST_INTEGRITY,
} from './shared.js';
export function buildExecutePrompt(taskDescription?: string): string {
@@ -23,19 +25,41 @@ Execute the assigned coding task. Read inputs, implement the work, test it, comm
${taskSection}
${INPUT_FILES}
${SIGNAL_FORMAT}
${SESSION_STARTUP}
## Execution Protocol
Follow these steps in order. Signal done only after committing.
Follow these steps in order. Do not skip steps. Signal done only after the final checklist passes.
1. **Read inputs**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code.
2. **Orient in the codebase**: Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent.
3. **Write or update tests**: Before implementing, write or update tests that define the expected behavior. This anchors your implementation and catches regressions early.
4. **Implement**: Write the code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives.
5. **Verify**: Run the full relevant test suite. All tests must pass before committing.
6. **Commit**: Stage and commit your changes with a descriptive message.
7. **Signal**: Write \`.cw/output/signal.json\` with status "done".
${CODEBASE_VERIFICATION}
1. **Startup**: Verify your environment per the Session Startup section. If baseline tests fail, signal error immediately.
2. **Read & orient**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code. Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent.
3. **Write failing tests (RED)**: Before writing any implementation, write tests that define the expected behavior. Then run them. They MUST fail. If they pass before you've implemented anything, your tests are wrong — they're testing the existing state, not the new behavior. Rewrite them until you have genuinely failing tests that will only pass when the task is correctly implemented.
4. **Implement (GREEN)**: Write the minimum code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives.
5. **Verify green**: Run the full relevant test suite — not just your new tests. ALL tests must pass. If a pre-existing test fails, your code broke something. Fix your code, not the test, unless the task explicitly changes the expected behavior.
6. **Commit**: Stage specific files and commit with a descriptive message. Update progress file.
7. **Iterate**: If the task has multiple parts, repeat steps 3-6 for each part. Each cycle should produce a commit.
Not all tasks require tests (e.g., config changes, documentation). If the task genuinely has no testable behavior, skip steps 3 and 5, but state explicitly in your progress file why tests were not applicable.
${TEST_INTEGRITY}
## Anti-Patterns
IMPORTANT: These are patterns that waste time, ship bugs, or break other agents' work. Avoid them.
- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't.
- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first.
- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests.
- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end.
- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better.
- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally.
- **Self-validating tests**: NEVER write test assertions that duplicate your implementation logic. If your code calculates \`a + b\`, your test MUST hardcode the expected result, not recalculate \`a + b\`. Tests that mirror implementation logic prove nothing.
- **Test mutation**: If a test fails, investigate why. Fix the code, not the assertion. Changing \`expect(discount).toBe(10)\` to \`expect(discount).toBe(0)\` to make a test pass is shipping a bug.
## Scope Rules
@@ -45,14 +69,18 @@ ${CODEBASE_VERIFICATION}
- If you need to modify a file that another task explicitly owns, coordinate via \`cw ask\` first — that agent may have in-progress changes you'd overwrite.
${DEVIATION_RULES}
${GIT_WORKFLOW}
${PROGRESS_TRACKING}
${CONTEXT_MANAGEMENT}
## Anti-Patterns
## Definition of Done
- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't.
- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first.
- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests.
- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end.
- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better.
- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally.`;
Before writing signal.json with status "done", verify ALL of these:
- [ ] All tests pass (run the full relevant test suite)
- [ ] No uncommitted changes (\`git status\` is clean)
- [ ] Progress file is updated with final status
- [ ] You implemented exactly what the task asked — no more, no less
- [ ] You did not modify files outside your task's scope
If any item fails, fix it before signaling. If you cannot fix it, signal with status "error" explaining what's wrong.`;
}