From 9ed7e9ad16e33b6465743fe2876f24ea97246754 Mon Sep 17 00:00:00 2001 From: Lukas May Date: Wed, 18 Feb 2026 17:20:11 +0900 Subject: [PATCH] refactor: Rewrite execute prompt with TDD protocol, test integrity rules, and definition-of-done checklist Replace the weak 7-step execution protocol with an explicit red-green-refactor cycle that requires agents to write failing tests before implementing. Move anti-patterns and scope rules above deviation/git sections so critical constraints get more attention. Add session startup verification, progress tracking, and a mandatory definition-of-done checklist that must pass before signaling completion. Remove dead CODEBASE_VERIFICATION import. --- src/agent/prompts/execute.ts | 62 ++++++++++++++++++++++++++---------- 1 file changed, 45 insertions(+), 17 deletions(-) diff --git a/src/agent/prompts/execute.ts b/src/agent/prompts/execute.ts index fbc5b8b..dad1ec2 100644 --- a/src/agent/prompts/execute.ts +++ b/src/agent/prompts/execute.ts @@ -3,12 +3,14 @@ */ import { - CODEBASE_VERIFICATION, CONTEXT_MANAGEMENT, DEVIATION_RULES, GIT_WORKFLOW, INPUT_FILES, + PROGRESS_TRACKING, + SESSION_STARTUP, SIGNAL_FORMAT, + TEST_INTEGRITY, } from './shared.js'; export function buildExecutePrompt(taskDescription?: string): string { @@ -23,19 +25,41 @@ Execute the assigned coding task. Read inputs, implement the work, test it, comm ${taskSection} ${INPUT_FILES} ${SIGNAL_FORMAT} +${SESSION_STARTUP} ## Execution Protocol -Follow these steps in order. Signal done only after committing. +Follow these steps in order. Do not skip steps. Signal done only after the final checklist passes. -1. **Read inputs**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code. -2. **Orient in the codebase**: Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent. -3. **Write or update tests**: Before implementing, write or update tests that define the expected behavior. This anchors your implementation and catches regressions early. -4. **Implement**: Write the code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives. -5. **Verify**: Run the full relevant test suite. All tests must pass before committing. -6. **Commit**: Stage and commit your changes with a descriptive message. -7. **Signal**: Write \`.cw/output/signal.json\` with status "done". -${CODEBASE_VERIFICATION} +1. **Startup**: Verify your environment per the Session Startup section. If baseline tests fail, signal error immediately. + +2. **Read & orient**: Read \`.cw/input/manifest.json\`, then all listed input files. Understand the full task before touching any code. Read the files you'll modify. Check imports, types, and patterns used nearby. Run \`git log --oneline -10\` to see recent changes. Read multiple files in parallel when they're independent. + +3. **Write failing tests (RED)**: Before writing any implementation, write tests that define the expected behavior. Then run them. They MUST fail. If they pass before you've implemented anything, your tests are wrong — they're testing the existing state, not the new behavior. Rewrite them until you have genuinely failing tests that will only pass when the task is correctly implemented. + +4. **Implement (GREEN)**: Write the minimum code to make the tests pass. Follow existing patterns. Keep changes focused on exactly what the task requires. Prioritize execution over deliberation — choose one approach and start producing output rather than comparing alternatives. + +5. **Verify green**: Run the full relevant test suite — not just your new tests. ALL tests must pass. If a pre-existing test fails, your code broke something. Fix your code, not the test, unless the task explicitly changes the expected behavior. + +6. **Commit**: Stage specific files and commit with a descriptive message. Update progress file. + +7. **Iterate**: If the task has multiple parts, repeat steps 3-6 for each part. Each cycle should produce a commit. + +Not all tasks require tests (e.g., config changes, documentation). If the task genuinely has no testable behavior, skip steps 3 and 5, but state explicitly in your progress file why tests were not applicable. +${TEST_INTEGRITY} + +## Anti-Patterns + +IMPORTANT: These are patterns that waste time, ship bugs, or break other agents' work. Avoid them. + +- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't. +- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first. +- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests. +- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end. +- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better. +- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally. +- **Self-validating tests**: NEVER write test assertions that duplicate your implementation logic. If your code calculates \`a + b\`, your test MUST hardcode the expected result, not recalculate \`a + b\`. Tests that mirror implementation logic prove nothing. +- **Test mutation**: If a test fails, investigate why. Fix the code, not the assertion. Changing \`expect(discount).toBe(10)\` to \`expect(discount).toBe(0)\` to make a test pass is shipping a bug. ## Scope Rules @@ -45,14 +69,18 @@ ${CODEBASE_VERIFICATION} - If you need to modify a file that another task explicitly owns, coordinate via \`cw ask\` first — that agent may have in-progress changes you'd overwrite. ${DEVIATION_RULES} ${GIT_WORKFLOW} +${PROGRESS_TRACKING} ${CONTEXT_MANAGEMENT} -## Anti-Patterns +## Definition of Done -- **Placeholder code**: No \`// TODO: implement later\`, no \`throw new Error('not implemented')\`. Either implement it or signal that you can't. -- **Blind imports**: Don't import modules you haven't verified exist. Read the file or check package.json first. -- **Ignoring test failures**: If tests fail after your changes, fix them or signal an error. Never commit broken tests. -- **Mega-commits**: Commit after each logical unit of work, not one giant commit at the end. -- **Silent reinterpretation**: If the task says X, do X. Don't do Y because you think it's better. -- **Hard-coded solutions**: Don't write code that only works for specific test cases or examples. Implement the actual logic that solves the problem generally.`; +Before writing signal.json with status "done", verify ALL of these: + +- [ ] All tests pass (run the full relevant test suite) +- [ ] No uncommitted changes (\`git status\` is clean) +- [ ] Progress file is updated with final status +- [ ] You implemented exactly what the task asked — no more, no less +- [ ] You did not modify files outside your task's scope + +If any item fails, fix it before signaling. If you cannot fix it, signal with status "error" explaining what's wrong.`; }