feat: Add codebase exploration to architect agent prompts

Architect agents (discuss, plan, detail, refine) were producing generic analysis disconnected from the actual codebase. They had full tool access in their worktrees but were never instructed to explore the code. - Add CODEBASE_EXPLORATION shared constant: read project docs, explore structure, check existing patterns, use subagents for parallel exploration - Inject into all 4 architect prompts after INPUT_FILES - Strengthen discuss prompt: analysis method references codebase, examples cite specific paths, definition_of_done requires codebase references - Fix spawnArchitectDiscuss to pass full context (pages/phases/tasks) via gatherInitiativeContext() — was only passing bare initiative metadata - Update docs/agent.md with new tag ordering and shared block table
2026-03-03 12:45:14 +01:00
parent 1043079a08
commit c8f370583a
8 changed files with 60 additions and 25 deletions
--- a/apps/server/agent/prompts/detail.ts
+++ b/apps/server/agent/prompts/detail.ts
@@ -2,13 +2,14 @@
 * Detail mode prompt — break a phase into executable tasks.
 */
-import { CONTEXT_MANAGEMENT, ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
+import { CODEBASE_EXPLORATION, CONTEXT_MANAGEMENT, ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
 export function buildDetailPrompt(): string {
  return `<role>
 You are an Architect agent in DETAIL mode. Break the phase into executable tasks. You do NOT write code.
 </role>
 ${INPUT_FILES}
 ${CODEBASE_EXPLORATION}
 <output_format>
 Write one file per task to \`.cw/output/tasks/{id}.md\`:
--- a/apps/server/agent/prompts/discuss.ts
+++ b/apps/server/agent/prompts/discuss.ts
@@ -2,7 +2,7 @@
 * Discuss mode prompt — clarifying questions and decision capture.
 */
-import { ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
+import { CODEBASE_EXPLORATION, ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
 export function buildDiscussPrompt(): string {
  return `<role>
@@ -10,6 +10,7 @@ You are an Architect agent in the Codewalk multi-agent system operating in DISCU
 Transform user intent into clear, documented decisions. You do NOT write code — you capture decisions.
 </role>
 ${INPUT_FILES}
 ${CODEBASE_EXPLORATION}
 <output_format>
 Write decisions to \`.cw/output/decisions/{id}.md\`:
@@ -21,37 +22,38 @@ ${ID_GENERATION}
 ${SIGNAL_FORMAT}
 <analysis_method>
-Work backward from the goal before asking anything:
+Work backward from the goal, grounded in the actual codebase:
 1. **Observable outcome**: What will the user see/do when this is done?
-2. **Artifacts needed**: What code, config, or infra produces that outcome?
+2. **Existing landscape**: What relevant code, patterns, and conventions already exist? (You explored this in the codebase exploration step — reference specific files.)
-3. **Wiring**: How do the artifacts connect (data flow, API contracts, events)?
+3. **Artifacts needed**: What code, config, or infra produces that outcome? How does it fit into the existing architecture?
-4. **Failure points**: What can go wrong? Edge cases?
+4. **Wiring**: How do the artifacts connect (data flow, API contracts, events)? What existing wiring can be reused?
 5. **Failure points**: What can go wrong? Edge cases?
 Only ask questions this analysis cannot answer from the codebase alone.
 </analysis_method>
 <question_quality>
-Every question must explain what depends on the answer.
+Every question must explain what depends on the answer and reference what the codebase already tells you.
 <examples>
 <example label="bad">
 "How should we handle errors?"
 </example>
 <example label="good">
-"The current API returns HTTP 500 for all errors. Should we: (a) add specific error codes (400, 404, 409) with JSON error bodies, (b) keep 500 but add error details in the response body, or (c) add a custom error middleware that maps domain errors to HTTP codes?"
+"The current API (\`src/server/trpc/\`) uses tRPC with TRPCError for error handling. The existing pattern returns typed error codes (NOT_FOUND, BAD_REQUEST, CONFLICT). Should we: (a) extend this with custom error codes for the new domain, (b) add an error middleware layer that maps domain errors before they reach tRPC, or (c) keep the existing TRPCError pattern as-is since it covers our cases?"
 </example>
 </examples>
 </question_quality>
 <decision_quality>
-Include: what, why, rejected alternatives. For behavioral decisions, add verification criteria.
+Include: what, why, rejected alternatives, and references to existing codebase patterns that informed the choice.
 <examples>
 <example label="bad">
 "We'll use a database for storage"
 </example>
 <example label="good">
-"Use SQLite via better-sqlite3 with drizzle-orm. Schema in src/db/schema.ts, migrations via drizzle-kit. Chosen over PostgreSQL because: single-node deployment, no external deps, existing pattern in the codebase."
+"Use SQLite via better-sqlite3 with drizzle-orm, following the existing pattern in \`apps/server/db/\`. Schema in \`apps/server/db/schema.ts\`, migrations via drizzle-kit (see \`drizzle/\` directory). Chosen over PostgreSQL because: single-node deployment, no external deps, matches existing codebase pattern. Repository port goes in \`apps/server/db/repositories/\`, Drizzle adapter in \`drizzle/\` subdirectory."
 </example>
 </examples>
 </decision_quality>
@@ -63,7 +65,7 @@ Include: what, why, rejected alternatives. For behavioral decisions, add verific
 - **Integration Points**: External systems, APIs, error handling
 - **Testability**: Acceptance criteria, test strategies
-Don't ask what the codebase already answers. If the project uses a framework, don't ask which framework to use.
+Don't ask what the codebase already answers. If the project uses a framework, don't ask which framework to use — you've already explored the codebase and know.
 </question_categories>
 <rules>
@@ -72,7 +74,9 @@ Don't ask what the codebase already answers. If the project uses a framework, do
 <definition_of_done>
 - Every decision includes what, why, and rejected alternatives
 - Every decision references specific files or patterns from the codebase
 - Behavioral decisions include verification criteria
 - No questions the codebase already answers
 - No generic advice — every output is specific to THIS project's architecture
 </definition_of_done>`;
 }
--- a/apps/server/agent/prompts/index.ts
+++ b/apps/server/agent/prompts/index.ts
@@ -5,7 +5,7 @@
 * input files, ID generation) are in shared.ts.
 */
-export { SIGNAL_FORMAT, INPUT_FILES, ID_GENERATION, CONTEXT_MANAGEMENT, DEVIATION_RULES, GIT_WORKFLOW, TEST_INTEGRITY, SESSION_STARTUP, PROGRESS_TRACKING, buildInterAgentCommunication } from './shared.js';
+export { SIGNAL_FORMAT, INPUT_FILES, ID_GENERATION, CODEBASE_EXPLORATION, CONTEXT_MANAGEMENT, DEVIATION_RULES, GIT_WORKFLOW, TEST_INTEGRITY, SESSION_STARTUP, PROGRESS_TRACKING, buildInterAgentCommunication } from './shared.js';
 export { buildExecutePrompt } from './execute.js';
 export { buildDiscussPrompt } from './discuss.js';
 export { buildPlanPrompt } from './plan.js';
--- a/apps/server/agent/prompts/plan.ts
+++ b/apps/server/agent/prompts/plan.ts
@@ -2,13 +2,14 @@
 * Plan mode prompt — plan initiative into phases.
 */
-import { CONTEXT_MANAGEMENT, ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
+import { CODEBASE_EXPLORATION, CONTEXT_MANAGEMENT, ID_GENERATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
 export function buildPlanPrompt(): string {
  return `<role>
 You are an Architect agent in PLAN mode. Plan the initiative into phases. You do NOT write code.
 </role>
 ${INPUT_FILES}
 ${CODEBASE_EXPLORATION}
 <output_format>
 Write one file per phase to \`.cw/output/phases/{id}.md\`:
--- a/apps/server/agent/prompts/refine.ts
+++ b/apps/server/agent/prompts/refine.ts
@@ -2,13 +2,14 @@
 * Refine mode prompt — review and propose edits to initiative pages.
 */
-import { INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
+import { CODEBASE_EXPLORATION, INPUT_FILES, SIGNAL_FORMAT } from './shared.js';
 export function buildRefinePrompt(): string {
  return `<role>
 You are an Architect agent reviewing initiative pages. You do NOT write code.
 </role>
 ${INPUT_FILES}
 ${CODEBASE_EXPLORATION}
 ${SIGNAL_FORMAT}
 <output_format>
--- a/apps/server/agent/prompts/shared.ts
+++ b/apps/server/agent/prompts/shared.ts
@@ -60,6 +60,25 @@ You are in an isolated git worktree. Other agents work in parallel on separate b
 - Run \`git status\` before committing
 </git_workflow>`;
 export const CODEBASE_EXPLORATION = `
 <codebase_exploration>
 Before beginning your analysis, explore the actual codebase to ground every decision in reality.
 **Step 1 — Read project docs**
 Check for CLAUDE.md, README.md, and docs/ at the repo root. These contain architecture decisions, conventions, and patterns you MUST follow. If they exist, read them first — they override any assumptions.
 **Step 2 — Understand project structure**
 Explore the project layout: key directories, entry points, config files (package.json, tsconfig, pyproject.toml, go.mod, etc.). Understand the tech stack, frameworks, and build system before proposing anything.
 **Step 3 — Check existing patterns**
 Before proposing any approach, search for how similar things are already done in the codebase. If the project has an established pattern for routing, state management, database access, testing, etc. — your decisions must build on those patterns, not invent new ones.
 **Step 4 — Use subagents for parallel exploration**
 Spawn subagents to explore different aspects of the codebase simultaneously rather than reading files one at a time. For example: one subagent for project structure and tech stack, another for existing patterns related to the initiative, another for test conventions. Parallelize aggressively.
 **Grounding rule**: Every decision, question, and plan MUST reference specific files, patterns, or conventions found in the codebase. If your output could apply to any generic project without modification, you have failed — start over with deeper exploration.
 </codebase_exploration>`;
 export const CONTEXT_MANAGEMENT = `
 <context_management>
 When reading multiple files or running independent commands, execute them in parallel rather than sequentially. After each commit, update your progress file (see Progress Tracking).
--- a/apps/server/trpc/routers/architect.ts
+++ b/apps/server/trpc/routers/architect.ts
@@ -105,6 +105,8 @@ export function architectProcedures(publicProcedure: ProcedureBuilder) {
          status: 'in_progress',
        });
        const context = await gatherInitiativeContext(ctx.phaseRepository, ctx.taskRepository, ctx.pageRepository, input.initiativeId);
        const prompt = buildDiscussPrompt();
        return agentManager.spawn({
@@ -114,7 +116,12 @@ export function architectProcedures(publicProcedure: ProcedureBuilder) {
          mode: 'discuss',
          provider: input.provider,
          initiativeId: input.initiativeId,
-          inputContext: { initiative },
+          inputContext: {
            initiative,
            pages: context.pages.length > 0 ? context.pages : undefined,
            phases: context.phases.length > 0 ? context.phases : undefined,
            tasks: context.tasks.length > 0 ? context.tasks : undefined,
          },
        });
      }),
--- a/docs/agent.md
+++ b/docs/agent.md
@@ -177,16 +177,17 @@ All prompts follow a consistent tag ordering:
 1. `<role>` — agent identity and mode
 2. `<task>` — dynamic task content (execute mode only)
 3. `<input_files>` — file format documentation
-4. `<output_format>` — what to produce, file paths, frontmatter
+4. `<codebase_exploration>` — codebase grounding instructions (architect modes only)
-5. `<id_generation>` — ID creation via `cw id`
+5. `<output_format>` — what to produce, file paths, frontmatter
-6. `<signal_format>` — completion signaling
+6. `<id_generation>` — ID creation via `cw id`
-7. `<session_startup>` — startup verification steps
+7. `<signal_format>` — completion signaling
-8. Mode-specific tags (see below)
+8. `<session_startup>` — startup verification steps
-9. Rules/constraints tags
+9. Mode-specific tags (see below)
-10. `<progress_tracking>` / `<context_management>`
+10. Rules/constraints tags
-11. `<definition_of_done>` — completion checklist
+11. `<progress_tracking>` / `<context_management>`
-12. `<workspace>` — workspace layout (appended by manager)
+12. `<definition_of_done>` — completion checklist
-13. `<inter_agent_communication>` — per-agent CLI instructions (appended by manager)
+13. `<workspace>` — workspace layout (appended by manager)
 14. `<inter_agent_communication>` — per-agent CLI instructions (appended by manager)
 ### Shared Blocks (`prompts/shared.ts`)
@@ -200,6 +201,7 @@ All prompts follow a consistent tag ordering:
 | `PROGRESS_TRACKING` | `<progress_tracking>` | Maintain `.cw/output/progress.md` after each commit — survives context compaction |
 | `DEVIATION_RULES` | `<deviation_rules>` | Typo→fix, bug→fix if small, missing dep→coordinate, architectural mismatch→STOP |
 | `GIT_WORKFLOW` | `<git_workflow>` | Specific file staging (no `git add .`), no force-push, check status first |
 | `CODEBASE_EXPLORATION` | `<codebase_exploration>` | Architect-mode codebase grounding: read project docs, explore structure, check existing patterns, use subagents for parallel exploration |
 | `CONTEXT_MANAGEMENT` | `<context_management>` | Parallel file reads, cross-reference to progress tracking |
 | `buildInterAgentCommunication()` | `<inter_agent_communication>` | Per-agent CLI instructions for `cw listen`, `cw ask`, `cw answer` |