docs(13): create phase plan for real Claude CLI integration tests

Phase 13: Real Claude E2E Tests - 1 plan in 1 wave - Validates JSON schemas with actual Claude CLI - Fixes structured_output parsing in ClaudeAgentManager - Tests skipped by default (expensive)
2026-02-02 10:31:45 +01:00
parent 5de7cd5f04
commit 6835dd45d5
1 changed files with 178 additions and 0 deletions
--- a/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md
+++ b/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md
@@ -0,0 +1,178 @@
+---
+phase: 13-real-claude-e2e-tests
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified: [src/test/integration/real-claude.test.ts, src/agent/manager.ts]
+autonomous: true
+---
+
+<objective>
+Create integration tests that validate Claude CLI JSON schema behavior with real Claude calls.
+
+Purpose: Verify that the JSON schemas defined in src/agent/schema.ts work correctly with the actual Claude CLI, confirming MockAgentManager accurately simulates real behavior.
+Output: Integration test file with real Claude CLI tests (skipped by default due to cost/time), documented findings.
+</objective>
+
+<execution_context>
+@~/.claude/get-shit-done/workflows/execute-plan.md
+@~/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+
+# Key source files
+@src/agent/manager.ts
+@src/agent/schema.ts
+@src/agent/prompts.ts
+@src/test/harness.ts
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create real Claude CLI integration test file</name>
+  <files>src/test/integration/real-claude.test.ts</files>
+  <action>
+Create integration test file for real Claude CLI validation. Structure:
+
+1. Create `src/test/integration/` directory if not exists
+2. Create test file with `describe.skip` wrapper (tests are expensive, run manually)
+3. Add helper function to call Claude CLI directly using execa:
+   - Takes prompt and JSON schema
+   - Returns parsed structured_output from CLI response
+   - Handles timeout (30s default)
+
+4. Add test cases for each agent mode:
+   - Execute mode: done status with result
+   - Execute mode: questions status with array
+   - Discuss mode: context_complete with decisions
+   - Breakdown mode: breakdown_complete with phases
+   - Decompose mode: decompose_complete with tasks
+
+5. Each test should:
+   - Use minimal prompt that triggers expected output
+   - Verify structured_output field is populated
+   - Verify output matches Zod schema validation
+   - Log cost for documentation
+
+Use `describe.skip` so tests don't run in CI. Add comment explaining how to run manually:
+`REAL_CLAUDE_TESTS=1 npm test -- --grep "Real Claude"`
+
+Key insight from validation: Claude CLI returns `structured_output` field (not `result`) when using --json-schema.
+  </action>
+  <verify>File exists at src/test/integration/real-claude.test.ts with skipped test suite</verify>
+  <done>Integration test file created with all mode tests, skipped by default</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Fix ClaudeAgentManager to parse structured_output</name>
+  <files>src/agent/manager.ts</files>
+  <action>
+Update handleAgentCompletion to read from `structured_output` field instead of parsing `result` as JSON.
+
+Current code (line ~190):
+```typescript
+const rawOutput = JSON.parse(cliResult.result);
+```
+
+The Claude CLI with --json-schema returns:
+```json
+{
+  "type": "result",
+  "result": "",
+  "structured_output": { "status": "done", "result": "..." }
+}
+```
+
+Update to:
+```typescript
+// When --json-schema is used, structured output is in structured_output field
+const rawOutput = cliResult.structured_output ?? JSON.parse(cliResult.result);
+```
+
+Also update ClaudeCliResult interface to include structured_output:
+```typescript
+interface ClaudeCliResult {
+  type: 'result';
+  subtype: 'success' | 'error';
+  is_error: boolean;
+  session_id: string;
+  result: string;
+  structured_output?: unknown;  // Add this
+  total_cost_usd?: number;
+}
+```
+
+This is backwards compatible - if structured_output is missing, falls back to parsing result.
+  </action>
+  <verify>npm run build passes, existing tests still pass</verify>
+  <done>ClaudeAgentManager correctly reads structured_output from Claude CLI response</done>
+</task>
+
+<task type="auto">
+  <name>Task 3: Run real Claude tests and document findings</name>
+  <files>src/test/integration/real-claude.test.ts</files>
+  <action>
+Run the real Claude tests manually and document findings:
+
+1. Enable tests temporarily by removing .skip or setting env var
+2. Run: `npm test -- src/test/integration/real-claude.test.ts`
+3. Capture results:
+   - Which tests pass/fail
+   - Response times
+   - Costs per test
+   - Any unexpected behavior
+
+4. Add findings as comments in test file:
+   ```typescript
+   /**
+    * Real Claude CLI Integration Tests
+    *
+    * Findings from validation run (DATE):
+    * - Execute mode: Works, ~$X.XX, ~Xs
+    * - Multi-question: Works, array format validated
+    * - Discuss mode: Works, decisions array validated
+    * - Breakdown mode: Works, phases array validated
+    * - Decompose mode: Works, tasks array validated
+    *
+    * Total validation cost: $X.XX
+    *
+    * Conclusion: MockAgentManager accurately simulates real CLI behavior.
+    * JSON schemas work correctly with Claude CLI --json-schema flag.
+    */
+   ```
+
+5. Re-add .skip to prevent accidental runs in CI
+  </action>
+  <verify>Tests run successfully when enabled, findings documented in file</verify>
+  <done>Real Claude CLI behavior validated, findings documented, tests skipped for CI</done>
+</task>
+
+</tasks>
+
+<verification>
+Before declaring plan complete:
+- [ ] src/test/integration/real-claude.test.ts exists with all mode tests
+- [ ] ClaudeAgentManager reads structured_output field
+- [ ] npm run build passes
+- [ ] npm test passes (integration tests skipped)
+- [ ] Manual run of real tests documents findings
+</verification>
+
+<success_criteria>
+
+- Integration test file created with real Claude CLI tests
+- Tests are skipped by default (cost/time)
+- ClaudeAgentManager correctly parses structured_output
+- At least one real test run validates expected behavior
+- Findings documented in test file comments
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/13-real-claude-e2e-tests/13-01-SUMMARY.md`
+</output>