From 6835dd45d50e93deb6cf5b80a333c9a169716576 Mon Sep 17 00:00:00 2001 From: Lukas May Date: Mon, 2 Feb 2026 10:31:45 +0100 Subject: [PATCH] docs(13): create phase plan for real Claude CLI integration tests Phase 13: Real Claude E2E Tests - 1 plan in 1 wave - Validates JSON schemas with actual Claude CLI - Fixes structured_output parsing in ClaudeAgentManager - Tests skipped by default (expensive) --- .../13-real-claude-e2e-tests/13-01-PLAN.md | 178 ++++++++++++++++++ 1 file changed, 178 insertions(+) create mode 100644 .planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md diff --git a/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md b/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md new file mode 100644 index 0000000..811597f --- /dev/null +++ b/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md @@ -0,0 +1,178 @@ +--- +phase: 13-real-claude-e2e-tests +plan: 01 +type: execute +wave: 1 +depends_on: [] +files_modified: [src/test/integration/real-claude.test.ts, src/agent/manager.ts] +autonomous: true +--- + + +Create integration tests that validate Claude CLI JSON schema behavior with real Claude calls. + +Purpose: Verify that the JSON schemas defined in src/agent/schema.ts work correctly with the actual Claude CLI, confirming MockAgentManager accurately simulates real behavior. +Output: Integration test file with real Claude CLI tests (skipped by default due to cost/time), documented findings. + + + +@~/.claude/get-shit-done/workflows/execute-plan.md +@~/.claude/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Key source files +@src/agent/manager.ts +@src/agent/schema.ts +@src/agent/prompts.ts +@src/test/harness.ts + + + + + + Task 1: Create real Claude CLI integration test file + src/test/integration/real-claude.test.ts + +Create integration test file for real Claude CLI validation. Structure: + +1. Create `src/test/integration/` directory if not exists +2. Create test file with `describe.skip` wrapper (tests are expensive, run manually) +3. Add helper function to call Claude CLI directly using execa: + - Takes prompt and JSON schema + - Returns parsed structured_output from CLI response + - Handles timeout (30s default) + +4. Add test cases for each agent mode: + - Execute mode: done status with result + - Execute mode: questions status with array + - Discuss mode: context_complete with decisions + - Breakdown mode: breakdown_complete with phases + - Decompose mode: decompose_complete with tasks + +5. Each test should: + - Use minimal prompt that triggers expected output + - Verify structured_output field is populated + - Verify output matches Zod schema validation + - Log cost for documentation + +Use `describe.skip` so tests don't run in CI. Add comment explaining how to run manually: +`REAL_CLAUDE_TESTS=1 npm test -- --grep "Real Claude"` + +Key insight from validation: Claude CLI returns `structured_output` field (not `result`) when using --json-schema. + + File exists at src/test/integration/real-claude.test.ts with skipped test suite + Integration test file created with all mode tests, skipped by default + + + + Task 2: Fix ClaudeAgentManager to parse structured_output + src/agent/manager.ts + +Update handleAgentCompletion to read from `structured_output` field instead of parsing `result` as JSON. + +Current code (line ~190): +```typescript +const rawOutput = JSON.parse(cliResult.result); +``` + +The Claude CLI with --json-schema returns: +```json +{ + "type": "result", + "result": "", + "structured_output": { "status": "done", "result": "..." } +} +``` + +Update to: +```typescript +// When --json-schema is used, structured output is in structured_output field +const rawOutput = cliResult.structured_output ?? JSON.parse(cliResult.result); +``` + +Also update ClaudeCliResult interface to include structured_output: +```typescript +interface ClaudeCliResult { + type: 'result'; + subtype: 'success' | 'error'; + is_error: boolean; + session_id: string; + result: string; + structured_output?: unknown; // Add this + total_cost_usd?: number; +} +``` + +This is backwards compatible - if structured_output is missing, falls back to parsing result. + + npm run build passes, existing tests still pass + ClaudeAgentManager correctly reads structured_output from Claude CLI response + + + + Task 3: Run real Claude tests and document findings + src/test/integration/real-claude.test.ts + +Run the real Claude tests manually and document findings: + +1. Enable tests temporarily by removing .skip or setting env var +2. Run: `npm test -- src/test/integration/real-claude.test.ts` +3. Capture results: + - Which tests pass/fail + - Response times + - Costs per test + - Any unexpected behavior + +4. Add findings as comments in test file: + ```typescript + /** + * Real Claude CLI Integration Tests + * + * Findings from validation run (DATE): + * - Execute mode: Works, ~$X.XX, ~Xs + * - Multi-question: Works, array format validated + * - Discuss mode: Works, decisions array validated + * - Breakdown mode: Works, phases array validated + * - Decompose mode: Works, tasks array validated + * + * Total validation cost: $X.XX + * + * Conclusion: MockAgentManager accurately simulates real CLI behavior. + * JSON schemas work correctly with Claude CLI --json-schema flag. + */ + ``` + +5. Re-add .skip to prevent accidental runs in CI + + Tests run successfully when enabled, findings documented in file + Real Claude CLI behavior validated, findings documented, tests skipped for CI + + + + + +Before declaring plan complete: +- [ ] src/test/integration/real-claude.test.ts exists with all mode tests +- [ ] ClaudeAgentManager reads structured_output field +- [ ] npm run build passes +- [ ] npm test passes (integration tests skipped) +- [ ] Manual run of real tests documents findings + + + + +- Integration test file created with real Claude CLI tests +- Tests are skipped by default (cost/time) +- ClaudeAgentManager correctly parses structured_output +- At least one real test run validates expected behavior +- Findings documented in test file comments + + + +After completion, create `.planning/phases/13-real-claude-e2e-tests/13-01-SUMMARY.md` +