docs(13): create phase plan for real Claude CLI integration tests

Phase 13: Real Claude E2E Tests
- 1 plan in 1 wave
- Validates JSON schemas with actual Claude CLI
- Fixes structured_output parsing in ClaudeAgentManager
- Tests skipped by default (expensive)
This commit is contained in:
Lukas May
2026-02-02 10:31:45 +01:00
parent 5de7cd5f04
commit 6835dd45d5

View File

@@ -0,0 +1,178 @@
---
phase: 13-real-claude-e2e-tests
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: [src/test/integration/real-claude.test.ts, src/agent/manager.ts]
autonomous: true
---
<objective>
Create integration tests that validate Claude CLI JSON schema behavior with real Claude calls.
Purpose: Verify that the JSON schemas defined in src/agent/schema.ts work correctly with the actual Claude CLI, confirming MockAgentManager accurately simulates real behavior.
Output: Integration test file with real Claude CLI tests (skipped by default due to cost/time), documented findings.
</objective>
<execution_context>
@~/.claude/get-shit-done/workflows/execute-plan.md
@~/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Key source files
@src/agent/manager.ts
@src/agent/schema.ts
@src/agent/prompts.ts
@src/test/harness.ts
</context>
<tasks>
<task type="auto">
<name>Task 1: Create real Claude CLI integration test file</name>
<files>src/test/integration/real-claude.test.ts</files>
<action>
Create integration test file for real Claude CLI validation. Structure:
1. Create `src/test/integration/` directory if not exists
2. Create test file with `describe.skip` wrapper (tests are expensive, run manually)
3. Add helper function to call Claude CLI directly using execa:
- Takes prompt and JSON schema
- Returns parsed structured_output from CLI response
- Handles timeout (30s default)
4. Add test cases for each agent mode:
- Execute mode: done status with result
- Execute mode: questions status with array
- Discuss mode: context_complete with decisions
- Breakdown mode: breakdown_complete with phases
- Decompose mode: decompose_complete with tasks
5. Each test should:
- Use minimal prompt that triggers expected output
- Verify structured_output field is populated
- Verify output matches Zod schema validation
- Log cost for documentation
Use `describe.skip` so tests don't run in CI. Add comment explaining how to run manually:
`REAL_CLAUDE_TESTS=1 npm test -- --grep "Real Claude"`
Key insight from validation: Claude CLI returns `structured_output` field (not `result`) when using --json-schema.
</action>
<verify>File exists at src/test/integration/real-claude.test.ts with skipped test suite</verify>
<done>Integration test file created with all mode tests, skipped by default</done>
</task>
<task type="auto">
<name>Task 2: Fix ClaudeAgentManager to parse structured_output</name>
<files>src/agent/manager.ts</files>
<action>
Update handleAgentCompletion to read from `structured_output` field instead of parsing `result` as JSON.
Current code (line ~190):
```typescript
const rawOutput = JSON.parse(cliResult.result);
```
The Claude CLI with --json-schema returns:
```json
{
"type": "result",
"result": "",
"structured_output": { "status": "done", "result": "..." }
}
```
Update to:
```typescript
// When --json-schema is used, structured output is in structured_output field
const rawOutput = cliResult.structured_output ?? JSON.parse(cliResult.result);
```
Also update ClaudeCliResult interface to include structured_output:
```typescript
interface ClaudeCliResult {
type: 'result';
subtype: 'success' | 'error';
is_error: boolean;
session_id: string;
result: string;
structured_output?: unknown; // Add this
total_cost_usd?: number;
}
```
This is backwards compatible - if structured_output is missing, falls back to parsing result.
</action>
<verify>npm run build passes, existing tests still pass</verify>
<done>ClaudeAgentManager correctly reads structured_output from Claude CLI response</done>
</task>
<task type="auto">
<name>Task 3: Run real Claude tests and document findings</name>
<files>src/test/integration/real-claude.test.ts</files>
<action>
Run the real Claude tests manually and document findings:
1. Enable tests temporarily by removing .skip or setting env var
2. Run: `npm test -- src/test/integration/real-claude.test.ts`
3. Capture results:
- Which tests pass/fail
- Response times
- Costs per test
- Any unexpected behavior
4. Add findings as comments in test file:
```typescript
/**
* Real Claude CLI Integration Tests
*
* Findings from validation run (DATE):
* - Execute mode: Works, ~$X.XX, ~Xs
* - Multi-question: Works, array format validated
* - Discuss mode: Works, decisions array validated
* - Breakdown mode: Works, phases array validated
* - Decompose mode: Works, tasks array validated
*
* Total validation cost: $X.XX
*
* Conclusion: MockAgentManager accurately simulates real CLI behavior.
* JSON schemas work correctly with Claude CLI --json-schema flag.
*/
```
5. Re-add .skip to prevent accidental runs in CI
</action>
<verify>Tests run successfully when enabled, findings documented in file</verify>
<done>Real Claude CLI behavior validated, findings documented, tests skipped for CI</done>
</task>
</tasks>
<verification>
Before declaring plan complete:
- [ ] src/test/integration/real-claude.test.ts exists with all mode tests
- [ ] ClaudeAgentManager reads structured_output field
- [ ] npm run build passes
- [ ] npm test passes (integration tests skipped)
- [ ] Manual run of real tests documents findings
</verification>
<success_criteria>
- Integration test file created with real Claude CLI tests
- Tests are skipped by default (cost/time)
- ClaudeAgentManager correctly parses structured_output
- At least one real test run validates expected behavior
- Findings documented in test file comments
</success_criteria>
<output>
After completion, create `.planning/phases/13-real-claude-e2e-tests/13-01-SUMMARY.md`
</output>