docs(13): create phase plan for real Claude CLI integration tests
Phase 13: Real Claude E2E Tests - 1 plan in 1 wave - Validates JSON schemas with actual Claude CLI - Fixes structured_output parsing in ClaudeAgentManager - Tests skipped by default (expensive)
This commit is contained in:
178
.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md
Normal file
178
.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md
Normal file
@@ -0,0 +1,178 @@
|
||||
---
|
||||
phase: 13-real-claude-e2e-tests
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified: [src/test/integration/real-claude.test.ts, src/agent/manager.ts]
|
||||
autonomous: true
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create integration tests that validate Claude CLI JSON schema behavior with real Claude calls.
|
||||
|
||||
Purpose: Verify that the JSON schemas defined in src/agent/schema.ts work correctly with the actual Claude CLI, confirming MockAgentManager accurately simulates real behavior.
|
||||
Output: Integration test file with real Claude CLI tests (skipped by default due to cost/time), documented findings.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@~/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@~/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
|
||||
# Key source files
|
||||
@src/agent/manager.ts
|
||||
@src/agent/schema.ts
|
||||
@src/agent/prompts.ts
|
||||
@src/test/harness.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create real Claude CLI integration test file</name>
|
||||
<files>src/test/integration/real-claude.test.ts</files>
|
||||
<action>
|
||||
Create integration test file for real Claude CLI validation. Structure:
|
||||
|
||||
1. Create `src/test/integration/` directory if not exists
|
||||
2. Create test file with `describe.skip` wrapper (tests are expensive, run manually)
|
||||
3. Add helper function to call Claude CLI directly using execa:
|
||||
- Takes prompt and JSON schema
|
||||
- Returns parsed structured_output from CLI response
|
||||
- Handles timeout (30s default)
|
||||
|
||||
4. Add test cases for each agent mode:
|
||||
- Execute mode: done status with result
|
||||
- Execute mode: questions status with array
|
||||
- Discuss mode: context_complete with decisions
|
||||
- Breakdown mode: breakdown_complete with phases
|
||||
- Decompose mode: decompose_complete with tasks
|
||||
|
||||
5. Each test should:
|
||||
- Use minimal prompt that triggers expected output
|
||||
- Verify structured_output field is populated
|
||||
- Verify output matches Zod schema validation
|
||||
- Log cost for documentation
|
||||
|
||||
Use `describe.skip` so tests don't run in CI. Add comment explaining how to run manually:
|
||||
`REAL_CLAUDE_TESTS=1 npm test -- --grep "Real Claude"`
|
||||
|
||||
Key insight from validation: Claude CLI returns `structured_output` field (not `result`) when using --json-schema.
|
||||
</action>
|
||||
<verify>File exists at src/test/integration/real-claude.test.ts with skipped test suite</verify>
|
||||
<done>Integration test file created with all mode tests, skipped by default</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Fix ClaudeAgentManager to parse structured_output</name>
|
||||
<files>src/agent/manager.ts</files>
|
||||
<action>
|
||||
Update handleAgentCompletion to read from `structured_output` field instead of parsing `result` as JSON.
|
||||
|
||||
Current code (line ~190):
|
||||
```typescript
|
||||
const rawOutput = JSON.parse(cliResult.result);
|
||||
```
|
||||
|
||||
The Claude CLI with --json-schema returns:
|
||||
```json
|
||||
{
|
||||
"type": "result",
|
||||
"result": "",
|
||||
"structured_output": { "status": "done", "result": "..." }
|
||||
}
|
||||
```
|
||||
|
||||
Update to:
|
||||
```typescript
|
||||
// When --json-schema is used, structured output is in structured_output field
|
||||
const rawOutput = cliResult.structured_output ?? JSON.parse(cliResult.result);
|
||||
```
|
||||
|
||||
Also update ClaudeCliResult interface to include structured_output:
|
||||
```typescript
|
||||
interface ClaudeCliResult {
|
||||
type: 'result';
|
||||
subtype: 'success' | 'error';
|
||||
is_error: boolean;
|
||||
session_id: string;
|
||||
result: string;
|
||||
structured_output?: unknown; // Add this
|
||||
total_cost_usd?: number;
|
||||
}
|
||||
```
|
||||
|
||||
This is backwards compatible - if structured_output is missing, falls back to parsing result.
|
||||
</action>
|
||||
<verify>npm run build passes, existing tests still pass</verify>
|
||||
<done>ClaudeAgentManager correctly reads structured_output from Claude CLI response</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 3: Run real Claude tests and document findings</name>
|
||||
<files>src/test/integration/real-claude.test.ts</files>
|
||||
<action>
|
||||
Run the real Claude tests manually and document findings:
|
||||
|
||||
1. Enable tests temporarily by removing .skip or setting env var
|
||||
2. Run: `npm test -- src/test/integration/real-claude.test.ts`
|
||||
3. Capture results:
|
||||
- Which tests pass/fail
|
||||
- Response times
|
||||
- Costs per test
|
||||
- Any unexpected behavior
|
||||
|
||||
4. Add findings as comments in test file:
|
||||
```typescript
|
||||
/**
|
||||
* Real Claude CLI Integration Tests
|
||||
*
|
||||
* Findings from validation run (DATE):
|
||||
* - Execute mode: Works, ~$X.XX, ~Xs
|
||||
* - Multi-question: Works, array format validated
|
||||
* - Discuss mode: Works, decisions array validated
|
||||
* - Breakdown mode: Works, phases array validated
|
||||
* - Decompose mode: Works, tasks array validated
|
||||
*
|
||||
* Total validation cost: $X.XX
|
||||
*
|
||||
* Conclusion: MockAgentManager accurately simulates real CLI behavior.
|
||||
* JSON schemas work correctly with Claude CLI --json-schema flag.
|
||||
*/
|
||||
```
|
||||
|
||||
5. Re-add .skip to prevent accidental runs in CI
|
||||
</action>
|
||||
<verify>Tests run successfully when enabled, findings documented in file</verify>
|
||||
<done>Real Claude CLI behavior validated, findings documented, tests skipped for CI</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Before declaring plan complete:
|
||||
- [ ] src/test/integration/real-claude.test.ts exists with all mode tests
|
||||
- [ ] ClaudeAgentManager reads structured_output field
|
||||
- [ ] npm run build passes
|
||||
- [ ] npm test passes (integration tests skipped)
|
||||
- [ ] Manual run of real tests documents findings
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
|
||||
- Integration test file created with real Claude CLI tests
|
||||
- Tests are skipped by default (cost/time)
|
||||
- ClaudeAgentManager correctly parses structured_output
|
||||
- At least one real test run validates expected behavior
|
||||
- Findings documented in test file comments
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/13-real-claude-e2e-tests/13-01-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user