Files
Codewalkers/.planning/phases/13-real-claude-e2e-tests/13-01-PLAN.md
Lukas May 6835dd45d5 docs(13): create phase plan for real Claude CLI integration tests
Phase 13: Real Claude E2E Tests
- 1 plan in 1 wave
- Validates JSON schemas with actual Claude CLI
- Fixes structured_output parsing in ClaudeAgentManager
- Tests skipped by default (expensive)
2026-02-02 10:31:45 +01:00

5.6 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous
phase plan type wave depends_on files_modified autonomous
13-real-claude-e2e-tests 01 execute 1
src/test/integration/real-claude.test.ts
src/agent/manager.ts
true
Create integration tests that validate Claude CLI JSON schema behavior with real Claude calls.

Purpose: Verify that the JSON schemas defined in src/agent/schema.ts work correctly with the actual Claude CLI, confirming MockAgentManager accurately simulates real behavior. Output: Integration test file with real Claude CLI tests (skipped by default due to cost/time), documented findings.

<execution_context> @/.claude/get-shit-done/workflows/execute-plan.md @/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

Key source files

@src/agent/manager.ts @src/agent/schema.ts @src/agent/prompts.ts @src/test/harness.ts

Task 1: Create real Claude CLI integration test file src/test/integration/real-claude.test.ts Create integration test file for real Claude CLI validation. Structure:
  1. Create src/test/integration/ directory if not exists

  2. Create test file with describe.skip wrapper (tests are expensive, run manually)

  3. Add helper function to call Claude CLI directly using execa:

    • Takes prompt and JSON schema
    • Returns parsed structured_output from CLI response
    • Handles timeout (30s default)
  4. Add test cases for each agent mode:

    • Execute mode: done status with result
    • Execute mode: questions status with array
    • Discuss mode: context_complete with decisions
    • Breakdown mode: breakdown_complete with phases
    • Decompose mode: decompose_complete with tasks
  5. Each test should:

    • Use minimal prompt that triggers expected output
    • Verify structured_output field is populated
    • Verify output matches Zod schema validation
    • Log cost for documentation

Use describe.skip so tests don't run in CI. Add comment explaining how to run manually: REAL_CLAUDE_TESTS=1 npm test -- --grep "Real Claude"

Key insight from validation: Claude CLI returns structured_output field (not result) when using --json-schema. File exists at src/test/integration/real-claude.test.ts with skipped test suite Integration test file created with all mode tests, skipped by default

Task 2: Fix ClaudeAgentManager to parse structured_output src/agent/manager.ts Update handleAgentCompletion to read from `structured_output` field instead of parsing `result` as JSON.

Current code (line ~190):

const rawOutput = JSON.parse(cliResult.result);

The Claude CLI with --json-schema returns:

{
  "type": "result",
  "result": "",
  "structured_output": { "status": "done", "result": "..." }
}

Update to:

// When --json-schema is used, structured output is in structured_output field
const rawOutput = cliResult.structured_output ?? JSON.parse(cliResult.result);

Also update ClaudeCliResult interface to include structured_output:

interface ClaudeCliResult {
  type: 'result';
  subtype: 'success' | 'error';
  is_error: boolean;
  session_id: string;
  result: string;
  structured_output?: unknown;  // Add this
  total_cost_usd?: number;
}

This is backwards compatible - if structured_output is missing, falls back to parsing result. npm run build passes, existing tests still pass ClaudeAgentManager correctly reads structured_output from Claude CLI response

Task 3: Run real Claude tests and document findings src/test/integration/real-claude.test.ts Run the real Claude tests manually and document findings:
  1. Enable tests temporarily by removing .skip or setting env var

  2. Run: npm test -- src/test/integration/real-claude.test.ts

  3. Capture results:

    • Which tests pass/fail
    • Response times
    • Costs per test
    • Any unexpected behavior
  4. Add findings as comments in test file:

    /**
     * Real Claude CLI Integration Tests
     *
     * Findings from validation run (DATE):
     * - Execute mode: Works, ~$X.XX, ~Xs
     * - Multi-question: Works, array format validated
     * - Discuss mode: Works, decisions array validated
     * - Breakdown mode: Works, phases array validated
     * - Decompose mode: Works, tasks array validated
     *
     * Total validation cost: $X.XX
     *
     * Conclusion: MockAgentManager accurately simulates real CLI behavior.
     * JSON schemas work correctly with Claude CLI --json-schema flag.
     */
    
  5. Re-add .skip to prevent accidental runs in CI Tests run successfully when enabled, findings documented in file Real Claude CLI behavior validated, findings documented, tests skipped for CI

Before declaring plan complete: - [ ] src/test/integration/real-claude.test.ts exists with all mode tests - [ ] ClaudeAgentManager reads structured_output field - [ ] npm run build passes - [ ] npm test passes (integration tests skipped) - [ ] Manual run of real tests documents findings

<success_criteria>

  • Integration test file created with real Claude CLI tests
  • Tests are skipped by default (cost/time)
  • ClaudeAgentManager correctly parses structured_output
  • At least one real test run validates expected behavior
  • Findings documented in test file comments </success_criteria>
After completion, create `.planning/phases/13-real-claude-e2e-tests/13-01-SUMMARY.md`