Files
Codewalkers/.planning/phases/09-extended-scenarios/09-02-PLAN.md
Lukas May 3168b30185 docs(08.1, 09): insert agent output schema phase, update phase 9
Phase 8.1: Agent Output Schema (INSERTED)
- 2 plans in 2 waves (sequential)
- Defines discriminated union schema (done/question/error)
- Updates ClaudeAgentManager to use --json-schema flag
- Aligns MockAgentManager with new schema

Phase 9: Extended Scenarios (updated)
- 2 plans in 1 wave (parallel)
- Now depends on Phase 8.1
- Updated scenario format references
2026-01-31 15:19:14 +01:00

5.4 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, phase_depends_on
phase plan type wave depends_on files_modified autonomous phase_depends_on
09-extended-scenarios 02 execute 1
src/test/e2e/recovery-scenarios.test.ts
true
08.1-agent-output-schema
Create E2E tests proving recovery/resume after interruption and extended agent Q&A scenarios work correctly.

Purpose: Validate system can recover state after interruption and handle complex agent question/answer flows. Output: Recovery scenarios test file with state persistence and Q&A flow tests.

<execution_context> @/.claude/get-shit-done/workflows/execute-plan.md @/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/09-extended-scenarios/09-CONTEXT.md @.planning/phases/08-e2e-scenario-tests/08-02-SUMMARY.md @.planning/phases/08.1-agent-output-schema/08.1-02-SUMMARY.md

@src/test/harness.ts @src/test/fixtures.ts @src/test/e2e/edge-cases.test.ts

Task 1: Create recovery/resume scenario tests src/test/e2e/recovery-scenarios.test.ts Create new test file with describe block "Recovery after interruption". Test scenarios:
  1. Test: queue state survives harness recreation

    • Seed fixture, queue tasks
    • Get queue state (tasks in queue)
    • Create NEW harness pointing to SAME database
    • Query queue state from new harness
    • Verify: queue state matches (tasks still queued)

    Implementation note: createTestHarness() creates fresh in-memory DB. For this test, need to:

    • Extract DB from first harness
    • Create second harness manually reusing same DB
    • Or modify test to verify DB persistence directly
  2. Test: in-progress task recoverable after agent crash

    • Dispatch task, agent crashes mid-execution
    • Verify task status is 'in_progress' (not completed, not lost)
    • Queue same task again (should be dispatchable)
    • Dispatch to new agent
    • Agent completes successfully
    • Verify: task completed, merge can proceed
  3. Test: blocked task state persists and can be unblocked

    • Queue task, block it with reason
    • Verify task in blocked state in DB
    • "Simulate restart" by recreating managers with same DB
    • Query blocked tasks
    • Unblock task
    • Verify: task now dispatchable
  4. Test: merge queue state recoverable

    • Complete task, queue for merge
    • Verify merge queue has pending item
    • Query merge queue state
    • Process merge
    • Verify: merge completes correctly

Focus on proving that DATABASE STATE is the source of truth and managers can be recreated without losing work. npm test src/test/e2e/recovery-scenarios.test.ts -- --run passes 4 recovery tests passing, proving state persistence works

Task 2: Create extended agent Q&A scenario tests src/test/e2e/recovery-scenarios.test.ts Add describe block "Agent Q&A extended scenarios" to the test file. Test scenarios:
  1. Test: multiple questions in sequence from same agent

    • Dispatch task with scenario: first asks question, then after resume asks another
    • Handle first question (agent:waiting -> resume -> agent:resumed)
    • Agent asks second question
    • Handle second question
    • Agent completes
    • Verify: 2 agent:waiting events, 2 agent:resumed events, 1 agent:stopped

    Implementation: MockAgentManager may need scenario that asks multiple questions. If not supported, test single question but verify the state machine works correctly.

  2. Test: question surfaces as message in message queue

    • Dispatch task with waiting_for_input scenario
    • Verify: agent:waiting event includes question
    • Check messageRepository for user-directed message
    • Verify: message contains the question text
  3. Test: agent resumes with user's answer in context

    • Dispatch task, agent asks question
    • Resume with specific answer "PostgreSQL"
    • Verify: resume call includes the answer
    • Agent completes
    • Verify: agent result reflects successful completion
  4. Test: waiting agent blocks task completion

    • Dispatch task, agent enters waiting_for_input
    • Attempt to complete task (should not be allowed while agent waiting)
    • Resume agent, agent completes
    • Now complete task
    • Verify: proper state transitions

Use new schema format for scenarios:

  • { status: 'question', question: '...', options: [...] } for questions
  • { status: 'done', result: '...' } for success
  • { status: 'unrecoverable_error', error: '...' } for failures
  • harness.getPendingQuestion() to retrieve structured question data npm test src/test/e2e/recovery-scenarios.test.ts -- --run passes 4 Q&A tests passing, proving extended question flows work
Before declaring plan complete: - [ ] `npm test src/test/e2e/recovery-scenarios.test.ts -- --run` passes - [ ] At least 8 new tests (4 recovery + 4 Q&A) - [ ] No flaky tests (run twice to verify) - [ ] Test patterns consistent with existing E2E tests

<success_criteria>

  • All tasks completed
  • All verification checks pass
  • Recovery scenarios prove database is source of truth
  • Q&A flow handles multiple questions and state transitions
  • No regressions in existing E2E tests </success_criteria>
After completion, create `.planning/phases/09-extended-scenarios/09-02-SUMMARY.md`