Codewalkers/.planning/phases/09-extended-scenarios/09-02-PLAN.md at 57784576e410c85989f31451d94e0368d4ee53cd

Files

Lukas May 3168b30185 docs(08.1, 09): insert agent output schema phase, update phase 9

Phase 8.1: Agent Output Schema (INSERTED)
- 2 plans in 2 waves (sequential)
- Defines discriminated union schema (done/question/error)
- Updates ClaudeAgentManager to use --json-schema flag
- Aligns MockAgentManager with new schema

Phase 9: Extended Scenarios (updated)
- 2 plans in 1 wave (parallel)
- Now depends on Phase 8.1
- Updated scenario format references

2026-01-31 15:19:14 +01:00

5.4 KiB

Raw Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, phase_depends_on

phase

plan

type

wave

depends_on

files_modified

autonomous

phase_depends_on

09-extended-scenarios

execute

src/test/e2e/recovery-scenarios.test.ts

true

08.1-agent-output-schema

Create E2E tests proving recovery/resume after interruption and extended agent Q&A scenarios work correctly.

Purpose: Validate system can recover state after interruption and handle complex agent question/answer flows. Output: Recovery scenarios test file with state persistence and Q&A flow tests.

<execution_context> @~~/.claude/get-shit-done/workflows/execute-plan.md @~~/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/09-extended-scenarios/09-CONTEXT.md @.planning/phases/08-e2e-scenario-tests/08-02-SUMMARY.md @.planning/phases/08.1-agent-output-schema/08.1-02-SUMMARY.md

@src/test/harness.ts @src/test/fixtures.ts @src/test/e2e/edge-cases.test.ts

Task 1: Create recovery/resume scenario tests src/test/e2e/recovery-scenarios.test.ts Create new test file with describe block "Recovery after interruption". Test scenarios:

Test: queue state survives harness recreation
- Seed fixture, queue tasks
- Get queue state (tasks in queue)
- Create NEW harness pointing to SAME database
- Query queue state from new harness
- Verify: queue state matches (tasks still queued)
Implementation note: createTestHarness() creates fresh in-memory DB. For this test, need to:
- Extract DB from first harness
- Create second harness manually reusing same DB
- Or modify test to verify DB persistence directly
Test: in-progress task recoverable after agent crash
- Dispatch task, agent crashes mid-execution
- Verify task status is 'in_progress' (not completed, not lost)
- Queue same task again (should be dispatchable)
- Dispatch to new agent
- Agent completes successfully
- Verify: task completed, merge can proceed
Test: blocked task state persists and can be unblocked
- Queue task, block it with reason
- Verify task in blocked state in DB
- "Simulate restart" by recreating managers with same DB
- Query blocked tasks
- Unblock task
- Verify: task now dispatchable
Test: merge queue state recoverable
- Complete task, queue for merge
- Verify merge queue has pending item
- Query merge queue state
- Process merge
- Verify: merge completes correctly

Focus on proving that DATABASE STATE is the source of truth and managers can be recreated without losing work. npm test src/test/e2e/recovery-scenarios.test.ts -- --run passes 4 recovery tests passing, proving state persistence works

Task 2: Create extended agent Q&A scenario tests src/test/e2e/recovery-scenarios.test.ts Add describe block "Agent Q&A extended scenarios" to the test file. Test scenarios:

Test: multiple questions in sequence from same agent
- Dispatch task with scenario: first asks question, then after resume asks another
- Handle first question (agent:waiting -> resume -> agent:resumed)
- Agent asks second question
- Handle second question
- Agent completes
- Verify: 2 agent:waiting events, 2 agent:resumed events, 1 agent:stopped
Implementation: MockAgentManager may need scenario that asks multiple questions. If not supported, test single question but verify the state machine works correctly.
Test: question surfaces as message in message queue
- Dispatch task with waiting_for_input scenario
- Verify: agent:waiting event includes question
- Check messageRepository for user-directed message
- Verify: message contains the question text
Test: agent resumes with user's answer in context
- Dispatch task, agent asks question
- Resume with specific answer "PostgreSQL"
- Verify: resume call includes the answer
- Agent completes
- Verify: agent result reflects successful completion
Test: waiting agent blocks task completion
- Dispatch task, agent enters waiting_for_input
- Attempt to complete task (should not be allowed while agent waiting)
- Resume agent, agent completes
- Now complete task
- Verify: proper state transitions

Use new schema format for scenarios:

{ status: 'question', question: '...', options: [...] } for questions
{ status: 'done', result: '...' } for success
{ status: 'unrecoverable_error', error: '...' } for failures
harness.getPendingQuestion() to retrieve structured question data npm test src/test/e2e/recovery-scenarios.test.ts -- --run passes 4 Q&A tests passing, proving extended question flows work

Before declaring plan complete: - [ ] `npm test src/test/e2e/recovery-scenarios.test.ts -- --run` passes - [ ] At least 8 new tests (4 recovery + 4 Q&A) - [ ] No flaky tests (run twice to verify) - [ ] Test patterns consistent with existing E2E tests

<success_criteria>

All tasks completed
All verification checks pass
Recovery scenarios prove database is source of truth
Q&A flow handles multiple questions and state transitions
No regressions in existing E2E tests </success_criteria>

After completion, create `.planning/phases/09-extended-scenarios/09-02-SUMMARY.md`

5.4 KiB Raw Blame History

5.4 KiB

Raw Blame History