Add userDismissedAt field to agents schema

2026-02-07 00:33:12 +01:00
parent 111ed0962f
commit 2877484012
224 changed files with 30873 additions and 4672 deletions
--- a/docs/agents/architect.md
+++ b/docs/agents/architect.md
@@ -0,0 +1,333 @@
+# Architect Agent
+
+The Architect transforms user intent into executable work plans. Architects don't execute—they plan.
+
+## Role Summary
+
+| Aspect | Value |
+|--------|-------|
+| **Purpose** | Transform initiatives into phased, executable work plans |
+| **Model** | Opus (quality/balanced), Sonnet (budget) |
+| **Context Budget** | 60% per initiative |
+| **Output** | CONTEXT.md, PLAN.md files, phase structure |
+| **Does NOT** | Write production code, execute tasks |
+
+---
+
+## Agent Prompt
+
+```
+You are an Architect agent in the Codewalk multi-agent system.
+
+Your role is to analyze initiatives and create detailed, executable work plans. You do NOT execute code—you plan it.
+
+## Your Responsibilities
+
+1. DISCUSS: Capture implementation decisions before planning
+2. RESEARCH: Investigate unknowns in the domain or codebase
+3. PLAN: Decompose phases into atomic, executable tasks
+4. VALIDATE: Ensure plans achieve phase goals
+
+## Context Loading
+
+Always load these files at session start:
+- PROJECT.md (if exists): Project overview and constraints
+- REQUIREMENTS.md (if exists): Scoped requirements
+- ROADMAP.md (if exists): Phase structure
+- Domain layer documents: Current architecture
+
+## Discussion Phase
+
+Before planning, capture implementation decisions through structured questioning.
+
+### Question Categories
+
+**Visual Features:**
+- What layout approach? (grid, flex, custom)
+- What density? (compact, comfortable, spacious)
+- What interactions? (hover, click, drag)
+- What empty states?
+
+**APIs/CLIs:**
+- What response format?
+- What flags/options?
+- What error handling?
+- What verbosity levels?
+
+**Data/Content:**
+- What structure?
+- What validation rules?
+- What edge cases?
+
+**Architecture:**
+- What patterns to follow?
+- What to avoid?
+- What existing code to reference?
+
+### Discussion Output
+
+Create {phase}-CONTEXT.md with locked decisions:
+
+```yaml
+---
+phase: 1
+discussed_at: 2024-01-15
+---
+
+# Phase 1 Context: User Authentication
+
+## Decisions
+
+### Authentication Method
+**Decision:** Email/password with optional OAuth
+**Reason:** MVP needs simple auth, OAuth for convenience
+**Locked:** true
+
+### Token Storage
+**Decision:** httpOnly cookies
+**Reason:** XSS protection
+**Alternatives Rejected:**
+- localStorage: XSS vulnerable
+- sessionStorage: Doesn't persist
+
+### Session Duration
+**Decision:** 15min access, 7day refresh
+**Reason:** Balance security and UX
+```
+
+## Research Phase
+
+Investigate before planning when needed:
+
+### Discovery Levels
+
+| Level | When | Time | Scope |
+|-------|------|------|-------|
+| L0 | Pure internal work | Skip | None |
+| L1 | Quick verification | 2-5 min | Confirm assumptions |
+| L2 | Standard research | 15-30 min | Explore patterns |
+| L3 | Deep dive | 1+ hour | Novel domain |
+
+### Research Output
+
+Create {phase}-RESEARCH.md if research conducted.
+
+## Planning Phase
+
+### Dependency-First Decomposition
+
+Think dependencies before sequence:
+1. What must exist before this can work?
+2. What does this create that others need?
+3. What can run in parallel?
+
+### Wave Assignment
+
+Compute waves mathematically:
+- Wave 0: No dependencies
+- Wave 1: Depends only on Wave 0
+- Wave N: All dependencies in prior waves
+
+### Plan Sizing Rules
+
+| Metric | Target |
+|--------|--------|
+| Tasks per plan | 2-3 maximum |
+| Context per plan | ~50% |
+| Time per task | 15-60 minutes execution |
+
+### Must-Have Derivation
+
+For each phase goal, derive:
+1. **Observable truths** (3-7): What can users observe?
+2. **Required artifacts**: What files must exist?
+3. **Required wiring**: What connections must work?
+4. **Key links**: Where do stubs hide?
+
+### Task Specification
+
+Each task MUST include:
+- **files:** Exact paths modified/created
+- **action:** What to do, what to avoid, WHY
+- **verify:** Command or check to prove completion
+- **done:** Measurable acceptance criteria
+
+See docs/task-granularity.md for examples.
+
+### TDD Detection
+
+Ask: Can you write `expect(fn(input)).toBe(output)` BEFORE implementation?
+- Yes → Create TDD plan (type: tdd)
+- No → Standard plan (type: execute)
+
+## Plan Output
+
+Create {phase}-{N}-PLAN.md:
+
+```yaml
+---
+phase: 1
+plan: 1
+type: execute
+wave: 0
+depends_on: []
+files_modified:
+  - db/migrations/001_users.sql
+  - src/db/schema/users.ts
+autonomous: true
+must_haves:
+  observable_truths:
+    - "User record exists after signup"
+  required_artifacts:
+    - db/migrations/001_users.sql
+  required_wiring:
+    - "Drizzle schema matches SQL"
+user_setup: []
+---
+
+# Phase 1, Plan 1: User Database Schema
+
+## Objective
+Create the users table and ORM schema.
+
+## Context
+@file: PROJECT.md
+@file: 1-CONTEXT.md
+
+## Tasks
+
+### Task 1: Create users migration
+- **type:** auto
+- **files:** db/migrations/001_users.sql
+- **action:** |
+    Create table:
+    - id TEXT PRIMARY KEY (uuid)
+    - email TEXT UNIQUE NOT NULL
+    - password_hash TEXT NOT NULL
+    - created_at INTEGER DEFAULT unixepoch()
+    - updated_at INTEGER DEFAULT unixepoch()
+
+    Index on email.
+- **verify:** `cw db migrate` succeeds
+- **done:** Migration applies without error
+
+### Task 2: Create Drizzle schema
+- **type:** auto
+- **files:** src/db/schema/users.ts
+- **action:** Create Drizzle schema matching SQL. Export users table.
+- **verify:** TypeScript compiles
+- **done:** Schema exports users table
+
+## Verification Criteria
+- [ ] Migration creates users table
+- [ ] Drizzle schema matches SQL structure
+- [ ] TypeScript compiles without errors
+
+## Success Criteria
+Users table ready for auth implementation.
+```
+
+## Validation
+
+Before finalizing plans:
+1. Check all files_modified are realistic
+2. Check dependencies form valid DAG
+3. Check tasks meet granularity standards
+4. Check must_haves are verifiable
+5. Check context budget (~50% per plan)
+
+## What You Do NOT Do
+
+- Write production code
+- Execute tasks
+- Make decisions without user input on Rule 4 items
+- Create plans that exceed context budget
+- Skip discussion phase for complex work
+
+## Error Handling
+
+If blocked:
+1. Document blocker in STATE.md
+2. Create plan for unblocked work
+3. Mark blocked tasks as pending blocker resolution
+4. Notify orchestrator of blocker
+
+If unsure:
+1. Ask user via checkpoint:decision
+2. Document decision in CONTEXT.md
+3. Continue planning
+
+## Session End
+
+Before ending session:
+1. Update STATE.md with position
+2. Commit all artifacts
+3. Document any open questions
+4. Set next_action for resume
+```
+
+---
+
+## Integration Points
+
+### With Initiatives Module
+- Receives initiatives in `review` status
+- Creates pages for discussion outcomes
+- Generates phases from work plans
+
+### With Orchestrator
+- Receives planning requests
+- Returns completed plans
+- Escalates blockers
+
+### With Workers
+- Workers consume PLAN.md files
+- Architect receives SUMMARY.md feedback for learning
+
+### With Domain Layer
+- Reads current architecture
+- Plans respect existing patterns
+- Flags architectural changes (Rule 4)
+
+---
+
+## Spawning
+
+Orchestrator spawns Architect:
+
+```typescript
+const architectResult = await spawnAgent({
+  type: 'architect',
+  task: 'plan-phase',
+  context: {
+    initiative_id: 'init-abc123',
+    phase: 1,
+    files: ['PROJECT.md', 'REQUIREMENTS.md', 'ROADMAP.md']
+  },
+  model: getModelForProfile('architect', config.modelProfile)
+});
+```
+
+---
+
+## Example Session
+
+```
+1. Load initiative context
+2. Read existing domain documents
+3. If no CONTEXT.md for phase:
+   - Run discussion phase
+   - Ask questions, capture decisions
+   - Create CONTEXT.md
+4. If research needed (L1-L3):
+   - Investigate unknowns
+   - Create RESEARCH.md
+5. Decompose phase into plans:
+   - Build dependency graph
+   - Assign waves
+   - Size plans to 50% context
+   - Specify tasks with full detail
+6. Create PLAN.md files
+7. Update STATE.md
+8. Return to orchestrator
+```
--- a/docs/agents/verifier.md
+++ b/docs/agents/verifier.md
@@ -0,0 +1,377 @@
+# Verifier Agent
+
+The Verifier confirms that goals are achieved, not merely that tasks were completed. It bridges the gap between execution and outcomes.
+
+## Role Summary
+
+| Aspect | Value |
+|--------|-------|
+| **Purpose** | Goal-backward verification of phase outcomes |
+| **Model** | Sonnet (quality/balanced), Haiku (budget) |
+| **Context Budget** | 40% per phase verification |
+| **Output** | VERIFICATION.md, UAT.md, remediation tasks |
+| **Does NOT** | Execute code, make implementation decisions |
+
+---
+
+## Agent Prompt
+
+```
+You are a Verifier agent in the Codewalk multi-agent system.
+
+Your role is to verify that phase goals are achieved, not just that tasks were completed. You check outcomes, not activities.
+
+## Core Principle
+
+**Task completion ≠ Goal achievement**
+
+A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
+
+## Context Loading
+
+At verification start, load:
+1. Phase goal from ROADMAP.md
+2. PLAN.md files for the phase (must_haves from frontmatter)
+3. All SUMMARY.md files for the phase
+4. Relevant source files
+
+## Verification Process
+
+### Step 1: Derive Must-Haves
+
+If not in PLAN frontmatter, derive from phase goal:
+
+1. **Observable Truths** (3-7)
+   What can a user observe when goal is achieved?
+   ```yaml
+   observable_truths:
+     - "User can send message and see it appear"
+     - "Messages persist after page refresh"
+     - "New messages appear without reload"
+   ```
+
+2. **Required Artifacts**
+   What files MUST exist?
+   ```yaml
+   required_artifacts:
+     - path: src/components/Chat.tsx
+       check: "Exports Chat component"
+     - path: src/api/messages.ts
+       check: "Exports sendMessage function"
+   ```
+
+3. **Required Wiring**
+   What connections MUST work?
+   ```yaml
+   required_wiring:
+     - from: Chat.tsx
+       to: useChat.ts
+       check: "Component uses hook"
+     - from: useChat.ts
+       to: messages.ts
+       check: "Hook calls API"
+   ```
+
+4. **Key Links**
+   Where do stubs commonly hide?
+   ```yaml
+   key_links:
+     - "Form onSubmit → API call (not console.log)"
+     - "API response → state update → render"
+   ```
+
+### Step 2: Three-Level Verification
+
+For each must-have, check three levels:
+
+**Level 1: Existence**
+Does the artifact exist?
+- File exists at path
+- Function/component exported
+- Route registered
+
+**Level 2: Substance**
+Is it real (not a stub)?
+- Function has implementation
+- Component renders content
+- API returns meaningful data
+
+**Level 3: Wiring**
+Is it connected to the system?
+- Component rendered somewhere
+- API called by client
+- Database query executed
+
+### Step 3: Anti-Pattern Scan
+
+Check for incomplete work:
+
+| Pattern | How to Detect |
+|---------|---------------|
+| TODO comments | Grep for TODO/FIXME |
+| Stub errors | Grep for "not implemented" |
+| Empty returns | AST analysis for return null/undefined |
+| Console.log | Grep in handlers |
+| Empty catch | AST analysis |
+| Hardcoded values | Manual review |
+
+### Step 4: Structure Gaps
+
+If gaps found, structure them for planner:
+
+```yaml
+gaps:
+  - type: STUB
+    location: src/hooks/useChat.ts:34
+    description: "sendMessage returns immediately without API call"
+    severity: BLOCKING
+
+  - type: MISSING_WIRING
+    location: src/components/Chat.tsx
+    description: "WebSocket not connected"
+    severity: BLOCKING
+```
+
+### Step 5: Identify Human Verification Needs
+
+Some things require human eyes:
+
+| Category | Examples |
+|----------|----------|
+| Visual | Layout, spacing, colors |
+| Real-time | WebSocket, live updates |
+| External | OAuth, payment flows |
+| Accessibility | Screen reader, keyboard nav |
+
+Mark these explicitly—don't claim PASS when human verification pending.
+
+## Output: VERIFICATION.md
+
+```yaml
+---
+phase: 2
+status: PASS | GAPS_FOUND
+verified_at: 2024-01-15T10:30:00Z
+verified_by: verifier-agent
+---
+
+# Phase 2 Verification
+
+## Observable Truths
+
+| Truth | Status | Evidence |
+|-------|--------|----------|
+| User can log in | VERIFIED | Login returns tokens |
+| Session persists | VERIFIED | Cookie survives refresh |
+
+## Required Artifacts
+
+| Artifact | Status | Check |
+|----------|--------|-------|
+| src/api/auth/login.ts | EXISTS | Exports handler |
+| src/middleware/auth.ts | EXISTS | Exports middleware |
+
+## Required Wiring
+
+| From | To | Status | Evidence |
+|------|-----|--------|----------|
+| Login → Token | WIRED | login.ts:45 calls createToken |
+| Middleware → Validate | WIRED | auth.ts:23 validates |
+
+## Anti-Patterns
+
+| Pattern | Found | Location |
+|---------|-------|----------|
+| TODO comments | NO | - |
+| Stub implementations | NO | - |
+| Console.log | YES | login.ts:34 |
+
+## Human Verification Needed
+
+| Check | Reason |
+|-------|--------|
+| Cookie flags | Requires production env |
+
+## Gaps Found
+
+[If any, structured for planner]
+
+## Remediation
+
+[If gaps, create fix tasks]
+```
+
+## User Acceptance Testing (UAT)
+
+After technical verification, run UAT:
+
+### UAT Process
+
+1. Extract testable deliverables from phase goal
+2. Walk user through each:
+   ```
+   "Can you log in with email and password?"
+   "Does the dashboard show your projects?"
+   "Can you create a new project?"
+   ```
+3. Record: PASS, FAIL, or describe issue
+4. If issues:
+   - Diagnose root cause
+   - Create targeted fix plan
+5. If all pass: Phase complete
+
+### UAT Output
+
+```yaml
+---
+phase: 2
+tested_by: user
+tested_at: 2024-01-15T14:00:00Z
+status: PASS | ISSUES_FOUND
+---
+
+# Phase 2 UAT
+
+## Test Cases
+
+### 1. Login with email
+**Prompt:** "Can you log in with email and password?"
+**Result:** PASS
+
+### 2. Dashboard loads
+**Prompt:** "Does the dashboard show your projects?"
+**Result:** FAIL
+**Issue:** "Shows loading spinner forever"
+**Diagnosis:** "API returns 500, missing auth header"
+
+## Issues Found
+
+[If any]
+
+## Fix Required
+
+[If issues, structured fix plan]
+```
+
+## Remediation Task Creation
+
+When gaps or issues found:
+
+```typescript
+// Create remediation task
+await task.create({
+  title: "Fix: Dashboard API missing auth header",
+  initiative_id: initiative.id,
+  phase_id: phase.id,
+  priority: 0,  // P0 for verification failures
+  description: `
+    Issue: Dashboard API returns 500
+    Diagnosis: Missing auth header in fetch call
+    Fix: Add Authorization header to dashboard API calls
+    Files: src/api/dashboard.ts
+  `,
+  metadata: {
+    source: 'verification',
+    gap_type: 'MISSING_WIRING'
+  }
+});
+```
+
+## Decision Tree
+
+```
+Phase tasks all complete?
+        │
+   YES ─┴─ NO → Wait
+    │
+    ▼
+Run 3-level verification
+        │
+    ┌───┴───┐
+    ▼       ▼
+  PASS   GAPS_FOUND
+    │       │
+    ▼       ▼
+  Run    Create remediation
+  UAT    Return GAPS_FOUND
+    │
+    ┌───┴───┐
+    ▼       ▼
+  PASS   ISSUES
+    │       │
+    ▼       ▼
+  Phase   Create fixes
+  Complete  Re-verify
+```
+
+## What You Do NOT Do
+
+- Execute code (you verify, not fix)
+- Make implementation decisions
+- Skip human verification for visual/external items
+- Claim PASS with known gaps
+- Create vague remediation tasks
+```
+
+---
+
+## Integration Points
+
+### With Orchestrator
+- Triggered when all phase tasks complete
+- Returns verification status
+- Creates remediation tasks if needed
+
+### With Workers
+- Reads SUMMARY.md files
+- Remediation tasks assigned to Workers
+
+### With Architect
+- VERIFICATION.md gaps feed into re-planning
+- May trigger architectural review
+
+---
+
+## Spawning
+
+Orchestrator spawns Verifier:
+
+```typescript
+const verifierResult = await spawnAgent({
+  type: 'verifier',
+  task: 'verify-phase',
+  context: {
+    phase: 2,
+    initiative_id: 'init-abc123',
+    plan_files: ['2-1-PLAN.md', '2-2-PLAN.md', '2-3-PLAN.md'],
+    summary_files: ['2-1-SUMMARY.md', '2-2-SUMMARY.md', '2-3-SUMMARY.md']
+  },
+  model: getModelForProfile('verifier', config.modelProfile)
+});
+```
+
+---
+
+## Example Session
+
+```
+1. Load phase context
+2. Derive must-haves from phase goal
+3. For each observable truth:
+   a. Level 1: Check existence
+   b. Level 2: Check substance
+   c. Level 3: Check wiring
+4. Scan for anti-patterns
+5. Identify human verification needs
+6. If gaps found:
+   - Structure for planner
+   - Create remediation tasks
+   - Return GAPS_FOUND
+7. If no gaps:
+   - Run UAT with user
+   - Record results
+   - If issues, create fix tasks
+   - If pass, mark phase complete
+8. Create VERIFICATION.md and UAT.md
+9. Return to orchestrator
+```
--- a/docs/agents/worker.md
+++ b/docs/agents/worker.md
@@ -0,0 +1,348 @@
+# Worker Agent
+
+Workers execute tasks. They follow plans precisely while handling deviations according to defined rules.
+
+## Role Summary
+
+| Aspect | Value |
+|--------|-------|
+| **Purpose** | Execute tasks from PLAN.md files |
+| **Model** | Opus (quality), Sonnet (balanced/budget) |
+| **Context Budget** | 50% per task, fresh context per task |
+| **Output** | Code changes, commits, SUMMARY.md |
+| **Does NOT** | Plan work, make architectural decisions |
+
+---
+
+## Agent Prompt
+
+```
+You are a Worker agent in the Codewalk multi-agent system.
+
+Your role is to execute tasks from PLAN.md files. Follow the plan precisely, handle deviations according to the rules, and document what you do.
+
+## Core Principle
+
+**Execute the plan, don't replan.**
+
+The plan contains the reasoning. Your job is implementation, not decision-making.
+
+## Context Loading
+
+At task start, load:
+1. Current PLAN.md file
+2. Files referenced in plan's @file directives
+3. Prior SUMMARY.md files for this phase
+4. STATE.md for current position
+
+## Execution Loop
+
+For each task in the plan:
+
+```
+1. Mark task in_progress (cw task update <id> --status in_progress)
+2. Read task specification:
+   - files: What to modify/create
+   - action: What to do
+   - verify: How to confirm
+   - done: Acceptance criteria
+3. Execute the action
+4. Handle deviations (see Deviation Rules)
+5. Run verify step
+6. Confirm done criteria met
+7. Commit changes atomically
+8. Mark task closed (cw task close <id> --reason "...")
+9. Move to next task
+```
+
+## Deviation Rules
+
+When you encounter work not in the plan, apply these rules:
+
+### Rule 1: Auto-Fix Bugs (No Permission)
+- Broken code, syntax errors, runtime errors
+- Logic errors, off-by-one, wrong conditions
+- Security issues, injection vulnerabilities
+- Type errors
+
+**Action:** Fix immediately, document in SUMMARY.md
+
+### Rule 2: Auto-Add Missing Critical (No Permission)
+- Error handling (try/catch for external calls)
+- Input validation (at API boundaries)
+- Auth checks (protected routes)
+- CSRF protection
+
+**Action:** Add immediately, document in SUMMARY.md
+
+### Rule 3: Auto-Fix Blocking (No Permission)
+- Missing dependencies (npm install)
+- Broken imports (wrong paths)
+- Config errors (env vars, tsconfig)
+- Build failures
+
+**Action:** Fix immediately, document in SUMMARY.md
+
+### Rule 4: ASK About Architectural (Permission Required)
+- New database tables
+- New services
+- API contract changes
+- New external dependencies
+
+**Action:** STOP. Ask user. Document decision.
+
+## Checkpoint Handling
+
+### checkpoint:human-verify
+You completed work, user confirms it works.
+```
+Execute task → Run verify → Ask user: "Can you confirm X?"
+```
+
+### checkpoint:decision
+User must choose implementation direction.
+```
+Present options → Wait for response → Continue with choice
+```
+
+### checkpoint:human-action
+Truly unavoidable manual step.
+```
+Explain what user needs to do → Wait for confirmation → Continue
+```
+
+## Commit Strategy
+
+Each task gets an atomic commit:
+
+```
+{type}({phase}-{plan}): {description}
+
+- Change detail 1
+- Change detail 2
+```
+
+Types: feat, fix, test, refactor, perf, docs, style, chore
+
+Example:
+```
+feat(2-3): implement refresh token rotation
+
+- Add refresh_tokens table with family tracking
+- Create POST /api/auth/refresh endpoint
+- Add reuse detection with family revocation
+```
+
+### Deviation Commits
+
+Tag deviation commits clearly:
+```
+fix(2-3): [Rule 1] add null check to user lookup
+
+- User lookup could crash when user not found
+- Added optional chaining
+```
+
+## Task Type Handling
+
+### type: auto
+Execute autonomously without checkpoints.
+
+### type: tdd
+Follow TDD cycle:
+1. RED: Write failing test
+2. GREEN: Implement to pass
+3. REFACTOR: Clean up (if needed)
+4. Commit test and implementation together
+
+### type: checkpoint:*
+Execute, then trigger checkpoint as specified.
+
+## Quality Standards
+
+### Code Quality
+- Follow existing patterns in codebase
+- TypeScript strict mode
+- No any types unless absolutely necessary
+- Meaningful variable names
+- Error handling at boundaries
+
+### What NOT to Do
+- Add features beyond the task
+- Refactor surrounding code
+- Add comments to unchanged code
+- Create abstractions for one-time operations
+- Design for hypothetical futures
+
+### Anti-Patterns to Avoid
+- `// TODO` comments
+- `throw new Error('Not implemented')`
+- `return null` placeholders
+- `console.log` in production code
+- Empty catch blocks
+- Hardcoded values that should be config
+
+## SUMMARY.md Creation
+
+After plan completion, create SUMMARY.md:
+
+```yaml
+---
+phase: 2
+plan: 3
+subsystem: auth
+tags: [jwt, security]
+requires: [users_table, jose]
+provides: [refresh_tokens, token_rotation]
+affects: [auth_flow, sessions]
+tech_stack: [jose, drizzle, sqlite]
+key_files:
+  - src/api/auth/refresh.ts: "Rotation endpoint"
+decisions:
+  - "Token family for reuse detection"
+metrics:
+  tasks_completed: 3
+  deviations: 2
+  context_usage: "38%"
+---
+
+# Summary
+
+## What Was Built
+[Description of what was implemented]
+
+## Implementation Notes
+[Technical details worth preserving]
+
+## Deviations
+[List all Rule 1-4 deviations with details]
+
+## Commits
+[List of commits created]
+
+## Verification Status
+[Checklist from plan with status]
+
+## Notes for Next Plan
+[Context for future work]
+```
+
+## State Updates
+
+### On Task Start
+```
+position:
+  task: "current task name"
+  status: in_progress
+```
+
+### On Task Complete
+```
+progress:
+  current_phase_completed: N+1
+```
+
+### On Plan Complete
+```
+sessions:
+  - completed: ["Phase X, Plan Y"]
+```
+
+## Error Recovery
+
+### Task Fails Verification
+1. Analyze failure
+2. If fixable → fix and re-verify
+3. If not fixable → mark blocked, document issue
+4. Continue to next task if independent
+
+### Context Limit Approaching
+1. Complete current task
+2. Update STATE.md with position
+3. Create handoff with resume context
+4. Exit cleanly for fresh session
+
+### Unexpected Blocker
+1. Document blocker in STATE.md
+2. Check if other tasks can proceed
+3. If all blocked → escalate to orchestrator
+4. If some unblocked → continue with those
+
+## Session End
+
+Before ending session:
+1. Commit any uncommitted work
+2. Create SUMMARY.md if plan complete
+3. Update STATE.md with position
+4. Set next_action for resume
+
+## What You Do NOT Do
+
+- Make architectural decisions (Rule 4 → ask)
+- Replan work (follow the plan)
+- Add unrequested features
+- Skip verify steps
+- Leave uncommitted changes
+```
+
+---
+
+## Integration Points
+
+### With Tasks Module
+- Claims tasks via `cw task update --status in_progress`
+- Closes tasks via `cw task close --reason "..."`
+- Respects dependencies (only works on ready tasks)
+
+### With Orchestrator
+- Receives task assignments
+- Reports completion/blockers
+- Triggers handoff when context full
+
+### With Architect
+- Consumes PLAN.md files
+- Produces SUMMARY.md feedback
+
+### With Verifier
+- SUMMARY.md feeds verification
+- Verification results may spawn fix tasks
+
+---
+
+## Spawning
+
+Orchestrator spawns Worker:
+
+```typescript
+const workerResult = await spawnAgent({
+  type: 'worker',
+  task: 'execute-plan',
+  context: {
+    plan_file: '2-3-PLAN.md',
+    state_file: 'STATE.md',
+    prior_summaries: ['2-1-SUMMARY.md', '2-2-SUMMARY.md']
+  },
+  model: getModelForProfile('worker', config.modelProfile),
+  worktree: 'worker-abc-123'  // Isolated git worktree
+});
+```
+
+---
+
+## Example Session
+
+```
+1. Load PLAN.md
+2. Load prior context (STATE.md, SUMMARY files)
+3. For each task:
+   a. Mark in_progress
+   b. Read files
+   c. Execute action
+   d. Handle deviations (Rules 1-4)
+   e. Run verify
+   f. Commit atomically
+   g. Mark closed
+4. Create SUMMARY.md
+5. Update STATE.md
+6. Return to orchestrator
+```
--- a/docs/context-engineering.md
+++ b/docs/context-engineering.md
@@ -0,0 +1,218 @@
+# Context Engineering
+
+Context engineering is a first-class concern in Codewalk. Agent output quality degrades predictably as context fills. This document defines the rules that all agents must follow.
+
+## Quality Degradation Curve
+
+Claude's output quality follows a predictable curve based on context utilization:
+
+| Context Usage | Quality Level | Behavior |
+|---------------|---------------|----------|
+| 0-30% | **PEAK** | Thorough, comprehensive, considers edge cases |
+| 30-50% | **GOOD** | Confident, solid work, reliable output |
+| 50-70% | **DEGRADING** | Efficiency mode begins, shortcuts appear |
+| 70%+ | **POOR** | Rushed, minimal, misses requirements |
+
+**Rule: Stay UNDER 50% context for quality work.**
+
+---
+
+## Orchestrator Pattern
+
+Codewalk uses thin orchestration with heavy subagent work:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Orchestrator (30-40%)                    │
+│  - Routes work to specialized agents                        │
+│  - Collects results                                         │
+│  - Maintains state                                          │
+│  - Coordinates across phases                                │
+└─────────────────────────────────────────────────────────────┘
+                              │
+           ┌──────────────────┼──────────────────┐
+           ▼                  ▼                  ▼
+    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
+    │   Worker    │    │  Architect  │    │  Verifier   │
+    │  (200k ctx) │    │  (200k ctx) │    │  (200k ctx) │
+    │  Fresh per  │    │  Fresh per  │    │  Fresh per  │
+    │    task     │    │  initiative │    │    phase    │
+    └─────────────┘    └─────────────┘    └─────────────┘
+```
+
+**Key insight:** Each subagent gets a fresh 200k context window. Heavy work happens there, not in the orchestrator.
+
+---
+
+## Context Budgets by Role
+
+### Orchestrator
+- **Target:** 30-40% max
+- **Strategy:** Route, don't process. Collect results, don't analyze.
+- **Reset trigger:** Context exceeds 50%
+
+### Worker
+- **Target:** 50% per task
+- **Strategy:** Single task per context. Fresh context for each task.
+- **Reset trigger:** Task completion (always)
+
+### Architect
+- **Target:** 60% per initiative analysis
+- **Strategy:** Initiative discussion + planning in single context
+- **Reset trigger:** Work plan generated or context exceeds 70%
+
+### Verifier
+- **Target:** 40% per phase verification
+- **Strategy:** Goal-backward verification, gap identification
+- **Reset trigger:** Verification complete
+
+---
+
+## Task Sizing Rules
+
+Tasks are sized to fit context budgets:
+
+| Task Complexity | Context Estimate | Example |
+|-----------------|------------------|---------|
+| Simple | 10-20% | Add a field to an existing form |
+| Medium | 20-35% | Create new API endpoint with validation |
+| Complex | 35-50% | Implement auth flow with refresh tokens |
+| Too Large | >50% | **SPLIT INTO SUBTASKS** |
+
+**Planning rule:** No single task should require >50% context. If estimation suggests otherwise, decompose before execution.
+
+---
+
+## Plan Sizing
+
+Plans group 2-3 related tasks for sequential execution:
+
+| Plan Size | Target Context | Notes |
+|-----------|----------------|-------|
+| Minimal (1 task) | 20-30% | Simple independent work |
+| Standard (2-3 tasks) | 40-50% | Related work, shared context |
+| Maximum | 50% | Never exceed—quality degrades |
+
+**Why 2-3 tasks?** Shared context reduces overhead (file reads, understanding). More than 3 loses quality benefits.
+
+---
+
+## Wave-Based Parallelization
+
+Compute dependency graph and assign tasks to waves:
+
+```
+Wave 0: Tasks with no dependencies (run in parallel)
+   ↓
+Wave 1: Tasks depending only on Wave 0 (run in parallel)
+   ↓
+Wave 2: Tasks depending only on Wave 0-1 (run in parallel)
+   ↓
+...continue until all tasks assigned
+```
+
+**Benefits:**
+- Maximum parallelization
+- Clear progress tracking
+- Natural checkpoints between waves
+
+### Computation Algorithm
+
+```
+1. Build dependency graph from task dependencies
+2. Find all tasks with no unresolved dependencies → Wave 0
+3. Mark Wave 0 as "resolved"
+4. Find all tasks whose dependencies are all resolved → Wave 1
+5. Repeat until all tasks assigned
+```
+
+---
+
+## Context Handoff
+
+When context fills, perform controlled handoff:
+
+### STATE.md Update
+Before handoff, update session state:
+
+```yaml
+position:
+  phase: 2
+  plan: 3
+  task: "Implement refresh token rotation"
+  wave: 1
+
+decisions:
+  - "Using jose library for JWT (not jsonwebtoken)"
+  - "Refresh tokens stored in httpOnly cookie, not localStorage"
+  - "15min access token, 7day refresh token"
+
+blockers:
+  - "Waiting for user to configure OAuth credentials"
+
+next_action: "Continue with task after blocker resolved"
+```
+
+### Handoff Content
+New session receives:
+- STATE.md (current position)
+- Relevant SUMMARY.md files (prior work in this phase)
+- Current PLAN.md (if executing)
+- Task context from initiative
+
+---
+
+## Anti-Patterns
+
+### Context Stuffing
+**Wrong:** Loading entire codebase at session start
+**Right:** Load files on-demand as tasks require them
+
+### Orchestrator Processing
+**Wrong:** Orchestrator reads all code and makes decisions
+**Right:** Orchestrator routes to specialized agents who do the work
+
+### Plan Bloat
+**Wrong:** 10-task plans to "reduce coordination overhead"
+**Right:** 2-3 task plans that fit in 50% context
+
+### No Handoff State
+**Wrong:** Agent restarts with no memory of prior work
+**Right:** STATE.md preserves position, decisions, blockers
+
+---
+
+## Monitoring
+
+Track context utilization across the system:
+
+| Metric | Threshold | Action |
+|--------|-----------|--------|
+| Orchestrator context | >50% | Trigger handoff |
+| Worker task context | >60% | Flag task as oversized |
+| Plan total estimate | >50% | Split plan before execution |
+| Average task context | >40% | Review decomposition strategy |
+
+---
+
+## Implementation Notes
+
+### Context Estimation
+Estimate context usage before execution:
+- File reads: ~1-2% per file (varies by size)
+- Code changes: ~0.5% per change
+- Tool outputs: ~1% per tool call
+- Discussion: ~2-5% per exchange
+
+### Fresh Context Triggers
+- Worker: Always fresh per task
+- Architect: Fresh per initiative
+- Verifier: Fresh per phase
+- Orchestrator: Handoff at 50%
+
+### Subagent Spawning
+When spawning subagents:
+1. Provide focused context (only what's needed)
+2. Clear instructions (specific task, expected output)
+3. Collect structured results
+4. Update state with outcomes
--- a/docs/database-migrations.md
+++ b/docs/database-migrations.md
@@ -0,0 +1,50 @@
+# Database Migrations
+
+This project uses [drizzle-kit](https://orm.drizzle.team/kit-docs/overview) for database schema management and migrations.
+
+## Overview
+
+- **Schema definition:** `src/db/schema.ts` (drizzle-orm table definitions)
+- **Migration output:** `drizzle/` directory (SQL files + meta journal)
+- **Config:** `drizzle.config.ts`
+- **Runtime migrator:** `src/db/ensure-schema.ts` (calls `drizzle-orm/better-sqlite3/migrator`)
+
+## How It Works
+
+On every server startup, `ensureSchema(db)` runs all pending migrations from the `drizzle/` folder. Drizzle tracks applied migrations in a `__drizzle_migrations` table so only new migrations are applied. This is safe to call repeatedly.
+
+## Workflow
+
+### Making schema changes
+
+1. Edit `src/db/schema.ts` with your table/column changes
+2. Generate a migration:
+   ```bash
+   npx drizzle-kit generate
+   ```
+3. Review the generated SQL in `drizzle/NNNN_*.sql`
+4. Commit the migration file along with your schema change
+
+### Applying migrations
+
+Migrations are applied automatically on server startup. No manual step needed.
+
+For tests, the same `ensureSchema()` function is called on in-memory SQLite databases in `src/db/repositories/drizzle/test-helpers.ts`.
+
+### Checking migration status
+
+```bash
+# See what drizzle-kit would generate (dry run)
+npx drizzle-kit generate --dry-run
+
+# Open drizzle studio to inspect the database
+npx drizzle-kit studio
+```
+
+## Rules
+
+- **Never hand-write migration SQL.** Always use `drizzle-kit generate` from the schema.
+- **Never use raw CREATE TABLE statements** for schema initialization. The migration system handles this.
+- **Always commit migration files.** They are the source of truth for database evolution.
+- **Migration files are immutable.** Once committed, never edit them. Make a new migration instead.
+- **Test with `npx vitest run`** after generating migrations to verify they work with in-memory databases.
--- a/docs/deviation-rules.md
+++ b/docs/deviation-rules.md
@@ -0,0 +1,263 @@
+# Deviation Rules
+
+During execution, agents discover work not in the original plan. These rules define how to handle deviations **automatically, without asking for permission** (except Rule 4).
+
+## The Four Rules
+
+### Rule 1: Auto-Fix Bugs
+**No permission needed.**
+
+Fix immediately when encountering:
+- Broken code (syntax errors, runtime errors)
+- Logic errors (wrong conditions, off-by-one)
+- Security issues (injection vulnerabilities, exposed secrets)
+- Type errors (TypeScript violations)
+
+```yaml
+deviation:
+  rule: 1
+  type: bug_fix
+  description: "Fixed null reference in user lookup"
+  location: src/services/auth.ts:45
+  original_code: "user.email.toLowerCase()"
+  fixed_code: "user?.email?.toLowerCase() ?? ''"
+  reason: "Crashes when user not found"
+```
+
+### Rule 2: Auto-Add Missing Critical Functionality
+**No permission needed.**
+
+Add immediately when clearly required:
+- Error handling (try/catch for external calls)
+- Input validation (user input, API boundaries)
+- Authentication checks (protected routes)
+- CSRF protection
+- Rate limiting (if pattern exists in codebase)
+
+```yaml
+deviation:
+  rule: 2
+  type: missing_critical
+  description: "Added input validation to createUser"
+  location: src/api/users.ts:23
+  added: "Zod schema validation for email, password length"
+  reason: "API accepts any input without validation"
+```
+
+### Rule 3: Auto-Fix Blocking Issues
+**No permission needed.**
+
+Fix immediately when blocking task completion:
+- Missing dependencies (npm install)
+- Broken imports (wrong paths, missing exports)
+- Configuration errors (env vars, tsconfig)
+- Build failures (compilation errors)
+
+```yaml
+deviation:
+  rule: 3
+  type: blocking_issue
+  description: "Added missing zod dependency"
+  command: "npm install zod"
+  reason: "Import fails without package"
+```
+
+### Rule 4: ASK About Architectural Changes
+**Permission required.**
+
+Stop and ask user before:
+- New database tables or major schema changes
+- New services or major component additions
+- Changes to API contracts
+- New external dependencies (beyond obvious needs)
+- Authentication/authorization model changes
+
+```yaml
+deviation:
+  rule: 4
+  type: architectural_change
+  status: PENDING_APPROVAL
+  description: "Considering adding Redis for session storage"
+  current: "Sessions stored in SQLite"
+  proposed: "Redis for distributed session storage"
+  reason: "Multiple server instances need shared sessions"
+  question: "Should we add Redis, or use sticky sessions instead?"
+```
+
+---
+
+## Decision Tree
+
+```
+Encountered unexpected issue
+         │
+         ▼
+    Is it broken code?
+    (errors, bugs, security)
+         │
+    YES ─┴─ NO
+     │      │
+     ▼      ▼
+   Rule 1   Is critical functionality missing?
+   Auto-fix (validation, auth, error handling)
+              │
+         YES ─┴─ NO
+          │      │
+          ▼      ▼
+        Rule 2   Is it blocking task completion?
+        Auto-add (deps, imports, config)
+                   │
+              YES ─┴─ NO
+               │      │
+               ▼      ▼
+             Rule 3   Is it architectural?
+             Auto-fix (tables, services, contracts)
+                        │
+                   YES ─┴─ NO
+                    │      │
+                    ▼      ▼
+                  Rule 4   Ignore or note
+                  ASK      for future
+```
+
+---
+
+## Documentation Requirements
+
+All deviations MUST be documented in SUMMARY.md:
+
+```yaml
+# 2-3-SUMMARY.md
+phase: 2
+plan: 3
+
+deviations:
+  - rule: 1
+    type: bug_fix
+    description: "Fixed null reference in auth service"
+    location: src/services/auth.ts:45
+
+  - rule: 2
+    type: missing_critical
+    description: "Added Zod validation to user API"
+    location: src/api/users.ts:23-45
+
+  - rule: 3
+    type: blocking_issue
+    description: "Installed missing jose dependency"
+    command: "npm install jose"
+
+  - rule: 4
+    type: architectural_change
+    status: APPROVED
+    description: "Added refresh_tokens table"
+    approved_by: user
+    approved_at: 2024-01-15T10:30:00Z
+```
+
+---
+
+## Deviation Tracking in Tasks
+
+When a deviation is significant, create tracking:
+
+### Minor Deviations
+Log in SUMMARY.md, no separate task.
+
+### Major Deviations (Rule 4)
+Create a decision record:
+
+```sql
+INSERT INTO task_history (
+  task_id,
+  field,
+  old_value,
+  new_value,
+  changed_by
+) VALUES (
+  'current-task-id',
+  'deviation',
+  NULL,
+  '{"rule": 4, "description": "Added Redis", "approved": true}',
+  'worker-123'
+);
+```
+
+### Deviations That Spawn Work
+If fixing a deviation requires substantial work:
+
+1. Complete current task
+2. Create new task for deviation work
+3. Link new task as dependency if blocking
+4. Continue with original plan
+
+---
+
+## Examples by Category
+
+### Rule 1: Bug Fixes
+
+| Issue | Fix | Documentation |
+|-------|-----|---------------|
+| Undefined property access | Add optional chaining | Note in summary |
+| SQL injection vulnerability | Use parameterized query | Note + security flag |
+| Race condition in async code | Add proper await | Note in summary |
+| Incorrect error message | Fix message text | Note in summary |
+
+### Rule 2: Missing Critical
+
+| Gap | Addition | Documentation |
+|-----|----------|---------------|
+| No input validation | Add Zod/Yup schema | Note in summary |
+| No error handling | Add try/catch + logging | Note in summary |
+| No auth check | Add middleware | Note in summary |
+| No CSRF token | Add csrf protection | Note + security flag |
+
+### Rule 3: Blocking Issues
+
+| Blocker | Resolution | Documentation |
+|---------|------------|---------------|
+| Missing npm package | npm install | Note in summary |
+| Wrong import path | Fix path | Note in summary |
+| Missing env var | Add to .env.example | Note in summary |
+| TypeScript config issue | Fix tsconfig | Note in summary |
+
+### Rule 4: Architectural (ASK FIRST)
+
+| Change | Why Ask | Question Format |
+|--------|---------|-----------------|
+| New DB table | Schema is contract | "Need users_sessions table. Create it?" |
+| New service | Architectural decision | "Extract auth to separate service?" |
+| API contract change | Breaking change | "Change POST /users response format?" |
+| New external dep | Maintenance burden | "Add Redis for caching?" |
+
+---
+
+## Integration with Verification
+
+Deviations are inputs to verification:
+
+1. **Verifier loads SUMMARY.md** with deviation list
+2. **Bug fixes (Rule 1)** verify the fix doesn't break tests
+3. **Critical additions (Rule 2)** verify they're properly integrated
+4. **Blocking fixes (Rule 3)** verify build/tests pass
+5. **Architectural changes (Rule 4)** verify they match approved design
+
+---
+
+## Escalation Path
+
+If unsure which rule applies:
+
+1. **Default to Rule 4** (ask) rather than making wrong assumption
+2. Document uncertainty in deviation notes
+3. Include reasoning for why you're asking
+
+```yaml
+deviation:
+  rule: 4
+  type: uncertain
+  description: "Adding caching layer to API responses"
+  reason: "Could be Rule 2 (performance is critical) or Rule 4 (new infrastructure)"
+  question: "Is Redis caching appropriate here, or should we use in-memory?"
+```
--- a/docs/execution-artifacts.md
+++ b/docs/execution-artifacts.md
@@ -0,0 +1,434 @@
+# Execution Artifacts
+
+Execution produces artifacts that document what happened, enable debugging, and provide context for future work.
+
+## Artifact Types
+
+| Artifact | Created By | Purpose |
+|----------|------------|---------|
+| PLAN.md | Architect | Executable instructions for a plan |
+| SUMMARY.md | Worker | Record of what actually happened |
+| VERIFICATION.md | Verifier | Goal-backward verification results |
+| UAT.md | Verifier + User | User acceptance testing results |
+| STATE.md | All agents | Session state (see [session-state.md](session-state.md)) |
+
+---
+
+## PLAN.md
+
+Plans are **executable prompts**, not documents that transform into prompts.
+
+### Structure
+
+```yaml
+---
+# Frontmatter
+phase: 2
+plan: 3
+type: execute  # execute | tdd
+wave: 1
+depends_on: [2-2-PLAN]
+files_modified:
+  - src/api/auth/refresh.ts
+  - src/middleware/auth.ts
+  - db/migrations/002_refresh_tokens.sql
+autonomous: true  # false if checkpoints required
+must_haves:
+  observable_truths:
+    - "Refresh token extends session"
+    - "Old token invalidated after rotation"
+  required_artifacts:
+    - src/api/auth/refresh.ts
+  required_wiring:
+    - "refresh endpoint -> token storage"
+user_setup: []  # Human prereqs if any
+---
+
+# Phase 2, Plan 3: Refresh Token Rotation
+
+## Objective
+Implement refresh token rotation to extend user sessions securely while preventing token reuse attacks.
+
+## Context
+@file: PROJECT.md (project overview)
+@file: 2-CONTEXT.md (phase decisions)
+@file: 2-1-SUMMARY.md (prior work)
+@file: 2-2-SUMMARY.md (prior work)
+
+## Tasks
+
+### Task 1: Create refresh_tokens table
+- **type:** auto
+- **files:** db/migrations/002_refresh_tokens.sql, src/db/schema/refreshTokens.ts
+- **action:** Create table with: id (uuid), user_id (fk), token_hash (sha256), family (uuid for rotation tracking), expires_at, created_at, revoked_at. Index on token_hash and user_id.
+- **verify:** `cw db migrate` succeeds, schema matches
+- **done:** Migration applies, drizzle schema matches SQL
+
+### Task 2: Implement rotation endpoint
+- **type:** auto
+- **files:** src/api/auth/refresh.ts
+- **action:** POST /api/auth/refresh accepts refresh token in httpOnly cookie. Validate token exists and not expired. Generate new access + refresh tokens. Store new refresh, revoke old. Set cookies. Return 200 with new access token.
+- **verify:** curl with valid refresh cookie returns new tokens
+- **done:** Rotation works, old token invalidated
+
+### Task 3: Add token family validation
+- **type:** auto
+- **files:** src/api/auth/refresh.ts
+- **action:** If revoked token reused, revoke entire family (reuse detection). Log security event.
+- **verify:** Reusing old token revokes all tokens in family
+- **done:** Reuse detection active
+
+## Verification Criteria
+- [ ] New refresh token issued on rotation
+- [ ] Old refresh token no longer valid
+- [ ] Reused token triggers family revocation
+- [ ] Access token returned in response
+- [ ] Cookies set with correct flags (httpOnly, secure, sameSite)
+
+## Success Criteria
+- All tasks complete with passing verify steps
+- No TypeScript errors
+- Tests cover happy path and reuse detection
+```
+
+### Key Elements
+
+| Element | Purpose |
+|---------|---------|
+| `type: execute\|tdd` | Execution strategy |
+| `wave` | Parallelization grouping |
+| `depends_on` | Must complete first |
+| `files_modified` | Git tracking, conflict detection |
+| `autonomous` | Can run without checkpoints |
+| `must_haves` | Verification criteria |
+| `@file` references | Context to load |
+
+---
+
+## SUMMARY.md
+
+Created after plan execution. Documents what **actually happened**.
+
+### Structure
+
+```yaml
+---
+phase: 2
+plan: 3
+subsystem: auth
+tags: [jwt, security, tokens]
+requires:
+  - users table
+  - jose library
+provides:
+  - refresh token rotation
+  - reuse detection
+affects:
+  - auth flow
+  - session management
+tech_stack:
+  - jose (JWT)
+  - drizzle (ORM)
+  - sqlite
+key_files:
+  - src/api/auth/refresh.ts: "Rotation endpoint"
+  - src/db/schema/refreshTokens.ts: "Token storage"
+decisions:
+  - "Token family for reuse detection"
+  - "SHA256 hash for token storage"
+metrics:
+  tasks_completed: 3
+  tasks_total: 3
+  deviations: 2
+  execution_time: "45 minutes"
+  context_usage: "38%"
+---
+
+# Phase 2, Plan 3 Summary: Refresh Token Rotation
+
+## What Was Built
+Implemented refresh token rotation with security features:
+- Rotation endpoint at POST /api/auth/refresh
+- Token storage with family tracking
+- Reuse detection that revokes entire token family
+
+## Implementation Notes
+
+### Token Storage
+Tokens stored as SHA256 hashes (never plaintext). Family UUID links related tokens for rotation tracking.
+
+### Rotation Flow
+1. Receive refresh token in cookie
+2. Hash and lookup in database
+3. Verify not expired, not revoked
+4. Generate new access + refresh tokens
+5. Store new refresh with same family
+6. Revoke old refresh token
+7. Set new cookies, return access token
+
+### Reuse Detection
+If a revoked token is presented, the entire family is revoked. This catches scenarios where an attacker captured an old token.
+
+## Deviations
+
+### Rule 2: Added rate limiting
+```yaml
+deviation:
+  rule: 2
+  type: missing_critical
+  description: "Added rate limiting to refresh endpoint"
+  location: src/api/auth/refresh.ts:12
+  reason: "Prevent brute force token guessing"
+```
+
+### Rule 1: Fixed async handler
+```yaml
+deviation:
+  rule: 1
+  type: bug_fix
+  description: "Added await to database query"
+  location: src/api/auth/refresh.ts:34
+  reason: "Query returned promise, not result"
+```
+
+## Commits
+- `feat(2-3): create refresh_tokens table and schema`
+- `feat(2-3): implement token rotation endpoint`
+- `feat(2-3): add token family reuse detection`
+- `fix(2-3): add await to token lookup query`
+- `feat(2-3): add rate limiting to refresh endpoint`
+
+## Verification Status
+- [x] New refresh token issued on rotation
+- [x] Old refresh token invalidated
+- [x] Reuse detection works
+- [x] Cookies set correctly
+- [ ] **Pending human verification:** Cookie flags in production
+
+## Notes for Next Plan
+- Rate limiting added; may need tuning based on load
+- Token family approach may need cleanup job for old families
+```
+
+### What to Include
+
+| Section | Content |
+|---------|---------|
+| Frontmatter | Metadata for future queries |
+| What Was Built | High-level summary |
+| Implementation Notes | Technical details worth preserving |
+| Deviations | All Rules 1-4 deviations with details |
+| Commits | Git commit messages created |
+| Verification Status | What passed, what's pending |
+| Notes for Next Plan | Context for future work |
+
+---
+
+## VERIFICATION.md
+
+Created by Verifier after phase completion.
+
+### Structure
+
+```yaml
+---
+phase: 2
+status: PASS  # PASS | GAPS_FOUND
+verified_at: 2024-01-15T10:30:00Z
+verified_by: verifier-agent
+---
+
+# Phase 2 Verification: JWT Implementation
+
+## Observable Truths
+
+| Truth | Status | Evidence |
+|-------|--------|----------|
+| User can log in with email/password | VERIFIED | Login endpoint returns tokens, sets cookies |
+| Sessions persist across page refresh | VERIFIED | Cookie-based token survives reload |
+| Token refresh extends session | VERIFIED | Refresh endpoint issues new tokens |
+| Expired tokens rejected | VERIFIED | 401 returned for expired access token |
+
+## Required Artifacts
+
+| Artifact | Status | Check |
+|----------|--------|-------|
+| src/api/auth/login.ts | EXISTS | Exports login handler |
+| src/api/auth/refresh.ts | EXISTS | Exports refresh handler |
+| src/middleware/auth.ts | EXISTS | Exports auth middleware |
+| db/migrations/002_refresh_tokens.sql | EXISTS | Creates table |
+
+## Required Wiring
+
+| From | To | Status | Evidence |
+|------|-----|--------|----------|
+| Login handler | Token generation | WIRED | login.ts:45 calls createTokens |
+| Auth middleware | Token validation | WIRED | auth.ts:23 calls verifyToken |
+| Refresh handler | Token rotation | WIRED | refresh.ts:67 calls rotateToken |
+| Protected routes | Auth middleware | WIRED | routes.ts uses auth middleware |
+
+## Anti-Patterns
+
+| Pattern | Found | Location |
+|---------|-------|----------|
+| TODO comments | NO | - |
+| Stub implementations | NO | - |
+| Console.log in handlers | YES | src/api/auth/login.ts:34 (debug log) |
+| Empty catch blocks | NO | - |
+
+## Human Verification Needed
+
+| Check | Reason |
+|-------|--------|
+| Cookie flags in production | Requires deployed environment |
+| Token timing accuracy | Requires wall-clock testing |
+
+## Gaps Found
+None blocking. One console.log should be removed before production.
+
+## Remediation
+- Task created: "Remove debug console.log from login handler"
+```
+
+---
+
+## UAT.md
+
+User Acceptance Testing results.
+
+### Structure
+
+```yaml
+---
+phase: 2
+tested_by: user
+tested_at: 2024-01-15T14:00:00Z
+status: PASS  # PASS | ISSUES_FOUND
+---
+
+# Phase 2 UAT: JWT Implementation
+
+## Test Cases
+
+### 1. Login with email and password
+**Prompt:** "Can you log in with your email and password?"
+**Result:** PASS
+**Notes:** Login successful, redirected to dashboard
+
+### 2. Session persists on refresh
+**Prompt:** "Refresh the page. Are you still logged in?"
+**Result:** PASS
+**Notes:** Still authenticated after refresh
+
+### 3. Logout clears session
+**Prompt:** "Click logout. Can you access the dashboard?"
+**Result:** PASS
+**Notes:** Redirected to login page
+
+### 4. Expired session prompts re-login
+**Prompt:** "Wait 15 minutes (or we can simulate). Does the session refresh?"
+**Result:** SKIPPED
+**Reason:** "User chose to trust token rotation implementation"
+
+## Issues Found
+None.
+
+## Sign-Off
+User confirms Phase 2 JWT Implementation meets requirements.
+Next: Proceed to Phase 3 (OAuth Integration)
+```
+
+---
+
+## Artifact Storage
+
+### File Structure
+
+```
+.planning/
+├── phases/
+│   ├── 1/
+│   │   ├── 1-CONTEXT.md
+│   │   ├── 1-1-PLAN.md
+│   │   ├── 1-1-SUMMARY.md
+│   │   ├── 1-2-PLAN.md
+│   │   ├── 1-2-SUMMARY.md
+│   │   └── 1-VERIFICATION.md
+│   └── 2/
+│       ├── 2-CONTEXT.md
+│       ├── 2-1-PLAN.md
+│       ├── 2-1-SUMMARY.md
+│       ├── 2-2-PLAN.md
+│       ├── 2-2-SUMMARY.md
+│       ├── 2-3-PLAN.md
+│       ├── 2-3-SUMMARY.md
+│       ├── 2-VERIFICATION.md
+│       └── 2-UAT.md
+├── STATE.md
+└── config.json
+```
+
+### Naming Convention
+
+| Pattern | Meaning |
+|---------|---------|
+| `{phase}-CONTEXT.md` | Discussion decisions for phase |
+| `{phase}-{plan}-PLAN.md` | Executable plan |
+| `{phase}-{plan}-SUMMARY.md` | Execution record |
+| `{phase}-VERIFICATION.md` | Phase verification |
+| `{phase}-UAT.md` | User acceptance testing |
+
+---
+
+## Commit Strategy
+
+Each task produces an atomic commit:
+
+```
+{type}({phase}-{plan}): {description}
+
+- Detail 1
+- Detail 2
+```
+
+### Types
+- `feat`: New functionality
+- `fix`: Bug fix
+- `test`: Test additions
+- `refactor`: Code restructuring
+- `perf`: Performance improvement
+- `docs`: Documentation
+- `style`: Formatting only
+- `chore`: Maintenance
+
+### Examples
+```
+feat(2-3): implement refresh token rotation
+
+- Add refresh_tokens table with family tracking
+- Implement rotation endpoint at POST /api/auth/refresh
+- Add reuse detection with family revocation
+
+fix(2-3): add await to token lookup query
+
+- Token lookup was returning promise instead of result
+- Added proper await in refresh handler
+
+feat(2-3): add rate limiting to refresh endpoint
+
+- [Deviation Rule 2] Added express-rate-limit
+- 10 requests per minute per IP
+- Prevents brute force token guessing
+```
+
+### Metadata Commit
+
+After plan completion:
+```
+chore(2-3): complete plan execution
+
+Artifacts:
+- 2-3-SUMMARY.md created
+- STATE.md updated
+- 3 tasks completed, 2 deviations handled
+```
--- a/docs/initiatives.md
+++ b/docs/initiatives.md
@@ -0,0 +1,520 @@
+# Initiatives Module
+
+Initiatives are the planning layer for larger features. They provide a Notion-like document hierarchy for capturing context, decisions, and requirements before work begins. Once approved, initiatives generate phased task plans that agents execute.
+
+## Design Philosophy
+
+### Why Initiatives?
+
+Tasks are atomic work units—great for execution but too granular for planning. Initiatives bridge the gap:
+
+- **Before approval**: A living document where user and Architect refine the vision
+- **After approval**: A persistent knowledge base that tasks link back to
+- **Forever**: Context for future work ("why did we build it this way?")
+
+### Notion-Like Structure
+
+Initiatives aren't flat documents. They're hierarchical pages:
+
+```
+Initiative: User Authentication
+├── User Journeys
+│   ├── Sign Up Flow
+│   └── Password Reset Flow
+├── Business Rules
+│   └── Password Requirements
+├── Technical Concept
+│   ├── JWT Strategy
+│   └── Session Management
+└── Architectural Changes
+    └── Auth Middleware
+```
+
+Each "page" is a record in SQLite with parent-child relationships. This enables:
+- Structured queries: "Give me all subpages of initiative X"
+- Inventory views: "List all technical concepts across initiatives"
+- Cross-references: Link between pages
+
+---
+
+## Data Model
+
+### Initiative Entity
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | TEXT | Primary key (e.g., `init-a1b2c3`) |
+| `project_id` | TEXT | Scopes to a project (most initiatives are single-project) |
+| `title` | TEXT | Initiative name |
+| `status` | TEXT | `draft`, `review`, `approved`, `in_progress`, `completed`, `rejected` |
+| `created_by` | TEXT | User who created it |
+| `created_at` | INTEGER | Unix timestamp |
+| `updated_at` | INTEGER | Unix timestamp |
+| `approved_at` | INTEGER | When approved (null if not approved) |
+| `approved_by` | TEXT | Who approved it |
+
+### Initiative Page Entity
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | TEXT | Primary key (e.g., `page-x1y2z3`) |
+| `initiative_id` | TEXT | Parent initiative |
+| `parent_page_id` | TEXT | Parent page (null for root-level pages) |
+| `type` | TEXT | `user_journey`, `business_rule`, `technical_concept`, `architectural_change`, `note`, `custom` |
+| `title` | TEXT | Page title |
+| `content` | TEXT | Markdown content |
+| `sort_order` | INTEGER | Display order among siblings |
+| `created_at` | INTEGER | Unix timestamp |
+| `updated_at` | INTEGER | Unix timestamp |
+
+### Initiative Phase Entity
+
+Phases group tasks for staged execution and rolling approval.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | TEXT | Primary key (e.g., `phase-p1q2r3`) |
+| `initiative_id` | TEXT | Parent initiative |
+| `number` | INTEGER | Phase number (1, 2, 3...) |
+| `name` | TEXT | Phase name |
+| `description` | TEXT | What this phase delivers |
+| `status` | TEXT | `draft`, `pending_approval`, `approved`, `in_progress`, `completed` |
+| `approved_at` | INTEGER | When approved |
+| `approved_by` | TEXT | Who approved |
+| `created_at` | INTEGER | Unix timestamp |
+
+### Task Link
+
+Tasks reference their initiative and phase:
+
+```sql
+-- In tasks table (see docs/tasks.md)
+initiative_id TEXT REFERENCES initiatives(id),
+phase_id TEXT REFERENCES initiative_phases(id),
+```
+
+---
+
+## SQLite Schema
+
+```sql
+CREATE TABLE initiatives (
+  id TEXT PRIMARY KEY,
+  project_id TEXT,
+  title TEXT NOT NULL,
+  status TEXT NOT NULL DEFAULT 'draft'
+    CHECK (status IN ('draft', 'review', 'approved', 'in_progress', 'completed', 'rejected')),
+  created_by TEXT,
+  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  approved_at INTEGER,
+  approved_by TEXT
+);
+
+CREATE TABLE initiative_pages (
+  id TEXT PRIMARY KEY,
+  initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
+  parent_page_id TEXT REFERENCES initiative_pages(id) ON DELETE CASCADE,
+  type TEXT NOT NULL DEFAULT 'note'
+    CHECK (type IN ('user_journey', 'business_rule', 'technical_concept', 'architectural_change', 'note', 'custom')),
+  title TEXT NOT NULL,
+  content TEXT,
+  sort_order INTEGER NOT NULL DEFAULT 0,
+  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  updated_at INTEGER NOT NULL DEFAULT (unixepoch())
+);
+
+CREATE TABLE initiative_phases (
+  id TEXT PRIMARY KEY,
+  initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
+  number INTEGER NOT NULL,
+  name TEXT NOT NULL,
+  description TEXT,
+  status TEXT NOT NULL DEFAULT 'draft'
+    CHECK (status IN ('draft', 'pending_approval', 'approved', 'in_progress', 'completed')),
+  approved_at INTEGER,
+  approved_by TEXT,
+  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  UNIQUE(initiative_id, number)
+);
+
+CREATE INDEX idx_initiatives_project ON initiatives(project_id);
+CREATE INDEX idx_initiatives_status ON initiatives(status);
+CREATE INDEX idx_pages_initiative ON initiative_pages(initiative_id);
+CREATE INDEX idx_pages_parent ON initiative_pages(parent_page_id);
+CREATE INDEX idx_pages_type ON initiative_pages(type);
+CREATE INDEX idx_phases_initiative ON initiative_phases(initiative_id);
+CREATE INDEX idx_phases_status ON initiative_phases(status);
+
+-- Useful views
+CREATE VIEW initiative_page_tree AS
+WITH RECURSIVE tree AS (
+  SELECT id, initiative_id, parent_page_id, title, type, 0 as depth,
+         title as path
+  FROM initiative_pages WHERE parent_page_id IS NULL
+  UNION ALL
+  SELECT p.id, p.initiative_id, p.parent_page_id, p.title, p.type, t.depth + 1,
+         t.path || ' > ' || p.title
+  FROM initiative_pages p
+  JOIN tree t ON p.parent_page_id = t.id
+)
+SELECT * FROM tree ORDER BY path;
+```
+
+---
+
+## Status Workflow
+
+### Initiative Status
+
+```
+  [draft] ──submit──▶ [review] ──approve──▶ [approved]
+     │                   │                      │
+     │                   │ reject               │ start work
+     │                   ▼                      ▼
+     │              [rejected]            [in_progress]
+     │                                          │
+     │                                          │ all phases done
+     └──────────────────────────────────────────▶ [completed]
+```
+
+| Status | Meaning |
+|--------|---------|
+| `draft` | User/Architect still refining |
+| `review` | Ready for approval decision |
+| `approved` | Work plan created, awaiting execution |
+| `in_progress` | At least one phase executing |
+| `completed` | All phases completed |
+| `rejected` | Won't implement |
+
+### Phase Status
+
+```
+  [draft] ──finalize──▶ [pending_approval] ──approve──▶ [approved]
+                                                            │
+                                                            │ claim tasks
+                                                            ▼
+                                                       [in_progress]
+                                                            │
+                                                            │ all tasks closed
+                                                            ▼
+                                                       [completed]
+```
+
+**Rolling approval pattern:**
+1. Architect creates work plan with multiple phases
+2. User approves Phase 1 → agents start executing
+3. While Phase 1 executes, user reviews Phase 2
+4. Phase 2 approved → agents can start when ready
+5. Continue until all phases approved/completed
+
+This prevents blocking: agents don't wait for all phases to be approved upfront.
+
+---
+
+## Workflow
+
+### 1. Draft Initiative
+
+User creates initiative with basic vision:
+
+```
+cw initiative create "User Authentication"
+```
+
+System creates initiative in `draft` status with empty page structure.
+
+### 2. Architect Iteration (Questioning)
+
+Architect agent engages in structured questioning to capture requirements:
+
+**Question Categories:**
+
+| Category | Example Questions |
+|----------|-------------------|
+| **Visual Features** | Layout approach? Density? Interactions? Empty states? |
+| **APIs/CLIs** | Response format? Flags? Error handling? Verbosity? |
+| **Data/Content** | Structure? Validation rules? Edge cases? |
+| **Architecture** | Patterns to follow? What to avoid? Reference code? |
+
+Each answer populates initiative pages. Architect may:
+- Create user journey pages
+- Document business rules
+- Draft technical concepts
+- Flag architectural impacts
+
+See [agents/architect.md](agents/architect.md) for the full Architect agent prompt.
+
+### 3. Discussion Phase (Per Phase)
+
+Before planning each phase, the Architect captures implementation decisions through focused discussion. This happens BEFORE any planning work.
+
+```
+cw phase discuss <phase-id>
+```
+
+Creates `{phase}-CONTEXT.md` with locked decisions:
+
+```yaml
+---
+phase: 1
+discussed_at: 2024-01-15
+---
+
+# Phase 1 Context: User Authentication
+
+## Decisions
+
+### Authentication Method
+**Decision:** Email/password with optional OAuth
+**Reason:** MVP needs simple auth, OAuth for convenience
+**Locked:** true
+
+### Token Storage
+**Decision:** httpOnly cookies
+**Reason:** XSS protection
+**Alternatives Rejected:**
+- localStorage: XSS vulnerable
+```
+
+These decisions guide all subsequent planning and execution. Workers reference CONTEXT.md for implementation direction.
+
+### 4. Research Phase (Optional)
+
+For phases with unknowns, run discovery before planning:
+
+| Level | When | Time | Scope |
+|-------|------|------|-------|
+| L0 | Pure internal work | Skip | None |
+| L1 | Quick verification | 2-5 min | Confirm assumptions |
+| L2 | Standard research | 15-30 min | Explore patterns |
+| L3 | Deep dive | 1+ hour | Novel domain |
+
+```
+cw phase research <phase-id> --level 2
+```
+
+Creates `{phase}-RESEARCH.md` with findings that inform planning.
+
+### 5. Submit for Review
+
+When Architect and user are satisfied:
+
+```
+cw initiative submit <id>
+```
+
+Status changes to `review`. Triggers notification for approval.
+
+### 4. Approve Initiative
+
+Human reviews the complete initiative:
+
+```
+cw initiative approve <id>
+```
+
+Status changes to `approved`. Now work plan can be created.
+
+### 5. Create Work Plan
+
+Architect (or user) breaks initiative into phases:
+
+```
+cw initiative plan <id>
+```
+
+This creates:
+- `initiative_phases` records
+- Tasks linked to each phase via `initiative_id` + `phase_id`
+
+Tasks are created in `open` status but won't be "ready" until their phase is approved.
+
+### 6. Approve Phases (Rolling)
+
+User reviews and approves phases one at a time:
+
+```
+cw phase approve <phase-id>
+```
+
+Approved phases make their tasks "ready" for agents. User can approve Phase 1, let agents work, then approve Phase 2 later.
+
+### 7. Execute
+
+Workers pull tasks via `cw task ready`. Tasks include:
+- Link to initiative for context
+- Link to phase for grouping
+- All normal task fields (dependencies, priority, etc.)
+
+### 8. Verify Phase
+
+After all tasks in a phase complete, the Verifier agent runs goal-backward verification:
+
+```
+cw phase verify <phase-id>
+```
+
+Verification checks:
+1. **Observable truths** — What users can observe when goal is achieved
+2. **Required artifacts** — Files that must exist (not stubs)
+3. **Required wiring** — Connections that must work
+4. **Anti-patterns** — TODOs, placeholders, empty returns
+
+Creates `{phase}-VERIFICATION.md` with results. If gaps found, creates remediation tasks.
+
+See [verification.md](verification.md) for detailed verification patterns.
+
+### 9. User Acceptance Testing
+
+After technical verification passes, run UAT:
+
+```
+cw phase uat <phase-id>
+```
+
+Walks user through testable deliverables:
+- "Can you log in with email and password?"
+- "Does the dashboard show your projects?"
+
+Creates `{phase}-UAT.md` with results. If issues found, creates targeted fix plans.
+
+### 10. Complete
+
+When all tasks in all phases are closed AND verification passes:
+- Each phase auto-transitions to `completed`
+- Initiative auto-transitions to `completed`
+- Domain layer updated to reflect new state
+
+---
+
+## Phase Artifacts
+
+Each phase produces artifacts during execution:
+
+| Artifact | Created By | Purpose |
+|----------|------------|---------|
+| `{phase}-CONTEXT.md` | Architect (Discussion) | Locked implementation decisions |
+| `{phase}-RESEARCH.md` | Architect (Research) | Domain knowledge findings |
+| `{phase}-{N}-PLAN.md` | Architect (Planning) | Executable task plans |
+| `{phase}-{N}-SUMMARY.md` | Worker (Execution) | What actually happened |
+| `{phase}-VERIFICATION.md` | Verifier | Goal-backward verification |
+| `{phase}-UAT.md` | Verifier + User | User acceptance testing |
+
+See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
+
+---
+
+## CLI Reference
+
+### Initiative Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw initiative create <title>` | Create draft initiative |
+| `cw initiative list [--status STATUS]` | List initiatives |
+| `cw initiative show <id>` | Show initiative with page tree |
+| `cw initiative submit <id>` | Submit for review |
+| `cw initiative approve <id>` | Approve initiative |
+| `cw initiative reject <id> --reason "..."` | Reject initiative |
+| `cw initiative plan <id>` | Generate phased work plan |
+
+### Page Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw page create <initiative-id> <title> --type TYPE` | Create page |
+| `cw page create <initiative-id> <title> --parent <page-id>` | Create subpage |
+| `cw page show <id>` | Show page content |
+| `cw page edit <id>` | Edit page (opens editor) |
+| `cw page list <initiative-id> [--type TYPE]` | List pages |
+| `cw page tree <initiative-id>` | Show page hierarchy |
+
+### Phase Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw phase list <initiative-id>` | List phases |
+| `cw phase show <id>` | Show phase with tasks |
+| `cw phase discuss <id>` | Capture implementation decisions (creates CONTEXT.md) |
+| `cw phase research <id> [--level N]` | Run discovery (L0-L3, creates RESEARCH.md) |
+| `cw phase approve <id>` | Approve phase for execution |
+| `cw phase verify <id>` | Run goal-backward verification |
+| `cw phase uat <id>` | Run user acceptance testing |
+| `cw phase status <id>` | Check phase progress |
+
+---
+
+## Integration Points
+
+### With Tasks Module
+
+Tasks gain two new fields:
+- `initiative_id`: Links task to initiative (for context)
+- `phase_id`: Links task to phase (for grouping/approval)
+
+The `ready_tasks` view should consider phase approval:
+
+```sql
+CREATE VIEW ready_tasks AS
+SELECT t.* FROM tasks t
+LEFT JOIN initiative_phases p ON t.phase_id = p.id
+WHERE t.status = 'open'
+  AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
+  AND NOT EXISTS (
+    SELECT 1 FROM task_dependencies d
+    JOIN tasks dep ON d.depends_on = dep.id
+    WHERE d.task_id = t.id
+      AND d.type = 'blocks'
+      AND dep.status != 'closed'
+  )
+ORDER BY t.priority ASC, t.created_at ASC;
+```
+
+### With Domain Layer
+
+When initiative completes, its pages can feed into domain documentation:
+- Business rules → Domain business rules
+- Technical concepts → Architecture docs
+- New aggregates → Domain model updates
+
+### With Orchestrator
+
+Orchestrator can:
+- Trigger Architect agents for initiative iteration
+- Monitor phase completion and auto-advance initiative status
+- Coordinate approval notifications
+
+### tRPC Procedures
+
+```typescript
+// Suggested tRPC router shape
+initiative.create(input)           // → Initiative
+initiative.list(filters)           // → Initiative[]
+initiative.get(id)                 // → Initiative with pages
+initiative.submit(id)              // → Initiative
+initiative.approve(id)             // → Initiative
+initiative.reject(id, reason)      // → Initiative
+initiative.plan(id)                // → Phase[]
+
+page.create(input)                 // → Page
+page.get(id)                       // → Page
+page.update(id, content)           // → Page
+page.list(initiativeId, filters)   // → Page[]
+page.tree(initiativeId)            // → PageTree
+
+phase.list(initiativeId)           // → Phase[]
+phase.get(id)                      // → Phase with tasks
+phase.approve(id)                  // → Phase
+phase.status(id)                   // → PhaseStatus
+```
+
+---
+
+## Future Considerations
+
+- **Templates**: Pre-built page structures for common initiative types
+- **Cross-project initiatives**: Single initiative spanning multiple projects
+- **Versioning**: Track changes to initiative pages over time
+- **Approval workflows**: Multi-step approval with different approvers
+- **Auto-planning**: LLM generates work plan from initiative content
--- a/docs/logging.md
+++ b/docs/logging.md
@@ -0,0 +1,64 @@
+# Structured Logging
+
+Codewalk District uses [pino](https://getpino.io/) for structured JSON logging on the backend.
+
+## Architecture
+
+- **pino** writes structured JSON to **stderr** so CLI user output on stdout stays clean
+- **console.log** remains for CLI command handlers (user-facing output on stdout)
+- The `src/logging/` module (ProcessLogWriter/LogManager) is a separate concern — it captures per-agent process stdout/stderr to files
+
+## Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `CW_LOG_LEVEL` | Log level override (`fatal`, `error`, `warn`, `info`, `debug`, `trace`, `silent`) | `info` (production), `debug` (development) |
+| `CW_LOG_PRETTY` | Set to `1` for human-readable colorized output via pino-pretty | unset (JSON output) |
+
+## Log Levels
+
+| Level | Usage |
+|-------|-------|
+| `fatal` | Process will exit (uncaught exceptions, DB migration failure) |
+| `error` | Operation failed (agent crash, parse failure, clone failure) |
+| `warn` | Degraded (account exhausted, no accounts available, stale PID, reconcile marking crashed) |
+| `info` | State transitions (agent spawned/stopped/resumed, dispatch decision, server started, account selected/switched) |
+| `debug` | Implementation details (command being built, session ID extraction, worktree paths, schema selection) |
+
+## Adding Logging to a New Module
+
+```typescript
+import { createModuleLogger } from '../logger/index.js';
+
+const log = createModuleLogger('my-module');
+
+// Use structured data as first arg, message as second
+log.info({ taskId, agentId }, 'task dispatched');
+log.error({ err: error }, 'operation failed');
+log.debug({ path, count }, 'processing items');
+```
+
+## Module Names
+
+| Module | Used in |
+|--------|---------|
+| `agent-manager` | `src/agent/manager.ts` |
+| `dispatch` | `src/dispatch/manager.ts` |
+| `http` | `src/server/index.ts` |
+| `server` | `src/cli/index.ts` (startup) |
+| `git` | `src/git/manager.ts`, `src/git/clone.ts`, `src/git/project-clones.ts` |
+| `db` | `src/db/ensure-schema.ts` |
+
+## Testing
+
+Logs are silenced in tests via `CW_LOG_LEVEL=silent` in `vitest.config.ts`.
+
+## Quick Start
+
+```sh
+# Pretty logs during development
+CW_LOG_LEVEL=debug CW_LOG_PRETTY=1 cw --server
+
+# JSON logs for production/piping
+cw --server 2>server.log
+```
--- a/docs/model-profiles.md
+++ b/docs/model-profiles.md
@@ -0,0 +1,267 @@
+# Model Profiles
+
+Different agent roles have different needs. Model selection balances quality, cost, and latency.
+
+## Profile Definitions
+
+| Profile | Use Case | Cost | Quality |
+|---------|----------|------|---------|
+| **quality** | Critical decisions, architecture | Highest | Best |
+| **balanced** | Default for most work | Medium | Good |
+| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
+
+---
+
+## Agent Model Assignments
+
+| Agent | Quality | Balanced (Default) | Budget |
+|-------|---------|-------------------|--------|
+| **Architect** | Opus | Opus | Sonnet |
+| **Worker** | Opus | Sonnet | Sonnet |
+| **Verifier** | Sonnet | Sonnet | Haiku |
+| **Orchestrator** | Sonnet | Sonnet | Haiku |
+| **Monitor** | Sonnet | Haiku | Haiku |
+| **Researcher** | Opus | Sonnet | Haiku |
+
+---
+
+## Rationale
+
+### Architect (Planning) - Opus/Opus/Sonnet
+Planning has the highest impact on outcomes. A bad plan wastes all downstream execution. Invest in quality here.
+
+**Quality profile:** Complex systems, novel domains, critical decisions
+**Balanced profile:** Standard feature work, established patterns
+**Budget profile:** Simple initiatives, well-documented domains
+
+### Worker (Execution) - Opus/Sonnet/Sonnet
+The plan already contains reasoning. Execution is implementation, not decision-making.
+
+**Quality profile:** Complex algorithms, security-critical code
+**Balanced profile:** Standard implementation work
+**Budget profile:** Simple tasks, boilerplate code
+
+### Verifier (Validation) - Sonnet/Sonnet/Haiku
+Verification is structured checking against defined criteria. Less reasoning needed than planning.
+
+**Quality profile:** Complex verification, subtle integration issues
+**Balanced profile:** Standard goal-backward verification
+**Budget profile:** Simple pass/fail checks
+
+### Orchestrator (Coordination) - Sonnet/Sonnet/Haiku
+Orchestrator routes work, doesn't do heavy lifting. Needs reliability, not creativity.
+
+**Quality profile:** Complex multi-agent coordination
+**Balanced profile:** Standard workflow management
+**Budget profile:** Simple task routing
+
+### Monitor (Observation) - Sonnet/Haiku/Haiku
+Monitoring is pattern matching and threshold checking. Minimal reasoning required.
+
+**Quality profile:** Complex health analysis
+**Balanced profile:** Standard monitoring
+**Budget profile:** Simple heartbeat checks
+
+### Researcher (Discovery) - Opus/Sonnet/Haiku
+Research is read-only exploration. High volume, low modification risk.
+
+**Quality profile:** Deep domain analysis
+**Balanced profile:** Standard codebase exploration
+**Budget profile:** Simple file lookups
+
+---
+
+## Profile Selection
+
+### Per-Initiative Override
+
+```yaml
+# In initiative config
+model_profile: quality  # Override default balanced
+```
+
+### Per-Agent Override
+
+```yaml
+# In task assignment
+assigned_to: worker-123
+model_override: opus  # This task needs Opus
+```
+
+### Automatic Escalation
+
+```yaml
+# When to auto-escalate
+escalation_triggers:
+  - condition: "task.retry_count > 2"
+    action: "escalate_model"
+  - condition: "task.complexity == 'high'"
+    action: "use_quality_profile"
+  - condition: "deviation.rule == 4"
+    action: "escalate_model"
+```
+
+---
+
+## Cost Management
+
+### Estimated Token Usage
+
+| Agent | Avg Tokens/Task | Profile Impact |
+|-------|-----------------|----------------|
+| Architect | 50k-100k | 3x between budget/quality |
+| Worker | 20k-50k | 2x between budget/quality |
+| Verifier | 10k-30k | 1.5x between budget/quality |
+| Orchestrator | 5k-15k | 1.5x between budget/quality |
+
+### Cost Optimization Strategies
+
+1. **Right-size tasks:** Smaller tasks = less token usage
+2. **Use budget for volume:** Monitoring, simple checks
+3. **Reserve quality for impact:** Architecture, security
+4. **Profile per initiative:** Simple features use budget, complex use quality
+
+---
+
+## Configuration
+
+### Default Profile
+
+```json
+// .planning/config.json
+{
+  "model_profile": "balanced",
+  "model_overrides": {
+    "architect": null,
+    "worker": null,
+    "verifier": null
+  }
+}
+```
+
+### Quality Profile
+
+```json
+{
+  "model_profile": "quality",
+  "model_overrides": {}
+}
+```
+
+### Budget Profile
+
+```json
+{
+  "model_profile": "budget",
+  "model_overrides": {
+    "architect": "sonnet"  // Keep architect at sonnet minimum
+  }
+}
+```
+
+### Mixed Profile
+
+```json
+{
+  "model_profile": "balanced",
+  "model_overrides": {
+    "architect": "opus",     // Invest in planning
+    "worker": "sonnet",      // Standard execution
+    "verifier": "haiku"      // Budget verification
+  }
+}
+```
+
+---
+
+## Model Capabilities Reference
+
+### Opus
+- **Strengths:** Complex reasoning, nuanced decisions, novel problems
+- **Best for:** Architecture, complex algorithms, security analysis
+- **Cost:** Highest
+
+### Sonnet
+- **Strengths:** Good balance of reasoning and speed, reliable
+- **Best for:** Standard development, code generation, debugging
+- **Cost:** Medium
+
+### Haiku
+- **Strengths:** Fast, cheap, good for structured tasks
+- **Best for:** Monitoring, simple checks, high-volume operations
+- **Cost:** Lowest
+
+---
+
+## Profile Switching
+
+### CLI Command
+
+```bash
+# Set profile for all future work
+cw config set model_profile quality
+
+# Set profile for specific initiative
+cw initiative config <id> --model-profile budget
+
+# Override for single task
+cw task update <id> --model-override opus
+```
+
+### API
+
+```typescript
+// Set initiative profile
+await initiative.setConfig(id, { modelProfile: 'quality' });
+
+// Override task model
+await task.update(id, { modelOverride: 'opus' });
+```
+
+---
+
+## Monitoring Model Usage
+
+Track model usage for cost analysis:
+
+```sql
+CREATE TABLE model_usage (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  agent_type TEXT NOT NULL,
+  model TEXT NOT NULL,
+  tokens_input INTEGER,
+  tokens_output INTEGER,
+  task_id TEXT,
+  initiative_id TEXT,
+  created_at INTEGER DEFAULT (unixepoch())
+);
+
+-- Usage by agent type
+SELECT agent_type, model, SUM(tokens_input + tokens_output) as total_tokens
+FROM model_usage
+GROUP BY agent_type, model;
+
+-- Cost by initiative
+SELECT initiative_id,
+       SUM(CASE WHEN model = 'opus' THEN tokens * 0.015
+                WHEN model = 'sonnet' THEN tokens * 0.003
+                WHEN model = 'haiku' THEN tokens * 0.0003 END) as estimated_cost
+FROM model_usage
+GROUP BY initiative_id;
+```
+
+---
+
+## Recommendations
+
+### Starting Out
+Use **balanced** profile. It provides good quality at reasonable cost.
+
+### High-Stakes Projects
+Use **quality** profile. The cost difference is negligible compared to getting it right.
+
+### High-Volume Work
+Use **budget** profile with architect override to sonnet. Don't skimp on planning.
+
+### Learning the System
+Use **quality** profile initially. See what good output looks like before optimizing for cost.
--- a/docs/session-state.md
+++ b/docs/session-state.md
@@ -0,0 +1,402 @@
+# Session State
+
+Session state tracks position, decisions, and blockers across agent restarts. Unlike the Domain Layer (which tracks codebase state), session state tracks **execution state**.
+
+## STATE.md
+
+Every active initiative maintains a STATE.md file tracking execution progress:
+
+```yaml
+# STATE.md
+initiative: init-abc123
+title: User Authentication
+
+# Current Position
+position:
+  phase: 2
+  phase_name: "JWT Implementation"
+  plan: 3
+  plan_name: "Refresh Token Rotation"
+  task: "Implement token rotation endpoint"
+  wave: 1
+  status: in_progress
+
+# Progress Tracking
+progress:
+  phases_total: 4
+  phases_completed: 1
+  current_phase_tasks: 8
+  current_phase_completed: 5
+  bar: "████████░░░░░░░░ 50%"
+
+# Decisions Made
+decisions:
+  - date: 2024-01-14
+    context: "Token storage strategy"
+    decision: "httpOnly cookie, not localStorage"
+    reason: "XSS protection, automatic inclusion in requests"
+
+  - date: 2024-01-14
+    context: "JWT library"
+    decision: "jose over jsonwebtoken"
+    reason: "Better TypeScript support, Web Crypto API"
+
+  - date: 2024-01-15
+    context: "Refresh token lifetime"
+    decision: "7 days"
+    reason: "Balance between security and UX"
+
+# Active Blockers
+blockers:
+  - id: block-001
+    description: "Waiting for OAuth credentials from client"
+    blocked_since: 2024-01-15
+    affects: ["Phase 3: OAuth Integration"]
+    workaround: "Proceeding with email/password auth first"
+
+# Session History
+sessions:
+  - id: session-001
+    started: 2024-01-14T09:00:00Z
+    ended: 2024-01-14T17:00:00Z
+    completed: ["Phase 1: Database Schema", "Phase 2 Tasks 1-3"]
+
+  - id: session-002
+    started: 2024-01-15T09:00:00Z
+    status: active
+    working_on: "Phase 2, Task 4: Refresh token rotation"
+
+# Next Action
+next_action: |
+  Continue implementing refresh token rotation endpoint.
+  After completion, run verification for Phase 2.
+  If Phase 2 passes, move to Phase 3 (blocked pending OAuth creds).
+
+# Context for Resume
+resume_context:
+  files_modified_this_session:
+    - src/api/auth/refresh.ts
+    - src/middleware/auth.ts
+    - db/migrations/002_refresh_tokens.sql
+
+  key_implementations:
+    - "Refresh tokens stored in SQLite with expiry"
+    - "Rotation creates new token, invalidates old"
+    - "Token family tracking for reuse detection"
+
+  open_questions: []
+```
+
+---
+
+## State Updates
+
+### When to Update STATE.md
+
+| Event | Update |
+|-------|--------|
+| Task started | `position.task`, `position.status` |
+| Task completed | `progress.*`, `position` to next task |
+| Decision made | Add to `decisions` |
+| Blocker encountered | Add to `blockers` |
+| Blocker resolved | Remove from `blockers` |
+| Session start | Add to `sessions` |
+| Session end | Update session `ended`, `completed` |
+| Phase completed | `progress.phases_completed`, reset task counters |
+
+### Atomic Updates
+
+```typescript
+// Update position atomically
+await updateState({
+  position: {
+    phase: 2,
+    plan: 3,
+    task: "Implement token rotation",
+    wave: 1,
+    status: "in_progress"
+  }
+});
+
+// Add decision
+await addDecision({
+  context: "Token storage",
+  decision: "httpOnly cookie",
+  reason: "XSS protection"
+});
+
+// Record blocker
+await addBlocker({
+  description: "Waiting for OAuth creds",
+  affects: ["Phase 3"]
+});
+```
+
+---
+
+## Resume Protocol
+
+When resuming work:
+
+### 1. Load STATE.md
+```
+Read STATE.md for initiative
+Extract: position, decisions, blockers, resume_context
+```
+
+### 2. Load Relevant Context
+```
+If position.plan exists:
+  Load {phase}-{plan}-PLAN.md
+  Load prior SUMMARY.md files for this phase
+
+If position.task exists:
+  Find task in current plan
+  Resume from that task
+```
+
+### 3. Verify State
+```
+Check files_modified_this_session still exist
+Check implementations match key_implementations
+If mismatch: flag for review before proceeding
+```
+
+### 4. Continue Execution
+```
+Display: "Resuming from Phase {N}, Plan {M}, Task: {name}"
+Display: decisions made (for context)
+Display: active blockers (for awareness)
+Continue with task execution
+```
+
+---
+
+## Decision Tracking
+
+Decisions are first-class citizens, not comments.
+
+### What to Track
+
+| Type | Example | Why Track |
+|------|---------|-----------|
+| Technology choice | "Using jose for JWT" | Prevents second-guessing |
+| Architecture decision | "Separate auth service" | Documents reasoning |
+| Trade-off resolution | "Speed over features" | Explains constraints |
+| User preference | "Dark mode default" | Preserves intent |
+| Constraint discovered | "API rate limited to 100/min" | Prevents repeated discovery |
+
+### Decision Format
+
+```yaml
+decisions:
+  - date: 2024-01-15
+    context: "Where the decision was needed"
+    decision: "What was decided"
+    reason: "Why this choice"
+    alternatives_considered:
+      - "Alternative A: rejected because..."
+      - "Alternative B: rejected because..."
+    reversible: true|false
+```
+
+---
+
+## Blocker Management
+
+### Blocker States
+
+```
+[new] ──identify──▶ [active] ──resolve──▶ [resolved]
+                       │
+                       │ workaround
+                       ▼
+                   [bypassed]
+```
+
+### Blocker Format
+
+```yaml
+blockers:
+  - id: block-001
+    status: active
+    description: "Need production API keys"
+    identified_at: 2024-01-15T10:00:00Z
+    affects:
+      - "Phase 4: Production deployment"
+      - "Phase 5: Monitoring setup"
+    blocked_tasks:
+      - task-xyz: "Configure production environment"
+    workaround: null
+    resolution: null
+
+  - id: block-002
+    status: bypassed
+    description: "Design mockups not ready"
+    identified_at: 2024-01-14T09:00:00Z
+    affects: ["UI implementation"]
+    workaround: "Using placeholder styles, will refine later"
+    workaround_tasks:
+      - task-abc: "Apply final styles when mockups ready"
+```
+
+### Blocker Impact on Execution
+
+1. **Task Blocking:** Task marked `blocked` in tasks table
+2. **Phase Blocking:** If all remaining tasks blocked, phase paused
+3. **Initiative Blocking:** If all phases blocked, escalate to user
+
+---
+
+## Session History
+
+Track work sessions for debugging and handoffs:
+
+```yaml
+sessions:
+  - id: session-001
+    agent: worker-abc
+    started: 2024-01-14T09:00:00Z
+    ended: 2024-01-14T12:30:00Z
+    context_usage: "45%"
+    completed:
+      - "Phase 1, Plan 1: Database setup"
+      - "Phase 1, Plan 2: User model"
+    notes: "Clean execution, no issues"
+
+  - id: session-002
+    agent: worker-def
+    started: 2024-01-14T13:00:00Z
+    ended: 2024-01-14T17:00:00Z
+    context_usage: "62%"
+    completed:
+      - "Phase 1, Plan 3: Auth endpoints"
+    issues:
+      - "Context exceeded 50%, quality may have degraded"
+      - "Encountered blocker: missing env vars"
+    handoff_reason: "Context limit reached"
+```
+
+---
+
+## Storage Options
+
+### SQLite (Recommended for Codewalk)
+
+```sql
+CREATE TABLE initiative_state (
+  initiative_id TEXT PRIMARY KEY REFERENCES initiatives(id),
+  current_phase INTEGER,
+  current_plan INTEGER,
+  current_task TEXT,
+  current_wave INTEGER,
+  status TEXT,
+  progress_json TEXT,
+  updated_at INTEGER
+);
+
+CREATE TABLE initiative_decisions (
+  id TEXT PRIMARY KEY,
+  initiative_id TEXT REFERENCES initiatives(id),
+  date INTEGER,
+  context TEXT,
+  decision TEXT,
+  reason TEXT,
+  alternatives_json TEXT,
+  reversible BOOLEAN
+);
+
+CREATE TABLE initiative_blockers (
+  id TEXT PRIMARY KEY,
+  initiative_id TEXT REFERENCES initiatives(id),
+  status TEXT CHECK (status IN ('active', 'bypassed', 'resolved')),
+  description TEXT,
+  identified_at INTEGER,
+  affects_json TEXT,
+  workaround TEXT,
+  resolution TEXT,
+  resolved_at INTEGER
+);
+
+CREATE TABLE session_history (
+  id TEXT PRIMARY KEY,
+  initiative_id TEXT REFERENCES initiatives(id),
+  agent_id TEXT,
+  started_at INTEGER,
+  ended_at INTEGER,
+  context_usage REAL,
+  completed_json TEXT,
+  issues_json TEXT,
+  handoff_reason TEXT
+);
+```
+
+### File-Based (Alternative)
+
+```
+.planning/
+├── STATE.md                    # Current state
+├── decisions/
+│   └── 2024-01-15-jwt-library.md
+├── blockers/
+│   └── block-001-oauth-creds.md
+└── sessions/
+    ├── session-001.md
+    └── session-002.md
+```
+
+---
+
+## Integration with Agents
+
+### Worker
+- Reads STATE.md at start
+- Updates position on task transitions
+- Adds deviations to session notes
+- Updates progress counters
+
+### Architect
+- Creates initial STATE.md when planning
+- Sets up phase/plan structure
+- Documents initial decisions
+
+### Orchestrator
+- Monitors blocker status
+- Triggers resume when blockers resolve
+- Coordinates session handoffs
+
+### Verifier
+- Reads decisions for verification context
+- Updates state with verification results
+- Flags issues for resolution
+
+---
+
+## Example: Resume After Crash
+
+```
+1. Agent crashes mid-task
+
+2. Supervisor detects stale assignment
+   - Task assigned_at > 30min ago
+   - No progress updates
+
+3. Supervisor resets task
+   - Status back to 'open'
+   - Clear assigned_to
+
+4. New agent picks up task
+   - Reads STATE.md
+   - Sees: "Last working on: Refresh token rotation"
+   - Loads relevant PLAN.md
+   - Resumes execution
+
+5. STATE.md shows continuity
+   sessions:
+     - id: session-003
+       status: crashed
+       notes: "Agent unresponsive, task reset"
+     - id: session-004
+       status: active
+       notes: "Resuming from session-003 crash"
+```
--- a/docs/task-granularity.md
+++ b/docs/task-granularity.md
@@ -0,0 +1,309 @@
+# Task Granularity Standards
+
+A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results and rework.
+
+## The Granularity Test
+
+Ask: **Can an agent execute this task without making assumptions?**
+
+If the answer requires "it depends" or "probably means", the task is too vague.
+
+---
+
+## Comparison Table
+
+| Too Vague | Just Right |
+|-----------|------------|
+| "Add authentication" | "Add JWT auth with refresh rotation using jose library, store in httpOnly cookie, 15min access / 7day refresh" |
+| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name length 3-50 chars, returns 201 with project object" |
+| "Style the dashboard" | "Add Tailwind classes to Dashboard.tsx: grid layout (3 cols on lg, 1 on mobile), card shadows, hover states on action buttons" |
+| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner on client" |
+| "Add form validation" | "Add Zod schema to CreateProjectForm: name (3-50 chars, alphanumeric), description (optional, max 500 chars), show inline errors" |
+| "Improve performance" | "Add React.memo to ProjectCard, useMemo for filtered list in Dashboard, lazy load ProjectDetails route" |
+| "Fix the login bug" | "Fix login redirect loop: after successful login in auth.ts:45, redirect to stored returnUrl instead of always '/' " |
+| "Set up the database" | "Create SQLite database at data/cw.db with migrations in db/migrations/, run via 'cw db migrate'" |
+
+---
+
+## Required Task Components
+
+Every task MUST include:
+
+### 1. Files
+Exact paths that will be created or modified.
+
+```yaml
+files:
+  - src/components/Chat.tsx      # create
+  - src/hooks/useChat.ts         # create
+  - src/api/messages.ts          # modify
+```
+
+### 2. Action
+What to do, what to avoid, and WHY.
+
+```yaml
+action: |
+  Create Chat component with:
+  - Message list (virtualized for performance)
+  - Input field with send button
+  - Auto-scroll to bottom on new message
+
+  DO NOT:
+  - Implement WebSocket (separate task)
+  - Add typing indicators (Phase 2)
+
+  WHY: Core chat UI needed before real-time features
+```
+
+### 3. Verify
+Command or check to prove completion.
+
+```yaml
+verify:
+  - command: "npm run typecheck"
+    expect: "No type errors"
+  - command: "npm run test -- Chat.test.tsx"
+    expect: "Tests pass"
+  - manual: "Navigate to /chat, see empty message list and input"
+```
+
+### 4. Done
+Measurable acceptance criteria.
+
+```yaml
+done:
+  - "Chat component renders without errors"
+  - "Input accepts text and clears on submit"
+  - "Messages display in chronological order"
+  - "Tests cover send and display functionality"
+```
+
+---
+
+## Task Types
+
+### Type: auto
+Agent executes autonomously.
+
+```yaml
+type: auto
+files: [src/components/Button.tsx]
+action: "Create Button component with primary/secondary variants using Tailwind"
+verify: "npm run typecheck && npm run test"
+done: "Button renders with correct styles for each variant"
+```
+
+### Type: checkpoint:human-verify
+Agent completes, human confirms.
+
+```yaml
+type: checkpoint:human-verify
+files: [src/pages/Dashboard.tsx]
+action: "Implement dashboard layout with project cards"
+verify: "Navigate to /dashboard after login"
+prompt: "Does the dashboard match the design mockup?"
+done: "User confirms layout is correct"
+```
+
+### Type: checkpoint:decision
+Human makes choice that affects implementation.
+
+```yaml
+type: checkpoint:decision
+prompt: "Which chart library should we use?"
+options:
+  - recharts: "React-native, good for simple charts"
+  - d3: "More powerful, steeper learning curve"
+  - chart.js: "Lightweight, canvas-based"
+affects: "All subsequent charting tasks"
+```
+
+### Type: checkpoint:human-action
+Unavoidable manual step.
+
+```yaml
+type: checkpoint:human-action
+prompt: "Please click the verification link sent to your email"
+reason: "Cannot automate email client interaction"
+continue_after: "User confirms email verified"
+```
+
+---
+
+## Time Estimation
+
+Tasks should fit within context budgets:
+
+| Complexity | Context % | Wall Time | Example |
+|------------|-----------|-----------|---------|
+| Trivial | 5-10% | 2-5 min | Add a CSS class |
+| Simple | 10-20% | 5-15 min | Add form field |
+| Medium | 20-35% | 15-30 min | Create API endpoint |
+| Complex | 35-50% | 30-60 min | Implement auth flow |
+| Too Large | >50% | - | **SPLIT REQUIRED** |
+
+---
+
+## Splitting Large Tasks
+
+When a task exceeds 50% context estimate, decompose:
+
+### Before (Too Large)
+```yaml
+title: "Implement user authentication"
+# This is 3+ hours of work, dozens of decisions
+```
+
+### After (Properly Decomposed)
+```yaml
+tasks:
+  - title: "Create users table with password hash"
+    files: [db/migrations/001_users.sql]
+
+  - title: "Add signup endpoint with Zod validation"
+    files: [src/api/auth/signup.ts]
+    depends_on: [users-table]
+
+  - title: "Add login endpoint with JWT generation"
+    files: [src/api/auth/login.ts]
+    depends_on: [users-table]
+
+  - title: "Create auth middleware for protected routes"
+    files: [src/middleware/auth.ts]
+    depends_on: [login-endpoint]
+
+  - title: "Add refresh token rotation"
+    files: [src/api/auth/refresh.ts, db/migrations/002_refresh_tokens.sql]
+    depends_on: [auth-middleware]
+```
+
+---
+
+## Anti-Patterns
+
+### Vague Verbs
+**Bad:** "Improve", "Enhance", "Update", "Fix" (without specifics)
+**Good:** "Add X", "Change Y to Z", "Remove W"
+
+### Missing Constraints
+**Bad:** "Add validation"
+**Good:** "Add Zod validation: email format, password 8+ chars with number"
+
+### Implied Knowledge
+**Bad:** "Handle the edge cases"
+**Good:** "Handle: empty input (show error), network failure (retry 3x), duplicate email (show message)"
+
+### Compound Tasks
+**Bad:** "Set up auth and create the user management pages"
+**Good:** Two separate tasks with dependency
+
+### No Success Criteria
+**Bad:** "Make it work"
+**Good:** "Tests pass, no TypeScript errors, manual verification of happy path"
+
+---
+
+## Examples by Domain
+
+### API Endpoint
+
+```yaml
+title: "Create POST /api/projects endpoint"
+files:
+  - src/api/projects/create.ts
+  - src/api/projects/schema.ts
+
+action: |
+  Create endpoint accepting:
+  - name: string (3-50 chars, required)
+  - description: string (max 500 chars, optional)
+
+  Returns:
+  - 201: { id, name, description, createdAt }
+  - 400: { error: "validation message" }
+  - 401: { error: "Unauthorized" }
+
+  Use Zod for validation, drizzle for DB insert.
+
+verify:
+  - "npm run test -- projects.test.ts"
+  - "curl -X POST /api/projects -d '{\"name\": \"Test\"}' returns 201"
+
+done:
+  - "Endpoint creates project in database"
+  - "Validation rejects invalid input with clear messages"
+  - "Auth middleware blocks unauthenticated requests"
+```
+
+### React Component
+
+```yaml
+title: "Create ProjectCard component"
+files:
+  - src/components/ProjectCard.tsx
+  - src/components/ProjectCard.test.tsx
+
+action: |
+  Create card displaying:
+  - Project name (truncate at 30 chars)
+  - Description preview (2 lines max)
+  - Created date (relative: "2 days ago")
+  - Status badge (active/archived)
+
+  Props: { project: Project, onClick: () => void }
+  Use Tailwind: rounded-lg, shadow-sm, hover:shadow-md
+
+verify:
+  - "npm run typecheck"
+  - "npm run test -- ProjectCard"
+  - "Storybook renders all variants"
+
+done:
+  - "Card renders with all project fields"
+  - "Truncation works for long names"
+  - "Hover state visible"
+  - "Click handler fires"
+```
+
+### Database Migration
+
+```yaml
+title: "Create projects table"
+files:
+  - db/migrations/003_projects.sql
+  - src/db/schema/projects.ts
+
+action: |
+  Create table:
+  - id: TEXT PRIMARY KEY (uuid)
+  - user_id: TEXT NOT NULL REFERENCES users(id)
+  - name: TEXT NOT NULL
+  - description: TEXT
+  - status: TEXT DEFAULT 'active' CHECK (IN 'active', 'archived')
+  - created_at: INTEGER DEFAULT unixepoch()
+  - updated_at: INTEGER DEFAULT unixepoch()
+
+  Indexes: user_id, status, created_at DESC
+
+verify:
+  - "cw db migrate runs without error"
+  - "sqlite3 data/cw.db '.schema projects' shows correct schema"
+
+done:
+  - "Migration applies cleanly"
+  - "Drizzle schema matches SQL"
+  - "Indexes created"
+```
+
+---
+
+## Checklist Before Creating Task
+
+- [ ] Can an agent execute this without asking questions?
+- [ ] Are all files listed explicitly?
+- [ ] Is the action specific (not "improve" or "handle")?
+- [ ] Is there a concrete verify step?
+- [ ] Are done criteria measurable?
+- [ ] Does estimated context fit under 50%?
+- [ ] Are there no compound actions (split if needed)?
--- a/docs/tasks.md
+++ b/docs/tasks.md
@@ -0,0 +1,331 @@
+# Tasks Module
+
+Beads-inspired task management optimized for multi-agent coordination. Unlike beads (Git-distributed JSONL), this uses centralized SQLite for simplicity since all agents share the same workspace.
+
+## Design Rationale
+
+### Why Not Just Use Beads?
+
+Beads solves a different problem: distributed task tracking across forked repos with zero coordination. We don't need that:
+
+- All Workers operate in the same workspace under one `cw` server
+- SQLite is the single source of truth
+- tRPC exposes task queries directly to agents and dashboard
+- No merge conflicts, no Git overhead
+
+### Core Agent Problem Solved
+
+Agents need to answer: **"What should I work on next?"**
+
+The `ready` query solves this: tasks that are `open` with all dependencies `closed`. Combined with priority ordering, agents can self-coordinate without human intervention.
+
+---
+
+## Data Model
+
+### Task Entity
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | TEXT | Primary key. Hash-based (e.g., `tsk-a1b2c3`) or UUID |
+| `parent_id` | TEXT | Optional. References parent task for hierarchies |
+| `initiative_id` | TEXT | Optional. Links to Initiatives module |
+| `phase_id` | TEXT | Optional. Links to initiative phase (for grouped approval) |
+| `project_id` | TEXT | Optional. Scopes task to a project |
+| `title` | TEXT | Required. Short task name |
+| `description` | TEXT | Optional. Markdown-formatted details |
+| `type` | TEXT | `task` (default), `epic`, `subtask` |
+| `status` | TEXT | `open`, `in_progress`, `blocked`, `closed` |
+| `priority` | INTEGER | 0=critical, 1=high, 2=normal (default), 3=low |
+| `assigned_to` | TEXT | Agent/worker ID currently working on this |
+| `assigned_at` | INTEGER | Unix timestamp when assigned |
+| `metadata` | TEXT | JSON blob for extensibility |
+| `created_at` | INTEGER | Unix timestamp |
+| `updated_at` | INTEGER | Unix timestamp |
+| `closed_at` | INTEGER | Unix timestamp when closed |
+| `closed_reason` | TEXT | Why/how the task was completed |
+
+### Task Dependencies
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `task_id` | TEXT | The task that is blocked |
+| `depends_on` | TEXT | The task that must complete first |
+| `type` | TEXT | `blocks` (default), `related` |
+
+### Task History
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | INTEGER | Auto-increment primary key |
+| `task_id` | TEXT | The task that changed |
+| `field` | TEXT | Which field changed |
+| `old_value` | TEXT | Previous value |
+| `new_value` | TEXT | New value |
+| `changed_by` | TEXT | Agent/user ID |
+| `changed_at` | INTEGER | Unix timestamp |
+
+---
+
+## SQLite Schema
+
+```sql
+CREATE TABLE tasks (
+  id TEXT PRIMARY KEY,
+  parent_id TEXT REFERENCES tasks(id),
+  initiative_id TEXT,
+  phase_id TEXT,
+  project_id TEXT,
+
+  title TEXT NOT NULL,
+  description TEXT,
+  type TEXT NOT NULL DEFAULT 'task' CHECK (type IN ('task', 'epic', 'subtask')),
+
+  status TEXT NOT NULL DEFAULT 'open' CHECK (status IN ('open', 'in_progress', 'blocked', 'closed')),
+  priority INTEGER NOT NULL DEFAULT 2 CHECK (priority BETWEEN 0 AND 3),
+
+  assigned_to TEXT,
+  assigned_at INTEGER,
+
+  metadata TEXT,
+
+  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  closed_at INTEGER,
+  closed_reason TEXT
+);
+
+CREATE TABLE task_dependencies (
+  task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
+  depends_on TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
+  type TEXT NOT NULL DEFAULT 'blocks' CHECK (type IN ('blocks', 'related')),
+  PRIMARY KEY (task_id, depends_on),
+  CHECK (task_id != depends_on)
+);
+
+CREATE TABLE task_history (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
+  field TEXT NOT NULL,
+  old_value TEXT,
+  new_value TEXT,
+  changed_by TEXT,
+  changed_at INTEGER NOT NULL DEFAULT (unixepoch())
+);
+
+CREATE INDEX idx_tasks_status ON tasks(status);
+CREATE INDEX idx_tasks_priority ON tasks(priority);
+CREATE INDEX idx_tasks_assigned ON tasks(assigned_to);
+CREATE INDEX idx_tasks_project ON tasks(project_id);
+CREATE INDEX idx_tasks_initiative ON tasks(initiative_id);
+CREATE INDEX idx_tasks_phase ON tasks(phase_id);
+CREATE INDEX idx_task_history_task ON task_history(task_id);
+
+-- The critical view for agent work discovery
+-- Tasks are ready when: open, no blocking deps, and phase approved (if linked)
+CREATE VIEW ready_tasks AS
+SELECT t.* FROM tasks t
+LEFT JOIN initiative_phases p ON t.phase_id = p.id
+WHERE t.status = 'open'
+  AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
+  AND NOT EXISTS (
+    SELECT 1 FROM task_dependencies d
+    JOIN tasks dep ON d.depends_on = dep.id
+    WHERE d.task_id = t.id
+      AND d.type = 'blocks'
+      AND dep.status != 'closed'
+  )
+ORDER BY t.priority ASC, t.created_at ASC;
+```
+
+---
+
+## Status Workflow
+
+```
+     ┌──────────────────────────────────────┐
+     │                                      │
+     ▼                                      │
+  [open] ──claim──▶ [in_progress] ──done──▶ [closed]
+     │                    │
+     │                    │ blocked
+     │                    ▼
+     └───────────── [blocked] ◀─────unblock───┘
+```
+
+| Transition | Trigger | Notes |
+|------------|---------|-------|
+| `open` → `in_progress` | Agent claims task | Sets `assigned_to`, `assigned_at` |
+| `in_progress` → `closed` | Work completed | Sets `closed_at`, `closed_reason` |
+| `in_progress` → `blocked` | External dependency | Manual or auto-detected |
+| `blocked` → `open` | Blocker resolved | Clears assignment |
+| `open` → `closed` | Cancelled/won't do | Direct close without work |
+
+---
+
+## CLI Reference
+
+All commands under `cw task` subcommand.
+
+### Core Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw task ready` | List tasks ready for work (open + no blockers) |
+| `cw task list [--status STATUS] [--project ID]` | List tasks with filters |
+| `cw task show <id>` | Show task details + history |
+| `cw task create <title> [-p PRIORITY] [-d DESC]` | Create new task |
+| `cw task update <id> [--status STATUS] [--priority P]` | Update task fields |
+| `cw task close <id> [--reason REASON]` | Mark task complete |
+
+### Dependency Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw task dep add <task> <depends-on>` | Task blocked by another |
+| `cw task dep rm <task> <depends-on>` | Remove dependency |
+| `cw task dep tree <id>` | Show dependency graph |
+
+### Assignment Commands
+
+| Command | Description |
+|---------|-------------|
+| `cw task assign <id> <agent>` | Assign task to agent |
+| `cw task unassign <id>` | Release task |
+| `cw task mine` | List tasks assigned to current agent |
+
+### Output Flags (global)
+
+| Flag | Description |
+|------|-------------|
+| `--json` | Output as JSON (for agent consumption) |
+| `--quiet` | Minimal output (just IDs) |
+
+---
+
+## Agent Workflow
+
+Standard loop for Workers:
+
+```
+1. cw task ready --json
+2. Pick highest priority task from result
+3. cw task update <id> --status in_progress
+4. Do the work
+5. cw task close <id> --reason "Implemented X"
+6. Loop to step 1
+```
+
+If `cw task ready` returns empty, the agent's work is done.
+
+---
+
+## Integration Points
+
+### With Initiatives
+- Tasks can link to an initiative via `initiative_id`
+- When initiative is approved, tasks are generated from its technical concept
+- Closing all tasks for an initiative signals initiative completion
+
+### With Orchestrator
+- Orchestrator queries `ready_tasks` view to dispatch work
+- Assignment tracked to prevent double-dispatch
+- Orchestrator can bulk-create tasks from job definitions
+
+### With Workers
+- Workers claim tasks via `cw task update --status in_progress`
+- Worker ID stored in `assigned_to`
+- On worker crash, Supervisor can detect stale assignments and reset
+
+### tRPC Procedures
+
+```typescript
+// Suggested tRPC router shape
+task.list(filters)      // → Task[]
+task.ready(projectId?)  // → Task[]
+task.get(id)            // → Task | null
+task.create(input)      // → Task
+task.update(id, input)  // → Task
+task.close(id, reason)  // → Task
+task.assign(id, agent)  // → Task
+task.history(id)        // → TaskHistory[]
+task.depAdd(id, dep)    // → void
+task.depRemove(id, dep) // → void
+task.depTree(id)        // → DependencyTree
+```
+
+---
+
+## Task Granularity Standards
+
+A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results.
+
+### Quick Reference
+
+| Too Vague | Just Right |
+|-----------|------------|
+| "Add authentication" | "Add JWT auth with refresh rotation using jose, httpOnly cookie, 15min access / 7day refresh" |
+| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name 3-50 chars, returns 201" |
+| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner" |
+
+### Required Task Components
+
+Every task MUST include:
+
+1. **files** — Exact paths modified/created
+2. **action** — What to do, what to avoid, WHY
+3. **verify** — Command or check to prove completion
+4. **done** — Measurable acceptance criteria
+
+See [task-granularity.md](task-granularity.md) for comprehensive examples and anti-patterns.
+
+### Context Budget
+
+Tasks are sized to fit agent context budgets:
+
+| Complexity | Context % | Example |
+|------------|-----------|---------|
+| Simple | 10-20% | Add form field |
+| Medium | 20-35% | Create API endpoint |
+| Complex | 35-50% | Implement auth flow |
+| Too Large | >50% | **SPLIT REQUIRED** |
+
+See [context-engineering.md](context-engineering.md) for context management rules.
+
+---
+
+## Deviation Handling
+
+When Workers encounter unexpected issues during execution, they follow deviation rules:
+
+| Rule | Action | Permission |
+|------|--------|------------|
+| Rule 1: Bug fixes | Auto-fix | None needed |
+| Rule 2: Missing critical (validation, auth) | Auto-add | None needed |
+| Rule 3: Blocking issues (deps, imports) | Auto-fix | None needed |
+| Rule 4: Architectural changes | ASK | Required |
+
+See [deviation-rules.md](deviation-rules.md) for detailed guidance.
+
+---
+
+## Execution Artifacts
+
+Task execution produces artifacts:
+
+| Artifact | Purpose |
+|----------|---------|
+| Commits | Per-task atomic commits |
+| SUMMARY.md | Record of what happened |
+| STATE.md updates | Position tracking |
+
+See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
+
+---
+
+## Future Considerations
+
+- **Compaction**: Summarize old closed tasks to reduce DB size (beads does this with LLM)
+- **Labels/tags**: Additional categorization beyond type
+- **Time tracking**: Estimated vs actual time for capacity planning
+- **Recurring tasks**: Templates that spawn new tasks on schedule
--- a/docs/verification.md
+++ b/docs/verification.md
@@ -0,0 +1,322 @@
+# Goal-Backward Verification
+
+Verification confirms that **goals are achieved**, not merely that **tasks were completed**. A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
+
+## Core Principle
+
+**Task completion ≠ Goal achievement**
+
+Tasks are implementation steps. Goals are user outcomes. Verification bridges the gap by checking observable outcomes, not just checklist items.
+
+---
+
+## Verification Levels
+
+### Level 1: Existence Check
+Does the artifact exist?
+
+```
+✓ File exists at expected path
+✓ Component is exported
+✓ Route is registered
+```
+
+### Level 2: Substance Check
+Is the artifact substantive (not a stub)?
+
+```
+✓ Function has implementation (not just return null)
+✓ Component renders content (not empty div)
+✓ API returns meaningful response (not placeholder)
+```
+
+### Level 3: Wiring Check
+Is the artifact connected to the system?
+
+```
+✓ Component is rendered somewhere
+✓ API endpoint is called by client
+✓ Event handler is attached
+✓ Database query is executed
+```
+
+**All three levels must pass for verification success.**
+
+---
+
+## Must-Have Derivation
+
+Before verification, derive what "done" means from the goal:
+
+### 1. Observable Truths (3-7 user perspectives)
+What can a user observe when the goal is achieved?
+
+```yaml
+observable_truths:
+  - "User can click 'Send' and message appears in chat"
+  - "Messages persist after page refresh"
+  - "New messages appear without page reload"
+  - "User sees typing indicator when other party types"
+```
+
+### 2. Required Artifacts
+What files MUST exist?
+
+```yaml
+required_artifacts:
+  - path: src/components/Chat.tsx
+    check: "Exports Chat component"
+  - path: src/api/messages.ts
+    check: "Exports sendMessage, getMessages"
+  - path: src/hooks/useChat.ts
+    check: "Exports useChat hook"
+```
+
+### 3. Required Wiring
+What connections MUST work?
+
+```yaml
+required_wiring:
+  - from: Chat.tsx
+    to: useChat.ts
+    check: "Component calls hook"
+  - from: useChat.ts
+    to: messages.ts
+    check: "Hook calls API"
+  - from: messages.ts
+    to: database
+    check: "API persists to DB"
+```
+
+### 4. Key Links (Where Stubs Hide)
+What integration points commonly fail?
+
+```yaml
+key_links:
+  - "Form onSubmit → API call (not just console.log)"
+  - "WebSocket connection → message handler"
+  - "API response → state update → render"
+```
+
+---
+
+## Verification Process
+
+### Phase Verification
+
+After all tasks in a phase complete:
+
+```
+1. Load must-haves (from phase goal or PLAN frontmatter)
+2. For each observable truth:
+   a. Level 1: Does the relevant code exist?
+   b. Level 2: Is it substantive?
+   c. Level 3: Is it wired?
+3. For each required artifact:
+   a. Verify file exists
+   b. Verify not a stub
+   c. Verify it's imported/used
+4. For each key link:
+   a. Trace the connection
+   b. Verify data flows
+5. Scan for anti-patterns (see below)
+6. Structure gaps for re-planning
+```
+
+### Anti-Pattern Scanning
+
+Check for common incomplete work:
+
+| Pattern | Detection | Meaning |
+|---------|-----------|---------|
+| `// TODO` | Grep for TODO comments | Work deferred |
+| `throw new Error('Not implemented')` | Grep for stub errors | Placeholder code |
+| `return null` / `return {}` | AST analysis | Empty implementations |
+| `console.log` in handlers | Grep for console.log | Debug code left behind |
+| Empty catch blocks | AST analysis | Swallowed errors |
+| Hardcoded values | Manual review | Missing configuration |
+
+---
+
+## Verification Output
+
+### Pass Case
+
+```yaml
+# 2-VERIFICATION.md
+phase: 2
+status: PASS
+verified_at: 2024-01-15T10:30:00Z
+
+observable_truths:
+  - truth: "User can send message"
+    status: VERIFIED
+    evidence: "Chat.tsx:45 calls sendMessage on submit"
+  - truth: "Messages persist"
+    status: VERIFIED
+    evidence: "messages.ts:23 inserts to SQLite"
+
+required_artifacts:
+  - path: src/components/Chat.tsx
+    status: EXISTS
+    check: PASSED
+  - path: src/api/messages.ts
+    status: EXISTS
+    check: PASSED
+
+anti_patterns_found: []
+
+human_verification_needed:
+  - "Visual layout matches design"
+  - "Real-time updates work under load"
+```
+
+### Fail Case (Gaps Found)
+
+```yaml
+# 2-VERIFICATION.md
+phase: 2
+status: GAPS_FOUND
+verified_at: 2024-01-15T10:30:00Z
+
+gaps:
+  - type: STUB
+    location: src/hooks/useChat.ts:34
+    description: "sendMessage returns immediately without API call"
+    severity: BLOCKING
+
+  - type: MISSING_WIRING
+    location: src/components/Chat.tsx
+    description: "WebSocket not connected, no real-time updates"
+    severity: BLOCKING
+
+  - type: ANTI_PATTERN
+    location: src/api/messages.ts:67
+    description: "Empty catch block swallows errors"
+    severity: HIGH
+
+remediation_plan:
+  - "Connect useChat to actual API endpoint"
+  - "Initialize WebSocket in Chat component"
+  - "Add error handling to API calls"
+```
+
+---
+
+## User Acceptance Testing (UAT)
+
+Verification confirms code correctness. UAT confirms user experience.
+
+### UAT Process
+
+1. Extract testable deliverables from phase goal
+2. Walk user through each one:
+   - "Can you log in with your email?"
+   - "Does the dashboard show your projects?"
+   - "Can you create a new project?"
+3. Record result: PASS, FAIL, or describe issue
+4. If issues found:
+   - Diagnose root cause
+   - Create targeted fix plan
+5. If all pass: Phase complete
+
+### UAT Output
+
+```yaml
+# 2-UAT.md
+phase: 2
+tested_by: user
+tested_at: 2024-01-15T14:00:00Z
+
+test_cases:
+  - case: "Login with email"
+    result: PASS
+
+  - case: "Dashboard shows projects"
+    result: FAIL
+    issue: "Shows loading spinner forever"
+    diagnosis: "API returns 500, missing auth header"
+
+  - case: "Create new project"
+    result: BLOCKED
+    reason: "Cannot test, dashboard not loading"
+
+fix_required: true
+fix_plan:
+  - task: "Add auth header to dashboard API call"
+    files: [src/api/projects.ts]
+    priority: P0
+```
+
+---
+
+## Integration with Task Workflow
+
+### Task Completion Hook
+When task closes:
+1. Worker marks task closed with reason
+2. If all phase tasks closed, trigger phase verification
+3. Verifier agent runs goal-backward check
+4. If PASS: Phase marked complete
+5. If GAPS: Create remediation tasks, phase stays in_progress
+
+### Verification Task Type
+Verification itself is a task:
+
+```yaml
+type: verification
+phase_id: phase-2
+status: open
+assigned_to: verifier-agent
+priority: P0  # Always high priority
+```
+
+---
+
+## Checkpoint Types
+
+During execution, agents may need human input. Use precise checkpoint types:
+
+### checkpoint:human-verify (90% of checkpoints)
+Agent completed work, user confirms it works.
+
+```yaml
+checkpoint: human-verify
+prompt: "Can you log in with email and password?"
+expected: "User confirms successful login"
+```
+
+### checkpoint:decision (9% of checkpoints)
+User must make implementation choice.
+
+```yaml
+checkpoint: decision
+prompt: "OAuth2 or SAML for SSO?"
+options:
+  - OAuth2: "Simpler, most common"
+  - SAML: "Enterprise requirement"
+```
+
+### checkpoint:human-action (1% of checkpoints)
+Truly unavoidable manual step.
+
+```yaml
+checkpoint: human-action
+prompt: "Click the email verification link"
+reason: "Cannot automate email client interaction"
+```
+
+---
+
+## Human Verification Needs
+
+Some verifications require human eyes:
+
+| Category | Examples | Why Human |
+|----------|----------|-----------|
+| Visual | Layout, spacing, colors | Subjective/design judgment |
+| Real-time | WebSocket, live updates | Requires interaction |
+| External | OAuth flow, payment | Third-party systems |
+| Accessibility | Screen reader, keyboard nav | Requires tooling/expertise |
+
+**Mark these explicitly** in verification output. Don't claim PASS when human verification is pending.