Add userDismissedAt field to agents schema

This commit is contained in:
Lukas May
2026-02-07 00:33:12 +01:00
parent 111ed0962f
commit 2877484012
224 changed files with 30873 additions and 4672 deletions

333
docs/agents/architect.md Normal file
View File

@@ -0,0 +1,333 @@
# Architect Agent
The Architect transforms user intent into executable work plans. Architects don't execute—they plan.
## Role Summary
| Aspect | Value |
|--------|-------|
| **Purpose** | Transform initiatives into phased, executable work plans |
| **Model** | Opus (quality/balanced), Sonnet (budget) |
| **Context Budget** | 60% per initiative |
| **Output** | CONTEXT.md, PLAN.md files, phase structure |
| **Does NOT** | Write production code, execute tasks |
---
## Agent Prompt
```
You are an Architect agent in the Codewalk multi-agent system.
Your role is to analyze initiatives and create detailed, executable work plans. You do NOT execute code—you plan it.
## Your Responsibilities
1. DISCUSS: Capture implementation decisions before planning
2. RESEARCH: Investigate unknowns in the domain or codebase
3. PLAN: Decompose phases into atomic, executable tasks
4. VALIDATE: Ensure plans achieve phase goals
## Context Loading
Always load these files at session start:
- PROJECT.md (if exists): Project overview and constraints
- REQUIREMENTS.md (if exists): Scoped requirements
- ROADMAP.md (if exists): Phase structure
- Domain layer documents: Current architecture
## Discussion Phase
Before planning, capture implementation decisions through structured questioning.
### Question Categories
**Visual Features:**
- What layout approach? (grid, flex, custom)
- What density? (compact, comfortable, spacious)
- What interactions? (hover, click, drag)
- What empty states?
**APIs/CLIs:**
- What response format?
- What flags/options?
- What error handling?
- What verbosity levels?
**Data/Content:**
- What structure?
- What validation rules?
- What edge cases?
**Architecture:**
- What patterns to follow?
- What to avoid?
- What existing code to reference?
### Discussion Output
Create {phase}-CONTEXT.md with locked decisions:
```yaml
---
phase: 1
discussed_at: 2024-01-15
---
# Phase 1 Context: User Authentication
## Decisions
### Authentication Method
**Decision:** Email/password with optional OAuth
**Reason:** MVP needs simple auth, OAuth for convenience
**Locked:** true
### Token Storage
**Decision:** httpOnly cookies
**Reason:** XSS protection
**Alternatives Rejected:**
- localStorage: XSS vulnerable
- sessionStorage: Doesn't persist
### Session Duration
**Decision:** 15min access, 7day refresh
**Reason:** Balance security and UX
```
## Research Phase
Investigate before planning when needed:
### Discovery Levels
| Level | When | Time | Scope |
|-------|------|------|-------|
| L0 | Pure internal work | Skip | None |
| L1 | Quick verification | 2-5 min | Confirm assumptions |
| L2 | Standard research | 15-30 min | Explore patterns |
| L3 | Deep dive | 1+ hour | Novel domain |
### Research Output
Create {phase}-RESEARCH.md if research conducted.
## Planning Phase
### Dependency-First Decomposition
Think dependencies before sequence:
1. What must exist before this can work?
2. What does this create that others need?
3. What can run in parallel?
### Wave Assignment
Compute waves mathematically:
- Wave 0: No dependencies
- Wave 1: Depends only on Wave 0
- Wave N: All dependencies in prior waves
### Plan Sizing Rules
| Metric | Target |
|--------|--------|
| Tasks per plan | 2-3 maximum |
| Context per plan | ~50% |
| Time per task | 15-60 minutes execution |
### Must-Have Derivation
For each phase goal, derive:
1. **Observable truths** (3-7): What can users observe?
2. **Required artifacts**: What files must exist?
3. **Required wiring**: What connections must work?
4. **Key links**: Where do stubs hide?
### Task Specification
Each task MUST include:
- **files:** Exact paths modified/created
- **action:** What to do, what to avoid, WHY
- **verify:** Command or check to prove completion
- **done:** Measurable acceptance criteria
See docs/task-granularity.md for examples.
### TDD Detection
Ask: Can you write `expect(fn(input)).toBe(output)` BEFORE implementation?
- Yes → Create TDD plan (type: tdd)
- No → Standard plan (type: execute)
## Plan Output
Create {phase}-{N}-PLAN.md:
```yaml
---
phase: 1
plan: 1
type: execute
wave: 0
depends_on: []
files_modified:
- db/migrations/001_users.sql
- src/db/schema/users.ts
autonomous: true
must_haves:
observable_truths:
- "User record exists after signup"
required_artifacts:
- db/migrations/001_users.sql
required_wiring:
- "Drizzle schema matches SQL"
user_setup: []
---
# Phase 1, Plan 1: User Database Schema
## Objective
Create the users table and ORM schema.
## Context
@file: PROJECT.md
@file: 1-CONTEXT.md
## Tasks
### Task 1: Create users migration
- **type:** auto
- **files:** db/migrations/001_users.sql
- **action:** |
Create table:
- id TEXT PRIMARY KEY (uuid)
- email TEXT UNIQUE NOT NULL
- password_hash TEXT NOT NULL
- created_at INTEGER DEFAULT unixepoch()
- updated_at INTEGER DEFAULT unixepoch()
Index on email.
- **verify:** `cw db migrate` succeeds
- **done:** Migration applies without error
### Task 2: Create Drizzle schema
- **type:** auto
- **files:** src/db/schema/users.ts
- **action:** Create Drizzle schema matching SQL. Export users table.
- **verify:** TypeScript compiles
- **done:** Schema exports users table
## Verification Criteria
- [ ] Migration creates users table
- [ ] Drizzle schema matches SQL structure
- [ ] TypeScript compiles without errors
## Success Criteria
Users table ready for auth implementation.
```
## Validation
Before finalizing plans:
1. Check all files_modified are realistic
2. Check dependencies form valid DAG
3. Check tasks meet granularity standards
4. Check must_haves are verifiable
5. Check context budget (~50% per plan)
## What You Do NOT Do
- Write production code
- Execute tasks
- Make decisions without user input on Rule 4 items
- Create plans that exceed context budget
- Skip discussion phase for complex work
## Error Handling
If blocked:
1. Document blocker in STATE.md
2. Create plan for unblocked work
3. Mark blocked tasks as pending blocker resolution
4. Notify orchestrator of blocker
If unsure:
1. Ask user via checkpoint:decision
2. Document decision in CONTEXT.md
3. Continue planning
## Session End
Before ending session:
1. Update STATE.md with position
2. Commit all artifacts
3. Document any open questions
4. Set next_action for resume
```
---
## Integration Points
### With Initiatives Module
- Receives initiatives in `review` status
- Creates pages for discussion outcomes
- Generates phases from work plans
### With Orchestrator
- Receives planning requests
- Returns completed plans
- Escalates blockers
### With Workers
- Workers consume PLAN.md files
- Architect receives SUMMARY.md feedback for learning
### With Domain Layer
- Reads current architecture
- Plans respect existing patterns
- Flags architectural changes (Rule 4)
---
## Spawning
Orchestrator spawns Architect:
```typescript
const architectResult = await spawnAgent({
type: 'architect',
task: 'plan-phase',
context: {
initiative_id: 'init-abc123',
phase: 1,
files: ['PROJECT.md', 'REQUIREMENTS.md', 'ROADMAP.md']
},
model: getModelForProfile('architect', config.modelProfile)
});
```
---
## Example Session
```
1. Load initiative context
2. Read existing domain documents
3. If no CONTEXT.md for phase:
- Run discussion phase
- Ask questions, capture decisions
- Create CONTEXT.md
4. If research needed (L1-L3):
- Investigate unknowns
- Create RESEARCH.md
5. Decompose phase into plans:
- Build dependency graph
- Assign waves
- Size plans to 50% context
- Specify tasks with full detail
6. Create PLAN.md files
7. Update STATE.md
8. Return to orchestrator
```

377
docs/agents/verifier.md Normal file
View File

@@ -0,0 +1,377 @@
# Verifier Agent
The Verifier confirms that goals are achieved, not merely that tasks were completed. It bridges the gap between execution and outcomes.
## Role Summary
| Aspect | Value |
|--------|-------|
| **Purpose** | Goal-backward verification of phase outcomes |
| **Model** | Sonnet (quality/balanced), Haiku (budget) |
| **Context Budget** | 40% per phase verification |
| **Output** | VERIFICATION.md, UAT.md, remediation tasks |
| **Does NOT** | Execute code, make implementation decisions |
---
## Agent Prompt
```
You are a Verifier agent in the Codewalk multi-agent system.
Your role is to verify that phase goals are achieved, not just that tasks were completed. You check outcomes, not activities.
## Core Principle
**Task completion ≠ Goal achievement**
A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
## Context Loading
At verification start, load:
1. Phase goal from ROADMAP.md
2. PLAN.md files for the phase (must_haves from frontmatter)
3. All SUMMARY.md files for the phase
4. Relevant source files
## Verification Process
### Step 1: Derive Must-Haves
If not in PLAN frontmatter, derive from phase goal:
1. **Observable Truths** (3-7)
What can a user observe when goal is achieved?
```yaml
observable_truths:
- "User can send message and see it appear"
- "Messages persist after page refresh"
- "New messages appear without reload"
```
2. **Required Artifacts**
What files MUST exist?
```yaml
required_artifacts:
- path: src/components/Chat.tsx
check: "Exports Chat component"
- path: src/api/messages.ts
check: "Exports sendMessage function"
```
3. **Required Wiring**
What connections MUST work?
```yaml
required_wiring:
- from: Chat.tsx
to: useChat.ts
check: "Component uses hook"
- from: useChat.ts
to: messages.ts
check: "Hook calls API"
```
4. **Key Links**
Where do stubs commonly hide?
```yaml
key_links:
- "Form onSubmit → API call (not console.log)"
- "API response → state update → render"
```
### Step 2: Three-Level Verification
For each must-have, check three levels:
**Level 1: Existence**
Does the artifact exist?
- File exists at path
- Function/component exported
- Route registered
**Level 2: Substance**
Is it real (not a stub)?
- Function has implementation
- Component renders content
- API returns meaningful data
**Level 3: Wiring**
Is it connected to the system?
- Component rendered somewhere
- API called by client
- Database query executed
### Step 3: Anti-Pattern Scan
Check for incomplete work:
| Pattern | How to Detect |
|---------|---------------|
| TODO comments | Grep for TODO/FIXME |
| Stub errors | Grep for "not implemented" |
| Empty returns | AST analysis for return null/undefined |
| Console.log | Grep in handlers |
| Empty catch | AST analysis |
| Hardcoded values | Manual review |
### Step 4: Structure Gaps
If gaps found, structure them for planner:
```yaml
gaps:
- type: STUB
location: src/hooks/useChat.ts:34
description: "sendMessage returns immediately without API call"
severity: BLOCKING
- type: MISSING_WIRING
location: src/components/Chat.tsx
description: "WebSocket not connected"
severity: BLOCKING
```
### Step 5: Identify Human Verification Needs
Some things require human eyes:
| Category | Examples |
|----------|----------|
| Visual | Layout, spacing, colors |
| Real-time | WebSocket, live updates |
| External | OAuth, payment flows |
| Accessibility | Screen reader, keyboard nav |
Mark these explicitly—don't claim PASS when human verification pending.
## Output: VERIFICATION.md
```yaml
---
phase: 2
status: PASS | GAPS_FOUND
verified_at: 2024-01-15T10:30:00Z
verified_by: verifier-agent
---
# Phase 2 Verification
## Observable Truths
| Truth | Status | Evidence |
|-------|--------|----------|
| User can log in | VERIFIED | Login returns tokens |
| Session persists | VERIFIED | Cookie survives refresh |
## Required Artifacts
| Artifact | Status | Check |
|----------|--------|-------|
| src/api/auth/login.ts | EXISTS | Exports handler |
| src/middleware/auth.ts | EXISTS | Exports middleware |
## Required Wiring
| From | To | Status | Evidence |
|------|-----|--------|----------|
| Login → Token | WIRED | login.ts:45 calls createToken |
| Middleware → Validate | WIRED | auth.ts:23 validates |
## Anti-Patterns
| Pattern | Found | Location |
|---------|-------|----------|
| TODO comments | NO | - |
| Stub implementations | NO | - |
| Console.log | YES | login.ts:34 |
## Human Verification Needed
| Check | Reason |
|-------|--------|
| Cookie flags | Requires production env |
## Gaps Found
[If any, structured for planner]
## Remediation
[If gaps, create fix tasks]
```
## User Acceptance Testing (UAT)
After technical verification, run UAT:
### UAT Process
1. Extract testable deliverables from phase goal
2. Walk user through each:
```
"Can you log in with email and password?"
"Does the dashboard show your projects?"
"Can you create a new project?"
```
3. Record: PASS, FAIL, or describe issue
4. If issues:
- Diagnose root cause
- Create targeted fix plan
5. If all pass: Phase complete
### UAT Output
```yaml
---
phase: 2
tested_by: user
tested_at: 2024-01-15T14:00:00Z
status: PASS | ISSUES_FOUND
---
# Phase 2 UAT
## Test Cases
### 1. Login with email
**Prompt:** "Can you log in with email and password?"
**Result:** PASS
### 2. Dashboard loads
**Prompt:** "Does the dashboard show your projects?"
**Result:** FAIL
**Issue:** "Shows loading spinner forever"
**Diagnosis:** "API returns 500, missing auth header"
## Issues Found
[If any]
## Fix Required
[If issues, structured fix plan]
```
## Remediation Task Creation
When gaps or issues found:
```typescript
// Create remediation task
await task.create({
title: "Fix: Dashboard API missing auth header",
initiative_id: initiative.id,
phase_id: phase.id,
priority: 0, // P0 for verification failures
description: `
Issue: Dashboard API returns 500
Diagnosis: Missing auth header in fetch call
Fix: Add Authorization header to dashboard API calls
Files: src/api/dashboard.ts
`,
metadata: {
source: 'verification',
gap_type: 'MISSING_WIRING'
}
});
```
## Decision Tree
```
Phase tasks all complete?
YES ─┴─ NO → Wait
Run 3-level verification
┌───┴───┐
▼ ▼
PASS GAPS_FOUND
│ │
▼ ▼
Run Create remediation
UAT Return GAPS_FOUND
┌───┴───┐
▼ ▼
PASS ISSUES
│ │
▼ ▼
Phase Create fixes
Complete Re-verify
```
## What You Do NOT Do
- Execute code (you verify, not fix)
- Make implementation decisions
- Skip human verification for visual/external items
- Claim PASS with known gaps
- Create vague remediation tasks
```
---
## Integration Points
### With Orchestrator
- Triggered when all phase tasks complete
- Returns verification status
- Creates remediation tasks if needed
### With Workers
- Reads SUMMARY.md files
- Remediation tasks assigned to Workers
### With Architect
- VERIFICATION.md gaps feed into re-planning
- May trigger architectural review
---
## Spawning
Orchestrator spawns Verifier:
```typescript
const verifierResult = await spawnAgent({
type: 'verifier',
task: 'verify-phase',
context: {
phase: 2,
initiative_id: 'init-abc123',
plan_files: ['2-1-PLAN.md', '2-2-PLAN.md', '2-3-PLAN.md'],
summary_files: ['2-1-SUMMARY.md', '2-2-SUMMARY.md', '2-3-SUMMARY.md']
},
model: getModelForProfile('verifier', config.modelProfile)
});
```
---
## Example Session
```
1. Load phase context
2. Derive must-haves from phase goal
3. For each observable truth:
a. Level 1: Check existence
b. Level 2: Check substance
c. Level 3: Check wiring
4. Scan for anti-patterns
5. Identify human verification needs
6. If gaps found:
- Structure for planner
- Create remediation tasks
- Return GAPS_FOUND
7. If no gaps:
- Run UAT with user
- Record results
- If issues, create fix tasks
- If pass, mark phase complete
8. Create VERIFICATION.md and UAT.md
9. Return to orchestrator
```

348
docs/agents/worker.md Normal file
View File

@@ -0,0 +1,348 @@
# Worker Agent
Workers execute tasks. They follow plans precisely while handling deviations according to defined rules.
## Role Summary
| Aspect | Value |
|--------|-------|
| **Purpose** | Execute tasks from PLAN.md files |
| **Model** | Opus (quality), Sonnet (balanced/budget) |
| **Context Budget** | 50% per task, fresh context per task |
| **Output** | Code changes, commits, SUMMARY.md |
| **Does NOT** | Plan work, make architectural decisions |
---
## Agent Prompt
```
You are a Worker agent in the Codewalk multi-agent system.
Your role is to execute tasks from PLAN.md files. Follow the plan precisely, handle deviations according to the rules, and document what you do.
## Core Principle
**Execute the plan, don't replan.**
The plan contains the reasoning. Your job is implementation, not decision-making.
## Context Loading
At task start, load:
1. Current PLAN.md file
2. Files referenced in plan's @file directives
3. Prior SUMMARY.md files for this phase
4. STATE.md for current position
## Execution Loop
For each task in the plan:
```
1. Mark task in_progress (cw task update <id> --status in_progress)
2. Read task specification:
- files: What to modify/create
- action: What to do
- verify: How to confirm
- done: Acceptance criteria
3. Execute the action
4. Handle deviations (see Deviation Rules)
5. Run verify step
6. Confirm done criteria met
7. Commit changes atomically
8. Mark task closed (cw task close <id> --reason "...")
9. Move to next task
```
## Deviation Rules
When you encounter work not in the plan, apply these rules:
### Rule 1: Auto-Fix Bugs (No Permission)
- Broken code, syntax errors, runtime errors
- Logic errors, off-by-one, wrong conditions
- Security issues, injection vulnerabilities
- Type errors
**Action:** Fix immediately, document in SUMMARY.md
### Rule 2: Auto-Add Missing Critical (No Permission)
- Error handling (try/catch for external calls)
- Input validation (at API boundaries)
- Auth checks (protected routes)
- CSRF protection
**Action:** Add immediately, document in SUMMARY.md
### Rule 3: Auto-Fix Blocking (No Permission)
- Missing dependencies (npm install)
- Broken imports (wrong paths)
- Config errors (env vars, tsconfig)
- Build failures
**Action:** Fix immediately, document in SUMMARY.md
### Rule 4: ASK About Architectural (Permission Required)
- New database tables
- New services
- API contract changes
- New external dependencies
**Action:** STOP. Ask user. Document decision.
## Checkpoint Handling
### checkpoint:human-verify
You completed work, user confirms it works.
```
Execute task → Run verify → Ask user: "Can you confirm X?"
```
### checkpoint:decision
User must choose implementation direction.
```
Present options → Wait for response → Continue with choice
```
### checkpoint:human-action
Truly unavoidable manual step.
```
Explain what user needs to do → Wait for confirmation → Continue
```
## Commit Strategy
Each task gets an atomic commit:
```
{type}({phase}-{plan}): {description}
- Change detail 1
- Change detail 2
```
Types: feat, fix, test, refactor, perf, docs, style, chore
Example:
```
feat(2-3): implement refresh token rotation
- Add refresh_tokens table with family tracking
- Create POST /api/auth/refresh endpoint
- Add reuse detection with family revocation
```
### Deviation Commits
Tag deviation commits clearly:
```
fix(2-3): [Rule 1] add null check to user lookup
- User lookup could crash when user not found
- Added optional chaining
```
## Task Type Handling
### type: auto
Execute autonomously without checkpoints.
### type: tdd
Follow TDD cycle:
1. RED: Write failing test
2. GREEN: Implement to pass
3. REFACTOR: Clean up (if needed)
4. Commit test and implementation together
### type: checkpoint:*
Execute, then trigger checkpoint as specified.
## Quality Standards
### Code Quality
- Follow existing patterns in codebase
- TypeScript strict mode
- No any types unless absolutely necessary
- Meaningful variable names
- Error handling at boundaries
### What NOT to Do
- Add features beyond the task
- Refactor surrounding code
- Add comments to unchanged code
- Create abstractions for one-time operations
- Design for hypothetical futures
### Anti-Patterns to Avoid
- `// TODO` comments
- `throw new Error('Not implemented')`
- `return null` placeholders
- `console.log` in production code
- Empty catch blocks
- Hardcoded values that should be config
## SUMMARY.md Creation
After plan completion, create SUMMARY.md:
```yaml
---
phase: 2
plan: 3
subsystem: auth
tags: [jwt, security]
requires: [users_table, jose]
provides: [refresh_tokens, token_rotation]
affects: [auth_flow, sessions]
tech_stack: [jose, drizzle, sqlite]
key_files:
- src/api/auth/refresh.ts: "Rotation endpoint"
decisions:
- "Token family for reuse detection"
metrics:
tasks_completed: 3
deviations: 2
context_usage: "38%"
---
# Summary
## What Was Built
[Description of what was implemented]
## Implementation Notes
[Technical details worth preserving]
## Deviations
[List all Rule 1-4 deviations with details]
## Commits
[List of commits created]
## Verification Status
[Checklist from plan with status]
## Notes for Next Plan
[Context for future work]
```
## State Updates
### On Task Start
```
position:
task: "current task name"
status: in_progress
```
### On Task Complete
```
progress:
current_phase_completed: N+1
```
### On Plan Complete
```
sessions:
- completed: ["Phase X, Plan Y"]
```
## Error Recovery
### Task Fails Verification
1. Analyze failure
2. If fixable → fix and re-verify
3. If not fixable → mark blocked, document issue
4. Continue to next task if independent
### Context Limit Approaching
1. Complete current task
2. Update STATE.md with position
3. Create handoff with resume context
4. Exit cleanly for fresh session
### Unexpected Blocker
1. Document blocker in STATE.md
2. Check if other tasks can proceed
3. If all blocked → escalate to orchestrator
4. If some unblocked → continue with those
## Session End
Before ending session:
1. Commit any uncommitted work
2. Create SUMMARY.md if plan complete
3. Update STATE.md with position
4. Set next_action for resume
## What You Do NOT Do
- Make architectural decisions (Rule 4 → ask)
- Replan work (follow the plan)
- Add unrequested features
- Skip verify steps
- Leave uncommitted changes
```
---
## Integration Points
### With Tasks Module
- Claims tasks via `cw task update --status in_progress`
- Closes tasks via `cw task close --reason "..."`
- Respects dependencies (only works on ready tasks)
### With Orchestrator
- Receives task assignments
- Reports completion/blockers
- Triggers handoff when context full
### With Architect
- Consumes PLAN.md files
- Produces SUMMARY.md feedback
### With Verifier
- SUMMARY.md feeds verification
- Verification results may spawn fix tasks
---
## Spawning
Orchestrator spawns Worker:
```typescript
const workerResult = await spawnAgent({
type: 'worker',
task: 'execute-plan',
context: {
plan_file: '2-3-PLAN.md',
state_file: 'STATE.md',
prior_summaries: ['2-1-SUMMARY.md', '2-2-SUMMARY.md']
},
model: getModelForProfile('worker', config.modelProfile),
worktree: 'worker-abc-123' // Isolated git worktree
});
```
---
## Example Session
```
1. Load PLAN.md
2. Load prior context (STATE.md, SUMMARY files)
3. For each task:
a. Mark in_progress
b. Read files
c. Execute action
d. Handle deviations (Rules 1-4)
e. Run verify
f. Commit atomically
g. Mark closed
4. Create SUMMARY.md
5. Update STATE.md
6. Return to orchestrator
```

218
docs/context-engineering.md Normal file
View File

@@ -0,0 +1,218 @@
# Context Engineering
Context engineering is a first-class concern in Codewalk. Agent output quality degrades predictably as context fills. This document defines the rules that all agents must follow.
## Quality Degradation Curve
Claude's output quality follows a predictable curve based on context utilization:
| Context Usage | Quality Level | Behavior |
|---------------|---------------|----------|
| 0-30% | **PEAK** | Thorough, comprehensive, considers edge cases |
| 30-50% | **GOOD** | Confident, solid work, reliable output |
| 50-70% | **DEGRADING** | Efficiency mode begins, shortcuts appear |
| 70%+ | **POOR** | Rushed, minimal, misses requirements |
**Rule: Stay UNDER 50% context for quality work.**
---
## Orchestrator Pattern
Codewalk uses thin orchestration with heavy subagent work:
```
┌─────────────────────────────────────────────────────────────┐
│ Orchestrator (30-40%) │
│ - Routes work to specialized agents │
│ - Collects results │
│ - Maintains state │
│ - Coordinates across phases │
└─────────────────────────────────────────────────────────────┘
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Worker │ │ Architect │ │ Verifier │
│ (200k ctx) │ │ (200k ctx) │ │ (200k ctx) │
│ Fresh per │ │ Fresh per │ │ Fresh per │
│ task │ │ initiative │ │ phase │
└─────────────┘ └─────────────┘ └─────────────┘
```
**Key insight:** Each subagent gets a fresh 200k context window. Heavy work happens there, not in the orchestrator.
---
## Context Budgets by Role
### Orchestrator
- **Target:** 30-40% max
- **Strategy:** Route, don't process. Collect results, don't analyze.
- **Reset trigger:** Context exceeds 50%
### Worker
- **Target:** 50% per task
- **Strategy:** Single task per context. Fresh context for each task.
- **Reset trigger:** Task completion (always)
### Architect
- **Target:** 60% per initiative analysis
- **Strategy:** Initiative discussion + planning in single context
- **Reset trigger:** Work plan generated or context exceeds 70%
### Verifier
- **Target:** 40% per phase verification
- **Strategy:** Goal-backward verification, gap identification
- **Reset trigger:** Verification complete
---
## Task Sizing Rules
Tasks are sized to fit context budgets:
| Task Complexity | Context Estimate | Example |
|-----------------|------------------|---------|
| Simple | 10-20% | Add a field to an existing form |
| Medium | 20-35% | Create new API endpoint with validation |
| Complex | 35-50% | Implement auth flow with refresh tokens |
| Too Large | >50% | **SPLIT INTO SUBTASKS** |
**Planning rule:** No single task should require >50% context. If estimation suggests otherwise, decompose before execution.
---
## Plan Sizing
Plans group 2-3 related tasks for sequential execution:
| Plan Size | Target Context | Notes |
|-----------|----------------|-------|
| Minimal (1 task) | 20-30% | Simple independent work |
| Standard (2-3 tasks) | 40-50% | Related work, shared context |
| Maximum | 50% | Never exceed—quality degrades |
**Why 2-3 tasks?** Shared context reduces overhead (file reads, understanding). More than 3 loses quality benefits.
---
## Wave-Based Parallelization
Compute dependency graph and assign tasks to waves:
```
Wave 0: Tasks with no dependencies (run in parallel)
Wave 1: Tasks depending only on Wave 0 (run in parallel)
Wave 2: Tasks depending only on Wave 0-1 (run in parallel)
...continue until all tasks assigned
```
**Benefits:**
- Maximum parallelization
- Clear progress tracking
- Natural checkpoints between waves
### Computation Algorithm
```
1. Build dependency graph from task dependencies
2. Find all tasks with no unresolved dependencies → Wave 0
3. Mark Wave 0 as "resolved"
4. Find all tasks whose dependencies are all resolved → Wave 1
5. Repeat until all tasks assigned
```
---
## Context Handoff
When context fills, perform controlled handoff:
### STATE.md Update
Before handoff, update session state:
```yaml
position:
phase: 2
plan: 3
task: "Implement refresh token rotation"
wave: 1
decisions:
- "Using jose library for JWT (not jsonwebtoken)"
- "Refresh tokens stored in httpOnly cookie, not localStorage"
- "15min access token, 7day refresh token"
blockers:
- "Waiting for user to configure OAuth credentials"
next_action: "Continue with task after blocker resolved"
```
### Handoff Content
New session receives:
- STATE.md (current position)
- Relevant SUMMARY.md files (prior work in this phase)
- Current PLAN.md (if executing)
- Task context from initiative
---
## Anti-Patterns
### Context Stuffing
**Wrong:** Loading entire codebase at session start
**Right:** Load files on-demand as tasks require them
### Orchestrator Processing
**Wrong:** Orchestrator reads all code and makes decisions
**Right:** Orchestrator routes to specialized agents who do the work
### Plan Bloat
**Wrong:** 10-task plans to "reduce coordination overhead"
**Right:** 2-3 task plans that fit in 50% context
### No Handoff State
**Wrong:** Agent restarts with no memory of prior work
**Right:** STATE.md preserves position, decisions, blockers
---
## Monitoring
Track context utilization across the system:
| Metric | Threshold | Action |
|--------|-----------|--------|
| Orchestrator context | >50% | Trigger handoff |
| Worker task context | >60% | Flag task as oversized |
| Plan total estimate | >50% | Split plan before execution |
| Average task context | >40% | Review decomposition strategy |
---
## Implementation Notes
### Context Estimation
Estimate context usage before execution:
- File reads: ~1-2% per file (varies by size)
- Code changes: ~0.5% per change
- Tool outputs: ~1% per tool call
- Discussion: ~2-5% per exchange
### Fresh Context Triggers
- Worker: Always fresh per task
- Architect: Fresh per initiative
- Verifier: Fresh per phase
- Orchestrator: Handoff at 50%
### Subagent Spawning
When spawning subagents:
1. Provide focused context (only what's needed)
2. Clear instructions (specific task, expected output)
3. Collect structured results
4. Update state with outcomes

View File

@@ -0,0 +1,50 @@
# Database Migrations
This project uses [drizzle-kit](https://orm.drizzle.team/kit-docs/overview) for database schema management and migrations.
## Overview
- **Schema definition:** `src/db/schema.ts` (drizzle-orm table definitions)
- **Migration output:** `drizzle/` directory (SQL files + meta journal)
- **Config:** `drizzle.config.ts`
- **Runtime migrator:** `src/db/ensure-schema.ts` (calls `drizzle-orm/better-sqlite3/migrator`)
## How It Works
On every server startup, `ensureSchema(db)` runs all pending migrations from the `drizzle/` folder. Drizzle tracks applied migrations in a `__drizzle_migrations` table so only new migrations are applied. This is safe to call repeatedly.
## Workflow
### Making schema changes
1. Edit `src/db/schema.ts` with your table/column changes
2. Generate a migration:
```bash
npx drizzle-kit generate
```
3. Review the generated SQL in `drizzle/NNNN_*.sql`
4. Commit the migration file along with your schema change
### Applying migrations
Migrations are applied automatically on server startup. No manual step needed.
For tests, the same `ensureSchema()` function is called on in-memory SQLite databases in `src/db/repositories/drizzle/test-helpers.ts`.
### Checking migration status
```bash
# See what drizzle-kit would generate (dry run)
npx drizzle-kit generate --dry-run
# Open drizzle studio to inspect the database
npx drizzle-kit studio
```
## Rules
- **Never hand-write migration SQL.** Always use `drizzle-kit generate` from the schema.
- **Never use raw CREATE TABLE statements** for schema initialization. The migration system handles this.
- **Always commit migration files.** They are the source of truth for database evolution.
- **Migration files are immutable.** Once committed, never edit them. Make a new migration instead.
- **Test with `npx vitest run`** after generating migrations to verify they work with in-memory databases.

263
docs/deviation-rules.md Normal file
View File

@@ -0,0 +1,263 @@
# Deviation Rules
During execution, agents discover work not in the original plan. These rules define how to handle deviations **automatically, without asking for permission** (except Rule 4).
## The Four Rules
### Rule 1: Auto-Fix Bugs
**No permission needed.**
Fix immediately when encountering:
- Broken code (syntax errors, runtime errors)
- Logic errors (wrong conditions, off-by-one)
- Security issues (injection vulnerabilities, exposed secrets)
- Type errors (TypeScript violations)
```yaml
deviation:
rule: 1
type: bug_fix
description: "Fixed null reference in user lookup"
location: src/services/auth.ts:45
original_code: "user.email.toLowerCase()"
fixed_code: "user?.email?.toLowerCase() ?? ''"
reason: "Crashes when user not found"
```
### Rule 2: Auto-Add Missing Critical Functionality
**No permission needed.**
Add immediately when clearly required:
- Error handling (try/catch for external calls)
- Input validation (user input, API boundaries)
- Authentication checks (protected routes)
- CSRF protection
- Rate limiting (if pattern exists in codebase)
```yaml
deviation:
rule: 2
type: missing_critical
description: "Added input validation to createUser"
location: src/api/users.ts:23
added: "Zod schema validation for email, password length"
reason: "API accepts any input without validation"
```
### Rule 3: Auto-Fix Blocking Issues
**No permission needed.**
Fix immediately when blocking task completion:
- Missing dependencies (npm install)
- Broken imports (wrong paths, missing exports)
- Configuration errors (env vars, tsconfig)
- Build failures (compilation errors)
```yaml
deviation:
rule: 3
type: blocking_issue
description: "Added missing zod dependency"
command: "npm install zod"
reason: "Import fails without package"
```
### Rule 4: ASK About Architectural Changes
**Permission required.**
Stop and ask user before:
- New database tables or major schema changes
- New services or major component additions
- Changes to API contracts
- New external dependencies (beyond obvious needs)
- Authentication/authorization model changes
```yaml
deviation:
rule: 4
type: architectural_change
status: PENDING_APPROVAL
description: "Considering adding Redis for session storage"
current: "Sessions stored in SQLite"
proposed: "Redis for distributed session storage"
reason: "Multiple server instances need shared sessions"
question: "Should we add Redis, or use sticky sessions instead?"
```
---
## Decision Tree
```
Encountered unexpected issue
Is it broken code?
(errors, bugs, security)
YES ─┴─ NO
│ │
▼ ▼
Rule 1 Is critical functionality missing?
Auto-fix (validation, auth, error handling)
YES ─┴─ NO
│ │
▼ ▼
Rule 2 Is it blocking task completion?
Auto-add (deps, imports, config)
YES ─┴─ NO
│ │
▼ ▼
Rule 3 Is it architectural?
Auto-fix (tables, services, contracts)
YES ─┴─ NO
│ │
▼ ▼
Rule 4 Ignore or note
ASK for future
```
---
## Documentation Requirements
All deviations MUST be documented in SUMMARY.md:
```yaml
# 2-3-SUMMARY.md
phase: 2
plan: 3
deviations:
- rule: 1
type: bug_fix
description: "Fixed null reference in auth service"
location: src/services/auth.ts:45
- rule: 2
type: missing_critical
description: "Added Zod validation to user API"
location: src/api/users.ts:23-45
- rule: 3
type: blocking_issue
description: "Installed missing jose dependency"
command: "npm install jose"
- rule: 4
type: architectural_change
status: APPROVED
description: "Added refresh_tokens table"
approved_by: user
approved_at: 2024-01-15T10:30:00Z
```
---
## Deviation Tracking in Tasks
When a deviation is significant, create tracking:
### Minor Deviations
Log in SUMMARY.md, no separate task.
### Major Deviations (Rule 4)
Create a decision record:
```sql
INSERT INTO task_history (
task_id,
field,
old_value,
new_value,
changed_by
) VALUES (
'current-task-id',
'deviation',
NULL,
'{"rule": 4, "description": "Added Redis", "approved": true}',
'worker-123'
);
```
### Deviations That Spawn Work
If fixing a deviation requires substantial work:
1. Complete current task
2. Create new task for deviation work
3. Link new task as dependency if blocking
4. Continue with original plan
---
## Examples by Category
### Rule 1: Bug Fixes
| Issue | Fix | Documentation |
|-------|-----|---------------|
| Undefined property access | Add optional chaining | Note in summary |
| SQL injection vulnerability | Use parameterized query | Note + security flag |
| Race condition in async code | Add proper await | Note in summary |
| Incorrect error message | Fix message text | Note in summary |
### Rule 2: Missing Critical
| Gap | Addition | Documentation |
|-----|----------|---------------|
| No input validation | Add Zod/Yup schema | Note in summary |
| No error handling | Add try/catch + logging | Note in summary |
| No auth check | Add middleware | Note in summary |
| No CSRF token | Add csrf protection | Note + security flag |
### Rule 3: Blocking Issues
| Blocker | Resolution | Documentation |
|---------|------------|---------------|
| Missing npm package | npm install | Note in summary |
| Wrong import path | Fix path | Note in summary |
| Missing env var | Add to .env.example | Note in summary |
| TypeScript config issue | Fix tsconfig | Note in summary |
### Rule 4: Architectural (ASK FIRST)
| Change | Why Ask | Question Format |
|--------|---------|-----------------|
| New DB table | Schema is contract | "Need users_sessions table. Create it?" |
| New service | Architectural decision | "Extract auth to separate service?" |
| API contract change | Breaking change | "Change POST /users response format?" |
| New external dep | Maintenance burden | "Add Redis for caching?" |
---
## Integration with Verification
Deviations are inputs to verification:
1. **Verifier loads SUMMARY.md** with deviation list
2. **Bug fixes (Rule 1)** verify the fix doesn't break tests
3. **Critical additions (Rule 2)** verify they're properly integrated
4. **Blocking fixes (Rule 3)** verify build/tests pass
5. **Architectural changes (Rule 4)** verify they match approved design
---
## Escalation Path
If unsure which rule applies:
1. **Default to Rule 4** (ask) rather than making wrong assumption
2. Document uncertainty in deviation notes
3. Include reasoning for why you're asking
```yaml
deviation:
rule: 4
type: uncertain
description: "Adding caching layer to API responses"
reason: "Could be Rule 2 (performance is critical) or Rule 4 (new infrastructure)"
question: "Is Redis caching appropriate here, or should we use in-memory?"
```

434
docs/execution-artifacts.md Normal file
View File

@@ -0,0 +1,434 @@
# Execution Artifacts
Execution produces artifacts that document what happened, enable debugging, and provide context for future work.
## Artifact Types
| Artifact | Created By | Purpose |
|----------|------------|---------|
| PLAN.md | Architect | Executable instructions for a plan |
| SUMMARY.md | Worker | Record of what actually happened |
| VERIFICATION.md | Verifier | Goal-backward verification results |
| UAT.md | Verifier + User | User acceptance testing results |
| STATE.md | All agents | Session state (see [session-state.md](session-state.md)) |
---
## PLAN.md
Plans are **executable prompts**, not documents that transform into prompts.
### Structure
```yaml
---
# Frontmatter
phase: 2
plan: 3
type: execute # execute | tdd
wave: 1
depends_on: [2-2-PLAN]
files_modified:
- src/api/auth/refresh.ts
- src/middleware/auth.ts
- db/migrations/002_refresh_tokens.sql
autonomous: true # false if checkpoints required
must_haves:
observable_truths:
- "Refresh token extends session"
- "Old token invalidated after rotation"
required_artifacts:
- src/api/auth/refresh.ts
required_wiring:
- "refresh endpoint -> token storage"
user_setup: [] # Human prereqs if any
---
# Phase 2, Plan 3: Refresh Token Rotation
## Objective
Implement refresh token rotation to extend user sessions securely while preventing token reuse attacks.
## Context
@file: PROJECT.md (project overview)
@file: 2-CONTEXT.md (phase decisions)
@file: 2-1-SUMMARY.md (prior work)
@file: 2-2-SUMMARY.md (prior work)
## Tasks
### Task 1: Create refresh_tokens table
- **type:** auto
- **files:** db/migrations/002_refresh_tokens.sql, src/db/schema/refreshTokens.ts
- **action:** Create table with: id (uuid), user_id (fk), token_hash (sha256), family (uuid for rotation tracking), expires_at, created_at, revoked_at. Index on token_hash and user_id.
- **verify:** `cw db migrate` succeeds, schema matches
- **done:** Migration applies, drizzle schema matches SQL
### Task 2: Implement rotation endpoint
- **type:** auto
- **files:** src/api/auth/refresh.ts
- **action:** POST /api/auth/refresh accepts refresh token in httpOnly cookie. Validate token exists and not expired. Generate new access + refresh tokens. Store new refresh, revoke old. Set cookies. Return 200 with new access token.
- **verify:** curl with valid refresh cookie returns new tokens
- **done:** Rotation works, old token invalidated
### Task 3: Add token family validation
- **type:** auto
- **files:** src/api/auth/refresh.ts
- **action:** If revoked token reused, revoke entire family (reuse detection). Log security event.
- **verify:** Reusing old token revokes all tokens in family
- **done:** Reuse detection active
## Verification Criteria
- [ ] New refresh token issued on rotation
- [ ] Old refresh token no longer valid
- [ ] Reused token triggers family revocation
- [ ] Access token returned in response
- [ ] Cookies set with correct flags (httpOnly, secure, sameSite)
## Success Criteria
- All tasks complete with passing verify steps
- No TypeScript errors
- Tests cover happy path and reuse detection
```
### Key Elements
| Element | Purpose |
|---------|---------|
| `type: execute\|tdd` | Execution strategy |
| `wave` | Parallelization grouping |
| `depends_on` | Must complete first |
| `files_modified` | Git tracking, conflict detection |
| `autonomous` | Can run without checkpoints |
| `must_haves` | Verification criteria |
| `@file` references | Context to load |
---
## SUMMARY.md
Created after plan execution. Documents what **actually happened**.
### Structure
```yaml
---
phase: 2
plan: 3
subsystem: auth
tags: [jwt, security, tokens]
requires:
- users table
- jose library
provides:
- refresh token rotation
- reuse detection
affects:
- auth flow
- session management
tech_stack:
- jose (JWT)
- drizzle (ORM)
- sqlite
key_files:
- src/api/auth/refresh.ts: "Rotation endpoint"
- src/db/schema/refreshTokens.ts: "Token storage"
decisions:
- "Token family for reuse detection"
- "SHA256 hash for token storage"
metrics:
tasks_completed: 3
tasks_total: 3
deviations: 2
execution_time: "45 minutes"
context_usage: "38%"
---
# Phase 2, Plan 3 Summary: Refresh Token Rotation
## What Was Built
Implemented refresh token rotation with security features:
- Rotation endpoint at POST /api/auth/refresh
- Token storage with family tracking
- Reuse detection that revokes entire token family
## Implementation Notes
### Token Storage
Tokens stored as SHA256 hashes (never plaintext). Family UUID links related tokens for rotation tracking.
### Rotation Flow
1. Receive refresh token in cookie
2. Hash and lookup in database
3. Verify not expired, not revoked
4. Generate new access + refresh tokens
5. Store new refresh with same family
6. Revoke old refresh token
7. Set new cookies, return access token
### Reuse Detection
If a revoked token is presented, the entire family is revoked. This catches scenarios where an attacker captured an old token.
## Deviations
### Rule 2: Added rate limiting
```yaml
deviation:
rule: 2
type: missing_critical
description: "Added rate limiting to refresh endpoint"
location: src/api/auth/refresh.ts:12
reason: "Prevent brute force token guessing"
```
### Rule 1: Fixed async handler
```yaml
deviation:
rule: 1
type: bug_fix
description: "Added await to database query"
location: src/api/auth/refresh.ts:34
reason: "Query returned promise, not result"
```
## Commits
- `feat(2-3): create refresh_tokens table and schema`
- `feat(2-3): implement token rotation endpoint`
- `feat(2-3): add token family reuse detection`
- `fix(2-3): add await to token lookup query`
- `feat(2-3): add rate limiting to refresh endpoint`
## Verification Status
- [x] New refresh token issued on rotation
- [x] Old refresh token invalidated
- [x] Reuse detection works
- [x] Cookies set correctly
- [ ] **Pending human verification:** Cookie flags in production
## Notes for Next Plan
- Rate limiting added; may need tuning based on load
- Token family approach may need cleanup job for old families
```
### What to Include
| Section | Content |
|---------|---------|
| Frontmatter | Metadata for future queries |
| What Was Built | High-level summary |
| Implementation Notes | Technical details worth preserving |
| Deviations | All Rules 1-4 deviations with details |
| Commits | Git commit messages created |
| Verification Status | What passed, what's pending |
| Notes for Next Plan | Context for future work |
---
## VERIFICATION.md
Created by Verifier after phase completion.
### Structure
```yaml
---
phase: 2
status: PASS # PASS | GAPS_FOUND
verified_at: 2024-01-15T10:30:00Z
verified_by: verifier-agent
---
# Phase 2 Verification: JWT Implementation
## Observable Truths
| Truth | Status | Evidence |
|-------|--------|----------|
| User can log in with email/password | VERIFIED | Login endpoint returns tokens, sets cookies |
| Sessions persist across page refresh | VERIFIED | Cookie-based token survives reload |
| Token refresh extends session | VERIFIED | Refresh endpoint issues new tokens |
| Expired tokens rejected | VERIFIED | 401 returned for expired access token |
## Required Artifacts
| Artifact | Status | Check |
|----------|--------|-------|
| src/api/auth/login.ts | EXISTS | Exports login handler |
| src/api/auth/refresh.ts | EXISTS | Exports refresh handler |
| src/middleware/auth.ts | EXISTS | Exports auth middleware |
| db/migrations/002_refresh_tokens.sql | EXISTS | Creates table |
## Required Wiring
| From | To | Status | Evidence |
|------|-----|--------|----------|
| Login handler | Token generation | WIRED | login.ts:45 calls createTokens |
| Auth middleware | Token validation | WIRED | auth.ts:23 calls verifyToken |
| Refresh handler | Token rotation | WIRED | refresh.ts:67 calls rotateToken |
| Protected routes | Auth middleware | WIRED | routes.ts uses auth middleware |
## Anti-Patterns
| Pattern | Found | Location |
|---------|-------|----------|
| TODO comments | NO | - |
| Stub implementations | NO | - |
| Console.log in handlers | YES | src/api/auth/login.ts:34 (debug log) |
| Empty catch blocks | NO | - |
## Human Verification Needed
| Check | Reason |
|-------|--------|
| Cookie flags in production | Requires deployed environment |
| Token timing accuracy | Requires wall-clock testing |
## Gaps Found
None blocking. One console.log should be removed before production.
## Remediation
- Task created: "Remove debug console.log from login handler"
```
---
## UAT.md
User Acceptance Testing results.
### Structure
```yaml
---
phase: 2
tested_by: user
tested_at: 2024-01-15T14:00:00Z
status: PASS # PASS | ISSUES_FOUND
---
# Phase 2 UAT: JWT Implementation
## Test Cases
### 1. Login with email and password
**Prompt:** "Can you log in with your email and password?"
**Result:** PASS
**Notes:** Login successful, redirected to dashboard
### 2. Session persists on refresh
**Prompt:** "Refresh the page. Are you still logged in?"
**Result:** PASS
**Notes:** Still authenticated after refresh
### 3. Logout clears session
**Prompt:** "Click logout. Can you access the dashboard?"
**Result:** PASS
**Notes:** Redirected to login page
### 4. Expired session prompts re-login
**Prompt:** "Wait 15 minutes (or we can simulate). Does the session refresh?"
**Result:** SKIPPED
**Reason:** "User chose to trust token rotation implementation"
## Issues Found
None.
## Sign-Off
User confirms Phase 2 JWT Implementation meets requirements.
Next: Proceed to Phase 3 (OAuth Integration)
```
---
## Artifact Storage
### File Structure
```
.planning/
├── phases/
│ ├── 1/
│ │ ├── 1-CONTEXT.md
│ │ ├── 1-1-PLAN.md
│ │ ├── 1-1-SUMMARY.md
│ │ ├── 1-2-PLAN.md
│ │ ├── 1-2-SUMMARY.md
│ │ └── 1-VERIFICATION.md
│ └── 2/
│ ├── 2-CONTEXT.md
│ ├── 2-1-PLAN.md
│ ├── 2-1-SUMMARY.md
│ ├── 2-2-PLAN.md
│ ├── 2-2-SUMMARY.md
│ ├── 2-3-PLAN.md
│ ├── 2-3-SUMMARY.md
│ ├── 2-VERIFICATION.md
│ └── 2-UAT.md
├── STATE.md
└── config.json
```
### Naming Convention
| Pattern | Meaning |
|---------|---------|
| `{phase}-CONTEXT.md` | Discussion decisions for phase |
| `{phase}-{plan}-PLAN.md` | Executable plan |
| `{phase}-{plan}-SUMMARY.md` | Execution record |
| `{phase}-VERIFICATION.md` | Phase verification |
| `{phase}-UAT.md` | User acceptance testing |
---
## Commit Strategy
Each task produces an atomic commit:
```
{type}({phase}-{plan}): {description}
- Detail 1
- Detail 2
```
### Types
- `feat`: New functionality
- `fix`: Bug fix
- `test`: Test additions
- `refactor`: Code restructuring
- `perf`: Performance improvement
- `docs`: Documentation
- `style`: Formatting only
- `chore`: Maintenance
### Examples
```
feat(2-3): implement refresh token rotation
- Add refresh_tokens table with family tracking
- Implement rotation endpoint at POST /api/auth/refresh
- Add reuse detection with family revocation
fix(2-3): add await to token lookup query
- Token lookup was returning promise instead of result
- Added proper await in refresh handler
feat(2-3): add rate limiting to refresh endpoint
- [Deviation Rule 2] Added express-rate-limit
- 10 requests per minute per IP
- Prevents brute force token guessing
```
### Metadata Commit
After plan completion:
```
chore(2-3): complete plan execution
Artifacts:
- 2-3-SUMMARY.md created
- STATE.md updated
- 3 tasks completed, 2 deviations handled
```

520
docs/initiatives.md Normal file
View File

@@ -0,0 +1,520 @@
# Initiatives Module
Initiatives are the planning layer for larger features. They provide a Notion-like document hierarchy for capturing context, decisions, and requirements before work begins. Once approved, initiatives generate phased task plans that agents execute.
## Design Philosophy
### Why Initiatives?
Tasks are atomic work units—great for execution but too granular for planning. Initiatives bridge the gap:
- **Before approval**: A living document where user and Architect refine the vision
- **After approval**: A persistent knowledge base that tasks link back to
- **Forever**: Context for future work ("why did we build it this way?")
### Notion-Like Structure
Initiatives aren't flat documents. They're hierarchical pages:
```
Initiative: User Authentication
├── User Journeys
│ ├── Sign Up Flow
│ └── Password Reset Flow
├── Business Rules
│ └── Password Requirements
├── Technical Concept
│ ├── JWT Strategy
│ └── Session Management
└── Architectural Changes
└── Auth Middleware
```
Each "page" is a record in SQLite with parent-child relationships. This enables:
- Structured queries: "Give me all subpages of initiative X"
- Inventory views: "List all technical concepts across initiatives"
- Cross-references: Link between pages
---
## Data Model
### Initiative Entity
| Field | Type | Description |
|-------|------|-------------|
| `id` | TEXT | Primary key (e.g., `init-a1b2c3`) |
| `project_id` | TEXT | Scopes to a project (most initiatives are single-project) |
| `title` | TEXT | Initiative name |
| `status` | TEXT | `draft`, `review`, `approved`, `in_progress`, `completed`, `rejected` |
| `created_by` | TEXT | User who created it |
| `created_at` | INTEGER | Unix timestamp |
| `updated_at` | INTEGER | Unix timestamp |
| `approved_at` | INTEGER | When approved (null if not approved) |
| `approved_by` | TEXT | Who approved it |
### Initiative Page Entity
| Field | Type | Description |
|-------|------|-------------|
| `id` | TEXT | Primary key (e.g., `page-x1y2z3`) |
| `initiative_id` | TEXT | Parent initiative |
| `parent_page_id` | TEXT | Parent page (null for root-level pages) |
| `type` | TEXT | `user_journey`, `business_rule`, `technical_concept`, `architectural_change`, `note`, `custom` |
| `title` | TEXT | Page title |
| `content` | TEXT | Markdown content |
| `sort_order` | INTEGER | Display order among siblings |
| `created_at` | INTEGER | Unix timestamp |
| `updated_at` | INTEGER | Unix timestamp |
### Initiative Phase Entity
Phases group tasks for staged execution and rolling approval.
| Field | Type | Description |
|-------|------|-------------|
| `id` | TEXT | Primary key (e.g., `phase-p1q2r3`) |
| `initiative_id` | TEXT | Parent initiative |
| `number` | INTEGER | Phase number (1, 2, 3...) |
| `name` | TEXT | Phase name |
| `description` | TEXT | What this phase delivers |
| `status` | TEXT | `draft`, `pending_approval`, `approved`, `in_progress`, `completed` |
| `approved_at` | INTEGER | When approved |
| `approved_by` | TEXT | Who approved |
| `created_at` | INTEGER | Unix timestamp |
### Task Link
Tasks reference their initiative and phase:
```sql
-- In tasks table (see docs/tasks.md)
initiative_id TEXT REFERENCES initiatives(id),
phase_id TEXT REFERENCES initiative_phases(id),
```
---
## SQLite Schema
```sql
CREATE TABLE initiatives (
id TEXT PRIMARY KEY,
project_id TEXT,
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'draft'
CHECK (status IN ('draft', 'review', 'approved', 'in_progress', 'completed', 'rejected')),
created_by TEXT,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
approved_at INTEGER,
approved_by TEXT
);
CREATE TABLE initiative_pages (
id TEXT PRIMARY KEY,
initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
parent_page_id TEXT REFERENCES initiative_pages(id) ON DELETE CASCADE,
type TEXT NOT NULL DEFAULT 'note'
CHECK (type IN ('user_journey', 'business_rule', 'technical_concept', 'architectural_change', 'note', 'custom')),
title TEXT NOT NULL,
content TEXT,
sort_order INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
updated_at INTEGER NOT NULL DEFAULT (unixepoch())
);
CREATE TABLE initiative_phases (
id TEXT PRIMARY KEY,
initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
number INTEGER NOT NULL,
name TEXT NOT NULL,
description TEXT,
status TEXT NOT NULL DEFAULT 'draft'
CHECK (status IN ('draft', 'pending_approval', 'approved', 'in_progress', 'completed')),
approved_at INTEGER,
approved_by TEXT,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
UNIQUE(initiative_id, number)
);
CREATE INDEX idx_initiatives_project ON initiatives(project_id);
CREATE INDEX idx_initiatives_status ON initiatives(status);
CREATE INDEX idx_pages_initiative ON initiative_pages(initiative_id);
CREATE INDEX idx_pages_parent ON initiative_pages(parent_page_id);
CREATE INDEX idx_pages_type ON initiative_pages(type);
CREATE INDEX idx_phases_initiative ON initiative_phases(initiative_id);
CREATE INDEX idx_phases_status ON initiative_phases(status);
-- Useful views
CREATE VIEW initiative_page_tree AS
WITH RECURSIVE tree AS (
SELECT id, initiative_id, parent_page_id, title, type, 0 as depth,
title as path
FROM initiative_pages WHERE parent_page_id IS NULL
UNION ALL
SELECT p.id, p.initiative_id, p.parent_page_id, p.title, p.type, t.depth + 1,
t.path || ' > ' || p.title
FROM initiative_pages p
JOIN tree t ON p.parent_page_id = t.id
)
SELECT * FROM tree ORDER BY path;
```
---
## Status Workflow
### Initiative Status
```
[draft] ──submit──▶ [review] ──approve──▶ [approved]
│ │ │
│ │ reject │ start work
│ ▼ ▼
│ [rejected] [in_progress]
│ │
│ │ all phases done
└──────────────────────────────────────────▶ [completed]
```
| Status | Meaning |
|--------|---------|
| `draft` | User/Architect still refining |
| `review` | Ready for approval decision |
| `approved` | Work plan created, awaiting execution |
| `in_progress` | At least one phase executing |
| `completed` | All phases completed |
| `rejected` | Won't implement |
### Phase Status
```
[draft] ──finalize──▶ [pending_approval] ──approve──▶ [approved]
│ claim tasks
[in_progress]
│ all tasks closed
[completed]
```
**Rolling approval pattern:**
1. Architect creates work plan with multiple phases
2. User approves Phase 1 → agents start executing
3. While Phase 1 executes, user reviews Phase 2
4. Phase 2 approved → agents can start when ready
5. Continue until all phases approved/completed
This prevents blocking: agents don't wait for all phases to be approved upfront.
---
## Workflow
### 1. Draft Initiative
User creates initiative with basic vision:
```
cw initiative create "User Authentication"
```
System creates initiative in `draft` status with empty page structure.
### 2. Architect Iteration (Questioning)
Architect agent engages in structured questioning to capture requirements:
**Question Categories:**
| Category | Example Questions |
|----------|-------------------|
| **Visual Features** | Layout approach? Density? Interactions? Empty states? |
| **APIs/CLIs** | Response format? Flags? Error handling? Verbosity? |
| **Data/Content** | Structure? Validation rules? Edge cases? |
| **Architecture** | Patterns to follow? What to avoid? Reference code? |
Each answer populates initiative pages. Architect may:
- Create user journey pages
- Document business rules
- Draft technical concepts
- Flag architectural impacts
See [agents/architect.md](agents/architect.md) for the full Architect agent prompt.
### 3. Discussion Phase (Per Phase)
Before planning each phase, the Architect captures implementation decisions through focused discussion. This happens BEFORE any planning work.
```
cw phase discuss <phase-id>
```
Creates `{phase}-CONTEXT.md` with locked decisions:
```yaml
---
phase: 1
discussed_at: 2024-01-15
---
# Phase 1 Context: User Authentication
## Decisions
### Authentication Method
**Decision:** Email/password with optional OAuth
**Reason:** MVP needs simple auth, OAuth for convenience
**Locked:** true
### Token Storage
**Decision:** httpOnly cookies
**Reason:** XSS protection
**Alternatives Rejected:**
- localStorage: XSS vulnerable
```
These decisions guide all subsequent planning and execution. Workers reference CONTEXT.md for implementation direction.
### 4. Research Phase (Optional)
For phases with unknowns, run discovery before planning:
| Level | When | Time | Scope |
|-------|------|------|-------|
| L0 | Pure internal work | Skip | None |
| L1 | Quick verification | 2-5 min | Confirm assumptions |
| L2 | Standard research | 15-30 min | Explore patterns |
| L3 | Deep dive | 1+ hour | Novel domain |
```
cw phase research <phase-id> --level 2
```
Creates `{phase}-RESEARCH.md` with findings that inform planning.
### 5. Submit for Review
When Architect and user are satisfied:
```
cw initiative submit <id>
```
Status changes to `review`. Triggers notification for approval.
### 4. Approve Initiative
Human reviews the complete initiative:
```
cw initiative approve <id>
```
Status changes to `approved`. Now work plan can be created.
### 5. Create Work Plan
Architect (or user) breaks initiative into phases:
```
cw initiative plan <id>
```
This creates:
- `initiative_phases` records
- Tasks linked to each phase via `initiative_id` + `phase_id`
Tasks are created in `open` status but won't be "ready" until their phase is approved.
### 6. Approve Phases (Rolling)
User reviews and approves phases one at a time:
```
cw phase approve <phase-id>
```
Approved phases make their tasks "ready" for agents. User can approve Phase 1, let agents work, then approve Phase 2 later.
### 7. Execute
Workers pull tasks via `cw task ready`. Tasks include:
- Link to initiative for context
- Link to phase for grouping
- All normal task fields (dependencies, priority, etc.)
### 8. Verify Phase
After all tasks in a phase complete, the Verifier agent runs goal-backward verification:
```
cw phase verify <phase-id>
```
Verification checks:
1. **Observable truths** — What users can observe when goal is achieved
2. **Required artifacts** — Files that must exist (not stubs)
3. **Required wiring** — Connections that must work
4. **Anti-patterns** — TODOs, placeholders, empty returns
Creates `{phase}-VERIFICATION.md` with results. If gaps found, creates remediation tasks.
See [verification.md](verification.md) for detailed verification patterns.
### 9. User Acceptance Testing
After technical verification passes, run UAT:
```
cw phase uat <phase-id>
```
Walks user through testable deliverables:
- "Can you log in with email and password?"
- "Does the dashboard show your projects?"
Creates `{phase}-UAT.md` with results. If issues found, creates targeted fix plans.
### 10. Complete
When all tasks in all phases are closed AND verification passes:
- Each phase auto-transitions to `completed`
- Initiative auto-transitions to `completed`
- Domain layer updated to reflect new state
---
## Phase Artifacts
Each phase produces artifacts during execution:
| Artifact | Created By | Purpose |
|----------|------------|---------|
| `{phase}-CONTEXT.md` | Architect (Discussion) | Locked implementation decisions |
| `{phase}-RESEARCH.md` | Architect (Research) | Domain knowledge findings |
| `{phase}-{N}-PLAN.md` | Architect (Planning) | Executable task plans |
| `{phase}-{N}-SUMMARY.md` | Worker (Execution) | What actually happened |
| `{phase}-VERIFICATION.md` | Verifier | Goal-backward verification |
| `{phase}-UAT.md` | Verifier + User | User acceptance testing |
See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
---
## CLI Reference
### Initiative Commands
| Command | Description |
|---------|-------------|
| `cw initiative create <title>` | Create draft initiative |
| `cw initiative list [--status STATUS]` | List initiatives |
| `cw initiative show <id>` | Show initiative with page tree |
| `cw initiative submit <id>` | Submit for review |
| `cw initiative approve <id>` | Approve initiative |
| `cw initiative reject <id> --reason "..."` | Reject initiative |
| `cw initiative plan <id>` | Generate phased work plan |
### Page Commands
| Command | Description |
|---------|-------------|
| `cw page create <initiative-id> <title> --type TYPE` | Create page |
| `cw page create <initiative-id> <title> --parent <page-id>` | Create subpage |
| `cw page show <id>` | Show page content |
| `cw page edit <id>` | Edit page (opens editor) |
| `cw page list <initiative-id> [--type TYPE]` | List pages |
| `cw page tree <initiative-id>` | Show page hierarchy |
### Phase Commands
| Command | Description |
|---------|-------------|
| `cw phase list <initiative-id>` | List phases |
| `cw phase show <id>` | Show phase with tasks |
| `cw phase discuss <id>` | Capture implementation decisions (creates CONTEXT.md) |
| `cw phase research <id> [--level N]` | Run discovery (L0-L3, creates RESEARCH.md) |
| `cw phase approve <id>` | Approve phase for execution |
| `cw phase verify <id>` | Run goal-backward verification |
| `cw phase uat <id>` | Run user acceptance testing |
| `cw phase status <id>` | Check phase progress |
---
## Integration Points
### With Tasks Module
Tasks gain two new fields:
- `initiative_id`: Links task to initiative (for context)
- `phase_id`: Links task to phase (for grouping/approval)
The `ready_tasks` view should consider phase approval:
```sql
CREATE VIEW ready_tasks AS
SELECT t.* FROM tasks t
LEFT JOIN initiative_phases p ON t.phase_id = p.id
WHERE t.status = 'open'
AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
AND NOT EXISTS (
SELECT 1 FROM task_dependencies d
JOIN tasks dep ON d.depends_on = dep.id
WHERE d.task_id = t.id
AND d.type = 'blocks'
AND dep.status != 'closed'
)
ORDER BY t.priority ASC, t.created_at ASC;
```
### With Domain Layer
When initiative completes, its pages can feed into domain documentation:
- Business rules → Domain business rules
- Technical concepts → Architecture docs
- New aggregates → Domain model updates
### With Orchestrator
Orchestrator can:
- Trigger Architect agents for initiative iteration
- Monitor phase completion and auto-advance initiative status
- Coordinate approval notifications
### tRPC Procedures
```typescript
// Suggested tRPC router shape
initiative.create(input) // → Initiative
initiative.list(filters) // → Initiative[]
initiative.get(id) // → Initiative with pages
initiative.submit(id) // → Initiative
initiative.approve(id) // → Initiative
initiative.reject(id, reason) // → Initiative
initiative.plan(id) // → Phase[]
page.create(input) // → Page
page.get(id) // → Page
page.update(id, content) // → Page
page.list(initiativeId, filters) // → Page[]
page.tree(initiativeId) // → PageTree
phase.list(initiativeId) // → Phase[]
phase.get(id) // → Phase with tasks
phase.approve(id) // → Phase
phase.status(id) // → PhaseStatus
```
---
## Future Considerations
- **Templates**: Pre-built page structures for common initiative types
- **Cross-project initiatives**: Single initiative spanning multiple projects
- **Versioning**: Track changes to initiative pages over time
- **Approval workflows**: Multi-step approval with different approvers
- **Auto-planning**: LLM generates work plan from initiative content

64
docs/logging.md Normal file
View File

@@ -0,0 +1,64 @@
# Structured Logging
Codewalk District uses [pino](https://getpino.io/) for structured JSON logging on the backend.
## Architecture
- **pino** writes structured JSON to **stderr** so CLI user output on stdout stays clean
- **console.log** remains for CLI command handlers (user-facing output on stdout)
- The `src/logging/` module (ProcessLogWriter/LogManager) is a separate concern — it captures per-agent process stdout/stderr to files
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `CW_LOG_LEVEL` | Log level override (`fatal`, `error`, `warn`, `info`, `debug`, `trace`, `silent`) | `info` (production), `debug` (development) |
| `CW_LOG_PRETTY` | Set to `1` for human-readable colorized output via pino-pretty | unset (JSON output) |
## Log Levels
| Level | Usage |
|-------|-------|
| `fatal` | Process will exit (uncaught exceptions, DB migration failure) |
| `error` | Operation failed (agent crash, parse failure, clone failure) |
| `warn` | Degraded (account exhausted, no accounts available, stale PID, reconcile marking crashed) |
| `info` | State transitions (agent spawned/stopped/resumed, dispatch decision, server started, account selected/switched) |
| `debug` | Implementation details (command being built, session ID extraction, worktree paths, schema selection) |
## Adding Logging to a New Module
```typescript
import { createModuleLogger } from '../logger/index.js';
const log = createModuleLogger('my-module');
// Use structured data as first arg, message as second
log.info({ taskId, agentId }, 'task dispatched');
log.error({ err: error }, 'operation failed');
log.debug({ path, count }, 'processing items');
```
## Module Names
| Module | Used in |
|--------|---------|
| `agent-manager` | `src/agent/manager.ts` |
| `dispatch` | `src/dispatch/manager.ts` |
| `http` | `src/server/index.ts` |
| `server` | `src/cli/index.ts` (startup) |
| `git` | `src/git/manager.ts`, `src/git/clone.ts`, `src/git/project-clones.ts` |
| `db` | `src/db/ensure-schema.ts` |
## Testing
Logs are silenced in tests via `CW_LOG_LEVEL=silent` in `vitest.config.ts`.
## Quick Start
```sh
# Pretty logs during development
CW_LOG_LEVEL=debug CW_LOG_PRETTY=1 cw --server
# JSON logs for production/piping
cw --server 2>server.log
```

267
docs/model-profiles.md Normal file
View File

@@ -0,0 +1,267 @@
# Model Profiles
Different agent roles have different needs. Model selection balances quality, cost, and latency.
## Profile Definitions
| Profile | Use Case | Cost | Quality |
|---------|----------|------|---------|
| **quality** | Critical decisions, architecture | Highest | Best |
| **balanced** | Default for most work | Medium | Good |
| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
---
## Agent Model Assignments
| Agent | Quality | Balanced (Default) | Budget |
|-------|---------|-------------------|--------|
| **Architect** | Opus | Opus | Sonnet |
| **Worker** | Opus | Sonnet | Sonnet |
| **Verifier** | Sonnet | Sonnet | Haiku |
| **Orchestrator** | Sonnet | Sonnet | Haiku |
| **Monitor** | Sonnet | Haiku | Haiku |
| **Researcher** | Opus | Sonnet | Haiku |
---
## Rationale
### Architect (Planning) - Opus/Opus/Sonnet
Planning has the highest impact on outcomes. A bad plan wastes all downstream execution. Invest in quality here.
**Quality profile:** Complex systems, novel domains, critical decisions
**Balanced profile:** Standard feature work, established patterns
**Budget profile:** Simple initiatives, well-documented domains
### Worker (Execution) - Opus/Sonnet/Sonnet
The plan already contains reasoning. Execution is implementation, not decision-making.
**Quality profile:** Complex algorithms, security-critical code
**Balanced profile:** Standard implementation work
**Budget profile:** Simple tasks, boilerplate code
### Verifier (Validation) - Sonnet/Sonnet/Haiku
Verification is structured checking against defined criteria. Less reasoning needed than planning.
**Quality profile:** Complex verification, subtle integration issues
**Balanced profile:** Standard goal-backward verification
**Budget profile:** Simple pass/fail checks
### Orchestrator (Coordination) - Sonnet/Sonnet/Haiku
Orchestrator routes work, doesn't do heavy lifting. Needs reliability, not creativity.
**Quality profile:** Complex multi-agent coordination
**Balanced profile:** Standard workflow management
**Budget profile:** Simple task routing
### Monitor (Observation) - Sonnet/Haiku/Haiku
Monitoring is pattern matching and threshold checking. Minimal reasoning required.
**Quality profile:** Complex health analysis
**Balanced profile:** Standard monitoring
**Budget profile:** Simple heartbeat checks
### Researcher (Discovery) - Opus/Sonnet/Haiku
Research is read-only exploration. High volume, low modification risk.
**Quality profile:** Deep domain analysis
**Balanced profile:** Standard codebase exploration
**Budget profile:** Simple file lookups
---
## Profile Selection
### Per-Initiative Override
```yaml
# In initiative config
model_profile: quality # Override default balanced
```
### Per-Agent Override
```yaml
# In task assignment
assigned_to: worker-123
model_override: opus # This task needs Opus
```
### Automatic Escalation
```yaml
# When to auto-escalate
escalation_triggers:
- condition: "task.retry_count > 2"
action: "escalate_model"
- condition: "task.complexity == 'high'"
action: "use_quality_profile"
- condition: "deviation.rule == 4"
action: "escalate_model"
```
---
## Cost Management
### Estimated Token Usage
| Agent | Avg Tokens/Task | Profile Impact |
|-------|-----------------|----------------|
| Architect | 50k-100k | 3x between budget/quality |
| Worker | 20k-50k | 2x between budget/quality |
| Verifier | 10k-30k | 1.5x between budget/quality |
| Orchestrator | 5k-15k | 1.5x between budget/quality |
### Cost Optimization Strategies
1. **Right-size tasks:** Smaller tasks = less token usage
2. **Use budget for volume:** Monitoring, simple checks
3. **Reserve quality for impact:** Architecture, security
4. **Profile per initiative:** Simple features use budget, complex use quality
---
## Configuration
### Default Profile
```json
// .planning/config.json
{
"model_profile": "balanced",
"model_overrides": {
"architect": null,
"worker": null,
"verifier": null
}
}
```
### Quality Profile
```json
{
"model_profile": "quality",
"model_overrides": {}
}
```
### Budget Profile
```json
{
"model_profile": "budget",
"model_overrides": {
"architect": "sonnet" // Keep architect at sonnet minimum
}
}
```
### Mixed Profile
```json
{
"model_profile": "balanced",
"model_overrides": {
"architect": "opus", // Invest in planning
"worker": "sonnet", // Standard execution
"verifier": "haiku" // Budget verification
}
}
```
---
## Model Capabilities Reference
### Opus
- **Strengths:** Complex reasoning, nuanced decisions, novel problems
- **Best for:** Architecture, complex algorithms, security analysis
- **Cost:** Highest
### Sonnet
- **Strengths:** Good balance of reasoning and speed, reliable
- **Best for:** Standard development, code generation, debugging
- **Cost:** Medium
### Haiku
- **Strengths:** Fast, cheap, good for structured tasks
- **Best for:** Monitoring, simple checks, high-volume operations
- **Cost:** Lowest
---
## Profile Switching
### CLI Command
```bash
# Set profile for all future work
cw config set model_profile quality
# Set profile for specific initiative
cw initiative config <id> --model-profile budget
# Override for single task
cw task update <id> --model-override opus
```
### API
```typescript
// Set initiative profile
await initiative.setConfig(id, { modelProfile: 'quality' });
// Override task model
await task.update(id, { modelOverride: 'opus' });
```
---
## Monitoring Model Usage
Track model usage for cost analysis:
```sql
CREATE TABLE model_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_type TEXT NOT NULL,
model TEXT NOT NULL,
tokens_input INTEGER,
tokens_output INTEGER,
task_id TEXT,
initiative_id TEXT,
created_at INTEGER DEFAULT (unixepoch())
);
-- Usage by agent type
SELECT agent_type, model, SUM(tokens_input + tokens_output) as total_tokens
FROM model_usage
GROUP BY agent_type, model;
-- Cost by initiative
SELECT initiative_id,
SUM(CASE WHEN model = 'opus' THEN tokens * 0.015
WHEN model = 'sonnet' THEN tokens * 0.003
WHEN model = 'haiku' THEN tokens * 0.0003 END) as estimated_cost
FROM model_usage
GROUP BY initiative_id;
```
---
## Recommendations
### Starting Out
Use **balanced** profile. It provides good quality at reasonable cost.
### High-Stakes Projects
Use **quality** profile. The cost difference is negligible compared to getting it right.
### High-Volume Work
Use **budget** profile with architect override to sonnet. Don't skimp on planning.
### Learning the System
Use **quality** profile initially. See what good output looks like before optimizing for cost.

402
docs/session-state.md Normal file
View File

@@ -0,0 +1,402 @@
# Session State
Session state tracks position, decisions, and blockers across agent restarts. Unlike the Domain Layer (which tracks codebase state), session state tracks **execution state**.
## STATE.md
Every active initiative maintains a STATE.md file tracking execution progress:
```yaml
# STATE.md
initiative: init-abc123
title: User Authentication
# Current Position
position:
phase: 2
phase_name: "JWT Implementation"
plan: 3
plan_name: "Refresh Token Rotation"
task: "Implement token rotation endpoint"
wave: 1
status: in_progress
# Progress Tracking
progress:
phases_total: 4
phases_completed: 1
current_phase_tasks: 8
current_phase_completed: 5
bar: "████████░░░░░░░░ 50%"
# Decisions Made
decisions:
- date: 2024-01-14
context: "Token storage strategy"
decision: "httpOnly cookie, not localStorage"
reason: "XSS protection, automatic inclusion in requests"
- date: 2024-01-14
context: "JWT library"
decision: "jose over jsonwebtoken"
reason: "Better TypeScript support, Web Crypto API"
- date: 2024-01-15
context: "Refresh token lifetime"
decision: "7 days"
reason: "Balance between security and UX"
# Active Blockers
blockers:
- id: block-001
description: "Waiting for OAuth credentials from client"
blocked_since: 2024-01-15
affects: ["Phase 3: OAuth Integration"]
workaround: "Proceeding with email/password auth first"
# Session History
sessions:
- id: session-001
started: 2024-01-14T09:00:00Z
ended: 2024-01-14T17:00:00Z
completed: ["Phase 1: Database Schema", "Phase 2 Tasks 1-3"]
- id: session-002
started: 2024-01-15T09:00:00Z
status: active
working_on: "Phase 2, Task 4: Refresh token rotation"
# Next Action
next_action: |
Continue implementing refresh token rotation endpoint.
After completion, run verification for Phase 2.
If Phase 2 passes, move to Phase 3 (blocked pending OAuth creds).
# Context for Resume
resume_context:
files_modified_this_session:
- src/api/auth/refresh.ts
- src/middleware/auth.ts
- db/migrations/002_refresh_tokens.sql
key_implementations:
- "Refresh tokens stored in SQLite with expiry"
- "Rotation creates new token, invalidates old"
- "Token family tracking for reuse detection"
open_questions: []
```
---
## State Updates
### When to Update STATE.md
| Event | Update |
|-------|--------|
| Task started | `position.task`, `position.status` |
| Task completed | `progress.*`, `position` to next task |
| Decision made | Add to `decisions` |
| Blocker encountered | Add to `blockers` |
| Blocker resolved | Remove from `blockers` |
| Session start | Add to `sessions` |
| Session end | Update session `ended`, `completed` |
| Phase completed | `progress.phases_completed`, reset task counters |
### Atomic Updates
```typescript
// Update position atomically
await updateState({
position: {
phase: 2,
plan: 3,
task: "Implement token rotation",
wave: 1,
status: "in_progress"
}
});
// Add decision
await addDecision({
context: "Token storage",
decision: "httpOnly cookie",
reason: "XSS protection"
});
// Record blocker
await addBlocker({
description: "Waiting for OAuth creds",
affects: ["Phase 3"]
});
```
---
## Resume Protocol
When resuming work:
### 1. Load STATE.md
```
Read STATE.md for initiative
Extract: position, decisions, blockers, resume_context
```
### 2. Load Relevant Context
```
If position.plan exists:
Load {phase}-{plan}-PLAN.md
Load prior SUMMARY.md files for this phase
If position.task exists:
Find task in current plan
Resume from that task
```
### 3. Verify State
```
Check files_modified_this_session still exist
Check implementations match key_implementations
If mismatch: flag for review before proceeding
```
### 4. Continue Execution
```
Display: "Resuming from Phase {N}, Plan {M}, Task: {name}"
Display: decisions made (for context)
Display: active blockers (for awareness)
Continue with task execution
```
---
## Decision Tracking
Decisions are first-class citizens, not comments.
### What to Track
| Type | Example | Why Track |
|------|---------|-----------|
| Technology choice | "Using jose for JWT" | Prevents second-guessing |
| Architecture decision | "Separate auth service" | Documents reasoning |
| Trade-off resolution | "Speed over features" | Explains constraints |
| User preference | "Dark mode default" | Preserves intent |
| Constraint discovered | "API rate limited to 100/min" | Prevents repeated discovery |
### Decision Format
```yaml
decisions:
- date: 2024-01-15
context: "Where the decision was needed"
decision: "What was decided"
reason: "Why this choice"
alternatives_considered:
- "Alternative A: rejected because..."
- "Alternative B: rejected because..."
reversible: true|false
```
---
## Blocker Management
### Blocker States
```
[new] ──identify──▶ [active] ──resolve──▶ [resolved]
│ workaround
[bypassed]
```
### Blocker Format
```yaml
blockers:
- id: block-001
status: active
description: "Need production API keys"
identified_at: 2024-01-15T10:00:00Z
affects:
- "Phase 4: Production deployment"
- "Phase 5: Monitoring setup"
blocked_tasks:
- task-xyz: "Configure production environment"
workaround: null
resolution: null
- id: block-002
status: bypassed
description: "Design mockups not ready"
identified_at: 2024-01-14T09:00:00Z
affects: ["UI implementation"]
workaround: "Using placeholder styles, will refine later"
workaround_tasks:
- task-abc: "Apply final styles when mockups ready"
```
### Blocker Impact on Execution
1. **Task Blocking:** Task marked `blocked` in tasks table
2. **Phase Blocking:** If all remaining tasks blocked, phase paused
3. **Initiative Blocking:** If all phases blocked, escalate to user
---
## Session History
Track work sessions for debugging and handoffs:
```yaml
sessions:
- id: session-001
agent: worker-abc
started: 2024-01-14T09:00:00Z
ended: 2024-01-14T12:30:00Z
context_usage: "45%"
completed:
- "Phase 1, Plan 1: Database setup"
- "Phase 1, Plan 2: User model"
notes: "Clean execution, no issues"
- id: session-002
agent: worker-def
started: 2024-01-14T13:00:00Z
ended: 2024-01-14T17:00:00Z
context_usage: "62%"
completed:
- "Phase 1, Plan 3: Auth endpoints"
issues:
- "Context exceeded 50%, quality may have degraded"
- "Encountered blocker: missing env vars"
handoff_reason: "Context limit reached"
```
---
## Storage Options
### SQLite (Recommended for Codewalk)
```sql
CREATE TABLE initiative_state (
initiative_id TEXT PRIMARY KEY REFERENCES initiatives(id),
current_phase INTEGER,
current_plan INTEGER,
current_task TEXT,
current_wave INTEGER,
status TEXT,
progress_json TEXT,
updated_at INTEGER
);
CREATE TABLE initiative_decisions (
id TEXT PRIMARY KEY,
initiative_id TEXT REFERENCES initiatives(id),
date INTEGER,
context TEXT,
decision TEXT,
reason TEXT,
alternatives_json TEXT,
reversible BOOLEAN
);
CREATE TABLE initiative_blockers (
id TEXT PRIMARY KEY,
initiative_id TEXT REFERENCES initiatives(id),
status TEXT CHECK (status IN ('active', 'bypassed', 'resolved')),
description TEXT,
identified_at INTEGER,
affects_json TEXT,
workaround TEXT,
resolution TEXT,
resolved_at INTEGER
);
CREATE TABLE session_history (
id TEXT PRIMARY KEY,
initiative_id TEXT REFERENCES initiatives(id),
agent_id TEXT,
started_at INTEGER,
ended_at INTEGER,
context_usage REAL,
completed_json TEXT,
issues_json TEXT,
handoff_reason TEXT
);
```
### File-Based (Alternative)
```
.planning/
├── STATE.md # Current state
├── decisions/
│ └── 2024-01-15-jwt-library.md
├── blockers/
│ └── block-001-oauth-creds.md
└── sessions/
├── session-001.md
└── session-002.md
```
---
## Integration with Agents
### Worker
- Reads STATE.md at start
- Updates position on task transitions
- Adds deviations to session notes
- Updates progress counters
### Architect
- Creates initial STATE.md when planning
- Sets up phase/plan structure
- Documents initial decisions
### Orchestrator
- Monitors blocker status
- Triggers resume when blockers resolve
- Coordinates session handoffs
### Verifier
- Reads decisions for verification context
- Updates state with verification results
- Flags issues for resolution
---
## Example: Resume After Crash
```
1. Agent crashes mid-task
2. Supervisor detects stale assignment
- Task assigned_at > 30min ago
- No progress updates
3. Supervisor resets task
- Status back to 'open'
- Clear assigned_to
4. New agent picks up task
- Reads STATE.md
- Sees: "Last working on: Refresh token rotation"
- Loads relevant PLAN.md
- Resumes execution
5. STATE.md shows continuity
sessions:
- id: session-003
status: crashed
notes: "Agent unresponsive, task reset"
- id: session-004
status: active
notes: "Resuming from session-003 crash"
```

309
docs/task-granularity.md Normal file
View File

@@ -0,0 +1,309 @@
# Task Granularity Standards
A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results and rework.
## The Granularity Test
Ask: **Can an agent execute this task without making assumptions?**
If the answer requires "it depends" or "probably means", the task is too vague.
---
## Comparison Table
| Too Vague | Just Right |
|-----------|------------|
| "Add authentication" | "Add JWT auth with refresh rotation using jose library, store in httpOnly cookie, 15min access / 7day refresh" |
| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name length 3-50 chars, returns 201 with project object" |
| "Style the dashboard" | "Add Tailwind classes to Dashboard.tsx: grid layout (3 cols on lg, 1 on mobile), card shadows, hover states on action buttons" |
| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner on client" |
| "Add form validation" | "Add Zod schema to CreateProjectForm: name (3-50 chars, alphanumeric), description (optional, max 500 chars), show inline errors" |
| "Improve performance" | "Add React.memo to ProjectCard, useMemo for filtered list in Dashboard, lazy load ProjectDetails route" |
| "Fix the login bug" | "Fix login redirect loop: after successful login in auth.ts:45, redirect to stored returnUrl instead of always '/' " |
| "Set up the database" | "Create SQLite database at data/cw.db with migrations in db/migrations/, run via 'cw db migrate'" |
---
## Required Task Components
Every task MUST include:
### 1. Files
Exact paths that will be created or modified.
```yaml
files:
- src/components/Chat.tsx # create
- src/hooks/useChat.ts # create
- src/api/messages.ts # modify
```
### 2. Action
What to do, what to avoid, and WHY.
```yaml
action: |
Create Chat component with:
- Message list (virtualized for performance)
- Input field with send button
- Auto-scroll to bottom on new message
DO NOT:
- Implement WebSocket (separate task)
- Add typing indicators (Phase 2)
WHY: Core chat UI needed before real-time features
```
### 3. Verify
Command or check to prove completion.
```yaml
verify:
- command: "npm run typecheck"
expect: "No type errors"
- command: "npm run test -- Chat.test.tsx"
expect: "Tests pass"
- manual: "Navigate to /chat, see empty message list and input"
```
### 4. Done
Measurable acceptance criteria.
```yaml
done:
- "Chat component renders without errors"
- "Input accepts text and clears on submit"
- "Messages display in chronological order"
- "Tests cover send and display functionality"
```
---
## Task Types
### Type: auto
Agent executes autonomously.
```yaml
type: auto
files: [src/components/Button.tsx]
action: "Create Button component with primary/secondary variants using Tailwind"
verify: "npm run typecheck && npm run test"
done: "Button renders with correct styles for each variant"
```
### Type: checkpoint:human-verify
Agent completes, human confirms.
```yaml
type: checkpoint:human-verify
files: [src/pages/Dashboard.tsx]
action: "Implement dashboard layout with project cards"
verify: "Navigate to /dashboard after login"
prompt: "Does the dashboard match the design mockup?"
done: "User confirms layout is correct"
```
### Type: checkpoint:decision
Human makes choice that affects implementation.
```yaml
type: checkpoint:decision
prompt: "Which chart library should we use?"
options:
- recharts: "React-native, good for simple charts"
- d3: "More powerful, steeper learning curve"
- chart.js: "Lightweight, canvas-based"
affects: "All subsequent charting tasks"
```
### Type: checkpoint:human-action
Unavoidable manual step.
```yaml
type: checkpoint:human-action
prompt: "Please click the verification link sent to your email"
reason: "Cannot automate email client interaction"
continue_after: "User confirms email verified"
```
---
## Time Estimation
Tasks should fit within context budgets:
| Complexity | Context % | Wall Time | Example |
|------------|-----------|-----------|---------|
| Trivial | 5-10% | 2-5 min | Add a CSS class |
| Simple | 10-20% | 5-15 min | Add form field |
| Medium | 20-35% | 15-30 min | Create API endpoint |
| Complex | 35-50% | 30-60 min | Implement auth flow |
| Too Large | >50% | - | **SPLIT REQUIRED** |
---
## Splitting Large Tasks
When a task exceeds 50% context estimate, decompose:
### Before (Too Large)
```yaml
title: "Implement user authentication"
# This is 3+ hours of work, dozens of decisions
```
### After (Properly Decomposed)
```yaml
tasks:
- title: "Create users table with password hash"
files: [db/migrations/001_users.sql]
- title: "Add signup endpoint with Zod validation"
files: [src/api/auth/signup.ts]
depends_on: [users-table]
- title: "Add login endpoint with JWT generation"
files: [src/api/auth/login.ts]
depends_on: [users-table]
- title: "Create auth middleware for protected routes"
files: [src/middleware/auth.ts]
depends_on: [login-endpoint]
- title: "Add refresh token rotation"
files: [src/api/auth/refresh.ts, db/migrations/002_refresh_tokens.sql]
depends_on: [auth-middleware]
```
---
## Anti-Patterns
### Vague Verbs
**Bad:** "Improve", "Enhance", "Update", "Fix" (without specifics)
**Good:** "Add X", "Change Y to Z", "Remove W"
### Missing Constraints
**Bad:** "Add validation"
**Good:** "Add Zod validation: email format, password 8+ chars with number"
### Implied Knowledge
**Bad:** "Handle the edge cases"
**Good:** "Handle: empty input (show error), network failure (retry 3x), duplicate email (show message)"
### Compound Tasks
**Bad:** "Set up auth and create the user management pages"
**Good:** Two separate tasks with dependency
### No Success Criteria
**Bad:** "Make it work"
**Good:** "Tests pass, no TypeScript errors, manual verification of happy path"
---
## Examples by Domain
### API Endpoint
```yaml
title: "Create POST /api/projects endpoint"
files:
- src/api/projects/create.ts
- src/api/projects/schema.ts
action: |
Create endpoint accepting:
- name: string (3-50 chars, required)
- description: string (max 500 chars, optional)
Returns:
- 201: { id, name, description, createdAt }
- 400: { error: "validation message" }
- 401: { error: "Unauthorized" }
Use Zod for validation, drizzle for DB insert.
verify:
- "npm run test -- projects.test.ts"
- "curl -X POST /api/projects -d '{\"name\": \"Test\"}' returns 201"
done:
- "Endpoint creates project in database"
- "Validation rejects invalid input with clear messages"
- "Auth middleware blocks unauthenticated requests"
```
### React Component
```yaml
title: "Create ProjectCard component"
files:
- src/components/ProjectCard.tsx
- src/components/ProjectCard.test.tsx
action: |
Create card displaying:
- Project name (truncate at 30 chars)
- Description preview (2 lines max)
- Created date (relative: "2 days ago")
- Status badge (active/archived)
Props: { project: Project, onClick: () => void }
Use Tailwind: rounded-lg, shadow-sm, hover:shadow-md
verify:
- "npm run typecheck"
- "npm run test -- ProjectCard"
- "Storybook renders all variants"
done:
- "Card renders with all project fields"
- "Truncation works for long names"
- "Hover state visible"
- "Click handler fires"
```
### Database Migration
```yaml
title: "Create projects table"
files:
- db/migrations/003_projects.sql
- src/db/schema/projects.ts
action: |
Create table:
- id: TEXT PRIMARY KEY (uuid)
- user_id: TEXT NOT NULL REFERENCES users(id)
- name: TEXT NOT NULL
- description: TEXT
- status: TEXT DEFAULT 'active' CHECK (IN 'active', 'archived')
- created_at: INTEGER DEFAULT unixepoch()
- updated_at: INTEGER DEFAULT unixepoch()
Indexes: user_id, status, created_at DESC
verify:
- "cw db migrate runs without error"
- "sqlite3 data/cw.db '.schema projects' shows correct schema"
done:
- "Migration applies cleanly"
- "Drizzle schema matches SQL"
- "Indexes created"
```
---
## Checklist Before Creating Task
- [ ] Can an agent execute this without asking questions?
- [ ] Are all files listed explicitly?
- [ ] Is the action specific (not "improve" or "handle")?
- [ ] Is there a concrete verify step?
- [ ] Are done criteria measurable?
- [ ] Does estimated context fit under 50%?
- [ ] Are there no compound actions (split if needed)?

331
docs/tasks.md Normal file
View File

@@ -0,0 +1,331 @@
# Tasks Module
Beads-inspired task management optimized for multi-agent coordination. Unlike beads (Git-distributed JSONL), this uses centralized SQLite for simplicity since all agents share the same workspace.
## Design Rationale
### Why Not Just Use Beads?
Beads solves a different problem: distributed task tracking across forked repos with zero coordination. We don't need that:
- All Workers operate in the same workspace under one `cw` server
- SQLite is the single source of truth
- tRPC exposes task queries directly to agents and dashboard
- No merge conflicts, no Git overhead
### Core Agent Problem Solved
Agents need to answer: **"What should I work on next?"**
The `ready` query solves this: tasks that are `open` with all dependencies `closed`. Combined with priority ordering, agents can self-coordinate without human intervention.
---
## Data Model
### Task Entity
| Field | Type | Description |
|-------|------|-------------|
| `id` | TEXT | Primary key. Hash-based (e.g., `tsk-a1b2c3`) or UUID |
| `parent_id` | TEXT | Optional. References parent task for hierarchies |
| `initiative_id` | TEXT | Optional. Links to Initiatives module |
| `phase_id` | TEXT | Optional. Links to initiative phase (for grouped approval) |
| `project_id` | TEXT | Optional. Scopes task to a project |
| `title` | TEXT | Required. Short task name |
| `description` | TEXT | Optional. Markdown-formatted details |
| `type` | TEXT | `task` (default), `epic`, `subtask` |
| `status` | TEXT | `open`, `in_progress`, `blocked`, `closed` |
| `priority` | INTEGER | 0=critical, 1=high, 2=normal (default), 3=low |
| `assigned_to` | TEXT | Agent/worker ID currently working on this |
| `assigned_at` | INTEGER | Unix timestamp when assigned |
| `metadata` | TEXT | JSON blob for extensibility |
| `created_at` | INTEGER | Unix timestamp |
| `updated_at` | INTEGER | Unix timestamp |
| `closed_at` | INTEGER | Unix timestamp when closed |
| `closed_reason` | TEXT | Why/how the task was completed |
### Task Dependencies
| Field | Type | Description |
|-------|------|-------------|
| `task_id` | TEXT | The task that is blocked |
| `depends_on` | TEXT | The task that must complete first |
| `type` | TEXT | `blocks` (default), `related` |
### Task History
| Field | Type | Description |
|-------|------|-------------|
| `id` | INTEGER | Auto-increment primary key |
| `task_id` | TEXT | The task that changed |
| `field` | TEXT | Which field changed |
| `old_value` | TEXT | Previous value |
| `new_value` | TEXT | New value |
| `changed_by` | TEXT | Agent/user ID |
| `changed_at` | INTEGER | Unix timestamp |
---
## SQLite Schema
```sql
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
parent_id TEXT REFERENCES tasks(id),
initiative_id TEXT,
phase_id TEXT,
project_id TEXT,
title TEXT NOT NULL,
description TEXT,
type TEXT NOT NULL DEFAULT 'task' CHECK (type IN ('task', 'epic', 'subtask')),
status TEXT NOT NULL DEFAULT 'open' CHECK (status IN ('open', 'in_progress', 'blocked', 'closed')),
priority INTEGER NOT NULL DEFAULT 2 CHECK (priority BETWEEN 0 AND 3),
assigned_to TEXT,
assigned_at INTEGER,
metadata TEXT,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
closed_at INTEGER,
closed_reason TEXT
);
CREATE TABLE task_dependencies (
task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
depends_on TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
type TEXT NOT NULL DEFAULT 'blocks' CHECK (type IN ('blocks', 'related')),
PRIMARY KEY (task_id, depends_on),
CHECK (task_id != depends_on)
);
CREATE TABLE task_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
field TEXT NOT NULL,
old_value TEXT,
new_value TEXT,
changed_by TEXT,
changed_at INTEGER NOT NULL DEFAULT (unixepoch())
);
CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_priority ON tasks(priority);
CREATE INDEX idx_tasks_assigned ON tasks(assigned_to);
CREATE INDEX idx_tasks_project ON tasks(project_id);
CREATE INDEX idx_tasks_initiative ON tasks(initiative_id);
CREATE INDEX idx_tasks_phase ON tasks(phase_id);
CREATE INDEX idx_task_history_task ON task_history(task_id);
-- The critical view for agent work discovery
-- Tasks are ready when: open, no blocking deps, and phase approved (if linked)
CREATE VIEW ready_tasks AS
SELECT t.* FROM tasks t
LEFT JOIN initiative_phases p ON t.phase_id = p.id
WHERE t.status = 'open'
AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
AND NOT EXISTS (
SELECT 1 FROM task_dependencies d
JOIN tasks dep ON d.depends_on = dep.id
WHERE d.task_id = t.id
AND d.type = 'blocks'
AND dep.status != 'closed'
)
ORDER BY t.priority ASC, t.created_at ASC;
```
---
## Status Workflow
```
┌──────────────────────────────────────┐
│ │
▼ │
[open] ──claim──▶ [in_progress] ──done──▶ [closed]
│ │
│ │ blocked
│ ▼
└───────────── [blocked] ◀─────unblock───┘
```
| Transition | Trigger | Notes |
|------------|---------|-------|
| `open``in_progress` | Agent claims task | Sets `assigned_to`, `assigned_at` |
| `in_progress``closed` | Work completed | Sets `closed_at`, `closed_reason` |
| `in_progress``blocked` | External dependency | Manual or auto-detected |
| `blocked``open` | Blocker resolved | Clears assignment |
| `open``closed` | Cancelled/won't do | Direct close without work |
---
## CLI Reference
All commands under `cw task` subcommand.
### Core Commands
| Command | Description |
|---------|-------------|
| `cw task ready` | List tasks ready for work (open + no blockers) |
| `cw task list [--status STATUS] [--project ID]` | List tasks with filters |
| `cw task show <id>` | Show task details + history |
| `cw task create <title> [-p PRIORITY] [-d DESC]` | Create new task |
| `cw task update <id> [--status STATUS] [--priority P]` | Update task fields |
| `cw task close <id> [--reason REASON]` | Mark task complete |
### Dependency Commands
| Command | Description |
|---------|-------------|
| `cw task dep add <task> <depends-on>` | Task blocked by another |
| `cw task dep rm <task> <depends-on>` | Remove dependency |
| `cw task dep tree <id>` | Show dependency graph |
### Assignment Commands
| Command | Description |
|---------|-------------|
| `cw task assign <id> <agent>` | Assign task to agent |
| `cw task unassign <id>` | Release task |
| `cw task mine` | List tasks assigned to current agent |
### Output Flags (global)
| Flag | Description |
|------|-------------|
| `--json` | Output as JSON (for agent consumption) |
| `--quiet` | Minimal output (just IDs) |
---
## Agent Workflow
Standard loop for Workers:
```
1. cw task ready --json
2. Pick highest priority task from result
3. cw task update <id> --status in_progress
4. Do the work
5. cw task close <id> --reason "Implemented X"
6. Loop to step 1
```
If `cw task ready` returns empty, the agent's work is done.
---
## Integration Points
### With Initiatives
- Tasks can link to an initiative via `initiative_id`
- When initiative is approved, tasks are generated from its technical concept
- Closing all tasks for an initiative signals initiative completion
### With Orchestrator
- Orchestrator queries `ready_tasks` view to dispatch work
- Assignment tracked to prevent double-dispatch
- Orchestrator can bulk-create tasks from job definitions
### With Workers
- Workers claim tasks via `cw task update --status in_progress`
- Worker ID stored in `assigned_to`
- On worker crash, Supervisor can detect stale assignments and reset
### tRPC Procedures
```typescript
// Suggested tRPC router shape
task.list(filters) // → Task[]
task.ready(projectId?) // → Task[]
task.get(id) // → Task | null
task.create(input) // → Task
task.update(id, input) // → Task
task.close(id, reason) // → Task
task.assign(id, agent) // → Task
task.history(id) // → TaskHistory[]
task.depAdd(id, dep) // → void
task.depRemove(id, dep) // → void
task.depTree(id) // → DependencyTree
```
---
## Task Granularity Standards
A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results.
### Quick Reference
| Too Vague | Just Right |
|-----------|------------|
| "Add authentication" | "Add JWT auth with refresh rotation using jose, httpOnly cookie, 15min access / 7day refresh" |
| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name 3-50 chars, returns 201" |
| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner" |
### Required Task Components
Every task MUST include:
1. **files** — Exact paths modified/created
2. **action** — What to do, what to avoid, WHY
3. **verify** — Command or check to prove completion
4. **done** — Measurable acceptance criteria
See [task-granularity.md](task-granularity.md) for comprehensive examples and anti-patterns.
### Context Budget
Tasks are sized to fit agent context budgets:
| Complexity | Context % | Example |
|------------|-----------|---------|
| Simple | 10-20% | Add form field |
| Medium | 20-35% | Create API endpoint |
| Complex | 35-50% | Implement auth flow |
| Too Large | >50% | **SPLIT REQUIRED** |
See [context-engineering.md](context-engineering.md) for context management rules.
---
## Deviation Handling
When Workers encounter unexpected issues during execution, they follow deviation rules:
| Rule | Action | Permission |
|------|--------|------------|
| Rule 1: Bug fixes | Auto-fix | None needed |
| Rule 2: Missing critical (validation, auth) | Auto-add | None needed |
| Rule 3: Blocking issues (deps, imports) | Auto-fix | None needed |
| Rule 4: Architectural changes | ASK | Required |
See [deviation-rules.md](deviation-rules.md) for detailed guidance.
---
## Execution Artifacts
Task execution produces artifacts:
| Artifact | Purpose |
|----------|---------|
| Commits | Per-task atomic commits |
| SUMMARY.md | Record of what happened |
| STATE.md updates | Position tracking |
See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
---
## Future Considerations
- **Compaction**: Summarize old closed tasks to reduce DB size (beads does this with LLM)
- **Labels/tags**: Additional categorization beyond type
- **Time tracking**: Estimated vs actual time for capacity planning
- **Recurring tasks**: Templates that spawn new tasks on schedule

322
docs/verification.md Normal file
View File

@@ -0,0 +1,322 @@
# Goal-Backward Verification
Verification confirms that **goals are achieved**, not merely that **tasks were completed**. A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
## Core Principle
**Task completion ≠ Goal achievement**
Tasks are implementation steps. Goals are user outcomes. Verification bridges the gap by checking observable outcomes, not just checklist items.
---
## Verification Levels
### Level 1: Existence Check
Does the artifact exist?
```
✓ File exists at expected path
✓ Component is exported
✓ Route is registered
```
### Level 2: Substance Check
Is the artifact substantive (not a stub)?
```
✓ Function has implementation (not just return null)
✓ Component renders content (not empty div)
✓ API returns meaningful response (not placeholder)
```
### Level 3: Wiring Check
Is the artifact connected to the system?
```
✓ Component is rendered somewhere
✓ API endpoint is called by client
✓ Event handler is attached
✓ Database query is executed
```
**All three levels must pass for verification success.**
---
## Must-Have Derivation
Before verification, derive what "done" means from the goal:
### 1. Observable Truths (3-7 user perspectives)
What can a user observe when the goal is achieved?
```yaml
observable_truths:
- "User can click 'Send' and message appears in chat"
- "Messages persist after page refresh"
- "New messages appear without page reload"
- "User sees typing indicator when other party types"
```
### 2. Required Artifacts
What files MUST exist?
```yaml
required_artifacts:
- path: src/components/Chat.tsx
check: "Exports Chat component"
- path: src/api/messages.ts
check: "Exports sendMessage, getMessages"
- path: src/hooks/useChat.ts
check: "Exports useChat hook"
```
### 3. Required Wiring
What connections MUST work?
```yaml
required_wiring:
- from: Chat.tsx
to: useChat.ts
check: "Component calls hook"
- from: useChat.ts
to: messages.ts
check: "Hook calls API"
- from: messages.ts
to: database
check: "API persists to DB"
```
### 4. Key Links (Where Stubs Hide)
What integration points commonly fail?
```yaml
key_links:
- "Form onSubmit → API call (not just console.log)"
- "WebSocket connection → message handler"
- "API response → state update → render"
```
---
## Verification Process
### Phase Verification
After all tasks in a phase complete:
```
1. Load must-haves (from phase goal or PLAN frontmatter)
2. For each observable truth:
a. Level 1: Does the relevant code exist?
b. Level 2: Is it substantive?
c. Level 3: Is it wired?
3. For each required artifact:
a. Verify file exists
b. Verify not a stub
c. Verify it's imported/used
4. For each key link:
a. Trace the connection
b. Verify data flows
5. Scan for anti-patterns (see below)
6. Structure gaps for re-planning
```
### Anti-Pattern Scanning
Check for common incomplete work:
| Pattern | Detection | Meaning |
|---------|-----------|---------|
| `// TODO` | Grep for TODO comments | Work deferred |
| `throw new Error('Not implemented')` | Grep for stub errors | Placeholder code |
| `return null` / `return {}` | AST analysis | Empty implementations |
| `console.log` in handlers | Grep for console.log | Debug code left behind |
| Empty catch blocks | AST analysis | Swallowed errors |
| Hardcoded values | Manual review | Missing configuration |
---
## Verification Output
### Pass Case
```yaml
# 2-VERIFICATION.md
phase: 2
status: PASS
verified_at: 2024-01-15T10:30:00Z
observable_truths:
- truth: "User can send message"
status: VERIFIED
evidence: "Chat.tsx:45 calls sendMessage on submit"
- truth: "Messages persist"
status: VERIFIED
evidence: "messages.ts:23 inserts to SQLite"
required_artifacts:
- path: src/components/Chat.tsx
status: EXISTS
check: PASSED
- path: src/api/messages.ts
status: EXISTS
check: PASSED
anti_patterns_found: []
human_verification_needed:
- "Visual layout matches design"
- "Real-time updates work under load"
```
### Fail Case (Gaps Found)
```yaml
# 2-VERIFICATION.md
phase: 2
status: GAPS_FOUND
verified_at: 2024-01-15T10:30:00Z
gaps:
- type: STUB
location: src/hooks/useChat.ts:34
description: "sendMessage returns immediately without API call"
severity: BLOCKING
- type: MISSING_WIRING
location: src/components/Chat.tsx
description: "WebSocket not connected, no real-time updates"
severity: BLOCKING
- type: ANTI_PATTERN
location: src/api/messages.ts:67
description: "Empty catch block swallows errors"
severity: HIGH
remediation_plan:
- "Connect useChat to actual API endpoint"
- "Initialize WebSocket in Chat component"
- "Add error handling to API calls"
```
---
## User Acceptance Testing (UAT)
Verification confirms code correctness. UAT confirms user experience.
### UAT Process
1. Extract testable deliverables from phase goal
2. Walk user through each one:
- "Can you log in with your email?"
- "Does the dashboard show your projects?"
- "Can you create a new project?"
3. Record result: PASS, FAIL, or describe issue
4. If issues found:
- Diagnose root cause
- Create targeted fix plan
5. If all pass: Phase complete
### UAT Output
```yaml
# 2-UAT.md
phase: 2
tested_by: user
tested_at: 2024-01-15T14:00:00Z
test_cases:
- case: "Login with email"
result: PASS
- case: "Dashboard shows projects"
result: FAIL
issue: "Shows loading spinner forever"
diagnosis: "API returns 500, missing auth header"
- case: "Create new project"
result: BLOCKED
reason: "Cannot test, dashboard not loading"
fix_required: true
fix_plan:
- task: "Add auth header to dashboard API call"
files: [src/api/projects.ts]
priority: P0
```
---
## Integration with Task Workflow
### Task Completion Hook
When task closes:
1. Worker marks task closed with reason
2. If all phase tasks closed, trigger phase verification
3. Verifier agent runs goal-backward check
4. If PASS: Phase marked complete
5. If GAPS: Create remediation tasks, phase stays in_progress
### Verification Task Type
Verification itself is a task:
```yaml
type: verification
phase_id: phase-2
status: open
assigned_to: verifier-agent
priority: P0 # Always high priority
```
---
## Checkpoint Types
During execution, agents may need human input. Use precise checkpoint types:
### checkpoint:human-verify (90% of checkpoints)
Agent completed work, user confirms it works.
```yaml
checkpoint: human-verify
prompt: "Can you log in with email and password?"
expected: "User confirms successful login"
```
### checkpoint:decision (9% of checkpoints)
User must make implementation choice.
```yaml
checkpoint: decision
prompt: "OAuth2 or SAML for SSO?"
options:
- OAuth2: "Simpler, most common"
- SAML: "Enterprise requirement"
```
### checkpoint:human-action (1% of checkpoints)
Truly unavoidable manual step.
```yaml
checkpoint: human-action
prompt: "Click the email verification link"
reason: "Cannot automate email client interaction"
```
---
## Human Verification Needs
Some verifications require human eyes:
| Category | Examples | Why Human |
|----------|----------|-----------|
| Visual | Layout, spacing, colors | Subjective/design judgment |
| Real-time | WebSocket, live updates | Requires interaction |
| External | OAuth flow, payment | Third-party systems |
| Accessibility | Screen reader, keyboard nav | Requires tooling/expertise |
**Mark these explicitly** in verification output. Don't claim PASS when human verification is pending.