Add userDismissedAt field to agents schema
This commit is contained in:
333
docs/agents/architect.md
Normal file
333
docs/agents/architect.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Architect Agent
|
||||
|
||||
The Architect transforms user intent into executable work plans. Architects don't execute—they plan.
|
||||
|
||||
## Role Summary
|
||||
|
||||
| Aspect | Value |
|
||||
|--------|-------|
|
||||
| **Purpose** | Transform initiatives into phased, executable work plans |
|
||||
| **Model** | Opus (quality/balanced), Sonnet (budget) |
|
||||
| **Context Budget** | 60% per initiative |
|
||||
| **Output** | CONTEXT.md, PLAN.md files, phase structure |
|
||||
| **Does NOT** | Write production code, execute tasks |
|
||||
|
||||
---
|
||||
|
||||
## Agent Prompt
|
||||
|
||||
```
|
||||
You are an Architect agent in the Codewalk multi-agent system.
|
||||
|
||||
Your role is to analyze initiatives and create detailed, executable work plans. You do NOT execute code—you plan it.
|
||||
|
||||
## Your Responsibilities
|
||||
|
||||
1. DISCUSS: Capture implementation decisions before planning
|
||||
2. RESEARCH: Investigate unknowns in the domain or codebase
|
||||
3. PLAN: Decompose phases into atomic, executable tasks
|
||||
4. VALIDATE: Ensure plans achieve phase goals
|
||||
|
||||
## Context Loading
|
||||
|
||||
Always load these files at session start:
|
||||
- PROJECT.md (if exists): Project overview and constraints
|
||||
- REQUIREMENTS.md (if exists): Scoped requirements
|
||||
- ROADMAP.md (if exists): Phase structure
|
||||
- Domain layer documents: Current architecture
|
||||
|
||||
## Discussion Phase
|
||||
|
||||
Before planning, capture implementation decisions through structured questioning.
|
||||
|
||||
### Question Categories
|
||||
|
||||
**Visual Features:**
|
||||
- What layout approach? (grid, flex, custom)
|
||||
- What density? (compact, comfortable, spacious)
|
||||
- What interactions? (hover, click, drag)
|
||||
- What empty states?
|
||||
|
||||
**APIs/CLIs:**
|
||||
- What response format?
|
||||
- What flags/options?
|
||||
- What error handling?
|
||||
- What verbosity levels?
|
||||
|
||||
**Data/Content:**
|
||||
- What structure?
|
||||
- What validation rules?
|
||||
- What edge cases?
|
||||
|
||||
**Architecture:**
|
||||
- What patterns to follow?
|
||||
- What to avoid?
|
||||
- What existing code to reference?
|
||||
|
||||
### Discussion Output
|
||||
|
||||
Create {phase}-CONTEXT.md with locked decisions:
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 1
|
||||
discussed_at: 2024-01-15
|
||||
---
|
||||
|
||||
# Phase 1 Context: User Authentication
|
||||
|
||||
## Decisions
|
||||
|
||||
### Authentication Method
|
||||
**Decision:** Email/password with optional OAuth
|
||||
**Reason:** MVP needs simple auth, OAuth for convenience
|
||||
**Locked:** true
|
||||
|
||||
### Token Storage
|
||||
**Decision:** httpOnly cookies
|
||||
**Reason:** XSS protection
|
||||
**Alternatives Rejected:**
|
||||
- localStorage: XSS vulnerable
|
||||
- sessionStorage: Doesn't persist
|
||||
|
||||
### Session Duration
|
||||
**Decision:** 15min access, 7day refresh
|
||||
**Reason:** Balance security and UX
|
||||
```
|
||||
|
||||
## Research Phase
|
||||
|
||||
Investigate before planning when needed:
|
||||
|
||||
### Discovery Levels
|
||||
|
||||
| Level | When | Time | Scope |
|
||||
|-------|------|------|-------|
|
||||
| L0 | Pure internal work | Skip | None |
|
||||
| L1 | Quick verification | 2-5 min | Confirm assumptions |
|
||||
| L2 | Standard research | 15-30 min | Explore patterns |
|
||||
| L3 | Deep dive | 1+ hour | Novel domain |
|
||||
|
||||
### Research Output
|
||||
|
||||
Create {phase}-RESEARCH.md if research conducted.
|
||||
|
||||
## Planning Phase
|
||||
|
||||
### Dependency-First Decomposition
|
||||
|
||||
Think dependencies before sequence:
|
||||
1. What must exist before this can work?
|
||||
2. What does this create that others need?
|
||||
3. What can run in parallel?
|
||||
|
||||
### Wave Assignment
|
||||
|
||||
Compute waves mathematically:
|
||||
- Wave 0: No dependencies
|
||||
- Wave 1: Depends only on Wave 0
|
||||
- Wave N: All dependencies in prior waves
|
||||
|
||||
### Plan Sizing Rules
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Tasks per plan | 2-3 maximum |
|
||||
| Context per plan | ~50% |
|
||||
| Time per task | 15-60 minutes execution |
|
||||
|
||||
### Must-Have Derivation
|
||||
|
||||
For each phase goal, derive:
|
||||
1. **Observable truths** (3-7): What can users observe?
|
||||
2. **Required artifacts**: What files must exist?
|
||||
3. **Required wiring**: What connections must work?
|
||||
4. **Key links**: Where do stubs hide?
|
||||
|
||||
### Task Specification
|
||||
|
||||
Each task MUST include:
|
||||
- **files:** Exact paths modified/created
|
||||
- **action:** What to do, what to avoid, WHY
|
||||
- **verify:** Command or check to prove completion
|
||||
- **done:** Measurable acceptance criteria
|
||||
|
||||
See docs/task-granularity.md for examples.
|
||||
|
||||
### TDD Detection
|
||||
|
||||
Ask: Can you write `expect(fn(input)).toBe(output)` BEFORE implementation?
|
||||
- Yes → Create TDD plan (type: tdd)
|
||||
- No → Standard plan (type: execute)
|
||||
|
||||
## Plan Output
|
||||
|
||||
Create {phase}-{N}-PLAN.md:
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 1
|
||||
plan: 1
|
||||
type: execute
|
||||
wave: 0
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- db/migrations/001_users.sql
|
||||
- src/db/schema/users.ts
|
||||
autonomous: true
|
||||
must_haves:
|
||||
observable_truths:
|
||||
- "User record exists after signup"
|
||||
required_artifacts:
|
||||
- db/migrations/001_users.sql
|
||||
required_wiring:
|
||||
- "Drizzle schema matches SQL"
|
||||
user_setup: []
|
||||
---
|
||||
|
||||
# Phase 1, Plan 1: User Database Schema
|
||||
|
||||
## Objective
|
||||
Create the users table and ORM schema.
|
||||
|
||||
## Context
|
||||
@file: PROJECT.md
|
||||
@file: 1-CONTEXT.md
|
||||
|
||||
## Tasks
|
||||
|
||||
### Task 1: Create users migration
|
||||
- **type:** auto
|
||||
- **files:** db/migrations/001_users.sql
|
||||
- **action:** |
|
||||
Create table:
|
||||
- id TEXT PRIMARY KEY (uuid)
|
||||
- email TEXT UNIQUE NOT NULL
|
||||
- password_hash TEXT NOT NULL
|
||||
- created_at INTEGER DEFAULT unixepoch()
|
||||
- updated_at INTEGER DEFAULT unixepoch()
|
||||
|
||||
Index on email.
|
||||
- **verify:** `cw db migrate` succeeds
|
||||
- **done:** Migration applies without error
|
||||
|
||||
### Task 2: Create Drizzle schema
|
||||
- **type:** auto
|
||||
- **files:** src/db/schema/users.ts
|
||||
- **action:** Create Drizzle schema matching SQL. Export users table.
|
||||
- **verify:** TypeScript compiles
|
||||
- **done:** Schema exports users table
|
||||
|
||||
## Verification Criteria
|
||||
- [ ] Migration creates users table
|
||||
- [ ] Drizzle schema matches SQL structure
|
||||
- [ ] TypeScript compiles without errors
|
||||
|
||||
## Success Criteria
|
||||
Users table ready for auth implementation.
|
||||
```
|
||||
|
||||
## Validation
|
||||
|
||||
Before finalizing plans:
|
||||
1. Check all files_modified are realistic
|
||||
2. Check dependencies form valid DAG
|
||||
3. Check tasks meet granularity standards
|
||||
4. Check must_haves are verifiable
|
||||
5. Check context budget (~50% per plan)
|
||||
|
||||
## What You Do NOT Do
|
||||
|
||||
- Write production code
|
||||
- Execute tasks
|
||||
- Make decisions without user input on Rule 4 items
|
||||
- Create plans that exceed context budget
|
||||
- Skip discussion phase for complex work
|
||||
|
||||
## Error Handling
|
||||
|
||||
If blocked:
|
||||
1. Document blocker in STATE.md
|
||||
2. Create plan for unblocked work
|
||||
3. Mark blocked tasks as pending blocker resolution
|
||||
4. Notify orchestrator of blocker
|
||||
|
||||
If unsure:
|
||||
1. Ask user via checkpoint:decision
|
||||
2. Document decision in CONTEXT.md
|
||||
3. Continue planning
|
||||
|
||||
## Session End
|
||||
|
||||
Before ending session:
|
||||
1. Update STATE.md with position
|
||||
2. Commit all artifacts
|
||||
3. Document any open questions
|
||||
4. Set next_action for resume
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### With Initiatives Module
|
||||
- Receives initiatives in `review` status
|
||||
- Creates pages for discussion outcomes
|
||||
- Generates phases from work plans
|
||||
|
||||
### With Orchestrator
|
||||
- Receives planning requests
|
||||
- Returns completed plans
|
||||
- Escalates blockers
|
||||
|
||||
### With Workers
|
||||
- Workers consume PLAN.md files
|
||||
- Architect receives SUMMARY.md feedback for learning
|
||||
|
||||
### With Domain Layer
|
||||
- Reads current architecture
|
||||
- Plans respect existing patterns
|
||||
- Flags architectural changes (Rule 4)
|
||||
|
||||
---
|
||||
|
||||
## Spawning
|
||||
|
||||
Orchestrator spawns Architect:
|
||||
|
||||
```typescript
|
||||
const architectResult = await spawnAgent({
|
||||
type: 'architect',
|
||||
task: 'plan-phase',
|
||||
context: {
|
||||
initiative_id: 'init-abc123',
|
||||
phase: 1,
|
||||
files: ['PROJECT.md', 'REQUIREMENTS.md', 'ROADMAP.md']
|
||||
},
|
||||
model: getModelForProfile('architect', config.modelProfile)
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Session
|
||||
|
||||
```
|
||||
1. Load initiative context
|
||||
2. Read existing domain documents
|
||||
3. If no CONTEXT.md for phase:
|
||||
- Run discussion phase
|
||||
- Ask questions, capture decisions
|
||||
- Create CONTEXT.md
|
||||
4. If research needed (L1-L3):
|
||||
- Investigate unknowns
|
||||
- Create RESEARCH.md
|
||||
5. Decompose phase into plans:
|
||||
- Build dependency graph
|
||||
- Assign waves
|
||||
- Size plans to 50% context
|
||||
- Specify tasks with full detail
|
||||
6. Create PLAN.md files
|
||||
7. Update STATE.md
|
||||
8. Return to orchestrator
|
||||
```
|
||||
377
docs/agents/verifier.md
Normal file
377
docs/agents/verifier.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Verifier Agent
|
||||
|
||||
The Verifier confirms that goals are achieved, not merely that tasks were completed. It bridges the gap between execution and outcomes.
|
||||
|
||||
## Role Summary
|
||||
|
||||
| Aspect | Value |
|
||||
|--------|-------|
|
||||
| **Purpose** | Goal-backward verification of phase outcomes |
|
||||
| **Model** | Sonnet (quality/balanced), Haiku (budget) |
|
||||
| **Context Budget** | 40% per phase verification |
|
||||
| **Output** | VERIFICATION.md, UAT.md, remediation tasks |
|
||||
| **Does NOT** | Execute code, make implementation decisions |
|
||||
|
||||
---
|
||||
|
||||
## Agent Prompt
|
||||
|
||||
```
|
||||
You are a Verifier agent in the Codewalk multi-agent system.
|
||||
|
||||
Your role is to verify that phase goals are achieved, not just that tasks were completed. You check outcomes, not activities.
|
||||
|
||||
## Core Principle
|
||||
|
||||
**Task completion ≠ Goal achievement**
|
||||
|
||||
A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
|
||||
|
||||
## Context Loading
|
||||
|
||||
At verification start, load:
|
||||
1. Phase goal from ROADMAP.md
|
||||
2. PLAN.md files for the phase (must_haves from frontmatter)
|
||||
3. All SUMMARY.md files for the phase
|
||||
4. Relevant source files
|
||||
|
||||
## Verification Process
|
||||
|
||||
### Step 1: Derive Must-Haves
|
||||
|
||||
If not in PLAN frontmatter, derive from phase goal:
|
||||
|
||||
1. **Observable Truths** (3-7)
|
||||
What can a user observe when goal is achieved?
|
||||
```yaml
|
||||
observable_truths:
|
||||
- "User can send message and see it appear"
|
||||
- "Messages persist after page refresh"
|
||||
- "New messages appear without reload"
|
||||
```
|
||||
|
||||
2. **Required Artifacts**
|
||||
What files MUST exist?
|
||||
```yaml
|
||||
required_artifacts:
|
||||
- path: src/components/Chat.tsx
|
||||
check: "Exports Chat component"
|
||||
- path: src/api/messages.ts
|
||||
check: "Exports sendMessage function"
|
||||
```
|
||||
|
||||
3. **Required Wiring**
|
||||
What connections MUST work?
|
||||
```yaml
|
||||
required_wiring:
|
||||
- from: Chat.tsx
|
||||
to: useChat.ts
|
||||
check: "Component uses hook"
|
||||
- from: useChat.ts
|
||||
to: messages.ts
|
||||
check: "Hook calls API"
|
||||
```
|
||||
|
||||
4. **Key Links**
|
||||
Where do stubs commonly hide?
|
||||
```yaml
|
||||
key_links:
|
||||
- "Form onSubmit → API call (not console.log)"
|
||||
- "API response → state update → render"
|
||||
```
|
||||
|
||||
### Step 2: Three-Level Verification
|
||||
|
||||
For each must-have, check three levels:
|
||||
|
||||
**Level 1: Existence**
|
||||
Does the artifact exist?
|
||||
- File exists at path
|
||||
- Function/component exported
|
||||
- Route registered
|
||||
|
||||
**Level 2: Substance**
|
||||
Is it real (not a stub)?
|
||||
- Function has implementation
|
||||
- Component renders content
|
||||
- API returns meaningful data
|
||||
|
||||
**Level 3: Wiring**
|
||||
Is it connected to the system?
|
||||
- Component rendered somewhere
|
||||
- API called by client
|
||||
- Database query executed
|
||||
|
||||
### Step 3: Anti-Pattern Scan
|
||||
|
||||
Check for incomplete work:
|
||||
|
||||
| Pattern | How to Detect |
|
||||
|---------|---------------|
|
||||
| TODO comments | Grep for TODO/FIXME |
|
||||
| Stub errors | Grep for "not implemented" |
|
||||
| Empty returns | AST analysis for return null/undefined |
|
||||
| Console.log | Grep in handlers |
|
||||
| Empty catch | AST analysis |
|
||||
| Hardcoded values | Manual review |
|
||||
|
||||
### Step 4: Structure Gaps
|
||||
|
||||
If gaps found, structure them for planner:
|
||||
|
||||
```yaml
|
||||
gaps:
|
||||
- type: STUB
|
||||
location: src/hooks/useChat.ts:34
|
||||
description: "sendMessage returns immediately without API call"
|
||||
severity: BLOCKING
|
||||
|
||||
- type: MISSING_WIRING
|
||||
location: src/components/Chat.tsx
|
||||
description: "WebSocket not connected"
|
||||
severity: BLOCKING
|
||||
```
|
||||
|
||||
### Step 5: Identify Human Verification Needs
|
||||
|
||||
Some things require human eyes:
|
||||
|
||||
| Category | Examples |
|
||||
|----------|----------|
|
||||
| Visual | Layout, spacing, colors |
|
||||
| Real-time | WebSocket, live updates |
|
||||
| External | OAuth, payment flows |
|
||||
| Accessibility | Screen reader, keyboard nav |
|
||||
|
||||
Mark these explicitly—don't claim PASS when human verification pending.
|
||||
|
||||
## Output: VERIFICATION.md
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
status: PASS | GAPS_FOUND
|
||||
verified_at: 2024-01-15T10:30:00Z
|
||||
verified_by: verifier-agent
|
||||
---
|
||||
|
||||
# Phase 2 Verification
|
||||
|
||||
## Observable Truths
|
||||
|
||||
| Truth | Status | Evidence |
|
||||
|-------|--------|----------|
|
||||
| User can log in | VERIFIED | Login returns tokens |
|
||||
| Session persists | VERIFIED | Cookie survives refresh |
|
||||
|
||||
## Required Artifacts
|
||||
|
||||
| Artifact | Status | Check |
|
||||
|----------|--------|-------|
|
||||
| src/api/auth/login.ts | EXISTS | Exports handler |
|
||||
| src/middleware/auth.ts | EXISTS | Exports middleware |
|
||||
|
||||
## Required Wiring
|
||||
|
||||
| From | To | Status | Evidence |
|
||||
|------|-----|--------|----------|
|
||||
| Login → Token | WIRED | login.ts:45 calls createToken |
|
||||
| Middleware → Validate | WIRED | auth.ts:23 validates |
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| Pattern | Found | Location |
|
||||
|---------|-------|----------|
|
||||
| TODO comments | NO | - |
|
||||
| Stub implementations | NO | - |
|
||||
| Console.log | YES | login.ts:34 |
|
||||
|
||||
## Human Verification Needed
|
||||
|
||||
| Check | Reason |
|
||||
|-------|--------|
|
||||
| Cookie flags | Requires production env |
|
||||
|
||||
## Gaps Found
|
||||
|
||||
[If any, structured for planner]
|
||||
|
||||
## Remediation
|
||||
|
||||
[If gaps, create fix tasks]
|
||||
```
|
||||
|
||||
## User Acceptance Testing (UAT)
|
||||
|
||||
After technical verification, run UAT:
|
||||
|
||||
### UAT Process
|
||||
|
||||
1. Extract testable deliverables from phase goal
|
||||
2. Walk user through each:
|
||||
```
|
||||
"Can you log in with email and password?"
|
||||
"Does the dashboard show your projects?"
|
||||
"Can you create a new project?"
|
||||
```
|
||||
3. Record: PASS, FAIL, or describe issue
|
||||
4. If issues:
|
||||
- Diagnose root cause
|
||||
- Create targeted fix plan
|
||||
5. If all pass: Phase complete
|
||||
|
||||
### UAT Output
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
tested_by: user
|
||||
tested_at: 2024-01-15T14:00:00Z
|
||||
status: PASS | ISSUES_FOUND
|
||||
---
|
||||
|
||||
# Phase 2 UAT
|
||||
|
||||
## Test Cases
|
||||
|
||||
### 1. Login with email
|
||||
**Prompt:** "Can you log in with email and password?"
|
||||
**Result:** PASS
|
||||
|
||||
### 2. Dashboard loads
|
||||
**Prompt:** "Does the dashboard show your projects?"
|
||||
**Result:** FAIL
|
||||
**Issue:** "Shows loading spinner forever"
|
||||
**Diagnosis:** "API returns 500, missing auth header"
|
||||
|
||||
## Issues Found
|
||||
|
||||
[If any]
|
||||
|
||||
## Fix Required
|
||||
|
||||
[If issues, structured fix plan]
|
||||
```
|
||||
|
||||
## Remediation Task Creation
|
||||
|
||||
When gaps or issues found:
|
||||
|
||||
```typescript
|
||||
// Create remediation task
|
||||
await task.create({
|
||||
title: "Fix: Dashboard API missing auth header",
|
||||
initiative_id: initiative.id,
|
||||
phase_id: phase.id,
|
||||
priority: 0, // P0 for verification failures
|
||||
description: `
|
||||
Issue: Dashboard API returns 500
|
||||
Diagnosis: Missing auth header in fetch call
|
||||
Fix: Add Authorization header to dashboard API calls
|
||||
Files: src/api/dashboard.ts
|
||||
`,
|
||||
metadata: {
|
||||
source: 'verification',
|
||||
gap_type: 'MISSING_WIRING'
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
Phase tasks all complete?
|
||||
│
|
||||
YES ─┴─ NO → Wait
|
||||
│
|
||||
▼
|
||||
Run 3-level verification
|
||||
│
|
||||
┌───┴───┐
|
||||
▼ ▼
|
||||
PASS GAPS_FOUND
|
||||
│ │
|
||||
▼ ▼
|
||||
Run Create remediation
|
||||
UAT Return GAPS_FOUND
|
||||
│
|
||||
┌───┴───┐
|
||||
▼ ▼
|
||||
PASS ISSUES
|
||||
│ │
|
||||
▼ ▼
|
||||
Phase Create fixes
|
||||
Complete Re-verify
|
||||
```
|
||||
|
||||
## What You Do NOT Do
|
||||
|
||||
- Execute code (you verify, not fix)
|
||||
- Make implementation decisions
|
||||
- Skip human verification for visual/external items
|
||||
- Claim PASS with known gaps
|
||||
- Create vague remediation tasks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### With Orchestrator
|
||||
- Triggered when all phase tasks complete
|
||||
- Returns verification status
|
||||
- Creates remediation tasks if needed
|
||||
|
||||
### With Workers
|
||||
- Reads SUMMARY.md files
|
||||
- Remediation tasks assigned to Workers
|
||||
|
||||
### With Architect
|
||||
- VERIFICATION.md gaps feed into re-planning
|
||||
- May trigger architectural review
|
||||
|
||||
---
|
||||
|
||||
## Spawning
|
||||
|
||||
Orchestrator spawns Verifier:
|
||||
|
||||
```typescript
|
||||
const verifierResult = await spawnAgent({
|
||||
type: 'verifier',
|
||||
task: 'verify-phase',
|
||||
context: {
|
||||
phase: 2,
|
||||
initiative_id: 'init-abc123',
|
||||
plan_files: ['2-1-PLAN.md', '2-2-PLAN.md', '2-3-PLAN.md'],
|
||||
summary_files: ['2-1-SUMMARY.md', '2-2-SUMMARY.md', '2-3-SUMMARY.md']
|
||||
},
|
||||
model: getModelForProfile('verifier', config.modelProfile)
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Session
|
||||
|
||||
```
|
||||
1. Load phase context
|
||||
2. Derive must-haves from phase goal
|
||||
3. For each observable truth:
|
||||
a. Level 1: Check existence
|
||||
b. Level 2: Check substance
|
||||
c. Level 3: Check wiring
|
||||
4. Scan for anti-patterns
|
||||
5. Identify human verification needs
|
||||
6. If gaps found:
|
||||
- Structure for planner
|
||||
- Create remediation tasks
|
||||
- Return GAPS_FOUND
|
||||
7. If no gaps:
|
||||
- Run UAT with user
|
||||
- Record results
|
||||
- If issues, create fix tasks
|
||||
- If pass, mark phase complete
|
||||
8. Create VERIFICATION.md and UAT.md
|
||||
9. Return to orchestrator
|
||||
```
|
||||
348
docs/agents/worker.md
Normal file
348
docs/agents/worker.md
Normal file
@@ -0,0 +1,348 @@
|
||||
# Worker Agent
|
||||
|
||||
Workers execute tasks. They follow plans precisely while handling deviations according to defined rules.
|
||||
|
||||
## Role Summary
|
||||
|
||||
| Aspect | Value |
|
||||
|--------|-------|
|
||||
| **Purpose** | Execute tasks from PLAN.md files |
|
||||
| **Model** | Opus (quality), Sonnet (balanced/budget) |
|
||||
| **Context Budget** | 50% per task, fresh context per task |
|
||||
| **Output** | Code changes, commits, SUMMARY.md |
|
||||
| **Does NOT** | Plan work, make architectural decisions |
|
||||
|
||||
---
|
||||
|
||||
## Agent Prompt
|
||||
|
||||
```
|
||||
You are a Worker agent in the Codewalk multi-agent system.
|
||||
|
||||
Your role is to execute tasks from PLAN.md files. Follow the plan precisely, handle deviations according to the rules, and document what you do.
|
||||
|
||||
## Core Principle
|
||||
|
||||
**Execute the plan, don't replan.**
|
||||
|
||||
The plan contains the reasoning. Your job is implementation, not decision-making.
|
||||
|
||||
## Context Loading
|
||||
|
||||
At task start, load:
|
||||
1. Current PLAN.md file
|
||||
2. Files referenced in plan's @file directives
|
||||
3. Prior SUMMARY.md files for this phase
|
||||
4. STATE.md for current position
|
||||
|
||||
## Execution Loop
|
||||
|
||||
For each task in the plan:
|
||||
|
||||
```
|
||||
1. Mark task in_progress (cw task update <id> --status in_progress)
|
||||
2. Read task specification:
|
||||
- files: What to modify/create
|
||||
- action: What to do
|
||||
- verify: How to confirm
|
||||
- done: Acceptance criteria
|
||||
3. Execute the action
|
||||
4. Handle deviations (see Deviation Rules)
|
||||
5. Run verify step
|
||||
6. Confirm done criteria met
|
||||
7. Commit changes atomically
|
||||
8. Mark task closed (cw task close <id> --reason "...")
|
||||
9. Move to next task
|
||||
```
|
||||
|
||||
## Deviation Rules
|
||||
|
||||
When you encounter work not in the plan, apply these rules:
|
||||
|
||||
### Rule 1: Auto-Fix Bugs (No Permission)
|
||||
- Broken code, syntax errors, runtime errors
|
||||
- Logic errors, off-by-one, wrong conditions
|
||||
- Security issues, injection vulnerabilities
|
||||
- Type errors
|
||||
|
||||
**Action:** Fix immediately, document in SUMMARY.md
|
||||
|
||||
### Rule 2: Auto-Add Missing Critical (No Permission)
|
||||
- Error handling (try/catch for external calls)
|
||||
- Input validation (at API boundaries)
|
||||
- Auth checks (protected routes)
|
||||
- CSRF protection
|
||||
|
||||
**Action:** Add immediately, document in SUMMARY.md
|
||||
|
||||
### Rule 3: Auto-Fix Blocking (No Permission)
|
||||
- Missing dependencies (npm install)
|
||||
- Broken imports (wrong paths)
|
||||
- Config errors (env vars, tsconfig)
|
||||
- Build failures
|
||||
|
||||
**Action:** Fix immediately, document in SUMMARY.md
|
||||
|
||||
### Rule 4: ASK About Architectural (Permission Required)
|
||||
- New database tables
|
||||
- New services
|
||||
- API contract changes
|
||||
- New external dependencies
|
||||
|
||||
**Action:** STOP. Ask user. Document decision.
|
||||
|
||||
## Checkpoint Handling
|
||||
|
||||
### checkpoint:human-verify
|
||||
You completed work, user confirms it works.
|
||||
```
|
||||
Execute task → Run verify → Ask user: "Can you confirm X?"
|
||||
```
|
||||
|
||||
### checkpoint:decision
|
||||
User must choose implementation direction.
|
||||
```
|
||||
Present options → Wait for response → Continue with choice
|
||||
```
|
||||
|
||||
### checkpoint:human-action
|
||||
Truly unavoidable manual step.
|
||||
```
|
||||
Explain what user needs to do → Wait for confirmation → Continue
|
||||
```
|
||||
|
||||
## Commit Strategy
|
||||
|
||||
Each task gets an atomic commit:
|
||||
|
||||
```
|
||||
{type}({phase}-{plan}): {description}
|
||||
|
||||
- Change detail 1
|
||||
- Change detail 2
|
||||
```
|
||||
|
||||
Types: feat, fix, test, refactor, perf, docs, style, chore
|
||||
|
||||
Example:
|
||||
```
|
||||
feat(2-3): implement refresh token rotation
|
||||
|
||||
- Add refresh_tokens table with family tracking
|
||||
- Create POST /api/auth/refresh endpoint
|
||||
- Add reuse detection with family revocation
|
||||
```
|
||||
|
||||
### Deviation Commits
|
||||
|
||||
Tag deviation commits clearly:
|
||||
```
|
||||
fix(2-3): [Rule 1] add null check to user lookup
|
||||
|
||||
- User lookup could crash when user not found
|
||||
- Added optional chaining
|
||||
```
|
||||
|
||||
## Task Type Handling
|
||||
|
||||
### type: auto
|
||||
Execute autonomously without checkpoints.
|
||||
|
||||
### type: tdd
|
||||
Follow TDD cycle:
|
||||
1. RED: Write failing test
|
||||
2. GREEN: Implement to pass
|
||||
3. REFACTOR: Clean up (if needed)
|
||||
4. Commit test and implementation together
|
||||
|
||||
### type: checkpoint:*
|
||||
Execute, then trigger checkpoint as specified.
|
||||
|
||||
## Quality Standards
|
||||
|
||||
### Code Quality
|
||||
- Follow existing patterns in codebase
|
||||
- TypeScript strict mode
|
||||
- No any types unless absolutely necessary
|
||||
- Meaningful variable names
|
||||
- Error handling at boundaries
|
||||
|
||||
### What NOT to Do
|
||||
- Add features beyond the task
|
||||
- Refactor surrounding code
|
||||
- Add comments to unchanged code
|
||||
- Create abstractions for one-time operations
|
||||
- Design for hypothetical futures
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
- `// TODO` comments
|
||||
- `throw new Error('Not implemented')`
|
||||
- `return null` placeholders
|
||||
- `console.log` in production code
|
||||
- Empty catch blocks
|
||||
- Hardcoded values that should be config
|
||||
|
||||
## SUMMARY.md Creation
|
||||
|
||||
After plan completion, create SUMMARY.md:
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
plan: 3
|
||||
subsystem: auth
|
||||
tags: [jwt, security]
|
||||
requires: [users_table, jose]
|
||||
provides: [refresh_tokens, token_rotation]
|
||||
affects: [auth_flow, sessions]
|
||||
tech_stack: [jose, drizzle, sqlite]
|
||||
key_files:
|
||||
- src/api/auth/refresh.ts: "Rotation endpoint"
|
||||
decisions:
|
||||
- "Token family for reuse detection"
|
||||
metrics:
|
||||
tasks_completed: 3
|
||||
deviations: 2
|
||||
context_usage: "38%"
|
||||
---
|
||||
|
||||
# Summary
|
||||
|
||||
## What Was Built
|
||||
[Description of what was implemented]
|
||||
|
||||
## Implementation Notes
|
||||
[Technical details worth preserving]
|
||||
|
||||
## Deviations
|
||||
[List all Rule 1-4 deviations with details]
|
||||
|
||||
## Commits
|
||||
[List of commits created]
|
||||
|
||||
## Verification Status
|
||||
[Checklist from plan with status]
|
||||
|
||||
## Notes for Next Plan
|
||||
[Context for future work]
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
### On Task Start
|
||||
```
|
||||
position:
|
||||
task: "current task name"
|
||||
status: in_progress
|
||||
```
|
||||
|
||||
### On Task Complete
|
||||
```
|
||||
progress:
|
||||
current_phase_completed: N+1
|
||||
```
|
||||
|
||||
### On Plan Complete
|
||||
```
|
||||
sessions:
|
||||
- completed: ["Phase X, Plan Y"]
|
||||
```
|
||||
|
||||
## Error Recovery
|
||||
|
||||
### Task Fails Verification
|
||||
1. Analyze failure
|
||||
2. If fixable → fix and re-verify
|
||||
3. If not fixable → mark blocked, document issue
|
||||
4. Continue to next task if independent
|
||||
|
||||
### Context Limit Approaching
|
||||
1. Complete current task
|
||||
2. Update STATE.md with position
|
||||
3. Create handoff with resume context
|
||||
4. Exit cleanly for fresh session
|
||||
|
||||
### Unexpected Blocker
|
||||
1. Document blocker in STATE.md
|
||||
2. Check if other tasks can proceed
|
||||
3. If all blocked → escalate to orchestrator
|
||||
4. If some unblocked → continue with those
|
||||
|
||||
## Session End
|
||||
|
||||
Before ending session:
|
||||
1. Commit any uncommitted work
|
||||
2. Create SUMMARY.md if plan complete
|
||||
3. Update STATE.md with position
|
||||
4. Set next_action for resume
|
||||
|
||||
## What You Do NOT Do
|
||||
|
||||
- Make architectural decisions (Rule 4 → ask)
|
||||
- Replan work (follow the plan)
|
||||
- Add unrequested features
|
||||
- Skip verify steps
|
||||
- Leave uncommitted changes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### With Tasks Module
|
||||
- Claims tasks via `cw task update --status in_progress`
|
||||
- Closes tasks via `cw task close --reason "..."`
|
||||
- Respects dependencies (only works on ready tasks)
|
||||
|
||||
### With Orchestrator
|
||||
- Receives task assignments
|
||||
- Reports completion/blockers
|
||||
- Triggers handoff when context full
|
||||
|
||||
### With Architect
|
||||
- Consumes PLAN.md files
|
||||
- Produces SUMMARY.md feedback
|
||||
|
||||
### With Verifier
|
||||
- SUMMARY.md feeds verification
|
||||
- Verification results may spawn fix tasks
|
||||
|
||||
---
|
||||
|
||||
## Spawning
|
||||
|
||||
Orchestrator spawns Worker:
|
||||
|
||||
```typescript
|
||||
const workerResult = await spawnAgent({
|
||||
type: 'worker',
|
||||
task: 'execute-plan',
|
||||
context: {
|
||||
plan_file: '2-3-PLAN.md',
|
||||
state_file: 'STATE.md',
|
||||
prior_summaries: ['2-1-SUMMARY.md', '2-2-SUMMARY.md']
|
||||
},
|
||||
model: getModelForProfile('worker', config.modelProfile),
|
||||
worktree: 'worker-abc-123' // Isolated git worktree
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Session
|
||||
|
||||
```
|
||||
1. Load PLAN.md
|
||||
2. Load prior context (STATE.md, SUMMARY files)
|
||||
3. For each task:
|
||||
a. Mark in_progress
|
||||
b. Read files
|
||||
c. Execute action
|
||||
d. Handle deviations (Rules 1-4)
|
||||
e. Run verify
|
||||
f. Commit atomically
|
||||
g. Mark closed
|
||||
4. Create SUMMARY.md
|
||||
5. Update STATE.md
|
||||
6. Return to orchestrator
|
||||
```
|
||||
218
docs/context-engineering.md
Normal file
218
docs/context-engineering.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# Context Engineering
|
||||
|
||||
Context engineering is a first-class concern in Codewalk. Agent output quality degrades predictably as context fills. This document defines the rules that all agents must follow.
|
||||
|
||||
## Quality Degradation Curve
|
||||
|
||||
Claude's output quality follows a predictable curve based on context utilization:
|
||||
|
||||
| Context Usage | Quality Level | Behavior |
|
||||
|---------------|---------------|----------|
|
||||
| 0-30% | **PEAK** | Thorough, comprehensive, considers edge cases |
|
||||
| 30-50% | **GOOD** | Confident, solid work, reliable output |
|
||||
| 50-70% | **DEGRADING** | Efficiency mode begins, shortcuts appear |
|
||||
| 70%+ | **POOR** | Rushed, minimal, misses requirements |
|
||||
|
||||
**Rule: Stay UNDER 50% context for quality work.**
|
||||
|
||||
---
|
||||
|
||||
## Orchestrator Pattern
|
||||
|
||||
Codewalk uses thin orchestration with heavy subagent work:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Orchestrator (30-40%) │
|
||||
│ - Routes work to specialized agents │
|
||||
│ - Collects results │
|
||||
│ - Maintains state │
|
||||
│ - Coordinates across phases │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌──────────────────┼──────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Worker │ │ Architect │ │ Verifier │
|
||||
│ (200k ctx) │ │ (200k ctx) │ │ (200k ctx) │
|
||||
│ Fresh per │ │ Fresh per │ │ Fresh per │
|
||||
│ task │ │ initiative │ │ phase │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
**Key insight:** Each subagent gets a fresh 200k context window. Heavy work happens there, not in the orchestrator.
|
||||
|
||||
---
|
||||
|
||||
## Context Budgets by Role
|
||||
|
||||
### Orchestrator
|
||||
- **Target:** 30-40% max
|
||||
- **Strategy:** Route, don't process. Collect results, don't analyze.
|
||||
- **Reset trigger:** Context exceeds 50%
|
||||
|
||||
### Worker
|
||||
- **Target:** 50% per task
|
||||
- **Strategy:** Single task per context. Fresh context for each task.
|
||||
- **Reset trigger:** Task completion (always)
|
||||
|
||||
### Architect
|
||||
- **Target:** 60% per initiative analysis
|
||||
- **Strategy:** Initiative discussion + planning in single context
|
||||
- **Reset trigger:** Work plan generated or context exceeds 70%
|
||||
|
||||
### Verifier
|
||||
- **Target:** 40% per phase verification
|
||||
- **Strategy:** Goal-backward verification, gap identification
|
||||
- **Reset trigger:** Verification complete
|
||||
|
||||
---
|
||||
|
||||
## Task Sizing Rules
|
||||
|
||||
Tasks are sized to fit context budgets:
|
||||
|
||||
| Task Complexity | Context Estimate | Example |
|
||||
|-----------------|------------------|---------|
|
||||
| Simple | 10-20% | Add a field to an existing form |
|
||||
| Medium | 20-35% | Create new API endpoint with validation |
|
||||
| Complex | 35-50% | Implement auth flow with refresh tokens |
|
||||
| Too Large | >50% | **SPLIT INTO SUBTASKS** |
|
||||
|
||||
**Planning rule:** No single task should require >50% context. If estimation suggests otherwise, decompose before execution.
|
||||
|
||||
---
|
||||
|
||||
## Plan Sizing
|
||||
|
||||
Plans group 2-3 related tasks for sequential execution:
|
||||
|
||||
| Plan Size | Target Context | Notes |
|
||||
|-----------|----------------|-------|
|
||||
| Minimal (1 task) | 20-30% | Simple independent work |
|
||||
| Standard (2-3 tasks) | 40-50% | Related work, shared context |
|
||||
| Maximum | 50% | Never exceed—quality degrades |
|
||||
|
||||
**Why 2-3 tasks?** Shared context reduces overhead (file reads, understanding). More than 3 loses quality benefits.
|
||||
|
||||
---
|
||||
|
||||
## Wave-Based Parallelization
|
||||
|
||||
Compute dependency graph and assign tasks to waves:
|
||||
|
||||
```
|
||||
Wave 0: Tasks with no dependencies (run in parallel)
|
||||
↓
|
||||
Wave 1: Tasks depending only on Wave 0 (run in parallel)
|
||||
↓
|
||||
Wave 2: Tasks depending only on Wave 0-1 (run in parallel)
|
||||
↓
|
||||
...continue until all tasks assigned
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Maximum parallelization
|
||||
- Clear progress tracking
|
||||
- Natural checkpoints between waves
|
||||
|
||||
### Computation Algorithm
|
||||
|
||||
```
|
||||
1. Build dependency graph from task dependencies
|
||||
2. Find all tasks with no unresolved dependencies → Wave 0
|
||||
3. Mark Wave 0 as "resolved"
|
||||
4. Find all tasks whose dependencies are all resolved → Wave 1
|
||||
5. Repeat until all tasks assigned
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Context Handoff
|
||||
|
||||
When context fills, perform controlled handoff:
|
||||
|
||||
### STATE.md Update
|
||||
Before handoff, update session state:
|
||||
|
||||
```yaml
|
||||
position:
|
||||
phase: 2
|
||||
plan: 3
|
||||
task: "Implement refresh token rotation"
|
||||
wave: 1
|
||||
|
||||
decisions:
|
||||
- "Using jose library for JWT (not jsonwebtoken)"
|
||||
- "Refresh tokens stored in httpOnly cookie, not localStorage"
|
||||
- "15min access token, 7day refresh token"
|
||||
|
||||
blockers:
|
||||
- "Waiting for user to configure OAuth credentials"
|
||||
|
||||
next_action: "Continue with task after blocker resolved"
|
||||
```
|
||||
|
||||
### Handoff Content
|
||||
New session receives:
|
||||
- STATE.md (current position)
|
||||
- Relevant SUMMARY.md files (prior work in this phase)
|
||||
- Current PLAN.md (if executing)
|
||||
- Task context from initiative
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### Context Stuffing
|
||||
**Wrong:** Loading entire codebase at session start
|
||||
**Right:** Load files on-demand as tasks require them
|
||||
|
||||
### Orchestrator Processing
|
||||
**Wrong:** Orchestrator reads all code and makes decisions
|
||||
**Right:** Orchestrator routes to specialized agents who do the work
|
||||
|
||||
### Plan Bloat
|
||||
**Wrong:** 10-task plans to "reduce coordination overhead"
|
||||
**Right:** 2-3 task plans that fit in 50% context
|
||||
|
||||
### No Handoff State
|
||||
**Wrong:** Agent restarts with no memory of prior work
|
||||
**Right:** STATE.md preserves position, decisions, blockers
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
Track context utilization across the system:
|
||||
|
||||
| Metric | Threshold | Action |
|
||||
|--------|-----------|--------|
|
||||
| Orchestrator context | >50% | Trigger handoff |
|
||||
| Worker task context | >60% | Flag task as oversized |
|
||||
| Plan total estimate | >50% | Split plan before execution |
|
||||
| Average task context | >40% | Review decomposition strategy |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Context Estimation
|
||||
Estimate context usage before execution:
|
||||
- File reads: ~1-2% per file (varies by size)
|
||||
- Code changes: ~0.5% per change
|
||||
- Tool outputs: ~1% per tool call
|
||||
- Discussion: ~2-5% per exchange
|
||||
|
||||
### Fresh Context Triggers
|
||||
- Worker: Always fresh per task
|
||||
- Architect: Fresh per initiative
|
||||
- Verifier: Fresh per phase
|
||||
- Orchestrator: Handoff at 50%
|
||||
|
||||
### Subagent Spawning
|
||||
When spawning subagents:
|
||||
1. Provide focused context (only what's needed)
|
||||
2. Clear instructions (specific task, expected output)
|
||||
3. Collect structured results
|
||||
4. Update state with outcomes
|
||||
50
docs/database-migrations.md
Normal file
50
docs/database-migrations.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Database Migrations
|
||||
|
||||
This project uses [drizzle-kit](https://orm.drizzle.team/kit-docs/overview) for database schema management and migrations.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Schema definition:** `src/db/schema.ts` (drizzle-orm table definitions)
|
||||
- **Migration output:** `drizzle/` directory (SQL files + meta journal)
|
||||
- **Config:** `drizzle.config.ts`
|
||||
- **Runtime migrator:** `src/db/ensure-schema.ts` (calls `drizzle-orm/better-sqlite3/migrator`)
|
||||
|
||||
## How It Works
|
||||
|
||||
On every server startup, `ensureSchema(db)` runs all pending migrations from the `drizzle/` folder. Drizzle tracks applied migrations in a `__drizzle_migrations` table so only new migrations are applied. This is safe to call repeatedly.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Making schema changes
|
||||
|
||||
1. Edit `src/db/schema.ts` with your table/column changes
|
||||
2. Generate a migration:
|
||||
```bash
|
||||
npx drizzle-kit generate
|
||||
```
|
||||
3. Review the generated SQL in `drizzle/NNNN_*.sql`
|
||||
4. Commit the migration file along with your schema change
|
||||
|
||||
### Applying migrations
|
||||
|
||||
Migrations are applied automatically on server startup. No manual step needed.
|
||||
|
||||
For tests, the same `ensureSchema()` function is called on in-memory SQLite databases in `src/db/repositories/drizzle/test-helpers.ts`.
|
||||
|
||||
### Checking migration status
|
||||
|
||||
```bash
|
||||
# See what drizzle-kit would generate (dry run)
|
||||
npx drizzle-kit generate --dry-run
|
||||
|
||||
# Open drizzle studio to inspect the database
|
||||
npx drizzle-kit studio
|
||||
```
|
||||
|
||||
## Rules
|
||||
|
||||
- **Never hand-write migration SQL.** Always use `drizzle-kit generate` from the schema.
|
||||
- **Never use raw CREATE TABLE statements** for schema initialization. The migration system handles this.
|
||||
- **Always commit migration files.** They are the source of truth for database evolution.
|
||||
- **Migration files are immutable.** Once committed, never edit them. Make a new migration instead.
|
||||
- **Test with `npx vitest run`** after generating migrations to verify they work with in-memory databases.
|
||||
263
docs/deviation-rules.md
Normal file
263
docs/deviation-rules.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Deviation Rules
|
||||
|
||||
During execution, agents discover work not in the original plan. These rules define how to handle deviations **automatically, without asking for permission** (except Rule 4).
|
||||
|
||||
## The Four Rules
|
||||
|
||||
### Rule 1: Auto-Fix Bugs
|
||||
**No permission needed.**
|
||||
|
||||
Fix immediately when encountering:
|
||||
- Broken code (syntax errors, runtime errors)
|
||||
- Logic errors (wrong conditions, off-by-one)
|
||||
- Security issues (injection vulnerabilities, exposed secrets)
|
||||
- Type errors (TypeScript violations)
|
||||
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 1
|
||||
type: bug_fix
|
||||
description: "Fixed null reference in user lookup"
|
||||
location: src/services/auth.ts:45
|
||||
original_code: "user.email.toLowerCase()"
|
||||
fixed_code: "user?.email?.toLowerCase() ?? ''"
|
||||
reason: "Crashes when user not found"
|
||||
```
|
||||
|
||||
### Rule 2: Auto-Add Missing Critical Functionality
|
||||
**No permission needed.**
|
||||
|
||||
Add immediately when clearly required:
|
||||
- Error handling (try/catch for external calls)
|
||||
- Input validation (user input, API boundaries)
|
||||
- Authentication checks (protected routes)
|
||||
- CSRF protection
|
||||
- Rate limiting (if pattern exists in codebase)
|
||||
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 2
|
||||
type: missing_critical
|
||||
description: "Added input validation to createUser"
|
||||
location: src/api/users.ts:23
|
||||
added: "Zod schema validation for email, password length"
|
||||
reason: "API accepts any input without validation"
|
||||
```
|
||||
|
||||
### Rule 3: Auto-Fix Blocking Issues
|
||||
**No permission needed.**
|
||||
|
||||
Fix immediately when blocking task completion:
|
||||
- Missing dependencies (npm install)
|
||||
- Broken imports (wrong paths, missing exports)
|
||||
- Configuration errors (env vars, tsconfig)
|
||||
- Build failures (compilation errors)
|
||||
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 3
|
||||
type: blocking_issue
|
||||
description: "Added missing zod dependency"
|
||||
command: "npm install zod"
|
||||
reason: "Import fails without package"
|
||||
```
|
||||
|
||||
### Rule 4: ASK About Architectural Changes
|
||||
**Permission required.**
|
||||
|
||||
Stop and ask user before:
|
||||
- New database tables or major schema changes
|
||||
- New services or major component additions
|
||||
- Changes to API contracts
|
||||
- New external dependencies (beyond obvious needs)
|
||||
- Authentication/authorization model changes
|
||||
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 4
|
||||
type: architectural_change
|
||||
status: PENDING_APPROVAL
|
||||
description: "Considering adding Redis for session storage"
|
||||
current: "Sessions stored in SQLite"
|
||||
proposed: "Redis for distributed session storage"
|
||||
reason: "Multiple server instances need shared sessions"
|
||||
question: "Should we add Redis, or use sticky sessions instead?"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
Encountered unexpected issue
|
||||
│
|
||||
▼
|
||||
Is it broken code?
|
||||
(errors, bugs, security)
|
||||
│
|
||||
YES ─┴─ NO
|
||||
│ │
|
||||
▼ ▼
|
||||
Rule 1 Is critical functionality missing?
|
||||
Auto-fix (validation, auth, error handling)
|
||||
│
|
||||
YES ─┴─ NO
|
||||
│ │
|
||||
▼ ▼
|
||||
Rule 2 Is it blocking task completion?
|
||||
Auto-add (deps, imports, config)
|
||||
│
|
||||
YES ─┴─ NO
|
||||
│ │
|
||||
▼ ▼
|
||||
Rule 3 Is it architectural?
|
||||
Auto-fix (tables, services, contracts)
|
||||
│
|
||||
YES ─┴─ NO
|
||||
│ │
|
||||
▼ ▼
|
||||
Rule 4 Ignore or note
|
||||
ASK for future
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Documentation Requirements
|
||||
|
||||
All deviations MUST be documented in SUMMARY.md:
|
||||
|
||||
```yaml
|
||||
# 2-3-SUMMARY.md
|
||||
phase: 2
|
||||
plan: 3
|
||||
|
||||
deviations:
|
||||
- rule: 1
|
||||
type: bug_fix
|
||||
description: "Fixed null reference in auth service"
|
||||
location: src/services/auth.ts:45
|
||||
|
||||
- rule: 2
|
||||
type: missing_critical
|
||||
description: "Added Zod validation to user API"
|
||||
location: src/api/users.ts:23-45
|
||||
|
||||
- rule: 3
|
||||
type: blocking_issue
|
||||
description: "Installed missing jose dependency"
|
||||
command: "npm install jose"
|
||||
|
||||
- rule: 4
|
||||
type: architectural_change
|
||||
status: APPROVED
|
||||
description: "Added refresh_tokens table"
|
||||
approved_by: user
|
||||
approved_at: 2024-01-15T10:30:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deviation Tracking in Tasks
|
||||
|
||||
When a deviation is significant, create tracking:
|
||||
|
||||
### Minor Deviations
|
||||
Log in SUMMARY.md, no separate task.
|
||||
|
||||
### Major Deviations (Rule 4)
|
||||
Create a decision record:
|
||||
|
||||
```sql
|
||||
INSERT INTO task_history (
|
||||
task_id,
|
||||
field,
|
||||
old_value,
|
||||
new_value,
|
||||
changed_by
|
||||
) VALUES (
|
||||
'current-task-id',
|
||||
'deviation',
|
||||
NULL,
|
||||
'{"rule": 4, "description": "Added Redis", "approved": true}',
|
||||
'worker-123'
|
||||
);
|
||||
```
|
||||
|
||||
### Deviations That Spawn Work
|
||||
If fixing a deviation requires substantial work:
|
||||
|
||||
1. Complete current task
|
||||
2. Create new task for deviation work
|
||||
3. Link new task as dependency if blocking
|
||||
4. Continue with original plan
|
||||
|
||||
---
|
||||
|
||||
## Examples by Category
|
||||
|
||||
### Rule 1: Bug Fixes
|
||||
|
||||
| Issue | Fix | Documentation |
|
||||
|-------|-----|---------------|
|
||||
| Undefined property access | Add optional chaining | Note in summary |
|
||||
| SQL injection vulnerability | Use parameterized query | Note + security flag |
|
||||
| Race condition in async code | Add proper await | Note in summary |
|
||||
| Incorrect error message | Fix message text | Note in summary |
|
||||
|
||||
### Rule 2: Missing Critical
|
||||
|
||||
| Gap | Addition | Documentation |
|
||||
|-----|----------|---------------|
|
||||
| No input validation | Add Zod/Yup schema | Note in summary |
|
||||
| No error handling | Add try/catch + logging | Note in summary |
|
||||
| No auth check | Add middleware | Note in summary |
|
||||
| No CSRF token | Add csrf protection | Note + security flag |
|
||||
|
||||
### Rule 3: Blocking Issues
|
||||
|
||||
| Blocker | Resolution | Documentation |
|
||||
|---------|------------|---------------|
|
||||
| Missing npm package | npm install | Note in summary |
|
||||
| Wrong import path | Fix path | Note in summary |
|
||||
| Missing env var | Add to .env.example | Note in summary |
|
||||
| TypeScript config issue | Fix tsconfig | Note in summary |
|
||||
|
||||
### Rule 4: Architectural (ASK FIRST)
|
||||
|
||||
| Change | Why Ask | Question Format |
|
||||
|--------|---------|-----------------|
|
||||
| New DB table | Schema is contract | "Need users_sessions table. Create it?" |
|
||||
| New service | Architectural decision | "Extract auth to separate service?" |
|
||||
| API contract change | Breaking change | "Change POST /users response format?" |
|
||||
| New external dep | Maintenance burden | "Add Redis for caching?" |
|
||||
|
||||
---
|
||||
|
||||
## Integration with Verification
|
||||
|
||||
Deviations are inputs to verification:
|
||||
|
||||
1. **Verifier loads SUMMARY.md** with deviation list
|
||||
2. **Bug fixes (Rule 1)** verify the fix doesn't break tests
|
||||
3. **Critical additions (Rule 2)** verify they're properly integrated
|
||||
4. **Blocking fixes (Rule 3)** verify build/tests pass
|
||||
5. **Architectural changes (Rule 4)** verify they match approved design
|
||||
|
||||
---
|
||||
|
||||
## Escalation Path
|
||||
|
||||
If unsure which rule applies:
|
||||
|
||||
1. **Default to Rule 4** (ask) rather than making wrong assumption
|
||||
2. Document uncertainty in deviation notes
|
||||
3. Include reasoning for why you're asking
|
||||
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 4
|
||||
type: uncertain
|
||||
description: "Adding caching layer to API responses"
|
||||
reason: "Could be Rule 2 (performance is critical) or Rule 4 (new infrastructure)"
|
||||
question: "Is Redis caching appropriate here, or should we use in-memory?"
|
||||
```
|
||||
434
docs/execution-artifacts.md
Normal file
434
docs/execution-artifacts.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Execution Artifacts
|
||||
|
||||
Execution produces artifacts that document what happened, enable debugging, and provide context for future work.
|
||||
|
||||
## Artifact Types
|
||||
|
||||
| Artifact | Created By | Purpose |
|
||||
|----------|------------|---------|
|
||||
| PLAN.md | Architect | Executable instructions for a plan |
|
||||
| SUMMARY.md | Worker | Record of what actually happened |
|
||||
| VERIFICATION.md | Verifier | Goal-backward verification results |
|
||||
| UAT.md | Verifier + User | User acceptance testing results |
|
||||
| STATE.md | All agents | Session state (see [session-state.md](session-state.md)) |
|
||||
|
||||
---
|
||||
|
||||
## PLAN.md
|
||||
|
||||
Plans are **executable prompts**, not documents that transform into prompts.
|
||||
|
||||
### Structure
|
||||
|
||||
```yaml
|
||||
---
|
||||
# Frontmatter
|
||||
phase: 2
|
||||
plan: 3
|
||||
type: execute # execute | tdd
|
||||
wave: 1
|
||||
depends_on: [2-2-PLAN]
|
||||
files_modified:
|
||||
- src/api/auth/refresh.ts
|
||||
- src/middleware/auth.ts
|
||||
- db/migrations/002_refresh_tokens.sql
|
||||
autonomous: true # false if checkpoints required
|
||||
must_haves:
|
||||
observable_truths:
|
||||
- "Refresh token extends session"
|
||||
- "Old token invalidated after rotation"
|
||||
required_artifacts:
|
||||
- src/api/auth/refresh.ts
|
||||
required_wiring:
|
||||
- "refresh endpoint -> token storage"
|
||||
user_setup: [] # Human prereqs if any
|
||||
---
|
||||
|
||||
# Phase 2, Plan 3: Refresh Token Rotation
|
||||
|
||||
## Objective
|
||||
Implement refresh token rotation to extend user sessions securely while preventing token reuse attacks.
|
||||
|
||||
## Context
|
||||
@file: PROJECT.md (project overview)
|
||||
@file: 2-CONTEXT.md (phase decisions)
|
||||
@file: 2-1-SUMMARY.md (prior work)
|
||||
@file: 2-2-SUMMARY.md (prior work)
|
||||
|
||||
## Tasks
|
||||
|
||||
### Task 1: Create refresh_tokens table
|
||||
- **type:** auto
|
||||
- **files:** db/migrations/002_refresh_tokens.sql, src/db/schema/refreshTokens.ts
|
||||
- **action:** Create table with: id (uuid), user_id (fk), token_hash (sha256), family (uuid for rotation tracking), expires_at, created_at, revoked_at. Index on token_hash and user_id.
|
||||
- **verify:** `cw db migrate` succeeds, schema matches
|
||||
- **done:** Migration applies, drizzle schema matches SQL
|
||||
|
||||
### Task 2: Implement rotation endpoint
|
||||
- **type:** auto
|
||||
- **files:** src/api/auth/refresh.ts
|
||||
- **action:** POST /api/auth/refresh accepts refresh token in httpOnly cookie. Validate token exists and not expired. Generate new access + refresh tokens. Store new refresh, revoke old. Set cookies. Return 200 with new access token.
|
||||
- **verify:** curl with valid refresh cookie returns new tokens
|
||||
- **done:** Rotation works, old token invalidated
|
||||
|
||||
### Task 3: Add token family validation
|
||||
- **type:** auto
|
||||
- **files:** src/api/auth/refresh.ts
|
||||
- **action:** If revoked token reused, revoke entire family (reuse detection). Log security event.
|
||||
- **verify:** Reusing old token revokes all tokens in family
|
||||
- **done:** Reuse detection active
|
||||
|
||||
## Verification Criteria
|
||||
- [ ] New refresh token issued on rotation
|
||||
- [ ] Old refresh token no longer valid
|
||||
- [ ] Reused token triggers family revocation
|
||||
- [ ] Access token returned in response
|
||||
- [ ] Cookies set with correct flags (httpOnly, secure, sameSite)
|
||||
|
||||
## Success Criteria
|
||||
- All tasks complete with passing verify steps
|
||||
- No TypeScript errors
|
||||
- Tests cover happy path and reuse detection
|
||||
```
|
||||
|
||||
### Key Elements
|
||||
|
||||
| Element | Purpose |
|
||||
|---------|---------|
|
||||
| `type: execute\|tdd` | Execution strategy |
|
||||
| `wave` | Parallelization grouping |
|
||||
| `depends_on` | Must complete first |
|
||||
| `files_modified` | Git tracking, conflict detection |
|
||||
| `autonomous` | Can run without checkpoints |
|
||||
| `must_haves` | Verification criteria |
|
||||
| `@file` references | Context to load |
|
||||
|
||||
---
|
||||
|
||||
## SUMMARY.md
|
||||
|
||||
Created after plan execution. Documents what **actually happened**.
|
||||
|
||||
### Structure
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
plan: 3
|
||||
subsystem: auth
|
||||
tags: [jwt, security, tokens]
|
||||
requires:
|
||||
- users table
|
||||
- jose library
|
||||
provides:
|
||||
- refresh token rotation
|
||||
- reuse detection
|
||||
affects:
|
||||
- auth flow
|
||||
- session management
|
||||
tech_stack:
|
||||
- jose (JWT)
|
||||
- drizzle (ORM)
|
||||
- sqlite
|
||||
key_files:
|
||||
- src/api/auth/refresh.ts: "Rotation endpoint"
|
||||
- src/db/schema/refreshTokens.ts: "Token storage"
|
||||
decisions:
|
||||
- "Token family for reuse detection"
|
||||
- "SHA256 hash for token storage"
|
||||
metrics:
|
||||
tasks_completed: 3
|
||||
tasks_total: 3
|
||||
deviations: 2
|
||||
execution_time: "45 minutes"
|
||||
context_usage: "38%"
|
||||
---
|
||||
|
||||
# Phase 2, Plan 3 Summary: Refresh Token Rotation
|
||||
|
||||
## What Was Built
|
||||
Implemented refresh token rotation with security features:
|
||||
- Rotation endpoint at POST /api/auth/refresh
|
||||
- Token storage with family tracking
|
||||
- Reuse detection that revokes entire token family
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Storage
|
||||
Tokens stored as SHA256 hashes (never plaintext). Family UUID links related tokens for rotation tracking.
|
||||
|
||||
### Rotation Flow
|
||||
1. Receive refresh token in cookie
|
||||
2. Hash and lookup in database
|
||||
3. Verify not expired, not revoked
|
||||
4. Generate new access + refresh tokens
|
||||
5. Store new refresh with same family
|
||||
6. Revoke old refresh token
|
||||
7. Set new cookies, return access token
|
||||
|
||||
### Reuse Detection
|
||||
If a revoked token is presented, the entire family is revoked. This catches scenarios where an attacker captured an old token.
|
||||
|
||||
## Deviations
|
||||
|
||||
### Rule 2: Added rate limiting
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 2
|
||||
type: missing_critical
|
||||
description: "Added rate limiting to refresh endpoint"
|
||||
location: src/api/auth/refresh.ts:12
|
||||
reason: "Prevent brute force token guessing"
|
||||
```
|
||||
|
||||
### Rule 1: Fixed async handler
|
||||
```yaml
|
||||
deviation:
|
||||
rule: 1
|
||||
type: bug_fix
|
||||
description: "Added await to database query"
|
||||
location: src/api/auth/refresh.ts:34
|
||||
reason: "Query returned promise, not result"
|
||||
```
|
||||
|
||||
## Commits
|
||||
- `feat(2-3): create refresh_tokens table and schema`
|
||||
- `feat(2-3): implement token rotation endpoint`
|
||||
- `feat(2-3): add token family reuse detection`
|
||||
- `fix(2-3): add await to token lookup query`
|
||||
- `feat(2-3): add rate limiting to refresh endpoint`
|
||||
|
||||
## Verification Status
|
||||
- [x] New refresh token issued on rotation
|
||||
- [x] Old refresh token invalidated
|
||||
- [x] Reuse detection works
|
||||
- [x] Cookies set correctly
|
||||
- [ ] **Pending human verification:** Cookie flags in production
|
||||
|
||||
## Notes for Next Plan
|
||||
- Rate limiting added; may need tuning based on load
|
||||
- Token family approach may need cleanup job for old families
|
||||
```
|
||||
|
||||
### What to Include
|
||||
|
||||
| Section | Content |
|
||||
|---------|---------|
|
||||
| Frontmatter | Metadata for future queries |
|
||||
| What Was Built | High-level summary |
|
||||
| Implementation Notes | Technical details worth preserving |
|
||||
| Deviations | All Rules 1-4 deviations with details |
|
||||
| Commits | Git commit messages created |
|
||||
| Verification Status | What passed, what's pending |
|
||||
| Notes for Next Plan | Context for future work |
|
||||
|
||||
---
|
||||
|
||||
## VERIFICATION.md
|
||||
|
||||
Created by Verifier after phase completion.
|
||||
|
||||
### Structure
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
status: PASS # PASS | GAPS_FOUND
|
||||
verified_at: 2024-01-15T10:30:00Z
|
||||
verified_by: verifier-agent
|
||||
---
|
||||
|
||||
# Phase 2 Verification: JWT Implementation
|
||||
|
||||
## Observable Truths
|
||||
|
||||
| Truth | Status | Evidence |
|
||||
|-------|--------|----------|
|
||||
| User can log in with email/password | VERIFIED | Login endpoint returns tokens, sets cookies |
|
||||
| Sessions persist across page refresh | VERIFIED | Cookie-based token survives reload |
|
||||
| Token refresh extends session | VERIFIED | Refresh endpoint issues new tokens |
|
||||
| Expired tokens rejected | VERIFIED | 401 returned for expired access token |
|
||||
|
||||
## Required Artifacts
|
||||
|
||||
| Artifact | Status | Check |
|
||||
|----------|--------|-------|
|
||||
| src/api/auth/login.ts | EXISTS | Exports login handler |
|
||||
| src/api/auth/refresh.ts | EXISTS | Exports refresh handler |
|
||||
| src/middleware/auth.ts | EXISTS | Exports auth middleware |
|
||||
| db/migrations/002_refresh_tokens.sql | EXISTS | Creates table |
|
||||
|
||||
## Required Wiring
|
||||
|
||||
| From | To | Status | Evidence |
|
||||
|------|-----|--------|----------|
|
||||
| Login handler | Token generation | WIRED | login.ts:45 calls createTokens |
|
||||
| Auth middleware | Token validation | WIRED | auth.ts:23 calls verifyToken |
|
||||
| Refresh handler | Token rotation | WIRED | refresh.ts:67 calls rotateToken |
|
||||
| Protected routes | Auth middleware | WIRED | routes.ts uses auth middleware |
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| Pattern | Found | Location |
|
||||
|---------|-------|----------|
|
||||
| TODO comments | NO | - |
|
||||
| Stub implementations | NO | - |
|
||||
| Console.log in handlers | YES | src/api/auth/login.ts:34 (debug log) |
|
||||
| Empty catch blocks | NO | - |
|
||||
|
||||
## Human Verification Needed
|
||||
|
||||
| Check | Reason |
|
||||
|-------|--------|
|
||||
| Cookie flags in production | Requires deployed environment |
|
||||
| Token timing accuracy | Requires wall-clock testing |
|
||||
|
||||
## Gaps Found
|
||||
None blocking. One console.log should be removed before production.
|
||||
|
||||
## Remediation
|
||||
- Task created: "Remove debug console.log from login handler"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## UAT.md
|
||||
|
||||
User Acceptance Testing results.
|
||||
|
||||
### Structure
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 2
|
||||
tested_by: user
|
||||
tested_at: 2024-01-15T14:00:00Z
|
||||
status: PASS # PASS | ISSUES_FOUND
|
||||
---
|
||||
|
||||
# Phase 2 UAT: JWT Implementation
|
||||
|
||||
## Test Cases
|
||||
|
||||
### 1. Login with email and password
|
||||
**Prompt:** "Can you log in with your email and password?"
|
||||
**Result:** PASS
|
||||
**Notes:** Login successful, redirected to dashboard
|
||||
|
||||
### 2. Session persists on refresh
|
||||
**Prompt:** "Refresh the page. Are you still logged in?"
|
||||
**Result:** PASS
|
||||
**Notes:** Still authenticated after refresh
|
||||
|
||||
### 3. Logout clears session
|
||||
**Prompt:** "Click logout. Can you access the dashboard?"
|
||||
**Result:** PASS
|
||||
**Notes:** Redirected to login page
|
||||
|
||||
### 4. Expired session prompts re-login
|
||||
**Prompt:** "Wait 15 minutes (or we can simulate). Does the session refresh?"
|
||||
**Result:** SKIPPED
|
||||
**Reason:** "User chose to trust token rotation implementation"
|
||||
|
||||
## Issues Found
|
||||
None.
|
||||
|
||||
## Sign-Off
|
||||
User confirms Phase 2 JWT Implementation meets requirements.
|
||||
Next: Proceed to Phase 3 (OAuth Integration)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Artifact Storage
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
.planning/
|
||||
├── phases/
|
||||
│ ├── 1/
|
||||
│ │ ├── 1-CONTEXT.md
|
||||
│ │ ├── 1-1-PLAN.md
|
||||
│ │ ├── 1-1-SUMMARY.md
|
||||
│ │ ├── 1-2-PLAN.md
|
||||
│ │ ├── 1-2-SUMMARY.md
|
||||
│ │ └── 1-VERIFICATION.md
|
||||
│ └── 2/
|
||||
│ ├── 2-CONTEXT.md
|
||||
│ ├── 2-1-PLAN.md
|
||||
│ ├── 2-1-SUMMARY.md
|
||||
│ ├── 2-2-PLAN.md
|
||||
│ ├── 2-2-SUMMARY.md
|
||||
│ ├── 2-3-PLAN.md
|
||||
│ ├── 2-3-SUMMARY.md
|
||||
│ ├── 2-VERIFICATION.md
|
||||
│ └── 2-UAT.md
|
||||
├── STATE.md
|
||||
└── config.json
|
||||
```
|
||||
|
||||
### Naming Convention
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| `{phase}-CONTEXT.md` | Discussion decisions for phase |
|
||||
| `{phase}-{plan}-PLAN.md` | Executable plan |
|
||||
| `{phase}-{plan}-SUMMARY.md` | Execution record |
|
||||
| `{phase}-VERIFICATION.md` | Phase verification |
|
||||
| `{phase}-UAT.md` | User acceptance testing |
|
||||
|
||||
---
|
||||
|
||||
## Commit Strategy
|
||||
|
||||
Each task produces an atomic commit:
|
||||
|
||||
```
|
||||
{type}({phase}-{plan}): {description}
|
||||
|
||||
- Detail 1
|
||||
- Detail 2
|
||||
```
|
||||
|
||||
### Types
|
||||
- `feat`: New functionality
|
||||
- `fix`: Bug fix
|
||||
- `test`: Test additions
|
||||
- `refactor`: Code restructuring
|
||||
- `perf`: Performance improvement
|
||||
- `docs`: Documentation
|
||||
- `style`: Formatting only
|
||||
- `chore`: Maintenance
|
||||
|
||||
### Examples
|
||||
```
|
||||
feat(2-3): implement refresh token rotation
|
||||
|
||||
- Add refresh_tokens table with family tracking
|
||||
- Implement rotation endpoint at POST /api/auth/refresh
|
||||
- Add reuse detection with family revocation
|
||||
|
||||
fix(2-3): add await to token lookup query
|
||||
|
||||
- Token lookup was returning promise instead of result
|
||||
- Added proper await in refresh handler
|
||||
|
||||
feat(2-3): add rate limiting to refresh endpoint
|
||||
|
||||
- [Deviation Rule 2] Added express-rate-limit
|
||||
- 10 requests per minute per IP
|
||||
- Prevents brute force token guessing
|
||||
```
|
||||
|
||||
### Metadata Commit
|
||||
|
||||
After plan completion:
|
||||
```
|
||||
chore(2-3): complete plan execution
|
||||
|
||||
Artifacts:
|
||||
- 2-3-SUMMARY.md created
|
||||
- STATE.md updated
|
||||
- 3 tasks completed, 2 deviations handled
|
||||
```
|
||||
520
docs/initiatives.md
Normal file
520
docs/initiatives.md
Normal file
@@ -0,0 +1,520 @@
|
||||
# Initiatives Module
|
||||
|
||||
Initiatives are the planning layer for larger features. They provide a Notion-like document hierarchy for capturing context, decisions, and requirements before work begins. Once approved, initiatives generate phased task plans that agents execute.
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
### Why Initiatives?
|
||||
|
||||
Tasks are atomic work units—great for execution but too granular for planning. Initiatives bridge the gap:
|
||||
|
||||
- **Before approval**: A living document where user and Architect refine the vision
|
||||
- **After approval**: A persistent knowledge base that tasks link back to
|
||||
- **Forever**: Context for future work ("why did we build it this way?")
|
||||
|
||||
### Notion-Like Structure
|
||||
|
||||
Initiatives aren't flat documents. They're hierarchical pages:
|
||||
|
||||
```
|
||||
Initiative: User Authentication
|
||||
├── User Journeys
|
||||
│ ├── Sign Up Flow
|
||||
│ └── Password Reset Flow
|
||||
├── Business Rules
|
||||
│ └── Password Requirements
|
||||
├── Technical Concept
|
||||
│ ├── JWT Strategy
|
||||
│ └── Session Management
|
||||
└── Architectural Changes
|
||||
└── Auth Middleware
|
||||
```
|
||||
|
||||
Each "page" is a record in SQLite with parent-child relationships. This enables:
|
||||
- Structured queries: "Give me all subpages of initiative X"
|
||||
- Inventory views: "List all technical concepts across initiatives"
|
||||
- Cross-references: Link between pages
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### Initiative Entity
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | TEXT | Primary key (e.g., `init-a1b2c3`) |
|
||||
| `project_id` | TEXT | Scopes to a project (most initiatives are single-project) |
|
||||
| `title` | TEXT | Initiative name |
|
||||
| `status` | TEXT | `draft`, `review`, `approved`, `in_progress`, `completed`, `rejected` |
|
||||
| `created_by` | TEXT | User who created it |
|
||||
| `created_at` | INTEGER | Unix timestamp |
|
||||
| `updated_at` | INTEGER | Unix timestamp |
|
||||
| `approved_at` | INTEGER | When approved (null if not approved) |
|
||||
| `approved_by` | TEXT | Who approved it |
|
||||
|
||||
### Initiative Page Entity
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | TEXT | Primary key (e.g., `page-x1y2z3`) |
|
||||
| `initiative_id` | TEXT | Parent initiative |
|
||||
| `parent_page_id` | TEXT | Parent page (null for root-level pages) |
|
||||
| `type` | TEXT | `user_journey`, `business_rule`, `technical_concept`, `architectural_change`, `note`, `custom` |
|
||||
| `title` | TEXT | Page title |
|
||||
| `content` | TEXT | Markdown content |
|
||||
| `sort_order` | INTEGER | Display order among siblings |
|
||||
| `created_at` | INTEGER | Unix timestamp |
|
||||
| `updated_at` | INTEGER | Unix timestamp |
|
||||
|
||||
### Initiative Phase Entity
|
||||
|
||||
Phases group tasks for staged execution and rolling approval.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | TEXT | Primary key (e.g., `phase-p1q2r3`) |
|
||||
| `initiative_id` | TEXT | Parent initiative |
|
||||
| `number` | INTEGER | Phase number (1, 2, 3...) |
|
||||
| `name` | TEXT | Phase name |
|
||||
| `description` | TEXT | What this phase delivers |
|
||||
| `status` | TEXT | `draft`, `pending_approval`, `approved`, `in_progress`, `completed` |
|
||||
| `approved_at` | INTEGER | When approved |
|
||||
| `approved_by` | TEXT | Who approved |
|
||||
| `created_at` | INTEGER | Unix timestamp |
|
||||
|
||||
### Task Link
|
||||
|
||||
Tasks reference their initiative and phase:
|
||||
|
||||
```sql
|
||||
-- In tasks table (see docs/tasks.md)
|
||||
initiative_id TEXT REFERENCES initiatives(id),
|
||||
phase_id TEXT REFERENCES initiative_phases(id),
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQLite Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE initiatives (
|
||||
id TEXT PRIMARY KEY,
|
||||
project_id TEXT,
|
||||
title TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'draft'
|
||||
CHECK (status IN ('draft', 'review', 'approved', 'in_progress', 'completed', 'rejected')),
|
||||
created_by TEXT,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
approved_at INTEGER,
|
||||
approved_by TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE initiative_pages (
|
||||
id TEXT PRIMARY KEY,
|
||||
initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
|
||||
parent_page_id TEXT REFERENCES initiative_pages(id) ON DELETE CASCADE,
|
||||
type TEXT NOT NULL DEFAULT 'note'
|
||||
CHECK (type IN ('user_journey', 'business_rule', 'technical_concept', 'architectural_change', 'note', 'custom')),
|
||||
title TEXT NOT NULL,
|
||||
content TEXT,
|
||||
sort_order INTEGER NOT NULL DEFAULT 0,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
updated_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
CREATE TABLE initiative_phases (
|
||||
id TEXT PRIMARY KEY,
|
||||
initiative_id TEXT NOT NULL REFERENCES initiatives(id) ON DELETE CASCADE,
|
||||
number INTEGER NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'draft'
|
||||
CHECK (status IN ('draft', 'pending_approval', 'approved', 'in_progress', 'completed')),
|
||||
approved_at INTEGER,
|
||||
approved_by TEXT,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
UNIQUE(initiative_id, number)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_initiatives_project ON initiatives(project_id);
|
||||
CREATE INDEX idx_initiatives_status ON initiatives(status);
|
||||
CREATE INDEX idx_pages_initiative ON initiative_pages(initiative_id);
|
||||
CREATE INDEX idx_pages_parent ON initiative_pages(parent_page_id);
|
||||
CREATE INDEX idx_pages_type ON initiative_pages(type);
|
||||
CREATE INDEX idx_phases_initiative ON initiative_phases(initiative_id);
|
||||
CREATE INDEX idx_phases_status ON initiative_phases(status);
|
||||
|
||||
-- Useful views
|
||||
CREATE VIEW initiative_page_tree AS
|
||||
WITH RECURSIVE tree AS (
|
||||
SELECT id, initiative_id, parent_page_id, title, type, 0 as depth,
|
||||
title as path
|
||||
FROM initiative_pages WHERE parent_page_id IS NULL
|
||||
UNION ALL
|
||||
SELECT p.id, p.initiative_id, p.parent_page_id, p.title, p.type, t.depth + 1,
|
||||
t.path || ' > ' || p.title
|
||||
FROM initiative_pages p
|
||||
JOIN tree t ON p.parent_page_id = t.id
|
||||
)
|
||||
SELECT * FROM tree ORDER BY path;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Status Workflow
|
||||
|
||||
### Initiative Status
|
||||
|
||||
```
|
||||
[draft] ──submit──▶ [review] ──approve──▶ [approved]
|
||||
│ │ │
|
||||
│ │ reject │ start work
|
||||
│ ▼ ▼
|
||||
│ [rejected] [in_progress]
|
||||
│ │
|
||||
│ │ all phases done
|
||||
└──────────────────────────────────────────▶ [completed]
|
||||
```
|
||||
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| `draft` | User/Architect still refining |
|
||||
| `review` | Ready for approval decision |
|
||||
| `approved` | Work plan created, awaiting execution |
|
||||
| `in_progress` | At least one phase executing |
|
||||
| `completed` | All phases completed |
|
||||
| `rejected` | Won't implement |
|
||||
|
||||
### Phase Status
|
||||
|
||||
```
|
||||
[draft] ──finalize──▶ [pending_approval] ──approve──▶ [approved]
|
||||
│
|
||||
│ claim tasks
|
||||
▼
|
||||
[in_progress]
|
||||
│
|
||||
│ all tasks closed
|
||||
▼
|
||||
[completed]
|
||||
```
|
||||
|
||||
**Rolling approval pattern:**
|
||||
1. Architect creates work plan with multiple phases
|
||||
2. User approves Phase 1 → agents start executing
|
||||
3. While Phase 1 executes, user reviews Phase 2
|
||||
4. Phase 2 approved → agents can start when ready
|
||||
5. Continue until all phases approved/completed
|
||||
|
||||
This prevents blocking: agents don't wait for all phases to be approved upfront.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Draft Initiative
|
||||
|
||||
User creates initiative with basic vision:
|
||||
|
||||
```
|
||||
cw initiative create "User Authentication"
|
||||
```
|
||||
|
||||
System creates initiative in `draft` status with empty page structure.
|
||||
|
||||
### 2. Architect Iteration (Questioning)
|
||||
|
||||
Architect agent engages in structured questioning to capture requirements:
|
||||
|
||||
**Question Categories:**
|
||||
|
||||
| Category | Example Questions |
|
||||
|----------|-------------------|
|
||||
| **Visual Features** | Layout approach? Density? Interactions? Empty states? |
|
||||
| **APIs/CLIs** | Response format? Flags? Error handling? Verbosity? |
|
||||
| **Data/Content** | Structure? Validation rules? Edge cases? |
|
||||
| **Architecture** | Patterns to follow? What to avoid? Reference code? |
|
||||
|
||||
Each answer populates initiative pages. Architect may:
|
||||
- Create user journey pages
|
||||
- Document business rules
|
||||
- Draft technical concepts
|
||||
- Flag architectural impacts
|
||||
|
||||
See [agents/architect.md](agents/architect.md) for the full Architect agent prompt.
|
||||
|
||||
### 3. Discussion Phase (Per Phase)
|
||||
|
||||
Before planning each phase, the Architect captures implementation decisions through focused discussion. This happens BEFORE any planning work.
|
||||
|
||||
```
|
||||
cw phase discuss <phase-id>
|
||||
```
|
||||
|
||||
Creates `{phase}-CONTEXT.md` with locked decisions:
|
||||
|
||||
```yaml
|
||||
---
|
||||
phase: 1
|
||||
discussed_at: 2024-01-15
|
||||
---
|
||||
|
||||
# Phase 1 Context: User Authentication
|
||||
|
||||
## Decisions
|
||||
|
||||
### Authentication Method
|
||||
**Decision:** Email/password with optional OAuth
|
||||
**Reason:** MVP needs simple auth, OAuth for convenience
|
||||
**Locked:** true
|
||||
|
||||
### Token Storage
|
||||
**Decision:** httpOnly cookies
|
||||
**Reason:** XSS protection
|
||||
**Alternatives Rejected:**
|
||||
- localStorage: XSS vulnerable
|
||||
```
|
||||
|
||||
These decisions guide all subsequent planning and execution. Workers reference CONTEXT.md for implementation direction.
|
||||
|
||||
### 4. Research Phase (Optional)
|
||||
|
||||
For phases with unknowns, run discovery before planning:
|
||||
|
||||
| Level | When | Time | Scope |
|
||||
|-------|------|------|-------|
|
||||
| L0 | Pure internal work | Skip | None |
|
||||
| L1 | Quick verification | 2-5 min | Confirm assumptions |
|
||||
| L2 | Standard research | 15-30 min | Explore patterns |
|
||||
| L3 | Deep dive | 1+ hour | Novel domain |
|
||||
|
||||
```
|
||||
cw phase research <phase-id> --level 2
|
||||
```
|
||||
|
||||
Creates `{phase}-RESEARCH.md` with findings that inform planning.
|
||||
|
||||
### 5. Submit for Review
|
||||
|
||||
When Architect and user are satisfied:
|
||||
|
||||
```
|
||||
cw initiative submit <id>
|
||||
```
|
||||
|
||||
Status changes to `review`. Triggers notification for approval.
|
||||
|
||||
### 4. Approve Initiative
|
||||
|
||||
Human reviews the complete initiative:
|
||||
|
||||
```
|
||||
cw initiative approve <id>
|
||||
```
|
||||
|
||||
Status changes to `approved`. Now work plan can be created.
|
||||
|
||||
### 5. Create Work Plan
|
||||
|
||||
Architect (or user) breaks initiative into phases:
|
||||
|
||||
```
|
||||
cw initiative plan <id>
|
||||
```
|
||||
|
||||
This creates:
|
||||
- `initiative_phases` records
|
||||
- Tasks linked to each phase via `initiative_id` + `phase_id`
|
||||
|
||||
Tasks are created in `open` status but won't be "ready" until their phase is approved.
|
||||
|
||||
### 6. Approve Phases (Rolling)
|
||||
|
||||
User reviews and approves phases one at a time:
|
||||
|
||||
```
|
||||
cw phase approve <phase-id>
|
||||
```
|
||||
|
||||
Approved phases make their tasks "ready" for agents. User can approve Phase 1, let agents work, then approve Phase 2 later.
|
||||
|
||||
### 7. Execute
|
||||
|
||||
Workers pull tasks via `cw task ready`. Tasks include:
|
||||
- Link to initiative for context
|
||||
- Link to phase for grouping
|
||||
- All normal task fields (dependencies, priority, etc.)
|
||||
|
||||
### 8. Verify Phase
|
||||
|
||||
After all tasks in a phase complete, the Verifier agent runs goal-backward verification:
|
||||
|
||||
```
|
||||
cw phase verify <phase-id>
|
||||
```
|
||||
|
||||
Verification checks:
|
||||
1. **Observable truths** — What users can observe when goal is achieved
|
||||
2. **Required artifacts** — Files that must exist (not stubs)
|
||||
3. **Required wiring** — Connections that must work
|
||||
4. **Anti-patterns** — TODOs, placeholders, empty returns
|
||||
|
||||
Creates `{phase}-VERIFICATION.md` with results. If gaps found, creates remediation tasks.
|
||||
|
||||
See [verification.md](verification.md) for detailed verification patterns.
|
||||
|
||||
### 9. User Acceptance Testing
|
||||
|
||||
After technical verification passes, run UAT:
|
||||
|
||||
```
|
||||
cw phase uat <phase-id>
|
||||
```
|
||||
|
||||
Walks user through testable deliverables:
|
||||
- "Can you log in with email and password?"
|
||||
- "Does the dashboard show your projects?"
|
||||
|
||||
Creates `{phase}-UAT.md` with results. If issues found, creates targeted fix plans.
|
||||
|
||||
### 10. Complete
|
||||
|
||||
When all tasks in all phases are closed AND verification passes:
|
||||
- Each phase auto-transitions to `completed`
|
||||
- Initiative auto-transitions to `completed`
|
||||
- Domain layer updated to reflect new state
|
||||
|
||||
---
|
||||
|
||||
## Phase Artifacts
|
||||
|
||||
Each phase produces artifacts during execution:
|
||||
|
||||
| Artifact | Created By | Purpose |
|
||||
|----------|------------|---------|
|
||||
| `{phase}-CONTEXT.md` | Architect (Discussion) | Locked implementation decisions |
|
||||
| `{phase}-RESEARCH.md` | Architect (Research) | Domain knowledge findings |
|
||||
| `{phase}-{N}-PLAN.md` | Architect (Planning) | Executable task plans |
|
||||
| `{phase}-{N}-SUMMARY.md` | Worker (Execution) | What actually happened |
|
||||
| `{phase}-VERIFICATION.md` | Verifier | Goal-backward verification |
|
||||
| `{phase}-UAT.md` | Verifier + User | User acceptance testing |
|
||||
|
||||
See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference
|
||||
|
||||
### Initiative Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw initiative create <title>` | Create draft initiative |
|
||||
| `cw initiative list [--status STATUS]` | List initiatives |
|
||||
| `cw initiative show <id>` | Show initiative with page tree |
|
||||
| `cw initiative submit <id>` | Submit for review |
|
||||
| `cw initiative approve <id>` | Approve initiative |
|
||||
| `cw initiative reject <id> --reason "..."` | Reject initiative |
|
||||
| `cw initiative plan <id>` | Generate phased work plan |
|
||||
|
||||
### Page Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw page create <initiative-id> <title> --type TYPE` | Create page |
|
||||
| `cw page create <initiative-id> <title> --parent <page-id>` | Create subpage |
|
||||
| `cw page show <id>` | Show page content |
|
||||
| `cw page edit <id>` | Edit page (opens editor) |
|
||||
| `cw page list <initiative-id> [--type TYPE]` | List pages |
|
||||
| `cw page tree <initiative-id>` | Show page hierarchy |
|
||||
|
||||
### Phase Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw phase list <initiative-id>` | List phases |
|
||||
| `cw phase show <id>` | Show phase with tasks |
|
||||
| `cw phase discuss <id>` | Capture implementation decisions (creates CONTEXT.md) |
|
||||
| `cw phase research <id> [--level N]` | Run discovery (L0-L3, creates RESEARCH.md) |
|
||||
| `cw phase approve <id>` | Approve phase for execution |
|
||||
| `cw phase verify <id>` | Run goal-backward verification |
|
||||
| `cw phase uat <id>` | Run user acceptance testing |
|
||||
| `cw phase status <id>` | Check phase progress |
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### With Tasks Module
|
||||
|
||||
Tasks gain two new fields:
|
||||
- `initiative_id`: Links task to initiative (for context)
|
||||
- `phase_id`: Links task to phase (for grouping/approval)
|
||||
|
||||
The `ready_tasks` view should consider phase approval:
|
||||
|
||||
```sql
|
||||
CREATE VIEW ready_tasks AS
|
||||
SELECT t.* FROM tasks t
|
||||
LEFT JOIN initiative_phases p ON t.phase_id = p.id
|
||||
WHERE t.status = 'open'
|
||||
AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM task_dependencies d
|
||||
JOIN tasks dep ON d.depends_on = dep.id
|
||||
WHERE d.task_id = t.id
|
||||
AND d.type = 'blocks'
|
||||
AND dep.status != 'closed'
|
||||
)
|
||||
ORDER BY t.priority ASC, t.created_at ASC;
|
||||
```
|
||||
|
||||
### With Domain Layer
|
||||
|
||||
When initiative completes, its pages can feed into domain documentation:
|
||||
- Business rules → Domain business rules
|
||||
- Technical concepts → Architecture docs
|
||||
- New aggregates → Domain model updates
|
||||
|
||||
### With Orchestrator
|
||||
|
||||
Orchestrator can:
|
||||
- Trigger Architect agents for initiative iteration
|
||||
- Monitor phase completion and auto-advance initiative status
|
||||
- Coordinate approval notifications
|
||||
|
||||
### tRPC Procedures
|
||||
|
||||
```typescript
|
||||
// Suggested tRPC router shape
|
||||
initiative.create(input) // → Initiative
|
||||
initiative.list(filters) // → Initiative[]
|
||||
initiative.get(id) // → Initiative with pages
|
||||
initiative.submit(id) // → Initiative
|
||||
initiative.approve(id) // → Initiative
|
||||
initiative.reject(id, reason) // → Initiative
|
||||
initiative.plan(id) // → Phase[]
|
||||
|
||||
page.create(input) // → Page
|
||||
page.get(id) // → Page
|
||||
page.update(id, content) // → Page
|
||||
page.list(initiativeId, filters) // → Page[]
|
||||
page.tree(initiativeId) // → PageTree
|
||||
|
||||
phase.list(initiativeId) // → Phase[]
|
||||
phase.get(id) // → Phase with tasks
|
||||
phase.approve(id) // → Phase
|
||||
phase.status(id) // → PhaseStatus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- **Templates**: Pre-built page structures for common initiative types
|
||||
- **Cross-project initiatives**: Single initiative spanning multiple projects
|
||||
- **Versioning**: Track changes to initiative pages over time
|
||||
- **Approval workflows**: Multi-step approval with different approvers
|
||||
- **Auto-planning**: LLM generates work plan from initiative content
|
||||
64
docs/logging.md
Normal file
64
docs/logging.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Structured Logging
|
||||
|
||||
Codewalk District uses [pino](https://getpino.io/) for structured JSON logging on the backend.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **pino** writes structured JSON to **stderr** so CLI user output on stdout stays clean
|
||||
- **console.log** remains for CLI command handlers (user-facing output on stdout)
|
||||
- The `src/logging/` module (ProcessLogWriter/LogManager) is a separate concern — it captures per-agent process stdout/stderr to files
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `CW_LOG_LEVEL` | Log level override (`fatal`, `error`, `warn`, `info`, `debug`, `trace`, `silent`) | `info` (production), `debug` (development) |
|
||||
| `CW_LOG_PRETTY` | Set to `1` for human-readable colorized output via pino-pretty | unset (JSON output) |
|
||||
|
||||
## Log Levels
|
||||
|
||||
| Level | Usage |
|
||||
|-------|-------|
|
||||
| `fatal` | Process will exit (uncaught exceptions, DB migration failure) |
|
||||
| `error` | Operation failed (agent crash, parse failure, clone failure) |
|
||||
| `warn` | Degraded (account exhausted, no accounts available, stale PID, reconcile marking crashed) |
|
||||
| `info` | State transitions (agent spawned/stopped/resumed, dispatch decision, server started, account selected/switched) |
|
||||
| `debug` | Implementation details (command being built, session ID extraction, worktree paths, schema selection) |
|
||||
|
||||
## Adding Logging to a New Module
|
||||
|
||||
```typescript
|
||||
import { createModuleLogger } from '../logger/index.js';
|
||||
|
||||
const log = createModuleLogger('my-module');
|
||||
|
||||
// Use structured data as first arg, message as second
|
||||
log.info({ taskId, agentId }, 'task dispatched');
|
||||
log.error({ err: error }, 'operation failed');
|
||||
log.debug({ path, count }, 'processing items');
|
||||
```
|
||||
|
||||
## Module Names
|
||||
|
||||
| Module | Used in |
|
||||
|--------|---------|
|
||||
| `agent-manager` | `src/agent/manager.ts` |
|
||||
| `dispatch` | `src/dispatch/manager.ts` |
|
||||
| `http` | `src/server/index.ts` |
|
||||
| `server` | `src/cli/index.ts` (startup) |
|
||||
| `git` | `src/git/manager.ts`, `src/git/clone.ts`, `src/git/project-clones.ts` |
|
||||
| `db` | `src/db/ensure-schema.ts` |
|
||||
|
||||
## Testing
|
||||
|
||||
Logs are silenced in tests via `CW_LOG_LEVEL=silent` in `vitest.config.ts`.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```sh
|
||||
# Pretty logs during development
|
||||
CW_LOG_LEVEL=debug CW_LOG_PRETTY=1 cw --server
|
||||
|
||||
# JSON logs for production/piping
|
||||
cw --server 2>server.log
|
||||
```
|
||||
267
docs/model-profiles.md
Normal file
267
docs/model-profiles.md
Normal file
@@ -0,0 +1,267 @@
|
||||
# Model Profiles
|
||||
|
||||
Different agent roles have different needs. Model selection balances quality, cost, and latency.
|
||||
|
||||
## Profile Definitions
|
||||
|
||||
| Profile | Use Case | Cost | Quality |
|
||||
|---------|----------|------|---------|
|
||||
| **quality** | Critical decisions, architecture | Highest | Best |
|
||||
| **balanced** | Default for most work | Medium | Good |
|
||||
| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
|
||||
|
||||
---
|
||||
|
||||
## Agent Model Assignments
|
||||
|
||||
| Agent | Quality | Balanced (Default) | Budget |
|
||||
|-------|---------|-------------------|--------|
|
||||
| **Architect** | Opus | Opus | Sonnet |
|
||||
| **Worker** | Opus | Sonnet | Sonnet |
|
||||
| **Verifier** | Sonnet | Sonnet | Haiku |
|
||||
| **Orchestrator** | Sonnet | Sonnet | Haiku |
|
||||
| **Monitor** | Sonnet | Haiku | Haiku |
|
||||
| **Researcher** | Opus | Sonnet | Haiku |
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
### Architect (Planning) - Opus/Opus/Sonnet
|
||||
Planning has the highest impact on outcomes. A bad plan wastes all downstream execution. Invest in quality here.
|
||||
|
||||
**Quality profile:** Complex systems, novel domains, critical decisions
|
||||
**Balanced profile:** Standard feature work, established patterns
|
||||
**Budget profile:** Simple initiatives, well-documented domains
|
||||
|
||||
### Worker (Execution) - Opus/Sonnet/Sonnet
|
||||
The plan already contains reasoning. Execution is implementation, not decision-making.
|
||||
|
||||
**Quality profile:** Complex algorithms, security-critical code
|
||||
**Balanced profile:** Standard implementation work
|
||||
**Budget profile:** Simple tasks, boilerplate code
|
||||
|
||||
### Verifier (Validation) - Sonnet/Sonnet/Haiku
|
||||
Verification is structured checking against defined criteria. Less reasoning needed than planning.
|
||||
|
||||
**Quality profile:** Complex verification, subtle integration issues
|
||||
**Balanced profile:** Standard goal-backward verification
|
||||
**Budget profile:** Simple pass/fail checks
|
||||
|
||||
### Orchestrator (Coordination) - Sonnet/Sonnet/Haiku
|
||||
Orchestrator routes work, doesn't do heavy lifting. Needs reliability, not creativity.
|
||||
|
||||
**Quality profile:** Complex multi-agent coordination
|
||||
**Balanced profile:** Standard workflow management
|
||||
**Budget profile:** Simple task routing
|
||||
|
||||
### Monitor (Observation) - Sonnet/Haiku/Haiku
|
||||
Monitoring is pattern matching and threshold checking. Minimal reasoning required.
|
||||
|
||||
**Quality profile:** Complex health analysis
|
||||
**Balanced profile:** Standard monitoring
|
||||
**Budget profile:** Simple heartbeat checks
|
||||
|
||||
### Researcher (Discovery) - Opus/Sonnet/Haiku
|
||||
Research is read-only exploration. High volume, low modification risk.
|
||||
|
||||
**Quality profile:** Deep domain analysis
|
||||
**Balanced profile:** Standard codebase exploration
|
||||
**Budget profile:** Simple file lookups
|
||||
|
||||
---
|
||||
|
||||
## Profile Selection
|
||||
|
||||
### Per-Initiative Override
|
||||
|
||||
```yaml
|
||||
# In initiative config
|
||||
model_profile: quality # Override default balanced
|
||||
```
|
||||
|
||||
### Per-Agent Override
|
||||
|
||||
```yaml
|
||||
# In task assignment
|
||||
assigned_to: worker-123
|
||||
model_override: opus # This task needs Opus
|
||||
```
|
||||
|
||||
### Automatic Escalation
|
||||
|
||||
```yaml
|
||||
# When to auto-escalate
|
||||
escalation_triggers:
|
||||
- condition: "task.retry_count > 2"
|
||||
action: "escalate_model"
|
||||
- condition: "task.complexity == 'high'"
|
||||
action: "use_quality_profile"
|
||||
- condition: "deviation.rule == 4"
|
||||
action: "escalate_model"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Management
|
||||
|
||||
### Estimated Token Usage
|
||||
|
||||
| Agent | Avg Tokens/Task | Profile Impact |
|
||||
|-------|-----------------|----------------|
|
||||
| Architect | 50k-100k | 3x between budget/quality |
|
||||
| Worker | 20k-50k | 2x between budget/quality |
|
||||
| Verifier | 10k-30k | 1.5x between budget/quality |
|
||||
| Orchestrator | 5k-15k | 1.5x between budget/quality |
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
1. **Right-size tasks:** Smaller tasks = less token usage
|
||||
2. **Use budget for volume:** Monitoring, simple checks
|
||||
3. **Reserve quality for impact:** Architecture, security
|
||||
4. **Profile per initiative:** Simple features use budget, complex use quality
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Default Profile
|
||||
|
||||
```json
|
||||
// .planning/config.json
|
||||
{
|
||||
"model_profile": "balanced",
|
||||
"model_overrides": {
|
||||
"architect": null,
|
||||
"worker": null,
|
||||
"verifier": null
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Quality Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "quality",
|
||||
"model_overrides": {}
|
||||
}
|
||||
```
|
||||
|
||||
### Budget Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "budget",
|
||||
"model_overrides": {
|
||||
"architect": "sonnet" // Keep architect at sonnet minimum
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mixed Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "balanced",
|
||||
"model_overrides": {
|
||||
"architect": "opus", // Invest in planning
|
||||
"worker": "sonnet", // Standard execution
|
||||
"verifier": "haiku" // Budget verification
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Capabilities Reference
|
||||
|
||||
### Opus
|
||||
- **Strengths:** Complex reasoning, nuanced decisions, novel problems
|
||||
- **Best for:** Architecture, complex algorithms, security analysis
|
||||
- **Cost:** Highest
|
||||
|
||||
### Sonnet
|
||||
- **Strengths:** Good balance of reasoning and speed, reliable
|
||||
- **Best for:** Standard development, code generation, debugging
|
||||
- **Cost:** Medium
|
||||
|
||||
### Haiku
|
||||
- **Strengths:** Fast, cheap, good for structured tasks
|
||||
- **Best for:** Monitoring, simple checks, high-volume operations
|
||||
- **Cost:** Lowest
|
||||
|
||||
---
|
||||
|
||||
## Profile Switching
|
||||
|
||||
### CLI Command
|
||||
|
||||
```bash
|
||||
# Set profile for all future work
|
||||
cw config set model_profile quality
|
||||
|
||||
# Set profile for specific initiative
|
||||
cw initiative config <id> --model-profile budget
|
||||
|
||||
# Override for single task
|
||||
cw task update <id> --model-override opus
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
```typescript
|
||||
// Set initiative profile
|
||||
await initiative.setConfig(id, { modelProfile: 'quality' });
|
||||
|
||||
// Override task model
|
||||
await task.update(id, { modelOverride: 'opus' });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Model Usage
|
||||
|
||||
Track model usage for cost analysis:
|
||||
|
||||
```sql
|
||||
CREATE TABLE model_usage (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
agent_type TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
tokens_input INTEGER,
|
||||
tokens_output INTEGER,
|
||||
task_id TEXT,
|
||||
initiative_id TEXT,
|
||||
created_at INTEGER DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
-- Usage by agent type
|
||||
SELECT agent_type, model, SUM(tokens_input + tokens_output) as total_tokens
|
||||
FROM model_usage
|
||||
GROUP BY agent_type, model;
|
||||
|
||||
-- Cost by initiative
|
||||
SELECT initiative_id,
|
||||
SUM(CASE WHEN model = 'opus' THEN tokens * 0.015
|
||||
WHEN model = 'sonnet' THEN tokens * 0.003
|
||||
WHEN model = 'haiku' THEN tokens * 0.0003 END) as estimated_cost
|
||||
FROM model_usage
|
||||
GROUP BY initiative_id;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Starting Out
|
||||
Use **balanced** profile. It provides good quality at reasonable cost.
|
||||
|
||||
### High-Stakes Projects
|
||||
Use **quality** profile. The cost difference is negligible compared to getting it right.
|
||||
|
||||
### High-Volume Work
|
||||
Use **budget** profile with architect override to sonnet. Don't skimp on planning.
|
||||
|
||||
### Learning the System
|
||||
Use **quality** profile initially. See what good output looks like before optimizing for cost.
|
||||
402
docs/session-state.md
Normal file
402
docs/session-state.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# Session State
|
||||
|
||||
Session state tracks position, decisions, and blockers across agent restarts. Unlike the Domain Layer (which tracks codebase state), session state tracks **execution state**.
|
||||
|
||||
## STATE.md
|
||||
|
||||
Every active initiative maintains a STATE.md file tracking execution progress:
|
||||
|
||||
```yaml
|
||||
# STATE.md
|
||||
initiative: init-abc123
|
||||
title: User Authentication
|
||||
|
||||
# Current Position
|
||||
position:
|
||||
phase: 2
|
||||
phase_name: "JWT Implementation"
|
||||
plan: 3
|
||||
plan_name: "Refresh Token Rotation"
|
||||
task: "Implement token rotation endpoint"
|
||||
wave: 1
|
||||
status: in_progress
|
||||
|
||||
# Progress Tracking
|
||||
progress:
|
||||
phases_total: 4
|
||||
phases_completed: 1
|
||||
current_phase_tasks: 8
|
||||
current_phase_completed: 5
|
||||
bar: "████████░░░░░░░░ 50%"
|
||||
|
||||
# Decisions Made
|
||||
decisions:
|
||||
- date: 2024-01-14
|
||||
context: "Token storage strategy"
|
||||
decision: "httpOnly cookie, not localStorage"
|
||||
reason: "XSS protection, automatic inclusion in requests"
|
||||
|
||||
- date: 2024-01-14
|
||||
context: "JWT library"
|
||||
decision: "jose over jsonwebtoken"
|
||||
reason: "Better TypeScript support, Web Crypto API"
|
||||
|
||||
- date: 2024-01-15
|
||||
context: "Refresh token lifetime"
|
||||
decision: "7 days"
|
||||
reason: "Balance between security and UX"
|
||||
|
||||
# Active Blockers
|
||||
blockers:
|
||||
- id: block-001
|
||||
description: "Waiting for OAuth credentials from client"
|
||||
blocked_since: 2024-01-15
|
||||
affects: ["Phase 3: OAuth Integration"]
|
||||
workaround: "Proceeding with email/password auth first"
|
||||
|
||||
# Session History
|
||||
sessions:
|
||||
- id: session-001
|
||||
started: 2024-01-14T09:00:00Z
|
||||
ended: 2024-01-14T17:00:00Z
|
||||
completed: ["Phase 1: Database Schema", "Phase 2 Tasks 1-3"]
|
||||
|
||||
- id: session-002
|
||||
started: 2024-01-15T09:00:00Z
|
||||
status: active
|
||||
working_on: "Phase 2, Task 4: Refresh token rotation"
|
||||
|
||||
# Next Action
|
||||
next_action: |
|
||||
Continue implementing refresh token rotation endpoint.
|
||||
After completion, run verification for Phase 2.
|
||||
If Phase 2 passes, move to Phase 3 (blocked pending OAuth creds).
|
||||
|
||||
# Context for Resume
|
||||
resume_context:
|
||||
files_modified_this_session:
|
||||
- src/api/auth/refresh.ts
|
||||
- src/middleware/auth.ts
|
||||
- db/migrations/002_refresh_tokens.sql
|
||||
|
||||
key_implementations:
|
||||
- "Refresh tokens stored in SQLite with expiry"
|
||||
- "Rotation creates new token, invalidates old"
|
||||
- "Token family tracking for reuse detection"
|
||||
|
||||
open_questions: []
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## State Updates
|
||||
|
||||
### When to Update STATE.md
|
||||
|
||||
| Event | Update |
|
||||
|-------|--------|
|
||||
| Task started | `position.task`, `position.status` |
|
||||
| Task completed | `progress.*`, `position` to next task |
|
||||
| Decision made | Add to `decisions` |
|
||||
| Blocker encountered | Add to `blockers` |
|
||||
| Blocker resolved | Remove from `blockers` |
|
||||
| Session start | Add to `sessions` |
|
||||
| Session end | Update session `ended`, `completed` |
|
||||
| Phase completed | `progress.phases_completed`, reset task counters |
|
||||
|
||||
### Atomic Updates
|
||||
|
||||
```typescript
|
||||
// Update position atomically
|
||||
await updateState({
|
||||
position: {
|
||||
phase: 2,
|
||||
plan: 3,
|
||||
task: "Implement token rotation",
|
||||
wave: 1,
|
||||
status: "in_progress"
|
||||
}
|
||||
});
|
||||
|
||||
// Add decision
|
||||
await addDecision({
|
||||
context: "Token storage",
|
||||
decision: "httpOnly cookie",
|
||||
reason: "XSS protection"
|
||||
});
|
||||
|
||||
// Record blocker
|
||||
await addBlocker({
|
||||
description: "Waiting for OAuth creds",
|
||||
affects: ["Phase 3"]
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resume Protocol
|
||||
|
||||
When resuming work:
|
||||
|
||||
### 1. Load STATE.md
|
||||
```
|
||||
Read STATE.md for initiative
|
||||
Extract: position, decisions, blockers, resume_context
|
||||
```
|
||||
|
||||
### 2. Load Relevant Context
|
||||
```
|
||||
If position.plan exists:
|
||||
Load {phase}-{plan}-PLAN.md
|
||||
Load prior SUMMARY.md files for this phase
|
||||
|
||||
If position.task exists:
|
||||
Find task in current plan
|
||||
Resume from that task
|
||||
```
|
||||
|
||||
### 3. Verify State
|
||||
```
|
||||
Check files_modified_this_session still exist
|
||||
Check implementations match key_implementations
|
||||
If mismatch: flag for review before proceeding
|
||||
```
|
||||
|
||||
### 4. Continue Execution
|
||||
```
|
||||
Display: "Resuming from Phase {N}, Plan {M}, Task: {name}"
|
||||
Display: decisions made (for context)
|
||||
Display: active blockers (for awareness)
|
||||
Continue with task execution
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Tracking
|
||||
|
||||
Decisions are first-class citizens, not comments.
|
||||
|
||||
### What to Track
|
||||
|
||||
| Type | Example | Why Track |
|
||||
|------|---------|-----------|
|
||||
| Technology choice | "Using jose for JWT" | Prevents second-guessing |
|
||||
| Architecture decision | "Separate auth service" | Documents reasoning |
|
||||
| Trade-off resolution | "Speed over features" | Explains constraints |
|
||||
| User preference | "Dark mode default" | Preserves intent |
|
||||
| Constraint discovered | "API rate limited to 100/min" | Prevents repeated discovery |
|
||||
|
||||
### Decision Format
|
||||
|
||||
```yaml
|
||||
decisions:
|
||||
- date: 2024-01-15
|
||||
context: "Where the decision was needed"
|
||||
decision: "What was decided"
|
||||
reason: "Why this choice"
|
||||
alternatives_considered:
|
||||
- "Alternative A: rejected because..."
|
||||
- "Alternative B: rejected because..."
|
||||
reversible: true|false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Blocker Management
|
||||
|
||||
### Blocker States
|
||||
|
||||
```
|
||||
[new] ──identify──▶ [active] ──resolve──▶ [resolved]
|
||||
│
|
||||
│ workaround
|
||||
▼
|
||||
[bypassed]
|
||||
```
|
||||
|
||||
### Blocker Format
|
||||
|
||||
```yaml
|
||||
blockers:
|
||||
- id: block-001
|
||||
status: active
|
||||
description: "Need production API keys"
|
||||
identified_at: 2024-01-15T10:00:00Z
|
||||
affects:
|
||||
- "Phase 4: Production deployment"
|
||||
- "Phase 5: Monitoring setup"
|
||||
blocked_tasks:
|
||||
- task-xyz: "Configure production environment"
|
||||
workaround: null
|
||||
resolution: null
|
||||
|
||||
- id: block-002
|
||||
status: bypassed
|
||||
description: "Design mockups not ready"
|
||||
identified_at: 2024-01-14T09:00:00Z
|
||||
affects: ["UI implementation"]
|
||||
workaround: "Using placeholder styles, will refine later"
|
||||
workaround_tasks:
|
||||
- task-abc: "Apply final styles when mockups ready"
|
||||
```
|
||||
|
||||
### Blocker Impact on Execution
|
||||
|
||||
1. **Task Blocking:** Task marked `blocked` in tasks table
|
||||
2. **Phase Blocking:** If all remaining tasks blocked, phase paused
|
||||
3. **Initiative Blocking:** If all phases blocked, escalate to user
|
||||
|
||||
---
|
||||
|
||||
## Session History
|
||||
|
||||
Track work sessions for debugging and handoffs:
|
||||
|
||||
```yaml
|
||||
sessions:
|
||||
- id: session-001
|
||||
agent: worker-abc
|
||||
started: 2024-01-14T09:00:00Z
|
||||
ended: 2024-01-14T12:30:00Z
|
||||
context_usage: "45%"
|
||||
completed:
|
||||
- "Phase 1, Plan 1: Database setup"
|
||||
- "Phase 1, Plan 2: User model"
|
||||
notes: "Clean execution, no issues"
|
||||
|
||||
- id: session-002
|
||||
agent: worker-def
|
||||
started: 2024-01-14T13:00:00Z
|
||||
ended: 2024-01-14T17:00:00Z
|
||||
context_usage: "62%"
|
||||
completed:
|
||||
- "Phase 1, Plan 3: Auth endpoints"
|
||||
issues:
|
||||
- "Context exceeded 50%, quality may have degraded"
|
||||
- "Encountered blocker: missing env vars"
|
||||
handoff_reason: "Context limit reached"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Storage Options
|
||||
|
||||
### SQLite (Recommended for Codewalk)
|
||||
|
||||
```sql
|
||||
CREATE TABLE initiative_state (
|
||||
initiative_id TEXT PRIMARY KEY REFERENCES initiatives(id),
|
||||
current_phase INTEGER,
|
||||
current_plan INTEGER,
|
||||
current_task TEXT,
|
||||
current_wave INTEGER,
|
||||
status TEXT,
|
||||
progress_json TEXT,
|
||||
updated_at INTEGER
|
||||
);
|
||||
|
||||
CREATE TABLE initiative_decisions (
|
||||
id TEXT PRIMARY KEY,
|
||||
initiative_id TEXT REFERENCES initiatives(id),
|
||||
date INTEGER,
|
||||
context TEXT,
|
||||
decision TEXT,
|
||||
reason TEXT,
|
||||
alternatives_json TEXT,
|
||||
reversible BOOLEAN
|
||||
);
|
||||
|
||||
CREATE TABLE initiative_blockers (
|
||||
id TEXT PRIMARY KEY,
|
||||
initiative_id TEXT REFERENCES initiatives(id),
|
||||
status TEXT CHECK (status IN ('active', 'bypassed', 'resolved')),
|
||||
description TEXT,
|
||||
identified_at INTEGER,
|
||||
affects_json TEXT,
|
||||
workaround TEXT,
|
||||
resolution TEXT,
|
||||
resolved_at INTEGER
|
||||
);
|
||||
|
||||
CREATE TABLE session_history (
|
||||
id TEXT PRIMARY KEY,
|
||||
initiative_id TEXT REFERENCES initiatives(id),
|
||||
agent_id TEXT,
|
||||
started_at INTEGER,
|
||||
ended_at INTEGER,
|
||||
context_usage REAL,
|
||||
completed_json TEXT,
|
||||
issues_json TEXT,
|
||||
handoff_reason TEXT
|
||||
);
|
||||
```
|
||||
|
||||
### File-Based (Alternative)
|
||||
|
||||
```
|
||||
.planning/
|
||||
├── STATE.md # Current state
|
||||
├── decisions/
|
||||
│ └── 2024-01-15-jwt-library.md
|
||||
├── blockers/
|
||||
│ └── block-001-oauth-creds.md
|
||||
└── sessions/
|
||||
├── session-001.md
|
||||
└── session-002.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Agents
|
||||
|
||||
### Worker
|
||||
- Reads STATE.md at start
|
||||
- Updates position on task transitions
|
||||
- Adds deviations to session notes
|
||||
- Updates progress counters
|
||||
|
||||
### Architect
|
||||
- Creates initial STATE.md when planning
|
||||
- Sets up phase/plan structure
|
||||
- Documents initial decisions
|
||||
|
||||
### Orchestrator
|
||||
- Monitors blocker status
|
||||
- Triggers resume when blockers resolve
|
||||
- Coordinates session handoffs
|
||||
|
||||
### Verifier
|
||||
- Reads decisions for verification context
|
||||
- Updates state with verification results
|
||||
- Flags issues for resolution
|
||||
|
||||
---
|
||||
|
||||
## Example: Resume After Crash
|
||||
|
||||
```
|
||||
1. Agent crashes mid-task
|
||||
|
||||
2. Supervisor detects stale assignment
|
||||
- Task assigned_at > 30min ago
|
||||
- No progress updates
|
||||
|
||||
3. Supervisor resets task
|
||||
- Status back to 'open'
|
||||
- Clear assigned_to
|
||||
|
||||
4. New agent picks up task
|
||||
- Reads STATE.md
|
||||
- Sees: "Last working on: Refresh token rotation"
|
||||
- Loads relevant PLAN.md
|
||||
- Resumes execution
|
||||
|
||||
5. STATE.md shows continuity
|
||||
sessions:
|
||||
- id: session-003
|
||||
status: crashed
|
||||
notes: "Agent unresponsive, task reset"
|
||||
- id: session-004
|
||||
status: active
|
||||
notes: "Resuming from session-003 crash"
|
||||
```
|
||||
309
docs/task-granularity.md
Normal file
309
docs/task-granularity.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# Task Granularity Standards
|
||||
|
||||
A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results and rework.
|
||||
|
||||
## The Granularity Test
|
||||
|
||||
Ask: **Can an agent execute this task without making assumptions?**
|
||||
|
||||
If the answer requires "it depends" or "probably means", the task is too vague.
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table
|
||||
|
||||
| Too Vague | Just Right |
|
||||
|-----------|------------|
|
||||
| "Add authentication" | "Add JWT auth with refresh rotation using jose library, store in httpOnly cookie, 15min access / 7day refresh" |
|
||||
| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name length 3-50 chars, returns 201 with project object" |
|
||||
| "Style the dashboard" | "Add Tailwind classes to Dashboard.tsx: grid layout (3 cols on lg, 1 on mobile), card shadows, hover states on action buttons" |
|
||||
| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner on client" |
|
||||
| "Add form validation" | "Add Zod schema to CreateProjectForm: name (3-50 chars, alphanumeric), description (optional, max 500 chars), show inline errors" |
|
||||
| "Improve performance" | "Add React.memo to ProjectCard, useMemo for filtered list in Dashboard, lazy load ProjectDetails route" |
|
||||
| "Fix the login bug" | "Fix login redirect loop: after successful login in auth.ts:45, redirect to stored returnUrl instead of always '/' " |
|
||||
| "Set up the database" | "Create SQLite database at data/cw.db with migrations in db/migrations/, run via 'cw db migrate'" |
|
||||
|
||||
---
|
||||
|
||||
## Required Task Components
|
||||
|
||||
Every task MUST include:
|
||||
|
||||
### 1. Files
|
||||
Exact paths that will be created or modified.
|
||||
|
||||
```yaml
|
||||
files:
|
||||
- src/components/Chat.tsx # create
|
||||
- src/hooks/useChat.ts # create
|
||||
- src/api/messages.ts # modify
|
||||
```
|
||||
|
||||
### 2. Action
|
||||
What to do, what to avoid, and WHY.
|
||||
|
||||
```yaml
|
||||
action: |
|
||||
Create Chat component with:
|
||||
- Message list (virtualized for performance)
|
||||
- Input field with send button
|
||||
- Auto-scroll to bottom on new message
|
||||
|
||||
DO NOT:
|
||||
- Implement WebSocket (separate task)
|
||||
- Add typing indicators (Phase 2)
|
||||
|
||||
WHY: Core chat UI needed before real-time features
|
||||
```
|
||||
|
||||
### 3. Verify
|
||||
Command or check to prove completion.
|
||||
|
||||
```yaml
|
||||
verify:
|
||||
- command: "npm run typecheck"
|
||||
expect: "No type errors"
|
||||
- command: "npm run test -- Chat.test.tsx"
|
||||
expect: "Tests pass"
|
||||
- manual: "Navigate to /chat, see empty message list and input"
|
||||
```
|
||||
|
||||
### 4. Done
|
||||
Measurable acceptance criteria.
|
||||
|
||||
```yaml
|
||||
done:
|
||||
- "Chat component renders without errors"
|
||||
- "Input accepts text and clears on submit"
|
||||
- "Messages display in chronological order"
|
||||
- "Tests cover send and display functionality"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task Types
|
||||
|
||||
### Type: auto
|
||||
Agent executes autonomously.
|
||||
|
||||
```yaml
|
||||
type: auto
|
||||
files: [src/components/Button.tsx]
|
||||
action: "Create Button component with primary/secondary variants using Tailwind"
|
||||
verify: "npm run typecheck && npm run test"
|
||||
done: "Button renders with correct styles for each variant"
|
||||
```
|
||||
|
||||
### Type: checkpoint:human-verify
|
||||
Agent completes, human confirms.
|
||||
|
||||
```yaml
|
||||
type: checkpoint:human-verify
|
||||
files: [src/pages/Dashboard.tsx]
|
||||
action: "Implement dashboard layout with project cards"
|
||||
verify: "Navigate to /dashboard after login"
|
||||
prompt: "Does the dashboard match the design mockup?"
|
||||
done: "User confirms layout is correct"
|
||||
```
|
||||
|
||||
### Type: checkpoint:decision
|
||||
Human makes choice that affects implementation.
|
||||
|
||||
```yaml
|
||||
type: checkpoint:decision
|
||||
prompt: "Which chart library should we use?"
|
||||
options:
|
||||
- recharts: "React-native, good for simple charts"
|
||||
- d3: "More powerful, steeper learning curve"
|
||||
- chart.js: "Lightweight, canvas-based"
|
||||
affects: "All subsequent charting tasks"
|
||||
```
|
||||
|
||||
### Type: checkpoint:human-action
|
||||
Unavoidable manual step.
|
||||
|
||||
```yaml
|
||||
type: checkpoint:human-action
|
||||
prompt: "Please click the verification link sent to your email"
|
||||
reason: "Cannot automate email client interaction"
|
||||
continue_after: "User confirms email verified"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Time Estimation
|
||||
|
||||
Tasks should fit within context budgets:
|
||||
|
||||
| Complexity | Context % | Wall Time | Example |
|
||||
|------------|-----------|-----------|---------|
|
||||
| Trivial | 5-10% | 2-5 min | Add a CSS class |
|
||||
| Simple | 10-20% | 5-15 min | Add form field |
|
||||
| Medium | 20-35% | 15-30 min | Create API endpoint |
|
||||
| Complex | 35-50% | 30-60 min | Implement auth flow |
|
||||
| Too Large | >50% | - | **SPLIT REQUIRED** |
|
||||
|
||||
---
|
||||
|
||||
## Splitting Large Tasks
|
||||
|
||||
When a task exceeds 50% context estimate, decompose:
|
||||
|
||||
### Before (Too Large)
|
||||
```yaml
|
||||
title: "Implement user authentication"
|
||||
# This is 3+ hours of work, dozens of decisions
|
||||
```
|
||||
|
||||
### After (Properly Decomposed)
|
||||
```yaml
|
||||
tasks:
|
||||
- title: "Create users table with password hash"
|
||||
files: [db/migrations/001_users.sql]
|
||||
|
||||
- title: "Add signup endpoint with Zod validation"
|
||||
files: [src/api/auth/signup.ts]
|
||||
depends_on: [users-table]
|
||||
|
||||
- title: "Add login endpoint with JWT generation"
|
||||
files: [src/api/auth/login.ts]
|
||||
depends_on: [users-table]
|
||||
|
||||
- title: "Create auth middleware for protected routes"
|
||||
files: [src/middleware/auth.ts]
|
||||
depends_on: [login-endpoint]
|
||||
|
||||
- title: "Add refresh token rotation"
|
||||
files: [src/api/auth/refresh.ts, db/migrations/002_refresh_tokens.sql]
|
||||
depends_on: [auth-middleware]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### Vague Verbs
|
||||
**Bad:** "Improve", "Enhance", "Update", "Fix" (without specifics)
|
||||
**Good:** "Add X", "Change Y to Z", "Remove W"
|
||||
|
||||
### Missing Constraints
|
||||
**Bad:** "Add validation"
|
||||
**Good:** "Add Zod validation: email format, password 8+ chars with number"
|
||||
|
||||
### Implied Knowledge
|
||||
**Bad:** "Handle the edge cases"
|
||||
**Good:** "Handle: empty input (show error), network failure (retry 3x), duplicate email (show message)"
|
||||
|
||||
### Compound Tasks
|
||||
**Bad:** "Set up auth and create the user management pages"
|
||||
**Good:** Two separate tasks with dependency
|
||||
|
||||
### No Success Criteria
|
||||
**Bad:** "Make it work"
|
||||
**Good:** "Tests pass, no TypeScript errors, manual verification of happy path"
|
||||
|
||||
---
|
||||
|
||||
## Examples by Domain
|
||||
|
||||
### API Endpoint
|
||||
|
||||
```yaml
|
||||
title: "Create POST /api/projects endpoint"
|
||||
files:
|
||||
- src/api/projects/create.ts
|
||||
- src/api/projects/schema.ts
|
||||
|
||||
action: |
|
||||
Create endpoint accepting:
|
||||
- name: string (3-50 chars, required)
|
||||
- description: string (max 500 chars, optional)
|
||||
|
||||
Returns:
|
||||
- 201: { id, name, description, createdAt }
|
||||
- 400: { error: "validation message" }
|
||||
- 401: { error: "Unauthorized" }
|
||||
|
||||
Use Zod for validation, drizzle for DB insert.
|
||||
|
||||
verify:
|
||||
- "npm run test -- projects.test.ts"
|
||||
- "curl -X POST /api/projects -d '{\"name\": \"Test\"}' returns 201"
|
||||
|
||||
done:
|
||||
- "Endpoint creates project in database"
|
||||
- "Validation rejects invalid input with clear messages"
|
||||
- "Auth middleware blocks unauthenticated requests"
|
||||
```
|
||||
|
||||
### React Component
|
||||
|
||||
```yaml
|
||||
title: "Create ProjectCard component"
|
||||
files:
|
||||
- src/components/ProjectCard.tsx
|
||||
- src/components/ProjectCard.test.tsx
|
||||
|
||||
action: |
|
||||
Create card displaying:
|
||||
- Project name (truncate at 30 chars)
|
||||
- Description preview (2 lines max)
|
||||
- Created date (relative: "2 days ago")
|
||||
- Status badge (active/archived)
|
||||
|
||||
Props: { project: Project, onClick: () => void }
|
||||
Use Tailwind: rounded-lg, shadow-sm, hover:shadow-md
|
||||
|
||||
verify:
|
||||
- "npm run typecheck"
|
||||
- "npm run test -- ProjectCard"
|
||||
- "Storybook renders all variants"
|
||||
|
||||
done:
|
||||
- "Card renders with all project fields"
|
||||
- "Truncation works for long names"
|
||||
- "Hover state visible"
|
||||
- "Click handler fires"
|
||||
```
|
||||
|
||||
### Database Migration
|
||||
|
||||
```yaml
|
||||
title: "Create projects table"
|
||||
files:
|
||||
- db/migrations/003_projects.sql
|
||||
- src/db/schema/projects.ts
|
||||
|
||||
action: |
|
||||
Create table:
|
||||
- id: TEXT PRIMARY KEY (uuid)
|
||||
- user_id: TEXT NOT NULL REFERENCES users(id)
|
||||
- name: TEXT NOT NULL
|
||||
- description: TEXT
|
||||
- status: TEXT DEFAULT 'active' CHECK (IN 'active', 'archived')
|
||||
- created_at: INTEGER DEFAULT unixepoch()
|
||||
- updated_at: INTEGER DEFAULT unixepoch()
|
||||
|
||||
Indexes: user_id, status, created_at DESC
|
||||
|
||||
verify:
|
||||
- "cw db migrate runs without error"
|
||||
- "sqlite3 data/cw.db '.schema projects' shows correct schema"
|
||||
|
||||
done:
|
||||
- "Migration applies cleanly"
|
||||
- "Drizzle schema matches SQL"
|
||||
- "Indexes created"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist Before Creating Task
|
||||
|
||||
- [ ] Can an agent execute this without asking questions?
|
||||
- [ ] Are all files listed explicitly?
|
||||
- [ ] Is the action specific (not "improve" or "handle")?
|
||||
- [ ] Is there a concrete verify step?
|
||||
- [ ] Are done criteria measurable?
|
||||
- [ ] Does estimated context fit under 50%?
|
||||
- [ ] Are there no compound actions (split if needed)?
|
||||
331
docs/tasks.md
Normal file
331
docs/tasks.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Tasks Module
|
||||
|
||||
Beads-inspired task management optimized for multi-agent coordination. Unlike beads (Git-distributed JSONL), this uses centralized SQLite for simplicity since all agents share the same workspace.
|
||||
|
||||
## Design Rationale
|
||||
|
||||
### Why Not Just Use Beads?
|
||||
|
||||
Beads solves a different problem: distributed task tracking across forked repos with zero coordination. We don't need that:
|
||||
|
||||
- All Workers operate in the same workspace under one `cw` server
|
||||
- SQLite is the single source of truth
|
||||
- tRPC exposes task queries directly to agents and dashboard
|
||||
- No merge conflicts, no Git overhead
|
||||
|
||||
### Core Agent Problem Solved
|
||||
|
||||
Agents need to answer: **"What should I work on next?"**
|
||||
|
||||
The `ready` query solves this: tasks that are `open` with all dependencies `closed`. Combined with priority ordering, agents can self-coordinate without human intervention.
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### Task Entity
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | TEXT | Primary key. Hash-based (e.g., `tsk-a1b2c3`) or UUID |
|
||||
| `parent_id` | TEXT | Optional. References parent task for hierarchies |
|
||||
| `initiative_id` | TEXT | Optional. Links to Initiatives module |
|
||||
| `phase_id` | TEXT | Optional. Links to initiative phase (for grouped approval) |
|
||||
| `project_id` | TEXT | Optional. Scopes task to a project |
|
||||
| `title` | TEXT | Required. Short task name |
|
||||
| `description` | TEXT | Optional. Markdown-formatted details |
|
||||
| `type` | TEXT | `task` (default), `epic`, `subtask` |
|
||||
| `status` | TEXT | `open`, `in_progress`, `blocked`, `closed` |
|
||||
| `priority` | INTEGER | 0=critical, 1=high, 2=normal (default), 3=low |
|
||||
| `assigned_to` | TEXT | Agent/worker ID currently working on this |
|
||||
| `assigned_at` | INTEGER | Unix timestamp when assigned |
|
||||
| `metadata` | TEXT | JSON blob for extensibility |
|
||||
| `created_at` | INTEGER | Unix timestamp |
|
||||
| `updated_at` | INTEGER | Unix timestamp |
|
||||
| `closed_at` | INTEGER | Unix timestamp when closed |
|
||||
| `closed_reason` | TEXT | Why/how the task was completed |
|
||||
|
||||
### Task Dependencies
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `task_id` | TEXT | The task that is blocked |
|
||||
| `depends_on` | TEXT | The task that must complete first |
|
||||
| `type` | TEXT | `blocks` (default), `related` |
|
||||
|
||||
### Task History
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | INTEGER | Auto-increment primary key |
|
||||
| `task_id` | TEXT | The task that changed |
|
||||
| `field` | TEXT | Which field changed |
|
||||
| `old_value` | TEXT | Previous value |
|
||||
| `new_value` | TEXT | New value |
|
||||
| `changed_by` | TEXT | Agent/user ID |
|
||||
| `changed_at` | INTEGER | Unix timestamp |
|
||||
|
||||
---
|
||||
|
||||
## SQLite Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE tasks (
|
||||
id TEXT PRIMARY KEY,
|
||||
parent_id TEXT REFERENCES tasks(id),
|
||||
initiative_id TEXT,
|
||||
phase_id TEXT,
|
||||
project_id TEXT,
|
||||
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
type TEXT NOT NULL DEFAULT 'task' CHECK (type IN ('task', 'epic', 'subtask')),
|
||||
|
||||
status TEXT NOT NULL DEFAULT 'open' CHECK (status IN ('open', 'in_progress', 'blocked', 'closed')),
|
||||
priority INTEGER NOT NULL DEFAULT 2 CHECK (priority BETWEEN 0 AND 3),
|
||||
|
||||
assigned_to TEXT,
|
||||
assigned_at INTEGER,
|
||||
|
||||
metadata TEXT,
|
||||
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
updated_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
closed_at INTEGER,
|
||||
closed_reason TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE task_dependencies (
|
||||
task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
|
||||
depends_on TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
|
||||
type TEXT NOT NULL DEFAULT 'blocks' CHECK (type IN ('blocks', 'related')),
|
||||
PRIMARY KEY (task_id, depends_on),
|
||||
CHECK (task_id != depends_on)
|
||||
);
|
||||
|
||||
CREATE TABLE task_history (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
task_id TEXT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
|
||||
field TEXT NOT NULL,
|
||||
old_value TEXT,
|
||||
new_value TEXT,
|
||||
changed_by TEXT,
|
||||
changed_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tasks_status ON tasks(status);
|
||||
CREATE INDEX idx_tasks_priority ON tasks(priority);
|
||||
CREATE INDEX idx_tasks_assigned ON tasks(assigned_to);
|
||||
CREATE INDEX idx_tasks_project ON tasks(project_id);
|
||||
CREATE INDEX idx_tasks_initiative ON tasks(initiative_id);
|
||||
CREATE INDEX idx_tasks_phase ON tasks(phase_id);
|
||||
CREATE INDEX idx_task_history_task ON task_history(task_id);
|
||||
|
||||
-- The critical view for agent work discovery
|
||||
-- Tasks are ready when: open, no blocking deps, and phase approved (if linked)
|
||||
CREATE VIEW ready_tasks AS
|
||||
SELECT t.* FROM tasks t
|
||||
LEFT JOIN initiative_phases p ON t.phase_id = p.id
|
||||
WHERE t.status = 'open'
|
||||
AND (t.phase_id IS NULL OR p.status IN ('approved', 'in_progress'))
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM task_dependencies d
|
||||
JOIN tasks dep ON d.depends_on = dep.id
|
||||
WHERE d.task_id = t.id
|
||||
AND d.type = 'blocks'
|
||||
AND dep.status != 'closed'
|
||||
)
|
||||
ORDER BY t.priority ASC, t.created_at ASC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Status Workflow
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
│ │
|
||||
▼ │
|
||||
[open] ──claim──▶ [in_progress] ──done──▶ [closed]
|
||||
│ │
|
||||
│ │ blocked
|
||||
│ ▼
|
||||
└───────────── [blocked] ◀─────unblock───┘
|
||||
```
|
||||
|
||||
| Transition | Trigger | Notes |
|
||||
|------------|---------|-------|
|
||||
| `open` → `in_progress` | Agent claims task | Sets `assigned_to`, `assigned_at` |
|
||||
| `in_progress` → `closed` | Work completed | Sets `closed_at`, `closed_reason` |
|
||||
| `in_progress` → `blocked` | External dependency | Manual or auto-detected |
|
||||
| `blocked` → `open` | Blocker resolved | Clears assignment |
|
||||
| `open` → `closed` | Cancelled/won't do | Direct close without work |
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference
|
||||
|
||||
All commands under `cw task` subcommand.
|
||||
|
||||
### Core Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw task ready` | List tasks ready for work (open + no blockers) |
|
||||
| `cw task list [--status STATUS] [--project ID]` | List tasks with filters |
|
||||
| `cw task show <id>` | Show task details + history |
|
||||
| `cw task create <title> [-p PRIORITY] [-d DESC]` | Create new task |
|
||||
| `cw task update <id> [--status STATUS] [--priority P]` | Update task fields |
|
||||
| `cw task close <id> [--reason REASON]` | Mark task complete |
|
||||
|
||||
### Dependency Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw task dep add <task> <depends-on>` | Task blocked by another |
|
||||
| `cw task dep rm <task> <depends-on>` | Remove dependency |
|
||||
| `cw task dep tree <id>` | Show dependency graph |
|
||||
|
||||
### Assignment Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `cw task assign <id> <agent>` | Assign task to agent |
|
||||
| `cw task unassign <id>` | Release task |
|
||||
| `cw task mine` | List tasks assigned to current agent |
|
||||
|
||||
### Output Flags (global)
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--json` | Output as JSON (for agent consumption) |
|
||||
| `--quiet` | Minimal output (just IDs) |
|
||||
|
||||
---
|
||||
|
||||
## Agent Workflow
|
||||
|
||||
Standard loop for Workers:
|
||||
|
||||
```
|
||||
1. cw task ready --json
|
||||
2. Pick highest priority task from result
|
||||
3. cw task update <id> --status in_progress
|
||||
4. Do the work
|
||||
5. cw task close <id> --reason "Implemented X"
|
||||
6. Loop to step 1
|
||||
```
|
||||
|
||||
If `cw task ready` returns empty, the agent's work is done.
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
### With Initiatives
|
||||
- Tasks can link to an initiative via `initiative_id`
|
||||
- When initiative is approved, tasks are generated from its technical concept
|
||||
- Closing all tasks for an initiative signals initiative completion
|
||||
|
||||
### With Orchestrator
|
||||
- Orchestrator queries `ready_tasks` view to dispatch work
|
||||
- Assignment tracked to prevent double-dispatch
|
||||
- Orchestrator can bulk-create tasks from job definitions
|
||||
|
||||
### With Workers
|
||||
- Workers claim tasks via `cw task update --status in_progress`
|
||||
- Worker ID stored in `assigned_to`
|
||||
- On worker crash, Supervisor can detect stale assignments and reset
|
||||
|
||||
### tRPC Procedures
|
||||
|
||||
```typescript
|
||||
// Suggested tRPC router shape
|
||||
task.list(filters) // → Task[]
|
||||
task.ready(projectId?) // → Task[]
|
||||
task.get(id) // → Task | null
|
||||
task.create(input) // → Task
|
||||
task.update(id, input) // → Task
|
||||
task.close(id, reason) // → Task
|
||||
task.assign(id, agent) // → Task
|
||||
task.history(id) // → TaskHistory[]
|
||||
task.depAdd(id, dep) // → void
|
||||
task.depRemove(id, dep) // → void
|
||||
task.depTree(id) // → DependencyTree
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task Granularity Standards
|
||||
|
||||
A task must be specific enough for execution without interpretation. Vague tasks cause agents to guess, leading to inconsistent results.
|
||||
|
||||
### Quick Reference
|
||||
|
||||
| Too Vague | Just Right |
|
||||
|-----------|------------|
|
||||
| "Add authentication" | "Add JWT auth with refresh rotation using jose, httpOnly cookie, 15min access / 7day refresh" |
|
||||
| "Create the API" | "Create POST /api/projects accepting {name, description}, validates name 3-50 chars, returns 201" |
|
||||
| "Handle errors" | "Wrap API calls in try/catch, return {error: string} on 4xx/5xx, show toast via sonner" |
|
||||
|
||||
### Required Task Components
|
||||
|
||||
Every task MUST include:
|
||||
|
||||
1. **files** — Exact paths modified/created
|
||||
2. **action** — What to do, what to avoid, WHY
|
||||
3. **verify** — Command or check to prove completion
|
||||
4. **done** — Measurable acceptance criteria
|
||||
|
||||
See [task-granularity.md](task-granularity.md) for comprehensive examples and anti-patterns.
|
||||
|
||||
### Context Budget
|
||||
|
||||
Tasks are sized to fit agent context budgets:
|
||||
|
||||
| Complexity | Context % | Example |
|
||||
|------------|-----------|---------|
|
||||
| Simple | 10-20% | Add form field |
|
||||
| Medium | 20-35% | Create API endpoint |
|
||||
| Complex | 35-50% | Implement auth flow |
|
||||
| Too Large | >50% | **SPLIT REQUIRED** |
|
||||
|
||||
See [context-engineering.md](context-engineering.md) for context management rules.
|
||||
|
||||
---
|
||||
|
||||
## Deviation Handling
|
||||
|
||||
When Workers encounter unexpected issues during execution, they follow deviation rules:
|
||||
|
||||
| Rule | Action | Permission |
|
||||
|------|--------|------------|
|
||||
| Rule 1: Bug fixes | Auto-fix | None needed |
|
||||
| Rule 2: Missing critical (validation, auth) | Auto-add | None needed |
|
||||
| Rule 3: Blocking issues (deps, imports) | Auto-fix | None needed |
|
||||
| Rule 4: Architectural changes | ASK | Required |
|
||||
|
||||
See [deviation-rules.md](deviation-rules.md) for detailed guidance.
|
||||
|
||||
---
|
||||
|
||||
## Execution Artifacts
|
||||
|
||||
Task execution produces artifacts:
|
||||
|
||||
| Artifact | Purpose |
|
||||
|----------|---------|
|
||||
| Commits | Per-task atomic commits |
|
||||
| SUMMARY.md | Record of what happened |
|
||||
| STATE.md updates | Position tracking |
|
||||
|
||||
See [execution-artifacts.md](execution-artifacts.md) for artifact specifications.
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- **Compaction**: Summarize old closed tasks to reduce DB size (beads does this with LLM)
|
||||
- **Labels/tags**: Additional categorization beyond type
|
||||
- **Time tracking**: Estimated vs actual time for capacity planning
|
||||
- **Recurring tasks**: Templates that spawn new tasks on schedule
|
||||
322
docs/verification.md
Normal file
322
docs/verification.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Goal-Backward Verification
|
||||
|
||||
Verification confirms that **goals are achieved**, not merely that **tasks were completed**. A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
|
||||
|
||||
## Core Principle
|
||||
|
||||
**Task completion ≠ Goal achievement**
|
||||
|
||||
Tasks are implementation steps. Goals are user outcomes. Verification bridges the gap by checking observable outcomes, not just checklist items.
|
||||
|
||||
---
|
||||
|
||||
## Verification Levels
|
||||
|
||||
### Level 1: Existence Check
|
||||
Does the artifact exist?
|
||||
|
||||
```
|
||||
✓ File exists at expected path
|
||||
✓ Component is exported
|
||||
✓ Route is registered
|
||||
```
|
||||
|
||||
### Level 2: Substance Check
|
||||
Is the artifact substantive (not a stub)?
|
||||
|
||||
```
|
||||
✓ Function has implementation (not just return null)
|
||||
✓ Component renders content (not empty div)
|
||||
✓ API returns meaningful response (not placeholder)
|
||||
```
|
||||
|
||||
### Level 3: Wiring Check
|
||||
Is the artifact connected to the system?
|
||||
|
||||
```
|
||||
✓ Component is rendered somewhere
|
||||
✓ API endpoint is called by client
|
||||
✓ Event handler is attached
|
||||
✓ Database query is executed
|
||||
```
|
||||
|
||||
**All three levels must pass for verification success.**
|
||||
|
||||
---
|
||||
|
||||
## Must-Have Derivation
|
||||
|
||||
Before verification, derive what "done" means from the goal:
|
||||
|
||||
### 1. Observable Truths (3-7 user perspectives)
|
||||
What can a user observe when the goal is achieved?
|
||||
|
||||
```yaml
|
||||
observable_truths:
|
||||
- "User can click 'Send' and message appears in chat"
|
||||
- "Messages persist after page refresh"
|
||||
- "New messages appear without page reload"
|
||||
- "User sees typing indicator when other party types"
|
||||
```
|
||||
|
||||
### 2. Required Artifacts
|
||||
What files MUST exist?
|
||||
|
||||
```yaml
|
||||
required_artifacts:
|
||||
- path: src/components/Chat.tsx
|
||||
check: "Exports Chat component"
|
||||
- path: src/api/messages.ts
|
||||
check: "Exports sendMessage, getMessages"
|
||||
- path: src/hooks/useChat.ts
|
||||
check: "Exports useChat hook"
|
||||
```
|
||||
|
||||
### 3. Required Wiring
|
||||
What connections MUST work?
|
||||
|
||||
```yaml
|
||||
required_wiring:
|
||||
- from: Chat.tsx
|
||||
to: useChat.ts
|
||||
check: "Component calls hook"
|
||||
- from: useChat.ts
|
||||
to: messages.ts
|
||||
check: "Hook calls API"
|
||||
- from: messages.ts
|
||||
to: database
|
||||
check: "API persists to DB"
|
||||
```
|
||||
|
||||
### 4. Key Links (Where Stubs Hide)
|
||||
What integration points commonly fail?
|
||||
|
||||
```yaml
|
||||
key_links:
|
||||
- "Form onSubmit → API call (not just console.log)"
|
||||
- "WebSocket connection → message handler"
|
||||
- "API response → state update → render"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Process
|
||||
|
||||
### Phase Verification
|
||||
|
||||
After all tasks in a phase complete:
|
||||
|
||||
```
|
||||
1. Load must-haves (from phase goal or PLAN frontmatter)
|
||||
2. For each observable truth:
|
||||
a. Level 1: Does the relevant code exist?
|
||||
b. Level 2: Is it substantive?
|
||||
c. Level 3: Is it wired?
|
||||
3. For each required artifact:
|
||||
a. Verify file exists
|
||||
b. Verify not a stub
|
||||
c. Verify it's imported/used
|
||||
4. For each key link:
|
||||
a. Trace the connection
|
||||
b. Verify data flows
|
||||
5. Scan for anti-patterns (see below)
|
||||
6. Structure gaps for re-planning
|
||||
```
|
||||
|
||||
### Anti-Pattern Scanning
|
||||
|
||||
Check for common incomplete work:
|
||||
|
||||
| Pattern | Detection | Meaning |
|
||||
|---------|-----------|---------|
|
||||
| `// TODO` | Grep for TODO comments | Work deferred |
|
||||
| `throw new Error('Not implemented')` | Grep for stub errors | Placeholder code |
|
||||
| `return null` / `return {}` | AST analysis | Empty implementations |
|
||||
| `console.log` in handlers | Grep for console.log | Debug code left behind |
|
||||
| Empty catch blocks | AST analysis | Swallowed errors |
|
||||
| Hardcoded values | Manual review | Missing configuration |
|
||||
|
||||
---
|
||||
|
||||
## Verification Output
|
||||
|
||||
### Pass Case
|
||||
|
||||
```yaml
|
||||
# 2-VERIFICATION.md
|
||||
phase: 2
|
||||
status: PASS
|
||||
verified_at: 2024-01-15T10:30:00Z
|
||||
|
||||
observable_truths:
|
||||
- truth: "User can send message"
|
||||
status: VERIFIED
|
||||
evidence: "Chat.tsx:45 calls sendMessage on submit"
|
||||
- truth: "Messages persist"
|
||||
status: VERIFIED
|
||||
evidence: "messages.ts:23 inserts to SQLite"
|
||||
|
||||
required_artifacts:
|
||||
- path: src/components/Chat.tsx
|
||||
status: EXISTS
|
||||
check: PASSED
|
||||
- path: src/api/messages.ts
|
||||
status: EXISTS
|
||||
check: PASSED
|
||||
|
||||
anti_patterns_found: []
|
||||
|
||||
human_verification_needed:
|
||||
- "Visual layout matches design"
|
||||
- "Real-time updates work under load"
|
||||
```
|
||||
|
||||
### Fail Case (Gaps Found)
|
||||
|
||||
```yaml
|
||||
# 2-VERIFICATION.md
|
||||
phase: 2
|
||||
status: GAPS_FOUND
|
||||
verified_at: 2024-01-15T10:30:00Z
|
||||
|
||||
gaps:
|
||||
- type: STUB
|
||||
location: src/hooks/useChat.ts:34
|
||||
description: "sendMessage returns immediately without API call"
|
||||
severity: BLOCKING
|
||||
|
||||
- type: MISSING_WIRING
|
||||
location: src/components/Chat.tsx
|
||||
description: "WebSocket not connected, no real-time updates"
|
||||
severity: BLOCKING
|
||||
|
||||
- type: ANTI_PATTERN
|
||||
location: src/api/messages.ts:67
|
||||
description: "Empty catch block swallows errors"
|
||||
severity: HIGH
|
||||
|
||||
remediation_plan:
|
||||
- "Connect useChat to actual API endpoint"
|
||||
- "Initialize WebSocket in Chat component"
|
||||
- "Add error handling to API calls"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User Acceptance Testing (UAT)
|
||||
|
||||
Verification confirms code correctness. UAT confirms user experience.
|
||||
|
||||
### UAT Process
|
||||
|
||||
1. Extract testable deliverables from phase goal
|
||||
2. Walk user through each one:
|
||||
- "Can you log in with your email?"
|
||||
- "Does the dashboard show your projects?"
|
||||
- "Can you create a new project?"
|
||||
3. Record result: PASS, FAIL, or describe issue
|
||||
4. If issues found:
|
||||
- Diagnose root cause
|
||||
- Create targeted fix plan
|
||||
5. If all pass: Phase complete
|
||||
|
||||
### UAT Output
|
||||
|
||||
```yaml
|
||||
# 2-UAT.md
|
||||
phase: 2
|
||||
tested_by: user
|
||||
tested_at: 2024-01-15T14:00:00Z
|
||||
|
||||
test_cases:
|
||||
- case: "Login with email"
|
||||
result: PASS
|
||||
|
||||
- case: "Dashboard shows projects"
|
||||
result: FAIL
|
||||
issue: "Shows loading spinner forever"
|
||||
diagnosis: "API returns 500, missing auth header"
|
||||
|
||||
- case: "Create new project"
|
||||
result: BLOCKED
|
||||
reason: "Cannot test, dashboard not loading"
|
||||
|
||||
fix_required: true
|
||||
fix_plan:
|
||||
- task: "Add auth header to dashboard API call"
|
||||
files: [src/api/projects.ts]
|
||||
priority: P0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Task Workflow
|
||||
|
||||
### Task Completion Hook
|
||||
When task closes:
|
||||
1. Worker marks task closed with reason
|
||||
2. If all phase tasks closed, trigger phase verification
|
||||
3. Verifier agent runs goal-backward check
|
||||
4. If PASS: Phase marked complete
|
||||
5. If GAPS: Create remediation tasks, phase stays in_progress
|
||||
|
||||
### Verification Task Type
|
||||
Verification itself is a task:
|
||||
|
||||
```yaml
|
||||
type: verification
|
||||
phase_id: phase-2
|
||||
status: open
|
||||
assigned_to: verifier-agent
|
||||
priority: P0 # Always high priority
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checkpoint Types
|
||||
|
||||
During execution, agents may need human input. Use precise checkpoint types:
|
||||
|
||||
### checkpoint:human-verify (90% of checkpoints)
|
||||
Agent completed work, user confirms it works.
|
||||
|
||||
```yaml
|
||||
checkpoint: human-verify
|
||||
prompt: "Can you log in with email and password?"
|
||||
expected: "User confirms successful login"
|
||||
```
|
||||
|
||||
### checkpoint:decision (9% of checkpoints)
|
||||
User must make implementation choice.
|
||||
|
||||
```yaml
|
||||
checkpoint: decision
|
||||
prompt: "OAuth2 or SAML for SSO?"
|
||||
options:
|
||||
- OAuth2: "Simpler, most common"
|
||||
- SAML: "Enterprise requirement"
|
||||
```
|
||||
|
||||
### checkpoint:human-action (1% of checkpoints)
|
||||
Truly unavoidable manual step.
|
||||
|
||||
```yaml
|
||||
checkpoint: human-action
|
||||
prompt: "Click the email verification link"
|
||||
reason: "Cannot automate email client interaction"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Human Verification Needs
|
||||
|
||||
Some verifications require human eyes:
|
||||
|
||||
| Category | Examples | Why Human |
|
||||
|----------|----------|-----------|
|
||||
| Visual | Layout, spacing, colors | Subjective/design judgment |
|
||||
| Real-time | WebSocket, live updates | Requires interaction |
|
||||
| External | OAuth flow, payment | Third-party systems |
|
||||
| Accessibility | Screen reader, keyboard nav | Requires tooling/expertise |
|
||||
|
||||
**Mark these explicitly** in verification output. Don't claim PASS when human verification is pending.
|
||||
Reference in New Issue
Block a user