Codewalkers/docs/context-engineering.md

# Context Engineering

Context engineering is a first-class concern in Codewalk. Agent output quality degrades predictably as context fills. This document defines the rules that all agents must follow.

## Quality Degradation Curve

Claude's output quality follows a predictable curve based on context utilization:

| Context Usage | Quality Level | Behavior |
|---------------|---------------|----------|
| 0-30% | **PEAK** | Thorough, comprehensive, considers edge cases |
| 30-50% | **GOOD** | Confident, solid work, reliable output |
| 50-70% | **DEGRADING** | Efficiency mode begins, shortcuts appear |
| 70%+ | **POOR** | Rushed, minimal, misses requirements |

**Rule: Stay UNDER 50% context for quality work.**

---

## Orchestrator Pattern

Codewalk uses thin orchestration with heavy subagent work:

```
┌─────────────────────────────────────────────────────────────┐
│                    Orchestrator (30-40%)                    │
│  - Routes work to specialized agents                        │
│  - Collects results                                         │
│  - Maintains state                                          │
│  - Coordinates across phases                                │
└─────────────────────────────────────────────────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Worker    │    │  Architect  │    │  Verifier   │
    │  (200k ctx) │    │  (200k ctx) │    │  (200k ctx) │
    │  Fresh per  │    │  Fresh per  │    │  Fresh per  │
    │    task     │    │  initiative │    │    phase    │
    └─────────────┘    └─────────────┘    └─────────────┘
```

**Key insight:** Each subagent gets a fresh 200k context window. Heavy work happens there, not in the orchestrator.

---

## Context Budgets by Role

### Orchestrator
- **Target:** 30-40% max
- **Strategy:** Route, don't process. Collect results, don't analyze.
- **Reset trigger:** Context exceeds 50%

### Worker
- **Target:** 50% per task
- **Strategy:** Single task per context. Fresh context for each task.
- **Reset trigger:** Task completion (always)

### Architect
- **Target:** 60% per initiative analysis
- **Strategy:** Initiative discussion + planning in single context
- **Reset trigger:** Work plan generated or context exceeds 70%

### Verifier
- **Target:** 40% per phase verification
- **Strategy:** Goal-backward verification, gap identification
- **Reset trigger:** Verification complete

---

## Task Sizing Rules

Tasks are sized to fit context budgets:

| Task Complexity | Context Estimate | Example |
|-----------------|------------------|---------|
| Simple | 10-20% | Add a field to an existing form |
| Medium | 20-35% | Create new API endpoint with validation |
| Complex | 35-50% | Implement auth flow with refresh tokens |
| Too Large | >50% | **SPLIT INTO SUBTASKS** |

**Planning rule:** No single task should require >50% context. If estimation suggests otherwise, decompose before execution.

---

## Plan Sizing

Plans group 2-3 related tasks for sequential execution:

| Plan Size | Target Context | Notes |
|-----------|----------------|-------|
| Minimal (1 task) | 20-30% | Simple independent work |
| Standard (2-3 tasks) | 40-50% | Related work, shared context |
| Maximum | 50% | Never exceed—quality degrades |

**Why 2-3 tasks?** Shared context reduces overhead (file reads, understanding). More than 3 loses quality benefits.

---

## Wave-Based Parallelization

Compute dependency graph and assign tasks to waves:

```
Wave 0: Tasks with no dependencies (run in parallel)
   ↓
Wave 1: Tasks depending only on Wave 0 (run in parallel)
   ↓
Wave 2: Tasks depending only on Wave 0-1 (run in parallel)
   ↓
...continue until all tasks assigned
```

**Benefits:**
- Maximum parallelization
- Clear progress tracking
- Natural checkpoints between waves

### Computation Algorithm

```
1. Build dependency graph from task dependencies
2. Find all tasks with no unresolved dependencies → Wave 0
3. Mark Wave 0 as "resolved"
4. Find all tasks whose dependencies are all resolved → Wave 1
5. Repeat until all tasks assigned
```

---

## Context Handoff

When context fills, perform controlled handoff:

### STATE.md Update
Before handoff, update session state:

```yaml
position:
  phase: 2
  plan: 3
  task: "Implement refresh token rotation"
  wave: 1

decisions:
  - "Using jose library for JWT (not jsonwebtoken)"
  - "Refresh tokens stored in httpOnly cookie, not localStorage"
  - "15min access token, 7day refresh token"

blockers:
  - "Waiting for user to configure OAuth credentials"

next_action: "Continue with task after blocker resolved"
```

### Handoff Content
New session receives:
- STATE.md (current position)
- Relevant SUMMARY.md files (prior work in this phase)
- Current PLAN.md (if executing)
- Task context from initiative

---

## Anti-Patterns

### Context Stuffing
**Wrong:** Loading entire codebase at session start
**Right:** Load files on-demand as tasks require them

### Orchestrator Processing
**Wrong:** Orchestrator reads all code and makes decisions
**Right:** Orchestrator routes to specialized agents who do the work

### Plan Bloat
**Wrong:** 10-task plans to "reduce coordination overhead"
**Right:** 2-3 task plans that fit in 50% context

### No Handoff State
**Wrong:** Agent restarts with no memory of prior work
**Right:** STATE.md preserves position, decisions, blockers

---

## Monitoring

Track context utilization across the system:

| Metric | Threshold | Action |
|--------|-----------|--------|
| Orchestrator context | >50% | Trigger handoff |
| Worker task context | >60% | Flag task as oversized |
| Plan total estimate | >50% | Split plan before execution |
| Average task context | >40% | Review decomposition strategy |

---

## Implementation Notes

### Context Estimation
Estimate context usage before execution:
- File reads: ~1-2% per file (varies by size)
- Code changes: ~0.5% per change
- Tool outputs: ~1% per tool call
- Discussion: ~2-5% per exchange

### Fresh Context Triggers
- Worker: Always fresh per task
- Architect: Fresh per initiative
- Verifier: Fresh per phase
- Orchestrator: Handoff at 50%

### Subagent Spawning
When spawning subagents:
1. Provide focused context (only what's needed)
2. Clear instructions (specific task, expected output)
3. Collect structured results
4. Update state with outcomes