219 lines
6.9 KiB
Markdown
219 lines
6.9 KiB
Markdown
# Context Engineering
|
|
|
|
Context engineering is a first-class concern in Codewalk. Agent output quality degrades predictably as context fills. This document defines the rules that all agents must follow.
|
|
|
|
## Quality Degradation Curve
|
|
|
|
Claude's output quality follows a predictable curve based on context utilization:
|
|
|
|
| Context Usage | Quality Level | Behavior |
|
|
|---------------|---------------|----------|
|
|
| 0-30% | **PEAK** | Thorough, comprehensive, considers edge cases |
|
|
| 30-50% | **GOOD** | Confident, solid work, reliable output |
|
|
| 50-70% | **DEGRADING** | Efficiency mode begins, shortcuts appear |
|
|
| 70%+ | **POOR** | Rushed, minimal, misses requirements |
|
|
|
|
**Rule: Stay UNDER 50% context for quality work.**
|
|
|
|
---
|
|
|
|
## Orchestrator Pattern
|
|
|
|
Codewalk uses thin orchestration with heavy subagent work:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Orchestrator (30-40%) │
|
|
│ - Routes work to specialized agents │
|
|
│ - Collects results │
|
|
│ - Maintains state │
|
|
│ - Coordinates across phases │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌──────────────────┼──────────────────┐
|
|
▼ ▼ ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Worker │ │ Architect │ │ Verifier │
|
|
│ (200k ctx) │ │ (200k ctx) │ │ (200k ctx) │
|
|
│ Fresh per │ │ Fresh per │ │ Fresh per │
|
|
│ task │ │ initiative │ │ phase │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
```
|
|
|
|
**Key insight:** Each subagent gets a fresh 200k context window. Heavy work happens there, not in the orchestrator.
|
|
|
|
---
|
|
|
|
## Context Budgets by Role
|
|
|
|
### Orchestrator
|
|
- **Target:** 30-40% max
|
|
- **Strategy:** Route, don't process. Collect results, don't analyze.
|
|
- **Reset trigger:** Context exceeds 50%
|
|
|
|
### Worker
|
|
- **Target:** 50% per task
|
|
- **Strategy:** Single task per context. Fresh context for each task.
|
|
- **Reset trigger:** Task completion (always)
|
|
|
|
### Architect
|
|
- **Target:** 60% per initiative analysis
|
|
- **Strategy:** Initiative discussion + planning in single context
|
|
- **Reset trigger:** Work plan generated or context exceeds 70%
|
|
|
|
### Verifier
|
|
- **Target:** 40% per phase verification
|
|
- **Strategy:** Goal-backward verification, gap identification
|
|
- **Reset trigger:** Verification complete
|
|
|
|
---
|
|
|
|
## Task Sizing Rules
|
|
|
|
Tasks are sized to fit context budgets:
|
|
|
|
| Task Complexity | Context Estimate | Example |
|
|
|-----------------|------------------|---------|
|
|
| Simple | 10-20% | Add a field to an existing form |
|
|
| Medium | 20-35% | Create new API endpoint with validation |
|
|
| Complex | 35-50% | Implement auth flow with refresh tokens |
|
|
| Too Large | >50% | **SPLIT INTO SUBTASKS** |
|
|
|
|
**Planning rule:** No single task should require >50% context. If estimation suggests otherwise, decompose before execution.
|
|
|
|
---
|
|
|
|
## Plan Sizing
|
|
|
|
Plans group 2-3 related tasks for sequential execution:
|
|
|
|
| Plan Size | Target Context | Notes |
|
|
|-----------|----------------|-------|
|
|
| Minimal (1 task) | 20-30% | Simple independent work |
|
|
| Standard (2-3 tasks) | 40-50% | Related work, shared context |
|
|
| Maximum | 50% | Never exceed—quality degrades |
|
|
|
|
**Why 2-3 tasks?** Shared context reduces overhead (file reads, understanding). More than 3 loses quality benefits.
|
|
|
|
---
|
|
|
|
## Wave-Based Parallelization
|
|
|
|
Compute dependency graph and assign tasks to waves:
|
|
|
|
```
|
|
Wave 0: Tasks with no dependencies (run in parallel)
|
|
↓
|
|
Wave 1: Tasks depending only on Wave 0 (run in parallel)
|
|
↓
|
|
Wave 2: Tasks depending only on Wave 0-1 (run in parallel)
|
|
↓
|
|
...continue until all tasks assigned
|
|
```
|
|
|
|
**Benefits:**
|
|
- Maximum parallelization
|
|
- Clear progress tracking
|
|
- Natural checkpoints between waves
|
|
|
|
### Computation Algorithm
|
|
|
|
```
|
|
1. Build dependency graph from task dependencies
|
|
2. Find all tasks with no unresolved dependencies → Wave 0
|
|
3. Mark Wave 0 as "resolved"
|
|
4. Find all tasks whose dependencies are all resolved → Wave 1
|
|
5. Repeat until all tasks assigned
|
|
```
|
|
|
|
---
|
|
|
|
## Context Handoff
|
|
|
|
When context fills, perform controlled handoff:
|
|
|
|
### STATE.md Update
|
|
Before handoff, update session state:
|
|
|
|
```yaml
|
|
position:
|
|
phase: 2
|
|
plan: 3
|
|
task: "Implement refresh token rotation"
|
|
wave: 1
|
|
|
|
decisions:
|
|
- "Using jose library for JWT (not jsonwebtoken)"
|
|
- "Refresh tokens stored in httpOnly cookie, not localStorage"
|
|
- "15min access token, 7day refresh token"
|
|
|
|
blockers:
|
|
- "Waiting for user to configure OAuth credentials"
|
|
|
|
next_action: "Continue with task after blocker resolved"
|
|
```
|
|
|
|
### Handoff Content
|
|
New session receives:
|
|
- STATE.md (current position)
|
|
- Relevant SUMMARY.md files (prior work in this phase)
|
|
- Current PLAN.md (if executing)
|
|
- Task context from initiative
|
|
|
|
---
|
|
|
|
## Anti-Patterns
|
|
|
|
### Context Stuffing
|
|
**Wrong:** Loading entire codebase at session start
|
|
**Right:** Load files on-demand as tasks require them
|
|
|
|
### Orchestrator Processing
|
|
**Wrong:** Orchestrator reads all code and makes decisions
|
|
**Right:** Orchestrator routes to specialized agents who do the work
|
|
|
|
### Plan Bloat
|
|
**Wrong:** 10-task plans to "reduce coordination overhead"
|
|
**Right:** 2-3 task plans that fit in 50% context
|
|
|
|
### No Handoff State
|
|
**Wrong:** Agent restarts with no memory of prior work
|
|
**Right:** STATE.md preserves position, decisions, blockers
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
Track context utilization across the system:
|
|
|
|
| Metric | Threshold | Action |
|
|
|--------|-----------|--------|
|
|
| Orchestrator context | >50% | Trigger handoff |
|
|
| Worker task context | >60% | Flag task as oversized |
|
|
| Plan total estimate | >50% | Split plan before execution |
|
|
| Average task context | >40% | Review decomposition strategy |
|
|
|
|
---
|
|
|
|
## Implementation Notes
|
|
|
|
### Context Estimation
|
|
Estimate context usage before execution:
|
|
- File reads: ~1-2% per file (varies by size)
|
|
- Code changes: ~0.5% per change
|
|
- Tool outputs: ~1% per tool call
|
|
- Discussion: ~2-5% per exchange
|
|
|
|
### Fresh Context Triggers
|
|
- Worker: Always fresh per task
|
|
- Architect: Fresh per initiative
|
|
- Verifier: Fresh per phase
|
|
- Orchestrator: Handoff at 50%
|
|
|
|
### Subagent Spawning
|
|
When spawning subagents:
|
|
1. Provide focused context (only what's needed)
|
|
2. Clear instructions (specific task, expected output)
|
|
3. Collect structured results
|
|
4. Update state with outcomes
|