Add userDismissedAt field to agents schema
This commit is contained in:
267
docs/model-profiles.md
Normal file
267
docs/model-profiles.md
Normal file
@@ -0,0 +1,267 @@
|
||||
# Model Profiles
|
||||
|
||||
Different agent roles have different needs. Model selection balances quality, cost, and latency.
|
||||
|
||||
## Profile Definitions
|
||||
|
||||
| Profile | Use Case | Cost | Quality |
|
||||
|---------|----------|------|---------|
|
||||
| **quality** | Critical decisions, architecture | Highest | Best |
|
||||
| **balanced** | Default for most work | Medium | Good |
|
||||
| **budget** | High-volume, low-risk tasks | Lowest | Acceptable |
|
||||
|
||||
---
|
||||
|
||||
## Agent Model Assignments
|
||||
|
||||
| Agent | Quality | Balanced (Default) | Budget |
|
||||
|-------|---------|-------------------|--------|
|
||||
| **Architect** | Opus | Opus | Sonnet |
|
||||
| **Worker** | Opus | Sonnet | Sonnet |
|
||||
| **Verifier** | Sonnet | Sonnet | Haiku |
|
||||
| **Orchestrator** | Sonnet | Sonnet | Haiku |
|
||||
| **Monitor** | Sonnet | Haiku | Haiku |
|
||||
| **Researcher** | Opus | Sonnet | Haiku |
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
### Architect (Planning) - Opus/Opus/Sonnet
|
||||
Planning has the highest impact on outcomes. A bad plan wastes all downstream execution. Invest in quality here.
|
||||
|
||||
**Quality profile:** Complex systems, novel domains, critical decisions
|
||||
**Balanced profile:** Standard feature work, established patterns
|
||||
**Budget profile:** Simple initiatives, well-documented domains
|
||||
|
||||
### Worker (Execution) - Opus/Sonnet/Sonnet
|
||||
The plan already contains reasoning. Execution is implementation, not decision-making.
|
||||
|
||||
**Quality profile:** Complex algorithms, security-critical code
|
||||
**Balanced profile:** Standard implementation work
|
||||
**Budget profile:** Simple tasks, boilerplate code
|
||||
|
||||
### Verifier (Validation) - Sonnet/Sonnet/Haiku
|
||||
Verification is structured checking against defined criteria. Less reasoning needed than planning.
|
||||
|
||||
**Quality profile:** Complex verification, subtle integration issues
|
||||
**Balanced profile:** Standard goal-backward verification
|
||||
**Budget profile:** Simple pass/fail checks
|
||||
|
||||
### Orchestrator (Coordination) - Sonnet/Sonnet/Haiku
|
||||
Orchestrator routes work, doesn't do heavy lifting. Needs reliability, not creativity.
|
||||
|
||||
**Quality profile:** Complex multi-agent coordination
|
||||
**Balanced profile:** Standard workflow management
|
||||
**Budget profile:** Simple task routing
|
||||
|
||||
### Monitor (Observation) - Sonnet/Haiku/Haiku
|
||||
Monitoring is pattern matching and threshold checking. Minimal reasoning required.
|
||||
|
||||
**Quality profile:** Complex health analysis
|
||||
**Balanced profile:** Standard monitoring
|
||||
**Budget profile:** Simple heartbeat checks
|
||||
|
||||
### Researcher (Discovery) - Opus/Sonnet/Haiku
|
||||
Research is read-only exploration. High volume, low modification risk.
|
||||
|
||||
**Quality profile:** Deep domain analysis
|
||||
**Balanced profile:** Standard codebase exploration
|
||||
**Budget profile:** Simple file lookups
|
||||
|
||||
---
|
||||
|
||||
## Profile Selection
|
||||
|
||||
### Per-Initiative Override
|
||||
|
||||
```yaml
|
||||
# In initiative config
|
||||
model_profile: quality # Override default balanced
|
||||
```
|
||||
|
||||
### Per-Agent Override
|
||||
|
||||
```yaml
|
||||
# In task assignment
|
||||
assigned_to: worker-123
|
||||
model_override: opus # This task needs Opus
|
||||
```
|
||||
|
||||
### Automatic Escalation
|
||||
|
||||
```yaml
|
||||
# When to auto-escalate
|
||||
escalation_triggers:
|
||||
- condition: "task.retry_count > 2"
|
||||
action: "escalate_model"
|
||||
- condition: "task.complexity == 'high'"
|
||||
action: "use_quality_profile"
|
||||
- condition: "deviation.rule == 4"
|
||||
action: "escalate_model"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Management
|
||||
|
||||
### Estimated Token Usage
|
||||
|
||||
| Agent | Avg Tokens/Task | Profile Impact |
|
||||
|-------|-----------------|----------------|
|
||||
| Architect | 50k-100k | 3x between budget/quality |
|
||||
| Worker | 20k-50k | 2x between budget/quality |
|
||||
| Verifier | 10k-30k | 1.5x between budget/quality |
|
||||
| Orchestrator | 5k-15k | 1.5x between budget/quality |
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
1. **Right-size tasks:** Smaller tasks = less token usage
|
||||
2. **Use budget for volume:** Monitoring, simple checks
|
||||
3. **Reserve quality for impact:** Architecture, security
|
||||
4. **Profile per initiative:** Simple features use budget, complex use quality
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Default Profile
|
||||
|
||||
```json
|
||||
// .planning/config.json
|
||||
{
|
||||
"model_profile": "balanced",
|
||||
"model_overrides": {
|
||||
"architect": null,
|
||||
"worker": null,
|
||||
"verifier": null
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Quality Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "quality",
|
||||
"model_overrides": {}
|
||||
}
|
||||
```
|
||||
|
||||
### Budget Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "budget",
|
||||
"model_overrides": {
|
||||
"architect": "sonnet" // Keep architect at sonnet minimum
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mixed Profile
|
||||
|
||||
```json
|
||||
{
|
||||
"model_profile": "balanced",
|
||||
"model_overrides": {
|
||||
"architect": "opus", // Invest in planning
|
||||
"worker": "sonnet", // Standard execution
|
||||
"verifier": "haiku" // Budget verification
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Capabilities Reference
|
||||
|
||||
### Opus
|
||||
- **Strengths:** Complex reasoning, nuanced decisions, novel problems
|
||||
- **Best for:** Architecture, complex algorithms, security analysis
|
||||
- **Cost:** Highest
|
||||
|
||||
### Sonnet
|
||||
- **Strengths:** Good balance of reasoning and speed, reliable
|
||||
- **Best for:** Standard development, code generation, debugging
|
||||
- **Cost:** Medium
|
||||
|
||||
### Haiku
|
||||
- **Strengths:** Fast, cheap, good for structured tasks
|
||||
- **Best for:** Monitoring, simple checks, high-volume operations
|
||||
- **Cost:** Lowest
|
||||
|
||||
---
|
||||
|
||||
## Profile Switching
|
||||
|
||||
### CLI Command
|
||||
|
||||
```bash
|
||||
# Set profile for all future work
|
||||
cw config set model_profile quality
|
||||
|
||||
# Set profile for specific initiative
|
||||
cw initiative config <id> --model-profile budget
|
||||
|
||||
# Override for single task
|
||||
cw task update <id> --model-override opus
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
```typescript
|
||||
// Set initiative profile
|
||||
await initiative.setConfig(id, { modelProfile: 'quality' });
|
||||
|
||||
// Override task model
|
||||
await task.update(id, { modelOverride: 'opus' });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Model Usage
|
||||
|
||||
Track model usage for cost analysis:
|
||||
|
||||
```sql
|
||||
CREATE TABLE model_usage (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
agent_type TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
tokens_input INTEGER,
|
||||
tokens_output INTEGER,
|
||||
task_id TEXT,
|
||||
initiative_id TEXT,
|
||||
created_at INTEGER DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
-- Usage by agent type
|
||||
SELECT agent_type, model, SUM(tokens_input + tokens_output) as total_tokens
|
||||
FROM model_usage
|
||||
GROUP BY agent_type, model;
|
||||
|
||||
-- Cost by initiative
|
||||
SELECT initiative_id,
|
||||
SUM(CASE WHEN model = 'opus' THEN tokens * 0.015
|
||||
WHEN model = 'sonnet' THEN tokens * 0.003
|
||||
WHEN model = 'haiku' THEN tokens * 0.0003 END) as estimated_cost
|
||||
FROM model_usage
|
||||
GROUP BY initiative_id;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Starting Out
|
||||
Use **balanced** profile. It provides good quality at reasonable cost.
|
||||
|
||||
### High-Stakes Projects
|
||||
Use **quality** profile. The cost difference is negligible compared to getting it right.
|
||||
|
||||
### High-Volume Work
|
||||
Use **budget** profile with architect override to sonnet. Don't skimp on planning.
|
||||
|
||||
### Learning the System
|
||||
Use **quality** profile initially. See what good output looks like before optimizing for cost.
|
||||
Reference in New Issue
Block a user