323 lines
7.5 KiB
Markdown
323 lines
7.5 KiB
Markdown
# Goal-Backward Verification
|
|
|
|
Verification confirms that **goals are achieved**, not merely that **tasks were completed**. A completed task "create chat component" does not guarantee the goal "working chat interface" is met.
|
|
|
|
## Core Principle
|
|
|
|
**Task completion ≠ Goal achievement**
|
|
|
|
Tasks are implementation steps. Goals are user outcomes. Verification bridges the gap by checking observable outcomes, not just checklist items.
|
|
|
|
---
|
|
|
|
## Verification Levels
|
|
|
|
### Level 1: Existence Check
|
|
Does the artifact exist?
|
|
|
|
```
|
|
✓ File exists at expected path
|
|
✓ Component is exported
|
|
✓ Route is registered
|
|
```
|
|
|
|
### Level 2: Substance Check
|
|
Is the artifact substantive (not a stub)?
|
|
|
|
```
|
|
✓ Function has implementation (not just return null)
|
|
✓ Component renders content (not empty div)
|
|
✓ API returns meaningful response (not placeholder)
|
|
```
|
|
|
|
### Level 3: Wiring Check
|
|
Is the artifact connected to the system?
|
|
|
|
```
|
|
✓ Component is rendered somewhere
|
|
✓ API endpoint is called by client
|
|
✓ Event handler is attached
|
|
✓ Database query is executed
|
|
```
|
|
|
|
**All three levels must pass for verification success.**
|
|
|
|
---
|
|
|
|
## Must-Have Derivation
|
|
|
|
Before verification, derive what "done" means from the goal:
|
|
|
|
### 1. Observable Truths (3-7 user perspectives)
|
|
What can a user observe when the goal is achieved?
|
|
|
|
```yaml
|
|
observable_truths:
|
|
- "User can click 'Send' and message appears in chat"
|
|
- "Messages persist after page refresh"
|
|
- "New messages appear without page reload"
|
|
- "User sees typing indicator when other party types"
|
|
```
|
|
|
|
### 2. Required Artifacts
|
|
What files MUST exist?
|
|
|
|
```yaml
|
|
required_artifacts:
|
|
- path: src/components/Chat.tsx
|
|
check: "Exports Chat component"
|
|
- path: src/api/messages.ts
|
|
check: "Exports sendMessage, getMessages"
|
|
- path: src/hooks/useChat.ts
|
|
check: "Exports useChat hook"
|
|
```
|
|
|
|
### 3. Required Wiring
|
|
What connections MUST work?
|
|
|
|
```yaml
|
|
required_wiring:
|
|
- from: Chat.tsx
|
|
to: useChat.ts
|
|
check: "Component calls hook"
|
|
- from: useChat.ts
|
|
to: messages.ts
|
|
check: "Hook calls API"
|
|
- from: messages.ts
|
|
to: database
|
|
check: "API persists to DB"
|
|
```
|
|
|
|
### 4. Key Links (Where Stubs Hide)
|
|
What integration points commonly fail?
|
|
|
|
```yaml
|
|
key_links:
|
|
- "Form onSubmit → API call (not just console.log)"
|
|
- "WebSocket connection → message handler"
|
|
- "API response → state update → render"
|
|
```
|
|
|
|
---
|
|
|
|
## Verification Process
|
|
|
|
### Phase Verification
|
|
|
|
After all tasks in a phase complete:
|
|
|
|
```
|
|
1. Load must-haves (from phase goal or PLAN frontmatter)
|
|
2. For each observable truth:
|
|
a. Level 1: Does the relevant code exist?
|
|
b. Level 2: Is it substantive?
|
|
c. Level 3: Is it wired?
|
|
3. For each required artifact:
|
|
a. Verify file exists
|
|
b. Verify not a stub
|
|
c. Verify it's imported/used
|
|
4. For each key link:
|
|
a. Trace the connection
|
|
b. Verify data flows
|
|
5. Scan for anti-patterns (see below)
|
|
6. Structure gaps for re-planning
|
|
```
|
|
|
|
### Anti-Pattern Scanning
|
|
|
|
Check for common incomplete work:
|
|
|
|
| Pattern | Detection | Meaning |
|
|
|---------|-----------|---------|
|
|
| `// TODO` | Grep for TODO comments | Work deferred |
|
|
| `throw new Error('Not implemented')` | Grep for stub errors | Placeholder code |
|
|
| `return null` / `return {}` | AST analysis | Empty implementations |
|
|
| `console.log` in handlers | Grep for console.log | Debug code left behind |
|
|
| Empty catch blocks | AST analysis | Swallowed errors |
|
|
| Hardcoded values | Manual review | Missing configuration |
|
|
|
|
---
|
|
|
|
## Verification Output
|
|
|
|
### Pass Case
|
|
|
|
```yaml
|
|
# 2-VERIFICATION.md
|
|
phase: 2
|
|
status: PASS
|
|
verified_at: 2024-01-15T10:30:00Z
|
|
|
|
observable_truths:
|
|
- truth: "User can send message"
|
|
status: VERIFIED
|
|
evidence: "Chat.tsx:45 calls sendMessage on submit"
|
|
- truth: "Messages persist"
|
|
status: VERIFIED
|
|
evidence: "messages.ts:23 inserts to SQLite"
|
|
|
|
required_artifacts:
|
|
- path: src/components/Chat.tsx
|
|
status: EXISTS
|
|
check: PASSED
|
|
- path: src/api/messages.ts
|
|
status: EXISTS
|
|
check: PASSED
|
|
|
|
anti_patterns_found: []
|
|
|
|
human_verification_needed:
|
|
- "Visual layout matches design"
|
|
- "Real-time updates work under load"
|
|
```
|
|
|
|
### Fail Case (Gaps Found)
|
|
|
|
```yaml
|
|
# 2-VERIFICATION.md
|
|
phase: 2
|
|
status: GAPS_FOUND
|
|
verified_at: 2024-01-15T10:30:00Z
|
|
|
|
gaps:
|
|
- type: STUB
|
|
location: src/hooks/useChat.ts:34
|
|
description: "sendMessage returns immediately without API call"
|
|
severity: BLOCKING
|
|
|
|
- type: MISSING_WIRING
|
|
location: src/components/Chat.tsx
|
|
description: "WebSocket not connected, no real-time updates"
|
|
severity: BLOCKING
|
|
|
|
- type: ANTI_PATTERN
|
|
location: src/api/messages.ts:67
|
|
description: "Empty catch block swallows errors"
|
|
severity: HIGH
|
|
|
|
remediation_plan:
|
|
- "Connect useChat to actual API endpoint"
|
|
- "Initialize WebSocket in Chat component"
|
|
- "Add error handling to API calls"
|
|
```
|
|
|
|
---
|
|
|
|
## User Acceptance Testing (UAT)
|
|
|
|
Verification confirms code correctness. UAT confirms user experience.
|
|
|
|
### UAT Process
|
|
|
|
1. Extract testable deliverables from phase goal
|
|
2. Walk user through each one:
|
|
- "Can you log in with your email?"
|
|
- "Does the dashboard show your projects?"
|
|
- "Can you create a new project?"
|
|
3. Record result: PASS, FAIL, or describe issue
|
|
4. If issues found:
|
|
- Diagnose root cause
|
|
- Create targeted fix plan
|
|
5. If all pass: Phase complete
|
|
|
|
### UAT Output
|
|
|
|
```yaml
|
|
# 2-UAT.md
|
|
phase: 2
|
|
tested_by: user
|
|
tested_at: 2024-01-15T14:00:00Z
|
|
|
|
test_cases:
|
|
- case: "Login with email"
|
|
result: PASS
|
|
|
|
- case: "Dashboard shows projects"
|
|
result: FAIL
|
|
issue: "Shows loading spinner forever"
|
|
diagnosis: "API returns 500, missing auth header"
|
|
|
|
- case: "Create new project"
|
|
result: BLOCKED
|
|
reason: "Cannot test, dashboard not loading"
|
|
|
|
fix_required: true
|
|
fix_plan:
|
|
- task: "Add auth header to dashboard API call"
|
|
files: [src/api/projects.ts]
|
|
priority: P0
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Task Workflow
|
|
|
|
### Task Completion Hook
|
|
When task closes:
|
|
1. Worker marks task closed with reason
|
|
2. If all phase tasks closed, trigger phase verification
|
|
3. Verifier agent runs goal-backward check
|
|
4. If PASS: Phase marked complete
|
|
5. If GAPS: Create remediation tasks, phase stays in_progress
|
|
|
|
### Verification Task Type
|
|
Verification itself is a task:
|
|
|
|
```yaml
|
|
type: verification
|
|
phase_id: phase-2
|
|
status: open
|
|
assigned_to: verifier-agent
|
|
priority: P0 # Always high priority
|
|
```
|
|
|
|
---
|
|
|
|
## Checkpoint Types
|
|
|
|
During execution, agents may need human input. Use precise checkpoint types:
|
|
|
|
### checkpoint:human-verify (90% of checkpoints)
|
|
Agent completed work, user confirms it works.
|
|
|
|
```yaml
|
|
checkpoint: human-verify
|
|
prompt: "Can you log in with email and password?"
|
|
expected: "User confirms successful login"
|
|
```
|
|
|
|
### checkpoint:decision (9% of checkpoints)
|
|
User must make implementation choice.
|
|
|
|
```yaml
|
|
checkpoint: decision
|
|
prompt: "OAuth2 or SAML for SSO?"
|
|
options:
|
|
- OAuth2: "Simpler, most common"
|
|
- SAML: "Enterprise requirement"
|
|
```
|
|
|
|
### checkpoint:human-action (1% of checkpoints)
|
|
Truly unavoidable manual step.
|
|
|
|
```yaml
|
|
checkpoint: human-action
|
|
prompt: "Click the email verification link"
|
|
reason: "Cannot automate email client interaction"
|
|
```
|
|
|
|
---
|
|
|
|
## Human Verification Needs
|
|
|
|
Some verifications require human eyes:
|
|
|
|
| Category | Examples | Why Human |
|
|
|----------|----------|-----------|
|
|
| Visual | Layout, spacing, colors | Subjective/design judgment |
|
|
| Real-time | WebSocket, live updates | Requires interaction |
|
|
| External | OAuth flow, payment | Third-party systems |
|
|
| Accessibility | Screen reader, keyboard nav | Requires tooling/expertise |
|
|
|
|
**Mark these explicitly** in verification output. Don't claim PASS when human verification is pending.
|