Files

Lukas May 2877484012 Add userDismissedAt field to agents schema

2026-02-07 00:33:12 +01:00

7.5 KiB

Raw Blame History

Goal-Backward Verification

Verification confirms that goals are achieved, not merely that tasks were completed. A completed task "create chat component" does not guarantee the goal "working chat interface" is met.

Core Principle

Task completion ≠ Goal achievement

Tasks are implementation steps. Goals are user outcomes. Verification bridges the gap by checking observable outcomes, not just checklist items.

Verification Levels

Level 1: Existence Check

Does the artifact exist?

✓ File exists at expected path
✓ Component is exported
✓ Route is registered

Level 2: Substance Check

Is the artifact substantive (not a stub)?

✓ Function has implementation (not just return null)
✓ Component renders content (not empty div)
✓ API returns meaningful response (not placeholder)

Level 3: Wiring Check

Is the artifact connected to the system?

✓ Component is rendered somewhere
✓ API endpoint is called by client
✓ Event handler is attached
✓ Database query is executed

All three levels must pass for verification success.

Must-Have Derivation

Before verification, derive what "done" means from the goal:

1. Observable Truths (3-7 user perspectives)

What can a user observe when the goal is achieved?

observable_truths:
  - "User can click 'Send' and message appears in chat"
  - "Messages persist after page refresh"
  - "New messages appear without page reload"
  - "User sees typing indicator when other party types"

2. Required Artifacts

What files MUST exist?

required_artifacts:
  - path: src/components/Chat.tsx
    check: "Exports Chat component"
  - path: src/api/messages.ts
    check: "Exports sendMessage, getMessages"
  - path: src/hooks/useChat.ts
    check: "Exports useChat hook"

3. Required Wiring

What connections MUST work?

required_wiring:
  - from: Chat.tsx
    to: useChat.ts
    check: "Component calls hook"
  - from: useChat.ts
    to: messages.ts
    check: "Hook calls API"
  - from: messages.ts
    to: database
    check: "API persists to DB"

4. Key Links (Where Stubs Hide)

What integration points commonly fail?

key_links:
  - "Form onSubmit → API call (not just console.log)"
  - "WebSocket connection → message handler"
  - "API response → state update → render"

Verification Process

Phase Verification

After all tasks in a phase complete:

1. Load must-haves (from phase goal or PLAN frontmatter)
2. For each observable truth:
   a. Level 1: Does the relevant code exist?
   b. Level 2: Is it substantive?
   c. Level 3: Is it wired?
3. For each required artifact:
   a. Verify file exists
   b. Verify not a stub
   c. Verify it's imported/used
4. For each key link:
   a. Trace the connection
   b. Verify data flows
5. Scan for anti-patterns (see below)
6. Structure gaps for re-planning

Anti-Pattern Scanning

Check for common incomplete work:

Pattern	Detection	Meaning
`// TODO`	Grep for TODO comments	Work deferred
`throw new Error('Not implemented')`	Grep for stub errors	Placeholder code
`return null` / `return {}`	AST analysis	Empty implementations
`console.log` in handlers	Grep for console.log	Debug code left behind
Empty catch blocks	AST analysis	Swallowed errors
Hardcoded values	Manual review	Missing configuration

Verification Output

Pass Case

# 2-VERIFICATION.md
phase: 2
status: PASS
verified_at: 2024-01-15T10:30:00Z

observable_truths:
  - truth: "User can send message"
    status: VERIFIED
    evidence: "Chat.tsx:45 calls sendMessage on submit"
  - truth: "Messages persist"
    status: VERIFIED
    evidence: "messages.ts:23 inserts to SQLite"

required_artifacts:
  - path: src/components/Chat.tsx
    status: EXISTS
    check: PASSED
  - path: src/api/messages.ts
    status: EXISTS
    check: PASSED

anti_patterns_found: []

human_verification_needed:
  - "Visual layout matches design"
  - "Real-time updates work under load"

Fail Case (Gaps Found)

# 2-VERIFICATION.md
phase: 2
status: GAPS_FOUND
verified_at: 2024-01-15T10:30:00Z

gaps:
  - type: STUB
    location: src/hooks/useChat.ts:34
    description: "sendMessage returns immediately without API call"
    severity: BLOCKING

  - type: MISSING_WIRING
    location: src/components/Chat.tsx
    description: "WebSocket not connected, no real-time updates"
    severity: BLOCKING

  - type: ANTI_PATTERN
    location: src/api/messages.ts:67
    description: "Empty catch block swallows errors"
    severity: HIGH

remediation_plan:
  - "Connect useChat to actual API endpoint"
  - "Initialize WebSocket in Chat component"
  - "Add error handling to API calls"

User Acceptance Testing (UAT)

Verification confirms code correctness. UAT confirms user experience.

UAT Process

Extract testable deliverables from phase goal
Walk user through each one:
- "Can you log in with your email?"
- "Does the dashboard show your projects?"
- "Can you create a new project?"
Record result: PASS, FAIL, or describe issue
If issues found:
- Diagnose root cause
- Create targeted fix plan
If all pass: Phase complete

UAT Output

# 2-UAT.md
phase: 2
tested_by: user
tested_at: 2024-01-15T14:00:00Z

test_cases:
  - case: "Login with email"
    result: PASS

  - case: "Dashboard shows projects"
    result: FAIL
    issue: "Shows loading spinner forever"
    diagnosis: "API returns 500, missing auth header"

  - case: "Create new project"
    result: BLOCKED
    reason: "Cannot test, dashboard not loading"

fix_required: true
fix_plan:
  - task: "Add auth header to dashboard API call"
    files: [src/api/projects.ts]
    priority: P0

Integration with Task Workflow

Task Completion Hook

When task closes:

Worker marks task closed with reason
If all phase tasks closed, trigger phase verification
Verifier agent runs goal-backward check
If PASS: Phase marked complete
If GAPS: Create remediation tasks, phase stays in_progress

Verification Task Type

Verification itself is a task:

type: verification
phase_id: phase-2
status: open
assigned_to: verifier-agent
priority: P0  # Always high priority

Checkpoint Types

During execution, agents may need human input. Use precise checkpoint types:

checkpoint:human-verify (90% of checkpoints)

Agent completed work, user confirms it works.

checkpoint: human-verify
prompt: "Can you log in with email and password?"
expected: "User confirms successful login"

checkpoint:decision (9% of checkpoints)

User must make implementation choice.

checkpoint: decision
prompt: "OAuth2 or SAML for SSO?"
options:
  - OAuth2: "Simpler, most common"
  - SAML: "Enterprise requirement"

checkpoint:human-action (1% of checkpoints)

Truly unavoidable manual step.

checkpoint: human-action
prompt: "Click the email verification link"
reason: "Cannot automate email client interaction"

Human Verification Needs

Some verifications require human eyes:

Category	Examples	Why Human
Visual	Layout, spacing, colors	Subjective/design judgment
Real-time	WebSocket, live updates	Requires interaction
External	OAuth flow, payment	Third-party systems
Accessibility	Screen reader, keyboard nav	Requires tooling/expertise

Mark these explicitly in verification output. Don't claim PASS when human verification is pending.

7.5 KiB Raw Blame History