Files
Codewalkers/docs/agent.md
Lukas May 0f1c578269 fix: Fail fast when agent worktree creation or branch setup fails
Previously, branch computation errors and ensureBranch failures were
silently swallowed for all tasks, allowing execution agents to spawn
without proper git isolation. This caused alert-pony to commit directly
to main instead of its task branch.

- manager.ts: Verify each project worktree subdirectory exists after
  createProjectWorktrees; throw if any are missing. Convert passive
  cwdVerified log to a hard guard.
- dispatch/manager.ts: Make branch computation and ensureBranch errors
  fatal for execution tasks (execute, verify, merge, review) while
  keeping them non-fatal for planning tasks.
2026-03-06 14:08:59 +01:00

274 lines
17 KiB
Markdown

# Agent Module
`apps/server/agent/` — Agent lifecycle management, output parsing, multi-provider support, and account failover.
## File Inventory
| File | Purpose |
|------|---------|
| `types.ts` | Core types: `AgentInfo`, `AgentManager` interface, `SpawnOptions`, `StreamEvent` |
| `manager.ts` | `MultiProviderAgentManager` — main orchestrator class |
| `process-manager.ts` | `AgentProcessManager` — worktree creation, command building, detached spawn |
| `output-handler.ts` | `OutputHandler` — JSONL stream parsing, completion detection, proposal creation, task dedup, task dependency persistence |
| `file-tailer.ts` | `FileTailer` — watches output files, fires parser + raw content callbacks |
| `file-io.ts` | Input/output file I/O: frontmatter writing, signal.json reading, tiptap conversion. Output files support `action` field (create/update/delete) for chat mode CRUD. Includes `writeErrandManifest()` for errand agent input files. |
| `markdown-to-tiptap.ts` | Markdown to Tiptap JSON conversion using MarkdownManager |
| `index.ts` | Public exports, `ClaudeAgentManager` deprecated alias |
### Sub-modules
| Directory | Purpose |
|-----------|---------|
| `providers/` | Provider registry, presets (7 providers), config types |
| `providers/parsers/` | Provider-specific output parsers (Claude JSONL, generic line) |
| `accounts/` | Account discovery, config dir setup, credential management, usage API |
| `credentials/` | `AccountCredentialManager` — credential injection per account |
| `lifecycle/` | `LifecycleController` — retry policy, signal recovery, missing signal instructions |
| `prompts/` | Mode-specific prompt builders (execute, discuss, plan, detail, refine, chat, conflict-resolution, errand) + shared blocks (test integrity, deviation rules, git workflow, session startup, progress tracking) + inter-agent communication instructions. Conflict-resolution uses a minimal inline startup (pwd, git status, CLAUDE.md) instead of the full `SESSION_STARTUP`/`CONTEXT_MANAGEMENT` blocks. |
## Key Flows
### Spawning an Agent
1. **tRPC procedure** calls `agentManager.spawn(options)`
2. Manager generates alias (adjective-animal), creates DB record. Appends inter-agent communication and preview instructions unless `skipPromptExtras: true` (used by conflict-resolution agents to keep prompts lean).
3. `AgentProcessManager.createProjectWorktrees()` — creates git worktrees at `agent-workdirs/<alias>/<project>/`. After creation, each project subdirectory is verified to exist; missing worktrees throw immediately to prevent agents running in the wrong directory.
4. `file-io.writeInputFiles()` — writes `.cw/input/` with assignment files (initiative, pages, phase, task) and read-only context dirs (`context/phases/`, `context/tasks/`)
5. Provider config builds spawn command via `buildSpawnCommand()`
6. `spawnDetached()` — launches detached child process with file output redirection
7. `FileTailer` watches output file, fires `onEvent` (parsed stream events) and `onRawContent` (raw JSONL chunks) callbacks
8. `onRawContent` → DB insert via `createLogChunkCallback()``agent:output` event emitted (single emission point)
9. `OutputHandler.handleStreamEvent()` processes parsed events (session tracking, result capture — no event emission)
10. DB record updated with PID, output file path, session ID
11. `agent:spawned` event emitted
### Completion Detection
1. Polling detects process exit, `FileTailer.stop()` flushes remaining output
2. `OutputHandler.handleCompletion()` triggered
3. **Path resolution**: Uses `ActiveAgent.agentCwd` (recorded at spawn) to locate signal.json. Standalone agents run in a `workspace/` subdirectory under `agent-workdirs/<alias>/`, so the base `getAgentWorkdir()` path won't contain `.cw/output/signal.json`. Reconciliation and crash detection paths also probe for the `workspace/` subdirectory when `.cw/output` is missing at the base level.
4. **Primary path**: Reads `.cw/output/signal.json` from agent worktree
5. Signal contains `{ status: "done"|"questions"|"error", result?, questions?, error? }`
6. Agent DB status updated accordingly (idle, waiting_for_input, crashed)
7. For `done`: proposals created from structured output; `agent:stopped` emitted
8. For `questions`: parsed and stored as `pendingQuestions`; `agent:waiting` emitted
9. **Fallback**: If signal.json missing, lifecycle controller retries with instruction injection
### Account Failover
1. On usage-limit error, `markAccountExhausted(id, until)` called
2. `findNextAvailable(provider)` returns least-recently-used non-exhausted account
3. Agent re-spawned with new account's credentials
4. `agent:account_switched` event emitted
### Resume Flow
1. tRPC `resumeAgent` called with `answers: Record<string, string>`
2. Manager looks up agent's session ID and provider config
3. `buildResumeCommand()` creates resume command with session flag
4. `formatAnswersAsPrompt(answers)` converts answers to prompt text
5. New detached process spawned, same worktree, incremented session number
## Provider Configuration
Providers defined in `providers/presets.ts`:
| Provider | Command | Resume | Prompt Mode |
|----------|---------|--------|-------------|
| claude | `claude` | `--resume <id>` | native (`-p`) |
| claude-code | `claude` | `--resume <id>` | native |
| codex | `codex` | none | flag (`--prompt`) |
| aider | `aider` | none | flag (`--message`) |
| cline | `cline` | none | flag |
| continue | `continue` | none | flag |
| cursor-agent | `cursor` | none | flag |
Each provider config specifies: `command`, `args`, `resumeStyle`, `promptMode`, `structuredOutput`, `sessionId` extraction, `nonInteractive` options.
## Output Parsing
The `OutputHandler` processes JSONL streams from Claude CLI:
- `init` event → session ID extracted and persisted
- `text_delta` events → no-op in handler (output streaming handled by DB log chunks)
- `result` event → final result with structured data captured on `ActiveAgent`
- Signal file (`signal.json`) → authoritative completion status
**Output event flow**: `FileTailer.onRawContent()` → DB `insertChunk()``EventBus.emit('agent:output')`. This is the single emission point — no events from `handleStreamEvent()` or `processLine()`.
For providers without structured output, the generic line parser accumulates raw text.
## Credential Management
`AccountCredentialManager` in `credentials/` handles OAuth token lifecycle:
- `read()` — extracts `claudeAiOauth` from `.credentials.json`. Only `accessToken` is required; `refreshToken` and `expiresAt` may be null (setup tokens).
- `isExpired()` — returns false when `expiresAt` is null (setup tokens never "expire" from our perspective).
- `ensureValid()` — if expired and `refreshToken` exists, refreshes. If expired with no `refreshToken`, returns invalid with error.
### Setup Tokens
Setup tokens (from `claude setup-token`) are long-lived OAuth access tokens with no refresh token or expiry. Register via:
```sh
cw account add --token <token> --email user@example.com
```
Stored as `credentials: {"claudeAiOauth":{"accessToken":"<token>"}}` and `configJson: {"hasCompletedOnboarding":true}`.
## Errand Agent Support
### `sendUserMessage(agentId, message)`
Delivers a user message directly to a running or idle errand agent without going through the conversations table. Used by the `errand.sendMessage` tRPC procedure.
**Steps**: look up agent → validate status (`running`|`idle`) → validate `sessionId` → clear signal.json → update status to `running` → build resume command → stop active tailer/poll → spawn detached → start polling.
**Key difference from `resumeForConversation`**: no `conversationResumeLocks`, no conversations table entry, raw message passed as resume prompt.
### `writeErrandManifest(options)`
Writes errand input files to `<agentWorkdir>/.cw/input/`:
- `errand.md` — YAML frontmatter with `id`, `description`, `branch`, `project`
- `manifest.json``{ errandId, agentId, agentName, mode: "errand" }` (no `files`/`contextFiles` arrays)
- `expected-pwd.txt` — the agent workdir path
Written in order: `errand.md` first, `manifest.json` last (same discipline as `writeInputFiles`).
### `buildErrandPrompt(description)`
Builds the initial prompt for errand agents. Exported from `prompts/errand.ts` and re-exported from `prompts/index.ts`. The prompt instructs the agent to make only the changes needed for the description and write `signal.json` when done.
## Auto-Resume for Conversations
When Agent A asks Agent B a question via `cw ask` and Agent B is idle, the conversation router automatically resumes Agent B's session. This mirrors the `resumeForCommit()` pattern.
### Flow
1. `createConversation` tRPC procedure creates the conversation record
2. Target resolution prefers `running` agents, falls back to `idle` (previously only matched `running`)
3. After creation, checks if target agent is idle → calls `agentManager.resumeForConversation()`
4. Agent resumes with a prompt to: answer via `cw answer`, drain pending conversations via `cw listen`, then complete
### Guards
- Agent must be `idle` status with a valid `sessionId`
- Provider must support resume (`resumeStyle !== 'none'`)
- Worktree must still exist (`existsSync` check)
- In-memory `conversationResumeLocks` Set prevents double-resume race when multiple conversations arrive simultaneously
- Resume failure is caught and logged — conversation is always created even if resume fails
## Auto-Cleanup & Commit Retries
After an agent completes (status → `idle`), `tryAutoCleanup` checks if its project worktrees have uncommitted changes:
1. `CleanupManager.getDirtyWorktreePaths()` runs `git status --porcelain` in each project subdirectory (not the parent `agent-workdirs/<alias>/` dir), returns `{ name, absPath }[]`
2. If all clean → worktrees and logs removed immediately
3. If dirty → `resumeForCommit()` resumes the agent's session with a prompt listing **absolute paths** to dirty subdirectories, using `git add -u` (tracked files only) to avoid staging unrelated files
4. The agent `cd`s into each listed absolute path and commits tracked changes only
5. On next completion, cleanup runs again. `MAX_COMMIT_RETRIES` (1) limits retries — after that the workdir is left in place with a warning
The retry counter is cleaned up on: successful removal, max retries exceeded, or unexpected error. It is **not** cleaned up when a commit retry is successfully launched (so the counter persists across the retry cycle).
## Log Chunks
Agent output is persisted to `agent_log_chunks` table and drives all live streaming:
- `onRawContent` callback fires for every raw JSONL chunk from `FileTailer`
- DB insert → `agent:output` event emission (single source of truth for UI)
- No FK to agents — survives agent deletion
- Session tracking: spawn=1, resume=previousMax+1
- Read path (`getAgentOutput` tRPC): returns timestamped chunks `{ content, createdAt }[]` from DB
- Live path (`onAgentOutput` subscription): listens for `agent:output` events (client stamps with `Date.now()`)
- Frontend: initial query loads timestamped chunks, subscription accumulates live chunks, both parsed via `parseAgentOutput()` which accepts `TimestampedChunk[]`
- Timestamps displayed inline (HH:MM:SS) on text, tool_call, system, and session_end messages
## Inter-Agent Communication
Agents can communicate with each other via the `conversations` table, coordinated through CLI commands.
### Prompt Integration
`buildInterAgentCommunication(agentId, mode)` function in `prompts/shared.ts` generates per-agent communication instructions. Called in `manager.ts` after agent record creation — the actual agent ID is injected directly into the prompt (no manifest.json indirection).
**Mode-aware branching:**
- **Planning modes** (`plan`, `refine`): Minimal block — just the agent ID and `cw ask` syntax for emergencies. These agents define high-level structure, not implementation details, so real-time coordination is almost never needed.
- **Execution + coordination modes** (`execute`, `detail`, `discuss`, `verify`, `merge`, `review`): Full protocol including:
1. Commands table with accurate CLI behavior descriptions
2. Numbered shell recipe for background listener lifecycle (start → check → answer → restart → cleanup)
3. Targeting guidance (`--agent-id` vs `--task-id` vs `--phase-id`)
4. Decision criteria: when to ask (uncommitted interfaces, shared file conflicts) and when NOT to ask (answer in codebase, answer in input files, not blocked, confirming approach)
5. Good/bad examples using `<example label>` pattern
6. Answering guidelines (be specific — include code snippets, file paths, type signatures)
### Agent Identity
`manifest.json` includes `agentId` and `agentName` fields. The manager passes these from the DB record after agent creation. The agent ID is also injected directly into the prompt's communication instructions.
### CLI Commands
**`cw listen --agent-id <id>`**
- Subscribes to `onPendingConversation` SSE subscription, prints first pending as JSON, exits with code 0
- First yields any existing pending conversations from DB, then listens for `conversation:created` events
- Output: `{ conversationId, fromAgentId, question, phaseId?, taskId? }`
**`cw ask <question> --from <agentId> --agent-id|--task-id|--phase-id <target>`**
- Creates conversation, subscribes to `onConversationAnswer` SSE, prints answer text to stdout when answered
- Target resolution: `--agent-id` (direct), `--task-id` (find agent running task), `--phase-id` (find agent in phase)
**`cw answer <answer> --conversation-id <id>`**
- Calls `answerConversation`, prints `{ conversationId, status: "answered" }`
## Prompt Architecture
Mode-specific prompts in `prompts/` use XML tags as top-level structural delimiters, with markdown formatting inside tags. This separates first-order instructions from second-order content (task descriptions, examples, templates) per Anthropic best practices. The old `apps/server/agent/prompts.ts` (flat markdown) has been deleted.
### XML Tag Structure
All prompts follow a consistent tag ordering:
1. `<role>` — agent identity and mode
2. `<task>` — dynamic task content (execute mode only)
3. `<input_files>` — file format documentation
4. `<codebase_exploration>` — codebase grounding instructions (architect modes only)
5. `<output_format>` — what to produce, file paths, frontmatter
6. `<id_generation>` — ID creation via `cw id`
7. `<signal_format>` — completion signaling
8. `<session_startup>` — startup verification steps
9. Mode-specific tags (see below)
10. Rules/constraints tags
11. `<progress_tracking>` / `<context_management>`
12. `<definition_of_done>` — completion checklist
13. `<workspace>` — workspace layout (appended by manager)
14. `<inter_agent_communication>` — per-agent CLI instructions (appended by manager)
### Shared Blocks (`prompts/shared.ts`)
| Constant / Function | XML Tag | Content |
|---------------------|---------|---------|
| `SIGNAL_FORMAT` | `<signal_format>` | Done/questions/error via `.cw/output/signal.json` |
| `INPUT_FILES` | `<input_files>` | Manifest, assignment files, context files |
| `ID_GENERATION` | `<id_generation>` | `cw id` usage for generating entity IDs |
| `TEST_INTEGRITY` | `<test_integrity>` | No self-validating tests, no assertion mutation, no skipping, independent tests, full suite runs |
| `SESSION_STARTUP` | `<session_startup>` | Confirm working directory, check git state, establish green test baseline, read assignment |
| `PROGRESS_TRACKING` | `<progress_tracking>` | Maintain `.cw/output/progress.md` after each commit — survives context compaction |
| `DEVIATION_RULES` | `<deviation_rules>` | Typo→fix, bug→fix if small, missing dep→coordinate, architectural mismatch→STOP |
| `GIT_WORKFLOW` | `<git_workflow>` | Specific file staging (no `git add .`), no force-push, check status first |
| `CODEBASE_EXPLORATION` | `<codebase_exploration>` | Architect-mode codebase grounding: read project docs, explore structure, check existing patterns, use subagents for parallel exploration |
| `CONTEXT_MANAGEMENT` | `<context_management>` | Parallel file reads, cross-reference to progress tracking |
| `buildInterAgentCommunication()` | `<inter_agent_communication>` | Per-agent CLI instructions for `cw listen`, `cw ask`, `cw answer` |
### Mode-Specific Tags
| Mode | File | Mode-Specific Tags |
|------|------|--------------------|
| **execute** | `execute.ts` | `<task>`, `<execution_protocol>`, `<anti_patterns>`, `<scope_rules>` |
| **plan** | `plan.ts` | `<phase_design>`, `<dependencies>`, `<file_ownership>`, `<specificity>`, `<existing_context>` |
| **detail** | `detail.ts` | `<task_body_requirements>`, `<file_ownership>`, `<task_sizing>`, `<existing_context>` |
| **discuss** | `discuss.ts` | `<analysis_method>`, `<question_quality>`, `<decision_quality>`, `<question_categories>`, `<rules>` |
| **refine** | `refine.ts` | `<improvement_priorities>`, `<rules>` |
| **chat** | `chat.ts` | `<chat_history>`, `<instruction>` — iterative refinement loop, uses action field (create/update/delete) in output files, signals "questions" after each change to stay alive |
Examples within mode-specific tags use `<examples>` > `<example label="good">` / `<example label="bad">` nesting.
### Execute Prompt Dispatch
`buildExecutePrompt(taskDescription?)` accepts an optional task description wrapped in a `<task>` tag. The dispatch manager (`apps/server/dispatch/manager.ts`) wraps `task.description || task.name` in `buildExecutePrompt()` so execute agents receive full system context alongside their task. The `<workspace>` and `<inter_agent_communication>` blocks are appended by the agent manager at spawn time.