braiin

--- name: braiin description: Guide to integrating the `braiin` npm library (TypeScript LLM orchestrator). Covers the three primitives (Tool, Agent, Orchestrator), the 4-action LLM protocol (describe / call / finish / abort), OrchestratorConfig options, task result shape, toolTraces persistence, streaming the final answer via the onToken callback, both backends (OpenAI-compatible HTTP API and local Claude Code CLI), and best practices. Use when writing code that imports `braiin`, designing tools/agents, streaming responses, or integrating an LLM orchestration layer. --- # braiin ## What it is `braiin` (Behavioral Reasoning AI for Intelligent Navigation) is a TypeScript orchestrator that lets an LLM route between specialized **agents** and their **tools** to accomplish a task. Two backends are supported: - **OpenAI-compatible HTTP API** (default) — works with any provider exposing `/chat/completions`: OpenAI, Anthropic (compat), OpenRouter, Together, Groq, Ollama, vLLM, Azure, etc. - **Local Claude Code CLI** — spawns the `claude` binary as a child process; the CLI maintains its own session memory so `history` does not need to be re-sent every turn. - Package: `braiin` on npm. Peer dependency: `openai ^6.0.0`. For the Claude Code backend, `claude` must be installed and on `PATH`. - The orchestrator and the LLM exchange a strict JSON protocol; free-form text replies are rejected. - Source-of-truth README: the library itself is small (~300 LOC core); when in doubt, read `node_modules/braiin/dist` types. ## Mental model Three primitives, nested: ``` Orchestrator ── has ──► Agent[] ── has ──► Tool[] ``` - **Tool**: one atomic capability (e.g. `read-file`, `fetch-user`). Has a `tag`, description, input schema (string or structured), and a `call(input)` that returns a string. - **Agent**: a themed bundle of tools (e.g. "file-agent", "user-agent") with a name and description. - **Orchestrator**: owns the agents and runs the reasoning loop. `executeTask(prompt)` returns a `TaskResult`. ## Public API ```ts import { createAgent, createOrchestrator, Tool, Agent, Orchestrator, TaskResult, ToolTrace, LLMMessage, LLMMessageRole } from 'braiin' ``` `createAgent` and `createOrchestrator` are the only factories exported. Types are exported for annotation, not runtime use. ## Defining a Tool ### Simple string input ```ts import { Tool } from 'braiin' export const userRetrieverTool: Tool = { tag: 'user-retriever', description: "Retrieve a user's information from its name", input: "The user's name", output: "A JSON object with the user's info, or empty string if not found", call: async (userName) => { const user = users.find(u => u.name === userName) return user ? JSON.stringify(user) : '' } } ``` ### Structured object input ```ts export const writeFileTool: Tool = { tag: 'write-file', description: 'Write content to a file', input: [ { name: 'path', description: 'File path', required: true }, { name: 'content', description: 'File content', required: true } ], output: 'Confirmation message', call: async (input) => { const { path, content } = input as Record<string, string> await fs.writeFile(path, content) return `Wrote ${content.length} chars to ${path}` } } ``` **Rules for Tool authors:** - `call` MUST return a `string`. Serialize objects with `JSON.stringify`. - `call` MAY throw — the orchestrator catches and returns `status:'error'`. - `tag` must be unique **per agent** (the lookup key). - `description`, `input`, `output` are all read by the LLM — write them as short instructions, not prose. - When `input` is an array, the LLM receives an object `{ [name]: value }`. When it's a string, `input` arrives as a plain string. ## Creating an Agent ```ts import { createAgent } from 'braiin' const userAgent = createAgent( 'user-agent', 'You are a useful assistant that answers questions about users.', [userRetrieverTool, userBirthYearTool] ) ``` Agent names must be unique across the orchestrator. Pick kebab-case names that the LLM can recognize without ambiguity. ## Creating the Orchestrator ```ts import { createOrchestrator } from 'braiin' const orchestrator = createOrchestrator( [userAgent, fileAgent], { apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o', temperature: 0 } ) const result = await orchestrator.executeTask('When was User 1 born?') ``` ## OrchestratorConfig — all options | Option | Default | When to set | | --- | --- | --- | | `backend` | `'openai'` | `'openai'` for OpenAI-compatible HTTP API, `'claude-code'` for the local Claude Code CLI. | | `optionalPrompt` | `undefined` | Extra instructions appended to the system prompt. For tone, domain facts, output constraints, etc. | | `maxSteps` | `50` | Hard cap on reasoning iterations. Lower it aggressively (5–15) for bounded tasks to fail fast. | | `stepsInterval` | `undefined` | ms to wait between steps. Useful for rate-limited providers. | | `timeoutMs` | `60000` | Per-LLM-call timeout. Combined with `signal` via `AbortSignal.any`. | | `signal` | `undefined` | User-provided `AbortSignal` to cancel in-flight calls. | | `llmService` | *internal* | Inject a mock `LLMService` for testing — see "Testing". | | **OpenAI backend** | | (used when `backend` is `'openai'` or omitted) | | `apiKey` | *required* | API key for the provider. | | `model` | `'gpt-4o'` | Model ID. Anything the endpoint accepts. | | `serverUrl` | `'https://api.openai.com/v1'` | Any OpenAI-compatible base URL. | | `temperature` | `0` | Keep at `0` for reliable protocol adherence. Raise only for creative final answers. | | `maxTokens` | `8192` | Per-call cap on completion tokens. | | `enablePromptCaching` | `false` | Enable when the system prompt is large. See "Prompt caching" below. | | `enforceJsonOutput` | `false` | Passes `response_format: { type: 'json_object' }` to the API. Forces the LLM to emit strict JSON (no prose preamble). Supported by OpenAI, Anthropic (compat), OpenRouter, Groq. **Stays active during streaming** (`onToken`) — the answer is streamed out of the JSON `answer` field, so enforcement and live streaming work together. See "Enforcing JSON output" below. | | **Claude Code backend** | | (used when `backend` is `'claude-code'`) | | `sessionId` | *required* | UUID identifying the Claude Code session. Create-or-resume: if the UUID is unknown, a new session is created with that exact ID; if it exists, the session resumes silently. | | `cliPath` | `'claude'` | Path to the `claude` binary. Override only if not on `PATH`. | ## Local Claude Code backend Set `backend: 'claude-code'` and pass a `sessionId` (UUID) to use the locally-installed `claude` CLI as the LLM backend instead of an HTTP API: ```ts import { randomUUID } from 'node:crypto' const orchestrator = createOrchestrator( [userAgent], { backend: 'claude-code', sessionId: randomUUID() } ) ``` Each `executeTask` call spawns `claude -p --session-id <uuid> --system-prompt <braiin-prompt> --tools "" --output-format json <prompt>` as a child process and parses the JSON result. Key properties of this backend: - **Session memory is owned by the CLI.** The `history` argument passed to `LLMService.ask` is intentionally ignored — Claude Code maintains the conversation transcript on disk under the given session ID. This saves tokens (no history re-injection) and means the same `sessionId` reused across orchestrators picks up where the previous task left off. - **Tool use is fully disabled** for security (`--tools ""`). Claude Code acts as a pure LLM; the only tool execution path is through your BRAIIN tools. - **No true token streaming.** The CLI returns the full result at once. Both callbacks still fire but in a single emit (plus the `[[END]]` marker): `logCallback` gets the whole step response, and `onToken` gets the whole final answer at the end — not incremental chunks. - **`enforceJsonOutput` does not apply.** The CLI has no `response_format` equivalent. The strict JSON discipline relies entirely on the BRAIIN system prompt and the tolerant `extractJson` parser. - **`enablePromptCaching` does not apply.** Claude Code already caches its session prefix internally. - **Errors map to `LLMResponse.error`** (not thrown): missing CLI binary, non-zero exit, invalid JSON, timeout/abort. The orchestrator surfaces them as `TaskResult { status: 'error', answer: '...' }` like the HTTP backend. - **Cancellation**: `signal` and `timeoutMs` work — they kill the spawned process with SIGTERM. The session state is persisted before exit, so the next call with the same `sessionId` resumes cleanly. When to use it: - You want to avoid managing API keys and rate limits during local development. - The user already has a Claude Code subscription and you don't want to bill API tokens for orchestration. - You want long-running multi-turn conversations where re-injecting `history` would be wasteful. When not to use it: - Production / multi-tenant environments where you cannot guarantee `claude` is on `PATH` everywhere. - Throughput-sensitive workloads — child-process spawn per turn is significantly slower than an HTTP request. - Cases where `enforceJsonOutput` is critical (very small/old models that hallucinate prose preambles). ## The LLM protocol (4 actions) The orchestrator and the LLM exchange one JSON object per turn. Every reply MUST include an `action` field. ```jsonc // 1. Ask an agent for the schemas of its tools { "action": "describe", "agent": "user-agent" } // 2. Call one of an agent's tools { "action": "call", "agent": "user-agent", "tool": "user-retriever", "input": "User 1" } // or with structured input: { "action": "call", "agent": "file-agent", "tool": "write-file", "input": { "path": "/tmp/x", "content": "..." } } // 3. Return the final answer { "action": "finish", "answer": "User 1 was born in 2001." } // 4. Give up with a reason { "action": "abort", "reason": "Cannot find any tool that returns birth dates." } ``` A typical chain: `describe` → `call` → (more calls) → `finish`. The orchestrator also accepts **legacy shapes** (`{"tool":"finished","input":"..."}`, `{"tool":"none","input":"..."}`, `{"agent":"...","tool":"...","input":"..."}`) for backward compatibility, but new integrations should emit the canonical `action` form. The parser (`extractJson`) tolerates markdown fences and surrounding prose — the LLM does not have to emit pure JSON, but it must contain a valid balanced JSON object somewhere. **Streaming the `finish` answer.** The `finish` action is always plain JSON — `{"action":"finish","answer":"..."}` — whether or not you stream. When an `onToken` callback is passed to `executeTask` (see "Observability & streaming"), the orchestrator streams the **value of the `answer` field** as the model writes it, decoding JSON string escapes on the fly. No special wire format and no marker: the protocol is identical to the non-streaming case, so `enforceJsonOutput` can stay on. ## TaskResult ```ts interface TaskResult { status: 'success' | 'error' answer: string // the final answer OR the error message toolTraces: ToolTrace[] // every tool call made during this task } interface ToolTrace { tool: string // tool tag input: string | Record<string, any> // whatever was passed result: string // whatever `call` returned } ``` **Always check `status` before using `answer`** — an error message lives in the same field. ## Persistence via toolTraces (follow-up questions) Pass prior `toolTraces` to re-use their results without re-fetching: ```ts const first = await orchestrator.executeTask("Tell me about User 1's wife") // first.toolTraces holds ONLY the traces produced by this call const second = await orchestrator.executeTask( 'Has User 1 been married before?', [], // history (conversation-level LLM messages, optional) first.toolTraces // inject as known context (re-used, not re-fetched) ) // second.toolTraces holds ONLY the second call's traces. // Accumulate across turns yourself if you need a running history: const allTraces = [...first.toolTraces, ...second.toolTraces] ``` `TaskResult.toolTraces` contains **only the traces produced during that call** (since v0.5.0). The `toolTraces` you pass in are injected as a synthetic `system` message (`"Known context from previous interactions: ..."`) so the model reuses their results instead of re-fetching — but they are **not echoed back** into the next result. The caller owns accumulation. (Before v0.5.0 the passed traces were prepended to the returned `toolTraces`, conflating "context in" with "produced out".) The `history` parameter is distinct: it is a raw `LLMMessage[]` of prior user/assistant exchanges for multi-turn conversations. ## Observability & streaming `executeTask` takes two **independent** callbacks with different jobs — pass either, both, or neither: ```ts await orchestrator.executeTask( prompt, history, // LLMMessage[] (optional) toolTraces, // ToolTrace[] (optional) logCallback, // (log: string) => void — debug: raw protocol, every step onToken // (token: string) => void — clean final answer, streamed live ) ``` ### `logCallback` (4th arg) — debug stream Providing `logCallback` switches the LLM call to **streaming mode**. It receives the **raw response of every step** — `describe`, `call`, *and* `finish` — chunk by chunk (plus a `[[END]]` marker per step). This is the *whole JSON protocol*, not a clean answer: use it for tracing/debugging, not for showing to an end user. The orchestrator still receives a full `LLMResponse` once each step's stream closes. ### `onToken` (5th arg) — final-answer stream Provide `onToken` to stream **only the final answer**, token by token, exactly as the model writes it — no JSON, no intermediate `describe`/`call` steps. This is what you wire to a chat UI. ```ts await orchestrator.executeTask('Explain X', [], [], undefined, (t) => process.stdout.write(t)) ``` It is **opt-in and costs no extra LLM call**. When `onToken` is set, the orchestrator runs a stateful filter over the streamed deltas that parses the JSON incrementally and forwards **only the characters inside the top-level `answer` field** (escapes decoded on the fly). The `describe`/`call`/`abort` steps have no `answer` key, so they emit nothing to `onToken` — the filter is self-gating across the whole chain. The only buffered part is the tiny `{"action":"finish","answer":"` envelope prefix. Notes: - The same final answer is still returned in `TaskResult.answer` after the chain completes — `onToken` is purely additive. - `onToken` is **fully compatible with `enforceJsonOutput`** (and they are recommended together): the model keeps emitting strict JSON, and only the `answer` value is streamed. Protocol JSON and model reasoning can never leak into the stream. - On the **Claude Code backend** (non-streaming), `onToken` still fires but delivers the answer in a **single emit** at the end rather than token-by-token. Same for `logCallback`. ## Enforcing JSON output (`enforceJsonOutput`) Some LLMs occasionally prepend prose to their JSON reply ("Sure, to answer your question, I'll first… {…}"). The orchestrator's parser tolerates this, but the preamble still leaks to `logCallback` / any UI that surfaces raw responses. Turn on `enforceJsonOutput` and the orchestrator passes `response_format: { type: 'json_object' }` to the API. The provider will refuse to emit anything that isn't a valid JSON object — no preamble possible. It applies on every step, **including while streaming** (`onToken`): the answer is read out of the JSON `answer` field, so enforcement and live streaming compose cleanly. Support per provider: - **OpenAI**: native, stable on `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo-0125+`. - **Anthropic (OpenAI-compat endpoint)**: accepted, mapped to their JSON mode on Claude 3.5+. - **OpenRouter, Groq, Together**: forward the flag to the underlying model when supported. - **Local/old providers (vLLM, Ollama, older endpoints)**: may reject the unknown field — keep the flag off in that case. Caveat: providers require the word "JSON" to appear somewhere in the prompt. BRAIIN's system prompt already mentions it repeatedly, so this is satisfied. ## Prompt caching (`enablePromptCaching`) The system prompt (agent descriptions + protocol) is identical across every step of a chain. Turn on `enablePromptCaching` and the orchestrator will send the system message as a content-block with `cache_control: { type: 'ephemeral' }`: - **Anthropic (native or OpenAI-compat)**: reads the marker and caches the prefix when ≥1024 tokens (2048 on Haiku). - **OpenAI**: ignores the marker but does automatic prefix caching at the same threshold — same savings, no side-effect. - **Other OpenAI-compat providers**: the marker is usually ignored silently. Rule of thumb: enable it whenever your agents/tools descriptions push the system prompt over ~1k tokens or when chains routinely exceed 5 steps. ## Best practices - **Start `temperature` at `0`.** The protocol is strict; creativity hurts it. Raise only if the final answer needs flair. - **Lower `maxSteps` aggressively.** The default `50` is a ceiling. Use `10` or less for focused tasks — errors surface faster. - **Make tool `description`/`input`/`output` LLM-facing, not dev-facing.** These strings ARE the prompt. Specify exact input shape, return shape (JSON? plain text?), and failure modes. - **Keep tool outputs short.** The full output goes back into the history on the next step. Summarize or truncate oversized payloads (e.g. paginate, return counts + samples). - **One agent per domain.** Avoid one mega-agent with 30 tools — narrow bundles help the LLM pick correctly. - **Unique tool tags per agent.** Duplicates across different agents are fine; duplicates inside one agent are a bug. - **Always handle `status === 'error'`.** Never assume success. - **For long-running tasks, wire `signal`** (via `AbortController`) to cancellation UI. - **Use `optionalPrompt` for global constraints.** Output language, forbidden topics, JSON schemas for the final answer, etc. - **Point `serverUrl` at any OpenAI-compatible endpoint** — don't pull in a second SDK for Anthropic, OpenRouter, Ollama, vLLM, etc. - **Stream the final answer to UIs with `onToken`** (5th arg), not `logCallback`. `logCallback` is the raw multi-step protocol for debugging; `onToken` is the clean final answer. ## Testing Inject a mock `LLMService` via `config.llmService` to bypass the real API: ```ts import { LLMService } from 'braiin/dist/service/llm.service' // internal path // or re-create the interface locally: interface MockLLMService { ask: (sys: string, prompt: string, history: any[]) => Promise<any> } const mockLLM: MockLLMService = { ask: async () => ({ id: 't', object: 'chat.completion', created: 0, model: 'mock', choices: [{ index: 0, message: { role: 'assistant', content: '{"action":"finish","answer":"ok"}' }, finish_reason: 'stop' }] }) } const o = createOrchestrator([agent], { apiKey: '', llmService: mockLLM as any }) ``` Script the `ask` function to return successive JSON responses to drive the chain through every branch you want to test. ## Common pitfalls - **Tool returns non-string.** `call` must stringify. Returning an object silently breaks the next step's LLM input. - **Tool description written for humans.** The LLM needs imperative schemas, not marketing copy. - **Forgetting `status` check.** Errors surface in `answer`, not via `throw`. - **Very long tool outputs.** They compound across steps and blow up token bills. Summarize. - **Setting `temperature > 0.3`.** The protocol breaks quickly as temperature rises. - **Passing `history` when you meant `toolTraces`.** Remember the signature: `executeTask(prompt, history?, toolTraces?, logCallback?, onToken?)`. - **Confusing the two callbacks.** `logCallback` (4th) streams the raw JSON of *every* step; `onToken` (5th) streams *only* the clean final answer. For a chat UI you want `onToken`. - **Expecting a clean answer stream from `logCallback`.** It emits the whole protocol (describe/call/finish), not just the answer — use `onToken` for that.