braiin
Version:
Behavioral Reasoning AI for Intelligent Navigation
358 lines (271 loc) • 20.4 kB
Markdown
---
name: braiin
description: Guide to integrating the `braiin` npm library (TypeScript LLM orchestrator). Covers the three primitives (Tool, Agent, Orchestrator), the 4-action LLM protocol (describe / call / finish / abort), OrchestratorConfig options, task result shape, toolTraces persistence, streaming the final answer via the onToken callback, both backends (OpenAI-compatible HTTP API and local Claude Code CLI), and best practices. Use when writing code that imports `braiin`, designing tools/agents, streaming responses, or integrating an LLM orchestration layer.
---
# braiin
## What it is
`braiin` (Behavioral Reasoning AI for Intelligent Navigation) is a TypeScript orchestrator that lets an LLM route between specialized **agents** and their **tools** to accomplish a task. Two backends are supported:
- **OpenAI-compatible HTTP API** (default) — works with any provider exposing `/chat/completions`: OpenAI, Anthropic (compat), OpenRouter, Together, Groq, Ollama, vLLM, Azure, etc.
- **Local Claude Code CLI** — spawns the `claude` binary as a child process; the CLI maintains its own session memory so `history` does not need to be re-sent every turn.
- Package: `braiin` on npm. Peer dependency: `openai ^6.0.0`. For the Claude Code backend, `claude` must be installed and on `PATH`.
- The orchestrator and the LLM exchange a strict JSON protocol; free-form text replies are rejected.
- Source-of-truth README: the library itself is small (~300 LOC core); when in doubt, read `node_modules/braiin/dist` types.
## Mental model
Three primitives, nested:
```
Orchestrator ── has ──► Agent[] ── has ──► Tool[]
```
- **Tool**: one atomic capability (e.g. `read-file`, `fetch-user`). Has a `tag`, description, input schema (string or structured), and a `call(input)` that returns a string.
- **Agent**: a themed bundle of tools (e.g. "file-agent", "user-agent") with a name and description.
- **Orchestrator**: owns the agents and runs the reasoning loop. `executeTask(prompt)` returns a `TaskResult`.
## Public API
```ts
import {
createAgent,
createOrchestrator,
Tool,
Agent,
Orchestrator,
TaskResult,
ToolTrace,
LLMMessage,
LLMMessageRole
} from 'braiin'
```
`createAgent` and `createOrchestrator` are the only factories exported. Types are exported for annotation, not runtime use.
## Defining a Tool
### Simple string input
```ts
import { Tool } from 'braiin'
export const userRetrieverTool: Tool = {
tag: 'user-retriever',
description: "Retrieve a user's information from its name",
input: "The user's name",
output: "A JSON object with the user's info, or empty string if not found",
call: async (userName) => {
const user = users.find(u => u.name === userName)
return user ? JSON.stringify(user) : ''
}
}
```
### Structured object input
```ts
export const writeFileTool: Tool = {
tag: 'write-file',
description: 'Write content to a file',
input: [
{ name: 'path', description: 'File path', required: true },
{ name: 'content', description: 'File content', required: true }
],
output: 'Confirmation message',
call: async (input) => {
const { path, content } = input as Record<string, string>
await fs.writeFile(path, content)
return `Wrote ${content.length} chars to ${path}`
}
}
```
**Rules for Tool authors:**
- `call` MUST return a `string`. Serialize objects with `JSON.stringify`.
- `call` MAY throw — the orchestrator catches and returns `status:'error'`.
- `tag` must be unique **per agent** (the lookup key).
- `description`, `input`, `output` are all read by the LLM — write them as short instructions, not prose.
- When `input` is an array, the LLM receives an object `{ [name]: value }`. When it's a string, `input` arrives as a plain string.
## Creating an Agent
```ts
import { createAgent } from 'braiin'
const userAgent = createAgent(
'user-agent',
'You are a useful assistant that answers questions about users.',
[userRetrieverTool, userBirthYearTool]
)
```
Agent names must be unique across the orchestrator. Pick kebab-case names that the LLM can recognize without ambiguity.
## Creating the Orchestrator
```ts
import { createOrchestrator } from 'braiin'
const orchestrator = createOrchestrator(
[userAgent, fileAgent],
{
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4o',
temperature: 0
}
)
const result = await orchestrator.executeTask('When was User 1 born?')
```
## OrchestratorConfig — all options
| Option | Default | When to set |
| --- | --- | --- |
| `backend` | `'openai'` | `'openai'` for OpenAI-compatible HTTP API, `'claude-code'` for the local Claude Code CLI. |
| `optionalPrompt` | `undefined` | Extra instructions appended to the system prompt. For tone, domain facts, output constraints, etc. |
| `maxSteps` | `50` | Hard cap on reasoning iterations. Lower it aggressively (5–15) for bounded tasks to fail fast. |
| `stepsInterval` | `undefined` | ms to wait between steps. Useful for rate-limited providers. |
| `timeoutMs` | `60000` | Per-LLM-call timeout. Combined with `signal` via `AbortSignal.any`. |
| `signal` | `undefined` | User-provided `AbortSignal` to cancel in-flight calls. |
| `llmService` | *internal* | Inject a mock `LLMService` for testing — see "Testing". |
| **OpenAI backend** | | (used when `backend` is `'openai'` or omitted) |
| `apiKey` | *required* | API key for the provider. |
| `model` | `'gpt-4o'` | Model ID. Anything the endpoint accepts. |
| `serverUrl` | `'https://api.openai.com/v1'` | Any OpenAI-compatible base URL. |
| `temperature` | `0` | Keep at `0` for reliable protocol adherence. Raise only for creative final answers. |
| `maxTokens` | `8192` | Per-call cap on completion tokens. |
| `enablePromptCaching` | `false` | Enable when the system prompt is large. See "Prompt caching" below. |
| `enforceJsonOutput` | `false` | Passes `response_format: { type: 'json_object' }` to the API. Forces the LLM to emit strict JSON (no prose preamble). Supported by OpenAI, Anthropic (compat), OpenRouter, Groq. **Stays active during streaming** (`onToken`) — the answer is streamed out of the JSON `answer` field, so enforcement and live streaming work together. See "Enforcing JSON output" below. |
| **Claude Code backend** | | (used when `backend` is `'claude-code'`) |
| `sessionId` | *required* | UUID identifying the Claude Code session. Create-or-resume: if the UUID is unknown, a new session is created with that exact ID; if it exists, the session resumes silently. |
| `cliPath` | `'claude'` | Path to the `claude` binary. Override only if not on `PATH`. |
## Local Claude Code backend
Set `backend: 'claude-code'` and pass a `sessionId` (UUID) to use the locally-installed `claude` CLI as the LLM backend instead of an HTTP API:
```ts
import { randomUUID } from 'node:crypto'
const orchestrator = createOrchestrator(
[userAgent],
{
backend: 'claude-code',
sessionId: randomUUID()
}
)
```
Each `executeTask` call spawns `claude -p --session-id <uuid> --system-prompt <braiin-prompt> --tools "" --output-format json <prompt>` as a child process and parses the JSON result.
Key properties of this backend:
- **Session memory is owned by the CLI.** The `history` argument passed to `LLMService.ask` is intentionally ignored — Claude Code maintains the conversation transcript on disk under the given session ID. This saves tokens (no history re-injection) and means the same `sessionId` reused across orchestrators picks up where the previous task left off.
- **Tool use is fully disabled** for security (`--tools ""`). Claude Code acts as a pure LLM; the only tool execution path is through your BRAIIN tools.
- **No true token streaming.** The CLI returns the full result at once. Both callbacks still fire but in a single emit (plus the `[[END]]` marker): `logCallback` gets the whole step response, and `onToken` gets the whole final answer at the end — not incremental chunks.
- **`enforceJsonOutput` does not apply.** The CLI has no `response_format` equivalent. The strict JSON discipline relies entirely on the BRAIIN system prompt and the tolerant `extractJson` parser.
- **`enablePromptCaching` does not apply.** Claude Code already caches its session prefix internally.
- **Errors map to `LLMResponse.error`** (not thrown): missing CLI binary, non-zero exit, invalid JSON, timeout/abort. The orchestrator surfaces them as `TaskResult { status: 'error', answer: '...' }` like the HTTP backend.
- **Cancellation**: `signal` and `timeoutMs` work — they kill the spawned process with SIGTERM. The session state is persisted before exit, so the next call with the same `sessionId` resumes cleanly.
When to use it:
- You want to avoid managing API keys and rate limits during local development.
- The user already has a Claude Code subscription and you don't want to bill API tokens for orchestration.
- You want long-running multi-turn conversations where re-injecting `history` would be wasteful.
When not to use it:
- Production / multi-tenant environments where you cannot guarantee `claude` is on `PATH` everywhere.
- Throughput-sensitive workloads — child-process spawn per turn is significantly slower than an HTTP request.
- Cases where `enforceJsonOutput` is critical (very small/old models that hallucinate prose preambles).
## The LLM protocol (4 actions)
The orchestrator and the LLM exchange one JSON object per turn. Every reply MUST include an `action` field.
```jsonc
// 1. Ask an agent for the schemas of its tools
{ "action": "describe", "agent": "user-agent" }
// 2. Call one of an agent's tools
{ "action": "call", "agent": "user-agent", "tool": "user-retriever", "input": "User 1" }
// or with structured input:
{ "action": "call", "agent": "file-agent", "tool": "write-file",
"input": { "path": "/tmp/x", "content": "..." } }
// 3. Return the final answer
{ "action": "finish", "answer": "User 1 was born in 2001." }
// 4. Give up with a reason
{ "action": "abort", "reason": "Cannot find any tool that returns birth dates." }
```
A typical chain: `describe` → `call` → (more calls) → `finish`. The orchestrator also accepts **legacy shapes** (`{"tool":"finished","input":"..."}`, `{"tool":"none","input":"..."}`, `{"agent":"...","tool":"...","input":"..."}`) for backward compatibility, but new integrations should emit the canonical `action` form.
The parser (`extractJson`) tolerates markdown fences and surrounding prose — the LLM does not have to emit pure JSON, but it must contain a valid balanced JSON object somewhere.
**Streaming the `finish` answer.** The `finish` action is always plain JSON — `{"action":"finish","answer":"..."}` — whether or not you stream. When an `onToken` callback is passed to `executeTask` (see "Observability & streaming"), the orchestrator streams the **value of the `answer` field** as the model writes it, decoding JSON string escapes on the fly. No special wire format and no marker: the protocol is identical to the non-streaming case, so `enforceJsonOutput` can stay on.
## TaskResult
```ts
interface TaskResult {
status: 'success' | 'error'
answer: string // the final answer OR the error message
toolTraces: ToolTrace[] // every tool call made during this task
}
interface ToolTrace {
tool: string // tool tag
input: string | Record<string, any> // whatever was passed
result: string // whatever `call` returned
}
```
**Always check `status` before using `answer`** — an error message lives in the same field.
## Persistence via toolTraces (follow-up questions)
Pass prior `toolTraces` to re-use their results without re-fetching:
```ts
const first = await orchestrator.executeTask("Tell me about User 1's wife")
// first.toolTraces holds ONLY the traces produced by this call
const second = await orchestrator.executeTask(
'Has User 1 been married before?',
[], // history (conversation-level LLM messages, optional)
first.toolTraces // inject as known context (re-used, not re-fetched)
)
// second.toolTraces holds ONLY the second call's traces.
// Accumulate across turns yourself if you need a running history:
const allTraces = [...first.toolTraces, ...second.toolTraces]
```
`TaskResult.toolTraces` contains **only the traces produced during that call** (since v0.5.0). The `toolTraces` you pass in are injected as a synthetic `system` message (`"Known context from previous interactions: ..."`) so the model reuses their results instead of re-fetching — but they are **not echoed back** into the next result. The caller owns accumulation. (Before v0.5.0 the passed traces were prepended to the returned `toolTraces`, conflating "context in" with "produced out".)
The `history` parameter is distinct: it is a raw `LLMMessage[]` of prior user/assistant exchanges for multi-turn conversations.
## Observability & streaming
`executeTask` takes two **independent** callbacks with different jobs — pass either, both, or neither:
```ts
await orchestrator.executeTask(
prompt,
history, // LLMMessage[] (optional)
toolTraces, // ToolTrace[] (optional)
logCallback, // (log: string) => void — debug: raw protocol, every step
onToken // (token: string) => void — clean final answer, streamed live
)
```
### `logCallback` (4th arg) — debug stream
Providing `logCallback` switches the LLM call to **streaming mode**. It receives the **raw response of every step** — `describe`, `call`, *and* `finish` — chunk by chunk (plus a `[[END]]` marker per step). This is the *whole JSON protocol*, not a clean answer: use it for tracing/debugging, not for showing to an end user. The orchestrator still receives a full `LLMResponse` once each step's stream closes.
### `onToken` (5th arg) — final-answer stream
Provide `onToken` to stream **only the final answer**, token by token, exactly as the model writes it — no JSON, no intermediate `describe`/`call` steps. This is what you wire to a chat UI.
```ts
await orchestrator.executeTask('Explain X', [], [], undefined, (t) => process.stdout.write(t))
```
It is **opt-in and costs no extra LLM call**. When `onToken` is set, the orchestrator runs a stateful filter over the streamed deltas that parses the JSON incrementally and forwards **only the characters inside the top-level `answer` field** (escapes decoded on the fly). The `describe`/`call`/`abort` steps have no `answer` key, so they emit nothing to `onToken` — the filter is self-gating across the whole chain. The only buffered part is the tiny `{"action":"finish","answer":"` envelope prefix.
Notes:
- The same final answer is still returned in `TaskResult.answer` after the chain completes — `onToken` is purely additive.
- `onToken` is **fully compatible with `enforceJsonOutput`** (and they are recommended together): the model keeps emitting strict JSON, and only the `answer` value is streamed. Protocol JSON and model reasoning can never leak into the stream.
- On the **Claude Code backend** (non-streaming), `onToken` still fires but delivers the answer in a **single emit** at the end rather than token-by-token. Same for `logCallback`.
## Enforcing JSON output (`enforceJsonOutput`)
Some LLMs occasionally prepend prose to their JSON reply ("Sure, to answer your question, I'll first… {…}"). The orchestrator's parser tolerates this, but the preamble still leaks to `logCallback` / any UI that surfaces raw responses.
Turn on `enforceJsonOutput` and the orchestrator passes `response_format: { type: 'json_object' }` to the API. The provider will refuse to emit anything that isn't a valid JSON object — no preamble possible. It applies on every step, **including while streaming** (`onToken`): the answer is read out of the JSON `answer` field, so enforcement and live streaming compose cleanly.
Support per provider:
- **OpenAI**: native, stable on `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo-0125+`.
- **Anthropic (OpenAI-compat endpoint)**: accepted, mapped to their JSON mode on Claude 3.5+.
- **OpenRouter, Groq, Together**: forward the flag to the underlying model when supported.
- **Local/old providers (vLLM, Ollama, older endpoints)**: may reject the unknown field — keep the flag off in that case.
Caveat: providers require the word "JSON" to appear somewhere in the prompt. BRAIIN's system prompt already mentions it repeatedly, so this is satisfied.
## Prompt caching (`enablePromptCaching`)
The system prompt (agent descriptions + protocol) is identical across every step of a chain. Turn on `enablePromptCaching` and the orchestrator will send the system message as a content-block with `cache_control: { type: 'ephemeral' }`:
- **Anthropic (native or OpenAI-compat)**: reads the marker and caches the prefix when ≥1024 tokens (2048 on Haiku).
- **OpenAI**: ignores the marker but does automatic prefix caching at the same threshold — same savings, no side-effect.
- **Other OpenAI-compat providers**: the marker is usually ignored silently.
Rule of thumb: enable it whenever your agents/tools descriptions push the system prompt over ~1k tokens or when chains routinely exceed 5 steps.
## Best practices
- **Start `temperature` at `0`.** The protocol is strict; creativity hurts it. Raise only if the final answer needs flair.
- **Lower `maxSteps` aggressively.** The default `50` is a ceiling. Use `10` or less for focused tasks — errors surface faster.
- **Make tool `description`/`input`/`output` LLM-facing, not dev-facing.** These strings ARE the prompt. Specify exact input shape, return shape (JSON? plain text?), and failure modes.
- **Keep tool outputs short.** The full output goes back into the history on the next step. Summarize or truncate oversized payloads (e.g. paginate, return counts + samples).
- **One agent per domain.** Avoid one mega-agent with 30 tools — narrow bundles help the LLM pick correctly.
- **Unique tool tags per agent.** Duplicates across different agents are fine; duplicates inside one agent are a bug.
- **Always handle `status === 'error'`.** Never assume success.
- **For long-running tasks, wire `signal`** (via `AbortController`) to cancellation UI.
- **Use `optionalPrompt` for global constraints.** Output language, forbidden topics, JSON schemas for the final answer, etc.
- **Point `serverUrl` at any OpenAI-compatible endpoint** — don't pull in a second SDK for Anthropic, OpenRouter, Ollama, vLLM, etc.
- **Stream the final answer to UIs with `onToken`** (5th arg), not `logCallback`. `logCallback` is the raw multi-step protocol for debugging; `onToken` is the clean final answer.
## Testing
Inject a mock `LLMService` via `config.llmService` to bypass the real API:
```ts
import { LLMService } from 'braiin/dist/service/llm.service' // internal path
// or re-create the interface locally:
interface MockLLMService {
ask: (sys: string, prompt: string, history: any[]) => Promise<any>
}
const mockLLM: MockLLMService = {
ask: async () => ({
id: 't', object: 'chat.completion', created: 0, model: 'mock',
choices: [{ index: 0, message: { role: 'assistant',
content: '{"action":"finish","answer":"ok"}' }, finish_reason: 'stop' }]
})
}
const o = createOrchestrator([agent], { apiKey: '', llmService: mockLLM as any })
```
Script the `ask` function to return successive JSON responses to drive the chain through every branch you want to test.
## Common pitfalls
- **Tool returns non-string.** `call` must stringify. Returning an object silently breaks the next step's LLM input.
- **Tool description written for humans.** The LLM needs imperative schemas, not marketing copy.
- **Forgetting `status` check.** Errors surface in `answer`, not via `throw`.
- **Very long tool outputs.** They compound across steps and blow up token bills. Summarize.
- **Setting `temperature > 0.3`.** The protocol breaks quickly as temperature rises.
- **Passing `history` when you meant `toolTraces`.** Remember the signature: `executeTask(prompt, history?, toolTraces?, logCallback?, onToken?)`.
- **Confusing the two callbacks.** `logCallback` (4th) streams the raw JSON of *every* step; `onToken` (5th) streams *only* the clean final answer. For a chat UI you want `onToken`.
- **Expecting a clean answer stream from `logCallback`.** It emits the whole protocol (describe/call/finish), not just the answer — use `onToken` for that.