verifiers-ts
Version:
TypeScript implementation of the verifiers framework for RL environments
320 lines (245 loc) • 9.3 kB
Markdown
# verifiers-ts
TypeScript implementation of the verifiers framework for building RL environments and evaluations with AI SDK integration.
## Overview
`verifiers-ts` provides the same core functionality as the Python `verifiers` library, enabling you to:
- Define custom interaction protocols between models and environments
- Build agents, multi-turn conversations, tool-augmented reasoning, and interactive games
- Create reusable evaluation environments with multi-criteria reward functions
- Integrate with [AI SDK](https://sdk.vercel.ai/docs) for model inference and native tool calling
## Installation
```bash
npm install verifiers-ts
```
Or if developing locally:
```bash
cd verifiers-ts
npm install
npm run build
```
## Quick Start
### Scaffold a Minimal RL Environment
```bash
pnpm dlx verifiers-ts vf-init weather-bot --minimal-rl
cd weather-bot
pnpm install
pnpm build
pnpm vf-eval -n 1 -r 1
```
This template matches the screenshot example: a tool-enabled agent, tiny dataset, and a reward built with `structuredOutputReward`. Replace the prompt, tweak the agent defaults, and you’re ready to evaluate. Remember to export `OPENAI_API_KEY` (or pass `--api-key` to `vf-eval`).
### Scaffold an Environment
```bash
pnpm dlx verifiers-ts vf-init my-environment
cd my-environment
pnpm install
pnpm build
pnpm vf-eval -n 1 -r 1
```
Customize the generated `src/index.ts`, dataset, and reward functions to match your task.
> `vf-eval` automatically compiles your TypeScript, provisions a local `.vf-eval/` virtualenv, and exposes the environment to Python tooling—no manual `uv sync` required.
> Provide `OPENAI_API_KEY` (or another provider key) so the default agent can make model calls.
### Minimal RL Environment
```typescript
import { generateText, tool } from "ai";
import { z } from "zod";
import { openai } from "@ai-sdk/openai";
import { createRLEnvironment } from "verifiers-ts";
const getCurrentWeather = tool({
description: "Get the current weather for a specific location.",
parameters: z.object({
location: z
.string()
.describe("City and state, for example: Seattle, WA"),
unit: z
.enum(["celsius", "fahrenheit"])
.describe("Temperature unit to return.")
.optional(),
}),
execute: async ({ location, unit }) => {
const preferredUnit = unit ?? "celsius";
const temperature = preferredUnit === "celsius" ? 18 : 64;
return `It is ${temperature}°${preferredUnit === "celsius" ? "C" : "F"} and sunny in ${location}.`;
},
});
const weatherAgent = {
generateText: (messages: any, options: Record<string, unknown> = {}) => {
const { tools = {}, ...rest } = options as {
tools?: Record<string, ReturnType<typeof tool>>;
};
return generateText({
model: openai("gpt-4o-mini") as any,
system:
"You are WeatherBot. When a user asks about the weather, call the getCurrentWeather tool and report the results clearly.",
temperature: 0,
tools: { getCurrentWeather, ...tools },
messages,
...rest,
});
},
tools: { getCurrentWeather },
};
const env = await createRLEnvironment({
agent: weatherAgent,
dataset: [
{
prompt: [
{
role: "user",
content: "What's the weather like in Seattle right now?",
},
],
answer: "seattle",
},
],
rewardFunction: (completion, answer) => {
const text = Array.isArray(completion)
? completion
.filter(
(msg) =>
typeof msg === "object" &&
msg !== null &&
"role" in msg &&
msg.role === "assistant"
)
.map((msg) => (msg as { content?: string }).content ?? "")
.join(" ")
: typeof completion === "string"
? completion
: "";
const normalized = text.toLowerCase();
return normalized.includes(answer) && normalized.includes("weather") ? 1 : 0;
},
});
```
### Single-Turn Environment
```typescript
import { SingleTurnEnv, Rubric, Parser } from "verifiers-ts";
function correctAnswer(params: {
completion: any;
answer: string;
}): number {
const text = extractText(params.completion);
return text.trim() === params.answer.trim() ? 1.0 : 0.0;
}
const rubric = new Rubric({
funcs: [correctAnswer],
weights: [1.0],
});
const env = new SingleTurnEnv({
dataset: myDataset,
systemPrompt: "Solve step by step",
rubric,
});
const results = await env.evaluate(
"gpt-4",
{},
10, // numExamples
1, // rolloutsPerExample
true, // scoreRollouts
32, // maxConcurrent
undefined, // maxConcurrentGeneration
undefined, // maxConcurrentScoring
process.env.OPENAI_API_KEY
);
```
### Tool-Using Environment
```typescript
import { ToolEnv, defineTool } from "verifiers-ts";
import { z } from "zod";
const calculator = defineTool(
"calculate",
"Perform arithmetic",
z.object({
expression: z.string(),
}),
async (args) => {
return eval(args.expression); // Use proper parser in production
}
);
const env = new ToolEnv({
tools: [calculator],
maxTurns: 10,
});
// AI SDK automatically handles tool calling loop
const results = await env.evaluate("gpt-4", {}, 10);
```
## Architecture
The library mirrors the Python verifiers structure:
- **Environments**: Base `Environment` class with `MultiTurnEnv`, `SingleTurnEnv`, `ToolEnv`, `StatefulToolEnv`, and `SandboxEnv` variants
- **Rubrics**: Weighted reward functions for evaluation
- **Parsers**: Extract structured information (`Parser`, `ThinkParser`, `XMLParser`)
- **Tools**: Native AI SDK tool integration using `tool()` function from 'ai' package
- **AI SDK Integration**: Uses `generateText` for model calls and automatic tool calling
## Key Features
### AI SDK Integration
- **Native Tool Calling**: Tools use AI SDK's `tool()` function with Zod schemas
- **Automatic Loop Handling**: AI SDK manages tool execution loops with `stopWhen` conditions
- **Type-Safe Tools**: Zod schemas provide runtime validation and TypeScript types
- **Structured Outputs**: Support for `generateObject` when needed
### Compatibility
- **Results Format**: Saves results in JSONL format compatible with Python `vf-tui`
- **Native TypeScript Evaluation**: TypeScript projects use native `vf-eval` CLI (no Python bridge needed)
- **Native Sandbox Client**: Direct HTTP API integration with Prime Intellect sandboxes (no Python dependencies)
- **State Management**: Same state structure as Python verifiers
## Environment Types
### SingleTurnEnv
For Q&A tasks requiring a single model response.
### MultiTurnEnv
Base class for custom interaction protocols. Override `is_completed` and `env_response`.
### ToolEnv
Uses AI SDK's native tool calling. Tools are defined with `defineTool()` and automatically handled by AI SDK.
### StatefulToolEnv
Extends `ToolEnv` for tools requiring dynamic state (e.g., sandbox IDs).
### SandboxEnv
Abstract base for Prime Intellect sandbox integration.
## Evaluation
TypeScript environments are evaluated natively using the TypeScript `vf-eval` CLI:
```bash
npx vf-eval hangman -n 5 -r 1
```
The CLI automatically:
- Detects TypeScript projects (those with `package.json` containing `verifiers.envId` but no `pyproject.toml`)
- Uses native TypeScript evaluation implementation
- Saves results in compatible JSONL format for `vf-tui`
For Python projects, `vf-eval` delegates to the Python `verifiers` CLI.
## Sandbox Support
Sandbox environments (using `SandboxEnv`) use native TypeScript HTTP client to interact with Prime Intellect sandboxes. No Python dependencies required.
**Configuration:**
- Set `PRIME_INTELLECT_API_KEY` or `PRIME_API_KEY` environment variable
- Optional: Set `PRIME_INTELLECT_API_URL` (default: `https://api.primeintellect.ai`)
- Optional: Set `PRIME_INTELLECT_TEAM_ID` for team-scoped sandboxes
## Examples
See `environments/` directory for example implementations:
- `example-single-turn`: Basic Q&A environment
- `example-tool-use`: Tool calling with AI SDK
## Development
This workspace uses [Turborepo](https://turbo.build) for task orchestration and caching. Use `turbo run` commands to build all packages with automatic dependency resolution and caching.
```bash
# Install dependencies
pnpm install
# Build all packages (core + environments)
pnpm turbo run build
# Build a specific environment
pnpm turbo run build --filter hangman
# Run tests
pnpm turbo run test
# Lint all packages
pnpm turbo run lint
# Format code
pnpm turbo run format
# Watch mode (runs all dev tasks in parallel)
pnpm turbo run dev --parallel
# Watch a specific environment
pnpm turbo run dev --parallel --filter hangman
```
### Turbo Features
- **Task Dependencies**: Builds automatically respect workspace dependencies (`dependsOn: ["^build"]`)
- **Local Caching**: Build outputs are cached locally for faster rebuilds
- **Parallel Execution**: Dev tasks run in parallel across packages
- **Filtering**: Use `--filter <package-name>` to target specific packages
For remote caching (CI/CD), set `TURBO_TEAM` and `TURBO_TOKEN` environment variables.
## Status
✅ **Core Complete** - All base classes and AI SDK integration implemented
🔄 **In Progress** - Python bridge refinement
📝 **Pending** - Comprehensive tests and examples
## License
MIT