agentrails
Version:
Safeguard your AI agents - keep them grounded and on the rails
215 lines (166 loc) • 4.89 kB
Markdown
# LLM Provider Support
AgentRails supports multiple LLM providers for evaluating your agent's responses. You can use any provider regardless of which LLM your agent uses!
## Supported Providers
### OpenAI
**Models:** GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
```javascript
module.exports = {
llm: {
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4-turbo-preview", // optional, default
temperature: 0.3, // optional, default
},
agent: async (input) => {
/* your agent */
},
tests: [
/* your tests */
],
};
```
**Setup:**
```bash
npm install openai
export OPENAI_API_KEY="sk-..."
```
### Anthropic Claude
**Models:** Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Sonnet
```javascript
module.exports = {
llm: {
provider: "anthropic",
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-3-5-sonnet-20241022", // optional, default
temperature: 0.3, // optional
},
agent: async (input) => {
/* your agent */
},
tests: [
/* your tests */
],
};
```
**Setup:**
```bash
npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY="sk-ant-..."
```
### Google Gemini
**Models:** Gemini Pro, Gemini Pro Vision
```javascript
module.exports = {
llm: {
provider: "google",
apiKey: process.env.GOOGLE_API_KEY,
model: "gemini-pro", // optional, default
temperature: 0.3, // optional
},
agent: async (input) => {
/* your agent */
},
tests: [
/* your tests */
],
};
```
**Setup:**
```bash
npm install @google/generative-ai
export GOOGLE_API_KEY="..."
```
### Grok (xAI)
**Models:** Grok Beta
```javascript
module.exports = {
llm: {
provider: "grok",
apiKey: process.env.XAI_API_KEY,
model: "grok-beta", // optional, default
temperature: 0.3, // optional
baseURL: "https://api.x.ai/v1", // optional, default
},
agent: async (input) => {
/* your agent */
},
tests: [
/* your tests */
],
};
```
**Setup:**
```bash
# Grok uses OpenAI SDK
npm install openai
export XAI_API_KEY="..."
```
## Choosing a Provider
### When to use OpenAI
- **Best for:** General purpose, well-documented, stable API
- **Pros:** Excellent at following JSON format, reliable, fast
- **Cons:** More expensive than alternatives
### When to use Anthropic
- **Best for:** Long context evaluations, detailed reasoning
- **Pros:** Excellent reasoning, large context window (200k tokens), good at nuanced evaluation
- **Cons:** Slightly slower, requires separate SDK
### When to use Google Gemini
- **Best for:** Cost-effective evaluation, multimodal inputs
- **Pros:** Free tier available, fast, good for image inputs
- **Cons:** Newer, less consistent JSON parsing
### When to use Grok
- **Best for:** Latest news/current events evaluation
- **Pros:** Access to real-time information, X integration
- **Cons:** Beta stage, limited availability
## Cost Comparison
Approximate costs per 1M tokens (input/output):
| Provider | Model | Input | Output |
| --------- | ----------------- | --------------------- | ------ |
| OpenAI | GPT-4 Turbo | $10 | $30 |
| OpenAI | GPT-3.5 Turbo | $0.50 | $1.50 |
| Anthropic | Claude 3.5 Sonnet | $3 | $15 |
| Google | Gemini Pro | Free tier, then $0.50 | $1.50 |
| Grok | Grok Beta | TBD | TBD |
## Best Practices
1. **Match model to task complexity:**
- Simple pass/fail: GPT-3.5 Turbo or Gemini Pro
- Nuanced evaluation: GPT-4 Turbo or Claude 3.5 Sonnet
2. **Use different providers for redundancy:**
```javascript
// Run same tests with multiple evaluators
const providers = ["openai", "anthropic", "google"];
```
3. **Set temperature low (0.1-0.3):**
- Low temperature = more consistent evaluation
- High temperature = more creative but less reliable
4. **Your agent can use a different LLM:**
```javascript
// Agent uses Claude, Evaluator uses GPT-4
module.exports = {
llm: { provider: "openai", apiKey: process.env.OPENAI_API_KEY },
agent: async (input) => {
// Your agent calls Claude internally
return await yourClaudeAgent.chat(input);
},
};
```
## Adding a New Provider
To add a new provider:
1. Implement the `LLMEvaluator` interface in `src/evaluator.ts`
2. Add the provider to the `LLMProvider` type in `src/types.ts`
3. Update the `createEvaluator` factory function
4. Add tests in `tests/evaluator.test.ts`
Example:
```typescript
export class CustomEvaluator implements LLMEvaluator {
async evaluate(
input: string | Record<string, any>,
actualResponse: string | Record<string, any>,
expectedBehavior?: string,
exampleResponses?: string[]
): Promise<{ passed: boolean; reasoning: string }> {
// Your implementation
}
}
```
Pull requests welcome!