multi-llm
Version:
A unified TypeScript/JavaScript package to use LLMs across ALL platforms with support for 17 major providers, streaming, MCP tools, and intelligent response parsing
566 lines (446 loc) β’ 16.5 kB
Markdown
# Multi-LLM π€
A unified TypeScript/JavaScript package to use LLMs across **ALL** platforms with support for streaming, MCP tools, and intelligent response parsing.
## Features
- **π Universal Provider Support**: **17 Major Providers** including OpenAI, Anthropic, Google Gemini, Cohere, Mistral AI, Together AI, Fireworks AI, OpenRouter, Groq, Cerebras, Ollama, Azure OpenAI, Perplexity, DeepInfra, Replicate, HuggingFace, AWS Bedrock
- **β‘ Streaming & Non-Streaming**: Real-time streaming or batch processing
- **π Intelligent Retry System**: Exponential backoff retry logic for handling API failures and rate limits
- **π§ Smart Response Parsing**: Automatic extraction of code blocks, thinking sections, and structured content
- **π§ MCP Integration**: Add Model Context Protocol tools to enhance capabilities
- **π TypeScript Support**: Full type definitions and IntelliSense
- **π― Unified API**: Same interface across all 17 providers
- **π§ͺ Smart Testing**: Conditional tests run only for configured providers
## Installation
```bash
npm install multi-llm
```
## Quick Start
```typescript
import { MultiLLM } from 'multi-llm';
// Create a provider
const provider = MultiLLM.createProvider('openai', 'your-api-key');
// Get available models
const models = await provider.getModels();
console.log(models);
// Create LLM instance
const llm = provider.createLLM('gpt-4o-mini');
// Non-streaming chat
const result = await llm.chat('What is the capital of France?', {
temperature: 0.7,
maxTokens: 100,
system: 'You are a helpful geography assistant'
});
console.log(result.parsed.content);
// Chat with retry configuration for handling API failures
const robustResult = await llm.chat('What is the capital of France?', {
temperature: 0.7,
maxTokens: 100,
system: 'You are a helpful geography assistant',
retries: 3, // Retry up to 3 times on failure (default: 1)
retryInterval: 1000, // Initial retry delay: 1 second (default: 1000ms)
retryBackoff: 2 // Exponential backoff multiplier (default: 2)
});
console.log(robustResult.parsed.content);
// Streaming chat
const streamResult = await llm.chat('Tell me a story', {
temperature: 1.0
}, (chunk) => {
process.stdout.write(chunk); // Real-time streaming
});
```
## Supported Providers
### OpenAI
```typescript
const provider = MultiLLM.createProvider('openai', 'sk-...');
const llm = provider.createLLM('gpt-4o-mini');
```
### Anthropic
```typescript
const provider = MultiLLM.createProvider('anthropic', 'sk-ant-...');
const llm = provider.createLLM('claude-3-5-sonnet-20241022');
```
### OpenRouter
```typescript
const provider = MultiLLM.createProvider('openrouter', 'sk-or-...');
const llm = provider.createLLM('microsoft/wizardlm-2-8x22b');
```
### Groq
```typescript
const provider = MultiLLM.createProvider('groq', 'gsk_...');
const llm = provider.createLLM('llama3-70b-8192');
```
### Cerebras
```typescript
const provider = MultiLLM.createProvider('cerebras', 'csk-...');
const llm = provider.createLLM('llama3.1-70b');
```
### Ollama (Local)
```typescript
const provider = MultiLLM.createProvider('ollama', '', 'http://localhost:11434');
const llm = provider.createLLM('llama3.2');
```
### Azure OpenAI
```typescript
const provider = MultiLLM.createProvider('azure', 'your-api-key', 'https://your-resource.openai.azure.com');
const llm = provider.createLLM('your-deployment-name');
```
### Google Gemini
```typescript
const provider = MultiLLM.createProvider('google', 'your-api-key');
const llm = provider.createLLM('gemini-2.5-pro');
```
### Cohere
```typescript
const provider = MultiLLM.createProvider('cohere', 'your-api-key');
const llm = provider.createLLM('command-r-plus');
```
### Mistral AI
```typescript
const provider = MultiLLM.createProvider('mistral', 'your-api-key');
const llm = provider.createLLM('mistral-large-latest');
```
### Together AI
```typescript
const provider = MultiLLM.createProvider('together', 'your-api-key');
const llm = provider.createLLM('meta-llama/Llama-3.2-3B-Instruct-Turbo');
```
### Fireworks AI
```typescript
const provider = MultiLLM.createProvider('fireworks', 'your-api-key');
const llm = provider.createLLM('accounts/fireworks/models/llama-v3p1-70b-instruct');
```
### Perplexity
```typescript
const provider = MultiLLM.createProvider('perplexity', 'your-api-key');
const llm = provider.createLLM('llama-3.1-sonar-large-128k-online');
```
### DeepInfra
```typescript
const provider = MultiLLM.createProvider('deepinfra', 'your-api-key');
const llm = provider.createLLM('meta-llama/Meta-Llama-3.1-8B-Instruct');
```
### Replicate
```typescript
const provider = MultiLLM.createProvider('replicate', 'your-api-key');
const llm = provider.createLLM('meta/llama-2-70b-chat');
```
### Hugging Face
```typescript
const provider = MultiLLM.createProvider('huggingface', 'your-api-key');
const llm = provider.createLLM('mistralai/Mixtral-8x7B-Instruct-v0.1');
```
### Amazon Bedrock
```typescript
const provider = MultiLLM.createProvider('bedrock', 'accessKeyId:secretAccessKey');
const llm = provider.createLLM('anthropic.claude-3-5-sonnet-20241022-v2:0');
```
## Response Structure
Every chat response includes:
```typescript
interface ChatResult {
raw: any; // Raw provider response
parsed: {
content: string; // Clean text content
codeBlocks: Array<{ // Extracted code blocks
language: string;
code: string;
}>;
thinking?: string; // Extracted thinking/reasoning
toolCalls?: Array<{ // MCP tool calls (if available)
id: string;
function: string;
args: any;
execute: () => Promise<any>;
}>;
};
usage?: { // Token usage stats
inputTokens: number;
outputTokens: number;
totalTokens: number;
};
}
```
## Retry Configuration
Multi-LLM includes built-in retry functionality with exponential backoff to handle temporary API failures, rate limits, and network issues.
### Basic Retry Usage
```typescript
const result = await llm.chat('Your message', {
retries: 3, // Number of retry attempts (default: 1)
retryInterval: 1000, // Initial retry delay in ms (default: 1000)
retryBackoff: 2, // Backoff multiplier (default: 2)
// ... other chat options
});
```
### Retry Behavior
The retry system implements **exponential backoff**:
- **1st retry**: After `retryInterval` ms (e.g., 1000ms)
- **2nd retry**: After `retryInterval Γ retryBackoff` ms (e.g., 2000ms)
- **3rd retry**: After `retryInterval Γ retryBackoffΒ²` ms (e.g., 4000ms)
### Retry Examples
```typescript
// Conservative retry for important requests
const result = await llm.chat('Critical business query', {
retries: 5,
retryInterval: 2000, // Start with 2 second delay
retryBackoff: 1.5, // Slower backoff: 2s, 3s, 4.5s, 6.75s, 10.125s
maxTokens: 500
});
// Quick retry for real-time applications
const result = await llm.chat('Fast query', {
retries: 2,
retryInterval: 200, // Quick 200ms initial delay
retryBackoff: 3, // Aggressive backoff: 200ms, 600ms
maxTokens: 50
});
// Disable retries (equivalent to retries: 0)
const result = await llm.chat('One-shot request', {
retries: 0, // No retries, fail immediately
maxTokens: 100
});
```
### Error Handling with Retries
```typescript
try {
const result = await llm.chat('Your message', {
retries: 3,
retryInterval: 1000,
retryBackoff: 2
});
console.log(result.parsed.content);
} catch (error) {
// After exhausting all retries
console.error('Request failed:', error.message);
// Error message includes retry context:
// "Failed after 3 retries (ProviderName:model-id): Original error message"
}
```
### When Retries Are Triggered
Retries are automatically triggered for:
- **Network errors** (connection timeouts, DNS failures)
- **Rate limit errors** (429 status codes)
- **Server errors** (5xx status codes)
- **Authentication failures** (invalid API keys)
- **Model unavailability** (temporary model issues)
Retries are **NOT** triggered for:
- **Client errors** (400, 404 - malformed requests)
- **Successful responses** (2xx status codes)
- **Streaming responses** (retries could cause duplicate content)
### Default Configuration
If no retry options are specified, the system uses:
```typescript
{
retries: 1, // 1 retry attempt
retryInterval: 1000, // 1 second initial delay
retryBackoff: 2 // Double delay each retry
}
```
## MCP (Model Context Protocol) Integration
Add tools to enhance your LLM's capabilities:
```typescript
const llm = provider.createLLM('gpt-4o-mini');
// Add MCP server
llm.addMCP('python3 -m my_mcp_server');
// Chat with tool access
const result = await llm.chat('Calculate the fibonacci sequence', {});
// Execute tool calls if present
if (result.parsed.toolCalls?.length > 0) {
for (const toolCall of result.parsed.toolCalls) {
const toolResult = await toolCall.execute();
console.log(`Tool ${toolCall.function} result:`, toolResult);
}
}
```
## Testing
The package includes comprehensive tests for each provider. Tests are only run for providers with valid environment variables.
### Environment Setup
The test system **automatically detects available providers** based on environment variables. Only providers with valid credentials will run tests.
Create a `.env` file in the project root:
```bash
# Copy the example file
cp .env.example .env
# Edit .env with your API keys (add only the providers you want to test)
```
**Provider Environment Variables** (add only what you have):
```env
# OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_MODEL=microsoft/wizardlm-2-8x22b
# OpenAI
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4o-mini
# Anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
# Google Gemini
GOOGLE_API_KEY=your_google_api_key
GOOGLE_MODEL=gemini-2.5-pro
# Cohere
COHERE_API_KEY=your_cohere_api_key
COHERE_MODEL=command-r-plus
# Mistral AI
MISTRAL_API_KEY=your_mistral_api_key
MISTRAL_MODEL=mistral-large-latest
# Together AI
TOGETHER_API_KEY=your_together_api_key
TOGETHER_MODEL=meta-llama/Llama-3.2-3B-Instruct-Turbo
# Fireworks AI
FIREWORKS_API_KEY=your_fireworks_api_key
FIREWORKS_MODEL=accounts/fireworks/models/llama-v3p1-70b-instruct
# Groq
GROQ_API_KEY=your_groq_api_key
GROQ_MODEL=llama3-70b-8192
# Cerebras
CEREBRAS_API_KEY=your_cerebras_api_key
CEREBRAS_MODEL=llama3.1-70b
# Ollama (local)
OLLAMA_MODEL=llama3.2
OLLAMA_BASE_URL=http://localhost:11434
# Azure OpenAI
AZURE_API_KEY=your_azure_api_key
AZURE_BASE_URL=https://your-resource.openai.azure.com
AZURE_MODEL=your-deployment-name
# Perplexity
PERPLEXITY_API_KEY=your_perplexity_api_key
PERPLEXITY_MODEL=llama-3.1-sonar-large-128k-online
# DeepInfra
DEEPINFRA_API_KEY=your_deepinfra_api_key
DEEPINFRA_MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
# Replicate
REPLICATE_API_KEY=your_replicate_api_key
REPLICATE_MODEL=meta/llama-2-70b-chat
# Hugging Face
HUGGINGFACE_API_KEY=your_huggingface_api_key
HUGGINGFACE_MODEL=mistralai/Mixtral-8x7B-Instruct-v0.1
# AWS Bedrock
BEDROCK_API_KEY=accessKeyId:secretAccessKey
BEDROCK_MODEL=anthropic.claude-3-5-sonnet-20241022-v2:0
BEDROCK_REGION=us-east-1
```
### Running Tests
```bash
# Install dependencies
npm install
# Build the project
npm run build
# Run all tests
# β
Providers with valid credentials will run tests
# βοΈ Providers without credentials will be skipped
npm test
# Run tests for specific provider
npm test -- --testPathPattern=openrouter
# Run tests with coverage
npm test -- --coverage
# Run tests in watch mode
npm test -- --watch
```
**Example Output:**
```
π Provider Environment Status:
OpenRouter: β
Available
OpenAI: β Missing credentials
Anthropic: β
Available
Google: β
Available
Cohere: β Missing credentials
Mistral: β Missing credentials
Together: β Missing credentials
Fireworks: β Missing credentials
Groq: β Missing credentials
Cerebras: β Missing credentials
Ollama: β Missing credentials
Azure: β Missing credentials
Perplexity: β Missing credentials
DeepInfra: β Missing credentials
Replicate: β Missing credentials
HuggingFace: β Missing credentials
Bedrock: β Missing credentials
π― 3 providers available for testing: openrouter, anthropic, google
β
Test execution will run for 3 provider(s): openrouter, anthropic, google
π Provider-specific tests will execute for configured providers
βοΈ Provider tests without credentials will be skipped
```
### Test Categories
Each provider test suite includes:
- **Provider Creation**: Basic instantiation and configuration
- **Model Management**: Fetching available models and metadata
- **Non-Streaming Chat**: Standard request/response with performance metrics
- **Streaming Chat**: Real-time streaming with chunk analysis
- **Error Handling**: Invalid requests and edge cases
- **Response Parsing**: Code blocks, thinking extraction, and structured content
### Performance Metrics
Tests automatically measure and report:
- Response time for non-streaming requests
- Time to first chunk for streaming requests
- Total streaming time
- Token usage statistics (when available)
- Chunk count and average size for streaming
## API Reference
### MultiLLM
```typescript
class MultiLLM {
static createProvider(type: ProviderType, apiKey: string, baseUrl?: string): Provider
}
```
### Provider
```typescript
abstract class Provider {
abstract getModels(): Promise<ModelInfo[]>
abstract createLLM(modelId: string): LLM
}
```
### LLM
```typescript
class LLM {
addMCP(startupCommand: string): void
chat(content: string, options: ChatOptions, streamCallback?: StreamCallback): Promise<ChatResult>
dispose(): void
}
```
### ChatOptions
```typescript
interface ChatOptions {
temperature?: number; // 0.0 to 2.0
maxTokens?: number; // Maximum output tokens
topP?: number; // Nucleus sampling parameter
topK?: number; // Top-K sampling parameter
system?: string; // System message
stream?: boolean; // Automatically set based on callback presence
// Retry configuration
retries?: number; // Number of retry attempts (default: 1)
retryInterval?: number; // Initial retry delay in ms (default: 1000)
retryBackoff?: number; // Exponential backoff multiplier (default: 2)
[key: string]: any; // Provider-specific options
}
```
## Examples
See `example.js` for comprehensive usage examples across all providers.
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Add tests for your changes
4. Run the test suite (`npm test`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
## License
MIT License - see LICENSE file for details.
## Changelog
### v1.1.0
- **π Intelligent Retry System**: Added exponential backoff retry logic with customizable configuration
- `retries`: Number of retry attempts (default: 1)
- `retryInterval`: Initial retry delay in milliseconds (default: 1000)
- `retryBackoff`: Exponential backoff multiplier (default: 2)
- **π§ͺ Comprehensive Retry Testing**: 13 new test cases covering retry behavior, backoff timing, and error handling
- **π Enhanced Documentation**: Complete retry configuration examples and best practices
- **β‘ Production Ready**: Robust error handling for network issues, rate limits, and API failures
### v1.0.0
- Initial release with support for **17 providers**:
- **Core Providers**: OpenAI, Anthropic, Google Gemini, OpenRouter
- **Performance Providers**: Groq, Cerebras, Together AI, Fireworks AI
- **Specialized Providers**: Cohere, Mistral AI, Perplexity, DeepInfra
- **Local/Custom**: Ollama, Azure OpenAI
- **Cloud Platforms**: Replicate, Hugging Face, AWS Bedrock
- **Streaming and non-streaming** support across all providers
- **Smart response parsing** with code block and thinking extraction
- **MCP integration** framework for enhanced capabilities
- **Conditional testing** system that adapts to available credentials
- **Comprehensive test suite** with performance metrics
- **Full TypeScript** definitions and IntelliSense support