exif-ai
Version:
A powerful Node.js CLI and library that uses AI providers (OpenAI, Google Gemini, Anthropic Claude, Mistral, Ollama, Amazon Bedrock, Azure OpenAI, DeepInfra, Fireworks, XAI, OpenRouter, and more) to intelligently write image descriptions and tags to EXIF
237 lines (180 loc) • 8.41 kB
Markdown
# Vercel AI SDK Integration Improvements
This document outlines the improvements made to integrate the Vercel AI SDK into the exif-ai project, providing enhanced AI functionality with better provider support and modern patterns.
## Overview
The project has been upgraded to use the **Vercel AI SDK** (v4.3.16+) for all AI functionality, replacing the previous custom implementations with a unified, modern approach that supports multiple AI providers through standardized interfaces.
## Key Improvements
### 1. **Unified AI SDK Implementation**
- **Before**: Custom implementations for each provider with inconsistent interfaces
- **After**: Single unified implementation using Vercel AI SDK patterns
- **Benefits**: Consistent behavior, better error handling, standardized multimodal support
### 2. **Enhanced Provider Support**
The following AI providers are now supported through their official AI SDK packages:
| Provider | Package | Models Supported | Environment Variable |
|----------|---------|------------------|---------------------|
| OpenAI | `@ai-sdk/openai` | GPT-4o, GPT-4 Turbo, etc. | `OPENAI_API_KEY` |
| Google | `@ai-sdk/google` | Gemini 1.5 Pro, Gemini 1.5 Flash | `GOOGLE_API_KEY` |
| Anthropic | `@ai-sdk/anthropic` | Claude 3.5 Sonnet, Claude 3 Opus | `ANTHROPIC_API_KEY` |
| Mistral | `@ai-sdk/mistral` | Mistral Large, Mistral Medium | `MISTRAL_API_KEY` |
| Ollama | `@ai-sdk/openai` (compatible) | Llama 3.2 Vision, etc. | `OLLAMA_BASE_URL` |
| Amazon Bedrock | `@ai-sdk/amazon-bedrock` | Claude 3 Sonnet, etc. | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
| Azure OpenAI | `@ai-sdk/azure` | GPT-4 Vision, etc. | `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT` |
| DeepInfra | `@ai-sdk/deepinfra` | CogVLM2, etc. | `DEEPINFRA_API_KEY` |
| Fireworks | `@ai-sdk/fireworks` | Llama 3, etc. | `FIREWORKS_API_KEY` |
| OpenAI Compatible | `@ai-sdk/openai-compatible` | Various models | `OPENAI_COMPATIBLE_API_KEY`, `OPENAI_COMPATIBLE_BASE_URL` |
| TogetherAI | `@ai-sdk/togetherai` | CogVLM2, etc. | `TOGETHER_API_KEY` |
| XAI | `@ai-sdk/xai` | Grok 1.5 Vision | `XAI_API_KEY` |
| OpenRouter | `@openrouter/ai-sdk-provider` | Various models | `OPENROUTER_API_KEY` |
### 3. **Improved Multimodal Support**
- **Native Buffer Support**: AI SDK can handle image buffers directly without base64 conversion
- **Automatic Image Processing**: Smart image resizing based on provider limits
- **Consistent Message Format**: Unified message structure across all providers
### 4. **Better Error Handling**
- **Provider-Specific Errors**: Clear error messages for each provider
- **Graceful Fallbacks**: Better handling of API failures
- **Validation**: Input validation for models and providers
## Technical Implementation
### Core Files Modified
1. **`src/provider/ai-sdk.ts`** - Main AI SDK implementation
- Unified provider configuration
- Standardized image processing
- Modern AI SDK patterns
2. **Package Dependencies** - Added official AI SDK packages
```json
{
"@ai-sdk/amazon-bedrock": "^2.2.x",
"@ai-sdk/anthropic": "^1.2.x",
"@ai-sdk/azure": "^1.3.x",
"@ai-sdk/deepinfra": "^0.2.x",
"@ai-sdk/fireworks": "^0.2.x",
"@ai-sdk/google": "^1.2.x",
"@ai-sdk/mistral": "^1.2.x",
"@ai-sdk/openai": "^1.3.x",
"@ai-sdk/openai-compatible": "^0.2.x",
"@ai-sdk/togetherai": "^0.2.x",
"@ai-sdk/xai": "^1.2.x",
"@openrouter/ai-sdk-provider": "^0.7.x"
}
```
### Key Functions
#### `getDescription(options)`
Generates image descriptions using the AI SDK's `generateText` function.
```typescript
const description = await getDescription({
buffer: imageBuffer,
model: "gpt-4o",
prompt: "Describe this image in detail.",
provider: "openai"
});
```
#### `getTags(options)`
Generates image tags using the same unified interface.
```typescript
const tags = await getTags({
buffer: imageBuffer,
model: "gemini-1.5-pro",
prompt: "Generate tags for this image.",
provider: "google"
});
```
## Usage Examples
### Command Line Usage
```bash
# Using OpenAI GPT-4o
OPENAI_API_KEY=your_key exif-ai -i image.jpg -a openai -m gpt-4o
# Using Google Gemini
GOOGLE_API_KEY=your_key exif-ai -i image.jpg -a google -m gemini-1.5-pro
# Using Anthropic Claude
ANTHROPIC_API_KEY=your_key exif-ai -i image.jpg -a anthropic -m claude-3-5-sonnet-20241022
# Using local Ollama
exif-ai -i image.jpg -a ollama -m llama3.2-vision
# Using Amazon Bedrock
AWS_ACCESS_KEY_ID=your_key AWS_SECRET_ACCESS_KEY=your_secret exif-ai -i image.jpg -a amazon -m anthropic.claude-3-sonnet-20240229-v1:0
# Using Azure OpenAI
AZURE_OPENAI_API_KEY=your_key AZURE_OPENAI_ENDPOINT=your_endpoint exif-ai -i image.jpg -a azure -m gpt-4-vision
# Using DeepInfra
DEEPINFRA_API_KEY=your_key exif-ai -i image.jpg -a deepinfra -m cogvlm2-llama3-8b-chat
# Using Fireworks
FIREWORKS_API_KEY=your_key exif-ai -i image.jpg -a fireworks -m accounts/fireworks/models/llama-v3-8b-instruct
# Using OpenAI Compatible
OPENAI_COMPATIBLE_API_KEY=your_key OPENAI_COMPATIBLE_BASE_URL=your_url exif-ai -i image.jpg -a openai-compatible -m gpt-4-vision
# Using TogetherAI
TOGETHER_API_KEY=your_key exif-ai -i image.jpg -a together -m cogvlm2-llama3-8b-chat
# Using XAI
XAI_API_KEY=your_key exif-ai -i image.jpg -a xai -m grok-1.5-vision
# Using OpenRouter
OPENROUTER_API_KEY=your_key exif-ai -i image.jpg -a openrouter -m openai/gpt-4o
```
### Programmatic Usage
```javascript
import { execute } from 'exif-ai';
await execute({
path: 'image.jpg',
provider: 'openai',
model: 'gpt-4o',
tasks: ['description', 'tags'],
verbose: true
});
```
## Migration Guide
### For Users
- **No Breaking Changes**: All existing command-line usage continues to work
- **New Models**: Access to latest models from all providers
- **Better Performance**: Improved image processing and API efficiency
### For Developers
- **Updated Imports**: AI SDK packages are now used internally
- **Consistent Interface**: All providers use the same function signatures
- **Better TypeScript**: Improved type safety and IntelliSense
## Testing
The implementation includes comprehensive test coverage:
- **Unit Tests**: Provider-specific functionality
- **Integration Tests**: End-to-end image processing
- **Mock Providers**: Test providers for CI/CD environments
Run tests with:
```bash
pnpm test
```
## Performance Improvements
1. **Reduced Memory Usage**: Direct buffer handling without base64 conversion
2. **Faster Processing**: Optimized image resizing pipeline
3. **Better Caching**: AI SDK's built-in request optimization
4. **Parallel Processing**: Concurrent description and tag generation
## Future Enhancements
The AI SDK integration provides a foundation for future improvements:
- **Streaming Support**: Real-time response streaming
- **Tool Calling**: Function calling capabilities
- **Embeddings**: Semantic search and similarity
- **Image Generation**: AI-powered image creation
- **Additional Providers**: Easy integration of new AI services
## Troubleshooting
### Common Issues
1. **API Key Errors**
```
Error: Missing API key for provider 'openai'
```
**Solution**: Set the appropriate environment variable (e.g., `OPENAI_API_KEY`)
2. **Model Not Found**
```
Error: Model 'gpt-5' not found
```
**Solution**: Use a valid model name for the provider
3. **Provider Not Supported**
```
Error: Unsupported provider: 'custom'
```
**Solution**: Use one of the supported providers: openai, google, anthropic, mistral, ollama, amazon, bedrock, azure, deepinfra, fireworks, openai-compatible, together, togetherai, xai, openrouter
### Debug Mode
Enable verbose logging to see detailed AI SDK operations:
```bash
exif-ai -i image.jpg -a openai --verbose
```
## Contributing
When contributing to the AI functionality:
1. **Follow AI SDK Patterns**: Use official AI SDK patterns and conventions
2. **Test All Providers**: Ensure changes work across all supported providers
3. **Update Documentation**: Keep this document and examples up to date
4. **Performance Testing**: Verify that changes don't impact performance
## Resources
- [Vercel AI SDK Documentation](https://sdk.vercel.ai/docs)
- [AI SDK Providers](https://sdk.vercel.ai/providers)
- [Multimodal Guide](https://sdk.vercel.ai/docs/foundations/prompts#multimodal-messages)
- [Provider Comparison](https://sdk.vercel.ai/providers/ai-sdk-providers)