git-contextor
Version:
A code context tool with vector search and real-time monitoring, with optional Git integration.
396 lines (311 loc) • 11.4 kB
Markdown
# Vision API Integration
Git Contextor includes experimental vision capabilities that allow you to index and search through images, diagrams, screenshots, and other visual content in your repository. This is particularly useful for repositories containing documentation with images, UI mockups, architectural diagrams, or code screenshots.
## Configuration
Vision capabilities are disabled by default. To enable them, configure your `.gitcontextor/config.json`:
```json
{
"vision": {
"enabled": true,
"provider": "openai",
"model": "gpt-4-vision-preview",
"apiKey": "your-openai-api-key",
"prompt": "Describe this image for a software developer. Focus on text, code, diagrams, UI elements, and technical content. Be concise but comprehensive."
}
}
```
### Supported Providers
- **OpenAI**: `gpt-4-vision-preview` or `gpt-4o`
- **Google**: `gemini-pro-vision`
### Configuration Options
- `enabled`: Enable/disable vision processing
- `provider`: Vision API provider (`openai` or `google`)
- `model`: Specific model to use
- `apiKey`: API key for the provider
- `prompt`: Custom prompt for image description (optional)
## Supported Image Formats
- `.png`
- `.jpg` / `.jpeg`
- `.gif`
- `.bmp`
- `.webp`
- `.svg` (converted to raster format)
## How It Works
1. **Image Discovery**: Git Contextor scans your repository for supported image files
2. **Vision Processing**: Images are sent to the configured vision API for description
3. **Indexing**: The generated descriptions are indexed alongside your code
4. **Search**: You can search for images using natural language queries
## CLI Usage
### Enable Vision Processing
```bash
# Enable vision in your configuration
git-contextor config --vision-enabled true --vision-provider openai --vision-api-key your-key
# Trigger reindexing to process images
git-contextor reindex
```
### Search for Images
```bash
# Search for images containing specific content
git-contextor query "architecture diagram showing database connections"
git-contextor query "screenshot of login form"
git-contextor query "flowchart explaining the authentication process"
```
## API Usage
### Search with Vision Content
```javascript
// Search that may include vision content
const response = await fetch('http://localhost:3333/api/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': 'your-api-key'
},
body: JSON.stringify({
query: 'user interface mockup for the dashboard',
maxTokens: 2048
})
});
const results = await response.json();
console.log('Search results:', results);
```
### Chat with Vision Context
```javascript
// Chat that can reference images
const response = await fetch('http://localhost:3333/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': 'your-api-key'
},
body: JSON.stringify({
query: 'Explain the system architecture shown in the diagrams',
context_type: 'general'
})
});
const result = await response.json();
console.log('AI Response:', result.response);
```
## Integration Examples
### Python Integration
```python
import requests
import json
class VisionEnabledGitContextor:
def __init__(self, base_url, api_key):
self.base_url = base_url.rstrip('/')
self.headers = {
'x-api-key': api_key,
'Content-Type': 'application/json'
}
def search_images(self, query, max_tokens=2048):
"""Search for images using natural language."""
url = f'{self.base_url}/api/search'
payload = {
'query': query,
'maxTokens': max_tokens
}
response = requests.post(url, headers=self.headers, json=payload)
response.raise_for_status()
return response.json()
def ask_about_images(self, question):
"""Ask questions about images in the repository."""
url = f'{self.base_url}/api/chat'
payload = {
'query': question,
'context_type': 'general'
}
response = requests.post(url, headers=self.headers, json=payload)
response.raise_for_status()
return response.json()
# Usage
vision_client = VisionEnabledGitContextor('http://localhost:3333', 'your-api-key')
# Search for specific types of images
ui_mockups = vision_client.search_images('user interface mockups and wireframes')
diagrams = vision_client.search_images('system architecture diagrams')
screenshots = vision_client.search_images('application screenshots')
# Ask questions about visual content
response = vision_client.ask_about_images('What does the login flow look like based on the UI mockups?')
print('AI Response:', response['response'])
```
### Node.js Integration
```javascript
class VisionEnabledGitContextor {
constructor(baseUrl, apiKey) {
this.baseUrl = baseUrl.replace(/\/$/, '');
this.headers = {
'x-api-key': apiKey,
'Content-Type': 'application/json'
};
}
async searchImages(query, maxTokens = 2048) {
const response = await fetch(`${this.baseUrl}/api/search`, {
method: 'POST',
headers: this.headers,
body: JSON.stringify({
query: query,
maxTokens: maxTokens
})
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
return await response.json();
}
async askAboutImages(question) {
const response = await fetch(`${this.baseUrl}/api/chat`, {
method: 'POST',
headers: this.headers,
body: JSON.stringify({
query: question,
context_type: 'general'
})
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
return await response.json();
}
async findImagesInContext(searchResults) {
// Filter search results to find image-related content
const imageResults = searchResults.results.filter(result => {
const filePath = result.filePath.toLowerCase();
return filePath.includes('.png') ||
filePath.includes('.jpg') ||
filePath.includes('.jpeg') ||
filePath.includes('.gif') ||
filePath.includes('.svg') ||
result.content.toLowerCase().includes('image') ||
result.content.toLowerCase().includes('diagram');
});
return imageResults;
}
}
// Usage
const visionClient = new VisionEnabledGitContextor('http://localhost:3333', 'your-api-key');
// Search for images
const results = await visionClient.searchImages('dashboard screenshot showing user statistics');
console.log('Found images:', results.results.length);
// Ask about visual content
const response = await visionClient.askAboutImages('Describe the user interface design patterns used in this application');
console.log('AI Analysis:', response.response);
```
## Common Use Cases
### Documentation Analysis
```javascript
// Find all documentation images
const docImages = await visionClient.searchImages('documentation images diagrams screenshots');
// Analyze documentation completeness
const analysis = await visionClient.askAboutImages(
'Based on the documentation images, what areas of the application are well-documented and what might need more visual documentation?'
);
```
### UI/UX Analysis
```javascript
// Find UI mockups and designs
const uiDesigns = await visionClient.searchImages('UI mockups wireframes user interface designs');
// Get design insights
const designAnalysis = await visionClient.askAboutImages(
'What are the main design patterns and UI components shown in the mockups?'
);
```
### Architecture Documentation
```javascript
// Find architectural diagrams
const architectureDiagrams = await visionClient.searchImages('system architecture diagrams database schema API flow');
// Get architectural insights
const archAnalysis = await visionClient.askAboutImages(
'Explain the system architecture based on the diagrams in the repository'
);
```
## Performance Considerations
1. **API Costs**: Vision API calls are more expensive than text processing
2. **Processing Time**: Image processing takes longer than text indexing
3. **File Size**: Large images may hit API limits
4. **Rate Limits**: Vision APIs often have stricter rate limits
## Optimization Tips
### Image Preprocessing
```javascript
// Resize large images before processing
const sharp = require('sharp');
async function preprocessImage(imagePath) {
const buffer = await sharp(imagePath)
.resize(1024, 1024, { fit: 'inside', withoutEnlargement: true })
.jpeg({ quality: 80 })
.toBuffer();
return buffer;
}
```
### Selective Processing
```json
{
"vision": {
"enabled": true,
"provider": "openai",
"model": "gpt-4-vision-preview",
"apiKey": "your-key",
"includePatterns": [
"docs/**/*.png",
"assets/**/*.jpg",
"screenshots/**/*"
],
"excludePatterns": [
"node_modules/**",
"build/**",
"dist/**"
]
}
}
```
## Error Handling
```javascript
async function safeVisionSearch(query) {
try {
const results = await visionClient.searchImages(query);
return results;
} catch (error) {
if (error.message.includes('rate limit')) {
console.log('Rate limit exceeded, retrying in 60 seconds...');
await new Promise(resolve => setTimeout(resolve, 60000));
return await safeVisionSearch(query);
} else if (error.message.includes('vision not enabled')) {
console.log('Vision processing is not enabled for this repository');
return null;
} else {
throw error;
}
}
}
```
## Best Practices
1. **Selective Enablement**: Only enable vision for repositories with meaningful visual content
2. **Cost Management**: Monitor API usage to control costs
3. **Image Quality**: Use clear, high-quality images for better results
4. **Descriptive Filenames**: Use descriptive filenames to complement vision descriptions
5. **Regular Updates**: Re-index when adding new images
## Limitations
1. **Text in Images**: OCR capability depends on the vision model used
2. **Context Understanding**: Vision models may not fully understand domain-specific diagrams
3. **File Size Limits**: Most vision APIs have file size restrictions
4. **Processing Time**: Vision processing is significantly slower than text processing
## Troubleshooting
### Common Issues
1. **Vision Not Working**: Check if vision is enabled in configuration
2. **API Errors**: Verify API key and model availability
3. **No Results**: Images might not be indexed yet, try reindexing
4. **Rate Limits**: Implement retry logic with exponential backoff
### Debug Commands
```bash
# Check vision configuration
git-contextor config --show | grep -A 10 vision
# Force reindex with debug info
DEBUG=true git-contextor reindex
```
## Future Enhancements
- Support for additional image formats
- Local vision processing options
- Better OCR integration
- Batch processing optimizations
- Custom vision prompts per file type
## Next Steps
- Learn about [programmatic access](./programmatic.md) for more API capabilities
- Explore [sharing features](../features/sharing-and-chat.md) for collaborative visual analysis
- Check out [LangChain integration](./langchain.md) for advanced RAG with vision content