@taiyokimura/rag-mcp
Version:
RAG (Retrieval-Augmented Generation) MCP Server with Supabase and Cohere integration
334 lines (245 loc) • 7.42 kB
Markdown
# RAG MCP
A Retrieval-Augmented Generation (RAG) MCP Server that integrates with Supabase and Cohere to provide intelligent code search and retrieval capabilities.
## Features
- **Automatic Repository Indexing**: Scans and embeds all supported files in a repository
- **Vector Similarity Search**: Uses Cohere embeddings for semantic code search
- **Supabase Integration**: Stores embeddings and metadata in Supabase PostgreSQL with vector support
- **File Type Support**: Supports 30+ programming languages and file types
- **Smart Filtering**: Automatically ignores build artifacts, dependencies, and large files
- **MCP Protocol**: Standard Model Context Protocol server for AI assistant integration
## Requirements
- Node.js 18 or higher
- Supabase account and project
- Cohere API account
- MCP-compatible client (Claude Desktop, etc.)
## Installation & Setup
### 1. Install the Package
```bash
# Install globally via npm
npm install -g @taiyokimura/rag-mcp
# Or run directly with npx
npx @taiyokimura/rag-mcp@latest
```
### 2. Database Setup
1. Create a Supabase project at https://supabase.com
2. Execute the SQL schema from `db/schema.sql` in your Supabase SQL editor
3. Get your Supabase URL and anon key from the API settings
### 3. Get API Keys
- **Supabase**: Project URL and anon key from your Supabase dashboard
- **Cohere**: API key from https://cohere.com dashboard
### 4. Configure MCP Client
Add the server to your MCP client configuration:
#### Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"rag-mcp": {
"command": "npx",
"args": ["@taiyokimura/rag-mcp@latest"],
"env": {
"SUPABASE_URL": "https://your-project-id.supabase.co",
"SUPABASE_ANON_KEY": "your-anon-key-here",
"COHERE_API_KEY": "your-cohere-api-key-here"
}
}
}
}
```
#### Other MCP Clients
Use the command format:
```bash
npx @taiyokimura/rag-mcp@latest
```
With environment variables:
- `SUPABASE_URL`: Your Supabase project URL
- `SUPABASE_ANON_KEY`: Your Supabase anon key
- `COHERE_API_KEY`: Your Cohere API key
- `MCP_NAME`: (Optional) Custom server name (default: "rag-mcp")
## Usage
### 1. Initialize Repository
First, initialize your repository to embed all files:
```
Use the initialize_repository tool with your repository path
```
This will:
- Scan all supported files in the repository
- Generate embeddings using Cohere
- Store content and embeddings in Supabase
- Skip large files (>1MB) and ignored patterns
### 2. Search Code
Search for relevant code using natural language:
```
Use the search_code tool with your search query
```
Examples:
- "authentication functions"
- "database connection setup"
- "error handling middleware"
- "API endpoint for user management"
## Supported File Types
The server automatically processes these file types:
**Programming Languages:**
- JavaScript/TypeScript (`.js`, `.ts`, `.jsx`, `.tsx`)
- Python (`.py`)
- Java (`.java`)
- C/C++ (`.c`, `.cpp`, `.h`)
- C# (`.cs`)
- PHP (`.php`)
- Ruby (`.rb`)
- Go (`.go`)
- Rust (`.rs`)
- Swift (`.swift`)
- Kotlin (`.kt`)
- Scala (`.scala`)
**Web Technologies:**
- HTML (`.html`)
- CSS/SCSS/Sass (`.css`, `.scss`, `.sass`, `.less`)
- XML (`.xml`)
**Configuration & Data:**
- JSON (`.json`)
- YAML (`.yaml`, `.yml`)
- SQL (`.sql`)
- Environment files (`.env`)
**Documentation:**
- Markdown (`.md`)
- Text files (`.txt`)
**Scripts:**
- Shell scripts (`.sh`, `.bash`, `.zsh`)
- PowerShell (`.ps1`)
- Batch files (`.bat`)
- Dockerfile
## Build & Development
### Local Development
```bash
# Clone the repository
git clone https://github.com/your-username/rag-mcp.git
cd rag-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run locally
npm run dev
```
### Build for Distribution
```bash
# Build TypeScript
npm run build
# Test the built version
node build/index.js
# Package for npm
npm pack --dry-run
```
## Publishing to npm
```bash
# Login to npm
npm login
# Publish the package
npm publish
```
## Tools
### initialize_repository
Scans and embeds all files in a repository.
**Input Schema:**
```json
{
"repository_path": "string (required) - Path to the repository"
}
```
**Output:**
- Success/failure status
- Statistics (total files, processed, failed)
- Processing details
### search_code
Searches for code using vector similarity.
**Input Schema:**
```json
{
"query": "string (required) - Search query",
"limit": "number (optional) - Max results (default: 5)"
}
```
**Output:**
- Matching files with similarity scores
- Content previews
- File metadata (path, type, size)
## Name Consistency & Troubleshooting
### Consistency Matrix
Always use these standardized names:
- **npm package name** → `rag-mcp`
- **Binary name** → `rag-mcp`
- **MCP server name** → `rag-mcp`
- **Environment variable MCP_NAME** → `rag-mcp`
- **Client registry key** → `rag-mcp`
- **UI display label** → `RAG MCP`
### Conflict Cleanup
- Remove any old entries with different names and re-add with `rag-mcp`
- Ensure global MCP configurations only use `rag-mcp` for keys
- This project does not include `.cursor/mcp.json` - configure in the UI only
### Example Configuration
**Correct:**
```json
{
"mcpServers": {
"rag-mcp": {
"command": "npx",
"args": ["@taiyokimura/rag-mcp@latest"]
}
}
}
```
**Incorrect:**
```json
{
"mcpServers": {
"RAG-MCP": { ... },
"ragMcp": { ... }
}
}
```
## Environment Variables
- `SUPABASE_URL`: Your Supabase project URL (required)
- `SUPABASE_ANON_KEY`: Your Supabase anonymous key (required)
- `COHERE_API_KEY`: Your Cohere API key (required)
- `MCP_NAME`: Server name (optional, default: "rag-mcp")
## Performance Considerations
- **File Size Limit**: Files larger than 1MB are automatically skipped
- **Batch Processing**: Files are processed in batches to avoid rate limits
- **Vector Index**: Uses HNSW index for fast similarity searches
- **Ignored Patterns**: Automatically skips `node_modules`, build directories, etc.
## Troubleshooting
### Common Issues
1. **"Supabase client not initialized"**
- Check your `SUPABASE_URL` and `SUPABASE_ANON_KEY` environment variables
- Verify your Supabase project is active
2. **"Cohere client not initialized"**
- Check your `COHERE_API_KEY` environment variable
- Verify your Cohere account has API access
3. **"Vector extension not available"**
- Make sure you executed the database schema in Supabase
- The `vector` extension should be available by default in Supabase
4. **Slow search performance**
- Ensure the HNSW vector index was created successfully
- Consider adjusting the `match_threshold` in the search function
### Debug Mode
Run with debug logging:
```bash
DEBUG=* npx rag-mcp@latest
```
## References
- [Model Context Protocol](https://modelcontextprotocol.io/)
- [MCP SDK Documentation](https://modelcontextprotocol.io/docs/sdks)
- [Supabase Vector Documentation](https://supabase.com/docs/guides/ai/vector-embeddings)
- [Cohere Embeddings API](https://docs.cohere.com/docs/embeddings)
## License
MIT
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## Support
- GitHub Issues: https://github.com/your-username/rag-mcp/issues
- Documentation: See `db/setup-instructions.md` for detailed database setup