UNPKG

@taiyokimura/rag-mcp

Version:

RAG (Retrieval-Augmented Generation) MCP Server with Supabase and Cohere integration

334 lines (245 loc) 7.42 kB
# RAG MCP A Retrieval-Augmented Generation (RAG) MCP Server that integrates with Supabase and Cohere to provide intelligent code search and retrieval capabilities. ## Features - **Automatic Repository Indexing**: Scans and embeds all supported files in a repository - **Vector Similarity Search**: Uses Cohere embeddings for semantic code search - **Supabase Integration**: Stores embeddings and metadata in Supabase PostgreSQL with vector support - **File Type Support**: Supports 30+ programming languages and file types - **Smart Filtering**: Automatically ignores build artifacts, dependencies, and large files - **MCP Protocol**: Standard Model Context Protocol server for AI assistant integration ## Requirements - Node.js 18 or higher - Supabase account and project - Cohere API account - MCP-compatible client (Claude Desktop, etc.) ## Installation & Setup ### 1. Install the Package ```bash # Install globally via npm npm install -g @taiyokimura/rag-mcp # Or run directly with npx npx @taiyokimura/rag-mcp@latest ``` ### 2. Database Setup 1. Create a Supabase project at https://supabase.com 2. Execute the SQL schema from `db/schema.sql` in your Supabase SQL editor 3. Get your Supabase URL and anon key from the API settings ### 3. Get API Keys - **Supabase**: Project URL and anon key from your Supabase dashboard - **Cohere**: API key from https://cohere.com dashboard ### 4. Configure MCP Client Add the server to your MCP client configuration: #### Claude Desktop Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "rag-mcp": { "command": "npx", "args": ["@taiyokimura/rag-mcp@latest"], "env": { "SUPABASE_URL": "https://your-project-id.supabase.co", "SUPABASE_ANON_KEY": "your-anon-key-here", "COHERE_API_KEY": "your-cohere-api-key-here" } } } } ``` #### Other MCP Clients Use the command format: ```bash npx @taiyokimura/rag-mcp@latest ``` With environment variables: - `SUPABASE_URL`: Your Supabase project URL - `SUPABASE_ANON_KEY`: Your Supabase anon key - `COHERE_API_KEY`: Your Cohere API key - `MCP_NAME`: (Optional) Custom server name (default: "rag-mcp") ## Usage ### 1. Initialize Repository First, initialize your repository to embed all files: ``` Use the initialize_repository tool with your repository path ``` This will: - Scan all supported files in the repository - Generate embeddings using Cohere - Store content and embeddings in Supabase - Skip large files (>1MB) and ignored patterns ### 2. Search Code Search for relevant code using natural language: ``` Use the search_code tool with your search query ``` Examples: - "authentication functions" - "database connection setup" - "error handling middleware" - "API endpoint for user management" ## Supported File Types The server automatically processes these file types: **Programming Languages:** - JavaScript/TypeScript (`.js`, `.ts`, `.jsx`, `.tsx`) - Python (`.py`) - Java (`.java`) - C/C++ (`.c`, `.cpp`, `.h`) - C# (`.cs`) - PHP (`.php`) - Ruby (`.rb`) - Go (`.go`) - Rust (`.rs`) - Swift (`.swift`) - Kotlin (`.kt`) - Scala (`.scala`) **Web Technologies:** - HTML (`.html`) - CSS/SCSS/Sass (`.css`, `.scss`, `.sass`, `.less`) - XML (`.xml`) **Configuration & Data:** - JSON (`.json`) - YAML (`.yaml`, `.yml`) - SQL (`.sql`) - Environment files (`.env`) **Documentation:** - Markdown (`.md`) - Text files (`.txt`) **Scripts:** - Shell scripts (`.sh`, `.bash`, `.zsh`) - PowerShell (`.ps1`) - Batch files (`.bat`) - Dockerfile ## Build & Development ### Local Development ```bash # Clone the repository git clone https://github.com/your-username/rag-mcp.git cd rag-mcp # Install dependencies npm install # Build the project npm run build # Run locally npm run dev ``` ### Build for Distribution ```bash # Build TypeScript npm run build # Test the built version node build/index.js # Package for npm npm pack --dry-run ``` ## Publishing to npm ```bash # Login to npm npm login # Publish the package npm publish ``` ## Tools ### initialize_repository Scans and embeds all files in a repository. **Input Schema:** ```json { "repository_path": "string (required) - Path to the repository" } ``` **Output:** - Success/failure status - Statistics (total files, processed, failed) - Processing details ### search_code Searches for code using vector similarity. **Input Schema:** ```json { "query": "string (required) - Search query", "limit": "number (optional) - Max results (default: 5)" } ``` **Output:** - Matching files with similarity scores - Content previews - File metadata (path, type, size) ## Name Consistency & Troubleshooting ### Consistency Matrix Always use these standardized names: - **npm package name** → `rag-mcp` - **Binary name** → `rag-mcp` - **MCP server name** → `rag-mcp` - **Environment variable MCP_NAME** → `rag-mcp` - **Client registry key** → `rag-mcp` - **UI display label** → `RAG MCP` ### Conflict Cleanup - Remove any old entries with different names and re-add with `rag-mcp` - Ensure global MCP configurations only use `rag-mcp` for keys - This project does not include `.cursor/mcp.json` - configure in the UI only ### Example Configuration **Correct:** ```json { "mcpServers": { "rag-mcp": { "command": "npx", "args": ["@taiyokimura/rag-mcp@latest"] } } } ``` **Incorrect:** ```json { "mcpServers": { "RAG-MCP": { ... }, "ragMcp": { ... } } } ``` ## Environment Variables - `SUPABASE_URL`: Your Supabase project URL (required) - `SUPABASE_ANON_KEY`: Your Supabase anonymous key (required) - `COHERE_API_KEY`: Your Cohere API key (required) - `MCP_NAME`: Server name (optional, default: "rag-mcp") ## Performance Considerations - **File Size Limit**: Files larger than 1MB are automatically skipped - **Batch Processing**: Files are processed in batches to avoid rate limits - **Vector Index**: Uses HNSW index for fast similarity searches - **Ignored Patterns**: Automatically skips `node_modules`, build directories, etc. ## Troubleshooting ### Common Issues 1. **"Supabase client not initialized"** - Check your `SUPABASE_URL` and `SUPABASE_ANON_KEY` environment variables - Verify your Supabase project is active 2. **"Cohere client not initialized"** - Check your `COHERE_API_KEY` environment variable - Verify your Cohere account has API access 3. **"Vector extension not available"** - Make sure you executed the database schema in Supabase - The `vector` extension should be available by default in Supabase 4. **Slow search performance** - Ensure the HNSW vector index was created successfully - Consider adjusting the `match_threshold` in the search function ### Debug Mode Run with debug logging: ```bash DEBUG=* npx rag-mcp@latest ``` ## References - [Model Context Protocol](https://modelcontextprotocol.io/) - [MCP SDK Documentation](https://modelcontextprotocol.io/docs/sdks) - [Supabase Vector Documentation](https://supabase.com/docs/guides/ai/vector-embeddings) - [Cohere Embeddings API](https://docs.cohere.com/docs/embeddings) ## License MIT ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests if applicable 5. Submit a pull request ## Support - GitHub Issues: https://github.com/your-username/rag-mcp/issues - Documentation: See `db/setup-instructions.md` for detailed database setup