embedocs-mcp
Version:
Transform any GitHub repository into searchable vector embeddings. MCP server with smart indexing, voyage-context-3 embeddings, and semantic search for Claude/Cursor IDEs.
535 lines (410 loc) • 18.8 kB
Markdown
```
███████╗███╗ ███╗██████╗ ███████╗██████╗ ██████╗ ██████╗███████╗
██╔════╝████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔═══██╗██╔════╝██╔════╝
█████╗ ██╔████╔██║██████╔╝█████╗ ██║ ██║██║ ██║██║ ███████╗
██╔══╝ ██║╚██╔╝██║██╔══██╗██╔══╝ ██║ ██║██║ ██║██║ ╚════██║
███████╗██║ ╚═╝ ██║██████╔╝███████╗██████╔╝╚██████╔╝╚██████╗███████║
╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═════╝ ╚═════╝ ╚═════╝╚══════╝
```
<div align="center">
# 🧠 **AI That Actually Knows Your Docs**
[](https://www.npmjs.com/package/embedocs-mcp)
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org)
[](https://embedocs.site/)
**Stop googling outdated Stack Overflow. Give your AI access to the LATEST documentation.**
*AI knowledge cutoffs are killing developer productivity*
[🌐 Website](https://embedocs.site/) • [🚀 Quick Start](#-quick-start) • [⚡ Power of Semantic Search](#-the-semantic-search-advantage) • [🎯 Examples](#-real-world-examples) • [📖 Setup](#-setup-guide)
</div>
## 🤕 **The Documentation Hell Every Developer Lives In**
Your AI assistant has **knowledge cutoffs** - it doesn't know about:
```
❌ New MongoDB 8.0 features (AI knows up to 7.0)
❌ Latest React 19 APIs (AI stuck on 18)
❌ Fresh TypeScript 5.6 syntax (AI knows 5.2)
❌ Your company's internal APIs (AI has no clue)
❌ Updated AWS services (AI knowledge is 6 months old)
```
**So you waste HOURS:**
- 🔍 Googling for current docs
- 📖 Reading through endless documentation pages
- 🤔 Figuring out what's changed since AI's training
- 😫 Getting outdated or wrong answers from AI
## 🧠 **EmbeDocs: AI With Current Knowledge**
```
┌──────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Latest Docs │───▶│ EmbeDocs │───▶│ Smart AI │
│ 📚 MongoDB 8.0 │ │ 🧠 Semantic │ │ 💡 Current │
│ ⚛️ React 19 │ │ 🔍 Search │ │ Answers │
│ 🔷 TypeScript │ │ ⚡️ Instant │ │ │
│ ☁️ AWS Latest │ │ Context │ │ │
└──────────────────┘ └─────────────────┘ └──────────────────┘
```
**Give your AI CURRENT, ACCURATE documentation knowledge in minutes**
✅ **After EmbeDocs**:
```
✅ You: "How do I use MongoDB 8.0's new queryable encryption?"
🤖 AI: [Finds latest docs, explains step-by-step with current syntax]
✅ You: "What's new in React 19 server components?"
🤖 AI: [Returns exact React 19 documentation with examples]
✅ You: "How does TypeScript 5.6 handle the new import assertions?"
🤖 AI: [Shows current TypeScript docs with working code samples]
```
## ⚡ **The Semantic Search Advantage**
### 🔍 **Beyond Keyword Matching**
Traditional search finds words. **EmbeDocs understands MEANING.**
```bash
# You search: "slow database"
# Regular search finds: documents containing "slow" AND "database"
# EmbeDocs semantic search finds: performance optimization, indexing strategies,
# query bottlenecks, N+1 problems, connection pooling - ALL related concepts!
```
### 🧠 **Powered by voyage-context-3**
- **1024-dimensional embeddings** - Captures deep semantic relationships
- **32K token context** - Understands entire documentation pages
- **Code-optimized** - Specifically trained on programming content
- **Multi-language** - Works across JavaScript, Python, Go, Rust, Java, C++
### 🎯 **Smart Search Modes**
1. **Hybrid Search** (Default): Combines semantic understanding + keyword precision
2. **MMR Search** (Advanced): Maximum diversity - finds ALL related concepts, not just similar ones
3. **Vector Search** (Pure): 100% meaning-based, perfect for conceptual questions
## 🎯 **Real-World Examples**
### **👨💻 Keep Up With Fast-Moving Projects**
```bash
# Add repos via web interface
embedocs setup
# Select and add:
# - facebook/react (Latest React documentation)
# - microsoft/TypeScript (Current TypeScript docs)
# - Your company's documentation repos
# Then index them all:
embedocs index
# Now your AI knows CURRENT features:
"What's new in React 19?"
"How do TypeScript 5.6 decorators work?"
"Show me the latest Suspense patterns"
```
### **🏢 Company Internal Documentation**
```bash
# Add your company repos through the web interface
embedocs setup
# Add your private repositories:
# - yourcompany/api-docs
# - yourcompany/architecture-guide
# - yourcompany/internal-wiki
# Your AI now understands your business:
"How does our payment processing work?"
"What are our microservice communication patterns?"
"Where do we handle user authentication?"
```
### **📚 Master New Technologies**
```bash
# Use the web interface to add cutting-edge projects
embedocs setup
# Add repositories like:
# - vercel/next.js
# - openai/openai-python
# - langchain-ai/langchain
# Learn from the source:
"How does Next.js App Router actually work?"
"What's the best way to use OpenAI's new API?"
"Show me advanced LangChain patterns"
```
## 🚀 **Quick Start** *(3 Simple Steps)*
### **Step 1: Install**
```bash
npm install -g embedocs-mcp
```
### **Step 2: First Run** *(Auto-launches setup wizard!)*
```bash
embedocs
# ✨ Automatically opens setup wizard on first run!
```
Or manually run setup anytime:
```bash
embedocs setup
```
### **🎨 Beautiful Web Interface**
<div align="center">
<img src="https://embedocs.site/pics/Screenshot%202025-08-19%20at%2015.29.52.png" alt="EmbeDocs Setup Wizard" width="800">
<br>
<em>Modern, intuitive setup wizard with a stunning 2025 UI design</em>
</div>
**🌐 Opens a stunning web interface in your browser!**
- Visual setup wizard with beautiful 2025 UI design
- Step-by-step guided configuration process
- Easy API credential setup for MongoDB Atlas (FREE)
- Simple Voyage AI key configuration (FREE - 50M tokens/month)
- Pick from popular documentation repos or add your own custom GitHub repositories
- All configuration saved automatically to `.env`
- Real-time connection testing and validation
### **Step 3: Add & Index Your Documentation**
**Option A: Using Web Interface** (Recommended ✨)
```bash
embedocs setup # or just 'embedocs' on first run
```
- Select from popular repos, add your own GitHub repositories, or switch to the
"Official Website" tab and paste a docs root URL (e.g., https://www.mongodb.com/docs/).
- Click "Validate & Add Website" to ingest the entire site (sitemap + discover).
- Click "Start Indexing" to begin
- All selected repos are saved for future CLI use
**Option B: Command Line** (After adding repos via web)
```bash
# After adding repos through web interface:
embedocs index # Indexes all your selected repositories
embedocs update # Updates only changed files
embedocs rebuild # Force re-index everything
```
**Important**: You must first add repositories using the web interface (`embedocs setup`). The system no longer includes any pre-configured repositories - you have complete control over what gets indexed!
### **Step 4: Connect to Your AI**
**Cursor IDE** (Recommended):
```json
// .cursor/settings.json
{
"mcpServers": {
"embedocs": {
"command": "npx",
"args": ["embedocs-mcp"],
"env": {
"MONGODB_URI": "your-mongodb-connection-string",
"VOYAGE_API_KEY": "your-voyage-api-key"
}
}
}
}
```
**Claude Code** (Same configuration):
```json
{
"mcpServers": {
"embedocs": {
"command": "npx",
"args": ["embedocs-mcp"],
"env": {
"MONGODB_URI": "your-mongodb-connection-string",
"VOYAGE_API_KEY": "your-voyage-api-key"
}
}
}
}
```
### **Step 5: Ask Current Questions!**
Your AI now has access to the LATEST documentation! 🎉
## 🔧 **What EmbeDocs Actually Does**
### 🎯 **Core Function**
**Indexes documentation repositories** and makes them **semantically searchable** by your AI through the Model Context Protocol (MCP).
### 🧠 **Smart Processing**
- **Semantic Chunking**: Intelligently splits docs into meaningful pieces (100-2500 chars)
- **voyage-context-3 Embeddings**: Creates 1024-dimensional vectors that understand code context
- **Automatic Indexing**: MongoDB Atlas vector + text search indexes created automatically
- **Git-Aware Updates**: Only processes changed files on updates
### 🔍 **Semantic Search Power**
- **Understands Intent**: "slow queries" finds performance docs, indexing guides, optimization tips
- **Code Context**: Knows that "authentication" relates to JWT, OAuth, sessions, middleware
- **Cross-Language**: Finds similar patterns across JavaScript, Python, Go implementations
- **Lightning Fast**: <100ms search responses with 7.5x performance optimization
### 🔌 **Universal AI Integration**
- **MCP Protocol**: Works with Claude Desktop, Cursor IDE, any MCP-compatible AI
- **Four Powerful Tools**: Primary hybrid search, advanced MMR search, full context fetcher, system status
- **Production Ready**: Handles 14,880+ documents with 0 failures
## 📖 **Setup Requirements** *(All FREE!)*
### **1. MongoDB Atlas** (Free 512MB tier)
- [Sign up here](https://cloud.mongodb.com)
- Create cluster → Copy connection string
- Add `0.0.0.0/0` to Network Access (allows EmbeDocs to connect)
### **2. Voyage AI** (Free 50M tokens/month)
- [Get API key here](https://voyageai.com)
- Industry-leading code embeddings
- 50M tokens = process 1000+ documentation repositories
### **3. Node.js 18+**
- [Download here](https://nodejs.org)
## 📊 **Why Semantic Search Matters**
### **Traditional Keyword Search vs EmbeDocs Semantic Search**
| Query | Keyword Search | EmbeDocs Semantic Search |
|-------|----------------|-------------------------|
| "slow database" | Finds docs with "slow" + "database" | Finds: performance tuning, indexing strategies, query optimization, connection pooling, N+1 problems |
| "user login" | Finds "user" + "login" exact matches | Finds: authentication, JWT tokens, OAuth flows, session management, middleware, security |
| "API errors" | Finds "API" + "errors" | Finds: error handling, HTTP status codes, exception patterns, debugging, logging, monitoring |
### **Real Performance Gains**
- **Search Speed**: <100ms average response time
- **Accuracy**: 92% relevance score with MMR diversity
- **Coverage**: Finds 3-5x more relevant results than keyword search
- **Context**: Understands relationships between concepts
## 🛠️ **Advanced Usage**
### **Index Multiple Documentation Sources**
```bash
# Frontend ecosystem
embedocs index https://github.com/facebook/react
embedocs index https://github.com/vuejs/core
embedocs index https://github.com/angular/angular
# Backend frameworks
embedocs index https://github.com/expressjs/express
embedocs index https://github.com/nestjs/nest
embedocs index https://github.com/django/django
# Cloud & DevOps
embedocs index https://github.com/aws/aws-cli
embedocs index https://github.com/kubernetes/kubernetes
embedocs index https://github.com/docker/cli
```
### **Monitor Indexing Progress**
```bash
# 🌐 Opens beautiful web dashboard at http://localhost:3333
embedocs progress
```
**Features:**
- Real-time progress bars and statistics
- "Keep Mac Awake" button (prevents sleep during long indexing)
- Shows all repositories being indexed
- Auto-refreshes every 5 seconds
- Estimated time remaining
```bash
# Quick CLI status check (no browser)
embedocs status
```
### **Smart Search Workflow with Full Context**
**CRITICAL: Search returns CHUNKS, not complete files!**
Always use the two-step workflow for complete understanding:
```bash
# Step 1: Search for relevant files
"How does the chatbot generate responses?"
→ mongodb-search finds: generate-response.js (partial chunk showing ~500 chars)
# Step 2: Get COMPLETE file content
→ mongodb-fetch-full-context("generate-response.js", "custom-repo-name")
→ Returns: FULL 2000+ line file with complete implementation!
```
**The Four Tools:**
1. **mongodb-search**: RRF hybrid search - best for general queries
2. **mongodb-mmr-search**: Maximum Marginal Relevance - best for diverse results
3. **mongodb-fetch-full-context**: Gets COMPLETE file content after search
4. **mongodb-status**: System health and statistics
**Smart Search Strategies:**
```bash
# For broad understanding - use hybrid search + fetch full context
"How does React handle state management?"
→ Search finds relevant files → Fetch complete implementations
# For comprehensive research - use MMR search + fetch full context
"Find ALL approaches to database optimization"
→ MMR finds diverse approaches → Fetch full files for each
# For specific implementations - always fetch full context
"Show me the authentication middleware"
→ Search finds auth.js → Fetch complete middleware code
```
## 🏗️ **Architecture: How It Works**
```
GitHub Documentation
↓
Git Clone & Parse
↓
Semantic Chunking (100-2500 chars)
↓
voyage-context-3 Embeddings (1024 dimensions)
↓
MongoDB Atlas (Vector + Text Indexes)
↓
MCP Protocol Tools
↓
Your AI Assistant
```
**Built on Production Infrastructure**:
- 🚀 **MongoDB Atlas**: Auto-creates vector search indexes, handles 50K+ documents on free tier
- 🧭 **Voyage AI**: State-of-the-art code embeddings, specifically trained for programming content
- 🤖 **MCP Protocol**: Standard integration works with any MCP-compatible AI assistant
## 💰 **Pricing: 100% FREE for Most Developers**
- **MongoDB Atlas**: 512MB free tier (handles 50,000+ documents)
- **Voyage AI**: 50M tokens/month free (index 1000+ repositories)
- **EmbeDocs**: Open source MIT license
- **Total Cost**: $0/month for typical usage
**Enterprise Scale**: Both services offer paid tiers for massive documentation sets.
## 🌟 **Why EmbeDocs vs Alternatives**
### **vs Googling Documentation**
- ❌ Google: Outdated results, SEO spam, wrong versions
- ✅ EmbeDocs: Always current, semantic understanding, AI integration
### **vs AI with Knowledge Cutoffs**
- ❌ Standard AI: 6-month old knowledge, makes up answers
- ✅ EmbeDocs: Real-time current docs, factual responses
### **vs Manual Documentation Reading**
- ❌ Manual: Hours of reading, finding specific answers
- ✅ EmbeDocs: Instant semantic search, AI explains in context
### **vs Other Documentation Tools**
- ❌ Others: Keyword search only, complex setup, expensive
- ✅ EmbeDocs: Semantic understanding, 60-second setup, free tier
## 🎯 **Perfect For**
### **📚 Documentation-Heavy Projects**
- MongoDB, PostgreSQL, Redis documentation
- AWS, GCP, Azure cloud service docs
- React, Vue, Angular framework documentation
- Company internal API documentation
### **⚡ Fast-Moving Technologies**
- AI/ML libraries (OpenAI, LangChain, Transformers)
- New language features (TypeScript, JavaScript, Python)
- Framework updates (Next.js, Django, Spring)
- Database new features (MongoDB, PostgreSQL)
### **🏢 Enterprise Internal Docs**
- Architecture decision records
- API specifications and guides
- Deployment and operational procedures
- Company coding standards and best practices
## 🔧 **Troubleshooting**
### **Setup Issues**
- **"embedocs: command not found"**: Run `npm install -g embedocs-mcp` with sudo if needed
- **Web interface doesn't open**: Navigate manually to http://localhost:3333
- **MongoDB connection fails**: Make sure to add `0.0.0.0/0` to Network Access in Atlas
### **Environment Configuration**
If the web setup doesn't work, create `.env` file manually:
```bash
# Create .env in your project directory
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
VOYAGE_API_KEY=pa-your-api-key-here
```
### **Indexing Issues**
- **"No repositories configured"**: Run `embedocs setup` to add repositories first
- **Rate limit errors**: Voyage AI free tier is limited to 2000 RPM - indexing automatically handles this
- **"0 chunks" for some files**: Normal for very small files
- **Process seems stuck**: Check `embedocs progress` for real-time status
### **Repository Management**
- All repositories are stored in `.repos/metadata.json`
- No hardcoded/default repositories - you control what gets indexed
- Add repos via web interface: `embedocs setup`
- Remove repos by editing `.repos/metadata.json` or using web interface
## 🤝 **Contributing**
Help make AI smarter about documentation!
```bash
git clone https://github.com/romiluz13/EmbeDocs-MCP.git
cd EmbeDocs-MCP
npm install
npm run build
npm test
```
**Areas for Contribution**:
- Support for more documentation formats (GitBook, Notion, etc.)
- Better chunking strategies for different content types
- Additional embedding models and search algorithms
- UI improvements for the setup wizard
## 📝 **License**
MIT © [Rom Iluz](https://github.com/romiluz13)
<div align="center">
### **🎯 Stop Fighting Outdated AI Knowledge**
```bash
npm install -g embedocs-mcp && embedocs
# Just run 'embedocs' - it auto-launches setup on first run!
```
**Give your AI access to current, accurate documentation in 60 seconds**
**[🌐 Website](https://embedocs.site/)** • **[⭐ Star on GitHub](https://github.com/romiluz13/EmbeDocs-MCP)** • **[📦 npm Package](https://www.npmjs.com/package/embedocs-mcp)** • **[🐛 Report Issues](https://github.com/romiluz13/EmbeDocs-MCP/issues)**
*"AI knowledge cutoffs are killing developer productivity. EmbeDocs fixes that."*
</div>