@vjlanguage/mcp-vj-docs
Version:
MCP server for documentation crawling, indexing, and retrieval
708 lines (564 loc) • 30.9 kB
Markdown
# MCP Documentation Server (@vjlanguage/mcp-vj-docs)
A Model Context Protocol (MCP) server for documentation crawling, indexing, and retrieval. This package provides tools for crawling websites, storing and indexing the content, and searching through that content using TF-IDF based search. The search results are optimized for large language models.
# MCP 文档服务器 (@vjlanguage/mcp-vj-docs)
一个用于文档爬取、索引和检索的模型上下文协议(MCP)服务器。该包提供了爬取网站、存储和索引内容以及使用基于 TF-IDF 的搜索来搜索内容的工具。搜索结果经过优化,适合大型语言模型使用。
## Features | 功能
- **Documentation Crawling**: Crawl documentation from websites using Firecrawl
- **Content Processing**: Convert HTML to Markdown and extract relevant content
- **Storage & Indexing**: Store documents using lowdb with TF-IDF based indexing
- **LLM-Optimized Search**: Search for documentation with aggregated results optimized for large language models
- **Full Content Return**: No character length limits on search results
- **Content-First Results**: Prioritizes content over URLs in search results
- **Smart Deduplication**: Removes duplicate content and returns only the top 3 most relevant results
- **AI-Optimized Format**: Results structured specifically for AI consumption and code generation
- **Complete Document Context**: Returns full document content via `fullDocument` field for comprehensive context
- **Custom Corpus Management**: Add your own text corpus files for inclusion in search results
- **Multiple Format Support**: Supports TXT, Markdown, and PDF files
- **Automatic Indexing**: Files in corpus directory are automatically indexed and searchable
- **MCP Integration**: Expose tools for crawling and searching via Model Context Protocol
- **Path Handling**: Support for tilde (~) expansion in file paths
- **Server Modes**: Support for both SSE (Server-Sent Events) and stdio transports
## 功能
- **文档爬取**:使用 Firecrawl 从网站爬取文档
- **内容处理**:将 HTML 转换为 Markdown 并提取相关内容
- **存储和索引**:使用 lowdb 存储文档,并使用基于 TF-IDF 的索引
- **LLM 优化搜索**:搜索文档并返回经过聚合的结果,专为大型语言模型优化
- **完整内容返回**:搜索结果没有字符长度限制
- **内容优先结果**:在搜索结果中优先考虑内容而非 URL
- **智能去重**:移除重复内容并仅返回前 3 个最相关的结果
- **AI 优化格式**:结果结构专为 AI 消费和代码生成而设计
- **完整文档上下文**:通过 `fullDocument` 字段返回完整文档内容,提供全面的上下文
- **自定义语料库管理**:添加您自己的文本语料库文件以包含在搜索结果中
- **多格式支持**:支持 TXT、Markdown 和 PDF 文件
- **自动索引**:语料目录中的文件自动索引并可搜索
- **MCP 集成**:通过模型上下文协议暴露爬取和搜索工具
- **路径处理**:支持波浪号(~)在文件路径中的扩展
- **服务器模式**:支持 SSE(服务器发送事件)和 stdio 传输
## Changelog | 更新日志
### 2025-04-11
- **Search Result Enhancement**: Modified search functionality to include relevant paragraphs for each individual result item, rather than only showing content for the top result.
- **Result Format Improvement**: Changed the structure to make it clearer which document content belongs to which search result.
- **Document Retrieval Enhancement**: Improved the `vjdoc_get_document` tool to support partial matching for both URL and title parameters.
### 2025年04月11日
- **搜索结果增强**:修改了搜索功能,为每个单独的结果项包含相关段落,而不仅仅是显示顶部结果的内容。
- **结果格式改进**:更改了结构,使其更清晰地显示哪些文档内容属于哪个搜索结果。
- **文档检索增强**:改进了 `vjdoc_get_document` 工具,支持 URL 和标题参数的部分匹配。
## Installation | 安装
```bash
# Install globally | 全局安装
npm install -g @vjlanguage/mcp-vj-docs
# Or use with npx | 或使用 npx
npx @vjlanguage/mcp-vj-docs
```
## Firecrawl Registration and API Key | Firecrawl 注册和 API 密钥
### English
This package uses Firecrawl service for web crawling. To use it, you need to:
1. **Register for Firecrawl**:
- Visit [Firecrawl website](https://firecrawl.dev) and create an account
- Or use the local Firecrawl service by setting `FIRECRAWL_API_URL` to your local endpoint
2. **Get your API Key**:
- After registration, navigate to your account dashboard
- Find and copy your API key
- Add this key to your environment variables or MCP configuration
3. **Configure the API Key**:
- Set the `FIRECRAWL_API_KEY` environment variable
- Or add it to your MCP configuration (see example below)
### 中文
本包使用 Firecrawl 服务进行网页爬取。要使用它,您需要:
1. **注册 Firecrawl**:
- 访问 [Firecrawl 网站](https://firecrawl.dev) 并创建账户
- 或通过设置 `FIRECRAWL_API_URL` 为您的本地端点来使用本地 Firecrawl 服务
2. **获取您的 API 密钥**:
- 注册后,导航到您的账户仪表板
- 找到并复制您的 API 密钥
- 将此密钥添加到您的环境变量或 MCP 配置中
3. **配置 API 密钥**:
- 设置 `FIRECRAWL_API_KEY` 环境变量
- 或将其添加到您的 MCP 配置中(见下面的示例)
## Usage | 使用方法
### Environment Variables | 环境变量
- `VJDOC_DB_PATH` - Path to the database file (default: ./data/docs.json) | 数据库文件路径(默认:./data/docs.json)
- `VJDOC_MAX_DEPTH` - Maximum depth to crawl (default: 3) | 最大爬取深度(默认:3)
- `VJDOC_MAX_PAGES` - Maximum number of pages to crawl (default: 100) | 最大爬取页面数(默认:100)
- `VJDOC_LOG_DIR` - Directory for log files | 日志文件目录
- `VJDOC_LOG_TO_FILE` - Whether to log to file (true/false) | 是否记录到文件(true/false)
- `VJDOC_LOG_LEVEL` - Log level (error, warn, info, debug) | 日志级别(error, warn, info, debug)
- `FIRECRAWL_API_KEY` - API key for Firecrawl service | Firecrawl 服务的 API 密钥
- `FIRECRAWL_API_URL` - Custom URL for Firecrawl API | Firecrawl API 的自定义 URL
- `MCP_TRANSPORT` - Transport method (sse or stdio, default: sse) | 传输方法(sse 或 stdio,默认:sse)
- `VJDOC_TFIDF_FILES_DIR` - Directory for custom corpus files (default: ~/mcpdata/tfidf_files) | 自定义语料库文件目录(默认:~/mcpdata/tfidf_files)
```json
{
"mcpServers": {
"mcp-vj-docs": {
"command": "npx",
"args": ["-y", "@vjlanguage/mcp-vj-docs@latest"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"VJDOC_MAX_DEPTH": "4",
"VJDOC_MAX_PAGES": "100",
"VJDOC_DB_PATH": "~/mcpdata/docs.json",
"VJDOC_LOG_DIR": "~/mcpdata/logs",
"VJDOC_LOG_TO_FILE": "true",
"VJDOC_LOG_LEVEL": "debug",
"FIRECRAWL_API_URL": "http://localhost:5002",
"VJDOC_TFIDF_FILES_DIR": "~/mcpdata/tfidf_files"
},
"disabled": false,
"timeout": 3600,
"autoApprove": ["vjdoc_search", "vjdoc_crawl", "vjdoc_add_corpus_file"]
}
}
}
```
## MCP Tools | MCP 工具
The server exposes the following MCP tools:
服务器暴露以下 MCP 工具:
### 1. `vjdoc_crawl` Tool | `vjdoc_crawl` 工具
Crawls a website and indexes its content for search.
爬取网站并为搜索索引其内容。
**Parameters | 参数:**
- `url` (string, required): The URL to crawl (e.g., "https://example.com/docs") | 要爬取的 URL(例如,"https://example.com/docs")
- `maxDepth` (number, optional): Maximum depth to crawl, default: 3 | 最大爬取深度,默认:3
- `maxPages` (number, optional): Maximum number of pages to crawl, default: 100 | 最大爬取页面数,默认:100
- `includePatterns` (array of strings, optional): Patterns to include in crawl (e.g., ["docs/*"]) | 要包含在爬取中的模式(例如,["docs/*"])
- `excludePatterns` (array of strings, optional): Patterns to exclude from crawl (e.g., ["blog/*"]) | 要从爬取中排除的模式(例如,["blog/*"])
- `defaultCategory` (string, optional): Default category for documents if not detected automatically | 如果未自动检测到,文档的默认类别
**Example | 示例:**
```json
{
"url": "https://example.com/docs",
"maxDepth": 3,
"maxPages": 100,
"includePatterns": ["docs/*"],
"excludePatterns": ["blog/*"]
}
```
**Response | 响应:**
```json
{
"success": true,
"message": "Successfully crawled and indexed 42 pages from https://example.com/docs",
"count": 42
}
```
### 2. `vjdoc_search` Tool | `vjdoc_search` 工具
Searches indexed documents with results optimized for large language models.
搜索已索引的文档,结果经过优化,适合大型语言模型。
**Parameters | 参数:**
- `query` (string, required): The search query (e.g., "how to use the API") | 搜索查询(例如,"如何使用 API")
- `limit` (number, optional): Maximum number of sources to consider, default: 10 | 要考虑的最大源数,默认:10
- `filters` (object, optional): Optional filters to narrow down search results | 可选过滤器,用于缩小搜索结果范围
- `categories` (array of strings, optional): Filter by document categories | 按文档类别过滤
- `dateFrom` (number, optional): Filter documents created after this timestamp | 过滤在此时间戳之后创建的文档
- `dateTo` (number, optional): Filter documents created before this timestamp | 过滤在此时间戳之前创建的文档
- `metadata` (object, optional): Filter by metadata fields | 按元数据字段过滤
- `userId` (string, optional): Optional user ID for personalized results | 可选的用户 ID,用于个性化结果
**Example | 示例:**
```json
{
"query": "how to use the API",
"limit": 5,
"filters": {
"categories": ["API Documentation"]
}
}
```
**Response | 响应:**
```json
{
"success": true,
"results": {
"paragraph": "The API can be used by making HTTP requests to the endpoints...",
"sources": [
{
"url": "https://example.com/docs/api",
"title": "API Documentation",
"relevance": 0.85,
"paragraph": "The API can be used by making HTTP requests to the endpoints...",
"highlightedParagraph": "The **API** can be used by making **HTTP** requests to the **endpoints**...",
"fullDocument": "Complete document content for this specific result..."
}
]
}
}
```
### 3. `vjdoc_add_corpus_file` Tool | `vjdoc_add_corpus_file` 工具
Adds a custom corpus file to the TF-IDF files directory for inclusion in search results. This is perfect for adding your own code snippets, documentation, error solutions, or technical notes that you want to be searchable.
向 TF-IDF 文件目录添加自定义语料库文件,以包含在搜索结果中。这非常适合添加您自己的代码片段、文档、错误解决方案或技术笔记,使它们可被搜索。
**Parameters | 参数:**
- `content` (string, required): The text content to add to the corpus file | 要添加到语料库文件的文本内容
- `filename` (string, optional): Optional filename for the corpus file (without extension) | 语料库文件的可选文件名(不带扩展名)
- `category` (string, optional): Optional category for the corpus file | 语料库文件的可选类别
**Recommended Categories | 推荐类别:**
- `Code Snippet` - Reusable code patterns and examples | 可重用的代码模式和示例
- `API Documentation` - Function and parameter descriptions | 函数和参数描述
- `Error Solution` - Common errors and their fixes | 常见错误及其修复方法
- `Technical Note` - Personal learning summaries | 个人学习总结
**Example | 示例:**
```json
{
"content": "// 快速排序实现\nfunction quickSort(arr) {\n if (arr.length <= 1) return arr;\n const pivot = arr[0];\n const left = []; \n const right = [];\n for (let i = 1; i < arr.length; i++) {\n arr[i] < pivot ? left.push(arr[i]) : right.push(arr[i]);\n }\n return [...quickSort(left), pivot, ...quickSort(right)];\n}\n\n// 常见错误:Uncaught TypeError\n// 解决方案:检查变量是否为null/undefined",
"filename": "quicksort_algorithm",
"category": "Code Snippet"
}
```
**Response | 响应:**
```json
{
"success": true,
"message": "Successfully added corpus file: code_snippet_quicksort_algorithm.txt",
"filename": "code_snippet_quicksort_algorithm.txt",
"category": "Code Snippet"
}
```
### 4. `vjdoc_get_docs_meta` Tool | `vjdoc_get_docs_meta` 工具
Retrieves metadata about all documents and corpus files to help LLMs understand the available content and plan effective searches.
获取所有文档和语料库文件的元数据,帮助大型语言模型了解可用内容并规划有效的搜索。
**Parameters | 参数:**
- `query` (string, required): Natural language query or requirement | 自然语言查询或需求
**Response Format | 响应格式:**
```json
{
"query": "Original natural language query",
"documents": [
{
"url": "Document URL",
"title": "Document title",
"category": "Document category",
"timestamp": 1712190000000,
"keywords": ["keyword1", "keyword2", "..."],
"summary": "Brief summary of document content..."
}
],
"totalDocuments": 42,
"categories": ["API Documentation", "Code Snippet", "..."],
"suggestion": "Search guidance for LLMs"
}
```
### 5. `vjdoc_get_document` Tool | `vjdoc_get_document` 工具
Gets the full content of a specific document by URL or title.
通过 URL 或标题获取特定文档的完整内容。
**Parameters | 参数:**
- `url` (string, optional): URL of the document to retrieve | 要检索的文档的 URL
- `title` (string, optional): Title of the document to retrieve | 要检索的文档的标题
**Notes | 注意:**
- At least one of `url` or `title` must be provided | 必须提供 `url` 或 `title` 中的至少一个
- The tool supports partial matching for both parameters | 该工具支持两个参数的部分匹配
- When using `url` parameter, it will find documents where the URL contains the provided string | 使用 `url` 参数时,它将查找 URL 包含所提供字符串的文档
- When using `title` parameter, it will find documents where the title contains the provided string (case-insensitive) | 使用 `title` 参数时,它将查找标题包含所提供字符串的文档(不区分大小写)
**Example | 示例:**
```json
{
"url": "https://example.com/docs/auth"
}
```
or | 或
```json
{
"title": "Authentication Guide"
}
```
**Response | 响应:**
```json
{
"url": "https://example.com/docs/auth",
"title": "Authentication Guide",
"content": "Complete document content...",
"metadata": {
"category": "API Documentation",
"lastModified": "2023-01-15T12:00:00Z"
}
}
```
## Using with AI Coding Assistants | 与 AI 编码助手一起使用
You can use these MCP tools with various AI coding assistants to enhance your documentation workflow.
您可以在各种 AI 编码助手中使用这些 MCP 工具来增强您的文档工作流程。
### Using with Cursor | 在 Cursor 中使用
In Cursor, you can use the MCP tools through the command interface:
在 Cursor 中,您可以通过命令界面使用 MCP 工具:
1. **Setup | 设置**: Configure Cursor to use your MCP server | 配置 Cursor 使用您的 MCP 服务器
2. **Crawling | 爬取**: Use the `/mcp` command to invoke the crawl tool | 使用 `/mcp` 命令调用 crawl 工具
```
/mcp mcp-vj-docs vjdoc_crawl {"url": "https://example.com/docs", "maxDepth": 3, "maxPages": 100}
```
3. **Searching | 搜索**: Use the `/mcp` command to invoke the search tool | 使用 `/mcp` 命令调用 search 工具
```
/mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5, "filters": {"categories": ["API Documentation"]}}
```
4. **Adding Corpus Files | 添加语料库文件**: Use the `/mcp` command to add custom corpus files | 使用 `/mcp` 命令添加自定义语料库文件
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"}
```
5. **Getting Document Content | 获取文档内容**: Use the `/mcp` command to get full document content | 使用 `/mcp` 命令获取完整文档内容
```
/mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth"}
```
or | 或
```
/mcp mcp-vj-docs vjdoc_get_document {"title": "Authentication Guide"}
```
### Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程
When working with AI assistants like Claude or GPT, you can create a more effective workflow:
1. **First, get document metadata** to understand what's available:
```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"}
```
2. **Then, search for relevant documents**:
```
/mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3}
```
3. **Finally, get the full content** of the most relevant document for comprehensive context:
```
/mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
```
4. **Ask the AI assistant** to explain or generate code based on the full document:
```
Based on this documentation, please explain how to implement JWT authentication in my Node.js application.
```
This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents.
当与 Claude 或 GPT 等 AI 助手一起工作时,您可以创建更有效的工作流程:
1. **首先,获取文档元数据**以了解有哪些可用内容:
```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"}
```
2. **然后,搜索相关文档**:
```
/mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3}
```
3. **最后,获取最相关文档的完整内容**以获得全面的上下文:
```
/mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
```
4. **请求 AI 助手**基于完整文档解释或生成代码:
```
根据这份文档,请解释如何在我的 Node.js 应用程序中实现 JWT 认证。
```
这个工作流程确保 AI 拥有完整的上下文,同时通过仅检索最相关文档的完整内容来最小化令牌使用。
## Troubleshooting | 故障排除
### Common Issues | 常见问题
1. **Database Path Issues | 数据库路径问题**
- Ensure the directory for your database exists | 确保您的数据库目录存在
- Check if you have write permissions to the specified path | 检查您是否有写入指定路径的权限
- For tilde paths, ensure your home directory is correctly detected | 对于波浪号路径,确保正确检测到您的主目录
2. **Firecrawl API Issues | Firecrawl API 问题**
- Verify your API key is correct | 验证您的 API 密钥是否正确
- Check if you've reached API rate limits | 检查您是否达到了 API 速率限制
- If using a local Firecrawl service, ensure it's running | 如果使用本地 Firecrawl 服务,确保它正在运行
3. **Crawling Issues | 爬取问题**
- Some websites may block crawlers | 某些网站可能会阻止爬虫
- Check if the website requires authentication | 检查网站是否需要身份验证
- Try reducing the crawl depth and page limit | 尝试减少爬取深度和页面限制
### Logs | 日志
Check the logs for more detailed error information:
查看日志以获取更详细的错误信息:
- If `VJDOC_LOG_TO_FILE` is enabled, check the log files in your log directory | 如果启用了 `VJDOC_LOG_TO_FILE`,请检查日志目录中的日志文件
- Otherwise, check the console output | 否则,检查控制台输出
## Search Tool Response Format | 搜索工具响应格式
The `vjdoc_search` tool returns results in the following format:
```json
{
"results": [
{
"url": "https://example.com/docs/api",
"title": "API Documentation",
"relevance": 0.85,
"category": "API Documentation",
"paragraph": "Content excerpt most relevant to this document...",
"highlightedParagraph": "Content with **highlighted** query terms for this document...",
"fullDocument": "Complete content for this specific document..." // Only present for the most relevant result
},
{
"url": "https://example.com/docs/guide",
"title": "User Guide",
"relevance": 0.75,
"category": "Documentation",
"paragraph": "Content excerpt most relevant to this document...",
"highlightedParagraph": "Content with **highlighted** query terms for this document..."
// No fullDocument field for lower-ranked results
},
// More results...
],
"content": "Summary of content most relevant to the query...",
"fullDocument": "Complete document of the most relevant result",
"personalized": true
}
```
Key fields:
- `results`: 带有相关性分数的来源列表
- 每个结果包括:
- `url`: 文档 URL
- `title`: 文档标题
- `relevance`: 相关性分数
- `category`: 文档类别
- `paragraph`: 来自此特定文档的相关段落摘录
- `highlightedParagraph`: 带有高亮查询词的此文档段落
- `fullDocument`: 完整的文档内容(仅适用于最相关的结果)
- `content`: 与查询相关的提取内容摘要
- `fullDocument`: 最相关结果的完整文档内容
- `personalized`: 结果是否基于用户 ID 进行了个性化
### 搜索工具响应格式
`vjdoc_search` 工具返回以下格式的结果:
```json
{
"results": [
{
"url": "https://example.com/docs/api",
"title": "API Documentation",
"relevance": 0.85,
"category": "API Documentation",
"paragraph": "与此文档最相关的内容段落...",
"highlightedParagraph": "带有**高亮**查询词的此文档段落...",
"fullDocument": "此特定文档的完整内容..." // 只有最相关的结果才包含此字段
},
{
"url": "https://example.com/docs/guide",
"title": "User Guide",
"relevance": 0.75,
"category": "Documentation",
"paragraph": "与此文档最相关的内容段落...",
"highlightedParagraph": "带有**高亮**查询词的此文档段落..."
// 较低排名的结果没有 fullDocument 字段
},
// 更多结果...
],
"content": "与查询最相关的摘要内容...",
"fullDocument": "最相关结果的完整文档内容",
"personalized": true
}
```
关键字段:
- `results`: 带有相关性分数的来源列表
- 每个结果包括:
- `url`: 文档 URL
- `title`: 文档标题
- `relevance`: 相关性分数
- `category`: 文档类别
- `paragraph`: 来自此特定文档的相关段落摘录
- `highlightedParagraph`: 带有高亮查询词的此文档段落
- `fullDocument`: 完整的文档内容(仅适用于最相关的结果)
- `content`: 与查询相关的提取内容摘要
- `fullDocument`: 最相关结果的完整文档内容
- `personalized`: 结果是否基于用户 ID 进行了个性化
## Examples | 示例
### Searching Across Database and Corpus | 在数据库和语料库中搜索
```
/mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5}
```
This will search for "authentication" in both the crawled documents (database) and your custom corpus files.
### Using Natural Language Queries | 使用自然语言查询
For natural language requirements, you can use the metadata tool first:
```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement user authentication in my React application"}
```
Then use the search tool with the refined query:
```
/mcp mcp-vj-docs vjdoc_search {"query": "React authentication implementation", "filters": {"categories": ["Code Snippet", "API Documentation"]}}
```
### Utilizing the fullDocument Field | 利用 fullDocument 字段
When working with LLMs, you can use the `fullDocument` field to provide comprehensive context:
```javascript
// 使用 fullDocument 字段与 LLM 的示例
const searchResults = await searchDocs("如何实现 JWT 认证");
const fullContext = searchResults.fullDocument;
// 现在您可以要求 LLM 基于完整文档生成代码
const generatedCode = await llm.generateCode(
`基于此文档: ${fullContext}\n\n生成一个 JWT 认证实现`
);
```
```
## Real-World Use Cases | 实际使用场景
### Personal Knowledge Base | 个人知识库
- Save code snippets you frequently use for easy reference | 保存您经常使用的代码片段以便于参考
- Document API endpoints with examples | 使用示例记录 API 端点
- Keep track of error messages and their solutions | 跟踪错误消息及其解决方案
- Store configuration examples for different environments | 存储不同环境的配置示例
- Create a personal knowledge base of technical notes | 创建技术笔记的个人知识库
**Pro Tip | 专业提示:**
Organize your corpus files with consistent categories to make searching more effective. You can then filter search results by category to find exactly what you need!
使用一致的类别组织您的语料库文件,使搜索更有效。然后,您可以按类别过滤搜索结果,以找到您需要的确切内容!
## PDF Support | PDF 支持
The system now supports adding PDF files to the corpus. PDFs are automatically converted to Markdown format for better searchability. | 系统现在支持将PDF文件添加到语料库。PDF会自动转换为Markdown格式以提高可搜索性。
**Adding a PDF file in Cline | 在Cline中添加PDF文件**:
Simply provide the absolute path to your PDF file:
```bash
cline mcp mcp-vj-docs vjdoc_add_corpus_file --filePath "/absolute/path/to/your/document.pdf" --category "Documentation"
```
**Adding a PDF file in Cursor | 在Cursor中添加PDF文件**:
Simply provide the absolute path to your PDF file:
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"filePath": "/absolute/path/to/your/document.pdf", "category": "Documentation"}
```
The system extracts text from the PDF and converts it to Markdown format, preserving structure like headings, code blocks, and lists where possible. | 系统从PDF中提取文本并将其转换为Markdown格式,尽可能保留标题、代码块和列表等结构。
## How It Works | 工作原理
1. When you add a corpus file, it's saved to the `VJDOC_TFIDF_FILES_DIR` directory | 当您添加语料库文件时,它会保存到 `VJDOC_TFIDF_FILES_DIR` 目录
2. If you don't specify a filename, one will be generated automatically | 如果您不指定文件名,将自动生成一个
3. The category will be added as a prefix to the filename | 类别将作为前缀添加到文件名中
4. The file is automatically indexed and will appear in search results | 文件会自动索引并出现在搜索结果中
5. You can search for this content later using the `vjdoc_search` tool | 您可以稍后使用 `vjdoc_search` 工具搜索此内容
## Practical Workflow Examples | 实用工作流程示例
Here are some practical workflows combining these tools:
以下是结合这些工具的一些实用工作流程:
1. **Documentation Indexing | 文档索引**
- Crawl your project documentation: | 爬取您的项目文档:
```
/mcp mcp-vj-docs vjdoc_crawl {"url": "https://your-project-docs.com"}
```
- Add custom code snippets: | 添加自定义代码片段:
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"}
```
- Search across all indexed content: | 搜索所有已索引内容:
```
/mcp mcp-vj-docs vjdoc_search {"query": "how to implement feature X"}
```
2. **Personal Knowledge Base | 个人知识库**
- Add error solutions as you encounter them: | 添加您遇到的错误解决方案:
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "Error: Module not found\nSolution: Run npm install", "category": "Error Solution"}
```
- Add API documentation for your projects: | 为您的项目添加 API 文档:
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "function getData(id) - Retrieves data by ID from the API", "category": "API Documentation"}
```
- Search your knowledge base when needed: | 在需要时搜索您的知识库:
```
/mcp mcp-vj-docs vjdoc_search {"query": "module not found", "filters": {"categories": ["Error Solution"]}}
```
## Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程
When working with AI assistants like Claude or GPT, you can create a more effective workflow:
1. **First, get document metadata** to understand what's available:
```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"}
```
2. **Then, search for relevant documents**:
```
/mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3}
```
3. **Finally, get the full content** of the most relevant document for comprehensive context:
```
/mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
```
4. **Ask the AI assistant** to explain or generate code based on the full document:
```
Based on this documentation, please explain how to implement JWT authentication in my Node.js application.
```
This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents.
当与 Claude 或 GPT 等 AI 助手一起工作时,您可以创建更有效的工作流程:
1. **首先,获取文档元数据**以了解有哪些可用内容:
```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"}
```
2. **然后,搜索相关文档**:
```
/mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3}
```
3. **最后,获取最相关文档的完整内容**以获得全面的上下文:
```
/mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
```
4. **请求 AI 助手**基于完整文档解释或生成代码:
```
根据这份文档,请解释如何在我的 Node.js 应用程序中实现 JWT 认证。
```
这个工作流程确保 AI 拥有完整的上下文,同时通过仅检索最相关文档的完整内容来最小化令牌使用。