@vjlanguage/mcp-vj-docs

# MCP Documentation Server (@vjlanguage/mcp-vj-docs) A Model Context Protocol (MCP) server for documentation crawling, indexing, and retrieval. This package provides tools for crawling websites, storing and indexing the content, and searching through that content using TF-IDF based search. The search results are optimized for large language models. # MCP 文档服务器 (@vjlanguage/mcp-vj-docs) 一个用于文档爬取、索引和检索的模型上下文协议（MCP）服务器。该包提供了爬取网站、存储和索引内容以及使用基于 TF-IDF 的搜索来搜索内容的工具。搜索结果经过优化，适合大型语言模型使用。 ## Features | 功能 - **Documentation Crawling**: Crawl documentation from websites using Firecrawl - **Content Processing**: Convert HTML to Markdown and extract relevant content - **Storage & Indexing**: Store documents using lowdb with TF-IDF based indexing - **LLM-Optimized Search**: Search for documentation with aggregated results optimized for large language models - **Full Content Return**: No character length limits on search results - **Content-First Results**: Prioritizes content over URLs in search results - **Smart Deduplication**: Removes duplicate content and returns only the top 3 most relevant results - **AI-Optimized Format**: Results structured specifically for AI consumption and code generation - **Complete Document Context**: Returns full document content via `fullDocument` field for comprehensive context - **Custom Corpus Management**: Add your own text corpus files for inclusion in search results - **Multiple Format Support**: Supports TXT, Markdown, and PDF files - **Automatic Indexing**: Files in corpus directory are automatically indexed and searchable - **MCP Integration**: Expose tools for crawling and searching via Model Context Protocol - **Path Handling**: Support for tilde (~) expansion in file paths - **Server Modes**: Support for both SSE (Server-Sent Events) and stdio transports ## 功能 - **文档爬取**：使用 Firecrawl 从网站爬取文档 - **内容处理**：将 HTML 转换为 Markdown 并提取相关内容 - **存储和索引**：使用 lowdb 存储文档，并使用基于 TF-IDF 的索引 - **LLM 优化搜索**：搜索文档并返回经过聚合的结果，专为大型语言模型优化 - **完整内容返回**：搜索结果没有字符长度限制 - **内容优先结果**：在搜索结果中优先考虑内容而非 URL - **智能去重**：移除重复内容并仅返回前 3 个最相关的结果 - **AI 优化格式**：结果结构专为 AI 消费和代码生成而设计 - **完整文档上下文**：通过 `fullDocument` 字段返回完整文档内容，提供全面的上下文 - **自定义语料库管理**：添加您自己的文本语料库文件以包含在搜索结果中 - **多格式支持**：支持 TXT、Markdown 和 PDF 文件 - **自动索引**：语料目录中的文件自动索引并可搜索 - **MCP 集成**：通过模型上下文协议暴露爬取和搜索工具 - **路径处理**：支持波浪号（~）在文件路径中的扩展 - **服务器模式**：支持 SSE（服务器发送事件）和 stdio 传输 ## Changelog | 更新日志 ### 2025-04-11 - **Search Result Enhancement**: Modified search functionality to include relevant paragraphs for each individual result item, rather than only showing content for the top result. - **Result Format Improvement**: Changed the structure to make it clearer which document content belongs to which search result. - **Document Retrieval Enhancement**: Improved the `vjdoc_get_document` tool to support partial matching for both URL and title parameters. ### 2025年04月11日 - **搜索结果增强**：修改了搜索功能，为每个单独的结果项包含相关段落，而不仅仅是显示顶部结果的内容。 - **结果格式改进**：更改了结构，使其更清晰地显示哪些文档内容属于哪个搜索结果。 - **文档检索增强**：改进了 `vjdoc_get_document` 工具，支持 URL 和标题参数的部分匹配。 ## Installation | 安装 ```bash # Install globally | 全局安装 npm install -g @vjlanguage/mcp-vj-docs # Or use with npx | 或使用 npx npx @vjlanguage/mcp-vj-docs ``` ## Firecrawl Registration and API Key | Firecrawl 注册和 API 密钥 ### English This package uses Firecrawl service for web crawling. To use it, you need to: 1. **Register for Firecrawl**: - Visit [Firecrawl website](https://firecrawl.dev) and create an account - Or use the local Firecrawl service by setting `FIRECRAWL_API_URL` to your local endpoint 2. **Get your API Key**: - After registration, navigate to your account dashboard - Find and copy your API key - Add this key to your environment variables or MCP configuration 3. **Configure the API Key**: - Set the `FIRECRAWL_API_KEY` environment variable - Or add it to your MCP configuration (see example below) ### 中文本包使用 Firecrawl 服务进行网页爬取。要使用它，您需要： 1. **注册 Firecrawl**： - 访问 [Firecrawl 网站](https://firecrawl.dev) 并创建账户 - 或通过设置 `FIRECRAWL_API_URL` 为您的本地端点来使用本地 Firecrawl 服务 2. **获取您的 API 密钥**： - 注册后，导航到您的账户仪表板 - 找到并复制您的 API 密钥 - 将此密钥添加到您的环境变量或 MCP 配置中 3. **配置 API 密钥**： - 设置 `FIRECRAWL_API_KEY` 环境变量 - 或将其添加到您的 MCP 配置中（见下面的示例） ## Usage | 使用方法 ### Environment Variables | 环境变量 - `VJDOC_DB_PATH` - Path to the database file (default: ./data/docs.json) | 数据库文件路径（默认：./data/docs.json） - `VJDOC_MAX_DEPTH` - Maximum depth to crawl (default: 3) | 最大爬取深度（默认：3） - `VJDOC_MAX_PAGES` - Maximum number of pages to crawl (default: 100) | 最大爬取页面数（默认：100） - `VJDOC_LOG_DIR` - Directory for log files | 日志文件目录 - `VJDOC_LOG_TO_FILE` - Whether to log to file (true/false) | 是否记录到文件（true/false） - `VJDOC_LOG_LEVEL` - Log level (error, warn, info, debug) | 日志级别（error, warn, info, debug） - `FIRECRAWL_API_KEY` - API key for Firecrawl service | Firecrawl 服务的 API 密钥 - `FIRECRAWL_API_URL` - Custom URL for Firecrawl API | Firecrawl API 的自定义 URL - `MCP_TRANSPORT` - Transport method (sse or stdio, default: sse) | 传输方法（sse 或 stdio，默认：sse） - `VJDOC_TFIDF_FILES_DIR` - Directory for custom corpus files (default: ~/mcpdata/tfidf_files) | 自定义语料库文件目录（默认：~/mcpdata/tfidf_files） ```json { "mcpServers": { "mcp-vj-docs": { "command": "npx", "args": ["-y", "@vjlanguage/mcp-vj-docs@latest"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE", "VJDOC_MAX_DEPTH": "4", "VJDOC_MAX_PAGES": "100", "VJDOC_DB_PATH": "~/mcpdata/docs.json", "VJDOC_LOG_DIR": "~/mcpdata/logs", "VJDOC_LOG_TO_FILE": "true", "VJDOC_LOG_LEVEL": "debug", "FIRECRAWL_API_URL": "http://localhost:5002", "VJDOC_TFIDF_FILES_DIR": "~/mcpdata/tfidf_files" }, "disabled": false, "timeout": 3600, "autoApprove": ["vjdoc_search", "vjdoc_crawl", "vjdoc_add_corpus_file"] } } } ``` ## MCP Tools | MCP 工具 The server exposes the following MCP tools: 服务器暴露以下 MCP 工具： ### 1. `vjdoc_crawl` Tool | `vjdoc_crawl` 工具 Crawls a website and indexes its content for search. 爬取网站并为搜索索引其内容。 **Parameters | 参数:** - `url` (string, required): The URL to crawl (e.g., "https://example.com/docs") | 要爬取的 URL（例如，"https://example.com/docs"） - `maxDepth` (number, optional): Maximum depth to crawl, default: 3 | 最大爬取深度，默认：3 - `maxPages` (number, optional): Maximum number of pages to crawl, default: 100 | 最大爬取页面数，默认：100 - `includePatterns` (array of strings, optional): Patterns to include in crawl (e.g., ["docs/*"]) | 要包含在爬取中的模式（例如，["docs/*"]） - `excludePatterns` (array of strings, optional): Patterns to exclude from crawl (e.g., ["blog/*"]) | 要从爬取中排除的模式（例如，["blog/*"]） - `defaultCategory` (string, optional): Default category for documents if not detected automatically | 如果未自动检测到，文档的默认类别 **Example | 示例:** ```json { "url": "https://example.com/docs", "maxDepth": 3, "maxPages": 100, "includePatterns": ["docs/*"], "excludePatterns": ["blog/*"] } ``` **Response | 响应:** ```json { "success": true, "message": "Successfully crawled and indexed 42 pages from https://example.com/docs", "count": 42 } ``` ### 2. `vjdoc_search` Tool | `vjdoc_search` 工具 Searches indexed documents with results optimized for large language models. 搜索已索引的文档，结果经过优化，适合大型语言模型。 **Parameters | 参数:** - `query` (string, required): The search query (e.g., "how to use the API") | 搜索查询（例如，"如何使用 API"） - `limit` (number, optional): Maximum number of sources to consider, default: 10 | 要考虑的最大源数，默认：10 - `filters` (object, optional): Optional filters to narrow down search results | 可选过滤器，用于缩小搜索结果范围 - `categories` (array of strings, optional): Filter by document categories | 按文档类别过滤 - `dateFrom` (number, optional): Filter documents created after this timestamp | 过滤在此时间戳之后创建的文档 - `dateTo` (number, optional): Filter documents created before this timestamp | 过滤在此时间戳之前创建的文档 - `metadata` (object, optional): Filter by metadata fields | 按元数据字段过滤 - `userId` (string, optional): Optional user ID for personalized results | 可选的用户 ID，用于个性化结果 **Example | 示例:** ```json { "query": "how to use the API", "limit": 5, "filters": { "categories": ["API Documentation"] } } ``` **Response | 响应:** ```json { "success": true, "results": { "paragraph": "The API can be used by making HTTP requests to the endpoints...", "sources": [ { "url": "https://example.com/docs/api", "title": "API Documentation", "relevance": 0.85, "paragraph": "The API can be used by making HTTP requests to the endpoints...", "highlightedParagraph": "The **API** can be used by making **HTTP** requests to the **endpoints**...", "fullDocument": "Complete document content for this specific result..." } ] } } ``` ### 3. `vjdoc_add_corpus_file` Tool | `vjdoc_add_corpus_file` 工具 Adds a custom corpus file to the TF-IDF files directory for inclusion in search results. This is perfect for adding your own code snippets, documentation, error solutions, or technical notes that you want to be searchable. 向 TF-IDF 文件目录添加自定义语料库文件，以包含在搜索结果中。这非常适合添加您自己的代码片段、文档、错误解决方案或技术笔记，使它们可被搜索。 **Parameters | 参数:** - `content` (string, required): The text content to add to the corpus file | 要添加到语料库文件的文本内容 - `filename` (string, optional): Optional filename for the corpus file (without extension) | 语料库文件的可选文件名（不带扩展名） - `category` (string, optional): Optional category for the corpus file | 语料库文件的可选类别 **Recommended Categories | 推荐类别:** - `Code Snippet` - Reusable code patterns and examples | 可重用的代码模式和示例 - `API Documentation` - Function and parameter descriptions | 函数和参数描述 - `Error Solution` - Common errors and their fixes | 常见错误及其修复方法 - `Technical Note` - Personal learning summaries | 个人学习总结 **Example | 示例:** ```json { "content": "// 快速排序实现\nfunction quickSort(arr) {\n if (arr.length <= 1) return arr;\n const pivot = arr[0];\n const left = []; \n const right = [];\n for (let i = 1; i < arr.length; i++) {\n arr[i] < pivot ? left.push(arr[i]) : right.push(arr[i]);\n }\n return [...quickSort(left), pivot, ...quickSort(right)];\n}\n\n// 常见错误：Uncaught TypeError\n// 解决方案：检查变量是否为null/undefined", "filename": "quicksort_algorithm", "category": "Code Snippet" } ``` **Response | 响应:** ```json { "success": true, "message": "Successfully added corpus file: code_snippet_quicksort_algorithm.txt", "filename": "code_snippet_quicksort_algorithm.txt", "category": "Code Snippet" } ``` ### 4. `vjdoc_get_docs_meta` Tool | `vjdoc_get_docs_meta` 工具 Retrieves metadata about all documents and corpus files to help LLMs understand the available content and plan effective searches. 获取所有文档和语料库文件的元数据，帮助大型语言模型了解可用内容并规划有效的搜索。 **Parameters | 参数:** - `query` (string, required): Natural language query or requirement | 自然语言查询或需求 **Response Format | 响应格式:** ```json { "query": "Original natural language query", "documents": [ { "url": "Document URL", "title": "Document title", "category": "Document category", "timestamp": 1712190000000, "keywords": ["keyword1", "keyword2", "..."], "summary": "Brief summary of document content..." } ], "totalDocuments": 42, "categories": ["API Documentation", "Code Snippet", "..."], "suggestion": "Search guidance for LLMs" } ``` ### 5. `vjdoc_get_document` Tool | `vjdoc_get_document` 工具 Gets the full content of a specific document by URL or title. 通过 URL 或标题获取特定文档的完整内容。 **Parameters | 参数:** - `url` (string, optional): URL of the document to retrieve | 要检索的文档的 URL - `title` (string, optional): Title of the document to retrieve | 要检索的文档的标题 **Notes | 注意:** - At least one of `url` or `title` must be provided | 必须提供 `url` 或 `title` 中的至少一个 - The tool supports partial matching for both parameters | 该工具支持两个参数的部分匹配 - When using `url` parameter, it will find documents where the URL contains the provided string | 使用 `url` 参数时，它将查找 URL 包含所提供字符串的文档 - When using `title` parameter, it will find documents where the title contains the provided string (case-insensitive) | 使用 `title` 参数时，它将查找标题包含所提供字符串的文档（不区分大小写） **Example | 示例:** ```json { "url": "https://example.com/docs/auth" } ``` or | 或 ```json { "title": "Authentication Guide" } ``` **Response | 响应:** ```json { "url": "https://example.com/docs/auth", "title": "Authentication Guide", "content": "Complete document content...", "metadata": { "category": "API Documentation", "lastModified": "2023-01-15T12:00:00Z" } } ``` ## Using with AI Coding Assistants | 与 AI 编码助手一起使用 You can use these MCP tools with various AI coding assistants to enhance your documentation workflow. 您可以在各种 AI 编码助手中使用这些 MCP 工具来增强您的文档工作流程。 ### Using with Cursor | 在 Cursor 中使用 In Cursor, you can use the MCP tools through the command interface: 在 Cursor 中，您可以通过命令界面使用 MCP 工具： 1. **Setup | 设置**: Configure Cursor to use your MCP server | 配置 Cursor 使用您的 MCP 服务器 2. **Crawling | 爬取**: Use the `/mcp` command to invoke the crawl tool | 使用 `/mcp` 命令调用 crawl 工具 ``` /mcp mcp-vj-docs vjdoc_crawl {"url": "https://example.com/docs", "maxDepth": 3, "maxPages": 100} ``` 3. **Searching | 搜索**: Use the `/mcp` command to invoke the search tool | 使用 `/mcp` 命令调用 search 工具 ``` /mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5, "filters": {"categories": ["API Documentation"]}} ``` 4. **Adding Corpus Files | 添加语料库文件**: Use the `/mcp` command to add custom corpus files | 使用 `/mcp` 命令添加自定义语料库文件 ``` /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"} ``` 5. **Getting Document Content | 获取文档内容**: Use the `/mcp` command to get full document content | 使用 `/mcp` 命令获取完整文档内容 ``` /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth"} ``` or | 或 ``` /mcp mcp-vj-docs vjdoc_get_document {"title": "Authentication Guide"} ``` ### Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程 When working with AI assistants like Claude or GPT, you can create a more effective workflow: 1. **First, get document metadata** to understand what's available: ``` /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"} ``` 2. **Then, search for relevant documents**: ``` /mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3} ``` 3. **Finally, get the full content** of the most relevant document for comprehensive context: ``` /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"} ``` 4. **Ask the AI assistant** to explain or generate code based on the full document: ``` Based on this documentation, please explain how to implement JWT authentication in my Node.js application. ``` This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents. 当与 Claude 或 GPT 等 AI 助手一起工作时，您可以创建更有效的工作流程： 1. **首先，获取文档元数据**以了解有哪些可用内容： ``` /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"} ``` 2. **然后，搜索相关文档**： ``` /mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3} ``` 3. **最后，获取最相关文档的完整内容**以获得全面的上下文： ``` /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"} ``` 4. **请求 AI 助手**基于完整文档解释或生成代码： ``` 根据这份文档，请解释如何在我的 Node.js 应用程序中实现 JWT 认证。 ``` 这个工作流程确保 AI 拥有完整的上下文，同时通过仅检索最相关文档的完整内容来最小化令牌使用。 ## Troubleshooting | 故障排除 ### Common Issues | 常见问题 1. **Database Path Issues | 数据库路径问题** - Ensure the directory for your database exists | 确保您的数据库目录存在 - Check if you have write permissions to the specified path | 检查您是否有写入指定路径的权限 - For tilde paths, ensure your home directory is correctly detected | 对于波浪号路径，确保正确检测到您的主目录 2. **Firecrawl API Issues | Firecrawl API 问题** - Verify your API key is correct | 验证您的 API 密钥是否正确 - Check if you've reached API rate limits | 检查您是否达到了 API 速率限制 - If using a local Firecrawl service, ensure it's running | 如果使用本地 Firecrawl 服务，确保它正在运行 3. **Crawling Issues | 爬取问题** - Some websites may block crawlers | 某些网站可能会阻止爬虫 - Check if the website requires authentication | 检查网站是否需要身份验证 - Try reducing the crawl depth and page limit | 尝试减少爬取深度和页面限制 ### Logs | 日志 Check the logs for more detailed error information: 查看日志以获取更详细的错误信息： - If `VJDOC_LOG_TO_FILE` is enabled, check the log files in your log directory | 如果启用了 `VJDOC_LOG_TO_FILE`，请检查日志目录中的日志文件 - Otherwise, check the console output | 否则，检查控制台输出 ## Search Tool Response Format | 搜索工具响应格式 The `vjdoc_search` tool returns results in the following format: ```json { "results": [ { "url": "https://example.com/docs/api", "title": "API Documentation", "relevance": 0.85, "category": "API Documentation", "paragraph": "Content excerpt most relevant to this document...", "highlightedParagraph": "Content with **highlighted** query terms for this document...", "fullDocument": "Complete content for this specific document..." // Only present for the most relevant result }, { "url": "https://example.com/docs/guide", "title": "User Guide", "relevance": 0.75, "category": "Documentation", "paragraph": "Content excerpt most relevant to this document...", "highlightedParagraph": "Content with **highlighted** query terms for this document..." // No fullDocument field for lower-ranked results }, // More results... ], "content": "Summary of content most relevant to the query...", "fullDocument": "Complete document of the most relevant result", "personalized": true } ``` Key fields: - `results`: 带有相关性分数的来源列表 - 每个结果包括： - `url`: 文档 URL - `title`: 文档标题 - `relevance`: 相关性分数 - `category`: 文档类别 - `paragraph`: 来自此特定文档的相关段落摘录 - `highlightedParagraph`: 带有高亮查询词的此文档段落 - `fullDocument`: 完整的文档内容（仅适用于最相关的结果） - `content`: 与查询相关的提取内容摘要 - `fullDocument`: 最相关结果的完整文档内容 - `personalized`: 结果是否基于用户 ID 进行了个性化 ### 搜索工具响应格式 `vjdoc_search` 工具返回以下格式的结果： ```json { "results": [ { "url": "https://example.com/docs/api", "title": "API Documentation", "relevance": 0.85, "category": "API Documentation", "paragraph": "与此文档最相关的内容段落...", "highlightedParagraph": "带有**高亮**查询词的此文档段落...", "fullDocument": "此特定文档的完整内容..." // 只有最相关的结果才包含此字段 }, { "url": "https://example.com/docs/guide", "title": "User Guide", "relevance": 0.75, "category": "Documentation", "paragraph": "与此文档最相关的内容段落...", "highlightedParagraph": "带有**高亮**查询词的此文档段落..." // 较低排名的结果没有 fullDocument 字段 }, // 更多结果... ], "content": "与查询最相关的摘要内容...", "fullDocument": "最相关结果的完整文档内容", "personalized": true } ``` 关键字段： - `results`: 带有相关性分数的来源列表 - 每个结果包括： - `url`: 文档 URL - `title`: 文档标题 - `relevance`: 相关性分数 - `category`: 文档类别 - `paragraph`: 来自此特定文档的相关段落摘录 - `highlightedParagraph`: 带有高亮查询词的此文档段落 - `fullDocument`: 完整的文档内容（仅适用于最相关的结果） - `content`: 与查询相关的提取内容摘要 - `fullDocument`: 最相关结果的完整文档内容 - `personalized`: 结果是否基于用户 ID 进行了个性化 ## Examples | 示例 ### Searching Across Database and Corpus | 在数据库和语料库中搜索 ``` /mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5} ``` This will search for "authentication" in both the crawled documents (database) and your custom corpus files. ### Using Natural Language Queries | 使用自然语言查询 For natural language requirements, you can use the metadata tool first: ``` /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement user authentication in my React application"} ``` Then use the search tool with the refined query: ``` /mcp mcp-vj-docs vjdoc_search {"query": "React authentication implementation", "filters": {"categories": ["Code Snippet", "API Documentation"]}} ``` ### Utilizing the fullDocument Field | 利用 fullDocument 字段 When working with LLMs, you can use the `fullDocument` field to provide comprehensive context: ```javascript // 使用 fullDocument 字段与 LLM 的示例 const searchResults = await searchDocs("如何实现 JWT 认证"); const fullContext = searchResults.fullDocument; // 现在您可以要求 LLM 基于完整文档生成代码 const generatedCode = await llm.generateCode( `基于此文档: ${fullContext}\n\n生成一个 JWT 认证实现` ); ``` ``` ## Real-World Use Cases | 实际使用场景 ### Personal Knowledge Base | 个人知识库 - Save code snippets you frequently use for easy reference | 保存您经常使用的代码片段以便于参考 - Document API endpoints with examples | 使用示例记录 API 端点 - Keep track of error messages and their solutions | 跟踪错误消息及其解决方案 - Store configuration examples for different environments | 存储不同环境的配置示例 - Create a personal knowledge base of technical notes | 创建技术笔记的个人知识库 **Pro Tip | 专业提示:** Organize your corpus files with consistent categories to make searching more effective. You can then filter search results by category to find exactly what you need! 使用一致的类别组织您的语料库文件，使搜索更有效。然后，您可以按类别过滤搜索结果，以找到您需要的确切内容！ ## PDF Support | PDF 支持 The system now supports adding PDF files to the corpus. PDFs are automatically converted to Markdown format for better searchability. | 系统现在支持将PDF文件添加到语料库。PDF会自动转换为Markdown格式以提高可搜索性。 **Adding a PDF file in Cline | 在Cline中添加PDF文件**: Simply provide the absolute path to your PDF file: ```bash cline mcp mcp-vj-docs vjdoc_add_corpus_file --filePath "/absolute/path/to/your/document.pdf" --category "Documentation" ``` **Adding a PDF file in Cursor | 在Cursor中添加PDF文件**: Simply provide the absolute path to your PDF file: ``` /mcp mcp-vj-docs vjdoc_add_corpus_file {"filePath": "/absolute/path/to/your/document.pdf", "category": "Documentation"} ``` The system extracts text from the PDF and converts it to Markdown format, preserving structure like headings, code blocks, and lists where possible. | 系统从PDF中提取文本并将其转换为Markdown格式，尽可能保留标题、代码块和列表等结构。 ## How It Works | 工作原理 1. When you add a corpus file, it's saved to the `VJDOC_TFIDF_FILES_DIR` directory | 当您添加语料库文件时，它会保存到 `VJDOC_TFIDF_FILES_DIR` 目录 2. If you don't specify a filename, one will be generated automatically | 如果您不指定文件名，将自动生成一个 3. The category will be added as a prefix to the filename | 类别将作为前缀添加到文件名中 4. The file is automatically indexed and will appear in search results | 文件会自动索引并出现在搜索结果中 5. You can search for this content later using the `vjdoc_search` tool | 您可以稍后使用 `vjdoc_search` 工具搜索此内容 ## Practical Workflow Examples | 实用工作流程示例 Here are some practical workflows combining these tools: 以下是结合这些工具的一些实用工作流程： 1. **Documentation Indexing | 文档索引** - Crawl your project documentation: | 爬取您的项目文档： ``` /mcp mcp-vj-docs vjdoc_crawl {"url": "https://your-project-docs.com"} ``` - Add custom code snippets: | 添加自定义代码片段： ``` /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"} ``` - Search across all indexed content: | 搜索所有已索引内容： ``` /mcp mcp-vj-docs vjdoc_search {"query": "how to implement feature X"} ``` 2. **Personal Knowledge Base | 个人知识库** - Add error solutions as you encounter them: | 添加您遇到的错误解决方案： ``` /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "Error: Module not found\nSolution: Run npm install", "category": "Error Solution"} ``` - Add API documentation for your projects: | 为您的项目添加 API 文档： ``` /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "function getData(id) - Retrieves data by ID from the API", "category": "API Documentation"} ``` - Search your knowledge base when needed: | 在需要时搜索您的知识库： ``` /mcp mcp-vj-docs vjdoc_search {"query": "module not found", "filters": {"categories": ["Error Solution"]}} ``` ## Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程 When working with AI assistants like Claude or GPT, you can create a more effective workflow: 1. **First, get document metadata** to understand what's available: ``` /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"} ``` 2. **Then, search for relevant documents**: ``` /mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3} ``` 3. **Finally, get the full content** of the most relevant document for comprehensive context: ``` /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"} ``` 4. **Ask the AI assistant** to explain or generate code based on the full document: ``` Based on this documentation, please explain how to implement JWT authentication in my Node.js application. ``` This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents. 当与 Claude 或 GPT 等 AI 助手一起工作时，您可以创建更有效的工作流程： 1. **首先，获取文档元数据**以了解有哪些可用内容： ``` /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"} ``` 2. **然后，搜索相关文档**： ``` /mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3} ``` 3. **最后，获取最相关文档的完整内容**以获得全面的上下文： ``` /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"} ``` 4. **请求 AI 助手**基于完整文档解释或生成代码： ``` 根据这份文档，请解释如何在我的 Node.js 应用程序中实现 JWT 认证。 ``` 这个工作流程确保 AI 拥有完整的上下文，同时通过仅检索最相关文档的完整内容来最小化令牌使用。