UNPKG

uns-mcp-server

Version:

Pure JavaScript MCP server for Unstructured.io - No Python required!

194 lines (140 loc) 4.52 kB
# Unstructured MCP Server Pure JavaScript MCP server for document processing with Unstructured.io API. Process PDFs, Word documents, HTML, images, and more directly from Claude Desktop and other AI clients - no Python required! ## Features - 📄 **Multi-format Support**: PDF, DOCX, HTML, images (with OCR), and more - 🚀 **NPX Executable**: No local installation required - 🤖 **Claude Desktop Integration**: Works seamlessly with Claude -**Pure JavaScript**: No Python dependencies - 🔍 **OCR Support**: Extract text from scanned documents - 📊 **Table Extraction**: Extract and convert tables - 📝 **Multiple Output Formats**: JSON, text, markdown ## Quick Start ### 1. Get API Key Sign up at [https://unstructuredapp.io](https://unstructuredapp.io) to get your API key. ### 2. Run with NPX (No Installation) ```bash # Run directly with NPX UNSTRUCTURED_API_KEY=your_key_here npx uns-mcp-server ``` ### 3. Add to Claude Desktop Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS): ```json { "mcpServers": { "uns-mcp": { "command": "npx", "args": ["uns-mcp-server"], "env": { "UNSTRUCTURED_API_KEY": "your_key_here" } } } } ``` ## Installation Options ### Global Install ```bash npm install -g uns-mcp-server uns-mcp-server ``` ### Local Project ```bash npm install uns-mcp-server npx uns-mcp-server ``` ## Available Tools The MCP server provides the following tools: ### Document Processing - `process_document` - Process any document with OCR and formatting preservation - `extract_text` - Extract plain text from documents - `extract_tables` - Extract tables in JSON, CSV, or HTML format ### Connectors - `list_sources` - List configured document sources - `create_source_connector` - Create input source (S3, Azure, local, etc.) ## Usage Examples ### Process a PDF ```javascript // In Claude Desktop, use the tool directly await process_document({ file_path: "/path/to/document.pdf", strategy: "hi_res", ocr_enabled: true, output_format: "json" }); ``` ### Extract Text ```javascript await extract_text({ file_path: "/path/to/document.pdf", include_metadata: true }); ``` ### Extract Tables ```javascript await extract_tables({ file_path: "/path/to/spreadsheet.xlsx", format: "csv" }); ``` ## Supported Formats - **Documents**: PDF, DOCX, DOC, ODT, RTF, TXT - **Images**: PNG, JPG, JPEG, TIFF, BMP (with OCR) - **Web**: HTML, XML - **Spreadsheets**: XLSX, XLS, CSV - **Presentations**: PPTX, PPT - **Email**: EML, MSG ## Processing Strategies - `auto` - Automatically select the best strategy - `hi_res` - High resolution processing with layout preservation - `ocr_only` - Focus on OCR for scanned documents - `fast` - Quick processing for simple documents ## Environment Variables - `UNSTRUCTURED_API_KEY` - Your Unstructured.io API key (required) - `UNSTRUCTURED_API_URL` - Custom API endpoint (optional) - `LOG_LEVEL` - Logging level: ERROR, WARN, INFO, DEBUG (default: ERROR) ## Testing Test your installation: ```bash # Create a test file echo "Hello World" > test.txt # Process it npx uns-mcp-server test.txt ``` ## Troubleshooting ### API Key Issues ```bash # Verify your API key is set echo $UNSTRUCTURED_API_KEY # Set it for current session export UNSTRUCTURED_API_KEY=your_key_here ``` ### Connection Issues - Ensure you have internet connectivity - Verify the API key is valid - Check if you're behind a corporate firewall ## Changelog ### v2.0.2 - Updated GitHub repository to CG-Labs organization - Documentation improvements ### v2.0.1 - Fixed API endpoint to use correct domain - Improved error handling - Better connection stability ### v2.0.0 - Complete rewrite in pure JavaScript - Removed all Python dependencies - Direct API integration - Improved performance ### v1.0.x - Initial release with Python bridge ## License MIT License ## Support - **Issues**: [GitHub Issues](https://github.com/CG-Labs/Unstructured-Document-Processor-MCP/issues) - **GitHub**: [github.com/CG-Labs/Unstructured-Document-Processor-MCP](https://github.com/CG-Labs/Unstructured-Document-Processor-MCP) - **NPM**: [npmjs.com/package/uns-mcp-server](https://www.npmjs.com/package/uns-mcp-server) - **Unstructured.io**: [Documentation](https://unstructured-io.github.io/unstructured/) ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. --- Made with ❤️ for the Claude Desktop community