UNPKG

uns-mcp-server

Version:

Pure JavaScript MCP server for Unstructured.io - No Python required!

329 lines (254 loc) 7.25 kB
# Unstructured MCP Server for Claude Code Universal NPX-executable MCP server for document processing with Unstructured.io API. Process PDFs, Word documents, HTML, images, and more directly from Claude Code and other AI clients. ## Features - 📄 **Multi-format Support**: PDF, DOCX, HTML, images (with OCR), and more - 🚀 **NPX Executable**: No local installation required - 🤖 **Claude Code Integration**: Dedicated document-processor agent - ☁️ **Cloud Storage**: S3, Azure, Google Drive, SharePoint, OneDrive - 🗄️ **Vector Databases**: Weaviate, Pinecone, MongoDB, Neo4j, AstraDB - 🔄 **Workflow Automation**: Create processing pipelines - 🕷️ **Web Crawling**: Firecrawl integration for web content ## Quick Start ### 1. Run with NPX (No Installation) ```bash # Run in STDIO mode (for Claude Desktop) npx uns-mcp-server # Run in SSE mode (for development/testing) npx uns-mcp-server --sse --port 8080 # With API key npx uns-mcp-server --api-key YOUR_KEY_HERE ``` ### 2. Add to Claude Desktop Add to your Claude Desktop configuration: ```json { "mcpServers": { "uns-mcp": { "command": "npx", "args": ["uns-mcp-server"], "env": { "UNSTRUCTURED_API_KEY": "your_key_here" } } } } ``` Or use Claude's CLI: ```bash claude mcp add uns-mcp npx uns-mcp-server ``` ### 3. Use in Claude Code ```javascript // Process documents with the dedicated agent Task("Document Processor", "Extract data from PDF invoices in S3 bucket", "document-processor" ); ``` ## Installation Options ### Global Install (Optional) ```bash npm install -g uns-mcp-server uns-mcp --help ``` ### Local Project ```bash npm install uns-mcp-server npx uns-mcp-server ``` ## Configuration ### Environment Variables Create a `.env` file in your project root: ```env # Required UNSTRUCTURED_API_KEY=your_unstructured_api_key # Optional Connectors AWS_KEY=your_aws_key AWS_SECRET=your_aws_secret AZURE_CONNECTION_STRING=your_azure_connection MONGO_DB_CONNECTION_STRING=your_mongodb_connection # ... see .env.example for all options ``` ### Command Line Options ```bash npx uns-mcp-server [options] Options: -m, --mode Server mode (stdio or sse) [default: "stdio"] -p, --port Port for SSE mode [default: 8080] -k, --api-key Unstructured API key [string] -c, --config Path to config file [string] -d, --debug Enable debug mode [boolean] -h, --help Show help [boolean] ``` ## Available Tools ### Document Management - `list_sources` - List available document sources - `create_source_connector` - Set up input sources (S3, Azure, etc.) - `list_destinations` - List output destinations - `create_destination_connector` - Set up outputs (Vector DBs, etc.) ### Workflow Processing - `list_workflows` - View existing workflows - `create_workflow` - Create processing pipelines - `run_workflow` - Execute workflows - `list_jobs` - Monitor processing jobs - `get_job_info` - Get job details ### Web Crawling (Firecrawl) - `invoke_firecrawl_crawlhtml` - Crawl websites for HTML - `invoke_firecrawl_llmtxt` - Generate LLM-optimized text ## Examples ### Basic Document Processing ```javascript // Create S3 source const source = await create_source_connector({ name: "documents-bucket", type: "s3", config: { bucket: "my-documents", aws_key: process.env.AWS_KEY, aws_secret: process.env.AWS_SECRET } }); // Create vector database destination const destination = await create_destination_connector({ name: "vector-store", type: "weaviate", config: { collection: "documents", api_key: process.env.WEAVIATE_API_KEY } }); // Create and run workflow const workflow = await create_workflow({ name: "document-pipeline", source_id: source.id, destination_id: destination.id, settings: { ocr_enabled: true, extract_tables: true } }); await run_workflow(workflow.id); ``` ### Process Web Content ```javascript // Crawl website const crawlJob = await invoke_firecrawl_crawlhtml({ url: "https://docs.example.com", max_depth: 2 }); // Generate LLM-ready text const textJob = await invoke_firecrawl_llmtxt({ crawl_job_id: crawlJob.id, format: "markdown" }); ``` ### Claude Code Agent Usage ```javascript // Use the document-processor agent Task("Document Processor", `Process all contracts in Azure storage: 1. Extract parties, dates, and terms 2. Identify risk clauses 3. Store in MongoDB for analysis`, "document-processor" ); ``` ## Supported Connectors ### Sources - **S3**: Amazon S3 buckets - **Azure**: Azure Blob Storage - **Google Drive**: Google Drive folders - **OneDrive**: Microsoft OneDrive - **SharePoint**: SharePoint sites - **Salesforce**: Salesforce documents - **Databricks**: Databricks Volumes ### Destinations - **S3**: Output to S3 - **Weaviate**: Vector database - **Pinecone**: Vector search - **MongoDB**: Document database - **Neo4j**: Graph database - **AstraDB**: Cassandra-based DB - **Databricks Delta**: Delta tables ## Document Processor Agent The package includes a specialized Claude Code agent for document processing tasks: ```bash # View agent documentation cat node_modules/uns-mcp-server/agents/document-processor.md ``` Key capabilities: - Batch document processing - OCR for scanned documents - Table and form extraction - Entity recognition - Multi-format conversion - Workflow orchestration ## API Key Get your Unstructured API key: 1. Sign up at [https://unstructured.io](https://unstructured.io) 2. Navigate to API Keys section 3. Generate a new key 4. Add to your `.env` file ## Troubleshooting ### Python Not Found ```bash # Install Python 3.8+ # macOS brew install python@3.11 # Ubuntu/Debian sudo apt-get install python3 python3-pip # Windows # Download from python.org ``` ### Module Not Found ```bash # Install Python dependencies pip install uns_mcp ``` ### API Key Issues ```bash # Check if key is set echo $UNSTRUCTURED_API_KEY # Set for current session export UNSTRUCTURED_API_KEY=your_key_here ``` ## Development ### Local Development ```bash # Install from npm npm install uns-mcp-server # Install dependencies npm install pip install uns_mcp # Run in development mode npm start -- --debug --sse ``` ### Testing ```bash # Run tests npm test # Test with MCP Inspector npx @modelcontextprotocol/inspector uns-mcp-server ``` ## License MIT License ## Support - **NPM Package**: [npmjs.com/package/uns-mcp-server](https://www.npmjs.com/package/uns-mcp-server) - **Unstructured.io**: [Official Documentation](https://unstructured.io/docs) - **Contact**: For support, please contact the package author through npm ## Changelog ### v1.0.2 - Fixed Python installation on macOS with --break-system-packages support - Improved error handling for pip installation - Removed GitHub repository references (private repo) - Fixed SSE mode flag issue ### v1.0.1 - Python installation improvements ### v1.0.0 - Initial release with NPX support - Document processor agent for Claude Code - Support for major cloud storage providers - Vector database integrations - Firecrawl web crawling support --- Made with ❤️ for the Claude Code community