uns-mcp-server
Version:
Pure JavaScript MCP server for Unstructured.io - No Python required!
329 lines (254 loc) • 7.25 kB
Markdown
# Unstructured MCP Server for Claude Code
Universal NPX-executable MCP server for document processing with Unstructured.io API. Process PDFs, Word documents, HTML, images, and more directly from Claude Code and other AI clients.
## Features
- 📄 **Multi-format Support**: PDF, DOCX, HTML, images (with OCR), and more
- 🚀 **NPX Executable**: No local installation required
- 🤖 **Claude Code Integration**: Dedicated document-processor agent
- ☁️ **Cloud Storage**: S3, Azure, Google Drive, SharePoint, OneDrive
- 🗄️ **Vector Databases**: Weaviate, Pinecone, MongoDB, Neo4j, AstraDB
- 🔄 **Workflow Automation**: Create processing pipelines
- 🕷️ **Web Crawling**: Firecrawl integration for web content
## Quick Start
### 1. Run with NPX (No Installation)
```bash
# Run in STDIO mode (for Claude Desktop)
npx uns-mcp-server
# Run in SSE mode (for development/testing)
npx uns-mcp-server --sse --port 8080
# With API key
npx uns-mcp-server --api-key YOUR_KEY_HERE
```
### 2. Add to Claude Desktop
Add to your Claude Desktop configuration:
```json
{
"mcpServers": {
"uns-mcp": {
"command": "npx",
"args": ["uns-mcp-server"],
"env": {
"UNSTRUCTURED_API_KEY": "your_key_here"
}
}
}
}
```
Or use Claude's CLI:
```bash
claude mcp add uns-mcp npx uns-mcp-server
```
### 3. Use in Claude Code
```javascript
// Process documents with the dedicated agent
Task("Document Processor",
"Extract data from PDF invoices in S3 bucket",
"document-processor"
);
```
## Installation Options
### Global Install (Optional)
```bash
npm install -g uns-mcp-server
uns-mcp --help
```
### Local Project
```bash
npm install uns-mcp-server
npx uns-mcp-server
```
## Configuration
### Environment Variables
Create a `.env` file in your project root:
```env
# Required
UNSTRUCTURED_API_KEY=your_unstructured_api_key
# Optional Connectors
AWS_KEY=your_aws_key
AWS_SECRET=your_aws_secret
AZURE_CONNECTION_STRING=your_azure_connection
MONGO_DB_CONNECTION_STRING=your_mongodb_connection
# ... see .env.example for all options
```
### Command Line Options
```bash
npx uns-mcp-server [options]
Options:
-m, --mode Server mode (stdio or sse) [default: "stdio"]
-p, --port Port for SSE mode [default: 8080]
-k, --api-key Unstructured API key [string]
-c, --config Path to config file [string]
-d, --debug Enable debug mode [boolean]
-h, --help Show help [boolean]
```
## Available Tools
### Document Management
- `list_sources` - List available document sources
- `create_source_connector` - Set up input sources (S3, Azure, etc.)
- `list_destinations` - List output destinations
- `create_destination_connector` - Set up outputs (Vector DBs, etc.)
### Workflow Processing
- `list_workflows` - View existing workflows
- `create_workflow` - Create processing pipelines
- `run_workflow` - Execute workflows
- `list_jobs` - Monitor processing jobs
- `get_job_info` - Get job details
### Web Crawling (Firecrawl)
- `invoke_firecrawl_crawlhtml` - Crawl websites for HTML
- `invoke_firecrawl_llmtxt` - Generate LLM-optimized text
## Examples
### Basic Document Processing
```javascript
// Create S3 source
const source = await create_source_connector({
name: "documents-bucket",
type: "s3",
config: {
bucket: "my-documents",
aws_key: process.env.AWS_KEY,
aws_secret: process.env.AWS_SECRET
}
});
// Create vector database destination
const destination = await create_destination_connector({
name: "vector-store",
type: "weaviate",
config: {
collection: "documents",
api_key: process.env.WEAVIATE_API_KEY
}
});
// Create and run workflow
const workflow = await create_workflow({
name: "document-pipeline",
source_id: source.id,
destination_id: destination.id,
settings: {
ocr_enabled: true,
extract_tables: true
}
});
await run_workflow(workflow.id);
```
### Process Web Content
```javascript
// Crawl website
const crawlJob = await invoke_firecrawl_crawlhtml({
url: "https://docs.example.com",
max_depth: 2
});
// Generate LLM-ready text
const textJob = await invoke_firecrawl_llmtxt({
crawl_job_id: crawlJob.id,
format: "markdown"
});
```
### Claude Code Agent Usage
```javascript
// Use the document-processor agent
Task("Document Processor",
`Process all contracts in Azure storage:
1. Extract parties, dates, and terms
2. Identify risk clauses
3. Store in MongoDB for analysis`,
"document-processor"
);
```
## Supported Connectors
### Sources
- **S3**: Amazon S3 buckets
- **Azure**: Azure Blob Storage
- **Google Drive**: Google Drive folders
- **OneDrive**: Microsoft OneDrive
- **SharePoint**: SharePoint sites
- **Salesforce**: Salesforce documents
- **Databricks**: Databricks Volumes
### Destinations
- **S3**: Output to S3
- **Weaviate**: Vector database
- **Pinecone**: Vector search
- **MongoDB**: Document database
- **Neo4j**: Graph database
- **AstraDB**: Cassandra-based DB
- **Databricks Delta**: Delta tables
## Document Processor Agent
The package includes a specialized Claude Code agent for document processing tasks:
```bash
# View agent documentation
cat node_modules/uns-mcp-server/agents/document-processor.md
```
Key capabilities:
- Batch document processing
- OCR for scanned documents
- Table and form extraction
- Entity recognition
- Multi-format conversion
- Workflow orchestration
## API Key
Get your Unstructured API key:
1. Sign up at [https://unstructured.io](https://unstructured.io)
2. Navigate to API Keys section
3. Generate a new key
4. Add to your `.env` file
## Troubleshooting
### Python Not Found
```bash
# Install Python 3.8+
# macOS
brew install python@3.11
# Ubuntu/Debian
sudo apt-get install python3 python3-pip
# Windows
# Download from python.org
```
### Module Not Found
```bash
# Install Python dependencies
pip install uns_mcp
```
### API Key Issues
```bash
# Check if key is set
echo $UNSTRUCTURED_API_KEY
# Set for current session
export UNSTRUCTURED_API_KEY=your_key_here
```
## Development
### Local Development
```bash
# Install from npm
npm install uns-mcp-server
# Install dependencies
npm install
pip install uns_mcp
# Run in development mode
npm start -- --debug --sse
```
### Testing
```bash
# Run tests
npm test
# Test with MCP Inspector
npx @modelcontextprotocol/inspector uns-mcp-server
```
## License
MIT License
## Support
- **NPM Package**: [npmjs.com/package/uns-mcp-server](https://www.npmjs.com/package/uns-mcp-server)
- **Unstructured.io**: [Official Documentation](https://unstructured.io/docs)
- **Contact**: For support, please contact the package author through npm
## Changelog
### v1.0.2
- Fixed Python installation on macOS with --break-system-packages support
- Improved error handling for pip installation
- Removed GitHub repository references (private repo)
- Fixed SSE mode flag issue
### v1.0.1
- Python installation improvements
### v1.0.0
- Initial release with NPX support
- Document processor agent for Claude Code
- Support for major cloud storage providers
- Vector database integrations
- Firecrawl web crawling support
---
Made with ❤️ for the Claude Code community