uns-mcp-server
Version:
Pure JavaScript MCP server for Unstructured.io - No Python required!
194 lines (140 loc) • 4.52 kB
Markdown
# Unstructured MCP Server
Pure JavaScript MCP server for document processing with Unstructured.io API. Process PDFs, Word documents, HTML, images, and more directly from Claude Desktop and other AI clients - no Python required!
## Features
- 📄 **Multi-format Support**: PDF, DOCX, HTML, images (with OCR), and more
- 🚀 **NPX Executable**: No local installation required
- 🤖 **Claude Desktop Integration**: Works seamlessly with Claude
- ⚡ **Pure JavaScript**: No Python dependencies
- 🔍 **OCR Support**: Extract text from scanned documents
- 📊 **Table Extraction**: Extract and convert tables
- 📝 **Multiple Output Formats**: JSON, text, markdown
## Quick Start
### 1. Get API Key
Sign up at [https://unstructuredapp.io](https://unstructuredapp.io) to get your API key.
### 2. Run with NPX (No Installation)
```bash
# Run directly with NPX
UNSTRUCTURED_API_KEY=your_key_here npx uns-mcp-server
```
### 3. Add to Claude Desktop
Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"uns-mcp": {
"command": "npx",
"args": ["uns-mcp-server"],
"env": {
"UNSTRUCTURED_API_KEY": "your_key_here"
}
}
}
}
```
## Installation Options
### Global Install
```bash
npm install -g uns-mcp-server
uns-mcp-server
```
### Local Project
```bash
npm install uns-mcp-server
npx uns-mcp-server
```
## Available Tools
The MCP server provides the following tools:
### Document Processing
- `process_document` - Process any document with OCR and formatting preservation
- `extract_text` - Extract plain text from documents
- `extract_tables` - Extract tables in JSON, CSV, or HTML format
### Connectors
- `list_sources` - List configured document sources
- `create_source_connector` - Create input source (S3, Azure, local, etc.)
## Usage Examples
### Process a PDF
```javascript
// In Claude Desktop, use the tool directly
await process_document({
file_path: "/path/to/document.pdf",
strategy: "hi_res",
ocr_enabled: true,
output_format: "json"
});
```
### Extract Text
```javascript
await extract_text({
file_path: "/path/to/document.pdf",
include_metadata: true
});
```
### Extract Tables
```javascript
await extract_tables({
file_path: "/path/to/spreadsheet.xlsx",
format: "csv"
});
```
## Supported Formats
- **Documents**: PDF, DOCX, DOC, ODT, RTF, TXT
- **Images**: PNG, JPG, JPEG, TIFF, BMP (with OCR)
- **Web**: HTML, XML
- **Spreadsheets**: XLSX, XLS, CSV
- **Presentations**: PPTX, PPT
- **Email**: EML, MSG
## Processing Strategies
- `auto` - Automatically select the best strategy
- `hi_res` - High resolution processing with layout preservation
- `ocr_only` - Focus on OCR for scanned documents
- `fast` - Quick processing for simple documents
## Environment Variables
- `UNSTRUCTURED_API_KEY` - Your Unstructured.io API key (required)
- `UNSTRUCTURED_API_URL` - Custom API endpoint (optional)
- `LOG_LEVEL` - Logging level: ERROR, WARN, INFO, DEBUG (default: ERROR)
## Testing
Test your installation:
```bash
# Create a test file
echo "Hello World" > test.txt
# Process it
npx uns-mcp-server test.txt
```
## Troubleshooting
### API Key Issues
```bash
# Verify your API key is set
echo $UNSTRUCTURED_API_KEY
# Set it for current session
export UNSTRUCTURED_API_KEY=your_key_here
```
### Connection Issues
- Ensure you have internet connectivity
- Verify the API key is valid
- Check if you're behind a corporate firewall
## Changelog
### v2.0.2
- Updated GitHub repository to CG-Labs organization
- Documentation improvements
### v2.0.1
- Fixed API endpoint to use correct domain
- Improved error handling
- Better connection stability
### v2.0.0
- Complete rewrite in pure JavaScript
- Removed all Python dependencies
- Direct API integration
- Improved performance
### v1.0.x
- Initial release with Python bridge
## License
MIT License
## Support
- **Issues**: [GitHub Issues](https://github.com/CG-Labs/Unstructured-Document-Processor-MCP/issues)
- **GitHub**: [github.com/CG-Labs/Unstructured-Document-Processor-MCP](https://github.com/CG-Labs/Unstructured-Document-Processor-MCP)
- **NPM**: [npmjs.com/package/uns-mcp-server](https://www.npmjs.com/package/uns-mcp-server)
- **Unstructured.io**: [Documentation](https://unstructured-io.github.io/unstructured/)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
---
Made with ❤️ for the Claude Desktop community