UNPKG

cleanweb-mcp

Version:

A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion

143 lines (108 loc) • 3.15 kB
# CleanWeb MCP Usage Examples ## šŸš€ Quick Start ### 1. Build and Start MCP Server ```bash # Build project npm run build # Start MCP server (Stdio mode) npm run mcp:stdio ``` After successful startup, you will see: ``` šŸš€ CleanWeb MCP server started ``` ### 2. Configure Claude Add the following configuration to Claude's configuration file: ```json { "mcpServers": { "cleanweb-mcp": { "command": "node", "args": ["path/to/your/project/build/index.js"] } } } ``` ### 3. Use in Claude #### Basic Usage ``` Please help me extract content from this webpage: https://example.com/article ``` #### Specify Format ``` Please extract this webpage content in JSON format: https://news.example.com/tech-article ``` #### Fast Mode ``` Please extract this webpage content in fast mode (skip image loading): https://blog.example.com/post ``` ## šŸ› ļø Tool Parameters ### extract_web_content | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | url | string | āœ… | - | URL of the webpage to extract content from | | format | string | āŒ | "markdown" | Return format: "markdown" or "json" | | timeout | number | āŒ | 30000 | Page loading timeout (milliseconds) | | fastMode | boolean | āŒ | false | Fast mode, skip images and other resources | ## šŸ“‹ Use Cases ### 1. Article Content Extraction ``` Help me extract the content of this tech article: https://techblog.example.com/ai-trends-2024 ``` ### 2. News Content Organization ``` Please extract the main content of this news and summarize the key points: https://news.example.com/breaking-news ``` ### 3. Blog Article Analysis ``` Extract the content of this blog post and help me analyze the viewpoints: https://personal-blog.example.com/opinion-piece ``` ### 4. Research Material Collection ``` Please extract the content of this research report: https://research.example.com/report-2024 Format requirement: JSON format with complete metadata ``` ## šŸ”§ Advanced Usage ### SSE Mode Deployment ```bash # Start SSE server npm run mcp:sse ``` Then configure in Claude: ```json { "mcpServers": { "cleanweb-mcp": { "type": "sse", "url": "http://localhost:3100/sse", "timeout": 600 } } } ``` ### Development Mode ```bash # Watch file changes and auto-recompile npm run mcp:dev ``` ## āš ļø Important Notes 1. **Network Access**: Ensure access to target websites 2. **Browser Dependency**: Requires Chrome/Chromium browser 3. **Timeout Settings**: For slow-loading websites, increase timeout appropriately 4. **Resource Consumption**: Each extraction starts a browser, use reasonably ## šŸ› Troubleshooting ### Common Issues 1. **Browser Launch Failed** - Check if Chrome browser is installed - Try setting environment variable `CHROME_PATH` 2. **Webpage Loading Timeout** - Increase timeout parameter - Use fastMode to skip resource loading 3. **Content Extraction Failed** - Check if target website is accessible - Some websites may have anti-crawling mechanisms ### Debug Mode ```bash # View detailed logs DEBUG=* npm run mcp:stdio ```