cleanweb-mcp
Version:
A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion
143 lines (108 loc) ⢠3.15 kB
Markdown
# CleanWeb MCP Usage Examples
## š Quick Start
### 1. Build and Start MCP Server
```bash
# Build project
npm run build
# Start MCP server (Stdio mode)
npm run mcp:stdio
```
After successful startup, you will see:
```
š CleanWeb MCP server started
```
### 2. Configure Claude
Add the following configuration to Claude's configuration file:
```json
{
"mcpServers": {
"cleanweb-mcp": {
"command": "node",
"args": ["path/to/your/project/build/index.js"]
}
}
}
```
### 3. Use in Claude
#### Basic Usage
```
Please help me extract content from this webpage: https://example.com/article
```
#### Specify Format
```
Please extract this webpage content in JSON format: https://news.example.com/tech-article
```
#### Fast Mode
```
Please extract this webpage content in fast mode (skip image loading): https://blog.example.com/post
```
## š ļø Tool Parameters
### extract_web_content
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| url | string | ā
| - | URL of the webpage to extract content from |
| format | string | ā | "markdown" | Return format: "markdown" or "json" |
| timeout | number | ā | 30000 | Page loading timeout (milliseconds) |
| fastMode | boolean | ā | false | Fast mode, skip images and other resources |
## š Use Cases
### 1. Article Content Extraction
```
Help me extract the content of this tech article: https://techblog.example.com/ai-trends-2024
```
### 2. News Content Organization
```
Please extract the main content of this news and summarize the key points: https://news.example.com/breaking-news
```
### 3. Blog Article Analysis
```
Extract the content of this blog post and help me analyze the viewpoints: https://personal-blog.example.com/opinion-piece
```
### 4. Research Material Collection
```
Please extract the content of this research report: https://research.example.com/report-2024
Format requirement: JSON format with complete metadata
```
## š§ Advanced Usage
### SSE Mode Deployment
```bash
# Start SSE server
npm run mcp:sse
```
Then configure in Claude:
```json
{
"mcpServers": {
"cleanweb-mcp": {
"type": "sse",
"url": "http://localhost:3100/sse",
"timeout": 600
}
}
}
```
### Development Mode
```bash
# Watch file changes and auto-recompile
npm run mcp:dev
```
## ā ļø Important Notes
1. **Network Access**: Ensure access to target websites
2. **Browser Dependency**: Requires Chrome/Chromium browser
3. **Timeout Settings**: For slow-loading websites, increase timeout appropriately
4. **Resource Consumption**: Each extraction starts a browser, use reasonably
## š Troubleshooting
### Common Issues
1. **Browser Launch Failed**
- Check if Chrome browser is installed
- Try setting environment variable `CHROME_PATH`
2. **Webpage Loading Timeout**
- Increase timeout parameter
- Use fastMode to skip resource loading
3. **Content Extraction Failed**
- Check if target website is accessible
- Some websites may have anti-crawling mechanisms
### Debug Mode
```bash
# View detailed logs
DEBUG=* npm run mcp:stdio
```