cleanweb-mcp
Version:
A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion
214 lines (151 loc) โข 6.35 kB
Markdown
# ๐ CleanWeb MCP
<div align="center">
[](https://www.npmjs.com/package/cleanweb-mcp)
[](https://github.com/guangxiangdebizi/cleanweb-mcp)
[](https://opensource.org/licenses/MIT)
**A lightweight Model Context Protocol (MCP) server**
Specialized in intelligently extracting core web content, automatically filtering ads and irrelevant elements, and converting to clean Markdown format
[๐ Quick Start](#-quick-start) โข [๐ Documentation](#-usage) โข [๐ง Configuration](#-claude-configuration) โข [๐ค Contributing](#-contributing)
</div>
## โจ Features
<div align="center">
| ๐ Smart Extraction | ๐งน Content Cleaning | ๐ Format Conversion | โก Lightweight Deploy |
|:---:|:---:|:---:|:---:|
| Axios + Cheerio + Readability | Auto-filter ads & distractions | HTML โ Markdown | Zero browser dependency |
</div>
### ๐ฏ Core Advantages
- ๐ **Smart Content Extraction**: Uses Axios + Cheerio + Readability algorithm to extract main web content
- ๐งน **Intelligent Content Cleaning**: Automatically removes ads, navigation, sidebars and other distracting elements
- ๐ **Markdown Conversion**: Converts HTML content to clean Markdown format
- ๐ผ๏ธ **Image Link Optimization**: Automatically handles overly long image links for better readability
- โก **Lightweight Deployment**: No browser dependencies, simple and fast deployment
- ๐ง **Multiple Output Formats**: Supports pure Markdown or JSON format with metadata
- ๐ **MCP Protocol**: Fully compatible with Model Context Protocol standard
### ๐ ๏ธ Tech Stack
<div align="center">




</div>
## ๐ Quick Start
### ๐ฆ Installation
```bash
# Install from npm
npm install cleanweb-mcp
# Or clone the repository
git clone https://github.com/guangxiangdebizi/cleanweb-mcp.git
cd cleanweb-mcp
npm install
```
> **๐ก Advantage**: Uses lightweight HTTP client, no browser download required, simpler deployment! Focused on content cleaning and optimization.
## ๐ง Build Project
```bash
npm run build
```
## ๐ฏ Usage
### 1. Stdio Mode (Local Development)
```bash
npm run mcp:stdio
```
### 2. SSE Mode (via Supergateway)
```bash
npm run mcp:sse
```
Server will start at `http://localhost:3100/sse`
### 3. WebSocket Mode
```bash
npm run mcp:ws
```
### 4. Development Mode (Watch file changes)
```bash
npm run mcp:dev
```
## ๐ ๏ธ Claude Configuration
### Stdio Mode Configuration
Add to Claude's configuration file:
```json
{
"mcpServers": {
"cleanweb-mcp": {
"command": "node",
"args": ["path/to/your/project/build/index.js"]
}
}
}
```
### SSE Mode Configuration
```json
{
"mcpServers": {
"cleanweb-mcp-sse": {
"type": "sse",
"url": "http://localhost:3100/sse",
"timeout": 600
}
}
}
```
## ๐จ API Reference
### `extract_web_content`
Intelligently extract web content and convert to Markdown format.
#### Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | string | โ
| - | The web URL to extract content from |
| `format` | string | โ | `markdown` | Return format: `markdown` or `json` |
| `timeout` | number | โ | `30000` | Page loading timeout (milliseconds) |
#### Usage Examples
```javascript
// Basic usage
extract_web_content({
url: "https://example.com/article"
})
// Advanced usage
extract_web_content({
url: "https://example.com/article",
format: "json",
timeout: 60000
})
```
## ๐ Project Structure
```
cleanweb-mcp/
โโโ ๐ README.md # Project documentation
โโโ ๐ฆ package.json # Project configuration
โโโ โ๏ธ tsconfig.json # TypeScript configuration
โโโ ๐ง claude-config-example.json # Claude configuration example
โโโ ๐ example-usage.md # Usage examples
โโโ ๐๏ธ build/ # Compiled output
โ โโโ index.js
โ โโโ tools/
โ โโโ web-content-extractor.js
โโโ ๐ src/ # Source code
โโโ index.ts # MCP server main entry
โโโ tools/
โโโ web-content-extractor.ts # Web content extraction tool
```
## ๐ Migration from Express Server
The original Express server (`server.js`) can still run independently:
```bash
npm start
```
The MCP version provides the same core functionality but integrates with AI assistants through the MCP protocol.
## ๐จ Important Notes
1. **Lightweight Implementation**: Uses HTTP client to fetch static content, no browser dependencies required
2. **Network Access**: Requires access to target websites
3. **Static Content**: Primarily suitable for static HTML content, dynamically rendered content may not be accessible
4. **Timeout Settings**: For slow-loading websites, you can appropriately increase the timeout parameter
5. **Content Optimization**: Automatically optimizes image link display for better readability
## ๐ค Contributing
Welcome to submit Issues and Pull Requests! If you have any questions or suggestions, feel free to contact me.
## ๐ Contact
- **GitHub**: [guangxiangdebizi](https://github.com/guangxiangdebizi/)
- **Email**: guangxiangdebizi@gmail.com
- **LinkedIn**: [Xingyu Chen](https://www.linkedin.com/in/xingyu-chen-b5b3b0313/)
- **NPM**: [@xingyuchen](https://www.npmjs.com/~xingyuchen)
## ๐ Related Links
- **GitHub Repository**: [https://github.com/guangxiangdebizi/cleanweb-mcp](https://github.com/guangxiangdebizi/cleanweb-mcp)
- **NPM Package**: [https://www.npmjs.com/package/cleanweb-mcp](https://www.npmjs.com/package/cleanweb-mcp)
## ๐ License
MIT License - See [LICENSE](LICENSE) file for details