web-parser-mcp
Version:
๐ MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!
378 lines (295 loc) โข 8.33 kB
Markdown
# Web Parser MCP - Developer Guide
## ๐ Quick Start
```bash
# Setup development environment
make dev-setup
# Run tests
make test
# Run with coverage
make test-cov
# Run linting
make lint
# Format code
make format
# Run everything
make check-all
```
## ๐ Project Structure
```
web-parser-mcp/
โโโ src/ # Main source code
โ โโโ config.py # Global configuration
โ โโโ schemas.py # Data schemas
โ โโโ utils/ # Utility modules
โ โ โโโ api_client.py # SessionAPIClient (requests + browser cookies)
โ โ โโโ http_client.py # UnifiedHTTPClient (Playwright)
โ โ โโโ ...
โ โโโ tools/ # MCP tools
โ โโโ session/ # Session management
โโโ tests/ # Test suite
โโโ main.py # Entry point
โโโ pyproject.toml # Python configuration
โโโ Makefile # Development commands
```
## ๐๏ธ Architecture Overview
### Hybrid Approach
```
Browser Login โ Cookies โ API Requests
โ โ โ
Playwright authenticated requests
Session session with cookies
```
### Key Components
#### SessionAPIClient
- **Purpose**: Efficient API requests with browser session cookies
- **Features**: Rotating User-Agent, security headers, JSON parsing
- **Use case**: All HTTP requests after browser authentication
#### UnifiedHTTPClient
- **Purpose**: Browser automation and session management
- **Features**: Human-like navigation, stealth mode, screenshot capabilities
- **Use case**: Authentication, complex interactions, debugging
## ๐งช Testing Strategy
### Unit Tests (`tests/test_*.py`)
```bash
# Test individual components
pytest tests/test_api_client.py -v
pytest tests/test_http_client.py -v
```
### Integration Tests (`tests/test_integration.py`)
```bash
# Test tool interactions
pytest tests/test_integration.py -v
```
### Coverage Requirements
- **Minimum**: 80% coverage
- **Critical**: api_client.py, http_client.py - 90%+
- **Tools**: 75%+ coverage
### Running Tests
```bash
# All tests
make test
# With coverage
make test-cov
# Fast testing (no coverage)
make test-fast
# Specific test file
pytest tests/test_api_client.py::TestSessionAPIClient::test_init -v
```
## ๐ ๏ธ Development Workflow
### 1. Setup
```bash
# Clone repository
git clone <repo-url>
cd web-parser-mcp
# Setup development environment
make dev-setup
# Verify setup
make check-all
```
### 2. Development
```bash
# Create feature branch
git checkout -b feature/new-tool
# Make changes
# Add tests for new functionality
# Run checks
make check-all
# Commit
git add .
git commit -m "feat: add new tool with tests"
```
### 3. Code Quality
#### Pre-commit Hooks
```bash
# Install hooks
make pre-commit
# Run on all files
make pre-commit-run
```
#### Linting & Formatting
```bash
# Check code quality
make lint
# Auto-fix issues
make format
# Type checking
make check-types
```
### 4. Testing
#### Writing Tests
```python
# Unit test example
class TestMyComponent:
def test_feature(self):
# Arrange
component = MyComponent()
# Act
result = component.do_something()
# Assert
assert result == expected_value
```
#### Mock Strategy
```python
# Mock external dependencies
def test_with_mock(self, mock_session):
mock_session.get.return_value = {"active": True}
# Test logic
```
## ๐ง Adding New Tools
### 1. Tool Structure
```python
# src/tools/new_tool.py
async def new_tool(args: Dict[str, Any]) -> List[TextContent]:
"""
Tool description.
Args:
args: Tool arguments
Returns:
List of text content results
"""
# Implementation
return [TextContent(type="text", text=json.dumps(result))]
```
### 2. Tool Definition
```python
# src/tools/definitions/new_tools.py
def get_new_tools():
return [
Tool(
name="new_tool",
description="Tool description",
inputSchema={
"type": "object",
"properties": {
"param": {"type": "string", "description": "Parameter description"}
},
"required": ["param"]
}
)
]
```
### 3. Export Tool
```python
# src/tools/__init__.py
from .new_tool import new_tool
```
### 4. Add Tests
```python
# tests/test_new_tool.py
def test_new_tool():
# Test implementation
pass
```
## ๐ Security Best Practices
### User-Agent Rotation
```python
# API client automatically rotates User-Agents
api_client = get_api_client() # Gets random UA from ANTI_DETECTION_HEADERS
# Manual rotation
api_client.rotate_user_agent()
```
### Session Management
```python
# Always use session-aware clients
api_client = get_api_client()
api_client.ensure_session_updated() # Auto-sync with browser session
# Check session status
security_info = api_client.get_security_info()
assert security_info["cookies_count"] > 0
```
### Input Validation
```python
def validate_url(url: str) -> bool:
"""Validate URL format and safety."""
from urllib.parse import urlparse
parsed = urlparse(url)
return bool(parsed.scheme and parsed.netloc)
```
## ๐ Monitoring & Debugging
### Performance Monitoring
```python
import time
class PerformanceMonitor:
def __init__(self):
self.metrics = {}
def measure_time(self, operation: str):
def decorator(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
self.metrics[f"{operation}_time"] = duration
return result
return wrapper
return decorator
```
### Debug Logging
```python
import logging
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Usage in tools
logger.debug(f"Processing URL: {url}")
logger.info(f"Found {len(links)} links")
```
## ๐ Deployment
### Building
```bash
# Build package
make build
# Publish to PyPI
make publish
# Publish to test PyPI
make publish-test
```
### Docker (Future)
```bash
# Build image
make docker-build
# Run container
make docker-run
```
## ๐ API Reference
### SessionAPIClient
- `get(url, headers=None, params=None, debug=False)` - GET request
- `post(url, data=None, json_data=None, headers=None, debug=False)` - POST request
- `ensure_session_updated()` - Sync with browser session
- `get_security_info()` - Get security status
- `rotate_user_agent()` - Rotate User-Agent
### UnifiedHTTPClient
- `get_with_session(url, timeout=10, headers=None, debug=False)` - Browser session request
- `navigate_like_human(url, timeout=30, wait_for=None, stealth=False, debug=False)` - Human-like navigation
## ๐ค Contributing
1. Fork repository
2. Create feature branch (`git checkout -b feature/amazing-feature`)
3. Write tests for new functionality
4. Ensure all tests pass (`make test-cov`)
5. Follow code style (`make format`)
6. Commit changes (`git commit -m 'feat: add amazing feature'`)
7. Push to branch (`git push origin feature/amazing-feature`)
8. Open Pull Request
## ๐ Support
- **Issues**: GitHub Issues
- **Discussions**: GitHub Discussions
- **Documentation**: `/docs` or `make docs-serve`
---
## ๐ฏ Quick Commands
```bash
# Development setup
make dev-setup
# Daily development
make check-all # Lint + test + coverage
make format # Auto-format code
make test # Run tests
make lint # Check code quality
# Before commit
make pre-commit-run # Run all pre-commit checks
# Build & deploy
make build # Build package
make publish # Publish to PyPI
```