web-parser-mcp

# Web Parser MCP - Developer Guide ## 🚀 Quick Start ```bash # Setup development environment make dev-setup # Run tests make test # Run with coverage make test-cov # Run linting make lint # Format code make format # Run everything make check-all ``` ## 📁 Project Structure ``` web-parser-mcp/ ├── src/ # Main source code │ ├── config.py # Global configuration │ ├── schemas.py # Data schemas │ ├── utils/ # Utility modules │ │ ├── api_client.py # SessionAPIClient (requests + browser cookies) │ │ ├── http_client.py # UnifiedHTTPClient (Playwright) │ │ └── ... │ ├── tools/ # MCP tools │ └── session/ # Session management ├── tests/ # Test suite ├── main.py # Entry point ├── pyproject.toml # Python configuration └── Makefile # Development commands ``` ## 🏗️ Architecture Overview ### Hybrid Approach ``` Browser Login → Cookies → API Requests ↓ ↓ ↓ Playwright authenticated requests Session session with cookies ``` ### Key Components #### SessionAPIClient - **Purpose**: Efficient API requests with browser session cookies - **Features**: Rotating User-Agent, security headers, JSON parsing - **Use case**: All HTTP requests after browser authentication #### UnifiedHTTPClient - **Purpose**: Browser automation and session management - **Features**: Human-like navigation, stealth mode, screenshot capabilities - **Use case**: Authentication, complex interactions, debugging ## 🧪 Testing Strategy ### Unit Tests (`tests/test_*.py`) ```bash # Test individual components pytest tests/test_api_client.py -v pytest tests/test_http_client.py -v ``` ### Integration Tests (`tests/test_integration.py`) ```bash # Test tool interactions pytest tests/test_integration.py -v ``` ### Coverage Requirements - **Minimum**: 80% coverage - **Critical**: api_client.py, http_client.py - 90%+ - **Tools**: 75%+ coverage ### Running Tests ```bash # All tests make test # With coverage make test-cov # Fast testing (no coverage) make test-fast # Specific test file pytest tests/test_api_client.py::TestSessionAPIClient::test_init -v ``` ## 🛠️ Development Workflow ### 1. Setup ```bash # Clone repository git clone <repo-url> cd web-parser-mcp # Setup development environment make dev-setup # Verify setup make check-all ``` ### 2. Development ```bash # Create feature branch git checkout -b feature/new-tool # Make changes # Add tests for new functionality # Run checks make check-all # Commit git add . git commit -m "feat: add new tool with tests" ``` ### 3. Code Quality #### Pre-commit Hooks ```bash # Install hooks make pre-commit # Run on all files make pre-commit-run ``` #### Linting & Formatting ```bash # Check code quality make lint # Auto-fix issues make format # Type checking make check-types ``` ### 4. Testing #### Writing Tests ```python # Unit test example class TestMyComponent: def test_feature(self): # Arrange component = MyComponent() # Act result = component.do_something() # Assert assert result == expected_value ``` #### Mock Strategy ```python # Mock external dependencies @patch('src.utils.api_client.authenticated_session') def test_with_mock(self, mock_session): mock_session.get.return_value = {"active": True} # Test logic ``` ## 🔧 Adding New Tools ### 1. Tool Structure ```python # src/tools/new_tool.py async def new_tool(args: Dict[str, Any]) -> List[TextContent]: """ Tool description. Args: args: Tool arguments Returns: List of text content results """ # Implementation return [TextContent(type="text", text=json.dumps(result))] ``` ### 2. Tool Definition ```python # src/tools/definitions/new_tools.py def get_new_tools(): return [ Tool( name="new_tool", description="Tool description", inputSchema={ "type": "object", "properties": { "param": {"type": "string", "description": "Parameter description"} }, "required": ["param"] } ) ] ``` ### 3. Export Tool ```python # src/tools/__init__.py from .new_tool import new_tool ``` ### 4. Add Tests ```python # tests/test_new_tool.py def test_new_tool(): # Test implementation pass ``` ## 🔒 Security Best Practices ### User-Agent Rotation ```python # API client automatically rotates User-Agents api_client = get_api_client() # Gets random UA from ANTI_DETECTION_HEADERS # Manual rotation api_client.rotate_user_agent() ``` ### Session Management ```python # Always use session-aware clients api_client = get_api_client() api_client.ensure_session_updated() # Auto-sync with browser session # Check session status security_info = api_client.get_security_info() assert security_info["cookies_count"] > 0 ``` ### Input Validation ```python def validate_url(url: str) -> bool: """Validate URL format and safety.""" from urllib.parse import urlparse parsed = urlparse(url) return bool(parsed.scheme and parsed.netloc) ``` ## 📊 Monitoring & Debugging ### Performance Monitoring ```python import time class PerformanceMonitor: def __init__(self): self.metrics = {} def measure_time(self, operation: str): def decorator(func): def wrapper(*args, **kwargs): start = time.time() result = func(*args, **kwargs) duration = time.time() - start self.metrics[f"{operation}_time"] = duration return result return wrapper return decorator ``` ### Debug Logging ```python import logging # Configure logging logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(__name__) # Usage in tools logger.debug(f"Processing URL: {url}") logger.info(f"Found {len(links)} links") ``` ## 🚀 Deployment ### Building ```bash # Build package make build # Publish to PyPI make publish # Publish to test PyPI make publish-test ``` ### Docker (Future) ```bash # Build image make docker-build # Run container make docker-run ``` ## 📚 API Reference ### SessionAPIClient - `get(url, headers=None, params=None, debug=False)` - GET request - `post(url, data=None, json_data=None, headers=None, debug=False)` - POST request - `ensure_session_updated()` - Sync with browser session - `get_security_info()` - Get security status - `rotate_user_agent()` - Rotate User-Agent ### UnifiedHTTPClient - `get_with_session(url, timeout=10, headers=None, debug=False)` - Browser session request - `navigate_like_human(url, timeout=30, wait_for=None, stealth=False, debug=False)` - Human-like navigation ## 🤝 Contributing 1. Fork repository 2. Create feature branch (`git checkout -b feature/amazing-feature`) 3. Write tests for new functionality 4. Ensure all tests pass (`make test-cov`) 5. Follow code style (`make format`) 6. Commit changes (`git commit -m 'feat: add amazing feature'`) 7. Push to branch (`git push origin feature/amazing-feature`) 8. Open Pull Request ## 📞 Support - **Issues**: GitHub Issues - **Discussions**: GitHub Discussions - **Documentation**: `/docs` or `make docs-serve` --- ## 🎯 Quick Commands ```bash # Development setup make dev-setup # Daily development make check-all # Lint + test + coverage make format # Auto-format code make test # Run tests make lint # Check code quality # Before commit make pre-commit-run # Run all pre-commit checks # Build & deploy make build # Build package make publish # Publish to PyPI ```