UNPKG

@pinkpixel/prysm-mcp

Version:

MCP server for the Prysm web scraper - enabling AI assistants to scrape web content

197 lines (153 loc) • 6.18 kB
# šŸ“‹ Prysm MCP Server Implementation Plan ## Project Overview This project aims to develop an MCP (Model Context Protocol) server that exposes the capabilities of the Prysm web scraper to AI assistants like Claude, Cursor, and other MCP-compatible platforms. The server will provide tools for various scraping modes (focused, balanced, deep) while allowing customization of parameters and enabling features like multi-page scraping and image downloading. ## 1. Project Setup - [ ] Create a new npm package: `prysm-mcp` - [ ] Set up basic project structure - [ ] `/src` - Source code - [ ] `/dist` - Compiled code - [ ] `/types` - TypeScript type definitions - [ ] `/docs` - Documentation - [ ] `/examples` - Example usage and configuration ## 2. Dependencies - [ ] Install required dependencies: - Core dependencies: - [ ] `@pinkpixel/prysm-scraper` - The web scraper package - [ ] `@modelcontextprotocol/server` - MCP server library - Development dependencies: - [ ] `typescript` - [ ] `tsup` - For bundling - [ ] `jest` - For testing ## 3. MCP Tools Implementation ### 3.1 Core Scraping Tools - [ ] Implement `scrapeFocused` tool - Parameters: - `url` (required) - URL to scrape - `maxScrolls` (optional) - Maximum scroll attempts - `scrollDelay` (optional) - Delay between scrolls in ms - `pages` (optional) - Number of pages to scrape - `scrapeImages` (optional) - Enable image scraping - `downloadImages` (optional) - Download images locally - `maxImages` (optional) - Maximum images to extract - `minImageSize` (optional) - Minimum width/height for images - [ ] Implement `scrapeBalanced` tool - Same parameters as `scrapeFocused` but with standard/balanced mode - [ ] Implement `scrapeDeep` tool - Same parameters as `scrapeFocused` but with deep mode ### 3.2 Specialized Tools - [ ] Implement `scrapeArticle` tool - Optimized for articles and blog posts - [ ] Implement `scrapeProduct` tool - Optimized for product pages - [ ] Implement `scrapeListing` tool - Optimized for product listings and search results ### 3.3 Utility Tools - [ ] Implement `analyzeUrl` tool - Analyze site structure without scraping (returns page structure information) - [ ] Implement `formatResult` tool - Structures scraped data into organized format (markdown, HTML, etc.) - Parameters: - `data` - Scraped data - `format` - Output format (markdown, html, json) - `includeImages` - Whether to include images in output ## 4. MCP Resources Implementation - [ ] Implement resource handlers for cached results - Allows referencing previously scraped content ## 5. JSON Configuration - [ ] Create MCP server configuration file ```json { "name": "prysm-mcp", "displayName": "Prysm Web Scraper", "description": "Intelligent web scraping tools with three efficiency modes: focused, balanced, and deep", "version": "1.0.0", "tools": [ { "name": "scrapeFocused", "description": "Fast web scraping optimized for speed (fewer scrolls, main content only)", "parameters": {...} }, { "name": "scrapeBalanced", "description": "Balanced web scraping approach with good coverage and reasonable speed", "parameters": {...} }, { "name": "scrapeDeep", "description": "Maximum extraction web scraping (slower but thorough)", "parameters": {...} }, ... ] } ``` ## 6. Documentation and Example Setup - [ ] Create comprehensive README.md - [ ] Create installation guide for various environments - [ ] Write example usage documentation for popular MCP-enabled apps: - [ ] Cursor - [ ] Claude.ai - [ ] Other MCP supporting applications ## 7. Transportation Modes - [ ] Implement stdio transport (for CLI integration) - [ ] Implement SSE transport (for web integration) ## 8. Testing - [ ] Write unit tests for MCP tools - [ ] Create integration tests with sample web pages - [ ] Test in Cursor and other MCP-enabled applications ## 9. Packaging & Publishing - [ ] Set up npm package configuration - [ ] Create build scripts - [ ] Publish to npm registry ## Technical Architecture ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ MCP Client (Cursor)│ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ MCP Protocol ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Prysm MCP Server │ ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤ │ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ │ │ Tool Handler │ │ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ │ │ │ │ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ │ │ Prysm Scraper │ │ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ``` ## Implementation Approach 1. Start with a minimal implementation of one tool (e.g., `scrapeBalanced`) 2. Test with a simple web page and verify results 3. Extend with additional tools and features 4. Add error handling and edge case management 5. Document and publish ## User Documentation Example ### Using Prysm MCP with Cursor In Cursor, you can add the Prysm MCP server by: 1. Going to Settings > Features > MCP 2. Clicking "+ Add New MCP Server" 3. Selecting "stdio" as the transport 4. Setting the command to: `npx -y @pinkpixel/prysm-mcp-server` 5. Clicking "Add Server" Then in the Composer, you can ask the agent to: - "Scrape https://example.com using the balanced mode and download images" - "Analyze the structure of https://example.com" - "Format the scraped results as markdown with images included" ### Using Prysm MCP with Claude.ai Add the Prysm MCP tool in Claude's settings by configuring the tool with: - Name: "Prysm Web Scraper" - Transport: "stdio" - Command: `npx -y @pinkpixel/prysm-mcp-server` ## Cursor .cursor/mcp.json Example ```json { "mcpServers": { "prysm-scraper": { "command": "npx", "args": ["-y", "@pinkpixel/prysm-mcp-server"] } } } ```