paper-search-mcp-nodejs

# AIPaper-assisant - Comprehensive Project Documentation ## 🎯 Project Overview AIPaper-assisant is a sophisticated Node.js Model Context Protocol (MCP) server that provides unified access to 14+ academic paper databases and search platforms. It serves as a centralized interface for searching, discovering, and downloading academic papers from sources including arXiv, Web of Science, PubMed, Google Scholar, Sci-Hub, ScienceDirect, Springer, Wiley, Scopus, Crossref, and more. ### Key Value Propositions - **Unified Interface**: Single API for multiple academic platforms - **Intelligent Fallbacks**: Automatic platform switching for better results - **Enterprise Features**: Rate limiting, error handling, and robust architecture - **MCP Integration**: Native Claude Desktop integration for seamless research workflows ## 🏗️ Architecture Documentation ### System Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ MCP Protocol Layer │ ├─────────────────────────────────────────────────────────────┤ │ Server Core (server.ts) │ ├─────────────────────────────────────────────────────────────┤ │ ┌──────────────┬──────────────┬──────────────┬──────────┐ │ │ │ Crossref │ arXiv │ Web of Sci │ PubMed │ │ │ │ Searcher │ Searcher │ Searcher │ Searcher │ │ │ └──────────────┴──────────────┴──────────────┴──────────┘ │ │ ┌──────────────┬──────────────┬──────────────┬──────────┐ │ │ │Google Scholar│ Sci-Hub │ScienceDirect │ Springer │ │ │ │ Searcher │ Searcher │ Searcher │ Searcher │ │ │ └──────────────┴──────────────┴──────────────┴──────────┘ │ ├─────────────────────────────────────────────────────────────┤ │ Abstract Platform Framework │ │ (PaperSource.ts) │ ├─────────────────────────────────────────────────────────────┤ │ Unified Data Model (Paper.ts) │ ├─────────────────────────────────────────────────────────────┤ │ Utility Layer (RateLimiter) │ └─────────────────────────────────────────────────────────────┘ ``` ### Core Components #### 1. MCP Server Core (`src/server.ts`) - **Purpose**: Main entry point implementing Model Context Protocol - **Responsibilities**: - Tool registration and lifecycle management - Request routing and parameter validation - Error handling and response formatting - MCP protocol compliance #### 2. Abstract Platform Framework (`src/platforms/PaperSource.ts`) - **Purpose**: Base class defining contract for all search platforms - **Key Features**: - Standardized search interface - Common HTTP client configuration - Error handling patterns - Capability discovery system - Rate limiting integration #### 3. Unified Data Model (`src/models/Paper.ts`) - **Purpose**: Consistent data structure across all platforms - **Components**: - `Paper` interface: Standard paper metadata - `PaperFactory`: Validation and transformation - Type-safe schemas using Zod #### 4. Rate Limiting System (`src/utils/RateLimiter.ts`) - **Algorithm**: Token bucket implementation - **Features**: - Per-platform rate configuration - Burst capacity management - Queue-based request handling - Automatic retry mechanisms ## 📋 Platform Integration Guide ### Platform Capabilities Matrix | Platform | Search | Download | Full Text | Citations | API Key | Special Features | |----------|--------|----------|-----------|-----------|---------|------------------| | Crossref | ✅ | ❌ | ❌ | ✅ | ❌ | Default fallback, extensive metadata | | arXiv | ✅ | ✅ | ✅ | ❌ | ❌ | Category filtering, PDF support | | Web of Science | ✅ | ❌ | ❌ | ✅ | ✅ | Multi-topic search, date sorting | | PubMed | ✅ | ❌ | ❌ | ❌ | 🟡 | Biomedical focus, MeSH terms | | Google Scholar | ✅ | ❌ | ❌ | ✅ | ❌ | Comprehensive coverage | | Sci-Hub | ✅ | ✅ | ✅ | ❌ | ❌ | Mirror management, DOI-based | | ScienceDirect | ✅ | ❌ | ❌ | ✅ | ✅ | Elsevier content | | Springer | ✅ | ✅* | ❌ | ❌ | ✅ | Dual API (Meta + OpenAccess) | | Scopus | ✅ | ❌ | ❌ | ✅ | ✅ | Largest citation database | | Wiley | ❌ | ✅ | ✅ | ❌ | ✅ | TDM API, DOI required | ### Integration Patterns #### 1. Basic Platform Implementation ```typescript export class PlatformSearcher extends PaperSource { constructor() { super('platform-name', { rateLimit: { tokens: 10, refillRate: 1 }, capabilities: { search: true, download: false, citations: true } }); } async search(query: string, options?: SearchOptions): Promise<Paper[]> { // Platform-specific implementation } } ``` #### 2. Error Handling Pattern ```typescript try { const response = await this.client.get(url); return this.transformResults(response.data); } catch (error) { if (error.response?.status === 429) { throw new RateLimitError('Platform rate limit exceeded'); } throw new PlatformError(`Search failed: ${error.message}`); } ``` #### 3. Rate Limiting Integration ```typescript // Automatic rate limiting via base class await this.rateLimiter.acquire(); const results = await this.performSearch(query); return results; ``` ## 🔧 API Documentation ### Core Tools #### `search_papers` Searches for academic papers across multiple platforms. **Parameters:** - `query` (string): Search query terms - `platform` (string, optional): Specific platform to search - `maxResults` (number, optional): Maximum results per platform (default: 10) - `yearRange` (string, optional): Year range filter (e.g., "2020-2024") - `sortBy` (string, optional): Sort criteria (relevance/date) **Returns:** ```json { "papers": [ { "title": "Paper Title", "authors": ["Author 1", "Author 2"], "abstract": "Abstract text...", "doi": "10.1000/example", "year": 2024, "source": "arxiv", "pdfUrl": "https://...", "citations": 42 } ], "metadata": { "platform": "arxiv", "totalResults": 150, "searchTime": 1.23 } } ``` #### `download_paper` Downloads a paper PDF from supported platforms. **Parameters:** - `doi` (string): DOI identifier - `title` (string, optional): Paper title for better matching - `platform` (string, optional): Preferred download platform **Returns:** ```json { "success": true, "pdfUrl": "https://download.example.com/paper.pdf", "platform": "arxiv", "fileSize": 1024000 } ``` #### `get_platform_status` Checks the health and capabilities of all platforms. **Returns:** ```json { "platforms": [ { "name": "arxiv", "status": "online", "capabilities": { "search": true, "download": true, "citations": false }, "rateLimit": { "tokens": 15, "refillRate": 1 } } ] } ``` ### Platform-Specific Features #### Web of Science Integration - Multi-topic search with boolean operators - Advanced date filtering and sorting - Citation network analysis - Research area categorization #### arXiv Integration - Category-specific filtering (cs.AI, physics.gen-ph, etc.) - Date-based sorting and filtering - Direct PDF download support - Preprint version tracking #### Sci-Hub Integration - Mirror health monitoring - Automatic mirror failover - DOI-based paper location - Legal compliance considerations ## 🚀 Development Workflow ### Setup Instructions 1. **Clone and Install** ```bash git clone <repository> cd paper-search-nodejs npm install ``` 2. **Configuration** - Copy `.env.example` to `.env` - Add API keys for platforms requiring authentication - Configure rate limits as needed 3. **Development Mode** ```bash npm run dev # Runs with tsx for hot reload ``` 4. **Build for Production** ```bash npm run build # Compiles TypeScript to dist/ npm start # Runs compiled JavaScript ``` ### Testing Strategy #### Test Categories 1. **Unit Tests**: Individual platform functionality 2. **Integration Tests**: Cross-platform compatibility 3. **Error Handling**: Invalid inputs and edge cases 4. **Rate Limiting**: Throttling and retry behavior 5. **MCP Protocol**: Tool registration and communication #### Running Tests ```bash npm test # Run all tests npm run test:watch # Watch mode for development npm run test:coverage # Generate coverage report ``` ### Code Quality Standards #### TypeScript Configuration - Strict mode enabled - ES2022 target with ESNext modules - Source maps for debugging - Explicit return types required #### Linting and Formatting ```bash npm run lint # ESLint with TypeScript support npm run format # Prettier formatting npm run type-check # TypeScript compiler checks ``` ## 🔐 Security Considerations ### API Key Management - Store keys in environment variables - Never commit keys to version control - Validate keys on platform initialization - Rotate keys regularly ### Rate Limiting - Respect platform rate limits - Implement exponential backoff - Queue requests during high load - Monitor for abuse patterns ### Error Sanitization - Remove sensitive data from errors - Log sanitized error messages only - Never expose internal implementation details - Use generic error messages for users ## 📊 Performance Optimization ### Concurrent Operations - Parallel search execution where possible - Non-blocking I/O for all network requests - Efficient memory usage with streaming ### Caching Strategy - Platform capability caching - Mirror health status (Sci-Hub) - Request deduplication - Configurable cache TTL ### Network Efficiency - Connection pooling with keep-alive - Request timeout configuration - Retry with exponential backoff - Compression support (gzip) ## 🎯 Best Practices ### Adding New Platforms 1. Extend `PaperSource` base class 2. Implement required methods (`search`, `download`) 3. Define platform capabilities 4. Add rate limiting configuration 5. Create comprehensive tests 6. Update documentation ### Error Handling Guidelines ```typescript // Platform-specific errors class PlatformError extends Error { constructor(message: string, public platform: string) { super(message); this.name = 'PlatformError'; } } // Rate limit errors class RateLimitError extends Error { constructor(message: string, public retryAfter?: number) { super(message); this.name = 'RateLimitError'; } } ``` ### Logging Standards ```typescript // Use structured logging logger.info('Platform search completed', { platform: 'arxiv', query: 'machine learning', results: 10, duration: 1234 }); // Log errors with context logger.error('Search failed', { platform: 'wos', error: error.message, query: options.query, retryCount }); ``` ## 📚 Additional Resources ### Related Documentation - [MCP Protocol Specification](https://modelcontextprotocol.io/) - [Platform API Documentation](./PLATFORM_APIS.md) - [Contributing Guidelines](./CONTRIBUTING.md) - [Changelog](./CHANGELOG.md) ### External References - [Crossref API](https://api.crossref.org/) - [arXiv API](https://arxiv.org/help/api/) - [Web of Science API](https://developer.clarivate.com/) - [PubMed API](https://www.ncbi.nlm.nih.gov/home/develop/) --- *This documentation is automatically maintained. For updates, please submit pull requests with detailed descriptions of changes.*