UNPKG

paper-search-mcp-nodejs

Version:

A Node.js MCP server for searching and downloading academic papers from multiple sources, including arXiv, PubMed, bioRxiv, Web of Science, and more.

501 lines (397 loc) β€’ 14 kB
# Paper Search MCP (Node.js) ## English|[δΈ­ζ–‡](README-sc.md) A Node.js Model Context Protocol (MCP) server for searching and downloading academic papers from multiple sources, including arXiv, Web of Science, PubMed, Google Scholar, Sci-Hub, ScienceDirect, Springer, Wiley, Scopus, and **13 academic platforms** in total. ![Node.js](https://img.shields.io/badge/node.js->=18.0.0-green.svg) ![TypeScript](https://img.shields.io/badge/typescript-^5.5.3-blue.svg) ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Platforms](https://img.shields.io/badge/platforms-13-brightgreen.svg) ## ✨ Key Features - **🌍 13 Academic Platforms**: arXiv, Web of Science, PubMed, Google Scholar, bioRxiv, medRxiv, Semantic Scholar, IACR ePrint, Sci-Hub, ScienceDirect, Springer Nature, Wiley, Scopus - **πŸ”— MCP Protocol Integration**: Seamless integration with Claude Desktop and other AI assistants - **πŸ“Š Unified Data Model**: Standardized paper format across all platforms - **⚑ High-Performance Search**: Concurrent search with intelligent rate limiting - **πŸ›‘οΈ Type Safety**: Complete TypeScript support - **🎯 Academic Papers First**: Smart filtering prioritizing academic papers over books - **πŸ”„ Smart Error Handling**: Platform fallback and auto-retry mechanisms ## πŸ“š Supported Platforms | Platform | Search | Download | Full Text | Citations | API Key | Special Features | |----------|--------|----------|-----------|-----------|---------|------------------| | **arXiv** | βœ… | βœ… | βœ… | ❌ | ❌ | Physics/CS preprints | | **Web of Science** | βœ… | ❌ | ❌ | βœ… | βœ… Required | High-quality journal index | | **PubMed** | βœ… | ❌ | ❌ | ❌ | 🟑 Optional | Biomedical literature | | **Google Scholar** | βœ… | ❌ | ❌ | βœ… | ❌ | Comprehensive academic search | | **bioRxiv** | βœ… | βœ… | βœ… | ❌ | ❌ | Biology preprints | | **medRxiv** | βœ… | βœ… | βœ… | ❌ | ❌ | Medical preprints | | **Semantic Scholar** | βœ… | βœ… | ❌ | βœ… | 🟑 Optional | AI semantic search | | **IACR ePrint** | βœ… | βœ… | βœ… | ❌ | ❌ | Cryptography papers | | **Sci-Hub** | βœ… | βœ… | ❌ | ❌ | ❌ | Universal paper access via DOI | | **ScienceDirect** | βœ… | ❌ | ❌ | βœ… | βœ… Required | Elsevier's full-text database | | **Springer Nature** | βœ… | βœ…* | ❌ | ❌ | βœ… Required | Dual API: Meta v2 & OpenAccess | | **Wiley** | βœ… | βœ… | ❌ | ❌ | βœ… Required | Text and Data Mining API | | **Scopus** | βœ… | ❌ | ❌ | βœ… | βœ… Required | Largest citation database | βœ… Supported | ❌ Not supported | 🟑 Optional | βœ…* Open Access only ## πŸš€ Quick Start ### System Requirements - Node.js >= 18.0.0 - npm or yarn ### Installation ```bash # Clone repository git clone https://github.com/your-username/paper-search-mcp-nodejs.git cd paper-search-mcp-nodejs # Install dependencies npm install # Copy environment template cp .env.example .env ``` ### Configuration 1. **Get Web of Science API Key** - Visit [Clarivate Developer Portal](https://developer.clarivate.com/apis) - Register and apply for Web of Science API access - Add API key to `.env` file 2. **Get PubMed API Key (Optional)** - Without API key: Free usage, 3 requests/second limit - With API key: 10 requests/second, more stable service - Get key: See [NCBI API Keys](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/) 3. **Configure Environment Variables** ```bash # Edit .env file WOS_API_KEY=your_actual_api_key_here WOS_API_VERSION=v1 # PubMed API key (optional, recommended for better performance) PUBMED_API_KEY=your_ncbi_api_key_here # Semantic Scholar API key (optional, increases rate limits) SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key # Elsevier API key (required for ScienceDirect and Scopus) ELSEVIER_API_KEY=your_elsevier_api_key # Springer Nature API keys (required for Springer) SPRINGER_API_KEY=your_springer_api_key # For Metadata API v2 # Optional: Separate key for OpenAccess API (if different from main key) SPRINGER_OPENACCESS_API_KEY=your_openaccess_api_key # Wiley TDM token (required for Wiley) WILEY_TDM_TOKEN=your_wiley_tdm_token ``` ### Build and Run #### Method 1: NPX (Recommended for MCP) ```bash # Direct run with npx (most common MCP deployment) npx -y paper-search-mcp-nodejs # Or install globally npm install -g paper-search-mcp-nodejs paper-search-mcp ``` #### Method 2: Local Development ```bash # Build TypeScript code npm run build # Start server npm start # Or run in development mode npm run dev ``` ### MCP Server Configuration Add the following configuration to your Claude Desktop config file: **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` #### NPX Configuration (Recommended) ```json { "mcpServers": { "paper-search-nodejs": { "command": "npx", "args": ["-y", "paper-search-mcp-nodejs"], "env": { "WOS_API_KEY": "your_web_of_science_api_key" } } } } ``` #### Local Installation Configuration ```json { "mcpServers": { "paper_search_nodejs": { "command": "node", "args": ["/path/to/paper-search-mcp-nodejs/dist/server.js"], "env": { "WOS_API_KEY": "your_web_of_science_api_key" } } } } ``` ## πŸ› οΈ MCP Tools ### `search_papers` Search academic papers across multiple platforms ```typescript // Random platform selection (default behavior) search_papers({ query: "machine learning", platform: "all", // Randomly selects one platform for efficiency maxResults: 10, year: "2023", sortBy: "date" }) // Search specific platform search_papers({ query: "quantum computing", platform: "webofscience", // Target specific platform maxResults: 5 }) ``` **Platform Selection Behavior:** - `platform: "all"` - Randomly selects one platform for efficient, focused results - Specific platform - Searches only that platform - Available platforms: `arxiv`, `webofscience`/`wos`, `pubmed`, `biorxiv`, `medrxiv`, `semantic`, `iacr`, `googlescholar`/`scholar`, `scihub`, `sciencedirect`, `springer`, `wiley`, `scopus` ### `search_arxiv` Search arXiv preprints specifically ```typescript search_arxiv({ query: "transformer neural networks", maxResults: 10, category: "cs.AI", author: "Attention" }) ``` ### `search_webofscience` Search Web of Science database specifically ```typescript search_webofscience({ query: "CRISPR gene editing", maxResults: 15, year: "2022", journal: "Nature" }) ``` ### `search_pubmed` Search PubMed/MEDLINE biomedical literature database ```typescript search_pubmed({ query: "COVID-19 vaccine efficacy", maxResults: 20, year: "2023", author: "Smith", journal: "New England Journal of Medicine", publicationType: ["Journal Article", "Clinical Trial"] }) ``` ### `search_google_scholar` Search Google Scholar academic database ```typescript search_google_scholar({ query: "machine learning", maxResults: 10, yearLow: 2020, yearHigh: 2023, author: "Bengio" }) ``` ### `search_biorxiv` / `search_medrxiv` Search biology and medical preprints ```typescript search_biorxiv({ query: "CRISPR", maxResults: 15, days: 30 }) ``` ### `search_semantic_scholar` Search Semantic Scholar AI semantic database ```typescript search_semantic_scholar({ query: "deep learning", maxResults: 10, fieldsOfStudy: ["Computer Science"], year: "2023" }) ``` ### `search_iacr` Search IACR ePrint cryptography archive ```typescript search_iacr({ query: "zero knowledge proof", maxResults: 5, fetchDetails: true }) ``` ### `search_scihub` Search and download papers from Sci-Hub using DOI or paper URL ```typescript search_scihub({ doiOrUrl: "10.1038/nature12373", downloadPdf: true, savePath: "./downloads" }) ``` ### `check_scihub_mirrors` Check health status of Sci-Hub mirror sites ```typescript check_scihub_mirrors({ forceCheck: true // Force fresh health check }) ``` ### `download_paper` Download paper PDF files ```typescript download_paper({ paperId: "2106.12345", // or DOI for Sci-Hub platform: "arxiv", // or "scihub" for Sci-Hub downloads savePath: "./downloads" }) ``` ### `get_paper_by_doi` Get paper information by DOI ```typescript get_paper_by_doi({ doi: "10.1038/s41586-023-12345-6", platform: "all" }) ``` ### `get_platform_status` Check platform status and API keys ```typescript get_platform_status({}) ``` ## πŸ“Š Data Model All platform paper data is converted to a unified format: ```typescript interface Paper { paperId: string; // Unique identifier title: string; // Paper title authors: string[]; // Author list abstract: string; // Abstract doi: string; // DOI publishedDate: Date; // Publication date pdfUrl: string; // PDF link url: string; // Paper page URL source: string; // Source platform citationCount?: number; // Citation count journal?: string; // Journal name year?: number; // Publication year categories?: string[]; // Subject categories keywords?: string[]; // Keywords // ... more fields } ``` ## πŸ”§ Development ### Project Structure ``` src/ β”œβ”€β”€ models/ β”‚ └── Paper.ts # Paper data model β”œβ”€β”€ platforms/ β”‚ β”œβ”€β”€ PaperSource.ts # Abstract base class β”‚ β”œβ”€β”€ ArxivSearcher.ts # arXiv searcher β”‚ β”œβ”€β”€ WebOfScienceSearcher.ts # Web of Science searcher β”‚ β”œβ”€β”€ PubMedSearcher.ts # PubMed searcher β”‚ β”œβ”€β”€ GoogleScholarSearcher.ts # Google Scholar searcher β”‚ β”œβ”€β”€ BioRxivSearcher.ts # bioRxiv/medRxiv searcher | β”œβ”€β”€ SemanticScholarSearcher.ts # Semantic Scholar searcher | β”œβ”€β”€ IACRSearcher.ts # IACR ePrint searcher | β”œβ”€β”€ SciHubSearcher.ts # Sci-Hub searcher with mirror management | β”œβ”€β”€ ScienceDirectSearcher.ts # ScienceDirect (Elsevier) searcher β”‚ β”œβ”€β”€ SpringerSearcher.ts # Springer Nature searcher (Meta v2 & OpenAccess APIs) | β”œβ”€β”€ WileySearcher.ts # Wiley TDM API searcher | └── ScopusSearcher.ts # Scopus citation database searcher β”œβ”€β”€ utils/ β”‚ └── RateLimiter.ts # Token bucket rate limiter └── server.ts # MCP server main file ``` ### Adding New Platforms 1. Create new searcher class extending `PaperSource` 2. Implement required abstract methods 3. Register new searcher in `server.ts` 4. Add corresponding MCP tool ### Testing ```bash # Run tests npm test # Run linting npm run lint # Code formatting npm run format ``` ## 🌟 Platform-Specific Features ### Springer Nature Dual API System Springer Nature provides two APIs: 1. **Metadata API v2** (Main API) - Endpoint: `https://api.springernature.com/meta/v2/json` - Searches all Springer content (subscription + open access) - Requires API key from https://dev.springernature.com/ 2. **OpenAccess API** (Optional) - Endpoint: `https://api.springernature.com/openaccess/json` - Only searches open access content - May require separate API key or special permissions - Better for finding downloadable PDFs ```typescript // Search all Springer content search_springer({ query: "machine learning", maxResults: 10 }) // Search only open access papers search_springer({ query: "COVID-19", openAccess: true, // Uses OpenAccess API if available maxResults: 5 }) ``` ### Web of Science Advanced Search ```typescript // Use Web of Science query syntax search_webofscience({ query: 'TS="machine learning" AND PY=2023', maxResults: 20 }) // Author search search_webofscience({ query: 'AU="Smith, J*"', maxResults: 10 }) // Journal search search_webofscience({ query: 'SO="Nature" AND PY=2022-2023', maxResults: 15 }) ``` **Supported Fields:** - `TS`: Topic search - `AU`: Author - `SO`: Source journal - `PY`: Publication year - `DO`: DOI - `TI`: Title ### Google Scholar Features - **Academic Paper Priority**: Automatically filters out books, prioritizes peer-reviewed papers - **Citation Data**: Provides citation counts and academic metrics - **Anti-Detection**: Smart request patterns to avoid blocking - **Comprehensive Coverage**: Searches across all academic publishers ### Semantic Scholar Features - **AI-Powered Search**: Semantic understanding of queries - **Citation Networks**: Paper relationships and influence metrics - **Open Access PDFs**: Direct links to freely available papers - **Research Fields**: Filter by specific academic disciplines ### Sci-Hub Features - **Universal Access**: Access papers using DOI or direct URLs - **Mirror Network**: Automatic detection and use of fastest available mirror (11+ mirrors) - **Health Monitoring**: Continuous monitoring of mirror site availability - **Automatic Failover**: Seamless switching between mirrors when one fails - **Smart Retry**: Automatic retry with different mirrors on failure - **Response Time Optimization**: Mirrors sorted by response time for best performance ## πŸ“ License MIT License - see [LICENSE](LICENSE) file for details. ## 🀝 Contributing Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. 1. Fork the project 2. Create feature branch (`git checkout -b feature/amazing-feature`) 3. Commit changes (`git commit -m 'Add amazing feature'`) 4. Push to branch (`git push origin feature/amazing-feature`) 5. Open Pull Request ## πŸ› Issue Reporting If you encounter issues, please report them at [GitHub Issues](https://github.com/your-username/paper-search-mcp-nodejs/issues). ## πŸ™ Acknowledgments - Original [paper-search-mcp](https://github.com/openags/paper-search-mcp) for the foundation - MCP community for the protocol standards --- ⭐ If this project helps you, please give it a star!