paper-search-mcp-nodejs
Version:
A Node.js MCP server for searching and downloading academic papers from multiple sources, including arXiv, PubMed, bioRxiv, Web of Science, and more.
398 lines (331 loc) β’ 12.9 kB
Markdown
# AIPaper-assisant - Comprehensive Project Documentation
## π― Project Overview
AIPaper-assisant is a sophisticated Node.js Model Context Protocol (MCP) server that provides unified access to 14+ academic paper databases and search platforms. It serves as a centralized interface for searching, discovering, and downloading academic papers from sources including arXiv, Web of Science, PubMed, Google Scholar, Sci-Hub, ScienceDirect, Springer, Wiley, Scopus, Crossref, and more.
### Key Value Propositions
- **Unified Interface**: Single API for multiple academic platforms
- **Intelligent Fallbacks**: Automatic platform switching for better results
- **Enterprise Features**: Rate limiting, error handling, and robust architecture
- **MCP Integration**: Native Claude Desktop integration for seamless research workflows
## ποΈ Architecture Documentation
### System Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Protocol Layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Server Core (server.ts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββ β
β β Crossref β arXiv β Web of Sci β PubMed β β
β β Searcher β Searcher β Searcher β Searcher β β
β ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββ β
β ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββ β
β βGoogle Scholarβ Sci-Hub βScienceDirect β Springer β β
β β Searcher β Searcher β Searcher β Searcher β β
β ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Abstract Platform Framework β
β (PaperSource.ts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Unified Data Model (Paper.ts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Utility Layer (RateLimiter) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Core Components
#### 1. MCP Server Core (`src/server.ts`)
- **Purpose**: Main entry point implementing Model Context Protocol
- **Responsibilities**:
- Tool registration and lifecycle management
- Request routing and parameter validation
- Error handling and response formatting
- MCP protocol compliance
#### 2. Abstract Platform Framework (`src/platforms/PaperSource.ts`)
- **Purpose**: Base class defining contract for all search platforms
- **Key Features**:
- Standardized search interface
- Common HTTP client configuration
- Error handling patterns
- Capability discovery system
- Rate limiting integration
#### 3. Unified Data Model (`src/models/Paper.ts`)
- **Purpose**: Consistent data structure across all platforms
- **Components**:
- `Paper` interface: Standard paper metadata
- `PaperFactory`: Validation and transformation
- Type-safe schemas using Zod
#### 4. Rate Limiting System (`src/utils/RateLimiter.ts`)
- **Algorithm**: Token bucket implementation
- **Features**:
- Per-platform rate configuration
- Burst capacity management
- Queue-based request handling
- Automatic retry mechanisms
## π Platform Integration Guide
### Platform Capabilities Matrix
| Platform | Search | Download | Full Text | Citations | API Key | Special Features |
|----------|--------|----------|-----------|-----------|---------|------------------|
| Crossref | β
| β | β | β
| β | Default fallback, extensive metadata |
| arXiv | β
| β
| β
| β | β | Category filtering, PDF support |
| Web of Science | β
| β | β | β
| β
| Multi-topic search, date sorting |
| PubMed | β
| β | β | β | π‘ | Biomedical focus, MeSH terms |
| Google Scholar | β
| β | β | β
| β | Comprehensive coverage |
| Sci-Hub | β
| β
| β
| β | β | Mirror management, DOI-based |
| ScienceDirect | β
| β | β | β
| β
| Elsevier content |
| Springer | β
| β
* | β | β | β
| Dual API (Meta + OpenAccess) |
| Scopus | β
| β | β | β
| β
| Largest citation database |
| Wiley | β | β
| β
| β | β
| TDM API, DOI required |
### Integration Patterns
#### 1. Basic Platform Implementation
```typescript
export class PlatformSearcher extends PaperSource {
constructor() {
super('platform-name', {
rateLimit: { tokens: 10, refillRate: 1 },
capabilities: {
search: true,
download: false,
citations: true
}
});
}
async search(query: string, options?: SearchOptions): Promise<Paper[]> {
// Platform-specific implementation
}
}
```
#### 2. Error Handling Pattern
```typescript
try {
const response = await this.client.get(url);
return this.transformResults(response.data);
} catch (error) {
if (error.response?.status === 429) {
throw new RateLimitError('Platform rate limit exceeded');
}
throw new PlatformError(`Search failed: ${error.message}`);
}
```
#### 3. Rate Limiting Integration
```typescript
// Automatic rate limiting via base class
await this.rateLimiter.acquire();
const results = await this.performSearch(query);
return results;
```
## π§ API Documentation
### Core Tools
#### `search_papers`
Searches for academic papers across multiple platforms.
**Parameters:**
- `query` (string): Search query terms
- `platform` (string, optional): Specific platform to search
- `maxResults` (number, optional): Maximum results per platform (default: 10)
- `yearRange` (string, optional): Year range filter (e.g., "2020-2024")
- `sortBy` (string, optional): Sort criteria (relevance/date)
**Returns:**
```json
{
"papers": [
{
"title": "Paper Title",
"authors": ["Author 1", "Author 2"],
"abstract": "Abstract text...",
"doi": "10.1000/example",
"year": 2024,
"source": "arxiv",
"pdfUrl": "https://...",
"citations": 42
}
],
"metadata": {
"platform": "arxiv",
"totalResults": 150,
"searchTime": 1.23
}
}
```
#### `download_paper`
Downloads a paper PDF from supported platforms.
**Parameters:**
- `doi` (string): DOI identifier
- `title` (string, optional): Paper title for better matching
- `platform` (string, optional): Preferred download platform
**Returns:**
```json
{
"success": true,
"pdfUrl": "https://download.example.com/paper.pdf",
"platform": "arxiv",
"fileSize": 1024000
}
```
#### `get_platform_status`
Checks the health and capabilities of all platforms.
**Returns:**
```json
{
"platforms": [
{
"name": "arxiv",
"status": "online",
"capabilities": {
"search": true,
"download": true,
"citations": false
},
"rateLimit": {
"tokens": 15,
"refillRate": 1
}
}
]
}
```
### Platform-Specific Features
#### Web of Science Integration
- Multi-topic search with boolean operators
- Advanced date filtering and sorting
- Citation network analysis
- Research area categorization
#### arXiv Integration
- Category-specific filtering (cs.AI, physics.gen-ph, etc.)
- Date-based sorting and filtering
- Direct PDF download support
- Preprint version tracking
#### Sci-Hub Integration
- Mirror health monitoring
- Automatic mirror failover
- DOI-based paper location
- Legal compliance considerations
## π Development Workflow
### Setup Instructions
1. **Clone and Install**
```bash
git clone <repository>
cd paper-search-nodejs
npm install
```
2. **Configuration**
- Copy `.env.example` to `.env`
- Add API keys for platforms requiring authentication
- Configure rate limits as needed
3. **Development Mode**
```bash
npm run dev # Runs with tsx for hot reload
```
4. **Build for Production**
```bash
npm run build # Compiles TypeScript to dist/
npm start # Runs compiled JavaScript
```
### Testing Strategy
#### Test Categories
1. **Unit Tests**: Individual platform functionality
2. **Integration Tests**: Cross-platform compatibility
3. **Error Handling**: Invalid inputs and edge cases
4. **Rate Limiting**: Throttling and retry behavior
5. **MCP Protocol**: Tool registration and communication
#### Running Tests
```bash
npm test # Run all tests
npm run test:watch # Watch mode for development
npm run test:coverage # Generate coverage report
```
### Code Quality Standards
#### TypeScript Configuration
- Strict mode enabled
- ES2022 target with ESNext modules
- Source maps for debugging
- Explicit return types required
#### Linting and Formatting
```bash
npm run lint # ESLint with TypeScript support
npm run format # Prettier formatting
npm run type-check # TypeScript compiler checks
```
## π Security Considerations
### API Key Management
- Store keys in environment variables
- Never commit keys to version control
- Validate keys on platform initialization
- Rotate keys regularly
### Rate Limiting
- Respect platform rate limits
- Implement exponential backoff
- Queue requests during high load
- Monitor for abuse patterns
### Error Sanitization
- Remove sensitive data from errors
- Log sanitized error messages only
- Never expose internal implementation details
- Use generic error messages for users
## π Performance Optimization
### Concurrent Operations
- Parallel search execution where possible
- Non-blocking I/O for all network requests
- Efficient memory usage with streaming
### Caching Strategy
- Platform capability caching
- Mirror health status (Sci-Hub)
- Request deduplication
- Configurable cache TTL
### Network Efficiency
- Connection pooling with keep-alive
- Request timeout configuration
- Retry with exponential backoff
- Compression support (gzip)
## π― Best Practices
### Adding New Platforms
1. Extend `PaperSource` base class
2. Implement required methods (`search`, `download`)
3. Define platform capabilities
4. Add rate limiting configuration
5. Create comprehensive tests
6. Update documentation
### Error Handling Guidelines
```typescript
// Platform-specific errors
class PlatformError extends Error {
constructor(message: string, public platform: string) {
super(message);
this.name = 'PlatformError';
}
}
// Rate limit errors
class RateLimitError extends Error {
constructor(message: string, public retryAfter?: number) {
super(message);
this.name = 'RateLimitError';
}
}
```
### Logging Standards
```typescript
// Use structured logging
logger.info('Platform search completed', {
platform: 'arxiv',
query: 'machine learning',
results: 10,
duration: 1234
});
// Log errors with context
logger.error('Search failed', {
platform: 'wos',
error: error.message,
query: options.query,
retryCount
});
```
## π Additional Resources
### Related Documentation
- [MCP Protocol Specification](https://modelcontextprotocol.io/)
- [Platform API Documentation](./PLATFORM_APIS.md)
- [Contributing Guidelines](./CONTRIBUTING.md)
- [Changelog](./CHANGELOG.md)
### External References
- [Crossref API](https://api.crossref.org/)
- [arXiv API](https://arxiv.org/help/api/)
- [Web of Science API](https://developer.clarivate.com/)
- [PubMed API](https://www.ncbi.nlm.nih.gov/home/develop/)
---
*This documentation is automatically maintained. For updates, please submit pull requests with detailed descriptions of changes.*