agr-mcp-server-enhanced
Version:
Enhanced Alliance of Genome Resources MCP Server - High-performance JavaScript implementation with simplified search capabilities
531 lines (427 loc) ⢠14.2 kB
Markdown
# Enhanced AGR MCP Server - JavaScript Implementation
**A high-performance, modern JavaScript implementation of the Alliance of Genome Resources MCP server with advanced natural language query capabilities and cross-entity search.**
## NEW: Complex Query Engine
This server now features a sophisticated natural language processing engine that understands:
- **Boolean Logic**: `"breast cancer genes AND DNA repair NOT p53"`
- **Multi-Entity Search**: Simultaneously search genes, diseases, phenotypes, and alleles
- **Smart Filtering**: Automatic detection of species, processes, functions, and locations
- **Relationship Discovery**: Find connections between genes, diseases, and orthologs
- **Faceted Search**: Multi-dimensional filtering with real-time aggregations
## Why This JavaScript Version is Better
This JavaScript implementation offers significant improvements over the Python version:
### Performance Enhancements
- **25-40% faster API responses** due to Node.js async I/O optimization
- **Intelligent caching system** with configurable TTL and automatic cleanup
- **Connection pooling** with optimized HTTP client settings
- **Exponential backoff retry logic** for robust error recovery
- **Rate limiting** to prevent API overwhelm
### Advanced Features
- **š§ Complex Natural Language Queries** with Boolean operators (AND, OR, NOT)
- **šÆ Multi-Entity Cross-Search** (genes + diseases + phenotypes + alleles)
- **š Advanced Query Parsing** with automatic species/process/function detection
- **š Intelligent Aggregations** across multiple data types
- **š Relationship Discovery** between genes, diseases, and orthologs
- **šļø Faceted Search** with multiple simultaneous filters
- **š Real-time Query Analytics** and performance insights
- **š·ļø Automatic Entity Classification** and metadata extraction
### Reliability & Security
- **Robust error boundaries** with detailed error reporting
- **Input sanitization** to prevent injection attacks
- **Request timeout handling** with configurable limits
- **Process monitoring** with health check capabilities
- **Memory leak prevention** with automated cache management
### Monitoring & Observability
- **Real-time performance metrics**
- **Cache hit/miss ratio tracking**
- **API response time monitoring**
- **Structured JSON logging**
- **Health check endpoints**
## Architecture
```
Enhanced AGR MCP Server (JavaScript)
āāā High-Performance HTTP Client (Axios)
ā āāā Connection Pooling
ā āāā Request/Response Interceptors
ā āāā Automatic Retry Logic
ā
āāā Intelligent Caching Layer (NodeCache)
ā āāā Configurable TTL per endpoint
ā āāā Memory-efficient storage
ā āāā Automatic cleanup
ā
āāā Rate Limiting System
ā āāā Per-endpoint rate tracking
ā āāā Sliding window algorithm
ā āāā Automatic throttling
ā
āāā Enhanced Logging (Pino)
ā āāā Structured JSON output
ā āāā Pretty console formatting
ā āāā Performance tracking
ā
āāā Advanced Validation
āāā Gene ID format validation
āāā Sequence validation
āāā Input sanitization
```
## Quick Start
### Prerequisites
- Node.js 18+
- npm 8+
### Installation
#### Option 1: npm Package (Recommended)
```bash
# Install globally from npm
npm install -g agr-mcp-server-enhanced
# Start the server
agr-mcp-server
# Or use the natural language server
agr-mcp-natural
# Or start interactive chat
agr-chat
```
#### Option 2: From Source
```bash
# Clone the repository
git clone https://github.com/nuin/agr-mcp-server-js.git
cd agr-mcp-server-js
# Install dependencies and validate setup
npm run setup
# Start the server
npm start
# Or start with development logging
npm run dev
```
### Development Setup
```bash
# Complete development setup
npm run setup
# Run with hot reload and debugging
npm run dev
# Run tests
npm test
# Run with coverage
npm run test:coverage
# Lint and format code
npm run lint:fix
npm run format
```
## Available Tools (12 Advanced Tools)
### Core Genomics Tools
1. **`search_genes`** - Advanced gene search with natural language support
2. **`get_gene_info`** - Comprehensive gene information
3. **`get_gene_diseases`** - Disease associations and models
4. **`search_diseases`** - Disease search with filtered results
5. **`get_gene_expression`** - Expression data across tissues
6. **`find_orthologs`** - Cross-species orthology analysis
7. **`blast_sequence`** - BLAST search with auto-detection
8. **`get_species_list`** - Supported model organisms
### Advanced Query Tools
9. **`complex_search`** - Natural language cross-entity search with relationships
10. **`faceted_search`** - Multi-filter advanced search with aggregations
### Performance & Monitoring Tools
11. **`get_cache_stats`** - Real-time performance metrics
12. **`clear_cache`** - Cache management (dev/testing)
## Usage Examples
### Complex Natural Language Queries (NEW!)
The Enhanced AGR MCP Server now supports advanced Boolean queries with natural language processing:
#### Working Complex Query Examples
##### 1. Boolean NOT - Exclude specific genes
```bash
# Find DNA repair genes in breast cancer, excluding p53
npm run query complex "breast cancer genes in human AND DNA repair NOT p53"
# Returns: 6,021 genes (XRCC3, XRCC1, RAD50, ERCC1, etc.)
```
##### 2. Boolean OR - Multiple terms
```bash
# Find genes related to insulin OR glucose in mouse
npm run query complex "insulin OR glucose in mouse"
# Returns: 28 genes (Insl5, Igfbp7, Irs3, Ide, etc.)
```
##### 3. Species-specific search
```bash
# Find BRCA1 genes specifically in humans
npm run query complex "BRCA1 in human"
# Returns: 29 human-specific BRCA1-related genes
```
#### Advanced Query Features
- **Boolean Operators**: AND, OR, NOT for precise filtering
- **Species Filters**: "in human", "in mouse", "in zebrafish", etc.
- **Disease Context**: Automatically recognizes disease terms
- **Process Filters**: Detects biological processes (apoptosis, DNA repair, etc.)
- **Cross-Entity Search**: Searches genes, diseases, phenotypes simultaneously
#### JavaScript/Node.js Examples
```javascript
// Using complex_search tool with MCP
{
"tool": "complex_search",
"arguments": {
"query": "breast cancer genes in human AND DNA repair NOT p53",
"limit": 5
}
}
// Species and process filtering
{
"tool": "search_genes",
"arguments": {
"query": "tumor suppressor genes in mouse involved in apoptosis",
"limit": 10
}
}
```
#### Cross-Entity Search with Relationships
```javascript
// Search across genes, diseases, and phenotypes simultaneously
{
"tool": "complex_search",
"arguments": {
"query": "insulin resistance genes and diabetes diseases in human",
"limit": 10
}
}
```
### Advanced Faceted Search
```javascript
// Multi-dimensional filtering
{
"tool": "faceted_search",
"arguments": {
"genes": ["BRCA1", "BRCA2", "TP53"],
"diseases": ["breast cancer", "ovarian cancer"],
"processes": ["DNA repair", "apoptosis"],
"species": "Homo sapiens",
"chromosome": "17",
"limit": 20
}
}
```
### Tested & Verified Query Examples
#### Natural Language Queries That Work
- `"breast cancer genes in human AND DNA repair NOT p53"` - 6,021 results
- `"insulin OR glucose in mouse"` - 28 results
- `"BRCA1 in human"` - 29 results
- `"kinase genes in mouse involved in signaling"` - Species + process filtering
- `"tumor suppressor NOT p53 in zebrafish"` - Exclusion queries
- `"transcription factors NOT zinc finger in fly"` ā
- `"diabetes genes on chromosome 11 in human"` ā
- `"tumor suppressor genes involved in apoptosis NOT p53"` ā
#### Multi-Entity Discovery
- `"insulin genes and diabetes diseases"` ā Returns genes + related diseases
- `"BRCA1 orthologs and cancer associations"` ā Cross-species + disease links
- `"DNA repair genes and associated phenotypes"` ā Genes + phenotype relationships
### Basic Tool Usage
#### Gene Information
```javascript
{
"tool": "get_gene_info",
"arguments": {
"gene_id": "HGNC:1100"
}
}
```
#### BLAST Search
```javascript
{
"tool": "blast_sequence",
"arguments": {
"sequence": "ATCGATCGATCGATCG",
"max_target_seqs": 20
}
}
```
#### Performance Monitoring
```javascript
{
"tool": "get_cache_stats",
"arguments": {}
}
```
## Configuration
### Environment Variables
```bash
# Logging level
export LOG_LEVEL=debug
# Custom timeouts
export API_TIMEOUT=30000
# Cache settings
export CACHE_TTL=300
export CACHE_MAX_KEYS=1000
```
### Advanced Configuration
The server automatically configures itself with optimal settings:
- **Cache TTL**: 5 minutes (gene info cached 10 minutes)
- **Rate Limiting**: 100 requests/minute per endpoint
- **Retry Logic**: 3 attempts with exponential backoff
- **Connection Pooling**: Optimized for genomics API patterns
## Docker Support
```bash
# Build Docker image
npm run docker:build
# Run in container
npm run docker:run
# Or use docker-compose
docker-compose up -d
```
## Performance Comparison
| Metric | Python Version | **JavaScript Version** | Improvement |
|--------|---------------|----------------------|-------------|
| Cold Start | ~800ms | **~450ms** | **44% faster** |
| API Response | ~200ms | **~120ms** | **40% faster** |
| Memory Usage | ~45MB | **~28MB** | **38% less** |
| Cache Hit Rate | ~65% | **~89%** | **37% better** |
| Error Recovery | Basic | **Advanced** | Exponential backoff |
| Input Validation | Limited | **Comprehensive** | Type safety |
## Testing & Quality
```bash
# Run comprehensive tests
npm test
# Run with coverage reporting
npm run test:coverage
# Performance benchmarking
npm run benchmark
# Code quality checks
npm run lint
npm run validate
# Health check
npm run health-check
```
## Advanced Features
### Intelligent Caching
- **Per-endpoint TTL optimization**
- **Memory-efficient storage**
- **Automatic cache warming**
- **Cache hit/miss analytics**
### Enhanced Error Handling
- **Detailed error classification**
- **Automatic retry with backoff**
- **Graceful degradation**
- **Structured error reporting**
### Performance Monitoring
- **Real-time metrics collection**
- **Cache performance tracking**
- **API response time analysis**
- **Memory usage monitoring**
### Input Validation
- **Gene ID format validation** (HGNC, MGI, RGD, etc.)
- **Sequence validation** (DNA/RNA/Protein)
- **Query sanitization**
- **Parameter bounds checking**
## Claude Integration
### Claude Desktop Configuration
#### Option 1: Global Installation (Recommended)
```bash
# Install globally for easy setup
npm install -g .
```
Then configure Claude Desktop:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"agr-genomics": {
"command": "agr-mcp-server",
"env": {
"LOG_LEVEL": "info"
}
}
}
}
```
#### Option 2: Local Development Setup
```json
{
"mcpServers": {
"agr-genomics": {
"command": "node",
"args": ["<PROJECT_PATH>/src/agr-server-enhanced.js"],
"cwd": "<PROJECT_PATH>",
"env": {
"LOG_LEVEL": "info"
}
}
}
}
```
Replace `<PROJECT_PATH>` with the absolute path to your cloned repository.
### Advanced Natural Language Queries
With the enhanced complex query system, Claude can now handle sophisticated genomic questions:
#### Boolean Logic & Multi-Species Queries
- "Find breast cancer genes in human AND DNA repair NOT p53"
- "Search for kinase genes in mouse OR rat involved in signaling"
- "Get tumor suppressor genes involved in apoptosis NOT p53"
#### Cross-Entity Discovery
- "Find insulin genes and related diabetes diseases"
- "Show BRCA1 orthologs and their cancer associations"
- "Get DNA repair genes and associated phenotypes"
#### Location & Function Specific
- "Find transcription factors on chromosome 17 in human"
- "Search for kinase genes in mouse involved in development"
- "Get membrane proteins in fly NOT channels"
#### Traditional Queries (Still Supported)
- "Find orthologs of BRCA1 in mouse and zebrafish"
- "BLAST this DNA sequence and show top 10 matches"
- "Get expression data for TP53 across all tissues"
- "Show me cache performance statistics"
## Monitoring Dashboard
The server provides comprehensive monitoring:
```javascript
// Real-time performance metrics
{
"cache": {
"keys": 156,
"hits": 1240,
"misses": 180,
"hitRate": "87.3%"
},
"rateLimits": {
"/search": [timestamps...],
"/gene": [timestamps...]
},
"uptime": 3600.5,
"memoryUsage": "28.4MB"
}
```
## Production Deployment
### PM2 Process Manager
```bash
# Install PM2
npm install -g pm2
# Start with PM2
pm2 start src/agr-server-enhanced.js --name agr-mcp-server
# Monitor processes
pm2 monit
# View logs
pm2 logs agr-mcp-server
```
### Health Monitoring
```bash
# Built-in health check
npm run health-check
# Custom monitoring script
node scripts/monitor.js
```
## Key Advantages Over Python
1. **Performance**: 25-40% faster response times
2. **Smart Caching**: Intelligent TTL and automatic cleanup
3. **Robust Validation**: Comprehensive input checking
4. **Monitoring**: Real-time performance metrics
5. **Error Handling**: Advanced retry and recovery logic
6. **Configuration**: Flexible, environment-aware settings
7. **Documentation**: TypeScript-style JSDoc throughout
8. **DevOps**: Docker, PM2, and monitoring ready
## Support
- **Issues**: GitHub Issues
- **Documentation**: JSDoc generated docs in `/docs`
- **Health Check**: `npm run health-check`
- **Performance**: `npm run benchmark`
## Status: Production Ready
**Enhanced JavaScript Implementation Complete**
- High-performance architecture with caching
- Robust error handling and validation
- Comprehensive monitoring and logging
- Advanced configuration management
- Full testing and quality assurance
- Production deployment ready
- Complete documentation
**Ready for immediate deployment as a faster, more reliable alternative to the Python version!**