UNPKG

@rhofkens/mcp-quotes-server-claude-code

Version:

Model Context Protocol (MCP) server for managing and serving quotes

169 lines (133 loc) 5.46 kB
# Error Handling Implementation - Phase 3 ## Overview This document describes the comprehensive error handling system implemented for the MCP Quotes Server, providing robust error recovery, user-friendly messages, and detailed debugging capabilities. ## Key Components ### 1. Error Handling Utilities (`src/utils/errorHandling.ts`) A new comprehensive error handling module that provides: - **Retry Logic with Exponential Backoff**: Automatically retries failed operations with configurable delays - **Circuit Breaker Pattern**: Prevents cascading failures by temporarily disabling failed services - **Structured Error Responses**: Consistent error format with debugging context - **Error Context Builder**: Fluent API for building rich error context - **Error Aggregator**: Collects multiple errors in batch operations - **Timeout Protection**: Prevents operations from hanging indefinitely ### 2. Enhanced Serper Client (`src/services/serperClient.ts`) The Serper API client now includes: - Circuit breaker protection to prevent overwhelming failed services - Retry logic for transient failures (timeouts, rate limits) - Request ID tracking for distributed tracing - Comprehensive error context logging - Timeout protection for all API calls ### 3. Updated Quote Retrieval Tool (`src/tools/getQuotes.ts`) The getQuotes tool now features: - Error aggregation for parsing failures - Graceful degradation with fallback searches - Request ID propagation throughout the operation - Circuit breaker status checking - Structured error responses with recovery suggestions ## Error Types and Codes ### Custom Error Classes 1. **BaseError**: Base class for all custom errors 2. **APIError**: External service failures 3. **ValidationError**: Input validation failures 4. **ConfigError**: Configuration issues 5. **NetworkError**: Connectivity problems 6. **AuthenticationError**: Authentication failures 7. **RateLimitError**: Rate limit exceeded ### Error Codes - `UNKNOWN_ERROR`: Unhandled error type - `INTERNAL_ERROR`: Internal server error - `VALIDATION_ERROR`: Validation failure - `API_ERROR`: External API error - `API_TIMEOUT`: Request timeout - `API_RATE_LIMIT`: Rate limit exceeded - `API_UNAUTHORIZED`: Authentication failed - `API_NOT_FOUND`: Resource not found - `CONFIG_ERROR`: Configuration error - `MCP_ERROR`: MCP protocol error ## Usage Examples ### Retry Logic ```typescript const result = await withRetry( async () => await apiCall(), { maxRetries: 3, initialDelay: 1000, maxDelay: 10000, backoffMultiplier: 2, retryableErrors: [ErrorCode.API_TIMEOUT, ErrorCode.API_RATE_LIMIT], onRetry: (error, attempt) => { logger.warn('Retrying operation', { error, attempt }); } } ); ``` ### Circuit Breaker ```typescript const circuitBreaker = new CircuitBreaker( 5, // Open after 5 failures 60000, // Stay open for 1 minute 30000 // Try half-open after 30 seconds ); const result = await circuitBreaker.execute(async () => { return await riskyOperation(); }); ``` ### Error Context Building ```typescript const context = new ErrorContextBuilder() .setOperation('searchQuotes') .setInput({ query, limit }) .setEnvironment() .setStackTrace(error) .addRelatedError('PARSE_ERROR', 'Failed to parse response') .build(); ``` ### Structured Error Response ```typescript const structuredError = createStructuredError(error, context, requestId); // Returns: { error: { code: "API_TIMEOUT", message: "Technical error message", userMessage: "The request timed out. Please try again.", timestamp: "2025-07-05T12:00:00.000Z", requestId: "req_1234567890_abc123", details: { timeout: 10000 }, context: { /* operation context */ }, recovery: { suggestions: ["Check your internet connection", "Try again"], retryable: true, retryAfter: 60 } } } ``` ## Best Practices 1. **Always include request IDs** for distributed tracing 2. **Log errors with full context** for debugging 3. **Provide user-friendly error messages** that are actionable 4. **Include recovery suggestions** in error responses 5. **Use error aggregation** for batch operations 6. **Implement timeouts** for all external calls 7. **Monitor circuit breaker states** for service health 8. **Test error scenarios** thoroughly ## Benefits 1. **Improved Reliability**: Automatic retry and circuit breaker patterns prevent transient failures 2. **Better User Experience**: Clear, actionable error messages with recovery suggestions 3. **Enhanced Debugging**: Rich error context and request tracking 4. **Graceful Degradation**: Fallback mechanisms when primary operations fail 5. **Performance Protection**: Timeouts and circuit breakers prevent resource exhaustion ## Monitoring and Observability The implementation includes comprehensive logging at all error points: - Request IDs for tracing operations - Circuit breaker state changes - Retry attempts and outcomes - Error aggregation summaries - Performance metrics (timeouts, response times) ## Future Enhancements 1. **Metrics Collection**: Add Prometheus metrics for error rates and circuit breaker states 2. **Distributed Tracing**: Integrate with OpenTelemetry for full request tracing 3. **Alert Integration**: Add webhook support for critical error notifications 4. **Dashboard**: Create a monitoring dashboard for error trends 5. **A/B Testing**: Test different retry strategies and timeout values