UNPKG

yoda-mcp

Version:

Intelligent Planning MCP with Optional Dependencies and Graceful Fallbacks - wise planning through the Force of lean excellence

599 lines (442 loc) 13.7 kB
# Planner MCP Deployment Runbook ## Overview This runbook provides comprehensive deployment procedures for the Planner MCP system, including production deployments, rollbacks, and environment management. ## Deployment Strategy ### Blue-Green Deployment The system uses blue-green deployment for zero-downtime updates: - **Blue Environment**: Current production environment - **Green Environment**: New version for deployment - **Traffic Switch**: Instantaneous routing change - **Rollback Capability**: Switch back to blue environment if issues occur ### Canary Deployment For gradual rollouts: - **5%**: Initial canary deployment - **25%**: Expanded testing - **100%**: Full deployment ## Pre-Deployment Checklist ### Planning Phase - [ ] **Release Notes**: Complete and reviewed - [ ] **Deployment Plan**: Approved by team lead - [ ] **Rollback Plan**: Documented and tested - [ ] **Database Migrations**: Reviewed and tested - [ ] **Configuration Changes**: Documented - [ ] **Feature Flags**: Configured correctly - [ ] **Monitoring**: Alerts configured for new features - [ ] **Load Testing**: Performance validated - [ ] **Security Scan**: No critical vulnerabilities - [ ] **Stakeholder Notification**: All parties informed ### Technical Requirements - [ ] **CI/CD Pipeline**: All tests passing - [ ] **Docker Images**: Built and scanned - [ ] **Kubernetes Manifests**: Updated and validated - [ ] **Database Backups**: Recent backup available - [ ] **Environment Variables**: Configured correctly - [ ] **SSL Certificates**: Valid and up-to-date - [ ] **DNS Configuration**: Verified - [ ] **External Dependencies**: Available and compatible ## Environment Management ### Environment Hierarchy ``` Development → Staging → Production ↓ ↓ ↓ Feature Integration Live Testing Testing Traffic ``` ### Environment Specifications | Environment | Purpose | Infrastructure | Data | |-------------|---------|---------------|------| | **Development** | Feature development | Single node | Synthetic | | **Staging** | Pre-production testing | Production-like | Anonymized | | **Production** | Live system | Full cluster | Real | ## Deployment Procedures ### Standard Deployment Process #### Phase 1: Pre-deployment 1. **Environment Preparation** ```bash # Set deployment environment export DEPLOY_ENV=production export RELEASE_VERSION=v1.2.3 # Authenticate with cluster kubectl config use-context production-cluster # Verify cluster access kubectl get nodes ``` 2. **Backup Critical Data** ```bash # Create database backup ./scripts/backup-database.sh # Backup configuration ./scripts/backup-config.sh # Verify backups ./scripts/verify-backups.sh ``` 3. **Pre-flight Checks** ```bash # Check system health ./scripts/pre-deployment-check.sh # Validate resources ./scripts/resource-check.sh # Test external dependencies ./scripts/dependency-check.sh ``` #### Phase 2: Green Environment Deployment 1. **Deploy New Version** ```bash # Deploy to green environment ./scripts/deploy-green.sh $RELEASE_VERSION # Wait for deployment completion kubectl rollout status deployment/planner-mcp-green -n planner-mcp # Verify pods are running kubectl get pods -n planner-mcp -l environment=green ``` 2. **Database Migrations** ```bash # Run database migrations ./scripts/run-migrations.sh --environment=green # Verify migration success ./scripts/verify-migrations.sh ``` 3. **Configuration Updates** ```bash # Update configurations kubectl apply -f k8s/configmaps/production.yaml # Update secrets kubectl apply -f k8s/secrets/production.yaml # Restart pods to pick up config changes kubectl rollout restart deployment/planner-mcp-green -n planner-mcp ``` #### Phase 3: Testing and Validation 1. **Health Checks** ```bash # Test green environment health ./scripts/health-check.sh --environment=green # Verify all services are operational ./scripts/service-check.sh --environment=green ``` 2. **Smoke Testing** ```bash # Run smoke tests ./scripts/smoke-test.sh --environment=green # Test critical user journeys ./scripts/critical-path-test.sh --environment=green ``` 3. **Performance Validation** ```bash # Run performance tests ./scripts/performance-test.sh --environment=green # Validate against SLA requirements ./scripts/sla-validation.sh --environment=green ``` #### Phase 4: Traffic Switch 1. **Canary Traffic** ```bash # Route 5% traffic to green ./scripts/route-traffic.sh --green=5 --blue=95 # Monitor for 10 minutes sleep 600 ./scripts/monitor-metrics.sh --duration=10m ``` 2. **Progressive Rollout** ```bash # Increase to 25% traffic ./scripts/route-traffic.sh --green=25 --blue=75 ./scripts/monitor-metrics.sh --duration=15m # Increase to 50% traffic ./scripts/route-traffic.sh --green=50 --blue=50 ./scripts/monitor-metrics.sh --duration=15m # Full traffic switch ./scripts/route-traffic.sh --green=100 --blue=0 ``` 3. **Post-Switch Validation** ```bash # Verify traffic routing ./scripts/verify-traffic-routing.sh # Monitor system metrics ./scripts/post-deployment-monitoring.sh --duration=30m # Validate business metrics ./scripts/business-metrics-check.sh ``` #### Phase 5: Blue Environment Cleanup 1. **Environment Decommission** ```bash # Scale down blue environment kubectl scale deployment/planner-mcp-blue --replicas=1 -n planner-mcp # Wait before full cleanup sleep 1800 # 30 minutes # Remove blue environment ./scripts/cleanup-blue.sh ``` ### Database Migration Procedures #### Migration Types 1. **Backward Compatible Migrations** - Column additions - Index creation - New tables - Data additions 2. **Breaking Changes** - Column removals - Table structure changes - Data type changes - Constraint modifications #### Migration Execution ```bash # Create migration backup ./scripts/create-migration-backup.sh # Test migration on copy ./scripts/test-migration.sh --dry-run # Execute migration ./scripts/run-migration.sh --version=$MIGRATION_VERSION # Verify migration ./scripts/verify-migration.sh --version=$MIGRATION_VERSION ``` #### Migration Rollback ```bash # Check if rollback is possible ./scripts/check-migration-rollback.sh --version=$MIGRATION_VERSION # Execute rollback ./scripts/rollback-migration.sh --version=$MIGRATION_VERSION # Verify rollback ./scripts/verify-rollback.sh ``` ## Rollback Procedures ### Automatic Rollback Triggers - **Health Check Failures**: > 50% endpoints failing - **Error Rate Spike**: > 10% increase in error rate - **Performance Degradation**: > 100% increase in response time - **Business Metric Drop**: > 25% decrease in success rate ### Manual Rollback Process 1. **Emergency Rollback** ```bash # Immediate traffic switch to blue ./scripts/emergency-rollback.sh # Verify rollback success ./scripts/verify-rollback.sh # Send incident notification ./scripts/notify-rollback.sh --reason="Performance degradation" ``` 2. **Database Rollback** ```bash # Check if database rollback is needed ./scripts/check-db-rollback-needed.sh # Execute database rollback ./scripts/rollback-database.sh --backup-id=$BACKUP_ID # Verify database integrity ./scripts/verify-db-integrity.sh ``` ## Environment-Specific Procedures ### Production Deployment #### Additional Requirements - **Change Management**: Approved change ticket - **Maintenance Window**: Scheduled (if needed) - **Communication**: Stakeholder notifications - **Rollback Window**: 4 hours maximum #### Production-Specific Steps 1. **Change Freeze Check** ```bash # Verify no active change freeze ./scripts/check-change-freeze.sh ``` 2. **Maintenance Mode (if required)** ```bash # Enable maintenance mode ./scripts/enable-maintenance-mode.sh # Deploy changes # ... deployment steps ... # Disable maintenance mode ./scripts/disable-maintenance-mode.sh ``` ### Staging Deployment #### Continuous Integration Flow ```bash # Automated staging deployment ./scripts/ci-staging-deploy.sh \ --branch=main \ --run-tests=true \ --notify-team=true ``` #### Staging Validation ```bash # Full test suite ./scripts/run-all-tests.sh --environment=staging # Integration tests ./scripts/run-integration-tests.sh --environment=staging # E2E tests ./scripts/run-e2e-tests.sh --environment=staging ``` ## Configuration Management ### Configuration Sources 1. **Environment Variables**: Runtime configuration 2. **ConfigMaps**: Application settings 3. **Secrets**: Sensitive data 4. **Feature Flags**: Feature toggles ### Configuration Deployment ```bash # Update environment-specific configs kubectl apply -f k8s/environments/$DEPLOY_ENV/ # Validate configuration ./scripts/validate-config.sh --environment=$DEPLOY_ENV # Restart services to pick up changes kubectl rollout restart deployment/planner-mcp -n planner-mcp ``` ## Monitoring and Alerts ### Deployment Monitoring #### Key Metrics to Monitor - **Response Time**: P50, P95, P99 latencies - **Error Rates**: 4xx and 5xx error percentages - **Throughput**: Requests per second - **Resource Usage**: CPU, memory, disk utilization - **Business Metrics**: Planning success rate, user satisfaction #### Monitoring Commands ```bash # Real-time metrics ./scripts/monitor-deployment.sh --duration=30m # Custom dashboard ./scripts/open-deployment-dashboard.sh # Alert status ./scripts/check-alert-status.sh ``` ### Post-Deployment Alerts #### Success Notifications ```bash # Deployment success notification ./scripts/notify-deployment-success.sh \ --version=$RELEASE_VERSION \ --environment=$DEPLOY_ENV \ --duration=$DEPLOYMENT_DURATION ``` #### Failure Notifications ```bash # Deployment failure notification ./scripts/notify-deployment-failure.sh \ --version=$RELEASE_VERSION \ --error="$ERROR_MESSAGE" \ --rollback-status=$ROLLBACK_STATUS ``` ## Security Considerations ### Security Scanning ```bash # Container image scanning ./scripts/scan-images.sh --version=$RELEASE_VERSION # Configuration security check ./scripts/security-config-check.sh # Vulnerability assessment ./scripts/vulnerability-scan.sh ``` ### Access Control ```bash # Verify RBAC settings kubectl auth can-i --list --as=system:serviceaccount:planner-mcp:default # Check network policies kubectl get networkpolicies -n planner-mcp # Validate pod security policies kubectl get psp ``` ## Disaster Recovery ### Backup Procedures ```bash # Full system backup ./scripts/full-backup.sh --environment=production # Database backup ./scripts/backup-database.sh --compression=true # Configuration backup ./scripts/backup-configuration.sh ``` ### Recovery Procedures ```bash # Restore from backup ./scripts/restore-from-backup.sh --backup-id=$BACKUP_ID # Verify system integrity ./scripts/verify-system-integrity.sh # Resume normal operations ./scripts/resume-operations.sh ``` ## Troubleshooting ### Common Issues #### Deployment Stuck ```bash # Check rollout status kubectl rollout status deployment/planner-mcp -n planner-mcp --timeout=600s # Debug stuck rollout kubectl describe deployment/planner-mcp -n planner-mcp # Check pod events kubectl get events -n planner-mcp --sort-by=.metadata.creationTimestamp ``` #### Database Migration Failures ```bash # Check migration status ./scripts/check-migration-status.sh # View migration logs kubectl logs -n planner-mcp -l job-name=migration-$VERSION # Manual migration fix ./scripts/fix-migration.sh --version=$VERSION ``` #### Configuration Issues ```bash # Validate configuration syntax ./scripts/validate-config-syntax.sh # Check configuration diff ./scripts/config-diff.sh --old=$OLD_VERSION --new=$NEW_VERSION # Test configuration ./scripts/test-config.sh --dry-run ``` ## Scripts and Automation ### Deployment Scripts Location ``` scripts/ ├── deployment/ │ ├── deploy-green.sh │ ├── route-traffic.sh │ ├── rollback.sh │ └── cleanup-blue.sh ├── testing/ │ ├── smoke-test.sh │ ├── performance-test.sh │ └── critical-path-test.sh ├── monitoring/ │ ├── health-check.sh │ ├── monitor-metrics.sh │ └── alert-status.sh └── utilities/ ├── backup.sh ├── restore.sh └── notify.sh ``` ### CI/CD Pipeline Integration ```yaml # .github/workflows/deploy.yml name: Deploy to Production on: push: tags: - 'v*' jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Run Pre-deployment Checks run: ./scripts/pre-deployment-check.sh - name: Deploy to Green Environment run: ./scripts/deploy-green.sh ${{ github.ref }} - name: Run Tests run: ./scripts/run-all-tests.sh - name: Switch Traffic run: ./scripts/progressive-rollout.sh ``` ## Contact Information ### Deployment Team - **Lead**: Deployment Manager - **DevOps**: @devops-team - **Engineering**: @engineering-team ### Escalation - **Engineering Manager**: For deployment issues - **VP Engineering**: For critical failures - **CTO**: For business-critical incidents ### Communication Channels - **Slack**: #deployments (status updates) - **PagerDuty**: Emergency escalation - **Email**: deployment-team@company.com --- **Last Updated**: {current_date} **Version**: 1.0 **Owner**: DevOps Team