yoda-mcp
Version:
Intelligent Planning MCP with Optional Dependencies and Graceful Fallbacks - wise planning through the Force of lean excellence
599 lines (442 loc) • 13.7 kB
Markdown
# Planner MCP Deployment Runbook
## Overview
This runbook provides comprehensive deployment procedures for the Planner MCP system, including production deployments, rollbacks, and environment management.
## Deployment Strategy
### Blue-Green Deployment
The system uses blue-green deployment for zero-downtime updates:
- **Blue Environment**: Current production environment
- **Green Environment**: New version for deployment
- **Traffic Switch**: Instantaneous routing change
- **Rollback Capability**: Switch back to blue environment if issues occur
### Canary Deployment
For gradual rollouts:
- **5%**: Initial canary deployment
- **25%**: Expanded testing
- **100%**: Full deployment
## Pre-Deployment Checklist
### Planning Phase
- [ ] **Release Notes**: Complete and reviewed
- [ ] **Deployment Plan**: Approved by team lead
- [ ] **Rollback Plan**: Documented and tested
- [ ] **Database Migrations**: Reviewed and tested
- [ ] **Configuration Changes**: Documented
- [ ] **Feature Flags**: Configured correctly
- [ ] **Monitoring**: Alerts configured for new features
- [ ] **Load Testing**: Performance validated
- [ ] **Security Scan**: No critical vulnerabilities
- [ ] **Stakeholder Notification**: All parties informed
### Technical Requirements
- [ ] **CI/CD Pipeline**: All tests passing
- [ ] **Docker Images**: Built and scanned
- [ ] **Kubernetes Manifests**: Updated and validated
- [ ] **Database Backups**: Recent backup available
- [ ] **Environment Variables**: Configured correctly
- [ ] **SSL Certificates**: Valid and up-to-date
- [ ] **DNS Configuration**: Verified
- [ ] **External Dependencies**: Available and compatible
## Environment Management
### Environment Hierarchy
```
Development → Staging → Production
↓ ↓ ↓
Feature Integration Live
Testing Testing Traffic
```
### Environment Specifications
| Environment | Purpose | Infrastructure | Data |
|-------------|---------|---------------|------|
| **Development** | Feature development | Single node | Synthetic |
| **Staging** | Pre-production testing | Production-like | Anonymized |
| **Production** | Live system | Full cluster | Real |
## Deployment Procedures
### Standard Deployment Process
#### Phase 1: Pre-deployment
1. **Environment Preparation**
```bash
# Set deployment environment
export DEPLOY_ENV=production
export RELEASE_VERSION=v1.2.3
# Authenticate with cluster
kubectl config use-context production-cluster
# Verify cluster access
kubectl get nodes
```
2. **Backup Critical Data**
```bash
# Create database backup
./scripts/backup-database.sh
# Backup configuration
./scripts/backup-config.sh
# Verify backups
./scripts/verify-backups.sh
```
3. **Pre-flight Checks**
```bash
# Check system health
./scripts/pre-deployment-check.sh
# Validate resources
./scripts/resource-check.sh
# Test external dependencies
./scripts/dependency-check.sh
```
#### Phase 2: Green Environment Deployment
1. **Deploy New Version**
```bash
# Deploy to green environment
./scripts/deploy-green.sh $RELEASE_VERSION
# Wait for deployment completion
kubectl rollout status deployment/planner-mcp-green -n planner-mcp
# Verify pods are running
kubectl get pods -n planner-mcp -l environment=green
```
2. **Database Migrations**
```bash
# Run database migrations
./scripts/run-migrations.sh --environment=green
# Verify migration success
./scripts/verify-migrations.sh
```
3. **Configuration Updates**
```bash
# Update configurations
kubectl apply -f k8s/configmaps/production.yaml
# Update secrets
kubectl apply -f k8s/secrets/production.yaml
# Restart pods to pick up config changes
kubectl rollout restart deployment/planner-mcp-green -n planner-mcp
```
#### Phase 3: Testing and Validation
1. **Health Checks**
```bash
# Test green environment health
./scripts/health-check.sh --environment=green
# Verify all services are operational
./scripts/service-check.sh --environment=green
```
2. **Smoke Testing**
```bash
# Run smoke tests
./scripts/smoke-test.sh --environment=green
# Test critical user journeys
./scripts/critical-path-test.sh --environment=green
```
3. **Performance Validation**
```bash
# Run performance tests
./scripts/performance-test.sh --environment=green
# Validate against SLA requirements
./scripts/sla-validation.sh --environment=green
```
#### Phase 4: Traffic Switch
1. **Canary Traffic**
```bash
# Route 5% traffic to green
./scripts/route-traffic.sh --green=5 --blue=95
# Monitor for 10 minutes
sleep 600
./scripts/monitor-metrics.sh --duration=10m
```
2. **Progressive Rollout**
```bash
# Increase to 25% traffic
./scripts/route-traffic.sh --green=25 --blue=75
./scripts/monitor-metrics.sh --duration=15m
# Increase to 50% traffic
./scripts/route-traffic.sh --green=50 --blue=50
./scripts/monitor-metrics.sh --duration=15m
# Full traffic switch
./scripts/route-traffic.sh --green=100 --blue=0
```
3. **Post-Switch Validation**
```bash
# Verify traffic routing
./scripts/verify-traffic-routing.sh
# Monitor system metrics
./scripts/post-deployment-monitoring.sh --duration=30m
# Validate business metrics
./scripts/business-metrics-check.sh
```
#### Phase 5: Blue Environment Cleanup
1. **Environment Decommission**
```bash
# Scale down blue environment
kubectl scale deployment/planner-mcp-blue --replicas=1 -n planner-mcp
# Wait before full cleanup
sleep 1800 # 30 minutes
# Remove blue environment
./scripts/cleanup-blue.sh
```
### Database Migration Procedures
#### Migration Types
1. **Backward Compatible Migrations**
- Column additions
- Index creation
- New tables
- Data additions
2. **Breaking Changes**
- Column removals
- Table structure changes
- Data type changes
- Constraint modifications
#### Migration Execution
```bash
# Create migration backup
./scripts/create-migration-backup.sh
# Test migration on copy
./scripts/test-migration.sh --dry-run
# Execute migration
./scripts/run-migration.sh --version=$MIGRATION_VERSION
# Verify migration
./scripts/verify-migration.sh --version=$MIGRATION_VERSION
```
#### Migration Rollback
```bash
# Check if rollback is possible
./scripts/check-migration-rollback.sh --version=$MIGRATION_VERSION
# Execute rollback
./scripts/rollback-migration.sh --version=$MIGRATION_VERSION
# Verify rollback
./scripts/verify-rollback.sh
```
## Rollback Procedures
### Automatic Rollback Triggers
- **Health Check Failures**: > 50% endpoints failing
- **Error Rate Spike**: > 10% increase in error rate
- **Performance Degradation**: > 100% increase in response time
- **Business Metric Drop**: > 25% decrease in success rate
### Manual Rollback Process
1. **Emergency Rollback**
```bash
# Immediate traffic switch to blue
./scripts/emergency-rollback.sh
# Verify rollback success
./scripts/verify-rollback.sh
# Send incident notification
./scripts/notify-rollback.sh --reason="Performance degradation"
```
2. **Database Rollback**
```bash
# Check if database rollback is needed
./scripts/check-db-rollback-needed.sh
# Execute database rollback
./scripts/rollback-database.sh --backup-id=$BACKUP_ID
# Verify database integrity
./scripts/verify-db-integrity.sh
```
## Environment-Specific Procedures
### Production Deployment
#### Additional Requirements
- **Change Management**: Approved change ticket
- **Maintenance Window**: Scheduled (if needed)
- **Communication**: Stakeholder notifications
- **Rollback Window**: 4 hours maximum
#### Production-Specific Steps
1. **Change Freeze Check**
```bash
# Verify no active change freeze
./scripts/check-change-freeze.sh
```
2. **Maintenance Mode (if required)**
```bash
# Enable maintenance mode
./scripts/enable-maintenance-mode.sh
# Deploy changes
# ... deployment steps ...
# Disable maintenance mode
./scripts/disable-maintenance-mode.sh
```
### Staging Deployment
#### Continuous Integration Flow
```bash
# Automated staging deployment
./scripts/ci-staging-deploy.sh \
--branch=main \
--run-tests=true \
--notify-team=true
```
#### Staging Validation
```bash
# Full test suite
./scripts/run-all-tests.sh --environment=staging
# Integration tests
./scripts/run-integration-tests.sh --environment=staging
# E2E tests
./scripts/run-e2e-tests.sh --environment=staging
```
## Configuration Management
### Configuration Sources
1. **Environment Variables**: Runtime configuration
2. **ConfigMaps**: Application settings
3. **Secrets**: Sensitive data
4. **Feature Flags**: Feature toggles
### Configuration Deployment
```bash
# Update environment-specific configs
kubectl apply -f k8s/environments/$DEPLOY_ENV/
# Validate configuration
./scripts/validate-config.sh --environment=$DEPLOY_ENV
# Restart services to pick up changes
kubectl rollout restart deployment/planner-mcp -n planner-mcp
```
## Monitoring and Alerts
### Deployment Monitoring
#### Key Metrics to Monitor
- **Response Time**: P50, P95, P99 latencies
- **Error Rates**: 4xx and 5xx error percentages
- **Throughput**: Requests per second
- **Resource Usage**: CPU, memory, disk utilization
- **Business Metrics**: Planning success rate, user satisfaction
#### Monitoring Commands
```bash
# Real-time metrics
./scripts/monitor-deployment.sh --duration=30m
# Custom dashboard
./scripts/open-deployment-dashboard.sh
# Alert status
./scripts/check-alert-status.sh
```
### Post-Deployment Alerts
#### Success Notifications
```bash
# Deployment success notification
./scripts/notify-deployment-success.sh \
--version=$RELEASE_VERSION \
--environment=$DEPLOY_ENV \
--duration=$DEPLOYMENT_DURATION
```
#### Failure Notifications
```bash
# Deployment failure notification
./scripts/notify-deployment-failure.sh \
--version=$RELEASE_VERSION \
--error="$ERROR_MESSAGE" \
--rollback-status=$ROLLBACK_STATUS
```
## Security Considerations
### Security Scanning
```bash
# Container image scanning
./scripts/scan-images.sh --version=$RELEASE_VERSION
# Configuration security check
./scripts/security-config-check.sh
# Vulnerability assessment
./scripts/vulnerability-scan.sh
```
### Access Control
```bash
# Verify RBAC settings
kubectl auth can-i --list --as=system:serviceaccount:planner-mcp:default
# Check network policies
kubectl get networkpolicies -n planner-mcp
# Validate pod security policies
kubectl get psp
```
## Disaster Recovery
### Backup Procedures
```bash
# Full system backup
./scripts/full-backup.sh --environment=production
# Database backup
./scripts/backup-database.sh --compression=true
# Configuration backup
./scripts/backup-configuration.sh
```
### Recovery Procedures
```bash
# Restore from backup
./scripts/restore-from-backup.sh --backup-id=$BACKUP_ID
# Verify system integrity
./scripts/verify-system-integrity.sh
# Resume normal operations
./scripts/resume-operations.sh
```
## Troubleshooting
### Common Issues
#### Deployment Stuck
```bash
# Check rollout status
kubectl rollout status deployment/planner-mcp -n planner-mcp --timeout=600s
# Debug stuck rollout
kubectl describe deployment/planner-mcp -n planner-mcp
# Check pod events
kubectl get events -n planner-mcp --sort-by=.metadata.creationTimestamp
```
#### Database Migration Failures
```bash
# Check migration status
./scripts/check-migration-status.sh
# View migration logs
kubectl logs -n planner-mcp -l job-name=migration-$VERSION
# Manual migration fix
./scripts/fix-migration.sh --version=$VERSION
```
#### Configuration Issues
```bash
# Validate configuration syntax
./scripts/validate-config-syntax.sh
# Check configuration diff
./scripts/config-diff.sh --old=$OLD_VERSION --new=$NEW_VERSION
# Test configuration
./scripts/test-config.sh --dry-run
```
## Scripts and Automation
### Deployment Scripts Location
```
scripts/
├── deployment/
│ ├── deploy-green.sh
│ ├── route-traffic.sh
│ ├── rollback.sh
│ └── cleanup-blue.sh
├── testing/
│ ├── smoke-test.sh
│ ├── performance-test.sh
│ └── critical-path-test.sh
├── monitoring/
│ ├── health-check.sh
│ ├── monitor-metrics.sh
│ └── alert-status.sh
└── utilities/
├── backup.sh
├── restore.sh
└── notify.sh
```
### CI/CD Pipeline Integration
```yaml
# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
tags:
- 'v*'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run Pre-deployment Checks
run: ./scripts/pre-deployment-check.sh
- name: Deploy to Green Environment
run: ./scripts/deploy-green.sh ${{ github.ref }}
- name: Run Tests
run: ./scripts/run-all-tests.sh
- name: Switch Traffic
run: ./scripts/progressive-rollout.sh
```
## Contact Information
### Deployment Team
- **Lead**: Deployment Manager
- **DevOps**: @devops-team
- **Engineering**: @engineering-team
### Escalation
- **Engineering Manager**: For deployment issues
- **VP Engineering**: For critical failures
- **CTO**: For business-critical incidents
### Communication Channels
- **Slack**: #deployments (status updates)
- **PagerDuty**: Emergency escalation
- **Email**: deployment-team@company.com
---
**Last Updated**: {current_date}
**Version**: 1.0
**Owner**: DevOps Team