UNPKG

sf-agent-framework

Version:

AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction

355 lines (272 loc) 8.32 kB
# Rollback Execution Task This task provides a systematic approach for executing rollbacks in Salesforce environments when deployments fail or cause unexpected issues. ## Purpose Enable release automation engineers to: - Execute rapid and safe rollbacks - Minimize system downtime - Preserve data integrity - Maintain audit trails - Implement lessons learned ## Prerequisites - Rollback plan documented and tested - Backup systems verified - Communication channels ready - Incident response team available - Rollback tools and scripts prepared ## Rollback Decision Framework ### 1. Rollback Triggers **Severity Assessment Matrix** ```yaml Critical (Immediate Rollback): - Complete system outage - Data corruption detected - Security breach identified - Core business process failure - Revenue impact > $10k/hour Decision Time: < 5 minutes High (Rapid Decision Required): - Major feature malfunction - Performance degradation > 50% - Integration failures - User authentication issues - Customer-facing errors Decision Time: < 15 minutes Medium (Evaluated Rollback): - Minor feature issues - UI/UX problems - Non-critical process errors - Performance degradation < 50% Decision Time: < 1 hour Low (Fix Forward Preferred): - Cosmetic issues - Non-blocking errors - Edge case failures - Documentation mismatches Decision Time: Next maintenance window ``` ### 2. Rollback Strategy Selection **Strategy Decision Tree** ```yaml Metadata Rollback: When: Code/configuration issues Method: Deploy previous package version Time: 15-30 minutes Risk: Low Data Rollback: When: Data transformation errors Method: Restore from backup Time: 1-4 hours Risk: Medium (data loss potential) Full System Rollback: When: Complete deployment failure Method: Restore full org backup Time: 4-8 hours Risk: High (significant data loss) Selective Rollback: When: Specific component failure Method: Targeted component reversion Time: 30-60 minutes Risk: Low-Medium ``` ### 3. Pre-Rollback Checklist **Critical Validations** ```yaml System State: - [ ] Current system status documented - [ ] Active user sessions identified - [ ] In-flight transactions logged - [ ] Integration status checked - [ ] Backup availability confirmed Communication: - [ ] Stakeholders notified - [ ] Maintenance page activated - [ ] Support team alerted - [ ] Customer communication drafted - [ ] Executive briefing prepared Technical Readiness: - [ ] Rollback scripts tested - [ ] Previous version available - [ ] Database restore point identified - [ ] Monitoring enhanced - [ ] War room established ``` ## Rollback Execution Process ### Phase 1: Immediate Stabilization **Emergency Response Actions** ```yaml Minute 0-5: 1. Activate incident response 2. Assess impact severity 3. Make rollback decision 4. Notify key stakeholders 5. Begin system isolation Minute 5-15: 1. Enable maintenance mode 2. Stop ongoing deployments 3. Preserve system state 4. Initialize rollback process 5. Enhance monitoring ``` ### Phase 2: Rollback Implementation **Execution Steps by Type** #### Metadata Rollback Process ```yaml Step 1: Preparation - Identify previous stable version - Generate deployment package - Validate package contents - Prepare deployment commands Step 2: Execution - Deploy to production - Monitor deployment progress - Validate component status - Check for errors Step 3: Verification - Run smoke tests - Validate core functionality - Check integration points - Confirm user access ``` #### Data Rollback Process ```yaml Step 1: Data Assessment - Identify affected records - Calculate data volume - Determine restore point - Plan restoration sequence Step 2: Restoration - Export current state (audit) - Restore from backup - Validate record counts - Verify data integrity Step 3: Reconciliation - Compare restored data - Identify gaps - Manual corrections - Update audit logs ``` ### Phase 3: System Validation **Post-Rollback Verification** ```yaml Functional Validation: - Core business processes - User authentication - Data access permissions - Integration connectivity - Email deliverability Performance Validation: - Response time baselines - System resource usage - Database performance - API throughput - Concurrent user capacity Data Validation: - Record counts - Referential integrity - Calculation accuracy - Report correctness - Audit trail continuity ``` ### Phase 4: System Restoration **Return to Normal Operations** ```yaml Gradual Restoration: 1. Internal users first 2. Pilot group validation 3. Phased user enablement 4. Full system activation 5. Remove maintenance mode Monitoring Enhancement: - Increased logging levels - Real-time dashboards - Alert thresholds lowered - Support team standby - Executive updates scheduled ``` ## Rollback Automation ### Automated Rollback Scenarios **Trigger-Based Automation** ```yaml Health Check Failures: Trigger: 3 consecutive health check failures Action: Initiate metadata rollback Notification: Ops team + Management Performance Degradation: Trigger: Response time > 5s for 5 minutes Action: Revert to previous version Notification: Full incident response Error Rate Spike: Trigger: Error rate > 10% for 10 minutes Action: Selective component rollback Notification: Dev team + Support ``` ### Rollback Orchestration **Automated Workflow** ```yaml workflow: name: automated_rollback triggers: - health_check_failure - performance_threshold_breach - error_rate_spike steps: - name: assess_impact timeout: 2m actions: - query_system_metrics - identify_affected_components - calculate_business_impact - name: decide_action timeout: 1m conditions: - if: impact == "critical" then: immediate_rollback - if: impact == "high" then: rapid_rollback - else: manual_review - name: execute_rollback timeout: 30m actions: - enable_maintenance_mode - backup_current_state - deploy_previous_version - validate_deployment - name: verify_stability timeout: 10m actions: - run_health_checks - validate_functionality - check_performance - confirm_data_integrity ``` ## Post-Rollback Activities ### Incident Analysis **Root Cause Investigation** ```yaml Immediate Analysis (Day 0): - Timeline reconstruction - Change correlation - Log analysis - Error pattern identification - Initial hypothesis Deep Dive (Day 1-3): - Code review - Test gap analysis - Environment comparison - Process evaluation - Tool assessment Prevention Planning (Day 4-7): - Process improvements - Additional test cases - Monitoring enhancements - Training requirements - Tool updates ``` ### Documentation Requirements **Rollback Report Template** ```markdown ## Rollback Incident Report ### Incident Summary - **Date/Time:** [YYYY-MM-DD HH:MM UTC] - **Duration:** [X hours Y minutes] - **Severity:** [Critical/High/Medium] - **Business Impact:** [Description] - **Root Cause:** [Brief summary] ### Timeline | Time | Event | Action Taken | Owner | | ---- | ----------------- | ------------------ | ------ | | T+0 | Issue detected | Alert triggered | System | | T+5 | Decision made | Rollback initiated | [Name] | | T+30 | Rollback complete | Validation started | [Name] | | T+45 | System stable | Normal operations | [Name] | ### Technical Details - **Failed Component:** [Component name] - **Error Type:** [Classification] - **Rollback Method:** [Method used] - **Data Impact:** [If any] ### Lessons Learned 1. **What Went Well:** - [Positive outcome 1] - [Positive outcome 2] 2. **What Needs Improvement:** - [Improvement area 1] - [Improvement area 2] 3. **Action Items:** - [ ] [Action 1] - Owner: [Name] - Due: [Date] - [ ] [Action 2] - Owner: [Name] - Due: [Date] ``` ## Success Criteria Rollback completed within RTO No data loss or corruption System stability restored Users properly communicated Root cause identified Prevention measures implemented