sf-agent-framework
Version:
AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction
433 lines (325 loc) • 9.44 kB
Markdown
# Salesforce Data Migration Best Practices
## Overview
Data migration is a critical component of Salesforce implementations. These best
practices ensure successful, efficient, and accurate data transfer while
minimizing risks and downtime.
## Migration Planning
### Pre-Migration Assessment
- **Data Inventory**: Catalog all data sources
- **Quality Assessment**: Current data state
- **Volume Analysis**: Record counts and sizes
- **Complexity Evaluation**: Relationships and dependencies
- **Business Rules**: Validation and transformation needs
- **Timeline Requirements**: Deadlines and constraints
- **Resource Planning**: Team and tools needed
### Migration Strategy
**Big Bang Migration**:
- All data migrated at once
- Shorter timeline
- Higher risk
- Clear cutover
**Phased Migration**:
- Data migrated in stages
- Lower risk
- Longer timeline
- Complex coordination
**Parallel Run**:
- Both systems operate simultaneously
- Lowest risk
- Highest cost
- Complex synchronization
### Success Criteria
- **Data Completeness**: All required data migrated
- **Data Accuracy**: Information correctly transferred
- **Performance Targets**: Migration speed requirements
- **Business Continuity**: Minimal disruption
- **Quality Thresholds**: Acceptable error rates
## Data Preparation
### Data Profiling
```sql
-- Analyze source data
SELECT
COUNT(*) as total_records,
COUNT(DISTINCT id) as unique_records,
SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) as missing_emails,
SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) as missing_phones
FROM source_contacts;
```
### Data Cleansing
**Common Cleansing Tasks**:
1. **Standardization**:
- Company names
- Addresses
- Phone formats
- Date formats
2. **Deduplication**:
- Identify duplicates
- Define merge rules
- Preserve relationships
- Maintain audit trail
3. **Validation**:
- Email format
- Required fields
- Data types
- Value ranges
4. **Enrichment**:
- Missing data
- Default values
- Calculated fields
- Reference data
### Data Mapping
**Mapping Documentation**:
```yaml
Account_Mapping:
source_table: companies
target_object: Account
fields:
- source: company_name
target: Name
transformation: TRIM(UPPER(company_name))
- source: annual_rev
target: AnnualRevenue
transformation: CAST(annual_rev AS DECIMAL)
- source: emp_count
target: NumberOfEmployees
transformation: CAST(emp_count AS INTEGER)
relationships:
- source: parent_company_id
target: ParentId
lookup_field: External_ID__c
```
## Migration Approach
### Order of Operations
1. **Reference Data**:
- Users
- Roles
- Profiles
- Record Types
- Picklist Values
2. **Master Data**:
- Accounts
- Contacts
- Products
- Price Books
3. **Transactional Data**:
- Opportunities
- Cases
- Orders
- Custom Objects
4. **Historical Data**:
- Activities
- Notes
- Attachments
- Field History
### Relationship Management
**Parent-Child Dependencies**:
```python
# Load parents first
accounts = load_accounts()
account_map = {a.external_id: a.id for a in accounts}
# Then load children with relationships
contacts = []
for contact_data in source_contacts:
contact = Contact(
FirstName=contact_data['first_name'],
LastName=contact_data['last_name'],
AccountId=account_map.get(contact_data['company_id'])
)
contacts.append(contact)
```
### External ID Strategy
- **Purpose**: Maintain source system references
- **Implementation**: Custom field on each object
- **Format**: SourceSystem_OriginalID
- **Usage**: Upsert operations and relationships
- **Benefits**: Simplifies updates and troubleshooting
## Migration Execution
### Tool Selection
**Data Loader**:
- Best for: <5 million records
- Features: GUI and CLI
- Pros: Free, simple
- Cons: Limited transformation
**Bulk API**:
- Best for: Large volumes
- Features: Async processing
- Pros: Efficient, parallel
- Cons: Complex error handling
**ETL Tools**:
- Best for: Complex transformations
- Options: Informatica, Talend, MuleSoft
- Pros: Powerful features
- Cons: Cost, learning curve
### Performance Optimization
**Batch Processing**:
```python
def migrate_records(records, batch_size=10000):
total = len(records)
for i in range(0, total, batch_size):
batch = records[i:i+batch_size]
result = bulk_api.insert(batch)
log_progress(i, total, result)
handle_errors(result.errors)
```
**Parallel Processing**:
- Split data by logical boundaries
- Run multiple jobs simultaneously
- Monitor API limits
- Coordinate dependencies
**Optimization Techniques**:
1. Disable triggers during load
2. Defer sharing calculations
3. Disable workflow rules
4. Turn off duplicate rules
5. Bulk API for large volumes
6. Binary format for attachments
## Quality Assurance
### Validation Approach
**Record Count Validation**:
```sql
-- Source count
SELECT COUNT(*) FROM source_table;
-- Target count
SELECT COUNT() FROM Target_Object__c;
-- Should match (accounting for filters/transformations)
```
**Data Sampling**:
```python
def validate_sample(source_records, target_records, sample_size=1000):
sample_ids = random.sample(source_records.keys(), sample_size)
for id in sample_ids:
source = source_records[id]
target = target_records[id]
# Compare fields
assert source.name == target.Name
assert source.email == target.Email__c
# ... additional validations
```
**Relationship Validation**:
- Verify parent-child relationships
- Check lookup field populations
- Validate junction object records
- Confirm sharing rules applied
### Testing Strategy
**Test Migration Phases**:
1. **Unit Testing**: Individual transformations
2. **Sample Testing**: Small data subset
3. **UAT Testing**: Business validation
4. **Full Testing**: Complete dataset
5. **Performance Testing**: Load times
6. **Rollback Testing**: Recovery procedures
## Error Handling
### Error Categories
**Data Errors**:
- Validation rule failures
- Required field missing
- Invalid picklist values
- Duplicate detection
- Format mismatches
**System Errors**:
- API limits exceeded
- Timeout errors
- Connection failures
- Permission errors
- Storage limits
### Error Resolution
```python
class MigrationErrorHandler:
def __init__(self):
self.error_log = []
self.retry_queue = []
def handle_error(self, record, error):
if self.is_retryable(error):
self.retry_queue.append(record)
else:
self.error_log.append({
'record': record,
'error': error,
'timestamp': datetime.now()
})
def process_retries(self):
for record in self.retry_queue:
try:
self.migrate_record(record)
except Exception as e:
self.log_permanent_error(record, e)
```
## Post-Migration Activities
### Validation Checklist
- [ ] Record counts match
- [ ] Key fields populated
- [ ] Relationships intact
- [ ] Business rules applied
- [ ] Reports functioning
- [ ] Integrations working
- [ ] Performance acceptable
- [ ] Security verified
- [ ] Users can access data
- [ ] Workflows triggered
### Data Reconciliation
**Reconciliation Report**:
```sql
-- Missing records
SELECT s.id
FROM source_table s
LEFT JOIN target_mapping t ON s.id = t.source_id
WHERE t.source_id IS NULL;
-- Data discrepancies
SELECT
s.id,
s.field as source_value,
t.field as target_value
FROM source_table s
JOIN target_table t ON s.id = t.external_id
WHERE s.field != t.field;
```
### Decommissioning
1. **Final Backup**: Archive source data
2. **Access Removal**: Revoke source system access
3. **Documentation**: Update system inventory
4. **Communication**: Notify stakeholders
5. **Retention**: Follow data retention policies
## Common Pitfalls
### Pitfall: Underestimating Complexity
**Solution**: Thorough analysis and planning
### Pitfall: Poor Data Quality
**Solution**: Invest in cleansing upfront
### Pitfall: Inadequate Testing
**Solution**: Multiple test iterations
### Pitfall: Missing Dependencies
**Solution**: Map all relationships
### Pitfall: No Rollback Plan
**Solution**: Detailed recovery procedures
## Best Practices Summary
1. **Plan Thoroughly**: 80% planning, 20% execution
2. **Clean First**: Fix data quality at source
3. **Test Iteratively**: Multiple test runs
4. **Document Everything**: Mappings, issues, decisions
5. **Communicate Constantly**: Keep stakeholders informed
6. **Monitor Closely**: Track progress and issues
7. **Have Rollback Plan**: Be ready to revert
8. **Validate Thoroughly**: Multiple validation methods
9. **Train Users**: On new data structure
10. **Celebrate Success**: Recognize team efforts
## Migration Checklist
### Pre-Migration
- [ ] Data inventory complete
- [ ] Quality assessment done
- [ ] Mapping documented
- [ ] Test plan created
- [ ] Tools selected
- [ ] Team trained
- [ ] Stakeholders aligned
### During Migration
- [ ] Backups taken
- [ ] Monitoring active
- [ ] Issues logged
- [ ] Progress tracked
- [ ] Communication ongoing
### Post-Migration
- [ ] Validation complete
- [ ] Reconciliation done
- [ ] Users trained
- [ ] Documentation updated
- [ ] Lessons learned captured
- [ ] Success celebrated