UNPKG

sf-agent-framework

Version:

AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction

529 lines (400 loc) 11.3 kB
# ETL Patterns for Salesforce ## Overview ETL (Extract, Transform, Load) patterns provide proven approaches for moving and transforming data into and out of Salesforce while maintaining data quality and system performance. ## Core ETL Concepts ### ETL vs ELT **ETL (Extract, Transform, Load)**: - Transform data before loading - Processing happens in middleware - Better for complex transformations - Reduces load on target system **ELT (Extract, Load, Transform)**: - Load raw data first - Transform within Salesforce - Leverages platform capabilities - Simpler architecture ### Key Considerations - **Volume**: Data quantity impacts approach - **Velocity**: Speed requirements - **Variety**: Data types and sources - **Veracity**: Quality requirements - **Value**: Business importance ## Common ETL Patterns ### Pattern 1: Batch Integration **Use Case**: Large volume, non-real-time data sync **Implementation**: ``` 1. Extract: Query source system 2. Transform: Apply business rules 3. Stage: Temporary storage 4. Validate: Quality checks 5. Load: Bulk API operation 6. Verify: Post-load validation ``` **Best Practices**: - Use Bulk API for >10k records - Implement parallel processing - Handle errors gracefully - Monitor API limits - Schedule during off-hours ### Pattern 2: Real-Time Sync **Use Case**: Immediate data synchronization **Implementation**: ``` 1. Trigger: Source system event 2. Extract: Get changed data 3. Transform: Apply mappings 4. Load: REST/SOAP API call 5. Confirm: Acknowledge receipt ``` **Best Practices**: - Use Platform Events for high volume - Implement circuit breakers - Queue for resilience - Monitor latency - Handle failures gracefully ### Pattern 3: Change Data Capture (CDC) **Use Case**: Sync only changed records **Implementation**: ``` 1. Enable: CDC on objects 2. Subscribe: Change events 3. Process: Handle changes 4. Transform: Apply logic 5. Update: Target system ``` **Best Practices**: - Filter unnecessary changes - Handle redelivery - Maintain event order - Monitor event volume - Plan retention period ### Pattern 4: Master Data Management **Use Case**: Salesforce as single source of truth **Implementation**: ``` 1. Identify: Master records 2. Match: Find duplicates 3. Merge: Consolidate data 4. Enrich: Add missing data 5. Distribute: Sync to systems ``` **Best Practices**: - Define match rules clearly - Implement survivorship rules - Maintain audit trail - Handle conflicts - Monitor data quality ## Data Extraction Patterns ### Query-Based Extraction **SOQL for Targeted Data**: ```sql SELECT Id, Name, LastModifiedDate FROM Account WHERE LastModifiedDate > :lastRunDate AND Type = 'Customer' ``` **Considerations**: - Query optimization - Selective filters - Relationship queries - Governor limits - Pagination handling ### Bulk Data Extraction **Bulk API Usage**: ```python # Python example job = bulk.create_query_job("Account") batch = bulk.query(job, "SELECT Id, Name FROM Account") while not bulk.is_batch_done(batch): time.sleep(10) results = bulk.get_all_results_for_query_batch(batch) ``` **Best Practices**: - Chunk large datasets - Handle timeouts - Monitor job status - Process results streaming - Clean up completed jobs ### Report/Analytics API **Extract Aggregated Data**: ``` 1. Define report criteria 2. Execute report via API 3. Parse results 4. Transform as needed 5. Load to target ``` **Use Cases**: - Summary data extraction - Complex calculations - Cross-object aggregations - Trending analysis ## Transformation Patterns ### Field Mapping **Simple Mapping**: ```json { "source_field": "target_field", "FirstName": "First_Name__c", "LastName": "Last_Name__c", "Email": "Email__c" } ``` **Complex Mapping**: ```javascript // Concatenation target.Full_Name__c = source.FirstName + ' ' + source.LastName; // Lookup transformation target.Account__c = lookupAccountId(source.CompanyName); // Conditional logic target.Status__c = source.IsActive ? 'Active' : 'Inactive'; ``` ### Data Type Conversion **Common Conversions**: ```javascript // String to Date target.Birth_Date__c = Date.parse(source.DOB); // Number formatting target.Revenue__c = parseFloat(source.Revenue.replace(/[^0-9.]/g, '')); // Boolean conversion target.Is_Active__c = source.Status === 'Active'; // Picklist mapping target.Type__c = mapPicklistValue(source.CustomerType); ``` ### Data Quality Transformations **Cleansing Operations**: ```javascript // Standardize phone target.Phone = formatPhone(source.Phone); // Clean email target.Email = source.Email.toLowerCase().trim(); // Standardize company name target.Account_Name__c = standardizeCompanyName(source.Company); // Address formatting target.Billing_Address__c = formatAddress(source.Address); ``` ### Business Rule Application **Complex Logic**: ```javascript // Lead scoring target.Lead_Score__c = calculateLeadScore({ title: source.Title, company_size: source.Employees, industry: source.Industry, }); // Territory assignment target.Territory__c = assignTerritory({ state: source.State, revenue: source.Annual_Revenue, }); // Categorization target.Customer_Segment__c = determineSegment(source); ``` ## Loading Patterns ### Bulk API Loading **Optimal for Large Volumes**: ```python def bulk_load_records(records, object_name): job = bulk.create_insert_job(object_name) batches = [] # Create batches of 10,000 records for i in range(0, len(records), 10000): batch = records[i:i+10000] batches.append(bulk.post_batch(job, batch)) # Monitor completion bulk.wait_for_batch(job, batches) # Check results for batch in batches: results = bulk.get_batch_results(batch) process_results(results) ``` ### Upsert Operations **Insert or Update Based on External ID**: ```apex List<Account> accounts = new List<Account>(); for(ExternalData__c ext : externalData) { accounts.add(new Account( External_ID__c = ext.Id, Name = ext.CompanyName, // ... other fields )); } Database.upsert(accounts, Account.External_ID__c, false); ``` ### Relationship Loading **Parent-Child Relationships**: ```json // Using external IDs { "Name": "John Doe", "Account__r": { "External_ID__c": "EXT-12345" } } // Using Salesforce IDs { "Name": "Opportunity ABC", "AccountId": "001XX000003DHPh" } ``` ## Error Handling Patterns ### Retry Logic ```javascript const maxRetries = 3; const retryDelay = 5000; // 5 seconds async function loadWithRetry(data, attempt = 1) { try { return await salesforceAPI.insert(data); } catch (error) { if (attempt < maxRetries && isRetryable(error)) { await sleep(retryDelay * attempt); return loadWithRetry(data, attempt + 1); } throw error; } } ``` ### Error Logging ```apex public class ETLErrorLogger { public static void logError(String process, String record, Exception e) { ETL_Error_Log__c errorLog = new ETL_Error_Log__c( Process__c = process, Record_Identifier__c = record, Error_Message__c = e.getMessage(), Stack_Trace__c = e.getStackTraceString(), Timestamp__c = DateTime.now() ); insert errorLog; } } ``` ### Dead Letter Queue ```javascript // Failed records go to dead letter queue function processWithDLQ(records) { const failed = []; for (const record of records) { try { processRecord(record); } catch (error) { failed.push({ record: record, error: error.message, timestamp: new Date(), }); } } if (failed.length > 0) { moveToDeadLetterQueue(failed); } } ``` ## Performance Optimization ### Parallel Processing ```python import concurrent.futures import math def parallel_load(records, num_threads=5): chunk_size = math.ceil(len(records) / num_threads) chunks = [records[i:i+chunk_size] for i in range(0, len(records), chunk_size)] with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor: futures = [executor.submit(load_chunk, chunk) for chunk in chunks] results = [f.result() for f in concurrent.futures.as_completed(futures)] return results ``` ### Bulk API Best Practices ```javascript // Optimal batch sizing const OPTIMAL_BATCH_SIZE = 10000; const MAX_BATCHES_PER_JOB = 100; // Compression for large payloads const compressedData = gzip(JSON.stringify(records)); // Binary attachments handling const attachments = records.filter((r) => r.hasAttachment); const regularRecords = records.filter((r) => !r.hasAttachment); // Process separately for performance processBulkRecords(regularRecords); processAttachments(attachments); ``` ## Monitoring and Logging ### ETL Job Monitoring ```apex public class ETLJobMonitor { public static void startJob(String jobName) { ETL_Job__c job = new ETL_Job__c( Name = jobName, Status__c = 'Running', Start_Time__c = DateTime.now() ); insert job; } public static void updateProgress(Id jobId, Integer processed, Integer total) { ETL_Job__c job = new ETL_Job__c( Id = jobId, Records_Processed__c = processed, Total_Records__c = total, Progress__c = (Decimal)processed / total * 100 ); update job; } } ``` ### Performance Metrics ```javascript class ETLMetrics { constructor() { this.startTime = Date.now(); this.recordsProcessed = 0; this.errors = 0; } recordProcessed(success = true) { this.recordsProcessed++; if (!success) this.errors++; } getMetrics() { const duration = Date.now() - this.startTime; return { duration: duration, recordsPerSecond: this.recordsProcessed / (duration / 1000), errorRate: this.errors / this.recordsProcessed, successRate: 1 - this.errors / this.recordsProcessed, }; } } ``` ## Security Considerations ### Credential Management - Use Named Credentials - Implement OAuth where possible - Rotate API keys regularly - Encrypt sensitive data in transit - Use secure storage for credentials ### Data Security ```apex // Field-level encryption public String encryptSensitiveData(String data) { Blob key = Crypto.generateAesKey(256); Blob encrypted = Crypto.encryptWithManagedIV('AES256', key, Blob.valueOf(data)); return EncodingUtil.base64Encode(encrypted); } // Masking sensitive data in logs public String maskSensitiveData(String data) { return data.substring(0, 4) + '****' + data.substring(data.length() - 4); } ``` ## Best Practices Summary 1. **Plan Thoroughly**: Understand data volumes and patterns 2. **Use Appropriate APIs**: Bulk for volume, REST for real-time 3. **Handle Errors Gracefully**: Implement retry and logging 4. **Monitor Performance**: Track metrics and optimize 5. **Ensure Data Quality**: Validate before and after 6. **Secure Data**: Encrypt and protect sensitive information 7. **Document Mappings**: Maintain transformation documentation 8. **Test Extensively**: Include edge cases and error scenarios 9. **Plan for Scale**: Design for future growth 10. **Maintain Audit Trail**: Track all data movements