UNPKG

sf-agent-framework

Version:

AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction

281 lines (219 loc) 5.14 kB
# Data Profiler Utility This utility provides comprehensive data profiling capabilities for Salesforce orgs, analyzing data quality, patterns, and characteristics. ## Purpose Automated data analysis to understand: - Data quality metrics - Field usage patterns - Data volume distribution - Relationship integrity - Storage optimization opportunities ## Core Features ### 1. Field Analysis ```javascript profileFields({ objects: ['Account', 'Contact', 'Opportunity'], analysis: { nullability: true, uniqueness: true, patterns: true, distributions: true, outliers: true, }, }); ``` ### 2. Data Quality Assessment ```javascript assessDataQuality({ checks: { completeness: { threshold: 90 }, accuracy: { validateAgainst: 'rules' }, consistency: { crossObject: true }, timeliness: { maxAge: '2 years' }, duplicates: { fuzzyMatch: true }, }, }); ``` ### 3. Storage Analysis ```javascript analyzeStorage({ objects: ['*'], metrics: ['record-count', 'storage-used', 'growth-rate', 'archive-candidates', 'large-attachments'], }); ``` ## Profiling Categories ### Data Characteristics - Field population rates - Value distributions - Pattern detection - Statistical summaries - Cardinality analysis ### Data Quality Metrics - Completeness scores - Accuracy validation - Consistency checks - Duplicate detection - Anomaly identification ### Relationship Analysis - Parent-child relationships - Orphaned records - Circular references - Lookup integrity - Junction object usage ### Performance Impact - Large data volumes - Wide tables - Skewed data - Index effectiveness - Query performance ## Usage Examples ### Basic Data Profiling ```bash # Profile specific object profileData --object Account # Profile all custom objects profileData --custom-only # Generate profiling report profileData --output data-profile.html ``` ### Automated Profiling ```yaml schedule: weekly: - profileData --quick-scan monthly: - profileData --comprehensive - generateDataQualityReport ``` ## Profile Results ### Summary Report ``` Data Profile Summary - Account Object ==================================== Total Records: 1,245,678 Data Quality Score: 87% Storage Used: 2.3 GB Key Findings: ✓ 95% field population rate ⚠ 3,421 potential duplicates ✗ 12% records missing required fields ``` ### Detailed Analysis ```json { "object": "Account", "profile": { "recordCount": 1245678, "fields": { "Name": { "populated": 100, "unique": 98.5, "avgLength": 35, "patterns": ["Company Inc", "LLC", "Corp"] }, "Phone": { "populated": 78, "format": "mixed", "invalid": 234 } }, "quality": { "completeness": 87, "accuracy": 92, "duplicates": 3421 } } } ``` ## Configuration ### Profiling Rules ```yaml profilingRules: dataQuality: required_fields: Account: [Name, Type, Industry] Contact: [LastName, Email] validation_rules: Email: regex:^[^\s@]+@[^\s@]+\.[^\s@]+$ Phone: regex:^\+?[\d\s\-\(\)]+$ thresholds: high_volume: 1000000 low_population: 10 duplicate_threshold: 95 ``` ### Custom Profiling ```javascript // Define custom profiling logic addProfiler({ name: 'industry-specific-validation', description: 'Validate industry-specific data requirements', profile: (records) => { // Custom profiling logic return analysis; }, }); ``` ## Data Quality Improvements ### Automated Cleanup ```javascript // Suggest and apply data improvements improveDataQuality({ standardizeFormats: true, deduplicateRecords: true, fillMissingRequired: true, archiveOldData: { olderThan: '5 years' }, }); ``` ### Recommendations Each profile includes: - Data quality improvement suggestions - Field optimization opportunities - Storage reduction strategies - Performance enhancement tips ## Integration Points ### With Data Management - Migration planning - Archive strategies - Data governance - Master data management ### With Development - Field usage analysis - Schema optimization - Query performance - Test data generation ## Visualization ### Charts and Graphs - Field population heatmaps - Value distribution histograms - Trend analysis charts - Relationship diagrams - Quality score dashboards ### Export Formats - HTML reports - PDF summaries - CSV data files - JSON analysis - Excel workbooks ## Best Practices 1. **Regular Profiling** - Weekly quick profiles - Monthly deep analysis - Pre-migration profiling - Post-deployment validation 2. **Action-Oriented** - Focus on actionable insights - Prioritize high-impact issues - Track improvement trends - Measure success metrics 3. **Comprehensive Coverage** - Profile all objects - Include custom fields - Analyze relationships - Consider all data types 4. **Continuous Monitoring** - Set quality baselines - Alert on degradation - Track improvements - Report to stakeholders This utility provides deep insights into your Salesforce data quality and characteristics.