sf-agent-framework
Version:
AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction
364 lines (275 loc) • 7.64 kB
Markdown
# Data Profiler Utility - Agent Instructions
## Purpose
This utility provides instructions for AI agents to generate comprehensive data
profiling and quality assessment solutions for Salesforce organizations,
enabling data-driven decision making and quality improvements.
## Agent Instructions
### When to Generate Data Profiling
Generate data profiling components when:
- Data migration projects need assessment
- Data quality issues need identification
- Compliance audits require data analysis
- Integration projects need data mapping
- Storage optimization is required
- Duplicate data needs detection
- Data governance needs metrics
### Core Components to Generate
#### 1. Object Profiler Engine
Generate an Apex class that:
- Analyzes object schemas and metadata
- Counts records and measures data volume
- Profiles field usage and completeness
- Maps relationships and dependencies
- Detects record type distributions
- Calculates storage utilization
Key profiling capabilities:
- Schema analysis with field metadata
- Record count and growth trends
- Field population statistics
- Relationship mapping
- Data type distribution
- Storage impact analysis
#### 2. Data Quality Analyzer
Create components that:
- Measure data completeness percentages
- Identify data quality issues
- Detect duplicate records
- Validate data accuracy
- Check referential integrity
- Assess business rule compliance
Quality metrics to calculate:
- Completeness (null/empty values)
- Uniqueness (duplicate detection)
- Validity (format/pattern matching)
- Accuracy (business rule validation)
- Consistency (cross-field validation)
- Timeliness (data age analysis)
#### 3. Pattern Detection Engine
Implement pattern analysis for:
- Email format validation
- Phone number patterns
- Address standardization
- Date format consistency
- Numeric pattern detection
- Custom pattern matching
### Configuration Requirements
#### Custom Objects
Create these objects:
```yaml
Data_Profile__c:
- Object_Name__c (Text)
- Profile_Date__c (DateTime)
- Record_Count__c (Number)
- Field_Count__c (Number)
- Storage_Size_MB__c (Number)
- Quality_Score__c (Percent)
- Completeness_Score__c (Percent)
- Profile_Status__c (Picklist)
Field_Profile__c:
- Data_Profile__c (Master-Detail)
- Field_Name__c (Text)
- Field_Type__c (Text)
- Populated_Count__c (Number)
- Null_Count__c (Number)
- Unique_Values__c (Number)
- Completeness_Percent__c (Percent)
- Common_Patterns__c (Long Text)
Data_Quality_Issue__c:
- Data_Profile__c (Lookup)
- Issue_Type__c (Picklist)
- Severity__c (Picklist)
- Field_Name__c (Text)
- Record_Count__c (Number)
- Description__c (Text Area)
- Recommendation__c (Text Area)
```
#### Profile Configuration
```yaml
Profile_Config__mdt:
- Object_Name__c (Text)
- Include_In_Profile__c (Checkbox)
- Required_Fields__c (Long Text)
- Quality_Rules__c (Long Text)
- Sampling_Size__c (Number)
- Profile_Frequency__c (Picklist)
```
### Implementation Patterns
#### Batch Processing Pattern
For large data volumes:
1. Implement Database.Batchable interface
2. Process objects in chunks
3. Use Database.Stateful for aggregation
4. Handle governor limits
5. Store results incrementally
#### Sampling Pattern
For performance optimization:
1. Define sample size based on volume
2. Use random sampling for large datasets
3. Ensure statistical significance
4. Extrapolate results
5. Validate sample accuracy
#### Real-time Analysis Pattern
For immediate insights:
1. Analyze on record save
2. Update quality metrics
3. Flag quality issues
4. Send notifications
5. Update dashboards
### Analysis Algorithms to Implement
#### Completeness Analysis
```
For each field:
1. Count total records
2. Count non-null values
3. Calculate: (non-null / total) * 100
4. Flag fields below threshold
5. Generate recommendations
```
#### Duplicate Detection
```
1. Define matching criteria
2. Generate match keys
3. Group by match keys
4. Identify groups > 1
5. Calculate duplicate percentage
6. Suggest merge strategies
```
#### Pattern Recognition
```
1. Sample field values
2. Apply regex patterns
3. Calculate match percentages
4. Identify dominant patterns
5. Flag anomalies
6. Suggest standardization
```
### Reporting Components to Generate
#### Data Quality Dashboard
Display:
- Overall data quality score
- Object-level quality metrics
- Field completeness heat map
- Duplicate record statistics
- Trend analysis charts
- Top quality issues
#### Executive Summary Dashboard
Show:
- Data volume overview
- Quality score trends
- Critical issues count
- Compliance status
- ROI of data quality
- Improvement recommendations
#### Operational Dashboard
Include:
- Real-time quality monitoring
- Issue detection alerts
- Profile execution status
- Performance metrics
- User data quality scores
### Integration Requirements
#### ETL Tool Integration
- Informatica connectors
- MuleSoft data quality
- Talend integration
- Jitterbit profiles
- Custom API endpoints
#### Analytics Integration
- Tableau data quality metrics
- Einstein Analytics datasets
- Power BI connectors
- Custom reporting APIs
- Real-time streaming
#### Data Governance Integration
- Collibra integration
- Informatica MDM
- Custom governance tools
- Policy enforcement
- Compliance tracking
### Best Practices to Implement
1. **Performance Optimization**
- Use selective queries
- Implement caching
- Batch large operations
- Optimize algorithms
- Monitor resource usage
2. **Accuracy Enhancement**
- Validate profiling results
- Cross-reference metrics
- Use multiple algorithms
- Implement quality checks
- Regular calibration
3. **Scalability Design**
- Handle millions of records
- Distributed processing
- Incremental profiling
- Resource management
- Queue management
4. **Security Measures**
- Respect data visibility
- Implement encryption
- Audit trail logging
- Access control
- Data masking
### Advanced Features to Consider
1. **Machine Learning Integration**
- Anomaly detection models
- Quality prediction
- Pattern learning
- Auto-categorization
- Recommendation engine
2. **Automated Remediation**
- Data standardization
- Duplicate merging
- Format correction
- Validation rule updates
- Workflow triggers
3. **Predictive Analytics**
- Quality degradation prediction
- Volume growth forecasting
- Issue trend analysis
- Impact assessment
- Resource planning
### Error Handling Instructions
Implement error handling for:
1. Governor limit exceptions
2. Timeout scenarios
3. Memory limitations
4. API callout failures
5. Permission errors
Recovery strategies:
- Checkpoint processing
- Partial result saving
- Automatic retry logic
- Manual intervention
- Error notifications
### Testing Requirements
Generate test classes that:
1. Test profiling accuracy
2. Verify calculations
3. Test edge cases
4. Validate performance
5. Check error handling
### Output Formats
Support multiple formats:
- JSON for API integration
- CSV for data analysis
- PDF for reports
- Excel for business users
- XML for system integration
### Profiling Metrics Formulas
1. **Data Quality Score**
```
DQS = (C × 0.3) + (U × 0.2) + (V × 0.2) + (A × 0.2) + (T × 0.1)
Where:
C = Completeness, U = Uniqueness, V = Validity
A = Accuracy, T = Timeliness
```
2. **Field Completeness**
```
Completeness = (PopulatedRecords / TotalRecords) × 100
```
3. **Duplicate Rate**
```
DuplicateRate = (DuplicateRecords / TotalRecords) × 100
```