sf-agent-framework
Version:
AI Agent Orchestration Framework for Salesforce Development - Two-phase architecture with 70% context reduction
281 lines (219 loc) • 5.14 kB
Markdown
# Data Profiler Utility
This utility provides comprehensive data profiling capabilities for Salesforce
orgs, analyzing data quality, patterns, and characteristics.
## Purpose
Automated data analysis to understand:
- Data quality metrics
- Field usage patterns
- Data volume distribution
- Relationship integrity
- Storage optimization opportunities
## Core Features
### 1. Field Analysis
```javascript
profileFields({
objects: ['Account', 'Contact', 'Opportunity'],
analysis: {
nullability: true,
uniqueness: true,
patterns: true,
distributions: true,
outliers: true,
},
});
```
### 2. Data Quality Assessment
```javascript
assessDataQuality({
checks: {
completeness: { threshold: 90 },
accuracy: { validateAgainst: 'rules' },
consistency: { crossObject: true },
timeliness: { maxAge: '2 years' },
duplicates: { fuzzyMatch: true },
},
});
```
### 3. Storage Analysis
```javascript
analyzeStorage({
objects: ['*'],
metrics: ['record-count', 'storage-used', 'growth-rate', 'archive-candidates', 'large-attachments'],
});
```
## Profiling Categories
### Data Characteristics
- Field population rates
- Value distributions
- Pattern detection
- Statistical summaries
- Cardinality analysis
### Data Quality Metrics
- Completeness scores
- Accuracy validation
- Consistency checks
- Duplicate detection
- Anomaly identification
### Relationship Analysis
- Parent-child relationships
- Orphaned records
- Circular references
- Lookup integrity
- Junction object usage
### Performance Impact
- Large data volumes
- Wide tables
- Skewed data
- Index effectiveness
- Query performance
## Usage Examples
### Basic Data Profiling
```bash
# Profile specific object
profileData --object Account
# Profile all custom objects
profileData --custom-only
# Generate profiling report
profileData --output data-profile.html
```
### Automated Profiling
```yaml
schedule:
weekly:
- profileData --quick-scan
monthly:
- profileData --comprehensive
- generateDataQualityReport
```
## Profile Results
### Summary Report
```
Data Profile Summary - Account Object
====================================
Total Records: 1,245,678
Data Quality Score: 87%
Storage Used: 2.3 GB
Key Findings:
✓ 95% field population rate
⚠ 3,421 potential duplicates
✗ 12% records missing required fields
```
### Detailed Analysis
```json
{
"object": "Account",
"profile": {
"recordCount": 1245678,
"fields": {
"Name": {
"populated": 100,
"unique": 98.5,
"avgLength": 35,
"patterns": ["Company Inc", "LLC", "Corp"]
},
"Phone": {
"populated": 78,
"format": "mixed",
"invalid": 234
}
},
"quality": {
"completeness": 87,
"accuracy": 92,
"duplicates": 3421
}
}
}
```
## Configuration
### Profiling Rules
```yaml
profilingRules:
dataQuality:
required_fields:
Account: [Name, Type, Industry]
Contact: [LastName, Email]
validation_rules:
Email: regex:^[^\s@]+@[^\s@]+\.[^\s@]+$
Phone: regex:^\+?[\d\s\-\(\)]+$
thresholds:
high_volume: 1000000
low_population: 10
duplicate_threshold: 95
```
### Custom Profiling
```javascript
// Define custom profiling logic
addProfiler({
name: 'industry-specific-validation',
description: 'Validate industry-specific data requirements',
profile: (records) => {
// Custom profiling logic
return analysis;
},
});
```
## Data Quality Improvements
### Automated Cleanup
```javascript
// Suggest and apply data improvements
improveDataQuality({
standardizeFormats: true,
deduplicateRecords: true,
fillMissingRequired: true,
archiveOldData: { olderThan: '5 years' },
});
```
### Recommendations
Each profile includes:
- Data quality improvement suggestions
- Field optimization opportunities
- Storage reduction strategies
- Performance enhancement tips
## Integration Points
### With Data Management
- Migration planning
- Archive strategies
- Data governance
- Master data management
### With Development
- Field usage analysis
- Schema optimization
- Query performance
- Test data generation
## Visualization
### Charts and Graphs
- Field population heatmaps
- Value distribution histograms
- Trend analysis charts
- Relationship diagrams
- Quality score dashboards
### Export Formats
- HTML reports
- PDF summaries
- CSV data files
- JSON analysis
- Excel workbooks
## Best Practices
1. **Regular Profiling**
- Weekly quick profiles
- Monthly deep analysis
- Pre-migration profiling
- Post-deployment validation
2. **Action-Oriented**
- Focus on actionable insights
- Prioritize high-impact issues
- Track improvement trends
- Measure success metrics
3. **Comprehensive Coverage**
- Profile all objects
- Include custom fields
- Analyze relationships
- Consider all data types
4. **Continuous Monitoring**
- Set quality baselines
- Alert on degradation
- Track improvements
- Report to stakeholders
This utility provides deep insights into your Salesforce data quality and
characteristics.