agentic-data-stack-community
Version:
AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.
306 lines (248 loc) • 16 kB
Markdown
# Task: Analyze Data
## Overview
Conducts comprehensive data analysis to extract meaningful insights, identify patterns, and generate actionable intelligence for business decision-making. Implements systematic analytical methodologies with statistical rigor and business context for evidence-based recommendations.
## Prerequisites
- Data availability and quality validation
- Analysis objectives and business questions
- Statistical computing tools and environment access
- Domain expertise and business context understanding
- Analysis methodology and framework selection
## Dependencies
- Templates: `data-analysis-tmpl.yaml`, `statistical-analysis-tmpl.yaml`
- Tasks: `create-dashboard.md`, `generate-insights.md`, `profile-data.md`
- Checklists: `analysis-quality-checklist.md`
## Steps
### 1. **Analysis Planning and Objective Definition**
- Define analysis objectives and business questions clearly
- Establish success criteria and expected outcomes
- Plan analysis methodology and statistical approach
- Identify required data sources and preparation needs
- **Validation**: Analysis plan approved by stakeholders and aligned with business objectives
### 2. **Data Exploration and Understanding**
- Conduct exploratory data analysis and profiling
- Identify data patterns, distributions, and relationships
- Assess data quality and completeness for analysis
- Document data characteristics and limitations
- **Quality Check**: Data exploration comprehensive with documented findings
### 3. **Statistical Analysis and Hypothesis Testing**
- Apply appropriate statistical methods and techniques
- Conduct hypothesis testing and significance analysis
- Perform correlation and regression analysis
- Calculate confidence intervals and statistical measures
- **Validation**: Statistical analysis rigorous with validated methodology
### 4. **Pattern Recognition and Trend Analysis**
- Identify trends, patterns, and anomalies in data
- Conduct time series analysis and forecasting
- Perform cohort analysis and segmentation
- Analyze seasonal patterns and cyclical behavior
- **Quality Check**: Pattern analysis thorough with business-relevant insights
### 5. **Business Context Integration and Interpretation**
- Interpret analytical results in business context
- Validate findings with domain experts and stakeholders
- Assess practical significance and business implications
- Identify actionable insights and recommendations
- **Validation**: Business interpretation validated with stakeholder agreement
### 6. **Visualization and Communication Preparation**
- Create compelling visualizations and charts
- Design analytical dashboards and reports
- Prepare narrative explanations and storytelling
- Develop executive summaries and key findings
- **Quality Check**: Visualizations effective with clear communication
### 7. **Documentation and Knowledge Transfer**
- Document analysis methodology and assumptions
- Create reproducible analysis code and procedures
- Prepare comprehensive analysis report
- Conduct findings presentation and knowledge transfer
- **Final Validation**: Analysis complete with stakeholder understanding and approval
## Interactive Features
### Data Analysis Platform
- **Interactive exploration** with drag-and-drop analysis and real-time results
- **Statistical computing** with automated statistical test selection and execution
- **Visualization engine** with dynamic chart creation and customization
- **Collaborative analysis** with shared workspaces and peer review
### Advanced Analytics Hub
- **Machine learning integration** with automated pattern recognition and classification
- **Predictive modeling** with forecasting and scenario analysis capabilities
- **Text analytics** with natural language processing and sentiment analysis
- **Geospatial analysis** with location-based insights and mapping
### Business Intelligence Integration
- **Dashboard creation** with automated report generation and scheduling
- **KPI tracking** with business metric monitoring and alerting
- **Benchmark analysis** with industry comparison and competitive intelligence
- **Impact assessment** with business outcome correlation and attribution
## Outputs
### Primary Deliverable
- **Comprehensive Data Analysis Report** (`data-analysis-report.md`)
- Detailed analytical findings with statistical validation
- Business interpretation and actionable insights
- Visualizations and supporting evidence
- Methodology documentation and reproducible procedures
### Supporting Artifacts
- **Analysis Code Package** - Reproducible analysis scripts and procedures
- **Visualization Suite** - Interactive dashboards and chart collections
- **Statistical Results** - Detailed statistical analysis outputs and validation
- **Executive Summary** - High-level findings and recommendations for leadership
## Success Criteria
### Analysis Quality and Business Value
- **Statistical Rigor**: Analysis methodology sound with appropriate statistical techniques
- **Business Relevance**: Findings actionable and aligned with business objectives
- **Insight Quality**: Clear, specific, and implementable recommendations
- **Communication Effectiveness**: Results clearly communicated to diverse stakeholders
- **Reproducibility**: Analysis reproducible with documented methodology
### Validation Requirements
- [ ] Analysis objectives clearly defined with stakeholder alignment
- [ ] Data exploration comprehensive with documented characteristics
- [ ] Statistical analysis rigorous with appropriate methods and validation
- [ ] Pattern analysis thorough with business-relevant insights
- [ ] Business interpretation validated with stakeholder agreement
- [ ] Visualizations effective with clear communication
- [ ] Documentation complete with reproducible methodology
### Evidence Collection
- Stakeholder validation of analysis objectives and business alignment
- Statistical validation of methodology appropriateness and execution
- Business expert validation of interpretation and practical significance
- Peer review validation of analytical approach and conclusions
- Reproducibility validation through independent verification
## Data Analysis Framework
### Analysis Types and Methodologies
- **Descriptive Analytics**: Summary statistics, data profiling, trend analysis
- **Diagnostic Analytics**: Root cause analysis, correlation analysis, hypothesis testing
- **Predictive Analytics**: Forecasting, regression modeling, machine learning
- **Prescriptive Analytics**: Optimization, scenario analysis, decision modeling
### Statistical Analysis Techniques
- **Univariate Analysis**: Single variable analysis, distribution analysis, outlier detection
- **Bivariate Analysis**: Correlation analysis, cross-tabulation, association testing
- **Multivariate Analysis**: Multiple regression, factor analysis, cluster analysis
- **Time Series Analysis**: Trend analysis, seasonality, forecasting, intervention analysis
### Business Analysis Applications
- **Customer Analytics**: Segmentation, lifetime value, churn prediction, satisfaction analysis
- **Operations Analytics**: Process optimization, efficiency analysis, capacity planning
- **Financial Analytics**: Revenue analysis, cost optimization, profitability assessment
- **Market Analytics**: Competitive analysis, market sizing, demand forecasting
## Exploratory Data Analysis
### Data Profiling and Assessment
- **Data Quality Assessment**: Completeness, accuracy, consistency, validity checks
- **Distribution Analysis**: Histograms, box plots, probability distributions
- **Missing Data Analysis**: Pattern identification and imputation strategies
- **Outlier Detection**: Statistical outlier identification and treatment options
### Relationship Analysis
- **Correlation Analysis**: Pearson, Spearman, partial correlation analysis
- **Association Rules**: Market basket analysis, frequent pattern mining
- **Dependency Analysis**: Mutual information, chi-square tests of independence
- **Interaction Effects**: Variable interaction identification and modeling
### Pattern Discovery
- **Clustering Analysis**: K-means, hierarchical clustering, density-based clustering
- **Classification Patterns**: Decision trees, rule mining, pattern classification
- **Sequence Analysis**: Sequential pattern mining, time-based pattern discovery
- **Anomaly Detection**: Statistical anomaly identification and investigation
## Statistical Analysis and Inference
### Hypothesis Testing Framework
- **Test Selection**: Appropriate statistical test selection based on data characteristics
- **Assumption Validation**: Normality, independence, homoscedasticity testing
- **Effect Size Calculation**: Practical significance beyond statistical significance
- **Power Analysis**: Sample size adequacy and statistical power assessment
### Regression and Modeling
- **Linear Regression**: Simple and multiple linear regression analysis
- **Logistic Regression**: Binary and multinomial logistic regression
- **Time Series Modeling**: ARIMA, exponential smoothing, seasonal models
- **Nonparametric Methods**: Rank-based tests, bootstrap, permutation tests
### Advanced Statistical Techniques
- **Bayesian Analysis**: Bayesian inference, credible intervals, posterior distributions
- **Survival Analysis**: Time-to-event analysis, hazard modeling, survival curves
- **Experimental Design**: A/B testing, factorial designs, randomized controlled trials
- **Causal Inference**: Causal analysis, treatment effects, instrumental variables
## Visualization and Communication
### Visualization Design Principles
- **Chart Selection**: Appropriate visualization types for data and message
- **Design Aesthetics**: Color theory, typography, layout principles
- **Information Hierarchy**: Visual emphasis and information prioritization
- **Accessibility**: Color-blind friendly, screen reader compatible visualizations
### Interactive Dashboard Development
- **Dashboard Architecture**: Layout design, navigation, user experience
- **Real-time Updates**: Live data integration and automatic refresh
- **User Personalization**: Customizable views and user-specific content
- **Mobile Optimization**: Responsive design for multiple device types
### Storytelling and Narrative
- **Data Storytelling**: Narrative structure, compelling story arc, audience engagement
- **Executive Communication**: C-suite presentation format and key message delivery
- **Technical Documentation**: Detailed methodology and reproducibility documentation
- **Training Materials**: User guides and knowledge transfer content
## Domain-Specific Analysis Applications
### Customer Analytics
- **Customer Segmentation**: Behavioral, demographic, psychographic segmentation
- **Lifetime Value Analysis**: CLV calculation, retention modeling, revenue prediction
- **Churn Analysis**: Customer attrition prediction, retention strategies
- **Satisfaction Analysis**: Survey analysis, sentiment tracking, experience measurement
### Operations Analytics
- **Process Analysis**: Workflow optimization, bottleneck identification, capacity analysis
- **Quality Analytics**: Defect analysis, process control, continuous improvement
- **Supply Chain Analytics**: Inventory optimization, demand forecasting, logistics analysis
- **Resource Optimization**: Staff scheduling, asset utilization, cost optimization
### Financial Analytics
- **Revenue Analysis**: Revenue decomposition, growth drivers, trend analysis
- **Profitability Analysis**: Margin analysis, cost allocation, profit optimization
- **Risk Analytics**: Credit risk, market risk, operational risk assessment
- **Budgeting Analytics**: Variance analysis, forecast accuracy, budget optimization
### Marketing Analytics
- **Campaign Analysis**: Campaign effectiveness, ROI measurement, attribution modeling
- **Channel Analytics**: Multi-channel analysis, cross-channel attribution
- **Competitive Analysis**: Market share analysis, competitive benchmarking
- **Pricing Analytics**: Price sensitivity, price optimization, revenue management
## Technology Stack Integration
### Statistical Computing Platforms
- **R**: Statistical analysis, data manipulation, visualization with ggplot2
- **Python**: Data science libraries (pandas, numpy, scipy, scikit-learn)
- **SAS**: Enterprise statistical analysis and advanced analytics
- **SPSS**: Statistical analysis and survey research capabilities
### Business Intelligence Tools
- **Tableau**: Interactive visualization and dashboard development
- **Power BI**: Microsoft business intelligence and self-service analytics
- **Looker**: Modern business intelligence with modeling layer
- **Qlik Sense**: Associative analytics and data discovery
### Big Data Analytics
- **Apache Spark**: Large-scale data processing and machine learning
- **Databricks**: Unified analytics platform for big data and machine learning
- **Amazon EMR**: Managed big data processing and analytics
- **Google BigQuery**: Cloud data warehouse and analytics platform
## Validation Framework
### Data Analysis Quality Assurance
1. **Methodology Validation**: Analysis approach appropriateness and statistical rigor
2. **Data Quality Validation**: Input data quality and suitability for analysis
3. **Statistical Validation**: Statistical technique application and interpretation accuracy
4. **Business Validation**: Business relevance and practical significance assessment
5. **Reproducibility Validation**: Analysis replication and result consistency verification
### Continuous Analysis Improvement
- Regular methodology review and enhancement based on outcomes
- Statistical technique advancement and tool capability improvement
- Business domain knowledge expansion and context integration
- Stakeholder feedback collection and analysis process optimization
## Best Practices
### Analysis Methodology
- Start with clear business questions and analysis objectives
- Use appropriate statistical methods for data characteristics and questions
- Validate assumptions and assess method limitations and constraints
- Apply multiple analytical approaches for validation and triangulation
### Data Quality and Preparation
- Invest significant time in data understanding and quality assessment
- Document data limitations and potential biases clearly
- Use appropriate data cleaning and preparation techniques
- Maintain data lineage and transformation documentation
### Communication and Stakeholder Engagement
- Tailor communication to audience expertise and information needs
- Use visualization effectively to enhance understanding and engagement
- Focus on actionable insights rather than technical methodology details
- Provide clear recommendations with implementation guidance
## Risk Mitigation
### Common Pitfalls
- **Analysis Bias**: Confirmation bias, selection bias, survivorship bias
- **Statistical Misuse**: Inappropriate test selection, assumption violations, p-hacking
- **Over-interpretation**: Reading too much into results, causation vs. correlation confusion
- **Communication Gaps**: Technical jargon, unclear visualizations, missing business context
### Success Factors
- Clear business objectives with stakeholder alignment and engagement
- Appropriate statistical methodology with rigorous execution and validation
- High-quality data with comprehensive quality assessment and preparation
- Effective communication with audience-appropriate presentation and visualization
- Continuous learning and methodology improvement based on feedback and outcomes
## Notes
Effective data analysis combines statistical rigor with business acumen to generate actionable insights that drive decision-making and business value. Success depends on appropriate methodology selection, quality execution, and clear communication that bridges technical analysis with business understanding and implementation.