UNPKG

agentic-data-stack-community

Version:

AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.

306 lines (248 loc) 16 kB
# Task: Analyze Data ## Overview Conducts comprehensive data analysis to extract meaningful insights, identify patterns, and generate actionable intelligence for business decision-making. Implements systematic analytical methodologies with statistical rigor and business context for evidence-based recommendations. ## Prerequisites - Data availability and quality validation - Analysis objectives and business questions - Statistical computing tools and environment access - Domain expertise and business context understanding - Analysis methodology and framework selection ## Dependencies - Templates: `data-analysis-tmpl.yaml`, `statistical-analysis-tmpl.yaml` - Tasks: `create-dashboard.md`, `generate-insights.md`, `profile-data.md` - Checklists: `analysis-quality-checklist.md` ## Steps ### 1. **Analysis Planning and Objective Definition** - Define analysis objectives and business questions clearly - Establish success criteria and expected outcomes - Plan analysis methodology and statistical approach - Identify required data sources and preparation needs - **Validation**: Analysis plan approved by stakeholders and aligned with business objectives ### 2. **Data Exploration and Understanding** - Conduct exploratory data analysis and profiling - Identify data patterns, distributions, and relationships - Assess data quality and completeness for analysis - Document data characteristics and limitations - **Quality Check**: Data exploration comprehensive with documented findings ### 3. **Statistical Analysis and Hypothesis Testing** - Apply appropriate statistical methods and techniques - Conduct hypothesis testing and significance analysis - Perform correlation and regression analysis - Calculate confidence intervals and statistical measures - **Validation**: Statistical analysis rigorous with validated methodology ### 4. **Pattern Recognition and Trend Analysis** - Identify trends, patterns, and anomalies in data - Conduct time series analysis and forecasting - Perform cohort analysis and segmentation - Analyze seasonal patterns and cyclical behavior - **Quality Check**: Pattern analysis thorough with business-relevant insights ### 5. **Business Context Integration and Interpretation** - Interpret analytical results in business context - Validate findings with domain experts and stakeholders - Assess practical significance and business implications - Identify actionable insights and recommendations - **Validation**: Business interpretation validated with stakeholder agreement ### 6. **Visualization and Communication Preparation** - Create compelling visualizations and charts - Design analytical dashboards and reports - Prepare narrative explanations and storytelling - Develop executive summaries and key findings - **Quality Check**: Visualizations effective with clear communication ### 7. **Documentation and Knowledge Transfer** - Document analysis methodology and assumptions - Create reproducible analysis code and procedures - Prepare comprehensive analysis report - Conduct findings presentation and knowledge transfer - **Final Validation**: Analysis complete with stakeholder understanding and approval ## Interactive Features ### Data Analysis Platform - **Interactive exploration** with drag-and-drop analysis and real-time results - **Statistical computing** with automated statistical test selection and execution - **Visualization engine** with dynamic chart creation and customization - **Collaborative analysis** with shared workspaces and peer review ### Advanced Analytics Hub - **Machine learning integration** with automated pattern recognition and classification - **Predictive modeling** with forecasting and scenario analysis capabilities - **Text analytics** with natural language processing and sentiment analysis - **Geospatial analysis** with location-based insights and mapping ### Business Intelligence Integration - **Dashboard creation** with automated report generation and scheduling - **KPI tracking** with business metric monitoring and alerting - **Benchmark analysis** with industry comparison and competitive intelligence - **Impact assessment** with business outcome correlation and attribution ## Outputs ### Primary Deliverable - **Comprehensive Data Analysis Report** (`data-analysis-report.md`) - Detailed analytical findings with statistical validation - Business interpretation and actionable insights - Visualizations and supporting evidence - Methodology documentation and reproducible procedures ### Supporting Artifacts - **Analysis Code Package** - Reproducible analysis scripts and procedures - **Visualization Suite** - Interactive dashboards and chart collections - **Statistical Results** - Detailed statistical analysis outputs and validation - **Executive Summary** - High-level findings and recommendations for leadership ## Success Criteria ### Analysis Quality and Business Value - **Statistical Rigor**: Analysis methodology sound with appropriate statistical techniques - **Business Relevance**: Findings actionable and aligned with business objectives - **Insight Quality**: Clear, specific, and implementable recommendations - **Communication Effectiveness**: Results clearly communicated to diverse stakeholders - **Reproducibility**: Analysis reproducible with documented methodology ### Validation Requirements - [ ] Analysis objectives clearly defined with stakeholder alignment - [ ] Data exploration comprehensive with documented characteristics - [ ] Statistical analysis rigorous with appropriate methods and validation - [ ] Pattern analysis thorough with business-relevant insights - [ ] Business interpretation validated with stakeholder agreement - [ ] Visualizations effective with clear communication - [ ] Documentation complete with reproducible methodology ### Evidence Collection - Stakeholder validation of analysis objectives and business alignment - Statistical validation of methodology appropriateness and execution - Business expert validation of interpretation and practical significance - Peer review validation of analytical approach and conclusions - Reproducibility validation through independent verification ## Data Analysis Framework ### Analysis Types and Methodologies - **Descriptive Analytics**: Summary statistics, data profiling, trend analysis - **Diagnostic Analytics**: Root cause analysis, correlation analysis, hypothesis testing - **Predictive Analytics**: Forecasting, regression modeling, machine learning - **Prescriptive Analytics**: Optimization, scenario analysis, decision modeling ### Statistical Analysis Techniques - **Univariate Analysis**: Single variable analysis, distribution analysis, outlier detection - **Bivariate Analysis**: Correlation analysis, cross-tabulation, association testing - **Multivariate Analysis**: Multiple regression, factor analysis, cluster analysis - **Time Series Analysis**: Trend analysis, seasonality, forecasting, intervention analysis ### Business Analysis Applications - **Customer Analytics**: Segmentation, lifetime value, churn prediction, satisfaction analysis - **Operations Analytics**: Process optimization, efficiency analysis, capacity planning - **Financial Analytics**: Revenue analysis, cost optimization, profitability assessment - **Market Analytics**: Competitive analysis, market sizing, demand forecasting ## Exploratory Data Analysis ### Data Profiling and Assessment - **Data Quality Assessment**: Completeness, accuracy, consistency, validity checks - **Distribution Analysis**: Histograms, box plots, probability distributions - **Missing Data Analysis**: Pattern identification and imputation strategies - **Outlier Detection**: Statistical outlier identification and treatment options ### Relationship Analysis - **Correlation Analysis**: Pearson, Spearman, partial correlation analysis - **Association Rules**: Market basket analysis, frequent pattern mining - **Dependency Analysis**: Mutual information, chi-square tests of independence - **Interaction Effects**: Variable interaction identification and modeling ### Pattern Discovery - **Clustering Analysis**: K-means, hierarchical clustering, density-based clustering - **Classification Patterns**: Decision trees, rule mining, pattern classification - **Sequence Analysis**: Sequential pattern mining, time-based pattern discovery - **Anomaly Detection**: Statistical anomaly identification and investigation ## Statistical Analysis and Inference ### Hypothesis Testing Framework - **Test Selection**: Appropriate statistical test selection based on data characteristics - **Assumption Validation**: Normality, independence, homoscedasticity testing - **Effect Size Calculation**: Practical significance beyond statistical significance - **Power Analysis**: Sample size adequacy and statistical power assessment ### Regression and Modeling - **Linear Regression**: Simple and multiple linear regression analysis - **Logistic Regression**: Binary and multinomial logistic regression - **Time Series Modeling**: ARIMA, exponential smoothing, seasonal models - **Nonparametric Methods**: Rank-based tests, bootstrap, permutation tests ### Advanced Statistical Techniques - **Bayesian Analysis**: Bayesian inference, credible intervals, posterior distributions - **Survival Analysis**: Time-to-event analysis, hazard modeling, survival curves - **Experimental Design**: A/B testing, factorial designs, randomized controlled trials - **Causal Inference**: Causal analysis, treatment effects, instrumental variables ## Visualization and Communication ### Visualization Design Principles - **Chart Selection**: Appropriate visualization types for data and message - **Design Aesthetics**: Color theory, typography, layout principles - **Information Hierarchy**: Visual emphasis and information prioritization - **Accessibility**: Color-blind friendly, screen reader compatible visualizations ### Interactive Dashboard Development - **Dashboard Architecture**: Layout design, navigation, user experience - **Real-time Updates**: Live data integration and automatic refresh - **User Personalization**: Customizable views and user-specific content - **Mobile Optimization**: Responsive design for multiple device types ### Storytelling and Narrative - **Data Storytelling**: Narrative structure, compelling story arc, audience engagement - **Executive Communication**: C-suite presentation format and key message delivery - **Technical Documentation**: Detailed methodology and reproducibility documentation - **Training Materials**: User guides and knowledge transfer content ## Domain-Specific Analysis Applications ### Customer Analytics - **Customer Segmentation**: Behavioral, demographic, psychographic segmentation - **Lifetime Value Analysis**: CLV calculation, retention modeling, revenue prediction - **Churn Analysis**: Customer attrition prediction, retention strategies - **Satisfaction Analysis**: Survey analysis, sentiment tracking, experience measurement ### Operations Analytics - **Process Analysis**: Workflow optimization, bottleneck identification, capacity analysis - **Quality Analytics**: Defect analysis, process control, continuous improvement - **Supply Chain Analytics**: Inventory optimization, demand forecasting, logistics analysis - **Resource Optimization**: Staff scheduling, asset utilization, cost optimization ### Financial Analytics - **Revenue Analysis**: Revenue decomposition, growth drivers, trend analysis - **Profitability Analysis**: Margin analysis, cost allocation, profit optimization - **Risk Analytics**: Credit risk, market risk, operational risk assessment - **Budgeting Analytics**: Variance analysis, forecast accuracy, budget optimization ### Marketing Analytics - **Campaign Analysis**: Campaign effectiveness, ROI measurement, attribution modeling - **Channel Analytics**: Multi-channel analysis, cross-channel attribution - **Competitive Analysis**: Market share analysis, competitive benchmarking - **Pricing Analytics**: Price sensitivity, price optimization, revenue management ## Technology Stack Integration ### Statistical Computing Platforms - **R**: Statistical analysis, data manipulation, visualization with ggplot2 - **Python**: Data science libraries (pandas, numpy, scipy, scikit-learn) - **SAS**: Enterprise statistical analysis and advanced analytics - **SPSS**: Statistical analysis and survey research capabilities ### Business Intelligence Tools - **Tableau**: Interactive visualization and dashboard development - **Power BI**: Microsoft business intelligence and self-service analytics - **Looker**: Modern business intelligence with modeling layer - **Qlik Sense**: Associative analytics and data discovery ### Big Data Analytics - **Apache Spark**: Large-scale data processing and machine learning - **Databricks**: Unified analytics platform for big data and machine learning - **Amazon EMR**: Managed big data processing and analytics - **Google BigQuery**: Cloud data warehouse and analytics platform ## Validation Framework ### Data Analysis Quality Assurance 1. **Methodology Validation**: Analysis approach appropriateness and statistical rigor 2. **Data Quality Validation**: Input data quality and suitability for analysis 3. **Statistical Validation**: Statistical technique application and interpretation accuracy 4. **Business Validation**: Business relevance and practical significance assessment 5. **Reproducibility Validation**: Analysis replication and result consistency verification ### Continuous Analysis Improvement - Regular methodology review and enhancement based on outcomes - Statistical technique advancement and tool capability improvement - Business domain knowledge expansion and context integration - Stakeholder feedback collection and analysis process optimization ## Best Practices ### Analysis Methodology - Start with clear business questions and analysis objectives - Use appropriate statistical methods for data characteristics and questions - Validate assumptions and assess method limitations and constraints - Apply multiple analytical approaches for validation and triangulation ### Data Quality and Preparation - Invest significant time in data understanding and quality assessment - Document data limitations and potential biases clearly - Use appropriate data cleaning and preparation techniques - Maintain data lineage and transformation documentation ### Communication and Stakeholder Engagement - Tailor communication to audience expertise and information needs - Use visualization effectively to enhance understanding and engagement - Focus on actionable insights rather than technical methodology details - Provide clear recommendations with implementation guidance ## Risk Mitigation ### Common Pitfalls - **Analysis Bias**: Confirmation bias, selection bias, survivorship bias - **Statistical Misuse**: Inappropriate test selection, assumption violations, p-hacking - **Over-interpretation**: Reading too much into results, causation vs. correlation confusion - **Communication Gaps**: Technical jargon, unclear visualizations, missing business context ### Success Factors - Clear business objectives with stakeholder alignment and engagement - Appropriate statistical methodology with rigorous execution and validation - High-quality data with comprehensive quality assessment and preparation - Effective communication with audience-appropriate presentation and visualization - Continuous learning and methodology improvement based on feedback and outcomes ## Notes Effective data analysis combines statistical rigor with business acumen to generate actionable insights that drive decision-making and business value. Success depends on appropriate methodology selection, quality execution, and clear communication that bridges technical analysis with business understanding and implementation.