UNPKG

agentic-data-stack-community

Version:

AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.

316 lines (286 loc) 11.4 kB
workflow: id: data-ingestion-workflow name: Interactive Data Ingestion Pipeline Development description: >- Complete workflow for developing data ingestion pipelines using interactive validation framework, multi-agent collaboration, and real-time quality scoring. Supports both batch and real-time patterns. type: greenfield framework_version: 2.0 validation_mode: interactive collaboration_mode: multi_agent project_types: - batch-ingestion - real-time-streaming - api-integration - file-based-ingestion - database-replication interactive_features: progressive_disclosure: enabled real_time_validation: enabled multi_agent_orchestration: enabled quality_scoring: continuous stakeholder_collaboration: active sequence: - step: interactive_requirements_analysis agent: data-product-manager action: create-data-contract uses_template: interactive-data-contract-tmpl creates: interactive-data-contract.md validation_mode: multi_stakeholder duration: 1-2 days interactive_features: progressive_disclosure: enabled stakeholder_routing: automated real_time_validation: active notes: | Create comprehensive interactive data contract with: - Multi-stakeholder collaboration workflows - Progressive disclosure for complex requirements - Real-time validation and quality scoring - Advanced elicitation techniques - Automated evidence collection SAVE OUTPUT: Copy final interactive-data-contract.md to your project's docs/ folder. quality_gates: stakeholder_approval: required validation_score_minimum: 85 completeness_threshold: 95 multi_agent_consensus: required - step: architecture_design agent: data-architect action: design-data-architecture creates: data-architecture.md requires: data-contract.md duration: 1-2 days notes: | Design technical architecture including: - Ingestion patterns (batch vs streaming) - Data pipeline architecture - Storage and processing layer design - Integration points and API specifications SAVE OUTPUT: Copy final data-architecture.md to your project's docs/ folder. - step: interactive_quality_framework_design agent: data-quality-engineer action: interactive-quality-validation uses_task: interactive-quality-validation creates: interactive-quality-framework.md requires: interactive-data-contract.md validation_mode: comprehensive duration: 0.5-1 day interactive_features: real_time_quality_scoring: enabled multi_dimensional_assessment: active automated_evidence_collection: enabled notes: | Define interactive quality framework with: - Real-time quality validation and scoring - Multi-dimensional quality assessment - Automated evidence collection - Predictive quality analytics - Interactive quality dashboards quality_gates: quality_coverage: 100 validation_framework_score: 90 automated_check_percentage: 80 stakeholder_quality_approval: required - step: interactive_governance_validation agent: data-governance-owner action: data-contract-validation uses_task: data-contract-validation validates: [interactive-data-contract.md, data-architecture.md] uses: interactive-quality-validation validation_mode: comprehensive_compliance duration: 0.5 day interactive_features: compliance_checking: automated regulatory_monitoring: real_time risk_assessment: continuous notes: | Interactive governance validation with: - Automated compliance checking - Real-time regulatory monitoring - Interactive risk assessment - Multi-jurisdictional compliance validation - Automated audit trail generation quality_gates: compliance_score: 95 regulatory_validation: passed security_assessment: approved privacy_impact_assessment: completed - step: pipeline_implementation agent: data-engineer action: build-pipeline creates: pipeline-code requires: [data-architecture.md, quality-framework.md] duration: 3-5 days notes: | Implement data ingestion pipeline: - Source system integration and data extraction - Data transformation and validation logic - Quality checks and error handling - Pipeline orchestration and scheduling - Monitoring and alerting implementation - step: interactive_quality_implementation agent: data-quality-engineer action: implement-quality-checks uses_framework: interactive-quality-validation creates: interactive-quality-tests requires: [pipeline-code, interactive-quality-framework.md] validation_mode: comprehensive_testing duration: 1-2 days interactive_features: real_time_test_validation: enabled automated_test_generation: active quality_score_tracking: continuous notes: | Implement interactive quality validation: - Real-time quality validation framework - Automated test generation and execution - Interactive quality dashboards - Multi-dimensional quality scoring - Predictive quality analytics quality_gates: test_coverage: 95 quality_validation_score: 90 automated_test_percentage: 85 real_time_monitoring: operational - step: multi_agent_testing_validation agents: [data-engineer, data-quality-engineer] action: validate-data-story uses_task: validate-data-story validates: [pipeline-code, interactive-quality-tests] validation_mode: multi_agent_orchestration duration: 1-2 days interactive_features: multi_agent_collaboration: enabled real_time_validation_scoring: active automated_evidence_collection: comprehensive quality_gates: multi_agent_consensus: required validation_score_minimum: 90 quality_framework_alignment: verified story_implementation_match: confirmed notes: | Comprehensive pipeline testing: - End-to-end pipeline testing - Data quality validation testing - Performance benchmarking - Error handling and recovery testing - step: user_acceptance_testing agent: data-analyst action: validate-business-requirements validates: pipeline-outputs requires: pipeline-code duration: 1 day notes: | Business validation of pipeline outputs: - Data accuracy validation against business rules - Completeness verification for business requirements - Performance validation against SLA requirements - User interface and reporting validation (if applicable) - step: deployment_preparation agent: data-engineer action: prepare-deployment creates: deployment-package requires: [pipeline-code, quality-tests] duration: 0.5-1 day notes: | Prepare for production deployment: - Infrastructure provisioning and configuration - Environment-specific configuration management - Deployment scripts and automation - Rollback procedures and contingency planning - step: production_deployment agent: data-engineer action: deploy-pipeline creates: production-deployment requires: deployment-package duration: 0.5 day notes: | Deploy pipeline to production: - Execute deployment automation - Validate production deployment - Configure monitoring and alerting - Initialize production data flows - step: monitoring_setup agent: data-quality-engineer action: setup-quality-monitoring creates: monitoring-dashboard requires: production-deployment duration: 0.5 day notes: | Configure ongoing monitoring: - Quality metrics monitoring dashboards - Automated alerting and notification setup - Performance monitoring and capacity planning - Operational runbooks and procedures - step: documentation_and_handoff agent: data-product-manager action: finalize-documentation creates: [user-documentation, operational-documentation] requires: [production-deployment, monitoring-dashboard] duration: 0.5 day notes: | Complete documentation and knowledge transfer: - User guides and API documentation - Operational procedures and troubleshooting guides - Team knowledge transfer sessions - Post-deployment support procedures validation_gates: - gate: requirements_validation criteria: - Data contract includes all required sections - Business stakeholders have approved requirements - Quality dimensions and thresholds are defined - Governance requirements are documented - gate: architecture_validation criteria: - Architecture supports scalability requirements - Integration patterns are well-defined - Security and compliance requirements addressed - Performance requirements can be met - gate: implementation_validation criteria: - All unit tests pass with >85% code coverage - Integration tests validate end-to-end data flow - Quality checks meet defined thresholds - Error handling covers all failure scenarios - gate: deployment_validation criteria: - Production deployment completes successfully - All monitoring and alerting is functional - Performance meets SLA requirements - Security controls are properly configured success_criteria: technical: - Pipeline processes data within SLA timeframes - Data quality scores meet defined thresholds - System availability meets uptime requirements - Performance benchmarks are achieved business: - Business stakeholders can access required data - Data supports decision-making requirements - Compliance and governance requirements are met - User adoption meets expected targets escalation_procedures: - condition: Quality gate failures action: Escalate to Data Architect and Data Governance Owner timeline: Within 4 hours - condition: Production deployment issues action: Escalate to Infrastructure Team and Data Engineering Manager timeline: Within 2 hours - condition: Business requirement conflicts action: Escalate to Data Product Manager and Business Stakeholders timeline: Within 1 business day post_deployment_activities: - activity: Performance monitoring frequency: Daily for first week, then weekly responsible: Data Engineer - activity: Quality assessment frequency: Weekly for first month, then monthly responsible: Data Quality Engineer - activity: User feedback collection frequency: 30 days post-deployment responsible: Data Product Manager - activity: Cost optimization review frequency: 60 days post-deployment responsible: Data Architect