agentic-data-stack-community
Version:
AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.
316 lines (286 loc) • 11.4 kB
YAML
workflow:
id: data-ingestion-workflow
name: Interactive Data Ingestion Pipeline Development
description: >-
Complete workflow for developing data ingestion pipelines using interactive validation framework,
multi-agent collaboration, and real-time quality scoring. Supports both batch and real-time patterns.
type: greenfield
framework_version: 2.0
validation_mode: interactive
collaboration_mode: multi_agent
project_types:
- batch-ingestion
- real-time-streaming
- api-integration
- file-based-ingestion
- database-replication
interactive_features:
progressive_disclosure: enabled
real_time_validation: enabled
multi_agent_orchestration: enabled
quality_scoring: continuous
stakeholder_collaboration: active
sequence:
- step: interactive_requirements_analysis
agent: data-product-manager
action: create-data-contract
uses_template: interactive-data-contract-tmpl
creates: interactive-data-contract.md
validation_mode: multi_stakeholder
duration: 1-2 days
interactive_features:
progressive_disclosure: enabled
stakeholder_routing: automated
real_time_validation: active
notes: |
Create comprehensive interactive data contract with:
- Multi-stakeholder collaboration workflows
- Progressive disclosure for complex requirements
- Real-time validation and quality scoring
- Advanced elicitation techniques
- Automated evidence collection
SAVE OUTPUT: Copy final interactive-data-contract.md to your project's docs/ folder.
quality_gates:
stakeholder_approval: required
validation_score_minimum: 85
completeness_threshold: 95
multi_agent_consensus: required
- step: architecture_design
agent: data-architect
action: design-data-architecture
creates: data-architecture.md
requires: data-contract.md
duration: 1-2 days
notes: |
Design technical architecture including:
- Ingestion patterns (batch vs streaming)
- Data pipeline architecture
- Storage and processing layer design
- Integration points and API specifications
SAVE OUTPUT: Copy final data-architecture.md to your project's docs/ folder.
- step: interactive_quality_framework_design
agent: data-quality-engineer
action: interactive-quality-validation
uses_task: interactive-quality-validation
creates: interactive-quality-framework.md
requires: interactive-data-contract.md
validation_mode: comprehensive
duration: 0.5-1 day
interactive_features:
real_time_quality_scoring: enabled
multi_dimensional_assessment: active
automated_evidence_collection: enabled
notes: |
Define interactive quality framework with:
- Real-time quality validation and scoring
- Multi-dimensional quality assessment
- Automated evidence collection
- Predictive quality analytics
- Interactive quality dashboards
quality_gates:
quality_coverage: 100
validation_framework_score: 90
automated_check_percentage: 80
stakeholder_quality_approval: required
- step: interactive_governance_validation
agent: data-governance-owner
action: data-contract-validation
uses_task: data-contract-validation
validates: [interactive-data-contract.md, data-architecture.md]
uses: interactive-quality-validation
validation_mode: comprehensive_compliance
duration: 0.5 day
interactive_features:
compliance_checking: automated
regulatory_monitoring: real_time
risk_assessment: continuous
notes: |
Interactive governance validation with:
- Automated compliance checking
- Real-time regulatory monitoring
- Interactive risk assessment
- Multi-jurisdictional compliance validation
- Automated audit trail generation
quality_gates:
compliance_score: 95
regulatory_validation: passed
security_assessment: approved
privacy_impact_assessment: completed
- step: pipeline_implementation
agent: data-engineer
action: build-pipeline
creates: pipeline-code
requires: [data-architecture.md, quality-framework.md]
duration: 3-5 days
notes: |
Implement data ingestion pipeline:
- Source system integration and data extraction
- Data transformation and validation logic
- Quality checks and error handling
- Pipeline orchestration and scheduling
- Monitoring and alerting implementation
- step: interactive_quality_implementation
agent: data-quality-engineer
action: implement-quality-checks
uses_framework: interactive-quality-validation
creates: interactive-quality-tests
requires: [pipeline-code, interactive-quality-framework.md]
validation_mode: comprehensive_testing
duration: 1-2 days
interactive_features:
real_time_test_validation: enabled
automated_test_generation: active
quality_score_tracking: continuous
notes: |
Implement interactive quality validation:
- Real-time quality validation framework
- Automated test generation and execution
- Interactive quality dashboards
- Multi-dimensional quality scoring
- Predictive quality analytics
quality_gates:
test_coverage: 95
quality_validation_score: 90
automated_test_percentage: 85
real_time_monitoring: operational
- step: multi_agent_testing_validation
agents: [data-engineer, data-quality-engineer]
action: validate-data-story
uses_task: validate-data-story
validates: [pipeline-code, interactive-quality-tests]
validation_mode: multi_agent_orchestration
duration: 1-2 days
interactive_features:
multi_agent_collaboration: enabled
real_time_validation_scoring: active
automated_evidence_collection: comprehensive
quality_gates:
multi_agent_consensus: required
validation_score_minimum: 90
quality_framework_alignment: verified
story_implementation_match: confirmed
notes: |
Comprehensive pipeline testing:
- End-to-end pipeline testing
- Data quality validation testing
- Performance benchmarking
- Error handling and recovery testing
- step: user_acceptance_testing
agent: data-analyst
action: validate-business-requirements
validates: pipeline-outputs
requires: pipeline-code
duration: 1 day
notes: |
Business validation of pipeline outputs:
- Data accuracy validation against business rules
- Completeness verification for business requirements
- Performance validation against SLA requirements
- User interface and reporting validation (if applicable)
- step: deployment_preparation
agent: data-engineer
action: prepare-deployment
creates: deployment-package
requires: [pipeline-code, quality-tests]
duration: 0.5-1 day
notes: |
Prepare for production deployment:
- Infrastructure provisioning and configuration
- Environment-specific configuration management
- Deployment scripts and automation
- Rollback procedures and contingency planning
- step: production_deployment
agent: data-engineer
action: deploy-pipeline
creates: production-deployment
requires: deployment-package
duration: 0.5 day
notes: |
Deploy pipeline to production:
- Execute deployment automation
- Validate production deployment
- Configure monitoring and alerting
- Initialize production data flows
- step: monitoring_setup
agent: data-quality-engineer
action: setup-quality-monitoring
creates: monitoring-dashboard
requires: production-deployment
duration: 0.5 day
notes: |
Configure ongoing monitoring:
- Quality metrics monitoring dashboards
- Automated alerting and notification setup
- Performance monitoring and capacity planning
- Operational runbooks and procedures
- step: documentation_and_handoff
agent: data-product-manager
action: finalize-documentation
creates: [user-documentation, operational-documentation]
requires: [production-deployment, monitoring-dashboard]
duration: 0.5 day
notes: |
Complete documentation and knowledge transfer:
- User guides and API documentation
- Operational procedures and troubleshooting guides
- Team knowledge transfer sessions
- Post-deployment support procedures
validation_gates:
- gate: requirements_validation
criteria:
- Data contract includes all required sections
- Business stakeholders have approved requirements
- Quality dimensions and thresholds are defined
- Governance requirements are documented
- gate: architecture_validation
criteria:
- Architecture supports scalability requirements
- Integration patterns are well-defined
- Security and compliance requirements addressed
- Performance requirements can be met
- gate: implementation_validation
criteria:
- All unit tests pass with >85% code coverage
- Integration tests validate end-to-end data flow
- Quality checks meet defined thresholds
- Error handling covers all failure scenarios
- gate: deployment_validation
criteria:
- Production deployment completes successfully
- All monitoring and alerting is functional
- Performance meets SLA requirements
- Security controls are properly configured
success_criteria:
technical:
- Pipeline processes data within SLA timeframes
- Data quality scores meet defined thresholds
- System availability meets uptime requirements
- Performance benchmarks are achieved
business:
- Business stakeholders can access required data
- Data supports decision-making requirements
- Compliance and governance requirements are met
- User adoption meets expected targets
escalation_procedures:
- condition: Quality gate failures
action: Escalate to Data Architect and Data Governance Owner
timeline: Within 4 hours
- condition: Production deployment issues
action: Escalate to Infrastructure Team and Data Engineering Manager
timeline: Within 2 hours
- condition: Business requirement conflicts
action: Escalate to Data Product Manager and Business Stakeholders
timeline: Within 1 business day
post_deployment_activities:
- activity: Performance monitoring
frequency: Daily for first week, then weekly
responsible: Data Engineer
- activity: Quality assessment
frequency: Weekly for first month, then monthly
responsible: Data Quality Engineer
- activity: User feedback collection
frequency: 30 days post-deployment
responsible: Data Product Manager
- activity: Cost optimization review
frequency: 60 days post-deployment
responsible: Data Architect