agentic-data-stack-community

Version:

AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.

github.com/barnyp/agentic-data-stack-framework-community

barnyp/agentic-data-stack-framework-community

540 lines (505 loc) • 23.9 kB

YAML

metadata: template_id: "data-contract-tmpl" name: "Data Contract Specification Template" version: "2.0.0" description: "Comprehensive template for defining data contracts with validation and governance" category: "governance" tags: ["data-contract", "governance", "validation", "specification", "compliance"] created_by: "AI Agentic Data Stack Framework" created_date: "2025-01-23" template: id: data-contract-template name: Data Contract Specification version: 2.0 type: data-contract mode: interactive output: format: markdown filename: docs/data-contracts/{{dataset_name}}-data-contract.md title: "{{dataset_name}} Data Contract" validation_mode: continuous workflow: mode: interactive elicitation: advanced-data-elicitation validation: multi-stage collaboration: multi-stakeholder approval_required: true elicitation_config: progressive_disclosure: true context_awareness: true intelligent_branching: true real_time_validation: true stakeholder_routing: true validation_framework: interactive_validation: true multi_agent_orchestration: true real_time_scoring: true evidence_collection: automated quality_gates: comprehensive sections: - id: overview title: Overview instruction: | Establish the foundational context for this data contract using interactive elicitation. Progressive disclosure adapts questions based on complexity and stakeholder expertise. workflow: interactive validation: real_time collaboration: multi_stakeholder sections: - id: business_purpose title: Business Purpose type: paragraph instruction: What business problem does this dataset solve? What decisions or processes does it support? examples: - "Customer segmentation for targeted marketing campaigns" - "Financial risk assessment for loan approvals" - "Supply chain optimization for inventory management" - id: value_proposition title: Data Value Proposition type: paragraph instruction: What specific business value does this data provide? How will success be measured? examples: - "Enables 15% improvement in marketing campaign conversion rates" - "Reduces loan default risk by 20% through better risk scoring" - "Decreases inventory carrying costs by 10% through demand forecasting" - id: stakeholders title: Multi-Stakeholder Collaboration Matrix type: collaborative_table columns: [Role, Team/Department, Responsibilities, Contact, Validation_Tasks, Approval_Level] instruction: Identify all stakeholders using interactive stakeholder mapping with validation workflows collaboration_mode: multi_party_input validation: stakeholder_verification examples: - "Data Owner | Marketing | Business requirements, quality standards | marketing-lead@company.com | Business validation | Required" - "Data Steward | Data Engineering | Technical implementation, monitoring | data-eng@company.com | Technical validation | Required" - "Data Consumer | Analytics Team | Analysis and insights | analytics@company.com | Usage validation | Advisory" interactive_features: stakeholder_workflows: enabled approval_routing: automated notification_system: real_time validation_tracking: comprehensive - id: data_sources title: Data Sources and Lineage instruction: | Document all data sources, update frequencies, and lineage information. sections: - id: source_systems title: Source Systems type: table columns: [System Name, Description, Owner, Connection Type, Dependencies] instruction: List all source systems that contribute data to this contract examples: - "Salesforce CRM | Customer relationship data | Sales Team | API | Customer Master Data" - "Web Analytics | User behavior data | Marketing | Streaming | User Identity Service" - "ERP System | Financial transaction data | Finance | Batch ETL | Chart of Accounts" - id: update_frequency title: Update Frequency and Timing type: structured instruction: Define when and how often data is updated fields: - field: batch_schedule type: text description: "Batch processing schedule (if applicable)" - field: real_time_sources type: text description: "Real-time streaming sources (if applicable)" - field: business_hours type: text description: "Business hours considerations" examples: - "Batch: Daily at 2 AM EST | Real-time: User events via Kafka | Business hours: 9 AM - 5 PM EST" - id: data_freshness title: Data Freshness Requirements type: structured instruction: Define acceptable data age and staleness thresholds fields: - field: maximum_age type: text description: "Maximum acceptable data age" - field: sla_target type: text description: "Service level agreement for data freshness" - field: alert_threshold type: text description: "When to alert if data becomes stale" examples: - "Max age: 4 hours | SLA: 95% of data < 2 hours old | Alert: > 6 hours" - id: schema_definition title: Schema Definition and Data Model instruction: | Define the complete data schema including fields, types, constraints, and business rules. sections: - id: field_specifications title: Field Specifications type: table columns: [Field Name, Data Type, Constraints, Description, Business Rules, Examples] instruction: Document every field in the dataset with complete specifications examples: - "customer_id | UUID | NOT NULL, UNIQUE | Unique customer identifier | Immutable once assigned | '123e4567-e89b-12d3-a456-426614174000'" - "email_address | VARCHAR(255) | NOT NULL, VALID EMAIL | Customer email address | Must be valid email format | 'customer@example.com'" - "created_date | TIMESTAMP | NOT NULL | Record creation timestamp | Must be in UTC | '2023-12-01T10:30:00Z'" - id: data_types title: Data Type Standards type: structured instruction: Define standard data types and formats used across the schema fields: - field: date_format type: text description: "Standard date/timestamp format" - field: string_encoding type: text description: "Character encoding standard" - field: number_precision type: text description: "Decimal precision standards" examples: - "Dates: ISO 8601 (YYYY-MM-DDTHH:MM:SSZ) | Encoding: UTF-8 | Decimals: 2 decimal places for currency" - id: relationships title: Data Relationships type: table columns: [Parent Table, Child Table, Relationship Type, Foreign Key, Description] instruction: Document relationships between tables/entities in the data model examples: - "customers | orders | One-to-Many | customer_id | One customer can have multiple orders" - "products | order_items | One-to-Many | product_id | One product can be in multiple order items" - id: quality_rules title: Interactive Data Quality Rules and Validation instruction: | Define comprehensive data quality rules using the interactive validation framework. Real-time quality scoring and multi-agent validation ensure comprehensive coverage. workflow: interactive_validation validation_mode: comprehensive quality_framework: interactive sections: - id: completeness_rules title: Interactive Completeness Rules type: interactive_structured instruction: Define completeness rules with real-time validation and scoring validation: automated_evidence_collection interactive_features: real_time_scoring: enabled validation_preview: enabled rule_testing: automated fields: - field: required_fields type: validated_list description: "Fields that must always have values" validation: schema_field_verification - field: conditional_requirements type: rule_builder description: "Fields required under certain conditions" validation: business_logic_verification - field: acceptable_null_rate type: percentage_with_validation description: "Maximum acceptable null rate for optional fields" validation: threshold_feasibility_check examples: - "Required: customer_id, email, created_date | Conditional: phone_number (required for premium customers) | Null rate: < 5% for optional fields" quality_gates: completeness_threshold: 95 validation_passing_score: 90 stakeholder_approval: required - id: accuracy_rules title: Accuracy Rules type: structured instruction: Define rules for data accuracy and correctness fields: - field: format_validation type: text description: "Data format validation rules" - field: range_checks type: text description: "Acceptable value ranges" - field: business_rules type: text description: "Business logic validation rules" examples: - "Email: valid email format | Age: 0-120 years | Revenue: >= 0" - id: consistency_rules title: Consistency Rules type: structured instruction: Define rules for data consistency across systems and time fields: - field: cross_system_checks type: text description: "Consistency checks across source systems" - field: historical_consistency type: text description: "Historical data consistency requirements" - field: referential_integrity type: text description: "Foreign key and relationship consistency" examples: - "Customer count must match between CRM and billing systems | Historical records immutable after 30 days | All order_items must reference valid product_id" - id: uniqueness_rules title: Uniqueness Rules type: table columns: [Field/Combination, Uniqueness Scope, Exception Handling, Validation Method] instruction: Define uniqueness constraints and duplicate handling rules examples: - "customer_id | Global | Error on duplicate | Primary key constraint" - "email_address | Per customer | Allow multiple customers same email in B2B | Business rule validation" - "order_number | Per year | Error on duplicate within year | Composite unique constraint" - id: governance title: Interactive Governance and Compliance Framework instruction: | Define governance requirements using multi-agent collaboration with automated compliance checking. Interactive validation ensures comprehensive coverage of regulatory requirements. workflow: compliance_validation agents: [data-governance-owner, data-product-manager] validation: regulatory_compliance_check sections: - id: access_controls title: Access Controls and Security type: structured instruction: Define who can access this data and under what conditions fields: - field: access_roles type: text description: "Roles and permissions for data access" - field: authentication type: text description: "Authentication requirements" - field: encryption type: text description: "Encryption requirements for data at rest and in transit" examples: - "Roles: Analytics (read), Data Engineering (read/write), Finance (read financial fields only) | Auth: SSO required | Encryption: AES-256 at rest, TLS 1.3 in transit" - id: privacy_compliance title: Privacy and Data Protection type: structured instruction: Define privacy requirements and data protection measures fields: - field: personal_data type: text description: "Classification of personal/sensitive data fields" - field: retention_policy type: text description: "Data retention and deletion policies" - field: consent_management type: text description: "Consent tracking and management requirements" examples: - "PII fields: email, phone, address | Retention: 7 years active, 2 years archived | Consent: tracked in consent_management table" - id: regulatory_requirements title: Interactive Regulatory Compliance Matrix type: compliance_validation_table columns: [Regulation, Applicable Data, Requirements, Compliance Measures, Validation_Status, Agent_Responsible] instruction: Document regulatory requirements with automated compliance validation validation_mode: continuous_compliance_monitoring interactive_features: compliance_checking: automated risk_assessment: real_time audit_trail: comprehensive regulatory_updates: monitored examples: - "GDPR | Customer personal data | Right to deletion, data portability | Automated deletion process, data export API | Validated | data-governance-owner" - "SOX | Financial transaction data | Audit trail, change tracking | Immutable audit log, quarterly reviews | Validated | data-governance-owner" - "HIPAA | Health information (if applicable) | Access logging, encryption | Comprehensive audit logging, BAA agreements | N/A | data-governance-owner" compliance_framework: automated_scanning: enabled regulatory_monitoring: active violation_alerting: immediate audit_preparation: automated - id: sla title: Service Level Agreements instruction: | Define service level commitments for data availability, performance, and quality. sections: - id: availability_sla title: Availability Requirements type: structured instruction: Define uptime and availability commitments fields: - field: uptime_target type: text description: "Target uptime percentage" - field: maintenance_windows type: text description: "Scheduled maintenance windows" - field: disaster_recovery type: text description: "Disaster recovery and business continuity plans" examples: - "Uptime: 99.9% | Maintenance: Sundays 2-4 AM EST | Recovery: 4-hour RTO, 1-hour RPO" - id: performance_sla title: Performance Requirements type: structured instruction: Define performance benchmarks and response times fields: - field: query_response_time type: text description: "Maximum acceptable query response times" - field: throughput_requirements type: text description: "Data processing throughput requirements" - field: scalability_targets type: text description: "Scalability and growth planning targets" examples: - "Queries: < 5 seconds for standard reports, < 30 seconds for ad-hoc analysis | Throughput: 10,000 records/minute | Scale: 50% annual growth capacity" - id: quality_sla title: Data Quality SLA type: structured instruction: Define measurable quality commitments and targets fields: - field: quality_score_target type: text description: "Overall data quality score target" - field: error_rate_threshold type: text description: "Maximum acceptable error rates" - field: resolution_time type: text description: "Time to resolve quality issues" examples: - "Quality score: > 95% | Error rate: < 0.1% | Resolution: Critical issues within 2 hours, standard issues within 24 hours" - id: monitoring title: Interactive Monitoring and Real-time Quality Scoring instruction: | Define comprehensive monitoring strategy with real-time quality scoring and intelligent alerting. Interactive dashboards provide continuous visibility into data health and performance. workflow: real_time_monitoring validation: continuous_quality_scoring interactive_features: real_time_dashboards: enabled predictive_alerting: enabled automated_remediation: configured sections: - id: quality_monitoring title: Interactive Quality Monitoring Framework type: interactive_monitoring_system instruction: Define continuous quality monitoring with real-time scoring and intelligent alerting validation: quality_monitoring_validation interactive_features: real_time_scoring: enabled predictive_analytics: enabled automated_evidence_collection: active multi_dimensional_assessment: comprehensive fields: - field: automated_checks type: validation_scheduler description: "Automated quality checks with intelligent scheduling" validation: feasibility_check - field: quality_metrics type: metric_selector_with_thresholds description: "Key quality metrics with dynamic thresholds" validation: business_impact_assessment - field: alerting_rules type: intelligent_alerting_system description: "Context-aware alerting with escalation logic" validation: alert_effectiveness_check examples: - "Checks: Real-time streaming validation + scheduled batch validation | Metrics: Multi-dimensional quality scores with trend analysis | Alerts: Intelligent alerting with stakeholder routing" monitoring_capabilities: real_time_quality_scoring: enabled anomaly_detection: automated trend_analysis: predictive stakeholder_dashboards: role_based - id: operational_monitoring title: Operational Monitoring type: structured instruction: Define operational metrics and system health monitoring fields: - field: performance_metrics type: text description: "System performance metrics to track" - field: usage_analytics type: text description: "Data usage and consumption analytics" - field: cost_monitoring type: text description: "Cost tracking and optimization monitoring" examples: - "Performance: Query response time, resource utilization | Usage: Active users, query volume, popular datasets | Cost: Storage costs, compute costs, trending" - id: lifecycle title: Data Lifecycle Management instruction: | Define how data will be managed throughout its lifecycle from creation to deletion. sections: - id: data_classification title: Data Classification type: table columns: [Classification Level, Data Types, Access Requirements, Retention Period] instruction: Classify data based on sensitivity and business importance examples: - "Public | Marketing metrics, product information | Open access | 3 years" - "Internal | Customer analytics, business metrics | Employee access only | 7 years" - "Confidential | PII, financial data | Role-based access | 7 years active + 3 years archived" - "Restricted | Payment info, health data | Strict access controls | Legal minimum only" - id: archival_strategy title: Archival and Retention type: structured instruction: Define data archival and long-term retention strategy fields: - field: active_period type: text description: "Period data remains in active storage" - field: archive_criteria type: text description: "Criteria for moving data to archive storage" - field: deletion_policy type: text description: "When and how data is permanently deleted" examples: - "Active: 2 years in hot storage | Archive: Move to cold storage after 2 years | Deletion: Permanent deletion after 7 years or upon request" - id: change_management title: Change Management instruction: | Define how changes to this data contract will be managed and communicated. sections: - id: versioning title: Contract Versioning type: structured instruction: Define how contract versions will be managed fields: - field: version_strategy type: text description: "Versioning strategy and numbering scheme" - field: backward_compatibility type: text description: "Backward compatibility requirements" - field: deprecation_process type: text description: "Process for deprecating old versions" examples: - "Strategy: Semantic versioning (major.minor.patch) | Compatibility: Maintain for 2 major versions | Deprecation: 6-month notice for breaking changes" - id: approval_process title: Change Approval Process type: structured instruction: Define who must approve changes and the approval workflow fields: - field: approval_authority type: text description: "Who can approve different types of changes" - field: review_process type: text description: "Review and validation process for changes" - field: communication_plan type: text description: "How changes will be communicated to stakeholders" examples: - "Authority: Data Owner (minor), Governance Board (major) | Review: Technical review + business impact assessment | Communication: Email + Slack 2 weeks advance notice" - id: appendix title: Appendix instruction: | Include additional reference information and documentation. sections: - id: glossary title: Glossary of Terms type: table columns: [Term, Definition, Context] instruction: Define key terms and concepts used in this contract examples: - "Customer | Individual or organization that purchases products/services | Includes both active and inactive customers" - "MAU | Monthly Active Users | Users who performed at least one action in the past 30 days" - "Churn | Customer who canceled or didn't renew subscription | Calculated monthly" - id: references title: References and Related Documents type: list instruction: List related documents, standards, and external references examples: - "Data Architecture Document v2.1" - "Company Data Governance Policy" - "GDPR Compliance Handbook" - "Data Quality Framework Documentation"