agentic-data-stack-community
Version:
AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.
540 lines (505 loc) • 23.9 kB
YAML
metadata:
template_id: "data-contract-tmpl"
name: "Data Contract Specification Template"
version: "2.0.0"
description: "Comprehensive template for defining data contracts with validation and governance"
category: "governance"
tags: ["data-contract", "governance", "validation", "specification", "compliance"]
created_by: "AI Agentic Data Stack Framework"
created_date: "2025-01-23"
template:
id: data-contract-template
name: Data Contract Specification
version: 2.0
type: data-contract
mode: interactive
output:
format: markdown
filename: docs/data-contracts/{{dataset_name}}-data-contract.md
title: "{{dataset_name}} Data Contract"
validation_mode: continuous
workflow:
mode: interactive
elicitation: advanced-data-elicitation
validation: multi-stage
collaboration: multi-stakeholder
approval_required: true
elicitation_config:
progressive_disclosure: true
context_awareness: true
intelligent_branching: true
real_time_validation: true
stakeholder_routing: true
validation_framework:
interactive_validation: true
multi_agent_orchestration: true
real_time_scoring: true
evidence_collection: automated
quality_gates: comprehensive
sections:
- id: overview
title: Overview
instruction: |
Establish the foundational context for this data contract using interactive elicitation.
Progressive disclosure adapts questions based on complexity and stakeholder expertise.
workflow: interactive
validation: real_time
collaboration: multi_stakeholder
sections:
- id: business_purpose
title: Business Purpose
type: paragraph
instruction: What business problem does this dataset solve? What decisions or processes does it support?
examples:
- "Customer segmentation for targeted marketing campaigns"
- "Financial risk assessment for loan approvals"
- "Supply chain optimization for inventory management"
- id: value_proposition
title: Data Value Proposition
type: paragraph
instruction: What specific business value does this data provide? How will success be measured?
examples:
- "Enables 15% improvement in marketing campaign conversion rates"
- "Reduces loan default risk by 20% through better risk scoring"
- "Decreases inventory carrying costs by 10% through demand forecasting"
- id: stakeholders
title: Multi-Stakeholder Collaboration Matrix
type: collaborative_table
columns: [Role, Team/Department, Responsibilities, Contact, Validation_Tasks, Approval_Level]
instruction: Identify all stakeholders using interactive stakeholder mapping with validation workflows
collaboration_mode: multi_party_input
validation: stakeholder_verification
examples:
- "Data Owner | Marketing | Business requirements, quality standards | marketing-lead@company.com | Business validation | Required"
- "Data Steward | Data Engineering | Technical implementation, monitoring | data-eng@company.com | Technical validation | Required"
- "Data Consumer | Analytics Team | Analysis and insights | analytics@company.com | Usage validation | Advisory"
interactive_features:
stakeholder_workflows: enabled
approval_routing: automated
notification_system: real_time
validation_tracking: comprehensive
- id: data_sources
title: Data Sources and Lineage
instruction: |
Document all data sources, update frequencies, and lineage information.
sections:
- id: source_systems
title: Source Systems
type: table
columns: [System Name, Description, Owner, Connection Type, Dependencies]
instruction: List all source systems that contribute data to this contract
examples:
- "Salesforce CRM | Customer relationship data | Sales Team | API | Customer Master Data"
- "Web Analytics | User behavior data | Marketing | Streaming | User Identity Service"
- "ERP System | Financial transaction data | Finance | Batch ETL | Chart of Accounts"
- id: update_frequency
title: Update Frequency and Timing
type: structured
instruction: Define when and how often data is updated
fields:
- field: batch_schedule
type: text
description: "Batch processing schedule (if applicable)"
- field: real_time_sources
type: text
description: "Real-time streaming sources (if applicable)"
- field: business_hours
type: text
description: "Business hours considerations"
examples:
- "Batch: Daily at 2 AM EST | Real-time: User events via Kafka | Business hours: 9 AM - 5 PM EST"
- id: data_freshness
title: Data Freshness Requirements
type: structured
instruction: Define acceptable data age and staleness thresholds
fields:
- field: maximum_age
type: text
description: "Maximum acceptable data age"
- field: sla_target
type: text
description: "Service level agreement for data freshness"
- field: alert_threshold
type: text
description: "When to alert if data becomes stale"
examples:
- "Max age: 4 hours | SLA: 95% of data < 2 hours old | Alert: > 6 hours"
- id: schema_definition
title: Schema Definition and Data Model
instruction: |
Define the complete data schema including fields, types, constraints, and business rules.
sections:
- id: field_specifications
title: Field Specifications
type: table
columns: [Field Name, Data Type, Constraints, Description, Business Rules, Examples]
instruction: Document every field in the dataset with complete specifications
examples:
- "customer_id | UUID | NOT NULL, UNIQUE | Unique customer identifier | Immutable once assigned | '123e4567-e89b-12d3-a456-426614174000'"
- "email_address | VARCHAR(255) | NOT NULL, VALID EMAIL | Customer email address | Must be valid email format | 'customer@example.com'"
- "created_date | TIMESTAMP | NOT NULL | Record creation timestamp | Must be in UTC | '2023-12-01T10:30:00Z'"
- id: data_types
title: Data Type Standards
type: structured
instruction: Define standard data types and formats used across the schema
fields:
- field: date_format
type: text
description: "Standard date/timestamp format"
- field: string_encoding
type: text
description: "Character encoding standard"
- field: number_precision
type: text
description: "Decimal precision standards"
examples:
- "Dates: ISO 8601 (YYYY-MM-DDTHH:MM:SSZ) | Encoding: UTF-8 | Decimals: 2 decimal places for currency"
- id: relationships
title: Data Relationships
type: table
columns: [Parent Table, Child Table, Relationship Type, Foreign Key, Description]
instruction: Document relationships between tables/entities in the data model
examples:
- "customers | orders | One-to-Many | customer_id | One customer can have multiple orders"
- "products | order_items | One-to-Many | product_id | One product can be in multiple order items"
- id: quality_rules
title: Interactive Data Quality Rules and Validation
instruction: |
Define comprehensive data quality rules using the interactive validation framework.
Real-time quality scoring and multi-agent validation ensure comprehensive coverage.
workflow: interactive_validation
validation_mode: comprehensive
quality_framework: interactive
sections:
- id: completeness_rules
title: Interactive Completeness Rules
type: interactive_structured
instruction: Define completeness rules with real-time validation and scoring
validation: automated_evidence_collection
interactive_features:
real_time_scoring: enabled
validation_preview: enabled
rule_testing: automated
fields:
- field: required_fields
type: validated_list
description: "Fields that must always have values"
validation: schema_field_verification
- field: conditional_requirements
type: rule_builder
description: "Fields required under certain conditions"
validation: business_logic_verification
- field: acceptable_null_rate
type: percentage_with_validation
description: "Maximum acceptable null rate for optional fields"
validation: threshold_feasibility_check
examples:
- "Required: customer_id, email, created_date | Conditional: phone_number (required for premium customers) | Null rate: < 5% for optional fields"
quality_gates:
completeness_threshold: 95
validation_passing_score: 90
stakeholder_approval: required
- id: accuracy_rules
title: Accuracy Rules
type: structured
instruction: Define rules for data accuracy and correctness
fields:
- field: format_validation
type: text
description: "Data format validation rules"
- field: range_checks
type: text
description: "Acceptable value ranges"
- field: business_rules
type: text
description: "Business logic validation rules"
examples:
- "Email: valid email format | Age: 0-120 years | Revenue: >= 0"
- id: consistency_rules
title: Consistency Rules
type: structured
instruction: Define rules for data consistency across systems and time
fields:
- field: cross_system_checks
type: text
description: "Consistency checks across source systems"
- field: historical_consistency
type: text
description: "Historical data consistency requirements"
- field: referential_integrity
type: text
description: "Foreign key and relationship consistency"
examples:
- "Customer count must match between CRM and billing systems | Historical records immutable after 30 days | All order_items must reference valid product_id"
- id: uniqueness_rules
title: Uniqueness Rules
type: table
columns: [Field/Combination, Uniqueness Scope, Exception Handling, Validation Method]
instruction: Define uniqueness constraints and duplicate handling rules
examples:
- "customer_id | Global | Error on duplicate | Primary key constraint"
- "email_address | Per customer | Allow multiple customers same email in B2B | Business rule validation"
- "order_number | Per year | Error on duplicate within year | Composite unique constraint"
- id: governance
title: Interactive Governance and Compliance Framework
instruction: |
Define governance requirements using multi-agent collaboration with automated compliance checking.
Interactive validation ensures comprehensive coverage of regulatory requirements.
workflow: compliance_validation
agents: [data-governance-owner, data-product-manager]
validation: regulatory_compliance_check
sections:
- id: access_controls
title: Access Controls and Security
type: structured
instruction: Define who can access this data and under what conditions
fields:
- field: access_roles
type: text
description: "Roles and permissions for data access"
- field: authentication
type: text
description: "Authentication requirements"
- field: encryption
type: text
description: "Encryption requirements for data at rest and in transit"
examples:
- "Roles: Analytics (read), Data Engineering (read/write), Finance (read financial fields only) | Auth: SSO required | Encryption: AES-256 at rest, TLS 1.3 in transit"
- id: privacy_compliance
title: Privacy and Data Protection
type: structured
instruction: Define privacy requirements and data protection measures
fields:
- field: personal_data
type: text
description: "Classification of personal/sensitive data fields"
- field: retention_policy
type: text
description: "Data retention and deletion policies"
- field: consent_management
type: text
description: "Consent tracking and management requirements"
examples:
- "PII fields: email, phone, address | Retention: 7 years active, 2 years archived | Consent: tracked in consent_management table"
- id: regulatory_requirements
title: Interactive Regulatory Compliance Matrix
type: compliance_validation_table
columns: [Regulation, Applicable Data, Requirements, Compliance Measures, Validation_Status, Agent_Responsible]
instruction: Document regulatory requirements with automated compliance validation
validation_mode: continuous_compliance_monitoring
interactive_features:
compliance_checking: automated
risk_assessment: real_time
audit_trail: comprehensive
regulatory_updates: monitored
examples:
- "GDPR | Customer personal data | Right to deletion, data portability | Automated deletion process, data export API | Validated | data-governance-owner"
- "SOX | Financial transaction data | Audit trail, change tracking | Immutable audit log, quarterly reviews | Validated | data-governance-owner"
- "HIPAA | Health information (if applicable) | Access logging, encryption | Comprehensive audit logging, BAA agreements | N/A | data-governance-owner"
compliance_framework:
automated_scanning: enabled
regulatory_monitoring: active
violation_alerting: immediate
audit_preparation: automated
- id: sla
title: Service Level Agreements
instruction: |
Define service level commitments for data availability, performance, and quality.
sections:
- id: availability_sla
title: Availability Requirements
type: structured
instruction: Define uptime and availability commitments
fields:
- field: uptime_target
type: text
description: "Target uptime percentage"
- field: maintenance_windows
type: text
description: "Scheduled maintenance windows"
- field: disaster_recovery
type: text
description: "Disaster recovery and business continuity plans"
examples:
- "Uptime: 99.9% | Maintenance: Sundays 2-4 AM EST | Recovery: 4-hour RTO, 1-hour RPO"
- id: performance_sla
title: Performance Requirements
type: structured
instruction: Define performance benchmarks and response times
fields:
- field: query_response_time
type: text
description: "Maximum acceptable query response times"
- field: throughput_requirements
type: text
description: "Data processing throughput requirements"
- field: scalability_targets
type: text
description: "Scalability and growth planning targets"
examples:
- "Queries: < 5 seconds for standard reports, < 30 seconds for ad-hoc analysis | Throughput: 10,000 records/minute | Scale: 50% annual growth capacity"
- id: quality_sla
title: Data Quality SLA
type: structured
instruction: Define measurable quality commitments and targets
fields:
- field: quality_score_target
type: text
description: "Overall data quality score target"
- field: error_rate_threshold
type: text
description: "Maximum acceptable error rates"
- field: resolution_time
type: text
description: "Time to resolve quality issues"
examples:
- "Quality score: > 95% | Error rate: < 0.1% | Resolution: Critical issues within 2 hours, standard issues within 24 hours"
- id: monitoring
title: Interactive Monitoring and Real-time Quality Scoring
instruction: |
Define comprehensive monitoring strategy with real-time quality scoring and intelligent alerting.
Interactive dashboards provide continuous visibility into data health and performance.
workflow: real_time_monitoring
validation: continuous_quality_scoring
interactive_features:
real_time_dashboards: enabled
predictive_alerting: enabled
automated_remediation: configured
sections:
- id: quality_monitoring
title: Interactive Quality Monitoring Framework
type: interactive_monitoring_system
instruction: Define continuous quality monitoring with real-time scoring and intelligent alerting
validation: quality_monitoring_validation
interactive_features:
real_time_scoring: enabled
predictive_analytics: enabled
automated_evidence_collection: active
multi_dimensional_assessment: comprehensive
fields:
- field: automated_checks
type: validation_scheduler
description: "Automated quality checks with intelligent scheduling"
validation: feasibility_check
- field: quality_metrics
type: metric_selector_with_thresholds
description: "Key quality metrics with dynamic thresholds"
validation: business_impact_assessment
- field: alerting_rules
type: intelligent_alerting_system
description: "Context-aware alerting with escalation logic"
validation: alert_effectiveness_check
examples:
- "Checks: Real-time streaming validation + scheduled batch validation | Metrics: Multi-dimensional quality scores with trend analysis | Alerts: Intelligent alerting with stakeholder routing"
monitoring_capabilities:
real_time_quality_scoring: enabled
anomaly_detection: automated
trend_analysis: predictive
stakeholder_dashboards: role_based
- id: operational_monitoring
title: Operational Monitoring
type: structured
instruction: Define operational metrics and system health monitoring
fields:
- field: performance_metrics
type: text
description: "System performance metrics to track"
- field: usage_analytics
type: text
description: "Data usage and consumption analytics"
- field: cost_monitoring
type: text
description: "Cost tracking and optimization monitoring"
examples:
- "Performance: Query response time, resource utilization | Usage: Active users, query volume, popular datasets | Cost: Storage costs, compute costs, trending"
- id: lifecycle
title: Data Lifecycle Management
instruction: |
Define how data will be managed throughout its lifecycle from creation to deletion.
sections:
- id: data_classification
title: Data Classification
type: table
columns: [Classification Level, Data Types, Access Requirements, Retention Period]
instruction: Classify data based on sensitivity and business importance
examples:
- "Public | Marketing metrics, product information | Open access | 3 years"
- "Internal | Customer analytics, business metrics | Employee access only | 7 years"
- "Confidential | PII, financial data | Role-based access | 7 years active + 3 years archived"
- "Restricted | Payment info, health data | Strict access controls | Legal minimum only"
- id: archival_strategy
title: Archival and Retention
type: structured
instruction: Define data archival and long-term retention strategy
fields:
- field: active_period
type: text
description: "Period data remains in active storage"
- field: archive_criteria
type: text
description: "Criteria for moving data to archive storage"
- field: deletion_policy
type: text
description: "When and how data is permanently deleted"
examples:
- "Active: 2 years in hot storage | Archive: Move to cold storage after 2 years | Deletion: Permanent deletion after 7 years or upon request"
- id: change_management
title: Change Management
instruction: |
Define how changes to this data contract will be managed and communicated.
sections:
- id: versioning
title: Contract Versioning
type: structured
instruction: Define how contract versions will be managed
fields:
- field: version_strategy
type: text
description: "Versioning strategy and numbering scheme"
- field: backward_compatibility
type: text
description: "Backward compatibility requirements"
- field: deprecation_process
type: text
description: "Process for deprecating old versions"
examples:
- "Strategy: Semantic versioning (major.minor.patch) | Compatibility: Maintain for 2 major versions | Deprecation: 6-month notice for breaking changes"
- id: approval_process
title: Change Approval Process
type: structured
instruction: Define who must approve changes and the approval workflow
fields:
- field: approval_authority
type: text
description: "Who can approve different types of changes"
- field: review_process
type: text
description: "Review and validation process for changes"
- field: communication_plan
type: text
description: "How changes will be communicated to stakeholders"
examples:
- "Authority: Data Owner (minor), Governance Board (major) | Review: Technical review + business impact assessment | Communication: Email + Slack 2 weeks advance notice"
- id: appendix
title: Appendix
instruction: |
Include additional reference information and documentation.
sections:
- id: glossary
title: Glossary of Terms
type: table
columns: [Term, Definition, Context]
instruction: Define key terms and concepts used in this contract
examples:
- "Customer | Individual or organization that purchases products/services | Includes both active and inactive customers"
- "MAU | Monthly Active Users | Users who performed at least one action in the past 30 days"
- "Churn | Customer who canceled or didn't renew subscription | Calculated monthly"
- id: references
title: References and Related Documents
type: list
instruction: List related documents, standards, and external references
examples:
- "Data Architecture Document v2.1"
- "Company Data Governance Policy"
- "GDPR Compliance Handbook"
- "Data Quality Framework Documentation"