UNPKG

agentic-data-stack-community

Version:

AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.

129 lines (97 loc) 4.57 kB
# Technical Preferences ## Data Stack 1. **Modern Data Stack (dbt + Snowflake + Fivetran)** 2. **Lakehouse Architecture (Databricks + Delta Lake)** 3. **Cloud-Native Stack (BigQuery + Dataflow + Looker)** 4. **Open Source Stack (Apache Spark + Apache Airflow + PostgreSQL)** 5. **Serverless Stack (AWS Glue + Redshift + QuickSight)** ## Programming Languages 1. **Python** - Most popular for data engineering and analytics 2. **SQL** - Essential for data querying and transformation 3. **JavaScript/TypeScript** - Frontend and full-stack development 4. **Java** - Enterprise applications and big data processing 5. **Scala** - Functional programming and Spark ecosystem ## Databases 1. **PostgreSQL** - Most popular open-source relational database 2. **MySQL** - Widely adopted relational database 3. **MongoDB** - Leading NoSQL document database 4. **Redis** - In-memory data structure store 5. **Snowflake** - Cloud data warehouse platform ## ETL/ELT Tools 1. **dbt (Data Build Tool)** - Modern data transformation 2. **Apache Airflow** - Workflow orchestration platform 3. **Fivetran** - Automated data integration 4. **Stitch** - Cloud-first ETL service 5. **Talend** - Enterprise data integration platform ## Orchestration 1. **Apache Airflow** - Most popular workflow orchestration 2. **Prefect** - Modern workflow management 3. **Dagster** - Data orchestration platform 4. **Kubernetes** - Container orchestration 5. **AWS Step Functions** - Serverless workflow service ## Data Quality Tools 1. **Great Expectations** - Data validation framework 2. **dbt tests** - Built-in data testing capabilities 3. **Monte Carlo** - Data observability platform 4. **Datafold** - Data diff and validation 5. **Soda** - Data quality monitoring ## Visualization Tools 1. **Tableau** - Leading enterprise BI platform 2. **Power BI** - Microsoft's business intelligence tool 3. **Looker** - Modern BI and data platform 4. **Grafana** - Open-source monitoring and observability 5. **Metabase** - Open-source business intelligence ## Cloud Platforms 1. **Amazon Web Services (AWS)** - Market leader 2. **Microsoft Fabric** - Enterprise-focused cloud platform 3. **Google Cloud Platform (GCP)** - Strong in data and AI 4. **Snowflake** - Cloud data platform 5. **Databricks** - Unified analytics platform ## Testing Frameworks 1. **pytest** - Python testing framework 2. **Jest** - JavaScript testing framework 3. **JUnit** - Java testing framework 4. **dbt test** - Data transformation testing 5. **Selenium** - Web application testing ## Development Tools 1. **Visual Studio Code** - Most popular code editor 2. **Git** - Version control system 3. **Docker** - Containerization platform 4. **Jupyter Notebooks** - Interactive development environment 5. **IntelliJ IDEA** - Integrated development environment ## Monitoring & Alerting 1. **Datadog** - Cloud monitoring and analytics 2. **New Relic** - Application performance monitoring 3. **Prometheus + Grafana** - Open-source monitoring stack 4. **Splunk** - Data platform for monitoring and security 5. **PagerDuty** - Incident response platform ## Documentation Standards 1. **Markdown** - Lightweight markup language 2. **Confluence** - Team collaboration and documentation 3. **GitBook** - Modern documentation platform 4. **Notion** - All-in-one workspace 5. **Sphinx** - Documentation generator for Python ## Security & Compliance 1. **SOC 2 Type II** - Security and availability standards 2. **GDPR Compliance** - Data protection regulation 3. **AWS IAM** - Identity and access management 4. **Vault by HashiCorp** - Secrets management 5. **RBAC (Role-Based Access Control)** - Access control method ## Performance Optimization 1. **Indexing Strategies** - Database performance optimization 2. **Caching (Redis/Memcached)** - In-memory data storage 3. **Query Optimization** - SQL performance tuning 4. **Partitioning** - Data distribution strategy 5. **CDN (Content Delivery Network)** - Global content distribution ## Code Standards 1. **PEP 8** - Python style guide 2. **ESLint + Prettier** - JavaScript code formatting 3. **Black** - Python code formatter 4. **SonarQube** - Code quality analysis 5. **Pre-commit Hooks** - Automated code quality checks ## Deployment Practices 1. **CI/CD Pipelines** - Continuous integration and deployment 2. **Infrastructure as Code (Terraform)** - Automated infrastructure management 3. **Blue-Green Deployment** - Zero-downtime deployment strategy 4. **Docker Containers** - Application containerization 5. **GitOps** - Git-based deployment workflow