agentic-data-stack-community
Version:
AI Agentic Data Stack Framework - Community Edition. Open source data engineering framework with 4 core agents, essential templates, and 3-dimensional quality validation.
129 lines (97 loc) • 4.57 kB
Markdown
# Technical Preferences
## Data Stack
1. **Modern Data Stack (dbt + Snowflake + Fivetran)**
2. **Lakehouse Architecture (Databricks + Delta Lake)**
3. **Cloud-Native Stack (BigQuery + Dataflow + Looker)**
4. **Open Source Stack (Apache Spark + Apache Airflow + PostgreSQL)**
5. **Serverless Stack (AWS Glue + Redshift + QuickSight)**
## Programming Languages
1. **Python** - Most popular for data engineering and analytics
2. **SQL** - Essential for data querying and transformation
3. **JavaScript/TypeScript** - Frontend and full-stack development
4. **Java** - Enterprise applications and big data processing
5. **Scala** - Functional programming and Spark ecosystem
## Databases
1. **PostgreSQL** - Most popular open-source relational database
2. **MySQL** - Widely adopted relational database
3. **MongoDB** - Leading NoSQL document database
4. **Redis** - In-memory data structure store
5. **Snowflake** - Cloud data warehouse platform
## ETL/ELT Tools
1. **dbt (Data Build Tool)** - Modern data transformation
2. **Apache Airflow** - Workflow orchestration platform
3. **Fivetran** - Automated data integration
4. **Stitch** - Cloud-first ETL service
5. **Talend** - Enterprise data integration platform
## Orchestration
1. **Apache Airflow** - Most popular workflow orchestration
2. **Prefect** - Modern workflow management
3. **Dagster** - Data orchestration platform
4. **Kubernetes** - Container orchestration
5. **AWS Step Functions** - Serverless workflow service
## Data Quality Tools
1. **Great Expectations** - Data validation framework
2. **dbt tests** - Built-in data testing capabilities
3. **Monte Carlo** - Data observability platform
4. **Datafold** - Data diff and validation
5. **Soda** - Data quality monitoring
## Visualization Tools
1. **Tableau** - Leading enterprise BI platform
2. **Power BI** - Microsoft's business intelligence tool
3. **Looker** - Modern BI and data platform
4. **Grafana** - Open-source monitoring and observability
5. **Metabase** - Open-source business intelligence
## Cloud Platforms
1. **Amazon Web Services (AWS)** - Market leader
2. **Microsoft Fabric** - Enterprise-focused cloud platform
3. **Google Cloud Platform (GCP)** - Strong in data and AI
4. **Snowflake** - Cloud data platform
5. **Databricks** - Unified analytics platform
## Testing Frameworks
1. **pytest** - Python testing framework
2. **Jest** - JavaScript testing framework
3. **JUnit** - Java testing framework
4. **dbt test** - Data transformation testing
5. **Selenium** - Web application testing
## Development Tools
1. **Visual Studio Code** - Most popular code editor
2. **Git** - Version control system
3. **Docker** - Containerization platform
4. **Jupyter Notebooks** - Interactive development environment
5. **IntelliJ IDEA** - Integrated development environment
## Monitoring & Alerting
1. **Datadog** - Cloud monitoring and analytics
2. **New Relic** - Application performance monitoring
3. **Prometheus + Grafana** - Open-source monitoring stack
4. **Splunk** - Data platform for monitoring and security
5. **PagerDuty** - Incident response platform
## Documentation Standards
1. **Markdown** - Lightweight markup language
2. **Confluence** - Team collaboration and documentation
3. **GitBook** - Modern documentation platform
4. **Notion** - All-in-one workspace
5. **Sphinx** - Documentation generator for Python
## Security & Compliance
1. **SOC 2 Type II** - Security and availability standards
2. **GDPR Compliance** - Data protection regulation
3. **AWS IAM** - Identity and access management
4. **Vault by HashiCorp** - Secrets management
5. **RBAC (Role-Based Access Control)** - Access control method
## Performance Optimization
1. **Indexing Strategies** - Database performance optimization
2. **Caching (Redis/Memcached)** - In-memory data storage
3. **Query Optimization** - SQL performance tuning
4. **Partitioning** - Data distribution strategy
5. **CDN (Content Delivery Network)** - Global content distribution
## Code Standards
1. **PEP 8** - Python style guide
2. **ESLint + Prettier** - JavaScript code formatting
3. **Black** - Python code formatter
4. **SonarQube** - Code quality analysis
5. **Pre-commit Hooks** - Automated code quality checks
## Deployment Practices
1. **CI/CD Pipelines** - Continuous integration and deployment
2. **Infrastructure as Code (Terraform)** - Automated infrastructure management
3. **Blue-Green Deployment** - Zero-downtime deployment strategy
4. **Docker Containers** - Application containerization
5. **GitOps** - Git-based deployment workflow