UNPKG

@prathammahajan/csv-bulk-processor

Version:

Powerful, memory-efficient bulk data processing for CSV, Excel, and JSON files with streaming, validation, transformation, and performance monitoring

225 lines (185 loc) 8.61 kB
# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [1.0.0] - 2025-01-27 ### Added - **Initial Release** - First stable release of CSV Bulk Processor - **Core Processing Engine** - Main BulkProcessor class with comprehensive file processing capabilities - **Multi-Format Support** - Support for CSV, Excel (.xlsx, .xls), and JSON file formats - **Memory-Efficient Streaming** - Streaming processor for large files with configurable chunk sizes - **Data Validation Engine** - Comprehensive validation with schema, format, and custom rules - **Data Transformation Engine** - Advanced data transformation with mapping, cleaning, and conversion - **Progress Tracking System** - Real-time progress monitoring with resumable processing capabilities - **Error Handling & Recovery** - Robust error detection, recovery, and rollback mechanisms - **Batch Processing** - Optimized batch operations for large datasets - **Performance Monitoring** - Built-in performance analytics and optimization tracking - **Processing Analytics** - Detailed analytics and insights into processing performance - **Event System** - Comprehensive event emission for progress, completion, and errors - **Configuration Management** - Flexible configuration system for all processing options - **Comprehensive Testing** - Full test suite with unit, integration, and performance tests - **Documentation & Examples** - Complete documentation with practical usage examples ### Technical Features - **Streaming Architecture** - Memory-efficient processing using Node.js streams - **Format Processors** - Specialized processors for CSV, Excel, and JSON formats - **Validation Framework** - Joi-based validation with custom rule support - **Transformation Pipeline** - Data cleaning, mapping, and type conversion - **Progress Tracking** - Real-time progress updates with session management - **Error Recovery** - Automatic error detection and recovery mechanisms - **Performance Metrics** - Throughput, memory usage, and processing time tracking - **Batch Optimization** - Configurable batch sizes for optimal performance - **Memory Management** - Automatic memory monitoring and optimization - **Concurrent Processing** - Support for concurrent file processing operations ### File Format Support - **CSV Files** - Full CSV support with custom delimiters and encoding - **Excel Files** - Complete Excel support (.xlsx, .xls) with multiple sheets - **JSON Files** - JSON processing with streaming support for large files ### Validation Features - **Schema Validation** - Joi-based schema validation with custom rules - **Format Validation** - Data format and type validation - **Business Rules** - Custom business logic validation - **Error Reporting** - Detailed validation error reporting ### Transformation Features - **Field Mapping** - Flexible field mapping and renaming - **Data Cleaning** - String trimming, date normalization, empty value handling - **Type Conversion** - Automatic type conversion (string to number, date, boolean) - **Data Normalization** - Data standardization and normalization ### Performance Features - **Memory Monitoring** - Real-time memory usage tracking - **Throughput Analytics** - Records per second processing metrics - **Performance Optimization** - Automatic performance tuning - **Resource Management** - Efficient resource utilization ### Error Handling - **Error Detection** - Comprehensive error detection and classification - **Recovery Mechanisms** - Automatic error recovery and retry logic - **Rollback Support** - Transaction rollback on critical errors - **Error Reporting** - Detailed error reporting and logging ### Testing & Quality - **Unit Tests** - Comprehensive unit test coverage for all components - **Integration Tests** - End-to-end integration testing - **Performance Tests** - Performance and load testing - **Test Coverage** - 100% test coverage for critical components - **CI/CD Ready** - GitHub Actions compatible test suite ### Documentation - **README** - Comprehensive documentation with examples - **API Reference** - Complete API documentation - **Usage Examples** - Practical examples for common use cases - **Configuration Guide** - Detailed configuration options - **Best Practices** - Performance and usage best practices ### Dependencies - **Production Dependencies**: - `csv-parser` - CSV file parsing - `xlsx` - Excel file processing - `fast-csv` - High-performance CSV processing - `stream-json` - JSON streaming support - `joi` - Data validation - `lodash` - Utility functions - `moment` - Date handling - `winston` - Logging - `node-cron` - Scheduled processing - `axios` - HTTP requests - **Development Dependencies**: - `jest` - Testing framework - `supertest` - HTTP testing - `eslint` - Code linting - `prettier` - Code formatting - `typescript` - Type definitions ### Platform Support - **Node.js** - Version 14.0.0 and above - **Operating Systems** - Windows, macOS, Linux - **Architectures** - x64, arm64, ia32 ### Performance Benchmarks - **Small Files** (< 1MB) - Direct processing for optimal performance - **Large Files** (> 1MB) - Streaming processing with memory efficiency - **Memory Usage** - Configurable memory limits with automatic optimization - **Throughput** - Optimized for high-volume data processing - **Error Recovery** - Fast error detection and recovery ### Use Cases - **Data Migration** - Efficient processing of large datasets - **ETL Pipelines** - Extract, Transform, Load operations - **Data Import/Export** - Enterprise data processing - **Analytics Platforms** - Large-scale data processing - **API Development** - Bulk data processing endpoints - **Microservices** - Data processing components - **Real-time Processing** - Streaming data processing - **Startup MVPs** - Production-ready data processing --- ## Version History Summary | Version | Date | Key Changes | |---------|------|-------------| | 1.0.0 | 2025-01-27 | Initial stable release with full feature set | ## Migration Guide ### Getting Started **Installation:** ```bash npm install @prathammahajan/csv-bulk-processor ``` **Basic Usage:** ```javascript const BulkProcessor = require('@prathammahajan/csv-bulk-processor'); const processor = new BulkProcessor(); const result = await processor.processFile('data.csv'); ``` **Advanced Configuration:** ```javascript const processor = new BulkProcessor({ streaming: { enabled: true, chunkSize: 1000 }, validation: { enabled: true }, transformation: { enabled: true }, progress: { enabled: true } }); ``` ### Configuration Options **Streaming Configuration:** ```javascript streaming: { enabled: true, // Enable streaming for large files chunkSize: 1000, // Records per chunk memoryLimit: '100MB' // Memory limit } ``` **Validation Configuration:** ```javascript validation: { enabled: true, // Enable validation schema: { /* schema */ }, format: true, // Format validation business: true // Business rules } ``` **Transformation Configuration:** ```javascript transformation: { enabled: true, // Enable transformation mapping: { /* mapping */ }, cleaning: true, // Data cleaning conversion: true // Type conversion } ``` ### Event Handling **Progress Tracking:** ```javascript processor.on('progress', (data) => { console.log(`Processed ${data.recordsProcessed} records`); }); ``` **Error Handling:** ```javascript processor.on('error', (error) => { console.error('Processing error:', error); }); ``` **Completion Handling:** ```javascript processor.on('complete', (data) => { console.log('Processing completed!'); }); ``` --- ## Support - 📧 **Issues**: [GitHub Issues](https://github.com/prathammahajan13/csv-bulk-processor/issues) - 📖 **Documentation**: [GitHub Wiki](https://github.com/prathammahajan13/csv-bulk-processor/wiki) - 💬 **Discussions**: [GitHub Discussions](https://github.com/prathammahajan13/csv-bulk-processor/discussions) -**Support Development**: [Buy Me a Coffee](https://buymeacoffee.com/mahajanprae) --- **Made with ❤️ by [Pratham Mahajan](https://github.com/prathammahajan13)**