UNPKG

@pujansrt/data-genie

Version:

High performant ETL engine written in TypeScript

184 lines (129 loc) โ€ข 5.92 kB
# Data-Genie A lightweight and efficient **ETL Engine** in **TypeScript**, suitable for various operations. ![](./diagram.jpg) ## ๐Ÿ“ฆ Features - ๐Ÿ”„ Read from various data sources (CSV, TSV, JSON, NDJSON, FixedWidth, etc.) - โœ๏ธ Write to multiple formats (JSON, NDJSON, CSV, TSV, FixedWidth, SQL, Console, etc.) - โœ‚๏ธ Filter and transform data with powerful field filters - ๐Ÿ“Š Supports complex filtering expressions - ๐Ÿ”— Chainable nd high performance operations for flexible data processing - ๐Ÿ” Supports data validation and transformation - ๐Ÿ“ˆ Ideal for data cleaning, migration, and analysis - ๐Ÿงฉ Modular design for easy integration into existing projects - ๐Ÿงช Easy to use with TypeScript/JavaScript/Browser - ๐Ÿ”’ Secure and reliable with TypeScript's type safety - ๐Ÿ”ง Easy to install and get started (with examples) ## ๐Ÿš€ Getting Started ### ๐Ÿ”ง Installation Install from npm: ```bash npm install @pujansrt/data-genie ``` Or, with yarn: ```bash yarn add @pujansrt/data-genie ``` <details> <summary>Development install (clone & build)</summary> ```bash git clone https://github.com/pujansrt/data-genie.git cd data-genie npm install npm run build ``` </details> --- ## ๐Ÿ“š How to use ### Example to read a CSV file, filter data, and write to console ```ts import { ConsoleWriter, CSVReader, Job, SetCalculatedField, TransformingReader, RemoveDuplicatesReader, RemoveFields } from '@pujansrt/data-genie'; async function runExample() { let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true); reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit'); reader = new TransformingReader(reader) .add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform()) .add(new RemoveFields('CreditLimit', 'Balance').transform()); await Job.run(reader, new ConsoleWriter()); // await Job.run(filteringReader, new JsonWriter('output/filtered-data.json')); // await Job.run(filteringReader, new CsvWriter('output/filtered-data.csv')); // await Job.run(filteringReader, new FixedWidthWriter('output/filtered-data.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15)); } runExample().catch(console.error); ``` ### Writing to Fixed Width File ```ts const fwWriter = new FixedWidthWriter('output/ex-simulated.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15); await Job.run(reader, fwWriter); ``` ### Example to read a CSV file, filter data, and write to JSON: ```ts import { ConsoleWriter, CSVReader, FieldFilter, FilterExpression, FilteringReader, IsNotNull, IsType, Job, PatternMatch, ValueMatch } from "@pujansrt/data-genie"; async function runExample() { const reader = new CSVReader('input/example.csv').setFieldNamesInFirstRow(true); const filteringReader = new FilteringReader(reader) .add(new FieldFilter('Rating').addRule(IsNotNull()).addRule(IsType('string')).addRule(ValueMatch('B', 'C')).createRecordFilter()) .add(new FieldFilter('Account').addRule(IsNotNull()).addRule(IsType('string')).addRule(PatternMatch('[0-9]*')).createRecordFilter()) .add( new FilterExpression( 'record.CreditLimit !== undefined && record.Balance !== undefined && parseFloat(record.CreditLimit) >= 0 && parseFloat(record.CreditLimit) <= 5000 && parseFloat(record.Balance) <= parseFloat(record.CreditLimit)' ).createRecordFilter() ); await Job.run(filteringReader, new ConsoleWriter()); } runExample().catch(console.error); ``` ### Example to read a JSON file and transform data ```ts import {ConsoleWriter, Job, JsonReader, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie"; async function runExample() { let reader: any = new JsonReader('input/simple-json-input.json'); reader = new TransformingReader(reader) .setCondition((record) => record.balance < 0) .add(new SetCalculatedField('balance', '0.0').transform()); // Using SetCalculatedField for dynamic value await Job.run(reader, new ConsoleWriter()); } runExample().catch(console.error); ``` ### FixedWidth Example ```ts import {ConsoleWriter, FixedWidthReader, Job} from "@pujansrt/data-genie"; async function runExample() { let reader: any = new FixedWidthReader('input/credit-balance-01.fw'); reader.setFieldWidths(8, 16, 16, 12, 14, 16, 7); reader.setFieldNamesInFirstRow(true); await Job.run(reader, new ConsoleWriter()); } runExample().catch(console.error); ``` ### Transform, Deduplicate and Fields Manipulation Example ```ts import {ConsoleWriter, CSVReader, Job, RemoveDuplicatesReader, RemoveFields, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie"; async function runExample() { let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true); reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit'); reader = new TransformingReader(reader) .add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform()) .add(new RemoveFields('CreditLimit', 'Balance').transform()); await Job.run(reader, new ConsoleWriter()); } runExample().catch(console.error); ``` --- ## Upcoming Features - Support for Apache Avro - Support for Apache Parquet - ๐Ÿ”— Enhanced data validation rules ## ๐Ÿงช Use Cases - Data cleaning and transformation - Data validation and filtering - Data migration and ETL processes - Data analysis and reporting - Data integration from multiple sources ## ๐Ÿค Contributing Contributions are welcome! Please open an issue or submit a pull request. --- ## ๐Ÿ“œ License MIT License โ€” free for personal and commercial use. --- ## ๐Ÿ‘ค Author Developed and maintained by Pujan Srivastava, a mathematician and software engineer with 18+ years of programming experience.