@pujansrt/data-genie
Version:
High performant ETL engine written in TypeScript
184 lines (129 loc) โข 5.92 kB
Markdown
# Data-Genie
A lightweight and efficient **ETL Engine** in **TypeScript**, suitable for various operations.

## ๐ฆ Features
- ๐ Read from various data sources (CSV, TSV, JSON, NDJSON, FixedWidth, etc.)
- โ๏ธ Write to multiple formats (JSON, NDJSON, CSV, TSV, FixedWidth, SQL, Console, etc.)
- โ๏ธ Filter and transform data with powerful field filters
- ๐ Supports complex filtering expressions
- ๐ Chainable nd high performance operations for flexible data processing
- ๐ Supports data validation and transformation
- ๐ Ideal for data cleaning, migration, and analysis
- ๐งฉ Modular design for easy integration into existing projects
- ๐งช Easy to use with TypeScript/JavaScript/Browser
- ๐ Secure and reliable with TypeScript's type safety
- ๐ง Easy to install and get started (with examples)
## ๐ Getting Started
### ๐ง Installation
Install from npm:
```bash
npm install @pujansrt/data-genie
```
Or, with yarn:
```bash
yarn add @pujansrt/data-genie
```
<details>
<summary>Development install (clone & build)</summary>
```bash
git clone https://github.com/pujansrt/data-genie.git
cd data-genie
npm install
npm run build
```
</details>
---
## ๐ How to use
### Example to read a CSV file, filter data, and write to console
```ts
import { ConsoleWriter, CSVReader, Job, SetCalculatedField, TransformingReader, RemoveDuplicatesReader, RemoveFields } from '@pujansrt/data-genie';
async function runExample() {
let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true);
reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit');
reader = new TransformingReader(reader)
.add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform())
.add(new RemoveFields('CreditLimit', 'Balance').transform());
await Job.run(reader, new ConsoleWriter());
// await Job.run(filteringReader, new JsonWriter('output/filtered-data.json'));
// await Job.run(filteringReader, new CsvWriter('output/filtered-data.csv'));
// await Job.run(filteringReader, new FixedWidthWriter('output/filtered-data.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15));
}
runExample().catch(console.error);
```
### Writing to Fixed Width File
```ts
const fwWriter = new FixedWidthWriter('output/ex-simulated.fw').setFieldNamesInFirstRow(true).setFieldWidths(10, 15, 10, 15);
await Job.run(reader, fwWriter);
```
### Example to read a CSV file, filter data, and write to JSON:
```ts
import { ConsoleWriter, CSVReader, FieldFilter, FilterExpression, FilteringReader, IsNotNull, IsType, Job, PatternMatch, ValueMatch } from "@pujansrt/data-genie";
async function runExample() {
const reader = new CSVReader('input/example.csv').setFieldNamesInFirstRow(true);
const filteringReader = new FilteringReader(reader)
.add(new FieldFilter('Rating').addRule(IsNotNull()).addRule(IsType('string')).addRule(ValueMatch('B', 'C')).createRecordFilter())
.add(new FieldFilter('Account').addRule(IsNotNull()).addRule(IsType('string')).addRule(PatternMatch('[0-9]*')).createRecordFilter())
.add(
new FilterExpression(
'record.CreditLimit !== undefined && record.Balance !== undefined && parseFloat(record.CreditLimit) >= 0 && parseFloat(record.CreditLimit) <= 5000 && parseFloat(record.Balance) <= parseFloat(record.CreditLimit)'
).createRecordFilter()
);
await Job.run(filteringReader, new ConsoleWriter());
}
runExample().catch(console.error);
```
### Example to read a JSON file and transform data
```ts
import {ConsoleWriter, Job, JsonReader, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie";
async function runExample() {
let reader: any = new JsonReader('input/simple-json-input.json');
reader = new TransformingReader(reader)
.setCondition((record) => record.balance < 0)
.add(new SetCalculatedField('balance', '0.0').transform()); // Using SetCalculatedField for dynamic value
await Job.run(reader, new ConsoleWriter());
}
runExample().catch(console.error);
```
### FixedWidth Example
```ts
import {ConsoleWriter, FixedWidthReader, Job} from "@pujansrt/data-genie";
async function runExample() {
let reader: any = new FixedWidthReader('input/credit-balance-01.fw');
reader.setFieldWidths(8, 16, 16, 12, 14, 16, 7);
reader.setFieldNamesInFirstRow(true);
await Job.run(reader, new ConsoleWriter());
}
runExample().catch(console.error);
```
### Transform, Deduplicate and Fields Manipulation Example
```ts
import {ConsoleWriter, CSVReader, Job, RemoveDuplicatesReader, RemoveFields, SetCalculatedField, TransformingReader} from "@pujansrt/data-genie";
async function runExample() {
let reader: any = new CSVReader('input/credit-balance-01.csv').setFieldNamesInFirstRow(true);
reader = new RemoveDuplicatesReader(reader, 'Rating', 'CreditLimit');
reader = new TransformingReader(reader)
.add(new SetCalculatedField('AvailableCredit', 'parseFloat(record.CreditLimit) - parseFloat(record.Balance)').transform())
.add(new RemoveFields('CreditLimit', 'Balance').transform());
await Job.run(reader, new ConsoleWriter());
}
runExample().catch(console.error);
```
---
## Upcoming Features
- Support for Apache Avro
- Support for Apache Parquet
- ๐ Enhanced data validation rules
## ๐งช Use Cases
- Data cleaning and transformation
- Data validation and filtering
- Data migration and ETL processes
- Data analysis and reporting
- Data integration from multiple sources
## ๐ค Contributing
Contributions are welcome! Please open an issue or submit a pull request.
---
## ๐ License
MIT License โ free for personal and commercial use.
---
## ๐ค Author
Developed and maintained by Pujan Srivastava, a mathematician and software engineer with 18+ years of programming experience.