@db2lake/core
Version:
Core interfaces and utilities for db2lake
141 lines (106 loc) • 4.87 kB
Markdown
<p align="center">
<img src="https://raw.githubusercontent.com/bahador-r/db2lake/master/assets/db2lake-logo240.png" width="200" alt="db2lake logo" />
</p>
# /core
Introduction
------------
db2lake is a small framework for extracting data from databases and loading it into
data lakes and warehouses. It provides a tiny, stable core API and a set of
driver packages (sources and destinations). Drivers can be scheduled and resumed
using cursor information so that only new data is transferred on subsequent runs.
This repository is a monorepo that includes the `core` package plus multiple
source and destination drivers.
Learn more about db2lake in our post on Dev.to:
[Introducing db2lake: A Lightweight and Powerful ETL Framework for Node.js](https://dev.to/bahador_r/introducing-db2lake-a-lightweight-and-powerful-etl-framework-for-nodejs-12b6)
Install
-------
Install the core package:
```bash
npm install /core
```
Source drivers
--------------
| Purpose | Driver | Install |
|---|:---|---|
| MySQL | [`/driver-mysql`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-mysql) | `npm i @db2lake/driver-mysql` |
| Firestore | [`/driver-firestore`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-firestore) | `npm i @db2lake/driver-firestore` |
| Postgres | [`/driver-postgres`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-postgres) | `npm i @db2lake/driver-postgres` |
| Oracle | [`/driver-oracle`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-oracle) | `npm i @db2lake/driver-oracle` |
Destination drivers
-------------------
| Purpose | Driver | Install |
|---|:---|---|
| BigQuery | [`/driver-bigquery`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-bigquery) | `npm i @db2lake/driver-bigquery` |
| Databricks | [`/driver-databricks`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-databricks) | `npm i @db2lake/driver-databricks` |
| Redshift | [`/driver-redshift`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-redshift) | `npm i @db2lake/driver-redshift` |
| Snowflake | [`/driver-snowflake`](https://github.com/bahador-r/db2lake/tree/master/packages/driver-snowflake) | `npm i @db2lake/driver-snowflake` |
Quick install example
---------------------
Install the core plus the MySQL source and BigQuery destination (example):
```bash
npm install /core @db2lake/driver-mysql /driver-bigquery
```
Complete TypeScript example
---------------------------
The following example demonstrates a simple pipeline using the MySQL source
driver and the BigQuery destination driver. It uses a transformer to adapt the
source rows and a lightweight logger passed into the pipeline.
Save as `examples/mysql-to-bigquery.ts` and run with `ts-node` or compile with
`tsc`.
```typescript
import { Pipeline, ITransformer, ILogger } from '/core';
import { MySQLSourceDriver } from '/driver-mysql';
import { BigQueryDestinationDriver } from '/driver-bigquery';
// --- Configure drivers (fill with your credentials) ---
const mysqlConfig = {
query: 'SELECT * FROM orders WHERE order_id > ? LIMIT 50',
params: [0],
cursorField: 'order_id',
cursorParamsIndex: 0,
connectionUri: 'mysql://user:password@localhost:3306/shopdb'
};
const bigqueryConfig = {
bigQueryOptions: {
keyFilename: './service-account.json',
projectId: 'my-project-id'
},
dataset: 'my_dataset',
table: 'users',
batchSize: 1000,
// Optional: use streaming for real-time inserts
writeOptions: {
sourceFormat: 'NEWLINE_DELIMITED_JSON'
}
};
// --- Transformer: adapt source row shape to destination schema ---
const transformer: ITransformer<any, any> = (rows) => rows.map(r => ({
id: r.id,
fullName: `${r.name}`,
createdAt: r.created_at instanceof Date ? r.created_at.toISOString() : r.created_at
}));
// --- Logger ---
const logger: ILogger = (level, message, data) => {
const ts = new Date().toISOString();
console.log(`${ts} [${level.toUpperCase()}] ${message}`);
if (data) console.debug(data);
};
async function main() {
const source = new MySQLSourceDriver(mysqlConfig as any);
const dest = new BigQueryDestinationDriver(bigqueryConfig as any);
const pipeline = new Pipeline(source as any, dest as any, transformer, logger);
try {
await pipeline.run();
console.log('Pipeline finished', pipeline.getMetrics());
} catch (err) {
console.error('Pipeline error', err);
}
}
main().catch(err => { console.error(err); process.exit(1); });
```
Contributing
------------
PRs that add drivers or improve the core API are welcome. Try to keep the core
API minimal and well-documented so drivers remain simple to implement.
License
-------
MIT