@db2lake/driver-databricks
Version:
DataBricks destination driver for db2lake
71 lines (51 loc) • 2.48 kB
Markdown
<p align="center">
<img src="https://raw.githubusercontent.com/bahador-r/db2lake/master/assets/db2lake-logo240.png" width="200" alt="db2lake logo" />
</p>
High-performance Databricks destination driver for `@db2lake`. It writes batches of records to a Databricks SQL table using the `@databricks/sql` SDK and supports optional table creation, transactions with retries, and configurable batching.
Note: This driver depends on the `@databricks/sql` package at runtime.
```bash
npm install @db2lake/driver-databricks
```
```
├── src/
│ └── index.ts
│ └── type.ts
└── package.json
```
```typescript
import { DatabricksDestinationDriver, DatabricksConfig } from '@db2lake/driver-databricks';
const config: DatabricksConfig = {
connection: { host: 'workspace.cloud.databricks.com', path: '/sql/1.0/warehouses/xxx', token: process.env.DATABRICKS_TOKEN! },
database: 'my_db',
table: 'my_table',
batchSize: 1000,
transaction: { enabled: true, maxRetries: 3 }
};
const driver = new DatabricksDestinationDriver<{ name: string; age: number }>(config);
try {
await driver.insert([{ name: 'John', age: 30 }]);
// Inserts are buffered and flushed when batchSize is reached or on close()
} finally {
await driver.close();
}
```
If you want the driver to create the target table automatically, pass `createTableOptions` in the config. The driver will generate and execute a `CREATE TABLE IF NOT EXISTS` statement using the provided `schema`, optional `properties`, and `comment`.
- `connection` - `{ host, path, token }` for the Databricks SQL warehouse
- `database` - target database/catalog name
- `table` - target table name
- `createTableOptions?` - `{ schema: DatabricksColumn[], properties?: Record<string,string>, comment?: string }`
- `writeMode?` - `'append' | 'overwrite'` (default: `append`)
- `batchSize?` - flush threshold (default: 1000)
- `transaction?` - `{ enabled?: boolean, maxRetries?: number }` (default enabled=true, maxRetries=3)
- Always call `close()` to flush pending rows and release connections.
- Tune `batchSize` to balance performance and memory use.
- Use `createTableOptions` to automate schema management in CI or initial runs.
MIT