@hotglue/gluestick-ts
Version:
TypeScript version of the gluestick ETL library for hotglue IPaaS platform
195 lines (141 loc) • 4.61 kB
Markdown
# Gluestick TypeScript
A powerful TypeScript library for data processing and ETL operations on the hotglue IPaaS platform, built with Polars for high-performance data manipulation. Supports multiple export formats including CSV, JSON, Parquet, and Singer specification.
## Installation
```bash
npm install @hotglue/gluestick-ts
```
[](https://www.npmjs.com/package/@hotglue/gluestick-ts)
## Quick Start
```typescript
import * as gs from '@hotglue/gluestick-ts';
// Create a Reader to access your data
const reader = new gs.Reader();
// Get available data streams
const streams = reader.keys();
console.log('Available streams:', streams);
// Read and process a specific stream
const dataFrame = reader.get('your_stream_name', { catalogTypes: true });
// Export processed data (defaults to CSV)
gs.toExport(dataFrame, 'output_name', './output');
// Export as Singer format
gs.toExport(dataFrame, 'output_name', './output', { exportFormat: 'singer' });
```
## Core Components
### Reader Class
The `Reader` class is your main interface for accessing data streams:
```typescript
const reader = new gs.Reader(inputDir?, rootDir?);
```
**Methods:**
- `get(stream, options)` - Read a specific stream as a Polars DataFrame
- `keys()` - Get all available stream names
- `getPk(stream)` - Get primary keys for a stream from catalog
**Options:**
- `catalogTypes: boolean` - Use catalog for automatic type inference
### Export Functions
Export your processed data in multiple formats:
```typescript
gs.toExport(dataFrame, outputName, outputDir, options?);
```
**Supported formats:**
- **CSV** (default) - Comma-separated values
- **JSON** - Single JSON array
- **JSONL** - Newline-delimited JSON
- **Parquet** - Columnar storage format
- **Singer** - Singer specification format for data integration
## Development
Build the project:
```bash
npm run build
```
Run examples:
```bash
# Run CSV processing example
npm run run:example:csv
# Run Parquet processing example
npm run run:example:parquet
```
## API Reference
### Reader Constructor
```typescript
new Reader(inputDir?: string, rootDir?: string)
```
- `inputDir` - Custom input directory (default: `${rootDir}/sync-output`)
- `rootDir` - Root directory (default: `process.env.ROOT_DIR || '.'`)
### Reader Methods
#### `get(stream: string, options?: ReadOptions): DataFrame | null`
Read a data stream as a Polars DataFrame.
```typescript
const df = reader.get('users', { catalogTypes: true });
```
**Options:**
- `catalogTypes: boolean` - Use catalog for automatic type inference
#### `keys(): string[]`
Get all available stream names.
```typescript
const streams = reader.keys();
// Returns: ['users', 'orders', 'products']
```
#### `getPk(stream: string): string[] | null`
Get primary keys for a stream from the catalog.
```typescript
const primaryKeys = reader.getPk('users');
// Returns: ['id']
```
### Export Function
```typescript
toExport(
dataFrame: DataFrame,
outputName: string,
outputDir: string,
options?: ExportOptions
): void
```
**Parameters:**
- `dataFrame` - Polars DataFrame to export
- `outputName` - Name for the output file (without extension)
- `outputDir` - Directory to write the output file
- `options` - Export configuration options
**Export Options:**
```typescript
interface ExportOptions {
exportFormat?: 'csv' | 'json' | 'jsonl' | 'parquet' | 'singer';
outputFilePrefix?: string;
keys?: string[]; // Primary keys for the data
stringifyObjects?: boolean;
reservedVariables?: Record<string, string>;
allowObjects?: boolean; // For Singer format
schema?: SingerHeaderMap; // For Singer format
}
```
**Examples:**
```typescript
// Export as CSV with prefix
gs.toExport(dataFrame, 'processed_users', './output', {
exportFormat: 'csv',
outputFilePrefix: 'tenant_123_',
keys: ['user_id']
});
// Export as Singer format
gs.toExport(dataFrame, 'processed_users', './output', {
exportFormat: 'singer',
allowObjects: true,
keys: ['user_id']
});
```
## Singer Format Support
Export data in [Singer specification](https://hub.meltano.com/singer/spec) format for data integration pipelines:
```typescript
// Basic Singer export
gs.toExport(dataFrame, 'users', './output', {
exportFormat: 'singer',
keys: ['id']
});
// Singer export with object support
gs.toExport(dataFrame, 'users', './output', {
exportFormat: 'singer',
allowObjects: true,
keys: ['id']
});
```
The Singer export automatically generates SCHEMA, RECORD, and STATE messages according to the Singer specification.