@uwdata/flechette
Version:
Fast, lightweight access to Apache Arrow data.
169 lines (130 loc) • 9.77 kB
Markdown
---
title: API Reference
---
# Flechette API Reference <a href="https://idl.uw.edu/flechette"><img align="right" src="../assets/logo.svg" height="38"/></a>
[**Top-Level**](/flechette/api) | [Data Types](data-types) | [Schema](schema) | [Table](table) | [Column](column)
## Top-Level Decoding and Encoding
* [tableFromIPC](#tableFromIPC)
* [tableToIPC](#tableToIPC)
* [tableFromArrays](#tableFromArrays)
* [columnFromArray](#columnFromArray)
* [columnFromValues](#columnFromValues)
* [tableFromColumns](#tableFromColumns)
<hr/><a id="tableFromIPC" href="#tableFromIPC">#</a>
<b>tableFromIPC</b>(<i>data</i>[, <i>options</i>])
Decode [Apache Arrow IPC data](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc) and return a new [`Table`](table). The input binary data may be either an `ArrayBuffer` or `Uint8Array`. For Arrow data in the [IPC 'stream' format](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format), an array of `Uint8Array` values is also supported.
* *data* (`ArrayBuffer` \| `Uint8Array` \| `Uint8Array[]`): The source byte buffer, or an array of buffers. If an array, each byte array may contain one or more self-contained messages. Messages may NOT span multiple byte arrays.
* *options* (`ExtractionOptions`): Options for controlling how values are transformed when extracted from an Arrow binary representation.
* *useBigInt* (`boolean`): If true, extract 64-bit integers as JavaScript `BigInt` values. Otherwise, coerce long integers to JavaScript number values (default `false`).
* *useDate* (`boolean`): If true, extract dates and timestamps as JavaScript `Date` objects. Otherwise, return numerical timestamp values (default `false`).
* *useDecimalInt* (`boolean`): If true, extract decimal-type data as scaled integer values, where fractional digits are scaled to integer positions. Returned integers are `BigInt` values for decimal bit widths of 64 bits or higher and 32-bit integers (as JavaScript `number`) otherwise. If false, decimals are converted to floating-point numbers (default).
* *useMap* (`boolean`): If true, extract Arrow 'Map' values as JavaScript `Map` instances. Otherwise, return an array of [key, value] pairs compatible with both `Map` and `Object.fromEntries` (default `false`).
* *useProxy* (`boolean`): If true, extract Arrow 'Struct' values and table row objects using zero-copy proxy objects that extract data from underlying Arrow batches. The proxy objects can improve performance and reduce memory usage, but do not support property enumeration (`Object.keys`, `Object.values`, `Object.entries`) or spreading (`{ ...object }`). Otherwise, use standard JS objects for structs and table rows (default `false`).
```js
import { tableFromIPC } from '@uwdata/flechette';
const url = 'https://vega.github.io/vega-datasets/data/flights-200k.arrow';
const ipc = await fetch(url).then(r => r.arrayBuffer());
const table = tableFromIPC(ipc);
```
<hr/><a id="tableToIPC" href="#tableToIPC">#</a>
<b>tableToIPC</b>(<i>table</i>[, <i>options</i>])
Encode an Arrow table into Arrow IPC binary format and return the result as a `Uint8Array`. Both the IPC ['stream'](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format) and ['file'](https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format) formats are supported.
* *table* (`Table`): The Arrow table to encode.
* *options* (`object`): Encoding options object.
* *format* (`string`): Arrow `'stream'` (the default) or `'file'` format.
```js
import { tableToIPC } from '@uwdata/flechette';
const bytes = tableFromIPC(table, { format: 'stream' });
```
<hr/><a id="tableFromArrays" href="#tableFromArrays">#</a>
<b>tableFromArrays</b>(<i>data</i>[, <i>options</i>])
Create a new table from a set of named arrays. Data types for the resulting Arrow columns can be automatically inferred or specified using the *types* option. If the *types* option provides data types for only a subset of columns, the rest are inferred. Each input array must have the same length.
* *data* (`object | array`): The input data as a collection of named arrays. If object-valued, the objects keys are column names and the values are arrays or typed arrays. If array-valued, the data should consist of [name, array] pairs in the style of `Object.entries`.
* *options* (`object`): Options for building new tables and controlling how values are transformed when extracted from an Arrow binary representation.
* *types*: (`object`): An object mapping column names to [data types](data-types).
* *maxBatchRows* (`number`): The maximum number of rows to include in a single record batch. If the array lengths exceed this number, the resulting table will consist of multiple record batches.
* In addition, all [tableFromIPC](#tableFromIPC) extraction options are supported.
```js
import { tableFromArrays } from '@uwdata/flechette';
// create table with inferred types
const table = tableFromArrays({
ints: [1, 2, null, 4, 5],
floats: [1.1, 2.2, 3.3, 4.4, 5.5],
bools: [true, true, null, false, true],
strings: ['a', 'b', 'c', 'b', 'a']
});
```
```js
import {
bool, dictionary, float32, int32, tableFromArrays, tableToIPC, utf8
} from '@uwdata/flechette';
// create table with specified types
const table = tableFromArrays({
ints: [1, 2, null, 4, 5],
floats: [1.1, 2.2, 3.3, 4.4, 5.5],
bools: [true, true, null, false, true],
strings: ['a', 'b', 'c', 'b', 'a']
}, {
types: {
ints: int32(),
floats: float32(),
bools: bool(),
strings: dictionary(utf8())
}
});
```
<hr/><a id="columnFromArray" href="#columnFromArray">#</a>
<b>columnFromArray</b>(<i>array</i>[, <i>type</i>, <i>options</i>])
Create a new column from a provided data array. The data types for the column can be automatically inferred or specified using the *type* argument.
* *array* (`Array | TypedArray`): The input data as an Array or TypedArray.
* *type*: (`DataType`): The [data type](data-types) for the column. If not specified, type inference is attempted.
* *options* (`object`): Options for building new columns and controlling how values are transformed when extracted from an Arrow binary representation.
* *maxBatchRows* (`number`): The maximum number of rows to include in a single record batch. If the array length exceeds this number, the resulting column will consist of multiple record batches.
* In addition, all [tableFromIPC](#tableFromIPC) extraction options are supported.
```js
import { columnFromArray, float32, int64 } from '@uwdata/flechette';
// create column with inferred type (here, float64)
columnFromArray([1.1, 2.2, 3.3, 4.4, 5.5]);
// create column with specified type
columnFromArray([1.1, 2.2, 3.3, 4.4, 5.5], float32());
// create column with specified type and options
columnFromArray(
[1n, 32n, 2n << 34n], int64(),
{ maxBatchRows: 1000, useBigInt: true }
);
```
<hr/><a id="columnFromValues" href="#columnFromValues">#</a>
<b>columnFromValues</b>(<i>values</i>[, <i>type</i>, <i>options</i>])
Create a new column by iterating over provided *values*. The *values* argument can either be an iterable collection or a visitor function that applies a callback to successive data values (akin to `Array.forEach`). The data types for the column can be automatically inferred or specified using the *type* argument.
* *values* (`Iterable | (callback: (value: any) => void) => void`): An iterable object or a visitor function that applies a callback to successive data values (akin to `Array.forEach`). In most cases, providing a visitor function will be more efficient than an iterator, so is recommended for larger datasets.
* *type*: (`DataType`): The [data type](data-types) for the column. If not specified, type inference is attempted.
* *options* (`object`): Options for building new columns and controlling how values are transformed when extracted from an Arrow binary representation.
* *maxBatchRows* (`number`): The maximum number of rows to include in a single record batch. If the number of values exceeds this number, the resulting column will consist of multiple record batches.
* In addition, all [tableFromIPC](#tableFromIPC) extraction options are supported.
```js
import { columnFromValues, float32, int64 } from '@uwdata/flechette';
// an iterable set of values
const set = new Set([1.1, 1.1, 2.2, 3.4]);
// create column with inferred type (here, float64)
columnFromValues(set);
// create column with specified type
columnFromValues(set, float32());
// create column using only values with odd-numbered indices
const values = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
columnFromValues(callback => {
values.forEach((v, i) => { if (i % 2) callback(v); })
});
```
<hr/><a id="tableFromColumns" href="#tableFromColumns">#</a>
<b>tableFromColumns</b>(<i>columns</i>[, <i>useProxy</i>])
Create a new table from a collection of columns. This method is useful for creating new tables using one or more pre-existing column instances. Otherwise, [`tableFromArrays`](#tableFromArrays) should be preferred. Input columns are assumed to have the same record batch sizes.
* *data* (`object | array`): The input columns as an object with name keys, or an array of [name, column] pairs.
* *useProxy* (`boolean`): Flag indicating if row proxy objects should be used to represent table rows (default `false`). Typically this should match the value of the `useProxy` extraction option used for column generation.
```js
import { columnFromArray, tableFromColumns } from '@uwdata/flechette';
// create table from columns with inferred types (here, bool and float64)
const table = tableFromColumns({
bools: columnFromArray([true, true, null, false, true]),
floats: columnFromArray([1.1, 2.2, 3.3, 4.4, 5.5])
});
```