UNPKG

series-extractor

Version:

A TypeScript library for extracting data series from nested objects and arrays using a custom syntax.

206 lines (159 loc) 7.63 kB
# series-extractor This library extracts data series from nested JavaScript objects and arrays. You define the structure of the data, and it will automatically extract values. It removes the need for writing custom logic to iterate over deeply nested structures to extract the data you want. ## Installation ```bash npm install series-extractor # or yarn add series-extractor ``` ## Syntax This library uses a custom syntax for succinctly describing nested data structures and which values need to be extracted. The syntax essentially outlines the object/array structure, and then labels which parts to be extracted. You only define the structure for the portions you want to extract data from. ### To specify the structure - `{key:value, ...}`: Define an object - `[value, key:value, ...]`: Define an array. The optional `key` is used to sparsely define the array structure. If omitted, it will use the value's index/position. For example: `[4:a, b]`, will be interpreted as `[4:a, 1:b]` - Only values can be a nested definition, not keys; e.g. `{a:{b:[c,d]}}`. This means `Map` objects are not supported currently. *Shorthand:* - `key.value`: Define an object with only one relevant key. Omit the `.` when nesting, except when the nesting is itself a shortcut; e.g. `key{key:value}` or `key.key.value`. - `#key.value`: Define an array with only one relevant index. The `#` prefix is needed to indicate `key` is an an index into an array ### To label parts of the structure to be extracted - Prefix keys or values with a `$` to mark them as dimension variables in the series. For example: `[$count]` will save the array's first value to a dimension named `count`. - Leaf values must *always* be dimension variables. For example: `a.b{c:[$d,$e],f:$g}`; note the leaves `d`, `e`, and `g` are all variables. - The `$` comes after a `#`, e.g. `#$var.$val` is equivalent to `[$var:$val]` *Anonymous variables:* Use a single `$` character for an anonymous variable. When extracting data as *rows* (see below), anonymous variables will still be iterated, but the anonymous variable itself is omitted from the ouput. Examples: - `{$key: $}`: extract all keys, but ignore their value - `[$: $val]`: extract all values, but ignore their index in the array ### Smaller syntactic details to be aware of - Whitespace is ignored. Insert whitespace anywhere to help with readability - You'll need to escape characters like `.`, `{`, whitespace, etc with a backslash to treat them literally. Typically these characters are not used as object keys, so you should rarely need to escape. ### Common pitfalls - **Array iteration**: `[{...}]` accesses only index 0. To iterate through all elements, use a dimension variable for the key: `[$idx: {...}]` or anonymous `[$: {...}]` - **Nested arrays**: `[[...]]` means `[0: [0: ...]]` (accessing index 0 at both levels). To iterate through both levels: `[$: [$: ...]]` ## Basic Usage Extract data as rows. This uses a heuristic to try and flatten the structure into rows. ```javascript import { seriesExtractor } from 'series-extractor'; const extractor = seriesExtractor(pattern) for (const row of extractor.extractRows(data)) console.log(row); ``` ### Heuristic details Extraction is a depth-first traversal: first constant keys are traversed, in the order defined in the syntax; second variable dimension keys are traversed in object/array order. To organize this into rows, we must make some assumptions about what constitutes a *row*. We assume that: - Variable dimension *keys* indicate rows of data. A row is generated for each key. This fits most data because typically rows are not keyed explicitly, but instead placed in an array or object indexed by unknown primary key. - Nested data indicates data has been grouped by some primary key. Variable dimension keys are broadcast among nested rows. - Constant keys indicate secondary grouping keys for any subsequent rows. These are broadcast among rows from any *following sibling*. Be careful to always define these secondary grouping keys before the rows you want to broadcast to. For example: - `{$: $row, b: $group}` or `{b: $group, $: $row}`: `$group` is included with each `$row`, because dimension variables always follow constant keys when traversing. *(Putting constants before variable keys in the syntax is recommended for clarity)* - `{a: $.$row, b: $group}`: `$group` is *not* included with each `$row`, but instead extracted separately. Both `a` and `b` are constant keys, so are traversed in syntax order. Since the rows *precede* the group constant instead of *follow*, they are not included. To be included, `b` most be rearranged to come first. This heuristic is sensible for most data, but for complex structures it may not be what you want. This may be improved in future versions, but for now if you wish more control you'll need to manually build the rows; see **Advanced Usage**. ## Advanced Usage Extract individual values, optionally with traversal information. This is a depth-first traversal: first constant keys are traversed, in the order defined in the syntax; second variable dimension keys are traversed in object/array order. ```javascript import { seriesExtractor, ExtractableFlags, ExtractableStack } from 'series-extractor'; const extractor = seriesExtractor(pattern) // Extract individual values for (const value of extractor.extract(data)) console.log(`${value.name} = ${value.value}`) // Also include traversal information for (const value of extractor.extract(data, ExtractableFlags.STACK | ExtractableFlags.ANONYMOUS)) { if (value == ExtractableStack.PUSH) { console.log("Entering nesting") } else if (value == ExtractableStack.POP) { console.log("Exiting nesting") } else if (value.name === null) { console.log(`Anonymous variable = ${value.value}`) } else { console.log(`${value.name} = ${value.value}`) } } ``` ## Example Extract a flattened time series from nested sensor data: ```typescript import { seriesExtractor } from 'series-extractor'; // Complex nested sensor data const sensorData = { buildings: { "building-A": { floors: [ [{ temp: 68, humidity: 45 }, { temp: 70, humidity: 48 }], [{ temp: 72, humidity: 50 }] ], roof: { temp: 90, humidity: 55 } }, "building-B": { floors: [[{ temp: 69, humidity: 46 }]], roof: { temp: 85, humidity: 52 } } } }; // Define extraction pattern const extractor = seriesExtractor(` buildings.$building { floors: #$floor.#$ { temp: $temp, humidity: $humidity }, roof: { temp: $temp, humidity: $humidity } } `); // Extract flattened rows for (const row of extractor.extractRows(sensorData)) console.log(row); /* Output - complex nesting flattened to rows: { building: 'building-A', floor: 0, temp: 68, humidity: 45 } { building: 'building-A', floor: 0, temp: 70, humidity: 48 } { building: 'building-A', floor: 1, temp: 72, humidity: 50 } { building: 'building-A', temp: 90, humidity: 55 } { building: 'building-B', floor: 0, temp: 69, humidity: 46 } { building: 'building-B', temp: 85, humidity: 52 } */ ``` ## Development - `src/index.ts`: Contains all the core logic for parsing and extraction. - `package.json`: Project metadata and dependencies. - `tsconfig.json`: TypeScript compiler configuration. To build the project: ```bash npm run build ``` To run tests: ```bash npm test ```