series-extractor
Version:
A TypeScript library for extracting data series from nested objects and arrays using a custom syntax.
206 lines (159 loc) • 7.63 kB
Markdown
# series-extractor
This library extracts data series from nested JavaScript objects and arrays. You define the
structure of the data, and it will automatically extract values. It removes the need for
writing custom logic to iterate over deeply nested structures to extract the data you want.
## Installation
```bash
npm install series-extractor
# or
yarn add series-extractor
```
## Syntax
This library uses a custom syntax for succinctly describing nested data structures and which values
need to be extracted. The syntax essentially outlines the object/array structure, and then labels
which parts to be extracted. You only define the structure for the portions you want to extract data
from.
### To specify the structure
- `{key:value, ...}`: Define an object
- `[value, key:value, ...]`: Define an array. The optional `key` is used to sparsely define the
array structure. If omitted, it will use the value's index/position. For example: `[4:a, b]`,
will be interpreted as `[4:a, 1:b]`
- Only values can be a nested definition, not keys; e.g. `{a:{b:[c,d]}}`. This means `Map` objects
are not supported currently.
*Shorthand:*
- `key.value`: Define an object with only one relevant key. Omit the `.` when nesting,
except when the nesting is itself a shortcut; e.g. `key{key:value}` or `key.key.value`.
- `#key.value`: Define an array with only one relevant index. The `#` prefix is needed to indicate
`key` is an an index into an array
### To label parts of the structure to be extracted
- Prefix keys or values with a `$` to mark them as dimension variables in the series. For example:
`[$count]` will save the array's first value to a dimension named `count`.
- Leaf values must *always* be dimension variables. For example: `a.b{c:[$d,$e],f:$g}`; note the
leaves `d`, `e`, and `g` are all variables.
- The `$` comes after a `#`, e.g. `#$var.$val` is equivalent to `[$var:$val]`
*Anonymous variables:*
Use a single `$` character for an anonymous variable. When extracting data as *rows* (see below),
anonymous variables will still be iterated, but the anonymous variable itself is omitted from the
ouput. Examples:
- `{$key: $}`: extract all keys, but ignore their value
- `[$: $val]`: extract all values, but ignore their index in the array
### Smaller syntactic details to be aware of
- Whitespace is ignored. Insert whitespace anywhere to help with readability
- You'll need to escape characters like `.`, `{`, whitespace, etc with a backslash to treat them
literally. Typically these characters are not used as object keys, so you should rarely need to
escape.
### Common pitfalls
- **Array iteration**: `[{...}]` accesses only index 0. To iterate through all elements, use a
dimension variable for the key: `[$idx: {...}]` or anonymous `[$: {...}]`
- **Nested arrays**: `[[...]]` means `[0: [0: ...]]` (accessing index 0 at both levels). To
iterate through both levels: `[$: [$: ...]]`
## Basic Usage
Extract data as rows. This uses a heuristic to try and flatten the structure into rows.
```javascript
import { seriesExtractor } from 'series-extractor';
const extractor = seriesExtractor(pattern)
for (const row of extractor.extractRows(data))
console.log(row);
```
### Heuristic details
Extraction is a depth-first traversal: first constant keys are traversed, in the order defined in
the syntax; second variable dimension keys are traversed in object/array order. To organize this
into rows, we must make some assumptions about what constitutes a *row*. We assume that:
- Variable dimension *keys* indicate rows of data. A row is generated for each key. This fits most
data because typically rows are not keyed explicitly, but instead placed in an array or object
indexed by unknown primary key.
- Nested data indicates data has been grouped by some primary key. Variable dimension keys are
broadcast among nested rows.
- Constant keys indicate secondary grouping keys for any subsequent rows. These are broadcast among
rows from any *following sibling*. Be careful to always define these secondary grouping keys
before the rows you want to broadcast to. For example:
- `{$: $row, b: $group}` or `{b: $group, $: $row}`: `$group` is included with each `$row`, because
dimension variables always follow constant keys when traversing. *(Putting constants before
variable keys in the syntax is recommended for clarity)*
- `{a: $.$row, b: $group}`: `$group` is *not* included with each `$row`, but instead extracted
separately. Both `a` and `b` are constant keys, so are traversed in syntax order. Since the rows
*precede* the group constant instead of *follow*, they are not included. To be included, `b`
most be rearranged to come first.
This heuristic is sensible for most data, but for complex structures it may not be what you want.
This may be improved in future versions, but for now if you wish more control you'll need to
manually build the rows; see **Advanced Usage**.
## Advanced Usage
Extract individual values, optionally with traversal information. This is a depth-first traversal:
first constant keys are traversed, in the order defined in the syntax; second variable dimension
keys are traversed in object/array order.
```javascript
import { seriesExtractor, ExtractableFlags, ExtractableStack } from 'series-extractor';
const extractor = seriesExtractor(pattern)
// Extract individual values
for (const value of extractor.extract(data))
console.log(`${value.name} = ${value.value}`)
// Also include traversal information
for (const value of extractor.extract(data, ExtractableFlags.STACK | ExtractableFlags.ANONYMOUS)) {
if (value == ExtractableStack.PUSH) {
console.log("Entering nesting")
} else if (value == ExtractableStack.POP) {
console.log("Exiting nesting")
} else if (value.name === null) {
console.log(`Anonymous variable = ${value.value}`)
} else {
console.log(`${value.name} = ${value.value}`)
}
}
```
## Example
Extract a flattened time series from nested sensor data:
```typescript
import { seriesExtractor } from 'series-extractor';
// Complex nested sensor data
const sensorData = {
buildings: {
"building-A": {
floors: [
[{ temp: 68, humidity: 45 }, { temp: 70, humidity: 48 }],
[{ temp: 72, humidity: 50 }]
],
roof: { temp: 90, humidity: 55 }
},
"building-B": {
floors: [[{ temp: 69, humidity: 46 }]],
roof: { temp: 85, humidity: 52 }
}
}
};
// Define extraction pattern
const extractor = seriesExtractor(`
buildings.$building {
floors: #$floor.#$ {
temp: $temp,
humidity: $humidity
},
roof: {
temp: $temp,
humidity: $humidity
}
}
`);
// Extract flattened rows
for (const row of extractor.extractRows(sensorData))
console.log(row);
/* Output - complex nesting flattened to rows:
{ building: 'building-A', floor: 0, temp: 68, humidity: 45 }
{ building: 'building-A', floor: 0, temp: 70, humidity: 48 }
{ building: 'building-A', floor: 1, temp: 72, humidity: 50 }
{ building: 'building-A', temp: 90, humidity: 55 }
{ building: 'building-B', floor: 0, temp: 69, humidity: 46 }
{ building: 'building-B', temp: 85, humidity: 52 }
*/
```
## Development
- `src/index.ts`: Contains all the core logic for parsing and extraction.
- `package.json`: Project metadata and dependencies.
- `tsconfig.json`: TypeScript compiler configuration.
To build the project:
```bash
npm run build
```
To run tests:
```bash
npm test
```