data-projector
Version:
Load and transform datasets
392 lines (304 loc) • 10.2 kB
Markdown
and map / transform the data for use.
- Load CSV datasets
- Guess types and cast data fields to types
- Calculate stats: global, fields and pairwise (field by field correlations etc)
- Map datasets to other datasets using transform functions
This is designed to take a specification object (JSON) and load a dataset and optionally map values to requested ranges.
The JSON specification objects can be saved in your application for use in presets.
Status: ALPHA
```javascript
// Simple example with only the default statistics and no field mappings requested.
const dp = require('data-projector');
const datasetPath = '/Users/crucial/code/idmx/playsplom/app/vendor/datasets/Nightingale.csv';
const functions = {};
const calculateStatsParams = {};
const mapFieldsParams = {};
const x = dp.project(functions, datasetPath, calculateStatsParams, mapFieldsParams);
x.then(console.log, console.error);
```
```javascript
{
"data": [
{
"0": 1,
"Date": "1854-04-01",
"Month": "Apr",
"Year": 1854,
"Army": 8571,
"Disease": 1,
"Wounds": 0,
"Other": 5,
"Disease.rate": 1.4,
"Wounds.rate": 0,
"Other.rate": 7
},
{
"0": 2,
"Date": "1854-05-01",
"Month": "May",
"Year": 1854,
"Army": 23333,
"Disease": 12,
"Wounds": 0,
"Other": 9,
"Disease.rate": 6.2,
"Wounds.rate": 0,
"Other.rate": 4.6
},
// etc etc
],
"fields": [
"0", // first column header did not have a name
"Date",
"Month",
"Year",
"Army",
"Disease",
"Wounds",
"Other",
"Disease.rate",
"Wounds.rate",
"Other.rate"
],
"path": "/Users/crucial/code/idmx/playsplom/app/vendor/datasets/Nightingale.csv",
"stats": {
"fields": {
"0": {
"minval": 1,
"maxval": 24,
"type": {
"type": "number",
"null": false
}
},
"Date": {
"minval": null, // bug: didn't calculate minval max for the date column
"maxval": null,
"type": {
"type": "date",
"null": false,
"dateFormat": "YYYY-MM-DD" // auto-detected the date format
}
},
"Month": {
"minval": null,
"maxval": null,
"type": {
"type": "string",
"null": false
}
},
"Year": {
"minval": 1854,
"maxval": 1856,
"type": {
"type": "number",
"null": false
}
},
"Army": {
"minval": 8571,
"maxval": 47751,
"type": {
"type": "number",
"null": false
}
},
"Disease": {
"minval": 1,
"maxval": 2761,
"type": {
"type": "number",
"null": false
}
},
"Wounds": {
"minval": 0,
"maxval": 287,
"type": {
"type": "number",
"null": false
}
},
"Other": {
"minval": 5,
"maxval": 361,
"type": {
"type": "number",
"null": false
}
},
"Disease.rate": {
"minval": 1.4,
"maxval": 1022.8,
"type": {
"type": "number",
"null": false
}
},
"Wounds.rate": {
"minval": 0,
"maxval": 115.8,
"type": {
"type": "number",
"null": false
}
},
"Other.rate": {
"minval": 2.5,
"maxval": 140.1,
"type": {
"type": "number",
"null": false
}
}
}
}
}
<a name="project"></a>
Load a dataset from disk, calculate statistics and apply transformations
**Returns**: <code>Object</code> - Dataset
| Param | Type | Description |
| --- | --- | --- |
| functions | <code>Object</code> | Named function registery |
| path | <code>String</code> | |
| calculateStatsParams | <code>Object</code> | |
| mapFieldsParams | <code>Array.<Object></code> | |
<a name="readParseDataset"></a>
Load and parse a dataset from path.
Stats are not yet calculated so types are unknown
and all fields are strings.
**Returns**: <code>Promise.<Object></code> - Promise for a dataset
| Param | Type | Description |
| --- | --- | --- |
| path | <code>String</code> | Absolute path to file |
<a name="loadDataset"></a>
Load and parse a dataset and calculate stats and coerce types of field values.
**Returns**: <code>Promise.<Object></code> - Promise for a dataset
| Param | Type | Description |
| --- | --- | --- |
| path | <code>String</code> | Absolute path to file |
| functions | <code>Object</code> | Named function registery |
| calculateStatsParams | <code>Object</code> | The `stats` object from params |
<a name="createDataset"></a>
Create a dataset object from an array of objects
**Returns**: <code>Object</code> - dataset - {data, fields, path}
| Param | Type | Description |
| --- | --- | --- |
| data | <code>Object</code> | [{field: value, field2: value}, ...] |
| fields | <code>Array.<String></code> | Field names |
| path | <code>String</code> | |
<a name="_calculateStats"></a>
Calculate statistics (minval, maxval, avg etc.) for a dataset using a stats specification.
**Returns**: <code>Object</code> - stats
| Param | Type | Description |
| --- | --- | --- |
| functions | <code>Object</code> | Named function registery |
| calculateStatsParams | <code>Object</code> | The `stats` object from params |
| dataset | <code>Object</code> | As returned by loadDataset or from a previous transformation. |
<a name="calculateStats"></a>
Calculate statistics and return a new dataset objects with .stats set
**Returns**: <code>Object</code> - dataset
| Param | Type | Description |
| --- | --- | --- |
| functions | <code>Object</code> | Named function registery |
| calculateStatsParams | <code>Object</code> | |
| dataset | <code>Object</code> | |
<a name="castTypes"></a>
Having guessed types with calculateStats, cast all fields to the guessed types.
- This converts '1.1' to 1.1
- Enums of strings to their integer indices
- Date strings to Date objects
- String fields with high cardinality remain strings
**Returns**: <code>Object</code> - Dataset object with values cast to guessed types
| Param | Type | Description |
| --- | --- | --- |
| dataset | <code>Object</code> | Dataset object |
<a name="mapFields"></a>
mapFields
Map input fields to output fields using mapping functions as specified in
mapFieldsParams
```js
{
input: 'inFieldName',
output: 'outFieldName'
fn: 'linear', // named function in functions registry
args: [0, 1] // parameters for linear mapping function
}
```
fn may be a String key to a function in the functions registery
or a function(stats, fieldName, [...args], value)
| Param | Type | Description |
| --- | --- | --- |
| functions | <code>Object</code> | Named function registery |
| mapFieldsParams | <code>Array.<Object></code> | |
| dataset | <code>Object</code> | |
<a name="makeMapFunction"></a>
makeMapFunction from mapParam
mapParam:
.fn
.args
Where fn is a Function or a String key to lookup Function in `functions`
Function should accept: (stats, fieldName, ...args, value)
Args are optional array of params to configure your mapping function.
eg. [minval, maxval]
This curries the function and calls it with:
(stats, fieldName, ...args) and returns that mapping function which accepts just value
and returns the mapped value.
**Returns**: <code>function</code> - any => any
| Param | Type | Description |
| --- | --- | --- |
| functions | <code>Object</code> | Named function registery |
| stats | <code>Object</code> | |
| mapParam | <code>Object</code> | |
<a name="getRow"></a>
Get a single row as an Object.
As this function is curried you can bake in dataset and fields:
```js
getter = getRow(dataset, null); // returns a function with first two args satisfied
getter(12); // get row 12
```
**Returns**: <code>Object</code> - - The object for this row.
| Param | Type | Description |
| --- | --- | --- |
| dataset | <code>Object</code> | |
| fields | <code>Array.<string></code> \| <code>null</code> | Optionally select just the fields you need. null selects all fields. |
<a name="getCell"></a>
Get a single data value (row, column)
As this function is curried you can bake in dataset and field:
```js
getter = getCell(dataset, 'sepalLength');
getter(12); // get value at row 12, field 'sepalLength'
```
**Returns**: <code>mixed</code> - - The value for this cell.
| Param | Type | Description |
| --- | --- | --- |
| dataset | <code>Object</code> | |
| field | <code>String</code> | key of the field to select |
| index | <code>Number</code> | integer index of row |
<a name="getColumn"></a>
Get all values for a column
As this function is curried you can bake in dataset:
```js
getter = getColumn(dataset);
getter('sepalLength'); // get the array of values for the sepalLength field
```
**Returns**: <code>Array.<mixed></code> - - Array of values for this field
| Param | Type | Description |
| --- | --- | --- |
| dataset | <code>Object</code> | |
| field | <code>String</code> | key of the field to select |
Load CSV datasets