dataship-frame
Version:
A Data Frame for Javascript. Crunch numbers in node and the browser.
99 lines (79 loc) • 2.14 kB
Markdown
# frame
a DataFrame for Javascript.
_crunch numbers in Node or the Browser_
## features
* Interactive performance (<100ms) on millions of rows
* Syntax similar to SQL and Pandas
* Compatible with `PapaParse` and [`BabyParse`](https://github.com/Rich-Harris/BabyParse)
## examples
Parse the [Iris](https://vincentarelbundock.github.io/Rdatasets/datasets.html)
dataset (with [`BabyParse`](https://github.com/Rich-Harris/BabyParse)) and create a `Frame` from the result.
```javascript
var baby = require('babyparse'),
Frame = require('frame');
// parse the csv file
config = {"header" :true, "dynamicTyping" : true, "skipEmptyLines" : true};
iris = baby.parseFiles('iris.csv', config).data;
// create a frame from the parsed results
frame = new Frame(iris);
```
### groupby
Group on `Species` and find the average value (`mean`) for `Sepal.Length`.
```javascript
g = frame.groupby("Species");
g.mean("Sepal.Length");
```
```json
{ "virginica": 6.58799, "versicolor": 5.9360, "setosa": 5.006 }
```
Using the same grouping, find the average value for `Sepal.Width`.
```javascript
g.mean("Sepal.Width");
```
```json
{ "virginica": 2.97399, "versicolor": 2.770, "setosa": 3.4279 }
```
### where
Filter by `Species` value `virginica` then find the average.
```javascript
f = frame.where("Species", "virginica");
f.mean("Sepal.Length");
```
```json
6.58799
```
Get the number of rows that match the filter.
```javascript
f.count();
```
```json
50
```
Columns can also be accessed directly (with the filter applied).
```javascript
f["Species"]
```
```javascript
["virginica", "virginica", "virginica", ..., "virginica"]
```
# tests
Hundreds of tests verify correctness on millions of data points (against a Pandas reference).
`npm run data && npm run test`
# benchmarks
`npm run bench`
typical performance on one million rows
operation | time
----------|------
`groupby` | 54ms
`where` | 29ms
`sum` | 5ms
# design goals and inspiration
* compatibility with [feather](https://github.com/wesm/feather)
## interface
* pandas
* R
* Linq
* rethinkDB
* Matlab
## performance
* [datavore](https://github.com/StanfordHCI/datavore)