with-csv
Version:
A CSV file manipulation library with a fluent API Ć la Lodash
176 lines (115 loc) ⢠7.53 kB
Markdown
# withCSV
> _It's like lodash for CSV files !_
`withCSV` is a Node.js library to consume and produce CSV files with clean and readable code, without sacrificing performance. It features :
š A fluent API similar to lodash chainable methods, treating your CSV like the array of objects it really is
šļø Based on a [robust parsing library using Node streams](https://www.npmjs.com/package/csv-parser). It is stupid fast and memory-efficient by default, for you to go crazy on large files šŖ
š Equipped with a streaming [CSV stringifying library](https://csv.js.org/stringify/api/) fit for writing large volumes of data to disk
ā Barely 300 lines of Typescript
ā³ Support for asynchronous callbacks
## Installing
`withCSV` can be installed using your package manager of choice :
```bash
npm install with-csv
# or
yarn add with-csv
```
## Usage
### Example
Given the following CSV file :
```csv
id,name,phone,flag,category
1,Joe,0612345678,true,6
2,Jack,0698765421,false,12
3,Mark,0645631256,true,54
4,Valerie,0645631256,true,12
```
```typescript
import { withCSV } from 'with-csv'
const result = await withCSV('my.csv')
.columns(['name', 'phone', 'flag'])
// row (below) is automatically typed as {name: string, phone: string, flag: string}
.filter(row => row.flag === 'true')
.map(row => `${row.name}: ${row.phone}`)
// value (below) has been typed as the output of .map , which is a string
.filter(value => value.startsWith('J'))
// At this point the CSV file hasn't yet been read
// It will be read by the terminator method `rows` (below)
.rows()
console.log(result)
// [
// "Joe: 0612345678",
// "Jack: 0698765421"
// ]
// You can also use withCSV to produce CSV files after treatment
await withCSV('my.csv')
.columns(['name', 'phone', 'flag'])
// row (below) is automatically typed as {name: string, phone: string, flag: string}
.filter(row => row.flag === 'true')
.toCSV('your.csv')
```
### Initialization
**withCSV**(csvFile, options): Returns an instance of withCSV configured with the provided CSV file and options. At this stage _the CSV file is not opened yet_.
- **csvSource**: The path to the CSV file
- **options** (optional): A [csv-parse options object](https://github.com/mafintosh/csv-parser#options)
The `withCSV` instance exposes the methods **columns** which takes as input an array of column names. This allows `withCSV` to infer the type of the rows.
### Querying API
Once you have selected your columns, you can start manipulating your CSV data using the Querying API. It consists of two categories of methods :
āļø chainable methods which are stacked in a pipeline through which every row will be processed one by one
š§ terminator methods which will trigger the reading of the file, and the processing of each row through the pipeline
**Only one terminator method can be present.** It will return a promise which resolves to the output of your pipeline.
Unless otherwise specified, the methods signature are always the same as their [corresponding method in the javascript Array prototype](https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Global_Objects/Array).
The only major differences are :
- All methods in the querying API accept **asynchronous callbacks**
- Array methods such as `filter`, `map` etc... will not receive the whole array as their last argument. This is by design as the CSV file is never held fully in memory.
### āļø Chainable methods
**map(callback)**: maps each record to a new shape. The output will be typed accordingly.
**pick(keys)**: picks a subset of properties from the records. `keys` is described in the [lodash.pick](https://lodash.com/docs/#pick) documentation.
**filter(callback)**: filters out records.
**forEach(callback)**: this is executed on each record, but doesn't alter the data of the rows.
**uniq(iterator)**: deduplicates records from your CSV file. It can accept as argument :
* A column name to deduplicate on that column
* An array of column anmes to deduplicate on the combination of those columns
* A callback returning a string to deduplicate on the value of that string
### š§ Terminator methods
_The following methods only ever consume one row at a time so they are safe to use on very large files_
**process**: Executes the pipeline on all the rows, but without outputting any data. This is useful for example when your pipeline is based on `forEach` and you want to discard the final output data.
**find(callback)**: returns the first matching record
**findIndex(callback)**: returns the index of the first matching record
**every(callback)**: returns `true` if all records match
**some(callback)**: returns `true` if at least one record matches
**includes(value)**: returns `true` if the final output contains the value passed. This uses [`lodash.isEqual`](https://lodash.com/docs/4.17.15#isEqual) so the value can be a primitive, object, array, etc...
**count()**: returns the number of rows at the end of the pipeline
**first(limit)**: returns the first elements of the result up to a maximum of `limit`
**toCSV(csvTarget, options)**: writes the result to a file or a stream
* if `csvTarget` is a string, a file at this path will be created
* if `csvTarget` is a WriteStream, the data will be piped directly to it
* `options` are documented in the [`csv-stringify` documentation page](https://csv.js.org/stringify/options/)
_The following methods consume the entirety of your CSV file and the resulting output will be stored in memory. Very large files should be adequately filtered beforehand or you may max out your machine's memory._
**last(limit)**: returns the last elements of the result up to a maximum of `limit`
**skip(offset)**: returns the whole result but omits the `offset` first items
**key(property, filterUndefined)**: returns an array of the values of that `property` for each row. If `filterUnderfined` is true, then only defined values will be returned.
**toJSON**(replacer, spaces): returns the final result of the query pipeline, as a JSON string. `replacer` and `spaces` are documented in the [JSON.stringify signature](https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify).
**rows**(): returns all the rows of the result, as an array of objects
## Contributing & Testing
Feel free to write us about any [bug you may find](https://github.com/amblerhq/withCSV/issues), or [submit a PR](https://github.com/amblerhq/withCSV/pulls) if you have fresh ideas for the library !
This project does its testing with a [ghetto Jest clone](tests/index.ts) written in <60 lines, and which includes basic benchmarking. You can launch the test suite & benchmarks with `npm test` or `yarn test` (it should take around 1 minute).
Here are the benchmark results on a decently powerful machine. Small sample contains 100 rows, medium contains 100 000 and large contains 2 000 000.
```
Benchmark Import and export : small: 12.515ms
Benchmark Import and export : medium: 390.564ms
Benchmark Import and export : large: 8.820s
1 => Import and export
Benchmark 1 map : small: 3.122s
Benchmark 1 map : medium: 576.674ms
Benchmark 1 map : large: 9.265s
2 => 1 map
Benchmark 4 chained map : small: 2.094ms
Benchmark 4 chained map : medium: 509.942ms
Benchmark 4 chained map : large: 11.925s
3 => 4 chained map
Benchmark uniq : small: 2.786ms
Benchmark uniq : medium: 667.513ms
Benchmark uniq : large: 15.352s
4 => uniq
```
Made with š @ [Ambler HQ](https://github.com/amblerhq)