UNPKG

xlsx-datafill

Version:

Scalable, template based data population for Excel XLSX spreadsheets.

295 lines (206 loc) 18.6 kB
# Xlsx smart templates DataFill The library takes an existing `.xlsx` file and _populates_ the extracted data from provided `JSON`, into the designated cell**s**, which we call _templates_. If the extracted data is an array - it is expanded and occupies as many cells, as the size of the data. In both dimensions. Each template follows a specific format and structure, and it defines these crucial aspects: * Where is the data coming from? * How is the data extracted? * How are the values for the cells extracted from the data? * How are the cell styles tweaked? All these in the context of the provided JavaScript object. If you've alrady know the story - check the [API](./API.md). ## Few examples that explain it all Consider the following JSON: ```json { "title": "An exemplary title", "rows": [ { "header": "Row 1", "data": [11, 12, 13, 14, 15] }, { "header": "Row 2", "data": [21, 22, 23, 24, 25] }, { "header": "Row 3", "data": [31, 32, 33, 34, 35] } ] } ``` A cell with the following content: `{{ | | title }}` will be expanded into `An exemplary title` after the data is filled. More information on the [template format](#template-format) will be given in a second. Consider this cell content: `{{ | rows | header }}`. It’ll be expanded into three cells, in a column. Like this: | | A | |--------|--------| | 1 | Row 1 | | 2 | Row 2 | | 3 | Row 3 | The template defines that the _data_ should be extracted from `rows` - which results in an array of 3 objects, and the _values_ for the target cells - 3, of course - should be extracted using the `header` _path_ inside each of the entries in the extracted _data_. So far, the data was extracted from the root of the initially provided JavaScript object. However, each template can refer another one, taking the data extracted from it, as a basis for its own processing. <a id="nested-blocks">Nested blocks</a>. Consider this spreadsheet: | | A | B | C | D | E | F | | ---- | ----------------------- | ----------------------- | ---- | ---- | ---- | ---- | | 1 | {{ \| rows \| header }} | {{ A1 \| 1 * data \| }} | | | | | | 2 | | | | | | | | 3 | | | | | | | The template in `A1` is clear - it expands into the range `A1:A3` as expected. The template in `B1`, however, introduces two new, interesting aspects. First, it _refers_ another template - the one in `A1`, and second - it gives strange notion of how the data is extracted - `1 * data`, instead of just `data`. The second one is simple - since the general data extraction form is `<rows> * <columns>` and, for example `rows` (in `A1`) is a shorthand for `rows * 1`, so the `1 * data` instructs the engine to expand the retrieved data horizontally (i.e. in many columns, but a single row), rather than vertically. The first one - the reference - means that the `B1` will _not_ extract the data directly from the provided JS object, but rather - _from the data already extracted from the referred template_. And this happens for _each data entry_ extracted from the referred template. Replaying in "slow-mo", the whole data extract & placement process will look like: 1. The engine processes `A1` template: 1. Extracts the data, resulting in an array of 3 _objects_ (exactly the one referred by `rows` property). 2. From each of these object, a value is extracted using the `header` property, resulting in the following array: `[“Row 1”, “Row 2”, “Row 3”]`. 3. The values are placed from the template’s cell (`A1`) downwards. 2. **For each** of the objects in the array, extracted in [1.a], all dependent templates - in this case `B1` are processed. We’ll show the processing for only the `{ “header”: “Row 1”, “data”: [...]}` object: 1. Data is extracted using the `data` property _from the provided reference object_, resulting in an array of 5 numbers. 2. Since there is no additional _value extractor_ - the data is used as is, for filling the cells - in this case in the range `B1:F1`. 3. The same process is repeated for _all three_ of the objects extracted in step [1.b]. The resulting table will look like: | | A | B | C | D | E | F | | ---- | ----- | ---- | ---- | ---- | ---- | ---- | | 1 | Row 1 | 11 | 12 | 13 | 14 | 15 | | 2 | Row 2 | 21 | 22 | 23 | 24 | 25 | | 3 | Row 3 | 31 | 32 | 33 | 34 | 35 | If you want to get a real feel about the power of the engine — check the [multi-dimensional example](./examples/multid-output.xlsx), showing how an automatically generated [5D data](./examples/gen-data-5d.json) can get expanded throughout the sheet, [by a simple template](./examples/multid-template.xlsx). **Great! That’s it!** _One more thing..._ As the general syntax of the data extraction suggests - there is another, more elegant way to achieve the same result. The template in `B1` could have be written in the following form: `{{ | rows * data | }}`. Quite natural to write, and should be clear, by now, why it leads to the same result. There is one more heavy lifting task that the engine does - it automatically _merges cells_, if the referring template turns to occupy more than one cell in the same dimension. In other words, if the template in `B1` was written as `{{ A1 | data | }}`, (i.e. without the `1*` part), this would instruct the engine to grow the data vertically. But, the data from `A1` template, already grows vertically, so the engine will have to make the `A1` cells “bigger”, i.e. occupying more rows. The result will look like this: | | A | B | | ---- | ----- | ---- | | 1 | Row 1 | 11 | | 2 | | 12 | | 3 | | 13 | | 4 | | 14 | | 5 | | 15 | | 6 | Row 2 | 21 | | 7 | | 22 | ... at least, these are the first 7 rows of it. Cells `A1:A5` will be merged, just like `A6:A10`, and `A11:A15`. Hope it is clear by now. Check this and the [other examples](./examples/). ## How to use it The actual access to a XLSX notebook is delegated to an external library, through a so-called _accessor_, and there is currently one implementation, based on [`xlsx-populate`](https://github.com/dtjohnson/xlsx-populate) library. Check the [API](./API.md) to see how a custom one can be implemented. Considering the existing accessor implementation, the use of `xlsx-datafill` is quite simple: ```javascript // Open the notebook and create the accessor for it const wb = await XlsxPopulate.fromFileAsync(path); const xlsxAccess = new XlsxPopulateAccess(wb, XlsxPopulate); // Create an instance of XlsxDataFill and provide custom // options. const dataFill = new XlsxDataFill(xlsxAccess, { callbacksMap: ... // Some custom handlers. }); // Make the actual processing of `data`. dataFill.fillData(data); // The data in populated inside the `wb`, so it can be used. wb.workbook().toFileAsync(...); ``` > **Note**: The template definitions are overwritten by the actual data, so don’t expect to be able to run `fillData()` with different data. Check the [template options section](#template-options) for more information on how to configure the `XlsxDataFill` instance. Refer to the [examples](./examples/) folder, as well as to the [API documentation](./API.md) for more and deeper documentation. ## <a name="template-format">Template format</a> The general format of each template is like follows: ``` {{ <reference cell> | <iterators> | <extractor> | <padding> | <styling> }} ``` Both the surrounding `{{`mustache`}}` brackets, and the `|` separator symbol are configurable, via [XlsxDataFill](./API.md#XlsxDataFill) constructor’s options. The meaning of each field is: | Field | Meaning | | ----------- | ------------------------------------------------------------ | | `reference` | The address of a cell, to be used as a reference for data extraction. If empty - the provided object’s root is considered. In both cases this is referred as _template data root_ in the rest of the description. | | `iterators` | JSON paths, determining how the data should be extracted from the _template data root_, and it follows the form <br />`<row data path> * <column data path>`. <br />The `<row path>` is applied on the _template data root_, while the `<col path` works on the result of `<row path>` extraction. If one needs the data to grow vertically (i.e. only as a column), the form `1 * <col path>` is allowed, in which case `<col path>` works directly on the _template data root_.<br />Can be empty, if the _template data root_ itself should be used. | | `extractor` | A JSON path, determining how the value that needs to be written in the cell(s) should be extracted from the data, provided by the iterators.<br />Can be omitted, in which case the iterators’ provided data is taken as a whole. | | `padding` | A `:` delimited pair specifying how many cells on each direction `row:column` need to be _added_ for each new entry from the extracted data. Can be omitted. | | `styling` | A comma-delimited styling pairs of the format `<style name>=<extractor>`, setting each cell’s style (with the given name), to the value extracted from iterator’s data, using the `extractor` as a JSON path. E.g. `fill='red'` will set the `fill` style of each of the cells to `red`. <br />It is allowed to pass a handler as value (e.g. `fill=:dangerColor` will set fill color to the value returned by `dangerColor` handler). Also, any of the [options](#template-options) can be overridden, by providing it here with a `!` as prefix - e.g. `!mergeCells=false` will set this options to _false_ only for this template. | The _JSON path_, mentioned above, refers to the ability to provide a full **path** of properties from the _template data root_ instead of just one property. So valid paths are, for example: `rows`, `genres.fiction`, `data[0].name`, etc. Check the [lodash’s `get` helper](https://lodash.com/docs/4.17.15#get), because this is what is used. In order to add additional flexibility, one can reference a user-provided function (via [template options](#template-options)) for both _iterators_ and _extractors_, including those used for styling. If a JSON-path component is suffixed with `:<handler name>` (e.g. `data:dataFix`, or even just `:dataGive`), the result of invoking the corresponding handler is used. The expected definition of such handler is: ```javascript /** * @param {object} data The data base for the current context. * @param {Cell} cell A target cell, if applicable. * @returns {*} The required value. */ function myHandler(data, cell); ``` Few things need to be clarified. First, the context (i.e. `this`) provided is the _options_ object, as provided upon [XlsxDataFill instantiation](./API.md#new_XlsxDataFill_new). The `data` object is the one that corresponds to the given context. For example, in the following template: ``` {{ A1 | rows:hRows * data:hData | :hNumber }} ``` All three handlers `hRows`, `hData` and `hNumber` will be invoked with different `data` argument - `hRows` will receive whatever `rows` extracted, `hData` will receive, whatever `hRows` returned (!), and `hNumber` will be given whatever `hData` returned. In other words: > Handlers are applied **after**, and their result taken **instead of**, whatever data is extracted with the JSON path. Another interesting thing is styling. It is quite straightforward, however. Each named style, as recognized by [accessor’s `setStyle()` method](./API.md#XlsxPopulateAccess+setStyle), is referred and the value is extracted in the usual way (JSON path + handler). ## <a name="template-options">Template options</a> Here are the options and their defaults. ```javascript { templateRegExp: new RegExp(/\{\{([^}]*)\}\}/), fieldSplitter: "|", joinText: ",", mergeCells: true, duplicateCells: false, followFormulae: false, copyStyle: true, callbacksMap: { '': data => _.keys(data), $: data => _.values(data) } }; ``` Check the [detailed description in the API](API.md#new-xlsxdatafillaccessor-opts). It is worth noting the `mergeCells` and `duplicateCells` behavior. First, they both have the same set of possible values: `true`, `false`, `”both”`, `”vertical”`, `”horizontal”` – in which direction the cells need to be merged/duplicated. As expected `true` and `”both”` have the same meaning. Second, cells duplication is valid only when merging is disabled, in other words, if `mergeCells == true`, `duplicateCells` is ignored. Given these options: ```js mergeCells: "vertical", duplicateCells: true ``` Is interpreted as follow: * If during data expansion a value in higher dimension occupies more than one cell in a **column**, they are merged – because these are _vertical_. * If a value from higher dimension occupies more than one cell in a **row**, then duplication options is taken into account and same value is duplicated on all these cells. ## <a id="formulae-handling">Formulae handling</a> As formulas are a key Excel feature, so `xlsx-datafill` is trying to keep them alive and meaningful. As a basic rule _Raw formulas are kept as they are, only those put in a template format, are handled_. So, the [template format](#template-format) has slightly different version: * `iterators` determine how the formula will be populated, and is one of the following values: `both`, `rows`, `cols` or `none`. The latter can be replaced with an empty value. * `extractor` is the actual formula and **must** start with `=` so the engine recognizes it as such. * `reference` should be present, otherwise an error will be issued. If a non-referenced formula is needed - just don’t use the template format. Two operations are performed during data population of the referenced template - _formula alteration_ and _formula population_. How this happens depends on the value inside `iterators` field: * When `none` is selected, the formula is not populated - The ranges inside it are expanded to match the size of the referenced data block. * When `cols` is selected, the formula is populated across the columns of referenced data block, while each range inside the formula is expanded across the rows of the data block. * When `rows` is selected, the population and expanding processes are reversed - the formula is populated across rows, and the ranges inside are expanded across columns. * Finally, when `both` is selected - the formula is not expanded - it is just populated across the same area of cells, as the referenced data block. It is important to note, that the formula population starts from the cell with the _formula_ template, or one with the same offset, as the current data block - review the [nested blocks](#nested-blocks) concept for more information. ### Few examples Consider the following template (and the same data as the previous examples): | | A | B | C | D | E | F | | ---- | :---------------------- | ------------------------- | ---- | ---- | ---- | ---- | | 1 | {{ \| rows \| header }} | {{ \| rows * data \| }} | | | | | | 2 | | | | | | | | 3 | | | | | | | | 4 | Formula: | {{ B1 \|\| =SUM(B1:B1) }} | | | | | Will result in the following output table: | | A | B | C | D | E | F | | ---- | -------- | ---- | ---- | ---- | ---- | ---- | | 1 | Row 1 | 11 | 12 | 13 | 14 | 15 | | 2 | Row 2 | 21 | 22 | 23 | 24 | 25 | | 3 | Row 3 | 31 | 32 | 33 | 34 | 35 | | 4 | Formula: | 345 | | | | | With the formula inside `B4` being altered to `=SUM(B1:F3)`. The skipped `iterators` field is synonym of `none`, therefore - no population, just expansion. If the _formula template_ was this one: `{{ B1 | cols | =SUM(B1:B1) }}`, then the resulting table would be this: | | A | B | C | D | E | F | | ---- | -------- | ---- | ---- | ---- | ---- | ---- | | 1 | Row 1 | 11 | 12 | 13 | 14 | 15 | | 2 | Row 2 | 21 | 22 | 23 | 24 | 25 | | 3 | Row 3 | 31 | 32 | 33 | 34 | 35 | | 4 | Formula: | 63 | 66 | 69 | 72 | 75 | The formula was populated across the columns, with each one being expanded across rows, i.e. the formula in `C4` will be `=SUM(C1:C3)`. Specifying `both` as _iterators_ keyword will result in the same size (3x5 in this example) table. It _does_ make sense if it includes anchored references, i.e a template like this `{{ B1 | both | =B1 * $A$5 }}` will result in a 3x5 table, starting from `B4`, with each value from `B1:F3` being multiplied by the value in `A5`. For example the formula in `C4` would be `C1 * $A$5`. ## Some important notes There are several specifics to be kept in mind: * The _iterators_ usually resolve to an array. If it resolves to an object, and still some iteration is expected — it can be converted to an array of object’s values, using the `$` handler (i.e. appending `:$`). If you need the keys (as opposed to object’s values) — append `:` to the last iterator, i.e. utilizing an empty _extraction handler_. Both of these default handlers can be overridden with the provided options. * If during _value_ extraction, the result is an array — it is automatically joined, using the (configurable) `joinText` from the options. * No matter what part of the cell’s value the template definition occupies, at the end — the whole cell is overwritten with the resolved value(s). Since, this is not a simple find-and-replace, and the cells need duplication — it is not as trivial as expected. A possible workaround is to use handlers to append whatever is needed for each cell. * The `xlsx-populate` library is _not_ a dependency, because (potentially in the future) other accessors can be used, so don’t expect it to be there if you just refer `xlsx-datafill`. ## Contribution Any help is appreciated! Check [**the list in the repo**](https://github.com/ideaconsult/xlsx-datafill/labels/contribution). **Thank you!** ## [Changelog](https://github.com/ideaconsult/xlsx-datafill/releases)