UNPKG

dtl-js

Version:

Data Transformation Language - JSON templates and data transformation

635 lines (501 loc) 20 kB
# Advanced DTL usage DTL is quite easy to begin using and in simple usage is very easy to understand. This simplicity is deceptive, as DTL is extremely powerful. With familiarity with some of the more advanced features of DTL you can do amazing things. If you have not already, be sure to read the [README](../README.md), [DTL Expression Guide](./DTL-Expressions.md) and [Helper guide](./DTL-Helpers.md) ## A word about undefined and null Often in data you will encounter undefined or `null` values. DTL itself has no separate concept of `null`, only `undefined` and therefore considers `null` and `undefined` in input data to be equivalent. It should be noted that while DTL understands undefined values and handles them properly, it can appear as though it doesn't because when outputting JSON, undefined properties can disappear. This is because in Javascript the default JSON stringifier eliminates any object properties that are null or undefined. You can override this by using a replacer function with the `JSON.stringify()` call. [This page](https://muffinman.io/blog/json-stringify-removes-undefined/) describes how to do this. ## The concept of empty Often in programming, the need arises to determine whether a particular variable holds a value or not. In JavaScript, you encounter the types `undefined` and `null` to represent the absence of value. These types, while useful, come with their quirks that could lead to unexpected behaviors or bugs if not handled carefully. For instance, you may have stumbled upon scenarios where the distinction between `undefined`, `null`, and other falsy values like an empty string `''` could lead to different outcomes in your code, sometimes making the code more verbose with additional checks. Here's a quick refresher on how JavaScript treats `undefined` and `null`: ```javascript let val = undefined; console.log(typeof val); // 'undefined' val = null; console.log(typeof val); // 'object' - a bit unexpected! Object.keys(val); // throws an exception :( ``` Both `undefined` and `null` signify the absence of a value in JavaScript, but as demonstrated, they behave differently. Moreover, when dealing with JSON, only `null` is recognized, leaving `undefined` out in the cold, which can complicate data handling. To streamline the handling of such scenarios, DTL introduces the concept of `empty`. In DTL, an item is considered `empty` if it holds no meaningful value. This includes variables that are `undefined`, empty strings `''`, arrays with no elements, and objects with no properties. This unified approach towards handling the absence of value simplifies data checks, making your transforms more straightforward and less error-prone compared to handling `undefined`, `null`, and other falsy values separately as in JavaScript. In the following sections, we'll explore how DTL's `empty` concept is leveraged through various helpers and how it contrasts with JavaScript's approach, aiming to provide a clearer, more efficient way of handling the absence of value in your data transformations. It is possible in DTL to determine the type of a value, by using the `typeof()` helper, but in almost every situation, using the concept of `empty` is preferable. You can determine if a value is empty using the `empty()` helper. Once again, an example is helpful: ``` empty(undefined) // true empty('') // true empty(' ') // false - contains a space empty([]) // true empty({}) // true ``` Why is this useful? Because in almost every situation, what you really want to know about the data isn't whether it has a value of `undefined` or similar, it's whether it has a meaningful value. `empty()` tells you whether it has a meaningful value or not in all situations. For example, if an object is `empty()` there may be no need to process it. Likewise for an array. If you need some data, and the place you are looking for it is `empty()`, you will need to look elsewhere. So making use of `empty()` can make your transforms much simpler than the equivalent code. Related to the `empty()` helper is the *First Not Empty*, or `fne()` helper. As it's name implies when you give `fne()` multiple values, it will return the first one that is not empty. This can be especially useful for getting a value from one of multiple places, or falling back to a default. For example: ``` fne($user.nickname $user.first_name 'User') ``` will try to obtain a value from `$user.nickname`. If that has a non-empty value it will be returned. If, however, it is empty, it will look in `$user.first_name` and if that is also empty, will return the string 'User'. Note that because of the concept of `empty()`, you don't need to concern yourself with checking whether `$user.nickname` is `undefined` or `null` or an empty string. In most cases, for the purposes of transforming data, they are all equivalently `empty()` As you can see, `fne()` makes it easy to do multiple fallback lookups and set default values. As mentioned, if you really need to know, you can use `typeof()` to determine whether something is actually `string` or `undefined`. You can also use `==` to check its actual value, for example: ``` $v == '' // true if $v is an empty string, false if $v is undefined ``` But in most cases, we have found that we mostly just want to know if a value is `empty()` or not. ### Comments When creating transforms, it can be helpful to add comments. DTL understands two types of comments in an expression. If you are familiar with C, Javascript, or other languages with a C-derived syntax, these will be familiar to you. ``` /* Find what we call them */ fne($user.nickname $user.first_name 'User') empty($user.nickname) // Does user have a nickname defined? ``` As you expect, `/* ... */` can be placed anywhere in an expression, and the `// ...` marks everything until the end of the line as a comment. ### A word about whitespace DTL expressions are fairly straightforward and do not require much extra punctuation. DTL understands that within an array context `[ ... ]` for example, that each item separated by whitespace is an additional element, so commas are unnecessary. Likewise, no special termination character is required to signal the end of an expression, etc. They simply end at the end of the expression. DTL treats spaces, tabs and newlines as whitespace and all are equivalent. This means that you can create complex transforms using multiple lines and indenting for clarity. This can be hard to see when transforms are encoded using JSON because JSON does not support multi-line strings directly, but JSON is only one way of encoding transforms. If you encode your transforms some other way, such as YAML or [JSON5](https://json5.org), or extract them directly from a database newlines and indentation can be quite helpful and DTL will happily let you indent and newline your expressions however you see fit. ## Understanding DTL Transform Libraries DTL (Data Transformation Language) facilitates the definition and execution of transformations on data. A central feature in DTL is transform libraries, which are collections of named transforms grouped together. ### Defining Transform Libraries A transform library is an object where each key represents a transform name, and the value is the transform definition. This structure allows for organizing multiple related transforms together. Transforms within a library can reference each other, and can be accessed both internally and externally. Here's a simplified example of a transform library: ```json { "out": { "original_value": "(: $num :)", "rounded": { "num": "(: $num -> 'round_number' :)" } }, "round_number": "(: math.round($.) :)" } ``` ### Applying Transforms Transforms are triggered using the `DTL.apply` function, which accepts three parameters: - `input_data`: The data subject to transformation. - `transform_library`: The library comprising the transforms. - `transform_name_to_run`: The specific transform to execute. If omitted, the 'out' transform is used. ```javascript DTL.apply(input_data, transform_library, transform_name_to_run); ``` ### Recursive Processing If a transform is an object or array, DTL processes it recursively, traversing its structure and executing any enclosed expressions. This feature is invaluable for constructing specific data structures based on input data. ### Example: Creating a Nested Object Given a flat data object with person details: ```json { "name": "John Doe", "street": "123 Main St", "city": "Springfield", "state": "IL", "zip": "62704" } ``` A transform library can be crafted to organize this data into a nested format: ```json { "out": "(: $ -> 'create_nested_object' :)", "create_nested_object": { "person": { "name": "(: $.name :)" }, "address": { "street": "(: $.street :)", "city": "(: $.city :)", "state": "(: $.state :)", "zip": "(: $.zip :)" } } } ``` In this setup, `create_nested_object` is a transform that builds a nested object from the flat input data. The `out` transform references `create_nested_object`. ### Advanced Example: Multiple Transforms and Specified Transform Execution Consider an input object with numerical data: ```json { "num": 105, "num2": 42 } ``` We can design a transform library to perform various operations on these numbers: ```json { "out": { "original_value": "(: $num :)", "rounded": { "num": "(: $num -> 'round_number' :)", "num2": "(: $num2 -> 'round_number' :)" }, "num_is_big": "(: $num -> 'is_big' :)" }, "round_number": "(: math.round($.) :)", "is_big": "(: $. >= 100 :)" } ``` Here, besides the `out` transform, two additional transforms `round_number` and `is_big` are defined. They round a number and check if a number is big (>=100), respectively. If we like, instead of defaulting to the 'out' transform, we can specify a transform name in `DTL.apply`: ```javascript DTL.apply(input_data.num, transform_library, 'is_big'); ``` This will only evaluate the `is_big` transform, checking if `num` is greater than or equal to 100. Through this mechanism, DTL offers a flexible, organized approach to define and apply complex data transformations, rendering it a potent tool for data manipulation and structure definition. ### Example: Processing an Array of People using `map` Consider an input array containing flat objects with person information: ```json [ { "name": "John Doe", "street": "123 Main St", "city": "Springfield", "state": "IL", "zip": "62704" }, { "name": "Jane Smith", "street": "456 Elm St", "city": "Rivertown", "state": "TX", "zip": "75001" } ] ``` We can design a transform library to organize this data into nested structures for each person: ```json { "out": "(: $. -> 'process_people' :)", "process_people": "(: map($. 'create_nested_object') :)", "create_nested_object": { "person": "(: $item -> 'get_person' :)", "address": "(: $item -> 'get_address' :)" }, "get_person": { "first_name": "(: (split($.name ' '))[0] :)", "last_name": "(: (split($.name ' '))[1] :)" }, "get_address": { "street": "(: $.street :)", "city": "(: $.city :)", "state": "(: $.state :)", "zip": "(: $.zip :)" } } ``` In this setup: - The `out` transform calls `process_people`. - `process_people` employs the `map` helper to apply `create_nested_object` to each item in the input array. - `create_nested_object` constructs a nested object from each flat person object, using the `get_person` transform for the person and the `get_address` transform to create the address structure. The `map` helper is instrumental as it iterates over each item in the input array, applying the specified transform, and gathering the results into a new array. This way, each flat person object is transformed into a nested structure, and a new array of these nested structures is produced. The usage would be: ```javascript DTL.apply(input_data, transform_library); ``` This will output an array of nested person objects: ```json [ { "person": { "name": "John Doe" }, "address": { "street": "123 Main St", "city": "Springfield", "state": "IL", "zip": "62704" } }, { "person": { "name": "Jane Smith" }, "address": { "street": "456 Elm St", "city": "Rivertown", "state": "TX", "zip": "75001" } } ] ``` ### Direct Transform Access in DTL Besides utilizing the default `out` transform, DTL allows direct access to other named transforms within a library from your JavaScript code. This feature is handy for executing specific transformations without going through the entire library. For instance, given a single person object: ```javascript const one_person = { "name": "John Doe", "street": "123 Main St", "city": "Springfield", "state": "IL", "zip": "62704" }; ``` You can directly access and execute the `get_person` transform using the `DTL.apply` function: ```javascript const result = DTL.apply(one_person, transform_library, 'get_person'); ``` This call will bypass the `out` transform, and instead, directly execute `get_person` on the provided person object, producing an output with the first and last name separated: ```json { "first_name": "John", "last_name": "Doe" } ``` This mechanism offers flexibility to invoke any specific transform defined in the library, catering to various data processing needs directly from your JavaScript environment. ## Literal Values in Transform Processing DTL permits the inclusion of literal values directly within a transform, which are then reflected in the result. This feature is particularly useful in scenarios where static values or structures are required in the output alongside dynamically transformed values. Literal values can be part of transforms that are objects or arrays. Let's enhance a previous example to illustrate this concept: ```json { "out": "(: $. -> 'process_people' :)", "process_people": "(: map($. 'create_nested_object') :)", "create_nested_object": { "person": "(: $item -> 'get_person' :)", "address": "(: $item -> 'get_address' :)", "metadata": { "processing_date": "2023-10-25", "source": "user_input" } }, "get_person": { "first_name": "(: (split($.name ' '))[0] :)", "last_name": "(: (split($.name ' '))[1] :)" }, "get_address": { "street": "(: $.street :)", "city": "(: $.city :)", "state": "(: $.state :)", "zip": "(: $.zip :)" } } ``` In the `create_nested_object` transform, a `metadata` object with literal values is introduced. These values, `processing_date` and `source`, are static and will appear as-is in the transformation result. Here's a snippet of the output illustrating the inclusion of literal values: ```json { "person": { "first_name": "John", "last_name": "Doe" }, "address": { // ... address fields ... }, "metadata": { "processing_date": "2023-10-25", "source": "user_input" } } ``` It's important to note that keys of objects within transforms are never processed as expressions; they are treated as static identifiers. This allows for a clear demarcation between static structure and dynamic value transformation in DTL, facilitating precise control over the output format. ## Static Keys vs Dynamic Keys in DTL In DTL, keys within transforms are kept static and are not processed as expressions. This design choice ensures the stability and predictability of your object structure during transformation, preventing inadvertent modifications which could occur if keys were dynamically interpreted. However, there are cases where you might need to dynamically generate parts of your object structure based on data. For these scenarios, DTL offers the `object creator` syntax `{ }`. This syntax allows you to explicitly create objects with dynamic keys, giving you the flexibility to construct complex or data-driven structures while maintaining a clear intention in your code. With the object creator syntax, you construct key-value pairs in a controlled manner, specifying exactly how keys are generated and values are assigned, ensuring that dynamic object creation is both deliberate and transparent. This balanced approach allows for a robust yet flexible data transformation process, catering to a wide range of use cases while preserving the integrity of your object structures. ### Object Creator Syntax The object creator syntax allows you to generate an object from one or more pairs of values. Each pair is a two-element array, where the first item becomes the key, and the second item becomes the value. Here's the syntax: ```json { [ $key $value ] } ``` #### Example: Given a variable `$first_name` with a value of "John", this expression: ```json { [ $first_name length($first_name) ] } ``` Produces an object: ```json { "John": 4 } ``` #### Unpacking Objects into Pairs Conversely, if you have an object and wish to break it down into an array of key/value pairs, you can use the `pairs` helper. The `pairs` helper takes an object as an argument and returns an array of key/value pairs. ##### Syntax: ```json pairs($object) ``` ##### Example: Given an object: ```json { "John": 4, "Doe": 3 } ``` Applying the `pairs` helper: ```json pairs($.) ``` Produces an array of key/value pairs: ```json [ ["John", 4], ["Doe", 3] ] ``` This complementary functionality allows for flexible manipulation of objects and arrays within DTL, enabling dynamic key generation and object decomposition to suit your data transformation needs. ## Flattening and Unflattening Objects in DTL Handling deeply nested objects efficiently is a common need in data transformation, especially when you want to interact with or overwrite specific fields deep within an object without disturbing the rest of the structure. DTL provides two powerful helpers for this purpose: `flatten($obj)` and `unflatten($obj)`. ### Flattening Objects The `flatten()` helper is used to condense a deeply nested object into a single-level object while preserving the original structure within the key names. This is achieved by encoding the full key path using dot-notation (or a custom separator if provided). Here's the syntax: ```plaintext flatten( $array_or_object [ $separator ] [ $prefix ] ) ``` #### Example Usage: Suppose you have a complex nested object, and you want to update specific deep fields within it. You can use `flatten()` to simplify the object, apply your transformations, and then `unflatten()` to restore the original nested structure. ```json { "out": "(: unflatten( &(flatten($.) ($. -> get_new_deep_fields))) :)", "get_new_deep_fields": { "metadata.detail.origin_ip": "(: $original_ip :)", "metadata.detail.geohash": "(: $geohash :)" } } ``` In this example: 1. `flatten($.)` simplifies the input object into a single-level object. 2. `get_new_deep_fields` transform creates or updates the specified deep fields. 3. `&()` merges the flattened original object with the new fields. 4. `unflatten()` restores the nested structure, integrating the new or updated fields. ### Unflattening Objects The `unflatten()` helper reverses the flattening process, reconstructing the original nested structure from a single-level object. #### Example Result: Given the original object and the transform above, if the `original_ip` and `geohash` are "192.168.1.1" and "u4pruydqqvj", the output will be: ```json { "metadata": { "detail": { "origin_ip": "192.168.1.1", "geohash": "u4pruydqqvj" } // ... other existing fields ... } // ... other existing fields ... } ``` This methodology facilitates precise manipulations on complex nested objects with ease, allowing for targeted updates or additions to the data structure while preserving the original format.