UNPKG

json-schema-to-es-mapping

Version:
814 lines (645 loc) 20.7 kB
# JSON Schema to ElasticSearch mappings Convert JSON schema to [ElasticSearch mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html) A mapping type has: _Meta-fields_ Meta-fields are used to customize how a document’s metadata associated is treated. Examples of meta-fields include the document’s `_index`, `_type`, `_id`, and `_source` fields. _Fields or properties_ A mapping `type` contains a list of fields or properties pertinent to the document. _Field datatypes_ Each field has a data `type` which can be: - a simple type like `text`, `keyword`, `date`, `long`, `double`, `boolean` or `ip` - a type which supports the hierarchical nature of JSON such as `object` or `nested` - a specialised type like `geo_point`, `geo_shape`, or `completion` It is often useful to index the same field in different ways for different purposes. For instance, a `string` field could be indexed as a `text` field for full-text search, and as a `keyword` field for sorting or aggregations. Alternatively, you could index a string field with the `standard` analyzer, the `english` analyzer, and the `french` analyzer. This is the purpose of multi-fields. Most datatypes support multi-fields via the `fields` parameter. ## Quick start - npm: `npm install json-schema-to-es-mapping -S` - yarn: `yarn add json-schema-to-es-mapping` The easiest way to get started is to use `buildMappingsFor` to create a mappings object for a named index given a JSON schema. ```js const mappings = buildMappingsFor("people", schema); ``` Example: ```js const schema = { $schema: "http://json-schema.org/draft-07/schema#", $id: "http://example.com/person.schema.json", title: "Person", description: "A person", type: "object", properties: { name: { description: "Name of the person", type: "string" }, age: { description: "Age of person", type: "number" } }, required: ["name"] }; const { buildMappingsFor } = require("json-schema-to-es-mapping"); const mappings = buildMappingsFor("people", schema); console.log({ mappings }); ``` This will by default give the following mappings result: ```json { "mappings": { "people": { "properties": { "name": { "type": "text" }, "age": { "type": "integer" } } } } } ``` The function `buildMappingsFor` uses the `build` function to return the properties map and simply wraps them with a `mappings` object for the named index. ## Supported mappings Currently all Elastic Search core data types are supported (except for `binary`). - string - numeric - boolean - date - range (numeric, date) ### Numeric You can assist the numeric type mapper by supplying a `numType` for the field entry, such as `numType: "double"` See ES [number](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#number) reference for list of valid `numType`s (except for `scaled_float`) ### Ranges - Numeric - Date #### Numeric ranges To make a numeric field entry be mapped to an ES numeric range: - Set `range: true` - Set a minimum range value, either `minimum` or `exlusiveMinimum` - Set a maximum range value, either `maximum` or `exlusiveMaximum` If you leave out the `range: true` it will be resolved as a number, using the min and max values and the `multipleOf` (precision). These properties will in combination be used to determine the exact numeric type (`byte`, `short`, ... `double`) to be used in the Elastic Search numeric type mapping. #### Date ranges To make an entry detect as a date range, the same applies as for a number range but the entry must also resolve to a date type (see `types/util.js` function `isDate(obj)` for details) ## Recent feature additions Now also resolves: - Array items that are themselves object types - References to object definitions (ie. `$ref`) - [Parent-child mapping](https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html) ## Limitations and coming features Support for Geo location mapping will likely be included in the near future. Please Let me know any other features you'd like to include for a more feature complete library! Initial work to support these features have been started in the [dev](https://github.com/kristianmandrup/json-schema-to-es-mapping/tree/dev) branch and should land soon (0.4.0). ## Fine grained control For more fine-grained control, use the `build` function directly. ```js const { build } = require("json-schema-to-es-mapping"); const { properties, results } = build(schema); console.log({ properties, results }); ``` Will output the following Elastic Search Mapping schema: ```json { "name": { "type": "text" }, "age": { "type": "float" } } ``` The `results` will in this (simple) case give the same results as the `mappings`: ```js { name: { type: "keyword" }, age: { type: "float" } } ``` ## Event driven approach You can use the Event driven approach with the `onResult` and other calback handlers, to generate a more context specific mapping for Elastic Search context, given your requirements. ```js const received = []; const onResult = result => { console.log("received", result); received.push(result); }; // potentially use to call resolve callback of Promise const onComplete = fullResult => { console.log("ES mapping done :)", { fullResult, // 'internal" results received // list built by onResult }); }; // potentially use to call reject callback of Promise const onError = errMsg => { console.error("ES mapping error", errMsg); throw errMsg; }; // potentially use to call reject callback of Promise const onThrow = err => throw err; const config = { onResult, onComplete, onError, onThrow }; ``` The `onResult` handler will populate the `received` array with the following: ```js [ { parentName: "Person", key: "name", resultKey: "name", type: "text" }, { parentName: "Person", key: "age", resultKey: "age", type: "float" } ]; ``` You will also get notified on: - successful completion of JSON schema mapping via `onComplete` callback - aborted due to processing error via `onError` callback - aborted due to throwing exception via `onThrow` callback The Event driven approach is entirely optional, but can be used for a more "stream like" approach. This approach works well with async promises (ie. `reject` and `resolve` callbacks). On each result received you can then issue a command to the Elastic Search server (f.ex via the REST interface) to add a new mapping that reflects the result received. [Put mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html) ```bash PUT person/_mapping/_doc { "properties": { "age": { "type": "float" } } } ``` Alternatively only submit the ES index mappings after `onComplete` is triggered, to make sure the full JSON schema could be processed, so that you don't end up with partial schema mappings. ## Nested schemas For a nested schema of the form: ```js { $schema: "http://json-schema.org/draft-07/schema#", $id: "http://example.com/person.schema.json", title: "Person", description: "A person", type: "object", properties: { name: { description: "Name of the person", type: "string" }, dog: { type: "object", typeName: "Animal", properties: { name: { description: "Name of the dog", type: "string", required: true }, age: { description: "Age of dog", type: "number" } } } }, required: ["name"] }; ``` `buildMappingsFor` will in this case generate an Elastic Search mapping as follows: ```js mappings: { people: { properties: { name: { type: "keyword" }, dog: { properties: { name: { type: "keyword" }, age: { type: "float" } } } } } } ``` Note that the `dog` object results in a nested mapping (see ElasticSearch resources below) The `results` will in this case give: ```js { name: { type: 'keyword' }, dog_name: { type: 'keyword' }, dog_age: { type: 'float' }, dog: { name: { type: 'keyword' }, age: { type: 'float' } } } ``` Notice how the dog properties are provided both in flat and nested form. Depending on your requirements, you might want to store the Elastic Search data in a more flat form than in your general application domain model. ### Customizing the result You can pass a custom function `shouldSetResult(converter)` which controls under which converter conditions the result should be set. You can also pass: - a custom name separator `nameSeparator` - a `resultKey(converter)` function, to customize how result keys (names) are generated - a `nestedKey(converter)` function, to customize how nested result keys (names) are generated Example: ```js const config = { shouldSetResult: converter => { return converter.type !== "object"; }, nameSeparator: "__" // example: dog__age }; ``` This configuration will result in results discarding the nested form, thus only retaining flattened field mappings. ```js { name: { type: 'keyword' }, dog__name: { type: 'keyword' }, dog__age: { type: 'float' }, } ``` If you add an `onResult` handler to receive results, it will look as follows: ```js results: [ { parentName: 'Person', key: 'name', resultKey: 'name', type: 'keyword' }, { parentName: 'dog', key: 'name', resultKey: 'dog__name', type: 'keyword' }, { parentName: 'dog', key: 'age', resultKey: 'dog__age', type: 'float' }, { parentName: 'Person', typeName: 'Animal', key: 'dog', resultKey: 'dog', properties: { name: { type: 'keyword' }, age: { type: 'float' } } } ] } ``` Note the `typeName` in the result for the `dog` fields (more on this later) ## Default configuration The default configuration is as follows. ```js { _meta_: { types: { string: "keyword", number: "float", object: "object", array: "nested", boolean: "boolean", date: "date" } }, fields: { name: { type: "keyword" }, content: { type: "text" }, text: { type: "text" }, title: { type: "text" }, caption: { type: "text" }, label: { type: "text" }, tag: { type: "keyword", index: "not_analyzed" } } } ``` Note that some or all of these might benefit from being defined as multi fields, that are indexed and analyzed both as `text` and `keyword`. You can pass in a custom configuration object (last argument) to override or extend it ;) Note that for convenience, we pass in some typical field mappings based on names. Please customize this further to your needs. ## Customization - Type mappers - Rules ### Type mappers You can pass in custom Type mapper factories if you want to override how specific types are mapped. Internally this is managed in the `SchemaEntry` constructor in `entry.js`: ```js this.defaults = { types: { string: toString, number: toNumber, boolean: toBoolean, array: toArray, object: toObject, date: toDate, dateRange: toDateRange, numericRange: toNumericRange }, typeOrder: [ "string", "dateRange", "numericRange", "number", "boolean", "array", "object", "date" ] }; this.types = { ...this.defaults.types, ...(config.types || {}) }; this.typeOrder = config.typeOrder || this.defaults.typeOrder; ``` To override, simply pass in a custom `types` object and/or a custom `typeOrder` array of the precedence order they should be resolved in. #### Custom Type mapper example (object) Create a `toObject` file loally in your project that contains your overrides ```js const { types } = require("json-schema-to-es-mapping"); const { MappingObject, toObject, util } = types; class MyMappingObject extends MappingObject { // ...override createMappingResult() { return this.hasProperties ? this.buildObjectValueMapping() : this.defaultObjectValueMapping; } buildObjectValueMapping() { const { buildProperties } = this.config; return buildProperties(this.objectValue, this.mappingConfig); } } module.exports = function toObject(obj) { return util.isObject(obj) && new MyMappingObject(obj).convert(); }; ``` Import the `toObject` function and pass it in the `types` object of the `config` object passed to the `build` function. ```js // custom implementation const toObject = require("./toObject"); const myConfig = { types: { toObject } }; // will now use the custom toObject for mapping JSON schema object to ES object build(schema, myConfig); ``` Depending on your requirements, you can post-process the generated mapping to better suit your specific needs and strategies for handling nested/complex data relationships. ## Elastic search types - [Elasticsearch: mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) Core: - String (`text`, `keyword`) - Numeric (`long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`) - Date (`date`) - Boolean (`boolean`) - Binary (`binary`) - Range (`integer_range`, `float_range`, `long_range`, `double_range`, `date_range`) ## Type mappings The default type mappings are as follows: - `boolean` -> `boolean` - `object` -> `object` - `array` -> `nested` - `string` -> `keyword` - `number` -> `integer` - `date` -> `date` For `array` it will use `type` of first [array item](https://cswr.github.io/JsonSchema/spec/arrays/) if [basic type](https://cswr.github.io/JsonSchema/spec/basic_types/) and the type for all array items are the same. ```js { "type": "array", "items":{ "type": "integer" } } ``` If array item types are note "uniform" it will throw an error. For the following array JSON schema entry the mapper will currently set the mapping type to `string` (by default). Please use the customization options outlined to define a more appropriate mapping strategy if needed. ```js { "type": "array", "items" : [{ "type": "string" // ... }, { "type": "string" // ... }, ] } ``` You can override the default type mappings by passing a `types` entry with type mappings in the `_meta_` entry of `config` ```js const config = { _meta_: { types: { number: "long", // use "integer" for numbers string: "text" // use "text" for strings } } }; ``` ### Rules You can pass an extra configuration object with specific rules for ES mapping properties that will be merged into the resulting mapping. ```js const config = { _meta_: { types: { number: "long", // use "integer" for numbers string: "text" // use "text" for strings } }, fields: { created: { // add extra indexing field meta data for Elastic search format: "strict_date_optional_time||epoch_millis" // ... }, firstName: { type: "keyword" // make sure firstName will be a keyword field (exact match) in ES mapping } } }; const { build } = require("json-schema-to-es-mapping"); const mapping = build(schema, config); ``` Also note that you can pass in many of the functions used internally, so that the internal mechanics themselves can easily be customized as needed or used as building blocks. ### Elastic Search nested objects and data - [Elasticsearch: Nested datatype](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) - [Elasticsearch: Nested Objects](https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) - [Elasticsearch data schema for nested objects](https://stackoverflow.com/questions/43488166/elasticsearch-data-schema-for-nested-objects) - [Elasticsearch : Advanced search and nested objects](http://obtao.com/blog/2014/04/elasticsearch-advanced-search-and-nested-objects/) ## Advanced customization To override the default mappings for certain fields, you can pass in a fields mapping entry in the `config` object as follows: ```js const config = { fields: { timestamp: { type: "date", format: "dateOptionalTime" } // ... more custom field mappings } }; ``` For a more scalable customization, pass an `entryFor` function which returns custom mappings depending on the entry being processed. - `key` - `resultKey` (ie. potentially nested key name) - `parentName` name of parent entry if nested property - `schemaValue` (entry from JSON schema being mapped) You could f.ex use this to provide custom mappings for specific types of date fields. ```js const config = { entryFor: ({ key }) => { if (key === "date" || key === "timestamp") { return { type: "date", format: "dateOptionalTime" }; } } }; ``` ### resolve type maps You can use [resolve-type-maps](https://github.com/kristianmandrup/resolve-type-maps) to define mappings to be used across your application in various schema-like contexts: - GraphQL schema - Data storage (tables, colletions etc) - Validation - Forms - Data Display - Indexing (including Elastic Search) - Mocks and fake data ```js const fieldMap = { name: { matches: ['title', 'caption', 'label'], elastic: { type: 'string', } } tag: { matches: ['tags'], elastic: { type: 'keyword', } }, text: { matches: ['description', 'content'], elastic: { type: 'text', } }, date: { matches: ['date', 'timestamp'], elastic: { type: 'text', format: 'dateOptionalTime' } } } const typeMap = { Person: { matches: ['User'], fields: { dog: { // ... elastic: { type: 'nested', // ... } }, // ... } } } ``` Then pass an `entryFor` function in the config object to resolve the entry to be used for the ES mapping entry. ```js import { createTypeMapResolver } from "resolve-type-maps"; const map = { typeMap, fieldMap }; const resolverConfig = {}; const functions = { resolveResult: (obj) => obj.elastic; } const resolver = createTypeMapResolver( { map, functions }, resolverConfig ); const config = { entryFor: ({ parentName, typeName }) => { // ensure capitalized and camelized name const type = classify(typeName || parentName); const name = converter.key; return resolver.resolve({ type, name }); } }; ``` Note that for `typeName` to be set, either set a `className` or `typeName` property on the object entry in the JSON schema (see `dog` example above) or alternatively provide a lookup function `typeNameFor(name)` on the config object passed in. For inner workings, see [TypeMapResolver.ts](https://github.com/kristianmandrup/resolve-type-maps/blob/master/src/lib/TypeMapResolver.ts) The above configuration should look up the elastic mapping entry to use, based on the type/field combination in the `typeMap` first and then fall back to the field name only in the `fieldMap` if not found. On a match, it will resolve by returning entry named `elastic` in the object matching. ```js { Person: { matches: [/User/], fields: { dog: { // ... elastic: { type: 'nested', // ... } }, } } } ``` It should match a schema (or nested schema entry) named `Person` or `User` on the `typeMap` entry `Person`. For the nested `dog` entry it should then match on the entry `dog` under `fields` and return the entry for elastic, ie: ```js { type: "nested"; } ``` If no match is made in the `typeMap`, it will follow a similar strategy by lookup a match in the `fieldMap` (as per the `maps` entry passed in the `config` object when creating the `resolver`), matching only on the field name. ## ElasticSearch mapping resources - [mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html) - [removal of types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html) - [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) ## Testing Uses [jest](jestjs.io/) for unit testing. Currently not well tested. Please help add more test coverage :) ## TODO ### 1.0.0 - Convert project to TypeScript - Add unit tests for ~80% test coverage - Improve mappings for: - Date range ## Author 2019 Kristian Mandrup (CTO@Tecla5) ## License MIT