json-schema-to-es-mapping
Version:
Generate Elastic Search mappings from JSON Schema
814 lines (645 loc) • 20.7 kB
Markdown
# JSON Schema to ElasticSearch mappings
Convert JSON schema to [ElasticSearch mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html)
A mapping type has:
_Meta-fields_
Meta-fields are used to customize how a document’s metadata associated is treated. Examples of meta-fields include the document’s `_index`, `_type`, `_id`, and `_source` fields.
_Fields or properties_
A mapping `type` contains a list of fields or properties pertinent to the document.
_Field datatypes_
Each field has a data `type` which can be:
- a simple type like `text`, `keyword`, `date`, `long`, `double`, `boolean` or `ip`
- a type which supports the hierarchical nature of JSON such as `object` or `nested`
- a specialised type like `geo_point`, `geo_shape`, or `completion`
It is often useful to index the same field in different ways for different purposes. For instance, a `string` field could be indexed as a `text` field for full-text search, and as a `keyword` field for sorting or aggregations. Alternatively, you could index a string field with the `standard` analyzer, the `english` analyzer, and the `french` analyzer.
This is the purpose of multi-fields. Most datatypes support multi-fields via the `fields` parameter.
## Quick start
- npm: `npm install json-schema-to-es-mapping -S`
- yarn: `yarn add json-schema-to-es-mapping`
The easiest way to get started is to use `buildMappingsFor` to create a mappings object for a named index given a JSON schema.
```js
const mappings = buildMappingsFor("people", schema);
```
Example:
```js
const schema = {
$schema: "http://json-schema.org/draft-07/schema#",
$id: "http://example.com/person.schema.json",
title: "Person",
description: "A person",
type: "object",
properties: {
name: {
description: "Name of the person",
type: "string"
},
age: {
description: "Age of person",
type: "number"
}
},
required: ["name"]
};
const { buildMappingsFor } = require("json-schema-to-es-mapping");
const mappings = buildMappingsFor("people", schema);
console.log({ mappings });
```
This will by default give the following mappings result:
```json
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
}
}
```
The function `buildMappingsFor` uses the `build` function to return the properties map and simply wraps them with a `mappings` object for the named index.
## Supported mappings
Currently all Elastic Search core data types are supported (except for `binary`).
- string
- numeric
- boolean
- date
- range (numeric, date)
### Numeric
You can assist the numeric type mapper by supplying a `numType` for the field entry, such as `numType: "double"`
See ES [number](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#number) reference for list of valid `numType`s (except for `scaled_float`)
### Ranges
- Numeric
- Date
#### Numeric ranges
To make a numeric field entry be mapped to an ES numeric range:
- Set `range: true`
- Set a minimum range value, either `minimum` or `exlusiveMinimum`
- Set a maximum range value, either `maximum` or `exlusiveMaximum`
If you leave out the `range: true` it will be resolved as a number, using the min and max values and the `multipleOf` (precision). These properties will in combination be used to determine the exact numeric type (`byte`, `short`, ... `double`) to be used in the Elastic Search numeric type mapping.
#### Date ranges
To make an entry detect as a date range, the same applies as for a number range but the entry must also resolve to a date type (see `types/util.js` function `isDate(obj)` for details)
## Recent feature additions
Now also resolves:
- Array items that are themselves object types
- References to object definitions (ie. `$ref`)
- [Parent-child mapping](https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html)
## Limitations and coming features
Support for Geo location mapping will likely be included in the near future.
Please Let me know any other features you'd like to include for a more feature complete library!
Initial work to support these features have been started in the [dev](https://github.com/kristianmandrup/json-schema-to-es-mapping/tree/dev) branch and should land soon (0.4.0).
## Fine grained control
For more fine-grained control, use the `build` function directly.
```js
const { build } = require("json-schema-to-es-mapping");
const { properties, results } = build(schema);
console.log({ properties, results });
```
Will output the following Elastic Search Mapping schema:
```json
{
"name": {
"type": "text"
},
"age": {
"type": "float"
}
}
```
The `results` will in this (simple) case give the same results as the `mappings`:
```js
{
name: { type: "keyword" },
age: { type: "float" }
}
```
## Event driven approach
You can use the Event driven approach with the `onResult` and other calback handlers, to generate a more context specific mapping for Elastic Search context, given your requirements.
```js
const received = [];
const onResult = result => {
console.log("received", result);
received.push(result);
};
// potentially use to call resolve callback of Promise
const onComplete = fullResult => {
console.log("ES mapping done :)", {
fullResult, // 'internal" results
received // list built by onResult
});
};
// potentially use to call reject callback of Promise
const onError = errMsg => {
console.error("ES mapping error", errMsg);
throw errMsg;
};
// potentially use to call reject callback of Promise
const onThrow = err => throw err;
const config = { onResult, onComplete, onError, onThrow };
```
The `onResult` handler will populate the `received` array with the following:
```js
[
{ parentName: "Person", key: "name", resultKey: "name", type: "text" },
{
parentName: "Person",
key: "age",
resultKey: "age",
type: "float"
}
];
```
You will also get notified on:
- successful completion of JSON schema mapping via `onComplete` callback
- aborted due to processing error via `onError` callback
- aborted due to throwing exception via `onThrow` callback
The Event driven approach is entirely optional, but can be used for a more "stream like" approach. This approach works well with async promises (ie. `reject` and `resolve` callbacks).
On each result received you can then issue a command to the Elastic Search server (f.ex via the REST interface) to add a new mapping that reflects the result received.
[Put mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html)
```bash
PUT person/_mapping/_doc
{
"properties": {
"age": {
"type": "float"
}
}
}
```
Alternatively only submit the ES index mappings after `onComplete` is triggered, to make sure the full JSON schema could be processed, so that you don't end up with partial schema mappings.
## Nested schemas
For a nested schema of the form:
```js
{
$schema: "http://json-schema.org/draft-07/schema#",
$id: "http://example.com/person.schema.json",
title: "Person",
description: "A person",
type: "object",
properties: {
name: {
description: "Name of the person",
type: "string"
},
dog: {
type: "object",
typeName: "Animal",
properties: {
name: {
description: "Name of the dog",
type: "string",
required: true
},
age: {
description: "Age of dog",
type: "number"
}
}
}
},
required: ["name"]
};
```
`buildMappingsFor` will in this case generate an Elastic Search mapping as follows:
```js
mappings: {
people: {
properties: {
name: {
type: "keyword"
},
dog: {
properties: {
name: {
type: "keyword"
},
age: {
type: "float"
}
}
}
}
}
}
```
Note that the `dog` object results in a nested mapping (see ElasticSearch resources below)
The `results` will in this case give:
```js
{
name: { type: 'keyword' },
dog_name: { type: 'keyword' },
dog_age: { type: 'float' },
dog: {
name: { type: 'keyword' },
age: { type: 'float' }
}
}
```
Notice how the dog properties are provided both in flat and nested form. Depending on your requirements, you might want to store the Elastic Search data in a more flat form than in your general application domain model.
### Customizing the result
You can pass a custom function `shouldSetResult(converter)` which controls under which converter conditions the result should be set. You can also pass:
- a custom name separator `nameSeparator`
- a `resultKey(converter)` function, to customize how result keys (names) are generated
- a `nestedKey(converter)` function, to customize how nested result keys (names) are generated
Example:
```js
const config = {
shouldSetResult: converter => {
return converter.type !== "object";
},
nameSeparator: "__" // example: dog__age
};
```
This configuration will result in results discarding the nested form, thus only retaining flattened field mappings.
```js
{
name: { type: 'keyword' },
dog__name: { type: 'keyword' },
dog__age: { type: 'float' },
}
```
If you add an `onResult` handler to receive results, it will look as follows:
```js
results:
[
{
parentName: 'Person',
key: 'name',
resultKey: 'name',
type: 'keyword'
},
{
parentName: 'dog',
key: 'name',
resultKey: 'dog__name',
type: 'keyword'
},
{ parentName: 'dog',
key: 'age',
resultKey: 'dog__age',
type: 'float'
},
{ parentName: 'Person',
typeName: 'Animal',
key: 'dog',
resultKey: 'dog',
properties: {
name: { type: 'keyword' },
age: { type: 'float' }
}
}
]
}
```
Note the `typeName` in the result for the `dog` fields (more on this later)
## Default configuration
The default configuration is as follows.
```js
{
_meta_: {
types: {
string: "keyword",
number: "float",
object: "object",
array: "nested",
boolean: "boolean",
date: "date"
}
},
fields: {
name: {
type: "keyword"
},
content: {
type: "text"
},
text: {
type: "text"
},
title: {
type: "text"
},
caption: {
type: "text"
},
label: {
type: "text"
},
tag: {
type: "keyword",
index: "not_analyzed"
}
}
}
```
Note that some or all of these might benefit from being defined as multi fields, that are indexed and analyzed both as `text` and `keyword`.
You can pass in a custom configuration object (last argument) to override or extend it ;)
Note that for convenience, we pass in some typical field mappings based on names. Please customize this further to your needs.
## Customization
- Type mappers
- Rules
### Type mappers
You can pass in custom Type mapper factories if you want to override how specific types are mapped.
Internally this is managed in the `SchemaEntry` constructor in `entry.js`:
```js
this.defaults = {
types: {
string: toString,
number: toNumber,
boolean: toBoolean,
array: toArray,
object: toObject,
date: toDate,
dateRange: toDateRange,
numericRange: toNumericRange
},
typeOrder: [
"string",
"dateRange",
"numericRange",
"number",
"boolean",
"array",
"object",
"date"
]
};
this.types = {
...this.defaults.types,
...(config.types || {})
};
this.typeOrder = config.typeOrder || this.defaults.typeOrder;
```
To override, simply pass in a custom `types` object and/or a custom `typeOrder` array of the precedence order they should be resolved in.
#### Custom Type mapper example (object)
Create a `toObject` file loally in your project that contains your overrides
```js
const { types } = require("json-schema-to-es-mapping");
const { MappingObject, toObject, util } = types;
class MyMappingObject extends MappingObject {
// ...override
createMappingResult() {
return this.hasProperties
? this.buildObjectValueMapping()
: this.defaultObjectValueMapping;
}
buildObjectValueMapping() {
const { buildProperties } = this.config;
return buildProperties(this.objectValue, this.mappingConfig);
}
}
module.exports = function toObject(obj) {
return util.isObject(obj) && new MyMappingObject(obj).convert();
};
```
Import the `toObject` function and pass it in the `types` object of the `config` object passed to the `build` function.
```js
// custom implementation
const toObject = require("./toObject");
const myConfig = {
types: {
toObject
}
};
// will now use the custom toObject for mapping JSON schema object to ES object
build(schema, myConfig);
```
Depending on your requirements, you can post-process the generated mapping to better suit your specific needs and strategies for handling nested/complex data relationships.
## Elastic search types
- [Elasticsearch: mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html)
Core:
- String (`text`, `keyword`)
- Numeric (`long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`)
- Date (`date`)
- Boolean (`boolean`)
- Binary (`binary`)
- Range (`integer_range`, `float_range`, `long_range`, `double_range`, `date_range`)
## Type mappings
The default type mappings are as follows:
- `boolean` -> `boolean`
- `object` -> `object`
- `array` -> `nested`
- `string` -> `keyword`
- `number` -> `integer`
- `date` -> `date`
For `array` it will use `type` of first [array item](https://cswr.github.io/JsonSchema/spec/arrays/) if [basic type](https://cswr.github.io/JsonSchema/spec/basic_types/) and the type for all array items are the same.
```js
{
"type": "array",
"items":{
"type": "integer"
}
}
```
If array item types are note "uniform" it will throw an error.
For the following array JSON schema entry the mapper will currently set the mapping type to `string` (by default). Please use the customization options outlined to define a more appropriate mapping strategy if needed.
```js
{
"type": "array",
"items" : [{
"type": "string"
// ...
},
{
"type": "string"
// ...
},
]
}
```
You can override the default type mappings by passing a `types` entry with type mappings in the `_meta_` entry of `config`
```js
const config = {
_meta_: {
types: {
number: "long", // use "integer" for numbers
string: "text" // use "text" for strings
}
}
};
```
### Rules
You can pass an extra configuration object with specific rules for ES mapping properties that will be merged into the resulting mapping.
```js
const config = {
_meta_: {
types: {
number: "long", // use "integer" for numbers
string: "text" // use "text" for strings
}
},
fields: {
created: {
// add extra indexing field meta data for Elastic search
format: "strict_date_optional_time||epoch_millis"
// ...
},
firstName: {
type: "keyword" // make sure firstName will be a keyword field (exact match) in ES mapping
}
}
};
const { build } = require("json-schema-to-es-mapping");
const mapping = build(schema, config);
```
Also note that you can pass in many of the functions used internally, so that the internal mechanics themselves can easily be customized as needed or used as building blocks.
### Elastic Search nested objects and data
- [Elasticsearch: Nested datatype](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)
- [Elasticsearch: Nested Objects](https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html)
- [Elasticsearch data schema for nested objects](https://stackoverflow.com/questions/43488166/elasticsearch-data-schema-for-nested-objects)
- [Elasticsearch : Advanced search and nested objects](http://obtao.com/blog/2014/04/elasticsearch-advanced-search-and-nested-objects/)
## Advanced customization
To override the default mappings for certain fields, you can pass in a fields mapping entry in the `config` object as follows:
```js
const config = {
fields: {
timestamp: {
type: "date",
format: "dateOptionalTime"
}
// ... more custom field mappings
}
};
```
For a more scalable customization, pass an `entryFor` function which returns custom mappings
depending on the entry being processed.
- `key`
- `resultKey` (ie. potentially nested key name)
- `parentName` name of parent entry if nested property
- `schemaValue` (entry from JSON schema being mapped)
You could f.ex use this to provide custom mappings for specific types of date fields.
```js
const config = {
entryFor: ({ key }) => {
if (key === "date" || key === "timestamp") {
return {
type: "date",
format: "dateOptionalTime"
};
}
}
};
```
### resolve type maps
You can use [resolve-type-maps](https://github.com/kristianmandrup/resolve-type-maps) to define mappings to be used across your application in various schema-like contexts:
- GraphQL schema
- Data storage (tables, colletions etc)
- Validation
- Forms
- Data Display
- Indexing (including Elastic Search)
- Mocks and fake data
```js
const fieldMap = {
name: {
matches: ['title', 'caption', 'label'],
elastic: {
type: 'string',
}
}
tag: {
matches: ['tags'],
elastic: {
type: 'keyword',
}
},
text: {
matches: ['description', 'content'],
elastic: {
type: 'text',
}
},
date: {
matches: ['date', 'timestamp'],
elastic: {
type: 'text',
format: 'dateOptionalTime'
}
}
}
const typeMap = {
Person: {
matches: ['User'],
fields: {
dog: {
// ...
elastic: {
type: 'nested',
// ...
}
},
// ...
}
}
}
```
Then pass an `entryFor` function in the config object to resolve the entry to be used for the ES mapping entry.
```js
import { createTypeMapResolver } from "resolve-type-maps";
const map = {
typeMap,
fieldMap
};
const resolverConfig = {};
const functions = {
resolveResult: (obj) => obj.elastic;
}
const resolver = createTypeMapResolver(
{ map, functions },
resolverConfig
);
const config = {
entryFor: ({ parentName, typeName }) => {
// ensure capitalized and camelized name
const type = classify(typeName || parentName);
const name = converter.key;
return resolver.resolve({ type, name });
}
};
```
Note that for `typeName` to be set, either set a `className` or `typeName` property on the object entry in the JSON schema (see `dog` example above) or alternatively provide a lookup function `typeNameFor(name)` on the config object passed in.
For inner workings, see [TypeMapResolver.ts](https://github.com/kristianmandrup/resolve-type-maps/blob/master/src/lib/TypeMapResolver.ts)
The above configuration should look up the elastic mapping entry to use, based on the type/field combination in the `typeMap` first and then fall back to the field name only in the `fieldMap` if not found. On a match, it will resolve by returning entry named `elastic` in the object matching.
```js
{
Person: {
matches: [/User/],
fields: {
dog: {
// ...
elastic: {
type: 'nested',
// ...
}
},
}
}
}
```
It should match a schema (or nested schema entry) named `Person` or `User` on the `typeMap` entry `Person`. For the nested `dog` entry it should then match on the entry `dog` under `fields` and return the entry for elastic, ie:
```js
{
type: "nested";
}
```
If no match is made in the `typeMap`, it will follow a similar strategy by lookup a match in the `fieldMap` (as per the `maps` entry passed in the `config` object when creating the `resolver`), matching only on the field name.
## ElasticSearch mapping resources
- [mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html)
- [removal of types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html)
- [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)
## Testing
Uses [jest](jestjs.io/) for unit testing.
Currently not well tested. Please help add more test coverage :)
## TODO
### 1.0.0
- Convert project to TypeScript
- Add unit tests for ~80% test coverage
- Improve mappings for:
- Date range
## Author
2019 Kristian Mandrup (CTO@Tecla5)
## License
MIT