datocms-contentful-to-structured-text
Version:
Convert Contentful Rich Text to a valid DatoCMS Structured Text `dast` document
338 lines (265 loc) • 9.56 kB
Markdown
# `datocms-contentful-to-structured-text`
This package contains utilities to convert Contentful Rich Text to a DatoCMS Structured Text `dast` (DatoCMS Abstract Syntax Tree) document.
Please refer to [the `dast` format docs](https://www.datocms.com/docs/structured-text/dast) to learn more about the syntax tree format and the available nodes.
## Usage
The main utility in this package is `richTextToStructuredText` which takes a Rich Text JSON and transforms it into a valid `dast` document.
`richTextToStructuredText` returns a `Promise` that resolves with a Structured Text document.
```js
import { richTextToStructuredText } from 'datocms-contentful-to-structured-text';
const richText = {
nodeType: 'document',
data: {},
content: [
{
nodeType: 'heading-1',
content: [
{
nodeType: 'text',
value: 'Lorem ipsum dolor sit amet',
marks: [],
data: {},
},
],
data: {},
},
};
richTextToStructuredText(richText).then((structuredText) => {
console.log(structuredText);
});
```
## Validate `dast` documents
`dast` is a strict format for DatoCMS' Structured Text fields and follows a different pattern from Contentful Rich Text structure.
The `datocms-structured-text-utils` package provides a `validate` utility to validate a Structured Text content to make sure that it is compatible with DatoCMS' Structured Text field.
```js
import { validate } from 'datocms-structured-text-utils';
// ...
richTextToStructuredText(richText).then((structuredText) => {
const { valid, message } = validate(structuredText);
if (!valid) {
throw new Error(message);
}
});
```
We recommend to validate every `dast` to avoid errors later when creating records.
## Advanced Usage
### Options
All the `*ToStructuredText` utils accept an optional `options` object as second argument:
```js
type Options = Partial<{
// Override existing Contentful node handlers or add new ones.
handlers: Record<string, CreateNodeFunction>,
// Array of allowed Block nodes.
allowedBlocks: Array<
BlockquoteType | CodeType | HeadingType | LinkType | ListType,
>,
// Array of allowed marks.
allowedMarks: Mark[],
}>;
```
### Transforming Nodes
The utils in this library traverse a `Contentful Rich Text` tree and transform supported nodes to `dast` nodes. The transformation is done by working on a `Contentful Rich Text` node with a handler (async) function.
Handlers are associated to `Contentful Rich Text` nodes by `nodeType` and look as follow:
```js
import { visitChildren } from 'datocms-contentful-to-structured-text';
// Handler for the paragraph node type.
async function p(createDastNode, contentfulNode, context) {
return createDastNode('paragraph', {
children: await visitChildren(createDastNode, contentfulNode, context),
});
}
```
Handlers can return either a promise that resolves to a `dast` node, an array of `dast` Nodes, or `undefined` to skip the current node.
To ensure that a valid `dast` is generated, the default handlers also check that the current `contentfulNode` is a valid `dast` node for its parent and, if not, they ignore the current node and continue visiting its children.
Information about the parent `dast` node name is available in `context.parentNodeType`.
Please take a look at the [default handlers implementation](./handlers.ts) for examples.
The default handlers are available on `context.defaultHandlers`.
### Context
Every handler receives a `context` object containing the following information:
```js
export interface Context {
// The current parent `dast` node type.
parentNodeType: NodeType;
// The parent `Contentful Rich Text` node.
parentNode: ContentfulNode;
// A reference to the current handlers - merged default + user handlers.
handlers: Record<string, Handler<unknown>>;
// A reference to the default handlers record (map).
defaultHandlers: Record<string, Handler<unknown>>;
// Marks for span nodes.
marks?: Mark[];
// Array of allowed Block types.
allowedBlocks: Array<
BlockquoteType | CodeType | HeadingType | LinkType | ListType,
>;
// Array of allowed marks.
allowedMarks: Mark[];
}
```
### Custom Handlers
It is possible to register custom handlers and override the default behaviour via options, using the `makeHandler` function.
For example, to create a custom handler for the Contentful `text` element, specify a guard clause to specify the correct type
```ts
import { makeHandler } from 'datocms-contentful-to-structured-text';
const customTextHandler = makeHandler(
(node): node is Text => n.nodeType === "text",
async (node) => {
// This custom handler will generate two spans for every text node, instead of one.
return [
{ type: 'span', value: node.value },
{ type: 'span', value: node.value },
];
}),
richTextToStructuredText(richText, {
handlers: [
customTextHandler,
],
}).then((structuredText) => {
console.log(structuredText);
});
```
```js
import { paragraphHandler } from './customHandlers';
richTextToStructuredText(richText, {
handlers: {
paragraph: paragraphHandler,
},
}).then((structuredText) => {
console.log(structuredText);
});
```
It is **highly encouraged** to validate the `dast` when using custom handlers because handlers are responsible for dictating valid parent-children relationships and therefore generating a tree that is compliant with DatoCMS Structured Text.
## Preprocessing Rich Text
Because of the strictness of the `dast` spec, it is possible that some elements might be lost during transformation.
To improve the final result, you might want to modify the Rich Text tree before it is transformed to `dast`.
### Examples
<details>
<summary>Split a node that contains an image.</summary>
In `dast`, images can only be presented as `Block` nodes, but blocks are not allowed inside of `ListItem` nodes (unordered-list/ordered-list). In this example we will split the original `unordered-list` in one list, the lifted up image block and another list.
```js
import { liftAssets } from 'datocms-contentful-to-structured-text';
const richTextWithAssets = {
nodeType: 'document',
data: {},
content: [
{
nodeType: 'unordered-list',
content: [
{
nodeType: 'list-item',
content: [
{
nodeType: 'paragraph',
content: [
{
nodeType: 'text',
value: 'text',
marks: [],
data: {},
},
],
data: {},
},
{
content: [],
data: {
target: {
sys: {
id: 'zzz',
linkType: 'Asset',
type: 'Link',
},
},
},
nodeType: 'embedded-asset-block',
},
{
nodeType: 'paragraph',
content: [
{
nodeType: 'text',
value: 'text',
marks: [],
data: {},
},
],
data: {},
},
],
data: {},
},
],
data: {},
},
],
};
liftAssets(richTextWithAssets);
const handlers = {
'embedded-asset-block': async (createNode, node, context) => {
const item = '123';
return createNode('block', {
item,
});
},
};
const dast = await richTextToStructuredText(richTextWithAssets, { handlers });
```
The liftAssets function transforms the richText tree and moves the embedded-asset-block to root,splitting the list in two parts.
```js
function liftAssets(richText) {
const visit = (node, cb, index = 0, parents = []) => {
if (node.content && node.content.length > 0) {
node.content.forEach((child, index) => {
visit(child, cb, index, [...parents, node]);
});
}
cb(node, index, parents);
};
const liftedImages = new WeakSet();
visit(richText, (node, index, parents) => {
if (
!node ||
node.nodeType !== 'embedded-asset-block' ||
liftedImages.has(node) ||
parents.length === 1 // is a top level asset
) {
return;
}
const imgParent = parents[parents.length - 1];
imgParent.content.splice(index, 1);
let i = parents.length;
let splitChildrenIndex = index;
const contentAfterSplitPoint = [];
while (--i > 0) {
const parent = parents[i];
const parentsParent = parents[i - 1];
contentAfterSplitPoint = parent.content.splice(splitChildrenIndex);
splitChildrenIndex = parentsParent.content.indexOf(parent);
let nodeInserted = false;
if (i === 1) {
splitChildrenIndex += 1;
parentsParent.content.splice(splitChildrenIndex, 0, node);
liftedImages.add(node);
nodeInserted = true;
}
splitChildrenIndex += 1;
if (contentAfterSplitPoint.length > 0) {
parentsParent.content.splice(splitChildrenIndex, 0, {
...parent,
content: contentAfterSplitPoint,
});
}
// Remove the parent if empty
if (parent.content.length === 0) {
splitChildrenIndex -= 1;
parentsParent.content.splice(
nodeInserted ? splitChildrenIndex - 1 : splitChildrenIndex,
1,
);
}
}
});
}
```
</details>
## License
MIT