UNPKG

parsoid-jsapi

Version:

Parsoid JSAPI

173 lines (144 loc) 7.38 kB
Parsoid JSAPI ============= [![Build Status](https://travis-ci.org/wikimedia/parsoid-jsapi.svg?branch=master)](https://travis-ci.org/wikimedia/parsoid-jsapi) Usage of the JavaScript API --------------------------- This file describes usage of Parsoid as a standalone wikitext parsing package, in the spirit of [`mwparserfromhell`]. This is not the typical use case for Parsoid; it is more often used as a network service. See [the HTTP API guide](#!/guide/apiuse) or [Parsoid service] on the wiki for more details. These examples will use the [`prfun`] library and [ES6 generators] in order to fluently express asynchronous operations. The library also exports vanilla [`Promise`]s if you wish to maintain compatibility with old versions of `node` at the cost of a little bit of readability. Since many methods in the API return [`Promise`]s, we've also provided a [`Promise`]-aware REPL, that will wait for a promise to be resolved before printing its value. This can be started from the shell using: node -e 'require("parsoid-jsapi").repl()' Use `"./"` instead of `"parsoid"` if you are running this from a checked-out repository. Code examples below which contain lines starting with `>` show sessions using this REPL. (You may also wish to look in `tests/mocha/jsapi.js` for examples using a more traditional promise-chaining style.) Use of Parsoid as a wikitext parser is straightforward (where `text` is wikitext input): #/usr/bin/node --harmony-generators var Promise = require('prfun'); var Parsoid = require('parsoid-jsapi'); var main = Promise.async(function*() { var text = "I love wikitext!"; var pdoc = yield Parsoid.parse(text, { pdoc: true }); console.log(pdoc.document.outerHTML); }); // start me up! main().done(); As you can see, there is a little bit of boilerplate needed to get the asynchronous machinery started. The body of the `main()` method can be replaced with your code. The `pdoc` variable above holds a [`PDoc`] object, which has helpful methods to filter and manipulate the document. If you want to access the raw [Parsoid DOM], however, it is easily accessible via the [`document`](#!/api/PDoc-property-document) property, as shown above, and all normal DOM manipulation functions can be used on it (Parsoid uses [`domino`] to implement these methods). Be sure to call [`update()`](#!/api/PNode-method-update) after any direct DOM manipulation. [`PDoc`] is a subclass of [`PNodeList`], which provides a number of useful access and mutation methods -- and if you use these you won't need to manually call `update()`. These provided methods can be quite useful. For example: > var text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?\n"; > var pdoc = yield Parsoid.parse(text, { pdoc: true }); > console.log(yield pdoc.toWikitext()); I has a template! {{foo|bar|baz|eggs=spam}} See it? > var templates = pdoc.filterTemplates(); > console.log(yield Promise.map(templates, Parsoid.toWikitext)); [ '{{foo|bar|baz|eggs=spam}}' ] > var template = templates[0]; > console.log(template.name); foo > template.name = 'notfoo'; > console.log(yield template.toWikitext()); {{notfoo|bar|baz|eggs=spam}} > console.log(template.params.map(function(p) { return p.name; })); [ '1', '2', 'eggs' ] > console.log(yield template.get(1).value.toWikitext()); bar > console.log(yield template.get("eggs").value.toWikitext()); spam Getting nested templates is trivial: > var text = "{{foo|bar={{baz|{{spam}}}}}}"; > var pdoc = yield Parsoid.parse(text, { pdoc: true }); > console.log(yield Promise.map(pdoc.filterTemplates(), Parsoid.toWikitext)); [ '{{foo|bar={{baz|{{spam}}}}}}', '{{baz|{{spam}}}}', '{{spam}}' ] You can also pass `{ recursive: false }` to [`filterTemplates()`](#!/api/PNodeList-method-filterTemplates) and explore templates manually. This is possible because the [`get`](#!/api/PTemplate-method-get) method on a [`PTemplate`] object returns an object containing further [`PNodeList`]s: > var text = "{{foo|this {{includes a|template}}}}"; > var pdoc = yield Parsoid.parse(text, { pdoc: true }); > var templates = pdoc.filterTemplates({ recursive: false }); > console.log(yield Promise.map(templates, Parsoid.toWikitext)); [ '{{foo|this {{includes a|template}}}}' ] > var foo = templates[0]; > console.log(yield foo.get(1).value.toWikitext()); this {{includes a|template}} > var more = foo.get(1).value.filterTemplates(); > console.log(yield Promise.map(more, Parsoid.toWikitext)); [ '{{includes a|template}}' ] > console.log(yield more[0].get(1).value.toWikitext()); template Templates can be easily modified to add, remove, or alter params. Templates also have a [`nameMatches()`](#!/api/PTemplate-method-nameMatches) method for comparing template names, which takes care of capitalization and white space: > var text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"; > var pdoc = yield Parsoid.parse(text, { pdoc: true }); > pdoc.filterTemplates().forEach(function(template) { ... if (template.nameMatches('Cleanup') && !template.has('date')) { ... template.add('date', 'July 2012'); ... } ... if (template.nameMatches('uncategorized')) { ... template.name = 'bar-stub'; ... } ... }); > console.log(yield pdoc.toWikitext()); {{cleanup|date = July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}} At any time you can convert the `pdoc` into HTML conforming to the [MediaWiki DOM spec] (by referencing the [`document`](#!/api/PDoc-property-document) property) or into wikitext (by invoking [`toWikitext()`](#!/api/PNodeList-method-toWikitext), which returns a [`Promise`] for the wikitext string). This allows you to save the page using either standard API methods or the RESTBase API (once [T101501](https://phabricator.wikimedia.org/T101501) is resolved). For more tips, check out [PNodeList's full method list](#!/api/PNodeList) and the list of [PNode](#!/api/PNode) subclasses. [`mwparserfromhell`]: http://mwparserfromhell.readthedocs.org/en/latest/index.html [Parsoid service]: https://www.mediawiki.org/wiki/Parsoid [`prfun`]: https://github.com/cscott/prfun [ES6 generators]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function* [`Promise`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise [Parsoid DOM]: http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec [MediaWiki DOM spec]: http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec [`domino`]: https://www.npmjs.com/package/domino [`PDoc`]: #!/api/PDoc [`PNodeList`]: #!/api/PNodeList [`PTemplate`]: #!/api/PTemplate License ------- Copyright (c) 2011-2015 Wikimedia Foundation and others. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.