saxen
Version:
A tiny, super fast, namespace aware sax-style XML parser written in plain JavaScript
146 lines (96 loc) • 4.45 kB
Markdown
# `/saxen/` parser
[](https://github.com/nikku/saxen/actions?query=workflow%3ACI)
[](https://codecov.io/gh/nikku/saxen)
A tiny, super fast, namespace aware [sax-style](https://en.wikipedia.org/wiki/Simple_API_for_XML) XML parser written in plain JavaScript.
## Features
* (optional) entity decoding and attribute parsing
* (optional) namespace aware
* element / attribute normalization in namespaced mode
* tiny (`2.6Kb` minified + gzipped)
* [pretty damn fast](https://github.com/nikku/js-sax-parser-tests)
## Usage
```javascript
var {
Parser
} = require('saxen');
var parser = new Parser();
// enable namespace parsing: element prefixes will
// automatically adjusted to the ones configured here
// elements in other namespaces will still be processed
parser.ns({
'http://foo': 'foo',
'http://bar': 'bar'
});
parser.on('openTag', function(elementName, attrGetter, decodeEntities, selfClosing, getContext) {
elementName;
// with prefix, i.e. foo:blub
var attrs = attrGetter();
// { 'bar:aa': 'A', ... }
});
parser.parse('<blub xmlns="http://foo" xmlns:bar="http://bar" bar:aa="A" />');
```
## Supported Hooks
We support the following parse hooks:
* `openTag(elementName, attrGetter, decodeEntities, selfClosing, contextGetter)`
* `closeTag(elementName, decodeEntities, selfClosing, contextGetter)`
* `error(err, contextGetter)`
* `warn(warning, contextGetter)`
* `text(value, decodeEntities, contextGetter)`
* `cdata(value, contextGetter)`
* `comment(value, decodeEntities, contextGetter)`
* `attention(str, decodeEntities, contextGetter)`
* `question(str, contextGetter)`
In contrast to `error`, `warn` receives recoverable errors, such as malformed attributes.
In [proxy mode](#proxy-mode), `openTag` and `closeTag` a view of the current element replaces the raw element name. In addition element attributes are not passed as a getter to `openTag`. Instead, they get exposed via the `element.attrs`:
* `openTag(element, decodeEntities, selfClosing, contextGetter)`
* `closeTag(element, selfClosing, contextGetter)`
## Namespace Handling
In namespace mode, the parser will adjust tag and attribute namespace prefixes before
passing the elements name to `openTag` or `closeTag`. To do that, you need to
configure default prefixes for wellknown namespaces:
```javascript
parser.ns({
'http://foo': 'foo',
'http://bar': 'bar'
});
```
To skip the adjustment and still process namespace information:
```javascript
parser.ns();
```
## Proxy Mode
In this mode, the first argument passed to `openTag` and `closeTag` is an object that exposes more internal XML parse state. This needs to be explicity enabled by instantiating the parser with `{ proxy: true }`.
```javascript
// instantiate parser with proxy=true
var parser = new Parser({ proxy: true });
parser.ns({
'http://foo-ns': 'foo'
});
parser.on('openTag', function(el, decodeEntities, selfClosing, getContext) {
el.originalName; // root
el.name; // foo:root
el.attrs; // { 'xmlns:foo': ..., id: '1' }
el.ns; // { xmlns: 'foo', foo: 'foo', foo$uri: 'http://foo-ns' }
});
parser.parse('<root xmlns:foo="http://foo-ns" id="1" />')
```
Proxy mode comes with a performance penelty of roughly five percent.
__Caution!__ For performance reasons the exposed element is a simple view into the current parser state. Because of that, it will change with the parser advancing and cannot be cached. If you would like to retain a persistent copy of the values, create a shallow clone:
```javascript
parser.on('openTag', function(el) {
var copy = Object.assign({}, el);
// copy, ready to keep around
});
```
## Non-Features
`/saxen/` lacks some features known in other XML parsers such as [sax-js](https://github.com/isaacs/sax-js):
* no support for parsing loose documents, such as arbitrary HTML snippets
* no support for text trimming
* no automatic entity decoding
* no automatic attribute parsing
...and that is ok ❤.
## Credits
We build on the awesome work done by [easysax](https://github.com/vflash/easysax).
`/saxen/` is named after [Sachsen](https://en.wikipedia.org/wiki/Saxony), a federal state of Germany. So geht sächsisch!
## LICENSE
MIT