stax-xml
Version:
308 lines (226 loc) โข 12.5 kB
Markdown
# StAX-XML
[English](#english) | [ํ๊ตญ์ด](#korean)
---
## English
A high-performance, pull-based XML parser for JavaScript/TypeScript inspired by Java's StAX (Streaming API for XML). It offers both **fully asynchronous, stream-based parsing** for large files and **synchronous parsing** for smaller, in-memory XML documents. Unlike traditional XML-to-JSON mappers, StAX-XML allows you to map XML data to any custom structure you desire while efficiently handling XML files through streaming or direct string processing.
### ๐ Features
- **Fully Asynchronous (Stream-based)**: For memory-efficient processing of large XML files.
- **Synchronous (String-based)**: For high-performance parsing of smaller, in-memory XML strings.
- **Pull-based Parsing**: Stream-based approach for memory-efficient processing of large XML files
- **Custom Mapping**: Map XML data to any structure you want, not just plain JSON objects
- **High Performance**: Optimized for speed and low memory usage
- **Universal Compatibility**: Works in Node.js, Bun, Deno, and web browsers using only Web Standard APIs
- **Namespace Support**: Basic XML namespace handling
- **Entity Support**: Built-in entity decoding with custom entity support
- **TypeScript Ready**: Full TypeScript support with comprehensive type definitions
### ๐ฆ Installation
```bash
# npm
npm install stax-xml
# yarn
yarn add stax-xml
# pnpm
pnpm add stax-xml
# bun
bun add stax-xml
# deno
deno add npm:stax-xml
```
### ๐ง Quick Start
Here are basic examples to get started. For detailed usage and API references, please refer to the dedicated documentation files:
- [**StaxXmlParser (Asynchronous)**](docs/StaxXmlParser.md): For parsing XML from `ReadableStream`.
- [**StaxXmlParserSync (Synchronous)**](docs/StaxXmlParserSync.md): For parsing XML from `string`.
- [**StaxXmlWriter**](docs/StaxXmlWriter.md): For writing XML to `string`.
#### Basic Asynchronous Parsing (StaxXmlParser)
```typescript
import { StaxXmlParser, XmlEventType } from 'stax-xml';
const xmlContent = '<root><item>Hello</item></root>';
const stream = new ReadableStream({
start(controller) {
controller.enqueue(new TextEncoder().encode(xmlContent));
controller.close();
}
});
async function parseXml() {
const parser = new StaxXmlParser(stream);
for await (const event of parser) {
console.log(event);
}
}
parseXml();
```
#### Basic Synchronous Parsing (StaxXmlParserSync)
```typescript
import { StaxXmlParserSync, XmlEventType } from 'stax-xml';
const xmlContent = '<data><value>123</value></data>';
const parser = new StaxXmlParserSync(xmlContent);
for (const event of parser) {
console.log(event);
}
```
### ๐ Platform Compatibility
StAX-XML uses only Web Standard APIs, making it compatible with:
- **Node.js** (v18+)
- **Bun** (any version)
- **Deno** (any version)
- **Web Browsers** (modern browsers)
- **Edge Runtime** (Vercel, Cloudflare Workers, etc.)
### ๐งช Testing
```bash
bun test
```
#### Benchmark Results
**Disclaimer:** These benchmarks were performed on a specific system (`cpu: 13th Gen Intel(R) Core(TM) i5-13600K`, `runtime: node 22.17.0 (x64-win32)`) and may vary on different hardware and environments.
**large.xml (97MB) parsing**
| Benchmark | avg (min โฆ max) | p75 / p99 | Memory (avg) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 4.36 s/iter | 4.42 s | 2.66 mb |
| stax-xml consume | 3.61 s/iter | 3.65 s | 3.13 mb |
| xml2js | 6.00 s/iter | 6.00 s | 1.80 mb |
| fast-xml-parser | 4.25 s/iter | 4.26 s | 151.81 mb |
| txml | 1.05 s/iter | 1.06 s | 179.81 mb |
**midsize.xml (13MB) parsing**
| Benchmark | avg (min โฆ max) | p75 / p99 | Memory (avg) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 492.06 ms/iter | 493.28 ms | 326.28 kb |
| stax-xml consume | 469.66 ms/iter | 471.54 ms | 174.51 kb |
| xml2js | 163.26 ยตs/iter | 161.20 ยตs | 89.89 kb |
| fast-xml-parser | 529.99 ms/iter | 531.12 ms | 1.92 mb |
| txml | 112.81 ms/iter | 113.26 ms | 1.00 mb |
**complex.xml (2KB) parsing**
| Benchmark | avg (min โฆ max) | p75 / p99 | Memory (avg) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 85.79 ยตs/iter | 75.60 ยตs | 105.11 kb |
| stax-xml consume | 50.38 ยตs/iter | 49.43 ยตs | 271.12 b |
| xml2js | 147.45 ยตs/iter | 153.50 ยตs | 89.42 kb |
| fast-xml-parser | 101.11 ยตs/iter | 102.20 ยตs | 92.92 kb |
| txml | 9.40 ยตs/iter | 9.41 ยตs | 125.89 b |
**books.xml (4KB) parsing**
| Benchmark | avg (min โฆ max) | p75 / p99 | Memory (avg) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 166.73 ยตs/iter | 156.20 ยตs | 221.40 kb |
| stax-xml consume | 176.45 ยตs/iter | 151.70 ยตs | 202.08 kb |
| xml2js | 259.90 ยตs/iter | 254.50 ยตs | 161.25 kb |
| fast-xml-parser | 239.57 ยตs/iter | 203.30 ยตs | 226.17 kb |
| txml | 19.18 ยตs/iter | 19.26 ยตs | 303.13 b |
### ๐ Sample File Sources
Sources of sample XML files used in testing:
- `books.xml`: [Microsoft XML Document Examples](https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85))
- `simple-namespace.xml`: [W3Schools XML Namespaces Guide](https://www.w3schools.com/xml/xml_namespaces.asp)
- `treebank_e.xml`: [University of Washington XML Data Repository](https://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html)
### ๐ License
MIT
### ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
---
## Korean
Java์ StAX(Streaming API for XML)์์ ์๊ฐ์ ๋ฐ์ ๊ณ ์ฑ๋ฅ pull ๋ฐฉ์์ JavaScript/TypeScript XML ํ์์
๋๋ค. **๋์ฉ๋ ํ์ผ์ ์ํ ์์ ๋น๋๊ธฐ ์คํธ๋ฆผ ๊ธฐ๋ฐ ํ์ฑ**๊ณผ **์์ ์ธ๋ฉ๋ชจ๋ฆฌ XML ๋ฌธ์๋ฅผ ์ํ ๋๊ธฐ ํ์ฑ**์ ๋ชจ๋ ์ ๊ณตํฉ๋๋ค. ๊ธฐ์กด์ XML-JSON ๋งคํผ์ ๋ฌ๋ฆฌ, StAX-XML์ ์ฌ์ฉํ๋ฉด XML ๋ฐ์ดํฐ๋ฅผ ์ํ๋ ์์์ ๊ตฌ์กฐ๋ก ๋งคํํ ์ ์์ผ๋ฉฐ, ์คํธ๋ฆฌ๋ฐ ๋๋ ์ง์ ๋ฌธ์์ด ์ฒ๋ฆฌ๋ฅผ ํตํด XML ํ์ผ์ ํจ์จ์ ์ผ๋ก ์ฒ๋ฆฌํ ์ ์์ต๋๋ค.
### ๐ ์ฃผ์ ๊ธฐ๋ฅ
- **์์ ๋น๋๊ธฐ (์คํธ๋ฆผ ๊ธฐ๋ฐ)**: ๋์ฉ๋ XML ํ์ผ์ ๋ฉ๋ชจ๋ฆฌ ํจ์จ์ ์ฒ๋ฆฌ๋ฅผ ์ํ ์คํธ๋ฆผ ๊ธฐ๋ฐ ์ ๊ทผ
- **๋๊ธฐ (๋ฌธ์์ด ๊ธฐ๋ฐ)**: ์์ ์ธ๋ฉ๋ชจ๋ฆฌ XML ๋ฌธ์์ด์ ๊ณ ์ฑ๋ฅ ํ์ฑ์ ์ํ ์ง์ ๋ฌธ์์ด ์ฒ๋ฆฌ
- **Pull ๋ฐฉ์ ํ์ฑ**: ๋์ฉ๋ XML ํ์ผ์ ๋ฉ๋ชจ๋ฆฌ ํจ์จ์ ์ฒ๋ฆฌ๋ฅผ ์ํ ์คํธ๋ฆผ ๊ธฐ๋ฐ ์ ๊ทผ
- **์ฌ์ฉ์ ์ ์ ๋งคํ**: ๋จ์ํ JSON ๊ฐ์ฒด๊ฐ ์๋ ์ํ๋ ๊ตฌ์กฐ๋ก XML ๋ฐ์ดํฐ ๋งคํ ๊ฐ๋ฅ
- **๊ณ ์ฑ๋ฅ**: ์๋์ ๋ฎ์ ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ต์ ํ
- **๋ฒ์ฉ ํธํ์ฑ**: ์น ํ์ค API๋ง ์ฌ์ฉํ์ฌ Node.js, Bun, Deno, ์น ๋ธ๋ผ์ฐ์ ์์ ๋ชจ๋ ๋์
- **๋ค์์คํ์ด์ค ์ง์**: ๊ธฐ๋ณธ์ ์ธ XML ๋ค์์คํ์ด์ค ์ฒ๋ฆฌ
- **์ํฐํฐ ์ง์**: ๋ด์ฅ ์ํฐํฐ ๋์ฝ๋ฉ ๋ฐ ์ฌ์ฉ์ ์ ์ ์ํฐํฐ ์ง์
- **TypeScript ์ง์**: ํฌ๊ด์ ์ธ ํ์
์ ์๋ก ์์ ํ TypeScript ์ง์
### ๐ฆ ์ค์น
```bash
# npm
npm install stax-xml
# yarn
yarn add stax-xml
# pnpm
pnpm add stax-xml
# bun
bun add stax-xml
# deno
deno add npm:stax-xml
```
### ๐ง ๋น ๋ฅธ ์์
์์ธํ ์ฌ์ฉ ์์ ๋ฐ API ์ฐธ์กฐ๋ ๋ค์ ๋ฌธ์ ํ์ผ์ ์ฐธ์กฐํ์ญ์์ค:
- [**StaxXmlParser (๋น๋๊ธฐ)**](docs/StaxXmlParser.md): `ReadableStream`์์ XML์ ํ์ฑํ๋ ๋ฐ ์ฌ์ฉํฉ๋๋ค.
- [**StaxXmlParserSync (๋๊ธฐ)**](docs/StaxXmlParserSync.md): `string`์์ XML์ ํ์ฑํ๋ ๋ฐ ์ฌ์ฉํฉ๋๋ค.
- [**StaxXmlWriter**](docs/StaxXmlWriter.md): `string`์ผ๋ก XML์ ์์ฑํ๋ ๋ฐ ์ฌ์ฉํฉ๋๋ค.
#### ๊ธฐ๋ณธ ๋น๋๊ธฐ ํ์ฑ (StaxXmlParser)
```typescript
import { StaxXmlParser, XmlEventType } from 'stax-xml';
const xmlContent = '<root><item>์๋
ํ์ธ์</item></root>';
const stream = new ReadableStream({
start(controller) {
controller.enqueue(new TextEncoder().encode(xmlContent));
controller.close();
}
});
async function parseXml() {
const parser = new StaxXmlParser(stream);
for await (const event of parser) {
console.log(event);
}
}
parseXml();
```
#### ๊ธฐ๋ณธ ๋๊ธฐ ํ์ฑ (StaxXmlParserSync)
```typescript
import { StaxXmlParserSync, XmlEventType } from 'stax-xml';
const xmlContent = '<data><value>123</value></data>';
const parser = new StaxXmlParserSync(xmlContent);
for (const event of parser) {
console.log(event);
}
```
### ๐ ํ๋ซํผ ํธํ์ฑ
StAX-XML์ ์น ํ์ค API๋ง์ ์ฌ์ฉํ์ฌ ๋ค์ ํ๊ฒฝ์์ ๋์ํฉ๋๋ค:
- **Node.js** (v18+)
- **Bun** (๋ชจ๋ ๋ฒ์ )
- **Deno** (๋ชจ๋ ๋ฒ์ )
- **์น ๋ธ๋ผ์ฐ์ ** (์ต์ ๋ธ๋ผ์ฐ์ )
- **Edge Runtime** (Vercel, Cloudflare Workers ๋ฑ)
### ๐งช ํ
์คํธ
```bash
bun test
```
#### ๋ฒค์น๋งํฌ ๊ฒฐ๊ณผ
**๋ฉด์ฑ
์กฐํญ:** ์ด ๋ฒค์น๋งํฌ๋ ํน์ ์์คํ
(`cpu: 13th Gen Intel(R) Core(TM) i5-13600K`, `runtime: node 22.17.0 (x64-win32)`)์์ ์ํ๋์์ผ๋ฉฐ, ๋ค๋ฅธ ํ๋์จ์ด ๋ฐ ํ๊ฒฝ์์๋ ๋ค๋ฅผ ์ ์์ต๋๋ค.
**large.xml (97MB) ํ์ฑ**
| ๋ฒค์น๋งํฌ | ํ๊ท (์ต์ โฆ ์ต๋) | p75 / p99 | ๋ฉ๋ชจ๋ฆฌ (ํ๊ท ) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 4.36 s/iter | 4.42 s | 2.66 mb |
| stax-xml consume | 3.61 s/iter | 3.65 s | 3.13 mb |
| xml2js | 6.00 s/iter | 6.00 s | 1.80 mb |
| fast-xml-parser | 4.25 s/iter | 4.26 s | 151.81 mb |
| txml | 1.05 s/iter | 1.06 s | 179.81 mb |
**midsize.xml (13MB) ํ์ฑ**
| ๋ฒค์น๋งํฌ | ํ๊ท (์ต์ โฆ ์ต๋) | p75 / p99 | ๋ฉ๋ชจ๋ฆฌ (ํ๊ท ) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 492.06 ms/iter | 493.28 ms | 326.28 kb |
| stax-xml consume | 469.66 ms/iter | 471.54 ms | 174.51 kb |
| xml2js | 163.26 ยตs/iter | 161.20 ยตs | 89.89 kb |
| fast-xml-parser | 529.99 ms/iter | 531.12 ms | 1.92 mb |
| txml | 112.81 ms/iter | 113.26 ms | 1.00 mb |
**complex.xml (2KB) ํ์ฑ**
| ๋ฒค์น๋งํฌ | ํ๊ท (์ต์ โฆ ์ต๋) | p75 / p99 | ๋ฉ๋ชจ๋ฆฌ (ํ๊ท ) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 85.79 ยตs/iter | 75.60 ยตs | 105.11 kb |
| stax-xml consume | 50.38 ยตs/iter | 49.43 ยตs | 271.12 b |
| xml2js | 147.45 ยตs/iter | 153.50 ยตs | 89.42 kb |
| fast-xml-parser | 101.11 ยตs/iter | 102.20 ยตs | 92.92 kb |
| txml | 9.40 ยตs/iter | 9.41 ยตs | 125.89 b |
**books.xml (4KB) ํ์ฑ**
| ๋ฒค์น๋งํฌ | ํ๊ท (์ต์ โฆ ์ต๋) | p75 / p99 | ๋ฉ๋ชจ๋ฆฌ (ํ๊ท ) |
| :------------------ | :-------------- | :-------------- | :----------- |
| stax-xml to object | 166.73 ยตs/iter | 156.20 ยตs | 221.40 kb |
| stax-xml consume | 176.45 ยตs/iter | 151.70 ยตs | 202.08 kb |
| xml2js | 259.90 ยตs/iter | 254.50 ยตs | 161.25 kb |
| fast-xml-parser | 239.57 ยตs/iter | 203.30 ยตs | 226.17 kb |
| txml | 19.18 ยตs/iter | 19.26 ยตs | 303.13 b |
### ๐ ์ํ ํ์ผ ์ถ์ฒ
ํ
์คํธ์ ์ฌ์ฉ๋ ์ํ XML ํ์ผ๋ค์ ์ถ์ฒ:
- `books.xml`: [Microsoft XML ๋ฌธ์ ์์ ](https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85))
- `simple-namespace.xml`: [W3Schools XML ๋ค์์คํ์ด์ค ๊ฐ์ด๋](https://www.w3schools.com/xml/xml_namespaces.asp)
- `treebank_e.xml`: [University of Washington XML Data Repository](https://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html)
### ๐ ๋ผ์ด์ ์ค
MIT
### ๐ค ๊ธฐ์ฌํ๊ธฐ
๊ธฐ์ฌ๋ฅผ ํ์ํฉ๋๋ค! Pull Request๋ฅผ ์์ ๋กญ๊ฒ ์ ์ถํด ์ฃผ์ธ์.