@ioris/parser-ttml

Version:

[![npm version](https://badge.fury.io/js/@ioris%2Fparser-ttml.svg)](https://badge.fury.io/js/@ioris%2Fparser-ttml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

github.com/8beeeaaat/ioris_parser_ttml

8beeeaaat/ioris_parser_ttml

231 lines (160 loc) • 5.98 kB

Markdown

# @ioris/parser-ttml [![npm version](https://badge.fury.io/js/@ioris%2Fparser-ttml.svg)](https://badge.fury.io/js/@ioris%2Fparser-ttml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A TypeScript library for parsing TTML (Timed Text Markup Language) documents into synchronized lyric structures for music applications. Part of the @ioris ecosystem for music lyric synchronization. ## Features - 🎵 **TTML Parsing** - Parse TTML documents with timing information - ⏱️ **Dual Timing Support** - Handle both line-level and word-level timing synchronization - 🌐 **Japanese Support** - Optional Japanese text tokenization via @ioris/tokenizer-kuromoji - 🔧 **TypeScript** - Full TypeScript support with type definitions - 🚀 **Modern Build** - ESM/CommonJS dual build with tree-shaking support - 🧪 **Well Tested** - Comprehensive test suite with Vitest ## Installation ```bash npm install @ioris/parser-ttml ``` For Japanese tokenization support: ```bash npm install @ioris/parser-ttml @ioris/tokenizer-kuromoji ``` ## Quick Start ### Basic Usage ```typescript import { TTMLParser } from '@ioris/parser-ttml'; // Create parser instance const parser = new TTMLParser(); // Parse TTML document const ttmlDocument = new DOMParser().parseFromString(ttmlContent, 'text/xml'); const lyric = await parser.parse(ttmlDocument, 'song-id'); console.log('Duration:', lyric.duration); console.log('Paragraphs:', lyric.paragraphs.length); ``` ### With Japanese Tokenization ```typescript import { TTMLParser } from '@ioris/parser-ttml'; import { LineArgsTokenizer } from '@ioris/tokenizer-kuromoji'; import { builder } from 'kuromoji'; // Setup Kuromoji tokenizer const kuromojiBuilder = builder({ dicPath: './node_modules/kuromoji/dict' }); const tokenizer = await new Promise((resolve, reject) => { kuromojiBuilder.build((err, tokenizer) => { if (err) reject(err); else resolve(tokenizer); }); }); // Create parser with Japanese tokenization const parser = new TTMLParser({ lineTokenizer: (lineArgs) => LineArgsTokenizer({ lineArgs, tokenizer, }), offsetSec: 0.5 // Optional timing offset }); const lyric = await parser.parse(ttmlDocument, 'japanese-song'); ``` ### Advanced Configuration ```typescript const parser = new TTMLParser({ lineTokenizer: customLineTokenizer, paragraphTokenizer: customParagraphTokenizer, offsetSec: 1.0 // Add 1 second offset to all timings }); ``` ## API Reference ### TTMLParser The main parser class for processing TTML documents. #### Constructor ```typescript new TTMLParser(options?: { lineTokenizer?: CreateLyricArgs["lineTokenizer"]; paragraphTokenizer?: CreateLyricArgs["paragraphTokenizer"]; offsetSec?: number; }) ``` **Parameters:** - `lineTokenizer` (optional) - Custom tokenizer for processing line content - `paragraphTokenizer` (optional) - Custom tokenizer for processing paragraph content - `offsetSec` (optional) - Time offset in seconds to apply to all timing values #### Methods ##### `parse(ttml: XMLDocument, resourceID: string): Promise<Lyric>` Parses a TTML document and returns a structured Lyric object. **Parameters:** - `ttml` - The XML document containing TTML content - `resourceID` - Unique identifier for the lyric resource **Returns:** - `Promise<Lyric>` - A promise that resolves to a structured lyric object with timing information ## TTML Format Support This library supports TTML documents with the following structure: ### Supported Elements - `<tt>` - Root element with timing attributes - `<body>` - Container with duration information - `<div>` - Paragraph groupings with timing - `<p>` - Individual lines with timing - `<span>` - Word-level timing (for detailed synchronization) ### Timing Attributes - `begin` - Start time (supports seconds or HH:MM:SS format) - `end` - End time (supports seconds or HH:MM:SS format) - `dur` - Duration (on body element) ### Example TTML Structure ```xml <tt xmlns="http://www.w3.org/ns/ttml" timing="Word"> <body dur="3:22.827"> <div begin="9.883" end="1:48.678"> <p begin="9.883" end="15.323"> <span begin="9.883" end="11.241">踏みつけ</span> <span begin="11.241" end="11.616">ら</span> <span begin="11.616" end="11.946">れ</span> <span begin="11.946" end="12.229">た</span> </p> </div> </body> </tt> ``` ## Integration with @ioris Ecosystem This library is designed to work seamlessly with other @ioris packages: - **[@ioris/core](https://www.npmjs.com/package/@ioris/core)** - Core lyric structures and utilities - **[@ioris/tokenizer-kuromoji](https://www.npmjs.com/package/@ioris/tokenizer-kuromoji)** - Japanese text tokenization ## Development ### Prerequisites - Node.js 16+ - npm or yarn ### Setup ```bash # Clone the repository git clone https://github.com/8beeeaaat/ioris_parser_ttml.git cd ioris_parser_ttml # Install dependencies npm install # Run tests npm test # Build the project npm run build # Format code npm run format # Lint code npm run lint ``` ### Project Structure ```text src/ ├── index.ts # Main exports ├── Parser.TTMLParser.ts # TTML parser implementation └── Parser.TTMLParser.test.ts # Test suite ``` ### Testing The project uses Vitest for testing with JSDOM for XML parsing simulation: ```bash npm test # Run tests npm run test:watch # Run tests in watch mode ``` ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Changelog See [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes. ## Related Projects - [@ioris/core](https://github.com/8beeeaaat/ioris_core) - Core lyric synchronization library - [@ioris/tokenizer-kuromoji](https://github.com/8beeeaaat/ioris_tokenizer_kuromoji) - Japanese tokenization support --- Made with ❤️ by [8beeeaaat](https://github.com/8beeeaaat)