@ioris/parser-ttml
Version:
[](https://badge.fury.io/js/@ioris%2Fparser-ttml) [](https://opensource.org/licenses/MIT)
231 lines (160 loc) β’ 5.98 kB
Markdown
# @ioris/parser-ttml
[](https://badge.fury.io/js/@ioris%2Fparser-ttml)
[](https://opensource.org/licenses/MIT)
A TypeScript library for parsing TTML (Timed Text Markup Language) documents into synchronized lyric structures for music applications. Part of the @ioris ecosystem for music lyric synchronization.
## Features
- π΅ **TTML Parsing** - Parse TTML documents with timing information
- β±οΈ **Dual Timing Support** - Handle both line-level and word-level timing synchronization
- π **Japanese Support** - Optional Japanese text tokenization via @ioris/tokenizer-kuromoji
- π§ **TypeScript** - Full TypeScript support with type definitions
- π **Modern Build** - ESM/CommonJS dual build with tree-shaking support
- π§ͺ **Well Tested** - Comprehensive test suite with Vitest
## Installation
```bash
npm install @ioris/parser-ttml
```
For Japanese tokenization support:
```bash
npm install @ioris/parser-ttml @ioris/tokenizer-kuromoji
```
## Quick Start
### Basic Usage
```typescript
import { TTMLParser } from '@ioris/parser-ttml';
// Create parser instance
const parser = new TTMLParser();
// Parse TTML document
const ttmlDocument = new DOMParser().parseFromString(ttmlContent, 'text/xml');
const lyric = await parser.parse(ttmlDocument, 'song-id');
console.log('Duration:', lyric.duration);
console.log('Paragraphs:', lyric.paragraphs.length);
```
### With Japanese Tokenization
```typescript
import { TTMLParser } from '@ioris/parser-ttml';
import { LineArgsTokenizer } from '@ioris/tokenizer-kuromoji';
import { builder } from 'kuromoji';
// Setup Kuromoji tokenizer
const kuromojiBuilder = builder({
dicPath: './node_modules/kuromoji/dict'
});
const tokenizer = await new Promise((resolve, reject) => {
kuromojiBuilder.build((err, tokenizer) => {
if (err) reject(err);
else resolve(tokenizer);
});
});
// Create parser with Japanese tokenization
const parser = new TTMLParser({
lineTokenizer: (lineArgs) => LineArgsTokenizer({
lineArgs,
tokenizer,
}),
offsetSec: 0.5 // Optional timing offset
});
const lyric = await parser.parse(ttmlDocument, 'japanese-song');
```
### Advanced Configuration
```typescript
const parser = new TTMLParser({
lineTokenizer: customLineTokenizer,
paragraphTokenizer: customParagraphTokenizer,
offsetSec: 1.0 // Add 1 second offset to all timings
});
```
## API Reference
### TTMLParser
The main parser class for processing TTML documents.
#### Constructor
```typescript
new TTMLParser(options?: {
lineTokenizer?: CreateLyricArgs["lineTokenizer"];
paragraphTokenizer?: CreateLyricArgs["paragraphTokenizer"];
offsetSec?: number;
})
```
**Parameters:**
- `lineTokenizer` (optional) - Custom tokenizer for processing line content
- `paragraphTokenizer` (optional) - Custom tokenizer for processing paragraph content
- `offsetSec` (optional) - Time offset in seconds to apply to all timing values
#### Methods
##### `parse(ttml: XMLDocument, resourceID: string): Promise<Lyric>`
Parses a TTML document and returns a structured Lyric object.
**Parameters:**
- `ttml` - The XML document containing TTML content
- `resourceID` - Unique identifier for the lyric resource
**Returns:**
- `Promise<Lyric>` - A promise that resolves to a structured lyric object with timing information
## TTML Format Support
This library supports TTML documents with the following structure:
### Supported Elements
- `<tt>` - Root element with timing attributes
- `<body>` - Container with duration information
- `<div>` - Paragraph groupings with timing
- `<p>` - Individual lines with timing
- `<span>` - Word-level timing (for detailed synchronization)
### Timing Attributes
- `begin` - Start time (supports seconds or HH:MM:SS format)
- `end` - End time (supports seconds or HH:MM:SS format)
- `dur` - Duration (on body element)
### Example TTML Structure
```xml
<tt xmlns="http://www.w3.org/ns/ttml" timing="Word">
<body dur="3:22.827">
<div begin="9.883" end="1:48.678">
<p begin="9.883" end="15.323">
<span begin="9.883" end="11.241">θΈγΏγ€γ</span>
<span begin="11.241" end="11.616">γ</span>
<span begin="11.616" end="11.946">γ</span>
<span begin="11.946" end="12.229">γ</span>
</p>
</div>
</body>
</tt>
```
## Integration with @ioris Ecosystem
This library is designed to work seamlessly with other @ioris packages:
- **[@ioris/core](https://www.npmjs.com/package/@ioris/core)** - Core lyric structures and utilities
- **[@ioris/tokenizer-kuromoji](https://www.npmjs.com/package/@ioris/tokenizer-kuromoji)** - Japanese text tokenization
## Development
### Prerequisites
- Node.js 16+
- npm or yarn
### Setup
```bash
# Clone the repository
git clone https://github.com/8beeeaaat/ioris_parser_ttml.git
cd ioris_parser_ttml
# Install dependencies
npm install
# Run tests
npm test
# Build the project
npm run build
# Format code
npm run format
# Lint code
npm run lint
```
### Project Structure
```text
src/
βββ index.ts # Main exports
βββ Parser.TTMLParser.ts # TTML parser implementation
βββ Parser.TTMLParser.test.ts # Test suite
```
### Testing
The project uses Vitest for testing with JSDOM for XML parsing simulation:
```bash
npm test # Run tests
npm run test:watch # Run tests in watch mode
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for a detailed history of changes.
## Related Projects
- [@ioris/core](https://github.com/8beeeaaat/ioris_core) - Core lyric synchronization library
- [@ioris/tokenizer-kuromoji](https://github.com/8beeeaaat/ioris_tokenizer_kuromoji) - Japanese tokenization support
---
Made with β€οΈ by [8beeeaaat](https://github.com/8beeeaaat)