md2hwp
Version:
Convert Markdown to HWP (Hangul Word Processor) format
406 lines (299 loc) • 12.7 kB
Markdown
# md2hwp
Convert Markdown files to HWP (Hangul Word Processor) format. This library supports HWPX (HWP 5.x), the XML-based format used by Hancom Office.
## Features
- ✅ Convert Markdown to HWP format (HWPX/HWP 5.x)
- ✅ **Verified Working** with Hancom Office 2020+
- ✅ **Professional heading hierarchy** (H1-H6 with graduated font sizes)
- ✅ **Natural line wrapping** - no character spacing compression
- ✅ **Improved line spacing** (140%-160% for readability)
- ✅ **Smart paragraph spacing** - context-aware gaps between content types
- ✅ Support for paragraphs, lists, and mixed content
- ✅ **Nested lists** with proper indentation levels
- ✅ Text formatting: **bold** and *italic*
- ✅ **Bold text in list items** with mixed content support
- ✅ **Tables** with visible borders and proper formatting
- ✅ Korean language support (UTF-8 encoding)
- ✅ Works in Node.js environments
- ✅ TypeScript support with full type definitions
- ✅ Easy integration with React and other frameworks
- 🚧 Code blocks (planned)
- 🚧 Images (currently text placeholders)
## What's New in v1.2.6
### ✅ Fixed Nested List Indentation
**v1.2.6** fixes the indentation issue from v1.2.5 by using pre-defined paragraph properties instead of inline paraPr.
Full support for nested/indented lists with visual hierarchy:
**What Now Works:**
- ✅ **Nested list parsing** - correctly handles multi-level list structures
- ✅ **Visual indentation** - each nesting level is indented by 800 HWPUNIT
- ✅ **Bold text in lists** - supports `**label**: value` pattern in list items
- ✅ **Mixed content** - handles both bold and normal text in same list item
- ✅ **Recursive nesting** - supports arbitrary depth of nested lists
**Example:**
```markdown
- **총 예산**: 35,000,000원
- **주요 항목**:
- 해외 연사 항공료 및 숙박: ~10,440,000원
- 연사비: 3,400,000원
- 식사 (점심 + 만찬): 4,400,000원
```
**Technical Details:**
- Recursively parses `item.tokens` array from marked.js
- Uses pre-defined paragraph properties in header (not inline paraPr)
- Created 14 paraPr definitions (id="0" to "10", "20", "21") with progressive indentation
- Each nesting level adds 800 HWPUNIT (approximately 8mm) of left margin
- Handles text tokens with inline formatting (bold/italic) within lists
- Formula: `paraPrId = Math.min(level + 1, 10)` supports up to 9 levels of nesting
## What's New in v1.2.4
### ✅ Bold Text Support - Finally Working!
After extensive debugging and analysis of the HWP format, bold text (`**text**`) now works correctly:
**The Problem We Solved:**
1. **Character Property References**: HWP's `charPrIDRef` uses position index, not the `id` attribute value
- Fixed by making all charPr IDs sequential (0-5)
2. **Bold Tag Discovery**: Found that HWP requires `<hh:bold/>` tag, not `<hh:fontweight>`
- Analyzed user-corrected HWP file to discover the correct format
3. **Font References**: Bold text needs different `fontRef` values for non-Latin scripts
**What Now Works:**
- ✅ **Bold inline text** renders at 10pt with bold weight (same size as normal text)
- ✅ **Headings** render at correct sizes (H1: 14pt, H2: 13pt, etc.) with bold
- ✅ Both **English** and **Korean** bold text work correctly
- ✅ No outline boxes or unwanted borders on bold text
**Technical Details:**
- Uses `<hh:bold/>` tag instead of fontweight
- Sequential charPr IDs (0=normal, 1=bold, 2=H1, 3=H2, 4=H3, 5=H4)
- Correct `fontRef` values for CJK scripts when bold
## What's New in v1.2
### 🎯 Professional Document Quality
This version includes significant improvements for professional HWP output:
1. **Heading Hierarchy** - H1-H6 headings now use different font sizes:
- H1: 14pt (1400 HWPUNIT)
- H2: 13pt (1300 HWPUNIT)
- H3: 12pt (1200 HWPUNIT)
- H4: 11pt (1100 HWPUNIT)
- H5/H6: 10pt (normal size)
2. **Improved Line Spacing (줄간격)**:
- Headings: 140% line spacing
- Lists: 150% line spacing
- Paragraphs: 160% line spacing
- Smart vertical gaps between different content types
3. **Natural Line Wrapping** ✨:
- Long sentences wrap naturally to the next line
- **No character spacing compression (자간 압축 해결)**
- Proper text flow for both Korean and English
- Fixed the common HWP issue where long text gets compressed onto one line
See [IMPROVEMENTS_SUMMARY.md](docs/IMPROVEMENTS_SUMMARY.md) for full details.
## Installation
```bash
npm install md2hwp
```
## Usage
### Basic Usage
```javascript
const { convertMarkdownToHwp } = require('md2hwp');
const fs = require('fs').promises;
const markdown = `# Hello World
This is **bold** and this is *italic*.
- Item 1
- Item 2
`;
const buffer = await convertMarkdownToHwp(markdown);
await fs.writeFile('output.hwp', buffer);
```
### Convert File
```javascript
const { convertFileToHwp } = require('md2hwp');
await convertFileToHwp('./input.md', './output.hwp', {
title: 'My Document',
author: 'John Doe'
});
```
### Using the Class API
```javascript
const { Md2Hwp } = require('md2hwp');
const converter = new Md2Hwp({
title: 'My Document',
author: 'John Doe',
pageWidth: 59528,
pageHeight: 84188
});
const buffer = await converter.convert(markdown);
```
### React Integration
```tsx
import React, { useState } from 'react';
import { convertMarkdownToHwp } from 'md2hwp';
function App() {
const [markdown, setMarkdown] = useState('# Hello');
const handleDownload = async () => {
const buffer = await convertMarkdownToHwp(markdown);
// Create download link
const blob = new Blob([buffer], { type: 'application/octet-stream' });
const url = URL.createObjectURL(blob);
const link = document.createElement('a');
link.href = url;
link.download = 'document.hwp';
link.click();
URL.revokeObjectURL(url);
};
return (
<div>
<textarea value={markdown} onChange={e => setMarkdown(e.target.value)} />
<button onClick={handleDownload}>Download HWP</button>
</div>
);
}
```
### Next.js Integration
```typescript
// app/api/convert/route.ts
import { convertMarkdownToHwp } from 'md2hwp';
import { NextResponse } from 'next/server';
export async function POST(request: Request) {
const { markdown } = await request.json();
const buffer = await convertMarkdownToHwp(markdown, {
title: 'Document',
author: 'User'
});
return new NextResponse(buffer, {
headers: {
'Content-Type': 'application/octet-stream',
'Content-Disposition': 'attachment; filename="document.hwp"'
}
});
}
```
## Options
```typescript
interface Md2HwpOptions {
title?: string; // Document title (default: 'Document')
author?: string; // Document author (default: 'md2hwp')
pageWidth?: number; // Page width in HWPUNIT (default: 59528)
pageHeight?: number; // Page height in HWPUNIT (default: 84188)
marginLeft?: number; // Left margin in HWPUNIT (default: 8504)
marginRight?: number; // Right margin in HWPUNIT (default: 8504)
marginTop?: number; // Top margin in HWPUNIT (default: 5668)
marginBottom?: number; // Bottom margin in HWPUNIT (default: 4252)
}
```
Note: 1 HWPUNIT ≈ 1/7200 inch
## Supported Markdown Features
### ✅ Currently Working
#### Headings with Visual Hierarchy
```markdown
# Heading 1 (14pt, bold)
## Heading 2 (13pt, bold)
### Heading 3 (12pt, bold)
#### Heading 4 (11pt, bold)
##### Heading 5 (10pt, bold)
###### Heading 6 (10pt, bold)
```
#### Text Formatting
```markdown
**bold text**
*italic text*
```
#### Lists with Proper Spacing
```markdown
- Item 1
- Item 2
- Item 3
```
#### Tables
```markdown
| Name | Age | Role |
|------|-----|------|
| John | 30 | Developer |
| Jane | 28 | Designer |
```
#### Korean Text with Natural Wrapping
```markdown
# 안녕하세요
이것은 매우 긴 한글 문장입니다. 이 문장은 자간을 압축하지 않고 자연스럽게 다음 줄로 넘어가야 합니다.
문장이 길어서 한 줄에 다 들어가지 않을 때에도 자간을 조종하여 한 줄로 만들지 않습니다.
This is a very long English sentence that should wrap to the next line naturally
without compressing the character spacing to force everything onto a single line.
```
### 🚧 Planned Features
- Code blocks
- Links (currently rendered as plain text)
- Images (currently text placeholders)
- Nested lists
## API Reference
### convertMarkdownToHwp(markdown, options?)
Converts a Markdown string to HWP format.
- `markdown` (string): The Markdown content to convert
- `options` (Md2HwpOptions): Optional conversion options
- Returns: `Promise<Buffer>` - The HWP file as a Buffer
### convertFileToHwp(inputPath, outputPath, options?)
Converts a Markdown file to HWP format.
- `inputPath` (string): Path to the input Markdown file
- `outputPath` (string): Path to the output HWP file
- `options` (Md2HwpOptions): Optional conversion options
- Returns: `Promise<void>`
### class Md2Hwp
The main converter class.
#### Constructor
```typescript
new Md2Hwp(options?: Md2HwpOptions)
```
#### Methods
- `convert(markdown: string): Promise<Buffer>` - Convert Markdown string to HWP
- `convertFile(inputPath: string, outputPath: string): Promise<void>` - Convert file
## Examples
Check the `examples/` directory for complete examples:
- `basic.js` - Basic conversion example
- `file-conversion.js` - File-to-file conversion
- `react-example.tsx` - React component example
## Technical Details
### HWPX Format
The library generates HWPX (HWP 5.x) files, which are ZIP archives containing XML files:
- `mimetype` - MIME type identifier (stored uncompressed)
- `version.xml` - HWP version information
- `settings.xml` - Document settings
- `Contents/header.xml` - Document metadata, styles, fonts, and formatting definitions
- `Contents/section0.xml` - Main document content
- `META-INF/` - Container metadata
### Key Features of Implementation
- The `mimetype` file is stored **uncompressed** as required by the HWPX specification
- Includes proper `FONTFACELIST` for Korean (맑은 고딕) and Latin (Arial) fonts
- Uses graduated character shapes (charPr) for heading hierarchy
- Multiple paragraph properties (paraPr) for different line spacing requirements
- **Natural line wrapping** via proper `breakNonLatinWord` settings
- **No linesegarray** for regular paragraphs - allows HWP to calculate line breaks naturally
- Generates valid HWPML (HWP Markup Language) XML structure
### Line Wrapping Implementation
The library uses a sophisticated approach to ensure natural line wrapping:
1. **No pre-calculated line breaks** - Regular paragraphs don't include `<hp:linesegarray>`
2. **Proper paragraph settings** - Uses `breakNonLatinWord="BREAK_WORD"` for Korean/CJK text
3. **Zero character spacing** - Prevents automatic spacing compression
4. **Context-aware layout** - HWP calculates optimal line breaks based on content
See [docs/Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) for detailed technical analysis.
## Troubleshooting
### File appears corrupted in Hancom Office
If you see "파일이 손상되었습니다" (file is corrupted), make sure:
1. You're using the latest version of this library
2. The generated file has the `.hwp` extension
3. Try opening with Hancom Office 2014 or later
### Korean characters not displaying correctly
The library includes proper font definitions for Korean text. If you encounter issues:
- Ensure your input Markdown is UTF-8 encoded
- The library uses "맑은 고딕" (Malgun Gothic) for Korean text
- Latin text uses Arial font
### Text appears compressed on one line
This issue has been **fixed in v1.2**! If you're still experiencing it:
1. Update to the latest version
2. Check that long sentences now wrap naturally
3. See [docs/Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) for technical details
## Documentation
Comprehensive documentation is available in the `docs/` directory:
- [IMPROVEMENTS_SUMMARY.md](docs/IMPROVEMENTS_SUMMARY.md) - Summary of all improvements
- [Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) - Detailed line wrapping fix analysis
- [HWP_Document_Data_Records.md](docs/HWP_Document_Data_Records.md) - HWPTAG reference
- [HWP_CharShape_Structure.md](docs/HWP_CharShape_Structure.md) - Character properties
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT
## Acknowledgments
- Uses [marked](https://github.com/markedjs/marked) for Markdown parsing
- Uses [JSZip](https://github.com/Stuk/jszip) for HWPX file generation
- HWP format research and implementation based on Hancom Office documentation