UNPKG

md2hwp

Version:

Convert Markdown to HWP (Hangul Word Processor) format

406 lines (299 loc) 12.7 kB
# md2hwp Convert Markdown files to HWP (Hangul Word Processor) format. This library supports HWPX (HWP 5.x), the XML-based format used by Hancom Office. ## Features - ✅ Convert Markdown to HWP format (HWPX/HWP 5.x) -**Verified Working** with Hancom Office 2020+ -**Professional heading hierarchy** (H1-H6 with graduated font sizes) -**Natural line wrapping** - no character spacing compression -**Improved line spacing** (140%-160% for readability) -**Smart paragraph spacing** - context-aware gaps between content types - ✅ Support for paragraphs, lists, and mixed content -**Nested lists** with proper indentation levels - ✅ Text formatting: **bold** and *italic* -**Bold text in list items** with mixed content support -**Tables** with visible borders and proper formatting - ✅ Korean language support (UTF-8 encoding) - ✅ Works in Node.js environments - ✅ TypeScript support with full type definitions - ✅ Easy integration with React and other frameworks - 🚧 Code blocks (planned) - 🚧 Images (currently text placeholders) ## What's New in v1.2.6 ### ✅ Fixed Nested List Indentation **v1.2.6** fixes the indentation issue from v1.2.5 by using pre-defined paragraph properties instead of inline paraPr. Full support for nested/indented lists with visual hierarchy: **What Now Works:** -**Nested list parsing** - correctly handles multi-level list structures -**Visual indentation** - each nesting level is indented by 800 HWPUNIT -**Bold text in lists** - supports `**label**: value` pattern in list items -**Mixed content** - handles both bold and normal text in same list item -**Recursive nesting** - supports arbitrary depth of nested lists **Example:** ```markdown - **총 예산**: 35,000,000원 - **주요 항목**: - 해외 연사 항공료 및 숙박: ~10,440,000원 - 연사비: 3,400,000원 - 식사 (점심 + 만찬): 4,400,000원 ``` **Technical Details:** - Recursively parses `item.tokens` array from marked.js - Uses pre-defined paragraph properties in header (not inline paraPr) - Created 14 paraPr definitions (id="0" to "10", "20", "21") with progressive indentation - Each nesting level adds 800 HWPUNIT (approximately 8mm) of left margin - Handles text tokens with inline formatting (bold/italic) within lists - Formula: `paraPrId = Math.min(level + 1, 10)` supports up to 9 levels of nesting ## What's New in v1.2.4 ### ✅ Bold Text Support - Finally Working! After extensive debugging and analysis of the HWP format, bold text (`**text**`) now works correctly: **The Problem We Solved:** 1. **Character Property References**: HWP's `charPrIDRef` uses position index, not the `id` attribute value - Fixed by making all charPr IDs sequential (0-5) 2. **Bold Tag Discovery**: Found that HWP requires `<hh:bold/>` tag, not `<hh:fontweight>` - Analyzed user-corrected HWP file to discover the correct format 3. **Font References**: Bold text needs different `fontRef` values for non-Latin scripts **What Now Works:** -**Bold inline text** renders at 10pt with bold weight (same size as normal text) -**Headings** render at correct sizes (H1: 14pt, H2: 13pt, etc.) with bold - ✅ Both **English** and **Korean** bold text work correctly - ✅ No outline boxes or unwanted borders on bold text **Technical Details:** - Uses `<hh:bold/>` tag instead of fontweight - Sequential charPr IDs (0=normal, 1=bold, 2=H1, 3=H2, 4=H3, 5=H4) - Correct `fontRef` values for CJK scripts when bold ## What's New in v1.2 ### 🎯 Professional Document Quality This version includes significant improvements for professional HWP output: 1. **Heading Hierarchy** - H1-H6 headings now use different font sizes: - H1: 14pt (1400 HWPUNIT) - H2: 13pt (1300 HWPUNIT) - H3: 12pt (1200 HWPUNIT) - H4: 11pt (1100 HWPUNIT) - H5/H6: 10pt (normal size) 2. **Improved Line Spacing (줄간격)**: - Headings: 140% line spacing - Lists: 150% line spacing - Paragraphs: 160% line spacing - Smart vertical gaps between different content types 3. **Natural Line Wrapping** ✨: - Long sentences wrap naturally to the next line - **No character spacing compression (자간 압축 해결)** - Proper text flow for both Korean and English - Fixed the common HWP issue where long text gets compressed onto one line See [IMPROVEMENTS_SUMMARY.md](docs/IMPROVEMENTS_SUMMARY.md) for full details. ## Installation ```bash npm install md2hwp ``` ## Usage ### Basic Usage ```javascript const { convertMarkdownToHwp } = require('md2hwp'); const fs = require('fs').promises; const markdown = `# Hello World This is **bold** and this is *italic*. - Item 1 - Item 2 `; const buffer = await convertMarkdownToHwp(markdown); await fs.writeFile('output.hwp', buffer); ``` ### Convert File ```javascript const { convertFileToHwp } = require('md2hwp'); await convertFileToHwp('./input.md', './output.hwp', { title: 'My Document', author: 'John Doe' }); ``` ### Using the Class API ```javascript const { Md2Hwp } = require('md2hwp'); const converter = new Md2Hwp({ title: 'My Document', author: 'John Doe', pageWidth: 59528, pageHeight: 84188 }); const buffer = await converter.convert(markdown); ``` ### React Integration ```tsx import React, { useState } from 'react'; import { convertMarkdownToHwp } from 'md2hwp'; function App() { const [markdown, setMarkdown] = useState('# Hello'); const handleDownload = async () => { const buffer = await convertMarkdownToHwp(markdown); // Create download link const blob = new Blob([buffer], { type: 'application/octet-stream' }); const url = URL.createObjectURL(blob); const link = document.createElement('a'); link.href = url; link.download = 'document.hwp'; link.click(); URL.revokeObjectURL(url); }; return ( <div> <textarea value={markdown} onChange={e => setMarkdown(e.target.value)} /> <button onClick={handleDownload}>Download HWP</button> </div> ); } ``` ### Next.js Integration ```typescript // app/api/convert/route.ts import { convertMarkdownToHwp } from 'md2hwp'; import { NextResponse } from 'next/server'; export async function POST(request: Request) { const { markdown } = await request.json(); const buffer = await convertMarkdownToHwp(markdown, { title: 'Document', author: 'User' }); return new NextResponse(buffer, { headers: { 'Content-Type': 'application/octet-stream', 'Content-Disposition': 'attachment; filename="document.hwp"' } }); } ``` ## Options ```typescript interface Md2HwpOptions { title?: string; // Document title (default: 'Document') author?: string; // Document author (default: 'md2hwp') pageWidth?: number; // Page width in HWPUNIT (default: 59528) pageHeight?: number; // Page height in HWPUNIT (default: 84188) marginLeft?: number; // Left margin in HWPUNIT (default: 8504) marginRight?: number; // Right margin in HWPUNIT (default: 8504) marginTop?: number; // Top margin in HWPUNIT (default: 5668) marginBottom?: number; // Bottom margin in HWPUNIT (default: 4252) } ``` Note: 1 HWPUNIT ≈ 1/7200 inch ## Supported Markdown Features ### ✅ Currently Working #### Headings with Visual Hierarchy ```markdown # Heading 1 (14pt, bold) ## Heading 2 (13pt, bold) ### Heading 3 (12pt, bold) #### Heading 4 (11pt, bold) ##### Heading 5 (10pt, bold) ###### Heading 6 (10pt, bold) ``` #### Text Formatting ```markdown **bold text** *italic text* ``` #### Lists with Proper Spacing ```markdown - Item 1 - Item 2 - Item 3 ``` #### Tables ```markdown | Name | Age | Role | |------|-----|------| | John | 30 | Developer | | Jane | 28 | Designer | ``` #### Korean Text with Natural Wrapping ```markdown # 안녕하세요 이것은 매우 긴 한글 문장입니다. 이 문장은 자간을 압축하지 않고 자연스럽게 다음 줄로 넘어가야 합니다. 문장이 길어서 한 줄에 다 들어가지 않을 때에도 자간을 조종하여 한 줄로 만들지 않습니다. This is a very long English sentence that should wrap to the next line naturally without compressing the character spacing to force everything onto a single line. ``` ### 🚧 Planned Features - Code blocks - Links (currently rendered as plain text) - Images (currently text placeholders) - Nested lists ## API Reference ### convertMarkdownToHwp(markdown, options?) Converts a Markdown string to HWP format. - `markdown` (string): The Markdown content to convert - `options` (Md2HwpOptions): Optional conversion options - Returns: `Promise<Buffer>` - The HWP file as a Buffer ### convertFileToHwp(inputPath, outputPath, options?) Converts a Markdown file to HWP format. - `inputPath` (string): Path to the input Markdown file - `outputPath` (string): Path to the output HWP file - `options` (Md2HwpOptions): Optional conversion options - Returns: `Promise<void>` ### class Md2Hwp The main converter class. #### Constructor ```typescript new Md2Hwp(options?: Md2HwpOptions) ``` #### Methods - `convert(markdown: string): Promise<Buffer>` - Convert Markdown string to HWP - `convertFile(inputPath: string, outputPath: string): Promise<void>` - Convert file ## Examples Check the `examples/` directory for complete examples: - `basic.js` - Basic conversion example - `file-conversion.js` - File-to-file conversion - `react-example.tsx` - React component example ## Technical Details ### HWPX Format The library generates HWPX (HWP 5.x) files, which are ZIP archives containing XML files: - `mimetype` - MIME type identifier (stored uncompressed) - `version.xml` - HWP version information - `settings.xml` - Document settings - `Contents/header.xml` - Document metadata, styles, fonts, and formatting definitions - `Contents/section0.xml` - Main document content - `META-INF/` - Container metadata ### Key Features of Implementation - The `mimetype` file is stored **uncompressed** as required by the HWPX specification - Includes proper `FONTFACELIST` for Korean (맑은 고딕) and Latin (Arial) fonts - Uses graduated character shapes (charPr) for heading hierarchy - Multiple paragraph properties (paraPr) for different line spacing requirements - **Natural line wrapping** via proper `breakNonLatinWord` settings - **No linesegarray** for regular paragraphs - allows HWP to calculate line breaks naturally - Generates valid HWPML (HWP Markup Language) XML structure ### Line Wrapping Implementation The library uses a sophisticated approach to ensure natural line wrapping: 1. **No pre-calculated line breaks** - Regular paragraphs don't include `<hp:linesegarray>` 2. **Proper paragraph settings** - Uses `breakNonLatinWord="BREAK_WORD"` for Korean/CJK text 3. **Zero character spacing** - Prevents automatic spacing compression 4. **Context-aware layout** - HWP calculates optimal line breaks based on content See [docs/Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) for detailed technical analysis. ## Troubleshooting ### File appears corrupted in Hancom Office If you see "파일이 손상되었습니다" (file is corrupted), make sure: 1. You're using the latest version of this library 2. The generated file has the `.hwp` extension 3. Try opening with Hancom Office 2014 or later ### Korean characters not displaying correctly The library includes proper font definitions for Korean text. If you encounter issues: - Ensure your input Markdown is UTF-8 encoded - The library uses "맑은 고딕" (Malgun Gothic) for Korean text - Latin text uses Arial font ### Text appears compressed on one line This issue has been **fixed in v1.2**! If you're still experiencing it: 1. Update to the latest version 2. Check that long sentences now wrap naturally 3. See [docs/Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) for technical details ## Documentation Comprehensive documentation is available in the `docs/` directory: - [IMPROVEMENTS_SUMMARY.md](docs/IMPROVEMENTS_SUMMARY.md) - Summary of all improvements - [Line_Wrapping_Fix.md](docs/Line_Wrapping_Fix.md) - Detailed line wrapping fix analysis - [HWP_Document_Data_Records.md](docs/HWP_Document_Data_Records.md) - HWPTAG reference - [HWP_CharShape_Structure.md](docs/HWP_CharShape_Structure.md) - Character properties ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT ## Acknowledgments - Uses [marked](https://github.com/markedjs/marked) for Markdown parsing - Uses [JSZip](https://github.com/Stuk/jszip) for HWPX file generation - HWP format research and implementation based on Hancom Office documentation