n8n-nodes-pdf-page-split

# n8n-nodes-pdf-page-split ![n8n.io - Workflow Automation](https://raw.githubusercontent.com/n8n-io/n8n/master/assets/n8n-logo.png) [![NPM Version](https://img.shields.io/npm/v/n8n-nodes-pdf-page-split.svg)](https://www.npmjs.com/package/n8n-nodes-pdf-page-split) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) > Powerful n8n community nodes for PDF document processing: split PDFs into pages and convert DOCX to PDF with page splitting ## 🌟 Features ### PDF Page Split Node - 📄 **PDF Splitting**: Split multi-page PDFs into individual single-page files - 🎯 **Page Selection**: Process specific page ranges with flexible selection options - 📝 **Custom Naming**: Configure output file names with prefixes and page numbers - 🔄 **Batch Processing**: Handle multiple PDFs in a single workflow ### DOCX to PDF Split Node (New in v0.2.0) - 📑 **DOCX Conversion**: Convert DOCX documents to PDF format - 📄 **Automatic Splitting**: Split converted PDFs into individual pages - 🎯 **Page Selection**: Extract specific pages from converted documents - 📝 **Custom Naming**: Configure output file names with prefixes and page numbers - 💾 **Optional Full PDF**: Keep the complete converted PDF alongside individual pages ### General Features - 🚀 **High Performance**: Pure JavaScript implementation for maximum compatibility - 🐳 **Docker Ready**: Works seamlessly in containerized environments ## 📋 Prerequisites - n8n version 0.147.0 or newer - Node.js version 16 or newer ## 💻 Installation ### Via n8n Interface 1. Open your n8n instance 2. Go to **Settings > Community Nodes** 3. Click on **Install** 4. Enter `n8n-nodes-pdf-page-split` in the **Name** field 5. Click **Install** ### Via npm ```bash npm install n8n-nodes-pdf-page-split ``` ### Via yarn ```bash yarn add n8n-nodes-pdf-page-split ``` ## 🔧 Configuration ### PDF Page Split Node #### Input Parameters | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | Binary Property | string | Name of the binary property containing the PDF file | data | | Page Range | string | Range of pages to process (e.g., "1-5,8,11-13") | (all pages) | | File Name Prefix | string | Prefix for output file names | page_ | | Start Number | number | Starting number for page numbering | 1 | ### DOCX to PDF Split Node #### Input Parameters | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | Binary Property | string | Name of the binary property containing the DOCX file | data | | Page Range | string | Range of pages to process (e.g., "1-5,8,11-13") | (all pages) | | File Name Prefix | string | Prefix for output file names | page_ | | Start Number | number | Starting number for page numbering | 1 | | Keep Original PDF | boolean | Include the full converted PDF in output | false | ### Output Each processed page generates an item with: - **Binary Data**: The PDF page as a binary file - **JSON Data**: - `pageNumber`: Current page number - `totalPages`: Total pages in original/converted document - `fileName`: Generated file name ## 📚 Usage Examples ### Basic PDF Splitting ```typescript // Split a PDF into individual pages [ { "node": "PDF Page Split", "parameters": { "binaryPropertyName": "data", "fileNamePrefix": "page_" } } ] ``` ### Extract Specific Pages from PDF ```typescript // Extract pages 1-3 and 5 [ { "node": "PDF Page Split", "parameters": { "binaryPropertyName": "data", "pageRange": "1-3,5", "fileNamePrefix": "extract_" } } ] ``` ### Convert DOCX to PDF and Split ```typescript // Convert DOCX to PDF and split into pages [ { "node": "DOCX to PDF Split", "parameters": { "binaryPropertyName": "data", "fileNamePrefix": "converted_page_", "keepOriginalPdf": true } } ] ``` ### Convert DOCX and Extract Specific Pages ```typescript // Convert DOCX and extract specific pages [ { "node": "DOCX to PDF Split", "parameters": { "binaryPropertyName": "data", "pageRange": "1-5,10", "fileNamePrefix": "doc_page_", "keepOriginalPdf": false } } ] ``` ## 🔍 Example Workflows ### 1. Split and Save PDF Pages 1. **HTTP Request** → Download PDF from URL 2. **PDF Page Split** → Split into pages 3. **Write Binary File** → Save pages locally ### 2. Process Selected Pages 1. **Read Binary File** → Load local PDF 2. **PDF Page Split** → Extract specific pages - Set "Page Range" to "1-3,5,10-12" 3. **Google Drive** → Upload selected pages ### 3. Convert DOCX and Process 1. **Read Binary File** → Load DOCX document 2. **DOCX to PDF Split** → Convert and split - Enable "Keep Original PDF" for complete document 3. **Email Send** → Send individual pages as attachments ### 4. Batch DOCX Processing 1. **Google Drive** → Download DOCX files 2. **DOCX to PDF Split** → Convert each to PDF pages 3. **Compress** → Create ZIP with all pages 4. **S3** → Upload to storage ## ⚠️ Troubleshooting ### Common Issues | Issue | Solution | |-------|----------| | "No binary data found" | Ensure previous node outputs binary data | | Empty PDF output | Verify input PDF is valid and not corrupted | | Memory errors | Process fewer pages at once for large PDFs | ### Best Practices - Verify PDF is not password protected - Use page ranges for large documents - Monitor memory usage in production ## 🔧 Technical Details ### Libraries Used - **[pdf-lib](https://pdf-lib.js.org/)**: PDF manipulation and splitting - **[mammoth](https://github.com/mwilliamson/mammoth.js)**: DOCX text and structure extraction - **[pdfkit](http://pdfkit.org/)**: PDF generation from extracted content ### Key Features - Pure JavaScript implementation - No native dependencies - Cross-platform compatibility - Docker-friendly operation ### Limitations - Does not support password-protected PDFs or DOCX files - Cannot extract text or metadata - Maximum file size depends on available memory - DOCX conversion preserves formatting but may vary from native Office rendering ## 🤝 Contributing We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details. ## 📝 License [MIT](LICENSE) ## 🙏 Acknowledgments - [n8n](https://n8n.io/) - For the amazing workflow automation platform - [pdf-lib](https://pdf-lib.js.org/) - For reliable PDF manipulation - All our contributors and users