n8n-nodes-pdf-page-split
Version:
n8n nodes to split PDF documents into individual pages and convert DOCX to PDF with page splitting
235 lines (174 loc) • 6.55 kB
Markdown
# n8n-nodes-pdf-page-split

[](https://www.npmjs.com/package/n8n-nodes-pdf-page-split)
[](https://opensource.org/licenses/MIT)
> Powerful n8n community nodes for PDF document processing: split PDFs into pages and convert DOCX to PDF with page splitting
## 🌟 Features
### PDF Page Split Node
- 📄 **PDF Splitting**: Split multi-page PDFs into individual single-page files
- 🎯 **Page Selection**: Process specific page ranges with flexible selection options
- 📝 **Custom Naming**: Configure output file names with prefixes and page numbers
- 🔄 **Batch Processing**: Handle multiple PDFs in a single workflow
### DOCX to PDF Split Node (New in v0.2.0)
- 📑 **DOCX Conversion**: Convert DOCX documents to PDF format
- 📄 **Automatic Splitting**: Split converted PDFs into individual pages
- 🎯 **Page Selection**: Extract specific pages from converted documents
- 📝 **Custom Naming**: Configure output file names with prefixes and page numbers
- 💾 **Optional Full PDF**: Keep the complete converted PDF alongside individual pages
### General Features
- 🚀 **High Performance**: Pure JavaScript implementation for maximum compatibility
- 🐳 **Docker Ready**: Works seamlessly in containerized environments
## 📋 Prerequisites
- n8n version 0.147.0 or newer
- Node.js version 16 or newer
## 💻 Installation
### Via n8n Interface
1. Open your n8n instance
2. Go to **Settings > Community Nodes**
3. Click on **Install**
4. Enter `n8n-nodes-pdf-page-split` in the **Name** field
5. Click **Install**
### Via npm
```bash
npm install n8n-nodes-pdf-page-split
```
### Via yarn
```bash
yarn add n8n-nodes-pdf-page-split
```
## 🔧 Configuration
### PDF Page Split Node
#### Input Parameters
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| Binary Property | string | Name of the binary property containing the PDF file | data |
| Page Range | string | Range of pages to process (e.g., "1-5,8,11-13") | (all pages) |
| File Name Prefix | string | Prefix for output file names | page_ |
| Start Number | number | Starting number for page numbering | 1 |
### DOCX to PDF Split Node
#### Input Parameters
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| Binary Property | string | Name of the binary property containing the DOCX file | data |
| Page Range | string | Range of pages to process (e.g., "1-5,8,11-13") | (all pages) |
| File Name Prefix | string | Prefix for output file names | page_ |
| Start Number | number | Starting number for page numbering | 1 |
| Keep Original PDF | boolean | Include the full converted PDF in output | false |
### Output
Each processed page generates an item with:
- **Binary Data**: The PDF page as a binary file
- **JSON Data**:
- `pageNumber`: Current page number
- `totalPages`: Total pages in original/converted document
- `fileName`: Generated file name
## 📚 Usage Examples
### Basic PDF Splitting
```typescript
// Split a PDF into individual pages
[
{
"node": "PDF Page Split",
"parameters": {
"binaryPropertyName": "data",
"fileNamePrefix": "page_"
}
}
]
```
### Extract Specific Pages from PDF
```typescript
// Extract pages 1-3 and 5
[
{
"node": "PDF Page Split",
"parameters": {
"binaryPropertyName": "data",
"pageRange": "1-3,5",
"fileNamePrefix": "extract_"
}
}
]
```
### Convert DOCX to PDF and Split
```typescript
// Convert DOCX to PDF and split into pages
[
{
"node": "DOCX to PDF Split",
"parameters": {
"binaryPropertyName": "data",
"fileNamePrefix": "converted_page_",
"keepOriginalPdf": true
}
}
]
```
### Convert DOCX and Extract Specific Pages
```typescript
// Convert DOCX and extract specific pages
[
{
"node": "DOCX to PDF Split",
"parameters": {
"binaryPropertyName": "data",
"pageRange": "1-5,10",
"fileNamePrefix": "doc_page_",
"keepOriginalPdf": false
}
}
]
```
## 🔍 Example Workflows
### 1. Split and Save PDF Pages
1. **HTTP Request** → Download PDF from URL
2. **PDF Page Split** → Split into pages
3. **Write Binary File** → Save pages locally
### 2. Process Selected Pages
1. **Read Binary File** → Load local PDF
2. **PDF Page Split** → Extract specific pages
- Set "Page Range" to "1-3,5,10-12"
3. **Google Drive** → Upload selected pages
### 3. Convert DOCX and Process
1. **Read Binary File** → Load DOCX document
2. **DOCX to PDF Split** → Convert and split
- Enable "Keep Original PDF" for complete document
3. **Email Send** → Send individual pages as attachments
### 4. Batch DOCX Processing
1. **Google Drive** → Download DOCX files
2. **DOCX to PDF Split** → Convert each to PDF pages
3. **Compress** → Create ZIP with all pages
4. **S3** → Upload to storage
## ⚠️ Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| "No binary data found" | Ensure previous node outputs binary data |
| Empty PDF output | Verify input PDF is valid and not corrupted |
| Memory errors | Process fewer pages at once for large PDFs |
### Best Practices
- Verify PDF is not password protected
- Use page ranges for large documents
- Monitor memory usage in production
## 🔧 Technical Details
### Libraries Used
- **[pdf-lib](https://pdf-lib.js.org/)**: PDF manipulation and splitting
- **[mammoth](https://github.com/mwilliamson/mammoth.js)**: DOCX text and structure extraction
- **[pdfkit](http://pdfkit.org/)**: PDF generation from extracted content
### Key Features
- Pure JavaScript implementation
- No native dependencies
- Cross-platform compatibility
- Docker-friendly operation
### Limitations
- Does not support password-protected PDFs or DOCX files
- Cannot extract text or metadata
- Maximum file size depends on available memory
- DOCX conversion preserves formatting but may vary from native Office rendering
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
## 📝 License
[MIT](LICENSE)
## 🙏 Acknowledgments
- [n8n](https://n8n.io/) - For the amazing workflow automation platform
- [pdf-lib](https://pdf-lib.js.org/) - For reliable PDF manipulation
- All our contributors and users