@nutrient-sdk/dws-client-typescript
Version:
Node.js TypeScript client library for Nutrient Document Web Services (DWS) API
1,538 lines (1,194 loc) • 54 kB
Markdown
# Nutrient DWS TypeScript Client Documentation
> Nutrient DWS is a document processing service which provides document processing operations including conversion, merging, compression, watermarking, signage, and text extraction.
## Authentication
### Direct API Key
Provide your API key directly:
```typescript
const client = new NutrientClient({
apiKey: 'nutr_sk_your_secret_key'
});
```
### Token Provider
Use an async token provider to fetch tokens from a secure source:
```typescript
const client = new NutrientClient({
apiKey: async () => {
const response = await fetch('/api/get-nutrient-token');
const { token } = await response.json();
return token;
}
});
```
## NutrientClient
The main client for interacting with the Nutrient DWS Processor API.
### Constructor
```typescript
new NutrientClient(options: NutrientClientOptions)
```
Options:
- `apiKey` (required): Your API key string or async function returning a token
- `baseUrl` (optional): Custom API base URL (defaults to `https://api.nutrient.io`)
- `timeout` (optional): Request timeout in milliseconds
## Direct Methods
The client provides numerous methods for document processing:
### Account Methods
#### getAccountInfo()
Gets account information for the current API key.
**Returns**: `Promise<AccountInfo>` - Promise resolving to account information
```typescript
const accountInfo = await client.getAccountInfo();
// Access subscription information
console.log(accountInfo.subscriptionType);
```
#### createToken(params)
Creates a new authentication token.
**Parameters**:
- `params: CreateAuthTokenParameters` - Parameters for creating the token
**Returns**: `Promise<CreateAuthTokenResponse>` - Promise resolving to the created token information
```typescript
const token = await client.createToken({
expirationTime: 3600
});
console.log(token.id);
// Store the token for future use
const tokenId = token.id;
const tokenValue = token.token;
```
#### deleteToken(id)
Deletes an authentication token.
**Parameters**:
- `id: string` - ID of the token to delete
**Returns**: `Promise<void>` - Promise resolving when the token is deleted
```typescript
await client.deleteToken('token-id-123');
// Example in a token management function
async function revokeUserToken(tokenId) {
try {
await client.deleteToken(tokenId);
console.log(`Token ${tokenId} successfully revoked`);
return true;
} catch (error) {
console.error(`Failed to revoke token: ${error.message}`);
return false;
}
}
```
### Document Processing Methods
#### sign(file, data?, options?)
Signs a PDF document.
**Parameters**:
- `file: FileInput` - The PDF file to sign
- `data?: CreateDigitalSignature` - Signature data
- `options?: { image?: FileInput; graphicImage?: FileInput }` - Additional options
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the signed PDF file output
```typescript
const result = await client.sign('document.pdf', {
signature: {
signatureType: 'cms',
flatten: false,
cadesLevel: 'b-lt',
}
});
// Access the signed PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('signed-document.pdf', Buffer.from(result.buffer));
```
#### createRedactionsAI(file, criteria, redaction_state?, pages?, options?)
Uses AI to redact sensitive information in a document.
**Parameters**:
- `file: FileInput` - The PDF file to redact
- `criteria: string` - AI redaction criteria
- `redaction_state?: 'stage' | 'apply'` - Whether to stage or apply redactions (default: 'stage')
- `pages?: { start?: number; end?: number }` - Optional pages to redact
- `options?: RedactDataOptions` - Optional redaction options
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the redacted document
```typescript
// Stage redactions
const result = await client.createRedactionsAI(
'document.pdf',
'Remove all emails'
);
// Apply redactions immediately
const result = await client.createRedactionsAI(
'document.pdf',
'Remove all PII',
'apply'
);
// Redact only specific pages
const result = await client.createRedactionsAI(
'document.pdf',
'Remove all emails',
'stage',
{ start: 0, end: 4 } // Pages 0, 1, 2, 3, 4
);
// Redact only the last 3 pages
const result = await client.createRedactionsAI(
'document.pdf',
'Remove all PII',
'stage',
{ start: -3, end: -1 } // Last three pages
);
// Access the redacted PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('redacted-document.pdf', Buffer.from(result.buffer));
```
#### ocr(file, language)
Performs OCR (Optical Character Recognition) on a document.
**Parameters**:
- `file: FileInput` - The input file to perform OCR on
- `language: OcrLanguage | OcrLanguage[]` - The language(s) to use for OCR
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the OCR result
```typescript
const result = await client.ocr('scanned-document.pdf', 'english');
// Access the OCR-processed PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('ocr-document.pdf', Buffer.from(result.buffer));
```
#### watermarkText(file, text, options?)
Adds a text watermark to a document.
**Parameters**:
- `file: FileInput` - The input file to watermark
- `text: string` - The watermark text
- `options?: TextWatermarkOptions` - Watermark options
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the watermarked document
```typescript
const result = await client.watermarkText('document.pdf', 'CONFIDENTIAL', {
opacity: 0.5,
fontSize: 24
});
// Access the watermarked PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('watermarked-document.pdf', Buffer.from(result.buffer));
```
#### watermarkImage(file, image, options?)
Adds an image watermark to a document.
**Parameters**:
- `file: FileInput` - The input file to watermark
- `image: FileInput` - The watermark image
- `options?: ImageWatermarkOptions` - Watermark options
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the watermarked document
```typescript
const result = await client.watermarkImage('document.pdf', 'watermark.jpg', {
opacity: 0.5,
width: { value: 50, unit: "%"},
height: { value: 50, unit: "%"}
});
// Access the watermarked PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('image-watermarked-document.pdf', Buffer.from(result.buffer));
```
#### convert(file, targetFormat)
Converts a document to a different format.
**Parameters**:
- `file: FileInput` - The input file to convert
- `targetFormat: string` - The target format to convert to
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the specific output type based on the target format
```typescript
// Convert DOCX to PDF
const pdfResult = await client.convert('document.docx', 'pdf');
// Supports formats: pdf, pdfa, pdfua, docx, xlsx, pptx, png, jpeg, jpg, webp, html, markdown
// Access the PDF buffer
const pdfBuffer = pdfResult.buffer;
console.log(pdfResult.mimeType); // 'application/pdf'
// Save the PDF (Node.js example)
const fs = require('fs');
fs.writeFileSync('converted-document.pdf', Buffer.from(pdfResult.buffer));
// Convert PDF to image
const imageResult = await client.convert('document.pdf', 'png');
// Access the PNG buffer
const pngBuffer = imageResult.buffer;
console.log(imageResult.mimeType); // 'image/png'
// Save the image (Node.js example)
fs.writeFileSync('document-page.png', Buffer.from(imageResult.buffer));
```
#### merge(files)
Merges multiple documents into one.
**Parameters**:
- `files: FileInput[]` - The files to merge
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the merged document
```typescript
const result = await client.merge([
'doc1.pdf',
'doc2.pdf',
'doc3.pdf'
]);
// Access the merged PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('merged-document.pdf', Buffer.from(result.buffer));
```
#### extractText(file, pages?)
Extracts text content from a document.
**Parameters**:
- `file: FileInput` - The file to extract text from
- `pages?: { start?: number; end?: number }` - Optional page range to extract text from
**Returns**: `Promise<OutputTypeMap['json-content']>` - Promise resolving to the extracted text data
```typescript
const result = await client.extractText('document.pdf');
// Extract text from specific pages
const result = await client.extractText('document.pdf', { start: 0, end: 2 }); // Pages 0, 1, 2
// Extract text from the last page
const result = await client.extractText('document.pdf', { end: -1 }); // Last page
// Extract text from the second-to-last page to the end
const result = await client.extractText('document.pdf', { start: -2 }); // Second-to-last and last page
// Access the extracted text content
const textContent = result.data.pages[0].plainText;
// Process the extracted text
const wordCount = textContent.split(/\s+/).length;
console.log(`Document contains ${wordCount} words`);
// Search for specific content
if (textContent.includes('confidential')) {
console.log('Document contains confidential information');
}
```
#### extractTable(file, pages?)
Extracts table content from a document.
**Parameters**:
- `file: FileInput` - The file to extract tables from
- `pages?: { start?: number; end?: number }` - Optional page range to extract tables from
**Returns**: `Promise<OutputTypeMap['json-content']>` - Promise resolving to the extracted table data
```typescript
const result = await client.extractTable('document.pdf');
// Extract tables from specific pages
const result = await client.extractTable('document.pdf', { start: 0, end: 2 }); // Pages 0, 1, 2
// Extract tables from the last page
const result = await client.extractTable('document.pdf', { end: -1 }); // Last page
// Extract tables from the second-to-last page to the end
const result = await client.extractTable('document.pdf', { start: -2 }); // Second-to-last and last page
// Access the extracted tables
const tables = result.data.pages[0].tables;
// Process the first table if available
if (tables && tables.length > 0) {
const firstTable = tables[0];
// Get table dimensions
console.log(`Table has ${firstTable.rows.length} rows and ${firstTable.columns.length} columns`);
// Access table cells
for (let i = 0; i < firstTable.rows.length; i++) {
for (let j = 0; j < firstTable.columns.length; j++) {
const cell = firstTable.cells.find(cell => cell.rowIndex === i && cell.columnIndex === j);
const cellContent = cell?.text || '';
console.log(`Cell [${i}][${j}]: ${cellContent}`);
}
}
// Convert table to CSV
let csv = '';
for (let i = 0; i < firstTable.rows.length; i++) {
const rowData = [];
for (let j = 0; j < firstTable.columns.length; j++) {
const cell = firstTable.cells.find(cell => cell.rowIndex === i && cell.columnIndex === j);
rowData.push(cell?.text || '');
}
csv += rowData.join(',') + '\n';
}
console.log(csv);
}
```
#### extractKeyValuePairs(file, pages?)
Extracts key value pair content from a document.
**Parameters**:
- `file: FileInput` - The file to extract KVPs from
- `pages?: { start?: number; end?: number }` - Optional page range to extract KVPs from
**Returns**: `Promise<OutputTypeMap['json-content']>` - Promise resolving to the extracted KVPs data
```typescript
const result = await client.extractKeyValuePairs('document.pdf');
// Extract KVPs from specific pages
const result = await client.extractKeyValuePairs('document.pdf', { start: 0, end: 2 }); // Pages 0, 1, 2
// Extract KVPs from the last page
const result = await client.extractKeyValuePairs('document.pdf', { end: -1 }); // Last page
// Extract KVPs from the second-to-last page to the end
const result = await client.extractKeyValuePairs('document.pdf', { start: -2 }); // Second-to-last and last page
// Access the extracted key-value pairs
const kvps = result.data.pages[0].keyValuePairs;
// Process the key-value pairs
if (kvps && kvps.length > 0) {
// Iterate through all key-value pairs
kvps.forEach((kvp, index) => {
console.log(`KVP ${index + 1}:`);
console.log(` Key: ${kvp.key}`);
console.log(` Value: ${kvp.value}`);
console.log(` Confidence: ${kvp.confidence}`);
});
// Create a dictionary from the key-value pairs
const dictionary = {};
kvps.forEach(kvp => {
dictionary[kvp.key] = kvp.value;
});
// Look up specific values
console.log(`Invoice Number: ${dictionary['Invoice Number']}`);
console.log(`Date: ${dictionary['Date']}`);
console.log(`Total Amount: ${dictionary['Total']}`);
}
```
#### flatten(file, annotationIds?)
Flattens annotations in a PDF document.
```typescript
const result = await client.flatten('annotated-document.pdf');
```
#### rotate(file, angle, pages?)
Rotates pages in a document.
```typescript
const result = await client.rotate('document.pdf', 90);
// Rotate specific pages:
const result = await client.rotate('document.pdf', 90, { start: 1, end: 3 }); // Pages 1, 2, 3
// Rotate the last page:
const result = await client.rotate('document.pdf', 90, { end: -1 }); // Last page
// Rotate from page 2 to the second-to-last page:
const result = await client.rotate('document.pdf', 90, { start: 2, end: -2 });
```
#### passwordProtect(file, userPassword, ownerPassword, permissions?)
Password protects a PDF document.
**Parameters**:
- `file: FileInput` - The file to protect
- `userPassword: string` - Password required to open the document
- `ownerPassword: string` - Password required to modify the document
- `permissions?: PDFUserPermission[]` - Optional array of permissions granted when opened with user password
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the password-protected document
```typescript
const result = await client.passwordProtect('document.pdf', 'user123', 'owner456');
// Or with specific permissions:
const result = await client.passwordProtect('document.pdf', 'user123', 'owner456',
['printing', 'extract_accessibility']);
// Access the password-protected PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('protected-document.pdf', Buffer.from(result.buffer));
```
#### setMetadata(file, metadata)
Sets metadata for a PDF document.
```typescript
const result = await client.setMetadata('document.pdf', {
title: 'My Document',
author: 'John Doe'
});
```
#### setPageLabels(file, labels)
Sets page labels for a PDF document.
```typescript
const result = await client.setPageLabels('document.pdf', [
{ pages: [0, 1, 2], label: 'Cover' },
{ pages: [3, 4, 5], label: 'Chapter 1' }
]);
```
#### applyInstantJson(file, instantJsonFile)
Applies Instant JSON to a document.
```typescript
const result = await client.applyInstantJson('document.pdf', 'annotations.json');
```
#### applyXfdf(file, xfdfFile, options?)
Applies XFDF to a document.
```typescript
const result = await client.applyXfdf('document.pdf', 'annotations.xfdf');
```
#### createRedactionsPreset(file, preset, redaction_state?, pages?, presetOptions?, options?)
Creates redaction annotations based on a preset pattern.
```typescript
const result = await client.createRedactionsPreset('document.pdf', 'email-address');
// With specific pages
const result = await client.createRedactionsPreset(
'document.pdf',
'email-address',
'stage',
{ start: 0, end: 4 } // Pages 0, 1, 2, 3, 4
);
// With the last 3 pages
const result = await client.createRedactionsPreset(
'document.pdf',
'email-address',
'stage',
{ start: -3, end: -1 } // Last three pages
);
```
#### createRedactionsRegex(file, regex, redaction_state?, pages?, regexOptions?, options?)
Creates redaction annotations based on a regular expression.
```typescript
const result = await client.createRedactionsRegex('document.pdf', 'Account:\\s*\\d{8,12}');
// With specific pages
const result = await client.createRedactionsRegex(
'document.pdf',
'Account:\\s*\\d{8,12}',
'stage',
{ start: 0, end: 4 } // Pages 0, 1, 2, 3, 4
);
// With the last 3 pages
const result = await client.createRedactionsRegex(
'document.pdf',
'Account:\\s*\\d{8,12}',
'stage',
{ start: -3, end: -1 } // Last three pages
);
```
#### createRedactionsText(file, text, redaction_state?, pages?, textOptions?, options?)
Creates redaction annotations based on text.
```typescript
const result = await client.createRedactionsText('document.pdf', 'email@example.com');
// With specific pages and options
const result = await client.createRedactionsText(
'document.pdf',
'email@example.com',
'stage',
{ start: 0, end: 4 }, // Pages 0, 1, 2, 3, 4
{ caseSensitive: false, includeAnnotations: true }
);
// Create redactions on the last 3 pages
const result = await client.createRedactionsText(
'document.pdf',
'email@example.com',
'stage',
{ start: -3, end: -1 } // Last three pages
);
```
#### applyRedactions(file)
Applies redaction annotations in a document.
```typescript
const result = await client.applyRedactions('document-with-redactions.pdf');
```
#### addPage(file, count?, index?)
Adds blank pages to a document.
```typescript
// Add 2 blank pages at the end
const result = await client.addPage('document.pdf', 2);
// Add 1 blank page after the first page (at index 1)
const result = await client.addPage('document.pdf', 1, 1);
```
#### optimize(file, options?)
Optimizes a PDF document for size reduction.
```typescript
const result = await client.optimize('large-document.pdf', {
grayscaleImages: true,
mrcCompression: true,
imageOptimizationQuality: 2
});
```
#### split(file, pageRanges)
Splits a PDF document into multiple parts based on page ranges.
**Parameters**:
- `file: FileInput` - The PDF file to split
- `pageRanges: { start?: number; end?: number }[]` - Array of page ranges to extract
**Returns**: `Promise<WorkflowOutput[]>` - Promise resolving to an array of PDF documents, one for each page range
```typescript
const results = await client.split('document.pdf', [
{ start: 0, end: 2 }, // Pages 0, 1, 2
{ start: 3, end: 5 } // Pages 3, 4, 5
]);
// Split using negative indices
const results = await client.split('document.pdf', [
{ start: 0, end: 2 }, // First three pages
{ start: 3, end: -3 }, // Middle pages
{ start: -2, end: -1 } // Last two pages
]);
// Process each resulting PDF
for (const result of results) {
// Access the PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync(`split-part-${i}.pdf`, Buffer.from(result.buffer));
}
```
#### duplicatePages(file, pageIndices)
Creates a new PDF containing only the specified pages in the order provided.
**Parameters**:
- `file: FileInput` - The PDF file to extract pages from
- `pageIndices: number[]` - Array of page indices to include in the new PDF (0-based)
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to a new document with only the specified pages
```typescript
// Create a new PDF with only the first and third pages
const result = await client.duplicatePages('document.pdf', [0, 2]);
// Create a new PDF with pages in a different order
const result = await client.duplicatePages('document.pdf', [2, 0, 1]);
// Create a new PDF with duplicated pages
const result = await client.duplicatePages('document.pdf', [0, 0, 1, 1, 0]);
// Create a new PDF with the first and last pages
const result = await client.duplicatePages('document.pdf', [0, -1]);
// Create a new PDF with the last three pages in reverse order
const result = await client.duplicatePages('document.pdf', [-1, -2, -3]);
// Access the PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('duplicated-pages.pdf', Buffer.from(result.buffer));
```
#### deletePages(file, pageIndices)
Deletes pages from a PDF document.
**Parameters**:
- `file: FileInput` - The PDF file to modify
- `pageIndices: number[]` - Array of page indices to delete (0-based)
**Returns**: `Promise<WorkflowOutput>` - Promise resolving to the document with deleted pages
```typescript
// Delete second and fourth pages
const result = await client.deletePages('document.pdf', [1, 3]);
// Delete the last page
const result = await client.deletePages('document.pdf', [-1]);
// Delete the first and last two pages
const result = await client.deletePages('document.pdf', [0, -1, -2]);
// Access the modified PDF buffer
const pdfBuffer = result.buffer;
// Get the MIME type of the output
console.log(result.mimeType); // 'application/pdf'
// Save the buffer to a file (Node.js example)
const fs = require('fs');
fs.writeFileSync('modified-document.pdf', Buffer.from(result.buffer));
```
### Error Handling
The library provides a comprehensive error hierarchy:
```typescript
import {
NutrientError,
ValidationError,
APIError,
AuthenticationError,
NetworkError
} from '@nutrient-sdk/dws-client-typescript';
try {
const result = await client.convert('file.docx', 'pdf');
} catch (error) {
if (error instanceof ValidationError) {
// Invalid input parameters
console.error('Invalid input:', error.message, error.details);
} else if (error instanceof AuthenticationError) {
// Authentication failed
console.error('Auth error:', error.message, error.statusCode);
} else if (error instanceof APIError) {
// API returned an error
console.error('API error:', error.message, error.statusCode, error.details);
} else if (error instanceof NetworkError) {
// Network request failed
console.error('Network error:', error.message, error.details);
}
}
```
## Workflow Methods
The Nutrient DWS TypeScript Client uses a fluent builder pattern with staged interfaces to create document processing workflows. This architecture provides several benefits:
1. **Type Safety**: The staged interface ensures that methods are only available at appropriate stages
2. **Readability**: Method chaining creates readable, declarative code
3. **Discoverability**: IDE auto-completion guides you through the workflow stages
4. **Flexibility**: Complex workflows can be built with simple, composable pieces
### Stage 0: Create Workflow
You have several ways of creating a workflow
```typescript
// Creating Workflow from a client
const workflow = client.workflow()
// Override the client timeout
const workflow = client.workflow(60000)
// Create a workflow without a client
const workflow = new StagedWorkflowBuilder({
apiKey: "your-api-key",
})
```
### Stage 1: Add Parts
In this stage, you add document parts to the workflow:
```typescript
const workflow = client.workflow()
.addFilePart('document.pdf')
.addFilePart('appendix.pdf');
```
Available methods:
#### `addFilePart(file, options?, actions?)`
Adds a file part to the workflow.
**Parameters:**
- `file: FileInput` - The file to add to the workflow. Can be a local file path, Buffer, or URL.
- `options?: object` - Additional options for the file part.
- `actions?: BuildAction[]` - Actions to apply to the file part.
**Returns:** `WorkflowWithPartsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Add a PDF file from a local path
workflow.addFilePart('/path/to/document.pdf');
// Add a file with options and actions
workflow.addFilePart(
'/path/to/document.pdf',
{ pages: { start: 1, end: 3 } },
[BuildActions.watermarkText('CONFIDENTIAL')]
);
```
#### `addHtmlPart(html, assets?, options?, actions?)`
Adds an HTML part to the workflow.
**Parameters:**
- `html: FileInput` - The HTML content to add. Can be a file path, Buffer, or URL.
- `assets?: FileInput[]` - Optional array of assets (CSS, images, etc.) to include with the HTML. Only local files or Buffers are supported (not URLs).
- `options?: object` - Additional options for the HTML part.
- `actions?: BuildAction[]` - Actions to apply to the HTML part.
**Returns:** `WorkflowWithPartsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Add HTML content from a file
workflow.addHtmlPart('/path/to/content.html');
// Add HTML with assets and options
workflow.addHtmlPart(
'/path/to/content.html',
['/path/to/style.css', '/path/to/image.png'],
{ layout: { size: 'A4' } }
);
```
#### `addNewPage(options?, actions?)`
Adds a new blank page to the workflow.
**Parameters:**
- `options?: object` - Additional options for the new page, such as page size, orientation, etc.
- `actions?: BuildAction[]` - Actions to apply to the new page.
**Returns:** `WorkflowWithPartsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Add a simple blank page
workflow.addNewPage();
// Add a new page with specific options
workflow.addNewPage(
{ layout: { size: 'A4', orientation: 'portrait' } }
);
```
#### `addDocumentPart(documentId, options?, actions?)`
Adds a document part to the workflow by referencing an existing document by ID.
**Parameters:**
- `documentId: string` - The ID of the document to add to the workflow.
- `options?: object` - Additional options for the document part.
- `options.layer?: string` - Optional layer name to select a specific layer from the document.
- `actions?: BuildAction[]` - Actions to apply to the document part.
**Returns:** `WorkflowWithPartsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Add a document by ID
workflow.addDocumentPart('doc_12345abcde');
// Add a document with a specific layer and options
workflow.addDocumentPart(
'doc_12345abcde',
{
layer: 'content',
pages: { start: 0, end: 3 }
}
);
```
### Stage 2: Apply Actions (Optional)
In this stage, you can apply actions to the document:
```typescript
workflow.applyAction(BuildActions.watermarkText('CONFIDENTIAL', {
opacity: 0.5,
fontSize: 48
}));
```
Available methods:
#### `applyAction(action)`
Applies a single action to the workflow.
**Parameters:**
- `action: BuildAction` - The action to apply to the workflow.
**Returns:** `WorkflowWithActionsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Apply a watermark action
workflow.applyAction(
BuildActions.watermarkText('CONFIDENTIAL', {
opacity: 0.3,
rotation: 45
})
);
// Apply an OCR action
workflow.applyAction(BuildActions.ocr('eng'));
```
#### `applyActions(actions)`
Applies multiple actions to the workflow.
**Parameters:**
- `actions: BuildAction[]` - An array of actions to apply to the workflow.
**Returns:** `WorkflowWithActionsStage` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Apply multiple actions to the workflow
workflow.applyActions([
BuildActions.watermarkText('DRAFT', { opacity: 0.5 }),
BuildActions.ocr('eng'),
BuildActions.flatten()
]);
```
#### Action Types:
#### Document Processing
##### `BuildActions.ocr(language)`
Creates an OCR (Optical Character Recognition) action to extract text from images or scanned documents.
**Parameters:**
- `language: string | string[]` - Language(s) for OCR. Can be a single language or an array of languages.
**Example:**
```typescript
// Basic OCR with English language
workflow.applyAction(BuildActions.ocr('english'));
// OCR with multiple languages
workflow.applyAction(BuildActions.ocr(['english', 'french', 'german']));
// OCR with options (via object syntax)
workflow.applyAction(BuildActions.ocr({
language: 'english',
enhanceResolution: true
}));
```
##### `BuildActions.rotate(rotateBy)`
Creates an action to rotate pages in the document.
**Parameters:**
- `rotateBy: 90 | 180 | 270` - Rotation angle in degrees (must be 90, 180, or 270).
**Example:**
```typescript
// Rotate pages by 90 degrees
workflow.applyAction(BuildActions.rotate(90));
// Rotate pages by 180 degrees
workflow.applyAction(BuildActions.rotate(180));
```
##### `BuildActions.flatten(annotationIds?)`
Creates an action to flatten annotations into the document content, making them non-interactive but permanently visible.
**Parameters:**
- `annotationIds?: (string | number)[]` - Optional array of annotation IDs to flatten. If not specified, all annotations will be flattened.
**Example:**
```typescript
// Flatten all annotations
workflow.applyAction(BuildActions.flatten());
// Flatten specific annotations
workflow.applyAction(BuildActions.flatten(['annotation1', 'annotation2']));
```
#### Watermarking
##### `BuildActions.watermarkText(text, options?)`
Creates an action to add a text watermark to the document.
**Parameters:**
- `text: string` - Watermark text content.
- `options?: object` - Watermark options:
- `width`: Width dimension of the watermark (value and unit, e.g. `{value: 100, unit: '%'}`)
- `height`: Height dimension of the watermark (value and unit)
- `top`, `right`, `bottom`, `left`: Position of the watermark (value and unit)
- `rotation`: Rotation of the watermark in counterclockwise degrees (default: 0)
- `opacity`: Watermark opacity (0 is fully transparent, 1 is fully opaque)
- `fontFamily`: Font family for the text (e.g. 'Helvetica')
- `fontSize`: Size of the text in points
- `fontColor`: Foreground color of the text (e.g. '#ffffff')
- `fontStyle`: Text style array ('bold', 'italic', or both)
**Example:**
```typescript
// Simple text watermark
workflow.applyAction(BuildActions.watermarkText('CONFIDENTIAL'));
// Customized text watermark
workflow.applyAction(BuildActions.watermarkText('DRAFT', {
opacity: 0.5,
rotation: 45,
fontSize: 36,
fontColor: '#FF0000',
fontStyle: ['bold', 'italic']
}));
```
##### `BuildActions.watermarkImage(image, options?)`
Creates an action to add an image watermark to the document.
**Parameters:**
- `image: FileInput` - Watermark image (file path, Buffer, or URL).
- `options?: object` - Watermark options:
- `width`: Width dimension of the watermark (value and unit, e.g. `{value: 100, unit: '%'}`)
- `height`: Height dimension of the watermark (value and unit)
- `top`, `right`, `bottom`, `left`: Position of the watermark (value and unit)
- `rotation`: Rotation of the watermark in counterclockwise degrees (default: 0)
- `opacity`: Watermark opacity (0 is fully transparent, 1 is fully opaque)
**Example:**
```typescript
// Simple image watermark
workflow.applyAction(BuildActions.watermarkImage('/path/to/logo.png'));
// Customized image watermark
workflow.applyAction(BuildActions.watermarkImage('/path/to/logo.png', {
opacity: 0.3,
width: { value: 50, unit: '%' },
height: { value: 50, unit: '%' },
top: { value: 10, unit: 'px' },
left: { value: 10, unit: 'px' },
rotation: 0
}));
```
#### Annotations
##### `BuildActions.applyInstantJson(file)`
Creates an action to apply annotations from an Instant JSON file to the document.
**Parameters:**
- `file: FileInput` - Instant JSON file input (file path, Buffer, or URL).
**Example:**
```typescript
// Apply annotations from Instant JSON file
workflow.applyAction(BuildActions.applyInstantJson('/path/to/annotations.json'));
```
##### `BuildActions.applyXfdf(file, options?)`
Creates an action to apply annotations from an XFDF file to the document.
**Parameters:**
- `file: FileInput` - XFDF file input (file path, Buffer, or URL).
- `options?: object` - Apply XFDF options:
- `ignorePageRotation?: boolean` - If true, ignores page rotation when applying XFDF data (default: false)
- `richTextEnabled?: boolean` - If true, plain text annotations will be converted to rich text annotations. If false, all text annotations will be plain text annotations (default: true)
**Example:**
```typescript
// Apply annotations from XFDF file with default options
workflow.applyAction(BuildActions.applyXfdf('/path/to/annotations.xfdf'));
// Apply annotations with specific options
workflow.applyAction(BuildActions.applyXfdf('/path/to/annotations.xfdf', {
ignorePageRotation: true,
richTextEnabled: false
}));
```
#### Redactions
##### `BuildActions.createRedactionsText(text, options?, strategyOptions?)`
Creates an action to add redaction annotations based on text search.
**Parameters:**
- `text: string` - Text to search and redact.
- `options?: object` - Redaction options:
- `content?: object` - Visual aspects of the redaction annotation (background color, overlay text, etc.)
- `strategyOptions?: object` - Redaction strategy options:
- `includeAnnotations?: boolean` - If true, redaction annotations are created on top of annotations whose content match the provided text (default: true)
- `caseSensitive?: boolean` - If true, the search will be case sensitive (default: false)
- `start?: number` - The index of the page from where to start the search (default: 0)
- `limit?: number` - Starting from start, the number of pages to search (default: to the end of the document)
**Example:**
```typescript
// Create redactions for all occurrences of "Confidential"
workflow.applyAction(BuildActions.createRedactionsText('Confidential'));
// Create redactions with custom appearance and search options
workflow.applyAction(BuildActions.createRedactionsText('Confidential',
{
content: {
backgroundColor: '#000000',
overlayText: 'REDACTED',
textColor: '#FFFFFF'
}
},
{
caseSensitive: true,
start: 2,
limit: 5
}
));
```
##### `BuildActions.createRedactionsRegex(regex, options?, strategyOptions?)`
Creates an action to add redaction annotations based on regex pattern matching.
**Parameters:**
- `regex: string` - Regex pattern to search and redact.
- `options?: object` - Redaction options:
- `content?: object` - Visual aspects of the redaction annotation (background color, overlay text, etc.)
- `strategyOptions?: object` - Redaction strategy options:
- `includeAnnotations?: boolean` - If true, redaction annotations are created on top of annotations whose content match the provided regex (default: true)
- `caseSensitive?: boolean` - If true, the search will be case sensitive (default: true)
- `start?: number` - The index of the page from where to start the search (default: 0)
- `limit?: number` - Starting from start, the number of pages to search (default: to the end of the document)
**Example:**
```typescript
// Create redactions for email addresses
workflow.applyAction(BuildActions.createRedactionsRegex('[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}'));
// Create redactions with custom appearance and search options
workflow.applyAction(BuildActions.createRedactionsRegex('[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}',
{
content: {
backgroundColor: '#FF0000',
overlayText: 'EMAIL REDACTED'
}
},
{
caseSensitive: false,
start: 0,
limit: 10
}
));
```
##### `BuildActions.createRedactionsPreset(preset, options?, strategyOptions?)`
Creates an action to add redaction annotations based on a preset pattern.
**Parameters:**
- `preset: string` - Preset pattern to search and redact (e.g. 'email-address', 'credit-card-number', 'social-security-number', etc.)
- `options?: object` - Redaction options:
- `content?: object` - Visual aspects of the redaction annotation (background color, overlay text, etc.)
- `strategyOptions?: object` - Redaction strategy options:
- `includeAnnotations?: boolean` - If true, redaction annotations are created on top of annotations whose content match the provided preset (default: true)
- `start?: number` - The index of the page from where to start the search (default: 0)
- `limit?: number` - Starting from start, the number of pages to search (default: to the end of the document)
**Example:**
```typescript
// Create redactions for email addresses using preset
workflow.applyAction(BuildActions.createRedactionsPreset('email-address'));
// Create redactions for credit card numbers with custom appearance
workflow.applyAction(BuildActions.createRedactionsPreset('credit-card-number',
{
content: {
backgroundColor: '#000000',
overlayText: 'FINANCIAL DATA'
}
},
{
start: 0,
limit: 5
}
));
```
##### `BuildActions.applyRedactions()`
Creates an action to apply previously created redaction annotations, permanently removing the redacted content.
**Example:**
```typescript
// First create redactions
workflow.applyAction(BuildActions.createRedactionsPreset('email-address'));
// Then apply them
workflow.applyAction(BuildActions.applyRedactions());
```
### Stage 3: Set Output Format
In this stage, you specify the desired output format:
```typescript
workflow.outputPdf({
optimize: {
mrcCompression: true,
imageOptimizationQuality: 2
}
});
```
Available methods:
#### `outputPdf(options?)`
Sets the output format to PDF.
**Parameters:**
- `options?: object` - Additional options for PDF output, such as compression, encryption, etc.
- `options.metadata?: object` - Document metadata properties like title, author.
- `options.labels?: array` - Custom labels to add to the document for organization and categorization.
- `options.userPassword?: string` - Password required to open the document. When set, the PDF will be encrypted.
- `options.ownerPassword?: string` - Password required to modify the document. Provides additional security beyond the user password.
- `options.userPermissions?: array` - Array of permissions granted to users who open the document with the user password.
Options include: "printing", "modification", "content-copying", "annotation", "form-filling", etc.
- `options.optimize?: object` - PDF optimization settings to reduce file size and improve performance.
- `options.optimize.mrcCompression?: boolean` - When true, applies Mixed Raster Content compression to reduce file size.
- `options.optimize.imageOptimizationQuality?: number` - Controls the quality of image optimization (1-5, where 1 is highest quality).
**Returns:** `WorkflowWithOutputStage<'pdf'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to PDF with default options
workflow.outputPdf();
// Set output format to PDF with specific options
workflow.outputPdf({
userPassword: 'secret',
userPermissions: ["printing"],
metadata: {
title: 'Important Document',
author: 'Document System'
},
optimize: {
mrcCompression: true,
imageOptimizationQuality: 3
}
});
```
#### `outputPdfA(options?)`
Sets the output format to PDF/A (archival PDF).
**Parameters:**
- `options?: object` - Additional options for PDF/A output.
- `options.conformance?: string` - The PDF/A conformance level to target. Options include 'pdfa-1b', 'pdfa-1a', 'pdfa-2b', 'pdfa-2a', 'pdfa-3b', 'pdfa-3a'.
Different levels have different requirements for long-term archiving.
- `options.vectorization?: boolean` - When true, attempts to convert raster content to vector graphics where possible, improving quality and reducing file size.
- `options.rasterization?: boolean` - When true, converts vector graphics to raster images, which can help with compatibility in some cases.
- `options.metadata?: object` - Document metadata properties like title, author.
- `options.labels?: array` - Custom labels to add to the document for organization and categorization.
- `options.userPassword?: string` - Password required to open the document. When set, the PDF will be encrypted.
- `options.ownerPassword?: string` - Password required to modify the document. Provides additional security beyond the user password.
- `options.userPermissions?: array` - Array of permissions granted to users who open the document with the user password.
Options include: "printing", "modification", "content-copying", "annotation", "form-filling", etc.
- `options.optimize?: object` - PDF optimization settings to reduce file size and improve performance.
- `options.optimize.mrcCompression?: boolean` - When true, applies Mixed Raster Content compression to reduce file size.
- `options.optimize.imageOptimizationQuality?: number` - Controls the quality of image optimization (1-5, where 1 is highest quality).
**Returns:** `WorkflowWithOutputStage<'pdfa'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to PDF/A with default options
workflow.outputPdfA();
// Set output format to PDF/A with specific options
workflow.outputPdfA({
conformance: 'pdfa-2b',
vectorization: true,
metadata: {
title: 'Archive Document',
author: 'Document System'
},
optimize: {
mrcCompression: true
}
});
```
#### `outputPdfUA(options?)`
Sets the output format to PDF/UA (Universal Accessibility).
**Parameters:**
- `options?: object` - Additional options for PDF/UA output.
- `options.metadata?: object` - Document metadata properties like title, author.
- `options.labels?: array` - Custom labels to add to the document for organization and categorization.
- `options.userPassword?: string` - Password required to open the document. When set, the PDF will be encrypted.
- `options.ownerPassword?: string` - Password required to modify the document. Provides additional security beyond the user password.
- `options.userPermissions?: array` - Array of permissions granted to users who open the document with the user password.
Options include: "printing", "modification", "content-copying", "annotation", "form-filling", etc.
- `options.optimize?: object` - PDF optimization settings to reduce file size and improve performance.
- `options.optimize.mrcCompression?: boolean` - When true, applies Mixed Raster Content compression to reduce file size.
- `options.optimize.imageOptimizationQuality?: number` - Controls the quality of image optimization (1-5, where 1 is highest quality).
**Returns:** `WorkflowWithOutputStage<'pdfua'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to PDF/UA with default options
workflow.outputPdfUA();
// Set output format to PDF/UA with specific options
workflow.outputPdfUA({
metadata: {
title: 'Accessible Document',
author: 'Document System'
},
optimize: {
mrcCompression: true,
imageOptimizationQuality: 3
}
});
```
#### `outputImage(format, options?)`
Sets the output format to an image format (PNG, JPEG, WEBP).
**Parameters:**
- `format: 'png' | 'jpeg' | 'jpg' | 'webp'` - The image format to output.
- PNG: Lossless compression, supports transparency, best for graphics and screenshots
- JPEG/JPG: Lossy compression, smaller file size, best for photographs
- WEBP: Modern format with both lossy and lossless compression, good for web use
- `options?: object` - Additional options for image output, such as resolution, quality, etc.
**Note: At least one of options.width, options.height, or options.dpi must be specified.**
- `options.pages?: object` - Specifies which pages to convert to images. If omitted, all pages are converted.
- `options.pages.start?: number` - The first page to convert (0-based index).
- `options.pages.end?: number` - The last page to convert (0-based index).
- `options.width?: number` - The width of the output image in pixels. If specified without height, aspect ratio is maintained.
- `options.height?: number` - The height of the output image in pixels. If specified without width, aspect ratio is maintained.
- `options.dpi?: number` - The resolution in dots per inch. Higher values create larger, more detailed images.
Common values: 72 (web), 150 (standard), 300 (print quality), 600 (high quality).
**Returns:** `WorkflowWithOutputStage<format>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to PNG with dpi specified
workflow.outputImage('png', { dpi: 300 });
// Set output format to JPEG with specific options
workflow.outputImage('jpeg', {
dpi: 300,
pages: { start: 1, end: 3 }
});
// Set output format to WEBP with specific dimensions
workflow.outputImage('webp', {
width: 1200,
height: 800,
dpi: 150
});
```
#### `outputOffice(format)`
Sets the output format to an Office document format (DOCX, XLSX, PPTX).
**Parameters:**
- `format: 'docx' | 'xlsx' | 'pptx'` - The Office format to output ('docx' for Word, 'xlsx' for Excel, or 'pptx' for PowerPoint).
**Returns:** `WorkflowWithOutputStage<format>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to Word document (DOCX)
workflow.outputOffice('docx');
// Set output format to Excel spreadsheet (XLSX)
workflow.outputOffice('xlsx');
// Set output format to PowerPoint presentation (PPTX)
workflow.outputOffice('pptx');
```
#### `outputHtml(layout)`
Sets the output format to HTML.
**Parameters:**
- `layout: 'page' | 'reflow'` - The layout type to use for conversion to HTML:
- 'page' layout keeps the original structure of the document, segmented by page.
- 'reflow' layout converts the document into a continuous flow of text, without page breaks.
**Returns:** `WorkflowWithOutputStage<'html'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to HTML
workflow.outputHtml('page');
```
#### `outputMarkdown()`
Sets the output format to Markdown.
**Returns:** `WorkflowWithOutputStage<'markdown'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to Markdown with default options
workflow.outputMarkdown();
```
#### `outputJson(options?)`
Sets the output format to JSON content.
**Parameters:**
- `options?: object` - Additional options for JSON output.
- `options.plainText?: boolean` - When true, extracts plain text content from the document and includes it in the JSON output.
This provides the raw text without structural information.
- `options.structuredText?: boolean` - When true, extracts text with structural information (paragraphs, headings, etc.)
and includes it in the JSON output.
- `options.keyValuePairs?: boolean` - When true, attempts to identify and extract key-value pairs from the document
(like form fields, labeled data, etc.) and includes them in the JSON output.
- `options.tables?: boolean` - When true, attempts to identify and extract tabular data from the document
and includes it in the JSON output as structured table objects.
- `options.language?: string | string[]` - Specifies the language(s) of the document content for better text extraction.
Can be a single language code or an array of language codes for multi-language documents.
Examples: "english", "french", "german", or ["english", "spanish"].
**Returns:** `WorkflowWithOutputStage<'json-content'>` - The workflow builder instance for method chaining.
**Example:**
```typescript
// Set output format to JSON with default options
workflow.outputJson();
// Set output format to JSON with specific options
workflow.outputJson({
plainText: true,
structuredText: true,
keyValuePairs: true,
tables: true,
language: "english"
});
// Set output format to JSON with multiple languages
workflow.outputJson({
plainText: true,
tables: true,
language: ["english", "french", "german"]
});
```
### Stage 4: Execute or Dry Run
In this final stage, you execute the workflow or perform a dry run:
```typescript
const result = await workflow.execute();
```
Available methods:
#### `execute(options?)`
Executes the workflow and returns the result.
**Parameters:**
- `options?: WorkflowExecuteOptions` - Options for workflow execution.
- `options.onProgress?: (current: number, total: number) => void` - Callback for progress updates.
**Returns:** `Promise<TypedWorkflowResult<TOutput>>` - A promise that resolves to the workflow result.
**Example:**
```typescript
// Execute the workflow with default options
const result = await workflow.execute();
// Execute with progress tracking
const result = await workflow.execute({
onProgress: (current, total) => {
console.log(`Processing step ${current} of ${total}`);
}
});
```
#### `dryRun(options?)`
Performs a dry run of the workflow without generating the final output. This is useful for validating the workflow configuration and estimating processing time.
**Returns:** `Promise<WorkflowDryRunResult>` - A promise that resolves to the dry run result, containing validation information and estimated processing time.
**Example:**
```typescript
// Perform a dry run with default options
const dryRunResult = await workflow
.addFilePart('/path/to/document.pdf')
.outputPdf()
.dryRun();
```
### Workflow Examples
#### Basic Document Conversion
```typescript
const result