document-extraction-service
Version:
A service for handling document extraction and processing
269 lines (213 loc) • 5.64 kB
Markdown
# Document Extraction Service
`document-extraction-service` is a Node.js library for seamless integration with document processing APIs. It provides request preparation, response validation, and callback handling functionalities, making document extraction workflows efficient and robust.
## Installation
To install the package, run:
```bash
npm install document-extraction-service
```
## Quick Start
### Configure the Request Validator
```javascript
const { createRequestValidator } = require('document-extraction-service');
const config = {
endpoint: 'https://your-extraction-api.com',
headers: {
'Content-Type': 'multipart/form-data',
'callback_url_pattern': 'https://your-service.com/callback/{{streamId}}/{{extractionStrategyId}}',
'trace_id': '{{traceId}}'
},
requestBody: {
'strategies_batch_id': '{{strategiesBatchId}}',
'doc_id': '{{docId}}',
'url': '{{file_url}}',
'document_meta': '{{content}}',
},
timeout_days: 2,
max_retries: 3
};
const requestValidator = createRequestValidator(config);
```
### Create the Callback Validator
```javascript
const { createCallbackValidator } = require('document-extraction-service');
const callbackValidator = createCallbackValidator();
```
### Processing a Document
```javascript
const processDocument = async () => {
const docId = 'doc123';
const content = { text: 'Your document content' };
const streamId = 'stream456';
try {
// Prepare request parameters
const requestParams = requestValidator.prepareRequest(docId, content, streamId);
// Make API call (using your preferred HTTP client, e.g., axios)
const response = await axios(requestParams);
// Handle response
const result = requestValidator.handleResponse(response, requestParams.headers['X-Trace-ID']);
console.log('Document processing initiated:', result);
} catch (error) {
console.error('Error processing document:', error);
}
};
```
### Handling Callback
```javascript
const handleCallback = async (callbackData) => {
try {
const result = await callbackValidator.handleCallback(callbackData);
if (!result.success) {
console.error(result.error);
} else {
console.log('Callback processed:', result);
}
} catch (error) {
console.error('Error processing callback:', error);
}
};
```
## API Documentation
### Configuration Object
```javascript
const config = {
endpoint: 'https://api.example.com', // Required - API endpoint
headers: {
'Authorization': 'Bearer token',
'callback_url_pattern': 'https://callback.com/{{docId}}/{{streamId}}' // Required
},
timeout_days: 2, // Optional - default: 2
max_retries: 3 // Optional - default: 3
};
```
### Request Preparation
```javascript
const params = requestValidator.prepareRequest(docId, content, streamId);
```
#### Returns:
```javascript
{
url: string, 'callback_url_pattern': 'https://your-service.com/callback/{{docId}}/{{streamId}}'
method: 'POST',
headers: {
'X-Document-ID': string,
'X-Trace-ID': string,
'X-Callback-URL': string,
...other headers
},
data: {
content: any,
streamId: string
}
}
```
### Response Handling
```javascript
const result = requestValidator.handleResponse(response, traceId);
```
#### Returns:
```javascript
{
success: boolean,
docId: string,
traceId: string,
message?: string,
error?: string
}
```
### Callback Handling
```javascript
const result = await callbackValidator.handleCallback(callbackData);
```
#### Input Format:
```javascript
{
doc_id: string,
trace_id: string,
chunk_data: Array<{
content: string,
index: number,
chunkId: string,
chunkText: string
}>,
last_batch: boolean
}
```
#### Returns:
```javascript
{
success: boolean,
docId: string,
traceId: string,
isLastBatch: boolean,
chunks: Array<ProcessedChunk>,
metadata: {
processedAt: string,
chunksCount: number
}
}
```
## Additional Utilities
### Chunk Validation
```javascript
const {
ChunkData,
ExtractionConfig,
CustomExtractorFactory
} = require('document-extraction-service');
// Validate chunks
ChunkData.validateResponse(chunksData);
ChunkData.validateChunk(chunk);
```
### Custom Configurations
```javascript
const config = new ExtractionConfig({...});
const factory = new CustomExtractorFactory();
const customRequestValidator = factory.createRequestValidator(config);
const customCallbackValidator = factory.createCallbackValidator();
```
## Error Handling
### Validating Input
```javascript
try {
const result = await requestValidator.prepareRequest(docId, content, streamId);
} catch (error) {
if (error.message.includes('Missing required field')) {
// Handle validation error
} else {
// Handle other errors
}
}
```
### Handling Callback Validation Errors
```javascript
try {
const result = await callbackValidator.handleCallback(callbackData);
if (!result.success) {
// Handle validation failure
console.error(result.error);
}
} catch (error) {
// Handle unexpected errors
console.error(error);
}
```
## Testing
Run the provided test suite:
```bash
npm test
```
## Features
- **Request Preparation**: Simplifies constructing API requests with headers and parameters.
- **Response Validation**: Ensures API responses are correctly formatted.
- **Callback Processing**: Validates and processes callback data efficiently.
- **Customizable Configuration**: Supports flexible timeout, retry logic, and callback URL patterns.
## License
This project is licensed under the MIT License. Contributions are welcome!