UNPKG

pdf-data-parser

Version:

Parse, search and stream PDF tabular data using Node.js with Mozilla's PDF.js library.

93 lines (91 loc) 4.13 kB
# pdf-data-parser Change Log - version 1.2.20 - enhancement: option.missingValues - check for blank cells by comparing XY coordinates against table header cells, default: false - enhancement: options.hasHeader - indicates if the table has a header row, default: true - updated: dependency "pdfjs-dist": "^5.4.54" - version 1.2.19 - refactor: convert project to ECMAScript modules - refactor: options hasHeader default is false - version 1.2.18 - refactor: change project group to github:drewletcher - version 1.2.17 - feature: add password option to decrypt PDF documents - refactor: pdp CLI improved options handling - bug fix: optional JSON output of rows as arrays - version 1.2.16 - refactor: change project group to gitlab:drewletcher - version 1.2.15 - enhancement: options.cell accepts minimum number of cells or "min-max" range - enhancement: transform options hasHeader, header:m:n - enhancement: handle stream pause/resume - feature: support for Typescript types - feature: add jsonc support for options files - refactor: error handling improvements for file not found - updated: "pdfjs-dist": "^4.5.136" - version 1.2.14 - bug fix: RepeatCellTransform splice repeating column at array index - feature: RepeatCellTransform handle missing value (empty cell) - version 1.2.13 - refactor: _compareFiles test - refactor: testrunner.bat, use tr_launcher CLI test app from @oby4/storage-lib project - version 1.2.12 - feature: add trim option to trim whitespace from output values - refactor: refactor CLI format option: --format=csv|json|rows - version 1.2.11 - update: use pdfjs-dist 4.2.67, fixes a high severity vulnerability - version 1.2.10 - feature: implement pdf.js options.data argument for TypedArray input, instead of using options.url argument - version 1.2.9 - bug fix: use pdfjs-dist 4.0.379 legacy build, fixes Promise.withResolvers error - version 1.2.8 - feature: add options.stopHeading to identify the end of a table. - version 1.2.7 - feature: read command-line options from a file - feature: pages option to limit parsing to specific pages - feature: support non-Marked, line oriented PDF content, e.g. printer style output to PDF files - feature: add RepeatCell and RepeatHeading transforms to normalize rows from printed reports - refactor: improved handling of XY coordinates for determining cell order - version 1.2.6 - updated: README.md - version 1.2.5 - feature: modulesPath() finds node_modules dir for default fonts - updated: README.md cli options - version 1.2.4 - feature: implement CLI program, pdp or pdf-data-parser - refactor: if options.heading is not defined process entire document, not just first table. - updated: pdfjs-dist@4.0 - version 1.1.3 - use latest pre-built PDF.js from npm package 'pdf.js-dist' instead of building it - version 1.1.2 - remove console debug statement - updated README.md documentation - version 1.1.1 - improved test coverage - remove console debug statement - fix typos in README.md - version 1.1.0 - remove page header and footer artifacts from output - option to remove repeating table headers on multiple page output - major refactor of marked content processing - handle cells that are out of normal x,y flow order, e.g. absolute positioning - improved handling of font size and line spacing - improved handling of cells with _Span_ marked content - testing: compare output files to expected data - version 1.0.5 - add comment about vertical spanning cells to README - version 1.0.4 - updated README with info about PDF marked content - version 1.0.3 - bug fix of typo using `options.cells` argument - version 1.0.2 - allow regexp for PdfDataParser `heading` option - bug fix of typo in code when using `options.cells` - version 1.0.1 - fix some typos in README file - version 1.0.0 - complete README file - testing updates - version 0.9.1 - cleanup code, comments and readme - version 0.9.0 - initial check-in of project files