pdf-data-parser
Version:
Parse, search and stream PDF tabular data using Node.js with Mozilla's PDF.js library.
93 lines (91 loc) • 4.13 kB
Markdown
# pdf-data-parser Change Log
- version 1.2.20
- enhancement: option.missingValues - check for blank cells by comparing XY coordinates against table header cells, default: false
- enhancement: options.hasHeader - indicates if the table has a header row, default: true
- updated: dependency "pdfjs-dist": "^5.4.54"
- version 1.2.19
- refactor: convert project to ECMAScript modules
- refactor: options hasHeader default is false
- version 1.2.18
- refactor: change project group to github:drewletcher
- version 1.2.17
- feature: add password option to decrypt PDF documents
- refactor: pdp CLI improved options handling
- bug fix: optional JSON output of rows as arrays
- version 1.2.16
- refactor: change project group to gitlab:drewletcher
- version 1.2.15
- enhancement: options.cell accepts minimum number of cells or "min-max" range
- enhancement: transform options hasHeader, header:m:n
- enhancement: handle stream pause/resume
- feature: support for Typescript types
- feature: add jsonc support for options files
- refactor: error handling improvements for file not found
- updated: "pdfjs-dist": "^4.5.136"
- version 1.2.14
- bug fix: RepeatCellTransform splice repeating column at array index
- feature: RepeatCellTransform handle missing value (empty cell)
- version 1.2.13
- refactor: _compareFiles test
- refactor: testrunner.bat, use tr_launcher CLI test app from @oby4/storage-lib project
- version 1.2.12
- feature: add trim option to trim whitespace from output values
- refactor: refactor CLI format option: --format=csv|json|rows
- version 1.2.11
- update: use pdfjs-dist 4.2.67, fixes a high severity vulnerability
- version 1.2.10
- feature: implement pdf.js options.data argument for TypedArray input, instead of using options.url argument
- version 1.2.9
- bug fix: use pdfjs-dist 4.0.379 legacy build, fixes Promise.withResolvers error
- version 1.2.8
- feature: add options.stopHeading to identify the end of a table.
- version 1.2.7
- feature: read command-line options from a file
- feature: pages option to limit parsing to specific pages
- feature: support non-Marked, line oriented PDF content, e.g. printer style output to PDF files
- feature: add RepeatCell and RepeatHeading transforms to normalize rows from printed reports
- refactor: improved handling of XY coordinates for determining cell order
- version 1.2.6
- updated: README.md
- version 1.2.5
- feature: modulesPath() finds node_modules dir for default fonts
- updated: README.md cli options
- version 1.2.4
- feature: implement CLI program, pdp or pdf-data-parser
- refactor: if options.heading is not defined process entire document, not just first table.
- updated: pdfjs-dist@4.0
- version 1.1.3
- use latest pre-built PDF.js from npm package 'pdf.js-dist' instead of building it
- version 1.1.2
- remove console debug statement
- updated README.md documentation
- version 1.1.1
- improved test coverage
- remove console debug statement
- fix typos in README.md
- version 1.1.0
- remove page header and footer artifacts from output
- option to remove repeating table headers on multiple page output
- major refactor of marked content processing
- handle cells that are out of normal x,y flow order, e.g. absolute positioning
- improved handling of font size and line spacing
- improved handling of cells with _Span_ marked content
- testing: compare output files to expected data
- version 1.0.5
- add comment about vertical spanning cells to README
- version 1.0.4
- updated README with info about PDF marked content
- version 1.0.3
- bug fix of typo using `options.cells` argument
- version 1.0.2
- allow regexp for PdfDataParser `heading` option
- bug fix of typo in code when using `options.cells`
- version 1.0.1
- fix some typos in README file
- version 1.0.0
- complete README file
- testing updates
- version 0.9.1
- cleanup code, comments and readme
- version 0.9.0
- initial check-in of project files