afpp
Version:
another f*cking pdf parser
181 lines (122 loc) β’ 4.43 kB
Markdown


[](https://codecov.io/github/l2ysho/afpp)





Another f\*cking PDF parser. Because parsing PDFs in Node.js should be easy. Live long and parse PDFs. π
There are plenty of PDF-related packages for Node.js. They workβ¦ until they donβt.
Afpp was built to solve the headaches I ran into while trying to parse PDFs in Node.js:
- π¦ Do I need a package with 30+ MB just to read a PDF?
- π§΅ Why is the event loop blocked?
- π Is that a memory leak I smell?
- π Should reading a PDF really be this performance-heavy?
- π Why is everything so buggy?
- π¨ Why does it complain about the lack of a canvas in Node.js?
- π§± Why does canvas require native C++/Python dependencies to build?
- πͺ Why does it complain about the missing window object?
- πͺ Why do I need ImageMagick for this?!
- π» What the hell is Ghostscript, and why does it keep failing?
- β Whereβs the TypeScript support?
- π§ Why are the dependencies older than my dev career?
- π Why does everything workβ¦ until I try an encrypted PDF?
- π―οΈ Why does every OS need its own special setup ritual?
- Node.js >= v22.14.0
You can install `afpp` via npm, Yarn, or pnpm.
```bash
npm install afpp
```
```bash
yarn add afpp
```
```bash
pnpm add afpp
```
The `afpp` library makes it simple to extract text or images from PDF files in Node.js. Whether your PDF is stored locally, hosted online, or encrypted, `afpp` provides an easy-to-use API to handle it all. All functions have common parameters and accepts string path, buffer, or URL object.
```ts
import { readFile } from 'fs/promises';
import path from 'path';
import { pdf2string } from 'afpp';
(async function main() {
const pathToFile = path.join('..', 'test', 'example.pdf');
const input = await readFile(pathToFile);
const data = await pdf2string(input);
console.log('Extracted text:', data); // ['page 1 content', 'page 2 content', ...]
})();
```
```ts
import { pdf2image } from 'afpp';
(async function main() {
const url = new URL('https://pdfobject.com/pdf/sample.pdf');
const arrayOfImages = await pdf2image(url);
console.log(arrayOfImages); // [imageBuffer, imageBuffer, ...]
})();
```
```ts
import { parsePdf } from 'afpp';
(async function main() {
// Download PDF from URL
const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
// Parse the PDF buffer
const result = await parsePdf(buffer, {}, (content) => content);
console.log('Parsed PDF:', result);
})();
```
Common properties of all afpp functions.
Example usage
```javascript
const result = await parsePdf(buffer, {
concurrency: 5,
imageEncoding: 'jpeg',
password: 'STRONG_PASS',
scale: 4,
});
```
> `optional` **concurrency**: `number`
Concurrency level for page processing. Defaults to 1.
Higher values may improve performance but increase memory usage.
```ts
1;
```
---
> `optional` **imageEncoding**: [`ImageEncoding`](../type-aliases/ImageEncoding.md)
Image encoding format when rendering non-text pages. Defaults to 'png'.
Supported formats: 'avif', 'jpeg', 'png', 'webp'.
```ts
'png';
```
---
> `optional` **password**: `string`
Password for encrypted pdf files.
---
> `optional` **scale**: `number`
Scale of a page if content is not text (or pdf2image is used). Defaults to 2.0.
Higher values increase image resolution but also memory usage.
#### Default
```ts
2.0;
```
This project is licensed under the terms of the [MIT License](./LICENSE).