pdf-parse-new

Version:

Pure javascript cross-platform module to extract text from PDFs with AI-powered optimization and multi-core processing.

github.com/simonegosetto/pdf-parse-new

simonegosetto/pdf-parse-new

110 lines (83 loc) • 3.29 kB

Markdown

# PDF Parse Examples This directory contains practical examples demonstrating all parsing methods. ## Quick Start ```bash # Run any example node test/examples/01-basic-parse.js node test/examples/06-smart-parser.js node test/examples/07-compare-all.js ``` ## Examples ### 01-basic-parse.js **Basic PDF parsing** with default options. - Learn: How to parse a PDF with minimal code - Best for: Quick integration, simple use cases ### 02-batch-parse.js **Batch processing** with different batch sizes. - Learn: How to optimize batch size for your PDFs - Best for: Small to medium PDFs (< 1000 pages) ### 03-stream-parse.js **Streaming parse** with progress tracking. - Learn: Memory-efficient parsing with visual progress - Best for: Large PDFs with limited memory ### 04-workers-parse.js **Worker threads** for true multi-core parallelism. - Learn: How to use all CPU cores for huge PDFs - Best for: PDFs with 1000+ pages on multi-core systems ### 05-processes-parse.js **Child processes** for maximum performance. - Learn: Multi-processing for fastest parsing - Best for: Huge PDFs (often faster than workers) ### 06-smart-parser.js **SmartPDFParser** with automatic method selection. - Learn: How the smart parser chooses the best method - See: Fast-path optimization, caching, statistics - Best for: Production use, mixed PDF sizes ### 07-compare-all.js **Compare all methods** on the same PDF. - Learn: Performance differences between methods - See: Benchmark results, speedup factors - Best for: Understanding when to use which method ## Running Examples ### With Small PDFs (included) All examples work with the included test PDFs: ```bash node test/examples/01-basic-parse.js node test/examples/02-batch-parse.js node test/examples/06-smart-parser.js ``` ### With Large PDFs (optional) Some examples perform better with large PDFs: ```bash # Place a large PDF in test/data/test_9000.pdf node test/examples/04-workers-parse.js node test/examples/05-processes-parse.js node test/examples/07-compare-all.js ``` If large PDF is not found, examples automatically use smaller test files. ## What You'll Learn ### Performance Optimization - When to use batch vs workers vs processes - How oversaturation improves CPU utilization - Impact of batch size on performance ### Smart Features - Fast-path optimization (50x faster overhead) - Decision caching (25x faster on cache hit) - CPU-aware thresholds (adapts to your hardware) ### Production Patterns - Error handling - Progress tracking - Method comparison - Statistics monitoring ## Tips 1. **Start with SmartPDFParser** (example 06) - it auto-selects the best method 2. **Run comparison** (example 07) to see differences on your hardware 3. **Test batch sizes** (example 02) to find optimal configuration 4. **Monitor progress** (examples 03, 04, 05) for long-running tasks ## Next Steps After trying examples: 1. Read the main [README.md](../../README.md) for full documentation 2. Check [lib/README.md](../../lib/README.md) for architecture details 3. See [benchmark/](../../benchmark/) for benchmarking tools --- **All examples are production-ready code!** Copy and adapt them for your projects.