client-side-ocr
Version:
High-performance client-side OCR with ONNX Runtime, RapidOCR and PPU PaddleOCR integration. 100+ language support. Process text from images entirely in the browser with state-of-the-art accuracy and complete privacy.
496 lines (383 loc) โข 19 kB
Markdown
<div align="center">
# ๐ Client-Side OCR with ONNX Runtime
**Extract text from images directly in your browser - no server required! Now with RapidOCR and PPU PaddleOCR integration for 100+ languages!**
[](https://www.npmjs.com/package/client-side-ocr)
[](https://www.npmjs.com/package/client-side-ocr)
[](https://github.com/siva-sub/client-ocr/blob/main/LICENSE)
[](https://siva-sub.github.io/client-ocr/)
[](https://github.com/siva-sub/client-ocr)
[**Live Demo**](https://siva-sub.github.io/client-ocr/) | [**NPM Package**](https://www.npmjs.com/package/client-side-ocr) | [**Documentation**](./docs/USAGE.md) | [**API Reference**](./docs/API.md) | [**Troubleshooting**](./docs/TROUBLESHOOTING.md)
</div>
---
A high-performance, privacy-focused OCR solution that runs entirely in the browser using ONNX Runtime with both RapidOCR and PPU PaddleOCR models. Process text from images and PDF documents without sending data to any server - everything happens locally on your device. Supporting 100+ languages with state-of-the-art accuracy!
## ๐ธ Screenshots
<div align="center">
| Main Interface | Preprocessing Options | Performance Metrics |
|----------------|----------------------|---------------------|
|  |  |  |
</div>
## ๐ Why Choose Client-Side OCR?
### ๐ **Complete Privacy & Security**
Unlike cloud-based OCR services (Google Vision, AWS Textract, Azure OCR), your sensitive documents **never leave your device**. Perfect for:
- ๐ Legal documents & contracts
- ๐ณ Financial statements & invoices
- ๐ฅ Medical records
- ๐ Personal IDs & passports
- ๐ Confidential business documents
### ๐ฐ **Zero Costs, Unlimited Usage**
- **No API fees**: Save thousands compared to cloud OCR services
- **No rate limits**: Process unlimited documents
- **No subscriptions**: One-time integration, lifetime usage
- **No surprises**: Predictable performance, no service outages
### โก **Superior Performance**
- **Instant results**: No network latency (avg 300-1500ms)
- **Offline capable**: Works without internet after initial load
- **GPU acceleration**: Uses WebGL for faster processing
- **Batch optimization**: Process multiple regions efficiently
### ๐ฏ **How It's Different**
| Feature | Client-Side OCR | Cloud OCR (Google/AWS) | Tesseract.js |
|---------|----------------|------------------------|--------------|
| **Privacy** | โ
100% local | โ Data sent to servers | โ
Local |
| **Cost** | โ
Free forever | โ Pay per request | โ
Free |
| **Languages** | โ
100+ built-in | โ
Many | โ ๏ธ Manual setup |
| **Performance** | โ
Fast (ONNX) | โ ๏ธ Network dependent | โ Slow |
| **Accuracy** | โ
State-of-art | โ
High | โ ๏ธ Good |
| **Setup** | โ
Simple npm install | โ Complex API setup | โ ๏ธ Large models |
| **Preprocessing** | โ
Built-in OpenCV | โ ๏ธ Limited | โ Basic |
| **Model Size** | โ
15-30MB total | N/A | โ 60MB+ per language |
| **Offline** | โ
Full support | โ Requires internet | โ
Supported |
### ๐จ **Advanced Features Not Found Elsewhere**
- ๐ผ๏ธ **Smart Preprocessing**: Built-in OpenCV.js for image enhancement
- ๐ **Auto-rotation**: Detects and corrects upside-down text
- ๐ **Confidence scores**: Get reliability metrics for each word
- ๐ค **Word segmentation**: Separate text into individual words
- ๐ฑ **Mobile optimized**: Responsive design with camera capture
- ๐ **Progressive Web App**: Install as native app on any device
- ๐ฏ **Multiple Model Support**: Choose between RapidOCR and PPU models
## ๐ฏ Real-World Use Cases
### Perfect for Applications That Need:
- **๐ฑ Document Scanner Apps**: Build mobile/web document scanners
- **๐ข Enterprise Document Processing**: Process sensitive documents securely
- **๐ฅ Healthcare Systems**: Extract text from medical records privately
- **๐๏ธ Government Portals**: Handle citizen documents without data leaks
- **๐ Education Platforms**: Convert handwritten notes to digital text
- **๐ผ Business Card Readers**: Extract contact information instantly
- **๐งพ Receipt/Invoice Processing**: Automate expense tracking
- **๐ Digital Libraries**: Make scanned books searchable
## โจ Core Features
- ๐ **100% Client-Side**: All OCR processing happens in the browser - no data leaves your device
- ๐ฏ **High Accuracy**: Uses state-of-the-art RapidOCR and PPU PaddleOCR v4/v5 models
- ๐ **100+ Languages**: Support for major world languages including Chinese, English, Japanese, Korean, Arabic, Hindi, Tamil, and more
- ๐ฑ **PWA Support**: Works offline after initial load with service worker caching
- ๐ผ๏ธ **Image Preprocessing**: Built-in OpenCV.js for auto-enhancement, denoising, deskewing
- ๐ **Auto-Rotation**: Automatically detects and corrects upside-down text
- ๐ **PDF Support**: Extract text from PDFs page-by-page with detailed results
- ๐จ **Modern UI**: Beautiful, responsive interface built with React & Mantine UI
- ๐ฆ **Smart Caching**: Models cached locally for instant subsequent use
- ๐ง **Developer Friendly**: Simple API, TypeScript support, React components
- ๐ **Performance Monitoring**: Real-time metrics and processing insights
## ๐จโ๐ป About the Author
**Sivasubramanian Ramanathan**
I created this module while experimenting and learning about extracting data from unstructured documents. What started as a curiosity about client-side OCR capabilities evolved into this comprehensive library that brings powerful text recognition to the browser.
<div align="center">
[](https://www.linkedin.com/in/sivasub987)
[](https://github.com/siva-sub)
[](mailto:hello@sivasub.com)
[](https://sivasub.com)
</div>
## Technology Stack
- **Frontend**: React 19 + TypeScript + Vite
- **UI Framework**: Mantine UI v8
- **OCR Engine**: ONNX Runtime Web
- **Models**: RapidOCR + PPU PaddleOCR (PP-OCRv4/v5)
- **Processing**: RapidOCR techniques (CTC decoding, DB postprocessing)
- **PWA**: Vite PWA Plugin + Workbox
## Attribution & Credits
This project builds upon the excellent work of:
### ๐ RapidOCR
- Repository: [https://github.com/RapidAI/RapidOCR](https://github.com/RapidAI/RapidOCR)
- Advanced OCR implementation with multi-language support
- Processing techniques and model hosting
- Licensed under Apache License 2.0
### ๐ PaddleOCR
- Repository: [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
- The state-of-the-art OCR models used in this application
- Licensed under Apache License 2.0
### ๐ฅ OnnxOCR
- Repository: [https://github.com/jingsongliujing/OnnxOCR](https://github.com/jingsongliujing/OnnxOCR)
- ONNX model conversion and inference implementation reference
- Provided the ONNX models and dictionary files
### ๐ ppu-paddle-ocr
- Repository: [https://github.com/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr](https://github.com/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr)
- TypeScript implementation reference
- Deskew algorithm implementation inspiration
## ๐ Demo
Try the live demo: [https://siva-sub.github.io/client-ocr/](https://siva-sub.github.io/client-ocr/)
## ๐ก Quick Comparison
```javascript
// โ Cloud OCR (Privacy Risk + Costs)
const result = await fetch('https://api.service.com/ocr', {
method: 'POST',
body: formData, // Your sensitive data leaves your device!
headers: { 'API-Key': 'sk-xxxxx' } // Costs money per request
});
// โ Tesseract.js (Slow + Large)
const worker = await Tesseract.createWorker('eng'); // 60MB+ download
const { data } = await worker.recognize(image); // Slow processing
// โ
Client-Side OCR (Private + Fast + Free)
import { RapidOCREngine } from 'client-side-ocr';
const ocr = new RapidOCREngine({ lang: 'en' }); // 15MB total
await ocr.initialize(); // One-time setup
const result = await ocr.process(imageData); // Fast, local, private!
```
## ๐ฆ Installation
<div align="center">
### Install from NPM
```bash
npm install client-side-ocr
```
```bash
yarn add client-side-ocr
```
```bash
pnpm add client-side-ocr
```
[](https://www.npmjs.com/package/client-side-ocr)
</div>
### For Development
```bash
# Clone the repository
git clone https://github.com/siva-sub/client-ocr.git
cd client-ocr
# Install dependencies
npm install
# Run development server
npm run dev
# Build for production
npm run build
```
## Quick Start
### As a Library
```typescript
import { createOCREngine } from 'client-side-ocr';
// Initialize the OCR engine with language selection
const ocr = createOCREngine({
language: 'en', // or 'ch', 'fr', 'de', 'ja', 'ko', etc.
modelVersion: 'PP-OCRv4' // or 'PP-OCRv5'
});
await ocr.initialize();
// Process an image with advanced options
const result = await ocr.processImage(imageFile, {
enableWordSegmentation: true,
returnConfidence: true
});
console.log(result.text);
console.log(result.confidence);
console.log(result.wordBoxes); // Word-level bounding boxes
```
### React Component
```tsx
import { RapidOCRInterface } from 'client-side-ocr/react';
function App() {
return (
<RapidOCRInterface
defaultLanguage="en"
modelVersion="PP-OCRv4"
onResult={(result) => console.log(result)}
/>
);
}
```
### Via CDN
```html
<script type="module">
import { createOCREngine } from 'https://unpkg.com/client-side-ocr@latest/dist/index.mjs';
const ocr = createOCREngine();
await ocr.initialize();
</script>
```
## Documentation
### ๐ Comprehensive Guides
- **[Usage Guide](./docs/USAGE.md)** - Complete usage documentation with examples
- **[API Reference](./docs/API.md)** - Detailed API documentation
- **[Model Documentation](./docs/MODELS.md)** - Information about available OCR models
- **[Troubleshooting Guide](./docs/TROUBLESHOOTING.md)** - Common issues and solutions
## API Overview
```typescript
// Create RapidOCR engine
const ocr = createRapidOCREngine({
language: 'en', // 'ch', 'fr', 'de', 'ja', 'ko', 'ru', 'pt', 'es', 'it', 'id', 'vi', 'fa', 'ka'
modelVersion: 'PP-OCRv4', // or 'PP-OCRv5'
modelType: 'mobile' // or 'server'
});
// Initialize with automatic model download
await ocr.initialize();
// Process image with RapidOCR techniques
const result = await ocr.processImage(file, {
enableTextClassification: true, // 180ยฐ rotation detection
enableWordSegmentation: true, // Word-level boxes
preprocessConfig: {
detectImageNetNorm: true, // ImageNet normalization for detection
recStandardNorm: true // Standard normalization for recognition
},
postprocessConfig: {
unclipRatio: 2.0, // Text region expansion
boxThresh: 0.7 // Box confidence threshold
}
});
// Access enhanced results
console.log(result.text); // Extracted text
console.log(result.confidence); // Overall confidence
console.log(result.lines); // Text lines with individual confidence
console.log(result.wordBoxes); // Word-level segmentation
console.log(result.angle); // Detected text angle (0ยฐ or 180ยฐ)
console.log(result.processingTime); // Processing time breakdown by stage
```
For detailed API documentation, see [API Reference](./docs/API.md).
## Model Support
The library supports both RapidOCR and PPU PaddleOCR models with multi-language capabilities:
### Supported Languages (100+)
| Language | Code | RapidOCR | PPU Models | Notes |
|----------|------|----------|------------|-------|
| Chinese | ch | โ
| โ
| Simplified & Traditional |
| English | en | โ
| โ
| Full support |
| French | fr | โ
| โ | RapidOCR only |
| German | de | โ
| โ | RapidOCR only |
| Japanese | ja | โ
| โ
| Hiragana, Katakana, Kanji |
| Korean | ko | โ
| โ
| Hangul support |
| Russian | ru | โ
| โ | Cyrillic script |
| Portuguese | pt | โ
| โ | Brazilian & European |
| Spanish | es | โ
| โ | Latin American & European |
| Italian | it | โ
| โ | RapidOCR only |
| Indonesian | id | โ
| โ | RapidOCR only |
| Vietnamese | vi | โ
| โ | With tone marks |
| Persian | fa | โ
| โ | Right-to-left support |
| Kannada | ka | โ
| โ | Indic script support |
### Model Specifications
| Model Component | Size | Purpose | Features |
|----------------|------|---------|----------|
| Detection | 4-5MB | Text region detection | DB algorithm with unclip expansion |
| Recognition | 8-17MB | Text recognition | CTC decoding with embedded dictionary |
| Classification | 0.5MB | Text angle detection | 0ยฐ and 180ยฐ rotation correction |
### Model Architecture
- **Detection Models**: Uses DB (Differentiable Binarization) algorithm with:
- ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Dynamic resolution adjustment (multiples of 32)
- Unclip ratio for text region expansion
- **Recognition Models**: Features include:
- CTC (Connectionist Temporal Classification) decoding
- Embedded dictionaries in model metadata
- Dynamic width calculation based on aspect ratio
- Standard normalization ((pixel/255 - 0.5) / 0.5)
- PPU models: Red channel only for grayscale, 0-based dictionary indexing
- **Classification Models**: Text orientation detection:
- Detects 0ยฐ and 180ยฐ rotations
- Batch processing with aspect ratio sorting
- Automatic rotation correction
### Model Sources
- **RapidOCR Models**: Hosted on RapidOCR's ModelScope repository
- **PPU Models**: Downloaded from PPU PaddleOCR repository with special preprocessing
## Architecture
```mermaid
graph TD
A[Image Upload] --> B[Language Selection]
B --> C[Model Download Check]
C -->|Not Cached| D[Download Models]
C -->|Cached| E[Detection Preprocessing]
D --> E
E --> F[ONNX Detection Worker]
F --> G[Text Classification]
G -->|180ยฐ Detected| H[Rotate Image]
G -->|Normal| I[Recognition Preprocessing]
H --> I
I --> J[ONNX Recognition Worker]
J --> K[CTC Decoding]
K --> L[Word Segmentation]
L --> M[Final Output]
subgraph Processing Pipeline
E -->|ImageNet/Standard Norm| F
I -->|Model-specific Norm| J
K -->|Dictionary| L
end
subgraph Model Management
C
D
end
```
## Performance
### Processing Speed
- Average processing time: 300-1500ms (depending on image size, language, and device)
- Batch processing optimization for multiple text regions
- Aspect ratio sorting for efficient recognition batching
### Optimizations
- WebGL backend for GPU acceleration when available
- Web Workers for non-blocking parallel processing
- Automatic model caching with SHA256 verification
- Smart preprocessing pipeline selection based on model type
- Efficient memory management with typed arrays
- Width limiting for PPU models to prevent memory issues
### Advanced Features
- **Word-level segmentation**: Separates Chinese characters from English/numbers
- **Confidence scoring**: Per-character and per-line confidence metrics
- **Rotation detection**: Automatic 180ยฐ text correction
- **Dynamic resolution**: Adaptive image resizing for optimal accuracy
- **Stack overflow prevention**: Safe handling of large documents
## Browser Support
- Chrome/Edge 90+ (recommended)
- Firefox 89+
- Safari 15+
- Requires WebAssembly and Web Workers support
## Development
### Project Structure
```
client-ocr/
โโโ src/
โ โโโ core/ # OCR engine and services
โ โโโ workers/ # Web Workers for processing
โ โโโ ui/ # React components
โ โโโ types/ # TypeScript definitions
โโโ public/
โ โโโ models/ # ONNX models and dictionaries
โโโ docs/ # Documentation
โโโ screenshots/ # Application screenshots
โโโ .github/
โโโ workflows/ # GitHub Actions for deployment
```
### Key Components
- `RapidOCREngine`: Main OCR orchestrator with multi-language support
- `PPUModelHandler`: Special handling for PPU PaddleOCR models
- `DetPreProcess`: Detection preprocessing with model-specific normalization
- `RecPreProcess`: Recognition preprocessing with dynamic width calculation
- `ClsPreProcess`: Classification preprocessing for rotation detection
- `CTCLabelDecode`: CTC decoding with word segmentation
- `DBPostProcess`: DB postprocessing with unclip expansion
- `ModelDownloader`: Automatic model fetching from multiple sources
- `MetaONNXLoader`: Extract embedded dictionaries from models
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## ๐ License
MIT License - see [LICENSE](LICENSE) file for details
## ๐ Acknowledgments
Special thanks to:
- The RapidAI team for RapidOCR and model hosting
- The PaddlePaddle team for creating PaddleOCR
- The OnnxOCR project for ONNX conversion tools
- The ppu-paddle-ocr team for TypeScript implementation reference
- The open-source community for making this possible
## ๐ What's New in v2.0
- **RapidOCR Integration**: Complete integration with RapidOCR processing pipeline
- **PPU Model Support**: Added support for PPU PaddleOCR models with special preprocessing
- **100+ Language Support**: Extended language support beyond the original 14
- **Advanced Processing**: CTC decoding, DB postprocessing, and word segmentation
- **Model Auto-Download**: Automatic model fetching with progress tracking
- **Embedded Dictionaries**: Models now include character dictionaries in metadata
- **Improved Accuracy**: Better preprocessing with proper normalization techniques
- **Batch Optimization**: Aspect ratio sorting for efficient batch processing
- **Stack Overflow Prevention**: Safe handling of large documents without memory issues
- **Enhanced UI**: Modern interface with tabs for OCR, preprocessing, and performance
---
<div align="center">
Made with โค๏ธ by [Sivasubramanian Ramanathan](https://sivasub.com)
[](https://www.npmjs.com/package/client-side-ocr)
[](https://github.com/siva-sub/client-ocr)
</div>