UNPKG

pdq-wasm

Version:

WebAssembly bindings for Meta's PDQ perceptual image hashing algorithm

306 lines (216 loc) 7.31 kB
# PDQ WASM Examples This directory contains practical examples demonstrating how to use PDQ WASM in different environments. ## Examples Overview ### 1. Browser Example (`browser/`) Interactive web application demonstrating PDQ hashing in the browser. **Features:** - Upload and hash images directly in the browser - Compare two images and see their similarity score - Visual interface with real-time results - Uses Canvas API for image processing **Running the example:** ```bash # From the pdq-wasm root directory cd examples/browser # Serve with any static file server, e.g.: npx serve . # Or use Python: python -m http.server 8000 # Then open http://localhost:8000 in your browser ``` **Note:** You need to serve the files with a local server (not `file://`) due to WASM module loading restrictions. ### 2. Node.js Examples (`nodejs/`) #### Basic Usage (`basic-usage.js`) Demonstrates core PDQ operations with synthetic images. **Running:** ```bash # From the pdq-wasm root directory cd examples/nodejs node basic-usage.js ``` **What it demonstrates:** - Initializing PDQ WASM - Hashing grayscale and RGB images - Calculating Hamming distance - Hash format conversion (binaryhex) - Similarity comparison #### Image Comparison (`image-comparison.js`) Real-world example comparing actual image files. **Prerequisites:** ```bash npm install sharp # Image processing library ``` **Running:** ```bash node image-comparison.js image1.jpg image2.jpg image3.jpg ``` **What it demonstrates:** - Loading images from disk - Processing real image files - Pairwise comparison of multiple images - Identifying similar images using PDQ threshold ## Installation ### Using from npm (published package) ```bash npm install pdq-wasm ``` ### Using local build ```bash # From the pdq-wasm root directory npm install npm run build # The examples will use the local build from ../../dist/ ``` ## Quick Start Code Snippets ### Browser (ES Modules) ```javascript import { PDQ } from 'pdq-wasm'; // Initialize once await PDQ.init(); // Hash an image from Canvas const canvas = document.getElementById('myCanvas'); const ctx = canvas.getContext('2d'); const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height); // Extract RGB data (skip alpha channel) const rgbData = new Uint8Array(canvas.width * canvas.height * 3); let rgbIndex = 0; for (let i = 0; i < imageData.data.length; i += 4) { rgbData[rgbIndex++] = imageData.data[i]; // R rgbData[rgbIndex++] = imageData.data[i + 1]; // G rgbData[rgbIndex++] = imageData.data[i + 2]; // B } const result = PDQ.hash({ data: rgbData, width: canvas.width, height: canvas.height, channels: 3 }); console.log('Hash:', PDQ.toHex(result.hash)); console.log('Quality:', result.quality); ``` ### Node.js (CommonJS) ```javascript const { PDQ } = require('pdq-wasm'); const sharp = require('sharp'); async function hashImage(filePath) { await PDQ.init(); const img = sharp(filePath); const { data, info } = await img.raw().toBuffer({ resolveWithObject: true }); const result = PDQ.hash({ data: new Uint8Array(data), width: info.width, height: info.height, channels: info.channels }); return PDQ.toHex(result.hash); } ``` ### Comparing Images ```javascript await PDQ.init(); // ... generate hash1 and hash2 ... // Method 1: Manual comparison const distance = PDQ.hammingDistance(hash1, hash2); const similarity = PDQ.similarity(hash1, hash2); console.log(`Distance: ${distance}/256`); console.log(`Similarity: ${similarity.toFixed(2)}%`); // Method 2: Using similarity threshold (default: 31) const areSimilar = PDQ.isSimilar(hash1, hash2); console.log(`Similar? ${areSimilar}`); // Method 3: Custom threshold const areSimilarCustom = PDQ.isSimilar(hash1, hash2, 50); ``` ## API Reference See the main [README.md](../README.md) for complete API documentation. ## Common Use Cases ### Duplicate Image Detection ```javascript const threshold = 31; // PDQ recommended threshold const distance = PDQ.hammingDistance(hash1, hash2); if (distance <= threshold) { console.log('Images are likely duplicates or near-duplicates'); } ``` ### Content Moderation ```javascript // Build a database of known inappropriate content hashes const bannedHashes = [...]; // Load from database // Check new upload against banned content const uploadHash = PDQ.hash(uploadedImageData); for (const bannedHash of bannedHashes) { const distance = PDQ.hammingDistance(uploadHash.hash, bannedHash); if (distance <= 31) { console.warn('Content flagged as inappropriate'); break; } } ``` ### Similar Image Search ```javascript // Find images similar to a query image using orderBySimilarity const queryHash = PDQ.hash(queryImageData); // Extract hashes from database const dbHashes = imageDatabase.map(img => img.hash); // Order by similarity (most similar first) const ordered = PDQ.orderBySimilarity(queryHash.hash, dbHashes, true); // Get top 10 most similar with original indices const topResults = ordered.slice(0, 10).map(match => ({ image: imageDatabase[match.index], distance: match.distance, similarity: match.similarity })); console.log('Top 10 similar images:'); topResults.forEach((result, i) => { console.log(`${i + 1}. ${result.image.name}: ${result.similarity.toFixed(2)}% similar`); }); ``` ### Efficient Batch Similarity Ranking ```javascript // Old approach: manual sorting (O(n log n) + O(n) comparisons) const results = dbHashes.map(hash => ({ hash, distance: PDQ.hammingDistance(queryHash.hash, hash), similarity: PDQ.similarity(queryHash.hash, hash) })).sort((a, b) => a.distance - b.distance); // New approach: using orderBySimilarity (optimized) const ordered = PDQ.orderBySimilarity(queryHash.hash, dbHashes); // Returns pre-sorted results with distance and similarity already calculated ``` ## Troubleshooting ### "WebAssembly module failed to load" Make sure you're serving the files over HTTP(S), not using `file://` protocol. Use a local server: ```bash npx serve . # or python -m http.server 8000 ``` ### "Cannot find module 'pdq-wasm'" Make sure you've installed the package: ```bash npm install pdq-wasm ``` Or if using the local build, ensure you've run: ```bash npm run build ``` ### "sharp module not found" (Node.js examples) The image comparison example requires Sharp: ```bash npm install sharp ``` ### Module initialization errors Always call `await PDQ.init()` before using any PDQ functions: ```javascript await PDQ.init(); // Must be called once before using PDQ const result = PDQ.hash(...); ``` ## Performance Tips 1. **Initialize once**: Call `PDQ.init()` only once at application startup 2. **Reuse hashes**: Store generated hashes instead of recalculating 3. **Batch comparisons**: When comparing against many hashes, precompute and store them 4. **Use hex for storage**: Store hashes as hex strings (64 chars) for easy database storage ## Further Reading - [Meta PDQ Algorithm](https://github.com/facebook/ThreatExchange/tree/main/pdq) - Original C++ implementation - [PDQ Paper](https://github.com/facebook/ThreatExchange/blob/main/pdq/docs/pdq_algorithm.md) - Algorithm details - [Main README](../README.md) - Complete API documentation - [CONTRIBUTING](../CONTRIBUTING.md) - Development guide