UNPKG

sqlite-wasm-vec

Version:

SQLite Wasm with sqlite-vec extension for vector similarity search, conveniently wrapped as an ES Module.

575 lines (459 loc) 15.2 kB
# SQLite Wasm with sqlite-vec SQLite Wasm with [sqlite-vec](https://github.com/asg017/sqlite-vec) extension for vector similarity search, conveniently wrapped as an ES Module. This is a fork of [@sqlite.org/sqlite-wasm](https://github.com/sqlite/sqlite-wasm) that includes the sqlite-vec extension compiled into the WASM build, enabling vector operations directly in the browser. ## Features - 🚀 **Full SQLite** with WASM support (SQLite 3.51.0) - 🔍 **Vector similarity search** using sqlite-vec extension (v0.1.7-alpha.2) - 📦 **~1.5MB WASM file** (optimized with -O2) - 🌐 **Browser & Node.js** compatible - 💾 **OPFS support** for persistent storage - 🎯 **TypeScript definitions** included - ✅ **100% API compatible** with @sqlite.org/sqlite-wasm ## Node.js support > [!Warning] > > Node.js is currently only supported for in-memory databases without > persistence. ## Installation ```bash npm install sqlite-wasm-vec ``` ## sqlite-vec Features All sqlite-vec functions are available out of the box. Here are comprehensive examples: ### Basic Setup and Version Check ```javascript import sqlite3InitModule from 'sqlite-wasm-vec'; // Initialize SQLite with vec extension const sqlite3 = await sqlite3InitModule(); const db = new sqlite3.oo1.DB(':memory:'); // or use a file path // Check sqlite-vec version const vecVersion = db.selectValue('SELECT vec_version()'); console.log('sqlite-vec version:', vecVersion); // e.g., "v0.1.7-alpha.2" ``` ### Creating Vector Tables ```javascript // Create a virtual table for storing vectors // Specify the vector dimension (e.g., 384 for all-MiniLM-L6-v2 embeddings) db.exec(` CREATE VIRTUAL TABLE vec_items USING vec0( embedding float[384] ) `); // You can also create tables with different vector types db.exec(` CREATE VIRTUAL TABLE vec_int8_items USING vec0( embedding int8[128] -- 8-bit integer vectors ) `); db.exec(` CREATE VIRTUAL TABLE vec_bit_items USING vec0( embedding bit[1024] -- Binary vectors for hamming distance ) `); ``` ### Inserting Vectors ```javascript // Method 1: Using prepared statements (recommended for multiple inserts) const vectors = [ { id: 1, data: [0.1, 0.2, 0.3, 0.4] }, { id: 2, data: [0.5, 0.6, 0.7, 0.8] }, { id: 3, data: [0.9, 1.0, 1.1, 1.2] }, ]; const stmt = db.prepare( 'INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)', ); for (const vector of vectors) { const float32Array = new Float32Array(vector.data); stmt.bind(vector.id, float32Array.buffer).stepReset(); } stmt.finalize(); // Method 2: Direct insert with typed arrays const embedding = new Float32Array(384); // ... fill embedding with your data ... db.exec({ sql: 'INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)', bind: [100, embedding.buffer], }); ``` ### Vector Similarity Search ```javascript // Perform k-NN search using MATCH operator const queryVector = new Float32Array([0.2, 0.3, 0.4, 0.5]); // Find 5 most similar vectors const results = db.selectArrays( 'SELECT rowid, distance FROM vec_items WHERE embedding MATCH ? ORDER BY distance LIMIT 5', queryVector.buffer, ); // Results format: [[rowid1, distance1], [rowid2, distance2], ...] results.forEach(([id, distance]) => { console.log(`Item ${id}: distance = ${distance}`); }); // With additional filtering const filtered = db.selectObjects( `SELECT vi.rowid, vi.distance, items.title, items.category FROM vec_items vi JOIN items ON items.id = vi.rowid WHERE embedding MATCH ? AND items.category = 'electronics' ORDER BY distance LIMIT 10`, queryVector.buffer, ); ``` ### Using Distance Functions ```javascript // Store vectors for comparison db.exec('CREATE TEMP TABLE test_vectors (id INTEGER PRIMARY KEY, vec BLOB)'); const v1 = new Float32Array([1.0, 2.0, 3.0]); const v2 = new Float32Array([4.0, 5.0, 6.0]); const stmt = db.prepare('INSERT INTO test_vectors(id, vec) VALUES (?, ?)'); stmt.bind(1, v1.buffer).stepReset(); stmt.bind(2, v2.buffer).stepReset(); stmt.finalize(); // Calculate different distance metrics const l2Distance = db.selectValue(` SELECT vec_distance_l2( (SELECT vec FROM test_vectors WHERE id = 1), (SELECT vec FROM test_vectors WHERE id = 2) ) `); const cosineDistance = db.selectValue(` SELECT vec_distance_cosine( (SELECT vec FROM test_vectors WHERE id = 1), (SELECT vec FROM test_vectors WHERE id = 2) ) `); const l1Distance = db.selectValue(` SELECT vec_distance_l1( (SELECT vec FROM test_vectors WHERE id = 1), (SELECT vec FROM test_vectors WHERE id = 2) ) `); console.log('L2 distance:', l2Distance); console.log('Cosine distance:', cosineDistance); console.log('L1 distance:', l1Distance); ``` ### Vector Operations ```javascript // Vector normalization const normalized = db.selectValue( 'SELECT vec_normalize(?)', new Float32Array([3.0, 4.0]).buffer, ); // Vector addition and subtraction const sum = db.selectValue( 'SELECT vec_add(?, ?)', new Float32Array([1, 2, 3]).buffer, new Float32Array([4, 5, 6]).buffer, ); // Get vector metadata const length = db.selectValue( 'SELECT vec_length(?)', new Float32Array([1, 2, 3, 4, 5]).buffer, ); // Returns 5 // Convert vector to JSON for inspection const jsonArray = db.selectValue( 'SELECT vec_to_json(?)', new Float32Array([1.5, 2.5, 3.5]).buffer, ); // Returns "[1.5,2.5,3.5]" ``` ### Quantization for Memory Efficiency ```javascript // Quantize float32 vectors to int8 (reduces memory by 4x) const float32Vec = new Float32Array([0.1, 0.5, 0.9, -0.3]); const int8Vec = db.selectValue( 'SELECT vec_quantize_int8(?)', float32Vec.buffer, ); // Quantize to binary vectors (reduces memory by 32x) const binaryVec = db.selectValue( 'SELECT vec_quantize_binary(?)', float32Vec.buffer, ); // Use quantized vectors for approximate search db.exec(` CREATE VIRTUAL TABLE vec_binary USING vec0( embedding bit[1024] ) `); ``` ### Practical Example: Building a Semantic Search ```javascript import sqlite3InitModule from 'sqlite-wasm-vec'; async function buildSemanticSearch() { const sqlite3 = await sqlite3InitModule(); const db = new sqlite3.oo1.DB(':memory:'); // Create tables for documents and their embeddings db.exec(` CREATE TABLE documents ( id INTEGER PRIMARY KEY, title TEXT, content TEXT, metadata JSON ); CREATE VIRTUAL TABLE doc_embeddings USING vec0( embedding float[384] -- Using 384-dim embeddings ); `); // Insert sample documents with their embeddings const documents = [ { id: 1, title: 'Introduction to AI', content: 'AI is transforming...', embedding: [ /* ... */ ], }, { id: 2, title: 'Machine Learning Basics', content: 'ML algorithms learn...', embedding: [ /* ... */ ], }, // ... more documents ]; const insertDoc = db.prepare('INSERT INTO documents VALUES (?, ?, ?, ?)'); const insertEmb = db.prepare( 'INSERT INTO doc_embeddings(rowid, embedding) VALUES (?, ?)', ); for (const doc of documents) { insertDoc .bind(doc.id, doc.title, doc.content, JSON.stringify({})) .stepReset(); insertEmb.bind(doc.id, new Float32Array(doc.embedding).buffer).stepReset(); } insertDoc.finalize(); insertEmb.finalize(); // Search function function search(queryEmbedding, limit = 5) { return db.selectObjects( ` SELECT d.id, d.title, d.content, e.distance FROM doc_embeddings e JOIN documents d ON d.id = e.rowid WHERE e.embedding MATCH ? ORDER BY e.distance LIMIT ? `, new Float32Array(queryEmbedding).buffer, limit, ); } // Perform a search const queryVector = [ /* your query embedding */ ]; const results = search(queryVector, 10); return { db, search }; } ``` ### Available Functions All sqlite-vec functions are compiled and available: **Core Functions:** - `vec_version()` - Returns the version of the sqlite-vec extension - `vec_debug()` - Debugging information about the extension **Vector Creation and Manipulation:** - `vec_f32(values...)` - Create a float32 vector from values - `vec_int8(values...)` - Create an int8 vector from values - `vec_bit(values...)` - Create a bit vector from values - `vec_slice(vector, start, end)` - Extract a slice of a vector - `vec_length(vector)` - Get the number of dimensions in a vector - `vec_type(vector)` - Get the type of a vector (f32, int8, bit) - `vec_to_json(vector)` - Convert vector to JSON array representation **Vector Operations:** - `vec_add(a, b)` - Element-wise addition of two vectors - `vec_sub(a, b)` - Element-wise subtraction of two vectors - `vec_normalize(vector)` - Normalize a vector to unit length **Distance Functions:** - `vec_distance_l2(a, b)` - Euclidean/L2 distance between vectors - `vec_distance_cosine(a, b)` - Cosine distance (1 - cosine similarity) - `vec_distance_l1(a, b)` - Manhattan/L1 distance between vectors - `vec_distance_hamming(a, b)` - Hamming distance for bit vectors **Quantization Functions:** - `vec_quantize_binary(vector)` - Quantize to binary (bit) representation - `vec_quantize_int8(vector)` - Quantize float32 to int8 representation ## Usage There are three ways to use SQLite Wasm: - [in the main thread with a wrapped worker](#in-a-wrapped-worker-with-opfs-if-available) (🏆 preferred option) - [in a worker](#in-a-worker-with-opfs-if-available) - [in the main thread](#in-the-main-thread-without-opfs) Only the worker versions allow you to use the origin private file system (OPFS) storage back-end. ### In a wrapped worker (with OPFS if available): > [!Warning] > > For this to work, you need to set the following headers on your server: > > `Cross-Origin-Opener-Policy: same-origin` > > `Cross-Origin-Embedder-Policy: require-corp` ```js import { sqlite3Worker1Promiser } from 'sqlite-wasm-vec'; const log = console.log; const error = console.error; const initializeSQLite = async () => { try { log('Loading and initializing SQLite3 module...'); const promiser = await new Promise((resolve) => { const _promiser = sqlite3Worker1Promiser({ onready: () => resolve(_promiser), }); }); log('Done initializing. Running demo...'); const configResponse = await promiser('config-get', {}); log('Running SQLite3 version', configResponse.result.version.libVersion); const openResponse = await promiser('open', { filename: 'file:mydb.sqlite3?vfs=opfs', }); const { dbId } = openResponse; log( 'OPFS is available, created persisted database at', openResponse.result.filename.replace(/^file:(.*?)\?vfs=opfs$/, '$1'), ); // Your SQLite code here. } catch (err) { if (!(err instanceof Error)) { err = new Error(err.result.message); } error(err.name, err.message); } }; initializeSQLite(); ``` The `promiser` object above implements the [Worker1 API](https://sqlite.org/wasm/doc/trunk/api-worker1.md#worker1-methods). ### In a worker (with OPFS if available): > [!Warning] > > For this to work, you need to set the following headers on your server: > > `Cross-Origin-Opener-Policy: same-origin` > > `Cross-Origin-Embedder-Policy: require-corp` ```js // In `main.js`. const worker = new Worker('worker.js', { type: 'module' }); ``` ```js // In `worker.js`. import sqlite3InitModule from 'sqlite-wasm-vec'; const log = console.log; const error = console.error; const start = (sqlite3) => { log('Running SQLite3 version', sqlite3.version.libVersion); const db = 'opfs' in sqlite3 ? new sqlite3.oo1.OpfsDb('/mydb.sqlite3') : new sqlite3.oo1.DB('/mydb.sqlite3', 'ct'); log( 'opfs' in sqlite3 ? `OPFS is available, created persisted database at ${db.filename}` : `OPFS is not available, created transient database ${db.filename}`, ); // Your SQLite code here. }; const initializeSQLite = async () => { try { log('Loading and initializing SQLite3 module...'); const sqlite3 = await sqlite3InitModule({ print: log, printErr: error }); log('Done initializing. Running demo...'); start(sqlite3); } catch (err) { error('Initialization error:', err.name, err.message); } }; initializeSQLite(); ``` The `db` object above implements the [Object Oriented API #1](https://sqlite.org/wasm/doc/trunk/api-oo1.md). ### In the main thread (without OPFS): ```js import sqlite3InitModule from 'sqlite-wasm-vec'; const log = console.log; const error = console.error; const start = (sqlite3) => { log('Running SQLite3 version', sqlite3.version.libVersion); const db = new sqlite3.oo1.DB('/mydb.sqlite3', 'ct'); // Your SQLite code here. }; const initializeSQLite = async () => { try { log('Loading and initializing SQLite3 module...'); const sqlite3 = await sqlite3InitModule({ print: log, printErr: error, }); log('Done initializing. Running demo...'); start(sqlite3); } catch (err) { error('Initialization error:', err.name, err.message); } }; initializeSQLite(); ``` The `db` object above implements the [Object Oriented API #1](https://sqlite.org/wasm/doc/trunk/api-oo1.md). ## Usage with vite If you are using [vite](https://vitejs.dev/), you need to add the following config option in `vite.config.js`: ```js import { defineConfig } from 'vite'; export default defineConfig({ server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'require-corp', }, }, optimizeDeps: { exclude: ['sqlite-wasm-vec'], }, }); ``` Check out a [sample project](https://stackblitz.com/edit/vitejs-vite-ttrbwh?file=main.js) that shows this in action. ## Demo See the [demo](https://github.com/sqlite/sqlite-wasm/tree/main/demo) folder for examples of how to use this in the main thread and in a worker. (Note that the worker variant requires special HTTP headers, so it can't be hosted on GitHub Pages.) An example that shows how to use this with vite is available on [StackBlitz](https://stackblitz.com/edit/vitejs-vite-ttrbwh?file=main.js). ## Projects using this package See the list of [npm dependents](https://www.npmjs.com/browse/depended/@sqlite.org/sqlite-wasm) for this package. ## Deploying a new version (These steps can only be executed by maintainers.) 1. Update the version number in `package.json` reflecting the current [SQLite version number](https://sqlite.org/download.html) and add a build identifier suffix like `-build1`. The complete version number should read something like `3.41.2-build1`. 1. Run `npm run build` to build the ES Module. This downloads the latest SQLite Wasm binary and builds the ES Module. 1. Run `npm run deploy` to commit the changes, push to GitHub, and publish the new version to npm. ## License Apache 2.0. ## Acknowledgements This project is based on [SQLite Wasm](https://sqlite.org/wasm), which it conveniently wraps as an ES Module and publishes to npm as [`@sqlite.org/sqlite-wasm`](https://www.npmjs.com/package/@sqlite.org/sqlite-wasm).