UNPKG

hnswsqlite

Version:

Vector search with HNSWlib and SQLite in TypeScript.

210 lines (157 loc) 5.83 kB
# HNSWSQLite [![npm version](https://img.shields.io/npm/v/hnswsqlite.svg)](https://www.npmjs.com/package/hnswsqlite) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![CI](https://github.com/praveencs87/hnswsqlite/actions/workflows/nodejs.yml/badge.svg)](https://github.com/praveencs87/hnswsqlite/actions) A TypeScript library that combines approximate nearest neighbor vector search (via HNSWlib) with SQLite for persistent, lightweight, and efficient semantic search. Perfect for building semantic search applications, recommendation systems, and more. ## Features - 🚀 **Fast Vector Search**: Approximate nearest neighbor search using HNSW algorithm - 💾 **Persistence**: All data stored in SQLite for durability and easy backup - 🔌 **Plugin System**: Support for multiple embedding providers: - OpenAI - HuggingFace - Dummy (for testing) - **WebLLM** (browser-based LLMs) - **MediaPipe** (image/video feature extraction) - **TensorFlow.js** (text/image/audio feature extraction) - 🛠️ **CLI Tool**: Full-featured command-line interface for easy interaction - 📦 **Lightweight**: No external dependencies other than SQLite and HNSWlib - 🧩 **Extensible**: Easy to integrate with existing applications - 🔄 **Batch Operations**: Support for adding and deleting multiple documents at once ## Installation ### As a Library ```bash npm install hnswsqlite ``` ### As a CLI Tool ```bash # Install globally npm install -g hnswsqlite # Or use with npx npx hnswsqlite --help ``` View on npm: [https://www.npmjs.com/package/hnswsqlite](https://www.npmjs.com/package/hnswsqlite) View on GitHub: [https://github.com/praveencs87/hnswsqlite](https://github.com/praveencs87/hnswsqlite) ## Usage ### JavaScript/TypeScript API ```typescript import { VectorStore } from 'hnswsqlite'; // Initialize with SQLite database path and embedding dimension const store = new VectorStore('my_vectors.db', 1536); try { // Add documents with embeddings const docId = store.addDocument('hello world', [0.1, 0.2, 0.3, ...]); // Search for similar documents const results = store.search([0.1, 0.2, 0.3, ...], 5); // Delete a document const deleted = store.deleteDocument(docId); // Batch operations const docIds = store.addDocuments([ { text: 'first document', embedding: [0.1, 0.2, ...] }, { text: 'second document', embedding: [0.3, 0.4, ...] } ]); } finally { // Always close the store when done store.close(); } ``` ### Command Line Interface (CLI) #### Initialize a new database ```bash hnswsqlite init ``` #### Add a document ```bash # With automatic dummy embedding hnswsqlite add "Your document text here" # With custom embedding hnswsqlite add "Another document" 0.1 0.2 0.3 ... # (Planned) With specific provider (e.g., WebLLM, MediaPipe, TensorFlow.js) hnswsqlite add "Text or image path" --provider webllm ``` #### Search for similar documents ```bash hnswsqlite search "search query" ``` #### List all documents ```bash hnswsqlite list ``` #### Delete a document ```bash hnswsqlite delete 1 ``` #### CLI Options ``` -d, --database <path> Path to the SQLite database (default: vectors.db) --dim <dimension> Dimension of the vectors (default: 1536) --provider <name> Embedding provider to use (openai, huggingface, webllm, mediapipe, tensorflowjs, dummy) --verbose Enable verbose output ``` ## Advanced Usage ### Using Different Embedding Providers All embedding providers implement a common interface: ```typescript type EmbeddingPlugin = { name: string; generateEmbedding(input: string | Buffer): Promise<number[]>; }; ``` #### Example: OpenAI ```typescript import { VectorStore } from 'hnswsqlite'; import { OpenAIEmbedder } from 'hnswsqlite/plugins/openai'; const store = new VectorStore('my_vectors.db', 1536); const embedder = new OpenAIEmbedder('your-api-key'); const embedding = await embedder.generateEmbedding('Your text here'); store.addDocument('Your text here', embedding); ``` #### Example: WebLLM (browser-based LLMs) ```typescript import { WebLLMPlugin } from 'hnswsqlite/plugins/webllm'; const plugin = new WebLLMPlugin(); const embedding = await plugin.generateEmbedding('Your text here'); ``` #### Example: MediaPipe (image/video feature extraction) ```typescript import { MediaPipePlugin } from 'hnswsqlite/plugins/mediapipe'; const plugin = new MediaPipePlugin(); const embedding = await plugin.generateEmbedding(imageBuffer); ``` #### Example: TensorFlow.js (text/image/audio) ```typescript import { TensorFlowPlugin } from 'hnswsqlite/plugins/tensorflow'; const plugin = new TensorFlowPlugin(); const embedding = await plugin.generateEmbedding('Your text or image buffer'); ``` > **Note:** Each plugin may require additional dependencies or setup. See the plugin source for details. ### Performance Tuning ```typescript const store = new VectorStore('my_vectors.db', 1536, { maxElements: 100000, // Maximum number of elements in the index M: 16, // Maximum number of outgoing connections in the graph efConstruction: 200, // Controls index search speed/build speed tradeoff randomSeed: 100, // Random seed for reproducibility }); ``` ## Development ```bash # Clone the repository git clone https://github.com/praveencs87/hnswsqlite.git cd hnswsqlite # Install dependencies npm install # Build the project npm run build # Run tests npm test # Run the CLI in development mode npm run cli -- --help ``` ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT © [Praveen CS](https://www.linkedin.com/in/praveen-cs/) --- ## Author Maintained by [Praveen CS](https://www.linkedin.com/in/praveen-cs/) - GitHub: [praveencs87](https://github.com/praveencs87)