seekmix

# SeekMix SeekMix is a powerful semantic caching library for Node.js that leverages vector embeddings to cache and retrieve semantically similar queries, significantly reducing API calls to expensive LLM services. ## Features - **Semantic Caching**: Cache results based on the semantic meaning of queries, not just exact matches - **Configurable Similarity Threshold**: Fine-tune how semantically similar queries need to be for a cache hit - **Local Embedding Models**: By default, SeekMix uses Hugging Face embedding models locally, reducing external API dependencies - **Multiple Embedding Providers**: Support for OpenAI and Hugging Face embedding models - **SQLite + sqlite-vec**: Persistent vector storage powered by SQLite — no external services required - **Time-based Invalidation**: Easily invalidate old cache entries based on time criteria - **TTL Support**: Configure time-to-live for all cache entries - **Tag-based Filtering**: Classify cache entries with tags and filter on retrieval ## Benefits - **Cost Reduction**: Minimize expensive API calls to Large Language Models - **Improved Response Times**: Retrieve cached results for semantically similar queries instantly - **Perfect for RAG Applications**: Ideal for Retrieval-Augmented Generation systems - **Zero Infrastructure**: Just a local SQLite file - **Flexible Configuration**: Adapt to your specific use case with multiple configuration options - **Multi-model Support**: Use with OpenAI or open-source Hugging Face models ## Installation ```bash npm install seekmix ``` > **AI Skill**: You can also add SeekMix as a skill for AI agentic development: > ```bash > npx skills add https://github.com/clasen/SeekMix --skill seekmix > ``` ## Quick Start ```javascript import { SeekMix } from 'seekmix'; const cache = new SeekMix(); await cache.connect(); // Store a response await cache.set('How to make pasta', 'Boil water, add pasta, cook 8 min...'); // Retrieve it with a semantically similar query const hit = await cache.get('Steps for cooking pasta'); console.log(hit.result); // 'Boil water, add pasta, cook 8 min...' await cache.disconnect(); ``` The query `"Steps for cooking pasta"` was never stored — but SeekMix understands it means the same as `"How to make pasta"` and returns the cached result. ## Usage with an LLM A typical pattern is to check the cache before calling an expensive API: ```javascript import { SeekMix } from 'seekmix'; const cache = new SeekMix({ similarityThreshold: 0.9, ttl: 60 * 60, // 1 hour }); await cache.connect(); async function ask(question) { // 1. Check cache first const hit = await cache.get(question); if (hit) return hit.result; // 2. Cache miss — call the LLM const answer = await callYourLLM(question); // 3. Store for future similar questions await cache.set(question, answer); return answer; } // First call hits the LLM await ask('What are the best restaurants in New York'); // This call returns the cached result — no LLM call needed await ask('Recommend places to eat in New York'); await cache.disconnect(); ``` ## Advanced Configuration ```javascript import { SeekMix, OpenAIEmbeddingProvider } from 'seekmix'; // Create a semantic cache with OpenAI embeddings and custom settings const cache = new SeekMix({ dbPath: 'my-app-cache.db', // SQLite database file path (default: 'seekmix.db') ttl: 60 * 60 * 24 * 7, // 1 week similarityThreshold: 0.85, dropIndex: false, // Set to true to recreate tables on connect dropKeys: false, // Set to true to clear all cache entries on connect embeddingProvider: new OpenAIEmbeddingProvider({ model: 'text-embedding-ada-002', apiKey: process.env.OPENAI_API_KEY }) }); ``` ### Configuration Options | Option | Default | Description | |---|---|---| | `dbPath` | `'seekmix.db'` | Path to the SQLite database file. Use `':memory:'` for in-memory storage | | `ttl` | `-1` | Time-to-live in seconds for cache entries. `-1` means no expiration | | `similarityThreshold` | `0.87` | Cosine similarity threshold for cache hits (0-1) | | `dropIndex` | `false` | Drop and recreate tables on `connect()` | | `dropKeys` | `false` | Delete all entries on `connect()` | | `embeddingProvider` | `HuggingfaceProvider` | Embedding provider instance | ## Using Qwen3 Embedding (via OpenRouter) [Qwen3 Embedding 8B](https://openrouter.ai/qwen/qwen3-embedding-8b) is a state-of-the-art multilingual embedding model with 32k context window, excellent for multilingual queries, code retrieval, and long-text understanding. ```javascript import { SeekMix, QwenEmbeddingProvider } from 'seekmix'; const cache = new SeekMix({ embeddingProvider: new QwenEmbeddingProvider() }); await cache.connect(); // Works seamlessly across languages await cache.set('Best restaurants in New York', 'Try Le Bernardin or Eleven Madison Park.'); await cache.set('Cómo hacer pasta al dente', 'Hierve agua con sal y cocina 1-2 min menos.'); // Retrieve with a semantically similar query in any language const hit = await cache.get('Where should I eat in New York?'); console.log(hit.result); // 'Try Le Bernardin or Eleven Madison Park.' await cache.disconnect(); ``` Requires `OPENROUTER_API_KEY` in your environment. See [OpenRouter](https://openrouter.ai) for API key setup. ### Embedding Providers | Provider | Class | Model | Dimensions | Notes | |---|---|---|---|---| | Hugging Face (local) | `HuggingfaceProvider` | `Xenova/multilingual-e5-large` | 1024 | Default, no API key needed | | OpenAI | `OpenAIEmbeddingProvider` | `text-embedding-ada-002` | 1536 | Requires `OPENAI_API_KEY` | | OpenAI v3 small | `OpenAIEmbedding3Provider` | `text-embedding-3-small` | 1536 | Requires `OPENAI_API_KEY` | | OpenAI v3 large | `OpenAIEmbedding3LargeProvider` | `text-embedding-3-large` | 3072 | Requires `OPENAI_API_KEY` | | OpenRouter (generic) | `OpenRouterEmbeddingProvider` | any OpenRouter model | varies | Requires `OPENROUTER_API_KEY` | | Qwen3 Embedding 8B | `QwenEmbeddingProvider` | `qwen/qwen3-embedding-8b` | 4096 | Requires `OPENROUTER_API_KEY` | | BAAI bge-m3 | `BgeM3EmbeddingProvider` | `baai/bge-m3` | 1024 | Requires `OPENROUTER_API_KEY` | || Multilingual E5 Large | `MultilingualE5LargeProvider` | `intfloat/multilingual-e5-large` | 1024 | Requires `OPENROUTER_API_KEY` | | OpenAI text-embedding-3-small (OpenRouter) | `OpenAIEmbedding3SmallRouterProvider` | `openai/text-embedding-3-small` | 1536 | Requires `OPENROUTER_API_KEY` | | OpenAI text-embedding-3-large (OpenRouter) | `OpenAIEmbedding3LargeRouterProvider` | `openai/text-embedding-3-large` | 3072 | Requires `OPENROUTER_API_KEY` | ## Using with RAG Applications SeekMix is perfect for Retrieval-Augmented Generation applications, as it can cache both the retrieval and generation steps: ```javascript // Caching the retrieval step const retrievalCache = new SeekMix({ dbPath: 'rag-retrieval.db' }); await retrievalCache.connect(); // Caching the generation step const generationCache = new SeekMix({ dbPath: 'rag-generation.db' }); await generationCache.connect(); async function queryRAG(userQuestion) { // 1. Try to get the final answer from generation cache const cachedAnswer = await generationCache.get(userQuestion); if (cachedAnswer) return cachedAnswer.result; // 2. Try to get retrieved context from retrieval cache let context; const cachedRetrieval = await retrievalCache.get(userQuestion); if (cachedRetrieval) { context = cachedRetrieval.result; } else { // Perform actual retrieval from vector DB context = await retrieveDocuments(userQuestion); // Cache the retrieval results await retrievalCache.set(userQuestion, context); } // 3. Generate answer using LLM const answer = await generateAnswer(context, userQuestion); // 4. Cache the final answer await generationCache.set(userQuestion, answer); return answer; } ``` ## Tag-based Filtering Classify cache entries with tags to filter results by category, language, domain, or any custom dimension. ### Include tags (legacy + new format) - Legacy format: `tags: ['a', 'b']` - New format: `tags: { in: ['a', 'b'] }` Both use **AND logic** — all specified tags must be present for a match. ```javascript // Store entries with tags await cache.set('Mejores restaurantes en Madrid', resultEs, { tags: ['lang:es'] }); await cache.set('Best restaurants in Madrid', resultEn, { tags: ['lang:en'] }); await cache.set('Latest AI news', resultTech, { tags: ['lang:en', 'code:NVDA'] }); // Retrieve filtering by tag const hit = await cache.get('Restaurantes en Madrid', { tags: ['lang:es'] }); // ✅ Only matches entries tagged with 'lang:es' // Multiple tags (AND logic: entry must have ALL specified tags) const hit2 = await cache.get('AI news', { tags: ['lang:en', 'code:NVDA'] }); // ✅ Only matches entries tagged with BOTH 'lang:en' AND 'code:NVDA' // Without tags — same behavior as always const hit3 = await cache.get('Restaurants in Madrid'); ``` ### Exclude tags (`out`) You can also exclude tags at retrieval time: ```javascript // Exclude entries that have ANY of these tags const hit = await cache.get('AI news', { tags: { out: ['lang:es'] } }); // Combine include + exclude const hit2 = await cache.get('AI news', { tags: { in: ['lang:en'], out: ['code:NVDA'] } }); ``` The result object includes the matched entry's tags: ```javascript { query: 'Mejores restaurantes en Madrid', result: resultEs, timestamp: 1234567890, score: 0.032, tags: ['lang:es'] } ``` ## Invalidating Old Cache Entries You can manually invalidate old cache entries: ```javascript // Invalidate entries older than 1 hour const invalidated = await cache.invalidateOld(60 * 60); console.log(`Invalidated ${invalidated} old cache entries`); ``` ## License MIT