seekmix
Version:
๐ A local semantic caching library for Node.js.
269 lines (197 loc) โข 9.81 kB
Markdown
# SeekMix
SeekMix is a powerful semantic caching library for Node.js that leverages vector embeddings to cache and retrieve semantically similar queries, significantly reducing API calls to expensive LLM services.
## Features
- **Semantic Caching**: Cache results based on the semantic meaning of queries, not just exact matches
- **Configurable Similarity Threshold**: Fine-tune how semantically similar queries need to be for a cache hit
- **Local Embedding Models**: By default, SeekMix uses Hugging Face embedding models locally, reducing external API dependencies
- **Multiple Embedding Providers**: Support for OpenAI and Hugging Face embedding models
- **SQLite + sqlite-vec**: Persistent vector storage powered by SQLite โ no external services required
- **Time-based Invalidation**: Easily invalidate old cache entries based on time criteria
- **TTL Support**: Configure time-to-live for all cache entries
- **Tag-based Filtering**: Classify cache entries with tags and filter on retrieval
## Benefits
- **Cost Reduction**: Minimize expensive API calls to Large Language Models
- **Improved Response Times**: Retrieve cached results for semantically similar queries instantly
- **Perfect for RAG Applications**: Ideal for Retrieval-Augmented Generation systems
- **Zero Infrastructure**: Just a local SQLite file
- **Flexible Configuration**: Adapt to your specific use case with multiple configuration options
- **Multi-model Support**: Use with OpenAI or open-source Hugging Face models
## Installation
```bash
npm install seekmix
```
> **AI Skill**: You can also add SeekMix as a skill for AI agentic development:
> ```bash
> npx skills add https://github.com/clasen/SeekMix --skill seekmix
> ```
## Quick Start
```javascript
import { SeekMix } from 'seekmix';
const cache = new SeekMix();
await cache.connect();
// Store a response
await cache.set('How to make pasta', 'Boil water, add pasta, cook 8 min...');
// Retrieve it with a semantically similar query
const hit = await cache.get('Steps for cooking pasta');
console.log(hit.result); // 'Boil water, add pasta, cook 8 min...'
await cache.disconnect();
```
The query `"Steps for cooking pasta"` was never stored โ but SeekMix understands it means the same as `"How to make pasta"` and returns the cached result.
## Usage with an LLM
A typical pattern is to check the cache before calling an expensive API:
```javascript
import { SeekMix } from 'seekmix';
const cache = new SeekMix({
similarityThreshold: 0.9,
ttl: 60 * 60, // 1 hour
});
await cache.connect();
async function ask(question) {
// 1. Check cache first
const hit = await cache.get(question);
if (hit) return hit.result;
// 2. Cache miss โ call the LLM
const answer = await callYourLLM(question);
// 3. Store for future similar questions
await cache.set(question, answer);
return answer;
}
// First call hits the LLM
await ask('What are the best restaurants in New York');
// This call returns the cached result โ no LLM call needed
await ask('Recommend places to eat in New York');
await cache.disconnect();
```
## Advanced Configuration
```javascript
import { SeekMix, OpenAIEmbeddingProvider } from 'seekmix';
// Create a semantic cache with OpenAI embeddings and custom settings
const cache = new SeekMix({
dbPath: 'my-app-cache.db', // SQLite database file path (default: 'seekmix.db')
ttl: 60 * 60 * 24 * 7, // 1 week
similarityThreshold: 0.85,
dropIndex: false, // Set to true to recreate tables on connect
dropKeys: false, // Set to true to clear all cache entries on connect
embeddingProvider: new OpenAIEmbeddingProvider({
model: 'text-embedding-ada-002',
apiKey: process.env.OPENAI_API_KEY
})
});
```
### Configuration Options
| Option | Default | Description |
|---|---|---|
| `dbPath` | `'seekmix.db'` | Path to the SQLite database file. Use `':memory:'` for in-memory storage |
| `ttl` | `-1` | Time-to-live in seconds for cache entries. `-1` means no expiration |
| `similarityThreshold` | `0.87` | Cosine similarity threshold for cache hits (0-1) |
| `dropIndex` | `false` | Drop and recreate tables on `connect()` |
| `dropKeys` | `false` | Delete all entries on `connect()` |
| `embeddingProvider` | `HuggingfaceProvider` | Embedding provider instance |
## Using Qwen3 Embedding (via OpenRouter)
[Qwen3 Embedding 8B](https://openrouter.ai/qwen/qwen3-embedding-8b) is a state-of-the-art multilingual embedding model with 32k context window, excellent for multilingual queries, code retrieval, and long-text understanding.
```javascript
import { SeekMix, QwenEmbeddingProvider } from 'seekmix';
const cache = new SeekMix({
embeddingProvider: new QwenEmbeddingProvider()
});
await cache.connect();
// Works seamlessly across languages
await cache.set('Best restaurants in New York', 'Try Le Bernardin or Eleven Madison Park.');
await cache.set('Cรณmo hacer pasta al dente', 'Hierve agua con sal y cocina 1-2 min menos.');
// Retrieve with a semantically similar query in any language
const hit = await cache.get('Where should I eat in New York?');
console.log(hit.result); // 'Try Le Bernardin or Eleven Madison Park.'
await cache.disconnect();
```
Requires `OPENROUTER_API_KEY` in your environment. See [OpenRouter](https://openrouter.ai) for API key setup.
### Embedding Providers
| Provider | Class | Model | Dimensions | Notes |
|---|---|---|---|---|
| Hugging Face (local) | `HuggingfaceProvider` | `Xenova/multilingual-e5-large` | 1024 | Default, no API key needed |
| OpenAI | `OpenAIEmbeddingProvider` | `text-embedding-ada-002` | 1536 | Requires `OPENAI_API_KEY` |
| OpenAI v3 small | `OpenAIEmbedding3Provider` | `text-embedding-3-small` | 1536 | Requires `OPENAI_API_KEY` |
| OpenAI v3 large | `OpenAIEmbedding3LargeProvider` | `text-embedding-3-large` | 3072 | Requires `OPENAI_API_KEY` |
| OpenRouter (generic) | `OpenRouterEmbeddingProvider` | any OpenRouter model | varies | Requires `OPENROUTER_API_KEY` |
| Qwen3 Embedding 8B | `QwenEmbeddingProvider` | `qwen/qwen3-embedding-8b` | 4096 | Requires `OPENROUTER_API_KEY` |
| BAAI bge-m3 | `BgeM3EmbeddingProvider` | `baai/bge-m3` | 1024 | Requires `OPENROUTER_API_KEY` |
|| Multilingual E5 Large | `MultilingualE5LargeProvider` | `intfloat/multilingual-e5-large` | 1024 | Requires `OPENROUTER_API_KEY` |
| OpenAI text-embedding-3-small (OpenRouter) | `OpenAIEmbedding3SmallRouterProvider` | `openai/text-embedding-3-small` | 1536 | Requires `OPENROUTER_API_KEY` |
| OpenAI text-embedding-3-large (OpenRouter) | `OpenAIEmbedding3LargeRouterProvider` | `openai/text-embedding-3-large` | 3072 | Requires `OPENROUTER_API_KEY` |
## Using with RAG Applications
SeekMix is perfect for Retrieval-Augmented Generation applications, as it can cache both the retrieval and generation steps:
```javascript
// Caching the retrieval step
const retrievalCache = new SeekMix({ dbPath: 'rag-retrieval.db' });
await retrievalCache.connect();
// Caching the generation step
const generationCache = new SeekMix({ dbPath: 'rag-generation.db' });
await generationCache.connect();
async function queryRAG(userQuestion) {
// 1. Try to get the final answer from generation cache
const cachedAnswer = await generationCache.get(userQuestion);
if (cachedAnswer) return cachedAnswer.result;
// 2. Try to get retrieved context from retrieval cache
let context;
const cachedRetrieval = await retrievalCache.get(userQuestion);
if (cachedRetrieval) {
context = cachedRetrieval.result;
} else {
// Perform actual retrieval from vector DB
context = await retrieveDocuments(userQuestion);
// Cache the retrieval results
await retrievalCache.set(userQuestion, context);
}
// 3. Generate answer using LLM
const answer = await generateAnswer(context, userQuestion);
// 4. Cache the final answer
await generationCache.set(userQuestion, answer);
return answer;
}
```
## Tag-based Filtering
Classify cache entries with tags to filter results by category, language, domain, or any custom dimension.
### Include tags (legacy + new format)
- Legacy format: `tags: ['a', 'b']`
- New format: `tags: { in: ['a', 'b'] }`
Both use **AND logic** โ all specified tags must be present for a match.
```javascript
// Store entries with tags
await cache.set('Mejores restaurantes en Madrid', resultEs, { tags: ['lang:es'] });
await cache.set('Best restaurants in Madrid', resultEn, { tags: ['lang:en'] });
await cache.set('Latest AI news', resultTech, { tags: ['lang:en', 'code:NVDA'] });
// Retrieve filtering by tag
const hit = await cache.get('Restaurantes en Madrid', { tags: ['lang:es'] });
// โ
Only matches entries tagged with 'lang:es'
// Multiple tags (AND logic: entry must have ALL specified tags)
const hit2 = await cache.get('AI news', { tags: ['lang:en', 'code:NVDA'] });
// โ
Only matches entries tagged with BOTH 'lang:en' AND 'code:NVDA'
// Without tags โ same behavior as always
const hit3 = await cache.get('Restaurants in Madrid');
```
### Exclude tags (`out`)
You can also exclude tags at retrieval time:
```javascript
// Exclude entries that have ANY of these tags
const hit = await cache.get('AI news', { tags: { out: ['lang:es'] } });
// Combine include + exclude
const hit2 = await cache.get('AI news', { tags: { in: ['lang:en'], out: ['code:NVDA'] } });
```
The result object includes the matched entry's tags:
```javascript
{
query: 'Mejores restaurantes en Madrid',
result: resultEs,
timestamp: 1234567890,
score: 0.032,
tags: ['lang:es']
}
```
## Invalidating Old Cache Entries
You can manually invalidate old cache entries:
```javascript
// Invalidate entries older than 1 hour
const invalidated = await cache.invalidateOld(60 * 60);
console.log(`Invalidated ${invalidated} old cache entries`);
```
## License
MIT