@wcs-colab/plugin-fuzzy-phrase
Version:
Advanced fuzzy phrase matching plugin for Orama with semantic weighting and synonym expansion
165 lines (127 loc) • 4.23 kB
Markdown
# @wcs-colab/plugin-fuzzy-phrase
Advanced fuzzy phrase matching plugin for Orama with semantic weighting and synonym expansion.
## Features
- ✅ **Independent from QPS** - Direct radix tree access, no QPS dependency
- ✅ **Fuzzy matching** - Using `boundedLevenshtein` algorithm (same as match-highlight)
- ✅ **Phrase-level scoring** - Multi-factor scoring algorithm
- ✅ **Synonym expansion** - Load synonyms from Supabase
- ✅ **Adaptive tolerance** - Dynamically scales with query length
- ✅ **Semantic weighting** - TF-IDF scoring for relevance
- ✅ **Configurable** - All weights and thresholds are configurable
## Installation
```bash
npm install @wcs-colab/plugin-fuzzy-phrase
```
## Basic Usage
```typescript
import { create } from '@wcs-colab/orama';
import { pluginFuzzyPhrase } from '@wcs-colab/plugin-fuzzy-phrase';
const db = await create({
schema: {
content: 'string',
title: 'string'
},
plugins: [
pluginFuzzyPhrase({
textProperty: 'content',
tolerance: 1,
adaptiveTolerance: true
})
]
});
// Search with fuzzy phrase matching
const results = await search(db, {
term: 'fuzzy search example',
properties: ['content']
});
```
## Configuration
```typescript
interface FuzzyPhraseConfig {
// Text property to search in
textProperty?: string; // default: 'content'
// Base fuzzy matching tolerance (edit distance)
tolerance?: number; // default: 1
// Enable adaptive tolerance (scales with query length)
adaptiveTolerance?: boolean; // default: true
// Enable synonym expansion
enableSynonyms?: boolean; // default: false
// Supabase configuration for loading synonyms
supabase?: {
url: string;
serviceKey: string;
};
// Scoring weight for synonym matches (0-1)
synonymMatchScore?: number; // default: 0.8
// Scoring weights for different components
weights?: {
exact?: number; // default: 1.0
fuzzy?: number; // default: 0.8
order?: number; // default: 0.3
proximity?: number; // default: 0.2
density?: number; // default: 0.2
semantic?: number; // default: 0.15
};
// Maximum gap between words in a phrase
maxGap?: number; // default: 5
// Minimum phrase score to include in results
minScore?: number; // default: 0.1
}
```
## With Synonyms (Supabase)
```typescript
import { pluginFuzzyPhrase } from '@wcs-colab/plugin-fuzzy-phrase';
const db = await create({
schema: {
content: 'string'
},
plugins: [
pluginFuzzyPhrase({
textProperty: 'content',
enableSynonyms: true,
supabase: {
url: process.env.SUPABASE_URL,
serviceKey: process.env.SUPABASE_SERVICE_ROLE_KEY
}
})
]
});
// Now searches will include synonym matches
// e.g., "humanité" will also match "homme", "humain"
```
## How It Works
### 1. Candidate Expansion
For each query token, the plugin finds:
- **Exact matches** - Exact word match (score: 1.0)
- **Fuzzy matches** - Within edit distance tolerance (score: 0.6-0.95)
- **Synonym matches** - From synonym dictionary (score: 0.8)
### 2. Phrase Finding
Uses sliding window to find phrases where:
- Words are within `maxGap` distance
- Multiple query tokens are present
- Phrases don't overlap
### 3. Multi-Factor Scoring
Each phrase is scored using:
- **Base score** - Quality of word matches
- **Order bonus** - Words in correct order
- **Proximity bonus** - Words close together
- **Density bonus** - Percentage of query covered
- **Semantic bonus** - TF-IDF relevance weighting
### 4. Result Ranking
Results are sorted by highest phrase score.
## Architecture
The plugin is completely independent from QPS:
- Accesses Orama's radix tree directly
- Uses same `boundedLevenshtein` as match-highlight plugin
- Implements custom phrase-level scoring
- Loads synonyms from Supabase (optional)
## Performance
- **Bounded Levenshtein** - Early termination for performance
- **Vocabulary extraction** - One-time cost at index creation
- **TF-IDF** - Pre-calculated document frequencies
- **Deduplication** - Non-overlapping phrase optimization
## License
Apache-2.0
## Version
3.1.16-custom.1
Compatible with `@wcs-colab/orama@3.1.16-custom.9`