UNPKG

@wcs-colab/plugin-fuzzy-phrase

Version:

Advanced fuzzy phrase matching plugin for Orama with semantic weighting and synonym expansion

165 lines (127 loc) 4.23 kB
# @wcs-colab/plugin-fuzzy-phrase Advanced fuzzy phrase matching plugin for Orama with semantic weighting and synonym expansion. ## Features - ✅ **Independent from QPS** - Direct radix tree access, no QPS dependency - ✅ **Fuzzy matching** - Using `boundedLevenshtein` algorithm (same as match-highlight) - ✅ **Phrase-level scoring** - Multi-factor scoring algorithm - ✅ **Synonym expansion** - Load synonyms from Supabase - ✅ **Adaptive tolerance** - Dynamically scales with query length - ✅ **Semantic weighting** - TF-IDF scoring for relevance - ✅ **Configurable** - All weights and thresholds are configurable ## Installation ```bash npm install @wcs-colab/plugin-fuzzy-phrase ``` ## Basic Usage ```typescript import { create } from '@wcs-colab/orama'; import { pluginFuzzyPhrase } from '@wcs-colab/plugin-fuzzy-phrase'; const db = await create({ schema: { content: 'string', title: 'string' }, plugins: [ pluginFuzzyPhrase({ textProperty: 'content', tolerance: 1, adaptiveTolerance: true }) ] }); // Search with fuzzy phrase matching const results = await search(db, { term: 'fuzzy search example', properties: ['content'] }); ``` ## Configuration ```typescript interface FuzzyPhraseConfig { // Text property to search in textProperty?: string; // default: 'content' // Base fuzzy matching tolerance (edit distance) tolerance?: number; // default: 1 // Enable adaptive tolerance (scales with query length) adaptiveTolerance?: boolean; // default: true // Enable synonym expansion enableSynonyms?: boolean; // default: false // Supabase configuration for loading synonyms supabase?: { url: string; serviceKey: string; }; // Scoring weight for synonym matches (0-1) synonymMatchScore?: number; // default: 0.8 // Scoring weights for different components weights?: { exact?: number; // default: 1.0 fuzzy?: number; // default: 0.8 order?: number; // default: 0.3 proximity?: number; // default: 0.2 density?: number; // default: 0.2 semantic?: number; // default: 0.15 }; // Maximum gap between words in a phrase maxGap?: number; // default: 5 // Minimum phrase score to include in results minScore?: number; // default: 0.1 } ``` ## With Synonyms (Supabase) ```typescript import { pluginFuzzyPhrase } from '@wcs-colab/plugin-fuzzy-phrase'; const db = await create({ schema: { content: 'string' }, plugins: [ pluginFuzzyPhrase({ textProperty: 'content', enableSynonyms: true, supabase: { url: process.env.SUPABASE_URL, serviceKey: process.env.SUPABASE_SERVICE_ROLE_KEY } }) ] }); // Now searches will include synonym matches // e.g., "humanité" will also match "homme", "humain" ``` ## How It Works ### 1. Candidate Expansion For each query token, the plugin finds: - **Exact matches** - Exact word match (score: 1.0) - **Fuzzy matches** - Within edit distance tolerance (score: 0.6-0.95) - **Synonym matches** - From synonym dictionary (score: 0.8) ### 2. Phrase Finding Uses sliding window to find phrases where: - Words are within `maxGap` distance - Multiple query tokens are present - Phrases don't overlap ### 3. Multi-Factor Scoring Each phrase is scored using: - **Base score** - Quality of word matches - **Order bonus** - Words in correct order - **Proximity bonus** - Words close together - **Density bonus** - Percentage of query covered - **Semantic bonus** - TF-IDF relevance weighting ### 4. Result Ranking Results are sorted by highest phrase score. ## Architecture The plugin is completely independent from QPS: - Accesses Orama's radix tree directly - Uses same `boundedLevenshtein` as match-highlight plugin - Implements custom phrase-level scoring - Loads synonyms from Supabase (optional) ## Performance - **Bounded Levenshtein** - Early termination for performance - **Vocabulary extraction** - One-time cost at index creation - **TF-IDF** - Pre-calculated document frequencies - **Deduplication** - Non-overlapping phrase optimization ## License Apache-2.0 ## Version 3.1.16-custom.1 Compatible with `@wcs-colab/orama@3.1.16-custom.9`