@zanreal/search

Version:

A powerful TypeScript fuzzy search library with intelligent scoring, exact match prioritization, and automatic field detection for any object structure

github.com/zanreal-labs/search

zanreal-labs/search

874 lines (652 loc) • 23.6 kB

Markdown

# Universal Search Engine A powerful TypeScript fuzzy search library with intelligent scoring, exact match prioritization, and automatic field detection for any object structure. ## Features - **Universal Search**: Works with any data structure without configuration - **Intelligent Scoring**: Prioritizes exact matches, then fuzzy matches with smart weighting - **Automatic Field Detection**: Automatically finds searchable string fields in your objects - **Nested Object Support**: Search through nested properties using dot notation - **Customizable Weights**: Define field importance with custom weights - **Multiple Search Types**: Exact start matches, exact contains, and fuzzy matching - **TypeScript First**: Full TypeScript support with comprehensive type definitions ## Installation ```bash npm install @zanreal/search ``` Or with other package managers: ```bash # Yarn yarn add @zanreal/search # pnpm pnpm add @zanreal/search # Bun bun add @zanreal/search ``` ## Quick Start ```typescript import { search, searchItems, quickSearch } from '@zanreal/search'; // Simple search - returns just the matching items const data = [ { name: 'John Doe', email: 'john@example.com' }, { name: 'Jane Smith', email: 'jane@example.com' } ]; const results = quickSearch(data, 'john'); // Returns: [{ name: 'John Doe', email: 'john@example.com' }] // Detailed search - returns items with scores and match details const detailedResults = search(data, 'john'); // Returns: [{ item: {...}, score: 15.2, matches: [...] }] ``` > 💡 **Want to see more examples?** Check out our [comprehensive examples collection](./examples/) with production-ready patterns for e-commerce, document search, user directories, and more! Run `npm run examples` for an interactive explorer. ## API Reference ### Core Functions #### `search<T>(data: T[], query: string, options?: SearchOptions): SearchResult<T>[]` The main search function that returns detailed results with scores and match information. ```typescript const results = search(users, 'john', { fields: ['name', 'email'], fieldWeights: { name: 5, email: 1 }, fuzzyThreshold: 0.7, limit: 10 }); ``` #### `searchItems<T>(data: T[], query: string, options?: SearchOptions): T[]` Simplified search that returns just the matching items. ```typescript const items = searchItems(users, 'john', { fields: ['name', 'email'] }); ``` #### `quickSearch<T>(data: T[], query: string, fields?: string[]): T[]` Quick search with sensible defaults for most use cases. ```typescript const results = quickSearch(users, 'john', ['name', 'email']); ``` ### Factory Functions #### `createSearcher<T>(config: SearchOptions)` Create a reusable search function with predefined configuration. ```typescript const searchUsers = createSearcher<User>({ fieldWeights: { name: 5, email: 2, bio: 1 }, fuzzyThreshold: 0.8 }); const results = searchUsers(users, 'john'); ``` #### `createDocumentSearcher<T>()` Create a document searcher with common defaults. ```typescript const searchDocs = createDocumentSearcher<Document>(); const results = searchDocs(documents, 'typescript'); ``` ## Search Options ```typescript interface SearchOptions { /** Fields to search in (auto-detected if not provided) */ fields?: string[]; /** Custom field weights (higher = more important) */ fieldWeights?: Record<string, number>; /** Minimum similarity threshold for fuzzy matching (0-1) */ fuzzyThreshold?: number; /** Minimum query length for fuzzy matching */ minFuzzyLength?: number; /** Maximum number of results to return */ limit?: number; /** Case sensitive search */ caseSensitive?: boolean; } ``` ## Search Results ```typescript interface SearchResult<T> { item: T; // The matching item score: number; // Relevance score matches: SearchMatch[]; // Detailed match information } interface SearchMatch { field: string; // Which field matched value: string; // The field value score: number; // Match score for this field type: "exact-start" | "exact-contain" | "fuzzy"; position?: number; // Position of match in string } ``` ## Advanced Usage ### Nested Object Search ```typescript const data = [ { user: { name: 'John', profile: { title: 'Developer' } }, company: { name: 'Tech Corp' } } ]; // Automatically searches nested fields: user.name, user.profile.title, company.name const results = search(data, 'developer'); ``` ### Custom Field Weights ```typescript const results = search(articles, 'typescript', { fieldWeights: { title: 10, // Highest priority summary: 5, // Medium priority content: 1 // Lower priority } }); ``` ### Field-Specific Configuration ```typescript const searchConfig = { fields: ['title', 'author.name', 'tags'], fieldWeights: { title: 8, 'author.name': 3, tags: 2 }, fuzzyThreshold: 0.8, limit: 20 }; const searcher = createSearcher<Article>(searchConfig); const results = searcher(articles, query); ``` ## Match Types & Scoring 1. **Exact Start Match** (Highest Score): Query matches from the beginning of the field 2. **Exact Contains Match** (High Score): Query found anywhere in the field 3. **Fuzzy Match** (Lower Score): Similar strings based on Levenshtein distance The library automatically adjusts scores based on: - Field importance (via weights) - Match position (earlier = better) - Field length (shorter fields get bonus) - String similarity (for fuzzy matches) ## Default Options ```typescript export const DEFAULT_SEARCH_OPTIONS = { fieldWeights: {}, fuzzyThreshold: 0.7, minFuzzyLength: 3, limit: 100, caseSensitive: false, }; ``` ## Development This project uses Bun for fast TypeScript execution and development. ```bash # Install dependencies bun install # Run TypeScript directly bun run src/index.ts ``` ## Examples & Use Cases The `/examples` directory contains comprehensive, production-ready examples for different domains and use cases: ### 🚀 Quick Start with Examples **Prerequisites:** Make sure you've built the library first: ```bash npm run build ``` **Run any example instantly:** ```bash # Interactive explorer (recommended for beginners) npm run examples # Individual examples npm run examples:basic # Basic usage patterns npm run examples:ecommerce # E-commerce search npm run examples:documents # Document/CMS search npm run examples:users # People directory search npm run examples:performance # Performance benchmarks # Validate all examples work correctly npm run examples:validate ``` ### 📖 Example Categories | Example | Use Case | Key Features | Best For | |---------|----------|--------------|----------| | **Basic Usage** | Simple search operations | Quick search, custom searcher, field weighting | Getting started, simple applications | | **E-commerce Search** | Product catalogs | Custom scoring, price awareness, category filtering | Online stores, marketplaces | | **Document Search** | CMS/Blog search | Nested objects, content analysis, metadata search | Content sites, documentation | | **User Directory** | Employee/people search | Fuzzy matching, skill matching, department search | HR systems, social networks | | **Performance Demo** | Large datasets (1000+ items) | Benchmarking, auto-complete, analytics | High-scale applications | ### 🛠️ Common Patterns #### Quick Search Pattern ```javascript import { quickSearch } from '@zanreal/search'; // Fastest way to search - returns just the items const results = quickSearch(data, query); ``` #### Detailed Search Pattern ```javascript import { search } from '@zanreal/search'; // Full control with scoring and configuration const results = search(data, query, { fieldWeights: { title: 10, description: 5 }, fuzzyThreshold: 0.7 }); ``` #### Reusable Searcher Pattern ```javascript import { createSearcher } from '@zanreal/search'; // Create once, use many times for better performance const searcher = createSearcher({ fieldWeights: { name: 10, email: 5 }, fuzzyThreshold: 0.7 }); const results = searcher(data, query); ``` ### ⚖️ Field Weight Guidelines | Field Type | Weight Range | Use Case | Example | |------------|--------------|----------|---------| | **Primary Identifiers** | 15-20 | Names, titles, SKUs | `name: 20` | | **Important Content** | 10-15 | Descriptions, summaries | `description: 12` | | **Secondary Content** | 5-10 | Categories, tags | `category: 8` | | **Searchable Text** | 3-5 | Body content, reviews | `content: 4` | | **Auxiliary Data** | 1-3 | IDs, timestamps | `id: 1` | ### ⚡ Performance Tips 1. **Limit Results**: Use the `limit` option for large datasets 2. **Pre-filter**: Filter data before searching when possible 3. **Field Selection**: Specify `fields` array to search only relevant fields 4. **Fuzzy Threshold**: Adjust `fuzzyThreshold` based on your use case (0.7 is usually good) 5. **Caching**: Cache searcher instances for repeated use with same configuration ### 🎛️ Configuration Presets ```javascript // Strict Matching (Exact matches preferred) { fuzzyThreshold: 0.9, minFuzzyLength: 5 } // Loose Matching (Typo-tolerant) { fuzzyThreshold: 0.5, minFuzzyLength: 2 } // Title-First Search (Prioritize titles/names) { fieldWeights: { title: 20, content: 1 } } // Balanced Search (Equal importance) { fieldWeights: { title: 8, description: 5, content: 3 } } ``` For complete examples with sample data and detailed explanations, see the [`/examples`](./examples/) directory. ## 💡 Usage Ideas & Real-World Applications The Universal Search library can be applied to a wide variety of use cases. Here are some popular applications and implementation patterns: ### 🏪 E-commerce & Marketplace #### Product Search with Business Logic ```javascript const productSearcher = createSearcher({ fieldWeights: { name: 15, // Product names most important brand: 12, // Brand recognition category: 8, // Category filtering tags: 10, // Searchable attributes description: 3 // Supporting content }, fuzzyThreshold: 0.7 }); // E-commerce with filtering function ecommerceSearch(products, query, filters = {}) { let filteredData = products; if (filters.category) { filteredData = filteredData.filter(p => p.category === filters.category); } if (filters.maxPrice) { filteredData = filteredData.filter(p => p.price <= filters.maxPrice); } if (filters.inStock) { filteredData = filteredData.filter(p => p.inStock); } return productSearcher(filteredData, query); } ``` **Use Cases:** - Product catalogs with thousands of items - Auto-complete search suggestions - Category and price filtering - Brand and feature-based discovery ### 👥 HR & People Management #### Employee Directory Search ```javascript const peopleSearcher = createSearcher({ fieldWeights: { 'profile.firstName': 10, 'profile.lastName': 10, 'employment.title': 15, 'employment.department': 8, skills: 12, // Converted from array to string bio: 5, 'profile.email': 3 }, fuzzyThreshold: 0.6, // Higher tolerance for name typos limit: 20 }); ``` **Use Cases:** - Employee directories with skills matching - Department and team organization - Project collaboration discovery - Expertise location within organizations ### 📄 Content Management Systems #### Document & Article Search ```javascript const documentSearcher = createSearcher({ fieldWeights: { title: 20, // Titles are crucial 'metadata.summary': 12, // Executive summaries 'content.excerpt': 8, // Article previews 'metadata.tags': 10, // Topic classification 'author.name': 5, // Author attribution 'content.body': 3 // Full content (lower weight) } }); // Content with quality scoring function contentSearch(documents, query) { const results = documentSearcher(documents, query); // Enhance with content quality metrics return results.map(result => ({ ...result, qualityScore: result.score * (result.item.readTime > 5 ? 1.2 : 1.0), freshness: calculateFreshness(result.item.publishDate) })); } ``` **Use Cases:** - Blog and news site search - Documentation and knowledge bases - Academic paper repositories - Content recommendation systems ### 🏢 Business Applications #### Customer Relationship Management ```javascript const customerSearcher = createSearcher({ fieldWeights: { 'company.name': 15, 'contact.name': 12, 'contact.email': 8, 'details.industry': 10, 'notes.content': 5 } }); ``` #### Inventory Management ```javascript const inventorySearcher = createSearcher({ fieldWeights: { sku: 20, // Exact SKU matching critical name: 15, category: 10, supplier: 8, location: 12 }, fuzzyThreshold: 0.8 // Stricter for inventory }); ``` ### 🎯 Specialized Use Cases #### Real-time Auto-complete ```javascript function autoComplete(dataset, query, limit = 5) { if (query.length < 2) return []; return search(dataset, query, { fieldWeights: { name: 15, brand: 12, category: 5 }, fuzzyThreshold: 0.8, // Stricter for autocomplete limit }); } ``` #### Multi-language Content ```javascript const multiLangSearcher = createSearcher({ fieldWeights: { 'title.en': 10, 'title.es': 10, 'title.fr': 10, 'content.en': 5, 'content.es': 5, 'content.fr': 5 } }); ``` #### Location-based Search ```javascript const locationSearcher = createSearcher({ fieldWeights: { name: 15, 'address.city': 12, 'address.state': 10, 'address.country': 8, category: 6 } }); ``` ### ⚡ Performance Patterns #### Large Dataset Optimization ```javascript // Pre-filter before search for better performance const optimizedSearch = (data, query, category) => { const filtered = category ? data.filter(item => item.category === category) : data; return search(filtered, query, { limit: 50 }); }; ``` #### Caching for Repeated Searches ```javascript const searchCache = new Map(); function cachedSearch(data, query, options) { const cacheKey = `${query}-${JSON.stringify(options)}`; if (searchCache.has(cacheKey)) { return searchCache.get(cacheKey); } const results = search(data, query, options); searchCache.set(cacheKey, results); return results; } ``` ### 🔧 Advanced Patterns #### Search with Analytics ```javascript function analyticsSearch(data, query, options = {}) { const startTime = Date.now(); const results = search(data, query, options); const endTime = Date.now(); const analytics = { query, resultCount: results.length, executionTime: endTime - startTime, avgScore: results.reduce((sum, r) => sum + r.score, 0) / results.length, topCategories: [...new Set(results.slice(0, 10).map(r => r.item.category))], timestamp: new Date().toISOString() }; // Log analytics for monitoring console.log('Search Analytics:', analytics); return { results, analytics }; } ``` #### Progressive Search Enhancement ```javascript function progressiveSearch(data, query) { // Start with strict matching let results = search(data, query, { fuzzyThreshold: 0.9, limit: 10 }); // If insufficient results, try fuzzy matching if (results.length < 5) { results = search(data, query, { fuzzyThreshold: 0.6, limit: 20 }); } // If still insufficient, broaden field search if (results.length < 3) { results = search(data, query, { fuzzyThreshold: 0.4, limit: 30, fields: undefined // Search all fields }); } return results; } ``` ### 🎨 Industry-Specific Examples - **Real Estate**: Property search with location, price, features - **Healthcare**: Patient records, medication databases - **Education**: Course catalogs, student directories - **Legal**: Case law, document discovery - **Media**: Asset management, content libraries - **Finance**: Transaction search, customer portfolios - **Tourism**: Hotel/restaurant discovery, activity search - **Food & Beverage**: Recipe databases, ingredient matching Each of these patterns can be customized with appropriate field weights, fuzzy thresholds, and filtering logic to match your specific domain requirements. ## TypeScript Support The library is built with TypeScript and provides full type safety: ```typescript interface User { id: number; name: string; email: string; } // Full type inference and safety const users: User[] = [...]; const results: SearchResult<User>[] = search(users, 'query'); const items: User[] = searchItems(users, 'query'); ``` ## 🔬 Benchmarks & Performance The Universal Search library comes with a comprehensive benchmark suite to measure performance across different scenarios and data sizes. Our benchmarks help ensure the library performs well in real-world applications. ### Quick Performance Overview | Data Size | Average Time | Operations/sec | Use Case | |-----------|--------------|----------------|----------| | 100 items | < 1ms | > 1,000 | Small apps, auto-complete | | 1K items | < 5ms | > 200 | Medium catalogs | | 5K items | < 25ms | > 40 | Large datasets | | 10K items | < 50ms | > 20 | Enterprise scale | ### Running Benchmarks **Prerequisites:** Make sure you've built the library first: ```bash npm run build ``` **Run all benchmarks:** ```bash # Complete benchmark suite with detailed analysis npm run benchmark # Quick performance check (ideal for CI/CD) npm run benchmark:quick # Individual benchmark types npm run benchmark:main # Core performance tests npm run benchmark:comparative # Compare different search functions npm run benchmark:memory # Memory usage and leak detection ``` **Advanced benchmarking with garbage collection:** ```bash npm run benchmark:gc ``` ### Benchmark Types #### 🚀 Performance Benchmarks Tests search speed across realistic scenarios: - **Small Dataset (100 items)**: User directory search - **Medium Dataset (1,000 items)**: E-commerce product search - **Large Dataset (5,000 items)**: Document/content search - **XL Dataset (10,000 items)**: Enterprise-scale stress test **Metrics measured:** - Average response time (milliseconds) - Operations per second (higher is better) - Result accuracy and relevance - Memory usage during operations #### ⚖️ Comparative Benchmarks Compares performance between different search functions: ```typescript // Functions tested in comparison search() // Full results with detailed scoring searchItems() // Items only, no score details quickSearch() // Simplified search with defaults createSearcher() // Pre-configured reusable searcher createDocumentSearcher() // Document-optimized searcher ``` **Helps you choose the right function for your use case based on performance vs features trade-offs.** #### 🧠 Memory Benchmarks Analyzes memory efficiency and scalability: - **Memory Usage**: Tests consumption across data sizes (100 to 10,000 items) - **Scalability Analysis**: How performance scales with data size - **Memory Leak Detection**: Sustained search operations to detect leaks **Scalability ratings:** - **Excellent (> 0.8)**: Near-linear performance scaling - **Good (0.6-0.8)**: Performance degrades slowly - **Fair (0.4-0.6)**: Noticeable impact with scale - **Poor (< 0.4)**: Significant degradation #### 🔍 Full Suite Analysis The complete benchmark suite provides: - **System Information**: Environment and runtime details - **Performance Overview**: Comprehensive timing analysis - **Memory Efficiency**: Detailed memory usage patterns - **Recommendations**: Performance optimization suggestions ### Real-World Performance Data Our benchmarks use realistic data generators that simulate actual usage patterns: #### User Directory Search ```typescript // 1,000 users with names, emails, roles, skills const results = search(users, 'john developer', { fieldWeights: { name: 10, role: 8, skills: 6 } }); // Typical: ~2-4ms, 250-500 ops/sec ``` #### E-commerce Product Search ```typescript // 5,000 products with names, brands, categories, descriptions const results = search(products, 'wireless headphones', { fieldWeights: { name: 15, brand: 12, category: 8 } }); // Typical: ~8-15ms, 60-125 ops/sec ``` #### Document/Content Search ```typescript // 10,000 articles with titles, content, tags, metadata const results = search(documents, 'typescript tutorial', { fieldWeights: { title: 20, tags: 10, content: 3 } }); // Typical: ~20-40ms, 25-50 ops/sec ``` ### Performance Optimization Tips Based on benchmark results, here are proven optimization strategies: #### 🎯 Field Selection ```typescript // ✅ Good: Specify relevant fields only search(data, query, { fields: ['name', 'title', 'category'] }) // ❌ Avoid: Searching all fields unnecessarily search(data, query) // Auto-detects ALL string fields ``` #### ⚡ Smart Limiting ```typescript // ✅ Good: Limit results for better performance search(data, query, { limit: 20 }) // ❌ Avoid: Unlimited results on large datasets search(largeData, query) // Returns up to 100 by default ``` #### 🔄 Reusable Searchers ```typescript // ✅ Good: Create once, use many times const searcher = createSearcher({ fieldWeights: {...} }); const results1 = searcher(data, 'query1'); const results2 = searcher(data, 'query2'); // Reuses config // ❌ Avoid: Recreating configuration each time search(data, 'query1', options); search(data, 'query2', options); // Recreates config ``` #### 🎚️ Threshold Tuning ```typescript // ✅ Good: Tune fuzzy threshold for your use case search(data, query, { fuzzyThreshold: 0.8 // Stricter for exact matching fuzzyThreshold: 0.6 // More forgiving for typos }) ``` ### Memory Efficiency Guidelines Our memory benchmarks show these patterns: | Data Size | Memory Usage | Memory/Item | Efficiency | |-----------|--------------|-------------|------------| | 100 items | ~2-4 MB | ~20-40 KB | Excellent | | 1K items | ~8-15 MB | ~8-15 KB | Good | | 5K items | ~25-40 MB | ~5-8 KB | Good | | 10K items | ~45-70 MB | ~4.5-7 KB | Fair | **Memory leak detection results:** < 5% memory growth over 100 iterations ✅ ### CI/CD Integration Integrate benchmarks into your CI/CD pipeline: ```yaml # Example GitHub Actions step - name: Run Performance Benchmarks run: | npm run build npm run benchmark:quick > benchmark-results.txt # Add custom logic to compare with baseline ``` ### Benchmark Data Generators The benchmark suite includes realistic data generators: - **`generateUsers(count)`**: User directory with profiles, skills, departments - **`generateProducts(count)`**: E-commerce with brands, categories, pricing - **`generateDocuments(count)`**: Articles with content, metadata, tags For detailed benchmark documentation and advanced usage, see [`/benchmarks/README.md`](./benchmarks/README.md). ### Performance Monitoring Track performance in your application: ```typescript function monitoredSearch(data, query, options = {}) { const start = performance.now(); const results = search(data, query, options); const duration = performance.now() - start; console.log(`Search completed in ${duration.toFixed(2)}ms`); console.log(`Found ${results.length} results`); return results; } ``` ---