UNPKG

@mastra/rag

Version:

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

265 lines (202 loc) • 7.14 kB
# DatabaseConfig The `DatabaseConfig` type allows you to specify database-specific configurations when using vector query tools. These configurations enable you to leverage unique features and optimizations offered by different vector stores. ## Type definition ```typescript export type DatabaseConfig = { pinecone?: PineconeConfig pgvector?: PgVectorConfig chroma?: ChromaConfig [key: string]: any // Extensible for future databases } ``` ## Database-specific types ### `PineconeConfig` Configuration options specific to Pinecone vector store. **namespace** (`string`): Pinecone namespace for organizing and isolating vectors within the same index. Useful for multi-tenancy or environment separation. **sparseVector** (`{ indices: number[]; values: number[]; }`): Sparse vector for hybrid search combining dense and sparse embeddings. Enables better search quality for keyword-based queries. The indices and values arrays must be the same length. **sparseVector.indices** (`number[]`): Array of indices for sparse vector components **sparseVector.values** (`number[]`): Array of values corresponding to the indices **Use Cases:** - Multi-tenant applications (separate namespaces per tenant) - Environment isolation (dev/staging/prod namespaces) - Hybrid search combining semantic and keyword matching ### `PgVectorConfig` Configuration options specific to PostgreSQL with pgvector extension. **minScore** (`number`): Minimum similarity score threshold for results. Only vectors with similarity scores above this value will be returned. **ef** (`number`): HNSW search parameter that controls the size of the dynamic candidate list during search. Higher values improve accuracy at the cost of speed. Typically set between topK and 200. **probes** (`number`): IVFFlat probe parameter that specifies the number of index cells to visit during search. Higher values improve recall at the cost of speed. **Performance Guidelines:** - **ef**: Start with 2-4x your topK value, increase for better accuracy - **probes**: Start with 1-10, increase for better recall - **minScore**: Use values between 0.5-0.9 depending on your quality requirements **Use Cases:** - Performance optimization for high-load scenarios - Quality filtering to remove irrelevant results - Fine-tuning search accuracy vs speed tradeoffs ### `ChromaConfig` Configuration options specific to Chroma vector store. **where** (`Record<string, any>`): Metadata filtering conditions using MongoDB-style query syntax. Filters results based on metadata fields. **whereDocument** (`Record<string, any>`): Document content filtering conditions. Allows filtering based on the actual document text content. **Filter Syntax Examples:** ```typescript // Simple equality where: { "category": "technical" } // Operators where: { "price": { "$gt": 100 } } // Multiple conditions where: { "category": "electronics", "inStock": true } // Document content filtering whereDocument: { "$contains": "API documentation" } ``` **Use Cases:** - Advanced metadata filtering - Content-based document filtering - Complex query combinations ## Usage examples **Basic Usage**: ### Basic Database Configuration ```typescript import { createVectorQueryTool } from '@mastra/rag' const vectorTool = createVectorQueryTool({ vectorStoreName: 'pinecone', indexName: 'documents', model: embedModel, databaseConfig: { pinecone: { namespace: 'production', }, }, }) ``` **Runtime Override**: ### Runtime Configuration Override ```typescript import { RequestContext } from '@mastra/core/request-context' // Initial configuration const vectorTool = createVectorQueryTool({ vectorStoreName: 'pinecone', indexName: 'documents', model: embedModel, databaseConfig: { pinecone: { namespace: 'development', }, }, }) // Override at runtime const requestContext = new RequestContext() requestContext.set('databaseConfig', { pinecone: { namespace: 'production', }, }) await vectorTool.execute({ queryText: 'search query' }, { mastra, requestContext }) ``` **Multi-Database**: ### Multi-Database Configuration ```typescript const vectorTool = createVectorQueryTool({ vectorStoreName: 'dynamic', // Will be determined at runtime indexName: 'documents', model: embedModel, databaseConfig: { pinecone: { namespace: 'default', }, pgvector: { minScore: 0.8, ef: 150, }, chroma: { where: { type: 'documentation' }, }, }, }) ``` > **Note:** **Multi-Database Support**: When you configure multiple databases, only the configuration matching the actual vector store being used will be applied. **Performance Tuning**: ### Performance Tuning ```typescript // High accuracy configuration const highAccuracyTool = createVectorQueryTool({ vectorStoreName: 'postgres', indexName: 'embeddings', model: embedModel, databaseConfig: { pgvector: { ef: 400, // High accuracy probes: 20, // High recall minScore: 0.85, // High quality threshold }, }, }) // High speed configuration const highSpeedTool = createVectorQueryTool({ vectorStoreName: 'postgres', indexName: 'embeddings', model: embedModel, databaseConfig: { pgvector: { ef: 50, // Lower accuracy, faster probes: 3, // Lower recall, faster minScore: 0.6, // Lower quality threshold }, }, }) ``` ## Extensibility The `DatabaseConfig` type is designed to be extensible. To add support for a new vector database: ```typescript // 1. Define the configuration interface export interface NewDatabaseConfig { customParam1?: string customParam2?: number } // 2. Extend DatabaseConfig type export type DatabaseConfig = { pinecone?: PineconeConfig pgvector?: PgVectorConfig chroma?: ChromaConfig newdatabase?: NewDatabaseConfig [key: string]: any } // 3. Use in vector query tool const vectorTool = createVectorQueryTool({ vectorStoreName: 'newdatabase', indexName: 'documents', model: embedModel, databaseConfig: { newdatabase: { customParam1: 'value', customParam2: 42, }, }, }) ``` ## Best practices 1. **Environment Configuration**: Use different namespaces or configurations for different environments 2. **Performance Tuning**: Start with default values and adjust based on your specific needs 3. **Quality Filtering**: Use minScore to filter out low-quality results 4. **Runtime Flexibility**: Override configurations at runtime for dynamic scenarios 5. **Documentation**: Document your specific configuration choices for team members ## Migration guide Existing vector query tools continue to work without changes. To add database configurations: ```diff const vectorTool = createVectorQueryTool({ vectorStoreName: 'pinecone', indexName: 'documents', model: embedModel, + databaseConfig: { + pinecone: { + namespace: 'production' + } + } }); ``` ## Related - [createVectorQueryTool()](https://mastra.ai/reference/tools/vector-query-tool) - [Hybrid Vector Search](https://mastra.ai/docs/rag/retrieval) - [Metadata Filters](https://mastra.ai/reference/rag/metadata-filters)