@mastra/rag
Version:
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.
265 lines (202 loc) • 7.14 kB
Markdown
# DatabaseConfig
The `DatabaseConfig` type allows you to specify database-specific configurations when using vector query tools. These configurations enable you to leverage unique features and optimizations offered by different vector stores.
## Type definition
```typescript
export type DatabaseConfig = {
pinecone?: PineconeConfig
pgvector?: PgVectorConfig
chroma?: ChromaConfig
[key: string]: any // Extensible for future databases
}
```
## Database-specific types
### `PineconeConfig`
Configuration options specific to Pinecone vector store.
**namespace** (`string`): Pinecone namespace for organizing and isolating vectors within the same index. Useful for multi-tenancy or environment separation.
**sparseVector** (`{ indices: number[]; values: number[]; }`): Sparse vector for hybrid search combining dense and sparse embeddings. Enables better search quality for keyword-based queries. The indices and values arrays must be the same length.
**sparseVector.indices** (`number[]`): Array of indices for sparse vector components
**sparseVector.values** (`number[]`): Array of values corresponding to the indices
**Use Cases:**
- Multi-tenant applications (separate namespaces per tenant)
- Environment isolation (dev/staging/prod namespaces)
- Hybrid search combining semantic and keyword matching
### `PgVectorConfig`
Configuration options specific to PostgreSQL with pgvector extension.
**minScore** (`number`): Minimum similarity score threshold for results. Only vectors with similarity scores above this value will be returned.
**ef** (`number`): HNSW search parameter that controls the size of the dynamic candidate list during search. Higher values improve accuracy at the cost of speed. Typically set between topK and 200.
**probes** (`number`): IVFFlat probe parameter that specifies the number of index cells to visit during search. Higher values improve recall at the cost of speed.
**Performance Guidelines:**
- **ef**: Start with 2-4x your topK value, increase for better accuracy
- **probes**: Start with 1-10, increase for better recall
- **minScore**: Use values between 0.5-0.9 depending on your quality requirements
**Use Cases:**
- Performance optimization for high-load scenarios
- Quality filtering to remove irrelevant results
- Fine-tuning search accuracy vs speed tradeoffs
### `ChromaConfig`
Configuration options specific to Chroma vector store.
**where** (`Record<string, any>`): Metadata filtering conditions using MongoDB-style query syntax. Filters results based on metadata fields.
**whereDocument** (`Record<string, any>`): Document content filtering conditions. Allows filtering based on the actual document text content.
**Filter Syntax Examples:**
```typescript
// Simple equality
where: { "category": "technical" }
// Operators
where: { "price": { "$gt": 100 } }
// Multiple conditions
where: {
"category": "electronics",
"inStock": true
}
// Document content filtering
whereDocument: { "$contains": "API documentation" }
```
**Use Cases:**
- Advanced metadata filtering
- Content-based document filtering
- Complex query combinations
## Usage examples
**Basic Usage**:
### Basic Database Configuration
```typescript
import { createVectorQueryTool } from '@mastra/rag'
const vectorTool = createVectorQueryTool({
vectorStoreName: 'pinecone',
indexName: 'documents',
model: embedModel,
databaseConfig: {
pinecone: {
namespace: 'production',
},
},
})
```
**Runtime Override**:
### Runtime Configuration Override
```typescript
import { RequestContext } from '@mastra/core/request-context'
// Initial configuration
const vectorTool = createVectorQueryTool({
vectorStoreName: 'pinecone',
indexName: 'documents',
model: embedModel,
databaseConfig: {
pinecone: {
namespace: 'development',
},
},
})
// Override at runtime
const requestContext = new RequestContext()
requestContext.set('databaseConfig', {
pinecone: {
namespace: 'production',
},
})
await vectorTool.execute({ queryText: 'search query' }, { mastra, requestContext })
```
**Multi-Database**:
### Multi-Database Configuration
```typescript
const vectorTool = createVectorQueryTool({
vectorStoreName: 'dynamic', // Will be determined at runtime
indexName: 'documents',
model: embedModel,
databaseConfig: {
pinecone: {
namespace: 'default',
},
pgvector: {
minScore: 0.8,
ef: 150,
},
chroma: {
where: { type: 'documentation' },
},
},
})
```
> **Note:** **Multi-Database Support**: When you configure multiple databases, only the configuration matching the actual vector store being used will be applied.
**Performance Tuning**:
### Performance Tuning
```typescript
// High accuracy configuration
const highAccuracyTool = createVectorQueryTool({
vectorStoreName: 'postgres',
indexName: 'embeddings',
model: embedModel,
databaseConfig: {
pgvector: {
ef: 400, // High accuracy
probes: 20, // High recall
minScore: 0.85, // High quality threshold
},
},
})
// High speed configuration
const highSpeedTool = createVectorQueryTool({
vectorStoreName: 'postgres',
indexName: 'embeddings',
model: embedModel,
databaseConfig: {
pgvector: {
ef: 50, // Lower accuracy, faster
probes: 3, // Lower recall, faster
minScore: 0.6, // Lower quality threshold
},
},
})
```
## Extensibility
The `DatabaseConfig` type is designed to be extensible. To add support for a new vector database:
```typescript
// 1. Define the configuration interface
export interface NewDatabaseConfig {
customParam1?: string
customParam2?: number
}
// 2. Extend DatabaseConfig type
export type DatabaseConfig = {
pinecone?: PineconeConfig
pgvector?: PgVectorConfig
chroma?: ChromaConfig
newdatabase?: NewDatabaseConfig
[key: string]: any
}
// 3. Use in vector query tool
const vectorTool = createVectorQueryTool({
vectorStoreName: 'newdatabase',
indexName: 'documents',
model: embedModel,
databaseConfig: {
newdatabase: {
customParam1: 'value',
customParam2: 42,
},
},
})
```
## Best practices
1. **Environment Configuration**: Use different namespaces or configurations for different environments
2. **Performance Tuning**: Start with default values and adjust based on your specific needs
3. **Quality Filtering**: Use minScore to filter out low-quality results
4. **Runtime Flexibility**: Override configurations at runtime for dynamic scenarios
5. **Documentation**: Document your specific configuration choices for team members
## Migration guide
Existing vector query tools continue to work without changes. To add database configurations:
```diff
const vectorTool = createVectorQueryTool({
vectorStoreName: 'pinecone',
indexName: 'documents',
model: embedModel,
+ databaseConfig: {
+ pinecone: {
+ namespace: 'production'
+ }
+ }
});
```
## Related
- [createVectorQueryTool()](https://mastra.ai/reference/tools/vector-query-tool)
- [Hybrid Vector Search](https://mastra.ai/docs/rag/retrieval)
- [Metadata Filters](https://mastra.ai/reference/rag/metadata-filters)