@mastra/rag
Version:
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.
180 lines (130 loc) • 7.11 kB
Markdown
The `createGraphRAGTool()` creates a tool that enhances RAG by building a graph of semantic relationships between documents. It uses the `GraphRAG` system under the hood to provide graph-based retrieval, finding relevant content through both direct similarity and connected relationships.
```typescript
import { createGraphRAGTool } from '@mastra/rag'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
const graphTool = createGraphRAGTool({
vectorStoreName: 'pinecone',
indexName: 'docs',
model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
graphOptions: {
dimension: 1536,
threshold: 0.7,
randomWalkSteps: 100,
restartProb: 0.15,
},
})
```
> **Note:** **Parameter Requirements:** Most fields can be set at creation as defaults. Some fields can be overridden at runtime via the request context or input. If a required field is missing from both creation and runtime, an error will be thrown. Note that `model`, `id`, and `description` can only be set at creation time.
**id** (`string`): Custom ID for the tool. By default: 'GraphRAG {vectorStoreName} {indexName} Tool'. (Set at creation only.)
**description** (`string`): Custom description for the tool. By default: 'Access and analyze relationships between information in the knowledge base to answer complex questions about connections and patterns.' (Set at creation only.)
**vectorStoreName** (`string`): Name of the vector store to query. (Can be set at creation or overridden at runtime.)
**indexName** (`string`): Name of the index within the vector store. (Can be set at creation or overridden at runtime.)
**model** (`EmbeddingModel`): Embedding model to use for vector search. (Set at creation only.)
**enableFilter** (`boolean`): Enable filtering of results based on metadata. (Set at creation only, but will be automatically enabled if a filter is provided in the request context.) (Default: `false`)
**includeSources** (`boolean`): Include the full retrieval objects in the results. (Can be set at creation or overridden at runtime.) (Default: `true`)
**graphOptions** (`GraphOptions`): Configuration for the graph-based retrieval (Default: `Default graph options`)
**graphOptions.dimension** (`number`): Dimension of the embedding vectors
**graphOptions.threshold** (`number`): Similarity threshold for creating edges between nodes (0-1)
**graphOptions.randomWalkSteps** (`number`): Number of steps in random walk for graph traversal. (Can be set at creation or overridden at runtime.)
**graphOptions.restartProb** (`number`): Probability of restarting random walk from query node. (Can be set at creation or overridden at runtime.)
**providerOptions** (`Record<string, Record<string, any>>`): Provider-specific options for the embedding model (e.g., outputDimensionality). \*\*Important\*\*: Only works with AI SDK EmbeddingModelV2 models. For V1 models, configure options when creating the model itself.
**vectorStore** (`MastraVector | VectorStoreResolver`): Direct vector store instance or a resolver function for dynamic selection. Use a function for multi-tenant applications where the vector store is selected based on request context. When provided, \`vectorStoreName\` becomes optional.
## Returns
The tool returns an object with:
**relevantContext** (`string`): Combined text from the most relevant document chunks, retrieved using graph-based ranking
**sources** (`QueryResult[]`): Array of full retrieval result objects. Each object contains all information needed to reference the original document, chunk, and similarity score.
### `QueryResult` object structure
```typescript
{
id: string; // Unique chunk/document identifier
metadata: any; // All metadata fields (document ID, etc.)
vector: number[]; // Embedding vector (if available)
score: number; // Similarity score for this retrieval
document: string; // Full chunk/document text (if available)
}
```
The default description focuses on:
- Analyzing relationships between documents
- Finding patterns and connections
- Answering complex queries
```typescript
const graphTool = createGraphRAGTool({
vectorStoreName: 'pinecone',
indexName: 'docs',
model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
graphOptions: {
dimension: 1536,
threshold: 0.8, // Higher similarity threshold
randomWalkSteps: 200, // More exploration steps
restartProb: 0.2, // Higher restart probability
},
})
```
```typescript
const graphTool = createGraphRAGTool({
vectorStoreName: 'pinecone',
indexName: 'docs',
model: 'openai/text-embedding-3-small ',
description:
"Analyze document relationships to find complex patterns and connections in our company's historical data",
})
```
This example shows how to customize the tool description for a specific use case while maintaining its core purpose of relationship analysis.
```typescript
const graphTool = createGraphRAGTool({
vectorStoreName: 'pinecone',
indexName: 'docs',
model: 'openai/text-embedding-3-small ',
})
```
When using request context, provide required parameters at execution time via the request context:
```typescript
const requestContext = new RequestContext<{
vectorStoreName: string
indexName: string
topK: number
filter: any
}>()
requestContext.set('vectorStoreName', 'my-store')
requestContext.set('indexName', 'my-index')
requestContext.set('topK', 5)
requestContext.set('filter', { category: 'docs' })
requestContext.set('randomWalkSteps', 100)
requestContext.set('restartProb', 0.15)
const response = await agent.generate('Find documentation from the knowledge base.', {
requestContext,
})
```
For more information on request context, please see:
- [Agent Request Context](https://mastra.ai/docs/server/request-context)
- [Request Context](https://mastra.ai/docs/server/request-context)
For multi-tenant applications where each tenant has isolated data, you can pass a resolver function instead of a static vector store:
```typescript
import { createGraphRAGTool, VectorStoreResolver } from '@mastra/rag'
import { PgVector } from '@mastra/pg'
const vectorStoreResolver: VectorStoreResolver = async ({ requestContext }) => {
const tenantId = requestContext?.get('tenantId')
return new PgVector({
id: `pg-vector-${tenantId}`,
connectionString: process.env.POSTGRES_CONNECTION_STRING!,
schemaName: `tenant_${tenantId}`,
})
}
const graphTool = createGraphRAGTool({
indexName: 'embeddings',
model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
vectorStore: vectorStoreResolver,
})
```
See [createVectorQueryTool - Dynamic Vector Store](https://mastra.ai/reference/tools/vector-query-tool) for more details.
- [createVectorQueryTool](https://mastra.ai/reference/tools/vector-query-tool)
- [GraphRAG](https://mastra.ai/reference/rag/graph-rag)