UNPKG

@mastra/rag

Version:

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

180 lines (130 loc) 7.11 kB
# createGraphRAGTool() The `createGraphRAGTool()` creates a tool that enhances RAG by building a graph of semantic relationships between documents. It uses the `GraphRAG` system under the hood to provide graph-based retrieval, finding relevant content through both direct similarity and connected relationships. ## Usage example ```typescript import { createGraphRAGTool } from '@mastra/rag' import { ModelRouterEmbeddingModel } from '@mastra/core/llm' const graphTool = createGraphRAGTool({ vectorStoreName: 'pinecone', indexName: 'docs', model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'), graphOptions: { dimension: 1536, threshold: 0.7, randomWalkSteps: 100, restartProb: 0.15, }, }) ``` ## Parameters > **Note:** **Parameter Requirements:** Most fields can be set at creation as defaults. Some fields can be overridden at runtime via the request context or input. If a required field is missing from both creation and runtime, an error will be thrown. Note that `model`, `id`, and `description` can only be set at creation time. **id** (`string`): Custom ID for the tool. By default: 'GraphRAG {vectorStoreName} {indexName} Tool'. (Set at creation only.) **description** (`string`): Custom description for the tool. By default: 'Access and analyze relationships between information in the knowledge base to answer complex questions about connections and patterns.' (Set at creation only.) **vectorStoreName** (`string`): Name of the vector store to query. (Can be set at creation or overridden at runtime.) **indexName** (`string`): Name of the index within the vector store. (Can be set at creation or overridden at runtime.) **model** (`EmbeddingModel`): Embedding model to use for vector search. (Set at creation only.) **enableFilter** (`boolean`): Enable filtering of results based on metadata. (Set at creation only, but will be automatically enabled if a filter is provided in the request context.) (Default: `false`) **includeSources** (`boolean`): Include the full retrieval objects in the results. (Can be set at creation or overridden at runtime.) (Default: `true`) **graphOptions** (`GraphOptions`): Configuration for the graph-based retrieval (Default: `Default graph options`) **graphOptions.dimension** (`number`): Dimension of the embedding vectors **graphOptions.threshold** (`number`): Similarity threshold for creating edges between nodes (0-1) **graphOptions.randomWalkSteps** (`number`): Number of steps in random walk for graph traversal. (Can be set at creation or overridden at runtime.) **graphOptions.restartProb** (`number`): Probability of restarting random walk from query node. (Can be set at creation or overridden at runtime.) **providerOptions** (`Record<string, Record<string, any>>`): Provider-specific options for the embedding model (e.g., outputDimensionality). \*\*Important\*\*: Only works with AI SDK EmbeddingModelV2 models. For V1 models, configure options when creating the model itself. **vectorStore** (`MastraVector | VectorStoreResolver`): Direct vector store instance or a resolver function for dynamic selection. Use a function for multi-tenant applications where the vector store is selected based on request context. When provided, \`vectorStoreName\` becomes optional. ## Returns The tool returns an object with: **relevantContext** (`string`): Combined text from the most relevant document chunks, retrieved using graph-based ranking **sources** (`QueryResult[]`): Array of full retrieval result objects. Each object contains all information needed to reference the original document, chunk, and similarity score. ### `QueryResult` object structure ```typescript { id: string; // Unique chunk/document identifier metadata: any; // All metadata fields (document ID, etc.) vector: number[]; // Embedding vector (if available) score: number; // Similarity score for this retrieval document: string; // Full chunk/document text (if available) } ``` ## Default tool description The default description focuses on: - Analyzing relationships between documents - Finding patterns and connections - Answering complex queries ## Advanced example ```typescript const graphTool = createGraphRAGTool({ vectorStoreName: 'pinecone', indexName: 'docs', model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'), graphOptions: { dimension: 1536, threshold: 0.8, // Higher similarity threshold randomWalkSteps: 200, // More exploration steps restartProb: 0.2, // Higher restart probability }, }) ``` ## Example with custom description ```typescript const graphTool = createGraphRAGTool({ vectorStoreName: 'pinecone', indexName: 'docs', model: 'openai/text-embedding-3-small ', description: "Analyze document relationships to find complex patterns and connections in our company's historical data", }) ``` This example shows how to customize the tool description for a specific use case while maintaining its core purpose of relationship analysis. ## Example: Using request context ```typescript const graphTool = createGraphRAGTool({ vectorStoreName: 'pinecone', indexName: 'docs', model: 'openai/text-embedding-3-small ', }) ``` When using request context, provide required parameters at execution time via the request context: ```typescript const requestContext = new RequestContext<{ vectorStoreName: string indexName: string topK: number filter: any }>() requestContext.set('vectorStoreName', 'my-store') requestContext.set('indexName', 'my-index') requestContext.set('topK', 5) requestContext.set('filter', { category: 'docs' }) requestContext.set('randomWalkSteps', 100) requestContext.set('restartProb', 0.15) const response = await agent.generate('Find documentation from the knowledge base.', { requestContext, }) ``` For more information on request context, please see: - [Agent Request Context](https://mastra.ai/docs/server/request-context) - [Request Context](https://mastra.ai/docs/server/request-context) ## Dynamic vector store for multi-tenant applications For multi-tenant applications where each tenant has isolated data, you can pass a resolver function instead of a static vector store: ```typescript import { createGraphRAGTool, VectorStoreResolver } from '@mastra/rag' import { PgVector } from '@mastra/pg' const vectorStoreResolver: VectorStoreResolver = async ({ requestContext }) => { const tenantId = requestContext?.get('tenantId') return new PgVector({ id: `pg-vector-${tenantId}`, connectionString: process.env.POSTGRES_CONNECTION_STRING!, schemaName: `tenant_${tenantId}`, }) } const graphTool = createGraphRAGTool({ indexName: 'embeddings', model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'), vectorStore: vectorStoreResolver, }) ``` See [createVectorQueryTool - Dynamic Vector Store](https://mastra.ai/reference/tools/vector-query-tool) for more details. ## Related - [createVectorQueryTool](https://mastra.ai/reference/tools/vector-query-tool) - [GraphRAG](https://mastra.ai/reference/rag/graph-rag)