UNPKG

@mastra/rag

Version:

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

114 lines (75 loc) 2.42 kB
# MDocument The MDocument class processes documents for RAG applications. The main methods are `.chunk()` and `.extractMetadata()`. ## Constructor **docs** (`Array<{ text: string, metadata?: Record<string, any> }>`): Array of document chunks with their text content and optional metadata **type** (`'text' | 'html' | 'markdown' | 'json' | 'latex'`): Type of document content ## Static methods ### `fromText()` Creates a document from plain text content. ```typescript static fromText(text: string, metadata?: Record<string, any>): MDocument ``` ### `fromHTML()` Creates a document from HTML content. ```typescript static fromHTML(html: string, metadata?: Record<string, any>): MDocument ``` ### `fromMarkdown()` Creates a document from Markdown content. ```typescript static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument ``` ### `fromJSON()` Creates a document from JSON content. ```typescript static fromJSON(json: string, metadata?: Record<string, any>): MDocument ``` ## Instance methods ### `chunk()` Splits document into chunks and optionally extracts metadata. ```typescript async chunk(params?: ChunkParams): Promise<Chunk[]> ``` See [chunk() reference](https://mastra.ai/reference/rag/chunk) for detailed options. ### `getDocs()` Returns array of processed document chunks. ```typescript getDocs(): Chunk[] ``` ### `getText()` Returns array of text strings from chunks. ```typescript getText(): string[] ``` ### `getMetadata()` Returns array of metadata objects from chunks. ```typescript getMetadata(): Record<string, any>[] ``` ### `extractMetadata()` Extracts metadata using specified extractors. See [ExtractParams reference](https://mastra.ai/reference/rag/extract-params) for details. ```typescript async extractMetadata(params: ExtractParams): Promise<MDocument> ``` ## Examples ```typescript import { MDocument } from '@mastra/rag' // Create document from text const doc = MDocument.fromText('Your content here') // Split into chunks with metadata extraction const chunks = await doc.chunk({ strategy: 'markdown', headers: [ ['#', 'title'], ['##', 'section'], ], extract: { summary: true, // Extract summaries with default settings keywords: true, // Extract keywords with default settings }, }) // Get processed chunks const docs = doc.getDocs() const texts = doc.getText() const metadata = doc.getMetadata() ```