@mastra/rag
Version:
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.
114 lines (75 loc) • 2.42 kB
Markdown
# MDocument
The MDocument class processes documents for RAG applications. The main methods are `.chunk()` and `.extractMetadata()`.
## Constructor
**docs** (`Array<{ text: string, metadata?: Record<string, any> }>`): Array of document chunks with their text content and optional metadata
**type** (`'text' | 'html' | 'markdown' | 'json' | 'latex'`): Type of document content
## Static methods
### `fromText()`
Creates a document from plain text content.
```typescript
static fromText(text: string, metadata?: Record<string, any>): MDocument
```
### `fromHTML()`
Creates a document from HTML content.
```typescript
static fromHTML(html: string, metadata?: Record<string, any>): MDocument
```
### `fromMarkdown()`
Creates a document from Markdown content.
```typescript
static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument
```
### `fromJSON()`
Creates a document from JSON content.
```typescript
static fromJSON(json: string, metadata?: Record<string, any>): MDocument
```
## Instance methods
### `chunk()`
Splits document into chunks and optionally extracts metadata.
```typescript
async chunk(params?: ChunkParams): Promise<Chunk[]>
```
See [chunk() reference](https://mastra.ai/reference/rag/chunk) for detailed options.
### `getDocs()`
Returns array of processed document chunks.
```typescript
getDocs(): Chunk[]
```
### `getText()`
Returns array of text strings from chunks.
```typescript
getText(): string[]
```
### `getMetadata()`
Returns array of metadata objects from chunks.
```typescript
getMetadata(): Record<string, any>[]
```
### `extractMetadata()`
Extracts metadata using specified extractors. See [ExtractParams reference](https://mastra.ai/reference/rag/extract-params) for details.
```typescript
async extractMetadata(params: ExtractParams): Promise<MDocument>
```
## Examples
```typescript
import { MDocument } from '@mastra/rag'
// Create document from text
const doc = MDocument.fromText('Your content here')
// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: 'markdown',
headers: [
['#', 'title'],
['##', 'section'],
],
extract: {
summary: true, // Extract summaries with default settings
keywords: true, // Extract keywords with default settings
},
})
// Get processed chunks
const docs = doc.getDocs()
const texts = doc.getText()
const metadata = doc.getMetadata()
```