@mastra/rag
Version:
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.
89 lines (63 loc) • 2.42 kB
Markdown
The `createDocumentChunkerTool()` function creates a tool for splitting documents into smaller chunks for efficient processing and retrieval. It supports different chunking strategies and configurable parameters.
```typescript
import { createDocumentChunkerTool, MDocument } from '@mastra/rag'
const document = new MDocument({
text: 'Your document content here...',
metadata: { source: 'user-manual' },
})
const chunker = createDocumentChunkerTool({
doc: document,
params: {
strategy: 'recursive',
size: 512,
overlap: 50,
separator: '\n',
},
})
const { chunks } = await chunker.execute()
```
**doc** (`MDocument`): The document to be chunked
**params** (`ChunkParams`): Configuration parameters for chunking (Default: `Default chunking parameters`)
### `ChunkParams`
**strategy** (`'recursive'`): The chunking strategy to use (Default: `'recursive'`)
**size** (`number`): Target size of each chunk in tokens/characters (Default: `512`)
**overlap** (`number`): Number of overlapping tokens/characters between chunks (Default: `50`)
**separator** (`string`): Character(s) to use as chunk separator (Default: `'\n'`)
## Returns
**chunks** (`DocumentChunk[]`): Array of document chunks with their content and metadata
## Example with custom parameters
```typescript
const technicalDoc = new MDocument({
text: longDocumentContent,
metadata: {
type: 'technical',
version: '1.0',
},
})
const chunker = createDocumentChunkerTool({
doc: technicalDoc,
params: {
strategy: 'recursive',
size: 1024, // Larger chunks
overlap: 100, // More overlap
separator: '\n\n', // Split on double newlines
},
})
const { chunks } = await chunker.execute()
// Process the chunks
chunks.forEach((chunk, index) => {
console.log(`Chunk ${index + 1} length: ${chunk.content.length}`)
})
```
The chunker is created as a Mastra tool with the following properties:
- **Tool ID**: `Document Chunker {strategy} {size}`
- **Description**: `Chunks document using {strategy} strategy with size {size} and {overlap} overlap`
- **Input Schema**: Empty object (no additional inputs required)
- **Output Schema**: Object containing the chunks array
## Related
- [MDocument](https://mastra.ai/reference/rag/document)
- [createVectorQueryTool](https://mastra.ai/reference/tools/vector-query-tool)