@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
198 lines (138 loc) • 6.62 kB
Markdown
# Datasets overview
**Added in:** `@mastra/core@1.4.0`
Datasets are collections of test cases that you run experiments against to measure how well your agents and workflows perform. Each mutation creates a new version, so you can reproduce past experiments exactly. Pair datasets with [scorers](https://mastra.ai/docs/evals/overview) to track quality across prompts, models, or code changes.
## Usage
### Configure storage
Configure storage in your Mastra instance. Datasets require a storage adapter that provides the `datasets` domain:
```typescript
import { Mastra } from '@mastra/core'
import { LibSQLStore } from '@mastra/libsql'
export const mastra = new Mastra({
storage: new LibSQLStore({
id: 'my-store',
url: 'file:./mastra.db',
}),
})
```
### Accessing the datasets API
All dataset operations are available through `mastra.datasets`:
```typescript
const datasets = mastra.datasets
// Create a dataset
const dataset = await datasets.create({ name: 'my-dataset' })
// Retrieve an existing dataset
const existing = await datasets.get({ id: 'dataset-id' })
// List all datasets
const { datasets: all } = await datasets.list()
```
> **Info:** Visit the [`DatasetsManager` reference](https://mastra.ai/reference/datasets/datasets-manager) for the full list of methods.
## Studio
You can also manage datasets in [Studio](https://mastra.ai/docs/studio/overview). After opening Studio, select **Datasets** from the sidebar to see all your available datasets or create a new one.
To get started, select **Create Dataset** and set a name, description, and optional schemas. After confirming, you'll see the dataset details page with two tabs: **Items** and [**Experiments**](https://mastra.ai/docs/evals/datasets/running-experiments).
In the **Items** view you can add, update, and delete items, and view version history. Select **Add Item** to insert a new item with JSON editors for input and ground truth. From this view you can also import items in bulk from a CSV or JSON file. When importing, map each column to the corresponding dataset field.
Select **Versions** to see the full history of changes to the dataset. After selecting **Compare Versions**, choose any two versions and select **Compare** to see a side-by-side diff of all items that were added, changed, or removed between those versions.
## Creating a dataset
Call [`create()`](https://mastra.ai/reference/datasets/create) with a name and optional description:
```typescript
import { mastra } from '../index'
const dataset = await mastra.datasets.create({
name: 'translation-pairs',
description: 'English to Spanish translation test cases',
})
console.log(dataset.id) // auto-generated UUID
```
### Defining schemas
You can enforce the shape of `input` and `groundTruth` by passing a [Standard JSON Schema](https://standardschema.dev/json-schema) ([Zod](https://zod.dev/), [Valibot](https://valibot.dev/), [ArkType](https://arktype.io/), etc.) when creating the dataset:
```typescript
import { z } from 'zod'
import { mastra } from '../index'
const dataset = await mastra.datasets.create({
name: 'translation-pairs',
inputSchema: z.object({
text: z.string(),
sourceLang: z.string(),
targetLang: z.string(),
}),
groundTruthSchema: z.object({
translation: z.string(),
}),
})
```
Items that don't match the schema are rejected at insert time.
## Adding items
Use [`addItem()`](https://mastra.ai/reference/datasets/addItem) for a single item or [`addItems()`](https://mastra.ai/reference/datasets/addItems) to insert in bulk:
```typescript
// Single item
await dataset.addItem({
input: { text: 'Hello', sourceLang: 'en', targetLang: 'es' },
groundTruth: { translation: 'Hola' },
})
// Bulk insert
await dataset.addItems({
items: [
{
input: { text: 'Goodbye', sourceLang: 'en', targetLang: 'es' },
groundTruth: { translation: 'Adiós' },
},
{
input: { text: 'Thank you', sourceLang: 'en', targetLang: 'es' },
groundTruth: { translation: 'Gracias' },
},
],
})
```
## Updating and deleting items
[`updateItem()`](https://mastra.ai/reference/datasets/updateItem), [`deleteItem()`](https://mastra.ai/reference/datasets/deleteItem), and [`deleteItems()`](https://mastra.ai/reference/datasets/deleteItems) let you modify or remove existing items by `itemId`:
```typescript
await dataset.updateItem({
itemId: 'item-abc-123',
groundTruth: { translation: '¡Hola!' },
})
await dataset.deleteItem({ itemId: 'item-abc-123' })
await dataset.deleteItems({ itemIds: ['item-1', 'item-2'] })
```
## Listing and searching items
[`listItems()`](https://mastra.ai/reference/datasets/listItems) supports pagination and full-text search:
```typescript
// Paginated list
const { items, pagination } = await dataset.listItems({
page: 0,
perPage: 50,
})
// Full-text search
const { items: matches } = await dataset.listItems({
search: 'Hello',
})
// List items at a specific version
const v2Items = await dataset.listItems({ version: 2 })
```
## Versioning
Every mutation to a dataset's items (add, update, or delete) bumps the dataset version. This lets you pin experiments to a specific snapshot of the data.
### Listing versions
Use [`listVersions()`](https://mastra.ai/reference/datasets/listVersions) to see the paginated history of versions:
```typescript
const { versions, pagination } = await dataset.listVersions()
for (const v of versions) {
console.log(`Version ${v.version} — created ${v.createdAt}`)
}
```
### Viewing item history
See how a specific item changed across versions by calling [`getItemHistory()`](https://mastra.ai/reference/datasets/getItemHistory) with the `itemId`:
```typescript
const history = await dataset.getItemHistory({ itemId: 'item-abc-123' })
for (const row of history) {
console.log(`Version ${row.datasetVersion}`, row.input, row.groundTruth)
}
```
### Pinning to a version
Fetch the exact items that existed at a past version:
```typescript
const items = await dataset.listItems({ version: 2 })
```
You can also pin experiments to a version, see [running experiments](https://mastra.ai/docs/evals/datasets/running-experiments).
> **Info:** Visit the [`Dataset` reference](https://mastra.ai/reference/datasets/dataset) for the full list of methods and parameters.
## Related
- [Running experiments](https://mastra.ai/docs/evals/datasets/running-experiments)
- [Scorers overview](https://mastra.ai/docs/evals/overview)
- [DatasetsManager reference](https://mastra.ai/reference/datasets/datasets-manager)
- [Dataset reference](https://mastra.ai/reference/datasets/dataset)