@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
113 lines (81 loc) • 3.5 kB
Markdown
# Textual difference scorer
The `createTextualDifferenceScorer()` function uses sequence matching to measure the textual differences between two strings. It provides detailed information about changes, including the number of operations needed to transform one text into another.
## Parameters
The `createTextualDifferenceScorer()` function doesn't take any options.
This function returns an instance of the MastraScorer class. See the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer) for details on the `.run()` method and its input/output.
## `.run()` returns
**runId** (`string`): The id of the run (optional).
**analyzeStepResult** (`object`): Object with difference metrics: { confidence: number, changes: number, lengthDiff: number }
**score** (`number`): Similarity ratio (0-1) where 1 indicates identical texts.
`.run()` returns a result in the following shape:
```typescript
{
runId: string,
analyzeStepResult: {
confidence: number,
ratio: number,
changes: number,
lengthDiff: number
},
score: number
}
```
## Scoring details
The scorer calculates several measures:
- **Similarity Ratio**: Based on sequence matching between texts (0-1)
- **Changes**: Count of non-matching operations needed
- **Length Difference**: Normalized difference in text lengths
- **Confidence**: Inversely proportional to length difference
### Scoring Process
1. Analyzes textual differences:
- Performs sequence matching between input and output
- Counts the number of change operations required
- Measures length differences
2. Calculates metrics:
- Computes similarity ratio
- Determines confidence score
- Combines into weighted score
Final score: `(similarity_ratio * confidence) * scale`
### Score interpretation
A textual difference score between 0 and 1:
- **1.0**: Identical texts – no differences detected.
- **0.7–0.9**: Minor differences – few changes needed.
- **0.4–0.6**: Moderate differences – noticeable changes required.
- **0.1–0.3**: Major differences – extensive changes needed.
- **0.0**: Completely different texts.
## Example
Measure textual differences between expected and actual agent outputs:
```typescript
import { runEvals } from '@mastra/core/evals'
import { createTextualDifferenceScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'
const scorer = createTextualDifferenceScorer()
const result = await runEvals({
data: [
{
input: 'Summarize the concept of recursion',
groundTruth:
'Recursion is when a function calls itself to solve a problem by breaking it into smaller subproblems.',
},
{
input: 'What is the capital of France?',
groundTruth: 'The capital of France is Paris.',
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
groundTruth: scorerResults[scorer.id].groundTruth,
})
},
})
console.log(result.scores)
```
For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
## Related
- [Content Similarity Scorer](https://mastra.ai/reference/evals/content-similarity)
- [Completeness Scorer](https://mastra.ai/reference/evals/completeness)
- [Keyword Coverage Scorer](https://mastra.ai/reference/evals/keyword-coverage)