@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
97 lines (64 loc) • 3.86 kB
Markdown
# Answer similarity scorer
The `createAnswerSimilarityScorer()` function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time.
## Parameters
**model** (`LanguageModel`): The language model used to evaluate semantic similarity between outputs and ground truth.
**options** (`AnswerSimilarityOptions`): Configuration options for the scorer.
**options.requireGroundTruth** (`boolean`): Whether to require ground truth for evaluation. If false, missing ground truth returns score 0.
**options.semanticThreshold** (`number`): Weight for semantic matches vs exact matches (0-1).
**options.exactMatchBonus** (`number`): Additional score bonus for exact matches (0-1).
**options.missingPenalty** (`number`): Penalty per missing key concept from ground truth.
**options.contradictionPenalty** (`number`): Penalty for contradictory information. High value ensures wrong answers score near 0.
**options.extraInfoPenalty** (`number`): Mild penalty for extra information not present in ground truth (capped at 0.2).
**options.scale** (`number`): Score scaling factor.
This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but **requires ground truth** to be provided in the run object.
## `.run()` returns
**runId** (`string`): The id of the run (optional).
**score** (`number`): Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth.
**reason** (`string`): Human-readable explanation of the score with actionable feedback.
**preprocessStepResult** (`object`): Extracted semantic units from output and ground truth.
**analyzeStepResult** (`object`): Detailed analysis of matches, contradictions, and extra information.
**preprocessPrompt** (`string`): The prompt used for semantic unit extraction.
**analyzePrompt** (`string`): The prompt used for similarity analysis.
**generateReasonPrompt** (`string`): The prompt used for generating the explanation.
## Scoring details
The scorer uses a multi-step process:
1. **Extract**: Breaks down output and ground truth into semantic units
2. **Analyze**: Compares units and identifies matches, contradictions, and gaps
3. **Score**: Calculates weighted similarity with penalties for contradictions
4. **Reason**: Generates human-readable explanation
Score calculation: `max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale`
## Example
Evaluate agent responses for similarity to ground truth across different scenarios:
```typescript
import { runEvals } from '@mastra/core/evals'
import { createAnswerSimilarityScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'
const scorer = createAnswerSimilarityScorer({ model: 'openai/gpt-5.4' })
const result = await runEvals({
data: [
{
input: 'What is 2+2?',
groundTruth: '4',
},
{
input: 'What is the capital of France?',
groundTruth: 'The capital of France is Paris',
},
{
input: 'What are the primary colors?',
groundTruth: 'The primary colors are red, blue, and yellow',
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
})
},
})
console.log(result.scores)
```
For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.