@mastra/core

Version:

Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.

97 lines (64 loc) • 3.86 kB

Markdown

# Answer similarity scorer The `createAnswerSimilarityScorer()` function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time. ## Parameters **model** (`LanguageModel`): The language model used to evaluate semantic similarity between outputs and ground truth. **options** (`AnswerSimilarityOptions`): Configuration options for the scorer. **options.requireGroundTruth** (`boolean`): Whether to require ground truth for evaluation. If false, missing ground truth returns score 0. **options.semanticThreshold** (`number`): Weight for semantic matches vs exact matches (0-1). **options.exactMatchBonus** (`number`): Additional score bonus for exact matches (0-1). **options.missingPenalty** (`number`): Penalty per missing key concept from ground truth. **options.contradictionPenalty** (`number`): Penalty for contradictory information. High value ensures wrong answers score near 0. **options.extraInfoPenalty** (`number`): Mild penalty for extra information not present in ground truth (capped at 0.2). **options.scale** (`number`): Score scaling factor. This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but **requires ground truth** to be provided in the run object. ## `.run()` returns **runId** (`string`): The id of the run (optional). **score** (`number`): Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth. **reason** (`string`): Human-readable explanation of the score with actionable feedback. **preprocessStepResult** (`object`): Extracted semantic units from output and ground truth. **analyzeStepResult** (`object`): Detailed analysis of matches, contradictions, and extra information. **preprocessPrompt** (`string`): The prompt used for semantic unit extraction. **analyzePrompt** (`string`): The prompt used for similarity analysis. **generateReasonPrompt** (`string`): The prompt used for generating the explanation. ## Scoring details The scorer uses a multi-step process: 1. **Extract**: Breaks down output and ground truth into semantic units 2. **Analyze**: Compares units and identifies matches, contradictions, and gaps 3. **Score**: Calculates weighted similarity with penalties for contradictions 4. **Reason**: Generates human-readable explanation Score calculation: `max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale` ## Example Evaluate agent responses for similarity to ground truth across different scenarios: ```typescript import { runEvals } from '@mastra/core/evals' import { createAnswerSimilarityScorer } from '@mastra/evals/scorers/prebuilt' import { myAgent } from './agent' const scorer = createAnswerSimilarityScorer({ model: 'openai/gpt-5.4' }) const result = await runEvals({ data: [ { input: 'What is 2+2?', groundTruth: '4', }, { input: 'What is the capital of France?', groundTruth: 'The capital of France is Paris', }, { input: 'What are the primary colors?', groundTruth: 'The primary colors are red, blue, and yellow', }, ], scorers: [scorer], target: myAgent, onItemComplete: ({ scorerResults }) => { console.log({ score: scorerResults[scorer.id].score, reason: scorerResults[scorer.id].reason, }) }, }) console.log(result.scores) ``` For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals). To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.