@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
105 lines (69 loc) • 3.64 kB
Markdown
# Answer relevancy scorer
The `createAnswerRelevancyScorer()` function accepts a single options object with the following properties:
## Parameters
**model** (`LanguageModel`): Configuration for the model used to evaluate relevancy.
**uncertaintyWeight** (`number`): Weight given to 'unsure' verdicts in scoring (0-1). (Default: `0.3`)
**scale** (`number`): Maximum score value. (Default: `1`)
This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but the return value includes LLM-specific fields as documented below.
## `.run()` returns
**runId** (`string`): The id of the run (optional).
**score** (`number`): Relevancy score (0 to scale, default 0-1)
**preprocessPrompt** (`string`): The prompt sent to the LLM for the preprocess step (optional).
**preprocessStepResult** (`object`): Object with extracted statements: { statements: string\[] }
**analyzePrompt** (`string`): The prompt sent to the LLM for the analyze step (optional).
**analyzeStepResult** (`object`): Object with results: { results: Array<{ result: 'yes' | 'unsure' | 'no', reason: string }> }
**generateReasonPrompt** (`string`): The prompt sent to the LLM for the reason step (optional).
**reason** (`string`): Explanation of the score.
## Scoring details
The scorer evaluates relevancy through query-answer alignment, considering completeness and detail level, but not factual correctness.
### Scoring Process
1. **Statement Preprocess:**
- Breaks output into meaningful statements while preserving context.
2. **Relevance Analysis:**
- Each statement is evaluated as:
- "yes": Full weight for direct matches
- "unsure": Partial weight (default: 0.3) for approximate matches
- "no": Zero weight for irrelevant content
3. **Score Calculation:**
- `((direct + uncertainty * partial) / total_statements) * scale`
### Score Interpretation
A relevancy score between 0 and 1:
- **1.0**: The response fully answers the query with relevant and focused information.
- **0.7–0.9**: The response mostly answers the query but may include minor unrelated content.
- **0.4–0.6**: The response partially answers the query, mixing relevant and unrelated information.
- **0.1–0.3**: The response includes minimal relevant content and largely misses the intent of the query.
- **0.0**: The response is entirely unrelated and doesn't answer the query.
## Example
Evaluate agent responses for relevancy across different scenarios:
```typescript
import { runEvals } from '@mastra/core/evals'
import { createAnswerRelevancyScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'
const scorer = createAnswerRelevancyScorer({ model: 'openai/gpt-5.4' })
const result = await runEvals({
data: [
{
input: 'What are the health benefits of regular exercise?',
},
{
input: 'What should a healthy breakfast include?',
},
{
input: 'What are the benefits of meditation?',
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
})
},
})
console.log(result.scores)
```
For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
## Related
- [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness)