@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
203 lines (140 loc) • 6.73 kB
Markdown
# Hallucination scorer
The `createHallucinationScorer()` function evaluates whether an LLM generates factually correct information by comparing its output against the provided context. This scorer measures hallucination by identifying direct contradictions between the context and the output.
## Parameters
The `createHallucinationScorer()` function accepts a single options object with the following properties:
**model** (`LanguageModel`): Configuration for the model used to evaluate hallucination.
**options** (`Options`): Configuration options.
**options.scale** (`number`): Maximum score value.
**options.context** (`string[]`): Static context strings to use as ground truth for hallucination detection.
**options.getContext** (`(params: GetContextParams) => string[] | Promise<string[]>`): A hook to dynamically resolve context at runtime. Takes priority over static context. Useful for live scoring where context (like tool results) is only available when the scorer runs.
This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but the return value includes LLM-specific fields as documented below.
## `.run()` returns
**runId** (`string`): The id of the run (optional).
**preprocessStepResult** (`object`): Object with extracted claims: { claims: string\[] }
**preprocessPrompt** (`string`): The prompt sent to the LLM for the preprocess step (optional).
**analyzeStepResult** (`object`): Object with verdicts: { verdicts: Array<{ statement: string, verdict: 'yes' | 'no', reason: string }> }
**analyzePrompt** (`string`): The prompt sent to the LLM for the analyze step (optional).
**score** (`number`): Hallucination score (0 to scale, default 0-1).
**reason** (`string`): Detailed explanation of the score and identified contradictions.
**generateReasonPrompt** (`string`): The prompt sent to the LLM for the generateReason step (optional).
## Scoring details
The scorer evaluates hallucination through contradiction detection and unsupported claim analysis.
### Scoring Process
1. Analyzes factual content:
- Extracts statements from context
- Identifies numerical values and dates
- Maps statement relationships
2. Analyzes output for hallucinations:
- Compares against context statements
- Marks direct conflicts as hallucinations
- Identifies unsupported claims as hallucinations
- Evaluates numerical accuracy
- Considers approximation context
3. Calculates hallucination score:
- Counts hallucinated statements (contradictions and unsupported claims)
- Divides by total statements
- Scales to configured range
Final score: `(hallucinated_statements / total_statements) * scale`
### Important Considerations
- Claims not present in context are treated as hallucinations
- Subjective claims are hallucinations unless explicitly supported
- Speculative language ("might", "possibly") about facts IN context is allowed
- Speculative language about facts NOT in context is treated as hallucination
- Empty outputs result in zero hallucinations
- Numerical evaluation considers:
- Scale-appropriate precision
- Contextual approximations
- Explicit precision indicators
### Score interpretation
A hallucination score between 0 and 1:
- **0.0**: No hallucination — all claims match the context.
- **0.3–0.4**: Low hallucination — a few contradictions.
- **0.5–0.6**: Mixed hallucination — several contradictions.
- **0.7–0.8**: High hallucination — many contradictions.
- **0.9–1.0**: Complete hallucination — most or all claims contradict the context.
**Note:** The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context
## Examples
### Static Context
Use static context when you have known ground truth to compare against:
```typescript
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
const scorer = createHallucinationScorer({
model: 'openai/gpt-5.4',
options: {
context: [
'The first iPhone was announced on January 9, 2007.',
'It was released on June 29, 2007.',
'Steve Jobs introduced it at Macworld.',
],
},
})
```
### Dynamic Context with `getContext`
Use `getContext` for live scoring scenarios where context comes from tool results:
```typescript
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { extractToolResults } from '@mastra/evals/scorers'
const scorer = createHallucinationScorer({
model: 'openai/gpt-5.4',
options: {
getContext: ({ run, step }) => {
// Extract tool results as context
const toolResults = extractToolResults(run.output)
return toolResults.map(t => JSON.stringify({ tool: t.toolName, result: t.result }))
},
},
})
```
### Live Scoring with Agent
Attach the scorer to an agent for live evaluation:
```typescript
import { Agent } from '@mastra/core/agent'
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { extractToolResults } from '@mastra/evals/scorers'
const hallucinationScorer = createHallucinationScorer({
model: 'openai/gpt-5.4',
options: {
getContext: ({ run }) => {
const toolResults = extractToolResults(run.output)
return toolResults.map(t => JSON.stringify({ tool: t.toolName, result: t.result }))
},
},
})
const agent = new Agent({
name: 'my-agent',
model: 'openai/gpt-5.4',
instructions: 'You are a helpful assistant.',
evals: {
scorers: [hallucinationScorer],
},
})
```
### Batch Evaluation with `runEvals`
```typescript
import { runEvals } from '@mastra/core/evals'
import { createHallucinationScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'
const scorer = createHallucinationScorer({
model: 'openai/gpt-5.4',
options: {
context: ['Known fact 1', 'Known fact 2'],
},
})
const result = await runEvals({
data: [{ input: 'Tell me about topic A' }, { input: 'Tell me about topic B' }],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
})
},
})
console.log(result.scores)
```
For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
## Related
- [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness)
- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy)