@mastra/core
Version:
Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
196 lines (143 loc) • 6.45 kB
Markdown
# Context precision scorer
The `createContextPrecisionScorer()` function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses **Mean Average Precision (MAP)** to reward systems that place relevant context earlier in the sequence.
It's especially useful for these use cases:
## RAG system evaluation
Ideal for evaluating retrieved context in RAG pipelines where:
- Context ordering matters for model performance
- You need to measure retrieval quality beyond basic relevance
- Early relevant context is more valuable than later relevant context
## Context window optimization
Use when optimizing context selection for:
- Limited context windows
- Token budget constraints
- Multi-step reasoning tasks
## Parameters
**model** (`MastraModelConfig`): The language model to use for evaluating context relevance
**options** (`ContextPrecisionMetricOptions`): Configuration options for the scorer
**Note**: Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.
## `.run()` returns
**score** (`number`): Mean Average Precision score between 0 and scale (default 0-1)
**reason** (`string`): Human-readable explanation of the context precision evaluation
## Scoring details
### Mean Average Precision (MAP)
Context Precision uses **Mean Average Precision** to evaluate both relevance and positioning:
1. **Context Evaluation**: Each context piece is classified as relevant or irrelevant for generating the expected output
2. **Precision Calculation**: For each relevant context at position `i`, precision = `relevant_items_so_far / (i + 1)`
3. **Average Precision**: Sum all precision values and divide by total relevant items
4. **Final Score**: Multiply by scale factor and round to 2 decimals
### Scoring Formula
```text
MAP = (Σ Precision@k) / R
Where:
- Precision@k = (relevant items in positions 1...k) / k
- R = total number of relevant items
- Only calculated at positions where relevant items appear
```
### Score Interpretation
- **0.9-1.0**: Excellent precision - all relevant context early in sequence
- **0.7-0.8**: Good precision - most relevant context well-positioned
- **0.4-0.6**: Moderate precision - relevant context mixed with irrelevant
- **0.1-0.3**: Poor precision - little relevant context or poorly positioned
- **0.0**: No relevant context found
### Reason analysis
The reason field explains:
- Which context pieces were deemed relevant/irrelevant
- How positioning affected the MAP calculation
- Specific relevance criteria used in evaluation
### Optimization insights
Use results to:
- **Improve retrieval**: Filter out irrelevant context before ranking
- **Optimize ranking**: Ensure relevant context surfaces early
- **Tune chunk size**: Balance context detail vs. relevance precision
- **Evaluate embeddings**: Test different embedding models for better retrieval
### Example Calculation
Given context: `[relevant, irrelevant, relevant, irrelevant]`
- Position 0: Relevant → Precision = 1/1 = 1.0
- Position 1: Skip (irrelevant)
- Position 2: Relevant → Precision = 2/3 = 0.67
- Position 3: Skip (irrelevant)
MAP = (1.0 + 0.67) / 2 = 0.835 ≈ **0.83**
## Scorer configuration
### Dynamic context extraction
```typescript
const scorer = createContextPrecisionScorer({
model: 'openai/gpt-5.4',
options: {
contextExtractor: (input, output) => {
// Extract context dynamically based on the query
const query = input?.inputMessages?.[0]?.content || ''
// Example: Retrieve from a vector database
const searchResults = vectorDB.search(query, { limit: 10 })
return searchResults.map(result => result.content)
},
scale: 1,
},
})
```
### Large context evaluation
```typescript
const scorer = createContextPrecisionScorer({
model: 'openai/gpt-5.4',
options: {
context: [
// Simulate retrieved documents from vector database
'Document 1: Highly relevant content...',
'Document 2: Somewhat related content...',
'Document 3: Tangentially related...',
'Document 4: Not relevant...',
'Document 5: Highly relevant content...',
// ... up to dozens of context pieces
],
},
})
```
## Example
Evaluate RAG system context retrieval precision for different queries:
```typescript
import { runEvals } from '@mastra/core/evals'
import { createContextPrecisionScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'
const scorer = createContextPrecisionScorer({
model: 'openai/gpt-5.4',
options: {
contextExtractor: (input, output) => {
// Extract context from agent's retrieved documents
return output.metadata?.retrievedContext || []
},
},
})
const result = await runEvals({
data: [
{
input: 'How does photosynthesis work in plants?',
},
{
input: 'What are the mental and physical benefits of exercise?',
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
})
},
})
console.log(result.scores)
```
For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).
To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.
## Comparison with context relevance
Choose the right scorer for your needs:
| Use Case | Context Relevance | Context Precision |
| ------------------------ | -------------------- | ------------------------- |
| **RAG evaluation** | When usage matters | When ranking matters |
| **Context quality** | Nuanced levels | Binary relevance |
| **Missing detection** | ✓ Identifies gaps | ✗ Not evaluated |
| **Usage tracking** | ✓ Tracks utilization | ✗ Not considered |
| **Position sensitivity** | ✗ Position agnostic | ✓ Rewards early placement |
## Related
- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy): Evaluates if answers address the question
- [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness): Measures answer groundedness in context
- [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers): Creating your own evaluation metrics