@lobehub/chat
Version:
Lobe Chat - an open-source, high-performance chatbot framework that supports speech synthesis, multimodal, and extensible Function Call plugin system. Supports one-click free deployment of your private ChatGPT/LLM web application.
225 lines (162 loc) • 6.8 kB
Markdown
# @lobechat/prompts
This package contains prompt chains and templates for the LobeChat application, with comprehensive testing using promptfoo.
## Features
- **Prompt Chains**: Reusable prompt templates for various AI tasks
- **AI Testing**: Comprehensive testing using promptfoo for prompt quality assurance
- **Multi-language Support**: Prompts and tests for multiple languages
- **Type Safety**: Full TypeScript support with proper type definitions
## Available Prompt Chains
- `chainSummaryTitle` - Generate conversation titles
- `chainLangDetect` - Detect language of input text
- `chainTranslate` - Translate content between languages
- `chainPickEmoji` - Select appropriate emojis for content
- `chainAnswerWithContext` - Answer questions using knowledge base context
## Testing with promptfoo
This package uses [promptfoo](https://promptfoo.dev) for AI-powered testing of prompts. The testing suite evaluates prompt quality, consistency, and performance across different AI models.
### Prerequisites
Set up your API keys in your environment:
```bash
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key" # optional
```
### Running Tests
```bash
# Run all prompt tests
pnpm test:prompts
# Run tests in watch mode for development
pnpm test:prompts:watch
# Generate summary report
pnpm test:prompts:summary
# Run tests for CI (no cache, structured output)
pnpm test:prompts:ci
# View test results in web UI
pnpm promptfoo:view
```
### Test Configuration
Tests are organized by prompt type in the `promptfoo/` directory:
```
promptfoo/
├── summary-title/
│ ├── eval.yaml # Test configuration
│ └── prompt.ts # Prompt wrapper
├── translation/
│ ├── eval.yaml
│ └── prompt.ts
├── language-detection/
│ ├── eval.yaml
│ └── prompt.ts
├── emoji-picker/
│ ├── eval.yaml
│ └── prompt.ts
└── knowledge-qa/
├── eval.yaml
└── prompt.ts
```
Each test configuration includes:
- Multiple test cases with different inputs
- Assertions for output validation (regex, JSON, custom logic)
- LLM-based rubric evaluation for semantic correctness
- Performance and cost monitoring
### Test Structure
Tests directly use the actual prompt chain functions from `src/chains/`. The TypeScript wrapper files in `promptfoo/prompts/` import and call the real chain functions, ensuring perfect synchronization.
```yaml
description: Test description
providers:
- openai:gpt-4o-mini
- anthropic:claude-3-5-haiku-latest
prompts:
- file://prompts/summary-title.ts # Imports and uses src/chains/summaryTitle.ts
tests:
- vars:
messages: [...]
locale: 'en-US'
assert:
- type: llm-rubric
value: 'Expected behavior description'
provider: openai:gpt-4o # Specify grader model for LLM rubric
- type: contains
value: 'expected text'
- type: not-contains
value: 'unwanted text'
```
### Adding New Tests
1. Create a test configuration file in `promptfoo/`
2. Create a TypeScript wrapper in `promptfoo/prompts/` that imports and calls your chain function from `src/chains/`
3. Add the test to `promptfooconfig.yaml`
4. Run tests to validate
**Advantage**: The wrapper files automatically stay in sync with source code changes since they directly import and use the actual chain functions.
### Performance Monitoring
Tests include performance monitoring:
- Response time tracking
- Cost per request monitoring
- Quality score evaluation
- Cross-model consistency checks
### CI Integration
The `test:prompts:ci` script is designed for continuous integration:
- Structured JSON output for parsing
- No interactive prompts
- Clear pass/fail status codes
- Detailed error reporting
## Development
```bash
# Install dependencies
pnpm install
# Run unit tests
pnpm test
# Run prompt tests
pnpm test:prompts
# Run all tests
pnpm test && pnpm test:prompts
```
## Contributing
When adding new prompt chains:
1. Implement the prompt function in `src/chains/`
2. Add unit tests in `src/chains/__tests__/`
3. Create promptfoo tests in `promptfoo/`
4. Update this README with the new chain description
## Architecture
The package follows a layered architecture:
```
src/
├── chains/ # Prompt chain implementations
├── prompts/ # Prompt templates and utilities
└── index.ts # Main exports
promptfoo/
├── prompts/ # Prompt implementations for testing
├── *.yaml # Test configurations
└── results/ # Test output directory
```
## Best Practices
1. **Test Coverage**: Every prompt chain should have comprehensive promptfoo tests
2. **Multi-language**: Test prompts with multiple languages when applicable
3. **Edge Cases**: Include tests for edge cases and error conditions
4. **Performance**: Monitor cost and response time in tests
5. **Consistency**: Use consistent assertion patterns across tests
6. **Prompt Optimization**: Use test results to iteratively improve prompts (see CLAUDE.md for optimization workflow)
## Prompt Optimization Workflow
This package follows an iterative prompt optimization process using promptfoo test results:
### Example: Translation Prompt Optimization
**Initial State**: 85% pass rate with issues:
- Claude models added explanatory text ("以下是翻译...")
- GPT models over-translated technical terms (`API_KEY_12345` → `API 密钥_12345`)
**Optimization Process**:
1. **Identify Failures**: Run tests and analyze specific failure patterns
2. **Update Prompts**: Modify prompt rules based on failure analysis
- Added: "Output ONLY the translated text, no explanations"
- Added: "Preserve technical terms, code identifiers, API keys exactly as they appear"
3. **Re-run Tests**: Validate improvements across all models
4. **Iterate**: Repeat until 100% pass rate achieved
**Final Result**: 100% pass rate (14/14 tests) across GPT-5-mini, Claude-3.5-Haiku, and Gemini-Flash
### Example: Knowledge Q\&A Optimization
**Initial State**: 71.43% pass rate with context handling issues
**Optimization Journey**:
- **Round 1** (80.95%): Clarified context relevance checking
- **Round 2** (90.48%): Distinguished between "no context" vs "irrelevant context"
- **Round 3** (92.86%): Added explicit rules for partial context
- **Round 4** (96.43%): Emphasized supplementing with general knowledge
- **Final** (100%): Added concrete example and MUST/SHOULD directives
**Key Learning**: When context is topic-relevant but information-limited, models should:
- Use context as foundation
- Supplement with general knowledge
- Provide practical, actionable guidance
See `CLAUDE.md` for detailed prompt engineering guidelines.