@equinor/fusion-framework-cli
Version:
[](./LICENSE)
578 lines (434 loc) • 18.1 kB
Markdown
> [!CAUTION]
>
> **⚠️ INTERNAL USE ONLY - NOT FOR EXTERNAL USE**
>
> **This feature is exclusively for internal use by the Fusion Core team.**
>
> This AI functionality is **NOT supported for third-party users** and is not part of the public API. These commands are experimental and intended only for internal development and documentation purposes within the Fusion Framework team.
>
> **Do not use these commands if you are:**
> - A third-party developer using Fusion Framework
> - Building applications on top of Fusion Framework
> - Expecting official support or documentation for external use
>
> For questions or issues related to these commands, contact the Fusion Core team directly.
---
The Fusion Framework CLI provides powerful AI commands for interacting with Large Language Models (LLMs), generating document embeddings, and performing semantic search. These commands integrate with Azure OpenAI and Azure Cognitive Search to enable intelligent codebase understanding and Q&A capabilities.
## Overview
The `ai` command group includes three main subcommands:
- **`chat`** - Interactive chat with AI models using vector store context retrieval
- **`embeddings`** - Generate embeddings from markdown and TypeScript files for semantic search
- **`search`** - Search the vector store to validate embeddings and retrieve relevant documents
## Prerequisites
Before using the AI commands, you need:
1. **Azure OpenAI Service** with:
- Chat model deployment (e.g., GPT-4, GPT-3.5-turbo)
- Embedding model deployment (e.g., text-embedding-ada-002)
2. **Azure Cognitive Search** with:
- A search service instance
- A search index configured for vector search
3. **Configuration** - Via environment variables (`.env` file for local development or GitHub Variables/Secrets for CI/CD)
## Configuration
### Environment Variables
All AI commands are configured via environment variables. For local development, use a `.env` file in your project root. For CI/CD, use GitHub Variables and Secrets.
#### Local Development (.env file)
Create a `.env` file in your project root:
```bash
# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_INSTANCE_NAME=your-instance-name
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME=text-embedding-ada-002
# Azure Cognitive Search Configuration
AZURE_SEARCH_ENDPOINT=https://your-search.search.windows.net
AZURE_SEARCH_API_KEY=your-search-api-key
AZURE_SEARCH_INDEX_NAME=your-index-name
```
**Note:** Add `.env` to your `.gitignore` to keep credentials secure.
#### CI/CD (GitHub Actions)
For GitHub Actions workflows, configure:
- **Secrets** (Settings → Secrets and variables → Actions → Secrets): For sensitive data
- `AZURE_OPENAI_API_KEY`
- `AZURE_SEARCH_API_KEY`
- **Variables** (Settings → Secrets and variables → Actions → Variables): For non-sensitive configuration
- `AZURE_OPENAI_API_VERSION`
- `AZURE_OPENAI_INSTANCE_NAME`
- `AZURE_OPENAI_CHAT_DEPLOYMENT_NAME`
- `AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME`
- `AZURE_SEARCH_ENDPOINT`
- `AZURE_SEARCH_INDEX_NAME`
### Configuration File
For the `embeddings` command, you can create a `fusion-ai.config.ts` file in your project root:
```typescript
import { configureFusionAI } from '@equinor/fusion-framework-cli-plugin-ai-index';
import type { FusionAIConfigWithIndex } from '@equinor/fusion-framework-cli-plugin-ai-index';
export default configureFusionAI((): FusionAIConfigWithIndex => {
return {
// Index-specific configuration
index: {
// File patterns to match for processing
patterns: [
'packages/**/src/**/*.{ts,tsx}',
'packages/**/docs/**/*.md',
'packages/**/README.md',
],
// Embedding generation configuration
embedding: {
// Size of text chunks for embedding
chunkSize: 1000,
// Overlap between chunks to maintain context
chunkOverlap: 200,
},
// Metadata processing configuration
metadata: {
// Optional: Custom metadata processor
// attributeProcessor: (metadata, document) => {
// // Transform or filter metadata attributes
// return metadata;
// },
},
},
};
});
```
## Commands
### `ai chat`
Interactive chat with Large Language Models enhanced with vector store context retrieval (RAG - Retrieval-Augmented Generation).
#### Features
- Real-time streaming responses from AI models
- Automatic context retrieval from vector store for enhanced accuracy
- Intelligent message history compression using AI summarization
- Special commands: `exit`, `quit`, `clear`, `help`
- Configurable context retrieval limits
#### Usage
```bash
ffc ai chat [options]
```
#### Options
| Option | Description | Default |
|--------|-------------|---------|
| `--context-limit <number>` | Max context documents to retrieve | `5` |
| `--history-limit <number>` | Max messages in conversation history | `20` |
| `--verbose` | Enable verbose output | `false` |
**Note:** Azure configuration (API keys, endpoints, etc.) is provided via environment variables (`.env` file or GitHub Variables/Secrets), not command-line options.
#### Interactive Commands
While in chat mode, you can use these special commands:
- `exit` or `quit` - End the conversation
- `clear` - Clear conversation history
- `help` - Show available commands
- `Ctrl+C` - Exit immediately
#### Examples
```bash
# Start interactive chat with default settings
# (Azure configuration loaded from .env file)
ffc ai chat
# Increase context retrieval limit for more comprehensive responses
ffc ai chat --context-limit 10
# Increase conversation history limit for longer sessions
ffc ai chat --history-limit 100
# Enable verbose output for debugging
ffc ai chat --verbose
```
#### How It Works
1. **Context Retrieval**: When you ask a question, the system automatically searches the vector store for relevant documents
2. **Message Formatting**: Retrieved context is included in the system message to provide the AI with relevant information
3. **Streaming Response**: The AI response streams in real-time for immediate feedback
4. **History Management**: Conversation history is automatically compressed when it reaches 10 messages to maintain context while reducing token usage
### `ai embeddings`
Generate embeddings from markdown and TypeScript files for semantic search indexing.
#### Features
- Markdown document chunking with frontmatter extraction
- TypeScript/TSX TSDoc extraction and chunking
- Glob pattern support for file collection
- Recursive directory processing
- Dry-run mode for testing
- Git diff-based processing for CI/CD workflows
- Automatic metadata extraction (git commit, author, etc.)
#### Usage
```bash
ffc ai embeddings [options] [glob-patterns...]
```
#### Options
| Option | Description | Default |
|--------|-------------|---------|
| `--dry-run` | Show what would be processed without doing it | `false` |
| `--config <path>` | Path to config file | `fusion-ai.config.ts` |
| `--recursive` | Process directories recursively | `false` |
| `--diff` | Process only changed files (workflow mode) | `false` |
| `--base-ref <ref>` | Git reference to compare against | `HEAD~1` |
| `--clean` | Delete all existing documents before processing | `false` |
**Note:** Azure configuration (API keys, endpoints, etc.) is provided via environment variables (`.env` file or GitHub Variables/Secrets), not command-line options.
#### Examples
```bash
# Dry run to see what would be processed
ffc ai embeddings --dry-run ./src
# Process all TypeScript and Markdown files in a directory
ffc ai embeddings ./src
# Process only changed files (useful for CI/CD)
ffc ai embeddings --diff ./src
# Process changed files compared to a specific branch
ffc ai embeddings --diff --base-ref origin/main ./src
# Clean and re-index all documents
ffc ai embeddings --clean ./src
# Process specific file patterns
ffc ai embeddings "packages/**/*.ts" "docs/**/*.md" "docs/**/*.mdx"
# Use custom config file
ffc ai embeddings --config ./custom-ai.config.ts ./src
```
#### Workflow Integration
The `--diff` flag is particularly useful for CI/CD pipelines. See the [CI/CD Integration](#cicd-integration) section for complete workflow examples.
#### File Processing
The command processes two types of files:
1. **Markdown/MDX Files** (`.md`, `.mdx`):
- Extracts frontmatter metadata
- Chunks content based on headers and structure
- Preserves document hierarchy
2. **TypeScript/TSX Files** (`.ts`, `.tsx`):
- Extracts TSDoc comments from functions, classes, and interfaces
- Chunks large TSDoc blocks intelligently
- Preserves code context and signatures
#### Metadata
Each document is enriched with metadata:
- `source` - File path relative to project root
- `rootPath` - Project root directory
- Git metadata (when available):
- `commit` - Git commit hash
- `author` - Commit author
- `date` - Commit date
- `message` - Commit message
### `ai search`
Search the vector store to validate embeddings and retrieve relevant documents using semantic search.
#### Features
- Semantic search using vector embeddings
- Configurable result limits
- OData filter support for metadata-based filtering
- JSON output option for programmatic use
- Detailed result display with scores and metadata
#### Usage
```bash
ffc ai search <query> [options]
```
#### Options
| Option | Description | Default |
|--------|-------------|---------|
| `<query>` | Search query string | Required |
| `--limit <number>` | Maximum number of results | `10` |
| `--filter <expression>` | OData filter expression for metadata filtering | - |
| `--json` | Output results as JSON | `false` |
| `--raw` | Output raw metadata without normalization | `false` |
| `--verbose` | Enable verbose output | `false` |
**Note:** Azure configuration (API keys, endpoints, etc.) is provided via environment variables (`.env` file or GitHub Variables/Secrets), not command-line options.
#### Examples
```bash
# Basic search
ffc ai search "how to use the framework"
# Limit results
ffc ai search "authentication" --limit 5
# Filter by source file
ffc ai search "typescript" --filter "metadata/source eq 'src/index.ts'"
# Output as JSON for programmatic use
ffc ai search "documentation" --json
# Output raw metadata structure
ffc ai search "documentation" --json --raw
# Enable verbose output
ffc ai search "API reference" --verbose
```
#### OData Filter Examples
The `--filter` option supports OData filter expressions:
```bash
# Filter by source file
--filter "metadata/source eq 'packages/framework/src/index.ts'"
# Filter by multiple sources
--filter "metadata/source eq 'src/a.ts' or metadata/source eq 'src/b.ts'"
# Filter by commit author
--filter "metadata/attributes/author eq 'John Doe'"
# Filter by date range (if available in metadata)
--filter "metadata/attributes/date gt '2024-01-01'"
```
#### Output Formats
**Human-readable format** (default):
- Shows results with scores, sources, and content previews
- Truncates long content for readability
- Displays metadata in a structured format
**JSON format** (`--json`):
- Machine-readable output
- Full document content and metadata
- Suitable for piping to other tools or scripts
## Common Workflows
### Initial Setup
1. **Create `.env` file** in your project root with Azure configuration:
```bash
# See Configuration section above for all required variables
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_INSTANCE_NAME=your-instance
# ... other variables
```
2. **Create configuration file** (optional):
```bash
# Create fusion-ai.config.ts in project root
```
3. **Generate initial embeddings**:
```bash
ffc ai embeddings --clean ./src
```
4. **Verify embeddings with search**:
```bash
ffc ai search "test query"
```
5. **Start using chat**:
```bash
ffc ai chat
```
### Incremental Updates
For ongoing development, use diff-based processing:
```bash
# Process only changed files
ffc ai embeddings --diff ./src
```
### CI/CD Integration
#### Basic Workflow
```yaml
# Example GitHub Actions workflow
name: Update AI Index
on:
push:
branches: [main]
jobs:
update-embeddings:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for --diff to work
- uses: pnpm/action-setup@v2
with:
version: 9
- uses: actions/setup-node@v4
with:
node-version: '24'
- run: pnpm install
- name: Update embeddings
run: |
ffc ai embeddings --diff
env:
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_API_VERSION: ${{ vars.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_INSTANCE_NAME: ${{ vars.AZURE_OPENAI_INSTANCE_NAME }}
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME: ${{ vars.AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME }}
AZURE_SEARCH_ENDPOINT: ${{ vars.AZURE_SEARCH_ENDPOINT }}
AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
AZURE_SEARCH_INDEX_NAME: ${{ vars.AZURE_SEARCH_INDEX_NAME }}
```
#### Advanced Workflow with SHA Tracking
For production use, track the last indexed commit SHA to enable efficient incremental updates:
```yaml
name: Index Documentation
on:
push:
branches: [main]
jobs:
index-docs:
runs-on: ubuntu-latest
permissions:
contents: write # Required to commit the SHA file
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch full history for diff comparison
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '24'
cache: 'pnpm'
- name: Install dependencies
run: pnpm install
- name: Get last indexed SHA
id: last_sha
run: |
if [ -f .index-base-ref ]; then
echo "last_sha=$(cat .index-base-ref)" >> $GITHUB_OUTPUT
else
echo "last_sha=" >> $GITHUB_OUTPUT
fi
- name: Index documentation
run: |
if [ -n "${{ steps.last_sha.outputs.last_sha }}" ]; then
ffc ai embeddings --diff --base-ref ${{ steps.last_sha.outputs.last_sha }}
else
ffc ai embeddings
fi
env:
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_API_VERSION: ${{ vars.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_INSTANCE_NAME: ${{ vars.AZURE_OPENAI_INSTANCE_NAME }}
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME: ${{ vars.AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME }}
AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
AZURE_SEARCH_ENDPOINT: ${{ vars.AZURE_SEARCH_ENDPOINT }}
AZURE_SEARCH_INDEX_NAME: ${{ vars.AZURE_SEARCH_INDEX_NAME }}
- name: Save current SHA
run: echo ${{ github.sha }} > .index-base-ref
- name: Commit and push SHA file
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add .index-base-ref
git commit -m "Update last indexed SHA" || echo "No changes to commit"
git push
```
**Benefits of SHA tracking:**
- Only processes files changed since the last successful indexing
- Handles cases where previous runs may have failed
- Automatically falls back to full indexing on first run
- More efficient than comparing against a fixed branch reference
**GitHub Secrets and Variables:**
- **Secrets** (Settings → Secrets and variables → Actions → Secrets): Store sensitive data like API keys
- `AZURE_OPENAI_API_KEY`
- `AZURE_SEARCH_API_KEY`
- **Variables** (Settings → Secrets and variables → Actions → Variables): Store non-sensitive configuration
- `AZURE_OPENAI_API_VERSION`
- `AZURE_OPENAI_INSTANCE_NAME`
- `AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME`
- `AZURE_SEARCH_ENDPOINT`
- `AZURE_SEARCH_INDEX_NAME`
## Troubleshooting
### Common Issues
**Error: "API key is required"**
- Ensure environment variables are set in your `.env` file
- Verify the `.env` file is in your project root
- Check that variable names match exactly (case-sensitive)
- For CI/CD, verify GitHub Secrets and Variables are configured
**Error: "Azure Search index name is required"**
- Ensure `AZURE_SEARCH_INDEX_NAME` is set
- Verify the index exists in your Azure Search service
**No results from search**
- Verify embeddings have been generated: `ffc ai embeddings --dry-run ./src`
- Check that the index contains documents
- Try a broader search query
**Chat not retrieving context**
- Verify vector store is configured correctly in your `.env` file
- Check that embeddings exist in the index
- Ensure `AZURE_SEARCH_INDEX_NAME` is set correctly
**Embeddings command processes no files**
- Check file patterns in config or command arguments
- Verify files match the patterns (use `--dry-run` to debug)
- Ensure files are not ignored by `.gitignore`
### Debug Mode
Use the `--verbose` flag for detailed output:
```bash
ffc ai chat --verbose
ffc ai search "query" --verbose
```
## Best Practices
1. **Use environment variables** for sensitive credentials
2. **Start with `--dry-run`** when testing new configurations
3. **Use `--diff` in CI/CD** to only process changed files
4. **Regular re-indexing** with `--clean` to keep index fresh
5. **Monitor token usage** - embeddings and chat consume API tokens
6. **Test search queries** before relying on chat context retrieval
7. **Keep configuration files** in version control (without secrets)
## Additional Resources
- [Azure OpenAI Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
- [Azure Cognitive Search Documentation](https://learn.microsoft.com/azure/search/)
- [LangChain Documentation](https://js.langchain.com/) (used internally for RAG)