@sap-ux/fiori-docs-embeddings
Version:
SAP Fiori documentation indexing
117 lines (78 loc) • 3.22 kB
Markdown
# @sap-ux/fiori-docs-embeddings
A Node.js package that indexes SAP Fiori related documentation from multiple sources and generates vector embeddings for semantic search capabilities.
## What it does
This module- Crawls documentation from GitHub repositories (see Configuration for a list of default sources)
Example sources:
https://github.com/SAP-docs/btp-fiori-tools (Fiori Tools documentation)
https://github.com/SAP-docs/sapui5/tree/main/docs/06_SAP_Fiori_Elements (UI5 Fiori Elements documentation)
https://github.com/SAP-samples/fiori-tools-samples (Sample applications)
https://github.com/SAP-samples/fiori-elements-feature-showcase (Feature examples)
https://ui5.sap.com/test-resources/sap/fe/macros/designtime/apiref/api.json (Design Time API Reference)
https://ui5.sap.com/test-resources/sap/fe/core/fpmExplorer/index.html#/overview/introduction SAP Fiori Development Portal
- Parses markdown, JSON, TypeScript, and other file types
- Generates AI-powered vector embeddings using transformers
- Stores embeddings in a local LanceDB vector database
- Provides tools for semantic and keyword search across documentation
## Installation
```bash
npm install @sap-ux/fiori-docs-embeddings
```
## Usage
### Basic Usage
```javascript
import { getDataPath, getEmbeddingsPath } from '@sap-ux/fiori-docs-embeddings';
// Get paths to data directories
const docsPath = getDataPath();
const embeddingsPath = getEmbeddingsPath();
```
### Building Documentation Index
```bash
# Set GitHub token to avoid rate limits
export GITHUB_TOKEN=your_github_token
# Build documentation index
npm run update-docs
# Generate embeddings
npm run update-embeddings
# Or do both
npm run update-all
```
### Available Scripts
- `update-docs` - Crawl and index documentation from configured sources
- `update-embeddings` - Generate vector embeddings from indexed documents
- `update-all` - Run both documentation indexing and embedding generation
### Configuration
The module indexes documentation from these sources by default:
- SAP-docs/btp-fiori-tools (Fiori Tools documentation)
- SAP-docs/sapui5 (UI5 framework documentation)
- SAP-samples/fiori-tools-samples (Sample applications)
- SAP-samples/fiori-elements-feature-showcase (Feature examples)
### Environment Variables
- `GITHUB_TOKEN` - GitHub personal access token (recommended to avoid rate limits)
### Data Structure
Generated data is organized as:
```
data/
├── docs/ # Parsed documentation files
├── embeddings/ # Vector database (LanceDB)
└── search/ # Search indexes
```
## Features
- **Multi-source indexing** - Supports GitHub repositories and JSON APIs
- **File type support** - Markdown, JSON, TypeScript, JavaScript, XML, YAML, and more
- **Vector embeddings** - Uses sentence-transformers/all-MiniLM-L6-v2 model
- **Local storage** - All data stored locally with LanceDB
- **Caching** - Intelligent caching to avoid unnecessary API calls
- **Chunking** - Smart document chunking for optimal embedding generation
## Development
```bash
# Install dependencies
npm install
# Run tests
npm test
# Check types
npm run build-compiler
# Clean generated files
npm run clean
```
## License
Apache-2.0