hikma-engine

Version:

Code Knowledge Graph Indexer - A sophisticated TypeScript-based indexer that transforms Git repositories into multi-dimensional knowledge stores for AI agents

github.com/foyzulkarim/hikma-engine

foyzulkarim/hikma-engine

503 lines (383 loc) • 20.9 kB

Markdown

# Hikma Engine A TypeScript-based code knowledge graph indexer that transforms Git repositories into searchable knowledge stores for AI agents. Creates interconnected representations of codebases through AST parsing and vector embeddings. ## Features - **AST-based code structure extraction**: Deep understanding of code relationships - **Vector embeddings**: Semantic similarity search with multiple providers - **Configurable LLM providers**: Support for local Python models, OpenAI API, and local services (LM Studio, Ollama) - **Python ML integration**: Advanced embedding models via Python bridge - **Intelligent fallback system**: Automatic fallback between providers for reliability - **Comprehensive monitoring**: Request tracking, performance metrics, and error analysis - **Unified CLI**: Single `hikma-engine` command for all operations (embed, search, rag) - **SQLite storage**: Unified storage with sqlite-vec extension ## Installation ### Prerequisites - Node.js >= 20.0.0 - Git repository for indexing - Python 3.10+ ### Clone and run the project ```bash # Clone repository git clone https://github.com/foyzulkarim/hikma-engine cd hikma-engine # Install dependencies npm install ``` For python provider, set up Python dependencies: ```bash # After installing hikma-engine npm run setup-python ``` ## CLI Usage Hikma Engine provides three main commands: `embed`, `search`, and `rag` with an **explicit CLI approach** that requires no configuration files. ### Key Features - **No .env dependencies**: All configuration is explicit via CLI flags - **Required provider**: `--provider` is mandatory for all commands - **NPX-friendly**: Works perfectly with `npx` without local installation - **Self-documenting**: All options are visible and explicit - **Scriptable**: Perfect for CI/CD and automation ### Quick Examples #### Using Python Provider (Local Models) ```bash # Embed with Python provider npm run embed -- --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # Search with Python provider npm run search -- "database configuration" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # RAG with Python provider npm run rag -- "How does authentication work?" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --llm-model "Qwen/Qwen2.5-Coder-1.5B-Instruct" ``` #### Using Server Provider (Ollama/LM Studio) ```bash # Embed with Ollama npm run embed -- --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest # Search with Ollama npm run search -- "database configuration" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest # RAG with Ollama npm run rag -- "How does authentication work?" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --llm-model qwen2.5-coder:7b --max-tokens 3000 ``` #### Using NPX (No Local Installation) ```bash # Works anywhere without installing hikma-engine locally npx hikma-engine embed --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --dir /path/to/project npx hikma-engine search "authentication" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --dir /path/to/project npx hikma-engine rag "How does this work?" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --llm-model qwen2.5-coder:7b --dir /path/to/project ``` ### Required and Optional Flags #### Required for All Commands - `--provider <python|server|local|transformers>`: **REQUIRED** - Specifies the AI provider to use #### Required for Server Provider - `--server-url <url>`: **REQUIRED when using `--provider server`** - Base URL for OpenAI-compatible server #### Common Optional Flags - `--dir <path>`: Project directory (defaults to current directory) - `--embedding-model <model>`: Override default embedding model - `--llm-model <model>`: Override default LLM model (for `rag` command) - `--install-python-deps`: Auto-install Python dependencies when using Python provider #### Command-Specific Flags - **embed**: `--force-full`, `--skip-embeddings` - **search**: `--limit <n>`, `--min-similarity <0..1>` - **rag**: `--top-k <n>`, `--max-tokens <n>` ### Intelligent Defaults When you specify a provider, Hikma Engine automatically selects appropriate default models: - **Python provider**: `mixedbread-ai/mxbai-embed-large-v1` (embedding), `Qwen/Qwen2.5-Coder-1.5B-Instruct` (LLM) - **Server provider**: `text-embedding-ada-002` (embedding), `gpt-3.5-turbo` (LLM) - **Local provider**: `Xenova/all-MiniLM-L6-v2` (embedding), `Xenova/gpt2` (LLM) - **Transformers provider**: `Xenova/all-MiniLM-L6-v2` (embedding) ### Directory Handling Each project gets its own SQLite database stored in the project directory. You can work with multiple projects simultaneously: ```bash # Index project A npm run embed -- --provider python --dir /path/to/project-a # Index project B npm run embed -- --provider python --dir /path/to/project-b # Search in specific project npm run search -- "authentication" --provider python --dir /path/to/project-a ``` ## Configuration ### Explicit CLI Approach (Recommended) Hikma Engine now uses an **explicit CLI approach** that requires no configuration files. All settings are specified directly via command-line flags: ```bash # Everything is explicit - no hidden configuration npm run embed -- --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" npm run search -- "query" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest npm run rag -- "question" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --llm-model qwen2.5-coder:7b ``` **Benefits:** - ✅ **No .env files needed** - Everything is explicit - ✅ **NPX-friendly** - Works without local installation - ✅ **Self-documenting** - All options are visible - ✅ **Scriptable** - Perfect for CI/CD pipelines - ✅ **No hidden state** - What you see is what you get ### Legacy Environment Variables (Optional) For backward compatibility, you can still use environment variables by copying `.env.example` to `.env`. However, **CLI flags take precedence** over environment variables. ```bash cp .env.example .env ``` #### Main Configuration - `HIKMA_LOG_LEVEL`: Logging level (debug, info, warn, error). Default: `info` - `HIKMA_SQLITE_PATH`: SQLite database path. Default: `./data/metadata.db` - `HIKMA_SQLITE_VEC_EXTENSION`: sqlite-vec extension path. Default: `./extensions/vec0.dylib` #### Legacy AI Configuration These are only used when CLI flags are not provided: **Embedding Configuration:** - `HIKMA_EMBEDDING_PROVIDER`: Provider (`python`, `openai`). Default: `python` - `HIKMA_EMBEDDING_MODEL`: Model for Python provider - `HIKMA_EMBEDDING_OPENAI_API_URL`: Server URL for OpenAI-compatible APIs - `HIKMA_EMBEDDING_OPENAI_API_KEY`: API key (optional for local services) - `HIKMA_EMBEDDING_OPENAI_MODEL`: Model name for server provider **LLM Configuration:** - `HIKMA_ENGINE_LLM_PROVIDER`: Provider (`python`, `openai`). Default: `python` - `HIKMA_ENGINE_LLM_PYTHON_MODEL`: Python model name - `HIKMA_ENGINE_LLM_OPENAI_API_URL`: Server URL - `HIKMA_ENGINE_LLM_OPENAI_API_KEY`: API key - `HIKMA_ENGINE_LLM_OPENAI_MODEL`: Model name - `HIKMA_ENGINE_LLM_OPENAI_MAX_TOKENS`: Max response tokens. Default: `400` - `HIKMA_ENGINE_LLM_OPENAI_TEMPERATURE`: Sampling temperature. Default: `0.6` Example for Ollama: ```bash HIKMA_EMBEDDING_PROVIDER=openai HIKMA_EMBEDDING_OPENAI_API_URL=http://localhost:11434 HIKMA_EMBEDDING_OPENAI_MODEL=mxbai-embed-large:latest ``` Example for LM Studio embeddings: ```bash HIKMA_EMBEDDING_PROVIDER=openai HIKMA_EMBEDDING_OPENAI_API_URL=http://localhost:1234 HIKMA_EMBEDDING_OPENAI_MODEL=text-embedding-mxbai-embed-large-v1 ``` #### RAG Configuration - `HIKMA_RAG_MODEL`: The RAG model for code explanation. Default: `Qwen/Qwen2.5-Coder-1.5B-Instruct`. ### LLM Provider Configuration - `HIKMA_ENGINE_LLM_PROVIDER`: The LLM provider for code explanations. Options: `python`, `openai`. Default: `python`. - `HIKMA_ENGINE_LLM_TIMEOUT`: Request timeout in milliseconds. Default: `300000`. - `HIKMA_ENGINE_LLM_RETRY_ATTEMPTS`: Number of retry attempts. Default: `3`. - `HIKMA_ENGINE_LLM_RETRY_DELAY`: Delay between retries in milliseconds. Default: `1000`. #### Python Provider When `HIKMA_ENGINE_LLM_PROVIDER=python`: - `HIKMA_ENGINE_LLM_PYTHON_MODEL`: The model to use. Default: `Qwen/Qwen2.5-Coder-1.5B-Instruct`. - `HIKMA_ENGINE_LLM_PYTHON_MAX_RESULTS`: Max results for the model. Default: `8`. #### OpenAI Provider When `HIKMA_ENGINE_LLM_PROVIDER=openai` (for OpenAI API or other compatible services like LM Studio/Ollama; `server` in CLI): - `HIKMA_ENGINE_LLM_OPENAI_API_URL`: The API endpoint. - `HIKMA_ENGINE_LLM_OPENAI_API_KEY`: Your API key. - `HIKMA_ENGINE_LLM_OPENAI_MODEL`: The model name. - `HIKMA_ENGINE_LLM_OPENAI_MAX_TOKENS`: (Optional) Max tokens for the response. Default: `400`. - `HIKMA_ENGINE_LLM_OPENAI_TEMPERATURE`: (Optional) Sampling temperature. Default: `0.6`. Example for OpenAI API: ```bash HIKMA_ENGINE_LLM_PROVIDER=openai HIKMA_ENGINE_LLM_OPENAI_API_URL=https://api.openai.com/v1/chat/completions HIKMA_ENGINE_LLM_OPENAI_API_KEY=sk-your-openai-api-key-here HIKMA_ENGINE_LLM_OPENAI_MODEL=gpt-4 ``` Example for local services (LM Studio, Ollama): ```bash HIKMA_ENGINE_LLM_PROVIDER=openai HIKMA_ENGINE_LLM_OPENAI_API_URL=http://localhost:1234 # For LM Studio (base URL; endpoint inferred) # HIKMA_ENGINE_LLM_OPENAI_API_URL=http://localhost:11434 # For Ollama (base URL; endpoint inferred) HIKMA_ENGINE_LLM_OPENAI_API_KEY=not-needed-for-local HIKMA_ENGINE_LLM_OPENAI_MODEL=your-local-model ``` ## Embedding Providers Hikma Engine supports multiple embedding providers. The default is `python`, but server-based (OpenAI-compatible) is fully supported and recommended for npx/global usage. | Provider | Description | Examples | Setup Required | Status | |----------|-------------|----------|----------------|--------| | `openai` (server) | OpenAI-compatible HTTP API for embeddings | Ollama (`http://localhost:11434`), LM Studio (`http://localhost:1234`) | Run server; optional API key | Supported | | `python` | Python-based embeddings using local models | Hugging Face transformers via Python | Python 3.8+ and pip deps | Supported (default) | | `transformers` | In-process JS embeddings via `@xenova/transformers` | Browser/Node, no server | None | Supported | ## LLM Providers Hikma Engine supports multiple LLM providers for generating code explanations: | Provider | Description | Use Case | Setup Required | Status | |----------|-------------|----------|----------------|--------| | `python` | Local Python-based LLM using transformers | Privacy, offline usage, no API costs | Python + pip dependencies | Supported (default) | | `openai` | OpenAI API or compatible services | High-quality responses, cloud-based | API key required | Supported | ### Local Services Integration You can use local AI services for both embeddings and LLM. Here are tested working configurations: #### Explicit CLI Approach (Recommended) **Using Ollama:** ```bash # Start Ollama and pull models ollama serve ollama pull mxbai-embed-large:latest ollama pull qwen2.5-coder:7b # Use with explicit CLI flags npm run embed -- --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest npm run search -- "query" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest npm run rag -- "question" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --llm-model qwen2.5-coder:7b ``` **Using LM Studio:** ```bash # Start LM Studio on http://localhost:1234 and load models # Use with explicit CLI flags npm run embed -- --provider server --server-url http://localhost:1234 --embedding-model text-embedding-mxbai-embed-large-v1 npm run search -- "query" --provider server --server-url http://localhost:1234 --embedding-model text-embedding-mxbai-embed-large-v1 npm run rag -- "question" --provider server --server-url http://localhost:1234 --embedding-model text-embedding-mxbai-embed-large-v1 --llm-model openai/gpt-oss-20b ``` #### Legacy Environment Variables (Optional) **Using LM Studio + Ollama:** ```bash # .env configuration HIKMA_EMBEDDING_PROVIDER=openai HIKMA_EMBEDDING_OPENAI_API_URL=http://localhost:11434 HIKMA_EMBEDDING_OPENAI_MODEL=mxbai-embed-large:latest HIKMA_ENGINE_LLM_PROVIDER=openai HIKMA_ENGINE_LLM_OPENAI_API_URL=http://localhost:1234/v1/chat/completions HIKMA_ENGINE_LLM_OPENAI_API_KEY=not-needed-for-local HIKMA_ENGINE_LLM_OPENAI_MODEL=openai/gpt-oss-20b ``` **Using Only Ollama:** ```bash # .env configuration HIKMA_EMBEDDING_PROVIDER=openai HIKMA_EMBEDDING_OPENAI_API_URL=http://localhost:11434 HIKMA_EMBEDDING_OPENAI_MODEL=mxbai-embed-large:latest HIKMA_ENGINE_LLM_PROVIDER=openai HIKMA_ENGINE_LLM_OPENAI_API_URL=http://localhost:11434/v1/chat/completions HIKMA_ENGINE_LLM_OPENAI_API_KEY=not-needed-for-local HIKMA_ENGINE_LLM_OPENAI_MODEL=gpt-oss:20b ``` #### Model Requirements **For Ollama:** - Embedding models: `mxbai-embed-large:latest` - LLM models: `gpt-oss:20b`, `qwen2.5-coder:7b`, or similar - Install models: `ollama pull mxbai-embed-large:latest && ollama pull gpt-oss:20b` **For LM Studio:** - Embedding models: `text-embedding-mxbai-embed-large-v1`, `text-embedding-nomic-embed-text-v1.5` - LLM models: `openai/gpt-oss-20b`, `qwen/qwen3-coder-30b`, or similar - Load models through LM Studio interface ## Quick Start ### Option 1: NPX (No Installation Required) ```bash # Index your codebase with Python provider npx hikma-engine embed --provider python --dir /path/to/your/project # Search for code npx hikma-engine search "authentication logic" --provider python --dir /path/to/your/project # Get AI explanations npx hikma-engine rag "how does authentication work?" --provider python --dir /path/to/your/project ``` ### Option 2: Local Installation 1. **Install and setup:** ```bash npm install npm run build # Build the TypeScript code npm rebuild # Rebuild native dependencies if needed npm run setup-python # For Python-based features (optional) ``` 2. **Index your codebase (explicit CLI - no .env needed):** ```bash # Using Python provider (local models) npm run embed -- --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # OR using server provider (Ollama/LM Studio) npm run embed -- --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest ``` 3. **Search and get explanations:** ```bash # Search with explicit provider npm run search -- "authentication logic" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # RAG with explicit provider npm run rag -- "how does user authentication work?" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --llm-model "Qwen/Qwen2.5-Coder-1.5B-Instruct" ``` ### Important Notes - **Build Required**: You must run `npm run build` after installation to compile TypeScript code - **Native Dependencies**: If you encounter SQLite errors, run `npm rebuild` to recompile native modules - **Provider Fallback**: The system automatically falls back between providers if one fails - **Database Location**: The SQLite database is created in the `data/` directory (configurable with `--db-path`) - **Explicit CLI**: The `--provider` flag is now required for all commands - no more hidden `.env` dependencies - **Server Provider**: When using `--provider server`, you must also specify `--server-url` ### Testing Your Setup To verify everything is working correctly with the explicit CLI approach: ```bash # 1. Test embedding (indexing) with Python provider npm run embed -- --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # Should show: "✅ Embedding completed successfully!" # 2. Test search functionality npm run search -- "CLI commands" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --limit 5 # Should return relevant code snippets with similarity scores # 3. Test RAG (AI explanation) npm run rag -- "How do the CLI commands work?" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --llm-model "Qwen/Qwen2.5-Coder-1.5B-Instruct" # Should provide an AI-generated explanation based on your code # 4. Test with npx (global usage) npx hikma-engine search "database" --provider python --dir . --limit 3 # Should work without local installation # 5. Test server provider (if you have Ollama running) npm run embed -- --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest npm run search -- "authentication" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest ``` **Expected Results:** - Embedding: Creates database file in `data/` directory with indexed code - Search: Returns table with Node ID, Type, File Path, Similarity %, and Source Text Preview - RAG: Provides detailed AI explanation with code context - All commands should complete without errors and show cleanup logs ## Troubleshooting ### Common Issues **SQLite Module Version Error:** ```bash # Error: The module was compiled against a different Node.js version npm rebuild ``` **OpenAI API Key Error:** ```bash # Error: Incorrect API key provided # Solution 1: Use explicit CLI flags (recommended) npm run rag -- "question" --provider openai --openai-api-key sk-your-actual-api-key --embedding-model text-embedding-3-small --llm-model gpt-4o-mini # Solution 2: Update your .env file (legacy approach) HIKMA_EMBEDDING_OPENAI_API_KEY=sk-your-actual-api-key HIKMA_ENGINE_LLM_OPENAI_API_KEY=sk-your-actual-api-key ``` **Local Service Connection Error:** ```bash # Check if Ollama is running and accessible curl -s http://localhost:11434/api/tags ollama list # List available models # If Ollama is not running: ollama serve # Test with explicit CLI flags: npm run search -- "test" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest # Check if LM Studio is running and accessible curl -s http://localhost:1234/v1/models # Test with explicit CLI flags: npm run search -- "test" --provider server --server-url http://localhost:1234 --embedding-model text-embedding-mxbai-embed-large-v1 # Ensure LM Studio is running on port 1234 with a model loaded ``` **"No healthy providers available" Error:** ```bash # This usually means the LLM service is not accessible or the model is not available # Solution 1: Use explicit CLI flags to test (recommended) # Test different providers: npm run rag -- "test" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --llm-model "Qwen/Qwen2.5-Coder-1.5B-Instruct" npm run rag -- "test" --provider server --server-url http://localhost:11434 --embedding-model mxbai-embed-large:latest --llm-model qwen2.5-coder:7b # Solution 2: Check your .env configuration (legacy approach): # 1. Verify the API URL is correct # 2. Ensure the model name matches exactly what's available # 3. Test the service manually: curl -s http://localhost:1234/v1/models # For LM Studio ollama list # For Ollama # 4. Try switching between services if one fails ``` **Model Runner Stopped Error (Ollama):** ```bash # If you get "model runner has unexpectedly stopped" # This usually indicates resource limitations or model issues # Solution 1: Try explicit CLI with smaller model or different provider (recommended) npm run rag -- "question" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" --llm-model "Qwen/Qwen2.5-Coder-1.5B-Instruct" # Or switch to LM Studio: npm run rag -- "question" --provider server --server-url http://localhost:1234 --embedding-model text-embedding-mxbai-embed-large-v1 --llm-model openai/gpt-oss-20b # Solution 2: Update .env file (legacy approach) HIKMA_ENGINE_LLM_OPENAI_API_URL=http://localhost:1234/v1/chat/completions HIKMA_ENGINE_LLM_OPENAI_MODEL=openai/gpt-oss-20b ``` **Python Dependencies Missing:** ```bash # Install Python dependencies for local LLM/embedding npm run setup-python ``` **CLI Command Not Found:** ```bash # Build the project first npm run build # Then use with explicit provider flags npm run embed -- --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" # Or use npx for global access (no installation required) npx hikma-engine embed --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" npx hikma-engine search "query" --provider python --embedding-model "mixedbread-ai/mxbai-embed-large-v1" ``` ## License MIT License - see LICENSE file for details.