llm-checker
Version:
Intelligent CLI tool with AI-powered model selection that analyzes your hardware and recommends optimal LLM models for your system
199 lines (152 loc) ⢠5.48 kB
Markdown
# AI Model Selector
This module implements an AI-powered model selector using a lightweight TabTransformer model that learns from performance benchmarks to recommend the optimal Ollama model for any given hardware configuration.
## š§ How It Works
The AI selector uses a small (<150KB) quantized ONNX model trained on hardware specifications and model performance data to predict which model will perform best on your system.
### Features
- **Hardware-Aware Selection**: Considers CPU cores, RAM, GPU VRAM, and model architecture
- **Lightweight Model**: Quantized ONNX model under 150KB
- **Fallback Heuristics**: Works even without trained model using smart heuristics
- **Continuous Learning**: Can be retrained with new benchmark data
- **Fast Inference**: Sub-second model selection
## š Quick Start
### 1. Check AI Model Status
```bash
npm run ai-check -- --status
```
### 2. Collect Benchmark Data (Optional)
```bash
npm run benchmark
```
### 3. Train AI Model (Optional)
```bash
npm run train-ai
```
### 4. Use AI Selection
```bash
npm run ai-check
npm run ai-check -- --models llama2:7b mistral:7b phi3:mini
npm run ai-check -- --prompt "Explain machine learning"
```
## š Architecture
### TabTransformer Model
- **Input Features**: Hardware specs (categorical + numerical)
- **Architecture**: 2-layer transformer with 32-dim embeddings
- **Output**: Binary classification (best model probability)
- **Size**: <150KB quantized ONNX
### Feature Engineering
**Categorical Features:**
- `model_id`: Model identifier
- `gpu_model_normalized`: GPU category
- `hw_platform`: Operating system
- `ram_tier`, `cpu_tier`, `vram_tier`: Hardware capability tiers
**Numerical Features:**
- `model_size_numeric`: Model parameters in billions
- `hw_cpu_cores`: CPU core count
- `hw_cpu_freq_max`: Maximum CPU frequency
- `hw_total_ram_gb`: System RAM in GB
- `hw_gpu_vram_gb`: GPU VRAM in GB
## š§ Development
### Python Requirements
```bash
cd ml-model
pip install -r requirements.txt
```
### Training Pipeline
1. **Data Collection**: `benchmark_collector.py`
- Runs performance tests on available models
- Collects hardware specifications
- Saves data as Parquet files
2. **Data Aggregation**: `dataset_aggregator.py`
- Combines benchmark data from multiple machines
- Creates training labels (best model per hardware config)
- Preprocesses features
3. **Model Training**: `train_model.py`
- Trains TabTransformer on processed data
- Exports to ONNX and quantizes to INT8
- Validates performance (target: >90% AUC)
### JavaScript Runtime
- **index.js**: ONNX runtime for model inference
- **cli.js**: Standalone CLI tool
- **test.js**: Testing utilities
## š Performance Metrics
The model is trained to achieve:
- **>90% AUC** on validation set
- **<150KB** model size after quantization
- **<100ms** inference time
- **>80% accuracy** on hardware compatibility
## š Continuous Improvement
The model can be continuously improved by:
1. Running benchmarks on new hardware configurations
2. Adding new models to the training set
3. Retraining periodically with updated data
4. Fine-tuning hyperparameters based on performance
## š ļø API Reference
### AIModelSelector Class
```javascript
const selector = new AIModelSelector();
// Initialize (loads ONNX model)
await selector.initialize();
// Select best model
const result = await selector.selectBestModel(
['llama2:7b', 'mistral:7b'],
systemSpecs
);
// Fallback selection
const fallback = selector.selectModelHeuristic(models, specs);
```
### CLI Commands
```bash
# AI-powered selection
llm-checker ai-check
# With specific models
llm-checker ai-check -m llama2:7b mistral:7b
# With prompt
llm-checker ai-check --prompt "Hello world"
# Check training status
llm-checker ai-check --status
# Collect benchmarks
llm-checker ai-check --benchmark
# Train model
llm-checker ai-check --train
```
## š Troubleshooting
### Common Issues
1. **"ONNX model not found"**
- Run `npm run train-ai` to train the model first
- Or collect benchmarks with `npm run benchmark`
2. **"Python not found"**
- Install Python ā„3.10
- Install required packages: `pip install -r requirements.txt`
3. **"No models found"**
- Install Ollama models: `ollama pull llama2:7b`
4. **Training fails with low AUC**
- Collect more diverse benchmark data
- Run benchmarks on different hardware configurations
### Debug Mode
```bash
llm-checker ai-check --debug
```
## š File Structure
```
ml-model/
āāā README.md # This file
āāā requirements.txt # Python dependencies
āāā python/ # Training pipeline
ā āāā benchmark_collector.py
ā āāā dataset_aggregator.py
ā āāā train_model.py
āāā js/ # JavaScript runtime
ā āāā package.json
ā āāā index.js
ā āāā cli.js
ā āāā test.js
āāā data/ # Training data
ā āāā raw/ # Benchmark parquet files
ā āāā processed/ # Processed training data
āāā trained/ # Trained model artifacts
āāā model_quantized.onnx
āāā metadata.json
āāā scaler.joblib
āāā label_encoders.joblib
```
This AI-powered approach ensures optimal model selection tailored to your specific hardware, maximizing performance while minimizing resource usage.