@llamafarm/llamafarm
Version:
š¾ Plant and harvest AI models, agents, and databases into single deployable binaries
204 lines (150 loc) ⢠5.11 kB
Markdown
# š¾ LLaMA Farm CLI
Deploy AI models, agents, and databases into single deployable binaries - no cloud required.
## Installation
```bash
npm install -g @llamafarm/llamafarm
```
## Quick Start
```bash
# Deploy a model
llamafarm plant llama3-8b
# Deploy with optimization
llamafarm plant llama3-8b --optimize
# Deploy to specific target
llamafarm plant mistral-7b --target raspberry-pi
# Development/Testing (no model download)
llamafarm plant llama3-8b --mock
```
## Complete Workflow Example
```bash
# 1. Plant - Configure your AI deployment
llamafarm plant llama3-8b \
--device mac-arm \
--agent chat-assistant \
--rag \
--database vector
# 2. Bale - Compile to single binary
llamafarm bale ./.llamafarm/llama3-8b \
--device mac-arm \
--optimize
# 3. Harvest - Deploy anywhere
llamafarm harvest llama3-8b-mac-arm-v1.0.0.bin --run
# Or just copy and run directly (no dependencies needed!)
./llama3-8b-mac-arm-v1.0.0.bin
```
## Features
- šÆ **One-Line Deployment** - Deploy complex AI models with a single command
- š¦ **Zero Dependencies** - Compiled binaries run anywhere
- š **100% Private** - Your data never leaves your device
- ā” **Lightning Fast** - 10x faster than traditional deployments
- š¾ **90% Smaller** - Optimized models use fraction of original size
## Commands
### `plant`
Deploy a model to create a standalone binary.
```bash
llamafarm plant <model> [options]
Options:
--target <platform> Target platform (mac, linux, windows, raspberry-pi)
--optimize Enable size optimization
--agent <name> Include an agent
--rag Enable RAG pipeline
--database <type> Include database (vector, sqlite)
```
### Examples
```bash
# Basic deployment
llamafarm plant llama3-8b
# Deploy with RAG and vector database
llamafarm plant mixtral-8x7b --rag --database vector
# Deploy optimized for Raspberry Pi
llamafarm plant llama3-8b --target raspberry-pi --optimize
# Deploy with custom agent
llamafarm plant llama3-8b --agent customer-service
```
### `bale`
šÆ **The Baler** - Compile your deployment into a single executable binary.
```bash
llamafarm bale <project-dir> [options]
Options:
--device <platform> Target platform (mac, linux, windows, raspberry-pi)
--output <path> Output binary path
--optimize <level> Optimization level (none, standard, max)
--sign Sign the binary for distribution
--compress Extra compression (slower but smaller)
```
The Baler packages everything into a single binary:
- š§ Quantized model (GGUF format)
- š¤ Agent configuration & code
- šļø Embedded vector database
- š Web UI
- š Node.js runtime
- š§ Platform-specific optimizations
**Supported Platforms:**
- `mac` / `mac-arm` / `mac-intel` - macOS with Metal acceleration
- `linux` / `linux-arm` - Linux with CUDA support
- `windows` - Windows with DirectML/CUDA
- `raspberry-pi` - Optimized for ARM devices
- `jetson` - NVIDIA Jetson edge devices
**Typical Binary Sizes:**
- 7B models: 4-8GB (depending on quantization)
- 13B models: 8-13GB
- Mixtral: 25-45GB
### Bale Examples
```bash
# Standard compilation
llamafarm bale ./.llamafarm/llama3-8b --device mac-arm
# Optimized for size
llamafarm bale ./.llamafarm/llama3-8b --device raspberry-pi --optimize max --compress
# Enterprise deployment with signing
llamafarm bale ./.llamafarm/mixtral --device linux --sign --output production.bin
```
### `harvest`
Deploy and run a compiled binary.
```bash
llamafarm harvest <binary-or-url> [options]
Options:
--run Run immediately after deployment
--daemon Run as background service
--port <number> Override default port
--verify Verify binary integrity
```
## Configuration
Create a `llamafarm.yaml` file for advanced configurations:
```yaml
name: my-assistant
base_model: llama3-8b
plugins:
- vector_search
- voice_recognition
data:
- path: ./company-docs
type: knowledge
optimization:
quantization: int8
target_size: 2GB
```
Then build:
```bash
llamafarm build
```
## Requirements
- Node.js 18+
- 8GB RAM (minimum)
- 10GB free disk space
## Documentation
For full documentation, visit [https://docs.llamafarm.ai](https://docs.llamafarm.ai)
## Support
- š [Documentation](https://docs.llamafarm.ai)
- š¬ [Discord Community](https://discord.gg/llamafarm-ai)
- š [Issue Tracker](https://github.com/llamafarm-ai/llamafarm/issues)
## Baler FAQ
**Q: Can I run the binary on a different OS than where I compiled it?**
A: No, you need to compile for each target platform. Use `--device` to specify the target.
**Q: How much disk space do I need?**
A: During compilation, you need ~3x the final binary size. The final binary is typically 4-8GB for 7B models.
**Q: Can I update the model without recompiling?**
A: No, the model is embedded in the binary. This ensures zero dependencies but means updates require recompilation.
**Q: Does the binary need internet access?**
A: No! Everything runs completely offline once deployed.
## License
MIT Ā© LLaMA Farm Team