UNPKG

ai-server

Version:

An OpenAI and Claude API compatible server using node-llama-cpp for local LLM models

233 lines (178 loc) • 5.44 kB
# AI Server A TypeScript microservice that provides an API compatible with OpenAI and Claude for working with local LLM models through node-llama-cpp. ## Features - šŸ”„ Full compatibility with OpenAI Chat API (`/v1/chat/completions`) - šŸ¤– Compatibility with Anthropic Claude API (`/v1/messages`) - 🐟 Full compatibility with DeepSeek API (`/v1/chat/completions`) - 🌊 Support for streaming generation (Streaming API) - šŸ”‘ Your own API key authentication - 🧠 Run local LLM models in GGUF format - āš™ļø Configuration through environment variables - šŸ” Monitoring via `/health` endpoint - šŸ“‹ Standard API for retrieving model list (`/v1/models`) ## Requirements - Node.js 18+ - TypeScript 5.3+ - GGUF model (Llama 2, Mistral, LLaMA 3, or other compatible models) - Recommended minimum 16 GB RAM for 7B models ## Installation 1. Clone the repository: ```bash git clone https://github.com/ivanoff/ai-server.git cd ai-server ``` 2. Install dependencies: ```bash npm install ``` 3. Create a directory for models: ```bash mkdir -p models ``` 4. Download a GGUF model into the `models/` directory (for example, from [Hugging Face](https://huggingface.co/models)) 5. Copy the example `.env` file and configure it to your needs: ```bash cp .env.example .env ``` 6. Compile TypeScript: ```bash npm run build ``` 7. Start the server: ```bash npm start ``` ## Project Structure ``` ai-server/ ā”œā”€ā”€ src/ │ └── server.ts # Main server code ā”œā”€ā”€ models/ # Directory for GGUF models ā”œā”€ā”€ dist/ # Compiled files ā”œā”€ā”€ .env # Configuration ā”œā”€ā”€ package.json └── tsconfig.json ``` ## Configuration Configure the `.env` file to change server parameters: ```ini # Path to the model (absolute or relative to project root) MODEL_PATH=./models/llama-2-7b-chat.gguf # Server port PORT=3000 # Default maximum number of tokens DEFAULT_MAX_TOKENS=2048 # Number of model layers to offload to GPU (0 for CPU-only) GPU_LAYERS=120 # API key for authentication API_KEY=your_api_key ``` ## Usage Examples ### OpenAI API compatible request ```typescript const response = await fetch('http://localhost:3000/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer your_api_key' }, body: JSON.stringify({ model: 'llama-local', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Tell me about TypeScript' } ], max_tokens: 500, temperature: 0.7 }) }); const data = await response.json(); console.log(data); ``` ### Anthropic Claude API compatible request ```typescript const response = await fetch('http://localhost:3000/v1/messages', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': 'your_api_key' }, body: JSON.stringify({ model: 'llama-local', messages: [ { role: 'human', content: 'Tell me about TypeScript' } ], max_tokens: 500, temperature: 0.7 }) }); const data = await response.json(); console.log(data); ``` ### Streaming mode To use streaming mode, add the `stream: true` parameter to the request and process the event stream: ```typescript const response = await fetch('http://localhost:3000/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'llama-local', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Tell me about TypeScript' } ], max_tokens: 500, temperature: 0.7, stream: true }) }); // Process the event stream const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim() !== ''); for (const line of lines) { if (line.startsWith('data: ') && line !== 'data: [DONE]') { const jsonData = JSON.parse(line.replace('data: ', '')); console.log(jsonData); } } } ``` ## API Endpoints ### `/v1/chat/completions` OpenAI Chat API compatible endpoint. **Request Parameters:** - `messages`: Array of message objects with `role` and `content` - `model`: Model identifier (optional) - `max_tokens`: Maximum tokens to generate (optional) - `temperature`: Randomness of generation (optional) - `stream`: Enable streaming mode (optional) ### `/v1/messages` Claude API compatible endpoint. **Request Parameters:** - `messages`: Array of message objects with `role` and `content` - `model`: Model identifier (optional) - `max_tokens`: Maximum tokens to generate (optional) - `temperature`: Randomness of generation (optional) - `stream`: Enable streaming mode (optional) ### `/health` Health check endpoint that returns server status and model path. ### `/v1/models` Returns a list of available models (currently returns a single model, `llama-local`). ## Development - Run in development mode with hot reloading: ```bash npm run dev ``` - Watch for TypeScript changes: ```bash npm run watch ``` ## License [MIT](https://choosealicense.com/licenses/mit/) ## Created by Dimitry Ivanov <2@ivanoff.org.ua> # curl -A cv ivanoff.org.ua