node-ai-ragbot

Version:

Node.js backend package for building AI chatbots and voicebots with Retrieval-Augmented Generation (RAG). It ingests website pages or local files (PDF, DOCX, TXT, MD), creates embeddings with LangChain + OpenAI, stores them in a fast in-memory vector data

github.com/KUSHAL-31/node-ai-ragbot

KUSHAL-31/node-ai-ragbot

240 lines (184 loc) • 6.55 kB

Markdown

# node-ai-ragbot `node-ai-ragbot` is a modular and framework-agnostic Node.js package for building intelligent chatbot and voicebot systems with **Retrieval-Augmented Generation (RAG)**, powered by **OpenAI** and **LangChain**. It allows you to quickly integrate both text-based chatbots and voice-enabled assistants into your application with minimal setup. The package provides ready-to-use request handlers and adapters for popular frameworks, so you can focus on building your application instead of reinventing backend logic. --- ## Features - Supports multiple local files as knowledge sources (`.pdf`, `.docx`, `.txt`, `.md`) - Supports website scraping with sitemap crawling or exact URL fetching - Chatbot via `/chat` endpoint - Voicebot via `/voice` endpoint (speech-to-text using Whisper, text-to-speech using OpenAI TTS) - Fully configurable: choose models, voices, embeddings, chunk sizes, and logging - Provides handlers and adapters for **Express**, **Fastify**, **Koa**, **NestJS**, and raw **Node.js (http)** - Simple initialization with sensible defaults - In-memory vector store using LangChain for quick query retrieval --- ## Requirements - Node.js v16 or higher - An OpenAI API key --- ## Installation ```bash npm install node-ai-ragbot ``` --- ## Quick Start with Express The easiest way to get started is with Express. You can initialize the bot and then attach the provided handlers to your routes. ```js const express = require("express"); const cors = require("cors"); require("dotenv").config(); const { initRagVoiceBot, useInExpress } = require("node-ai-ragbot"); const app = express(); (async () => { const { chatHandler, voiceHandler } = await initRagVoiceBot({ sources: { files: ["files/knowledge.txt", "files/knowledge.pdf"], // urls: ["https://docs.myproduct.com"], }, openai: { apiKey: process.env.OPENAI_API_KEY, chat: { model: "gpt-4o", temperature: 0.3 }, whisper: { model: "whisper-1", language: "en" }, tts: { model: "tts-1-hd", voice: "nova" }, }, }); // Mounts /api/bot/chat and /api/bot/voice useInExpress(app, chatHandler, voiceHandler, "/api/bot"); app.listen(3000, () => console.log("Server running on http://localhost:3000") ); })(); ``` This will expose the following endpoints: - `POST /api/bot/chat` - `POST /api/bot/voice` --- ## Using Handlers with Other Frameworks If you are not using Express, you can still initialize handlers with `initRagVoiceBot` and then connect them to your chosen framework using the adapters provided in this package. ```js const { initRagVoiceBot, useInFastify, useInKoa, useInNestJS, useInHttp } = require("node-ai-ragbot"); (async () => { const { chatHandler, voiceHandler } = await initRagVoiceBot({ sources: { files: ["files/knowledge.txt"] }, openai: { apiKey: process.env.OPENAI_API_KEY }, }); // Fastify const fastify = require("fastify")(); useInFastify(fastify, chatHandler, voiceHandler, "/api/bot"); // Koa const Koa = require("koa"); const Router = require("koa-router"); const koaApp = new Koa(); const koaRouter = new Router(); useInKoa(koaRouter, chatHandler, voiceHandler, "/api/bot"); koaApp.use(koaRouter.routes()); // NestJS const { chat, voice } = useInNestJS(chatHandler, voiceHandler); // Example in controller: // @Post("chat") chat(@Req() req, @Res() res) { return chat(req, res); } // @Post("voice") voice(@Req() req, @Res() res) { return voice(req, res); } // Raw Node.js const http = require("http"); const server = http.createServer(); useInHttp(server, chatHandler, voiceHandler, "/api/bot"); server.listen(4000, () => console.log("Raw server on http://localhost:4000")); })(); ``` By using adapters, you can integrate seamlessly with any backend framework without being locked to a specific one. --- ## API Endpoints ### `/chat` - **Method**: POST - **Request body**: ```json { "question": "What is in the knowledge base?" } ``` - **Response**: ```json { "success": true, "answer": "Answer from the chatbot" } ``` --- ### `/voice` - **Method**: POST - **Request body**: raw audio buffer (e.g. `audio/webm`) - **Response**: ```json { "success": true, "transcription": "Your transcribed question", "answer": "Generated answer", "audio": "Base64 encoded audio response" } ``` --- ## Configuration You can customize how the bot processes data by passing a configuration object. ```ts interface RagConfig { sources: { files?: string[]; // Local file paths (.pdf, .docx, .txt, .md) urls?: string[]; // Website URLs }; rag?: { maxPagesPerSite?: number; // default: 30 textSplit?: { chunkSize?: number; // default: 1000 chunkOverlap?: number; // default: 200 }; topK?: number; // default: 3 }; openai: { apiKey: string; embeddings?: { model?: string }; // default: text-embedding-3-small chat?: { model?: string; // default: gpt-4o temperature?: number; // default: 0.3 maxTokens?: number; // default: 200 promptTemplate?: string; }; stt?: { model?: string; // default: whisper-1 language?: string; // default: en }; tts?: { model?: string; // default: tts-1 voice?: string; // default: alloy }; }; logger?: Console; // default: console } ``` --- ## Memory Vector Storage The package uses LangChain’s in-memory vector store to hold embeddings of indexed data extracted from files or websites. - Data is stored only in memory and rebuilt on every server restart - Provides fast query responses for small to medium datasets - Best suited for real-time, non-persistent workloads --- ## Security Considerations - The OpenAI API key is passed via configuration, never hardcoded - CORS is not modified; you control it in your hosting app - Audio files are processed in memory only and not stored on disk - No persistent database is included; vector data is rebuilt each run --- ## Example Project Structure ``` my-app/ ├── files/ │ ├── knowledge.txt │ ├── knowledge.pdf ├── server.js ├── package.json └── README.md ``` --- ## Contributing and Support If you encounter issues or have feature requests, please open a GitHub issue. For custom requirements, the package can be extended by wrapping handlers or building new adapters. --- Happy building with node-ai-ragbot!