rag-system-pgvector

Version:

A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers, structured data queries, and chat history - supports OpenAI, Anthropic, HuggingFace, Azure, Goog

1,304 lines (1,043 loc) • 37.1 kB

Markdown

# RAG System Package [![npm version](https://badge.fury.io/js/rag-system-pgvector.svg)](https://badge.fury.io/js/rag-system-pgvector) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A production-ready **Retrieval-Augmented Generation (RAG) system** package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models. ## 🚀 Features - **📦 Easy Integration**: Simple npm install and ready-to-use API - **🤖 Multi-Provider Support**: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama - **📚 Multi-format Support**: PDF, DOCX, TXT, HTML, Markdown, JSON - **🔍 Vector Search**: High-performance similarity search with pgvector - **🎯 Structured Data Queries**: Accept JSON data for precise, contextual responses - **💬 Chat History Support**: Full conversation memory with summarization - **⚡ Production Ready**: Error handling, connection pooling, monitoring - **🔧 Flexible Configuration**: Choose your preferred embedding and LLM providers - **💾 Buffer Processing**: Process documents directly from memory buffers - **🌐 URL Processing**: Download and process documents from web URLs - **📊 Batch Operations**: Efficient processing of multiple documents ## 📦 Installation ```bash npm install rag-system-pgvector # Choose your AI provider (one or more): npm install @langchain/openai # For OpenAI npm install @langchain/anthropic # For Anthropic Claude npm install @langchain/azure-openai # For Azure OpenAI npm install @langchain/google-genai # For Google AI npm install @langchain/community # For HuggingFace, Ollama, etc. ``` ## 🚀 Quick Start ### OpenAI Provider (Traditional) ```javascript import { RAGSystem } from 'rag-system-pgvector'; import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai'; // Create provider instances const embeddings = new OpenAIEmbeddings({ openAIApiKey: 'your-openai-api-key', modelName: 'text-embedding-ada-002', }); const llm = new ChatOpenAI({ openAIApiKey: 'your-openai-api-key', modelName: 'gpt-4', temperature: 0.7, }); // Initialize RAG system const rag = new RAGSystem({ database: { host: 'localhost', database: 'your_db', username: 'postgres', password: 'your_password' }, embeddings: embeddings, llm: llm, embeddingDimensions: 1536, }); await rag.initialize(); // Add documents and query await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']); // Simple query const result = await rag.query("What is the main topic?"); console.log(result.answer); // Query with structured data for precise responses const structuredResult = await rag.query("Tell me about iPhone features", { structuredData: { intent: "product_information", entities: { product: "iPhone", category: "smartphone" }, constraints: ["Focus on latest features", "Include specifications"], responseFormat: "structured_list" } }); console.log(structuredResult.answer); ``` ### Mixed Providers (Advanced) ```javascript import { RAGSystem } from 'rag-system-pgvector'; import { OpenAIEmbeddings } from '@langchain/openai'; import { ChatAnthropic } from '@langchain/anthropic'; // Use OpenAI for embeddings, Anthropic for chat const embeddings = new OpenAIEmbeddings({ openAIApiKey: 'your-openai-api-key', modelName: 'text-embedding-ada-002', }); const llm = new ChatAnthropic({ anthropicApiKey: 'your-anthropic-api-key', modelName: 'claude-3-haiku-20240307', temperature: 0.7, }); const rag = new RAGSystem({ database: { /* your config */ }, embeddings: embeddings, llm: llm, embeddingDimensions: 1536, }); ``` ### Local Models (Privacy-First) ```javascript import { RAGSystem } from 'rag-system-pgvector'; import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers'; import { Ollama } from '@langchain/community/llms/ollama'; // Use local models (no API keys required) const embeddings = new HuggingFaceTransformersEmbeddings({ modelName: 'sentence-transformers/all-MiniLM-L6-v2', }); const llm = new Ollama({ baseUrl: 'http://localhost:11434', model: 'llama2', }); const rag = new RAGSystem({ database: { /* your config */ }, embeddings: embeddings, llm: llm, embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions }); ``` ### Buffer Processing (New in v1.1.0) ```javascript import { DocumentProcessor } from 'rag-system-pgvector/utils'; const processor = new DocumentProcessor(); // Process document from Buffer const buffer = fs.readFileSync('document.pdf'); const result = await processor.processDocumentFromBuffer( buffer, 'document.pdf', 'pdf', { source: 'api-upload', category: 'research' } ); console.log(result.chunks); // Processed chunks with embeddings ``` ### URL Processing (New in v1.1.0) ```javascript import { DocumentProcessor } from 'rag-system-pgvector/utils'; const processor = new DocumentProcessor(); // Process single URL const result = await processor.processDocumentFromUrl( 'https://example.com/document.pdf', { source: 'web-crawl', priority: 'high' } ); // Process multiple URLs const urls = [ 'https://example.com/doc1.pdf', 'https://example.com/doc2.html', 'https://example.com/doc3.md' ]; const results = await processor.processDocumentsFromUrls(urls, { source: 'batch-import', maxConcurrent: 3 }); console.log(`Processed ${results.successful.length} documents`); ``` ## 🎯 Structured Data Queries (New in v2.2.0) The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses. ### Basic Structured Query ```javascript const result = await rag.query("Tell me about iPhone features", { structuredData: { intent: "product_information", entities: { product: "iPhone", category: "smartphone", brand: "Apple" }, constraints: [ "Focus on latest model features", "Include technical specifications" ], context: { userType: "potential_buyer", priceRange: "premium" }, responseFormat: "structured_list" } }); ``` ### Troubleshooting Query ```javascript const result = await rag.query("My device won't connect to WiFi", { structuredData: { intent: "troubleshooting", entities: { issue_type: "connectivity", device_category: "mobile", problem_area: "wifi" }, constraints: [ "Provide step-by-step solution", "Include alternative methods" ], responseFormat: "step_by_step_guide" } }); ``` ### Comparison Query ```javascript const result = await rag.query("Compare iPhone vs Samsung Galaxy", { structuredData: { intent: "comparison", entities: { item1: "iPhone", item2: "Samsung Galaxy" }, constraints: [ "Compare key specifications", "Highlight main differences" ], responseFormat: "comparison_table" } }); ``` ### Combined with Chat History ```javascript const result = await rag.query("What about the camera quality?", { chatHistory: [ { role: 'user', content: 'Tell me about iPhone features' }, { role: 'assistant', content: 'The iPhone offers excellent features...' } ], structuredData: { intent: "follow_up_question", entities: { topic: "camera", context_reference: "previous_iphone_discussion" }, responseFormat: "detailed_explanation" } }); ``` ### Structured Data Schema ```typescript interface StructuredData { intent: string; // Query intent/category (required) entities?: { // Named entities and values [key: string]: string | number; }; constraints?: string[]; // Requirements/constraints context?: { // Additional context [key: string]: string | number | boolean; }; responseFormat?: string; // Desired response format } ``` ### Common Intents - `product_information` - Product details and specifications - `troubleshooting` - Problem-solving and technical support - `comparison` - Comparing multiple items - `how_to_guide` - Step-by-step instructions - `explanation` - Detailed explanations - `follow_up_question` - Context-aware follow-ups ### Response Formats - `structured_list` - Organized bullet points - `step_by_step_guide` - Numbered instructions - `comparison_table` - Side-by-side comparison - `detailed_explanation` - Comprehensive explanation - `bullet_points` - Simple bullet format - `json_format` - Structured JSON response ### Advanced Filtering (New in v2.1.0) ```javascript import RAGSystem from 'rag-system-pgvector'; const rag = new RAGSystem(config); await rag.initialize(); // Add documents with user/knowledgebot metadata const documentData = await processor.processDocumentFromBuffer( buffer, 'user-manual.pdf', 'pdf', { userId: 'user_123', knowledgebotId: 'tech_support_bot', department: 'engineering', priority: 'high' } ); await rag.documentStore.saveDocument(documentData); // Query with user filtering const userResults = await rag.query('What technical info is available?', { userId: 'user_123', limit: 5 }); // Query with knowledgebot filtering const botResults = await rag.query('Help with technical issues', { knowledgebotId: 'tech_support_bot' }); // Query with multiple filters const filteredResults = await rag.query('Show important documents', { userId: 'user_123', filter: { priority: 'high', department: 'engineering' } }); // Direct search with filtering const searchResults = await rag.searchDocumentsByUserId( 'documentation', 'user_123' ); // Get all documents for a specific user const userDocs = await rag.getDocumentsByUserId('user_123'); ``` ### Chat History & Session Persistence (New in v2.3.0) Enable multi-turn conversations with persistent chat history stored in PostgreSQL. #### Basic Chat History ```javascript // First query const result1 = await rag.query('What is machine learning?'); // Follow-up with context const result2 = await rag.query('Can you give me examples?', { chatHistory: result1.chatHistory }); // Another follow-up const result3 = await rag.query('Which one is most popular?', { chatHistory: result2.chatHistory }); ``` #### Session Persistence ```javascript const sessionId = 'user_conversation_123'; // Query with automatic session save/load const result = await rag.query('What is machine learning?', { sessionId: sessionId, persistSession: true, // Auto-save after query userId: 'user_456', knowledgebotId: 'tech_bot' }); // Continue conversation (automatically loads history) const result2 = await rag.query('Tell me more', { sessionId: sessionId, persistSession: true }); // Load session manually const session = await rag.loadSession(sessionId); console.log(`Session has ${session.messageCount} messages`); // Get all user sessions const userSessions = await rag.getUserSessions('user_456'); console.log(`User has ${userSessions.length} sessions`); // Get session statistics const stats = await rag.getSessionStats({ userId: 'user_456' }); console.log(`Total messages: ${stats.totalMessages}`); ``` #### History Summarization ```javascript // Long conversations are automatically managed const result = await rag.query('Complex question', { sessionId: sessionId, persistSession: true, maxHistoryLength: 20 // Keeps recent 20 messages }); ``` #### Testing Chat Features ```bash # Basic chat history npm run test:chat:basic # Session management npm run test:chat:session # History summarization npm run test:chat:summarization # Session persistence npm run test:chat:persistence ``` **Documentation:** - 📖 [Chat History Implementation Guide](./CHAT-HISTORY-IMPLEMENTATION.md) - 📖 [Session Persistence Guide](./CHAT-HISTORY-SESSION-PERSISTENCE.md) - 📖 [Chat History Summarization](./CHAT-HISTORY-SUMMARIZATION.md) ## 📚 API Documentation ### DocumentProcessor Class The `DocumentProcessor` class provides powerful document processing capabilities for files, buffers, and URLs. #### Buffer Processing Methods ##### `processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})` Process a document directly from a memory buffer. ```javascript import { DocumentProcessor } from 'rag-system-pgvector/utils'; const processor = new DocumentProcessor(); const buffer = Buffer.from('This is a test document', 'utf8'); const result = await processor.processDocumentFromBuffer( buffer, 'test.txt', 'txt', { source: 'api', category: 'test' } ); // Returns: // { // title: 'Test Document', // content: 'This is a test document', // chunks: [...], // Array of processed chunks with embeddings // metadata: { ... }, // fileType: 'txt', // filePath: 'test.txt' // } ``` **Parameters:** - `buffer` (Buffer): The document content as a Buffer object - `fileName` (string): Name of the file (used for metadata) - `fileType` (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json') - `metadata` (object): Additional metadata to attach to the document **Supported Buffer Types:** - **TXT**: Plain text files - **HTML**: HTML documents (extracts text content) - **Markdown**: Markdown files - **JSON**: JSON files (converts to readable text) ##### `extractTextFromBuffer(buffer, fileType)` Extract raw text from a buffer without processing into chunks. ```javascript const text = await processor.extractTextFromBuffer(buffer, 'html'); console.log(text); // Extracted plain text ``` #### URL Processing Methods ##### `processDocumentFromUrl(url, metadata = {})` Download and process a document from a URL. ```javascript const result = await processor.processDocumentFromUrl( 'https://example.com/document.pdf', { source: 'web-crawl', priority: 'high', category: 'research' } ); // Automatically detects file type from URL and content headers // Downloads to temp directory and processes ``` **Parameters:** - `url` (string): HTTP/HTTPS URL to download from - `metadata` (object): Additional metadata for the document **Features:** - Automatic file type detection from URL extension and Content-Type headers - Temporary file handling (auto-cleanup) - Support for redirects and various HTTP response types - Comprehensive error handling ##### `processDocumentsFromUrls(urls, options = {})` Process multiple URLs in parallel with concurrency control. ```javascript const urls = [ 'https://site1.com/doc1.pdf', 'https://site2.com/doc2.html', 'https://site3.com/doc3.md' ]; const results = await processor.processDocumentsFromUrls(urls, { maxConcurrent: 3, // Process up to 3 URLs simultaneously metadata: { batch: 'import-2024' }, timeout: 30000, // 30 second timeout per URL retries: 2 // Retry failed downloads }); // Returns: // { // successful: [...], // Array of successfully processed documents // failed: [...], // Array of failed URLs with error details // total: 3, // successCount: 2, // failureCount: 1 // } ``` **Options:** - `maxConcurrent` (number): Maximum concurrent downloads (default: 5) - `metadata` (object): Metadata applied to all documents - `timeout` (number): Timeout per URL in milliseconds - `retries` (number): Number of retry attempts for failed downloads #### Error Handling All methods include comprehensive error handling: ```javascript try { const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf'); } catch (error) { if (error.message.includes('Buffer is empty')) { console.log('Empty buffer provided'); } else if (error.message.includes('Unsupported file type')) { console.log('File type not supported for buffer processing'); } else { console.log('Processing error:', error.message); } } ``` #### Integration with RAG System Use processed documents with the RAG system: ```javascript import RAGSystem from 'rag-system-pgvector'; import { DocumentProcessor } from 'rag-system-pgvector/utils'; const rag = new RAGSystem(config); const processor = new DocumentProcessor(); await rag.initialize(); // Process from buffer const buffer = fs.readFileSync('document.pdf'); const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf'); // Add to RAG system await rag.documentStore.saveDocument(processed); // Process from URL and add to RAG const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html'); await rag.documentStore.saveDocument(urlProcessed); // Now query across all documents const answer = await rag.query('What information is available?'); ``` ## 🌐 With Web Interface ```javascript const rag = new RAGSystem({ // ... configuration server: { port: 3000, enableWebUI: true } }); await rag.initialize(); await rag.startServer(); // Visit http://localhost:3000 ``` ## 📖 Documentation - 📚 **[Complete Package Documentation](./PACKAGE.md)** - Full API reference and examples - 🔧 **[Integration Guide](./INTEGRATION.md)** - Step-by-step integration examples - 🎯 **[Examples](./examples.js)** - Ready-to-run examples ## ⚡ Quick Examples Run the included examples: ```bash # Basic usage example npm run example:basic # Web server example npm run example:server # Advanced integration example npm run example:advanced # Usage patterns overview npm run example:patterns ``` ## 🛠️ Development & Contributing For local development and contributions: ### Prerequisites - **Node.js** v18+ - **PostgreSQL** v12+ with pgvector extension - **OpenAI API Key** ### Setup ```bash # Clone and install git clone https://github.com/yourusername/rag-system-pgvector.git cd rag-system-pgvector npm install # Configure environment cp .env.example .env # Edit .env with your credentials # Initialize database npm run setup # Start development npm run dev ``` ### Testing ```bash # Run examples npm run example:basic # Run with web interface npm run example:server ``` ```bash curl -X POST http://localhost:3000/documents/upload \ -F "document=@path/to/your/document.pdf" \ -F "title=My Document" ``` #### Process Document from File Path ```bash curl -X POST http://localhost:3000/documents/process \ -H "Content-Type: application/json" \ -d '{ "filePath": "/path/to/document.pdf", "title": "My Document" }' ``` #### Search/Query ```bash curl -X POST http://localhost:3000/search \ -H "Content-Type: application/json" \ -d '{ "query": "What is the main topic of the document?", "sessionId": "optional-session-id" }' ``` #### Get All Documents ```bash curl http://localhost:3000/documents ``` #### Get Specific Document ```bash curl http://localhost:3000/documents/{document-id} ``` #### Delete Document ```bash curl -X DELETE http://localhost:3000/documents/{document-id} ``` ### Command Line Tools #### Process Documents from Directory ```bash npm run process-docs /path/to/documents/folder ``` #### Interactive Search ```bash npm run search ``` #### Single Query Search ```bash npm run search "Your question here" ``` ## 🏗️ Architecture ### System Components 1. **Document Processor** (`src/utils/documentProcessor.js`) - Extracts text from various file formats - Splits documents into chunks with configurable overlap - Generates embeddings using OpenAI 2. **Document Store** (`src/services/documentStore.js`) - Manages document and chunk storage in PostgreSQL - Performs vector similarity search using pgvector - Handles CRUD operations 3. **RAG Workflow** (`src/workflows/ragWorkflow.js`) - LangGraph-based workflow orchestration - Three-step process: Retrieve → Rerank → Generate - Supports conversational context 4. **API Server** (`src/index.js`) - Express.js REST API - File upload handling - Conversation session management ### Database Schema ```sql -- Documents table CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title VARCHAR(255) NOT NULL, content TEXT NOT NULL, file_path VARCHAR(500), file_type VARCHAR(50), metadata JSONB DEFAULT '{}', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Document chunks with embeddings CREATE TABLE document_chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), document_id UUID REFERENCES documents(id) ON DELETE CASCADE, chunk_index INTEGER NOT NULL, content TEXT NOT NULL, embedding vector(1536), metadata JSONB DEFAULT '{}', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Search sessions for tracking CREATE TABLE search_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), query TEXT NOT NULL, results JSONB, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Chat Sessions for conversation persistence (NEW) CREATE TABLE chat_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id VARCHAR(255) UNIQUE NOT NULL, user_id VARCHAR(255), knowledgebot_id VARCHAR(255), history JSONB DEFAULT '[]'::jsonb, metadata JSONB DEFAULT '{}'::jsonb, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP, message_count INTEGER DEFAULT 0 ); -- Indexes for chat sessions CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id); CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id); CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id); CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity); ``` ### LangGraph Workflow ```mermaid graph TD A[Query Input] --> B[Retrieve Node] B --> C[Rerank Node] C --> D[Generate Node] D --> E[Response Output] B --> F[Vector Search] F --> G[Similar Chunks] C --> H[Score Ranking] H --> I[Top Chunks] D --> J[LLM Generation] J --> K[Contextual Response] ``` ## 🔧 Configuration The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object. ### Complete Configuration Example ```javascript import RAGSystem from 'rag-system-pgvector'; import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai'; const rag = new RAGSystem({ // ======================================== // 1. Database Configuration (Required) // ======================================== database: { host: 'localhost', // Database host port: 5432, // Database port database: 'rag_db', // Database name username: 'postgres', // Database user password: 'your_password', // Database password // Connection Pool Settings max: 10, // Max connections in pool min: 0, // Min connections in pool maxUses: Infinity, // Max uses per connection allowExitOnIdle: false, // Allow pool to close when idle maxLifetimeSeconds: 0, // Max connection lifetime (0 = unlimited) idleTimeoutMillis: 10000 // Idle timeout (10 seconds) }, // ======================================== // 2. AI Provider Configuration (Required) // ======================================== embeddings: new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: 'text-embedding-ada-002' }), llm: new ChatOpenAI({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4', temperature: 0.7 }), // ======================================== // 3. Embedding Configuration // ======================================== embeddingDimensions: 1536, // Dimensions for embeddings // OpenAI ada-002: 1536 // HuggingFace MiniLM: 384 // Anthropic: varies // ======================================== // 4. Vector Store Configuration // ======================================== vectorStore: { tableName: 'document_chunks_vector', vectorColumnName: 'embedding', contentColumnName: 'content', metadataColumnName: 'metadata' }, // ======================================== // 5. Document Processing Configuration // ======================================== processing: { chunkSize: 1000, // Characters per chunk chunkOverlap: 200 // Overlap between chunks }, // ======================================== // 6. Chat History Configuration (NEW) // ======================================== chatHistory: { enabled: true, // Enable chat history feature maxMessages: 20, // Max messages before management kicks in maxTokens: 3000, // Max tokens in chat history summarizeThreshold: 30, // Trigger summarization after N messages keepRecentCount: 10, // Recent messages to preserve alwaysKeepFirst: true, // Always keep conversation starter persistSessions: true, // Store sessions in database sessionTimeout: 3600000 // Session timeout (1 hour in ms) } }); await rag.initialize(); ``` ### Configuration Sections Explained #### 1. Database Configuration Controls PostgreSQL connection and pool behavior: ```javascript database: { host: 'localhost', // Where PostgreSQL is running port: 5432, // PostgreSQL port (default: 5432) database: 'rag_db', // Your database name username: 'postgres', // Database user password: 'your_password', // User password // Pool Settings (Advanced) max: 10, // Maximum concurrent connections min: 0, // Minimum idle connections idleTimeoutMillis: 10000 // Close idle connections after 10s } ``` **Best Practices:** - Use environment variables for sensitive data - Set `max` based on your application's concurrency needs - Monitor connection pool usage in production #### 2. AI Provider Configuration Specify your embedding and language model providers: **OpenAI Example:** ```javascript import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai'; embeddings: new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: 'text-embedding-ada-002' }), llm: new ChatOpenAI({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: 'gpt-4', temperature: 0.7 }) ``` **Anthropic Example:** ```javascript import { OpenAIEmbeddings } from '@langchain/openai'; import { ChatAnthropic } from '@langchain/anthropic'; embeddings: new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: 'text-embedding-ada-002' }), llm: new ChatAnthropic({ anthropicApiKey: process.env.ANTHROPIC_API_KEY, modelName: 'claude-3-sonnet-20240229', temperature: 0.7 }) ``` **Local Models Example:** ```javascript import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers'; import { Ollama } from '@langchain/community/llms/ollama'; embeddings: new HuggingFaceTransformersEmbeddings({ modelName: 'sentence-transformers/all-MiniLM-L6-v2' }), llm: new Ollama({ baseUrl: 'http://localhost:11434', model: 'llama2' }) ``` #### 3. Embedding Dimensions Match this to your embedding model's output dimensions: | Model | Dimensions | Provider | |-------|------------|----------| | text-embedding-ada-002 | 1536 | OpenAI | | all-MiniLM-L6-v2 | 384 | HuggingFace | | text-embedding-3-small | 1536 | OpenAI | | text-embedding-3-large | 3072 | OpenAI | ```javascript embeddingDimensions: 1536 // Must match your embedding model ``` **Important:** If you change embedding models, you must recreate the database schema! #### 4. Vector Store Configuration Customize the vector store table structure: ```javascript vectorStore: { tableName: 'document_chunks_vector', // Table name for vectors vectorColumnName: 'embedding', // Column for embeddings contentColumnName: 'content', // Column for text content metadataColumnName: 'metadata' // Column for metadata } ``` Most users can use the defaults. #### 5. Document Processing Control how documents are chunked: ```javascript processing: { chunkSize: 1000, // Characters per chunk (500-2000 recommended) chunkOverlap: 200 // Overlap between chunks (10-20% of chunkSize) } ``` **Guidelines:** - **Small chunks (500)**: Better precision, more chunks, higher cost - **Large chunks (2000)**: Better context, fewer chunks, lower cost - **Overlap**: Prevents context loss at boundaries (typically 10-20%) **Examples:** ```javascript // For technical documentation (needs precision) processing: { chunkSize: 800, chunkOverlap: 150 } // For books/long content (needs context) processing: { chunkSize: 1500, chunkOverlap: 300 } // For code documentation (needs structure) processing: { chunkSize: 1000, chunkOverlap: 200 } ``` #### 6. Chat History Configuration (NEW in v2.3.0) Control conversation history management: ```javascript chatHistory: { enabled: true, // Enable/disable chat history maxMessages: 20, // Start management after N messages maxTokens: 3000, // Maximum tokens in history summarizeThreshold: 30, // Summarize after N messages keepRecentCount: 10, // Recent messages to always keep alwaysKeepFirst: true, // Keep conversation starter persistSessions: true, // Store in database sessionTimeout: 3600000 // 1 hour timeout (in milliseconds) } ``` **Chat History Options Explained:** - **`enabled`**: Master switch for chat history feature - **`maxMessages`**: Soft limit before history management activates - **`maxTokens`**: Hard limit on token count (prevents API errors) - **`summarizeThreshold`**: When to trigger LLM-based summarization - **`keepRecentCount`**: Recent messages to preserve during summarization - **`alwaysKeepFirst`**: Preserve conversation context from the beginning - **`persistSessions`**: Save sessions to database for persistence - **`sessionTimeout`**: Milliseconds before session is considered inactive **Preset Configurations:** ```javascript // Minimal (cost-effective) chatHistory: { enabled: true, maxMessages: 10, maxTokens: 1500, summarizeThreshold: 15, keepRecentCount: 5, persistSessions: false } // Balanced (recommended) chatHistory: { enabled: true, maxMessages: 20, maxTokens: 3000, summarizeThreshold: 30, keepRecentCount: 10, persistSessions: true } // Maximum context (for complex conversations) chatHistory: { enabled: true, maxMessages: 40, maxTokens: 6000, summarizeThreshold: 50, keepRecentCount: 20, persistSessions: true } // Disabled (for single-shot queries) chatHistory: { enabled: false } ``` ### Environment Variables Create a `.env` file for sensitive configuration: ```env # Database DB_HOST=localhost DB_PORT=5432 DB_NAME=rag_db DB_USER=postgres DB_PASSWORD=your_secure_password # OpenAI OPENAI_API_KEY=sk-... # Anthropic (optional) ANTHROPIC_API_KEY=sk-ant-... # Azure (optional) AZURE_OPENAI_API_KEY=... AZURE_OPENAI_ENDPOINT=https://... # Processing (optional) CHUNK_SIZE=1000 CHUNK_OVERLAP=200 EMBEDDING_DIMENSIONS=1536 ``` Then use in your code: ```javascript import 'dotenv/config'; const rag = new RAGSystem({ database: { host: process.env.DB_HOST, port: parseInt(process.env.DB_PORT), database: process.env.DB_NAME, username: process.env.DB_USER, password: process.env.DB_PASSWORD }, embeddings: new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY }), llm: new ChatOpenAI({ openAIApiKey: process.env.OPENAI_API_KEY }), embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536') }); ``` ### Query-Time Configuration You can also configure behavior at query time: ```javascript const result = await rag.query('Your question', { // Filtering userId: 'user_123', // Filter by user knowledgebotId: 'bot_456', // Filter by bot filter: { category: 'tech' }, // Custom metadata filters // Retrieval limit: 10, // Number of chunks to retrieve threshold: 0.5, // Similarity threshold (0-1) // Chat History chatHistory: previousHistory, // Previous conversation maxHistoryLength: 15, // Override default history length sessionId: 'session_789', // Session identifier persistSession: true, // Save session to database // Context context: additionalContext, // Extra context to include metadata: { source: 'api' } // Custom metadata }); ``` ### Configuration Best Practices 1. **Security**: Never hardcode API keys or passwords 2. **Environment-Specific**: Use different configs for dev/staging/prod 3. **Performance**: Monitor and adjust based on usage patterns 4. **Cost**: Balance context size with API costs 5. **Testing**: Test with different configurations to find optimal settings ## 📊 Performance Optimization ### Database Indexes The system creates optimized indexes: ```sql -- For vector similarity search CREATE INDEX idx_document_chunks_embedding ON document_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- For document relationships CREATE INDEX idx_document_chunks_document_id ON document_chunks(document_id); ``` ### Chunking Strategy - **Recursive Character Text Splitter**: Preserves semantic boundaries - **Configurable overlap**: Ensures context continuity - **Multiple separators**: Prioritizes paragraph, sentence, then word boundaries ## 🧪 Testing ### Test Document Processing ```bash # Create test documents directory mkdir test-docs # Add some test files (PDF, DOCX, TXT, etc.) # Then process them npm run process-docs ./test-docs ``` ### Test Search ```bash # Interactive search npm run search # Or single query npm run search "What is machine learning?" ``` ## 🔍 Troubleshooting ### Common Issues 1. **pgvector extension not found** ```sql -- Install pgvector extension CREATE EXTENSION IF NOT EXISTS vector; ``` 2. **OpenAI API quota exceeded** - Check your OpenAI API usage - Consider using alternative embedding models 3. **Large document processing fails** - Increase chunk size or reduce document size - Check memory limits 4. **Poor search results** - Lower similarity threshold - Adjust chunk size and overlap - Verify document content quality ### Debug Mode Enable verbose logging by setting: ```env NODE_ENV=development ``` ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Commit your changes 4. Push to the branch 5. Create a Pull Request ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - [LangChain](https://langchain.com/) for the excellent AI/ML framework - [pgvector](https://github.com/pgvector/pgvector) for vector similarity search - [OpenAI](https://openai.com/) for embedding and language models ## 📚 Additional Resources - [RAG Best Practices](https://docs.langchain.com/docs/use-cases/question-answering) - [pgvector Documentation](https://github.com/pgvector/pgvector) - [LangGraph Documentation](https://langchain-ai.github.io/langgraph/) - [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)