rag-system-pgvector
Version:
A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers, structured data queries, and chat history - supports OpenAI, Anthropic, HuggingFace, Azure, Goog
1,304 lines (1,043 loc) โข 37.1 kB
Markdown
# RAG System Package
[](https://badge.fury.io/js/rag-system-pgvector)
[](https://opensource.org/licenses/MIT)
A production-ready **Retrieval-Augmented Generation (RAG) system** package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models.
## ๐ Features
- **๐ฆ Easy Integration**: Simple npm install and ready-to-use API
- **๐ค Multi-Provider Support**: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama
- **๐ Multi-format Support**: PDF, DOCX, TXT, HTML, Markdown, JSON
- **๐ Vector Search**: High-performance similarity search with pgvector
- **๐ฏ Structured Data Queries**: Accept JSON data for precise, contextual responses
- **๐ฌ Chat History Support**: Full conversation memory with summarization
- **โก Production Ready**: Error handling, connection pooling, monitoring
- **๐ง Flexible Configuration**: Choose your preferred embedding and LLM providers
- **๐พ Buffer Processing**: Process documents directly from memory buffers
- **๐ URL Processing**: Download and process documents from web URLs
- **๐ Batch Operations**: Efficient processing of multiple documents
## ๐ฆ Installation
```bash
npm install rag-system-pgvector
# Choose your AI provider (one or more):
npm install @langchain/openai # For OpenAI
npm install @langchain/anthropic # For Anthropic Claude
npm install @langchain/azure-openai # For Azure OpenAI
npm install @langchain/google-genai # For Google AI
npm install @langchain/community # For HuggingFace, Ollama, etc.
```
## ๐ Quick Start
### OpenAI Provider (Traditional)
```javascript
import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
// Create provider instances
const embeddings = new OpenAIEmbeddings({
openAIApiKey: 'your-openai-api-key',
modelName: 'text-embedding-ada-002',
});
const llm = new ChatOpenAI({
openAIApiKey: 'your-openai-api-key',
modelName: 'gpt-4',
temperature: 0.7,
});
// Initialize RAG system
const rag = new RAGSystem({
database: {
host: 'localhost',
database: 'your_db',
username: 'postgres',
password: 'your_password'
},
embeddings: embeddings,
llm: llm,
embeddingDimensions: 1536,
});
await rag.initialize();
// Add documents and query
await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);
// Simple query
const result = await rag.query("What is the main topic?");
console.log(result.answer);
// Query with structured data for precise responses
const structuredResult = await rag.query("Tell me about iPhone features", {
structuredData: {
intent: "product_information",
entities: { product: "iPhone", category: "smartphone" },
constraints: ["Focus on latest features", "Include specifications"],
responseFormat: "structured_list"
}
});
console.log(structuredResult.answer);
```
### Mixed Providers (Advanced)
```javascript
import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
// Use OpenAI for embeddings, Anthropic for chat
const embeddings = new OpenAIEmbeddings({
openAIApiKey: 'your-openai-api-key',
modelName: 'text-embedding-ada-002',
});
const llm = new ChatAnthropic({
anthropicApiKey: 'your-anthropic-api-key',
modelName: 'claude-3-haiku-20240307',
temperature: 0.7,
});
const rag = new RAGSystem({
database: { /* your config */ },
embeddings: embeddings,
llm: llm,
embeddingDimensions: 1536,
});
```
### Local Models (Privacy-First)
```javascript
import { RAGSystem } from 'rag-system-pgvector';
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';
// Use local models (no API keys required)
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: 'sentence-transformers/all-MiniLM-L6-v2',
});
const llm = new Ollama({
baseUrl: 'http://localhost:11434',
model: 'llama2',
});
const rag = new RAGSystem({
database: { /* your config */ },
embeddings: embeddings,
llm: llm,
embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions
});
```
### Buffer Processing (New in v1.1.0)
```javascript
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
// Process document from Buffer
const buffer = fs.readFileSync('document.pdf');
const result = await processor.processDocumentFromBuffer(
buffer,
'document.pdf',
'pdf',
{ source: 'api-upload', category: 'research' }
);
console.log(result.chunks); // Processed chunks with embeddings
```
### URL Processing (New in v1.1.0)
```javascript
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
// Process single URL
const result = await processor.processDocumentFromUrl(
'https://example.com/document.pdf',
{ source: 'web-crawl', priority: 'high' }
);
// Process multiple URLs
const urls = [
'https://example.com/doc1.pdf',
'https://example.com/doc2.html',
'https://example.com/doc3.md'
];
const results = await processor.processDocumentsFromUrls(urls, {
source: 'batch-import',
maxConcurrent: 3
});
console.log(`Processed ${results.successful.length} documents`);
```
## ๐ฏ Structured Data Queries (New in v2.2.0)
The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.
### Basic Structured Query
```javascript
const result = await rag.query("Tell me about iPhone features", {
structuredData: {
intent: "product_information",
entities: {
product: "iPhone",
category: "smartphone",
brand: "Apple"
},
constraints: [
"Focus on latest model features",
"Include technical specifications"
],
context: {
userType: "potential_buyer",
priceRange: "premium"
},
responseFormat: "structured_list"
}
});
```
### Troubleshooting Query
```javascript
const result = await rag.query("My device won't connect to WiFi", {
structuredData: {
intent: "troubleshooting",
entities: {
issue_type: "connectivity",
device_category: "mobile",
problem_area: "wifi"
},
constraints: [
"Provide step-by-step solution",
"Include alternative methods"
],
responseFormat: "step_by_step_guide"
}
});
```
### Comparison Query
```javascript
const result = await rag.query("Compare iPhone vs Samsung Galaxy", {
structuredData: {
intent: "comparison",
entities: {
item1: "iPhone",
item2: "Samsung Galaxy"
},
constraints: [
"Compare key specifications",
"Highlight main differences"
],
responseFormat: "comparison_table"
}
});
```
### Combined with Chat History
```javascript
const result = await rag.query("What about the camera quality?", {
chatHistory: [
{ role: 'user', content: 'Tell me about iPhone features' },
{ role: 'assistant', content: 'The iPhone offers excellent features...' }
],
structuredData: {
intent: "follow_up_question",
entities: {
topic: "camera",
context_reference: "previous_iphone_discussion"
},
responseFormat: "detailed_explanation"
}
});
```
### Structured Data Schema
```typescript
interface StructuredData {
intent: string; // Query intent/category (required)
entities?: { // Named entities and values
[key: string]: string | number;
};
constraints?: string[]; // Requirements/constraints
context?: { // Additional context
[key: string]: string | number | boolean;
};
responseFormat?: string; // Desired response format
}
```
### Common Intents
- `product_information` - Product details and specifications
- `troubleshooting` - Problem-solving and technical support
- `comparison` - Comparing multiple items
- `how_to_guide` - Step-by-step instructions
- `explanation` - Detailed explanations
- `follow_up_question` - Context-aware follow-ups
### Response Formats
- `structured_list` - Organized bullet points
- `step_by_step_guide` - Numbered instructions
- `comparison_table` - Side-by-side comparison
- `detailed_explanation` - Comprehensive explanation
- `bullet_points` - Simple bullet format
- `json_format` - Structured JSON response
### Advanced Filtering (New in v2.1.0)
```javascript
import RAGSystem from 'rag-system-pgvector';
const rag = new RAGSystem(config);
await rag.initialize();
// Add documents with user/knowledgebot metadata
const documentData = await processor.processDocumentFromBuffer(
buffer,
'user-manual.pdf',
'pdf',
{
userId: 'user_123',
knowledgebotId: 'tech_support_bot',
department: 'engineering',
priority: 'high'
}
);
await rag.documentStore.saveDocument(documentData);
// Query with user filtering
const userResults = await rag.query('What technical info is available?', {
userId: 'user_123',
limit: 5
});
// Query with knowledgebot filtering
const botResults = await rag.query('Help with technical issues', {
knowledgebotId: 'tech_support_bot'
});
// Query with multiple filters
const filteredResults = await rag.query('Show important documents', {
userId: 'user_123',
filter: {
priority: 'high',
department: 'engineering'
}
});
// Direct search with filtering
const searchResults = await rag.searchDocumentsByUserId(
'documentation',
'user_123'
);
// Get all documents for a specific user
const userDocs = await rag.getDocumentsByUserId('user_123');
```
### Chat History & Session Persistence (New in v2.3.0)
Enable multi-turn conversations with persistent chat history stored in PostgreSQL.
#### Basic Chat History
```javascript
// First query
const result1 = await rag.query('What is machine learning?');
// Follow-up with context
const result2 = await rag.query('Can you give me examples?', {
chatHistory: result1.chatHistory
});
// Another follow-up
const result3 = await rag.query('Which one is most popular?', {
chatHistory: result2.chatHistory
});
```
#### Session Persistence
```javascript
const sessionId = 'user_conversation_123';
// Query with automatic session save/load
const result = await rag.query('What is machine learning?', {
sessionId: sessionId,
persistSession: true, // Auto-save after query
userId: 'user_456',
knowledgebotId: 'tech_bot'
});
// Continue conversation (automatically loads history)
const result2 = await rag.query('Tell me more', {
sessionId: sessionId,
persistSession: true
});
// Load session manually
const session = await rag.loadSession(sessionId);
console.log(`Session has ${session.messageCount} messages`);
// Get all user sessions
const userSessions = await rag.getUserSessions('user_456');
console.log(`User has ${userSessions.length} sessions`);
// Get session statistics
const stats = await rag.getSessionStats({ userId: 'user_456' });
console.log(`Total messages: ${stats.totalMessages}`);
```
#### History Summarization
```javascript
// Long conversations are automatically managed
const result = await rag.query('Complex question', {
sessionId: sessionId,
persistSession: true,
maxHistoryLength: 20 // Keeps recent 20 messages
});
```
#### Testing Chat Features
```bash
# Basic chat history
npm run test:chat:basic
# Session management
npm run test:chat:session
# History summarization
npm run test:chat:summarization
# Session persistence
npm run test:chat:persistence
```
**Documentation:**
- ๐ [Chat History Implementation Guide](./CHAT-HISTORY-IMPLEMENTATION.md)
- ๐ [Session Persistence Guide](./CHAT-HISTORY-SESSION-PERSISTENCE.md)
- ๐ [Chat History Summarization](./CHAT-HISTORY-SUMMARIZATION.md)
## ๐ API Documentation
### DocumentProcessor Class
The `DocumentProcessor` class provides powerful document processing capabilities for files, buffers, and URLs.
#### Buffer Processing Methods
##### `processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})`
Process a document directly from a memory buffer.
```javascript
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const processor = new DocumentProcessor();
const buffer = Buffer.from('This is a test document', 'utf8');
const result = await processor.processDocumentFromBuffer(
buffer,
'test.txt',
'txt',
{ source: 'api', category: 'test' }
);
// Returns:
// {
// title: 'Test Document',
// content: 'This is a test document',
// chunks: [...], // Array of processed chunks with embeddings
// metadata: { ... },
// fileType: 'txt',
// filePath: 'test.txt'
// }
```
**Parameters:**
- `buffer` (Buffer): The document content as a Buffer object
- `fileName` (string): Name of the file (used for metadata)
- `fileType` (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')
- `metadata` (object): Additional metadata to attach to the document
**Supported Buffer Types:**
- **TXT**: Plain text files
- **HTML**: HTML documents (extracts text content)
- **Markdown**: Markdown files
- **JSON**: JSON files (converts to readable text)
##### `extractTextFromBuffer(buffer, fileType)`
Extract raw text from a buffer without processing into chunks.
```javascript
const text = await processor.extractTextFromBuffer(buffer, 'html');
console.log(text); // Extracted plain text
```
#### URL Processing Methods
##### `processDocumentFromUrl(url, metadata = {})`
Download and process a document from a URL.
```javascript
const result = await processor.processDocumentFromUrl(
'https://example.com/document.pdf',
{
source: 'web-crawl',
priority: 'high',
category: 'research'
}
);
// Automatically detects file type from URL and content headers
// Downloads to temp directory and processes
```
**Parameters:**
- `url` (string): HTTP/HTTPS URL to download from
- `metadata` (object): Additional metadata for the document
**Features:**
- Automatic file type detection from URL extension and Content-Type headers
- Temporary file handling (auto-cleanup)
- Support for redirects and various HTTP response types
- Comprehensive error handling
##### `processDocumentsFromUrls(urls, options = {})`
Process multiple URLs in parallel with concurrency control.
```javascript
const urls = [
'https://site1.com/doc1.pdf',
'https://site2.com/doc2.html',
'https://site3.com/doc3.md'
];
const results = await processor.processDocumentsFromUrls(urls, {
maxConcurrent: 3, // Process up to 3 URLs simultaneously
metadata: { batch: 'import-2024' },
timeout: 30000, // 30 second timeout per URL
retries: 2 // Retry failed downloads
});
// Returns:
// {
// successful: [...], // Array of successfully processed documents
// failed: [...], // Array of failed URLs with error details
// total: 3,
// successCount: 2,
// failureCount: 1
// }
```
**Options:**
- `maxConcurrent` (number): Maximum concurrent downloads (default: 5)
- `metadata` (object): Metadata applied to all documents
- `timeout` (number): Timeout per URL in milliseconds
- `retries` (number): Number of retry attempts for failed downloads
#### Error Handling
All methods include comprehensive error handling:
```javascript
try {
const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');
} catch (error) {
if (error.message.includes('Buffer is empty')) {
console.log('Empty buffer provided');
} else if (error.message.includes('Unsupported file type')) {
console.log('File type not supported for buffer processing');
} else {
console.log('Processing error:', error.message);
}
}
```
#### Integration with RAG System
Use processed documents with the RAG system:
```javascript
import RAGSystem from 'rag-system-pgvector';
import { DocumentProcessor } from 'rag-system-pgvector/utils';
const rag = new RAGSystem(config);
const processor = new DocumentProcessor();
await rag.initialize();
// Process from buffer
const buffer = fs.readFileSync('document.pdf');
const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');
// Add to RAG system
await rag.documentStore.saveDocument(processed);
// Process from URL and add to RAG
const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');
await rag.documentStore.saveDocument(urlProcessed);
// Now query across all documents
const answer = await rag.query('What information is available?');
```
## ๐ With Web Interface
```javascript
const rag = new RAGSystem({
// ... configuration
server: { port: 3000, enableWebUI: true }
});
await rag.initialize();
await rag.startServer();
// Visit http://localhost:3000
```
## ๐ Documentation
- ๐ **[Complete Package Documentation](./PACKAGE.md)** - Full API reference and examples
- ๐ง **[Integration Guide](./INTEGRATION.md)** - Step-by-step integration examples
- ๐ฏ **[Examples](./examples.js)** - Ready-to-run examples
## โก Quick Examples
Run the included examples:
```bash
# Basic usage example
npm run example:basic
# Web server example
npm run example:server
# Advanced integration example
npm run example:advanced
# Usage patterns overview
npm run example:patterns
```
## ๐ ๏ธ Development & Contributing
For local development and contributions:
### Prerequisites
- **Node.js** v18+
- **PostgreSQL** v12+ with pgvector extension
- **OpenAI API Key**
### Setup
```bash
# Clone and install
git clone https://github.com/yourusername/rag-system-pgvector.git
cd rag-system-pgvector
npm install
# Configure environment
cp .env.example .env
# Edit .env with your credentials
# Initialize database
npm run setup
# Start development
npm run dev
```
### Testing
```bash
# Run examples
npm run example:basic
# Run with web interface
npm run example:server
```
```bash
curl -X POST http://localhost:3000/documents/upload \
-F "document=@path/to/your/document.pdf" \
-F "title=My Document"
```
#### Process Document from File Path
```bash
curl -X POST http://localhost:3000/documents/process \
-H "Content-Type: application/json" \
-d '{
"filePath": "/path/to/document.pdf",
"title": "My Document"
}'
```
#### Search/Query
```bash
curl -X POST http://localhost:3000/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the main topic of the document?",
"sessionId": "optional-session-id"
}'
```
#### Get All Documents
```bash
curl http://localhost:3000/documents
```
#### Get Specific Document
```bash
curl http://localhost:3000/documents/{document-id}
```
#### Delete Document
```bash
curl -X DELETE http://localhost:3000/documents/{document-id}
```
### Command Line Tools
#### Process Documents from Directory
```bash
npm run process-docs /path/to/documents/folder
```
#### Interactive Search
```bash
npm run search
```
#### Single Query Search
```bash
npm run search "Your question here"
```
## ๐๏ธ Architecture
### System Components
1. **Document Processor** (`src/utils/documentProcessor.js`)
- Extracts text from various file formats
- Splits documents into chunks with configurable overlap
- Generates embeddings using OpenAI
2. **Document Store** (`src/services/documentStore.js`)
- Manages document and chunk storage in PostgreSQL
- Performs vector similarity search using pgvector
- Handles CRUD operations
3. **RAG Workflow** (`src/workflows/ragWorkflow.js`)
- LangGraph-based workflow orchestration
- Three-step process: Retrieve โ Rerank โ Generate
- Supports conversational context
4. **API Server** (`src/index.js`)
- Express.js REST API
- File upload handling
- Conversation session management
### Database Schema
```sql
-- Documents table
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
file_path VARCHAR(500),
file_type VARCHAR(50),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Document chunks with embeddings
CREATE TABLE document_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Search sessions for tracking
CREATE TABLE search_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
query TEXT NOT NULL,
results JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Chat Sessions for conversation persistence (NEW)
CREATE TABLE chat_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id VARCHAR(255) UNIQUE NOT NULL,
user_id VARCHAR(255),
knowledgebot_id VARCHAR(255),
history JSONB DEFAULT '[]'::jsonb,
metadata JSONB DEFAULT '{}'::jsonb,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
message_count INTEGER DEFAULT 0
);
-- Indexes for chat sessions
CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);
CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);
CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);
CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);
```
### LangGraph Workflow
```mermaid
graph TD
A[Query Input] --> B[Retrieve Node]
B --> C[Rerank Node]
C --> D[Generate Node]
D --> E[Response Output]
B --> F[Vector Search]
F --> G[Similar Chunks]
C --> H[Score Ranking]
H --> I[Top Chunks]
D --> J[LLM Generation]
J --> K[Contextual Response]
```
## ๐ง Configuration
The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.
### Complete Configuration Example
```javascript
import RAGSystem from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
const rag = new RAGSystem({
// ========================================
// 1. Database Configuration (Required)
// ========================================
database: {
host: 'localhost', // Database host
port: 5432, // Database port
database: 'rag_db', // Database name
username: 'postgres', // Database user
password: 'your_password', // Database password
// Connection Pool Settings
max: 10, // Max connections in pool
min: 0, // Min connections in pool
maxUses: Infinity, // Max uses per connection
allowExitOnIdle: false, // Allow pool to close when idle
maxLifetimeSeconds: 0, // Max connection lifetime (0 = unlimited)
idleTimeoutMillis: 10000 // Idle timeout (10 seconds)
},
// ========================================
// 2. AI Provider Configuration (Required)
// ========================================
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4',
temperature: 0.7
}),
// ========================================
// 3. Embedding Configuration
// ========================================
embeddingDimensions: 1536, // Dimensions for embeddings
// OpenAI ada-002: 1536
// HuggingFace MiniLM: 384
// Anthropic: varies
// ========================================
// 4. Vector Store Configuration
// ========================================
vectorStore: {
tableName: 'document_chunks_vector',
vectorColumnName: 'embedding',
contentColumnName: 'content',
metadataColumnName: 'metadata'
},
// ========================================
// 5. Document Processing Configuration
// ========================================
processing: {
chunkSize: 1000, // Characters per chunk
chunkOverlap: 200 // Overlap between chunks
},
// ========================================
// 6. Chat History Configuration (NEW)
// ========================================
chatHistory: {
enabled: true, // Enable chat history feature
maxMessages: 20, // Max messages before management kicks in
maxTokens: 3000, // Max tokens in chat history
summarizeThreshold: 30, // Trigger summarization after N messages
keepRecentCount: 10, // Recent messages to preserve
alwaysKeepFirst: true, // Always keep conversation starter
persistSessions: true, // Store sessions in database
sessionTimeout: 3600000 // Session timeout (1 hour in ms)
}
});
await rag.initialize();
```
### Configuration Sections Explained
#### 1. Database Configuration
Controls PostgreSQL connection and pool behavior:
```javascript
database: {
host: 'localhost', // Where PostgreSQL is running
port: 5432, // PostgreSQL port (default: 5432)
database: 'rag_db', // Your database name
username: 'postgres', // Database user
password: 'your_password', // User password
// Pool Settings (Advanced)
max: 10, // Maximum concurrent connections
min: 0, // Minimum idle connections
idleTimeoutMillis: 10000 // Close idle connections after 10s
}
```
**Best Practices:**
- Use environment variables for sensitive data
- Set `max` based on your application's concurrency needs
- Monitor connection pool usage in production
#### 2. AI Provider Configuration
Specify your embedding and language model providers:
**OpenAI Example:**
```javascript
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4',
temperature: 0.7
})
```
**Anthropic Example:**
```javascript
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-ada-002'
}),
llm: new ChatAnthropic({
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
modelName: 'claude-3-sonnet-20240229',
temperature: 0.7
})
```
**Local Models Example:**
```javascript
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';
embeddings: new HuggingFaceTransformersEmbeddings({
modelName: 'sentence-transformers/all-MiniLM-L6-v2'
}),
llm: new Ollama({
baseUrl: 'http://localhost:11434',
model: 'llama2'
})
```
#### 3. Embedding Dimensions
Match this to your embedding model's output dimensions:
| Model | Dimensions | Provider |
|-------|------------|----------|
| text-embedding-ada-002 | 1536 | OpenAI |
| all-MiniLM-L6-v2 | 384 | HuggingFace |
| text-embedding-3-small | 1536 | OpenAI |
| text-embedding-3-large | 3072 | OpenAI |
```javascript
embeddingDimensions: 1536 // Must match your embedding model
```
**Important:** If you change embedding models, you must recreate the database schema!
#### 4. Vector Store Configuration
Customize the vector store table structure:
```javascript
vectorStore: {
tableName: 'document_chunks_vector', // Table name for vectors
vectorColumnName: 'embedding', // Column for embeddings
contentColumnName: 'content', // Column for text content
metadataColumnName: 'metadata' // Column for metadata
}
```
Most users can use the defaults.
#### 5. Document Processing
Control how documents are chunked:
```javascript
processing: {
chunkSize: 1000, // Characters per chunk (500-2000 recommended)
chunkOverlap: 200 // Overlap between chunks (10-20% of chunkSize)
}
```
**Guidelines:**
- **Small chunks (500)**: Better precision, more chunks, higher cost
- **Large chunks (2000)**: Better context, fewer chunks, lower cost
- **Overlap**: Prevents context loss at boundaries (typically 10-20%)
**Examples:**
```javascript
// For technical documentation (needs precision)
processing: { chunkSize: 800, chunkOverlap: 150 }
// For books/long content (needs context)
processing: { chunkSize: 1500, chunkOverlap: 300 }
// For code documentation (needs structure)
processing: { chunkSize: 1000, chunkOverlap: 200 }
```
#### 6. Chat History Configuration (NEW in v2.3.0)
Control conversation history management:
```javascript
chatHistory: {
enabled: true, // Enable/disable chat history
maxMessages: 20, // Start management after N messages
maxTokens: 3000, // Maximum tokens in history
summarizeThreshold: 30, // Summarize after N messages
keepRecentCount: 10, // Recent messages to always keep
alwaysKeepFirst: true, // Keep conversation starter
persistSessions: true, // Store in database
sessionTimeout: 3600000 // 1 hour timeout (in milliseconds)
}
```
**Chat History Options Explained:**
- **`enabled`**: Master switch for chat history feature
- **`maxMessages`**: Soft limit before history management activates
- **`maxTokens`**: Hard limit on token count (prevents API errors)
- **`summarizeThreshold`**: When to trigger LLM-based summarization
- **`keepRecentCount`**: Recent messages to preserve during summarization
- **`alwaysKeepFirst`**: Preserve conversation context from the beginning
- **`persistSessions`**: Save sessions to database for persistence
- **`sessionTimeout`**: Milliseconds before session is considered inactive
**Preset Configurations:**
```javascript
// Minimal (cost-effective)
chatHistory: {
enabled: true,
maxMessages: 10,
maxTokens: 1500,
summarizeThreshold: 15,
keepRecentCount: 5,
persistSessions: false
}
// Balanced (recommended)
chatHistory: {
enabled: true,
maxMessages: 20,
maxTokens: 3000,
summarizeThreshold: 30,
keepRecentCount: 10,
persistSessions: true
}
// Maximum context (for complex conversations)
chatHistory: {
enabled: true,
maxMessages: 40,
maxTokens: 6000,
summarizeThreshold: 50,
keepRecentCount: 20,
persistSessions: true
}
// Disabled (for single-shot queries)
chatHistory: {
enabled: false
}
```
### Environment Variables
Create a `.env` file for sensitive configuration:
```env
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=rag_db
DB_USER=postgres
DB_PASSWORD=your_secure_password
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...
# Azure (optional)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...
# Processing (optional)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_DIMENSIONS=1536
```
Then use in your code:
```javascript
import 'dotenv/config';
const rag = new RAGSystem({
database: {
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT),
database: process.env.DB_NAME,
username: process.env.DB_USER,
password: process.env.DB_PASSWORD
},
embeddings: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
}),
llm: new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY
}),
embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')
});
```
### Query-Time Configuration
You can also configure behavior at query time:
```javascript
const result = await rag.query('Your question', {
// Filtering
userId: 'user_123', // Filter by user
knowledgebotId: 'bot_456', // Filter by bot
filter: { category: 'tech' }, // Custom metadata filters
// Retrieval
limit: 10, // Number of chunks to retrieve
threshold: 0.5, // Similarity threshold (0-1)
// Chat History
chatHistory: previousHistory, // Previous conversation
maxHistoryLength: 15, // Override default history length
sessionId: 'session_789', // Session identifier
persistSession: true, // Save session to database
// Context
context: additionalContext, // Extra context to include
metadata: { source: 'api' } // Custom metadata
});
```
### Configuration Best Practices
1. **Security**: Never hardcode API keys or passwords
2. **Environment-Specific**: Use different configs for dev/staging/prod
3. **Performance**: Monitor and adjust based on usage patterns
4. **Cost**: Balance context size with API costs
5. **Testing**: Test with different configurations to find optimal settings
## ๐ Performance Optimization
### Database Indexes
The system creates optimized indexes:
```sql
-- For vector similarity search
CREATE INDEX idx_document_chunks_embedding
ON document_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- For document relationships
CREATE INDEX idx_document_chunks_document_id
ON document_chunks(document_id);
```
### Chunking Strategy
- **Recursive Character Text Splitter**: Preserves semantic boundaries
- **Configurable overlap**: Ensures context continuity
- **Multiple separators**: Prioritizes paragraph, sentence, then word boundaries
## ๐งช Testing
### Test Document Processing
```bash
# Create test documents directory
mkdir test-docs
# Add some test files (PDF, DOCX, TXT, etc.)
# Then process them
npm run process-docs ./test-docs
```
### Test Search
```bash
# Interactive search
npm run search
# Or single query
npm run search "What is machine learning?"
```
## ๐ Troubleshooting
### Common Issues
1. **pgvector extension not found**
```sql
-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
```
2. **OpenAI API quota exceeded**
- Check your OpenAI API usage
- Consider using alternative embedding models
3. **Large document processing fails**
- Increase chunk size or reduce document size
- Check memory limits
4. **Poor search results**
- Lower similarity threshold
- Adjust chunk size and overlap
- Verify document content quality
### Debug Mode
Enable verbose logging by setting:
```env
NODE_ENV=development
```
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request
## ๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
## ๐ Acknowledgments
- [LangChain](https://langchain.com/) for the excellent AI/ML framework
- [pgvector](https://github.com/pgvector/pgvector) for vector similarity search
- [OpenAI](https://openai.com/) for embedding and language models
## ๐ Additional Resources
- [RAG Best Practices](https://docs.langchain.com/docs/use-cases/question-answering)
- [pgvector Documentation](https://github.com/pgvector/pgvector)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)