rag-doc-analyzer

# RAG Document Analyzer A powerful TypeScript library for document analysis and processing using Retrieval-Augmented Generation (RAG) techniques. This library provides tools for document loading, text extraction, embedding generation, and semantic search. ## Features - 📄 Document loading and processing (PDF support included) - 🔍 Semantic search capabilities - 🤖 Integration with various LLM providers (OpenAI, Ollama, Gemini) - 🧠 In-memory vector storage - 🚀 Built with TypeScript for type safety ## Installation ```bash npm install rag-doc-analyzer # or yarn add rag-doc-analyzer ``` ## Prerequisites - Node.js >= 16.0.0 - API keys for your chosen LLM provider (OpenAI, Ollama, or Gemini) ## Quick Start ```typescript import { DocAnalyzer } from 'rag-doc-analyzer'; import { readFileSync } from 'fs'; import { join } from 'path'; async function main() { try { // Initialize the analyzer const analyzer = await DocAnalyzer.init({ pdfs: [readFileSync(join(process.cwd(), 'path-to-your-document.pdf'))], llm: { provider: 'openai', // or 'ollama' or 'gemini' apiKey: 'your-api-key-here', model: 'gpt-3.5-turbo' // or your preferred model }, embedder: 'openai' // or 'ollama' }); // Ask a question about the document const answer = await analyzer.ask('What is the main topic of this document?'); console.log('Answer:', answer); // Or have a conversation const messages = [ { role: 'user' as const, content: 'What are the key points?' }, // The response will be added to the messages array ]; const response = await analyzer.chat(messages); console.log('Chat response:', response); } catch (error) { console.error('Error:', error); } } main(); ``` ## Configuration Options ```typescript interface RagOptions { // Array of PDF documents (as Buffer, File, or file path string) pdfs: (Buffer | File | string)[]; // LLM configuration llm: { // The LLM provider to use provider: 'openai' | 'ollama' | 'gemini'; // API key for the provider (optional for some providers like Ollama) apiKey?: string; // Model to use (e.g., 'gpt-3.5-turbo', 'llama2', 'gemini-pro') model: string; }; // Embedding model to use (optional, defaults to 'openai') embedder?: 'openai' | 'ollama'; // Vector store to use (currently only 'memory' is supported) vectorStore?: 'memory'; } ``` ## API Reference ### `DocAnalyzer.init(options: RagOptions): Promise<DocAnalyzer>` Initialize a new DocAnalyzer instance with the provided options. ### `analyzer.ask(question: string): Promise<string>` Ask a question about the loaded documents. ### `analyzer.chat(messages: Message[]): Promise<string>` Have a conversation about the loaded documents. ## License MIT For detailed documentation and API reference, please visit our [documentation website](https://github.com/yourusername/rag-doc-analyzer#readme). ## Contributing Contributions are welcome! Please read our [contributing guidelines](CONTRIBUTING.md) to get started. ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.