rag-doc-analyzer
Version:
A RAG (Retrieval-Augmented Generation) library for document analysis and processing
120 lines (84 loc) • 3.33 kB
Markdown
for document analysis and processing using Retrieval-Augmented Generation (RAG) techniques. This library provides tools for document loading, text extraction, embedding generation, and semantic search.
## Features
- 📄 Document loading and processing (PDF support included)
- 🔍 Semantic search capabilities
- 🤖 Integration with various LLM providers (OpenAI, Ollama, Gemini)
- 🧠 In-memory vector storage
- 🚀 Built with TypeScript for type safety
## Installation
```bash
npm install rag-doc-analyzer
# or
yarn add rag-doc-analyzer
```
## Prerequisites
- Node.js >= 16.0.0
- API keys for your chosen LLM provider (OpenAI, Ollama, or Gemini)
## Quick Start
```typescript
import { DocAnalyzer } from 'rag-doc-analyzer';
import { readFileSync } from 'fs';
import { join } from 'path';
async function main() {
try {
// Initialize the analyzer
const analyzer = await DocAnalyzer.init({
pdfs: [readFileSync(join(process.cwd(), 'path-to-your-document.pdf'))],
llm: {
provider: 'openai', // or 'ollama' or 'gemini'
apiKey: 'your-api-key-here',
model: 'gpt-3.5-turbo' // or your preferred model
},
embedder: 'openai' // or 'ollama'
});
// Ask a question about the document
const answer = await analyzer.ask('What is the main topic of this document?');
console.log('Answer:', answer);
// Or have a conversation
const messages = [
{ role: 'user' as const, content: 'What are the key points?' },
// The response will be added to the messages array
];
const response = await analyzer.chat(messages);
console.log('Chat response:', response);
} catch (error) {
console.error('Error:', error);
}
}
main();
```
```typescript
interface RagOptions {
// Array of PDF documents (as Buffer, File, or file path string)
pdfs: (Buffer | File | string)[];
// LLM configuration
llm: {
// The LLM provider to use
provider: 'openai' | 'ollama' | 'gemini';
// API key for the provider (optional for some providers like Ollama)
apiKey?: string;
// Model to use (e.g., 'gpt-3.5-turbo', 'llama2', 'gemini-pro')
model: string;
};
// Embedding model to use (optional, defaults to 'openai')
embedder?: 'openai' | 'ollama';
// Vector store to use (currently only 'memory' is supported)
vectorStore?: 'memory';
}
```
Initialize a new DocAnalyzer instance with the provided options.
Ask a question about the loaded documents.
Have a conversation about the loaded documents.
MIT
For detailed documentation and API reference, please visit our [documentation website](https://github.com/yourusername/rag-doc-analyzer#readme).
Contributions are welcome! Please read our [contributing guidelines](CONTRIBUTING.md) to get started.
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
A powerful TypeScript library