UNPKG

weavebot-core

Version:

Generic content processing framework for web scraping and AI extraction

123 lines (91 loc) 3.1 kB
# @weavebot/core Generic content processing framework for web scraping and AI extraction. ## Overview `@weavebot/core` is a lightweight, plugin-based framework for extracting structured data from web content. It provides infrastructure without implementation details, allowing you to build custom content processing pipelines. ## Features - 🔌 **Plugin Architecture** - Extend functionality without modifying core - 🤖 **Schema-Driven AI Extraction** - Register custom schemas for any data type - 🌐 **Generic Web Scraper** - Platform-agnostic with plugin support - 💾 **Flexible Storage Interface** - Use any backend (Airtable, MongoDB, etc.) - 📝 **Dynamic Schema Registry** - Register schemas at runtime - 🔧 **Zero Implementation Details** - Pure infrastructure, no domain logic ## Installation ```bash npm install @weavebot/core ``` ## Quick Start ```typescript import ContentProcessor, { createWebScraper, createAIExtractor, SchemaRegistry } from '@weavebot/core'; import { z } from 'zod'; // Create processor instance const processor = new ContentProcessor(); // Register your schema const ArticleSchema = z.object({ title: z.string(), author: z.string(), content: z.string(), publishedAt: z.date() }); processor.registerSchema('article', ArticleSchema); // Set up processors const scraper = createWebScraper(); const extractor = createAIExtractor({ provider: 'openai', apiKey: process.env.OPENAI_API_KEY }); // Register extraction configuration extractor.registerExtractor('article', { schema: ArticleSchema, systemPrompt: 'Extract article information from the content', userPromptTemplate: 'Extract article from: {{content}}' }); processor.addProcessor('web-scraper', scraper); processor.addProcessor('ai-extractor', extractor); // Process a URL const result = await processor.process({ type: 'url', data: 'https://example.com/article', schema: 'article' }); ``` ## Plugin System Create platform-specific plugins for the web scraper: ```typescript import { WebScraperPlugin } from '@weavebot/core'; class MyPlatformPlugin implements WebScraperPlugin { name = 'my-platform'; canHandle(url: string): boolean { return url.includes('myplatform.com'); } getConfig(url: string) { return { strategy: 'spa', waitSelectors: ['.content-loaded'], timeout: 10000 }; } } scraper.registerPlugin(new MyPlatformPlugin()); ``` ## Storage Adapters Implement the generic storage interface for your backend: ```typescript import { StorageAdapter } from '@weavebot/core'; class MyStorageAdapter implements StorageAdapter { async initialize(config) { /* ... */ } async create(collection, data) { /* ... */ } async read(collection, id) { /* ... */ } async update(collection, id, data) { /* ... */ } async delete(collection, id) { /* ... */ } async query(collection, filter) { /* ... */ } } processor.addStorage('my-storage', new MyStorageAdapter()); ``` ## Documentation For complete documentation, visit the [GitHub repository](https://github.com/weavebot/library). ## License MIT