UNPKG

aicf-core

Version:

Universal AI Context Format (AICF) - Enterprise-grade AI memory infrastructure with 95.5% compression and zero semantic loss

442 lines (325 loc) โ€ข 12.1 kB
# Universal Conversation Extractor The Universal Extractor is a bulletproof system for safely extracting conversation data from multiple AI platforms and integrating it into your AICF core system. Built with **data preservation as the top priority**. ## ๐Ÿ” Safety First Architecture ### Key Principles 1. **BULLETPROOF BACKUPS** - All existing data is backed up before any operation 2. **INDEPENDENT PARSERS** - Each platform parser operates in complete isolation 3. **GRACEFUL FAILURES** - One parser failure doesn't affect others 4. **DATA VALIDATION** - All extracted data is validated before storage 5. **ROLLBACK CAPABILITY** - Any operation can be safely undone ### Data Loss Prevention - โœ… **Automatic backups** before every operation - โœ… **Incremental updates** - only adds new data, never overwrites - โœ… **Duplicate detection** - prevents processing the same conversation twice - โœ… **Format validation** - ensures all data meets AICF 3.0 standards - โœ… **Error isolation** - platform failures don't corrupt existing data ## ๐Ÿš€ Quick Start ### Basic Usage ```javascript const { AICF } = require('@aicf/core'); // Create AICF instance const aicf = new AICF(); // Extract conversations from all available platforms const results = await aicf.extractConversations({ maxConversations: 10, timeframe: '7d', platforms: 'all' }); console.log(`Extracted ${results.summary.extraction.conversationsExtracted} conversations`); ``` ### Platform-Specific Extraction ```javascript // Extract only from Warp Terminal const warpResults = await aicf.extractConversations({ platforms: ['warp'], maxConversations: 5 }); // Extract from multiple specific platforms const results = await aicf.extractConversations({ platforms: ['warp', 'augment'], timeframe: '3d' }); ``` ### Advanced Configuration ```javascript const { AICFExtractorIntegration } = require('@aicf/core'); const extractor = new AICFExtractorIntegration({ preserveExisting: true, normalizeFormat: true, organizeFiles: true, backupEnabled: true, maxBackups: 20 }); const results = await extractor.extractAndIntegrate({ platforms: 'all', maxConversations: 20, timeframe: '14d', workspaceLimit: 5 // For Augment VSCode workspaces }); ``` ## ๐Ÿ—๏ธ Supported Platforms ### Currently Supported | Platform | Parser | Data Source | Status | |----------|--------|-------------|---------| | **Warp Terminal** | `WarpParser` | SQLite Database | โœ… Production Ready | | **Augment VSCode** | `AugmentParser` | LevelDB Files | โœ… Production Ready | ### Planned Support | Platform | Parser | Data Source | Status | |----------|--------|-------------|---------| | **OpenAI ChatGPT** | `OpenAIParser` | Export Files | ๐Ÿšง Coming Soon | | **Claude/Anthropic** | `ClaudeParser` | Export Files | ๐Ÿšง Coming Soon | | **Cursor AI** | `CursorParser` | Local Storage | ๐Ÿšง Coming Soon | ## ๐Ÿ” Platform Detection The system automatically detects which AI platforms are available on your system: ```javascript // Check what platforms are available const status = await aicf.getExtractionStatus(); console.log('Available platforms:', status.extractor.available); console.log('Unavailable platforms:', status.extractor.unavailable); ``` ### Warp Terminal Detection - Searches multiple possible database locations - Validates database accessibility - Reports recent activity status ### Augment VSCode Detection - Scans VSCode workspace storage - Identifies active LevelDB stores - Prioritizes recently accessed workspaces ## ๐Ÿ’พ Backup & Recovery System ### Automatic Backups Every extraction operation creates a timestamped backup: ```javascript const results = await aicf.extractConversations(); console.log(`Backup created: ${results.summary.extraction.backupId}`); ``` ### Manual Backup Management ```javascript const { UniversalExtractor } = require('@aicf/core'); const extractor = new UniversalExtractor(); // Create manual backup const backupId = await extractor.createBackup('manual-backup'); // List all backups const backups = extractor.listBackups(); console.log('Available backups:', backups); // Restore from backup await extractor.rollbackToBackup(backupId); ``` ### Backup Structure ``` .aicf/backups/ โ”œโ”€โ”€ backup-extraction-2025-01-06T18-30-45-123Z/ โ”‚ โ”œโ”€โ”€ manifest.json # Backup metadata โ”‚ โ”œโ”€โ”€ ai/ # Complete .ai/ directory backup โ”‚ โ””โ”€โ”€ aicf/ # Complete .aicf/ directory backup โ””โ”€โ”€ backup-manual-2025-01-06T17-15-30-456Z/ โ””โ”€โ”€ ... ``` ## ๐Ÿ”ง Data Processing Pipeline ### 1. Extraction Phase ```javascript // Each parser operates independently const warpConversations = await warpParser.extractConversations(); const augmentConversations = await augmentParser.extractConversations(); ``` ### 2. Analysis Phase ```javascript // Analyze for insights, technologies, patterns const analyzed = await analyzeConversation(conversation); // Result: { insights, technologies, summary, quality, confidence } ``` ### 3. Normalization Phase ```javascript // Convert to clean AICF 3.0 format const normalized = await prettifier.prettifyData(analyzed); ``` ### 4. Integration Phase ```javascript // Safely append to existing AICF files (preserves all existing content) await appendToConversationFile(conversation); ``` ## ๐Ÿ“Š Data Quality & Validation ### Content Validation - Minimum content length requirements - Noise pattern filtering - Duplicate detection - Format compliance checking ### Quality Assessment ```javascript // Automatic quality scoring const quality = assessConversationQuality(conversation); // Result: 'high', 'medium', or 'low' // Confidence calculation const confidence = calculateConfidence(conversation); // Result: 0.0 to 1.0 ``` ### Data Enrichment - Technology detection - Insight extraction - Conversation type classification - Working directory analysis ## ๐Ÿšจ Error Handling & Recovery ### Independent Parser Operation ```javascript // If one parser fails, others continue try { const warpData = await warpParser.extract(); } catch (error) { console.warn('Warp parser failed, continuing with other parsers'); } try { const augmentData = await augmentParser.extract(); } catch (error) { console.warn('Augment parser failed, continuing with other parsers'); } ``` ### Rollback on Critical Failure ```javascript try { const results = await extractor.extractAndIntegrate(); } catch (error) { if (results.extraction?.backupId) { console.log('Rolling back due to critical failure...'); await extractor.rollbackToBackup(results.extraction.backupId); } } ``` ## ๐Ÿงช Testing & Validation ### Run Comprehensive Tests ```bash # Test the complete system node test-universal-extractor.js ``` ### Test Output Example ``` ๐Ÿงช Testing Universal Extractor System ===================================== ๐Ÿ“Š Step 1: Checking system status... ๐Ÿ” Extractor Status: Available parsers: 2 Processed conversations: 0 Backup count: 3 โœ… Available Parsers: - Warp Terminal: warp - Augment VSCode Extension: augment, vscode ๐Ÿ’พ Step 2: Testing backup system... โœ… Backup created: backup-test-2025-01-06T18-45-30-789Z ๐Ÿ” Step 3: Testing extraction (limited scope)... โœ… Extraction completed successfully! Conversations extracted: 5 Conversations processed: 5 Conversations added: 5 ๐ŸŽ‰ Universal Extractor Test Complete! โœ… System architecture validated โœ… Backup/recovery system working โœ… Data integrity preserved โœ… Platform detection functional ``` ## โš™๏ธ Configuration Options ### Extraction Options ```javascript const options = { platforms: 'all', // 'all' or ['warp', 'augment'] maxConversations: 10, // Limit per platform timeframe: '7d', // '1d', '7d', '30d', etc. workspaceLimit: 3, // For Augment workspaces organizeFiles: true // Run file organization after extraction }; ``` ### System Options ```javascript const systemOptions = { backupEnabled: true, // Always keep enabled for safety maxBackups: 10, // Number of backups to retain validateContent: true, // Validate extracted content parallelExtraction: false, // Sequential for safety preserveExisting: true // Never overwrite existing data }; ``` ## ๐Ÿ”„ Integration with AICF Core ### Main AICF Class Integration ```javascript const aicf = new AICF(); // Extract conversations await aicf.extractConversations(); // Organize files await aicf.organizeFiles(); // Run complete memory lifecycle await aicf.runMemoryLifecycle(); ``` ### File Organization Integration The extractor integrates with `FileOrganizationAgent` to: - Move misplaced files to appropriate subdirectories - Maintain whitelist of core AICF files - Create organized documentation structure ### Memory Lifecycle Integration Works with `MemoryLifecycleManager` to: - Process extracted conversations through analysis pipeline - Apply memory decay strategies - Generate insights and summaries ## ๐Ÿšฆ Best Practices ### Production Usage 1. **Always test first**: Run `test-universal-extractor.js` before production use 2. **Start small**: Begin with limited conversations (`maxConversations: 5`) 3. **Monitor backups**: Regularly check backup creation and integrity 4. **Validate results**: Review extracted conversation quality 5. **Schedule extraction**: Run during off-peak hours to avoid platform interference ### Data Safety 1. **Never disable backups** in production 2. **Test rollback procedures** before critical operations 3. **Monitor disk space** for backup storage 4. **Validate data integrity** after each extraction 5. **Keep multiple backup generations** for redundancy ### Platform-Specific Tips #### Warp Terminal - Extraction works while Warp is running - Database is read-only, no interference with Warp operation - Recent conversations have richer metadata #### Augment VSCode - Close VSCode before extraction for best results - LevelDB files can be large - monitor processing time - Multiple workspaces are processed in parallel ## ๐Ÿ†˜ Troubleshooting ### Common Issues **"No parsers available"** - Warp/Augment not installed or databases not found - Check file paths and permissions **"Backup creation failed"** - Insufficient disk space - Permission issues with .aicf directory **"Extraction timeout"** - Large LevelDB files in Augment - Reduce `workspaceLimit` or increase timeout ### Debug Mode ```javascript const extractor = new AICFExtractorIntegration({ verbose: true // Enable detailed logging }); ``` ### Recovery Procedures ```javascript // Emergency rollback to last known good state const extractor = new UniversalExtractor(); const backups = extractor.listBackups(); await extractor.rollbackToBackup(backups[0].id); ``` ## ๐Ÿ“ˆ Future Enhancements ### Planned Features - **Real-time extraction**: Monitor AI platforms for new conversations - **Enhanced analysis**: Integration with Dennis's full ConversationAnalyzer - **Cloud backup**: Optional backup to cloud storage - **Web interface**: Visual management of extractions and backups - **Scheduled extraction**: Automatic daily/weekly extraction runs ### Parser Extensions - OpenAI ChatGPT export parser - Claude conversation parser - Cursor AI session parser - Generic JSON conversation parser --- ## ๐ŸŽฏ Summary The Universal Extractor provides a **bulletproof, production-ready system** for safely extracting and integrating AI conversation data. With automatic backups, independent parser operation, and comprehensive error handling, it ensures your valuable conversation history is preserved and enhanced without risk. **Key benefits:** - โœ… **Zero data loss** with automatic backup/recovery - โœ… **Platform independence** - parsers can't interfere with each other - โœ… **Production ready** with comprehensive testing - โœ… **Future proof** - easily extensible to new AI platforms - โœ… **Integrated** - works seamlessly with existing AICF core systems