UNPKG

aicf-core

Version:

Universal AI Context Format (AICF) - Enterprise-grade AI memory infrastructure with 95.5% compression and zero semantic loss

865 lines (647 loc) 23.9 kB
# AICF Ethical Design Guide ## AI Context File Format - Privacy-First Architecture **Version:** 1.0 **Last Updated:** October 7, 2025 **Status:** Draft Specification --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [Ethical Foundation](#ethical-foundation) 3. [The LLM Export Approach](#the-llm-export-approach) 4. [AICF Format Specification](#aicf-format-specification) 5. [Implementation Guide](#implementation-guide) 6. [Privacy & Security](#privacy--security) 7. [Legal Considerations](#legal-considerations) 8. [Comparison: System Extraction vs LLM Export](#comparison-system-extraction-vs-llm-export) 9. [Roadmap](#roadmap) --- ## Executive Summary AICF (AI Context File) is a privacy-first format for capturing and sharing AI conversation context across different platforms and tools. Unlike traditional approaches that extract data from application storage, AICF uses a **user-initiated, consent-based export model** where users explicitly request their LLM to generate standardized context files. ### Core Principles - **User Agency First** - All exports are user-initiated - **No System Permissions** - No Full Disk Access or privileged permissions required - **Platform Agnostic** - Works with any LLM via natural language - **Local Processing** - All data stays on the user's machine - **Transparent Format** - Human-readable, inspectable by users --- ## Ethical Foundation ### Key Ethical Principles #### 1. Explicit User Consent **Principle**: Individuals own their personal data, and collecting it without explicit consent is both unethical and potentially illegal. **Implementation**: ```bash # ✅ User-initiated export user: "Export our conversation in AICF format" # ❌ Automatic system extraction (invasive) extractFromSystem('~/Library/Application Support/app/'); ``` #### 2. Transparency Requirements Users must have clear information about: - What data is collected - How it will be used - Who has access to it - How to revoke access #### 3. Minimum Necessary Data Only collect the minimum viable information required for the intended purpose. Users should be able to selectively export only relevant portions of their conversations. #### 4. Data Security & Privacy Even with consent, users expect their data to remain private and secure. AICF processes all data locally and never transmits it to external servers without explicit user action. #### 5. User Control Users must have complete control over: - When to export data - What to include in exports - How to share or delete data - How to revoke access --- ## The LLM Export Approach ### Why LLM Self-Reporting? Instead of scraping application databases, AICF leverages the LLM's own ability to understand and format its conversation history. #### Comparison ```javascript // System Snooping - Invasive const extractFromSystem = require('./system-extractors'); await extractFromSystem('~/Library/Application Support/warp'); // LLM Self-Reporting - Consent-Based const instructions = ` Please export our conversation history in AICF format. Include: timestamps, context, decisions, code changes. Format: @CONVERSATION, @DECISIONS, @INSIGHTS tags. `; ``` ### Advantages | Aspect | System Extraction | LLM Export | |--------|---------------------|---------------| | **Consent** | Implicit, hidden | Explicit, visible | | **Permissions** | Requires Full Disk Access | Requires nothing | | **Privacy** | Accesses all app data | Only shared conversation | | **Platform Support** | Custom parser per app | Universal prompt | | **Maintenance** | Breaks when apps update | Stable | | **User Trust** | Questionable | High | | **Legal Risk** | May violate ToS | Uses intended interface | --- ## AICF Format Specification ### Core Structure AICF files use a tagged section format with clear start/end markers. ``` @CONVERSATION_START timestamp: 2025-10-07T14:30:00Z platform: claude|chatgpt|cursor|warp|copilot session_id: [optional] user_id: [optional, anonymized] @CONVERSATION_END @CONTEXT - Current project: [project name] - Working directory: [if mentioned] - Tech stack: [languages/frameworks discussed] - Goals: [what user is trying to accomplish] - Constraints: [limitations, preferences] @CONTEXT_END @MESSAGES [timestamp] user: [message content] [timestamp] assistant: [response content] [timestamp] user: [message content] [timestamp] assistant: [response content] @MESSAGES_END @DECISIONS - [timestamp] Decided to use React instead of Vue because of team familiarity - [timestamp] Chose PostgreSQL over MongoDB for relational data requirements - [timestamp] Opted for REST API over GraphQL to reduce complexity @DECISIONS_END @CODE_CHANGES [timestamp] file: src/components/App.jsx action: created summary: Initial React component structure lines: 45 [timestamp] file: package.json action: modified summary: Added dependencies: react, react-dom, typescript @CODE_CHANGES_END @INSIGHTS - User prefers functional components over class components - Performance is a priority concern - Team is familiar with TypeScript but new to React - Accessibility (a11y) is important for this project @INSIGHTS_END @TODO - [ ] Implement authentication with JWT - [x] Set up database schema and migrations - [ ] Add comprehensive error handling - [ ] Write unit tests for core functionality @TODO_END @METADATA export_version: 1.0 total_messages: 42 export_date: 2025-10-07T15:45:00Z privacy_level: standard @METADATA_END ``` ### Section Definitions #### @CONVERSATION Metadata about the conversation session. **Required fields**: - `timestamp`: ISO 8601 format - `platform`: Source platform name **Optional fields**: - `session_id`: Unique identifier for the session - `user_id`: Anonymized user identifier #### @CONTEXT High-level context about the project and goals. **Recommended fields**: - Current project name - Working directory (use relative paths for privacy) - Tech stack and tools - Primary goals - Known constraints #### @MESSAGES Complete conversation history. **Format**: `[timestamp] role: content` **Roles**: `user`, `assistant`, `system` #### @DECISIONS Key decisions made during the conversation. **Format**: `- [timestamp] Decision with rationale` Include the reasoning behind decisions for future reference. #### @CODE_CHANGES Summary of code created or modified. **Fields per change**: - `timestamp`: When the change was made - `file`: File path (relative preferred) - `action`: created | modified | deleted - `summary`: Brief description - `lines`: Optional line count #### @INSIGHTS Important observations about preferences, patterns, and constraints. **Examples**: - Coding style preferences - Performance priorities - Team capabilities - Domain-specific requirements #### @TODO Outstanding tasks and next steps. **Format**: GitHub-style checkboxes - `[ ]` - Incomplete - `[x]` - Complete #### @METADATA Information about the export itself. **Standard fields**: - `export_version`: AICF format version - `total_messages`: Message count - `export_date`: When exported - `privacy_level`: standard | enhanced | minimal --- ## Implementation Guide ### Phase 1: Export Prompt Template #### Basic Export Prompt ```markdown Please export our conversation in AICF format. Include the following sections: 1. @CONTEXT - Project context, goals, tech stack 2. @MESSAGES - Full conversation history with timestamps 3. @DECISIONS - Key decisions made during this conversation 4. @CODE_CHANGES - Any code created or modified 5. @INSIGHTS - Important observations about preferences, constraints 6. @TODO - Outstanding tasks or next steps Format each section with clear @SECTION_START and @SECTION_END tags. Use ISO 8601 timestamps where applicable. Be comprehensive but concise. ``` #### Privacy-Aware Export ```markdown Export our conversation in AICF format with privacy protections: - Redact any API keys, passwords, or credentials - Use relative file paths instead of absolute paths - Anonymize any personal identifiers - Exclude messages tagged with [PRIVATE] - Replace sensitive data with [REDACTED] Include standard AICF sections: @CONTEXT, @MESSAGES, @DECISIONS, @CODE_CHANGES, @INSIGHTS, @TODO ``` #### Focused Export ```markdown Export a focused AICF file containing only: - Time range: Last 2 hours - Topic: Authentication implementation - Sections: @DECISIONS and @CODE_CHANGES only - Exclude: General discussion, unrelated messages Use standard AICF format with timestamps. ``` ### Phase 2: CLI Tool Design #### Installation ```bash npm install -g aicf-cli # or cargo install aicf-cli ``` #### Basic Commands ```bash # Import from clipboard aicf import --from-clipboard # Import from file aicf import --from-file conversation.aicf # Import from URL (if platform has export API) aicf import --from-url https://example.com/export/session123 # Parse and validate aicf parse conversation.aicf # Extract specific sections aicf parse conversation.aicf --extract-decisions aicf parse conversation.aicf --extract-code aicf parse conversation.aicf --extract-context # Merge multiple contexts aicf merge claude-chat.aicf cursor-session.aicf warp-context.aicf --output unified.aicf # Query the context aicf query "What did we decide about the database?" aicf query --related-to "authentication" # Validate format aicf validate conversation.aicf # Convert formats aicf convert conversation.aicf --to json aicf convert conversation.aicf --to markdown ``` #### Advanced Commands ```bash # Deduplicate merged contexts aicf merge *.aicf --deduplicate --output unified.aicf # Privacy check aicf privacy-scan conversation.aicf # Statistics aicf stats conversation.aicf # Split large files aicf split large-conversation.aicf --by-date --chunk-size 100 # Redact sensitive data aicf redact conversation.aicf --patterns api_keys,passwords,emails ``` ### Phase 3: Parser Implementation #### JavaScript/TypeScript ```typescript // aicf-parser.ts interface AICFSection { type: string; content: string; metadata?: Record<string, any>; } interface AICFDocument { conversation: ConversationMetadata; context: string[]; messages: Message[]; decisions: Decision[]; codeChanges: CodeChange[]; insights: string[]; todos: TodoItem[]; metadata: ExportMetadata; } class AICFParser { parse(content: string): AICFDocument { const sections = this.extractSections(content); return { conversation: this.parseConversation(sections.CONVERSATION), context: this.parseContext(sections.CONTEXT), messages: this.parseMessages(sections.MESSAGES), decisions: this.parseDecisions(sections.DECISIONS), codeChanges: this.parseCodeChanges(sections.CODE_CHANGES), insights: this.parseInsights(sections.INSIGHTS), todos: this.parseTodos(sections.TODO), metadata: this.parseMetadata(sections.METADATA) }; } private extractSections(content: string): Record<string, string> { const sections: Record<string, string> = {}; const sectionRegex = /@(\w+)_START\n([\s\S]*?)@\1_END/g; // Also support simplified format: @SECTION\n...\n@SECTION_END const simplifiedRegex = /@(\w+)\n([\s\S]*?)@\1_END/g; let match; while ((match = sectionRegex.exec(content)) !== null) { sections[match[1]] = match[2].trim(); } while ((match = simplifiedRegex.exec(content)) !== null) { if (!sections[match[1]]) { sections[match[1]] = match[2].trim(); } } return sections; } private parseMessages(content: string): Message[] { const lines = content.split('\n'); const messages: Message[] = []; const messageRegex = /^\[(.+?)\] (user|assistant|system): (.+)$/; for (const line of lines) { const match = line.match(messageRegex); if (match) { messages.push({ timestamp: new Date(match[1]), role: match[2] as 'user' | 'assistant' | 'system', content: match[3] }); } } return messages; } validate(content: string): ValidationResult { const issues: string[] = []; const warnings: string[] = []; // Check required sections const requiredSections = ['CONVERSATION', 'MESSAGES']; for (const section of requiredSections) { if (!content.includes(`@${section}`)) { issues.push(`Missing required section: @${section}`); } } // Validate timestamps const timestampRegex = /\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/g; const timestamps = content.match(timestampRegex) || []; for (const ts of timestamps) { if (isNaN(Date.parse(ts))) { warnings.push(`Invalid timestamp format: ${ts}`); } } // Check for potential sensitive data const sensitivePatterns = [ /api[_-]key/i, /password/i, /secret/i, /token/i, /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/ // email ]; for (const pattern of sensitivePatterns) { if (pattern.test(content)) { warnings.push(`Potential sensitive data detected: ${pattern.source}`); } } return { valid: issues.length === 0, issues, warnings }; } } export default AICFParser; ``` #### Python Implementation ```python # aicf_parser.py import re from datetime import datetime from typing import Dict, List, Optional from dataclasses import dataclass @dataclass class Message: timestamp: datetime role: str content: str @dataclass class Decision: timestamp: datetime content: str @dataclass class CodeChange: timestamp: datetime file: str action: str summary: str lines: Optional[int] = None @dataclass class TodoItem: completed: bool content: str class AICFParser: def parse(self, content: str) -> Dict: sections = self._extract_sections(content) return { 'conversation': self._parse_conversation(sections.get('CONVERSATION', '')), 'context': self._parse_context(sections.get('CONTEXT', '')), 'messages': self._parse_messages(sections.get('MESSAGES', '')), 'decisions': self._parse_decisions(sections.get('DECISIONS', '')), 'code_changes': self._parse_code_changes(sections.get('CODE_CHANGES', '')), 'insights': self._parse_insights(sections.get('INSIGHTS', '')), 'todos': self._parse_todos(sections.get('TODO', '')), 'metadata': self._parse_metadata(sections.get('METADATA', '')) } def _extract_sections(self, content: str) -> Dict[str, str]: sections = {} # Match @SECTION_START ... @SECTION_END pattern = r'@(\w+)_START\n(.*?)@\1_END' matches = re.finditer(pattern, content, re.DOTALL) for match in matches: section_name = match.group(1) section_content = match.group(2).strip() sections[section_name] = section_content # Also support simplified format pattern = r'@(\w+)\n(.*?)@\1_END' matches = re.finditer(pattern, content, re.DOTALL) for match in matches: section_name = match.group(1) if section_name not in sections: section_content = match.group(2).strip() sections[section_name] = section_content return sections def _parse_messages(self, content: str) -> List[Message]: messages = [] pattern = r'\[(.+?)\] (user|assistant|system): (.+)' for line in content.split('\n'): match = re.match(pattern, line) if match: timestamp_str, role, msg_content = match.groups() messages.append(Message( timestamp=datetime.fromisoformat(timestamp_str), role=role, content=msg_content )) return messages def validate(self, content: str) -> Dict[str, any]: issues = [] warnings = [] # Check required sections required_sections = ['CONVERSATION', 'MESSAGES'] for section in required_sections: if f'@{section}' not in content: issues.append(f'Missing required section: @{section}') # Check for sensitive data sensitive_patterns = [ (r'api[_-]?key', 'API key'), (r'password', 'Password'), (r'secret', 'Secret'), (r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', 'Email address') ] for pattern, desc in sensitive_patterns: if re.search(pattern, content, re.IGNORECASE): warnings.append(f'Potential sensitive data detected: {desc}') return { 'valid': len(issues) == 0, 'issues': issues, 'warnings': warnings } ``` ### Phase 4: Browser Extension A browser extension can provide one-click export from web-based LLM interfaces. #### Features - Detect LLM platform (Claude, ChatGPT, Gemini, etc.) - One-click "Export as AICF" button - Preview export before saving - Privacy controls (redact sensitive data) - Direct save to file or clipboard --- ## Privacy & Security ### Data Processing Principles 1. **Local Processing Only** - All parsing and analysis happens on the user's machine - No data transmitted to external servers - No telemetry or analytics collection 2. **Privacy Controls** ```javascript // Privacy levels const PRIVACY_LEVELS = { minimal: { include: ['decisions', 'insights'], exclude: ['messages', 'code_changes'], redact: ['all_identifiers'] }, standard: { include: ['context', 'decisions', 'code_changes', 'insights'], exclude: ['raw_messages'], redact: ['credentials', 'emails', 'paths'] }, full: { include: ['all'], exclude: [], redact: ['credentials'] } }; ``` 3. **Automatic Redaction** ```javascript const SENSITIVE_PATTERNS = { api_keys: /[a-zA-Z0-9_-]{32,}/, passwords: /password[:\s=]+\S+/i, emails: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/, tokens: /token[:\s=]+\S+/i, absolute_paths: /\/Users\/[^\/]+/, ip_addresses: /\b(?:\d{1,3}\.){3}\d{1,3}\b/ }; function redactSensitive(content: string): string { let redacted = content; for (const [type, pattern] of Object.entries(SENSITIVE_PATTERNS)) { redacted = redacted.replace(pattern, `[REDACTED_${type.toUpperCase()}]`); } return redacted; } ``` ### Security Best Practices - Store AICF files with appropriate permissions (chmod 600) - Use encryption for sensitive exports - Implement secure deletion (overwrite before removing) - Audit trail for all operations - Version control friendly (text format) --- ## Legal Considerations ### Compliance Framework #### GDPR (European Union) If your tool has EU users, ensure: - **Right to Access**: Users can export their data (✅ built into AICF) - **Right to Erasure**: Users can delete exported files (✅ standard file operations) - **Data Portability**: Standard format enables portability (✅ AICF is portable) - **Consent**: User explicitly requests export (✅ consent-based model) #### CCPA (California) For California users: - **Disclosure**: Clear documentation of what data is collected (✅ format spec) - **Opt-out Rights**: Users control all exports (✅ user-initiated only) - **No Sale of Data**: AICF never transmits data (✅ local processing) #### Platform Terms of Service AICF uses the intended interface (chat/API) rather than scraping, which typically complies with ToS. However: - ⚠️ Review each platform's export and data usage policies - Use official APIs where available - Don't circumvent rate limits or technical protections - Respect robots.txt and API terms ### Recommended Legal Documentation For your AICF project, create: 1. **Privacy Policy** - How AICF handles data 2. **Terms of Use** - User responsibilities 3. **Data Processing Agreement** - For enterprise users 4. **Security Documentation** - How data is protected --- ## Comparison: System Extraction vs LLM Export ### Technical Comparison | Feature | System Extraction | LLM Export | |---------|------------------|------------| | **Setup Complexity** | High | Low | | **Permissions Required** | Full Disk Access | None | | **Platform Support** | Per-platform parser | Universal | | **Maintenance Burden** | High (breaks with updates) | Low (stable prompts) | | **Data Accuracy** | High (raw data) | Good (AI interpretation) | | **Privacy Risk** | High (accesses all data) | Low (only shared data) | | **User Control** | Low (automatic) | High (explicit) | | **Installation** | Complex | Simple | ### Ethical Comparison | Principle | System Extraction | LLM Export | |-----------|------------------|------------| | **Consent** | Implicit | Explicit | | **Transparency** | Hidden process | Visible action | | **User Agency** | Passive | Active | | **Privacy** | Accesses all | Only requested | | **Trust** | Low | High | | **Legal Risk** | ⚠️ Moderate-High | Low | ### Decision Matrix Choose **System Extraction** when: - Working with legacy systems without APIs - Need guaranteed data accuracy - Have explicit user authorization - Compliance allows system-level access Choose **LLM Export** when: - Building new tools (like AICF) - Privacy is a priority - Need wide platform support - Want low maintenance burden - User trust is critical **Recommendation**: Use LLM Export as the primary method, with system extraction only as an advanced opt-in feature. --- ## Roadmap ### Version 1.0 - Foundation (Current) - AICF format specification - Export prompt templates - Basic parser (JS/TS, Python) - CLI tool design - Privacy framework ### Version 1.1 - Enhanced Features - 🔄 Browser extension (Chrome, Firefox) - 🔄 GUI application for non-technical users - 🔄 Advanced privacy controls - 🔄 Multi-format conversion (JSON, Markdown, HTML) - 🔄 Encryption support ### Version 2.0 - Platform Integration - 📋 Official API integrations (where available) - 📋 IDE plugins (VSCode, JetBrains) - 📋 Context injection helpers - 📋 Real-time sync capabilities - 📋 Collaborative features ### Version 3.0 - Intelligence Layer - 📋 Semantic search across contexts - 📋 Auto-tagging and categorization - 📋 Conflict resolution for merged contexts - 📋 Insight extraction and summarization - 📋 Recommendation engine --- ## Contributing AICF is an open standard. Contributions welcome: - 📝 Format improvements - 🔧 Parser implementations in other languages - 🎨 UI/UX for tools - 📚 Documentation - 🧪 Test cases and validation ### Guidelines 1. **Privacy First** - Never compromise user privacy 2. **Simplicity** - Keep the format human-readable 3. **Compatibility** - Ensure backward compatibility 4. **Documentation** - Document all changes 5. **Testing** - Provide test cases --- ## License [Your chosen license - recommend MIT or Apache 2.0 for open standards] --- ## Contact & Support - **Issues**: [GitHub Issues](https://github.com/yourusername/aicf-core/issues) - **Discussions**: [GitHub Discussions](https://github.com/yourusername/aicf-core/discussions) - **Email**: [your-email@example.com] - **Documentation**: [https://aicf.dev](https://aicf.dev) --- ## Acknowledgments This specification was developed with ethical considerations as the primary design constraint, prioritizing user privacy, consent, and control over convenience and technical capability. Special thanks to the privacy and security communities for establishing the principles that guided this work. --- **Document Version**: 1.0 **Last Updated**: October 7, 2025 **Status**: Draft - Open for Community Feedback