UNPKG

oneie

Version:

Build apps, websites, and AI agents in English. Zero-interaction setup for AI agents (Claude Code, Cursor, Windsurf). Download to your computer, run in the cloud, deploy to the edge. Open source and free forever.

661 lines (553 loc) 17.4 kB
--- title: Knowledge dimension: knowledge category: knowledge.md tags: ai, cycle, knowledge, ontology, rag, things related_dimensions: connections, events, people, things scope: global created: 2025-11-03 updated: 2025-11-03 version: 1.0.0 ai_context: | This document is part of the knowledge dimension in the knowledge.md category. Location: one/knowledge/knowledge.md Purpose: Documents knowledge: vectors, embeddings & cycle Related dimensions: connections, events, people, things For AI agents: Read this to understand knowledge. --- # Knowledge: Vectors, Embeddings & Cycle **The Intelligence Layer** Knowledge is the fourth primitive in ONE's ontology. It powers: - **Semantic search** via vector embeddings - **RAG** (Retrieval-Augmented Generation) for AI responses - **Cycle** for content generation - **Taxonomy** via labels (replaces legacy "tags") --- ## The Knowledge Table ```typescript type KnowledgeType = 'label' | 'document' | 'chunk' | 'vector_only'; type Knowledge = { _id: Id<'knowledge'>; knowledgeType: KnowledgeType; // Content text?: string; // Omitted for vector_only embedding?: number[]; // Float32 vector (e.g., 3072 dims) embeddingModel?: string; // 'text-embedding-3-large' embeddingDim?: number; // Source linkage sourceThingId?: Id<'things'>; // Which thing this came from sourceField?: string; // 'content', 'transcript', 'title' chunk?: { index: number; start?: number; end?: number; tokenCount?: number; overlap?: number; }; // Taxonomy labels?: string[]; // ['industry:fitness', 'skill:video'] // Metadata metadata?: { language?: string; protocol?: string; // For protocol-specific knowledge hash?: string; // Content hash for deduplication modelVersion?: string; qualityScore?: number; [key: string]: any; }; createdAt: number; updatedAt: number; deletedAt?: number; } ``` ### Junction: thingKnowledge Links knowledge things with roles: ```typescript type ThingKnowledge = { _id: Id<'thingKnowledge'>; thingId: Id<'things'>; knowledgeId: Id<'knowledge'>; role?: 'label' | 'summary' | 'chunk_of' | 'caption' | 'keyword'; metadata?: Record<string, any>; createdAt: number; } ``` --- ## Four Knowledge Types ### 1. Label (Taxonomy) Replaces legacy "tags" with structured labels: ```typescript { knowledgeType: 'label', labels: ['industry:fitness', 'skill:video-editing'], text: 'fitness video-editing', // Searchable text metadata: { category: 'skill', usageCount: 42 } } ``` **Label Categories:** - `industry:*` - fitness, tech, education - `skill:*` - video-editing, copywriting, design - `topic:*` - seo, social-media, email-marketing - `format:*` - video, blog, podcast, course - `goal:*` - lead-generation, brand-awareness - `audience:*` - beginners, professionals - `technology:*` - youtube, instagram, tiktok - `status:*` - draft, published, archived - `capability:*` - chat, analyze, generate - `protocol:*` - a2a, acp, ap2, x402, ag-ui - `payment_method:*` - stripe, crypto, invoice - `network:*` - sui, solana, base, ethereum ### 2. Document (Pre-chunking Wrapper) Represents a source document before chunking: ```typescript { knowledgeType: 'document', text: '[Full document text]', sourceThingId: blogPostId, sourceField: 'content', metadata: { title: 'How to Build a Creator Business', wordCount: 2500, language: 'en' } } ``` ### 3. Chunk (Semantic Fragment) 800-token chunks with embeddings for RAG: ```typescript { knowledgeType: 'chunk', text: '[Paragraph of content...]', embedding: [0.123, -0.456, ...], // 3072-dim vector embeddingModel: 'text-embedding-3-large', embeddingDim: 3072, sourceThingId: blogPostId, sourceField: 'content', chunk: { index: 5, tokenCount: 783, start: 3200, // Char offset end: 6100, overlap: 200 }, labels: ['industry:fitness', 'topic:seo'] } ``` ### 4. Vector-Only (Privacy) Embedding without storing plaintext: ```typescript { knowledgeType: 'vector_only', embedding: [0.123, -0.456, ...], embeddingModel: 'text-embedding-3-large', sourceThingId: privateDocId, metadata: { hash: 'sha256:abc123...', // Content hash redacted: true } } ``` --- ## Cycle Flow ### How Knowledge Powers Generation ``` ┌─────────────────────────────────────────────────────┐ 1. User Request "Create a fitness course about weight loss" └──────────────────────┬──────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ 2. Vector Search Query embedding Find similar chunks Filter: labels=['industry:fitness', 'format:course'] └──────────────────────┬──────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ 3. RAG Context Assembly Top 10 chunks Context for LLM + Labels + Metadata └──────────────────────┬──────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ 4. LLM Generation Context + Prompt Generated content Track: cycle_request event └──────────────────────┬──────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ 5. New Thing Created Course entity + Lessons + Connections Log: content_event (action: 'created') └──────────────────────┬──────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ 6. Embed New Content Course chunks embeddings knowledge Link via thingKnowledge └─────────────────────────────────────────────────────┘ ``` --- ## Ingestion Pipeline ### Step 1: Extract Text from Things ```typescript // Which thing types have embeddable content? const embeddableSources = [ 'blog_post', // properties.content 'video', // properties.transcript 'podcast', // properties.transcript 'social_post', // properties.text 'email', // properties.content 'course', // properties.description + lessons 'lesson', // properties.content 'livestream', // properties.recording.transcript 'website', // properties.pages[].html (stripped) ]; ``` ### Step 2: Chunk Text ```typescript // Chunking strategy const chunkConfig = { size: 800, // tokens overlap: 200, // tokens boundaries: 'sentence', // Don't split mid-sentence preserveCode: true, // Keep code blocks intact preserveTables: true }; // Chunker output type Chunk = { text: string; index: number; tokenCount: number; start: number; // Char offset in source end: number; field: string; // Source field name labels?: string[]; // Inherited from thing }; ``` ### Step 3: Generate Embeddings ```typescript // Call embedding provider (OpenAI, Cohere, etc.) const embedding = await embedText({ text: chunk.text, model: 'text-embedding-3-large' }); // Returns: { embedding: number[], model: string, dim: number } ``` ### Step 4: Store Knowledge Items ```typescript // Create knowledge item const knowledgeId = await db.insert('knowledge', { knowledgeType: 'chunk', text: chunk.text, embedding: embedding.embedding, embeddingModel: embedding.model, embeddingDim: embedding.dim, sourceThingId: thingId, sourceField: chunk.field, chunk: { index: chunk.index, tokenCount: chunk.tokenCount, start: chunk.start, end: chunk.end, overlap: 200 }, labels: chunk.labels, metadata: { hash: sha256(chunk.text), language: 'en' }, createdAt: Date.now(), updatedAt: Date.now() }); // Link to thing await db.insert('thingKnowledge', { thingId, knowledgeId, role: 'chunk_of', createdAt: Date.now() }); ``` --- ## Retrieval (RAG) ### Vector Search Query ```typescript // User asks: "How do I grow on Instagram?" const queryEmbedding = await embedText({ text: "How do I grow on Instagram?", model: 'text-embedding-3-large' }); // Vector search const results = await vectorSearch('knowledge', { vectorField: 'embedding', query: queryEmbedding.embedding, filter: { knowledgeType: 'chunk', organizationId: currentOrgId, // Multi-tenant isolation 'labels': { $in: ['topic:social-media', 'technology:instagram'] } }, k: 10, // Top 10 results threshold: 0.7 // Min similarity score }); ``` ### Hybrid Scoring Combine semantic similarity with symbolic signals: ```typescript // Boost by: // 1. Label matches // 2. Recency // 3. Quality score // 4. User feedback const reranked = results.map(r => ({ ...r, score: r.similarity * 0.6 + // Vector similarity labelBoost(r.labels) * 0.2 + // Label match recencyBoost(r.createdAt) * 0.1 + // Newer is better qualityBoost(r.metadata.qualityScore) * 0.1 })).sort((a, b) => b.score - a.score); ``` ### Context Assembly ```typescript // Build context for LLM const context = { chunks: reranked.slice(0, 5), // Top 5 chunks totalTokens: sum(chunks.map(c => c.chunk.tokenCount)), sources: unique(chunks.map(c => c.sourceThingId)), labels: unique(chunks.flatMap(c => c.labels)) }; // Generate answer with citations const answer = await llm.generate({ system: "You are a creator business expert.", context: context.chunks.map(c => c.text).join('\n\n'), prompt: "How do I grow on Instagram?", citations: context.sources // Link back to source things }); ``` --- ## Cycle Events Track all AI generations in the ontology: ```typescript // 1. Cycle requested await db.insert('events', { type: 'cycle_request', actorId: userId, targetId: cycleRequestId, timestamp: Date.now(), metadata: { organizationId, model: 'gpt-4', prompt: 'Create a fitness course...', contextChunks: 10, cost: 0.045, price: 0.10 } }); // 2. Cycle completed await db.insert('events', { type: 'cycle_completed', actorId: 'system', targetId: cycleRequestId, timestamp: Date.now(), metadata: { result: courseId, tokensGenerated: 2500, latency: 3.2, // seconds revenue: 0.10 } }); // 3. Update org usage const org = await db.get(organizationId); await db.patch(organizationId, { properties: { ...org.properties, usage: { ...org.properties.usage, cycles: org.properties.usage.cycles + 1 } } }); // 4. Check quota if (org.properties.usage.cycles >= org.properties.limits.cycles) { await db.insert('events', { type: 'cycle_quota_exceeded', actorId: 'system', targetId: organizationId, timestamp: Date.now(), metadata: { limit: org.properties.limits.cycles, usage: org.properties.usage.cycles } }); } ``` --- ## Cycle Revenue Model ### Daily Revenue Collection ```typescript // Collect all cycle revenue for the day const today = new Date().setHours(0, 0, 0, 0); const cycleEvents = await db .query('events') .withIndex('type_time', q => q.eq('type', 'cycle_completed') .gte('timestamp', today) ) .collect(); const metrics = { totalCycles: cycleEvents.length, totalRevenue: sum(cycleEvents.map(e => e.metadata.revenue)), totalCosts: sum(cycleEvents.map(e => e.metadata.cost)), netProfit: 0, // Calculated below profitMargin: 0 }; metrics.netProfit = metrics.totalRevenue - metrics.totalCosts; metrics.profitMargin = (metrics.netProfit / metrics.totalRevenue) * 100; // Log revenue collection event await db.insert('events', { type: 'cycle_revenue_collected', actorId: 'system', targetId: platformOwnerId, timestamp: Date.now(), metadata: { ...metrics, network: 'sui', treasuryAddress: process.env.PLATFORM_TREASURY_SUI, txDigest: '...' // On-chain transaction } }); ``` ### Revenue Share Distribution If organizations have revenue share agreements: ```typescript const org = await db.get(organizationId); if (org.properties.revenueShare > 0) { const orgRevenue = metrics.totalRevenue * org.properties.revenueShare; const platformRevenue = metrics.totalRevenue - orgRevenue; // Log revenue generation await db.insert('events', { type: 'org_revenue_generated', actorId: organizationId, targetId: platformOwnerId, timestamp: Date.now(), metadata: { totalRevenue: metrics.totalRevenue, orgShare: orgRevenue, platformShare: platformRevenue, revenueSharePercentage: org.properties.revenueShare } }); // Distribute to org await db.insert('events', { type: 'revenue_share_distributed', actorId: platformOwnerId, targetId: orgOwnerId, timestamp: Date.now(), metadata: { amount: orgRevenue, percentage: org.properties.revenueShare, network: 'sui', txDigest: '...' } }); } ``` --- ## Cycle Score Track how many times AI modifies the ontology: ```markdown # Cycle Score Measures ontology stability. Lower is better. Current Score: 15 Last Hash: adbe7c5cd7e0d7bd1f17917054091b9b02204936aaeb2e60a395df39e7deb8cd Updated: 2025-10-07T17:21:52Z ## History - 2025-10-07T17:09:16Z: score=1 (initial) - 2025-10-07T17:21:52Z: score=15 (current) ## Goal Keep score < 20 per month. Stability = beauty. ``` --- ## Lifecycle & Governance ### Versioning - Store `metadata.hash` (SHA256 of text) - Skip re-embedding if hash unchanged - Track `metadata.modelVersion` for model upgrades ### Re-embedding Policy - On content change: schedule re-embed - On model upgrade: background reindex - Debounce: wait 5 minutes before re-embedding ### Archival - Mark old chunks `deletedAt` on major edits - Keep for audit trail - GC after 90 days ### Quality Tracking ```typescript metadata: { qualityScore: 0.85, // 0-1 userFeedback: [ { userId, rating: 5, comment: 'very helpful' } ], clickThrough: 0.42, // CTR on citations bounceRate: 0.12 } ``` --- ## Security & Privacy ### PII Redaction Before embedding, strip: - Email addresses - Phone numbers - Credit card numbers - Social security numbers Mark with `metadata.redacted = true` ### Sensitive Content Use `knowledgeType: 'vector_only'`: - No plaintext stored - Only embeddings + hash - Citations show source but not content ### Access Control Vector search MUST filter by: ```typescript filter: { organizationId: currentOrgId, // Multi-tenant isolation // Only search content user can access } ``` --- ## Performance ### Indexes ```typescript // Knowledge table - by_type(knowledgeType) - by_source(sourceThingId) - by_created(createdAt) - by_embedding (vector index - provider-specific) // thingKnowledge junction - by_thing(thingId) - by_knowledge(knowledgeId) ``` ### Query Optimization - Use filters before vector search - Limit k to 5-20 - Set similarity threshold (0.7+) - Cache common queries ### Chunking Best Practices - 800 tokens = sweet spot - 200 token overlap = continuity - Sentence boundaries = coherence - Preserve structure (code, tables) --- ## The Beautiful Loop ``` Create Thing Extract Text Chunk Embed Store Knowledge └───────── Generate with RAG Vector Search ───────┘ ``` Knowledge makes AI generation **context-aware**, **organization-scoped**, and **infinitely scalable**. Every thing becomes searchable. Every connection becomes discoverable. Every event becomes learnable. **This is how ONE grows beautifully.**