UNPKG

oneie

Version:

Build apps, websites, and AI agents in English. Zero-interaction setup for AI agents (Claude Code, Cursor, Windsurf). Download to your computer, run in the cloud, deploy to the edge. Open source and free forever.

1,608 lines (1,289 loc) 132 kB
--- title: Ontology dimension: knowledge category: ontology.md tags: 6-dimensions, ai, architecture, ontology related_dimensions: connections, events, groups, people, things scope: global created: 2025-11-25 updated: 2025-11-25 version: 2.0.0 ai_context: | This document is part of the knowledge dimension in the ontology.md category. Location: one/knowledge/ontology-v2.md Purpose: Documents one platform - ontology specification v2 Related dimensions: connections, events, groups, people, things For AI agents: Read this to understand ontology. --- # ONE Platform - Ontology Specification V2 **Version:** 2.0.0 (Reality as DSL - The Universal Code Generation Language) **Status:** Active - Reality-Aware Architecture **Design Principle:** This isn't just a data model. It's a Domain-Specific Language (DSL) that models reality itself, enabling 98% AI code generation accuracy through compound structure. --- ## Why This Changes Everything ### The Breakthrough: Reality as DSL **Most developers think databases model their application.** We flipped this. **The 6-dimension ontology models reality itself**. Applications map to it. This enables: - **98% AI code generation accuracy** (not 30-70%) - **Compound structure** (each feature makes the next MORE accurate, not less) - **Universal feature import** (clone ANY system into the ontology) - **Never breaks** (reality doesn't change, technology does) ### What AI Sees **Traditional Codebase (Pattern Divergence):** ``` Feature 1: createUser(email) ────────┐ Feature 2: addProduct(name) ─────────┼─→ 100 patterns Feature 3: registerCustomer(data) ───┤ AI confused Feature 4: insertOrder(items) ───────┤ Accuracy: 30% ...each uses different approach ``` **ONE Codebase (Pattern Convergence):** ``` Feature 1: provider.things.create({ type: "user" }) ────┐ Feature 2: provider.things.create({ type: "product" }) ─┼─→ 1 pattern Feature 3: provider.things.create({ type: "customer" })─┤ AI masters it Feature 4: provider.things.create({ type: "order" }) ───┤ Accuracy: 98% ...all use same pattern ``` **The difference:** Traditional codebases teach AI 100 patterns (chaos). ONE teaches AI 1 pattern (mastery). ### Why This Never Breaks **Reality is stable. Technology changes.** The 6 dimensions model reality: 1. **Groups** - Containers exist (friend circles → governments) 2. **People** - Actors authorize (who can do what) 3. **Things** - Entities exist (users, products, courses, agents) 4. **Connections** - Relationships relate (owns, purchased, enrolled_in) 5. **Events** - Actions happen (created, updated, purchased) 6. **Knowledge** - Understanding emerges (embeddings, search, RAG) These dimensions NEVER change because they model reality itself, not any specific technology. **Examples of systems that map perfectly:** - **Shopify** → Products (things), Orders (connections + events), Customers (people) - **Moodle** → Courses (things), Enrollments (connections), Completions (events) - **Stripe** → Payments (things), Transactions (connections + events), Customers (people) - **WordPress** → Posts (things), Authors (people), Categories (knowledge labels) **Every system maps to the same 6 dimensions.** That's why AI agents achieve 98% accuracy. --- ## Structure This ontology is organized into 6 dimension files: 1. **[organisation.md](./organisation.md)** - Multi-tenant isolation & ownership 2. **[people.md](./people.md)** - Authorization, governance, & user customization 3. **[things.md](./things.md)** - 66 entity types (what exists) 4. **[connections.md](./connections.md)** - 25 relationship types (how they relate) 5. **[events.md](./events.md)** - 67 event types (what happened) 6. **[knowledge.md](./knowledge.md)** - Vectors, embeddings, RAG (what it means) **Execution Guide:** 7. **[todo.md](./todo.md)** - 100-cycle execution sequence (plan in cycles, not days) **This document (Ontology.md)** contains the complete technical specification. The consolidated files above provide focused summaries and patterns. **Planning Paradigm:** We don't plan in days. We plan in **cycle passes** (Cycle 1-100). See [todo.md](./todo.md) for the complete 100-cycle template that guides feature implementation from idea to production. ## The 6-Dimension Reality Model **This is the universal interface.** Every feature in every system maps to these 6 dimensions. **Every single thing in ONE platform exists within one of these 6 dimensions:** ``` ┌──────────────────────────────────────────────────────────────┐ │ 1. GROUPS │ │ Multi-tenant isolation with hierarchical nesting - who owns │ │ what at group level (friend circles → DAOs → governments) │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 2. PEOPLE │ │ Authorization & governance - platform owner, group owners │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 3. THINGS │ │ Every "thing" - users, agents, content, tokens, courses │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 4. CONNECTIONS │ │ Every relationship - owns, follows, taught_by, powers │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 5. EVENTS │ │ Every action - purchased, created, viewed, completed │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 6. KNOWLEDGE │ │ Labels + chunks + vectors powering RAG & search │ └──────────────────────────────────────────────────────────────┘ ``` **The Universal Interface (How Technology Implements the Ontology):** ``` ┌─────────────────────────────────────────────────────────────────────┐ │ LAYER 1: UNIVERSAL INTERFACE │ │ (The 6-Dimension DSL) │ ├─────────────────────────────────────────────────────────────────────┤ │ groups → Hierarchical containers (friend circles → governments)│ │ people → Authorization & governance (who can do what) │ │ things → All entities (66 types: user, product, course...) │ │ connections → All relationships (25 types: owns, purchased...) │ │ events → All actions (67 types: created, updated, logged...) │ │ knowledge → AI understanding (embeddings, search, RAG) │ │ │ │ This layer NEVER changes. It models reality. │ └──────────────────┬──────────────────────────────────────────────────┘ │ ↓ Technology changes, ontology stays the same ┌─────────────────────────────────────────────────────────────────────┐ │ TECHNOLOGY ADAPTERS (swap freely) │ │ (Convex, Hono, Astro, React, etc.) │ ├─────────────────────────────────────────────────────────────────────┤ │ Backend: Hono API + Convex Database (implements ontology) │ │ Frontend: Astro SSR + React Islands (renders ontology) │ │ Real-time: Convex hooks (live ontology subscriptions) │ │ Static: Astro Content Collections (ontology as files) │ │ │ │ Technology can be swapped. Ontology stays the same. │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Dimension 1: Groups (Containers) **Purpose:** Partition the system with hierarchical nesting (friend circles → DAOs → governments) **Why it never changes:** Containers always contain things. Whether it's a lemonade stand or a global government, the concept of "container" is universal. **Pattern for AI:** ```typescript // AI learns: Everything belongs to a group provider.things.create({ groupId, type, name, properties }); ``` **Example mappings:** - Shopify Store → group (type: business) - Moodle School → group (type: organization) - DAO Treasury → group (type: dao) - Friend Circle → group (type: friend_circle) ### Dimension 2: People (Authorization) **Purpose:** Define who can do what (actors, roles, permissions) **Why it never changes:** Authorization is a universal concept. Someone always performs actions. **Pattern for AI:** ```typescript // AI learns: Every action has an actor events.log({ actorId: personId, type, targetId }); ``` **Example mappings:** - Shopify Admin → person (role: org_owner) - Moodle Student → person (role: customer) - Platform Owner → person (role: platform_owner) - Team Member → person (role: org_user) ### Dimension 3: Things (Entities) **Purpose:** All nouns in the system (66 types, infinitely extensible) **Why it never changes:** Entities exist. Users, products, courses, agents—these are all "things" with different types. **Pattern for AI:** ```typescript // AI learns: One pattern for all entities provider.things.create({ type: "product" | "course" | "user" | ..., name, properties }) ``` **Example mappings:** - Shopify Product → thing (type: product) - Moodle Course → thing (type: course) - Stripe Payment → thing (type: payment) - WordPress Post → thing (type: blog_post) **New entity type?** Just add to `properties`. No schema migration needed. ### Dimension 4: Connections (Relationships) **Purpose:** How entities relate to each other (25 types + metadata) **Why it never changes:** Relationships are universal. Things connect to other things. **Pattern for AI:** ```typescript // AI learns: One pattern for all relationships provider.connections.create({ fromThingId, toThingId, relationshipType, metadata, }); ``` **Example mappings:** - Shopify Order → connection (type: purchased) + event (type: order_placed) - Moodle Enrollment → connection (type: enrolled_in) - GitHub Follows → connection (type: following) - Token Holdings → connection (type: holds_tokens, metadata: { balance }) ### Dimension 5: Events (Actions) **Purpose:** Complete audit trail of what happened when (67 types + metadata) **Why it never changes:** Actions happen at specific times. This is universal. **Pattern for AI:** ```typescript // AI learns: All actions are logged the same way provider.events.log({ type, actorId, targetId, timestamp, metadata }); ``` **Example mappings:** - Shopify Checkout → event (type: payment_processed) - Moodle Lesson View → event (type: content_viewed) - User Login → event (type: user_login) - Token Purchase → event (type: tokens_purchased) ### Dimension 6: Knowledge (Understanding) **Purpose:** Labels, embeddings, and semantic search for AI **Why it never changes:** Categorization and understanding are universal concepts. **Pattern for AI:** ```typescript // AI learns: Knowledge is linked to things provider.knowledge.create({ sourceThingId, knowledgeType: "label" | "chunk", text, embedding, }); ``` **Example mappings:** - WordPress Categories → knowledge (type: label) - Course Content → knowledge (type: chunk, embedding: [...]) - Product Tags → knowledge (type: label) - Semantic Search → knowledge vector search --- **Golden Rule:** If you can't map your feature to these 6 dimensions, you're thinking about it wrong. **For AI Agents:** This ontology is your universal language. Learn these 6 patterns and you can generate ANY feature with 98% accuracy. --- ## Compound Structure Accuracy: Why AI Gets Better Over Time ### Traditional AI Code Generation (Pattern Divergence) **The death spiral:** ``` Generation 1: Clean code → 95% accurate Generation 2: Slight drift → 90% accurate (-5% - patterns starting to diverge) Generation 3: Pattern divergence → 80% accurate (-10% - AI sees multiple patterns) Generation 4: Inconsistency → 65% accurate (-15% - AI confused by variations) Generation N: Unmaintainable mess → 30% accurate (-20% - complete chaos) ``` **Why?** Each feature introduces NEW patterns. AI has to guess which pattern to use. ### ONE's AI Code Generation (Pattern Convergence) **The virtuous cycle:** ``` Generation 1: Maps to ontology → 85% accurate (learning the ontology) Generation 2: Follows patterns → 90% accurate (+5% - recognizing service pattern) Generation 3: Reuses services → 93% accurate (+3% - composing existing services) Generation 4: Predictable structure → 96% accurate (+3% - mastering Effect.ts patterns) Generation N: Perfect consistency → 98%+ accurate (+2% - generalized patterns) ``` **Why?** Each feature uses the SAME patterns. AI masters the ontology, then reuses it. ### How Agents Learn the Ontology **Generation 1-5: Learning (85% accurate)** ``` Agent thinks: "I see things being created with `type` field. Let me check..." "All entities are things? Even users, products, courses?" "Connections link things together. Got it." "Events log actions. People are actors." "Pattern emerging: 6 dimensions for everything." CONFIDENCE: 85% (still learning) ``` **Generation 6-20: Recognizing (90% accurate)** ``` Agent thinks: "Every create operation uses provider.things.create. 100% of the time." "Every service uses Effect.gen. Pattern confirmed." "All errors are tagged unions. I can handle exhaustively." "Dependencies are injected. I know what's needed." CONFIDENCE: 90% (pattern recognized) ``` **Generation 21-50: Composing (93% accurate)** ``` Agent thinks: "I can reuse existing services:" "- createThing service exists" "- createConnection service exists" "- logEvent service exists" "My new feature: compose these three services." "No new patterns needed. Just composition." CONFIDENCE: 93% (composing existing patterns) ``` **Generation 51-100: Mastering (96% accurate)** ``` Agent thinks: "I've generated 50 services. All follow same structure:" "1. Get provider from context" "2. Call provider operation" "3. Handle typed errors" "4. Return typed result" "Pattern is UNIVERSAL. Works for ANY feature." CONFIDENCE: 96% (mastered the structure) ``` **Generation 100+: Generalizing (98%+ accurate)** ``` Agent thinks: "I don't even need to think. The pattern IS the system." "New feature request? Map to 6 dimensions." "Need validation? Effect.ts service." "Need data? Provider interface." "Need state? Nanostores." "Every decision is deterministic." CONFIDENCE: 98%+ (system internalized) ``` ### What This Means for Development **Feature #1:** - Traditional: 8 hours (70% AI, 30% human) - ONE: 8 hours (70% AI, 30% human) - **No difference yet** **Feature #10:** - Traditional: 10 hours (60% AI, 40% human - patterns diverging) - ONE: 6 hours (85% AI, 15% human - patterns converging) - **ONE is 1.7x faster** **Feature #50:** - Traditional: 16 hours (40% AI, 60% human - technical debt) - ONE: 3 hours (95% AI, 5% human - pattern mastery) - **ONE is 5.3x faster** **Feature #100:** - Traditional: 24 hours (25% AI, 75% human - chaos) - ONE: 1.5 hours (98% AI, 2% human - generalized) - **ONE is 16x faster** **Cumulative for 100 features:** - Traditional: 1,400 hours - ONE: 350 hours - **ONE is 4x faster overall** - **And the gap keeps growing** ### Why Schema Migrations Never Break This **New entity type?** ```typescript // NO schema migration needed { type: "new_thing", name: "...", properties: { ...custom } } ``` **New field on existing type?** ```typescript // NO schema migration needed { type: "product", properties: { price, SKU, newField: "value" } } ``` **New relationship?** ```typescript // NO schema migration needed { relationshipType: "new_connection", metadata: { ...custom } } ``` **New protocol integration?** ```typescript // NO schema migration needed { relationshipType: "transacted", metadata: { protocol: "new_protocol", ...custom } } ``` **Result:** Technology changes (React → Svelte, REST → GraphQL), but the ontology stays the same forever. --- ## GROUPS: The Isolation Boundary with Hierarchical Nesting Purpose: Partition the system with perfect isolation and support nested groups (groups within groups) - from friend circles to DAOs to governments. Every group owns its own graph of things, connections, events, and knowledge. ### Group Structure ```typescript { _id: Id<'groups'>, slug: string, // REQUIRED: URL identifier (/group/slug) name: string, // REQUIRED: Display name type: 'friend_circle' | 'business' | 'community' | 'dao' | 'government' | 'organization', parentGroupId?: Id<'groups'>, // OPTIONAL: Parent group for hierarchical nesting description?: string, // OPTIONAL: About text metadata: Record<string, any>, settings: { visibility: 'public' | 'private', joinPolicy: 'open' | 'invite_only' | 'approval_required', plan: 'starter' | 'pro' | 'enterprise', limits: { users: number, storage: number, // GB apiCalls: number, } }, status: 'active' | 'archived', createdAt: number, updatedAt: number, } ``` ### Common Fields by Use Case **Identity:** `[slug, name]` - Who they are + URL **Web:** `[slug, name, description]` - Website generation **Operations:** `[status, type, settings, parentGroupId]` - System management ### Why Groups Matter 1. **Multi-Tenant Isolation:** Each group's data is completely separate 2. **Hierarchical Nesting:** Groups can contain sub-groups for complex organizations (parent → child → grandchild...) 3. **Flexible Types:** From friend circles (2 people) to businesses to DAOs to governments (billions) 4. **Resource Quotas:** Control costs and usage per group 5. **Privacy Control:** Groups can be public or private with controlled access 6. **Flexible Scale:** Scales from friend circles to global governments without schema changes ### Hierarchical Group Examples by Domain **E-Commerce (Retail Chain):** ``` Corporate Headquarters (group) ├─ North American Division (child group) │ ├─ New York Store (grandchild group) │ └─ California Store (grandchild group) └─ European Division (child group) ├─ London Store (grandchild group) └─ Paris Store (grandchild group) ``` **Education (University System):** ``` MIT (group) ├─ School of Engineering (child group) │ ├─ Computer Science Dept (grandchild group) │ ├─ Electrical Engineering Dept (grandchild group) │ └─ Mechanical Engineering Dept (grandchild group) ├─ School of Science (child group) │ ├─ Mathematics Dept (grandchild group) │ └─ Physics Dept (grandchild group) └─ School of Business (child group) ``` **Creator (Multi-Channel Brand):** ``` Creator Brand (group) ├─ YouTube Channel (child group) │ └─ Content Series 1 (grandchild group) ├─ Podcast (child group) │ └─ Season 2 (grandchild group) └─ Community (child group - Discord server with channels) ``` **Crypto (DAO Treasury):** ``` DAO Treasury (group) ├─ Core Operations (child group) │ ├─ Development Fund (grandchild group) │ └─ Marketing Fund (grandchild group) ├─ Investment Committee (child group) │ └─ Venture Capital Allocation (grandchild group) └─ Community Grants (child group) ``` --- ### System Group Pattern (Global Entities) **Problem:** Some entities are truly global and don't belong to any user group. **Examples:** - Platform-wide settings - System notifications - Global rate limits - Reference data (timezones, currencies, countries) - Platform-level analytics **Solution:** Reserve a special "system" group. ```typescript // Create system group on platform initialization const SYSTEM_GROUP_ID = 'system'; await ctx.db.insert('groups', { _id: SYSTEM_GROUP_ID, slug: 'system', name: 'System', type: 'organization', settings: { visibility: 'private', joinPolicy: 'invite_only', plan: 'enterprise', limits: { users: Infinity, storage: Infinity, apiCalls: Infinity } }, status: 'active', createdAt: Date.now(), }); // Use for global entities await ctx.db.insert('things', { type: 'platform_setting', name: 'Global Rate Limit', groupId: SYSTEM_GROUP_ID, // System group properties: { maxRequestsPerMinute: 1000, scope: 'global' }, status: 'active', createdAt: Date.now(), }); ``` **Rules:** - System group ID is reserved and cannot be deleted - Only platform owners can create things in system group - System group has no resource limits - System entities are visible to all groups (read-only) --- ## PEOPLE: Authorization & Governance Purpose: Define who can do what. People direct groups, customize AI agents, and govern access. ### Person Structure ```typescript { _id: Id<'people'>, email: string, username: string, displayName: string, // CRITICAL: Role determines access level role: 'platform_owner' | 'group_owner' | 'group_user' | 'customer', // Group context groupId?: Id<'groups'>, // Current/default group permissions?: string[], // Profile bio?: string, avatar?: string, // Multi-tenant tracking groups: Id<'groups'>[], // All groups this person belongs to createdAt: number, updatedAt: number, } ``` ### Four Roles 1. **Platform Owner** (Anthony) - Owns the ONE Platform - 100% revenue from platform-level services - Can access all groups (support/debugging) - Creates new groups 2. **Group Owner** - Owns/manages one or more groups - Controls users, permissions, billing within group - Customizes AI agents and frontend - Revenue sharing with platform 3. **Group User** - Works within a group - Limited permissions (defined by group owner) - Can create content, run agents (within quotas) 4. **Customer** - External user consuming content - Purchases tokens, enrolls in courses - No admin access ### Why People Matter 1. **Authorization:** Every action must have an actor (person) 2. **Governance:** Group owners control who can do what 3. **Audit Trail:** Events log who did what when 4. **Customization:** People teach AI agents their preferences --- ## KNOWLEDGE: Labels, Chunks, and Vectors (RAG) Purpose: unify taxonomy (“tags”) and retrieval‑augmented generation (RAG) under one table. A knowledge item can be a label (former tag), a document wrapper, or a chunk with an embedding. Design principles: - Protocol‑agnostic: store protocol details in `metadata`. - Many‑to‑many: link knowledge ⇄ things via `thingKnowledge` with optional context metadata. - Scalable: consolidated types minimize index fan‑out; embeddings enable semantic search. ### Knowledge Types ```typescript type KnowledgeType = | "label" // replaces legacy "tag"; lightweight categorical marker | "document" // wrapper for a source text/blob (pre-chunking) | "chunk" // atomic chunk of text with embedding | "vector_only"; // embedding without stored text (e.g., privacy) ``` ### Knowledge Structure ```typescript { _id: Id<'knowledge'>, knowledgeType: KnowledgeType, // Textual content (optional for label/vector_only) text?: string, // Embedding for semantic search (optional for label/document) embedding?: number[], // Float32 vector; model-dependent dimension embeddingModel?: string, // e.g., "text-embedding-3-large" embeddingDim?: number, // Source linkage sourceThingId?: Id<'things'>, // Primary source entity sourceField?: string, // e.g., 'content', 'transcript', 'title' chunk?: { index: number; start?: number; end?: number; tokenCount?: number; overlap?: number }, // Lightweight categorization (free-form) labels?: string[], // Replaces per-thing tags; applied to knowledge // Additional metadata (protocol, language, mime, hash, version) metadata?: Record<string, any>, createdAt: number, updatedAt: number, deletedAt?: number, } ``` ### Junction: thingKnowledge ```typescript { _id: Id<'thingKnowledge'>, thingId: Id<'things'>, knowledgeId: Id<'knowledge'>, role?: 'label' | 'summary' | 'chunk_of' | 'caption' | 'keyword', // Context for the link (e.g., confidence, section name) metadata?: Record<string, any>, createdAt: number, } ``` ### Indexes (recommended) - `knowledge.by_type` (knowledgeType) - `knowledge.by_source` (sourceThingId) - `knowledge.by_created` (createdAt) - `thingKnowledge.by_thing` (thingId) - `thingKnowledge.by_knowledge` (knowledgeId) - Vector index (provider-dependent): `knowledge.by_embedding` for ANN search ### How Domains Apply Knowledge **Education - Learning Objectives & Study Materials:** ```typescript // Knowledge: Learning objective chunk { knowledgeType: 'chunk', text: 'Students should be able to solve quadratic equations', sourceThingId: courseId, labels: ['subject:mathematics', 'grade:9-12', 'objective:apply', 'skill:algebra'] } // Link: Course references this learning objective { thingId: courseId, knowledgeId: knowledgeId, role: 'learning_objective' } ``` **Creator - Content SEO & Discovery:** ```typescript // Knowledge: Video description chunk with embedded metadata { knowledgeType: 'chunk', text: 'This video teaches React hooks for beginners...', sourceThingId: videoId, embedding: [0.1, 0.2, ...], labels: ['topic:react', 'difficulty:beginner', 'platform:youtube', 'series:javascript101'] } ``` **E-Commerce - Product Categorization & Search:** ```typescript // Knowledge: Product description for semantic search { knowledgeType: 'document', text: 'Blue wireless headphones with 40-hour battery life', sourceThingId: productId, embedding: [0.5, 0.3, ...], labels: ['category:electronics', 'color:blue', 'feature:wireless', 'price_range:premium'] } ``` **Crypto - Risk Analysis & Token Intelligence:** ```typescript // Knowledge: Token risk assessment { knowledgeType: 'chunk', text: 'Token has no minting restrictions, moderate holder concentration', sourceThingId: tokenId, labels: ['risk:medium', 'metric:tvl_trend_up', 'audit:completed', 'governance:none'] } // Knowledge: Protocol dependency analysis { knowledgeType: 'label', text: 'Depends on Chainlink oracle', sourceThingId: protocolId, labels: ['dependency:critical', 'type:oracle', 'risk_factor:oracle'] } ``` ### RAG Ingestion Strategy Objective: Attach vectors to **relevant** content for high-quality retrieval while controlling costs and maintaining performance. **CRITICAL:** Not every field needs RAG. Be selective. Embeddings are expensive in storage, compute, and money. --- #### What to Embed (Decision Matrix by Domain) **Universal Rule:** ``` IF "user will semantically search this" → EMBED IF "user will filter/sort this" → DON'T EMBED IF "structured data" → DON'T EMBED ``` | Content Type | Embed? | Domain Example | Use Case | |--------------|--------|----------------|----------| | **Long-form Content** ||||| | Blog post content | ✅ YES | Creator | "Find posts about React hooks" | | Course lesson content | ✅ YES | E-Learning | "Search lessons on form validation" | | Video/podcast transcripts | ✅ YES | E-Learning, Creator | Makes A/V content searchable | | Email campaign body | ✅ YES | Creator, E-Commerce | Content discovery | | **Product Content** ||||| | Product descriptions | ✅ YES | E-Commerce | "Find eco-friendly water bottles" | | Product specs (JSON) | ❌ NO | E-Commerce | Use filters: `size === 'L'` | | Customer reviews | ✅ YES | E-Commerce | "What do people say about durability?" | | Q&A responses | ✅ YES | E-Commerce | Customer support knowledge base | | Prices, SKUs, inventory | ❌ NO | E-Commerce | Exact match: `price < 50` | | **Social Content** ||||| | Social post text (>100 chars) | ✅ YES | Social | "Find my AI posts with high engagement" | | Social post text (<100 chars) | ❌ NO | Social | Too short, use labels | | Thread content (combined) | ✅ YES | Social | Combine into single chunk | | Hashtags | ❌ NO | Social | Exact match, not semantic | | Comments (>50 words) | ⚠️ MAYBE | Social | Only for community insights | | **Image Generation** ||||| | Image prompts | ✅ YES | Image Gen | "Find cyberpunk city prompts" | | Prompt descriptions | ✅ YES | Image Gen | Style discovery | | Negative prompts | ✅ YES | Image Gen | "Avoid common mistakes" | | Generation params | ❌ NO | Image Gen | Use filters: `steps === 50` | | Image pixels | ❌ NO | Image Gen | Use CLIP embeddings separately | | **Educational Content** ||||| | Course descriptions | ✅ YES | E-Learning | Discovery + recommendations | | Lesson summaries | ✅ YES | E-Learning | "React hooks for beginners" | | Student notes | ✅ YES | E-Learning | Personal knowledge base | | Quiz questions | ⚠️ MAYBE | E-Learning | Only for study guides | | Progress data | ❌ NO | E-Learning | Use analytics: `progress >= 0.5` | | Certificates | ❌ NO | E-Learning | Metadata only | | **Metadata** ||||| | Titles, summaries | ✅ YES | All | High signal-to-noise | | Descriptions (>50 words) | ✅ YES | All | Context for search | | Tags, categories | ❌ NO | All | Use `labels` instead (free) | | **User-Generated** ||||| | Bios, profiles | ⚠️ MAYBE | Social | Only for people search | | **System Data** ||||| | Logs, errors | ❌ NO | All | Use log aggregation tools | | Metrics, analytics | ❌ NO | All | Use time-series DB | | Audit trails | ❌ NO | All | Events table is sufficient | **Domain-Specific Examples:** **E-Commerce:** ```typescript // ✅ EMBED: Product discovery "Find sustainable yoga mats" → Semantic search on descriptions // ❌ DON'T EMBED: Filtering "Show mats under $30" → Filter: price < 30 "In stock only" → Filter: inventory > 0 ``` **E-Learning:** ```typescript // ✅ EMBED: Course/lesson discovery "Learn React hooks for beginners" → Semantic search on course descriptions + lesson transcripts // ❌ DON'T EMBED: Progress tracking "Show my completed courses" → Filter: connections where completed = true ``` **Image Generation:** ```typescript // ✅ EMBED: Prompt library "Cyberpunk city at night" → Semantic search on successful prompts // ❌ DON'T EMBED: Generation settings "Images with CFG 7.5" → Filter: metadata.cfg === 7.5 ``` **Social Posting:** ```typescript // ✅ EMBED: Content inspiration "My posts about AI with high engagement" → Semantic search on post text // ❌ DON'T EMBED: Engagement metrics "Posts with >1000 likes" → Filter: engagement.likes > 1000 ``` **Cost Reality Check (10K items):** | Domain | What to Embed | Monthly Cost | |--------|---------------|--------------| | E-Commerce | Product descriptions | ~$1.30 | | E-Learning | Lesson transcripts | ~$13 (longer content) | | Image Gen | Prompts + descriptions | ~$0.50 | | Social | Long posts only | ~$0.80 | **Key Insight:** Be ruthlessly selective. Only embed content users will **semantically search**, not data they'll **filter or sort**. --- #### When to Update Embeddings **Trigger:** Content changes in source thing. ```typescript // On content update export const updateBlogPost = mutation({ handler: async (ctx, { postId, content }) => { // 1. Update the thing await ctx.db.patch(postId, { properties: { content }, updatedAt: Date.now(), }); // 2. Schedule re-embedding (debounced) await ctx.scheduler.runAfter(5000, internal.knowledge.reEmbedThing, { thingId: postId, fields: ['content'], // Only re-embed changed fields }); // 3. Log event await ctx.db.insert('events', { type: 'content_event', actorId: ctx.auth.userId!, targetId: postId, groupId: post.groupId, timestamp: Date.now(), metadata: { action: 'updated', triggeredReEmbedding: true }, }); }, }); ``` **Re-embedding Strategy:** | Change Type | Action | Why | |-------------|--------|-----| | Content edited | Re-embed immediately | Content changed | | Title/summary edited | Re-embed immediately | High-signal metadata | | Tags/labels changed | Update labels only | No embedding needed | | Status changed (draft→published) | Re-embed if first publish | Visibility changed | | Minor typo fix | Debounce 5 seconds | Avoid re-embedding every keystroke | | Bulk import | Batch embed (100/batch) | Rate limiting | **Cost Optimization:** ```typescript // Hash content to detect actual changes import { createHash } from 'crypto'; export const reEmbedThing = internalMutation({ handler: async (ctx, { thingId, fields }) => { const thing = await ctx.db.get(thingId); const content = fields.map(f => thing.properties[f]).join('\n'); // Hash current content const contentHash = createHash('sha256').update(content).digest('hex'); // Check if content actually changed const existingKnowledge = await ctx.db .query('knowledge') .withIndex('by_source') .filter(q => q.eq(q.field('sourceThingId'), thingId)) .first(); if (existingKnowledge?.metadata?.contentHash === contentHash) { console.log('Content unchanged, skipping re-embedding'); return; // Save $$$ by skipping } // Content changed, re-embed await embedAndStore(ctx, thing, content, contentHash); }, }); ``` --- #### Chunking Standard **Window:** ~800 tokens (~3,200 characters) **Overlap:** ~200 tokens (~800 characters) **Boundaries:** Sentence-aware (don't split mid-sentence) ```typescript export async function chunkText(text: string): Promise<Chunk[]> { const chunks: Chunk[] = []; const sentences = text.split(/[.!?]+\s+/); // Split on sentence boundaries let currentChunk = ''; let currentTokens = 0; let chunkIndex = 0; for (const sentence of sentences) { const sentenceTokens = estimateTokens(sentence); if (currentTokens + sentenceTokens > 800 && currentChunk.length > 0) { // Save chunk chunks.push({ index: chunkIndex++, text: currentChunk.trim(), tokenCount: currentTokens, start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0, end: currentChunk.length, }); // Start new chunk with overlap (last 200 tokens) const overlapText = getLastNTokens(currentChunk, 200); currentChunk = overlapText + ' ' + sentence; currentTokens = 200 + sentenceTokens; } else { currentChunk += ' ' + sentence; currentTokens += sentenceTokens; } } // Save final chunk if (currentChunk.length > 0) { chunks.push({ index: chunkIndex, text: currentChunk.trim(), tokenCount: currentTokens, start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0, end: currentChunk.length, }); } return chunks; } ``` --- #### Embedding Pipeline ```typescript // 1. Schedule embedding export const scheduleEmbeddingForThing = mutation({ handler: async (ctx, { thingId, fields }) => { await ctx.scheduler.runAfter(0, internal.knowledge.embedThing, { thingId, fields, }); }, }); // 2. Embed text (internal action - calls OpenAI) export const embedText = internalAction({ handler: async (ctx, { text, model = 'text-embedding-3-large' }) => { const response = await openai.embeddings.create({ model, input: text, }); return { embedding: response.data[0].embedding, dim: response.data[0].embedding.length, }; }, }); // 3. Store chunks with embeddings export const upsertKnowledgeChunks = internalMutation({ handler: async (ctx, { thingId, chunks, embeddings }) => { // Delete old chunks const oldChunks = await ctx.db .query('knowledge') .withIndex('by_source') .filter(q => q.eq(q.field('sourceThingId'), thingId)) .collect(); for (const old of oldChunks) { await ctx.db.delete(old._id); } // Insert new chunks for (let i = 0; i < chunks.length; i++) { const knowledgeId = await ctx.db.insert('knowledge', { knowledgeType: 'chunk', text: chunks[i].text, embedding: embeddings[i].embedding, embeddingModel: 'text-embedding-3-large', embeddingDim: embeddings[i].dim, sourceThingId: thingId, chunk: chunks[i], metadata: { contentHash: chunks[i].hash, embeddingVersion: 'v3', }, createdAt: Date.now(), }); // Link to thing await ctx.db.insert('thingKnowledge', { thingId, knowledgeId, role: 'chunk_of', createdAt: Date.now(), }); } }, }); ``` --- #### Cost Management **Embedding Costs (OpenAI text-embedding-3-large):** - $0.13 per 1M tokens - Average blog post: ~1,000 tokens = $0.00013 - 1M blog posts embedded: ~$130 **Storage Costs:** - 3,072 dimensions × 4 bytes = 12KB per chunk - 1M chunks = 12GB of vector data - Convex: ~$0.25/GB/month = $3/month per 1M chunks **Optimization Strategies:** 1. **Selective Embedding:** Only embed content types with high search value 2. **Lazy Embedding:** Embed on first publish, not on draft save 3. **Batch Processing:** Embed 100 items at a time to avoid rate limits 4. **Content Hashing:** Skip re-embedding if content unchanged 5. **Smaller Models:** Use `text-embedding-3-small` (512 dims) for less critical content (75% cost savings) --- #### Query & Retrieval ```typescript export const semanticSearch = query({ args: { query: v.string(), groupId: v.id('groups'), limit: v.number() }, handler: async (ctx, { query, groupId, limit = 10 }) => { // 1. Embed query const queryEmbedding = await ctx.runAction(internal.knowledge.embedText, { text: query, }); // 2. Vector search (filtered by group) const results = await ctx.db .vectorSearch('knowledge', 'by_embedding', { vector: queryEmbedding.embedding, limit: limit * 2, // Over-fetch for filtering filter: q => q.eq(q.field('knowledgeType'), 'chunk'), }) .collect(); // 3. Filter by group (get source things) const groupResults = []; for (const result of results) { const sourceThing = await ctx.db.get(result.sourceThingId); if (sourceThing?.groupId === groupId) { groupResults.push({ ...result, score: result._score, thing: sourceThing, }); } if (groupResults.length >= limit) break; } return groupResults; }, }); ``` --- #### Governance & Lifecycle **Versioning:** - Store `metadata.contentHash` of source content - If hash unchanged, skip re-embedding - Track `metadata.embeddingVersion` for model migrations **Retention:** - Archive old chunks on major content edits (keep last 3 versions) - Garbage collect orphaned knowledge items (no thingKnowledge links) - Delete embeddings when source thing is hard-deleted **Quality:** - Track `metadata.qualityScore` based on user feedback - Monitor search relevance metrics - A/B test embedding models **Summary:** Be ruthlessly selective about what gets embedded. RAG is powerful but expensive. Embed content users will semantically search, not structured data they'll filter. ### Knowledge Governance Policy: Default is free-form, user-extensible knowledge labels for maximum flexibility and zero schema churn. - Curated label prefixes (recommended): `skill:*`, `industry:*`, `topic:*`, `format:*`, `goal:*`, `audience:*`, `technology:*`, `status:*`, `capability:*`, `protocol:*`, `payment_method:*`, `network:*`. - Validation: Enforce label hygiene (no duplicates within scope); allow synonyms via an alias list if needed. - Ownership: Platform/group owners may curate official labels; users can still apply ad‑hoc labels. - Hygiene: Periodically consolidate low-usage duplicates; do not delete knowledge items with active references—mark deprecated instead. --- ## THINGS: All The "Things" ### What Goes in Things? **Simple test:** If you can point at it and say "this is a \_\_\_", it's a thing. Examples: - "This is a **creator**" ✅ Thing - "This is a **blog post**" ✅ Thing - "This is a **token**" ✅ Thing - "This is a **relationship**" ❌ Connection, not thing - "This is a **purchase**" ❌ Event, not thing ### Thing Types **66 Types Organized in 13 Categories:** ```typescript type ThingType = // CORE (4) | "creator" // Human creator (role: platform_owner, org_owner, org_user, customer) | "ai_clone" // Digital twin of creator | "audience_member" // Fan/user (role: customer) | "organization" // Multi-tenant organization // BUSINESS AGENTS (10) | "strategy_agent" // Vision, planning, OKRs | "research_agent" // Market, trends, competitors | "marketing_agent" // Content strategy, SEO, distribution | "sales_agent" // Funnels, conversion, follow-up | "service_agent" // Support, onboarding, success | "design_agent" // Brand, UI/UX, assets | "engineering_agent" // Tech, integration, automation | "finance_agent" // Revenue, costs, forecasting | "legal_agent" // Compliance, contracts, IP | "intelligence_agent" // Analytics, insights, predictions // CONTENT (7) | "blog_post" // Written content (guides, newsletters, articles) | "video" // Video content (lectures, demos, shorts) | "podcast" // Audio content (episodes, interviews) | "social_post" // Social media post (all platforms) | "email" // Email content (campaigns, newsletters) | "course" // Educational course (programs, learning paths) | "lesson" // Individual lesson (units, modules, segments) // PRODUCTS (4) | "digital_product" // Templates, tools, assets | "membership" // Tiered membership (Patreon, Substack) | "consultation" // 1-on-1 session (coaching, support) | "nft" // NFT collectible (governance, utility) // COMMUNITY (3) | "community" // Community space (Discord, forums) | "conversation" // Thread/discussion (boards, channels) | "message" // Individual message (chat, DM) // TOKEN (2) | "token" // Actual token instance | "token_contract" // Smart contract // KNOWLEDGE (2) | "knowledge_item" // Piece of creator knowledge | "embedding" // Vector embedding // PLATFORM (6) | "website" // Auto-generated creator site | "landing_page" // Custom landing pages (campaigns, sales) | "template" // Design templates (reusable components) | "livestream" // Live broadcast (streaming, webinars) | "recording" // Saved livestream content | "media_asset" // Images, videos, files // BUSINESS (7) | "payment" // Payment transaction | "subscription" // Recurring subscription | "invoice" // Invoice record | "metric" // Tracked metric | "insight" // AI-generated insight | "prediction" // AI prediction | "report" // Analytics report // AUTHENTICATION & SESSION (5) | "session" // User session (Better Auth) | "oauth_account" // OAuth connection (GitHub, Google) | "verification_token" // Email/2FA verification token | "password_reset_token" // Password reset token | "ui_preferences" // User UI settings (theme, layout) // MARKETING (6) | "notification" // System notification | "email_campaign" // Email marketing campaign | "announcement" // Platform announcement | "referral" // Referral record | "campaign" // Marketing campaign | "lead" // Potential customer/lead // EXTERNAL INTEGRATIONS (3) | "external_agent" // External AI agent (ElizaOS) | "external_workflow" // External workflow (n8n, Zapier) | "external_connection" // Connection config // PROTOCOL ENTITIES (2, protocol-agnostic) | "mandate" // Intent/cart (AP2, shopping) | "product"; // Sellable product (ACP marketplace) ``` **How Domains Apply These Types:** - **E-Commerce**: Uses `product` (catalog items), `mandate` (shopping carts), `payment` (transactions), `subscription` (auto-renewals), `membership` (loyalty), `notification` (order updates), `email_campaign` (promotional) - **Education**: Uses `course` (programs), `lesson` (units), `community` (cohorts), `assignment` (assessments), `conversation` (discussion boards), `metric` (grades), `report` (transcripts) - **Creator**: Uses `video` (YouTube/TikTok), `podcast` (episodes), `blog_post` (newsletters), `membership` (tiers), `course` (products), `email_campaign` (outreach), `metric` (engagement), `insight` (analytics) - **Crypto**: Uses `token` (holdings), `token_contract` (smart contracts), `metric` (TVL/volume), `payment` (transfers), `knowledge_item` (risk profiles), `report` (protocol analysis) ### Thing Structure ```typescript { _id: Id<"things">, type: ThingType, name: string, // Display name groupId: Id<"groups">, // REQUIRED: Multi-tenant isolation properties: { // Type-specific properties (JSON) // For creator: email?: string, username?: string, niche?: string[], // For token: contractAddress?: string, totalSupply?: number, // etc... }, status: "active" | "inactive" | "draft" | "published" | "archived", createdAt: number, updatedAt: number, deletedAt?: number } ``` ### Properties by Thing Type **Creator Properties:** ```typescript { email: string, username: string, displayName: string, bio?: string, avatar?: string, niche: string[], expertise: string[], targetAudience: string, brandColors?: { primary: string, secondary: string, accent: string }, totalFollowers: number, totalContent: number, totalRevenue: number, // MULTI-TENANT ROLES role: "platform_owner" | "group_owner" | "group_user" | "customer", groupId?: Id<"groups">, // Current/default group (if group_owner or group_user) permissions?: string[], // Additional permissions } ``` **Organization Properties:** ```typescript { name: string, slug: string, // URL-friendly identifier domain?: string, // Custom domain (e.g., acme.one.ie) logo?: string, description?: string, status: "active" | "suspended" | "trial" | "cancelled", plan: "starter" | "pro" | "enterprise", limits: { users: number, // Max users allowed storage: number, // GB apiCalls: number, // Per month }, usage: { users: number, // Current users storage: number, // GB used apiCalls: number, // This month }, billing: { customerId?: string, // Stripe customer ID subscriptionId?: string, // Stripe subscription ID currentPeriodEnd?: number, }, settings: { allowSignups: boolean, requireEmailVerification: boolean, enableTwoFactor: boolean, allowedDomains?: string[], // Email domain whitelist }, createdAt: number, trialEndsAt?: number, } ``` **AI Clone Properties:** ```typescript { voiceId?: string, voiceProvider?: "elevenlabs" | "azure" | "custom", appearanceId?: string, appearanceProvider?: "d-id" | "heygen" | "custom", systemPrompt: string, temperature: number, knowledgeBaseSize: number, lastTrainingDate: number, totalInteractions: number, satisfactionScore: number } ``` **Agent Properties:** ```typescript { agentType: "strategy" | "marketing" | "sales" | ..., systemPrompt: string, model: string, temperature: number, capabilities: string[], tools: string[], totalExecutions: number, successRate: number, averageExecutionTime: number } ``` **Token Properties:** ```typescript { contractAddress: string, blockchain: "base" | "ethereum" | "polygon", standard: "ERC20" | "ERC721" | "ERC1155", totalSupply: number, circulatingSupply: number, price: number, marketCap: number, utility: string[], burnRate: number, holders: number, transactions24h: number, volume24h: number } ``` **Course Properties:** ```typescript { title: string, description: string, thumbnail?: string, modules: number, lessons: number, totalDuration: number, price: number, currency: string, tokenPrice?: number, enrollments: number, completions: number, averageRating: number, generatedBy: "ai" | "human" | "hybrid", personalizationLevel: "none" | "basic" | "advanced" } ``` **Website Properties:** ```typescript { domain: string, subdomain: string, // creator.one.ie template: "minimal" | "showcase" | "portfolio", customCSS?: string, customDomain?: string, sslEnabled: boolean, analytics: { visitors30d: number, pageViews: number, conversionRate: number } } ``` **Livestream Properties:** ```typescript { title: string, scheduledAt: number, startedAt?: number, endedAt?: number, platform: "youtube" | "twitch" | "custom", streamUrl: string, recordingUrl?: string, viewersPeak: number, viewersAverage: number, chatEnabled: boolean, aiCloneMixEnabled: boolean, // For human + AI mixing status: "scheduled" | "live" | "ended" | "cancelled" } ``` **Payment Properties (Consolidated):** ```typescript { protocol: "x402" | "acp" | "ap2" | "stripe", // Protocol identifier amount: number, currency: "usd" | "eur", paymentMethod: "stripe" | "crypto", stripePaymentIntentId?: string, txHash?: string, // Blockchain transaction status: "pending" | "completed" | "failed" | "refunded", fees: number, netAmount: number, processedAt?: number, // Protocol specifics scheme?: "permit", // X402 network?: "base", // X402/Crypto invoiceId?: string // ACP } ``` **Subscription Properties:** ```typescript { tier: "starter" | "pro" | "enterprise", price: number, currency: string, interval: "monthly" | "yearly", status: "active" | "cancelled" | "past_due" | "expired", currentPeriodStart: number, currentPeriodEnd: number, cancelAt?: number, stripeSubscriptionId?: string } ``` **Metric Properti