oneie

Version:

Build apps, websites, and AI agents in English. Zero-interaction setup for AI agents (Claude Code, Cursor, Windsurf). Download to your computer, run in the cloud, deploy to the edge. Open source and free forever.

one.ie

one-ie/one

1,608 lines (1,289 loc) • 132 kB

Markdown

--- title: Ontology dimension: knowledge category: ontology.md tags: 6-dimensions, ai, architecture, ontology related_dimensions: connections, events, groups, people, things scope: global created: 2025-11-25 updated: 2025-11-25 version: 2.0.0 ai_context: | This document is part of the knowledge dimension in the ontology.md category. Location: one/knowledge/ontology-v2.md Purpose: Documents one platform - ontology specification v2 Related dimensions: connections, events, groups, people, things For AI agents: Read this to understand ontology. --- # ONE Platform - Ontology Specification V2 **Version:** 2.0.0 (Reality as DSL - The Universal Code Generation Language) **Status:** Active - Reality-Aware Architecture **Design Principle:** This isn't just a data model. It's a Domain-Specific Language (DSL) that models reality itself, enabling 98% AI code generation accuracy through compound structure. --- ## Why This Changes Everything ### The Breakthrough: Reality as DSL **Most developers think databases model their application.** We flipped this. **The 6-dimension ontology models reality itself**. Applications map to it. This enables: - **98% AI code generation accuracy** (not 30-70%) - **Compound structure** (each feature makes the next MORE accurate, not less) - **Universal feature import** (clone ANY system into the ontology) - **Never breaks** (reality doesn't change, technology does) ### What AI Sees **Traditional Codebase (Pattern Divergence):** ``` Feature 1: createUser(email) ────────┐ Feature 2: addProduct(name) ─────────┼─→ 100 patterns Feature 3: registerCustomer(data) ───┤ AI confused Feature 4: insertOrder(items) ───────┤ Accuracy: 30% ...each uses different approach ``` **ONE Codebase (Pattern Convergence):** ``` Feature 1: provider.things.create({ type: "user" }) ────┐ Feature 2: provider.things.create({ type: "product" }) ─┼─→ 1 pattern Feature 3: provider.things.create({ type: "customer" })─┤ AI masters it Feature 4: provider.things.create({ type: "order" }) ───┤ Accuracy: 98% ...all use same pattern ``` **The difference:** Traditional codebases teach AI 100 patterns (chaos). ONE teaches AI 1 pattern (mastery). ### Why This Never Breaks **Reality is stable. Technology changes.** The 6 dimensions model reality: 1. **Groups** - Containers exist (friend circles → governments) 2. **People** - Actors authorize (who can do what) 3. **Things** - Entities exist (users, products, courses, agents) 4. **Connections** - Relationships relate (owns, purchased, enrolled_in) 5. **Events** - Actions happen (created, updated, purchased) 6. **Knowledge** - Understanding emerges (embeddings, search, RAG) These dimensions NEVER change because they model reality itself, not any specific technology. **Examples of systems that map perfectly:** - **Shopify** → Products (things), Orders (connections + events), Customers (people) - **Moodle** → Courses (things), Enrollments (connections), Completions (events) - **Stripe** → Payments (things), Transactions (connections + events), Customers (people) - **WordPress** → Posts (things), Authors (people), Categories (knowledge labels) **Every system maps to the same 6 dimensions.** That's why AI agents achieve 98% accuracy. --- ## Structure This ontology is organized into 6 dimension files: 1. **[organisation.md](./organisation.md)** - Multi-tenant isolation & ownership 2. **[people.md](./people.md)** - Authorization, governance, & user customization 3. **[things.md](./things.md)** - 66 entity types (what exists) 4. **[connections.md](./connections.md)** - 25 relationship types (how they relate) 5. **[events.md](./events.md)** - 67 event types (what happened) 6. **[knowledge.md](./knowledge.md)** - Vectors, embeddings, RAG (what it means) **Execution Guide:** 7. **[todo.md](./todo.md)** - 100-cycle execution sequence (plan in cycles, not days) **This document (Ontology.md)** contains the complete technical specification. The consolidated files above provide focused summaries and patterns. **Planning Paradigm:** We don't plan in days. We plan in **cycle passes** (Cycle 1-100). See [todo.md](./todo.md) for the complete 100-cycle template that guides feature implementation from idea to production. ## The 6-Dimension Reality Model **This is the universal interface.** Every feature in every system maps to these 6 dimensions. **Every single thing in ONE platform exists within one of these 6 dimensions:** ``` ┌──────────────────────────────────────────────────────────────┐ │ 1. GROUPS │ │ Multi-tenant isolation with hierarchical nesting - who owns │ │ what at group level (friend circles → DAOs → governments) │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 2. PEOPLE │ │ Authorization & governance - platform owner, group owners │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 3. THINGS │ │ Every "thing" - users, agents, content, tokens, courses │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 4. CONNECTIONS │ │ Every relationship - owns, follows, taught_by, powers │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 5. EVENTS │ │ Every action - purchased, created, viewed, completed │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ 6. KNOWLEDGE │ │ Labels + chunks + vectors powering RAG & search │ └──────────────────────────────────────────────────────────────┘ ``` **The Universal Interface (How Technology Implements the Ontology):** ``` ┌─────────────────────────────────────────────────────────────────────┐ │ LAYER 1: UNIVERSAL INTERFACE │ │ (The 6-Dimension DSL) │ ├─────────────────────────────────────────────────────────────────────┤ │ groups → Hierarchical containers (friend circles → governments)│ │ people → Authorization & governance (who can do what) │ │ things → All entities (66 types: user, product, course...) │ │ connections → All relationships (25 types: owns, purchased...) │ │ events → All actions (67 types: created, updated, logged...) │ │ knowledge → AI understanding (embeddings, search, RAG) │ │ │ │ This layer NEVER changes. It models reality. │ └──────────────────┬──────────────────────────────────────────────────┘ │ ↓ Technology changes, ontology stays the same ┌─────────────────────────────────────────────────────────────────────┐ │ TECHNOLOGY ADAPTERS (swap freely) │ │ (Convex, Hono, Astro, React, etc.) │ ├─────────────────────────────────────────────────────────────────────┤ │ Backend: Hono API + Convex Database (implements ontology) │ │ Frontend: Astro SSR + React Islands (renders ontology) │ │ Real-time: Convex hooks (live ontology subscriptions) │ │ Static: Astro Content Collections (ontology as files) │ │ │ │ Technology can be swapped. Ontology stays the same. │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Dimension 1: Groups (Containers) **Purpose:** Partition the system with hierarchical nesting (friend circles → DAOs → governments) **Why it never changes:** Containers always contain things. Whether it's a lemonade stand or a global government, the concept of "container" is universal. **Pattern for AI:** ```typescript // AI learns: Everything belongs to a group provider.things.create({ groupId, type, name, properties }); ``` **Example mappings:** - Shopify Store → group (type: business) - Moodle School → group (type: organization) - DAO Treasury → group (type: dao) - Friend Circle → group (type: friend_circle) ### Dimension 2: People (Authorization) **Purpose:** Define who can do what (actors, roles, permissions) **Why it never changes:** Authorization is a universal concept. Someone always performs actions. **Pattern for AI:** ```typescript // AI learns: Every action has an actor events.log({ actorId: personId, type, targetId }); ``` **Example mappings:** - Shopify Admin → person (role: org_owner) - Moodle Student → person (role: customer) - Platform Owner → person (role: platform_owner) - Team Member → person (role: org_user) ### Dimension 3: Things (Entities) **Purpose:** All nouns in the system (66 types, infinitely extensible) **Why it never changes:** Entities exist. Users, products, courses, agents—these are all "things" with different types. **Pattern for AI:** ```typescript // AI learns: One pattern for all entities provider.things.create({ type: "product" | "course" | "user" | ..., name, properties }) ``` **Example mappings:** - Shopify Product → thing (type: product) - Moodle Course → thing (type: course) - Stripe Payment → thing (type: payment) - WordPress Post → thing (type: blog_post) **New entity type?** Just add to `properties`. No schema migration needed. ### Dimension 4: Connections (Relationships) **Purpose:** How entities relate to each other (25 types + metadata) **Why it never changes:** Relationships are universal. Things connect to other things. **Pattern for AI:** ```typescript // AI learns: One pattern for all relationships provider.connections.create({ fromThingId, toThingId, relationshipType, metadata, }); ``` **Example mappings:** - Shopify Order → connection (type: purchased) + event (type: order_placed) - Moodle Enrollment → connection (type: enrolled_in) - GitHub Follows → connection (type: following) - Token Holdings → connection (type: holds_tokens, metadata: { balance }) ### Dimension 5: Events (Actions) **Purpose:** Complete audit trail of what happened when (67 types + metadata) **Why it never changes:** Actions happen at specific times. This is universal. **Pattern for AI:** ```typescript // AI learns: All actions are logged the same way provider.events.log({ type, actorId, targetId, timestamp, metadata }); ``` **Example mappings:** - Shopify Checkout → event (type: payment_processed) - Moodle Lesson View → event (type: content_viewed) - User Login → event (type: user_login) - Token Purchase → event (type: tokens_purchased) ### Dimension 6: Knowledge (Understanding) **Purpose:** Labels, embeddings, and semantic search for AI **Why it never changes:** Categorization and understanding are universal concepts. **Pattern for AI:** ```typescript // AI learns: Knowledge is linked to things provider.knowledge.create({ sourceThingId, knowledgeType: "label" | "chunk", text, embedding, }); ``` **Example mappings:** - WordPress Categories → knowledge (type: label) - Course Content → knowledge (type: chunk, embedding: [...]) - Product Tags → knowledge (type: label) - Semantic Search → knowledge vector search --- **Golden Rule:** If you can't map your feature to these 6 dimensions, you're thinking about it wrong. **For AI Agents:** This ontology is your universal language. Learn these 6 patterns and you can generate ANY feature with 98% accuracy. --- ## Compound Structure Accuracy: Why AI Gets Better Over Time ### Traditional AI Code Generation (Pattern Divergence) **The death spiral:** ``` Generation 1: Clean code → 95% accurate Generation 2: Slight drift → 90% accurate (-5% - patterns starting to diverge) Generation 3: Pattern divergence → 80% accurate (-10% - AI sees multiple patterns) Generation 4: Inconsistency → 65% accurate (-15% - AI confused by variations) Generation N: Unmaintainable mess → 30% accurate (-20% - complete chaos) ``` **Why?** Each feature introduces NEW patterns. AI has to guess which pattern to use. ### ONE's AI Code Generation (Pattern Convergence) **The virtuous cycle:** ``` Generation 1: Maps to ontology → 85% accurate (learning the ontology) Generation 2: Follows patterns → 90% accurate (+5% - recognizing service pattern) Generation 3: Reuses services → 93% accurate (+3% - composing existing services) Generation 4: Predictable structure → 96% accurate (+3% - mastering Effect.ts patterns) Generation N: Perfect consistency → 98%+ accurate (+2% - generalized patterns) ``` **Why?** Each feature uses the SAME patterns. AI masters the ontology, then reuses it. ### How Agents Learn the Ontology **Generation 1-5: Learning (85% accurate)** ``` Agent thinks: "I see things being created with `type` field. Let me check..." "All entities are things? Even users, products, courses?" "Connections link things together. Got it." "Events log actions. People are actors." "Pattern emerging: 6 dimensions for everything." CONFIDENCE: 85% (still learning) ``` **Generation 6-20: Recognizing (90% accurate)** ``` Agent thinks: "Every create operation uses provider.things.create. 100% of the time." "Every service uses Effect.gen. Pattern confirmed." "All errors are tagged unions. I can handle exhaustively." "Dependencies are injected. I know what's needed." CONFIDENCE: 90% (pattern recognized) ``` **Generation 21-50: Composing (93% accurate)** ``` Agent thinks: "I can reuse existing services:" "- createThing service exists" "- createConnection service exists" "- logEvent service exists" "My new feature: compose these three services." "No new patterns needed. Just composition." CONFIDENCE: 93% (composing existing patterns) ``` **Generation 51-100: Mastering (96% accurate)** ``` Agent thinks: "I've generated 50 services. All follow same structure:" "1. Get provider from context" "2. Call provider operation" "3. Handle typed errors" "4. Return typed result" "Pattern is UNIVERSAL. Works for ANY feature." CONFIDENCE: 96% (mastered the structure) ``` **Generation 100+: Generalizing (98%+ accurate)** ``` Agent thinks: "I don't even need to think. The pattern IS the system." "New feature request? Map to 6 dimensions." "Need validation? Effect.ts service." "Need data? Provider interface." "Need state? Nanostores." "Every decision is deterministic." CONFIDENCE: 98%+ (system internalized) ``` ### What This Means for Development **Feature #1:** - Traditional: 8 hours (70% AI, 30% human) - ONE: 8 hours (70% AI, 30% human) - **No difference yet** **Feature #10:** - Traditional: 10 hours (60% AI, 40% human - patterns diverging) - ONE: 6 hours (85% AI, 15% human - patterns converging) - **ONE is 1.7x faster** **Feature #50:** - Traditional: 16 hours (40% AI, 60% human - technical debt) - ONE: 3 hours (95% AI, 5% human - pattern mastery) - **ONE is 5.3x faster** **Feature #100:** - Traditional: 24 hours (25% AI, 75% human - chaos) - ONE: 1.5 hours (98% AI, 2% human - generalized) - **ONE is 16x faster** **Cumulative for 100 features:** - Traditional: 1,400 hours - ONE: 350 hours - **ONE is 4x faster overall** - **And the gap keeps growing** ### Why Schema Migrations Never Break This **New entity type?** ```typescript // NO schema migration needed { type: "new_thing", name: "...", properties: { ...custom } } ``` **New field on existing type?** ```typescript // NO schema migration needed { type: "product", properties: { price, SKU, newField: "value" } } ``` **New relationship?** ```typescript // NO schema migration needed { relationshipType: "new_connection", metadata: { ...custom } } ``` **New protocol integration?** ```typescript // NO schema migration needed { relationshipType: "transacted", metadata: { protocol: "new_protocol", ...custom } } ``` **Result:** Technology changes (React → Svelte, REST → GraphQL), but the ontology stays the same forever. --- ## GROUPS: The Isolation Boundary with Hierarchical Nesting Purpose: Partition the system with perfect isolation and support nested groups (groups within groups) - from friend circles to DAOs to governments. Every group owns its own graph of things, connections, events, and knowledge. ### Group Structure ```typescript { _id: Id<'groups'>, slug: string, // REQUIRED: URL identifier (/group/slug) name: string, // REQUIRED: Display name type: 'friend_circle' | 'business' | 'community' | 'dao' | 'government' | 'organization', parentGroupId?: Id<'groups'>, // OPTIONAL: Parent group for hierarchical nesting description?: string, // OPTIONAL: About text metadata: Record<string, any>, settings: { visibility: 'public' | 'private', joinPolicy: 'open' | 'invite_only' | 'approval_required', plan: 'starter' | 'pro' | 'enterprise', limits: { users: number, storage: number, // GB apiCalls: number, } }, status: 'active' | 'archived', createdAt: number, updatedAt: number, } ``` ### Common Fields by Use Case **Identity:** `[slug, name]` - Who they are + URL **Web:** `[slug, name, description]` - Website generation **Operations:** `[status, type, settings, parentGroupId]` - System management ### Why Groups Matter 1. **Multi-Tenant Isolation:** Each group's data is completely separate 2. **Hierarchical Nesting:** Groups can contain sub-groups for complex organizations (parent → child → grandchild...) 3. **Flexible Types:** From friend circles (2 people) to businesses to DAOs to governments (billions) 4. **Resource Quotas:** Control costs and usage per group 5. **Privacy Control:** Groups can be public or private with controlled access 6. **Flexible Scale:** Scales from friend circles to global governments without schema changes ### Hierarchical Group Examples by Domain **E-Commerce (Retail Chain):** ``` Corporate Headquarters (group) ├─ North American Division (child group) │ ├─ New York Store (grandchild group) │ └─ California Store (grandchild group) └─ European Division (child group) ├─ London Store (grandchild group) └─ Paris Store (grandchild group) ``` **Education (University System):** ``` MIT (group) ├─ School of Engineering (child group) │ ├─ Computer Science Dept (grandchild group) │ ├─ Electrical Engineering Dept (grandchild group) │ └─ Mechanical Engineering Dept (grandchild group) ├─ School of Science (child group) │ ├─ Mathematics Dept (grandchild group) │ └─ Physics Dept (grandchild group) └─ School of Business (child group) ``` **Creator (Multi-Channel Brand):** ``` Creator Brand (group) ├─ YouTube Channel (child group) │ └─ Content Series 1 (grandchild group) ├─ Podcast (child group) │ └─ Season 2 (grandchild group) └─ Community (child group - Discord server with channels) ``` **Crypto (DAO Treasury):** ``` DAO Treasury (group) ├─ Core Operations (child group) │ ├─ Development Fund (grandchild group) │ └─ Marketing Fund (grandchild group) ├─ Investment Committee (child group) │ └─ Venture Capital Allocation (grandchild group) └─ Community Grants (child group) ``` --- ### System Group Pattern (Global Entities) **Problem:** Some entities are truly global and don't belong to any user group. **Examples:** - Platform-wide settings - System notifications - Global rate limits - Reference data (timezones, currencies, countries) - Platform-level analytics **Solution:** Reserve a special "system" group. ```typescript // Create system group on platform initialization const SYSTEM_GROUP_ID = 'system'; await ctx.db.insert('groups', { _id: SYSTEM_GROUP_ID, slug: 'system', name: 'System', type: 'organization', settings: { visibility: 'private', joinPolicy: 'invite_only', plan: 'enterprise', limits: { users: Infinity, storage: Infinity, apiCalls: Infinity } }, status: 'active', createdAt: Date.now(), }); // Use for global entities await ctx.db.insert('things', { type: 'platform_setting', name: 'Global Rate Limit', groupId: SYSTEM_GROUP_ID, // System group properties: { maxRequestsPerMinute: 1000, scope: 'global' }, status: 'active', createdAt: Date.now(), }); ``` **Rules:** - System group ID is reserved and cannot be deleted - Only platform owners can create things in system group - System group has no resource limits - System entities are visible to all groups (read-only) --- ## PEOPLE: Authorization & Governance Purpose: Define who can do what. People direct groups, customize AI agents, and govern access. ### Person Structure ```typescript { _id: Id<'people'>, email: string, username: string, displayName: string, // CRITICAL: Role determines access level role: 'platform_owner' | 'group_owner' | 'group_user' | 'customer', // Group context groupId?: Id<'groups'>, // Current/default group permissions?: string[], // Profile bio?: string, avatar?: string, // Multi-tenant tracking groups: Id<'groups'>[], // All groups this person belongs to createdAt: number, updatedAt: number, } ``` ### Four Roles 1. **Platform Owner** (Anthony) - Owns the ONE Platform - 100% revenue from platform-level services - Can access all groups (support/debugging) - Creates new groups 2. **Group Owner** - Owns/manages one or more groups - Controls users, permissions, billing within group - Customizes AI agents and frontend - Revenue sharing with platform 3. **Group User** - Works within a group - Limited permissions (defined by group owner) - Can create content, run agents (within quotas) 4. **Customer** - External user consuming content - Purchases tokens, enrolls in courses - No admin access ### Why People Matter 1. **Authorization:** Every action must have an actor (person) 2. **Governance:** Group owners control who can do what 3. **Audit Trail:** Events log who did what when 4. **Customization:** People teach AI agents their preferences --- ## KNOWLEDGE: Labels, Chunks, and Vectors (RAG) Purpose: unify taxonomy (“tags”) and retrieval‑augmented generation (RAG) under one table. A knowledge item can be a label (former tag), a document wrapper, or a chunk with an embedding. Design principles: - Protocol‑agnostic: store protocol details in `metadata`. - Many‑to‑many: link knowledge ⇄ things via `thingKnowledge` with optional context metadata. - Scalable: consolidated types minimize index fan‑out; embeddings enable semantic search. ### Knowledge Types ```typescript type KnowledgeType = | "label" // replaces legacy "tag"; lightweight categorical marker | "document" // wrapper for a source text/blob (pre-chunking) | "chunk" // atomic chunk of text with embedding | "vector_only"; // embedding without stored text (e.g., privacy) ``` ### Knowledge Structure ```typescript { _id: Id<'knowledge'>, knowledgeType: KnowledgeType, // Textual content (optional for label/vector_only) text?: string, // Embedding for semantic search (optional for label/document) embedding?: number[], // Float32 vector; model-dependent dimension embeddingModel?: string, // e.g., "text-embedding-3-large" embeddingDim?: number, // Source linkage sourceThingId?: Id<'things'>, // Primary source entity sourceField?: string, // e.g., 'content', 'transcript', 'title' chunk?: { index: number; start?: number; end?: number; tokenCount?: number; overlap?: number }, // Lightweight categorization (free-form) labels?: string[], // Replaces per-thing tags; applied to knowledge // Additional metadata (protocol, language, mime, hash, version) metadata?: Record<string, any>, createdAt: number, updatedAt: number, deletedAt?: number, } ``` ### Junction: thingKnowledge ```typescript { _id: Id<'thingKnowledge'>, thingId: Id<'things'>, knowledgeId: Id<'knowledge'>, role?: 'label' | 'summary' | 'chunk_of' | 'caption' | 'keyword', // Context for the link (e.g., confidence, section name) metadata?: Record<string, any>, createdAt: number, } ``` ### Indexes (recommended) - `knowledge.by_type` (knowledgeType) - `knowledge.by_source` (sourceThingId) - `knowledge.by_created` (createdAt) - `thingKnowledge.by_thing` (thingId) - `thingKnowledge.by_knowledge` (knowledgeId) - Vector index (provider-dependent): `knowledge.by_embedding` for ANN search ### How Domains Apply Knowledge **Education - Learning Objectives & Study Materials:** ```typescript // Knowledge: Learning objective chunk { knowledgeType: 'chunk', text: 'Students should be able to solve quadratic equations', sourceThingId: courseId, labels: ['subject:mathematics', 'grade:9-12', 'objective:apply', 'skill:algebra'] } // Link: Course references this learning objective { thingId: courseId, knowledgeId: knowledgeId, role: 'learning_objective' } ``` **Creator - Content SEO & Discovery:** ```typescript // Knowledge: Video description chunk with embedded metadata { knowledgeType: 'chunk', text: 'This video teaches React hooks for beginners...', sourceThingId: videoId, embedding: [0.1, 0.2, ...], labels: ['topic:react', 'difficulty:beginner', 'platform:youtube', 'series:javascript101'] } ``` **E-Commerce - Product Categorization & Search:** ```typescript // Knowledge: Product description for semantic search { knowledgeType: 'document', text: 'Blue wireless headphones with 40-hour battery life', sourceThingId: productId, embedding: [0.5, 0.3, ...], labels: ['category:electronics', 'color:blue', 'feature:wireless', 'price_range:premium'] } ``` **Crypto - Risk Analysis & Token Intelligence:** ```typescript // Knowledge: Token risk assessment { knowledgeType: 'chunk', text: 'Token has no minting restrictions, moderate holder concentration', sourceThingId: tokenId, labels: ['risk:medium', 'metric:tvl_trend_up', 'audit:completed', 'governance:none'] } // Knowledge: Protocol dependency analysis { knowledgeType: 'label', text: 'Depends on Chainlink oracle', sourceThingId: protocolId, labels: ['dependency:critical', 'type:oracle', 'risk_factor:oracle'] } ``` ### RAG Ingestion Strategy Objective: Attach vectors to **relevant** content for high-quality retrieval while controlling costs and maintaining performance. **CRITICAL:** Not every field needs RAG. Be selective. Embeddings are expensive in storage, compute, and money. --- #### What to Embed (Decision Matrix by Domain) **Universal Rule:** ``` IF "user will semantically search this" → EMBED IF "user will filter/sort this" → DON'T EMBED IF "structured data" → DON'T EMBED ``` | Content Type | Embed? | Domain Example | Use Case | |--------------|--------|----------------|----------| | **Long-form Content** ||||| | Blog post content | ✅ YES | Creator | "Find posts about React hooks" | | Course lesson content | ✅ YES | E-Learning | "Search lessons on form validation" | | Video/podcast transcripts | ✅ YES | E-Learning, Creator | Makes A/V content searchable | | Email campaign body | ✅ YES | Creator, E-Commerce | Content discovery | | **Product Content** ||||| | Product descriptions | ✅ YES | E-Commerce | "Find eco-friendly water bottles" | | Product specs (JSON) | ❌ NO | E-Commerce | Use filters: `size === 'L'` | | Customer reviews | ✅ YES | E-Commerce | "What do people say about durability?" | | Q&A responses | ✅ YES | E-Commerce | Customer support knowledge base | | Prices, SKUs, inventory | ❌ NO | E-Commerce | Exact match: `price < 50` | | **Social Content** ||||| | Social post text (>100 chars) | ✅ YES | Social | "Find my AI posts with high engagement" | | Social post text (<100 chars) | ❌ NO | Social | Too short, use labels | | Thread content (combined) | ✅ YES | Social | Combine into single chunk | | Hashtags | ❌ NO | Social | Exact match, not semantic | | Comments (>50 words) | ⚠️ MAYBE | Social | Only for community insights | | **Image Generation** ||||| | Image prompts | ✅ YES | Image Gen | "Find cyberpunk city prompts" | | Prompt descriptions | ✅ YES | Image Gen | Style discovery | | Negative prompts | ✅ YES | Image Gen | "Avoid common mistakes" | | Generation params | ❌ NO | Image Gen | Use filters: `steps === 50` | | Image pixels | ❌ NO | Image Gen | Use CLIP embeddings separately | | **Educational Content** ||||| | Course descriptions | ✅ YES | E-Learning | Discovery + recommendations | | Lesson summaries | ✅ YES | E-Learning | "React hooks for beginners" | | Student notes | ✅ YES | E-Learning | Personal knowledge base | | Quiz questions | ⚠️ MAYBE | E-Learning | Only for study guides | | Progress data | ❌ NO | E-Learning | Use analytics: `progress >= 0.5` | | Certificates | ❌ NO | E-Learning | Metadata only | | **Metadata** ||||| | Titles, summaries | ✅ YES | All | High signal-to-noise | | Descriptions (>50 words) | ✅ YES | All | Context for search | | Tags, categories | ❌ NO | All | Use `labels` instead (free) | | **User-Generated** ||||| | Bios, profiles | ⚠️ MAYBE | Social | Only for people search | | **System Data** ||||| | Logs, errors | ❌ NO | All | Use log aggregation tools | | Metrics, analytics | ❌ NO | All | Use time-series DB | | Audit trails | ❌ NO | All | Events table is sufficient | **Domain-Specific Examples:** **E-Commerce:** ```typescript // ✅ EMBED: Product discovery "Find sustainable yoga mats" → Semantic search on descriptions // ❌ DON'T EMBED: Filtering "Show mats under $30" → Filter: price < 30 "In stock only" → Filter: inventory > 0 ``` **E-Learning:** ```typescript // ✅ EMBED: Course/lesson discovery "Learn React hooks for beginners" → Semantic search on course descriptions + lesson transcripts // ❌ DON'T EMBED: Progress tracking "Show my completed courses" → Filter: connections where completed = true ``` **Image Generation:** ```typescript // ✅ EMBED: Prompt library "Cyberpunk city at night" → Semantic search on successful prompts // ❌ DON'T EMBED: Generation settings "Images with CFG 7.5" → Filter: metadata.cfg === 7.5 ``` **Social Posting:** ```typescript // ✅ EMBED: Content inspiration "My posts about AI with high engagement" → Semantic search on post text // ❌ DON'T EMBED: Engagement metrics "Posts with >1000 likes" → Filter: engagement.likes > 1000 ``` **Cost Reality Check (10K items):** | Domain | What to Embed | Monthly Cost | |--------|---------------|--------------| | E-Commerce | Product descriptions | ~$1.30 | | E-Learning | Lesson transcripts | ~$13 (longer content) | | Image Gen | Prompts + descriptions | ~$0.50 | | Social | Long posts only | ~$0.80 | **Key Insight:** Be ruthlessly selective. Only embed content users will **semantically search**, not data they'll **filter or sort**. --- #### When to Update Embeddings **Trigger:** Content changes in source thing. ```typescript // On content update export const updateBlogPost = mutation({ handler: async (ctx, { postId, content }) => { // 1. Update the thing await ctx.db.patch(postId, { properties: { content }, updatedAt: Date.now(), }); // 2. Schedule re-embedding (debounced) await ctx.scheduler.runAfter(5000, internal.knowledge.reEmbedThing, { thingId: postId, fields: ['content'], // Only re-embed changed fields }); // 3. Log event await ctx.db.insert('events', { type: 'content_event', actorId: ctx.auth.userId!, targetId: postId, groupId: post.groupId, timestamp: Date.now(), metadata: { action: 'updated', triggeredReEmbedding: true }, }); }, }); ``` **Re-embedding Strategy:** | Change Type | Action | Why | |-------------|--------|-----| | Content edited | Re-embed immediately | Content changed | | Title/summary edited | Re-embed immediately | High-signal metadata | | Tags/labels changed | Update labels only | No embedding needed | | Status changed (draft→published) | Re-embed if first publish | Visibility changed | | Minor typo fix | Debounce 5 seconds | Avoid re-embedding every keystroke | | Bulk import | Batch embed (100/batch) | Rate limiting | **Cost Optimization:** ```typescript // Hash content to detect actual changes import { createHash } from 'crypto'; export const reEmbedThing = internalMutation({ handler: async (ctx, { thingId, fields }) => { const thing = await ctx.db.get(thingId); const content = fields.map(f => thing.properties[f]).join('\n'); // Hash current content const contentHash = createHash('sha256').update(content).digest('hex'); // Check if content actually changed const existingKnowledge = await ctx.db .query('knowledge') .withIndex('by_source') .filter(q => q.eq(q.field('sourceThingId'), thingId)) .first(); if (existingKnowledge?.metadata?.contentHash === contentHash) { console.log('Content unchanged, skipping re-embedding'); return; // Save $$$ by skipping } // Content changed, re-embed await embedAndStore(ctx, thing, content, contentHash); }, }); ``` --- #### Chunking Standard **Window:** ~800 tokens (~3,200 characters) **Overlap:** ~200 tokens (~800 characters) **Boundaries:** Sentence-aware (don't split mid-sentence) ```typescript export async function chunkText(text: string): Promise<Chunk[]> { const chunks: Chunk[] = []; const sentences = text.split(/[.!?]+\s+/); // Split on sentence boundaries let currentChunk = ''; let currentTokens = 0; let chunkIndex = 0; for (const sentence of sentences) { const sentenceTokens = estimateTokens(sentence); if (currentTokens + sentenceTokens > 800 && currentChunk.length > 0) { // Save chunk chunks.push({ index: chunkIndex++, text: currentChunk.trim(), tokenCount: currentTokens, start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0, end: currentChunk.length, }); // Start new chunk with overlap (last 200 tokens) const overlapText = getLastNTokens(currentChunk, 200); currentChunk = overlapText + ' ' + sentence; currentTokens = 200 + sentenceTokens; } else { currentChunk += ' ' + sentence; currentTokens += sentenceTokens; } } // Save final chunk if (currentChunk.length > 0) { chunks.push({ index: chunkIndex, text: currentChunk.trim(), tokenCount: currentTokens, start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0, end: currentChunk.length, }); } return chunks; } ``` --- #### Embedding Pipeline ```typescript // 1. Schedule embedding export const scheduleEmbeddingForThing = mutation({ handler: async (ctx, { thingId, fields }) => { await ctx.scheduler.runAfter(0, internal.knowledge.embedThing, { thingId, fields, }); }, }); // 2. Embed text (internal action - calls OpenAI) export const embedText = internalAction({ handler: async (ctx, { text, model = 'text-embedding-3-large' }) => { const response = await openai.embeddings.create({ model, input: text, }); return { embedding: response.data[0].embedding, dim: response.data[0].embedding.length, }; }, }); // 3. Store chunks with embeddings export const upsertKnowledgeChunks = internalMutation({ handler: async (ctx, { thingId, chunks, embeddings }) => { // Delete old chunks const oldChunks = await ctx.db .query('knowledge') .withIndex('by_source') .filter(q => q.eq(q.field('sourceThingId'), thingId)) .collect(); for (const old of oldChunks) { await ctx.db.delete(old._id); } // Insert new chunks for (let i = 0; i < chunks.length; i++) { const knowledgeId = await ctx.db.insert('knowledge', { knowledgeType: 'chunk', text: chunks[i].text, embedding: embeddings[i].embedding, embeddingModel: 'text-embedding-3-large', embeddingDim: embeddings[i].dim, sourceThingId: thingId, chunk: chunks[i], metadata: { contentHash: chunks[i].hash, embeddingVersion: 'v3', }, createdAt: Date.now(), }); // Link to thing await ctx.db.insert('thingKnowledge', { thingId, knowledgeId, role: 'chunk_of', createdAt: Date.now(), }); } }, }); ``` --- #### Cost Management **Embedding Costs (OpenAI text-embedding-3-large):** - $0.13 per 1M tokens - Average blog post: ~1,000 tokens = $0.00013 - 1M blog posts embedded: ~$130 **Storage Costs:** - 3,072 dimensions × 4 bytes = 12KB per chunk - 1M chunks = 12GB of vector data - Convex: ~$0.25/GB/month = $3/month per 1M chunks **Optimization Strategies:** 1. **Selective Embedding:** Only embed content types with high search value 2. **Lazy Embedding:** Embed on first publish, not on draft save 3. **Batch Processing:** Embed 100 items at a time to avoid rate limits 4. **Content Hashing:** Skip re-embedding if content unchanged 5. **Smaller Models:** Use `text-embedding-3-small` (512 dims) for less critical content (75% cost savings) --- #### Query & Retrieval ```typescript export const semanticSearch = query({ args: { query: v.string(), groupId: v.id('groups'), limit: v.number() }, handler: async (ctx, { query, groupId, limit = 10 }) => { // 1. Embed query const queryEmbedding = await ctx.runAction(internal.knowledge.embedText, { text: query, }); // 2. Vector search (filtered by group) const results = await ctx.db .vectorSearch('knowledge', 'by_embedding', { vector: queryEmbedding.embedding, limit: limit * 2, // Over-fetch for filtering filter: q => q.eq(q.field('knowledgeType'), 'chunk'), }) .collect(); // 3. Filter by group (get source things) const groupResults = []; for (const result of results) { const sourceThing = await ctx.db.get(result.sourceThingId); if (sourceThing?.groupId === groupId) { groupResults.push({ ...result, score: result._score, thing: sourceThing, }); } if (groupResults.length >= limit) break; } return groupResults; }, }); ``` --- #### Governance & Lifecycle **Versioning:** - Store `metadata.contentHash` of source content - If hash unchanged, skip re-embedding - Track `metadata.embeddingVersion` for model migrations **Retention:** - Archive old chunks on major content edits (keep last 3 versions) - Garbage collect orphaned knowledge items (no thingKnowledge links) - Delete embeddings when source thing is hard-deleted **Quality:** - Track `metadata.qualityScore` based on user feedback - Monitor search relevance metrics - A/B test embedding models **Summary:** Be ruthlessly selective about what gets embedded. RAG is powerful but expensive. Embed content users will semantically search, not structured data they'll filter. ### Knowledge Governance Policy: Default is free-form, user-extensible knowledge labels for maximum flexibility and zero schema churn. - Curated label prefixes (recommended): `skill:*`, `industry:*`, `topic:*`, `format:*`, `goal:*`, `audience:*`, `technology:*`, `status:*`, `capability:*`, `protocol:*`, `payment_method:*`, `network:*`. - Validation: Enforce label hygiene (no duplicates within scope); allow synonyms via an alias list if needed. - Ownership: Platform/group owners may curate official labels; users can still apply ad‑hoc labels. - Hygiene: Periodically consolidate low-usage duplicates; do not delete knowledge items with active references—mark deprecated instead. --- ## THINGS: All The "Things" ### What Goes in Things? **Simple test:** If you can point at it and say "this is a \_\_\_", it's a thing. Examples: - "This is a **creator**" ✅ Thing - "This is a **blog post**" ✅ Thing - "This is a **token**" ✅ Thing - "This is a **relationship**" ❌ Connection, not thing - "This is a **purchase**" ❌ Event, not thing ### Thing Types **66 Types Organized in 13 Categories:** ```typescript type ThingType = // CORE (4) | "creator" // Human creator (role: platform_owner, org_owner, org_user, customer) | "ai_clone" // Digital twin of creator | "audience_member" // Fan/user (role: customer) | "organization" // Multi-tenant organization // BUSINESS AGENTS (10) | "strategy_agent" // Vision, planning, OKRs | "research_agent" // Market, trends, competitors | "marketing_agent" // Content strategy, SEO, distribution | "sales_agent" // Funnels, conversion, follow-up | "service_agent" // Support, onboarding, success | "design_agent" // Brand, UI/UX, assets | "engineering_agent" // Tech, integration, automation | "finance_agent" // Revenue, costs, forecasting | "legal_agent" // Compliance, contracts, IP | "intelligence_agent" // Analytics, insights, predictions // CONTENT (7) | "blog_post" // Written content (guides, newsletters, articles) | "video" // Video content (lectures, demos, shorts) | "podcast" // Audio content (episodes, interviews) | "social_post" // Social media post (all platforms) | "email" // Email content (campaigns, newsletters) | "course" // Educational course (programs, learning paths) | "lesson" // Individual lesson (units, modules, segments) // PRODUCTS (4) | "digital_product" // Templates, tools, assets | "membership" // Tiered membership (Patreon, Substack) | "consultation" // 1-on-1 session (coaching, support) | "nft" // NFT collectible (governance, utility) // COMMUNITY (3) | "community" // Community space (Discord, forums) | "conversation" // Thread/discussion (boards, channels) | "message" // Individual message (chat, DM) // TOKEN (2) | "token" // Actual token instance | "token_contract" // Smart contract // KNOWLEDGE (2) | "knowledge_item" // Piece of creator knowledge | "embedding" // Vector embedding // PLATFORM (6) | "website" // Auto-generated creator site | "landing_page" // Custom landing pages (campaigns, sales) | "template" // Design templates (reusable components) | "livestream" // Live broadcast (streaming, webinars) | "recording" // Saved livestream content | "media_asset" // Images, videos, files // BUSINESS (7) | "payment" // Payment transaction | "subscription" // Recurring subscription | "invoice" // Invoice record | "metric" // Tracked metric | "insight" // AI-generated insight | "prediction" // AI prediction | "report" // Analytics report // AUTHENTICATION & SESSION (5) | "session" // User session (Better Auth) | "oauth_account" // OAuth connection (GitHub, Google) | "verification_token" // Email/2FA verification token | "password_reset_token" // Password reset token | "ui_preferences" // User UI settings (theme, layout) // MARKETING (6) | "notification" // System notification | "email_campaign" // Email marketing campaign | "announcement" // Platform announcement | "referral" // Referral record | "campaign" // Marketing campaign | "lead" // Potential customer/lead // EXTERNAL INTEGRATIONS (3) | "external_agent" // External AI agent (ElizaOS) | "external_workflow" // External workflow (n8n, Zapier) | "external_connection" // Connection config // PROTOCOL ENTITIES (2, protocol-agnostic) | "mandate" // Intent/cart (AP2, shopping) | "product"; // Sellable product (ACP marketplace) ``` **How Domains Apply These Types:** - **E-Commerce**: Uses `product` (catalog items), `mandate` (shopping carts), `payment` (transactions), `subscription` (auto-renewals), `membership` (loyalty), `notification` (order updates), `email_campaign` (promotional) - **Education**: Uses `course` (programs), `lesson` (units), `community` (cohorts), `assignment` (assessments), `conversation` (discussion boards), `metric` (grades), `report` (transcripts) - **Creator**: Uses `video` (YouTube/TikTok), `podcast` (episodes), `blog_post` (newsletters), `membership` (tiers), `course` (products), `email_campaign` (outreach), `metric` (engagement), `insight` (analytics) - **Crypto**: Uses `token` (holdings), `token_contract` (smart contracts), `metric` (TVL/volume), `payment` (transfers), `knowledge_item` (risk profiles), `report` (protocol analysis) ### Thing Structure ```typescript { _id: Id<"things">, type: ThingType, name: string, // Display name groupId: Id<"groups">, // REQUIRED: Multi-tenant isolation properties: { // Type-specific properties (JSON) // For creator: email?: string, username?: string, niche?: string[], // For token: contractAddress?: string, totalSupply?: number, // etc... }, status: "active" | "inactive" | "draft" | "published" | "archived", createdAt: number, updatedAt: number, deletedAt?: number } ``` ### Properties by Thing Type **Creator Properties:** ```typescript { email: string, username: string, displayName: string, bio?: string, avatar?: string, niche: string[], expertise: string[], targetAudience: string, brandColors?: { primary: string, secondary: string, accent: string }, totalFollowers: number, totalContent: number, totalRevenue: number, // MULTI-TENANT ROLES role: "platform_owner" | "group_owner" | "group_user" | "customer", groupId?: Id<"groups">, // Current/default group (if group_owner or group_user) permissions?: string[], // Additional permissions } ``` **Organization Properties:** ```typescript { name: string, slug: string, // URL-friendly identifier domain?: string, // Custom domain (e.g., acme.one.ie) logo?: string, description?: string, status: "active" | "suspended" | "trial" | "cancelled", plan: "starter" | "pro" | "enterprise", limits: { users: number, // Max users allowed storage: number, // GB apiCalls: number, // Per month }, usage: { users: number, // Current users storage: number, // GB used apiCalls: number, // This month }, billing: { customerId?: string, // Stripe customer ID subscriptionId?: string, // Stripe subscription ID currentPeriodEnd?: number, }, settings: { allowSignups: boolean, requireEmailVerification: boolean, enableTwoFactor: boolean, allowedDomains?: string[], // Email domain whitelist }, createdAt: number, trialEndsAt?: number, } ``` **AI Clone Properties:** ```typescript { voiceId?: string, voiceProvider?: "elevenlabs" | "azure" | "custom", appearanceId?: string, appearanceProvider?: "d-id" | "heygen" | "custom", systemPrompt: string, temperature: number, knowledgeBaseSize: number, lastTrainingDate: number, totalInteractions: number, satisfactionScore: number } ``` **Agent Properties:** ```typescript { agentType: "strategy" | "marketing" | "sales" | ..., systemPrompt: string, model: string, temperature: number, capabilities: string[], tools: string[], totalExecutions: number, successRate: number, averageExecutionTime: number } ``` **Token Properties:** ```typescript { contractAddress: string, blockchain: "base" | "ethereum" | "polygon", standard: "ERC20" | "ERC721" | "ERC1155", totalSupply: number, circulatingSupply: number, price: number, marketCap: number, utility: string[], burnRate: number, holders: number, transactions24h: number, volume24h: number } ``` **Course Properties:** ```typescript { title: string, description: string, thumbnail?: string, modules: number, lessons: number, totalDuration: number, price: number, currency: string, tokenPrice?: number, enrollments: number, completions: number, averageRating: number, generatedBy: "ai" | "human" | "hybrid", personalizationLevel: "none" | "basic" | "advanced" } ``` **Website Properties:** ```typescript { domain: string, subdomain: string, // creator.one.ie template: "minimal" | "showcase" | "portfolio", customCSS?: string, customDomain?: string, sslEnabled: boolean, analytics: { visitors30d: number, pageViews: number, conversionRate: number } } ``` **Livestream Properties:** ```typescript { title: string, scheduledAt: number, startedAt?: number, endedAt?: number, platform: "youtube" | "twitch" | "custom", streamUrl: string, recordingUrl?: string, viewersPeak: number, viewersAverage: number, chatEnabled: boolean, aiCloneMixEnabled: boolean, // For human + AI mixing status: "scheduled" | "live" | "ended" | "cancelled" } ``` **Payment Properties (Consolidated):** ```typescript { protocol: "x402" | "acp" | "ap2" | "stripe", // Protocol identifier amount: number, currency: "usd" | "eur", paymentMethod: "stripe" | "crypto", stripePaymentIntentId?: string, txHash?: string, // Blockchain transaction status: "pending" | "completed" | "failed" | "refunded", fees: number, netAmount: number, processedAt?: number, // Protocol specifics scheme?: "permit", // X402 network?: "base", // X402/Crypto invoiceId?: string // ACP } ``` **Subscription Properties:** ```typescript { tier: "starter" | "pro" | "enterprise", price: number, currency: string, interval: "monthly" | "yearly", status: "active" | "cancelled" | "past_due" | "expired", currentPeriodStart: number, currentPeriodEnd: number, cancelAt?: number, stripeSubscriptionId?: string } ``` **Metric Properti