oneie
Version:
Build apps, websites, and AI agents in English. Zero-interaction setup for AI agents (Claude Code, Cursor, Windsurf). Download to your computer, run in the cloud, deploy to the edge. Open source and free forever.
1,608 lines (1,289 loc) • 132 kB
Markdown
---
title: Ontology
dimension: knowledge
category: ontology.md
tags: 6-dimensions, ai, architecture, ontology
related_dimensions: connections, events, groups, people, things
scope: global
created: 2025-11-25
updated: 2025-11-25
version: 2.0.0
ai_context: |
This document is part of the knowledge dimension in the ontology.md category.
Location: one/knowledge/ontology-v2.md
Purpose: Documents one platform - ontology specification v2
Related dimensions: connections, events, groups, people, things
For AI agents: Read this to understand ontology.
---
# ONE Platform - Ontology Specification V2
**Version:** 2.0.0 (Reality as DSL - The Universal Code Generation Language)
**Status:** Active - Reality-Aware Architecture
**Design Principle:** This isn't just a data model. It's a Domain-Specific Language (DSL) that models reality itself, enabling 98% AI code generation accuracy through compound structure.
---
## Why This Changes Everything
### The Breakthrough: Reality as DSL
**Most developers think databases model their application.**
We flipped this. **The 6-dimension ontology models reality itself**. Applications map to it.
This enables:
- **98% AI code generation accuracy** (not 30-70%)
- **Compound structure** (each feature makes the next MORE accurate, not less)
- **Universal feature import** (clone ANY system into the ontology)
- **Never breaks** (reality doesn't change, technology does)
### What AI Sees
**Traditional Codebase (Pattern Divergence):**
```
Feature 1: createUser(email) ────────┐
Feature 2: addProduct(name) ─────────┼─→ 100 patterns
Feature 3: registerCustomer(data) ───┤ AI confused
Feature 4: insertOrder(items) ───────┤ Accuracy: 30%
...each uses different approach
```
**ONE Codebase (Pattern Convergence):**
```
Feature 1: provider.things.create({ type: "user" }) ────┐
Feature 2: provider.things.create({ type: "product" }) ─┼─→ 1 pattern
Feature 3: provider.things.create({ type: "customer" })─┤ AI masters it
Feature 4: provider.things.create({ type: "order" }) ───┤ Accuracy: 98%
...all use same pattern
```
**The difference:** Traditional codebases teach AI 100 patterns (chaos). ONE teaches AI 1 pattern (mastery).
### Why This Never Breaks
**Reality is stable. Technology changes.**
The 6 dimensions model reality:
1. **Groups** - Containers exist (friend circles → governments)
2. **People** - Actors authorize (who can do what)
3. **Things** - Entities exist (users, products, courses, agents)
4. **Connections** - Relationships relate (owns, purchased, enrolled_in)
5. **Events** - Actions happen (created, updated, purchased)
6. **Knowledge** - Understanding emerges (embeddings, search, RAG)
These dimensions NEVER change because they model reality itself, not any specific technology.
**Examples of systems that map perfectly:**
- **Shopify** → Products (things), Orders (connections + events), Customers (people)
- **Moodle** → Courses (things), Enrollments (connections), Completions (events)
- **Stripe** → Payments (things), Transactions (connections + events), Customers (people)
- **WordPress** → Posts (things), Authors (people), Categories (knowledge labels)
**Every system maps to the same 6 dimensions.** That's why AI agents achieve 98% accuracy.
---
## Structure
This ontology is organized into 6 dimension files:
1. **[organisation.md](./organisation.md)** - Multi-tenant isolation & ownership
2. **[people.md](./people.md)** - Authorization, governance, & user customization
3. **[things.md](./things.md)** - 66 entity types (what exists)
4. **[connections.md](./connections.md)** - 25 relationship types (how they relate)
5. **[events.md](./events.md)** - 67 event types (what happened)
6. **[knowledge.md](./knowledge.md)** - Vectors, embeddings, RAG (what it means)
**Execution Guide:**
7. **[todo.md](./todo.md)** - 100-cycle execution sequence (plan in cycles, not days)
**This document (Ontology.md)** contains the complete technical specification. The consolidated files above provide focused summaries and patterns.
**Planning Paradigm:** We don't plan in days. We plan in **cycle passes** (Cycle 1-100). See [todo.md](./todo.md) for the complete 100-cycle template that guides feature implementation from idea to production.
## The 6-Dimension Reality Model
**This is the universal interface.** Every feature in every system maps to these 6 dimensions.
**Every single thing in ONE platform exists within one of these 6 dimensions:**
```
┌──────────────────────────────────────────────────────────────┐
│ 1. GROUPS │
│ Multi-tenant isolation with hierarchical nesting - who owns │
│ what at group level (friend circles → DAOs → governments) │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ 2. PEOPLE │
│ Authorization & governance - platform owner, group owners │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ 3. THINGS │
│ Every "thing" - users, agents, content, tokens, courses │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ 4. CONNECTIONS │
│ Every relationship - owns, follows, taught_by, powers │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ 5. EVENTS │
│ Every action - purchased, created, viewed, completed │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ 6. KNOWLEDGE │
│ Labels + chunks + vectors powering RAG & search │
└──────────────────────────────────────────────────────────────┘
```
**The Universal Interface (How Technology Implements the Ontology):**
```
┌─────────────────────────────────────────────────────────────────────┐
│ LAYER 1: UNIVERSAL INTERFACE │
│ (The 6-Dimension DSL) │
├─────────────────────────────────────────────────────────────────────┤
│ groups → Hierarchical containers (friend circles → governments)│
│ people → Authorization & governance (who can do what) │
│ things → All entities (66 types: user, product, course...) │
│ connections → All relationships (25 types: owns, purchased...) │
│ events → All actions (67 types: created, updated, logged...) │
│ knowledge → AI understanding (embeddings, search, RAG) │
│ │
│ This layer NEVER changes. It models reality. │
└──────────────────┬──────────────────────────────────────────────────┘
│
↓ Technology changes, ontology stays the same
┌─────────────────────────────────────────────────────────────────────┐
│ TECHNOLOGY ADAPTERS (swap freely) │
│ (Convex, Hono, Astro, React, etc.) │
├─────────────────────────────────────────────────────────────────────┤
│ Backend: Hono API + Convex Database (implements ontology) │
│ Frontend: Astro SSR + React Islands (renders ontology) │
│ Real-time: Convex hooks (live ontology subscriptions) │
│ Static: Astro Content Collections (ontology as files) │
│ │
│ Technology can be swapped. Ontology stays the same. │
└─────────────────────────────────────────────────────────────────────┘
```
### Dimension 1: Groups (Containers)
**Purpose:** Partition the system with hierarchical nesting (friend circles → DAOs → governments)
**Why it never changes:** Containers always contain things. Whether it's a lemonade stand or a global government, the concept of "container" is universal.
**Pattern for AI:**
```typescript
// AI learns: Everything belongs to a group
provider.things.create({ groupId, type, name, properties });
```
**Example mappings:**
- Shopify Store → group (type: business)
- Moodle School → group (type: organization)
- DAO Treasury → group (type: dao)
- Friend Circle → group (type: friend_circle)
### Dimension 2: People (Authorization)
**Purpose:** Define who can do what (actors, roles, permissions)
**Why it never changes:** Authorization is a universal concept. Someone always performs actions.
**Pattern for AI:**
```typescript
// AI learns: Every action has an actor
events.log({ actorId: personId, type, targetId });
```
**Example mappings:**
- Shopify Admin → person (role: org_owner)
- Moodle Student → person (role: customer)
- Platform Owner → person (role: platform_owner)
- Team Member → person (role: org_user)
### Dimension 3: Things (Entities)
**Purpose:** All nouns in the system (66 types, infinitely extensible)
**Why it never changes:** Entities exist. Users, products, courses, agents—these are all "things" with different types.
**Pattern for AI:**
```typescript
// AI learns: One pattern for all entities
provider.things.create({ type: "product" | "course" | "user" | ..., name, properties })
```
**Example mappings:**
- Shopify Product → thing (type: product)
- Moodle Course → thing (type: course)
- Stripe Payment → thing (type: payment)
- WordPress Post → thing (type: blog_post)
**New entity type?** Just add to `properties`. No schema migration needed.
### Dimension 4: Connections (Relationships)
**Purpose:** How entities relate to each other (25 types + metadata)
**Why it never changes:** Relationships are universal. Things connect to other things.
**Pattern for AI:**
```typescript
// AI learns: One pattern for all relationships
provider.connections.create({
fromThingId,
toThingId,
relationshipType,
metadata,
});
```
**Example mappings:**
- Shopify Order → connection (type: purchased) + event (type: order_placed)
- Moodle Enrollment → connection (type: enrolled_in)
- GitHub Follows → connection (type: following)
- Token Holdings → connection (type: holds_tokens, metadata: { balance })
### Dimension 5: Events (Actions)
**Purpose:** Complete audit trail of what happened when (67 types + metadata)
**Why it never changes:** Actions happen at specific times. This is universal.
**Pattern for AI:**
```typescript
// AI learns: All actions are logged the same way
provider.events.log({ type, actorId, targetId, timestamp, metadata });
```
**Example mappings:**
- Shopify Checkout → event (type: payment_processed)
- Moodle Lesson View → event (type: content_viewed)
- User Login → event (type: user_login)
- Token Purchase → event (type: tokens_purchased)
### Dimension 6: Knowledge (Understanding)
**Purpose:** Labels, embeddings, and semantic search for AI
**Why it never changes:** Categorization and understanding are universal concepts.
**Pattern for AI:**
```typescript
// AI learns: Knowledge is linked to things
provider.knowledge.create({
sourceThingId,
knowledgeType: "label" | "chunk",
text,
embedding,
});
```
**Example mappings:**
- WordPress Categories → knowledge (type: label)
- Course Content → knowledge (type: chunk, embedding: [...])
- Product Tags → knowledge (type: label)
- Semantic Search → knowledge vector search
---
**Golden Rule:** If you can't map your feature to these 6 dimensions, you're thinking about it wrong.
**For AI Agents:** This ontology is your universal language. Learn these 6 patterns and you can generate ANY feature with 98% accuracy.
---
## Compound Structure Accuracy: Why AI Gets Better Over Time
### Traditional AI Code Generation (Pattern Divergence)
**The death spiral:**
```
Generation 1: Clean code → 95% accurate
Generation 2: Slight drift → 90% accurate (-5% - patterns starting to diverge)
Generation 3: Pattern divergence → 80% accurate (-10% - AI sees multiple patterns)
Generation 4: Inconsistency → 65% accurate (-15% - AI confused by variations)
Generation N: Unmaintainable mess → 30% accurate (-20% - complete chaos)
```
**Why?** Each feature introduces NEW patterns. AI has to guess which pattern to use.
### ONE's AI Code Generation (Pattern Convergence)
**The virtuous cycle:**
```
Generation 1: Maps to ontology → 85% accurate (learning the ontology)
Generation 2: Follows patterns → 90% accurate (+5% - recognizing service pattern)
Generation 3: Reuses services → 93% accurate (+3% - composing existing services)
Generation 4: Predictable structure → 96% accurate (+3% - mastering Effect.ts patterns)
Generation N: Perfect consistency → 98%+ accurate (+2% - generalized patterns)
```
**Why?** Each feature uses the SAME patterns. AI masters the ontology, then reuses it.
### How Agents Learn the Ontology
**Generation 1-5: Learning (85% accurate)**
```
Agent thinks:
"I see things being created with `type` field. Let me check..."
"All entities are things? Even users, products, courses?"
"Connections link things together. Got it."
"Events log actions. People are actors."
"Pattern emerging: 6 dimensions for everything."
CONFIDENCE: 85% (still learning)
```
**Generation 6-20: Recognizing (90% accurate)**
```
Agent thinks:
"Every create operation uses provider.things.create. 100% of the time."
"Every service uses Effect.gen. Pattern confirmed."
"All errors are tagged unions. I can handle exhaustively."
"Dependencies are injected. I know what's needed."
CONFIDENCE: 90% (pattern recognized)
```
**Generation 21-50: Composing (93% accurate)**
```
Agent thinks:
"I can reuse existing services:"
"- createThing service exists"
"- createConnection service exists"
"- logEvent service exists"
"My new feature: compose these three services."
"No new patterns needed. Just composition."
CONFIDENCE: 93% (composing existing patterns)
```
**Generation 51-100: Mastering (96% accurate)**
```
Agent thinks:
"I've generated 50 services. All follow same structure:"
"1. Get provider from context"
"2. Call provider operation"
"3. Handle typed errors"
"4. Return typed result"
"Pattern is UNIVERSAL. Works for ANY feature."
CONFIDENCE: 96% (mastered the structure)
```
**Generation 100+: Generalizing (98%+ accurate)**
```
Agent thinks:
"I don't even need to think. The pattern IS the system."
"New feature request? Map to 6 dimensions."
"Need validation? Effect.ts service."
"Need data? Provider interface."
"Need state? Nanostores."
"Every decision is deterministic."
CONFIDENCE: 98%+ (system internalized)
```
### What This Means for Development
**Feature #1:**
- Traditional: 8 hours (70% AI, 30% human)
- ONE: 8 hours (70% AI, 30% human)
- **No difference yet**
**Feature #10:**
- Traditional: 10 hours (60% AI, 40% human - patterns diverging)
- ONE: 6 hours (85% AI, 15% human - patterns converging)
- **ONE is 1.7x faster**
**Feature #50:**
- Traditional: 16 hours (40% AI, 60% human - technical debt)
- ONE: 3 hours (95% AI, 5% human - pattern mastery)
- **ONE is 5.3x faster**
**Feature #100:**
- Traditional: 24 hours (25% AI, 75% human - chaos)
- ONE: 1.5 hours (98% AI, 2% human - generalized)
- **ONE is 16x faster**
**Cumulative for 100 features:**
- Traditional: 1,400 hours
- ONE: 350 hours
- **ONE is 4x faster overall**
- **And the gap keeps growing**
### Why Schema Migrations Never Break This
**New entity type?**
```typescript
// NO schema migration needed
{ type: "new_thing", name: "...", properties: { ...custom } }
```
**New field on existing type?**
```typescript
// NO schema migration needed
{ type: "product", properties: { price, SKU, newField: "value" } }
```
**New relationship?**
```typescript
// NO schema migration needed
{ relationshipType: "new_connection", metadata: { ...custom } }
```
**New protocol integration?**
```typescript
// NO schema migration needed
{
relationshipType: "transacted",
metadata: { protocol: "new_protocol", ...custom }
}
```
**Result:** Technology changes (React → Svelte, REST → GraphQL), but the ontology stays the same forever.
---
## GROUPS: The Isolation Boundary with Hierarchical Nesting
Purpose: Partition the system with perfect isolation and support nested groups (groups within groups) - from friend circles to DAOs to governments. Every group owns its own graph of things, connections, events, and knowledge.
### Group Structure
```typescript
{
_id: Id<'groups'>,
slug: string, // REQUIRED: URL identifier (/group/slug)
name: string, // REQUIRED: Display name
type: 'friend_circle' | 'business' | 'community' | 'dao' | 'government' | 'organization',
parentGroupId?: Id<'groups'>, // OPTIONAL: Parent group for hierarchical nesting
description?: string, // OPTIONAL: About text
metadata: Record<string, any>,
settings: {
visibility: 'public' | 'private',
joinPolicy: 'open' | 'invite_only' | 'approval_required',
plan: 'starter' | 'pro' | 'enterprise',
limits: {
users: number,
storage: number, // GB
apiCalls: number,
}
},
status: 'active' | 'archived',
createdAt: number,
updatedAt: number,
}
```
### Common Fields by Use Case
**Identity:** `[slug, name]` - Who they are + URL
**Web:** `[slug, name, description]` - Website generation
**Operations:** `[status, type, settings, parentGroupId]` - System management
### Why Groups Matter
1. **Multi-Tenant Isolation:** Each group's data is completely separate
2. **Hierarchical Nesting:** Groups can contain sub-groups for complex organizations (parent → child → grandchild...)
3. **Flexible Types:** From friend circles (2 people) to businesses to DAOs to governments (billions)
4. **Resource Quotas:** Control costs and usage per group
5. **Privacy Control:** Groups can be public or private with controlled access
6. **Flexible Scale:** Scales from friend circles to global governments without schema changes
### Hierarchical Group Examples by Domain
**E-Commerce (Retail Chain):**
```
Corporate Headquarters (group)
├─ North American Division (child group)
│ ├─ New York Store (grandchild group)
│ └─ California Store (grandchild group)
└─ European Division (child group)
├─ London Store (grandchild group)
└─ Paris Store (grandchild group)
```
**Education (University System):**
```
MIT (group)
├─ School of Engineering (child group)
│ ├─ Computer Science Dept (grandchild group)
│ ├─ Electrical Engineering Dept (grandchild group)
│ └─ Mechanical Engineering Dept (grandchild group)
├─ School of Science (child group)
│ ├─ Mathematics Dept (grandchild group)
│ └─ Physics Dept (grandchild group)
└─ School of Business (child group)
```
**Creator (Multi-Channel Brand):**
```
Creator Brand (group)
├─ YouTube Channel (child group)
│ └─ Content Series 1 (grandchild group)
├─ Podcast (child group)
│ └─ Season 2 (grandchild group)
└─ Community (child group - Discord server with channels)
```
**Crypto (DAO Treasury):**
```
DAO Treasury (group)
├─ Core Operations (child group)
│ ├─ Development Fund (grandchild group)
│ └─ Marketing Fund (grandchild group)
├─ Investment Committee (child group)
│ └─ Venture Capital Allocation (grandchild group)
└─ Community Grants (child group)
```
---
### System Group Pattern (Global Entities)
**Problem:** Some entities are truly global and don't belong to any user group.
**Examples:**
- Platform-wide settings
- System notifications
- Global rate limits
- Reference data (timezones, currencies, countries)
- Platform-level analytics
**Solution:** Reserve a special "system" group.
```typescript
// Create system group on platform initialization
const SYSTEM_GROUP_ID = 'system';
await ctx.db.insert('groups', {
_id: SYSTEM_GROUP_ID,
slug: 'system',
name: 'System',
type: 'organization',
settings: {
visibility: 'private',
joinPolicy: 'invite_only',
plan: 'enterprise',
limits: { users: Infinity, storage: Infinity, apiCalls: Infinity }
},
status: 'active',
createdAt: Date.now(),
});
// Use for global entities
await ctx.db.insert('things', {
type: 'platform_setting',
name: 'Global Rate Limit',
groupId: SYSTEM_GROUP_ID, // System group
properties: {
maxRequestsPerMinute: 1000,
scope: 'global'
},
status: 'active',
createdAt: Date.now(),
});
```
**Rules:**
- System group ID is reserved and cannot be deleted
- Only platform owners can create things in system group
- System group has no resource limits
- System entities are visible to all groups (read-only)
---
## PEOPLE: Authorization & Governance
Purpose: Define who can do what. People direct groups, customize AI agents, and govern access.
### Person Structure
```typescript
{
_id: Id<'people'>,
email: string,
username: string,
displayName: string,
// CRITICAL: Role determines access level
role: 'platform_owner' | 'group_owner' | 'group_user' | 'customer',
// Group context
groupId?: Id<'groups'>, // Current/default group
permissions?: string[],
// Profile
bio?: string,
avatar?: string,
// Multi-tenant tracking
groups: Id<'groups'>[], // All groups this person belongs to
createdAt: number,
updatedAt: number,
}
```
### Four Roles
1. **Platform Owner** (Anthony)
- Owns the ONE Platform
- 100% revenue from platform-level services
- Can access all groups (support/debugging)
- Creates new groups
2. **Group Owner**
- Owns/manages one or more groups
- Controls users, permissions, billing within group
- Customizes AI agents and frontend
- Revenue sharing with platform
3. **Group User**
- Works within a group
- Limited permissions (defined by group owner)
- Can create content, run agents (within quotas)
4. **Customer**
- External user consuming content
- Purchases tokens, enrolls in courses
- No admin access
### Why People Matter
1. **Authorization:** Every action must have an actor (person)
2. **Governance:** Group owners control who can do what
3. **Audit Trail:** Events log who did what when
4. **Customization:** People teach AI agents their preferences
---
## KNOWLEDGE: Labels, Chunks, and Vectors (RAG)
Purpose: unify taxonomy (“tags”) and retrieval‑augmented generation (RAG) under one table. A knowledge item can be a label (former tag), a document wrapper, or a chunk with an embedding.
Design principles:
- Protocol‑agnostic: store protocol details in `metadata`.
- Many‑to‑many: link knowledge ⇄ things via `thingKnowledge` with optional context metadata.
- Scalable: consolidated types minimize index fan‑out; embeddings enable semantic search.
### Knowledge Types
```typescript
type KnowledgeType =
| "label" // replaces legacy "tag"; lightweight categorical marker
| "document" // wrapper for a source text/blob (pre-chunking)
| "chunk" // atomic chunk of text with embedding
| "vector_only"; // embedding without stored text (e.g., privacy)
```
### Knowledge Structure
```typescript
{
_id: Id<'knowledge'>,
knowledgeType: KnowledgeType,
// Textual content (optional for label/vector_only)
text?: string,
// Embedding for semantic search (optional for label/document)
embedding?: number[], // Float32 vector; model-dependent dimension
embeddingModel?: string, // e.g., "text-embedding-3-large"
embeddingDim?: number,
// Source linkage
sourceThingId?: Id<'things'>, // Primary source entity
sourceField?: string, // e.g., 'content', 'transcript', 'title'
chunk?: { index: number; start?: number; end?: number; tokenCount?: number; overlap?: number },
// Lightweight categorization (free-form)
labels?: string[], // Replaces per-thing tags; applied to knowledge
// Additional metadata (protocol, language, mime, hash, version)
metadata?: Record<string, any>,
createdAt: number,
updatedAt: number,
deletedAt?: number,
}
```
### Junction: thingKnowledge
```typescript
{
_id: Id<'thingKnowledge'>,
thingId: Id<'things'>,
knowledgeId: Id<'knowledge'>,
role?: 'label' | 'summary' | 'chunk_of' | 'caption' | 'keyword',
// Context for the link (e.g., confidence, section name)
metadata?: Record<string, any>,
createdAt: number,
}
```
### Indexes (recommended)
- `knowledge.by_type` (knowledgeType)
- `knowledge.by_source` (sourceThingId)
- `knowledge.by_created` (createdAt)
- `thingKnowledge.by_thing` (thingId)
- `thingKnowledge.by_knowledge` (knowledgeId)
- Vector index (provider-dependent): `knowledge.by_embedding` for ANN search
### How Domains Apply Knowledge
**Education - Learning Objectives & Study Materials:**
```typescript
// Knowledge: Learning objective chunk
{
knowledgeType: 'chunk',
text: 'Students should be able to solve quadratic equations',
sourceThingId: courseId,
labels: ['subject:mathematics', 'grade:9-12', 'objective:apply', 'skill:algebra']
}
// Link: Course references this learning objective
{
thingId: courseId,
knowledgeId: knowledgeId,
role: 'learning_objective'
}
```
**Creator - Content SEO & Discovery:**
```typescript
// Knowledge: Video description chunk with embedded metadata
{
knowledgeType: 'chunk',
text: 'This video teaches React hooks for beginners...',
sourceThingId: videoId,
embedding: [0.1, 0.2, ...],
labels: ['topic:react', 'difficulty:beginner', 'platform:youtube', 'series:javascript101']
}
```
**E-Commerce - Product Categorization & Search:**
```typescript
// Knowledge: Product description for semantic search
{
knowledgeType: 'document',
text: 'Blue wireless headphones with 40-hour battery life',
sourceThingId: productId,
embedding: [0.5, 0.3, ...],
labels: ['category:electronics', 'color:blue', 'feature:wireless', 'price_range:premium']
}
```
**Crypto - Risk Analysis & Token Intelligence:**
```typescript
// Knowledge: Token risk assessment
{
knowledgeType: 'chunk',
text: 'Token has no minting restrictions, moderate holder concentration',
sourceThingId: tokenId,
labels: ['risk:medium', 'metric:tvl_trend_up', 'audit:completed', 'governance:none']
}
// Knowledge: Protocol dependency analysis
{
knowledgeType: 'label',
text: 'Depends on Chainlink oracle',
sourceThingId: protocolId,
labels: ['dependency:critical', 'type:oracle', 'risk_factor:oracle']
}
```
### RAG Ingestion Strategy
Objective: Attach vectors to **relevant** content for high-quality retrieval while controlling costs and maintaining performance.
**CRITICAL:** Not every field needs RAG. Be selective. Embeddings are expensive in storage, compute, and money.
---
#### What to Embed (Decision Matrix by Domain)
**Universal Rule:**
```
IF "user will semantically search this" → EMBED
IF "user will filter/sort this" → DON'T EMBED
IF "structured data" → DON'T EMBED
```
| Content Type | Embed? | Domain Example | Use Case |
|--------------|--------|----------------|----------|
| **Long-form Content** |||||
| Blog post content | ✅ YES | Creator | "Find posts about React hooks" |
| Course lesson content | ✅ YES | E-Learning | "Search lessons on form validation" |
| Video/podcast transcripts | ✅ YES | E-Learning, Creator | Makes A/V content searchable |
| Email campaign body | ✅ YES | Creator, E-Commerce | Content discovery |
| **Product Content** |||||
| Product descriptions | ✅ YES | E-Commerce | "Find eco-friendly water bottles" |
| Product specs (JSON) | ❌ NO | E-Commerce | Use filters: `size === 'L'` |
| Customer reviews | ✅ YES | E-Commerce | "What do people say about durability?" |
| Q&A responses | ✅ YES | E-Commerce | Customer support knowledge base |
| Prices, SKUs, inventory | ❌ NO | E-Commerce | Exact match: `price < 50` |
| **Social Content** |||||
| Social post text (>100 chars) | ✅ YES | Social | "Find my AI posts with high engagement" |
| Social post text (<100 chars) | ❌ NO | Social | Too short, use labels |
| Thread content (combined) | ✅ YES | Social | Combine into single chunk |
| Hashtags | ❌ NO | Social | Exact match, not semantic |
| Comments (>50 words) | ⚠️ MAYBE | Social | Only for community insights |
| **Image Generation** |||||
| Image prompts | ✅ YES | Image Gen | "Find cyberpunk city prompts" |
| Prompt descriptions | ✅ YES | Image Gen | Style discovery |
| Negative prompts | ✅ YES | Image Gen | "Avoid common mistakes" |
| Generation params | ❌ NO | Image Gen | Use filters: `steps === 50` |
| Image pixels | ❌ NO | Image Gen | Use CLIP embeddings separately |
| **Educational Content** |||||
| Course descriptions | ✅ YES | E-Learning | Discovery + recommendations |
| Lesson summaries | ✅ YES | E-Learning | "React hooks for beginners" |
| Student notes | ✅ YES | E-Learning | Personal knowledge base |
| Quiz questions | ⚠️ MAYBE | E-Learning | Only for study guides |
| Progress data | ❌ NO | E-Learning | Use analytics: `progress >= 0.5` |
| Certificates | ❌ NO | E-Learning | Metadata only |
| **Metadata** |||||
| Titles, summaries | ✅ YES | All | High signal-to-noise |
| Descriptions (>50 words) | ✅ YES | All | Context for search |
| Tags, categories | ❌ NO | All | Use `labels` instead (free) |
| **User-Generated** |||||
| Bios, profiles | ⚠️ MAYBE | Social | Only for people search |
| **System Data** |||||
| Logs, errors | ❌ NO | All | Use log aggregation tools |
| Metrics, analytics | ❌ NO | All | Use time-series DB |
| Audit trails | ❌ NO | All | Events table is sufficient |
**Domain-Specific Examples:**
**E-Commerce:**
```typescript
// ✅ EMBED: Product discovery
"Find sustainable yoga mats" → Semantic search on descriptions
// ❌ DON'T EMBED: Filtering
"Show mats under $30" → Filter: price < 30
"In stock only" → Filter: inventory > 0
```
**E-Learning:**
```typescript
// ✅ EMBED: Course/lesson discovery
"Learn React hooks for beginners" → Semantic search on course descriptions + lesson transcripts
// ❌ DON'T EMBED: Progress tracking
"Show my completed courses" → Filter: connections where completed = true
```
**Image Generation:**
```typescript
// ✅ EMBED: Prompt library
"Cyberpunk city at night" → Semantic search on successful prompts
// ❌ DON'T EMBED: Generation settings
"Images with CFG 7.5" → Filter: metadata.cfg === 7.5
```
**Social Posting:**
```typescript
// ✅ EMBED: Content inspiration
"My posts about AI with high engagement" → Semantic search on post text
// ❌ DON'T EMBED: Engagement metrics
"Posts with >1000 likes" → Filter: engagement.likes > 1000
```
**Cost Reality Check (10K items):**
| Domain | What to Embed | Monthly Cost |
|--------|---------------|--------------|
| E-Commerce | Product descriptions | ~$1.30 |
| E-Learning | Lesson transcripts | ~$13 (longer content) |
| Image Gen | Prompts + descriptions | ~$0.50 |
| Social | Long posts only | ~$0.80 |
**Key Insight:** Be ruthlessly selective. Only embed content users will **semantically search**, not data they'll **filter or sort**.
---
#### When to Update Embeddings
**Trigger:** Content changes in source thing.
```typescript
// On content update
export const updateBlogPost = mutation({
handler: async (ctx, { postId, content }) => {
// 1. Update the thing
await ctx.db.patch(postId, {
properties: { content },
updatedAt: Date.now(),
});
// 2. Schedule re-embedding (debounced)
await ctx.scheduler.runAfter(5000, internal.knowledge.reEmbedThing, {
thingId: postId,
fields: ['content'], // Only re-embed changed fields
});
// 3. Log event
await ctx.db.insert('events', {
type: 'content_event',
actorId: ctx.auth.userId!,
targetId: postId,
groupId: post.groupId,
timestamp: Date.now(),
metadata: { action: 'updated', triggeredReEmbedding: true },
});
},
});
```
**Re-embedding Strategy:**
| Change Type | Action | Why |
|-------------|--------|-----|
| Content edited | Re-embed immediately | Content changed |
| Title/summary edited | Re-embed immediately | High-signal metadata |
| Tags/labels changed | Update labels only | No embedding needed |
| Status changed (draft→published) | Re-embed if first publish | Visibility changed |
| Minor typo fix | Debounce 5 seconds | Avoid re-embedding every keystroke |
| Bulk import | Batch embed (100/batch) | Rate limiting |
**Cost Optimization:**
```typescript
// Hash content to detect actual changes
import { createHash } from 'crypto';
export const reEmbedThing = internalMutation({
handler: async (ctx, { thingId, fields }) => {
const thing = await ctx.db.get(thingId);
const content = fields.map(f => thing.properties[f]).join('\n');
// Hash current content
const contentHash = createHash('sha256').update(content).digest('hex');
// Check if content actually changed
const existingKnowledge = await ctx.db
.query('knowledge')
.withIndex('by_source')
.filter(q => q.eq(q.field('sourceThingId'), thingId))
.first();
if (existingKnowledge?.metadata?.contentHash === contentHash) {
console.log('Content unchanged, skipping re-embedding');
return; // Save $$$ by skipping
}
// Content changed, re-embed
await embedAndStore(ctx, thing, content, contentHash);
},
});
```
---
#### Chunking Standard
**Window:** ~800 tokens (~3,200 characters)
**Overlap:** ~200 tokens (~800 characters)
**Boundaries:** Sentence-aware (don't split mid-sentence)
```typescript
export async function chunkText(text: string): Promise<Chunk[]> {
const chunks: Chunk[] = [];
const sentences = text.split(/[.!?]+\s+/); // Split on sentence boundaries
let currentChunk = '';
let currentTokens = 0;
let chunkIndex = 0;
for (const sentence of sentences) {
const sentenceTokens = estimateTokens(sentence);
if (currentTokens + sentenceTokens > 800 && currentChunk.length > 0) {
// Save chunk
chunks.push({
index: chunkIndex++,
text: currentChunk.trim(),
tokenCount: currentTokens,
start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0,
end: currentChunk.length,
});
// Start new chunk with overlap (last 200 tokens)
const overlapText = getLastNTokens(currentChunk, 200);
currentChunk = overlapText + ' ' + sentence;
currentTokens = 200 + sentenceTokens;
} else {
currentChunk += ' ' + sentence;
currentTokens += sentenceTokens;
}
}
// Save final chunk
if (currentChunk.length > 0) {
chunks.push({
index: chunkIndex,
text: currentChunk.trim(),
tokenCount: currentTokens,
start: chunks.length > 0 ? chunks[chunks.length - 1].end - 200 : 0,
end: currentChunk.length,
});
}
return chunks;
}
```
---
#### Embedding Pipeline
```typescript
// 1. Schedule embedding
export const scheduleEmbeddingForThing = mutation({
handler: async (ctx, { thingId, fields }) => {
await ctx.scheduler.runAfter(0, internal.knowledge.embedThing, {
thingId,
fields,
});
},
});
// 2. Embed text (internal action - calls OpenAI)
export const embedText = internalAction({
handler: async (ctx, { text, model = 'text-embedding-3-large' }) => {
const response = await openai.embeddings.create({
model,
input: text,
});
return {
embedding: response.data[0].embedding,
dim: response.data[0].embedding.length,
};
},
});
// 3. Store chunks with embeddings
export const upsertKnowledgeChunks = internalMutation({
handler: async (ctx, { thingId, chunks, embeddings }) => {
// Delete old chunks
const oldChunks = await ctx.db
.query('knowledge')
.withIndex('by_source')
.filter(q => q.eq(q.field('sourceThingId'), thingId))
.collect();
for (const old of oldChunks) {
await ctx.db.delete(old._id);
}
// Insert new chunks
for (let i = 0; i < chunks.length; i++) {
const knowledgeId = await ctx.db.insert('knowledge', {
knowledgeType: 'chunk',
text: chunks[i].text,
embedding: embeddings[i].embedding,
embeddingModel: 'text-embedding-3-large',
embeddingDim: embeddings[i].dim,
sourceThingId: thingId,
chunk: chunks[i],
metadata: {
contentHash: chunks[i].hash,
embeddingVersion: 'v3',
},
createdAt: Date.now(),
});
// Link to thing
await ctx.db.insert('thingKnowledge', {
thingId,
knowledgeId,
role: 'chunk_of',
createdAt: Date.now(),
});
}
},
});
```
---
#### Cost Management
**Embedding Costs (OpenAI text-embedding-3-large):**
- $0.13 per 1M tokens
- Average blog post: ~1,000 tokens = $0.00013
- 1M blog posts embedded: ~$130
**Storage Costs:**
- 3,072 dimensions × 4 bytes = 12KB per chunk
- 1M chunks = 12GB of vector data
- Convex: ~$0.25/GB/month = $3/month per 1M chunks
**Optimization Strategies:**
1. **Selective Embedding:** Only embed content types with high search value
2. **Lazy Embedding:** Embed on first publish, not on draft save
3. **Batch Processing:** Embed 100 items at a time to avoid rate limits
4. **Content Hashing:** Skip re-embedding if content unchanged
5. **Smaller Models:** Use `text-embedding-3-small` (512 dims) for less critical content (75% cost savings)
---
#### Query & Retrieval
```typescript
export const semanticSearch = query({
args: { query: v.string(), groupId: v.id('groups'), limit: v.number() },
handler: async (ctx, { query, groupId, limit = 10 }) => {
// 1. Embed query
const queryEmbedding = await ctx.runAction(internal.knowledge.embedText, {
text: query,
});
// 2. Vector search (filtered by group)
const results = await ctx.db
.vectorSearch('knowledge', 'by_embedding', {
vector: queryEmbedding.embedding,
limit: limit * 2, // Over-fetch for filtering
filter: q => q.eq(q.field('knowledgeType'), 'chunk'),
})
.collect();
// 3. Filter by group (get source things)
const groupResults = [];
for (const result of results) {
const sourceThing = await ctx.db.get(result.sourceThingId);
if (sourceThing?.groupId === groupId) {
groupResults.push({
...result,
score: result._score,
thing: sourceThing,
});
}
if (groupResults.length >= limit) break;
}
return groupResults;
},
});
```
---
#### Governance & Lifecycle
**Versioning:**
- Store `metadata.contentHash` of source content
- If hash unchanged, skip re-embedding
- Track `metadata.embeddingVersion` for model migrations
**Retention:**
- Archive old chunks on major content edits (keep last 3 versions)
- Garbage collect orphaned knowledge items (no thingKnowledge links)
- Delete embeddings when source thing is hard-deleted
**Quality:**
- Track `metadata.qualityScore` based on user feedback
- Monitor search relevance metrics
- A/B test embedding models
**Summary:** Be ruthlessly selective about what gets embedded. RAG is powerful but expensive. Embed content users will semantically search, not structured data they'll filter.
### Knowledge Governance
Policy: Default is free-form, user-extensible knowledge labels for maximum flexibility and zero schema churn.
- Curated label prefixes (recommended): `skill:*`, `industry:*`, `topic:*`, `format:*`, `goal:*`, `audience:*`, `technology:*`, `status:*`, `capability:*`, `protocol:*`, `payment_method:*`, `network:*`.
- Validation: Enforce label hygiene (no duplicates within scope); allow synonyms via an alias list if needed.
- Ownership: Platform/group owners may curate official labels; users can still apply ad‑hoc labels.
- Hygiene: Periodically consolidate low-usage duplicates; do not delete knowledge items with active references—mark deprecated instead.
---
## THINGS: All The "Things"
### What Goes in Things?
**Simple test:** If you can point at it and say "this is a \_\_\_", it's a thing.
Examples:
- "This is a **creator**" ✅ Thing
- "This is a **blog post**" ✅ Thing
- "This is a **token**" ✅ Thing
- "This is a **relationship**" ❌ Connection, not thing
- "This is a **purchase**" ❌ Event, not thing
### Thing Types
**66 Types Organized in 13 Categories:**
```typescript
type ThingType =
// CORE (4)
| "creator" // Human creator (role: platform_owner, org_owner, org_user, customer)
| "ai_clone" // Digital twin of creator
| "audience_member" // Fan/user (role: customer)
| "organization" // Multi-tenant organization
// BUSINESS AGENTS (10)
| "strategy_agent" // Vision, planning, OKRs
| "research_agent" // Market, trends, competitors
| "marketing_agent" // Content strategy, SEO, distribution
| "sales_agent" // Funnels, conversion, follow-up
| "service_agent" // Support, onboarding, success
| "design_agent" // Brand, UI/UX, assets
| "engineering_agent" // Tech, integration, automation
| "finance_agent" // Revenue, costs, forecasting
| "legal_agent" // Compliance, contracts, IP
| "intelligence_agent" // Analytics, insights, predictions
// CONTENT (7)
| "blog_post" // Written content (guides, newsletters, articles)
| "video" // Video content (lectures, demos, shorts)
| "podcast" // Audio content (episodes, interviews)
| "social_post" // Social media post (all platforms)
| "email" // Email content (campaigns, newsletters)
| "course" // Educational course (programs, learning paths)
| "lesson" // Individual lesson (units, modules, segments)
// PRODUCTS (4)
| "digital_product" // Templates, tools, assets
| "membership" // Tiered membership (Patreon, Substack)
| "consultation" // 1-on-1 session (coaching, support)
| "nft" // NFT collectible (governance, utility)
// COMMUNITY (3)
| "community" // Community space (Discord, forums)
| "conversation" // Thread/discussion (boards, channels)
| "message" // Individual message (chat, DM)
// TOKEN (2)
| "token" // Actual token instance
| "token_contract" // Smart contract
// KNOWLEDGE (2)
| "knowledge_item" // Piece of creator knowledge
| "embedding" // Vector embedding
// PLATFORM (6)
| "website" // Auto-generated creator site
| "landing_page" // Custom landing pages (campaigns, sales)
| "template" // Design templates (reusable components)
| "livestream" // Live broadcast (streaming, webinars)
| "recording" // Saved livestream content
| "media_asset" // Images, videos, files
// BUSINESS (7)
| "payment" // Payment transaction
| "subscription" // Recurring subscription
| "invoice" // Invoice record
| "metric" // Tracked metric
| "insight" // AI-generated insight
| "prediction" // AI prediction
| "report" // Analytics report
// AUTHENTICATION & SESSION (5)
| "session" // User session (Better Auth)
| "oauth_account" // OAuth connection (GitHub, Google)
| "verification_token" // Email/2FA verification token
| "password_reset_token" // Password reset token
| "ui_preferences" // User UI settings (theme, layout)
// MARKETING (6)
| "notification" // System notification
| "email_campaign" // Email marketing campaign
| "announcement" // Platform announcement
| "referral" // Referral record
| "campaign" // Marketing campaign
| "lead" // Potential customer/lead
// EXTERNAL INTEGRATIONS (3)
| "external_agent" // External AI agent (ElizaOS)
| "external_workflow" // External workflow (n8n, Zapier)
| "external_connection" // Connection config
// PROTOCOL ENTITIES (2, protocol-agnostic)
| "mandate" // Intent/cart (AP2, shopping)
| "product"; // Sellable product (ACP marketplace)
```
**How Domains Apply These Types:**
- **E-Commerce**: Uses `product` (catalog items), `mandate` (shopping carts), `payment` (transactions), `subscription` (auto-renewals), `membership` (loyalty), `notification` (order updates), `email_campaign` (promotional)
- **Education**: Uses `course` (programs), `lesson` (units), `community` (cohorts), `assignment` (assessments), `conversation` (discussion boards), `metric` (grades), `report` (transcripts)
- **Creator**: Uses `video` (YouTube/TikTok), `podcast` (episodes), `blog_post` (newsletters), `membership` (tiers), `course` (products), `email_campaign` (outreach), `metric` (engagement), `insight` (analytics)
- **Crypto**: Uses `token` (holdings), `token_contract` (smart contracts), `metric` (TVL/volume), `payment` (transfers), `knowledge_item` (risk profiles), `report` (protocol analysis)
### Thing Structure
```typescript
{
_id: Id<"things">,
type: ThingType,
name: string, // Display name
groupId: Id<"groups">, // REQUIRED: Multi-tenant isolation
properties: { // Type-specific properties (JSON)
// For creator:
email?: string,
username?: string,
niche?: string[],
// For token:
contractAddress?: string,
totalSupply?: number,
// etc...
},
status: "active" | "inactive" | "draft" | "published" | "archived",
createdAt: number,
updatedAt: number,
deletedAt?: number
}
```
### Properties by Thing Type
**Creator Properties:**
```typescript
{
email: string,
username: string,
displayName: string,
bio?: string,
avatar?: string,
niche: string[],
expertise: string[],
targetAudience: string,
brandColors?: {
primary: string,
secondary: string,
accent: string
},
totalFollowers: number,
totalContent: number,
totalRevenue: number,
// MULTI-TENANT ROLES
role: "platform_owner" | "group_owner" | "group_user" | "customer",
groupId?: Id<"groups">, // Current/default group (if group_owner or group_user)
permissions?: string[], // Additional permissions
}
```
**Organization Properties:**
```typescript
{
name: string,
slug: string, // URL-friendly identifier
domain?: string, // Custom domain (e.g., acme.one.ie)
logo?: string,
description?: string,
status: "active" | "suspended" | "trial" | "cancelled",
plan: "starter" | "pro" | "enterprise",
limits: {
users: number, // Max users allowed
storage: number, // GB
apiCalls: number, // Per month
},
usage: {
users: number, // Current users
storage: number, // GB used
apiCalls: number, // This month
},
billing: {
customerId?: string, // Stripe customer ID
subscriptionId?: string, // Stripe subscription ID
currentPeriodEnd?: number,
},
settings: {
allowSignups: boolean,
requireEmailVerification: boolean,
enableTwoFactor: boolean,
allowedDomains?: string[], // Email domain whitelist
},
createdAt: number,
trialEndsAt?: number,
}
```
**AI Clone Properties:**
```typescript
{
voiceId?: string,
voiceProvider?: "elevenlabs" | "azure" | "custom",
appearanceId?: string,
appearanceProvider?: "d-id" | "heygen" | "custom",
systemPrompt: string,
temperature: number,
knowledgeBaseSize: number,
lastTrainingDate: number,
totalInteractions: number,
satisfactionScore: number
}
```
**Agent Properties:**
```typescript
{
agentType: "strategy" | "marketing" | "sales" | ...,
systemPrompt: string,
model: string,
temperature: number,
capabilities: string[],
tools: string[],
totalExecutions: number,
successRate: number,
averageExecutionTime: number
}
```
**Token Properties:**
```typescript
{
contractAddress: string,
blockchain: "base" | "ethereum" | "polygon",
standard: "ERC20" | "ERC721" | "ERC1155",
totalSupply: number,
circulatingSupply: number,
price: number,
marketCap: number,
utility: string[],
burnRate: number,
holders: number,
transactions24h: number,
volume24h: number
}
```
**Course Properties:**
```typescript
{
title: string,
description: string,
thumbnail?: string,
modules: number,
lessons: number,
totalDuration: number,
price: number,
currency: string,
tokenPrice?: number,
enrollments: number,
completions: number,
averageRating: number,
generatedBy: "ai" | "human" | "hybrid",
personalizationLevel: "none" | "basic" | "advanced"
}
```
**Website Properties:**
```typescript
{
domain: string,
subdomain: string, // creator.one.ie
template: "minimal" | "showcase" | "portfolio",
customCSS?: string,
customDomain?: string,
sslEnabled: boolean,
analytics: {
visitors30d: number,
pageViews: number,
conversionRate: number
}
}
```
**Livestream Properties:**
```typescript
{
title: string,
scheduledAt: number,
startedAt?: number,
endedAt?: number,
platform: "youtube" | "twitch" | "custom",
streamUrl: string,
recordingUrl?: string,
viewersPeak: number,
viewersAverage: number,
chatEnabled: boolean,
aiCloneMixEnabled: boolean, // For human + AI mixing
status: "scheduled" | "live" | "ended" | "cancelled"
}
```
**Payment Properties (Consolidated):**
```typescript
{
protocol: "x402" | "acp" | "ap2" | "stripe", // Protocol identifier
amount: number,
currency: "usd" | "eur",
paymentMethod: "stripe" | "crypto",
stripePaymentIntentId?: string,
txHash?: string, // Blockchain transaction
status: "pending" | "completed" | "failed" | "refunded",
fees: number,
netAmount: number,
processedAt?: number,
// Protocol specifics
scheme?: "permit", // X402
network?: "base", // X402/Crypto
invoiceId?: string // ACP
}
```
**Subscription Properties:**
```typescript
{
tier: "starter" | "pro" | "enterprise",
price: number,
currency: string,
interval: "monthly" | "yearly",
status: "active" | "cancelled" | "past_due" | "expired",
currentPeriodStart: number,
currentPeriodEnd: number,
cancelAt?: number,
stripeSubscriptionId?: string
}
```
**Metric Properti