kitcn

# Aggregates > Prerequisites: `setup/server.md` Canonical runtime rules: - Use ORM scalar metrics (`aggregateIndex` + `count()`/`aggregate()`) for counts, sums, averages - Use `_count` relation loading instead of per-row `.count()` fanout loops - Use `rankIndex` + `rank()` for rankings, random access, sorted pagination - `aggregateIndex` and `rankIndex` backfill automatically via `kitcn dev` — no manual trigger wiring needed ## ORM Scalar Metrics ### `aggregateIndex` Schema Declaration Declare count/aggregate coverage in table definitions: ```ts const orders = convexTable( "orders", { orgId: text(), amount: integer(), score: integer() }, (t) => [ aggregateIndex("by_org") .on(t.orgId) .sum(t.amount) .avg(t.amount) .min(t.score) .max(t.score), aggregateIndex("all_metrics").all().sum(t.amount).count(t.orgId), ] ); ``` - `.on(fields)` — filter key fields (namespaced counts) - `.all()` — unfiltered global metrics - `.count(field)` / `.sum(field)` / `.avg(field)` / `.min(field)` / `.max(field)` — chainable metrics After deploying, CLI runs `aggregateBackfill` automatically. Wait for `aggregateBackfillStatus` to report `READY`. ### `count()` — O(1) No-Scan Counts ```ts const total = await ctx.orm.query.todos.count({ where: { projectId } }); ``` Unfiltered `count()` uses native Convex count syscall (no aggregateIndex required). Filtered `count()` accepts `eq`, `in`, `isNull`, `gt`, `gte`, `lt`, `lte`, conjunction via `AND`, and bounded finite DNF `OR` when every branch is index-plannable on one `aggregateIndex`. Requires matching `aggregateIndex`. Windowed count: `count({ where, orderBy, skip, take, cursor })` counts rows within a window. - `skip`/`take` for pagination windows, `cursor` for "after this value" counting (requires `orderBy`, single field in v1) - `count({ select: { field: true } })` with `skip`/`take`/`cursor` throws `COUNT_FILTER_UNSUPPORTED` in v1 | Error | Cause | | -------------------------- | -------------------------------------------- | | `COUNT_NOT_INDEXED` | No `aggregateIndex` matches the filter shape | | `COUNT_FILTER_UNSUPPORTED` | Uses unsupported operators | | `COUNT_INDEX_BUILDING` | Index still backfilling | | `COUNT_RLS_UNSUPPORTED` | Called in RLS-restricted context | ### `aggregate()` — Prisma-style Aggregate Blocks ```ts const stats = await ctx.orm.query.orders.aggregate({ where: { orgId: "org-1" }, _count: { _all: true }, _sum: { amount: true }, _avg: { amount: true }, }); ``` Same filter rules as `count()`. Supports bounded finite DNF `OR` when every branch is index-plannable and resolves to one `aggregateIndex`. Windowed aggregate: - `orderBy` + `cursor` works for `_count/_sum/_avg/_min/_max` - `skip`/`take` are `_count`-only in v1 (`AGGREGATE_ARGS_UNSUPPORTED` for non-count metrics) because metric window skip/take is not bucket-computable under strict no-scan ### `groupBy()` — Finite Indexed Groups Only `groupBy()` is supported with strict no-scan bounds: - `by` is required - every `by` field must be constrained in `where` via `eq`/`in`/`isNull` - `orderBy` supports `by` fields and selected metric fields - `skip`/`take`/`cursor` require explicit `orderBy` - `having` supports conjunction filters on `by` fields and selected metrics - `OR`/`NOT` in `having` are unsupported (`AGGREGATE_FILTER_UNSUPPORTED`) ```ts const rows = await ctx.orm.query.orders.groupBy({ by: ["orgId"], where: { orgId: { in: ["org-1", "org-2"] }, status: "paid" }, _count: true, _sum: { amount: true }, orderBy: [{ _count: "desc" }, { _sum: { amount: "desc" } }], having: { _count: { gt: 0 } }, take: 10, }); ``` #### When to use `groupBy` vs alternatives Use `groupBy` when you need **multi-bucket metrics in one call** where each bucket is a distinct field value: | Pattern | Use instead | Why | | ---------------------------------------------------------- | ----------------------------------------- | ------------------------------------- | | Multiple `.count()` calls with different filter values | `groupBy({ by, _count })` | One call replaces N sequential counts | | `findMany` + manual Map/reduce grouping in JS | `groupBy({ by, _count, _sum })` | O(log n) per bucket vs O(n) scan | | Sampling + estimation (e.g. "count admins from 100 users") | `groupBy({ by: ['role'], _count })` | Exact counts, no estimation | | Dashboard stats with breakdowns by category | `groupBy({ by: ['status'], _sum, _avg })` | Single query for full breakdown | Delta from parity: Unlike Prisma, `groupBy` requires every `by` field to be finite-constrained in `where` (`eq`/`in`/`isNull`) and backed by an `aggregateIndex`. Unconstrained `by` fields throw `AGGREGATE_ARGS_UNSUPPORTED`. ### `findMany({ distinct })` (Unsupported) `findMany({ distinct })` is not available to keep strict no-scan/index-backed guarantees. If you need deduplication, use select-pipeline distinct: ```ts const result = await ctx.orm.query.todos .select() .distinct({ fields: ["status"] }) .paginate({ cursor: null, limit: 100 }); ``` ### Relation `_count` — Best Practice **Always prefer `_count` relation loading over per-row `.count()` fanout loops.** Single query with embedded count vs N+1 separate count queries. ```ts // ❌ BAD: N+1 count queries (one per tag) const tags = await ctx.orm.query.tags.findMany({ where: { createdBy: ctx.userId }, }); const usageCounts = await Promise.all( tags.map((tag) => ctx.orm.query.todoTags.count({ where: { tagId: tag.id } })) ); return tags.map((tag, idx) => ({ ...tag, usageCount: usageCounts[idx] ?? 0, })); // ✅ GOOD: Single query with embedded _count const tags = await ctx.orm.query.tags.findMany({ where: { createdBy: ctx.userId }, with: { _count: { todos: true, }, }, }); return tags.map((tag) => ({ ...tag, usageCount: tag._count?.todos ?? 0, })); ``` Filtered `_count`: ```ts const users = await ctx.orm.query.user.findMany({ with: { _count: { todos: { where: { deletionTime: { isNull: true } }, }, }, }, }); const usersWithTodos = users.filter( (user) => (user._count?.todos ?? 0) > 0 ).length; ``` Through-filtered `_count` works for `through()` relations: ```ts const users = await ctx.orm.query.users.findMany({ with: { _count: { memberTeams: { where: { name: "Core" } }, }, }, }); // users[0]._count?.memberTeams => 1 ``` Works on `findMany`, `findFirst`, `findFirstOrThrow`. Access via `row._count?.relation ?? 0`. ### Mutation `returning({ _count })` ```ts const [user] = await ctx.orm .insert(usersTable) .values({ name: "Alice" }) .returning({ id: usersTable.id, _count: { posts: true }, }); // user._count?.posts => 0 const [updated] = await ctx.orm .update(usersTable) .set({ name: "Bob" }) .where(eq(usersTable.id, userId)) .returning({ id: usersTable.id, _count: { posts: { where: { status: "published" } } }, }); // updated._count?.posts => 2 ``` Works on `insert`, `update`, and `delete`. ### `_sum` Nullability `_sum` returns `null` for empty sets or when all field values are `null` (Prisma-compatible): ```ts // Empty table or all-null amounts → { _sum: { amount: null } } // Non-empty with values → { _sum: { amount: 1500 } } ``` ## Ranked Access With `rankIndex` For **rankings**, **random access**, and **sorted pagination**. ORM-native, no external dependency, backfills automatically. | Operation | Description | | ------------------------------------ | --------------------------- | | `rank().indexOf({ id })` | Position/rank of a document | | `rank().at(offset)` | Row at a specific position | | `rank().paginate({ cursor, limit })` | Ordered page traversal | | `rank().max()` / `rank().min()` | Extremes by rank order | | `rank().random()` | Random row from ranked set | | `rank().count()` / `rank().sum()` | Ranked-set count/sum | ### Declaring `rankIndex` ```ts const scores = convexTable( "scores", { gameId: text().notNull(), score: integer().notNull(), createdAt: timestamp().notNull(), userId: text().notNull(), }, (t) => [ rankIndex("leaderboard") .partitionBy(t.gameId) .orderBy({ column: t.score, direction: "desc" }) .orderBy({ column: t.createdAt, direction: "asc" }) .sum(t.score), rankIndex("global_leaderboard") .all() .orderBy({ column: t.score, direction: "desc" }), ] ); ``` `partitionBy(...)` isolates ranked sets per unique partition value. `.all()` for global (unpartitioned). ### Ranked Queries ```ts const leaderboard = ctx.orm.query.scores.rank("leaderboard", { where: { gameId }, }); const top10 = await leaderboard.paginate({ cursor: null, limit: 10 }); const userRank = await leaderboard.indexOf({ id: userId }); const thirdPlace = await leaderboard.at(2); const best = await leaderboard.max(); const worst = await leaderboard.min(); const randomPick = await leaderboard.random(); const total = await leaderboard.count(); const totalScore = await leaderboard.sum(); ``` ### Leaderboard + User Stats ```ts const lb = ctx.orm.query.scores.rank("leaderboard", { where: { gameId: input.gameId }, }); const globalRank = await lb.indexOf({ id: ctx.userId }); const totalPlayers = await lb.count(); ``` ### Best Practices ```ts // ✅ Partition per tenant to isolate write hot spots rankIndex("tenant_scores") .partitionBy(t.tenantId) .orderBy({ column: t.score, direction: "desc" }); // ❌ Global rank can create cross-tenant contention rankIndex("global_scores") .all() .orderBy({ column: t.score, direction: "desc" }); ``` ## Repair If rank or aggregate state gets out of sync: ```bash kitcn aggregate rebuild ``` ## When to Use | Need | Use | | ---------------------- | --------------------------------------------------------------- | | Counts, sums, averages | ORM Scalar Metrics (`aggregateIndex` + `count()`/`aggregate()`) | | Relation counts | `_count` relation loading (`with: { _count: { ... } }`) | | Rankings, leaderboards | `rankIndex` + `rank()` (`indexOf`, `at`, `paginate`) | | Random document access | `rankIndex` + `rank()` (`random()`, `at()`) | | Sorted pagination | `rankIndex` + `rank()` (`paginate({ cursor, limit })`) | | Non-table data | Model as a table, then use `aggregateIndex` or `rankIndex` | ## Limitations | Consideration | Guideline | | ---------------- | ------------------------------------------------------ | | Update frequency | High-frequency updates to nearby keys cause contention | | Key size | Keep composite keys reasonable (3-4 components max) | | Namespace count | Each namespace has overhead | | Query patterns | Design keys for actual needs | ## API Reference ### Prisma Parity Matrix (No-Scan) | Prisma feature | Status | Notes | | ----------------------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------- | | `aggregate({ _count/_sum/_avg/_min/_max, where })` | Supported | Bucket-backed, no base-table scan fallback | | `aggregate({ _sum })` nullability | Supported | Returns `null` for empty/all-null sets | | `groupBy({ by, where, _count/_sum/_avg/_min/_max })` | Supported | `by` fields must be finite-constrained (`eq/in/isNull`) in `where` | | `groupBy({ having/orderBy/skip/take/cursor })` | Partial | Supported for finite index-bounded groups with conjunction-only `having` | | `count()` | Supported | Native Convex count syscall | | `count({ where })` | Supported | Indexed scalar subset | | `count({ where, select: { _all, field } })` | Supported | Field counts require `aggregateIndex.count(field)` | | `findMany({ with: { _count: { relation: true } } })` | Supported | Indexed relation counts | | `findMany({ with: { _count: { relation: { where } } } })` | Supported | Direct relation scalar filters | | `aggregate({ orderBy/take/skip/cursor })` | Partial | `orderBy/cursor` supported; `skip/take` is `_count`-only in v1 | | Advanced aggregate/count filters (`OR/NOT/string/relation`) | Partial | Bounded finite DNF `OR` rewrite is supported when branches resolve to one `aggregateIndex`; `NOT`/string/relation filters are blocked | | Relation `_count` nested relation filter | Blocked | `RELATION_COUNT_FILTER_UNSUPPORTED` | | `findMany({ distinct })` | Blocked | Not available under strict no-scan contract. Use `select().distinct({ fields })` | | Relation `_count` filtered through relation | Supported | Indexed `through()` relation filters | | Mutation return `_count` parity | Supported | `returning({ _count })` on insert/update/delete |