@clickup/ent-framework

# Ent Framework <div align="left"><figure><img src="https://844702935-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNRmYySpdaO8NqIAEQYuO%2Fuploads%2Fgit-blob-6ccce58424156951ca18f24fdb38a5218c496786%2Flogo-berkshire-swash.svg?alt=media" alt="" width="375"><figcaption></figcaption></figure></div> The TypeScript library for working with microsharded PostgreSQL databases. * [Getting Started and Tutorials](https://ent-framework.net) * [API documentation](https://github.com/clickup/ent-framework/blob/main/docs/globals.md) * [Source code](https://github.com/clickup/ent-framework) * [Ent Framework's Discord](https://discord.gg/QXvN6VTCKS) #### Core Features 1. **Graph-like representation of entities.** With Ent Framework, you represent each Ent (a domain object of your business logic) as a TypeScript class with immutable properties. An Ent class instance maps to one row of some table in a relational database (like PostgreSQL). It may look similar to ORM, but has many aspects that traditional ORMs don't have. 2. **Row-level security in a graph (privacy layer).** You manage data as a graph where each node is an Ent instance, and each edge is a field link (think of foreign keys) to other Ents. To be allowed to read (or update/delete) some Ent, you define a set of explicit rules like "user can read EntA if they can read EntB or EntC". And, consequently, in EntB you define its own set of rules, like "user can read EntB if they can read EntD". 3. **Query batching and coalescing.** Ent Framework holistically solves the "N+1 selects" problem commonly known in ORM world. You still write you code as if you work with individual Ents and individual IDs, and the framework magically takes care of sending batched requests (both read and write) to the underlying relational database. You do not work with lists and JOINs anymore. 4. **Microsharding and replication lag tracking support out of the box.** Splitting your database horizontally is like a breeze now: Ent Framework takes care of routing the requests to the proper microshards. When scaling reads, Ent Framework knows whether a replica node is "good enough" for that particular query. It automatically uses the proper replica when possible, falling back to master when not. 5. **Pluggable to your existing relational database.** If your project already uses some ORM or runs raw SQL queries, Ent Framework can be plugged in. 6. **Tens of other features.** Some examples: cross-microshards foreign keys, composite fields, triggers, build-in caching etc. #### Installation ``` npm add ent-framework pnpm add ent-framework yarn add ent-framework ``` <div align="left"><figure><img src="https://github.com/clickup/ent-framework/actions/workflows/ci.yml/badge.svg?branch=main" alt="" width="188"><figcaption></figcaption></figure></div> # Code Structure Below, we'll show some Ent Framework usage examples. We will progress from the simplest code snippets to more and more advanced topics, like: * custom ID schemas * privacy rules * triggers * composite field types * Viewer Context flavors * master-replica and automatic replication lag tracking * microsharding and migrations * cross-shards foreign keys and inverse indexes * etc. ### Code Structure The examples in this tutorial will approximately follow [examples/next-example](https://github.com/dimikot/ent-framework/tree/main/examples/next-example) `src` folder structure: * ents/ * cluster.sql * cluster.ts * EntComment.ts * EntTopic.ts * EntUser.ts * getServerVC.ts * app/ * api/ * auth/\[...nextauth] * route.ts * topics/ * route.ts # Connect to a Database To start simple, create a PostgreSQL database and several tables there. You can also use you existing database: ```bash $ psql postgresql://postgres:postgres@127.0.0.1/postgres -f ents/cluster.sql ``` {% code title="ents/cluster.sql" %} ```sql CREATE TABLE users( id bigserial PRIMARY KEY, email varchar(256) NOT NULL UNIQUE, is_admin boolean NOT NULL DEFAULT FALSE ); CREATE TABLE topics( id bigserial PRIMARY KEY, created_at timestamptz NOT NULL, updated_at timestamptz NOT NULL, slug varchar(64) NOT NULL UNIQUE, creator_id bigint NOT NULL, subject text DEFAULT NULL ); CREATE TABLE comments( id bigserial PRIMARY KEY, created_at timestamptz NOT NULL, topic_id bigint REFERENCES topics, creator_id bigint NOT NULL, message text NOT NULL ); CREATE TABLE organizations( id bigserial PRIMARY KEY, name text NOT NULL UNIQUE ); CREATE TABLE organization_users( id bigserial PRIMARY KEY, organization_id bigint REFERENCES organizations, user_id bigint REFERENCES users, UNIQUE (organization_id, user_id) ); ``` {% endcode %} To access that database, create an instance of Cluster: {% code title="ents/cluster.ts" fullWidth="false" %} ```typescript import { Cluster } from "ent-framework"; import type { PgClientOptions } from "ent-framework/pg"; import { PgClient } from "ent-framework/pg"; import type { PoolConfig } from "pg"; export const cluster = new Cluster<PgClient, PgClientOptions>({ islands: async () => [ // sync or async { no: 0, nodes: [ { name: "island0-master", config: { connectionString: process.env.DATABASE_URL, // e.g. from .env // This object is of the standard node-postgres type PoolConfig. // Thus, you can use host, port, user, password, database and other // properties instead of connectionString if you want. min: 5, max: 20, } satisfies PoolConfig, }, ], }, ], createClient: (node) => new PgClient(node), loggers: { clientQueryLogger: (props) => console.debug(props.msg), swallowedErrorLogger: (props) => console.log(props), }, }); // Pre-open min number of DB connections. cluster.prewarm(); ``` {% endcode %} Terminology: 1. **Cluster** consists of **Islands**. Each Island is identified by an integer number (there can be many islands for horizontal scaling of the cluster). 2. Island consists of master + replica **nodes** (in the above example, we only define one master node and no replicas).  3. Island also hosts **Microshards** (in the example above, we will have no microshards, aka just one global shard). Microshards may travel from island to island during shards rebalancing process; the engine tracks this automatically ("shards discovery"). Notice that we define the layout of the cluster using a callback. Ent Framework will call it from time to time to refresh the view of the cluster, so in this callback, you can read the data from some centralized configuration database (new nodes may be added, or empty nodes may be removed with no downtime). This is called "dynamic real-time reconfiguration". [PgClient](https://github.com/clickup/ent-framework/blob/main/docs/classes/PgClient.md) class accepts several options, one of them is the standard [node-postgres PoolConfig](https://node-postgres.com/apis/pool) interface. For simplicity, when we define a cluster shape in `islands`, we just return a list of such configs, to be passed into `createClient()` lambda. As of `prewarm()` call, it's explained in Advanced section. # Create Ent Classes Once you have a Cluster instance, you can create Ent classes to access the data. {% code title="ents/EntUser.ts" %} ```typescript import { PgSchema } from "ent-framework/pg"; import { ID, BaseEnt, GLOBAL_SHARD, AllowIf, OutgoingEdgePointsToVC } from "ent-framework"; import { cluster } from "./cluster"; const schema = new PgSchema( "users", { id: { type: ID, autoInsert: "nextval('users_id_seq')" }, email: { type: String }, is_admin: { type: Boolean, autoInsert: "false" }, }, ["email"] ); export class EntUser extends BaseEnt(cluster, schema) { static override configure() { return new this.Configuration({ shardAffinity: GLOBAL_SHARD, privacyInferPrincipal: async (_vc, row) => row.id, privacyLoad: [new AllowIf(new OutgoingEdgePointsToVC("id"))], privacyInsert: [], }); } } ``` {% endcode %} If your app uses UUID type for IDs, replace `{ type: ID, autoInsert: "nextval('users_id_seq')" }` with something like: ```typescript id: { type: String, autoInsert: "gen_random_uuid()" } ``` (Notice that you need to use type `String` and not `ID` for UUID fields. Read more about ID formats and microsharding aspects in [locating-a-shard-id-format](https://docs.ent-framework.net/scalability/locating-a-shard-id-format "mention") article.) Each Ent may also have one optional "unique key" (possible composite) which is treated by the engine in a specific optimized way. In the above example, it's `email`. {% code title="ents/EntTopic.ts" %} ```typescript import { PgSchema } from "ent-framework/pg"; import { ID, BaseEnt, GLOBAL_SHARD, AllowIf, OutgoingEdgePointsToVC, Require, } from "ent-framework"; import { cluster } from "./cluster"; const schema = new PgSchema( "topics", { id: { type: ID, autoInsert: "nextval('topics_id_seq')" }, created_at: { type: Date, autoInsert: "now()" }, updated_at: { type: Date, autoUpdate: "now()" }, slug: { type: String }, creator_id: { type: ID }, subject: { type: String, allowNull: true }, }, ["slug"] ); export class EntTopic extends BaseEnt(cluster, schema) { static override configure() { return new this.Configuration({ shardAffinity: GLOBAL_SHARD, privacyInferPrincipal: async (_vc, row) => row.creator_id, privacyLoad: [new AllowIf(new OutgoingEdgePointsToVC("creator_id"))], privacyInsert: [new Require(new OutgoingEdgePointsToVC("creator_id"))], }); } } ``` {% endcode %} By default, all fields are non-nullable (unless you provide `allowNull` option). Disregard privacy rules for now, it's a more complicated topic which will be covered later. For now, the code should be obvious enough. {% code title="ents/EntComment.ts" %} ```typescript import { PgSchema } from "ent-framework/pg"; import { ID, BaseEnt, AllowIf, CanReadOutgoingEdge, OutgoingEdgePointsToVC, Require, } from "ent-framework"; import { cluster } from "./cluster"; import { EntTopic } from "./EntTopic"; const schema = new PgSchema( "comments", { id: { type: ID, autoInsert: "nextval('comments_id_seq')" }, created_at: { type: Date, autoInsert: "now()" }, topic_id: { type: ID }, creator_id: { type: ID }, message: { type: String }, }, [] ); export class EntComment extends BaseEnt(cluster, schema) { static override configure() { return new this.Configuration({ shardAffinity: GLOBAL_SHARD, privacyInferPrincipal: async (_vc, row) => row.creator_id, privacyLoad: [ new AllowIf(new CanReadOutgoingEdge("topic_id", EntTopic)), new AllowIf(new OutgoingEdgePointsToVC("creator_id")), ], privacyInsert: [new Require(new OutgoingEdgePointsToVC("creator_id"))], }); } } ``` {% endcode %} Since we have no microshards yet, `shardAffinity` basically does nothing. We'll talk about microsharding in [locating-a-shard-id-format](https://docs.ent-framework.net/scalability/locating-a-shard-id-format "mention"). # VC: Viewer Context and Principal One of the most important Ent Framework traits is that it always knows, "who" is sending some read/write query to the database, and is able to check permissions. Typically, that "who" is a user who opens a web page, or on behalf of whom a background worker job is running, but it can be any other **Principal**. This mechanism is quite different from traditional database abstraction layers or ORMs, which typically lack awareness of the specific user on whose behalf the queries are executed. To send a query, you must always have an instance of [VC](https://github.com/clickup/ent-framework/blob/main/docs/classes/VC.md) class in hand (stands for **Viewer Context**). The most important property in a VC is `principal`, it's a string which identifies the party who's acting. Typically, we store some user ID in `vc.principal`. It is intentionally not easy to create a brand new VC instance. In fact, you should only do it once in your app (this VC is called "root VC"), and all other VCs created should **derive** from that VC using its methods. Below is a basic example for [Next.js](https://nextjs.org/) framework. (Of course you can use any other framework like Express or whatever. Next.js is here only for illustrative purposes, it has nothing to do with Ent Framework.) ## Integrate with e.g. Google Auth For simplicity of the example, we'll plug in "Login with Google" feature to our Next app, and then will use the user's email as a primary method of addressing an EntUser. {% code title="app/api/auth/\[...nextauth]/route.ts" %} ```typescript import NextAuth from "next-auth"; import GoogleProvider from "next-auth/providers/google"; const handler = NextAuth({ providers: [ GoogleProvider({ clientId: process.env.GOOGLE_ID, clientSecret: process.env.GOOGLE_SECRET, }), ], }); export { handler as GET, handler as POST }; ``` {% endcode %} Now on any page, you may place a [Sign in button component](https://github.com/dimikot/ent-framework/blob/main/examples/next-example/src/components/SignInButton.tsx): {% code title="components/SignInButton.tsx" %} ```typescript import { signIn } from "next-auth/react"; ... <a onClick={() => signIn("google")}>Sign in</a> ``` {% endcode %} Next.js exposes `getServerSession()` function for server components, to allow you access the session data of the user, including their email: {% code title="app/page.tsx" %} ```typescript import { getServerSession } from "next-auth"; export default async function Home() { const session = await getServerSession(); return session ? ( <div>Welcome, {session.user?.name}!</div> ) : ( <div>Please sign in to continue.</div> ); } ``` {% endcode %} You can also use `getServerSession()` from inside of your API route handlers. ## Build a Request VC Accessor Function The same way as `getServerSession()` gives us access to the user's session, let's build a function that returns a VC instance for that user. Technically, this function should work exactly the same way as `getServerSession()`: it will even use `session.user.email` field from there. And in case the user is not authenticated yet, we still need a "guest VC" to be returned by this function. Such VC can still access some "public" Ents (depending on their privacy rules). The VC instance should be "memoized" per the HTTP request, so if the VC accessor function is called multiple time, it should return the same object. This is critical: otherwise, many Ent Framework features (like queries batching and caching) will just not work as they should. Different frameworks have different ways of attaching a property to the request object. In Next, the easiest way so far is to use `WeakMap` and `headers()` API function. (In Express, you would likely just assign a value to `req.vc` in some middleware.) {% code title="ents/getServerVC.ts" %} ```typescript import { VC } from "ent-framework"; import { getServerSession } from "next-auth"; import { headers } from "next/headers"; import { EntUser } from "./EntUser"; const vcStore = new WeakMap<object, VC>(); export async function getServerVC(): Promise<VC> { const [heads, session] = await Promise.all([headers(), getServerSession()]); let vc = vcStore.get(heads); if (!vc) { vc = VC.createGuestPleaseDoNotUseCreationPointsMustBeLimited(); if (session?.user?.email) { const vcOmni = vc.toOmniDangerous(); let user = await EntUser.loadByNullable(vcOmni, { email: session.user.email, }); if (!user) { // User did not exist: upsert the Ent. await EntUser.insertIfNotExists(vcOmni, { email: session.user.email, is_admin: false, }); user = await EntUser.loadByX(vcOmni, { email: session.user.email, }); } // Thanks to EntUser's privacyInferPrincipal rule, user.vc is // automatically assigned to a new derived VC with principal equals to // user.id. vc = user.vc; } vcStore.set(heads, vc); } return vc; } ``` {% endcode %} We will discuss what `loadByX()` is in the next sections. In short, it **loads** an Ent **by** unique key and throws an e**X**ception (this is what "X" stands for) if it doesn't exist. Here comes the catch: `loadByX()` requires to pass a VC whose principal is the user loading the data. And to derive that VC, we need to call `EntUser#loadByX()`. In our case, it's obviously a "chicken and egg" problem, so we just derive a new VC in "god mode" with `vc.toOmniDangerous()` and allow Ent Framework to bypass privacy checks for the very 1st `EntUser` loaded. ## Use getServerVC() in Your Server Components and APIs So now, everywhere you could use `getServerSession()`, you can use `getServerVC()` as well. For instance, in a server component: {% code title="app/page.tsx" %} ```typescript import { getServerVC } from "@/ents/getServerVC"; export default async function Home() { const vc = await getServerVC(); // <--- return !vc.isGuest() ? ( <div>Your vc.principal={vc.principal}.</div> ) : ( <div>Please sign in to continue.</div> ); } ``` {% endcode %} Or in an API route handle: {% code title="app/api/topics/route.ts" %} ```typescript import { EntTopic } from "@/ents/EntTopic"; import { getServerVC } from "@/ents/getServerVC"; import { NextApiRequest } from "next"; import { NextResponse } from "next/server"; export async function POST(req: NextApiRequest) { const vc = await getServerVC(); // <--- const topic = await EntTopic.insertReturning(vc, { slug: `t${Date.now()}`, creator_id: vc.principal, subject: req.body.subject, }); return NextResponse.json({ id: topic.id }); } ``` {% endcode %} In other frameworks, you would access the per-request VC differently. For instance, in Express, you would likely just read `req.vc` value that you earlier assigned in a middleware. # Ent API: insert\*() Ent Framework exposes an opinionated API which allows to write and read data from the microsharded database. {% code title="app/api/topics/route.ts" %} ```typescript import { EntComment } from "@/ents/EntComment"; import { EntTopic } from "@/ents/EntTopic"; import { EntUser } from "@/ents/EntUser"; import { getServerVC } from "@/ents/getServerVC"; import { NextApiRequest } from "next"; import { NextResponse } from "next/server"; export async function POST(req: NextApiRequest) { const vc = await getServerVC(); const user = await EntUser.loadX(vc, vc.principal); const topic = await EntTopic.insertReturning(vc, { slug: `t${Date.now()}`, creator_id: user.id, subject: String(req.body.subject || "My Topic"), }); const commentID = await EntComment.insert(topic.vc, { topic_id: topic.id, creator_id: user.id, message: String(req.body.subject || "My Message"), }); return NextResponse.json({ message: `Created topic ${topic.id} and comment ${commentID}`, }); } ``` {% endcode %} There are several versions of `insert*` static methods on each Ent class. ## **insertIfNotExists(vc, { field: "...", ... }): string | null** inserts a new Ent and returns its ID or null if the Ent violates unique index constraints. This is a low-level method, all other methods use it internally. ## **insert(vc, { field: "...", ... }): string** Inserts a new Ent and returns its ID. Throws `EntUniqueKeyError` if it violates unique index constraints. Always returns an ID of just-inserted Ent. ## **insertReturning(vc, { field: "...", ... }): Ent** Same as `insert()`, but immediately loads the just-inserted Ent back from the database and returns it. The reasoning is that the database may have fields with default values or even PG triggers, so we always need 2 round-trips to get the actual data. {% hint style="info" %} In fact, `insert*()` methods do way more things. They check privacy rules to make sure that a VC can actually insert the data. They call Ent triggers. They infer a proper microshard to write the data to. We'll discuss all those topics later. {% endhint %} ## VC Embedding When some Ent is loaded in a VC, its `ent.vc` is assigned to that VC. In the above example, we use `req.vc` and `topic.vc` interchangeably.\ \ **Embedding a VC into each Ent is a crucial aspect of Ent Framework.** It allows to remove **lots** of boilerplate from the code. Instead of passing an instance of some VC everywhere from function to function, we can just pass Ents, and we'll always have an up-to-date VC: ```typescript async function loadTopicOfComment(comment: EntComment) { return EntTopic.loadX(comment.vc, comment.topic_id); } async function loadTopicOfCommentUglyDontDoItPlease(vc: VC, commentID: string) { return EntTopic.loadX(vc, commentID); } ``` You almost never need to pass a VC from function to function: pass Ent instances instead. Having an explicit `vc` argument somewhere is a smell. # Built-in Field Types Before we move to the next Ent API calls, let's talk about the Ent field types that are natively supported in Ent Framework: <table><thead><tr><th width="305.828125">Field Definition</th><th width="197.37890625">TypeScript Type</th><th>PostgreSQL Type</th></tr></thead><tbody><tr><td>{ type: String }</td><td>string</td><td>varchar, text, bigint, numeric, ...</td></tr><tr><td>{ type: ID }</td><td>string</td><td>varchar, text, bigint, ...</td></tr><tr><td>{ type: Number }</td><td>number</td><td>int, bigint, doube, ...</td></tr><tr><td>{ type: Date }</td><td>Date</td><td>timestamptz, timestamp</td></tr><tr><td>{ type: Boolean }</td><td>boolean</td><td>boolean</td></tr><tr><td>{ type: EnumType<"a" | "b">() }</td><td>"a" | "b"</td><td>varchar, text, ...</td></tr><tr><td>{ type: EnumType<42 | 101>() }</td><td>42 | 101</td><td>integer, ...</td></tr><tr><td>{ type: EnumType<MyEnum>() }</td><td>MyEnum</td><td>varchar, text, integer, ...</td></tr><tr><td>{ type: YourCustomType }</td><td>see<br><a data-mention href="custom-field-types">custom-field-types</a></td><td>jsonc, bytea or anything else</td></tr></tbody></table> You can also define custom field types: [custom-field-types](https://docs.ent-framework.net/getting-started/custom-field-types "mention") Fields may be *nullable* and *optional*, with the corresponding support from TypeScript side. Nullability and optionality concepts are often times mixed up. In Ent Framework, they are independent on each other and are used for different use cases. ## Nullability: allowNull=true By default, all fields can't store a `null` TypeScript value. To allow storing of a null, use the `allowNull` syntax: ```typescript const schema = new PgSchema( "topics", { ... // TypeScript type will be: string | null. company_id: { type: ID, allowNull: true }, // TypeScript type will be: string (non-nullable). slug: { type: String }, }, ["slug"] ); ``` **Notice that if your field is nullable, it doesn't mean that it is optional.** Nullability and optionality are independent concepts in both Ent Framework and TypeScript. E.g. you can have a required nullable field which allows saving `null` in it, but you will still need to explicitly pass this `null` in your TypeScript code: ```typescript await EntTopic.insertReturning(vc, { slug: "abc" }); // ^ TypeScript error: missing required property, company_id. await EntTopic.insertReturning(vc, { company_id: null, slug: "abc" }); // ^ OK. ``` By default, each field in the schema is **required at insert time**. I.e. if you run an `insert*()` call, then TypeScript won't let you skip a required field. ## Optionality: autoInsert="..." To make a field optional, you can use `autoInsert="sql expression"` modifier: it makes the field optional at insert time. Ent Framework will use the raw SQL expression provided if you don't mention an explicit field value on an insert (which is convenient when doing refactoring for instance). Several examples: ```typescript const schema = new PgSchema( "topics", { // If not passed in insert*() call, uses nextval('topics_id_seq'). id: { type: ID, autoInsert: "nextval('topics_id_seq')" }, // If not passed in insert*() call, uses now(). created_at: { type: Date, autoInsert: "now()" }, // If not passed in insert*() call, uses NULL. company_id: { type: ID, allowNull: true, autoInsert: "NULL" }, // Required AND non-nullable at the same time. slug: { type: String }, }, ["slug"] ); ``` Notice that now `company_id` field is both *optional* and *nullable*. I.e. you can run this code: ```typescript await EntTopic.insertReturning(vc, { slug: "abc" }); // ^ OK: company_id is both optional and nullable. ``` An example of optional, but non-nullable field is `created_at`. I.e. you can omit this field when inserting (and thus, Ent Framework will use `now()` SQL expression for its value), but you can't pass a `null` TypeScript value there, and your `topic.created_at` will be of type `Date`, not `Date | null` or `Date | undefined`. ### autoUpdate There is also one more way to mark the field as optional, use `autoUpdate` modifier. It is very similar to `autoInsert`, but additionally, if the value is omitted at an `update*()` call, then it will be automatically set to the result of the provided SQL expression. A classical use case for it is `updated_at` field: ```typescript const schema = new PgSchema( "topics", { // Defaults to now() if not mentioned at insert time. created_at: { type: Date, autoInsert: "now()" }, // Auto-set to now() if not mentioned at update time. updated_at: { type: Date, autoUpdate: "now()" }, ... }, ["slug"] ); ``` # Ent API: load\*() by ID There is a basic primitive used very frequently: having some Ent ID, load this Ent into memory. {% code title="app/api/comments/\[id]/route.ts" %} ```typescript import { EntComment } from "@/ents/EntComment"; import { getServerVC } from "@/ents/getServerVC"; import { NextApiRequest } from "next"; import { NextResponse } from "next/server"; export async function GET( _req: NextApiRequest, { params }: { params: Promise<{ id: string }> } ) { const vc = await getServerVC(); const comment = await EntComment.loadX(vc, (await params).id); return NextResponse.json({ message: comment.message }); } ``` {% endcode %} There are several versions of `load*` static methods on each Ent class: ## **Ent.loadX(vc, id): Ent** Loads an Ent by ID. Throws `EntNotFoundError` if there is no such Ent in the database, or `EntNotReadableError` if the VC has no permissions to read it. ## **Ent.loadNullable(vc, id): Ent | null** loads an Ent by ID if it exists in the database, otherwise returns null. If an Ent with such ID exists, but the VC doesn't have permissions to access it, the call will throw `EntNotReadableError`. ## **Ent.loadIfReadableNullable(vc, id)**: Ent | null This is a special method designed to return `null` in two cases: when an Ent with the specified ID does not exist, or when the user lacks the necessary permissions to read it. Basically, it never throws. Permissions are enforced by the `privacyLoad` rules of the Ent, which were briefly introduced earlier and will be covered in more detail later. {% hint style="info" %} In most of the cases, prefer `loadX()` and rely on the outer try-catch blocks, as opposed to `loadNullable()` with manual null-checking. Let the framework do its job. And you likely almost never need to use `loadIfReadableNullable()`: it's a smell. {% endhint %} There is intentionally no method which loads multiple Ents at once taking an array of IDs. Read further on, why. # N+1 Selects Solution To reveal some magic, could you please make a small favor? **Stop thinking in terms of lists when loading.** Always think in terms of an individual row/object and an individual ID. Not in terms of an array of IDs: ```typescript async function loadCommentsBadDontDoThis(ids: string[]): Promise<Comment[]> { // Please don't. } async function loadComment(id: string): Promise<Comment> { // Do this: one ID as an input, one row as an output. } ``` It sounds contradictory. In the example above, if we always use `loadComment(id)`, how do we avoid sending too many queries to the database, especially when it comes to loading children records for each loaded parent? (This problem is well known as "N+1 Selects".) The answer is: **let the DB access engine take care of batching**. ## Traditional List Based Approach Imagine we have some list of comment IDs shown on the screen. For each comment, we want to load its creator, the owning topic, and for each topic, load its creator too. Then, return it all as a JSON to the client. Of course we want to send as few SQL queries to the database as possible to minimize connections utilization and round-trip latency. We also do not want to use JOINs (imagine `loadUsers()`, `loadTopics()` and `loadComments()` live in independent modules and don't want to know about each other, plus the data lives in different microshards). First, let's see, what will happen if we think in terms of "load a list of things" abstraction. This is how people used to fight the "N+1 Selects" problem in the past. ```typescript import { map, uniq, keyBy } from "lodash"; async function loadUsers(ids: string): Promise<User[]> { return sql.query("SELECT * FROM users WHERE id = ANY($1)", ids); } async function loadTopics(ids: string): Promise<Topic[]> { return sql.query("SELECT * FROM topics WHERE id = ANY($1)", ids); } async function loadComments(ids: string[]): Promise<Comment[]> { return sql.query("SELECT * FROM comments WHERE id = ANY($1)", ids); } // Loads data using just 3 SQL queries. app.get("/comments", async (req, res) => { const commentIDs = String(req.query.ids).split(","); const comments = keyBy(await loadComments(commentIDs), "id"); const topicIDs = uniq(map(comments, (comment) => comment.topic_id)); const topics = keyBy(await loadTopics(topicIDs), "id"); const userIDs = uniq([ ...map(comments, (comment) => comment.creator_id), ...map(topics, (topic) => topic.creator_id), ]); const users = keyBy(await loadUsers(userIDs), "id"); res.json( map(comments, (comment) => ({ comment, commentCreator: users[comment.creator_id], topic: topics[comment.topic_id], topicCreator: users[topics[comment.topic_id].creator_id], })) ); }); ``` Look at this spaghetti mess. The code appears very coupled. The root of the problem here is clear: we think in terms of the lists, and the code encourages us to "accumulate" lists manually. ### Ent Framework Approach: Automatic Batching Now let's see what happens if we stop thinking in terms of lists and, instead, switch to "per individual object" paradigm. ```typescript // Still using just 3 SQL queries. But wait a second... app.get("/comments", async (req, res) => { const commentIDs = uniq(String(req.query.ids).split(",")); res.json( await Promise.all( commentIDs.map(async (commentID) => { const comment = await EntComment.loadX(req.vc, commentID); const topic = await EntTopic.loadX(req.vc, comment.topic_id); const [commentCreator, topicCreator] = await Promise.all([ EntUser.loadX(req.vc, comment.creator_id), EntUser.loadX(req.vc, topic.creator_id), ]); return { comment, commentCreator, topic, topicCreator }; }) ) ); }); ``` All calls to `uniq()`, `keyBy()` and `map()` are gone. We now use only `loadX(vc, id)` which accepts an individual ID and returns an individual Ent. And still, it runs only 3 SQL queries under the hood: ```sql SELECT * FROM comments WHERE id IN(...); SELECT * FROM topics WHERE id IN(...); SELECT * FROM users WHERE id IN(...); ``` * **Batching:** Ent Framework recognizes that the `loadX()` calls happen in concurrent Promises and batches them together intelligently. * **Coalescing:** in case multiple `loadX(vc, id)` try to load the same Ent by the same ID, Ent Framework coalesces those calls into one. * **Caching:** if enabled, an Ent loaded in some VC remains in the VC's cache, so next time it's attempted to load again, the Ent is returned from the cache directly. Ents are immutable JS objects, so it simplifies things even further. {% hint style="info" %} In fact, Ent Framework does similar batching not only for `loadX()`. It batches all other calls too, including inserts, updates, deletes and even more complicated expression-based multi-row selects. {% endhint %} To learn more about batching, "parallel Promises", and how event loop works in Node, check out [loaders-and-custom-batching](https://docs.ent-framework.net/advanced/loaders-and-custom-batching "mention") article. ## Helper Loading Methods Each Ent is an immutable object, which means that you can't change its fields after loading from the DB. But you can add helper methods to simplify things like loading. Let's optimize the above example even further by adding `topic()` and `creator()` helper methods into Ent classes directly. ```typescript class EntComment extends ... { async topic() { return EntTopic.loadX(this.vc, this.topic_id); } async creator() { return EntUser.loadX(this.vc, this.creator_id); } } class EntTopic extends ... { async creator() { return EntUser.loadX(this.vc, this.creator_id); } } app.get("/comments", async (req, res) => { const commentIDs = String(req.query.ids).split(","); res.json( await mapJoin(commentIDs, async (commentID) => { const comment = await EntComment.loadX(req.vc, commentID); const topic = await comment.topic(); const [commentCreator, topicCreator] = await Promise.all([ comment.creator(), topic.creator(), ]); return { comment, commentCreator, topic, topicCreator }; }) ); }); ``` {% hint style="info" %} `mapJoin(arr, fn)` is a simple wrapper which calls `Promise.all(arr.map(fn))`. {% endhint %} Now it's responsibility of each Ent to load the related data. This will, as previously, produce the same exact 3 DB queries: ```sql SELECT * FROM comments WHERE id IN(...); SELECT * FROM topics WHERE id IN(...); SELECT * FROM users WHERE id IN(...); ``` In traditional ORMs, such helper loading methods are added to the classes automatically. Ent Framework doesn't do it and requires you to write a bit of boilerplate. Why? For general purpose use cases, we may need not one, but 2 method for each field, like `creator()` and `creatorNullable()`, which is not elegant. This is because foreign keys do not work reliably enough across microshards, so in some cases, we should always be ready that some Ent is not in the database, even when its field is technically non-nullable. Luckily, in practice, it is not hard at all to add such methods manually, so we don't lose too much here. ## Batching vs. JOINs In traditional SQL and in many ORMs, people use JOINs to minimize the number of queries they send to the database engine. Despite the JOINs having advantages, they are also problematic: 1. One cannot do JOINs across microshards or machines. 2. JOINs encourage people to write highly coupled code, similar to the 1st example on this page. 3. JOINs generally can't run their subqueries in parallel. Ent Framework's automatic batching can be treated as an alternative to JOINs. It doesn't have any of the above problems, plus (and more importantly), the calls are batched across the entire async functions call stack, which means that you can split the code into independent abstraction layers easily. Stop thinking in terms of lists. Start thinking in terms of an individual Ent and its behavior. {% hint style="info" %} Of course, in some cases, we still want to run JOINs. Ent Framework exposes low-level API to get access to the underlying DB, so you can craft and run arbitrary queries. It also provides you with a `Loader` abstraction and framework to build your own custom batching strategies. We'll discuss it all in details in the advanced section. {% endhint %} # Automatic Batching Examples In the previous chapter, we talked about Ent Framework calls batching. Let's provide some more examples. ## Batching of load\*() Calls The following code will produce only one SQL query: ```typescript await Promise.all([ EntTopic.loadX(vc, "123"), EntTopic.loadX(vc, "456"), EntTopic.loadX(vc, "789"), ]); ``` SQL query produced under the hood: ```sql SELECT * FROM topics WHERE id IN(...) ``` ## Batching of insert\*() Calls Since `insertReturning()` first inserts the Ent into the database and then loads the inserted data back, the following code will produce 2 SQL queries. ```typescript await Promise.all([ EntTopic.insertReturning(vc, { ... }), EntTopic.insertReturning(vc, { ... }), EntTopic.insertReturning(vc, { ... }), ]); ``` SQL queries produced: ```sql INSERT INTO topics (...) VALUES ... RETURNING id; SELECT * FROM topics WHERE id IN(...); ``` Even if `insertReturning()` is called in nested functions, Ent Framework will still batch them properly and produce just 2 queries: ```typescript async function insertTopicsBatch(n: number) { await mapJoin(range(n), async (i) => EntTopic.insertReturning(vc, { ... })); } ... await Promise.all([ insertTopicsBatch(42), insertTopicsBatch(101), ]); ``` ## Batching of Update, Delete and all Other Calls All Ent Framework API calls are subject for batching the way --described above. ## De-batching and Deadlocks As in most of MVCC databases, In PostgreSQL, reads never block writes, and writes never block reads. Still, if two clients update the same row in the database, one client has to wait for another one to finish. If the order of row updates is different in two clients, there is a change of [deadlocks](https://www.postgresql.org/docs/current/runtime-config-locks.html). E.g. imagine Alice updates row A and then row B in the same transaction, whilst Bob first updates B and then A. In this case, Alice will wait until Bob finishes updating row B, but at the same time, Bob will wait until Alice commits the transaction updating A. Thus, they would wait for each other infinitely, and PostgreSQL will cancel one of the transactions. (Notice that this situation never happens when both Alice and Bob update rows A and B in the same order.) Deadlocks may occur during the automatic queries batching. It is rare (especially since Ent Framework always orders the updating rows in a consistent way, by e.g. id), but may still happen. In case of a rare deadlock, when Ent Framework knows that it's safe to retry the write, it performs *de-batching*: splits the batched query into individual queries and runs them in parallel, independently. This solves the problem of deadlocks entirely, in an exchange of very rare slowdown of the mass insert, update or delete operations. # Ent API: select() by Expression The previous chapters explained how to load an Ent by its ID: `load*()` API. Loading by ID is the most basic operation, and it is usually the most common one in the code as well. Now, let's talk about some more complicated ways of loading. TL;DR: ```typescript const comments = await EntComment.select( vc, { topic_id: "123", created_at: { $gte: new Date(Date.now() - 1000 * 3600 * 24) }, }, 100, // limit [{ created_at: "ASC" }] // order by ); ``` ## Dry Boring Theory Below, there will be a bit of theory, fasten your seatbelt. In graph terms, where each Ent is a **node**, an Ent's field that points to the ID of another Ent represents an **edge**. (Or, in relational databases, people typically use "foreign key" term.) We often refer to it as **field edge**; traversing such edges is typically straightforward: you simply load another Ent by the ID obtained from a field of the current Ent. For example, `EntComment#topic_id` or `EntTopic#creator_id` are field edges. From a different perspective, traversing a field edge can be seen as "going from a child Ent to a parent Ent" (for example, from `EntComment` to its owning `EntTopic`). In other words, it’s a **child-to-parent traversal**, or a **many-to-one relationship:** {% @mermaid/diagram content="classDiagram direction BT class EntComment\["EntComment<br><small>(child)</small>"] EntComment : topic\_id EntComment : creator\_id class EntTopic\["EntTopic<br><small>(parent)</small>"] EntTopic : creator\_id class EntUser\["EntUser<br><small>(grandparent)</small>"] EntComment --> EntTopic : <small>field<br>edge</small> EntTopic --> EntUser : <small>field<br>edge</small> EntComment --> EntUser : <small>field<br>edge</small>" %} Nothing too new yet, right? Just a regular relational theory so far. How do we go in the opposite direction, performing a **parent-to-children traversal** in a **one-to-many relationship**? To accomplish this, Ent Framework provides (surprise!) a `select()` primitive. It allows you to fetch Ents from the database using any arbitrary expression, including those that specify constraints on which parent Ent's ID the selected Ents should have: ```typescript const comments = await EntComment.select( vc, { topic_id: "123" }, // "load all children comments of topic 123" 100, // limit [{ created_at: "ASC" }] // order by ); ``` In production databases with millions of Ents, it's assumed that the relevant table has a necessary index to run such queries efficiently; in the above example, ```sql CREATE INDEX ON comments_topic_id_created_at ON comments(topic_id, created_at); ``` Nothing new again. Or there is something?.. {% @mermaid/diagram content="classDiagram direction BT class EntComment1\["EntComment<br><small>(child)</small>"] EntComment1 : topic\_id class EntComment2\["EntComment<br><small>(child)</small>"] EntComment2 : topic\_id class EntComment3\["EntComment<br><small>(child)</small>"] EntComment3 : topic\_id class EntTopic\["EntTopic<br><small>(parent)</small>"] EntTopic : id EntComment1 <-- EntTopic : <small>???</small> EntComment1 --> EntTopic : <small>field<br>edge</small> EntComment2 <-- EntTopic : <small>???</small> EntComment2 --> EntTopic : <small>field<br>edge</small> EntComment3 <-- EntTopic : <small>???</small> EntComment3 --> EntTopic : <small>field<br>edge</small>" %} Let's think about those `???` on the diagram. To traverse edges in a graph in both directions, the edges must be bi-directional (or, there should be pairs of edges, which is the same). In the graph with bi-directional edges we discussed earlier, the child-to-parent direction of an edge is represented by an "Ent field edge". But what corresponds to `???`, the opposite **parent-to-children direction** of that edge? This `???`, dear friends, is the **automatic database index** (or an index prefix, which is `topic_id` in the example). In fact, as we hinted above, without such an index, the queries will just blow up. This distinction between graph edge directions is crucial to understand: for free traversal of the graph, both **field edges** and **indexes** are absolutely essential. * By defining a DB foreign key on an Ent, you define a field edge, which represents child-to-parent direction in the graph. * By defining a DB index, you define the opposite direction of that edge, which is parent-to-children direction. Modern database engines are pretty good at managing indexes. You can add them without acquiring write locks on the tables (`CREATE INDEX CONCURRENTLY`), and you can also add more field edges (aka fields with foreign keys) on a table with no downtime, to refer some other Ent from an existing one. ## What About Microsharding and Horizontal Scaling? The point of view described above works straightforwardly when your database is monolithic. Scaling your app introduces more complexity though due to the involvement of microshards in the traversal process. Luckily, we can still rely on the parent-to-children indices mainly. And this is where the theory pays off. When loading children of a parent Ent, the children might be distributed across multiple microshards. A naive way would thus be to just query all microshards using the exact same query (Ent IDs are globally unique) and then merge the results, but of course it would blow up the DB nodes. Therefore, before Ent Framework executes the actual SELECT queries in parallel on multiple nodes to merge their results later, it first determines the **minimal set of microshards** that needs to be queried; in the vast majority of cases, this is just **one microshard**. Those mechanisms are known as [Inverses](https://docs.ent-framework.net/scalability/inverses-cross-shard-foreign-keys) and [Ent Colocation](https://docs.ent-framework.net/scalability/shard-affinity-ent-colocation) correspondingly, and we’ll explore them in detail later, in advanced sections. For now, all you need to know is that there is a magical subsystem in Ent Framework called Inverses which, given a parent ID (e.g. EntTopic ID), returns the list of microshards where the children Ents (e.g. EntComment) may **or may not** reside. This "may not" is important: cross-shard writes are not transactional, so sometimes (rarely), slightly more candidate microshards may be returned, but **never less**. In reality it produces no problems for business logic: the "excess" microshards, when queried, will just return 0 children Ents. ## Ent.select(vc, { ... }, limit, order): Ent\[] The `select()` API uses a simple query language. If a plain object is passed, it combines all the specified field constraints using an AND operation: ```typescript const comments = await EntComment.select( vc, { topic_id: ["123", "456"], created_at: { $gte: new Date(Date.now() - 1000 * 3600 * 24) }, }, 100, // limit [{ created_at: "ASC" }] // order by ); ``` The full list of operations include: * equality and "one of array element" implicit operators (see examples above) * logical: `$or`, `$and`, `$not` * binary: `$lte`, `$lt`, `$gte`, `$gt` * `$overlap` (useful for array fields, typically backed by a PostgreSQL GIN index) * `$isDistinctFrom` (for NULL-safe comparisons) * `$literal` (to run a custom SQL sub-expression) These operations can be nested in any way, but it's important to ensure that the actual SQL engine uses an appropriate database index for efficiency. If your project uses microsharding, one of the top-level fields in the `select()` expression must match a parent ID or an array of parent IDs to help Ent Framework identify the relevant microshards. Notice that we used `topic_id` for this purpose in the example above. There’s no magic here: sometimes, it has to determine, which microshards are involved. Alternatively, you can use the special `$shardOfID` operator to explicitly provide this hint in the query. For illustrative purposes, below is a giant `select()` expression from one of Ent Framework's unit tests. It is generally obvious, how the operations work (as opposed to e.g. Elasticsearch query language BTW): ```typescript const ents = await EntSome.select( vc, { name: ["aa", "bb"], // matches one of some_flag: true, $or: [ { name: "aa" }, { name: "bb" }, { url_name: [] }, // will never match { url_name: [null, "zzz"] }, // null-safe ], $and: [ { name: ["aa", "bb"] }, { name: { $ne: "kk" } }, { name: { $isDistinctFrom: "dd" } }, { url_name: { $isDistinctFrom: null } }, // null-safe != { url_name: { $ne: ["kk", null] } }, // null-safe too { url_name: { $ne: [] } }, // will always match { $literal: ["? > '2'", "5"] }, { name: { $lte: "y", $gte: "a" } }, { $overlap: [id1, id2, id3] } // most likely you want a GIN index here! ], $not: { name: "yy", $literal: ["length(name) < ?", 5], // custom SQL expression }, // Optional; it's for the cases when you don't really have a field edge, // so Ent Framework can't infer microshards from the query. $shardOfID: "12345", }, 100, [{ name: "ASC" }, { url_name: "DESC" }, { $literal: ["1=?", 2] }] ); ``` For more details, see TypeScript `Where<...>` definition in [types.ts](https://github.com/clickup/ent-framework/blob/main/src/types.ts). ## Batching of select() Calls As everything in Ent Framework, when multiple `select()` calls run in parallel, they are batched into one giant SQL `UNION ALL` query. The following code will produce only one SQL query: ```typescript await Promise.all([ EntComment.select(vc, { topic_id: "42" }, 10), EntComment.select(vc, { creator_id: "101" }, 20), ]); ``` SQL query produced under the hood (with some simplifications): ```sql SELECT * FROM topics WHERE topic_id='42' UNION ALL SELECT * FROM topics WHERE creator_id='101' ``` Sometimes, `select()` calls are meant to be relatively slow, and we don't want to batch them; instead, we prefer to run them in parallel, in different DB connections. To do so, you can just inject an "event loop spin" barrier: ```typescript // Never produces a UNION ALL SQL query. await Promise.all([ EntComment.select(vc, { topic_id: "42" }, 10), new Promise(setImmediate).then( async () => EntComment.select(vc, { creator_id: "101" }, 20), ), ]); ``` ## JOIN, WITH, FROM and Subqueries, Planner Hints In addition to database-independent features, `select()` call also supports engine-specific customizations using its last optional argument: ```typescript const comments = await EntComment.select( vc, { creator_id: "101" }, 20, // limit undefined, // order { joins, ctes, from, hints }, // untyped, but of type SelectInputCustom ); ``` Read more in: * [postgresql-specific-features](https://docs.ent-framework.net/advanced/postgresql-specific-features "mention") * [query-planner-hints](https://docs.ent-framework.net/a