UNPKG

universal-emoji-parser

Version:

This tool allow parse unicode and emoji codes to html images using emojilib && Twemoji CDN

373 lines (298 loc) 21 kB
# Architecture This document explains the **big picture** of Universal Emoji Parser so a new contributor (human or agent) can be productive quickly. For day-to-day commands see [Development Commands](DEVELOPMENT_COMMANDS.md). For language-specific rules see [Standards](STANDARDS.md). ## High-level model ``` ┌─────────────────────────────────┐ │ src/index.ts (public API) │ │ ─────────────────────────── │ │ uEmojiParser.parse(text, opts) │ │ uEmojiParser.parseToHtml │ │ uEmojiParser.parseToUnicode │ │ uEmojiParser.parseToShortcode │ │ emojiLibJsonData │ │ DEFAULT_EMOJI_CDN │ └────────┬───────────────┬────────┘ │ │ ┌─────────────────┘ └─────────────────┐ ▼ ▼ ┌──────────────────────────┐ ┌─────────────────────────┐ │ src/lib/emoji-lib.json │ │ @twemoji/parser │ │ (1906 entries) │ │ (only runtime dep) │ │ shortcode → EmojiType │ │ finds emoji entities │ │ unicode → EmojiType │ │ → CDN URLs │ └──────────┬───────────────┘ └─────────────────────────┘ │ │ generated offline by ▼ ┌──────────────────────────────────────────┐ │ test/prepareEmojiLibJson.test.ts │ │ (it.skip — opt-in regeneration) │ │ emojilib + unicode-emoji-json │ │ + EMOJIS_SPECIAL_CASES overrides │ │ → src/lib/emoji-lib-output.json │ │ → (review + copy to emoji-lib.json) │ └──────────────────────────────────────────┘ ``` The runtime is **two files**: `src/index.ts` (~135 lines) and `src/lib/emoji-lib.json` (data). Everything else is type definitions, tests, or build/release infrastructure. ## Project structure ``` universal-emoji-parser/ ├── AGENTS.md # Single source of truth for AI agents ├── CLAUDE.md → AGENTS.md # Symlink (do not edit directly) ├── README.md # Human-facing intro and usage docs ├── LICENSE # MIT ├── package.json # Scripts, deps, version, engines.node ≥ 20.19 ├── tsconfig.json # Strict TS config; tests + src; emits .d.ts via `build:tsc` ├── tsconfig.build.json # `tsc`/ts-loader: compile `src/` only (`rootDir`) ├── webpack.config.js # commonjs2 output, ts-loader → `tsconfig.build.json` ├── eslint.config.mjs # ESLint flat config + Prettier integration ├── .prettierrc # semi:false, singleQuote:true, trailingComma:'es5' ├── .editorconfig # 2-space indent, LF, max 120 cols ├── .ncurc.json # npm-check-updates (optional `reject` list) ├── .babelrc # babel-preset-env + transform-runtime (legacy, kept for compat) ├── .npmignore # Trims source/test/config from npm tarball │ ├── src/ │ ├── index.ts # The public API — see "src/index.ts" below │ └── lib/ │ ├── type.ts # EmojiType, EmojiParseOptionsType, UEmojiParserType │ ├── emoji-lib.json # The catalog (committed; ~543 KB; 1906 entries) │ └── emoji-lib-output.json # Last regeneration output (git-ignored) │ ├── test/ │ ├── main.test.ts # Integration tests for the public methods │ ├── emojiLibJson.test.ts # Validates catalog metadata + count │ └── prepareEmojiLibJson.test.ts # `it.skip`-guarded regenerator │ ├── dist/ # Webpack output (git-ignored, npm-published) │ ├── index.js │ ├── index.d.ts │ └── *.map │ ├── docker/local/ # Dev container Docker Compose + Dockerfile ├── .devcontainer/ # VS Code Dev Container config (uses docker/local/) │ ├── .github/ │ ├── workflows/ │ │ ├── code_check.yml # PR: lint + format + test │ │ ├── pull_request_check.yml # PR: title/body length + size labels │ │ ├── release_and_publish.yml # PR merge → bump version, build, publish │ │ ├── check_packages_versions.yml # Weekly: open deps PR via ncu │ │ ├── check_and_merge_packages_upgrades_pr.yml # Auto-merge that PR if green │ │ ├── check_branches_state.yml # Stale branch report │ │ └── cleanup_caches.yml # GHA cache GC │ └── scripts/ │ ├── get_github_release_log.sh # Build release notes from git log │ └── get_packages_upgrades.sh # Format ncu output for the PR body │ ├── .agents/ # AI agent skills, commands, subagents │ ├── README.md │ ├── skills/ │ ├── commands/ │ └── agents/ ├── .claude/ → .agents # Symlink (Claude Code looks here natively) │ ├── docs/ # This documentation └── tmp/ # Git-ignored scratch space ``` ## `src/index.ts` walkthrough ### Imports ```ts import { EmojiLibJsonType, EmojiParseOptionsType, EmojiType, TwemojiEntity, UEmojiParserType } from './lib/type' import emojiLibJson from './lib/emoji-lib.json' import { parse } from '@twemoji/parser' ``` `emoji-lib.json` is imported as a typed JSON module (`resolveJsonModule: true` in `tsconfig.json`) and cast to `EmojiLibJsonType`. There is **no** runtime construction of the catalog — it's literally a `.json` import. ### Constants ```ts export const DEFAULT_EMOJI_CDN: string = 'https://cdn.jsdelivr.net/gh/jdecked/twemoji@latest/assets/svg/' export const emojiLibJsonData: EmojiLibJsonType = emojiLibJson ``` `DEFAULT_EMOJI_CDN` is the URL prefix Twemoji's `parse()` produces. Custom CDNs work by string-replacing this prefix in `__parseEmojiToHtml`. ### The `uEmojiParser` object Six methods, each described below. `getEmojiObjectByShortcode` and `getDefaultOptions` are public (typed in `UEmojiParserType`) but rarely used directly. #### `getEmojiObjectByShortcode(shortcode)` Two-tier lookup: 1. Strip `:` from the shortcode 2. **Direct hit** on `emojiLibJsonData[shortcode]` — fast path for canonical slugs (`smiling_face_with_sunglasses`) 3. **Keyword scan**`Object.keys(...).find(k => emojiLibJsonData[k].keywords.includes(shortcode))` — fallback for dialects like `:thumbsup:` (Slack/legacy) that aren't the slug This is what makes Slack-style aliases coexist with the canonical slugs in a single catalog. #### `getDefaultOptions(options)` Merges user options with defaults. Subtle detail: it uses `Object.getOwnPropertyDescriptor(options, 'emojiCDN')` to distinguish "explicitly undefined" from "missing". For booleans (`parseToHtml`, `parseToUnicode`, `parseToShortcode`) it just calls `Boolean(...)` because `undefined → false` is the right default for those. Defaults: `parseToHtml: true`, `parseToUnicode: false`, `parseToShortcode: false`, `emojiCDN: undefined`. #### `__parseEmojiToHtml(text, emojiCDN)` Internal (note the `__` prefix, though it's exported — it's a JS-style "please don't call this" marker, not a hard private): 1. Run `@twemoji/parser`'s `parse(text)` to get `Array<TwemojiEntity>` (each has `text`, `url`, `indices`, `type`) 2. Track `entitiesFound` to avoid replacing the same emoji twice 3. For each entity: rewrite the URL prefix if `emojiCDN` is set, then `text.replace(new RegExp(entity.text, 'g'), <img...>)` to swap all occurrences Output: `<img class="emoji" alt="<unicode>" src="<url>"/>` — see [API Reference → HTML output contract](API_REFERENCE.md). #### `parseToHtml(text, emojiCDN?)` Convenience: runs `parseToUnicode` first (so `:smile:` becomes `🙂` first), then hands off to `__parseEmojiToHtml`. **Always** runs unicode resolution first — Twemoji only sees unicode characters. #### `parseToUnicode(text)` Match `/:(\w+):/g` to find shortcodes, look each one up via `getEmojiObjectByShortcode`, replace with `emoji.char`. Linear scan over matches; one regex per shortcode found. #### `parseToShortcode(text)` Builds a single alternation regex from `Object.keys(emojiLibJsonData).join('|')`, escapes the `*️⃣` keycap (it has special regex characters), then `text.matchAll` to find every emoji and replace with `:slug:`. The escape is load-bearing — without it, the regex compiles but corrupts the keycap match. #### `parse(text, options)` The dispatcher: ```ts if (typeof text !== 'string') throw new Error('The text parameter should be a string.') if (!opts.parseToHtml && opts.parseToShortcode) text = parseToShortcode(text) if (opts.parseToHtml || opts.parseToUnicode) text = parseToUnicode(text) if (opts.parseToHtml) text = __parseEmojiToHtml(text, opts.emojiCDN) ``` Order matters: shortcode → unicode → HTML. Each stage is a no-op if its option is off. ### CommonJS reattachment ```ts export default uEmojiParser module.exports = uEmojiParser module.exports.emojiLibJsonData = emojiLibJsonData module.exports.DEFAULT_EMOJI_CDN = DEFAULT_EMOJI_CDN ``` Webpack's `libraryTarget: 'commonjs2'` exposes the default export as `module.exports.default`, which would break `require('universal-emoji-parser').parse(...)`. The three `module.exports` assignments at the bottom flatten the API so `require` and `import` users see the same shape. Every `export const` declared at the top of `src/index.ts` must be reattached here too, otherwise it ships as `undefined` to CommonJS consumers (regression-tested in `test/exports.test.ts`). ## Type model — `src/lib/type.ts` ```ts export interface EmojiType { name: string // "smiling face with sunglasses" slug: string // "smiling_face_with_sunglasses" (canonical shortcode) group: string // "Smileys & Emotion" emoji_version: string // "1.0" unicode_version: string // "1.0" skin_tone_support: boolean char: string // "😎" — the unicode literal keywords: Array<string> // ["smiling_face_with_sunglasses", "cool", "summer", ...] keyword_index_found?: number // Used by the regenerator only — don't rely on it } export interface EmojiLibJsonType { [key: string]: EmojiType // keyed by emoji char (the unicode literal) } export interface EmojiParseOptionsType { emojiCDN?: string parseToHtml?: boolean parseToUnicode?: boolean parseToShortcode?: boolean } export interface UEmojiParserType { getEmojiObjectByShortcode: (shortcode: string) => EmojiType | undefined getDefaultOptions(options?: EmojiParseOptionsType): EmojiParseOptionsType __parseEmojiToHtml(text: string, emojiCDN?: string): string parseToHtml: (text: string, emojiCDN?: string) => string parseToUnicode: (text: string) => string parseToShortcode: (text: string) => string parse: (text: string, options?: EmojiParseOptionsType) => string } export interface TwemojiEntity { url: string indices: Array<number> text: string type: string } ``` The catalog is **keyed by unicode literal**, not by slug. That's because the regenerator pipeline starts from `unicode-emoji-json` (whose keys are unicode) and merges keywords from `emojilib` (whose keys are also unicode). Looking up by slug requires the two-tier scan in `getEmojiObjectByShortcode`. ## The regeneration pipeline `test/prepareEmojiLibJson.test.ts` is the **only** sanctioned way to rebuild `src/lib/emoji-lib.json`. The test is `it.skip`-guarded so it never runs on CI: 1. Load `unicode-emoji-json` (1906 emojis with metadata: name, slug, group, version) 2. Load `emojilib` (1898 emojis with curated keyword arrays) 3. For each emoji in `unicode-emoji-json`: - Set `char` to the key - Use `emojilib` keywords if present, else `[slug]` - Ensure the slug is in keywords (unshift if missing) - Apply `EMOJIS_SPECIAL_CASES` overrides (include/exclude) 4. **Deduplicate keywords** across emojis — the same keyword can appear on multiple emojis (e.g., `coffee` on `☕` and `🤎`). The algorithm picks the emoji with the lowest `keyword_index_found` (i.e., where the keyword is most prominent) and removes it from the rest. This is O(n²) but only runs at regeneration time 5. Write to `src/lib/emoji-lib-output.json` After regeneration: - Diff `emoji-lib-output.json` vs `emoji-lib.json` to review changes - Copy the new contents to `emoji-lib.json` (the runtime source) - Update `TOTAL_EMOJIS` in `emojiLibJson.test.ts` if the count changed - Commit both files together See [`/regenerate-emoji-lib`](../.agents/commands/regenerate-emoji-lib.md) for the full workflow. ## Special cases (`EMOJIS_SPECIAL_CASES`) The regenerator applies hand-curated keyword overrides for a handful of emojis where the upstream `emojilib` keywords are wrong, missing, or collide with another emoji. Current entries: | Emoji | Include | Exclude | Why | | ----- | -------------------------------------- | ----------------- | --------------------------------------------------------------------------------- | | `☕` | `coffee` | — | `emojilib` has it, but the dedup loop would otherwise hand `coffee` to `🤎` first | | `🤎` | — | `coffee` | Brown heart should not match `:coffee:` | | `❤️` | `heart` | — | The plain red heart is the canonical `:heart:` | | `💘` | — | `heart` | Heart-with-arrow shouldn't steal `:heart:` | | `👮‍♀️` | `policewoman`, `female-police-officer` | `legal`, `arrest` | Common Slack aliases; remove ambiguous keywords | | `✅` | `white_check_mark` | — | GitHub-flavored alias | | `⏸️` | `double_vertical_bar` | — | Niche but supported | Add new entries by editing `EMOJIS_SPECIAL_CASES` in `prepareEmojiLibJson.test.ts` and regenerating. ## Build configuration ### Webpack (`webpack.config.js`) ```js { entry: { index: { import: './src/index.ts' } }, output: { path: 'dist/', filename: '[name].js', libraryTarget: 'commonjs2', // critical — see "CommonJS reattachment" above globalObject: 'this', }, module: { rules: [{ test: /\.tsx?$/, use: 'ts-loader' }] }, resolve: { extensions: ['.tsx', '.ts', '.js'] }, optimization: { chunkIds: 'size', minimize: true }, // CleanWebpackPlugin only on `--mode production` } ``` Single-entry, single-output. ts-loader runs the TypeScript compiler, no Babel involvement at build time (the `.babelrc` is legacy — Babel only kicks in if a downstream tool reaches for it). ### TypeScript (`tsconfig.json`) Highlights: - `strictNullChecks: true` - `noImplicitAny: true` - `noUnusedLocals: true`, `noUnusedParameters: true` - `declaration: true` — emits `.d.ts` so consumers get types - `module: 'commonjs'`, `moduleResolution: 'node'` - `lib: ['es6', 'dom']` — includes DOM types because consumers may use this in the browser - `resolveJsonModule: true` — required to `import emojiLibJson from './lib/emoji-lib.json'` `outDir: './dist/'` — but Webpack overrides this; `tsc` is used only via `npm run build:tsc` to emit type declarations. ### npm scripts | Script | Runs | Purpose | | --------------------------------- | ------------------------------------------------------------------------------- | ------------------------- | | `dev` | `nodemon src/index.ts` | Watch-run a smoke script | | `build` | `webpack --mode production --progress` | Production bundle | | `build:dev` | `webpack --mode development --progress` | Unminified bundle | | `build:tsc` | `tsc --build tsconfig.json` | Type-check + emit `.d.ts` | | `test` | `tsx ./node_modules/mocha/bin/mocha.js 'test/**/*.ts' --timeout 25000 --colors` | Run all specs | | `test:watch` | `mocha -w --watch-extensions ts ...` | TDD inner loop | | `eslint:check` / `eslint:fix` | ESLint over `*.ts` | Lint | | `prettier:check` / `prettier:fix` | Prettier over `*.{css,html,js,ts,json,md,yaml,yml}` | Format | | `release` | `npm version patch -m "[🤖 DailyBot] New release to v%s launched 🚀"` | Bump version (CI-only) | | `ncu:check` / `ncu:upgrade` | `npm-check-updates` | Dep upgrade pipeline | ## CI/CD pipeline The release flow (`.github/workflows/release_and_publish.yml`) is triggered on `pull_request: closed` with `merged == true` against `main`: ``` PR merged to main │ ▼ check_pr_size_label (XS / S / M / L / XL / XXL based on lines changed) │ ▼ notify_on_channel_start (DailyBot Slack-like notification) │ ▼ deploy_setup (npm install with cache) │ ▼ deploy_validate_linters_and_code_format (eslint:check + prettier:check) │ ▼ deploy_tests (npm test) │ ▼ build (npm run build → dist/) │ ▼ release_and_publish (npm version patch + push tag + create GH release + npm publish) │ ▼ cleanup_caches + notify_on_channel_end ``` Every job runs on `ubuntu-latest` with Node 24 and aggressive caching of `~/.npm` and `node_modules`. The release job uses `secrets.AUTOMATION_GITHUB_TOKEN` (push + tag) and `secrets.NPM_TOKEN` (npm publish). The DailyBot identity (`🤖 DailyBot <ops@dailybot.com>`) is hardcoded. Detailed walkthrough: **[Build & Deploy](BUILD_DEPLOY.md)**. ## Mental model summary 1. **Two-file runtime.** `src/index.ts` + `src/lib/emoji-lib.json`. Everything else is build/test/CI. 2. **The catalog is generated, not authored.** Edit `EMOJIS_SPECIAL_CASES` and regenerate; never hand-edit the JSON. 3. **One runtime dependency.** `@twemoji/parser`. Adding more dependencies requires justification — they ship to consumer bundles. 4. **Dual ESM/CommonJS shape.** The `module.exports` reattachment at the bottom of `src/index.ts` is non-negotiable. 5. **HTML output is a contract.** `<img class="emoji" alt="<unicode>" src="<url>"/>` — exactly that shape, forever (until a major bump). 6. **CI owns the release.** Humans never run `npm version` or `npm publish`. The merge to `main` is the release trigger.