universal-emoji-parser

# Security Baseline security guidance for Universal Emoji Parser. The package emits HTML that consumers typically inject via `innerHTML` / `v-html` / `dangerouslySetInnerHTML` — making the output's correctness and the package's supply chain both directly relevant to consumer security. ## Principles 1. **The HTML output is trusted by consumers.** Do not let untrusted data leak into the output template 2. **No secrets in the repo.** Everything that authenticates as the package (npm token, signing keys) lives in CI secrets 3. **Default deny on dependencies.** Adding a runtime dependency ships it to every consumer's bundle and increases the supply-chain surface 4. **Pin versions.** No `^`/`~` ranges in CI behavior — `package.json` uses exact versions for both deps and devDeps --- ## HTML output safety ### The output template is hardcoded ```ts text = text.replace(regex, `<img class="emoji" alt="${entity.text}" src="${emojiUrl}"/>`) ``` Two interpolated values: - **`entity.text`** — the unicode emoji literal returned by `@twemoji/parser`. **Never user-controlled** — it's whatever Twemoji decided was an emoji entity - **`emojiUrl`** — either Twemoji's CDN URL or a string-replaced version using the `emojiCDN` option. The CDN comes from the consuming app's config (or default), not from end-user input Both are safe under normal use. The risks come from: ### Risk 1 — Consumer passes user input as `emojiCDN` ```ts // ❌ DANGEROUS — never do this const userCdn = req.query.cdn // user-controlled uEmojiParser.parse(text, { emojiCDN: userCdn }) ``` If a consumer pipes user input into `emojiCDN`, an attacker could set: ``` emojiCDN = '"></script><script>alert(1)</script>' ``` …and the output would contain executable JavaScript. This is **the consumer's bug**, not the package's, but the doc here is to flag it. **Mitigation in this package:** the package doesn't validate `emojiCDN` shape. We **could** add a check (e.g., must start with `https?://`, must end with `/`, no quotes), but that adds complexity for a misuse that's clearly the consumer's responsibility. Document it loudly here and in [API Reference](API_REFERENCE.md); don't add defensive validation that can be bypassed. If a consumer wants to allow user-configurable CDNs, they should validate before passing: ```ts function safeCdn(input: string): string | undefined { if (!/^https:\/\/[\w.-]+(:\d+)?(\/[\w.-]*)*\/$/.test(input)) return undefined return input } uEmojiParser.parse(text, { emojiCDN: safeCdn(req.query.cdn) }) ``` ### Risk 2 — Consumer doesn't escape surrounding text ```ts // ❌ DANGEROUS const userMessage = req.body.message // "<script>alert(1)</script> :smile:" const html = uEmojiParser.parseToHtml(userMessage) res.send(html) // ships the script tag ``` The package **does not escape** the surrounding text. It's a _transformer_, not a sanitizer. Consumers who feed user input through it must escape first: ```ts import escape from 'lodash.escape' const safe = escape(req.body.message) // escapes <, >, &, " const html = uEmojiParser.parseToHtml(safe) // emojis still resolve; HTML is escaped res.send(html) ``` The reason: HTML-escaping inside `parseToHtml` would corrupt content for consumers who already consider the input safe (e.g., trusted markdown rendered to HTML by another library, then emoji-replaced). Escaping is the consumer's choice and timing. This package's responsibility is to never _introduce_ unsafe HTML. The output template is safe under all inputs because: - `entity.text` is always a unicode emoji (Twemoji's regex doesn't match arbitrary strings) - `emojiUrl` is either trusted (default CDN) or consumer-supplied (their responsibility) ### Risk 3 — Future feature additions If we ever add user-controlled HTML attributes (e.g., custom `data-*` attributes from a `customDataAttributes` option), we must: - HTML-escape all values before inserting - Whitelist attribute names against a regex - Document the security implications Currently no such feature exists — the output template is fixed. --- ## Input handling ### `parse(text, options)` - **`text` non-string check** — throws `Error('The text parameter should be a string.')`. There's a test for this; don't remove it - **`text` empty string** — returns empty string; no errors - **`text` very long** — no length limit. Latency scales linearly with text length and number of emojis. Consumers who accept untrusted input should rate-limit themselves; we can't enforce it here - **`text` with malformed input** — unmatched shortcodes (`:not_real:`), garbage Unicode, partial surrogate pairs — all pass through as text, no errors ### Catalog data integrity `src/lib/emoji-lib.json` is the catalog. It's: - Generated by `prepareEmojiLibJson.test.ts` from `emojilib` and `unicode-emoji-json` - Reviewed by humans (PR diff) before merging - Loaded as a static JSON import — no `eval`, no dynamic require, no remote fetch A malicious entry in the catalog (e.g., a slug with HTML special characters) would only affect `parseToShortcode` output. Currently every slug is `[a-z0-9_]+` so this isn't a real risk, but if you ever notice a non-safe character in `slug`, that's a bug in the regenerator's input — fix `EMOJIS_SPECIAL_CASES` to scrub it. --- ## Supply chain ### Runtime dependency footprint ```json "dependencies": { "@twemoji/parser": "17.0.1" } ``` **One** runtime dependency. Adding a second is a major decision — it ships to every consumer. ### `@twemoji/parser` trust - Maintained by [jdecked](https://github.com/jdecked) (former Twemoji maintainer) and the broader Twemoji community - License: MIT - No native dependencies (pure JS) - Used by Twitter/X, Discord, and many other major products - We pin to an exact version; bumping is a deliberate `chore: bump @twemoji/parser` PR ### Dev-only dependencies `emojilib` and `unicode-emoji-json` are `devDependencies` — they only run during `prepareEmojiLibJson.test.ts` regeneration. They never ship to npm consumers. If either project becomes unmaintained or compromised, the impact is limited to catalog regeneration — we'd switch to a fork or fall back to the previously committed catalog. ### CI-side security - **`secrets.NPM_TOKEN`** — automation-scoped; can publish but not modify package metadata - **`secrets.AUTOMATION_GITHUB_TOKEN`** — fine-grained PAT or GitHub App token with `contents: write`, `pull-requests: write`. Scoped to this repo only - **`secrets.DAILYBOT_API_KEY`** — sends notifications; no repo access CI workflows check out the repo with `actions/checkout@v4` (pinned major version) and use `actions/setup-node@v4`, `actions/cache@v4` (also pinned majors). Pinning to SHAs would be more rigorous; we accept the major-pin risk for now. ### Dependency upgrade automation `check_packages_versions.yml` runs `ncu -u` weekly and opens an auto-PR. The PR goes through `code_check.yml` (lint + format + test) before auto-merging. This means a malicious new release of any devDep that breaks the build is caught — but a malicious release that _passes_ all checks would auto-merge. Mitigations: - `.ncurc.json` rejects bumps for `chai` and `eslint` (specific versions we want to control) - The auto-merge workflow can be disabled if a high-profile supply-chain incident hits To harden further, consider: - Adding `npm audit` to CI (`npm audit --audit-level=high`) - Pinning to exact SHAs for action versions (`actions/checkout@<sha>`) - Using [Socket.dev](https://socket.dev/) or [Snyk](https://snyk.io/) PR checks - Disabling auto-merge and reviewing every dep PR by hand We don't currently do any of these; document the gap so a future security-focused contributor can add them. --- ## Publishing security ### npm 2FA The npm account that owns `secrets.NPM_TOKEN` should have 2FA enabled with **`auth-and-writes`** mode. This prevents stolen tokens from publishing — they'd also need a TOTP code. If an automation token can't satisfy 2FA (typically the case), use an "automation" token specifically (npm generates these for CI) and **scope it to this single package**. A leaked package-scoped automation token can't be used to publish other packages. ### Reproducible builds The package is built by Webpack from TypeScript source. Build is deterministic for a given: - Node version (CI: Node 24) - npm lockfile state (note: no lockfile committed; see [`.gitignore`](../.gitignore)) - Source tree Running `npm install && npm run build` on different machines produces byte-identical `dist/index.js`... mostly. Webpack's chunk IDs and minification are deterministic; the JSON catalog is checked-in source. If you ever need to verify a published version against source: `npm pack` locally, diff against the published tarball. ### Tarball contents `.npmignore` and `package.json` `files` (we don't currently use `files`; `.npmignore` is the source of truth) restrict what ships: ``` Included: dist/index.js dist/index.d.ts dist/lib/type.d.ts dist/*.map package.json README.md LICENSE Excluded: src/, test/, docker/, .github/, docs/, .agents/, .claude/, .vscode/, .devcontainer/ *.config.js, eslint.config.mjs, .prettierrc, .editorconfig, .babelrc, tsconfig.json package-lock.json, *.txt ``` Verify before publishing: `npm pack --dry-run`. --- ## License compliance - **This package**: MIT (see [LICENSE](../LICENSE)) - **`@twemoji/parser`**: MIT - **Twemoji assets** (the SVGs the CDN serves): CC-BY 4.0 — consumers using the default CDN should attribute Twemoji per the license If a consumer rebrands or self-hosts assets, they must respect Twemoji's license. We document this in [Emoji Providers → CDN selection](EMOJI_PROVIDERS.md#cdn-selection). --- ## Reporting vulnerabilities If you find a security issue: 1. **Don't open a public GitHub issue.** Email developers@dailybot.com with details, repro steps, and impact assessment 2. **Expected response**: acknowledgement within 5 business days; coordinated disclosure timeline depends on severity 3. **CVE assignment**: we'll request one if the issue qualifies For non-critical issues (e.g., "this docs example shows an unsafe pattern"), a public issue or PR is fine. --- ## Threat model checklist For a typical consumer integration: - [ ] User input is HTML-escaped **before** being passed to `parse` / `parseToHtml` - [ ] `emojiCDN` is hardcoded or validated against an allowlist; never user-controlled - [ ] The CDN serving Twemoji assets is HTTPS (default is — `cdn.jsdelivr.net`) - [ ] If self-hosting Twemoji assets, the CDN sets correct Content-Type (`image/svg+xml`) - [ ] If serving on a strict CSP, the CDN host is whitelisted in `img-src` - [ ] If hosting on edge functions with strict bundle limits, lazy-load the package - [ ] Consumer's npm-audit / Snyk / Dependabot picks up vulnerabilities in this package's dependencies For maintainers of this package: - [ ] No `console.log`, no `eval`, no `Function` constructor in `src/` - [ ] No `dangerously*` patterns; output template is fixed - [ ] `package.json` `dependencies` has only `@twemoji/parser`; verify on every PR - [ ] CI's `secrets.NPM_TOKEN` is scoped to this package - [ ] npm 2FA is `auth-and-writes` - [ ] Major Twemoji bumps go through manual review (don't auto-merge `@twemoji/parser` updates without reading release notes) --- ## What we don't protect against - **Consumer-side XSS** — if a consumer's templating system is misconfigured and renders the package's output unescaped while _also_ rendering attacker-controlled text unescaped, that's the consumer's vulnerability. Document the safe pattern, don't try to make every misuse safe - **CDN compromise** — if the Twemoji CDN itself is compromised, every consumer using the default CDN serves compromised SVGs. Mitigation: pin a Twemoji version (so a future compromise of `@latest` doesn't propagate), or self-host - **DoS via huge inputs** — `parse('a'.repeat(10_000_000))` will be slow (linear in input length × catalog regex). Consumers facing DoS risk should bound input size before calling - **Catastrophic regex** — the `parseToShortcode` alternation regex doesn't have catastrophic backtracking under standard inputs, but with a maliciously crafted input that's mostly partial-emoji-prefixes, performance could degrade. We don't currently fuzz-test for this