universal-emoji-parser
Version:
This tool allow parse unicode and emoji codes to html images using emojilib && Twemoji CDN
298 lines (209 loc) • 15 kB
Markdown
# Standards
Canonical coding rules for Universal Emoji Parser. Every contributor (human or agent) must follow these. ESLint + Prettier handle most details automatically — these standards cover decisions tools cannot make.
## Language
- **English only** for code, identifiers, comments, JSDoc, commit messages, branch names, and PR descriptions
- Emoji shortcodes (`:smile:`, `:thumbs_up:`) are **data**, not language. Their casing/naming follows the emoji catalog, not English convention
## TypeScript
### Strict by default
`tsconfig.json` enforces:
- `strictNullChecks: true` — every nullable union must be handled (`?.`, `??`, narrowing, or explicit type guard)
- `noImplicitAny: true` — every parameter and return type must be inferable or annotated
- `noUnusedLocals: true`, `noUnusedParameters: true` — dead code fails the build
- `declaration: true` — every public export gets a `.d.ts` entry
If you need `any`, prefer `unknown` and narrow. If `any` is genuinely the right call, suppress with a targeted `// eslint-disable-next-line @typescript-eslint/no-explicit-any` and explain why.
### Explicit return types on exports
```ts
// ✅ exported — annotate
export function parseToHtml(text: string, emojiCDN?: string): string { ... }
// ✅ internal helper — inference is fine
const formatEntity = (e: TwemojiEntity) => `<img src="${e.url}"/>`
```
This keeps the public `.d.ts` stable across minor refactors.
### Interfaces over type aliases for public types
`type.ts` uses `interface` for `EmojiType`, `EmojiLibJsonType`, `EmojiParseOptionsType`, etc. Reasons:
- Interfaces support declaration merging (consumers can extend in their own `.d.ts` if needed)
- TypeScript error messages reference interface names cleanly
- They show up as "interface" in IDE hover tooltips, signaling "this is part of the API"
Reserve `type` for unions and mapped types: `type EmojiKey = keyof typeof emojiLibJsonData`.
## Naming
| Element | Convention | Example |
| --------------------------- | ------------------------------------------- | ----------------------------------------------------------- |
| Source file | `camelCase.ts` matching the dominant export | `emojiLibJson.test.ts`, `index.ts`, `type.ts` |
| Test file | `<subject>.test.ts` | `main.test.ts`, `emojiLibJson.test.ts` |
| Class / interface | `PascalCase` | `EmojiType`, `UEmojiParserType` |
| Function (top-level) | `camelCase` | `parseToHtml`, `getEmojiObjectByShortcode` |
| Internal "private" function | `__camelCase` (double underscore prefix) | `__parseEmojiToHtml` |
| Constant (compile-time) | `SCREAMING_SNAKE_CASE` | `DEFAULT_EMOJI_CDN`, `EMOJIS_SPECIAL_CASES`, `TOTAL_EMOJIS` |
| Local variable | `camelCase` | `entitiesFound`, `emojiUrl` |
| Catalog slug | `snake_case` | `smiling_face_with_sunglasses` |
The `__` prefix on `__parseEmojiToHtml` is a JavaScript-era marker meaning "implementation detail, may change without notice". TypeScript's actual `private` modifier doesn't apply because we use a plain object literal, not a class.
## Module structure (`src/index.ts`)
Order in this exact sequence:
1. **External imports** — packages from `node_modules` (`@twemoji/parser`)
2. **Internal imports** — relative paths (`./lib/type`, `./lib/emoji-lib.json`)
3. **Constants** — `export const DEFAULT_EMOJI_CDN`, `export const emojiLibJsonData`
4. **The main object** — `const uEmojiParser: UEmojiParserType = { ... }`
5. **Default export** — `export default uEmojiParser`
6. **CommonJS reattachment** — `module.exports = uEmojiParser; module.exports.emojiLibJsonData = emojiLibJsonData`
The CommonJS reattachment is **mandatory** — see [Architecture → CommonJS reattachment](ARCHITECTURE.md#commonjs-reattachment). Don't move it, don't delete it, don't refactor around it.
## Public API discipline
The public surface is:
```ts
// from src/index.ts
export const DEFAULT_EMOJI_CDN: string
export const emojiLibJsonData: EmojiLibJsonType
export default uEmojiParser // UEmojiParserType — 7 methods
// from src/lib/type.ts (re-exported via .d.ts)
export interface EmojiType
export interface EmojiLibJsonType
export interface EmojiParseOptionsType
export interface UEmojiParserType
export interface TwemojiEntity
```
Rules:
1. **Don't add new top-level exports.** Extend `uEmojiParser` instead — that's how consumers expect to find new functionality
2. **Don't change method signatures.** Adding optional parameters is OK; reordering, renaming, or changing return types is a major bump
3. **Don't change the HTML output template.** `<img class="emoji" alt="..." src="..."/>` is a contract — see [API Reference](API_REFERENCE.md)
4. **Don't break dual ESM/CommonJS.** Both `import` and `require` consumers must keep working
5. **Don't expose internal helpers.** If something's prefixed with `__`, it's internal. If you add a new helper, mark it the same way
## Formatting (Prettier)
Configured in `.prettierrc`:
```json
{
"semi": false,
"singleQuote": true,
"trailingComma": "es5"
}
```
Implications:
```ts
// ✅ no semicolons (except where ASI hazards exist — Prettier inserts a leading semi)
const x = 1
const y = 2
// ✅ single quotes for strings; backticks for templates
const a = 'hello'
const b = `hello, ${name}`
// ✅ trailing comma in multi-line arrays/objects (es5: not in function calls)
const arr = ['a', 'b', 'c']
fn('a', 'b', 'c') // ✅ no trailing comma in function call (es5)
```
Auto-fix with `npm run prettier:fix`. CI fails on `prettier:check`, so always run before committing.
### Line length
`.editorconfig` sets `max_line_length = 120`. Prettier reflows past it when possible (long string literals stay inline). Don't force-wrap shorter lines for cosmetic reasons.
## Linting (ESLint)
`eslint.config.mjs` (flat config) composes:
- `@eslint/js` `recommended`
- `typescript-eslint` `recommended`
- `eslint-plugin-prettier/recommended`
Custom rules:
| Rule | Setting | Reason |
| ------------------------------------------ | -------------- | ------------------------------------------------------------------------------------------------ |
| `no-console` | `2` (error) | This is a library — `console.*` in `src/` leaks into consumers. Tests may log freely |
| `@typescript-eslint/no-inferrable-types` | `off` | We sometimes annotate inferable types for clarity (e.g., `const emojiCDN: string = '...'`) |
| `@typescript-eslint/no-non-null-assertion` | `off` | Allowed sparingly when the type system can't see the invariant (e.g., dedup loop in regenerator) |
| `@typescript-eslint/ban-ts-comment` | `off` | `// @ts-ignore` allowed for unavoidable interop |
| `semi` | `[2, 'never']` | Reinforces Prettier's `semi: false` |
Run `npm run eslint:check` before committing; auto-fix is `npm run eslint:fix`.
## Comments
- **Don't comment what the code does** — the code already says that
- **Do comment why** when the reason is non-obvious: a workaround, a constraint, an upstream quirk
- **Do JSDoc public methods** with at minimum a one-line description; consumers see this in their IDE hover. The current `src/index.ts` is light on JSDoc — adding more is welcome
- **TODOs:** `// TODO(<owner>): <action>` — never bare `// TODO`. Even better, open an issue and reference it
Examples that are _worth_ keeping:
```ts
// Track processed entities to avoid duplicate replacements when the same emoji
// appears multiple times — Twemoji parse() returns one entry per occurrence
const entitiesFound: Array<string> = []
```
```ts
// Escape the keycap; * has special regex semantics and would corrupt the alternation
regexText = regexText.replace(/\*️⃣/g, '\\*️⃣')
```
Both explain a non-obvious _why_; without them, a reader would think the code was redundant or buggy.
## Object option-merge pattern
The `getDefaultOptions` helper uses an unusual pattern — preserve it:
```ts
emojiCDN: options && Object.getOwnPropertyDescriptor(options, 'emojiCDN')
? String(options.emojiCDN)
: undefined,
parseToHtml: options && Object.getOwnPropertyDescriptor(options, 'parseToHtml')
? Boolean(options.parseToHtml)
: true,
```
Why `Object.getOwnPropertyDescriptor` instead of `options.emojiCDN === undefined`?
Because callers passing `{ emojiCDN: undefined }` should be treated as "explicitly clearing" — and a future signature might want to distinguish "unset" from "undefined". `getOwnPropertyDescriptor` returns `undefined` when the key doesn't exist; truthy when the key is set to _anything_ (including undefined).
For `parseToHtml`/`parseToUnicode`/`parseToShortcode`, the pattern is simpler — `Boolean(options?.parseToHtml)` defaults to `false`, but `parseToHtml`'s default is **true**, hence the `getOwnPropertyDescriptor` check. The other two booleans default to `false`, so `Boolean(options?.x)` is fine.
Don't refactor this to nullish coalescing without verifying every test still passes — the option semantics are subtle.
## Error handling
The package only throws in one place:
```ts
if (typeof text !== 'string') {
throw new Error('The text parameter should be a string.')
}
```
Rules:
- **The message string is part of the contract.** A test asserts the throw, and consumers may catch by message. Don't reword it
- **Don't add other throws.** Bad input (an unmatched shortcode like `:not_an_emoji:`) is just left as text — it's not an error
- **Never throw asynchronously.** The whole API is synchronous; introducing `Promise.reject` paths is a major change
## Testing standards
See [Testing Guide](TESTING_GUIDE.md). Summary:
- Specs in `test/*.test.ts`, run by Mocha + Chai 6 + tsx
- BDD style: `describe('Test emoji parser', () => { describe('Using default options', () => { it('should ...') })})`
- One behavior per `it` — split if you'd write "and" in the name
- `expect(result).to.be.equal(...)` for primitive equality; `.deep.equal` for objects/arrays
- Paste the exact failing input verbatim when adding a regression test — don't summarize
## Catalog discipline
- **Do not edit `src/lib/emoji-lib.json` by hand.** Regenerate via `prepareEmojiLibJson.test.ts`
- **Do not commit `src/lib/emoji-lib-output.json`** — gitignored intentionally
- **Do not export new fields from `EmojiType`** without measuring the bundle-size cost; every field × 1906 entries × every consumer's bundle adds up
- **Do update `EMOJIS_SPECIAL_CASES`** in `prepareEmojiLibJson.test.ts` when a Slack-style alias needs to be supported
See [`/regenerate-emoji-lib`](../.agents/commands/regenerate-emoji-lib.md) and [`/add-special-case`](../.agents/commands/add-special-case.md).
## Imports
ESLint enforces no unused imports (`noUnusedLocals`). Prefer named imports for clarity:
```ts
// ✅ named — clear what we're using
import { parse } from '@twemoji/parser'
// ✅ default — when the lib's primary export is a single object/value
import emojiLibJson from './lib/emoji-lib.json'
// ❌ namespace — only when truly needed
import * as fs from 'fs' // ✅ this case is fine — we use fs.writeFileSync, fs.existsSync
```
Don't insert blank lines between import groups; let the file flow naturally.
## Visibility
TypeScript classes aren't used here, but the same intent applies via naming:
- **`__name`** — internal, may change without notice
- **`name`** without underscore — public API, signature changes are versioned
- **Type re-exports** — only re-export types from `src/lib/type.ts` that consumers will reasonably use; don't pollute the `.d.ts` with internal helpers
## Versioning
The package follows **Semantic Versioning** loosely:
- **Patch** (`2.0.78` → `2.0.79`) — bug fixes, catalog regenerations, doc-only changes. CI auto-bumps on merge
- **Minor** (`2.0.x` → `2.1.0`) — new methods on `uEmojiParser`, new options, new catalog fields (rare). **Bump manually** before merging
- **Major** (`2.x` → `3.0`) — HTML output template change, default option flip, removed/renamed method, dual-export break, dropped Node version. Reserved for intentional breakage
CI's `npm version patch` is the right default. If a change deserves minor or major, edit `package.json` version manually in the same PR and the workflow's `npm version patch` will fail loudly (you'll need to skip the auto-bump for that release — open an issue in the workflow at that point).
## Build hygiene
- Don't commit `dist/` (gitignored)
- Don't commit `node_modules/` (gitignored)
- Don't commit `package-lock.json` — gitignored intentionally; CI rebuilds from `package.json` + cached `node_modules`. _(If you have strong feelings, open an issue and discuss before changing.)_
- Don't commit `.env` files — gitignored
- Don't commit `git_logs.txt`, `git_logs_output.txt`, `packages_upgrades.txt`, `packages_upgrades_output.txt` — gitignored CI scratch
- Don't commit `src/lib/emoji-lib-output.json` — gitignored
## Don't
- ❌ Hand-edit `src/lib/emoji-lib.json`
- ❌ Add a new runtime dependency without measuring bundle-size impact
- ❌ Change the HTML output template (`<img class="emoji" alt="..." src="..."/>`)
- ❌ Use `console.log` / `console.error` in `src/`
- ❌ Use `==` (TypeScript ESLint allows `===` only)
- ❌ Use `!!x` for boolean coercion in option parsing — use `Boolean(x)` (matches existing style)
- ❌ Add semicolons (Prettier strips them; ESLint errors)
- ❌ Use double quotes (`"..."`)
- ❌ Skip `npm run eslint:check` / `prettier:check` before committing
- ❌ Modify `EmojiType` shape without regenerating the catalog and bumping consumer-visible types
## Do
- ✅ Run `npm run test:watch` while editing `src/`
- ✅ Add a regression test for every parsing fix; paste the failing input verbatim
- ✅ Use `npm run prettier:fix` and `npm run eslint:fix` before committing
- ✅ Annotate exported function return types explicitly
- ✅ Use `Object.getOwnPropertyDescriptor` for option-merge "explicit-undefined" detection
- ✅ Update `EMOJIS_SPECIAL_CASES` for keyword overrides; never mutate the catalog at runtime
- ✅ Bump deps via `npm run ncu:upgrade` (respects `.ncurc.json`)
- ✅ Write conventional commit messages (`feat:`, `fix:`, `chore:`, etc.)