i18n-ai-translate

Version:

AI-powered localization CLI, Node library, and GitHub Action. Translate i18next JSON, Gettext PO, Java .properties, and iOS .strings with ChatGPT, Claude, Gemini, or local Ollama models.

github.com/taahamahdi/i18n-ai-translate

taahamahdi/i18n-ai-translate

162 lines (121 loc) • 9.57 kB

Markdown

# i18n‑ai‑translate [![npm version](https://img.shields.io/npm/v/i18n-ai-translate.svg)](https://www.npmjs.com/package/i18n-ai-translate) [![npm downloads](https://img.shields.io/npm/dw/i18n-ai-translate.svg)](https://www.npmjs.com/package/i18n-ai-translate) [![Build](https://img.shields.io/github/actions/workflow/status/taahamahdi/i18n-ai-translate/build.yml?branch=master)](https://github.com/taahamahdi/i18n-ai-translate/actions/workflows/build.yml) [![License: GPL‑3.0](https://img.shields.io/npm/l/i18n-ai-translate.svg)](https://github.com/taahamahdi/i18n-ai-translate/blob/master/LICENSE) AI‑powered localization for your translation catalogues. Automate translating single files or entire directories with ChatGPT, Gemini, Claude, or local Ollama models — while keeping translations accurate, formatting consistent, and placeholders intact. Works with **i18next‑style** JSON out of the box, plus Gettext `.po`, Java `.properties`, and iOS `.strings`. _For a detailed walkthrough and advanced tips, see [ADVANCED_GUIDE.md](ADVANCED_GUIDE.md)._ --- ## Why use it? | Feature | What it means | | --------------------- | --------------------------------------------------------------------------------------- | | **Multi‑engine** | Choose OpenAI, Google, Anthropic, or your own Ollama models | | **Fast** | Parallel per-batch workers share one rate limiter; translate 20 locales concurrently | | **Safe** | Translations verified against the source before being written | | **Diff‑aware** | Only re‑translate keys you changed; existing translations are preserved | | **Check mode** | Audit existing translations for drift, missing placeholders, or quality regressions | | **Format‑aware** | i18next JSON, Gettext `.po`, Java `.properties`, iOS `.strings` — round‑tripped intact | | **Context-aware** | `--context` flag injects product info so the model picks domain-appropriate terminology | | **Dry‑run** | Preview updates before touching disk | | **Everywhere** | Use as a CLI, GitHub Action, or Node library | --- ## Quick start ### 1 · Install ```bash npm i -g i18n-ai-translate # or yarn add i18n-ai-translate --dev export OPENAI_API_KEY=••• # or GEMINI_API_KEY / ANTHROPIC_API_KEY ``` ### 2 · Translate a file ```bash i18n-ai-translate translate -i i18n/en.json -o fr \ -e chatgpt -m gpt-5.2 ``` Need more languages? Pass multiple codes (`-o fr es de`) or `-A` for **all** 180+. Filenames like `es-ES.json` / `pt-BR.json` are accepted too — the language subtag is extracted automatically. Skip specific locales with `--exclude-languages fr de` (handy for locales you maintain by hand). **Other formats:** besides i18next JSON, Gettext `.po`, Java `.properties`, and iOS `.strings` files work too — the format is inferred from the file extension (override with `--file-format json|po|properties|strings`). Non-translatable structure round-trips losslessly: PO comments, `msgctxt`, and plural forms; `.properties` comments, separators, and line continuations; `.strings` `/* */` and `//` comments and quoting. Native placeholders (`printf` `%s`/`%1$s`/`%@`, MessageFormat `{0}`/`{1}`) are preserved across the translation. Works across `translate` (file + folder), `diff`, and `check`. ### 3 · Translate a folder ```bash i18n-ai-translate translate -i i18n/en -o fr es de \ -e chatgpt -m gpt-5.2 ``` Recursively translates every `*.json` file in `en` and writes the results to `i18n/fr`, `i18n/es`, and `i18n/de`. ### 4 · Translate only what changed ```bash i18n-ai-translate diff \ -b i18n/en-before.json -a i18n/en.json \ -l en -e claude -m claude-sonnet-4-6 ``` Preserves every existing translation; only added/modified keys are re-translated, only deleted keys are removed. Per-locale writes are persisted as each language finishes, so a mid-run crash doesn't discard completed work. ### 5 · Check an existing translation ```bash i18n-ai-translate check -i i18n/en.json -o fr de \ -e chatgpt -m gpt-5.2 --format json ``` Runs the verification pipeline against your existing translations without writing anything. Emits a structured report of keys the model flagged. Exits non-zero if any issue is found, so you can gate CI on it. ### 6 · Keep PRs up‑to‑date Add a one‑liner GitHub Action to auto‑translate whenever `en.json` changes: ```yaml - uses: taahamahdi/i18n-ai-translate@master with: json-file-path: i18n/en.json api-key: ${{ secrets.OPENAI_API_KEY }} ``` --- ## CLI cheat‑sheet ```bash translate -i <src> -o <lang…> [options] # Translate a file or folder diff -b <before> -a <after> [options] # Re‑translate only edited keys check -i <src> -o <lang…> [options] # Verify existing translations (no writes) ``` Common flags (all subcommands accept these unless noted): | Flag | Default | Description | | ------------------------- | --------------- | ------------------------------------------------------------------------------- | | `-e, --engine` | chatgpt | chatgpt · gemini · claude · ollama | | `-m, --model` | gpt‑5.2 | e.g. `gemini‑2.5‑flash`, `claude‑sonnet‑4‑6`, `llama3.3` | | `-l, --input-language` | from filename | ISO‑639‑1 code or English name (`en`, `French`) — BCP‑47 tags like `pt-BR` OK | | `-r, --rate-limit-ms` | engine‑specific | Minimum gap between requests | | `--concurrency` | 2 | Batches to run in parallel within one language | | `--language-concurrency` | 1 | Target languages to translate in parallel (shares pool + rate limit) | | `--tokens-per-minute` | off | Extra TPM cap across all workers; pair with `--concurrency` to stay under tier | | `--context <string>` | — | Product/domain context, e.g. `"a B2B invoicing SaaS"` | | `--glossary <path>` | — | JSON file: keep-verbatim terms + forced per-language translations | | `--exclude-languages` | — | Locales to skip (for manually‑maintained targets) | | `--no-continue-on-error` | continue | Abort on first key/batch failure instead of skipping | | `--dry-run` | false | Don't write files, preview instead (translate/diff only) | | `--cache [path]` | off | Reuse a translation memory across runs; skip unchanged strings (translate/diff) | | `--file-format` | from extension | File format: `json`, `po`, `properties`, `strings` (translate/diff/check) | | `--format` | table | `table` or `json` report output (check only) | Full flag list: `i18n-ai-translate <subcommand> --help`. --- ## Use as a library ```ts import { translate, translateDiff, check } from "i18n-ai-translate"; const fr = await translate({ inputJSON: require("./en.json"), inputLanguageCode: "en", outputLanguageCode: "fr", engine: "chatgpt", model: "gpt-5.2", apiKey: process.env.OPENAI_API_KEY, context: "a music trivia game for Discord", // optional concurrency: 4, // optional }); const report = await check({ inputJSON: require("./en.json"), targetJSON: require("./fr.json"), inputLanguageCode: "en", outputLanguageCode: "fr", engine: "chatgpt", model: "gpt-5.2", apiKey: process.env.OPENAI_API_KEY, }); // report.issues = [{ key, original, translated, issue, suggestion }] ``` --- ## Advanced topics * **Prompt modes**: `csv` (faster, GPT‑class models only) vs `json` (structured output, works with weaker models too) * **Custom prompts**: swap in your own generation/verification prompts via `--override-prompt` * **Translation memory**: `--cache [path]` stores translations in a JSON file (default `.i18n-ai-translate-cache.json`) and reuses them on later runs, so unchanged strings are never re-sent to the model. The key is the source text + languages + `--context` — independent of engine/model, so the cache survives a provider switch. Library callers can pass their own `cache` object. * **Glossary**: `--glossary <path>` points to a JSON file that steers terminology — `doNotTranslate` keeps brand/product names verbatim, and `terms` forces exact per-language translations: `{ "doNotTranslate": ["Acme"], "terms": { "fr": { "Account": "Compte" } } }`. The rules are injected into both the generation and verification prompts; only the run's target language is applied (with BCP-47 base-subtag fallback, so `pt` covers `pt-BR`). * **Plural awareness**: keys ending in `_one`/`_other`/`_few`/`_many` get a CLDR plural hint in JSON mode * **Placeholders**: `{{variables}}` are preserved; customise delimiters with `-p`/`-s` * **Rate-limit handling**: per-engine defaults + exponential backoff; `--tokens-per-minute` adds TPM cap