language-data

Version:

Linguistic data useful for font testing and development.

39 lines (35 loc) • 4.89 kB

Markdown

# Data structure The JSON data file [`/dist/language-data.json`](https://github.com/hyvyys/language-data/blob/master/dist/language-data.json) is generated from JavaScript source [`/src/languageData.js`](https://github.com/hyvyys/language-data/blob/master/src/languageData.js) and contains an array of entries, each containing the following fields: Field | Data type | Description --- | --- | --- **language** | `String` | Language name in English. **altNames** | `Array` of `String` | Alternative language names, also used for looking up HTML tags if default fails. **htmlTag** | `String` | A minimal [BCP-47 language tag](https://www.ietf.org/rfc/bcp/bcp47.txt) used for HTML lang attribute. Typically equivalent to the 2-letter ISO-639-1 code or the 3-letter ISO-639-3 code when the former isn't defined. **opentypeTag** | `String` | [Four-character language code](https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags) used by OpenType features. For unsupported languages, some engines (notably Harfbuzz) use ISO-639-3 codes as fallback, so it might be useful to implement them in fonts. **script** | `String` | Four letter ISO-15924 script code, e.g. `Latn` or `Cyrl`. Or the arbitrary value `IPA`, used for the IPA entry. **scriptName** | `String` | ISO-15924 script name, e.g. `Latin` or `Cyrillic`. `IPA` for the (pseudo-)language IPA. **region** | `String` | Arbitrary geographical region that the language belongs in. **speakers** | `Number` | Number of L1 (native) speakers. Can be set to 0 for artificial languages or typographical conventions that don't correspond to an actual language, e.g. phonetic transcription. **pangrams** | `Array` of `String` | Pangrams, i.e. sentences that contain all letters of the language's alphabet. **letterings** | `Array` of `String` | Letterings, i.e. strings of words starting with each letter of the language's alphabet, preferably also repeating the initial letter within. This way a single word can be used to show off both uppercase and lowercase in a natural setting. **sentences** | `Array` of `String` | Single sentences in the given language, approx. 100-200 characters. **paragraphs** | `Array` of `String` | Paragraphs, i.e. longer passages in the given language, approx. 250-750 characters. **smallcaps** | `Array` of `String` | Paragraphs or sentences in HTML, sprinkled with small caps words formatted like this: `<span style='font-variant-caps: all-small-caps;'>AWOL<span>` **gotchas** | `Array` of `Object` | Typographic challenges specific to given language, e.g. required ligatures, kerning/spacing pairs (also for punctuation), things to look out for when adding language support to a font. gotchas[i].**topic** | `String` | Concerned letters or their names (applies to diacritics), or other concise description of the issue. gotchas[i].**tags** | `Array` of `String` | One or more of:<ul><li>`metrics` — for issues related to spacing or kerning,</li><li>`ligature` — concerning a possibly needed ligature,</li><li>`contextual` — concerning a possibly needed contextual alternate,</li><li>`localization` — related to alternate localized glyphs, (gotchas without this tag are just pointers to making a better font in general)</li><li>`congruency` — regarding interplay between design of particular glyphs</li><li>`optional` — for issues that might be considered irrelevant (the described feature is more `nice-to-have` than `must-have`).</li></ul> gotchas[i].**description** | `String` | Description of the issue and/or design recommendations. gotchas[i].**tests** | `Array` of `String` | Strings that can be used to test a font against the issue. **specialCharacters** | `String` | Special characters (mainly accented letters — diacritics) used by the language. **alphabet** | `String` | The letters of the language's alphabet in order, separated by spaces. Typically A-Z with `specialCharacters` intertwined or appended, depending on the language's convention. **alphabetIsSorted** | `Boolean` | If true, ignore sorting suggested by JavaScript. **optionalCharacters** | `String` | Optional characters used by the language on rare occasions. **optionalCharactersNote** | `String` | Details regarding the usage of optional characters. **pseudo** | `Boolean` | Set to `true` for writing systems that are not everyday orthographies of spoken languages (e.g. linguistics). <br> > #### Do not edit manually > This documentation file is generated from [`/src/LanguageDataParser/entryFormat.js`](https://github.com/hyvyys/language-data/blob/master/src/LanguageDataParser/entryFormat.js) > by the script at [`/scripts/build.js`](https://github.com/hyvyys/language-data/blob/master/scripts/build.js). > > To update it, edit either and run `npm run build`. > Then you can paste the result here to evaluate the preview but instead of saving, commit your local changes.