larvitgeodata
Version:
Geo data, primarily ISO territories, languages etc. Data fetched mostly from CLDR.
396 lines (394 loc) • 17.7 kB
text/xml
<!--
Copyright © 1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<supplementalData>
<version number="$Revision: 11914 $"/>
<transforms>
<transform source="Latin" target="Katakana" direction="both">
<comment># note: a global filter is more efficient, but MUST include all source chars</comment>
<comment>#:: [\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ [:Latin:][:Katakana:] [:nonspacing mark:]] ;</comment>
<comment># MINIMAL FILTER GENERATED FOR: Latin-Katakana</comment>
<comment>### WARNING -- must add width filter, both here and below!!! ###</comment>
<tRule>:: [[ᄀ-ᄒᄚᄡ\u1160-ᅵᆪᆬ-ᆭᆰ-ᆵ←-↓│■○\u3000-。「-」゙-゚ァ-ロワヲ-ヴヷヺ-ー!-~¢-₩][',.A-Za-z~À-ÖØ-öø-ďĒ-ĥĨ-İĴ-ķĹ-ľŃ-ňŌ-őŔ-ťŨ-žƠ-ơƯ-ưǍ-ǜǞ-ǣǦ-ǭǰǴ-ǵǸ-țȞ-ȟȦ-ȳ̄Ӣ-ӣӮ-ӯḀ-ẙẠ-ỹᾱᾹῑῙῡῩK-Å]] ;</tRule>
<tRule>:: [:Latin:] fullwidth-halfwidth ();</tRule>
<tRule>:: NFD (NFC);</tRule>
<tRule>:: Lower (); # whenever transliterating from cased to uncased script, include this</tRule>
<comment># :: NFD () ; # this would catch the odd cases where a lowercase is not in NFD, but none are important for Japanese</comment>
<comment># Uses modified Hepburn. Small changes to make unambiguous.</comment>
<comment># | Kunrei-shiki: Hepburn/MHepburn</comment>
<comment># | ------------------------------</comment>
<comment># | si: shi</comment>
<comment># | si ~ya: sha</comment>
<comment># | si ~yu: shu</comment>
<comment># | si ~yo: sho</comment>
<comment># | zi: ji</comment>
<comment># | zi ~ya: ja</comment>
<comment># | zi ~yu: ju</comment>
<comment># | zi ~yo: jo</comment>
<comment># | ti: chi</comment>
<comment># | ti ~ya: cha</comment>
<comment># | ti ~yu: chu</comment>
<comment># | ti ~yu: cho</comment>
<comment># | tu: tsu</comment>
<comment># | di: ji/dji</comment>
<comment># | du: zu/dzu</comment>
<comment># | hu: fu</comment>
<comment># | For foreign words:</comment>
<comment># | -----------------</comment>
<comment># | se ~i si</comment>
<comment># | si ~e she</comment>
<comment># |</comment>
<comment># | ze ~i zi</comment>
<comment># | zi ~e je</comment>
<comment># |</comment>
<comment># | te ~i ti</comment>
<comment># | ti ~e che</comment>
<comment># | te ~u tu</comment>
<comment># |</comment>
<comment># | de ~i di</comment>
<comment># | de ~u du</comment>
<comment># | de ~i di</comment>
<comment># |</comment>
<comment># | he ~u: hu</comment>
<comment># | hu ~a fa</comment>
<comment># | hu ~i fi</comment>
<comment># | hu ~e he</comment>
<comment># | hu ~o ho</comment>
<comment># Most small forms are generated, but if necessary</comment>
<comment># explicit small forms are given with ~a, ~ya, etc.</comment>
<comment>#------------------------------------------------------</comment>
<comment># Variables</comment>
<tRule>$vowel = [aeiou] ;</tRule>
<tRule>$consonant = [bcdfghjklmnpqrstvwxyz] ;</tRule>
<tRule>$macron = ̄ ;</tRule>
<comment># Variables used for doubled-consonants with tsu</comment>
<tRule>$kana = [ぁ-ゔ] ;</tRule>
<tRule>$voice = [゙゛];</tRule>
<tRule>$semivoice = [゚゜];</tRule>
<tRule>$k_start = [カキクケコかきくけこ] ;</tRule>
<tRule>$s_start = [サシスセソさしすせそ] ;</tRule>
<tRule>$j_start = [シし] $voice ;</tRule>
<tRule>$t_start = [タチツテトたちつてと] ;</tRule>
<tRule>$n_start = [ナニヌネノンなにぬねの] ;</tRule>
<tRule>$h_start = [ハヒヘホはひへほ] ;</tRule>
<tRule>$f_start = [フふ] ;</tRule>
<tRule>$m_start = [マミムメモまみむめも] ;</tRule>
<tRule>$y_start = [ヤユヨやゆよ] ;</tRule>
<tRule>$r_start = [ラリルレロらりるれろ] ;</tRule>
<tRule>$w_start = [ワヰヱヲわゐゑを] ;</tRule>
<tRule>$v_start = [ワヰヱヲ]゙ ;</tRule>
<tRule>$voweled_basekana = [ァ-オカキクケコサシスセソタチッツテトナ-ノハヒフヘホマ-ヲヵヶ] ;</tRule>
<comment># if ン is followed by $n_quoter, then it needs an</comment>
<comment># apostrophe after its romaji form to disambiguate it.</comment>
<comment># e.g., ン ア ! = ナ, so represent as "n'a", not "na".</comment>
<tRule>$n_quoter = [ア イ ウ エ オ ナ ニ ヌ ネ ノ ヤ ユ ヨ ン] ;</tRule>
<tRule>$small_y = [ャィュェョ] ;</tRule>
<tRule>$iteration = ゝ ;</tRule>
<comment>#------------------------------------------------------</comment>
<comment># katakana rules</comment>
<comment># Punctuation</comment>
<tRule>'.' ↔ 。;</tRule>
<tRule>',' ↔ 、;</tRule>
<comment># ' ' } [a-z] → ; # delete spaces before latin</comment>
<comment># ' ' ← [^' '゠-ヿ] {} ['゠-ヿ] ; #insert spaces before hiragana</comment>
<comment># Iteration Mark</comment>
<comment># Copy previous letter § marks</comment>
<comment># TODO</comment>
<comment># | $1 $1 ← ($kana [[:M:]$voice$semivoice]?) $iteration</comment>
<comment># Specials for katakana -- not shared with hiragana</comment>
<tRule>va ↔ ヷ ;</tRule>
<tRule>vi ↔ ヸ ;</tRule>
<tRule>ve ↔ ヹ ;</tRule>
<tRule>vo ↔ ヺ ;</tRule>
<tRule>'~ka' ↔ ヵ ;</tRule>
<tRule>'~ke' ↔ ヶ ;</tRule>
<comment># ~~~ begin shared rules ~~~</comment>
<comment>#special</comment>
<tRule>ya ← '~'ャ;</tRule>
<tRule>yi ← '~'ィ ;</tRule>
<tRule>yu ← '~'ュ;</tRule>
<tRule>ye ← '~'ェ;</tRule>
<tRule>yo ← '~'ョ;</tRule>
<comment>#normal</comment>
<tRule>a ↔ ア ;</tRule>
<tRule>b | '~' ← ヒ ゙} $small_y ;</tRule>
<tRule>by } $vowel → ビ | '~y' ;</tRule>
<tRule>ba ↔ バ ;</tRule>
<tRule>bi ↔ ビ ;</tRule>
<tRule>bu ↔ ブ ;</tRule>
<tRule>be ↔ ベ ;</tRule>
<tRule>bo ↔ ボ ;</tRule>
<tRule>c } i → | s ;</tRule>
<tRule>c } e → | s ;</tRule>
<tRule>da ↔ ダ ;</tRule>
<tRule>di ↔ ディ ;</tRule>
<tRule>du ↔ デゥ ;</tRule>
<tRule>de ↔ デ ;</tRule>
<tRule>do ↔ ド ;</tRule>
<tRule>dzu ↔ ヅ ;</tRule>
<tRule>dja ← ヂャ ;</tRule>
<tRule>dji'~i' ← ヂィ ; # liu</tRule>
<tRule>dju ← ヂュ ;</tRule>
<tRule>dje ← ヂェ ;</tRule>
<tRule>djo ← ヂョ ;</tRule>
<tRule>dji ↔ ヂ ;</tRule>
<tRule>dj } $vowel → ヂ | '~y' ;</tRule>
<comment># TODO: QUESTION: use ĵĴżŻ instead of dj, dz</comment>
<tRule>cha ← チャ ;</tRule>
<tRule>chi'~i' ← チィ ; # liu</tRule>
<tRule>chu ← チュ ;</tRule>
<tRule>che ← チェ ;</tRule>
<tRule>cho ← チョ ;</tRule>
<tRule>chi ↔ チ ;</tRule>
<tRule>ch } $vowel → チ | '~y' ;</tRule>
<tRule>e ↔ エ ;</tRule>
<tRule>g | '~' ← ギ} $small_y ;</tRule>
<tRule>gy } $vowel → ギ | '~y' ;</tRule>
<tRule>ga ↔ ガ ;</tRule>
<tRule>gi ↔ ギ ;</tRule>
<tRule>gu ↔ グ ;</tRule>
<tRule>ge ↔ ゲ ;</tRule>
<tRule>go ↔ ゴ ;</tRule>
<tRule>i ↔ イ ;</tRule>
<comment># j } $vowel → ジ | '~y' ;</comment>
<tRule>ja ↔ ジャ ;</tRule>
<tRule>ji'~i' ← ジィ ; # liu</tRule>
<tRule>ju ↔ ジュ ;</tRule>
<tRule>je ↔ ジェ ;</tRule>
<tRule>jo ↔ ジョ ;</tRule>
<tRule>ji ↔ ジ ;</tRule>
<tRule>k | '~' ← キ} $small_y ;</tRule>
<tRule>ky } $vowel → キ | '~y' ;</tRule>
<tRule>ka ↔ カ ;</tRule>
<tRule>ki ↔ キ ;</tRule>
<tRule>ku ↔ ク ;</tRule>
<tRule>ke ↔ ケ ;</tRule>
<tRule>ko ↔ コ ;</tRule>
<tRule>m | '~' ← ミ} $small_y ;</tRule>
<tRule>my } $vowel → ミ | '~y' ;</tRule>
<tRule>ma ↔ マ ;</tRule>
<tRule>mi ↔ ミ ;</tRule>
<tRule>mu ↔ ム ;</tRule>
<tRule>me ↔ メ ;</tRule>
<tRule>mo ↔ モ ;</tRule>
<tRule>m } [pbfv] → ン ;</tRule>
<tRule>n | '~' ← ニ } $small_y ;</tRule>
<tRule>ny } $vowel → ニ | '~y' ;</tRule>
<tRule>na ↔ ナ ;</tRule>
<tRule>ni ↔ ニ ;</tRule>
<tRule>nu ↔ ヌ ;</tRule>
<tRule>ne ↔ ネ ;</tRule>
<tRule>no ↔ ノ ;</tRule>
<tRule>o ↔ オ ;</tRule>
<tRule>p | '~' ← ピ } $small_y ;</tRule>
<tRule>py } $vowel → ピ | '~y' ;</tRule>
<tRule>pa ↔ パ ;</tRule>
<tRule>pi ↔ ピ ;</tRule>
<tRule>pu ↔ プ ;</tRule>
<tRule>pe ↔ ペ ;</tRule>
<tRule>po ↔ ポ ;</tRule>
<tRule>h | '~' ← ヒ } $small_y ;</tRule>
<tRule>hy } $vowel → ヒ | '~y' ;</tRule>
<tRule>ha ↔ ハ ;</tRule>
<tRule>hi ↔ ヒ ;</tRule>
<tRule>hu ↔ ヘゥ ;</tRule>
<tRule>he ↔ ヘ ;</tRule>
<tRule>ho ↔ ホ ;</tRule>
<comment># f | '~' ← フ } $small_y ;</comment>
<comment># f } $vowel → フ | '~' ;</comment>
<tRule>fa ↔ ファ ;</tRule>
<tRule>fi ↔ フィ ;</tRule>
<tRule>fe ↔ フェ ;</tRule>
<tRule>fo ↔ フォ ;</tRule>
<tRule>fu ↔ フ ;</tRule>
<tRule>r | '~' ← リ } $small_y ;</tRule>
<tRule>ry } $vowel → リ | '~y' ;</tRule>
<tRule>ra ↔ ラ ;</tRule>
<tRule>ri ↔ リ ;</tRule>
<tRule>ru ↔ ル ;</tRule>
<tRule>re ↔ レ ;</tRule>
<tRule>ro ↔ ロ ;</tRule>
<tRule>za ↔ ザ ;</tRule>
<tRule>zi ↔ ゼィ ;</tRule>
<tRule>zu ↔ ズ ;</tRule>
<tRule>ze ↔ ゼ ;</tRule>
<tRule>zo ↔ ゾ ;</tRule>
<tRule>sa ↔ サ ;</tRule>
<tRule>si ↔ セィ ;</tRule>
<tRule>su ↔ ス ;</tRule>
<tRule>se ↔ セ ;</tRule>
<tRule>so ↔ ソ ;</tRule>
<tRule>sha ← シャ ;</tRule>
<tRule>shi'~i' ← シィ ; # liu</tRule>
<tRule>shu ← シュ ;</tRule>
<tRule>she ← シェ ;</tRule>
<tRule>sho ← ショ ;</tRule>
<tRule>shi ↔ シ ;</tRule>
<tRule>sh } $vowel → シ | '~y' ;</tRule>
<tRule>ta ↔ タ ;</tRule>
<tRule>ti ↔ ティ ;</tRule>
<tRule>tu ↔ テゥ ;</tRule>
<tRule>te ↔ テ ;</tRule>
<tRule>to ↔ ト ;</tRule>
<tRule>tsu ↔ ツ ;</tRule>
<comment># v } $vowel → ヴ | '~' ;</comment>
<comment>#'v~a' ← ヴァ ; # liu</comment>
<comment>#'v~i' ← ヴィ ; # liu</comment>
<comment>#'v~e' ← ヴェ ; # liu</comment>
<comment>#'v~o' ← ヴォ ; # liu</comment>
<tRule>vu ↔ ヴ ;</tRule>
<tRule>u ↔ ウ ;</tRule>
<comment># w } $vowel → ウ | '~' ;</comment>
<tRule>wa ↔ ワ ;</tRule>
<tRule>wi ↔ ヰ ;</tRule>
<tRule>wu → ウ ;</tRule>
<tRule>we ↔ ヱ ;</tRule>
<tRule>wo ↔ ヲ ;</tRule>
<tRule>ya ↔ ヤ ;</tRule>
<tRule>yi → イ ;</tRule>
<tRule>yu ↔ ユ ;</tRule>
<tRule>ye → エ ;</tRule>
<tRule>yo ↔ ヨ ;</tRule>
<comment># double consonants</comment>
<comment>#specials</comment>
<tRule>s } sh → ッ ;</tRule>
<tRule>t } ch → ッ ;</tRule>
<comment>#voiced</comment>
<tRule>j } j ↔ ッ } $j_start ;</tRule>
<tRule>b } b ↔ ッ } [$h_start$f_start] $voice;</tRule>
<tRule>d } d ↔ ッ } $t_start $voice;</tRule>
<tRule>g } g ↔ ッ } $k_start $voice;</tRule>
<tRule>p } p ↔ ッ } [$h_start$f_start] $semivoice;</tRule>
<comment># v } v ↔ ッ } [ワヰウヱヲう] $voice ;</comment>
<tRule>z } z ↔ ッ } $s_start $voice;</tRule>
<tRule>v } v ↔ ッ } $v_start;</tRule>
<comment># normal</comment>
<tRule>k } k ↔ ッ } $k_start ;</tRule>
<tRule>m } m ↔ ッ } $m_start ;</tRule>
<tRule>n } n ↔ ッ } $n_start ;</tRule>
<tRule>h } h ↔ ッ } $h_start ;</tRule>
<tRule>f } f ↔ ッ } $f_start ;</tRule>
<tRule>r } r ↔ ッ } $r_start ;</tRule>
<tRule>t } t ↔ ッ } $t_start ;</tRule>
<tRule>s } s ↔ ッ } $s_start ;</tRule>
<tRule>w } w ↔ ッ } $w_start;</tRule>
<tRule>y } y ↔ ッ } $y_start;</tRule>
<comment># completeness</comment>
<tRule>x } x → ッ ;</tRule>
<tRule>c } k → ッ ;</tRule>
<tRule>c } c → ッ ;</tRule>
<tRule>c } q → ッ ;</tRule>
<tRule>l } l → ッ ;</tRule>
<tRule>q } q → ッ ;</tRule>
<comment># y } y → ッ ;</comment>
<comment># w } w → ッ ;</comment>
<comment># prolonged vowel mark. this indicates a doubling of</comment>
<comment># the preceding vowel sound</comment>
<comment>#a ← a { ー ; # liu</comment>
<comment>#e ← e { ー ; # liu</comment>
<comment>#i ← i { ー ; # liu</comment>
<comment>#o ← o { ー ; # liu</comment>
<comment>#u ← u { ー ; # liu</comment>
<tRule>$macron ↔ ー ;</tRule>
<comment># small forms</comment>
<tRule>'~a' ↔ ァ ;</tRule>
<tRule>'~i' ↔ ィ ;</tRule>
<tRule>'~u' ↔ ゥ ;</tRule>
<tRule>'~e' ↔ ェ ;</tRule>
<tRule>'~o' ↔ ォ ;</tRule>
<tRule>'~tsu' ↔ ッ ;</tRule>
<tRule>'~wa' ↔ ヮ ;</tRule>
<tRule>'~ya' ↔ ャ ;</tRule>
<tRule>'~yi' → ィ ;</tRule>
<tRule>'~yu' ↔ ュ ;</tRule>
<tRule>'~ye' → ェ ;</tRule>
<tRule>'~yo' ↔ ョ ;</tRule>
<comment># iteration marks</comment>
<comment># TODO: make more accurate</comment>
<tRule>j $1 ← sh (y* $vowel) {ヽ$voice ;</tRule>
<tRule>dj $1 ← ch (y* $vowel) {ヽ$voice ;</tRule>
<tRule>dz $1 ← ts (y* $vowel) {ヽ$voice ;</tRule>
<tRule>g $1 ← k (y* $vowel) {ヽ$voice ;</tRule>
<tRule>z $1 ← s (y* $vowel) {ヽ$voice ;</tRule>
<tRule>d $1 ← t (y* $vowel) {ヽ$voice ;</tRule>
<tRule>h $1 ← b (y* $vowel) {ヽ$voice ;</tRule>
<tRule>v $1 ← w (y* $vowel) {ヽ$voice ;</tRule>
<tRule>sh $1 ← sh (y* $vowel) {ヽ$voice ;</tRule>
<tRule>j $1 ← j (y* $vowel) {ヽ$voice ;</tRule>
<tRule>ch $1 ← ch (y* $vowel) {ヽ$voice ;</tRule>
<tRule>dj $1 ← dj(y* $vowel) {ヽ$voice ;</tRule>
<tRule>ts $1 ← ts (y* $vowel) {ヽ$voice ;</tRule>
<tRule>dz $1 ← dz (y* $vowel) {ヽ$voice ;</tRule>
<tRule>$1 ← ($consonant y* $vowel) {ヽ$voice? ;</tRule>
<tRule>$1 ← (.) {ヽ $voice? ; # otherwise repeat last character</tRule>
<tRule>← ヽ $voice? ; # delete if no characters found</tRule>
<comment># h- rule: lengthens vowel if not followed by a vowel.</comment>
<comment># At the point this is applied, latin [cons]?vowel sequences</comment>
<comment># have been converted to katakana in NFD form.</comment>
<tRule>$voweled_basekana [\u3099 \u309A]? { h → ー ;</tRule>
<comment># one-way latin- → kana rules. these do not occur in</comment>
<comment># well-formed romaji representing actual japanese text.</comment>
<comment># their purpose is to make all romaji map to kana of</comment>
<comment># some sort.</comment>
<comment># the following are not really necessary, but produce</comment>
<comment># slightly more natural results.</comment>
<tRule>cy → セィ ;</tRule>
<tRule>dy → ディ ;</tRule>
<tRule>hy → ヒ ;</tRule>
<tRule>sy → セィ ;</tRule>
<tRule>ty → ティ ;</tRule>
<tRule>zy → ゼィ ;</tRule>
<tRule>h → ヘ ;</tRule>
<comment># isolated consonants listed here so as not to mask</comment>
<comment># longer rules above.</comment>
<tRule>ch → チ;</tRule>
<tRule>sh → シ ;</tRule>
<tRule>dz → ヅ ;</tRule>
<tRule>dj → ヂ;</tRule>
<tRule>b → ブ ;</tRule>
<tRule>d → デ ;</tRule>
<tRule>g → グ ;</tRule>
<tRule>k → ク ;</tRule>
<tRule>m → ム ;</tRule>
<tRule>n'' ← ン } $n_quoter ;</tRule>
<tRule>n ↔ ン ;</tRule>
<tRule>p → プ ;</tRule>
<tRule>r → ル ;</tRule>
<tRule>s → ス ;</tRule>
<tRule>t → テ ;</tRule>
<tRule>y → イ ;</tRule>
<tRule>z → ズ ;</tRule>
<tRule>v → ヴ ;</tRule>
<tRule>f → フ;</tRule>
<tRule>j → ジ;</tRule>
<tRule>w → ウ;</tRule>
<tRule>ß → | ss ;</tRule>
<tRule>æ → | e ;</tRule>
<tRule>ð → | d ;</tRule>
<tRule>ø → | u ;</tRule>
<tRule>þ → | th ;</tRule>
<comment># simple substitutions using backup</comment>
<tRule>c → | k ;</tRule>
<tRule>l → | r ;</tRule>
<tRule>q → | k ;</tRule>
<tRule>x → | ks ;</tRule>
<comment># ~~~ END shared rules ~~~</comment>
<comment>#------------------------------------------------------</comment>
<comment># Final cleanup</comment>
<tRule>'~' → ; # delete stray tildes between letters</tRule>
<tRule>[:Katakana:] { '' } [:Latin:] → ; # delete stray quotes between letters</tRule>
<comment># [ʾ[:Nonspacing Mark:]-[゙-゜]] → ; # delete any non-spacing marks that we didn't use</comment>
<tRule>:: NFC (NFD) ;</tRule>
<tRule>:: ([[:Katakana:][\u309B\u309C\u30A0\u30FC\uFF70\uFF9E\uFF9F]] halfwidth-fullwidth);</tRule>
<comment># note: a global filter is more efficient, but MUST include all source chars!!</comment>
<comment>#:: ([\u0000-\u007E 、。 ゙-゜ ァ-ー 。-゚ [:Latin:][:Katakana:] [:nonspacing mark:]]);</comment>
<comment># MINIMAL FILTER GENERATED FOR: Latin-Katakana BACKWARD</comment>
<tRule>:: ( [[\ -~¢-£¥-¦¬̄₩。-하-ᅦᅧ-ᅬᅭ-ᅲᅳ-ᅵ│-○][~、-。がぎぐげござじずぜぞだぢづでどば-ぱび-ぴぶ-ぷべ-ぺぼ-ぽゔ゙-゛ゞァ-ヺー-ヾ][\u309B\u309C\u30A0\u30FC\uFF70\uFF9E\uFF9F]] ) ;</tRule>
<comment># eof</comment>
</transform>
</transforms>
</supplementalData>