larvitgeodata

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">  <supplementalData> <version number="$Revision: 11914 $"/> <transforms> <transform source="Latin" target="Katakana" direction="both"> <comment># note: a global filter is more efficient, but MUST include all source chars</comment> <comment>#:: [\u0000-\u007E 、。゙-゜ァ-ー｡-ﾟ [:Latin:][:Katakana:] [:nonspacing mark:]] ;</comment> <comment># MINIMAL FILTER GENERATED FOR: Latin-Katakana</comment> <comment>### WARNING -- must add width filter, both here and below!!! ###</comment> <tRule>:: [[ᄀ-ᄒᄚᄡ\u1160-ᅵᆪᆬ-ᆭᆰ-ᆵ←-↓│■○\u3000-。「-」゙-゚ァ-ロワヲ-ヴヷヺ-ー！-～￠-￦][',.A-Za-z~À-ÖØ-öø-ďĒ-ĥĨ-İĴ-ķĹ-ľŃ-ňŌ-őŔ-ťŨ-žƠ-ơƯ-ưǍ-ǜǞ-ǣǦ-ǭǰǴ-ǵǸ-țȞ-ȟȦ-ȳ̄Ӣ-ӣӮ-ӯḀ-ẙẠ-ỹᾱᾹῑῙῡῩK-Å]] ;</tRule> <tRule>:: [:Latin:] fullwidth-halfwidth ();</tRule> <tRule>:: NFD (NFC);</tRule> <tRule>:: Lower (); # whenever transliterating from cased to uncased script, include this</tRule> <comment># :: NFD () ; # this would catch the odd cases where a lowercase is not in NFD, but none are important for Japanese</comment> <comment># Uses modified Hepburn. Small changes to make unambiguous.</comment> <comment># | Kunrei-shiki: Hepburn/MHepburn</comment> <comment># | ------------------------------</comment> <comment># | si: shi</comment> <comment># | si ~ya: sha</comment> <comment># | si ~yu: shu</comment> <comment># | si ~yo: sho</comment> <comment># | zi: ji</comment> <comment># | zi ~ya: ja</comment> <comment># | zi ~yu: ju</comment> <comment># | zi ~yo: jo</comment> <comment># | ti: chi</comment> <comment># | ti ~ya: cha</comment> <comment># | ti ~yu: chu</comment> <comment># | ti ~yu: cho</comment> <comment># | tu: tsu</comment> <comment># | di: ji/dji</comment> <comment># | du: zu/dzu</comment> <comment># | hu: fu</comment> <comment># | For foreign words:</comment> <comment># | -----------------</comment> <comment># | se ~i si</comment> <comment># | si ~e she</comment> <comment># |</comment> <comment># | ze ~i zi</comment> <comment># | zi ~e je</comment> <comment># |</comment> <comment># | te ~i ti</comment> <comment># | ti ~e che</comment> <comment># | te ~u tu</comment> <comment># |</comment> <comment># | de ~i di</comment> <comment># | de ~u du</comment> <comment># | de ~i di</comment> <comment># |</comment> <comment># | he ~u: hu</comment> <comment># | hu ~a fa</comment> <comment># | hu ~i fi</comment> <comment># | hu ~e he</comment> <comment># | hu ~o ho</comment> <comment># Most small forms are generated, but if necessary</comment> <comment># explicit small forms are given with ~a, ~ya, etc.</comment> <comment>#------------------------------------------------------</comment> <comment># Variables</comment> <tRule>$vowel = [aeiou] ;</tRule> <tRule>$consonant = [bcdfghjklmnpqrstvwxyz] ;</tRule> <tRule>$macron = ̄ ;</tRule> <comment># Variables used for doubled-consonants with tsu</comment> <tRule>$kana = [ぁ-ゔ] ;</tRule> <tRule>$voice = [゙゛];</tRule> <tRule>$semivoice = [゚゜];</tRule> <tRule>$k_start = [カキクケコかきくけこ] ;</tRule> <tRule>$s_start = [サシスセソさしすせそ] ;</tRule> <tRule>$j_start = [シし] $voice ;</tRule> <tRule>$t_start = [タチツテトたちつてと] ;</tRule> <tRule>$n_start = [ナニヌネノンなにぬねの] ;</tRule> <tRule>$h_start = [ハヒヘホはひへほ] ;</tRule> <tRule>$f_start = [フふ] ;</tRule> <tRule>$m_start = [マミムメモまみむめも] ;</tRule> <tRule>$y_start = [ヤユヨやゆよ] ;</tRule> <tRule>$r_start = [ラリルレロらりるれろ] ;</tRule> <tRule>$w_start = [ワヰヱヲわゐゑを] ;</tRule> <tRule>$v_start = [ワヰヱヲ]゙ ;</tRule> <tRule>$voweled_basekana = [ァ-オカキクケコサシスセソタチッツテトナ-ノハヒフヘホマ-ヲヵヶ] ;</tRule> <comment># if ン is followed by $n_quoter, then it needs an</comment> <comment># apostrophe after its romaji form to disambiguate it.</comment> <comment># e.g., ンア ! = ナ, so represent as "n'a", not "na".</comment> <tRule>$n_quoter = [アイウエオナニヌネノヤユヨン] ;</tRule> <tRule>$small_y = [ャィュェョ] ;</tRule> <tRule>$iteration = ゝ ;</tRule> <comment>#------------------------------------------------------</comment> <comment># katakana rules</comment> <comment># Punctuation</comment> <tRule>'.' ↔ 。;</tRule> <tRule>',' ↔ 、;</tRule> <comment># ' ' } [a-z] → ; # delete spaces before latin</comment> <comment># ' ' ← [^' '゠-ヿ] {} ['゠-ヿ] ; #insert spaces before hiragana</comment> <comment># Iteration Mark</comment> <comment># Copy previous letter § marks</comment> <comment># TODO</comment> <comment># | $1 $1 ← ($kana [[:M:]$voice$semivoice]?) $iteration</comment> <comment># Specials for katakana -- not shared with hiragana</comment> <tRule>va ↔ ヷ ;</tRule> <tRule>vi ↔ ヸ ;</tRule> <tRule>ve ↔ ヹ ;</tRule> <tRule>vo ↔ ヺ ;</tRule> <tRule>'~ka' ↔ ヵ ;</tRule> <tRule>'~ke' ↔ ヶ ;</tRule> <comment># ~~~ begin shared rules ~~~</comment> <comment>#special</comment> <tRule>ya ← '~'ャ;</tRule> <tRule>yi ← '~'ィ ;</tRule> <tRule>yu ← '~'ュ;</tRule> <tRule>ye ← '~'ェ;</tRule> <tRule>yo ← '~'ョ;</tRule> <comment>#normal</comment> <tRule>a ↔ ア ;</tRule> <tRule>b | '~' ← ビ} $small_y ;</tRule> <tRule>by } $vowel → ビ | '~y' ;</tRule> <tRule>ba ↔ バ ;</tRule> <tRule>bi ↔ ビ ;</tRule> <tRule>bu ↔ ブ ;</tRule> <tRule>be ↔ ベ ;</tRule> <tRule>bo ↔ ボ ;</tRule> <tRule>c } i → | s ;</tRule> <tRule>c } e → | s ;</tRule> <tRule>da ↔ ダ ;</tRule> <tRule>di ↔ ディ ;</tRule> <tRule>du ↔ デゥ ;</tRule> <tRule>de ↔ デ ;</tRule> <tRule>do ↔ ド ;</tRule> <tRule>dzu ↔ ヅ ;</tRule> <tRule>dja ← ヂャ ;</tRule> <tRule>dji'~i' ← ヂィ ; # liu</tRule> <tRule>dju ← ヂュ ;</tRule> <tRule>dje ← ヂェ ;</tRule> <tRule>djo ← ヂョ ;</tRule> <tRule>dji ↔ ヂ ;</tRule> <tRule>dj } $vowel → ヂ | '~y' ;</tRule> <comment># TODO: QUESTION: use ĵĴżŻ instead of dj, dz</comment> <tRule>cha ← チャ ;</tRule> <tRule>chi'~i' ← チィ ; # liu</tRule> <tRule>chu ← チュ ;</tRule> <tRule>che ← チェ ;</tRule> <tRule>cho ← チョ ;</tRule> <tRule>chi ↔ チ ;</tRule> <tRule>ch } $vowel → チ | '~y' ;</tRule> <tRule>e ↔ エ ;</tRule> <tRule>g | '~' ← ギ} $small_y ;</tRule> <tRule>gy } $vowel → ギ | '~y' ;</tRule> <tRule>ga ↔ ガ ;</tRule> <tRule>gi ↔ ギ ;</tRule> <tRule>gu ↔ グ ;</tRule> <tRule>ge ↔ ゲ ;</tRule> <tRule>go ↔ ゴ ;</tRule> <tRule>i ↔ イ ;</tRule> <comment># j } $vowel → ジ | '~y' ;</comment> <tRule>ja ↔ ジャ ;</tRule> <tRule>ji'~i' ← ジィ ; # liu</tRule> <tRule>ju ↔ ジュ ;</tRule> <tRule>je ↔ ジェ ;</tRule> <tRule>jo ↔ ジョ ;</tRule> <tRule>ji ↔ ジ ;</tRule> <tRule>k | '~' ← キ} $small_y ;</tRule> <tRule>ky } $vowel → キ | '~y' ;</tRule> <tRule>ka ↔ カ ;</tRule> <tRule>ki ↔ キ ;</tRule> <tRule>ku ↔ ク ;</tRule> <tRule>ke ↔ ケ ;</tRule> <tRule>ko ↔ コ ;</tRule> <tRule>m | '~' ← ミ} $small_y ;</tRule> <tRule>my } $vowel → ミ | '~y' ;</tRule> <tRule>ma ↔ マ ;</tRule> <tRule>mi ↔ ミ ;</tRule> <tRule>mu ↔ ム ;</tRule> <tRule>me ↔ メ ;</tRule> <tRule>mo ↔ モ ;</tRule> <tRule>m } [pbfv] → ン ;</tRule> <tRule>n | '~' ← ニ } $small_y ;</tRule> <tRule>ny } $vowel → ニ | '~y' ;</tRule> <tRule>na ↔ ナ ;</tRule> <tRule>ni ↔ ニ ;</tRule> <tRule>nu ↔ ヌ ;</tRule> <tRule>ne ↔ ネ ;</tRule> <tRule>no ↔ ノ ;</tRule> <tRule>o ↔ オ ;</tRule> <tRule>p | '~' ← ピ } $small_y ;</tRule> <tRule>py } $vowel → ピ | '~y' ;</tRule> <tRule>pa ↔ パ ;</tRule> <tRule>pi ↔ ピ ;</tRule> <tRule>pu ↔ プ ;</tRule> <tRule>pe ↔ ペ ;</tRule> <tRule>po ↔ ポ ;</tRule> <tRule>h | '~' ← ヒ } $small_y ;</tRule> <tRule>hy } $vowel → ヒ | '~y' ;</tRule> <tRule>ha ↔ ハ ;</tRule> <tRule>hi ↔ ヒ ;</tRule> <tRule>hu ↔ ヘゥ ;</tRule> <tRule>he ↔ ヘ ;</tRule> <tRule>ho ↔ ホ ;</tRule> <comment># f | '~' ← フ } $small_y ;</comment> <comment># f } $vowel → フ | '~' ;</comment> <tRule>fa ↔ ファ ;</tRule> <tRule>fi ↔ フィ ;</tRule> <tRule>fe ↔ フェ ;</tRule> <tRule>fo ↔ フォ ;</tRule> <tRule>fu ↔ フ ;</tRule> <tRule>r | '~' ← リ } $small_y ;</tRule> <tRule>ry } $vowel → リ | '~y' ;</tRule> <tRule>ra ↔ ラ ;</tRule> <tRule>ri ↔ リ ;</tRule> <tRule>ru ↔ ル ;</tRule> <tRule>re ↔ レ ;</tRule> <tRule>ro ↔ ロ ;</tRule> <tRule>za ↔ ザ ;</tRule> <tRule>zi ↔ ゼィ ;</tRule> <tRule>zu ↔ ズ ;</tRule> <tRule>ze ↔ ゼ ;</tRule> <tRule>zo ↔ ゾ ;</tRule> <tRule>sa ↔ サ ;</tRule> <tRule>si ↔ セィ ;</tRule> <tRule>su ↔ ス ;</tRule> <tRule>se ↔ セ ;</tRule> <tRule>so ↔ ソ ;</tRule> <tRule>sha ← シャ ;</tRule> <tRule>shi'~i' ← シィ ; # liu</tRule> <tRule>shu ← シュ ;</tRule> <tRule>she ← シェ ;</tRule> <tRule>sho ← ショ ;</tRule> <tRule>shi ↔ シ ;</tRule> <tRule>sh } $vowel → シ | '~y' ;</tRule> <tRule>ta ↔ タ ;</tRule> <tRule>ti ↔ ティ ;</tRule> <tRule>tu ↔ テゥ ;</tRule> <tRule>te ↔ テ ;</tRule> <tRule>to ↔ ト ;</tRule> <tRule>tsu ↔ ツ ;</tRule> <comment># v } $vowel → ヴ | '~' ;</comment> <comment>#'v~a' ← ヴァ ; # liu</comment> <comment>#'v~i' ← ヴィ ; # liu</comment> <comment>#'v~e' ← ヴェ ; # liu</comment> <comment>#'v~o' ← ヴォ ; # liu</comment> <tRule>vu ↔ ヴ ;</tRule> <tRule>u ↔ ウ ;</tRule> <comment># w } $vowel → ウ | '~' ;</comment> <tRule>wa ↔ ワ ;</tRule> <tRule>wi ↔ ヰ ;</tRule> <tRule>wu → ウ ;</tRule> <tRule>we ↔ ヱ ;</tRule> <tRule>wo ↔ ヲ ;</tRule> <tRule>ya ↔ ヤ ;</tRule> <tRule>yi → イ ;</tRule> <tRule>yu ↔ ユ ;</tRule> <tRule>ye → エ ;</tRule> <tRule>yo ↔ ヨ ;</tRule> <comment># double consonants</comment> <comment>#specials</comment> <tRule>s } sh → ッ ;</tRule> <tRule>t } ch → ッ ;</tRule> <comment>#voiced</comment> <tRule>j } j ↔ ッ } $j_start ;</tRule> <tRule>b } b ↔ ッ } [$h_start$f_start] $voice;</tRule> <tRule>d } d ↔ ッ } $t_start $voice;</tRule> <tRule>g } g ↔ ッ } $k_start $voice;</tRule> <tRule>p } p ↔ ッ } [$h_start$f_start] $semivoice;</tRule> <comment># v } v ↔ ッ } [ワヰウヱヲう] $voice ;</comment> <tRule>z } z ↔ ッ } $s_start $voice;</tRule> <tRule>v } v ↔ ッ } $v_start;</tRule> <comment># normal</comment> <tRule>k } k ↔ ッ } $k_start ;</tRule> <tRule>m } m ↔ ッ } $m_start ;</tRule> <tRule>n } n ↔ ッ } $n_start ;</tRule> <tRule>h } h ↔ ッ } $h_start ;</tRule> <tRule>f } f ↔ ッ } $f_start ;</tRule> <tRule>r } r ↔ ッ } $r_start ;</tRule> <tRule>t } t ↔ ッ } $t_start ;</tRule> <tRule>s } s ↔ ッ } $s_start ;</tRule> <tRule>w } w ↔ ッ } $w_start;</tRule> <tRule>y } y ↔ ッ } $y_start;</tRule> <comment># completeness</comment> <tRule>x } x → ッ ;</tRule> <tRule>c } k → ッ ;</tRule> <tRule>c } c → ッ ;</tRule> <tRule>c } q → ッ ;</tRule> <tRule>l } l → ッ ;</tRule> <tRule>q } q → ッ ;</tRule> <comment># y } y → ッ ;</comment> <comment># w } w → ッ ;</comment> <comment># prolonged vowel mark. this indicates a doubling of</comment> <comment># the preceding vowel sound</comment> <comment>#a ← a { ー ; # liu</comment> <comment>#e ← e { ー ; # liu</comment> <comment>#i ← i { ー ; # liu</comment> <comment>#o ← o { ー ; # liu</comment> <comment>#u ← u { ー ; # liu</comment> <tRule>$macron ↔ ー ;</tRule> <comment># small forms</comment> <tRule>'~a' ↔ ァ ;</tRule> <tRule>'~i' ↔ ィ ;</tRule> <tRule>'~u' ↔ ゥ ;</tRule> <tRule>'~e' ↔ ェ ;</tRule> <tRule>'~o' ↔ ォ ;</tRule> <tRule>'~tsu' ↔ ッ ;</tRule> <tRule>'~wa' ↔ ヮ ;</tRule> <tRule>'~ya' ↔ ャ ;</tRule> <tRule>'~yi' → ィ ;</tRule> <tRule>'~yu' ↔ ュ ;</tRule> <tRule>'~ye' → ェ ;</tRule> <tRule>'~yo' ↔ ョ ;</tRule> <comment># iteration marks</comment> <comment># TODO: make more accurate</comment> <tRule>j $1 ← sh (y* $vowel) {ヽ$voice ;</tRule> <tRule>dj $1 ← ch (y* $vowel) {ヽ$voice ;</tRule> <tRule>dz $1 ← ts (y* $vowel) {ヽ$voice ;</tRule> <tRule>g $1 ← k (y* $vowel) {ヽ$voice ;</tRule> <tRule>z $1 ← s (y* $vowel) {ヽ$voice ;</tRule> <tRule>d $1 ← t (y* $vowel) {ヽ$voice ;</tRule> <tRule>h $1 ← b (y* $vowel) {ヽ$voice ;</tRule> <tRule>v $1 ← w (y* $vowel) {ヽ$voice ;</tRule> <tRule>sh $1 ← sh (y* $vowel) {ヽ$voice ;</tRule> <tRule>j $1 ← j (y* $vowel) {ヽ$voice ;</tRule> <tRule>ch $1 ← ch (y* $vowel) {ヽ$voice ;</tRule> <tRule>dj $1 ← dj(y* $vowel) {ヽ$voice ;</tRule> <tRule>ts $1 ← ts (y* $vowel) {ヽ$voice ;</tRule> <tRule>dz $1 ← dz (y* $vowel) {ヽ$voice ;</tRule> <tRule>$1 ← ($consonant y* $vowel) {ヽ$voice? ;</tRule> <tRule>$1 ← (.) {ヽ $voice? ; # otherwise repeat last character</tRule> <tRule>← ヽ $voice? ; # delete if no characters found</tRule> <comment># h- rule: lengthens vowel if not followed by a vowel.</comment> <comment># At the point this is applied, latin [cons]?vowel sequences</comment> <comment># have been converted to katakana in NFD form.</comment> <tRule>$voweled_basekana [\u3099 \u309A]? { h → ー ;</tRule> <comment># one-way latin- → kana rules. these do not occur in</comment> <comment># well-formed romaji representing actual japanese text.</comment> <comment># their purpose is to make all romaji map to kana of</comment> <comment># some sort.</comment> <comment># the following are not really necessary, but produce</comment> <comment># slightly more natural results.</comment> <tRule>cy → セィ ;</tRule> <tRule>dy → ディ ;</tRule> <tRule>hy → ヒ ;</tRule> <tRule>sy → セィ ;</tRule> <tRule>ty → ティ ;</tRule> <tRule>zy → ゼィ ;</tRule> <tRule>h → ヘ ;</tRule> <comment># isolated consonants listed here so as not to mask</comment> <comment># longer rules above.</comment> <tRule>ch → チ;</tRule> <tRule>sh → シ ;</tRule> <tRule>dz → ヅ ;</tRule> <tRule>dj → ヂ;</tRule> <tRule>b → ブ ;</tRule> <tRule>d → デ ;</tRule> <tRule>g → グ ;</tRule> <tRule>k → ク ;</tRule> <tRule>m → ム ;</tRule> <tRule>n'' ← ン } $n_quoter ;</tRule> <tRule>n ↔ ン ;</tRule> <tRule>p → プ ;</tRule> <tRule>r → ル ;</tRule> <tRule>s → ス ;</tRule> <tRule>t → テ ;</tRule> <tRule>y → イ ;</tRule> <tRule>z → ズ ;</tRule> <tRule>v → ヴ ;</tRule> <tRule>f → フ;</tRule> <tRule>j → ジ;</tRule> <tRule>w → ウ;</tRule> <tRule>ß → | ss ;</tRule> <tRule>æ → | e ;</tRule> <tRule>ð → | d ;</tRule> <tRule>ø → | u ;</tRule> <tRule>þ → | th ;</tRule> <comment># simple substitutions using backup</comment> <tRule>c → | k ;</tRule> <tRule>l → | r ;</tRule> <tRule>q → | k ;</tRule> <tRule>x → | ks ;</tRule> <comment># ~~~ END shared rules ~~~</comment> <comment>#------------------------------------------------------</comment> <comment># Final cleanup</comment> <tRule>'~' → ; # delete stray tildes between letters</tRule> <tRule>[:Katakana:] { '' } [:Latin:] → ; # delete stray quotes between letters</tRule> <comment># [ʾ[:Nonspacing Mark:]-[゙-゜]] → ; # delete any non-spacing marks that we didn't use</comment> <tRule>:: NFC (NFD) ;</tRule> <tRule>:: ([[:Katakana:][\u309B\u309C\u30A0\u30FC\uFF70\uFF9E\uFF9F]] halfwidth-fullwidth);</tRule> <comment># note: a global filter is more efficient, but MUST include all source chars!!</comment> <comment>#:: ([\u0000-\u007E 、。゙-゜ァ-ー｡-ﾟ [:Latin:][:Katakana:] [:nonspacing mark:]]);</comment> <comment># MINIMAL FILTER GENERATED FOR: Latin-Katakana BACKWARD</comment> <tRule>:: ( [[\ -~¢-£¥-¦¬̄₩｡-ﾾￂ-ￇￊ-ￏￒ-ￗￚ-ￜ￨-￮][~、-。がぎぐげござじずぜぞだぢづでどば-ぱび-ぴぶ-ぷべ-ぺぼ-ぽゔ゙-゛ゞァ-ヺー-ヾ][\u309B\u309C\u30A0\u30FC\uFF70\uFF9E\uFF9F]] ) ;</tRule> <comment># eof</comment> </transform> </transforms> </supplementalData>