mespeak

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>meSpeak – Voices & Languages</title> <style type="text/css"> li { margin-bottom: 0.5em; } h2,h3 { margin-top: 2em; } </style> </head> <body> <h1>meSpeak – Voices & Languages</h1> <p>A short guide to the set-up of languages and voices for meSpeak.<br /> Please mind that meSpeak is based on an Emscripten-port of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a>, so all of the eSpeak grammar applies also to meSpeak.</p> <h2>Standard Language Files</h2> <p>meSpeak's language-files provide eSpeak's language- and voice-files in a single package.<br />(Since a voice usually refers to a language and its dictionary, it seems suitable to bundle them together in a single file.)<br />The language-files are of the following structure (JSON):</p> <xmp>{ "voice_id": "<filename>", "dict_id": "<filename>", "dict": "<base64-encoded octet stream>", "voice": "<base64-encoded octet stream>" } </xmp> <p>The values of <em>voice_id</em> and <em>dict_id</em> are actually UNIX-filenames, <code>dict_id</code> relative to the path of eSpeak's data-directory "<code>espeak-data/</code>", <em>voice_id</em> relative to "<code>espeak-data/voices/</code>".</p> <p>If we were to embed the files for the langage "<code>en-en</code>", these would be:</p> <ul> <li>"<code>en/en-en</code>" for the voice and</li> <li>"<code>en_dict</code>" for the dictionary used by "en-en"</li> </ul> <p>For a standard language-file, you would add a base64-representation as the string value of <em>dict</em> and <em>voice</em> of the respective eSpeak-files.</p> <h2>Customizing</h2> <p>There is an alternate layout for meSpeak's language-files, which is espacially usefull for the purpose of customizing and testing:</p> <xmp>{ "voice_id": "<filename>", "dict_id": "<filename>", "dict": "<base64-encoded octet stream>", "voice": "<text-string>", "voice_encoding": "text" } </xmp> <p>Since eSpeak's voice-files are actually plain-text files, you may use a simple string for these, if you provide an additional property <code>"voice_encoding": "text"</code> at the same time.</p> <p><em>For dictionaries, which are a binary files with eSpeak, see the note at the end of the page.</em></p> <h3>Example</h3> <p>For an example we will configure a basic female voice for "en-us", which will be named "en-us-f".</p> <ol> <li>Make a copy of a meSpeak-language file (json), which you want to modify (in this case "<code>voices/en/en-us.json</code>).</li> <li>Rename the file (e.g.: "<code>en-us-f.json</code>") and open it in editor.</li> <li>Download the source of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a> and go to the "<code>espeak-data/</code>" directory.</li> <li>The eSpeak-file "<code>espeak-data/voices/en-us</code>" looks like this: <xmp>// moving towards US English name english-us language en-us 2 language en-r language en 3 gender male // and more, skipped here </xmp></li> <li>Rename the "<code>name</code>" parameter to make it unique (e.g.: "<code>name english-us-f</code>").</li> <li>Change any paramaters as you whish, in this case change "<code>gender male</code>" to "<code>gender female</code>" for a female voice.</li> <li>You should have arrived at something like this (first line removed, since it is just a comment): <xmp>name english-us-f language en-us 2 language en-r language en 3 gender female </xmp></li> <li>Replace any line-breaks by "<code>\n</code>" in order to get a valid JSON-string: <xmp>"name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female"</xmp> And use this as a value for the "<code>voice</code>"-property of the JSON-file.</li> <li>Add the line <code>"voice_encoding": "text"</code> to the JSON to indicate that the voice is plain-text.<br />Your voice file should now look like this: <xmp>Content of file: "en-us-f.json": { "voice_id": "en-us-f", "dict_id": "en_dict", "dict": "<base64-encoded octet stream>", "voice": "name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female", "voice_encoding": "text" } </xmp></li> <li>Save it and load it into meSpeak.</li> </ol> <p><em>Please note that eSpeak is not very graceful with syntax errors in a voice-definition and will just throw an error, which will — in the case of meSpeak — show up in the console-log.</em></p> <p>For further details on voice-parameters and fine-tuning, please refer to the eSpeak-documentation: <a href="http://espeak.sourceforge.net/voices.html" target="_blank">http://espeak.sourceforge.net/voices.html</a>.</p> <h2>Custom Dictionaries</h2> <p>eSpeak's dictonaries are binary files, which must be compiled with eSpeak first.<br /> You would have to install eSpeak and compile a file following the <a href="http://espeak.sourceforge.net/docindex.html" target="_blank">eSpeak documentation</a>.</br /> Further, you would insert a base64-encoded string of the resulting object-file's content as the value of the <em>dict</em> property of a meSpeak-language-file.<br /> Finally, you would set a suiting and unique value for the property <em>dict_id</em> (UNIX file path).</p> <p>There is no shortcut to this. Sorry.</p> <p> </p> <p>Norbert Landsteiner<br /> Vienna, July 2013</p> </body> </html>