mespeak
Version:
Text to speech synthesizer
133 lines (94 loc) • 5.72 kB
HTML
<html lang="en">
<head>
<meta charset="utf-8" />
<title>meSpeak – Voices & Languages</title>
<style type="text/css">
li { margin-bottom: 0.5em; }
h2,h3 { margin-top: 2em; }
</style>
</head>
<body>
<h1>meSpeak – Voices & Languages</h1>
<p>A short guide to the set-up of languages and voices for meSpeak.<br />
Please mind that meSpeak is based on an Emscripten-port of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a>, so all of the eSpeak grammar applies also to meSpeak.</p>
<h2>Standard Language Files</h2>
<p>meSpeak's language-files provide eSpeak's language- and voice-files in a single package.<br />(Since a voice usually refers to a language and its dictionary, it seems suitable to bundle them together in a single file.)<br />The language-files are of the following structure (JSON):</p>
<xmp>{
"voice_id": "<filename>",
"dict_id": "<filename>",
"dict": "<base64-encoded octet stream>",
"voice": "<base64-encoded octet stream>"
}
</xmp>
<p>The values of <em>voice_id</em> and <em>dict_id</em> are actually UNIX-filenames, <code>dict_id</code> relative to the path of eSpeak's data-directory "<code>espeak-data/</code>", <em>voice_id</em> relative to "<code>espeak-data/voices/</code>".</p>
<p>If we were to embed the files for the langage "<code>en-en</code>", these would be:</p>
<ul>
<li>"<code>en/en-en</code>" for the voice and</li>
<li>"<code>en_dict</code>" for the dictionary used by "en-en"</li>
</ul>
<p>For a standard language-file, you would add a base64-representation as the string value of <em>dict</em> and <em>voice</em> of the respective eSpeak-files.</p>
<h2>Customizing</h2>
<p>There is an alternate layout for meSpeak's language-files, which is espacially usefull for the purpose of customizing and testing:</p>
<xmp>{
"voice_id": "<filename>",
"dict_id": "<filename>",
"dict": "<base64-encoded octet stream>",
"voice": "<text-string>",
"voice_encoding": "text"
}
</xmp>
<p>Since eSpeak's voice-files are actually plain-text files, you may use a simple string for these, if you provide an additional property <code>"voice_encoding": "text"</code> at the same time.</p>
<p><em>For dictionaries, which are a binary files with eSpeak, see the note at the end of the page.</em></p>
<h3>Example</h3>
<p>For an example we will configure a basic female voice for "en-us", which will be named "en-us-f".</p>
<ol>
<li>Make a copy of a meSpeak-language file (json), which you want to modify (in this case "<code>voices/en/en-us.json</code>).</li>
<li>Rename the file (e.g.: "<code>en-us-f.json</code>") and open it in editor.</li>
<li>Download the source of <a href="http://espeak.sourceforge.net/" target="_blank">eSpeak</a> and go to the "<code>espeak-data/</code>" directory.</li>
<li>The eSpeak-file "<code>espeak-data/voices/en-us</code>" looks like this:
<xmp>// moving towards US English
name english-us
language en-us 2
language en-r
language en 3
gender male
// and more, skipped here
</xmp></li>
<li>Rename the "<code>name</code>" parameter to make it unique (e.g.: "<code>name english-us-f</code>").</li>
<li>Change any paramaters as you whish, in this case change "<code>gender male</code>" to "<code>gender female</code>" for a female voice.</li>
<li>You should have arrived at something like this (first line removed, since it is just a comment):
<xmp>name english-us-f
language en-us 2
language en-r
language en 3
gender female
</xmp></li>
<li>Replace any line-breaks by "<code>\n</code>" in order to get a valid JSON-string:
<xmp>"name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female"</xmp>
And use this as a value for the "<code>voice</code>"-property of the JSON-file.</li>
<li>Add the line <code>"voice_encoding": "text"</code> to the JSON to indicate that the voice is plain-text.<br />Your voice file should now look like this:
<xmp>Content of file: "en-us-f.json":
{
"voice_id": "en-us-f",
"dict_id": "en_dict",
"dict": "<base64-encoded octet stream>",
"voice": "name english-us-f\nlanguage en-us 2\nlanguage en-r\nlanguage en 3\ngender female",
"voice_encoding": "text"
}
</xmp></li>
<li>Save it and load it into meSpeak.</li>
</ol>
<p><em>Please note that eSpeak is not very graceful with syntax errors in a voice-definition and will just throw an error, which will — in the case of meSpeak — show up in the console-log.</em></p>
<p>For further details on voice-parameters and fine-tuning, please refer to the eSpeak-documentation: <a href="http://espeak.sourceforge.net/voices.html" target="_blank">http://espeak.sourceforge.net/voices.html</a>.</p>
<h2>Custom Dictionaries</h2>
<p>eSpeak's dictonaries are binary files, which must be compiled with eSpeak first.<br />
You would have to install eSpeak and compile a file following the <a href="http://espeak.sourceforge.net/docindex.html" target="_blank">eSpeak documentation</a>.</br />
Further, you would insert a base64-encoded string of the resulting object-file's content as the value of the <em>dict</em> property of a meSpeak-language-file.<br />
Finally, you would set a suiting and unique value for the property <em>dict_id</em> (UNIX file path).</p>
<p>There is no shortcut to this. Sorry.</p>
<p> </p>
<p>Norbert Landsteiner<br />
Vienna, July 2013</p>
</body>
</html>