watson-speech
Version:
IBM Watson Speech to Text and Text to Speech SDK for web browsers.
153 lines (116 loc) • 15.6 kB
HTML
<html lang="en">
<head>
<meta charset="utf-8">
<title>JSDoc: Home</title>
<script src="scripts/prettify/prettify.js"> </script>
<script src="scripts/prettify/lang-css.js"> </script>
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link type="text/css" rel="stylesheet" href="styles/prettify-tomorrow.css">
<link type="text/css" rel="stylesheet" href="styles/jsdoc-default.css">
</head>
<body>
<div id="main">
<h1 class="page-title">Home</h1>
<h3>watson-speech 0.32.2</h3>
<section>
<article><h1>IBM Watson Speech Services for Web Browsers</h1><p><a href="https://travis-ci.org/watson-developer-cloud/speech-javascript-sdk"><img src="https://travis-ci.org/watson-developer-cloud/speech-javascript-sdk.svg?branch=master" alt="Build Status"></a>
<a href="https://www.npmjs.com/package/watson-speech"><img src="https://img.shields.io/npm/v/watson-speech.svg" alt="npm-version"></a></p>
<p>Allows you to easily add voice recognition and synthesis to any web app with minimal code.</p>
<h3>Built for Browsers</h3><p>This library is primarily intended for use in web browsers.
Check out <a href="https://www.npmjs.com/package/watson-developer-cloud">watson-developer-cloud</a> to use Watson services (speech and others) from Node.js.</p>
<p>However, a <strong>server-side component is required to generate auth tokens</strong>.
The examples/ folder includes example Node.js and Python servers, and SDKs are available for <a href="https://github.com/watson-developer-cloud/node-sdk#authorization">Node.js</a>,
<a href="https://github.com/watson-developer-cloud/java-sdk">Java</a>,
<a href="https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/authorization_v1.py">Python</a>,
and there is also a <a href="http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-tokens.shtml">REST API</a>.</p>
<h3>Installation - standalone</h3><p>Pre-compiled bundles are available from on GitHub Releases - just download the file and drop it into your website: https://github.com/watson-developer-cloud/speech-javascript-sdk/releases</p>
<h3>Installation - bower</h3><pre class="prettyprint source lang-sh"><code>bower install --save watson-speech</code></pre><h3>Installation - npm with Browserify or Webpack</h3><p>This library can be bundled with <a href="http://browserify.org/">browserify</a> or <a href="http://webpack.github.io/">Webpack</a>
and easy included in larger projects:</p>
<pre class="prettyprint source"><code>npm install --save watson-speech</code></pre><p>This method enables a smaller bundle by only including the desired components, for example:</p>
<pre class="prettyprint source lang-js"><code>var recognizeMic = require('watson-speech/speech-to-text/recognize-microphone');</code></pre><h2>Breaking change for v0.22.0</h2><p>The format of objects emitted in objectMode has changed from <code>{alternatives: [...], index: 1}</code> to <code>{results: [{alternatives: [...]}], result_index: 1}</code>.</p>
<p>There is a new <code>ResultExtractor</code> class that restores the old behavior; <code>recognizeMicrophone()</code> and <code>recognizeFile()</code> both accept a new <code>extract_results</code> option to enable it.</p>
<p>This was done to enable the new <code>speaker_labels</code> feature. The format now exactly matches what the Watson Speech to Text service returns and shouldn't change again unless the Watson service changes.</p>
<h2>API & Examples</h2><p>The basic API is outlined below, see complete API docs at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/</p>
<p>See several basic examples at http://watson-speech.mybluemix.net/ (<a href="https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/">source</a>)</p>
<p>See a more advanced example at https://speech-to-text-demo.mybluemix.net/</p>
<p>All API methods require an auth token that must be <a href="https://github.com/watson-developer-cloud/node-sdk#authorization">generated server-side</a>.
(See https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/ for a couple of basic examples in Node.js and Python.)</p>
<h2><a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech.html"><code>WatsonSpeech.TextToSpeech</code></a></h2><h3><a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_text-to-speech_synthesize.html"><code>.synthesize({text, token})</code></a> -> <code><audio></code></h3><p>Speaks the supplied text through an automatically-created <code><audio></code> element.
Currently limited to text that can fit within a GET URL (this is particularly an issue on <a href="http://stackoverflow.com/questions/32267442/url-length-limitation-of-microsoft-edge">Internet Explorer before Windows 10</a>
where the max length is around 1000 characters after the token is accounted for.)</p>
<p>Options: </p>
<ul>
<li>text - the text to speak</li>
<li>voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.</li>
<li>autoPlay - set to false to prevent the audio from automatically playing</li>
</ul>
<h2><a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html"><code>WatsonSpeech.SpeechToText</code></a></h2><p>The <code>recognizeMicrophone()</code> and <code>recognizeFile()</code> helper methods are recommended for most use-cases. They set up the streams in the appropriate order and enable common options. These two methods are documented below.</p>
<p>The core of the library is the <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/RecognizeStream.html">RecognizeStream</a> that performs the actual transcription, and a collection of other Node.js-style streams that manipulate the data in various ways. For less common use-cases, the core components may be used directly with the helper methods serving as optional templates to follow. The full library is documented at http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text.html</p>
<h3><a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-microphone.html"><code>.recognizeMicrophone({token})</code></a> -> Stream</h3><p>Options: </p>
<ul>
<li><code>keepMic</code>: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox</li>
<li><code>mediaStream</code>: Optionally pass in an existing media stream rather than prompting the user for microphone access.</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/RecognizeStream.html">RecognizeStream</a></li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/SpeakerStream.html">SpeakerStream</a> if <code>options.resultsbySpeaker</code> is set to true</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/FormatStream.html">FormatStream</a> if <code>options.format</code> is not set to false</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/WritableElementStream.html">WritableElementStream</a> if <code>options.outputElement</code> is set</li>
</ul>
<p>Requires the <code>getUserMedia</code> API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia)
Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features</p>
<p>No more data will be set after <code>.stop()</code> is called on the returned stream, but additional results may be recieved for already-sent data.</p>
<h3><a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/module-watson-speech_speech-to-text_recognize-file.html"><code>.recognizeFile({data, token})</code></a> -> Stream</h3><p>Can recognize and optionally attempt to play a URL, <a href="https://developer.mozilla.org/en-US/docs/Web/API/File">File</a> or <a href="https://developer.mozilla.org/en-US/docs/Web/API/Blob">Blob</a>
(such as from an <code><input type="file"/></code> or from an ajax request.)</p>
<p>Options: </p>
<ul>
<li><code>file</code>: a String URL or a <code>Blob</code> or <code>File</code> instance. Note that <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS">CORS</a> restrictions apply to URLs.</li>
<li><code>play</code>: (optional, default=<code>false</code>) Attempt to also play the file locally while uploading it for transcription </li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/RecognizeStream.html">RecognizeStream</a></li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/TimingStream.html">TimingStream</a> if <code>options.realtime</code> is true, or unset and <code>options.play</code> is true</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/SpeakerStream.html">SpeakerStream</a> if <code>options.resultsbySpeaker</code> is set to true</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/FormatStream.html">FormatStream</a> if <code>options.format</code> is not set to false</li>
<li>Other options passed to <a href="http://watson-developer-cloud.github.io/speech-javascript-sdk/master/WritableElementStream.html">WritableElementStream</a> if <code>options.outputElement</code> is set</li>
</ul>
<p><code>play</code>requires that the browser support the format; most browsers support wav and ogg/opus, but not flac.)
Will emit an <code>UNSUPPORTED_FORMAT</code> error on the RecognizeStream if playback fails. This error is special in that it does not stop the streaming of results.</p>
<p>Playback will automatically stop when <code>.stop()</code> is called on the returned stream. </p>
<p>For Mobile Safari compatibility, a URL must be provided, and <code>recognizeFile()</code> must be called in direct response to a user interaction (so the token must be pre-loaded).</p>
<h2>Changes</h2><p>There have been a few breaking changes in recent releases:</p>
<ul>
<li>Removed <code>SpeechToText.recognizeElement()</code> due to quality issues. The code is <a href="https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples/static/audio-video-deprecated">avaliable in an (unsupported) example</a> if you wish to use it with current releases of the SDK.</li>
<li>renamed <code>recognizeBlob</code> to <code>recognizeFile</code> to make the primary usage more apparent</li>
<li>Changed <code>playFile</code> option of <code>recognizeBlob()</code> to just <code>play</code>, corrected default</li>
<li>Changed format of objects emitted in objectMode to exactly match what service sends. Added <code>ResultStrean</code> class and <code>extract_results</code> option to enable older behavior.</li>
<li>Changed <code>playback-error</code> event to just <code>error</code> when recognizing and playing a file. Check for <code>error.name == 'UNSUPPORTED_FORMAT'</code> to identify playback errors. This error is special in that it does not stop the streaming of results.</li>
<li>Renamed <code>recognizeFile()</code>'s <code>data</code> option to <code>file</code> because it now may be a URL. Using a URL enables faster playback and mobile Safari support</li>
</ul>
<p>See <a href="CHANGELOG.md">CHANGELOG.md</a> for a complete list of changes.</p>
<h2>todo</h2><ul>
<li>Further solidify API</li>
<li>break components into standalone npm modules where it makes sense</li>
<li>run integration tests on travis (fall back to offline server for pull requests)</li>
<li>add even more tests</li>
<li>better cross-browser testing (IE, Safari, mobile browsers - maybe saucelabs?)</li>
<li>update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)</li>
<li>move <code>result</code> and <code>results</code> events to node wrapper (along with the deprecation notice)</li>
<li>improve docs</li>
<li>consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html</li>
<li>support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.</li>
<li>look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)</li>
<li>fix bug where TimingStream shows words slightly before they're spoken</li>
</ul></article>
</section>
</div>
<nav>
<h2><a href="index.html">Home</a></h2><h3>Modules</h3><ul><li><a href="module-watson-speech.html">watson-speech</a></li><li><a href="module-watson-speech_speech-to-text.html">watson-speech/speech-to-text</a></li><li><a href="module-watson-speech_speech-to-text_get-models.html">watson-speech/speech-to-text/get-models</a></li><li><a href="module-watson-speech_speech-to-text_recognize-file.html">watson-speech/speech-to-text/recognize-file</a></li><li><a href="module-watson-speech_speech-to-text_recognize-microphone.html">watson-speech/speech-to-text/recognize-microphone</a></li><li><a href="module-watson-speech_text-to-speech.html">watson-speech/text-to-speech</a></li><li><a href="module-watson-speech_text-to-speech_get-voices.html">watson-speech/text-to-speech/get-voices</a></li><li><a href="module-watson-speech_text-to-speech_synthesize.html">watson-speech/text-to-speech/synthesize</a></li></ul><h3>Classes</h3><ul><li><a href="FilePlayer.html">FilePlayer</a></li><li><a href="FormatStream.html">FormatStream</a></li><li><a href="RecognizeStream.html">RecognizeStream</a></li><li><a href="ResultStream.html">ResultStream</a></li><li><a href="SpeakerStream.html">SpeakerStream</a></li><li><a href="TimingStream.html">TimingStream</a></li><li><a href="UrlPlayer.html">UrlPlayer</a></li><li><a href="WebAudioL16Stream.html">WebAudioL16Stream</a></li><li><a href="WritableElementStream.html">WritableElementStream</a></li></ul><h3>Events</h3><ul><li><a href="RecognizeStream.html#event:close">close</a></li><li><a href="RecognizeStream.html#event:data">data</a></li><li><a href="RecognizeStream.html#event:error">error</a></li><li><a href="RecognizeStream.html#event:listening">listening</a></li><li><a href="RecognizeStream.html#event:message">message</a></li><li><a href="RecognizeStream.html#event:open">open</a></li><li><a href="RecognizeStream.html#event:send-data">send-data</a></li><li><a href="RecognizeStream.html#event:send-json">send-json</a></li><li><a href="RecognizeStream.html#event:stop">stop</a></li><li><a href="SpeakerStream.html#event:data">data</a></li></ul><h3>Global</h3><ul><li><a href="global.html#getContentTypeFromFile">getContentTypeFromFile</a></li><li><a href="global.html#playFile">playFile</a></li></ul>
</nav>
<br class="clear">
<footer>
Documentation generated by <a href="https://github.com/jsdoc3/jsdoc">JSDoc 3.4.3</a> on Tue Feb 21 2017 17:41:51 GMT+0000 (UTC)
</footer>
<script> prettyPrint(); </script>
<script src="scripts/linenumber.js"> </script>
</body>
</html>