watson-speech
Version:
IBM Watson Speech to Text and Text to Speech SDK for web browsers.
159 lines (122 loc) • 10.3 kB
HTML
<html lang="en">
<head>
<meta charset="utf-8">
<title>JSDoc: Home</title>
<script src="scripts/prettify/prettify.js"> </script>
<script src="scripts/prettify/lang-css.js"> </script>
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link type="text/css" rel="stylesheet" href="styles/prettify-tomorrow.css">
<link type="text/css" rel="stylesheet" href="styles/jsdoc-default.css">
</head>
<body>
<div id="main">
<h1 class="page-title">Home</h1>
<h3>watson-speech 0.7.3</h3>
<section>
<article><h1>IBM Watson Speech To Text Browser Client Library</h1><p><a href="https://travis-ci.org/watson-developer-cloud/speech-javascript-sdk"><img src="https://travis-ci.org/watson-developer-cloud/speech-javascript-sdk.svg?branch=master" alt="Build Status"></a></p>
<p>Allows you to easily add voice recognition to any web app with minimal code. </p>
<p><strong>Warning</strong> This library is still early-stage and may see significant breaking changes.</p>
<p><strong>For Web Browsers Only</strong> This library is primarily intended for use in browsers.
Check out <a href="https://www.npmjs.com/package/watson-developer-cloud">watson-developer-cloud</a> to use Watson services (speech and others) from Node.js.</p>
<p>However, a server-side component is required to generate auth tokens.
The examples/ folder includes a node.js one, and SDKs are available for <a href="https://github.com/watson-developer-cloud/node-sdk#authorization">Node.js</a>,
<a href="https://github.com/watson-developer-cloud/java-sdk">Java</a>,
<a href="https://github.com/watson-developer-cloud/python-sdk/blob/master/examples/authorization_v1.py">Python</a>,
and there is also a <a href="http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-tokens.shtml">REST API</a>.</p>
<p>See several examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples</p>
<p>This library is built with <a href="http://browserify.org/">browserify</a> and easy to use in browserify-based projects (<code>npm install --save watson-speech</code>), but you can also grab the compiled bundle from the
<code>dist/</code> folder and use it as a standalone library.</p>
<h2><code>WatsonSpeech.SpeechToText</code> Basic API</h2><p>Complete API docs should be published at http://watson-developer-cloud.github.io/speech-javascript-sdk/</p>
<p>All API methods require an auth token that must be <a href="https://github.com/watson-developer-cloud/node-sdk#authorization">generated server-side</a>.
(Snp teee examples/token-server.js for a basic example.)</p>
<h3><code>.recognizeMicrophone({token})</code> -> <code>RecognizeStream</code></h3><p>Options: No direct options, all provided options are passed to MicrophoneStream and RecognizeStream</p>
<p>Requires the <code>getUserMedia</code> API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia)
Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features</p>
<p>Pipes results through a <code>{FormatStream}</code> by default, set <code>options.format=false</code> to disable.</p>
<h3><code>.recognizeElement({element, token})</code> -> <code>RecognizeStream</code></h3><p>Options: </p>
<ul>
<li><code>element</code>: an <code><audio></code> or <code><video></code> element (could be generated pragmatically, e.g. <code>new Audio()</code>)</li>
<li>Other options passed to MediaElementAudioStream and RecognizeStream</li>
</ul>
<p>Requires that the browser support MediaElement and whatever audio codec is used in your media file.</p>
<p>Will automatically call <code>.play()</code> the <code>element</code>, set <code>options.autoplay=false</code> to disable. Calling <code>.stop()</code> on the returned stream will automatically call <code>.stop()</code> on the <code>element</code>.</p>
<p>Pipes results through a <code>{FormatStream}</code> by default, set <code>options.format=false</code> to disable.</p>
<h3><code>.recognizeBlob({data, token})</code> -> <code>RecognizeStream</code></h3><p>Options: </p>
<ul>
<li><code>data</code>: a <code>Blob</code> (or <code>File</code>) instance. </li>
<li><code>play</code>: (optional, default=<code>false</code>) Attempt to also play the file locally while uploading it for transcription </li>
<li>Other options passed to RecognizeStream</li>
</ul>
<p><code>play</code>requires that the browser support the format; most browsers support wav and ogg/opus, but not flac.)
Will emit a <code>playback-error</code> on the RecognizeStream if playback fails.
Playback will automatically stop when <code>.stop()</code> is called on the RecognizeStream.</p>
<p>Pipes results through a <code>{TimingStream}</code> by if <code>options.play=true</code>, set <code>options.realtime=false</code> to disable.</p>
<p>Pipes results through a <code>{FormatStream}</code> by default, set <code>options.format=false</code> to disable.</p>
<h3>Class <code>RecognizeStream()</code></h3><p>A <a href="https://nodejs.org/api/stream.html">Node.js-style stream</a> of the final text, with some helpers and extra events built in.</p>
<p>RecognizeStream is generally not instantiated directly but rather returned as the result of calling one of the recognize* methods.</p>
<p>The RecognizeStream waits until after receiving data to open a connection.
If no <code>content-type</code> option is set, it will attempt to parse the first chunk of data to determine type.</p>
<p>See speech-to-text/recognize-stream.js for other options.</p>
<h4>Methods</h4><ul>
<li><p><code>.promise()</code>: returns a promise that will resolve to the final text.
Note that you must either set <code>continuous: false</code> or call <code>.stop()</code> on the stream to make the promise resolve in a timely manner.</p>
</li>
<li><p><code>.stop()</code>: stops the stream. No more data will be sent, but the stream may still recieve additional results with the transcription of already-sent audio.
Standard <code>close</code> event will fire once the underlying websocket is closed and <code>end</code> once all of the data is consumed.</p>
</li>
</ul>
<h4>Events</h4><p>In addition to the standard <a href="https://nodejs.org/api/stream.html">Node.js stream events</a>, the following events are fired:</p>
<ul>
<li><code>result</code>: an individual result object from the results array.
May include final or interim transcription, alternatives, word timing, confidence scores, etc. depending on passed in options.
Note: Listening for <code>result</code> will automatically put the stream into flowing mode.</li>
</ul>
<p>(Note: there are several other events, but they are intended for internal usage)</p>
<h3>Class <code>FormatStream()</code></h3><p>Pipe a <code>RecognizeStream</code> to a format stream, and the resulting text and <code>results</code> events will have basic formatting applied:</p>
<ul>
<li>Capitalize the first word of each sentence</li>
<li>Add a period to the end</li>
<li>Fix any "cruft" in the transcription</li>
<li>A few other tweaks for asian languages and such.</li>
</ul>
<p>Inherits <code>.promise()</code> and <code>.stop()</code> methods and <code>result</code> event from the <code>RecognizeStream</code>.</p>
<h3>Class <code>TimingStream()</code></h3><p>For use with <code>.recognizeBlob({play: true})</code> - slows the results down to match the audio. Pipe in the <code>RecognizeStream</code> (or <code>FormatStream</code>) and listen for results as usual.</p>
<p>Inherits <code>.stop()</code> method and <code>result</code> event from the <code>RecognizeStream</code>.</p>
<h2>Changelog</h2><h3>v0.7</h3><ul>
<li>Changed <code>playFile</code> option of <code>recognizeBlob()</code> to just <code>play</code>, corrected default</li>
<li>Added <code>options.format=true</code> to <code>recognize*()</code> to pipe text through a FormatStream</li>
<li>Added <code>options.realtime=options.play</code> to <code>recognizeBlob()</code> to automatically pipe results through a TimingStream when playing locally</li>
<li>Added <code>close</code> and <code>end</code> events to TimingStream</li>
<li>Added <code>delay</code> option to <code>TimingStream</code></li>
<li>Moved compiled binary to GitHub Releases (in addition to uncompiled source on npm).</li>
<li>Misc. doc and internal improvements</li>
</ul>
<h2>todo</h2><ul>
<li>Solidify API</li>
<li>support objectMode instead of having random events</li>
<li>add text-to-speech support</li>
<li>add an example that includes alternatives and word confidence scores</li>
<li>enable eslint</li>
<li>break components into standalone npm modules where it makes sense</li>
<li>record which shim/pollyfills would be useful to extend partial support to older browsers (Promise, etc.)</li>
<li>run integration tests on travis (fall back to offline server for pull requests)</li>
<li>more tests in general</li>
<li>update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)</li>
<li>improve docs</li>
</ul></article>
</section>
</div>
<nav>
<h2><a href="index.html">Home</a></h2><h3>Classes</h3><ul><li><a href="FormatStream.html">FormatStream</a></li><li><a href="MediaElementAudioStream.html">MediaElementAudioStream</a></li><li><a href="RecognizeStream.html">RecognizeStream</a></li><li><a href="TimingStream.html">TimingStream</a></li><li><a href="WebAudioL16Stream.html">WebAudioL16Stream</a></li></ul><h3>Events</h3><ul><li><a href="RecognizeStream.html#event:connection-close">connection-close</a></li><li><a href="RecognizeStream.html#event:data">data</a></li><li><a href="RecognizeStream.html#event:error">error</a></li><li><a href="RecognizeStream.html#event:results">results</a></li></ul>
</nav>
<br class="clear">
<footer>
Documentation generated by <a href="https://github.com/jsdoc3/jsdoc">JSDoc 3.4.0</a> on Mon Feb 08 2016 20:03:58 GMT+0000 (UTC)
</footer>
<script> prettyPrint(); </script>
<script src="scripts/linenumber.js"> </script>
</body>
</html>