@jackdbd/eleventy-plugin-text-to-speech
Version:
Eleventy plugin for the Google Cloud Text-to-Speech API
179 lines (126 loc) • 8.57 kB
Markdown
# @jackdbd/eleventy-plugin-text-to-speech
[](https://badge.fury.io/js/@jackdbd%2Feleventy-plugin-text-to-speech)

Eleventy plugin that synthesizes **any text** you want, on **any page** of your Eleventy site, using the [Google Cloud Text-to-Speech API](https://cloud.google.com/text-to-speech). You can either self-host the audio assets this plugin generates, or host them on [Cloud Storage](https://cloud.google.com/storage).
> :warning: The Cloud Text-to-Speech API has a [limit of 5000 characters](https://cloud.google.com/text-to-speech/quotas).
>
> See also:
>
> - [this issue of the Wavenet for Chrome extension](https://github.com/wavenet-for-chrome/extension/issues/12)
>
> - [this discussion on Google Groups](https://groups.google.com/g/google-translate-api/c/2JsRdq0tEdA)
## Installation
```sh
npm install --save-dev @jackdbd/eleventy-plugin-text-to-speech
```
## Preliminary Operations
### Enable the Text-to-Speech API
Before you can begin using the Text-to-Speech API, you must enable it. You can enable the API with the following command:
```sh
gcloud services enable texttospeech.googleapis.com
```
### Set up authentication via a service account
This plugin uses the [official Node.js client library for the Text-to-Speech API](https://github.com/googleapis/nodejs-text-to-speech). In order to authenticate to any Google Cloud API you will need some kind of credentials. At the moment this plugin supports only authentication via a service account JSON key.
First, create a service account that can use the Text-to-Speech API. You can also reuse an existing service account if you want. You just need the service account, no need to configure any IAM permissions.
```sh
gcloud iam service-accounts create sa-text-to-speech-user \
--display-name "Text-to-Speech user SA"
```
Second, [download the JSON key of this service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and store it somewhere safe. Do **not** track this file in git.
### Optional: Create Cloud Storage bucket (only if you want to host audio files on Cloud Storage)
Create a Cloud Storage bucket in your desired [location](https://cloud.google.com/storage/docs/locations). Enable [uniform bucket-level access](https://cloud.google.com/storage/docs/uniform-bucket-level-access) and use the `nearline` [storage class](https://cloud.google.com/storage/docs/storage-classes).
```sh
gsutil mb \
-p $GCP_PROJECT_ID \
-l $CLOUD_STORAGE_LOCATION \
-c nearline \
-b on \
gs://bkt-eleventy-plugin-text-to-speech-audio-files
```
If you want, you can check that uniform bucket-level access is **enabled** using this command:
```sh
gsutil uniformbucketlevelaccess get \
gs://bkt-eleventy-plugin-text-to-speech-audio-files
```
Make the bucket's objects publicly available for read access (otherwise people will not be able to listen/download the audio files):
```sh
gsutil iam ch allUsers:objectViewer \
gs://bkt-eleventy-plugin-text-to-speech-audio-files
```
## Usage
Let's say that you are hosting your Eleventy website on Cloudflare Pages. Your current deployment is at the URL indicated by the [environment variable](https://developers.cloudflare.com/pages/platform/build-configuration/#environment-variables) `CF_PAGES_URL`.
### Self-hosting the generated audio assets
If you want to self-host the audio assets that this plugin generates and use all default options, you can register the plugin with this code:
```js
const { plugin: tts } = require('@jackdbd/eleventy-plugin-text-to-speech')
module.exports = function (eleventyConfig) {
// some eleventy configuration...
eleventyConfig.addPlugin(tts, {
audioHost: process.env.CF_PAGES_URL
? new URL(`${process.env.CF_PAGES_URL}/assets/audio`)
: new URL('http://localhost:8090/assets/audio')
})
// some more eleventy configuration...
}
```
### Hosting the generated audio assets on Cloud Storage
If you want to host the audio assets on a Cloud Storage bucket and configure the rules for the audio matches, you could register the plugin using something like this:
```js
const { plugin: tts } = require('@jackdbd/eleventy-plugin-text-to-speech')
module.exports = function (eleventyConfig) {
// some eleventy configuration...
eleventyConfig.addPlugin(tts, {
audioHost: {
bucketName: 'some-bucket-containing-publicly-readable-files'
},
rules: [
// synthesize the text contained in all <h1> tags, in all posts
{
regex: new RegExp('posts\\/.*\\.html$'),
cssSelectors: ['h1']
},
// synthesize the text contained in all <p> tags that start with "Once upon a time", in all HTML pages, except the 404.html page
{
regex: new RegExp('^((?!404).)*\\.html$'),
xPathExpressions: ['//p[starts-with(., "Once upon a time")]']
}
],
voice: 'en-GB-Wavenet-C'
})
// some more eleventy configuration...
}
```
### Multiple hosts
If you want to host the generated audio assets on multiple hosts, register this plugin multiple times. Here are a few examples:
- self-host some audio assets, and host on a Cloud Storage bucket some other assets
- host all audio assets on Cloud Storage, but host some on one bucket, and some others on a different bucket.
Have a look at the Eleventy configuration of the [demo-site in this monorepo](../demo-site/README.md).
## Configuration
### Required parameters
| Parameter | Explanation |
| --- | --- |
| `audioHost` | Each audio host should have a matching writer responsible for writing/uploading the assets to the host. |
### Options
| Option | Default | Explanation |
| --- | --- | --- |
| `audioEncodings` | `['OGG_OPUS', 'MP3']` | List of [audio encodings](https://cloud.google.com/speech-to-text/docs/encoding#audio-encodings) to use when generating audio assets from text matches. |
| `audioInnerHTML` | see in [src/dom.ts](./src/dom.ts) | Function to use to generate the innerHTML of the `<audio>` tag to inject in the page for each text match. |
| `cacheExpiration` | `365d` | Expiration for the 11ty AssetCache. See [here](https://www.11ty.dev/docs/plugins/fetch/#change-the-cache-duration). |
| `collectionName` | `audio-items` | Name of the 11ty collection created by this plugin. |
| `keyFilename` | `process.env.GOOGLE_APPLICATION_CREDENTIALS` | credentials for the Cloud Text-to-Speech API (and for the Cloud Storage API if you don't set it in `audioHost`). |
| `rules` | see in [src/constants.ts](./src/constants.ts) | Rules that determine which texts to convert into speech. |
| `transformName` | `inject-audio-tags-into-html` | Name of the 11ty transform created by this plugin. |
| `voice` | `en-US-Standard-J` | Voice to use when generating audio assets from text matches. The Speech-to-Text API supports [these voices](https://cloud.google.com/text-to-speech/docs/voices), and might have different [pricing](https://cloud.google.com/text-to-speech/pricing) for diffent voices. |
> :warning: Don't forget to set either `keyFilename` or the `GOOGLE_APPLICATION_CREDENTIALS` environment variable on your build server.
## Debug
This plugin uses the [debug](https://github.com/debug-js/debug) library for logging. You can control what's logged using the `DEBUG` environment variable. For example, if you set your environment variables in a `.envrc` file, you could do:
```sh
# print all logging statements
export DEBUG=eleventy-plugin-text-to-speech/*
# print just the logging statements from the dom module and the writers module
export DEBUG=eleventy-plugin-text-to-speech/dom,eleventy-plugin-text-to-speech/writers
# print all logging statements, except the ones from the dom module and the transforms module
export DEBUG=eleventy-plugin-text-to-speech/*,-eleventy-plugin-text-to-speech/dom,-eleventy-plugin-text-to-speech/transforms
```
## Credits
I had the idea of this plugin while reading the code of the homonym [eleventy-plugin-text-to-speech](https://github.com/larryhudson/eleventy-plugin-text-to-speech) by [Larry Hudson](https://larryhudson.io/). There are a few differences between these plugins, the main one is that this plugin uses the [Google Cloud Text-to-Speech API](https://cloud.google.com/text-to-speech), while Larry's plugin uses the [Microsoft Azure Speech SDK](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-sdk).