UNPKG

@mastra/voice-cloudflare

Version:

Mastra Cloudflare AI voice integration

959 lines (741 loc) • 29.5 kB
# Voice in Mastra Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications. ## Adding voice to agents To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions. ```typescript import { Agent } from '@mastra/core/agent' import { OpenAIVoice } from '@mastra/voice-openai' // Initialize OpenAI voice for TTS const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new OpenAIVoice(), }) ``` You can then use the following voice capabilities: ### Text to Speech (TTS) Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more. For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech). **OpenAI**: ```typescript import { Agent } from '@mastra/core/agent' import { OpenAIVoice } from '@mastra/voice-openai' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new OpenAIVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker responseFormat: 'wav', // Optional: specify a response format }) playAudio(audioStream) ``` Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider. **Azure**: ```typescript import { Agent } from '@mastra/core/agent' import { AzureVoice } from '@mastra/voice-azure' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new AzureVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'en-US-JennyNeural', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider. **ElevenLabs**: ```typescript import { Agent } from '@mastra/core/agent' import { ElevenLabsVoice } from '@mastra/voice-elevenlabs' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new ElevenLabsVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider. **PlayAI**: ```typescript import { Agent } from '@mastra/core/agent' import { PlayAIVoice } from '@mastra/voice-playai' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new PlayAIVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider. **Google**: ```typescript import { Agent } from '@mastra/core/agent' import { GoogleVoice } from '@mastra/voice-google' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new GoogleVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'en-US-Studio-O', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider. **Cloudflare**: ```typescript import { Agent } from '@mastra/core/agent' import { CloudflareVoice } from '@mastra/voice-cloudflare' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new CloudflareVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider. **Deepgram**: ```typescript import { Agent } from '@mastra/core/agent' import { DeepgramVoice } from '@mastra/voice-deepgram' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new DeepgramVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'aura-english-us', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider. **Speechify**: ```typescript import { Agent } from '@mastra/core/agent' import { SpeechifyVoice } from '@mastra/voice-speechify' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new SpeechifyVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'matthew', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider. **Sarvam**: ```typescript import { Agent } from '@mastra/core/agent' import { SarvamVoice } from '@mastra/voice-sarvam' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new SarvamVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider. **Murf**: ```typescript import { Agent } from '@mastra/core/agent' import { MurfVoice } from '@mastra/voice-murf' import { playAudio } from '@mastra/node-audio' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new MurfVoice(), }) const { text } = await voiceAgent.generate('What color is the sky?') // Convert text to speech to an Audio Stream const audioStream = await voiceAgent.voice.speak(text, { speaker: 'default', // Optional: specify a speaker }) playAudio(audioStream) ``` Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider. ### Speech to Text (STT) Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text). You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3). [](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3) **OpenAI**: ```typescript import { Agent } from '@mastra/core/agent' import { OpenAIVoice } from '@mastra/voice-openai' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new OpenAIVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider. **Azure**: ```typescript import { createReadStream } from 'fs' import { Agent } from '@mastra/core/agent' import { AzureVoice } from '@mastra/voice-azure' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new AzureVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider. **ElevenLabs**: ```typescript import { Agent } from '@mastra/core/agent' import { ElevenLabsVoice } from '@mastra/voice-elevenlabs' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new ElevenLabsVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider. **Google**: ```typescript import { Agent } from '@mastra/core/agent' import { GoogleVoice } from '@mastra/voice-google' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new GoogleVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider. **Cloudflare**: ```typescript import { Agent } from '@mastra/core/agent' import { CloudflareVoice } from '@mastra/voice-cloudflare' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new CloudflareVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider. **Deepgram**: ```typescript import { Agent } from '@mastra/core/agent' import { DeepgramVoice } from '@mastra/voice-deepgram' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new DeepgramVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider. **Sarvam**: ```typescript import { Agent } from '@mastra/core/agent' import { SarvamVoice } from '@mastra/voice-sarvam' import { createReadStream } from 'fs' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new SarvamVoice(), }) // Use an audio file from a URL const audioStream = await createReadStream('./how_can_i_help_you.mp3') // Convert audio to text const transcript = await voiceAgent.voice.listen(audioStream) console.log(`User said: ${transcript}`) // Generate a response based on the transcript const { text } = await voiceAgent.generate(transcript) ``` Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider. ### Speech to Speech (STS) Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech). **OpenAI**: ```typescript import { Agent } from '@mastra/core/agent' import { playAudio, getMicrophoneStream } from '@mastra/node-audio' import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new OpenAIRealtimeVoice(), }) // Listen for agent audio responses voiceAgent.voice.on('speaker', ({ audio }) => { playAudio(audio) }) // Initiate the conversation await voiceAgent.voice.speak('How can I help you today?') // Send continuous audio from the microphone const micStream = getMicrophoneStream() await voiceAgent.voice.send(micStream) ``` Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider. **Google**: ```typescript import { Agent } from '@mastra/core/agent' import { playAudio, getMicrophoneStream } from '@mastra/node-audio' import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live' const voiceAgent = new Agent({ id: 'voice-agent', name: 'Voice Agent', instructions: 'You are a voice assistant that can help users with their tasks.', model: 'openai/gpt-5.4', voice: new GeminiLiveVoice({ // Live API mode apiKey: process.env.GOOGLE_API_KEY, model: 'gemini-2.0-flash-exp', speaker: 'Puck', debug: true, // Vertex AI alternative: // vertexAI: true, // project: 'your-gcp-project', // location: 'us-central1', // serviceAccountKeyFile: '/path/to/service-account.json', }), }) // Connect before using speak/send await voiceAgent.voice.connect() // Listen for agent audio responses voiceAgent.voice.on('speaker', ({ audio }) => { playAudio(audio) }) // Listen for text responses and transcriptions voiceAgent.voice.on('writing', ({ text, role }) => { console.log(`${role}: ${text}`) }) // Initiate the conversation await voiceAgent.voice.speak('How can I help you today?') // Send continuous audio from the microphone const micStream = getMicrophoneStream() await voiceAgent.voice.send(micStream) ``` Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider. ## Voice configuration Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers: **OpenAI**: ```typescript // OpenAI Voice Configuration const voice = new OpenAIVoice({ speechModel: { name: 'gpt-3.5-turbo', // Example model name apiKey: process.env.OPENAI_API_KEY, language: 'en-US', // Language code voiceType: 'neural', // Type of voice model }, listeningModel: { name: 'whisper-1', // Example model name apiKey: process.env.OPENAI_API_KEY, language: 'en-US', // Language code format: 'wav', // Audio format }, speaker: 'alloy', // Example speaker name }) ``` Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider. **Azure**: ```typescript // Azure Voice Configuration const voice = new AzureVoice({ speechModel: { name: 'en-US-JennyNeural', // Example model name apiKey: process.env.AZURE_SPEECH_KEY, region: process.env.AZURE_SPEECH_REGION, language: 'en-US', // Language code style: 'cheerful', // Voice style pitch: '+0Hz', // Pitch adjustment rate: '1.0', // Speech rate }, listeningModel: { name: 'en-US', // Example model name apiKey: process.env.AZURE_SPEECH_KEY, region: process.env.AZURE_SPEECH_REGION, format: 'simple', // Output format }, }) ``` Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider. **ElevenLabs**: ```typescript // ElevenLabs Voice Configuration const voice = new ElevenLabsVoice({ speechModel: { voiceId: 'your-voice-id', // Example voice ID model: 'eleven_multilingual_v2', // Example model name apiKey: process.env.ELEVENLABS_API_KEY, language: 'en', // Language code emotion: 'neutral', // Emotion setting }, // ElevenLabs may not have a separate listening model }) ``` Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider. **PlayAI**: ```typescript // PlayAI Voice Configuration const voice = new PlayAIVoice({ speechModel: { name: 'playai-voice', // Example model name speaker: 'emma', // Example speaker name apiKey: process.env.PLAYAI_API_KEY, language: 'en-US', // Language code speed: 1.0, // Speech speed }, // PlayAI may not have a separate listening model }) ``` Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider. **Google**: ```typescript // Google Voice Configuration const voice = new GoogleVoice({ speechModel: { name: 'en-US-Studio-O', // Example model name apiKey: process.env.GOOGLE_API_KEY, languageCode: 'en-US', // Language code gender: 'FEMALE', // Voice gender speakingRate: 1.0, // Speaking rate }, listeningModel: { name: 'en-US', // Example model name sampleRateHertz: 16000, // Sample rate }, }) ``` Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider. **Cloudflare**: ```typescript // Cloudflare Voice Configuration const voice = new CloudflareVoice({ speechModel: { name: 'cloudflare-voice', // Example model name accountId: process.env.CLOUDFLARE_ACCOUNT_ID, apiToken: process.env.CLOUDFLARE_API_TOKEN, language: 'en-US', // Language code format: 'mp3', // Audio format }, // Cloudflare may not have a separate listening model }) ``` Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider. **Deepgram**: ```typescript // Deepgram Voice Configuration const voice = new DeepgramVoice({ speechModel: { name: 'nova-2', // Example model name speaker: 'aura-english-us', // Example speaker name apiKey: process.env.DEEPGRAM_API_KEY, language: 'en-US', // Language code tone: 'formal', // Tone setting }, listeningModel: { name: 'nova-2', // Example model name format: 'flac', // Audio format }, }) ``` Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider. **Speechify**: ```typescript // Speechify Voice Configuration const voice = new SpeechifyVoice({ speechModel: { name: 'speechify-voice', // Example model name speaker: 'matthew', // Example speaker name apiKey: process.env.SPEECHIFY_API_KEY, language: 'en-US', // Language code speed: 1.0, // Speech speed }, // Speechify may not have a separate listening model }) ``` Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider. **Sarvam**: ```typescript // Sarvam Voice Configuration const voice = new SarvamVoice({ speechModel: { name: 'sarvam-voice', // Example model name apiKey: process.env.SARVAM_API_KEY, language: 'en-IN', // Language code style: 'conversational', // Style setting }, // Sarvam may not have a separate listening model }) ``` Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider. **Murf**: ```typescript // Murf Voice Configuration const voice = new MurfVoice({ speechModel: { name: 'murf-voice', // Example model name apiKey: process.env.MURF_API_KEY, language: 'en-US', // Language code emotion: 'happy', // Emotion setting }, // Murf may not have a separate listening model }) ``` Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider. **OpenAI Realtime**: ```typescript // OpenAI Realtime Voice Configuration const voice = new OpenAIRealtimeVoice({ speechModel: { name: 'gpt-3.5-turbo', // Example model name apiKey: process.env.OPENAI_API_KEY, language: 'en-US', // Language code }, listeningModel: { name: 'whisper-1', // Example model name apiKey: process.env.OPENAI_API_KEY, format: 'ogg', // Audio format }, speaker: 'alloy', // Example speaker name }) ``` For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime). **Google Gemini Live**: ```typescript // Google Gemini Live Voice Configuration const voice = new GeminiLiveVoice({ speechModel: { name: 'gemini-2.0-flash-exp', // Example model name apiKey: process.env.GOOGLE_API_KEY, }, speaker: 'Puck', // Example speaker name // Google Gemini Live is a realtime bidirectional API without separate speech and listening models }) ``` Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider. **AI SDK**: ```typescript // AI SDK Voice Configuration import { CompositeVoice } from '@mastra/core/voice' import { openai } from '@ai-sdk/openai' import { elevenlabs } from '@ai-sdk/elevenlabs' // Use AI SDK models directly - no need to install separate packages const voice = new CompositeVoice({ input: openai.transcription('whisper-1'), // AI SDK transcription output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech }) // Works seamlessly with your agent const voiceAgent = new Agent({ id: 'aisdk-voice-agent', name: 'AI SDK Voice Agent', instructions: 'You are a helpful assistant with voice capabilities.', model: 'openai/gpt-5.4', voice, }) ``` ### Using Multiple Voice Providers This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS). Start by creating instances of the voice providers with any necessary configuration. ```typescript import { OpenAIVoice } from '@mastra/voice-openai' import { PlayAIVoice } from '@mastra/voice-playai' import { CompositeVoice } from '@mastra/core/voice' import { playAudio, getMicrophoneStream } from '@mastra/node-audio' // Initialize OpenAI voice for STT const input = new OpenAIVoice({ listeningModel: { name: 'whisper-1', apiKey: process.env.OPENAI_API_KEY, }, }) // Initialize PlayAI voice for TTS const output = new PlayAIVoice({ speechModel: { name: 'playai-voice', apiKey: process.env.PLAYAI_API_KEY, }, }) // Combine the providers using CompositeVoice const voice = new CompositeVoice({ input, output, }) // Implement voice interactions using the combined voice provider const audioStream = getMicrophoneStream() // Assume this function gets audio input const transcript = await voice.listen(audioStream) // Log the transcribed text console.log('Transcribed text:', transcript) // Convert text to speech const responseAudio = await voice.speak(`You said: ${transcript}`, { speaker: 'default', // Optional: specify a speaker, responseFormat: 'wav', // Optional: specify a response format }) // Play the audio response playAudio(responseAudio) ``` ### Using AI SDK Model Providers You can also use AI SDK models directly with `CompositeVoice`: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { openai } from '@ai-sdk/openai' import { elevenlabs } from '@ai-sdk/elevenlabs' import { playAudio, getMicrophoneStream } from '@mastra/node-audio' // Use AI SDK models directly - no provider setup needed const voice = new CompositeVoice({ input: openai.transcription('whisper-1'), // AI SDK transcription output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech }) // Works the same way as Mastra providers const audioStream = getMicrophoneStream() const transcript = await voice.listen(audioStream) console.log('Transcribed text:', transcript) // Convert text to speech const responseAudio = await voice.speak(`You said: ${transcript}`, { speaker: 'Rachel', // ElevenLabs voice }) playAudio(responseAudio) ``` You can also mix AI SDK models with Mastra providers: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { PlayAIVoice } from '@mastra/voice-playai' import { groq } from '@ai-sdk/groq' const voice = new CompositeVoice({ input: groq.transcription('whisper-large-v3'), // AI SDK for STT output: new PlayAIVoice(), // Mastra provider for TTS }) ``` For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice). ## More resources - [CompositeVoice](https://mastra.ai/reference/voice/composite-voice) - [MastraVoice](https://mastra.ai/reference/voice/mastra-voice) - [OpenAI Voice](https://mastra.ai/reference/voice/openai) - [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime) - [Azure Voice](https://mastra.ai/reference/voice/azure) - [Google Voice](https://mastra.ai/reference/voice/google) - [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live) - [Deepgram Voice](https://mastra.ai/reference/voice/deepgram) - [PlayAI Voice](https://mastra.ai/reference/voice/playai) - [Voice Examples](https://github.com/mastra-ai/voice-examples)