@mastra/voice-cloudflare
Version:
Mastra Cloudflare AI voice integration
959 lines (741 loc) • 29.5 kB
Markdown
# Voice in Mastra
Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.
## Adding voice to agents
To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
```typescript
import { Agent } from '/core/agent'
import { OpenAIVoice } from '/voice-openai'
// Initialize OpenAI voice for TTS
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new OpenAIVoice(),
})
```
You can then use the following voice capabilities:
### Text to Speech (TTS)
Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.
For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech).
**OpenAI**:
```typescript
import { Agent } from '/core/agent'
import { OpenAIVoice } from '/voice-openai'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new OpenAIVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
responseFormat: 'wav', // Optional: specify a response format
})
playAudio(audioStream)
```
Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
**Azure**:
```typescript
import { Agent } from '/core/agent'
import { AzureVoice } from '/voice-azure'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new AzureVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'en-US-JennyNeural', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
**ElevenLabs**:
```typescript
import { Agent } from '/core/agent'
import { ElevenLabsVoice } from '/voice-elevenlabs'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new ElevenLabsVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
**PlayAI**:
```typescript
import { Agent } from '/core/agent'
import { PlayAIVoice } from '/voice-playai'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new PlayAIVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
**Google**:
```typescript
import { Agent } from '/core/agent'
import { GoogleVoice } from '/voice-google'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new GoogleVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'en-US-Studio-O', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
**Cloudflare**:
```typescript
import { Agent } from '/core/agent'
import { CloudflareVoice } from '/voice-cloudflare'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new CloudflareVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
**Deepgram**:
```typescript
import { Agent } from '/core/agent'
import { DeepgramVoice } from '/voice-deepgram'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new DeepgramVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'aura-english-us', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
**Speechify**:
```typescript
import { Agent } from '/core/agent'
import { SpeechifyVoice } from '/voice-speechify'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new SpeechifyVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'matthew', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
**Sarvam**:
```typescript
import { Agent } from '/core/agent'
import { SarvamVoice } from '/voice-sarvam'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new SarvamVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
**Murf**:
```typescript
import { Agent } from '/core/agent'
import { MurfVoice } from '/voice-murf'
import { playAudio } from '/node-audio'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new MurfVoice(),
})
const { text } = await voiceAgent.generate('What color is the sky?')
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: 'default', // Optional: specify a speaker
})
playAudio(audioStream)
```
Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
### Speech to Text (STT)
Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text).
You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3).
[](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3)
**OpenAI**:
```typescript
import { Agent } from '/core/agent'
import { OpenAIVoice } from '/voice-openai'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new OpenAIVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
**Azure**:
```typescript
import { createReadStream } from 'fs'
import { Agent } from '/core/agent'
import { AzureVoice } from '/voice-azure'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new AzureVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
**ElevenLabs**:
```typescript
import { Agent } from '/core/agent'
import { ElevenLabsVoice } from '/voice-elevenlabs'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new ElevenLabsVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
**Google**:
```typescript
import { Agent } from '/core/agent'
import { GoogleVoice } from '/voice-google'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new GoogleVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
**Cloudflare**:
```typescript
import { Agent } from '/core/agent'
import { CloudflareVoice } from '/voice-cloudflare'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new CloudflareVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
**Deepgram**:
```typescript
import { Agent } from '/core/agent'
import { DeepgramVoice } from '/voice-deepgram'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new DeepgramVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
**Sarvam**:
```typescript
import { Agent } from '/core/agent'
import { SarvamVoice } from '/voice-sarvam'
import { createReadStream } from 'fs'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new SarvamVoice(),
})
// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```
Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
### Speech to Speech (STS)
Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech).
**OpenAI**:
```typescript
import { Agent } from '/core/agent'
import { playAudio, getMicrophoneStream } from '/node-audio'
import { OpenAIRealtimeVoice } from '/voice-openai-realtime'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new OpenAIRealtimeVoice(),
})
// Listen for agent audio responses
voiceAgent.voice.on('speaker', ({ audio }) => {
playAudio(audio)
})
// Initiate the conversation
await voiceAgent.voice.speak('How can I help you today?')
// Send continuous audio from the microphone
const micStream = getMicrophoneStream()
await voiceAgent.voice.send(micStream)
```
Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider.
**Google**:
```typescript
import { Agent } from '/core/agent'
import { playAudio, getMicrophoneStream } from '/node-audio'
import { GeminiLiveVoice } from '/voice-google-gemini-live'
const voiceAgent = new Agent({
id: 'voice-agent',
name: 'Voice Agent',
instructions: 'You are a voice assistant that can help users with their tasks.',
model: 'openai/gpt-5.4',
voice: new GeminiLiveVoice({
// Live API mode
apiKey: process.env.GOOGLE_API_KEY,
model: 'gemini-2.0-flash-exp',
speaker: 'Puck',
debug: true,
// Vertex AI alternative:
// vertexAI: true,
// project: 'your-gcp-project',
// location: 'us-central1',
// serviceAccountKeyFile: '/path/to/service-account.json',
}),
})
// Connect before using speak/send
await voiceAgent.voice.connect()
// Listen for agent audio responses
voiceAgent.voice.on('speaker', ({ audio }) => {
playAudio(audio)
})
// Listen for text responses and transcriptions
voiceAgent.voice.on('writing', ({ text, role }) => {
console.log(`${role}: ${text}`)
})
// Initiate the conversation
await voiceAgent.voice.speak('How can I help you today?')
// Send continuous audio from the microphone
const micStream = getMicrophoneStream()
await voiceAgent.voice.send(micStream)
```
Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
## Voice configuration
Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:
**OpenAI**:
```typescript
// OpenAI Voice Configuration
const voice = new OpenAIVoice({
speechModel: {
name: 'gpt-3.5-turbo', // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US', // Language code
voiceType: 'neural', // Type of voice model
},
listeningModel: {
name: 'whisper-1', // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US', // Language code
format: 'wav', // Audio format
},
speaker: 'alloy', // Example speaker name
})
```
Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
**Azure**:
```typescript
// Azure Voice Configuration
const voice = new AzureVoice({
speechModel: {
name: 'en-US-JennyNeural', // Example model name
apiKey: process.env.AZURE_SPEECH_KEY,
region: process.env.AZURE_SPEECH_REGION,
language: 'en-US', // Language code
style: 'cheerful', // Voice style
pitch: '+0Hz', // Pitch adjustment
rate: '1.0', // Speech rate
},
listeningModel: {
name: 'en-US', // Example model name
apiKey: process.env.AZURE_SPEECH_KEY,
region: process.env.AZURE_SPEECH_REGION,
format: 'simple', // Output format
},
})
```
Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
**ElevenLabs**:
```typescript
// ElevenLabs Voice Configuration
const voice = new ElevenLabsVoice({
speechModel: {
voiceId: 'your-voice-id', // Example voice ID
model: 'eleven_multilingual_v2', // Example model name
apiKey: process.env.ELEVENLABS_API_KEY,
language: 'en', // Language code
emotion: 'neutral', // Emotion setting
},
// ElevenLabs may not have a separate listening model
})
```
Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
**PlayAI**:
```typescript
// PlayAI Voice Configuration
const voice = new PlayAIVoice({
speechModel: {
name: 'playai-voice', // Example model name
speaker: 'emma', // Example speaker name
apiKey: process.env.PLAYAI_API_KEY,
language: 'en-US', // Language code
speed: 1.0, // Speech speed
},
// PlayAI may not have a separate listening model
})
```
Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
**Google**:
```typescript
// Google Voice Configuration
const voice = new GoogleVoice({
speechModel: {
name: 'en-US-Studio-O', // Example model name
apiKey: process.env.GOOGLE_API_KEY,
languageCode: 'en-US', // Language code
gender: 'FEMALE', // Voice gender
speakingRate: 1.0, // Speaking rate
},
listeningModel: {
name: 'en-US', // Example model name
sampleRateHertz: 16000, // Sample rate
},
})
```
Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
**Cloudflare**:
```typescript
// Cloudflare Voice Configuration
const voice = new CloudflareVoice({
speechModel: {
name: 'cloudflare-voice', // Example model name
accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
apiToken: process.env.CLOUDFLARE_API_TOKEN,
language: 'en-US', // Language code
format: 'mp3', // Audio format
},
// Cloudflare may not have a separate listening model
})
```
Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
**Deepgram**:
```typescript
// Deepgram Voice Configuration
const voice = new DeepgramVoice({
speechModel: {
name: 'nova-2', // Example model name
speaker: 'aura-english-us', // Example speaker name
apiKey: process.env.DEEPGRAM_API_KEY,
language: 'en-US', // Language code
tone: 'formal', // Tone setting
},
listeningModel: {
name: 'nova-2', // Example model name
format: 'flac', // Audio format
},
})
```
Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
**Speechify**:
```typescript
// Speechify Voice Configuration
const voice = new SpeechifyVoice({
speechModel: {
name: 'speechify-voice', // Example model name
speaker: 'matthew', // Example speaker name
apiKey: process.env.SPEECHIFY_API_KEY,
language: 'en-US', // Language code
speed: 1.0, // Speech speed
},
// Speechify may not have a separate listening model
})
```
Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
**Sarvam**:
```typescript
// Sarvam Voice Configuration
const voice = new SarvamVoice({
speechModel: {
name: 'sarvam-voice', // Example model name
apiKey: process.env.SARVAM_API_KEY,
language: 'en-IN', // Language code
style: 'conversational', // Style setting
},
// Sarvam may not have a separate listening model
})
```
Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
**Murf**:
```typescript
// Murf Voice Configuration
const voice = new MurfVoice({
speechModel: {
name: 'murf-voice', // Example model name
apiKey: process.env.MURF_API_KEY,
language: 'en-US', // Language code
emotion: 'happy', // Emotion setting
},
// Murf may not have a separate listening model
})
```
Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
**OpenAI Realtime**:
```typescript
// OpenAI Realtime Voice Configuration
const voice = new OpenAIRealtimeVoice({
speechModel: {
name: 'gpt-3.5-turbo', // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US', // Language code
},
listeningModel: {
name: 'whisper-1', // Example model name
apiKey: process.env.OPENAI_API_KEY,
format: 'ogg', // Audio format
},
speaker: 'alloy', // Example speaker name
})
```
For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime).
**Google Gemini Live**:
```typescript
// Google Gemini Live Voice Configuration
const voice = new GeminiLiveVoice({
speechModel: {
name: 'gemini-2.0-flash-exp', // Example model name
apiKey: process.env.GOOGLE_API_KEY,
},
speaker: 'Puck', // Example speaker name
// Google Gemini Live is a realtime bidirectional API without separate speech and listening models
})
```
Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
**AI SDK**:
```typescript
// AI SDK Voice Configuration
import { CompositeVoice } from '/core/voice'
import { openai } from '-sdk/openai'
import { elevenlabs } from '-sdk/elevenlabs'
// Use AI SDK models directly - no need to install separate packages
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK transcription
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
})
// Works seamlessly with your agent
const voiceAgent = new Agent({
id: 'aisdk-voice-agent',
name: 'AI SDK Voice Agent',
instructions: 'You are a helpful assistant with voice capabilities.',
model: 'openai/gpt-5.4',
voice,
})
```
### Using Multiple Voice Providers
This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
Start by creating instances of the voice providers with any necessary configuration.
```typescript
import { OpenAIVoice } from '/voice-openai'
import { PlayAIVoice } from '/voice-playai'
import { CompositeVoice } from '/core/voice'
import { playAudio, getMicrophoneStream } from '/node-audio'
// Initialize OpenAI voice for STT
const input = new OpenAIVoice({
listeningModel: {
name: 'whisper-1',
apiKey: process.env.OPENAI_API_KEY,
},
})
// Initialize PlayAI voice for TTS
const output = new PlayAIVoice({
speechModel: {
name: 'playai-voice',
apiKey: process.env.PLAYAI_API_KEY,
},
})
// Combine the providers using CompositeVoice
const voice = new CompositeVoice({
input,
output,
})
// Implement voice interactions using the combined voice provider
const audioStream = getMicrophoneStream() // Assume this function gets audio input
const transcript = await voice.listen(audioStream)
// Log the transcribed text
console.log('Transcribed text:', transcript)
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
speaker: 'default', // Optional: specify a speaker,
responseFormat: 'wav', // Optional: specify a response format
})
// Play the audio response
playAudio(responseAudio)
```
### Using AI SDK Model Providers
You can also use AI SDK models directly with `CompositeVoice`:
```typescript
import { CompositeVoice } from '/core/voice'
import { openai } from '-sdk/openai'
import { elevenlabs } from '-sdk/elevenlabs'
import { playAudio, getMicrophoneStream } from '/node-audio'
// Use AI SDK models directly - no provider setup needed
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK transcription
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
})
// Works the same way as Mastra providers
const audioStream = getMicrophoneStream()
const transcript = await voice.listen(audioStream)
console.log('Transcribed text:', transcript)
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
speaker: 'Rachel', // ElevenLabs voice
})
playAudio(responseAudio)
```
You can also mix AI SDK models with Mastra providers:
```typescript
import { CompositeVoice } from '/core/voice'
import { PlayAIVoice } from '/voice-playai'
import { groq } from '-sdk/groq'
const voice = new CompositeVoice({
input: groq.transcription('whisper-large-v3'), // AI SDK for STT
output: new PlayAIVoice(), // Mastra provider for TTS
})
```
For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice).
## More resources
- [CompositeVoice](https://mastra.ai/reference/voice/composite-voice)
- [MastraVoice](https://mastra.ai/reference/voice/mastra-voice)
- [OpenAI Voice](https://mastra.ai/reference/voice/openai)
- [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime)
- [Azure Voice](https://mastra.ai/reference/voice/azure)
- [Google Voice](https://mastra.ai/reference/voice/google)
- [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live)
- [Deepgram Voice](https://mastra.ai/reference/voice/deepgram)
- [PlayAI Voice](https://mastra.ai/reference/voice/playai)
- [Voice Examples](https://github.com/mastra-ai/voice-examples)