mbz-voice-sdk
Version:
๐๏ธ MBZ Voice SDK: Easily add voice recognition, Gemini-based AI replies, and TTS to any web app.
426 lines (291 loc) โข 11.2 kB
Markdown
# ๐๏ธ MBZ Voice SDK
> **Speak. Think. Respond. Seamlessly.**
MBZ-Voice-SDK is a powerful developer tool that enables you to integrate voice input, AI understanding (via Gemini), and spoken responses into any modern web app. Whether you're building a chatbot, AI assistant, or a voice-powered UI โ this SDK makes it plug-and-play.
---
## ๐ Table of Contents
- [Features](#-features)
- [Requirements](#-requirements)
- [Installation](#-install-the-sdk)
- [Backend Setup](#๏ธ-backend-setup-guide)
- [Usage Examples](#-sdk-usage-example)
- [API Documentation](#-api-documentation)
- [Troubleshooting](#-troubleshooting)
- [Contributing](#-contributing)
- [Security Notice](#-security-notice)
- [Tools Used](#-tools-used)
- [License](#-license)
- [Support](#-support)
---
## ๐ฅ Features
โ
**Voice Input**: Capture user speech via browser microphone using Web Speech API
โ
**AI Processing**: Gemini-powered AI backend built with FastAPI
โ
**Voice Response**: Convert AI text responses to spoken words using Web Speech TTS
โ
**Audio Controls**: Easily toggle mute/unmute functionality
โ
**Conversation Memory**: Store the last 3 Q&A exchanges using localStorage
โ
**Framework Agnostic**: Seamlessly integrate with plain JavaScript, React, Vue, or any modern frontend framework
โ
**Customizable**: Configure language, voice type, and response behavior
โ
**Lightweight**: Minimal dependencies for optimal performance
## ๐ป Requirements
- Modern web browser with support for:
- Web Speech API (SpeechRecognition)
- Web Speech API (SpeechSynthesis)
- localStorage
- Node.js 14+ (for development)
- Python 3.8+ (for backend)
- Gemini API key from Google AI Studio
## ๐ฆ Install the SDK
### NPM Installation
After publishing on npm:
```bash
npx mbz-voice-sdk init
### Creating a Comprehensive README.md File
Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:
```markdown
...
```
### Yarn Installation
```shellscript
yarn add mbz-voice-sdk
```
### Local Installation (if cloned)
```shellscript
cd mbz-voice-sdk/sdk
npm install
```
### CDN Usage
```html
<script src="https://unpkg.com/mbz-voice-sdk@latest/dist/mbz-voice-sdk.min.js"></script>
```
## โ๏ธ Backend Setup Guide
This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the `/backend` folder.
### 1๏ธโฃ Navigate to the backend directory
```shellscript
cd ../backend
```
### 2๏ธโฃ Install Python dependencies
```shellscript
pip install -r requirements.txt
```
### 3๏ธโฃ Add Your Gemini API Key
Create a `.env` file in the backend folder and paste your Gemini API key:
```plaintext
GEMINI_API_KEY=your_google_gemini_api_key_here
```
๐ Get your key from: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey)
### 4๏ธโฃ Run the server
```shellscript
uvicorn main:app --reload
```
Now your backend is live at:
```plaintext
http://localhost:8000/ask
```
## ๐ง SDK Usage Example
### Basic Usage
```javascript
import { MBZVoiceAgent } from "mbz-voice-sdk";
const agent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
agent.onTranscript((text) => {
console.log("User said:", text);
});
agent.onResponse((reply) => {
console.log("AI replied:", reply);
});
document.getElementById("start-btn").onclick = () => agent.listen();
```
### React Integration
```javascriptreact
import React, { useEffect, useState } from 'react';
import { MBZVoiceAgent } from 'mbz-voice-sdk';
function VoiceAssistant() {
const [transcript, setTranscript] = useState('');
const [response, setResponse] = useState('');
const [isListening, setIsListening] = useState(false);
const [agent, setAgent] = useState(null);
useEffect(() => {
// Initialize the agent
const voiceAgent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
// Set up event handlers
voiceAgent.onTranscript((text) => {
setTranscript(text);
});
voiceAgent.onResponse((reply) => {
setResponse(reply);
});
voiceAgent.onListeningChange((listening) => {
setIsListening(listening);
});
setAgent(voiceAgent);
// Cleanup on unmount
return () => {
voiceAgent.cleanup();
};
}, []);
const handleListen = () => {
if (agent) {
agent.listen();
}
};
return (
<div className="voice-assistant">
<button
onClick={handleListen}
className={isListening ? 'listening' : ''}
>
{isListening ? '๐ด Listening...' : '๐๏ธ Start Talking'}
</button>
{transcript && (
<div className="transcript">
<h3>You said:</h3>
<p>{transcript}</p>
</div>
)}
{response && (
<div className="response">
<h3>AI response:</h3>
<p>{response}</p>
</div>
)}
</div>
);
}
export default VoiceAssistant;
```
## ๐งช HTML Quick Test
```html
<button id="start-btn">๐๏ธ Start Talking</button>
<div id="transcript"></div>
<div id="response"></div>
<script type="module">
import { MBZVoiceAgent } from 'mbz-voice-sdk';
const agent = new MBZVoiceAgent({
apiUrl: 'http://localhost:8000/ask',
speak: true
});
const transcriptEl = document.getElementById('transcript');
const responseEl = document.getElementById('response');
agent.onTranscript(text => {
console.log("๐ค", text);
transcriptEl.textContent = `You said: ${text}`;
});
agent.onResponse(reply => {
console.log("๐ค", reply);
responseEl.textContent = `AI says: ${reply}`;
});
document.getElementById("start-btn").onclick = () => agent.listen();
</script>
```
## ๐ API Documentation
### `MBZVoiceAgent` Class
The main class for interacting with the SDK.
#### Constructor
```javascript
const agent = new MBZVoiceAgent(options);
```
#### Options
| Option | Type | Default | Description
|-----|-----|-----|-----
| `apiUrl` | String | Required | The URL of your backend API endpoint
| `lang` | String | 'en-US' | The language for speech recognition
| `speak` | Boolean | true | Whether to speak the AI's response
| `voiceIndex` | Number | 0 | Index of the voice to use for speech synthesis
| `pitch` | Number | 1.0 | The pitch of the voice (0.1 to 2.0)
| `rate` | Number | 1.0 | The speed of the voice (0.1 to 10.0)
| `volume` | Number | 1.0 | The volume of the voice (0.0 to 1.0)
| `maxHistory` | Number | 3 | Maximum number of Q&A pairs to store in history
#### Methods
| Method | Parameters | Description
|-----|-----|-----|-----
| `listen()` | None | Start listening for voice input
| `stop()` | None | Stop listening for voice input
| `mute()` | None | Mute the voice response
| `unmute()` | None | Unmute the voice response
| `cleanup()` | None | Clean up resources and event listeners
| `onTranscript(callback)` | Function | Set callback for transcript events
| `onResponse(callback)` | Function | Set callback for AI response events
| `onListeningChange(callback)` | Function | Set callback for listening state changes
| `onError(callback)` | Function | Set callback for error events
| `getHistory()` | None | Get the conversation history
| `clearHistory()` | None | Clear the conversation history
## ๐ง Troubleshooting
### Microphone Not Working
- Ensure your browser has permission to access the microphone
- Check if your microphone is properly connected and working
- Try using a different browser (Chrome and Edge have the best support)
### Speech Recognition Not Starting
- Make sure you're using a supported browser (Chrome, Edge, Safari)
- Check your internet connection
- Verify that your site is served over HTTPS (required for production)
### Backend Connection Issues
- Confirm your backend server is running
- Check for CORS issues (the backend should allow requests from your frontend)
- Verify your API URL is correct in the SDK initialization
### Voice Response Not Working
- Check if your device's volume is turned on
- Make sure the `speak` option is set to `true`
- Try using a different voice by changing the `voiceIndex`
## ๐ค Contributing
Contributions are welcome! Here's how you can help:
1. **Fork the repository**
2. **Create a feature branch**:
```shellscript
git checkout -b feature/amazing-feature
```
3. **Commit your changes**:
```shellscript
git commit -m 'Add some amazing feature'
```
4. **Push to the branch**:
```shellscript
git push origin feature/amazing-feature
```
5. **Open a Pull Request**
### Development Setup
```shellscript
# Clone the repository
git clone https://github.com/ProMBZ/mbz-voice-sdk.git
# Install dependencies
cd mbz-voice-sdk
npm install
# Run development server
npm run dev
# Build for production
npm run build
```
## ๐ Security Notice
This SDK does not use any built-in Gemini key.
๐ You are responsible for adding your own Gemini key to the backend.
Never include your Gemini key in frontend code.
## ๐งฐ Tools Used
- **Frontend**:
- JavaScript (SpeechRecognition + TTS APIs)
- localStorage for conversation persistence
- Rollup for bundling
- **Backend**:
- FastAPI (Python)
- Google Generative AI SDK (Gemini 1.5 Flash)
- Python-dotenv for environment variables
## ๐ License
MIT ยฉ 2025 โ Developed by Muhammad (MBZ-Voice-SDK)๐ GitHub: @ProMBZ
## ๐ฌ Support
If you have questions, suggestions, or want to collaborate:๐ง Email: [muhammadzohaib1415@gmail.com](mailto:muhammadzohaib1415@gmail.com)๐ Portfolio: [https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/](https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/)
---
Made with โค๏ธ by Muhammad
```plaintext
This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.
<Actions>
<Action name="Create a demo implementation" description="Build a simple demo app using the MBZ Voice SDK" />
<Action name="Add code examples for Vue.js" description="Add specific code examples for Vue.js integration" />
<Action name="Create backend API documentation" description="Generate detailed API documentation for the backend endpoints" />
<Action name="Add deployment instructions" description="Create a guide for deploying the backend to production" />
<Action name="Create a video tutorial" description="Outline steps for creating a video tutorial for the SDK" />
</Actions>
```