UNPKG

mbz-voice-sdk

Version:

๐ŸŽ™๏ธ MBZ Voice SDK: Easily add voice recognition, Gemini-based AI replies, and TTS to any web app.

426 lines (291 loc) โ€ข 11.2 kB
# ๐ŸŽ™๏ธ MBZ Voice SDK > **Speak. Think. Respond. Seamlessly.** MBZ-Voice-SDK is a powerful developer tool that enables you to integrate voice input, AI understanding (via Gemini), and spoken responses into any modern web app. Whether you're building a chatbot, AI assistant, or a voice-powered UI โ€” this SDK makes it plug-and-play. --- ## ๐Ÿ“‹ Table of Contents - [Features](#-features) - [Requirements](#-requirements) - [Installation](#-install-the-sdk) - [Backend Setup](#๏ธ-backend-setup-guide) - [Usage Examples](#-sdk-usage-example) - [API Documentation](#-api-documentation) - [Troubleshooting](#-troubleshooting) - [Contributing](#-contributing) - [Security Notice](#-security-notice) - [Tools Used](#-tools-used) - [License](#-license) - [Support](#-support) --- ## ๐Ÿ”ฅ Features โœ… **Voice Input**: Capture user speech via browser microphone using Web Speech API โœ… **AI Processing**: Gemini-powered AI backend built with FastAPI โœ… **Voice Response**: Convert AI text responses to spoken words using Web Speech TTS โœ… **Audio Controls**: Easily toggle mute/unmute functionality โœ… **Conversation Memory**: Store the last 3 Q&A exchanges using localStorage โœ… **Framework Agnostic**: Seamlessly integrate with plain JavaScript, React, Vue, or any modern frontend framework โœ… **Customizable**: Configure language, voice type, and response behavior โœ… **Lightweight**: Minimal dependencies for optimal performance ## ๐Ÿ’ป Requirements - Modern web browser with support for: - Web Speech API (SpeechRecognition) - Web Speech API (SpeechSynthesis) - localStorage - Node.js 14+ (for development) - Python 3.8+ (for backend) - Gemini API key from Google AI Studio ## ๐Ÿ“ฆ Install the SDK ### NPM Installation After publishing on npm: ```bash npx mbz-voice-sdk init ### Creating a Comprehensive README.md File Here's an enhanced README.md file with more complete details for the MBZ Voice SDK: ```markdown ... ``` ### Yarn Installation ```shellscript yarn add mbz-voice-sdk ``` ### Local Installation (if cloned) ```shellscript cd mbz-voice-sdk/sdk npm install ``` ### CDN Usage ```html <script src="https://unpkg.com/mbz-voice-sdk@latest/dist/mbz-voice-sdk.min.js"></script> ``` ## โš™๏ธ Backend Setup Guide This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the `/backend` folder. ### 1๏ธโƒฃ Navigate to the backend directory ```shellscript cd ../backend ``` ### 2๏ธโƒฃ Install Python dependencies ```shellscript pip install -r requirements.txt ``` ### 3๏ธโƒฃ Add Your Gemini API Key Create a `.env` file in the backend folder and paste your Gemini API key: ```plaintext GEMINI_API_KEY=your_google_gemini_api_key_here ``` ๐Ÿ‘‰ Get your key from: [https://makersuite.google.com/app/apikey](https://makersuite.google.com/app/apikey) ### 4๏ธโƒฃ Run the server ```shellscript uvicorn main:app --reload ``` Now your backend is live at: ```plaintext http://localhost:8000/ask ``` ## ๐Ÿง  SDK Usage Example ### Basic Usage ```javascript import { MBZVoiceAgent } from "mbz-voice-sdk"; const agent = new MBZVoiceAgent({ apiUrl: "http://localhost:8000/ask", lang: "en-US", speak: true }); agent.onTranscript((text) => { console.log("User said:", text); }); agent.onResponse((reply) => { console.log("AI replied:", reply); }); document.getElementById("start-btn").onclick = () => agent.listen(); ``` ### React Integration ```javascriptreact import React, { useEffect, useState } from 'react'; import { MBZVoiceAgent } from 'mbz-voice-sdk'; function VoiceAssistant() { const [transcript, setTranscript] = useState(''); const [response, setResponse] = useState(''); const [isListening, setIsListening] = useState(false); const [agent, setAgent] = useState(null); useEffect(() => { // Initialize the agent const voiceAgent = new MBZVoiceAgent({ apiUrl: "http://localhost:8000/ask", lang: "en-US", speak: true }); // Set up event handlers voiceAgent.onTranscript((text) => { setTranscript(text); }); voiceAgent.onResponse((reply) => { setResponse(reply); }); voiceAgent.onListeningChange((listening) => { setIsListening(listening); }); setAgent(voiceAgent); // Cleanup on unmount return () => { voiceAgent.cleanup(); }; }, []); const handleListen = () => { if (agent) { agent.listen(); } }; return ( <div className="voice-assistant"> <button onClick={handleListen} className={isListening ? 'listening' : ''} > {isListening ? '๐Ÿ”ด Listening...' : '๐ŸŽ™๏ธ Start Talking'} </button> {transcript && ( <div className="transcript"> <h3>You said:</h3> <p>{transcript}</p> </div> )} {response && ( <div className="response"> <h3>AI response:</h3> <p>{response}</p> </div> )} </div> ); } export default VoiceAssistant; ``` ## ๐Ÿงช HTML Quick Test ```html <button id="start-btn">๐ŸŽ™๏ธ Start Talking</button> <div id="transcript"></div> <div id="response"></div> <script type="module"> import { MBZVoiceAgent } from 'mbz-voice-sdk'; const agent = new MBZVoiceAgent({ apiUrl: 'http://localhost:8000/ask', speak: true }); const transcriptEl = document.getElementById('transcript'); const responseEl = document.getElementById('response'); agent.onTranscript(text => { console.log("๐ŸŽค", text); transcriptEl.textContent = `You said: ${text}`; }); agent.onResponse(reply => { console.log("๐Ÿค–", reply); responseEl.textContent = `AI says: ${reply}`; }); document.getElementById("start-btn").onclick = () => agent.listen(); </script> ``` ## ๐Ÿ“š API Documentation ### `MBZVoiceAgent` Class The main class for interacting with the SDK. #### Constructor ```javascript const agent = new MBZVoiceAgent(options); ``` #### Options | Option | Type | Default | Description |-----|-----|-----|----- | `apiUrl` | String | Required | The URL of your backend API endpoint | `lang` | String | 'en-US' | The language for speech recognition | `speak` | Boolean | true | Whether to speak the AI's response | `voiceIndex` | Number | 0 | Index of the voice to use for speech synthesis | `pitch` | Number | 1.0 | The pitch of the voice (0.1 to 2.0) | `rate` | Number | 1.0 | The speed of the voice (0.1 to 10.0) | `volume` | Number | 1.0 | The volume of the voice (0.0 to 1.0) | `maxHistory` | Number | 3 | Maximum number of Q&A pairs to store in history #### Methods | Method | Parameters | Description |-----|-----|-----|----- | `listen()` | None | Start listening for voice input | `stop()` | None | Stop listening for voice input | `mute()` | None | Mute the voice response | `unmute()` | None | Unmute the voice response | `cleanup()` | None | Clean up resources and event listeners | `onTranscript(callback)` | Function | Set callback for transcript events | `onResponse(callback)` | Function | Set callback for AI response events | `onListeningChange(callback)` | Function | Set callback for listening state changes | `onError(callback)` | Function | Set callback for error events | `getHistory()` | None | Get the conversation history | `clearHistory()` | None | Clear the conversation history ## ๐Ÿ”ง Troubleshooting ### Microphone Not Working - Ensure your browser has permission to access the microphone - Check if your microphone is properly connected and working - Try using a different browser (Chrome and Edge have the best support) ### Speech Recognition Not Starting - Make sure you're using a supported browser (Chrome, Edge, Safari) - Check your internet connection - Verify that your site is served over HTTPS (required for production) ### Backend Connection Issues - Confirm your backend server is running - Check for CORS issues (the backend should allow requests from your frontend) - Verify your API URL is correct in the SDK initialization ### Voice Response Not Working - Check if your device's volume is turned on - Make sure the `speak` option is set to `true` - Try using a different voice by changing the `voiceIndex` ## ๐Ÿค Contributing Contributions are welcome! Here's how you can help: 1. **Fork the repository** 2. **Create a feature branch**: ```shellscript git checkout -b feature/amazing-feature ``` 3. **Commit your changes**: ```shellscript git commit -m 'Add some amazing feature' ``` 4. **Push to the branch**: ```shellscript git push origin feature/amazing-feature ``` 5. **Open a Pull Request** ### Development Setup ```shellscript # Clone the repository git clone https://github.com/ProMBZ/mbz-voice-sdk.git # Install dependencies cd mbz-voice-sdk npm install # Run development server npm run dev # Build for production npm run build ``` ## ๐Ÿ” Security Notice This SDK does not use any built-in Gemini key. ๐Ÿ” You are responsible for adding your own Gemini key to the backend. Never include your Gemini key in frontend code. ## ๐Ÿงฐ Tools Used - **Frontend**: - JavaScript (SpeechRecognition + TTS APIs) - localStorage for conversation persistence - Rollup for bundling - **Backend**: - FastAPI (Python) - Google Generative AI SDK (Gemini 1.5 Flash) - Python-dotenv for environment variables ## ๐Ÿ“„ License MIT ยฉ 2025 โ€” Developed by Muhammad (MBZ-Voice-SDK)๐Ÿ”— GitHub: @ProMBZ ## ๐Ÿ’ฌ Support If you have questions, suggestions, or want to collaborate:๐Ÿ“ง Email: [muhammadzohaib1415@gmail.com](mailto:muhammadzohaib1415@gmail.com)๐ŸŒ Portfolio: [https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/](https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/) --- Made with โค๏ธ by Muhammad ```plaintext This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand. <Actions> <Action name="Create a demo implementation" description="Build a simple demo app using the MBZ Voice SDK" /> <Action name="Add code examples for Vue.js" description="Add specific code examples for Vue.js integration" /> <Action name="Create backend API documentation" description="Generate detailed API documentation for the backend endpoints" /> <Action name="Add deployment instructions" description="Create a guide for deploying the backend to production" /> <Action name="Create a video tutorial" description="Outline steps for creating a video tutorial for the SDK" /> </Actions> ```