@namastexlabs/speak
Version:
Open source voice dictation for everyone
63 lines (53 loc) • 2.76 kB
Markdown
# Mission
## Pitch
Speak is a voice-to-text dictation application that works seamlessly across all your apps. Hold Ctrl+Win to dictate anywhere - in emails, messages, docs, or any other application. Built as an open-source clone inspired by Flow, with focus on accuracy, speed, and universal compatibility.
## Users
- **Knowledge Workers**: Professionals who write frequently (emails, documents, reports)
- **Content Creators**: Writers, bloggers, journalists who need fast text input
- **Accessibility Users**: People with mobility challenges or RSI who prefer voice input
- **Multilingual Users**: People working across multiple languages who need accurate transcription
- **Productivity Enthusiasts**: Users seeking to optimize their workflow with voice input
## Problem
Typing is slower than speaking. Most voice dictation tools are:
- Limited to specific applications
- Require context switching
- Have poor accuracy with technical terms or multilingual content
- Lack offline capabilities
- Don't integrate seamlessly into existing workflows
## Solution
Speak provides:
- **Universal Compatibility**: Works in any application via global hotkey (Ctrl+Win)
- **High Accuracy**: Powered by OpenAI's gpt-4o-transcribe model
- **Seamless Integration**: Direct text insertion at cursor position
- **Multi-language Support**: Supports 50+ languages out of the box
- **Privacy-Focused**: User controls their data and API keys
- **Extensible**: Open-source architecture allowing customization
## Key Features
### Phase 0 (MVP)
- [x] Global hotkey activation (Ctrl+Win)
- [x] Voice recording with visual feedback
- [x] Real-time transcription via OpenAI Whisper API
- [x] Direct text insertion into active application
- [x] User statistics (usage tracking, word count, WPM)
- [x] Dictionary/snippets management
- [x] Style preferences
- [x] Notes feature
### Phase 1 (Current - see roadmap.md)
- [ ] Speaker diarization (meeting mode with gpt-4o-transcribe-diarize)
- [ ] Streaming transcription for real-time feedback
- [ ] Custom vocabulary/prompting for domain-specific accuracy
- [ ] Multi-format audio support
- [ ] Enhanced error handling and retry logic
- [ ] Settings persistence and configuration management
### Future Phases
- Offline mode with local models
- Custom voice commands (formatting, navigation)
- Team collaboration features
- API for third-party integrations
- Mobile companion app
## Success Metrics
- **Transcription Accuracy**: >95% word accuracy across supported languages
- **Response Time**: <2 seconds from stop speaking to text insertion
- **User Adoption**: 1000+ active users within 6 months
- **Reliability**: 99.5% uptime for critical path components
- **User Satisfaction**: >4.5/5 average rating from user feedback