Awesome Whisper Apps
A curated collection of applications, tools, and resources built with OpenAI Whisper - a robust automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data.
openai/whisper View on GitHubTable of Contents
Quick Start Guide
Looking for something specific?
Voice typing on Linux? → Linux System Integration or try nerd-dictation
Voice typing on Mac? → macOS Apps or try SuperWhisper
Voice typing on Windows? → Windows Apps or try WinWhisper
Cross-platform desktop app? → Try Buzz or whisper-writer
Generate video subtitles? → Subtitles & Captioning
Real-time transcription? → Real-Time & Streaming
Meeting transcription? → Meeting & Productivity
Cloud/SaaS solution? → SaaS Platforms
Self-hosted web interface? → Web UI
Developer integration? → Libraries & APIs or Model Variants
Popular Picks
Top projects by community engagement and activity:
Desktop Applications
Buzz — Cross-platform · · Feature-rich desktop transcription app
whisper-writer — Cross-platform · · Voice-to-text for system-wide input
SuperWhisper — macOS · N/A · Premium Mac app for voice-to-text
WinWhisper — Windows · · System-wide hotkey support for Windows
Model Variants & Performance
whisper.cpp — · High-performance C/C++ implementation
faster-whisper — · Faster implementation using CTranslate2
WhisperX — · Word-level timestamps + speaker diarization
insanely-fast-whisper — · Speed-optimized implementation
Developer Tools
WhisperLive — · Real-time transcription server
whisper_streaming — · Long-form streaming transcription
Whisper-WebUI — · Self-hosted web interface
Getting Started
Official Whisper & Models
Official Repository: openai/whisper
openai/whisper View on GitHubHugging Face Collection: Whisper Model Release
Official Paper: Robust Speech Recognition via Large-Scale Weak Supervision
Official Model Sizes
Choose based on your accuracy/speed requirements:
tiny — 39M · ✓ · ✓ · Fastest · Minimal resource usage, real-time apps
base — 74M · ✓ · ✓ · Very Fast · Resource-constrained environments
small — 244M · ✓ · ✓ · Fast · Good balance for most use cases
medium — 769M · ✓ · ✓ · Moderate · Better accuracy, moderate speed
large — 1550M · ✓ · Slower · Best accuracy, research use
By Use Case
Voice Typing & Dictation
Cross-Platform:
Buzz - Feature-rich desktop app
whisper-writer - System-wide voice-to-text
whisper-dictation - Dictation application
Linux:
nerd-dictation - Hackable offline speech-to-text
BlahST - Linux speech-to-text integration
whisper-to-input - Convert transcription to keyboard input
voice-typing-linux - Voice typing integration
macOS:
SuperWhisper - Premium Mac voice-to-text app
OpenSuperWhisper - Open-source Mac app
WhisperKit - Native macOS implementation
Windows:
WinWhisper - System-wide hotkey support
Whisper Typing for Windows - Desktop voice typing
Mobile:
whisperIME (Android) - Input method editor
Whisperboard (iOS) - Keyboard with Whisper
SaaS Platforms & Cloud Services
Whisper Transcribe - Online transcription platform
WhisperAI - Cloud-based transcription service
Whisper Typing - Online typing and transcription
Wisprflow - Workflow automation with transcription
CleverType - Smart typing assistant
SpeechPulse - Cross-platform speech-to-text
Blabby.ai - Browser-based transcription
Subtitles & Captioning
Generate subtitles and captions for videos:
[auto-subs](https://github.com/tmoroney/auto-subs) - Automatic subtitle generation
[TeroSubtitler](https://github.com/URUWorks/TeroSubtitler) - Professional subtitle editor
[whisper-youtube](https://github.com/ArthurFDLR/whisper-youtube) - YouTube subtitle generation
[yt-whisper](https://github.com/m1guelpf/yt-whisper) - YouTube transcription tool
[whisper-subs](https://github.com/GhostNaN/whisper-subs) - CLI for adding subtitles to videos
[whisply](https://github.com/tsmdt/whisply) - Automatic subtitle generation (Linux)
[template-tiktok](https://github.com/remotion-dev/template-tiktok) - TikTok-style captioning with Remotion
Meeting & Productivity
Tools for transcribing meetings and generating notes:
[meeting-minutes](https://github.com/Zackriya-Solutions/meeting-minutes) - Generate meeting minutes
[ScribeWizard](https://github.com/Bklieger/ScribeWizard) - AI-powered note-taking
Web Interfaces
Self-Hosted:
[Whisper-WebUI](https://github.com/jhj0517/Whisper-WebUI) - Web interface for transcription
[NeuroSandboxWebUI](https://github.com/Dartvauder/NeuroSandboxWebUI) - Comprehensive web UI for AI models
By Platform
Cross-Platform Desktop Applications
Applications that work on Linux, macOS, and Windows:
Buzz — · Feature-rich transcription app
chidiwilliams/buzz View on GitHubwhisper-writer — · Voice-to-text application
savbell/whisper-writer View on GitHubfaster-whisper-GUI — · GUI for faster-whisper
CheshireCC/faster-whisper-GUI View on GitHubSoftWhisper — · User-friendly GUI
NullMagic2/SoftWhisper View on GitHubspeech-assistant — · Speech assistant GUI
Mohamad-Hussein/speech-assistant View on GitHubwhisper-dictation — · Dictation application
foges/whisper-dictation View on GitHubwhisper-realtime-gui — · Real-time transcription GUI
phongthanhbuiit/whisper-realtime-gui View on GitHubwhisper-ui — · Cross-platform desktop UI
schnoddelbotz/whisper-ui View on GitHubwhisper_dictation — · Voice dictation tool
themanyone/whisper_dictation View on GitHubWhisperGUI — · Simple GUI
ADT109119/WhisperGUI View on GitHub
Linux
Desktop Applications
[froshine](https://github.com/AdrianScott/froshine) - Linux desktop app
[speak-to-ai](https://github.com/AshBuk/speak-to-ai) - Voice interaction app
[Whisper-Notepad-For-Linux](https://github.com/danielrosehill/Whisper-Notepad-For-Linux) - Notepad-style transcription
[WhisperNow](https://github.com/shinglyu/WhisperNow) - Desktop application
CLI Tools
[whisper.cpp-cli](https://github.com/charliermarsh/whisper.cpp-cli) - CLI for whisper.cpp
[blurt](https://github.com/QuantiusBenignus/blurt) - Command-line transcription tool
System Integration
[nerd-dictation](https://github.com/ideasman42/nerd-dictation) - Hackable offline STT (VOSK-API)
[BlahST](https://github.com/QuantiusBenignus/BlahST) - Speech-to-text integration
[Linux-Dictation-Project](https://github.com/wheeler01/Linux-Dictation-Project) - Dictation system
[linux-stt-input](https://github.com/fengwk/linux-stt-input) - STT input method
[linux-voice-to-text-ai](https://github.com/trebormc/linux-voice-to-text-ai) - Voice-to-text AI
[LinuxWhisper](https://github.com/vitali87/LinuxWhisper) - Linux implementation
[voice-typing-linux](https://github.com/GitJuhb/voice-typing-linux) - Voice typing integration
[Whisper-Dictation](https://github.com/LumenYoung/Whisper-Dictation) - Dictation system
[whisper-flow-linux](https://github.com/sapountzis/whisper-flow-linux) - Workflow integration
[whisper-hotkey-linux](https://github.com/atkvishnu/whisper-hotkey-linux) - Hotkey-based integration
[whispertrigger](https://github.com/RetroTrigger/whispertrigger) - System integration
[whisprd](https://github.com/AgenticToaster/whisprd) - Whisper daemon
[whisper-to-input](https://github.com/j3soon/whisper-to-input) - Transcription to keyboard input
[whispy](https://github.com/daaku/whispy) - Integration tool
[dicti](https://github.com/tksimson/dicti) - Dictation tool
[sonori](https://github.com/0xPD33/sonori) - Voice input system
[hushnote](https://github.com/peteonrails/hushnote) - Private note-taking
[Local-Voice](https://github.com/shashank2122/Local-Voice) - Local voice processing
[s2t](https://github.com/franchesoni/s2t) - Speech-to-text
[Whisper-Notepad-Simple](https://github.com/danielrosehill/Whisper-Notepad-Simple) - Simple notepad app
[Linux-AI-Assistant-scripts](https://github.com/samoylenkodmitry/Linux-AI-Assistant-scripts) - AI assistant scripts
macOS
Desktop Applications
[SuperWhisper](https://superwhisper.com/) - Premium Mac voice-to-text app
[OpenSuperWhisper](https://github.com/Starmel/OpenSuperWhisper) - Open-source Mac app
[WhisperKit](https://github.com/argmaxinc/WhisperKit) - Native macOS implementation
[Careless Whisper](https://carelesswhisper.app/) - Lightweight transcription app
System Integration
[ollama-voice-mac](https://github.com/apeatling/ollama-voice-mac) - Voice interface for Ollama
[whisperanywhere-js](https://github.com/unclecode/whisperanywhere-js) - System-wide transcription
Windows
Desktop Applications
[AI Transcription](https://apps.microsoft.com/detail/9p7f1j2svk3g) - Microsoft Store app
[Whisper Typing for Windows](https://whispertyping.com/download) - Desktop voice typing
System Integration
[WinWhisper](https://github.com/GewoonJaap/WinWhisper) - System-wide hotkey support
Android
[whisperIME](https://github.com/woheller69/whisperIME) - Input method editor
[WhisperInput](https://github.com/alex-vt/WhisperInput) - Input app
[WhisperKitAndroid](https://github.com/argmaxinc/WhisperKitAndroid) - WhisperKit for Android
[RTranslator](https://github.com/niedev/RTranslator) - Real-time translation app
[Dictate](https://github.com/DevEmperor/Dictate) - Voice dictation app
[whisper_android](https://github.com/vilassn/whisper_android) - Android integration
iOS
[Whisperboard](https://github.com/Saik0s/Whisperboard) - iOS keyboard with Whisper integration
Embedded / Raspberry Pi
[Local-Voice](https://github.com/shashank2122/Local-Voice) - Local voice processing for embedded systems
For Developers
Model Variants & Performance Optimizations
Enhanced and optimized versions of Whisper:
whisper.cpp — · High-performance C/C++ implementation
ggerganov/whisper.cpp View on GitHubfaster-whisper — · 4x faster using CTranslate2
SYSTRAN/faster-whisper View on GitHubinsanely-fast-whisper — · Speed-optimized implementation
Vaibhavs10/insanely-fast-whisper View on GitHubWhisperX — · Word-level timestamps + diarization
m-bain/whisperX View on GitHubdistil-whisper — · Distilled models from HuggingFace
huggingface/distil-whisper View on GitHubCrisperWhisper — · Enhanced accuracy variant
nyrahealth/CrisperWhisper View on GitHubwhisper.net — · .NET implementation
sandrohanea/whisper.net View on GitHubwhisper-turbo — · High-performance implementation
FL33TW00D/whisper-turbo View on GitHub
Real-Time & Streaming
For live transcription and streaming audio:
[WhisperLive](https://github.com/collabora/WhisperLive) - Real-time transcription server
[whisper_streaming](https://github.com/ufal/whisper_streaming) - Long-form streaming transcription
[whisper_real_time](https://github.com/davabase/whisper_real_time) - Real-time implementation
[whisper-flow](https://github.com/dimastatz/whisper-flow) - Real-time flow
Diarization & Advanced Features
Speaker diarization and word-level timestamps:
[whisper-diarization](https://github.com/MahmoudAshraf97/whisper-diarization) - Speaker diarization
[whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) - Word-level timestamps
[pyannote-whisper](https://github.com/yinruiqing/pyannote-whisper) - Pyannote integration
[cog-whisper-diarization](https://github.com/thomasmol/cog-whisper-diarization) - Cog-wrapped diarization
[WhisperTimeSync](https://github.com/EtienneAb3d/WhisperTimeSync) - Time sync & diarization
Fine-Tuning
Tools for customizing Whisper models:
[Whisper-Finetune](https://github.com/yeyupiaoling/Whisper-Finetune) - Fine-tuning utilities
[whisper-finetuning](https://github.com/jumon/whisper-finetuning) - Fine-tuning framework
Deployment & Containers
[cog-whisper](https://github.com/replicate/cog-whisper) - Cog container for deployment
IDE & Editor Integrations
VS Code Extensions:
Whisper Assistant - Voice-to-text integration
Yap - Cursor Extension - Voice input for VS Code/Cursor
WhisperX Assistant - WhisperX integration
Other Editors:
[whisper-obsidian-plugin](https://github.com/nikdanilov/whisper-obsidian-plugin) - Obsidian integration
Game Development
[whisper.unity](https://github.com/Macoron/whisper.unity) - Unity game engine integration
Pipelines & Workflows
[WhisperChain](https://github.com/chrischoy/WhisperChain) - Pipeline framework for Whisper workflows
[whisper-playground](https://github.com/saharmor/whisper-playground) - Interactive playground for experimentation
Resources
Official Documentation
- openai/whisper View on GitHub
-
arXiv.org
Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
-
huggingface.co
Whisper Release - a openai CollectionWhisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large.