Voice Apps Index

Index of voice typing, dictation, and speech-to-text applications and utilities.

In Development

VoiceType

A fork of Deepgram's Linux starter with CLI-to-GUI conversion, hotkey support, API key editing, and cost tracking. Uses Deepgram streaming ASR.

View Repo

danielrosehill/VoiceType View on GitHub

Parakeet Type Ubuntu

On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx. Built-in punctuation, multiple model profiles, system tray app with configurable hotkeys. No cloud, no GPU required.

View Repo

danielrosehill/Parakeet-Type-Ubuntu View on GitHub

AI Typer V2

Voice dictation with multimodal AI cleanup — speak naturally, get polished text. Uses Gemini multimodal audio processing.

View Repo

danielrosehill/AI-Typer-V2 View on GitHub

Wayland Voice Typer

Simple GUI around whisper.cpp for voice-to-text on Linux.

View Repo

danielrosehill/Wayland-Voice-Typer View on GitHub

Quick STT

Optimised always-on STT for Ubuntu with ROCm support.

View Repo

danielrosehill/Quick-STT View on GitHub

hyprvoice

Voice-powered typing for Wayland/Hyprland desktops.

View Repo

danielrosehill/hyprvoice View on GitHub

Mooshine Dictation App

Moonshine-based dictation application.

View Repo

danielrosehill/Mooshine-Dictation-App-0326 View on GitHub

Local STT App

Local speech-to-text application.

View Repo

danielrosehill/Local-STT-App View on GitHub

Voice Typing 1125

Voice typing application iteration.

View Repo

danielrosehill/Voice-Typing-1125 View on GitHub

Old Iterations

AI Transcription Notepad

Voice note-taking utility that uses cloud audio multimodal models for single-pass transcription and text cleanup.

View Repo

danielrosehill/AI-Transcription-Notepad View on GitHub

Thought Pad

Linux desktop application providing a two-stage process for creating notes from dictated speech — transcription via Whisper API followed by light text formatting. Exports to markdown docs.

View Repo

danielrosehill/Thought-Pad View on GitHub

Whisper Typer 0911

Earlier Whisper-based voice typing iteration.

View Repo

danielrosehill/Whisper-Typer-0911 View on GitHub

Deepgram Voice Keyboard Ubuntu

WIP STT utility using cloud STT APIs on Ubuntu.

View Repo

danielrosehill/Deepgram-Voice-Keyboard-Ubuntu View on GitHub

Voiceflow V1

Early voice flow implementation.

View Repo

danielrosehill/Voiceflow-V1 View on GitHub

Voiceflow Dev

Voice flow development iteration.

View Repo

danielrosehill/Voiceflow-Dev View on GitHub

Voice Flow Idea Dev

Voice flow idea development workspace.

View Repo

danielrosehill/Voice-Flow-Idea-Dev View on GitHub

Whisper Typing Linux 1125

Whisper-based typing tool for Linux.

View Repo

danielrosehill/Whisper-Typing-Linux-1125 View on GitHub

Voice Keyboard

Voice keyboard application.

View Repo

danielrosehill/Voice-Keyboard View on GitHub

Android Voice Keyboard

Voice keyboard for Android.

View Repo

danielrosehill/Android-Voice-Keyboard View on GitHub

Voice Notepad Android

Android fork of transcription UI.

View Repo

danielrosehill/Voice-Notepad-Android View on GitHub

Transcription Tools

Gemini Audio Transcriber

File upload based multimodal transcription tool using Gemini via Open Router.

View Repo

danielrosehill/Gemini-Audio-Transcriber View on GitHub

Gemini Transcription Notepad

Gemini-powered transcription notepad with cleanup.

View Repo

danielrosehill/Gemini-Transcription-Notepad View on GitHub

Gemini ASR Transcriber

Transcription notepad for Gemini ASR.

View Repo

danielrosehill/Gemini-ASR-Transcriber View on GitHub

DVR Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription.

View Repo

danielrosehill/DVR-Transcriber View on GitHub

Transcript Creator

Audio cleanup and transcription tool.

View Repo

danielrosehill/Transcript-Creator View on GitHub

Local Multimodal Transcriber

Local transcription app with audio multimodal design.

View Repo

danielrosehill/Local-Multimodal-Transcriber View on GitHub

ASR Transcription Pipeline

ASR transcription pipeline.

View Repo

danielrosehill/ASR-Transcription-Pipeline View on GitHub

Transcription MCPs

Gemini Transcription MCP

MCP server for Gemini multimodal audio transcription with built-in post-processing.

View Repo

danielrosehill/Gemini-Transcription-MCP View on GitHub

Cloud ASR MCP

MCP for using various cloud ASR models for speech-to-text and transcription.

View Repo

danielrosehill/Cloud-ASR-MCP View on GitHub

Local AI Transcription MCP

MCP for local AI transcription.

View Repo

danielrosehill/Local-AI-Transcription-MCP View on GitHub

Local Transcription MCP

WIP MCP for local STT with cleanup on AMD GPU machines.

View Repo

danielrosehill/Local-Transcription-MCP View on GitHub

OR Audio Transcription MCP

Open Router-based audio transcription MCP server.

View Repo

danielrosehill/OR-Audio-Transcription-MCP View on GitHub

Evaluations & Benchmarks

Whisper Fine Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference.

View Repo

danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHub

Whisper WPM Background Noise Eval

Quick eval to answer: how much does speaking pace affect WER/accuracy in ASR?

View Repo

danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHub

Transcription Cleanup Eval

Evaluating various cloud audio understanding models on the transcribe-and-cleanup workflow.

View Repo

danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHub

One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy evaluation.

View Repo

danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHub

Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for local hardware.

View Repo

danielrosehill/Local-ASR-STT-Benchmark View on GitHub

Whisper WPM Test

Whisper words-per-minute testing.

View Repo

danielrosehill/Whisper-WPM-Test View on GitHub

Gemini 3.1 Lite Audio Understanding Eval

Evaluation of Gemini 3.1 Lite on audio understanding tasks.

View Repo

danielrosehill/Gemini-31-Lite-Audio-Understanding-Eval View on GitHub

Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup and comparing multimodal ASR vs. the STT + LLM approach.

View Repo

danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHub

Whisper Fine-Tuning & Setup

Whisper Finetune V2

Whisper fine-tuning iteration.

View Repo

danielrosehill/Whisper-Finetune-V2 View on GitHub

Modal Whisper Finetune Script

Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset.

View Repo

danielrosehill/Modal-Whisper-Finetune-Script View on GitHub

Whisper Fine Tuning Data

Whisper fine-tuning dataset.

View Repo

danielrosehill/Whisper-Fine-Tuning-Data View on GitHub

Whisper Fine Tune 171125

Whisper fine-tuning iteration.

View Repo

danielrosehill/Whisper-Fine-Tune-171125 View on GitHub

Whisper Base FUTO

Whisper base model via FUTO.

View Repo

danielrosehill/Whisper-Base-FUTO View on GitHub

Local STT Fine Tune Tests

Local STT fine-tuning tests.

View Repo

danielrosehill/Local-STT-Fine-Tune-Tests View on GitHub

Fine Tuned STT Formats

Fine-tuned STT data formats.

View Repo

danielrosehill/Fine-Tuned-STT-Formats View on GitHub

whisper-wayland-rocm

Whisper-Wayland with ROCm GPU acceleration — Docker setup for AMD GPUs.

View Repo

danielrosehill/whisper-wayland-rocm View on GitHub

whisper-cpp-rocm-setup

whisper.cpp ROCm setup scripts.

View Repo

danielrosehill/whisper-cpp-rocm-setup View on GitHub

Whisper Local Notes

Notes on local Whisper usage.

View Repo

danielrosehill/Whisper-Local-Notes View on GitHub

ASR Training Data

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

View Repo

danielrosehill/ASR-Training-Data-Collector View on GitHub