Speech Tech Index
Part of the Daniel Rosehill Index Collection
danielrosehill/Index View on GitHubIndex of speech technology repositories, tools, and resources — covering the full pipeline from speech capture through transcription, cleanup, and text transformation.
Last updated: 2026-03-25
ASR Fine-Tuning
Resources, scripts, and models for fine-tuning automatic speech recognition systems.
Modal ACFT Finetune Script
Validated Whisper fine-tuning script on Modal for FUTO
danielrosehill/Modal-ACFT-Finetune-Script View on GitHubModal Whisper Finetune Script
Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset
danielrosehill/Modal-Whisper-Finetune-Script View on GitHubMy Whisper ACFT Fine-Tunes (Collection)
Collection of fine-tuned Whisper models specifically for FUTO Keyboard on mobile. Fine-tuned on ~1 hour of personal voice samples.
Whisper ACFT - Base
Base-sized Whisper fine-tune
Whisper ACFT - Small
Small-sized Whisper fine-tune
Whisper ACFT - Tiny
Tiny-sized Whisper fine-tune
My Whisper Fine-Tunes V2 (Collection)
Collection of general Whisper fine-tuned models for desktop use, available in GGML and CTranslate2 formats. Fine-tuned on ~1 hour of personal voice samples.
Whisper Fine-Tune - Large V3 Turbo
Large V3 Turbo-sized Whisper fine-tune
Whisper Fine-Tune - Medium
Medium-sized Whisper fine-tune
Whisper Fine-Tune - Tiny
Tiny-sized Whisper fine-tune
Whisper Fine-Tune - Base
Base-sized Whisper fine-tune
STT Fine Tune Project Outline
Planning doc for STT fine-tuning and eval project
danielrosehill/STT-Fine-Tune-Project-Outline View on GitHubwhisper-acft
Whisper ACFT fine-tuning
danielrosehill/whisper-acft View on GitHubWhisper Fine Tuning Resources
Some resources for those looking to fine-tune Whisper ASR
danielrosehill/Whisiper-Fine-Tuning-Resources View on GitHubWhisper-Hebrish
Fine-tuned Whisper model for Hebrew/English mixed speech
ASR Training Data GUIs
GUI applications for creating and collecting training data for ASR fine-tuning.
ASR Training Data Chunker
Breaks up texts by approximate reading duration
danielrosehill/ASR-Training-Data-Chunker View on GitHubASR Training Data Collector
GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.
danielrosehill/ASR-Training-Data-Collector View on GitHubVoice Training Data Creator
GUI to facilitate capturing voice data for TTS / voice clone training with LLM synthetic text generation and saving logic (Ubuntu Linux)
danielrosehill/Voice-Training-Data-Creator View on GitHubASR Datasets
Curated datasets for training and evaluating ASR/STT models.
My Public Audio Datasets (Collection)
Collection of public audio datasets for speech recognition training and evaluation
English-Hebrew Mixed Sentences
Dataset of mixed English/Hebrew sentences for multilingual ASR training
Tech Audio Samples
Technical audio samples for STT evaluation
Whisper WPM Test
Dataset for testing words-per-minute recognition accuracy
STT Applications
Desktop applications and utilities for speech-to-text input.
Whisper-Based Linux Prototypes
Voice Prompt Editor
Streamlit app for capturing and editing prompts and system prompts
danielrosehill/Voice-Prompt-Editor View on GitHubVoice Prompt Runner
Demo UI which parses and then runs audio prompts
danielrosehill/Voice-Prompt-Runner View on GitHubWhisper Notepad For Linux
Notepad for Linux that uses OpenAI Whisper (API) and reformats dictated text
danielrosehill/Whisper-Notepad-For-Linux View on GitHubWhisper Notepad Simple
A Linux desktop utility for converting speech to text using the OpenAI Whisper API
danielrosehill/Whisper-Notepad-Simple View on GitHubWhisper Transcription Notepad Linux
Transcription notepad with cloud speech to text (STT) for Linux
danielrosehill/Whisper-Transcription-Notepad-Linux View on GitHubDeepgram-Based Linux Prototypes
Deepgram Voice Keyboard
A fork of Deepgram's Linux starter. CLI -> GUI + hotkey support, API key editing, cost tracking. WIP
danielrosehill/deepgram-voice-keyboard View on GitHubDeepgram Voice Keyboard Ubuntu
WIP to try to create a good STT utility with cloud STT APIs
danielrosehill/Deepgram-Voice-Keyboard-Ubuntu View on GitHubOther STT & Dictation Apps
amical
Open Source AI Dictation App - Type 3x faster, no keyboard needed
danielrosehill/amical View on GitHubHandy
A free, open source, and extensible speech-to-text application that works completely offline
danielrosehill/Handy View on GitHubhyprvoice
Voice-powered typing for Wayland/Hyprland desktops
danielrosehill/hyprvoice View on GitHubparakeet-dictation
On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx
danielrosehill/parakeet-dictation View on GitHubspeech-notes-with-text-fixes
Speech Note Linux app. Note taking, reading and translating with offline STT, TTS and Machine translation
danielrosehill/speech-notes-with-text-fixes View on GitHubThought-Pad
Linux desktop application for creating notes from dictated speech
danielrosehill/Thought-Pad View on GitHubVoice-Note-Recorder-Ubuntu
GUI for recording voice notes
danielrosehill/Voice-Note-Recorder-Ubuntu View on GitHubWayland-Voice-Typer
Simple GUI around whisper.cpp for voice-to-text on Linux
danielrosehill/Wayland-Voice-Typer View on GitHubMultimodal Audio Transcription
AI-Transcription-Notepad
Voice note taking utility with cloud audio multimodal models for transcription and text cleanup
danielrosehill/AI-Transcription-Notepad View on GitHubCloud-ASR-MCP
WIP MCP for using various cloud ASR models for speech to text / transcription
danielrosehill/Cloud-ASR-MCP View on GitHubDVR-Transcriber
Workflow workspace for importing recordings from a DVR and using AI for transcription
danielrosehill/DVR-Transcriber View on GitHubGemini-Audio-Transcriber
File upload based multimodal transcription tool using Gemini
danielrosehill/Gemini-Audio-Transcriber View on GitHubGemini-Transcription-MCP
MCP for Gemini multimodal audio transcription with built in post-processing
danielrosehill/Gemini-Transcription-MCP View on GitHubLocal-Multimodal-Transcriber
Local transcription app with audio multimodal design
danielrosehill/Local-Multimodal-Transcriber View on GitHubTranscript Processing
System prompts and tools for cleaning, transforming, and enhancing STT output.
Basic STT Transcript Cleanup
Clean up raw speech-to-text transcripts
Diarised Transcript Assistant
System prompt for generating diarised transcripts (STT plus stylistic guidance)
danielrosehill/Diarised-Transcript-Assistant View on GitHubSpeech To Text System Prompt Library
An updated skeleton library of system prompts for using LLMs to refine STT output
danielrosehill/Speech-To-Text-System-Prompt-Library View on GitHubSTT Basic Cleanup System Prompt
Basic foundational system prompt for cleaning up AI voice transcripts
danielrosehill/STT-Basic-Cleanup-System-Prompt View on GitHubText Magic Fix Linux
WIP/Idea - Select text and fix typos with local AI
danielrosehill/Text-Magic-Fix-Linux View on GitHubText Transformation Prompt Collection 2
An abbreviated collection of STT transformation prompts
danielrosehill/Text-Transformation-Prompt-Collection-2 View on GitHubText Transformation Prompt Combiner
Basic implementation of a prompt concatenation utility for text transformation system prompts for converting transcribed text
danielrosehill/Text-Transformation-Prompt-Combiner View on GitHubText Transformation Prompt Library
Updated repo of text transformation prompts (raw STT transcripts -> *). New repo for capturing via automations.
danielrosehill/Text-Transformation-Prompt-Library View on GitHubAI Text Rewriting Toolbox
LLM text reformatting and rewriting toolbox comprised of many system prompts
danielrosehill/AI-Text-Rewriting-Toolbox View on GitHubAudiopenai Edit Prompts
Text transformation prompts library for Audiopen.ai
danielrosehill/Audiopenai-Edit-Prompts View on GitHubShakespearean Text Generators
System prompts for rewriting text in Shakespearean English
danielrosehill/Shakespearean-Text-Generators View on GitHubText Cleanup Fine Tuning Set
Fine-tuning dataset/plans for text cleanup audio multimodal
danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHubText Transformation Prompt Stack
Documentation/notes for a "prompt stack" for audio multimodal text processing
danielrosehill/Text-Transformation-Prompt-Stack View on GitHubTranscription Cleanup Eval 1225
Evaluating various cloud audio understanding models on transcribe and cleanup
danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHubVoice Cleanup Prompt Experiment
Testing various permutations in system prompting for raw audio transcript cleanup
danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHubVoice Note Redaction Agent
Config for a text redaction agent for voicenote -> * workflows
danielrosehill/Voice-Note-Redaction-Agent View on GitHubVoice Automation & Pipelines
Workflows and agents for voice-to-action automation.
Audio Context Pipeline Model 0425
Planning repo for personalised AI context pipeline with revised tooling
danielrosehill/Audio-Context-Pipeline-Model-0425 View on GitHubSTT To TTS
Gemini app which captures user speech, condenses (LLM), and then synthesises
danielrosehill/STT-To-TTS View on GitHubVoice Prompt Enhancement Node
Configuration for an intermediate agent in voice automation workflows that bridge voice input to other actions
danielrosehill/Voice-Prompt-Enhancement-Node View on GitHubVoice Prompt Pipeline
Voice-to-prompt pipeline for processing spoken instructions
Voice Spec Driven Development Demo
Demonstrating a voice to text spec driven development workflow
danielrosehill/Voice-Spec-Driven-Development-Demo View on GitHubVoice To Prompt Pipeline
A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results
danielrosehill/Voice-To-Prompt-Pipeline View on GitHubN8N Voice Note Context Pipeline Workflow
Workflow for extracting context data from voice notes to Pinecone
danielrosehill/N8N-Voice-Note-Context-Pipeline-Workflow View on GitHubVoice Note Ragie Pipeline
Test pipeline: voice context data to Ragie
danielrosehill/Voice-Note-Ragie-Pipeline View on GitHubVoicenotes Prompt To Email Workflow N8N
danielrosehill/Voicenotes-Prompt-To-Email-Workflow-N8N View on GitHubEvaluation & Benchmarking
Tools for testing and comparing STT performance.
Local STT Eval One Sample
Single-sample evaluation for local STT models
Long Form Audio Eval
Single shot STT benchmark for long form audio
danielrosehill/Long-Form-Audio-Eval View on GitHubSTT Comparison
Compare different speech-to-text models and services
STT Voice Note Evaluation
danielrosehill/STT-Voice-Note-Evaluation View on GitHubLocal ASR STT Benchmark
Quick evaluation to find the best STT model in Speech Note (Ubuntu) for specific hardware
danielrosehill/Local-ASR-STT-Benchmark View on GitHubLong Form Audio Pipeline
Basic audio pipeline for preparing long audio content for ASR transcription
danielrosehill/Long-Form-Audio-Pipeline View on GitHubOne Shot Transcription Microphone Eval
Test samples for various microphones with an STT accuracy eval
danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHubSpeech And ASR Evaluations
Index repository for speech recognition and ASR evaluations
danielrosehill/Speech-And-ASR-Evaluations View on GitHubWhisper Fine-Tune Accuracy Eval
Comparing Whisper fine-tunes versus stock Whisper on local inference
danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHubWhisper Fine-Tune Eval
Evaluation interface for fine-tuned Whisper models
Whisper WPM Background Noise Eval
Quick eval: how much does speaking pace affect WER/accuracy in ASR?
danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHubAudio Processing
Microphone setup, EQ, and audio chain tools for optimal STT input.
Deepnet Baby Noise Scrub
Audio cleaning tool for removing baby/background noise from recordings
EQ Template Generator
Generate EQ templates for audio processing
Mic Input Boot FX Script Ubuntu
Boot script to ensure that Easy Effects manages the input sound source on boot (Ubuntu)
danielrosehill/Mic-Input-Boot-FX-Script-Ubuntu View on GitHubSpeech Recognition Audio Chain
Attempt to set up a good autostart audio processing chain for STT
danielrosehill/Speech-Recognition-Audio-Chain View on GitHubVoice Analyzer
Analyses voice data
danielrosehill/Voice-Analyzer View on GitHubTTS & Speech Synthesis
Text-to-speech and SSML generation tools.
Text To SSML Generator
Generates SSML from text by inference
danielrosehill/Text-To-SSML-Generator View on GitHubHebrew TTS Providers
Reference of Hebrew text-to-speech providers and services
Point in time snapshot of TTS providers with Modern Hebrew support
General
Documentation, research, curated lists, and miscellaneous speech tech resources.
ASR And STT AI Notebook
Prompts and outputs (and some notes) on STT + ASR + fine-tuning. LLM: Claude
danielrosehill/ASR-And-STT-AI-Notebook View on GitHubAwesome Whisper Apps
Useful speech to text tools that use Whisper under the hood (API/local)
danielrosehill/Awesome-Whisper-Apps View on GitHubDeepgram Text Input
Analysis of Deepgram text input
danielrosehill/Deepgram-Text-Input View on GitHubLinux Voice Typing App Notes
Planning notes for a tool I've been working on for a while!
danielrosehill/Linux-Voice-Typing-App-Notes View on GitHubSTT Price Points 260225
Some timestamped API pricepoints for speech to text providers
danielrosehill/STT-Price-Points-260225 View on GitHubVoice LLM App Notes
A few notes describing the kind of voice app for large language models I would love to have!
danielrosehill/Voice-LLM-App-Notes View on GitHubDictation Macropad
Plan/key allocation for a macropad optimised for heavy daily dictation workflows
danielrosehill/Dictation-Macropad View on GitHubLinux Friendly Voice Tech
List of resources for voice technology with support for Linux
danielrosehill/Linux-Friendly-Voice-Tech View on GitHubSpeech To Text Chain Notes
Notes on STT processing chain (for future voice projects)
danielrosehill/Speech-To-Text-Chain-Notes View on GitHubUbuntu Mic Selector
Utility for switching microphone sources
danielrosehill/Ubuntu-Mic-Selector View on GitHubVoice Control Linux
Claude-enhanced research for voice control platforms with Linux support
danielrosehill/Voice-Control-Linux View on GitHubVoicepad
Planning notes for a macropad for STT users
danielrosehill/Voicepad View on GitHub