Speech Tech Index

Part of the Daniel Rosehill Index Collection

Index of speech technology repositories, tools, and resources — covering the full pipeline from speech capture through transcription, cleanup, and text transformation.

Last updated: 2026-03-25

ASR Fine-Tuning

Resources, scripts, and models for fine-tuning automatic speech recognition systems.

Modal ACFT Finetune Script

Validated Whisper fine-tuning script on Modal for FUTO

GitHub

danielrosehill/Modal-ACFT-Finetune-Script View on GitHub

Modal Whisper Finetune Script

Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset

GitHub

danielrosehill/Modal-Whisper-Finetune-Script View on GitHub

My Whisper ACFT Fine-Tunes (Collection)

Collection of fine-tuned Whisper models specifically for FUTO Keyboard on mobile. Fine-tuned on ~1 hour of personal voice samples.

huggingface.co

My Whisper ACFT Fine Tunes - a danielrosehill Collection

Whisper fine tunes for use with FUTO keyboard on Android (training: Modal based on Whisper-ACFT skeleton from FUTO)

Whisper ACFT - Base

Base-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_acft_base_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper ACFT - Small

Small-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_acft_small_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper ACFT - Tiny

Tiny-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_acft_tiny_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

My Whisper Fine-Tunes V2 (Collection)

Collection of general Whisper fine-tuned models for desktop use, available in GGML and CTranslate2 formats. Fine-tuned on ~1 hour of personal voice samples.

huggingface.co

My Whisper Fine-Tunes (V2) - a danielrosehill Collection

Whisper fine-tunes for my voice and vocab (tech, Hebrew). About 1 hour of training data so still very much POCs!

Whisper Fine-Tune - Large V3 Turbo

Large V3 Turbo-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_finetune_large_v3_turbo_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper Fine-Tune - Medium

Medium-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_finetune_medium_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper Fine-Tune - Tiny

Tiny-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_finetune_tiny_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper Fine-Tune - Base

Base-sized Whisper fine-tune

huggingface.co

danielrosehill/daniel_whisper_finetune_base_v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

STT Fine Tune Project Outline

Planning doc for STT fine-tuning and eval project

GitHub

danielrosehill/STT-Fine-Tune-Project-Outline View on GitHub

whisper-acft

Whisper ACFT fine-tuning

GitHub

danielrosehill/whisper-acft View on GitHub

Whisper Fine Tuning Resources

Some resources for those looking to fine-tune Whisper ASR

GitHub

danielrosehill/Whisiper-Fine-Tuning-Resources View on GitHub

Whisper-Hebrish

Fine-tuned Whisper model for Hebrew/English mixed speech

huggingface.co

danielrosehill/Whisper-Hebrish · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ASR Training Data GUIs

GUI applications for creating and collecting training data for ASR fine-tuning.

ASR Training Data Chunker

Breaks up texts by approximate reading duration

GitHub

danielrosehill/ASR-Training-Data-Chunker View on GitHub

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

GitHub

danielrosehill/ASR-Training-Data-Collector View on GitHub

Voice Training Data Creator

GUI to facilitate capturing voice data for TTS / voice clone training with LLM synthetic text generation and saving logic (Ubuntu Linux)

GitHub

danielrosehill/Voice-Training-Data-Creator View on GitHub

ASR Datasets

Curated datasets for training and evaluating ASR/STT models.

My Public Audio Datasets (Collection)

Collection of public audio datasets for speech recognition training and evaluation

huggingface.co

My Public Audio Datasets - a danielrosehill Collection

Open sourced audio datasets for STT/ASR. All recordings by me (Daniel Rosehill) unless otherwise accredited.

English-Hebrew Mixed Sentences

Dataset of mixed English/Hebrew sentences for multilingual ASR training

huggingface.co

danielrosehill/English-Hebrew-Mixed-Sentences · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Tech Audio Samples

Technical audio samples for STT evaluation

huggingface.co

danielrosehill/Tech-Sentences-For-ASR-Training · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Whisper WPM Test

Dataset for testing words-per-minute recognition accuracy

huggingface.co

404 – Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

STT Applications

Desktop applications and utilities for speech-to-text input.

Whisper-Based Linux Prototypes

Voice Prompt Editor

Streamlit app for capturing and editing prompts and system prompts

GitHub

danielrosehill/Voice-Prompt-Editor View on GitHub

Voice Prompt Runner

Demo UI which parses and then runs audio prompts

GitHub

danielrosehill/Voice-Prompt-Runner View on GitHub

Whisper Notepad For Linux

Notepad for Linux that uses OpenAI Whisper (API) and reformats dictated text

GitHub

danielrosehill/Whisper-Notepad-For-Linux View on GitHub

Whisper Notepad Simple

A Linux desktop utility for converting speech to text using the OpenAI Whisper API

GitHub

danielrosehill/Whisper-Notepad-Simple View on GitHub

Whisper Transcription Notepad Linux

Transcription notepad with cloud speech to text (STT) for Linux

GitHub

danielrosehill/Whisper-Transcription-Notepad-Linux View on GitHub

Deepgram-Based Linux Prototypes

Deepgram Voice Keyboard

A fork of Deepgram's Linux starter. CLI -> GUI + hotkey support, API key editing, cost tracking. WIP

GitHub

danielrosehill/deepgram-voice-keyboard View on GitHub

Deepgram Voice Keyboard Ubuntu

WIP to try to create a good STT utility with cloud STT APIs

GitHub

danielrosehill/Deepgram-Voice-Keyboard-Ubuntu View on GitHub

Other STT & Dictation Apps

amical

Open Source AI Dictation App - Type 3x faster, no keyboard needed

GitHub

danielrosehill/amical View on GitHub

Handy

A free, open source, and extensible speech-to-text application that works completely offline

GitHub

danielrosehill/Handy View on GitHub

hyprvoice

Voice-powered typing for Wayland/Hyprland desktops

GitHub

danielrosehill/hyprvoice View on GitHub

parakeet-dictation

On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx

GitHub

danielrosehill/parakeet-dictation View on GitHub

speech-notes-with-text-fixes

Speech Note Linux app. Note taking, reading and translating with offline STT, TTS and Machine translation

GitHub

danielrosehill/speech-notes-with-text-fixes View on GitHub

Thought-Pad

Linux desktop application for creating notes from dictated speech

GitHub

danielrosehill/Thought-Pad View on GitHub

Voice-Note-Recorder-Ubuntu

GUI for recording voice notes

GitHub

danielrosehill/Voice-Note-Recorder-Ubuntu View on GitHub

Wayland-Voice-Typer

Simple GUI around whisper.cpp for voice-to-text on Linux

GitHub

danielrosehill/Wayland-Voice-Typer View on GitHub

Multimodal Audio Transcription

AI-Transcription-Notepad

Voice note taking utility with cloud audio multimodal models for transcription and text cleanup

GitHub

danielrosehill/AI-Transcription-Notepad View on GitHub

Cloud-ASR-MCP

WIP MCP for using various cloud ASR models for speech to text / transcription

GitHub

danielrosehill/Cloud-ASR-MCP View on GitHub

DVR-Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription

GitHub

danielrosehill/DVR-Transcriber View on GitHub

Gemini-Audio-Transcriber

File upload based multimodal transcription tool using Gemini

GitHub

danielrosehill/Gemini-Audio-Transcriber View on GitHub

Gemini-Transcription-MCP

MCP for Gemini multimodal audio transcription with built in post-processing

GitHub

danielrosehill/Gemini-Transcription-MCP View on GitHub

Local-Multimodal-Transcriber

Local transcription app with audio multimodal design

GitHub

danielrosehill/Local-Multimodal-Transcriber View on GitHub

Transcript Processing

System prompts and tools for cleaning, transforming, and enhancing STT output.

Basic STT Transcript Cleanup

Clean up raw speech-to-text transcripts

huggingface.co

Basic STT Transcript Cleanup - a Hugging Face Space by danielrosehill

Clean up speech-to-text transcripts with AI

Diarised Transcript Assistant

System prompt for generating diarised transcripts (STT plus stylistic guidance)

GitHub

danielrosehill/Diarised-Transcript-Assistant View on GitHub

Speech To Text System Prompt Library

An updated skeleton library of system prompts for using LLMs to refine STT output

GitHub

danielrosehill/Speech-To-Text-System-Prompt-Library View on GitHub

STT Basic Cleanup System Prompt

Basic foundational system prompt for cleaning up AI voice transcripts

GitHub

danielrosehill/STT-Basic-Cleanup-System-Prompt View on GitHub

Text Magic Fix Linux

WIP/Idea - Select text and fix typos with local AI

GitHub

danielrosehill/Text-Magic-Fix-Linux View on GitHub

Text Transformation Prompt Collection 2

An abbreviated collection of STT transformation prompts

GitHub

danielrosehill/Text-Transformation-Prompt-Collection-2 View on GitHub

Text Transformation Prompt Combiner

Basic implementation of a prompt concatenation utility for text transformation system prompts for converting transcribed text

GitHub

danielrosehill/Text-Transformation-Prompt-Combiner View on GitHub

Text Transformation Prompt Library

Updated repo of text transformation prompts (raw STT transcripts -> *). New repo for capturing via automations.

GitHub

danielrosehill/Text-Transformation-Prompt-Library View on GitHub

AI Text Rewriting Toolbox

LLM text reformatting and rewriting toolbox comprised of many system prompts

GitHub

danielrosehill/AI-Text-Rewriting-Toolbox View on GitHub

Audiopenai Edit Prompts

Text transformation prompts library for Audiopen.ai

GitHub

danielrosehill/Audiopenai-Edit-Prompts View on GitHub

Shakespearean Text Generators

System prompts for rewriting text in Shakespearean English

GitHub

danielrosehill/Shakespearean-Text-Generators View on GitHub

Text Cleanup Fine Tuning Set

Fine-tuning dataset/plans for text cleanup audio multimodal

GitHub

danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHub

Text Transformation Prompt Stack

Documentation/notes for a "prompt stack" for audio multimodal text processing

GitHub

danielrosehill/Text-Transformation-Prompt-Stack View on GitHub

Transcription Cleanup Eval 1225

Evaluating various cloud audio understanding models on transcribe and cleanup

GitHub

danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHub

Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup

GitHub

danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHub

Voice Note Redaction Agent

Config for a text redaction agent for voicenote -> * workflows

GitHub

danielrosehill/Voice-Note-Redaction-Agent View on GitHub

Voice Automation & Pipelines

Workflows and agents for voice-to-action automation.

Audio Context Pipeline Model 0425

Planning repo for personalised AI context pipeline with revised tooling

GitHub

danielrosehill/Audio-Context-Pipeline-Model-0425 View on GitHub

STT To TTS

Gemini app which captures user speech, condenses (LLM), and then synthesises

GitHub

danielrosehill/STT-To-TTS View on GitHub

Voice Prompt Enhancement Node

Configuration for an intermediate agent in voice automation workflows that bridge voice input to other actions

GitHub

danielrosehill/Voice-Prompt-Enhancement-Node View on GitHub

Voice Prompt Pipeline

Voice-to-prompt pipeline for processing spoken instructions

huggingface.co

404 – Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Voice Spec Driven Development Demo

Demonstrating a voice to text spec driven development workflow

GitHub

danielrosehill/Voice-Spec-Driven-Development-Demo View on GitHub

Voice To Prompt Pipeline

A conceptual voice to prompt pipeline that attempts to separate instructions from provided context for better results

GitHub

danielrosehill/Voice-To-Prompt-Pipeline View on GitHub

N8N Voice Note Context Pipeline Workflow

Workflow for extracting context data from voice notes to Pinecone

GitHub

danielrosehill/N8N-Voice-Note-Context-Pipeline-Workflow View on GitHub

Voice Note Ragie Pipeline

Test pipeline: voice context data to Ragie

GitHub

danielrosehill/Voice-Note-Ragie-Pipeline View on GitHub

Voicenotes Prompt To Email Workflow N8N

GitHub

danielrosehill/Voicenotes-Prompt-To-Email-Workflow-N8N View on GitHub

Evaluation & Benchmarking

Tools for testing and comparing STT performance.

Local STT Eval One Sample

Single-sample evaluation for local STT models

huggingface.co

Local STT Eval One Sample - a Hugging Face Space by danielrosehill

Single sample eval for WER on various Whisper models

Long Form Audio Eval

Single shot STT benchmark for long form audio

GitHub

danielrosehill/Long-Form-Audio-Eval View on GitHub

STT Comparison

Compare different speech-to-text models and services

huggingface.co

STT Comparison - a Hugging Face Space by danielrosehill

Comparing STT models against audio

STT Voice Note Evaluation

GitHub

danielrosehill/STT-Voice-Note-Evaluation View on GitHub

Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for specific hardware

GitHub

danielrosehill/Local-ASR-STT-Benchmark View on GitHub

Long Form Audio Pipeline

Basic audio pipeline for preparing long audio content for ASR transcription

GitHub

danielrosehill/Long-Form-Audio-Pipeline View on GitHub

One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy eval

GitHub

danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHub

Speech And ASR Evaluations

Index repository for speech recognition and ASR evaluations

GitHub

danielrosehill/Speech-And-ASR-Evaluations View on GitHub

Whisper Fine-Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference

GitHub

danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHub

Whisper Fine-Tune Eval

Evaluation interface for fine-tuned Whisper models

huggingface.co

Whisper Fine-Tune vs Commercial APIs - a Hugging Face Space by danielrosehill

Local fine-tunes beat commercial STT APIs

Whisper WPM Background Noise Eval

Quick eval: how much does speaking pace affect WER/accuracy in ASR?

GitHub

danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHub

Audio Processing

Microphone setup, EQ, and audio chain tools for optimal STT input.

Deepnet Baby Noise Scrub

Audio cleaning tool for removing baby/background noise from recordings

huggingface.co

Baby Noise Cancellation Demo - a Hugging Face Space by danielrosehill

AI-powered baby noise removal demo with STT comparison

EQ Template Generator

Generate EQ templates for audio processing

huggingface.co

EQ Template Generator - a Hugging Face Space by danielrosehill

Create custom vocal EQ templates using AI. Upload your audio or describe your preferences, and get tailored EQ settings to enhance your sound.

Mic Input Boot FX Script Ubuntu

Boot script to ensure that Easy Effects manages the input sound source on boot (Ubuntu)

GitHub

danielrosehill/Mic-Input-Boot-FX-Script-Ubuntu View on GitHub

Speech Recognition Audio Chain

Attempt to set up a good autostart audio processing chain for STT

GitHub

danielrosehill/Speech-Recognition-Audio-Chain View on GitHub

Voice Analyzer

Analyses voice data

GitHub

danielrosehill/Voice-Analyzer View on GitHub

TTS & Speech Synthesis

Text-to-speech and SSML generation tools.

Text To SSML Generator

Generates SSML from text by inference

GitHub

danielrosehill/Text-To-SSML-Generator View on GitHub

Hebrew TTS Providers

Reference of Hebrew text-to-speech providers and services

GitHub

danielrosehill/Hebrew-TTS-Providers ★ 2

Point in time snapshot of TTS providers with Modern Hebrew support

Updated Mar 2026

General

Documentation, research, curated lists, and miscellaneous speech tech resources.

ASR And STT AI Notebook

Prompts and outputs (and some notes) on STT + ASR + fine-tuning. LLM: Claude

GitHub

danielrosehill/ASR-And-STT-AI-Notebook View on GitHub

Awesome Whisper Apps

Useful speech to text tools that use Whisper under the hood (API/local)

GitHub

danielrosehill/Awesome-Whisper-Apps View on GitHub

Deepgram Text Input

Analysis of Deepgram text input

GitHub

danielrosehill/Deepgram-Text-Input View on GitHub

Linux Voice Typing App Notes

Planning notes for a tool I've been working on for a while!

GitHub

danielrosehill/Linux-Voice-Typing-App-Notes View on GitHub

STT Price Points 260225

Some timestamped API pricepoints for speech to text providers

GitHub

danielrosehill/STT-Price-Points-260225 View on GitHub

Voice LLM App Notes

A few notes describing the kind of voice app for large language models I would love to have!

GitHub

danielrosehill/Voice-LLM-App-Notes View on GitHub

Dictation Macropad

Plan/key allocation for a macropad optimised for heavy daily dictation workflows

GitHub

danielrosehill/Dictation-Macropad View on GitHub

Linux Friendly Voice Tech

List of resources for voice technology with support for Linux

GitHub

danielrosehill/Linux-Friendly-Voice-Tech View on GitHub

Speech To Text Chain Notes

Notes on STT processing chain (for future voice projects)

GitHub

danielrosehill/Speech-To-Text-Chain-Notes View on GitHub

Ubuntu Mic Selector

Utility for switching microphone sources

GitHub

danielrosehill/Ubuntu-Mic-Selector View on GitHub

Voice Control Linux

Claude-enhanced research for voice control platforms with Linux support

GitHub

danielrosehill/Voice-Control-Linux View on GitHub

Voicepad

Planning notes for a macropad for STT users

GitHub

danielrosehill/Voicepad View on GitHub