Experiments & Evaluations Index

Repository Index

This repository serves as an index of experimental projects, evaluations, proof-of-concepts, templates, patterns, and exploratory ideas related to AI/LLM development and workflows.

About this Index

This collection brings together various experimental repositories exploring AI agent workflows, LLM capabilities, evaluation frameworks, and development patterns. These repositories represent hands-on experiments, proof-of-concepts, benchmarking efforts, and reusable templates for AI-driven development.

Quick Reference: Evaluations

Whisper Fine-Tune Accuracy Eval — Speech · Smaller models improve with fine-tuning; larger models degrade unless handling code-switching
danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHub
One-Shot Transcription Microphone Eval — Speech · Environment matters more than equipment cost for STT accuracy
danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHub
Transcription Cleanup Eval — Speech · Compares cloud models on single-step transcription + cleanup
danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHub
Whisper WPM Background Noise Eval — Speech · Speaking pace and background noise impact on Whisper accuracy
danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHub
Long Form Audio Eval — Speech · Long-form audio transcription evaluation
danielrosehill/Long-Form-Audio-Eval View on GitHub
Local ASR STT Benchmark — Speech · Local ASR/STT benchmarking
danielrosehill/Local-ASR-STT-Benchmark View on GitHub
Hebrew Image Generation Eval — Image · Hebrew text rendering in AI image generation
danielrosehill/Hebrew-Image-Generation-Eval View on GitHub
Bias Censorship Eval Tests — LLM · Testing for bias and censorship in LLMs
danielrosehill/Bias-Censorship-Eval-Tests View on GitHub

Quick Reference: Experiments

Voice Cloning Difference Test — Speech · How training data duration affects voice cloning quality
danielrosehill/Voice-Cloning-Difference-Test View on GitHub
Text Cleanup Fine-Tuning Set — Speech · Dataset for training AI to clean up STT transcripts
danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHub
Voice Cleanup Prompt Experiment — Speech · Comparing OpenAI vs Gemini for transcript cleanup
danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHub
Impact Bond Policy Simulator — Multi-Agent · Simulating stakeholder reactions to policy proposals
danielrosehill/Impact-Bond-Policy-Simulator View on GitHub
Peace In The Middle East — Multi-Agent · AI simulation of geopolitical dialogue
danielrosehill/Peace-In-The-Middle-East View on GitHub
Weird AI Experiment Ideator — Multi-Agent · Blind multi-pass review for generating experiment ideas
danielrosehill/Weird-AI-Experiment-Ideator View on GitHub
LLM Long Codegen Test — LLM · Testing long-form code generation
danielrosehill/LLM-Long-Codegen-Test View on GitHub
Single Shot Brevity Training — LLM · Training for concise responses
danielrosehill/Single-Shot-Brevity-Training View on GitHub

AI Agent Development

Agent Workflows & Patterns

Agent Handover Demo — View Repo · Demonstration of agent handover patterns · 2025
danielrosehill/Agent-Handover-Demo View on GitHub
Agent Network Expander Template — View Repo · Template for expanding agent networks · 2025
danielrosehill/Agent-Network-Expander-Template View on GitHub
Agent Task Repo Pattern With MCP — View Repo · Repository pattern for agent tasks using MCP · 2025
danielrosehill/Agent-Task-Repo-Pattern-With-MCP View on GitHub
AI Agent UN — View Repo · AI agent unified namespace · 2025
danielrosehill/AI-Agent-UN View on GitHub
AI Agent Workspace Spec — View Repo · Agent workspace specification · Mar 2025
danielrosehill/AI-Agent-Workspace-Spec-310325 View on GitHub
Weird AI Experiment Ideator — View Repo · CrewAI multi-agent system using blind multi-pass review to generate creative AI experiment ideas · 2025
danielrosehill/Weird-AI-Experiment-Ideator View on GitHub

Development Templates

AI Assistant Template — View Repo · Template for AI assistant development · 2025
danielrosehill/AI-Assistant-Template View on GitHub
AI Dev Prompts Example — View Repo · Example development prompts for AI · 2025
danielrosehill/AI-Dev-Prompts-Example View on GitHub
AI Development Template — View Repo · General AI development template · 2025
danielrosehill/Ai-Development-Template View on GitHub

LLM Evaluation & Benchmarking

LLM Capabilities & Testing

Bias Censorship Eval Tests — View Repo · Evaluation tests for bias and censorship · 2025
danielrosehill/Bias-Censorship-Eval-Tests View on GitHub
LLM Evaluation Prompts — View Repo · Prompts for evaluating LLMs · 2025
danielrosehill/LLM-Evaluation-Prompts View on GitHub
Assistant Self Ideation — View Repo · Demo of AI "self ideation" in practice · Feb 2025
danielrosehill/Assistant-Self-Ideation-280325 View on GitHub

LLM Experiments

LLM Experiment Notebook — View Repo · Notebook of LLM experiments · 2025
danielrosehill/LLM-Experiment-Notebook View on GitHub
LLM Long Codegen Test — View Repo · Testing long-form code generation · 2025
danielrosehill/LLM-Long-Codegen-Test View on GitHub
LLM Max Token Length — View Repo · Maximum token length exploration · Feb 2025
danielrosehill/LLM-Max-Token-Length-0225 View on GitHub
Single Shot Brevity Training — View Repo · Training for concise single-shot responses · 2025
danielrosehill/Single-Shot-Brevity-Training View on GitHub
One Prompt AI Book — View Repo · Experiment in generating content from single prompts · 2025
danielrosehill/One-Prompt-AI-Book View on GitHub
Long AI Prompting Experiment — View Repo · Testing experiments with extended prompts · 2025
danielrosehill/Long-AI-Prompting-Experiment View on GitHub

Hugging Face Spaces

Single Shot Brevity Training — View Space · Brevity training interface · 2025
LLM Long Code Generation Experiment — View Space · Long-form code generation experiment · 2025
Max Output Tokens Analysis — View Space · Maximum output tokens analysis · Feb 2025

Speech-to-Text & Audio Processing

STT Benchmarks & Evaluation

Local ASR STT Benchmark — View Repo · Local ASR and STT benchmarking · 2025
danielrosehill/Local-ASR-STT-Benchmark View on GitHub
Long Form Audio Eval — View Repo · Evaluation of long-form audio transcription · 2025
danielrosehill/Long-Form-Audio-Eval View on GitHub
Personal STT Benchmarking — View Repo · Personal speech-to-text benchmarking · 2025
danielrosehill/Personal-STT-Benchmarking View on GitHub
STT Voice Note Evaluation — View Repo · Evaluation of STT for voice notes · 2025
danielrosehill/STT-Voice-Note-Evaluation View on GitHub
Whisper WPM Background Noise Eval — View Repo · Evaluating how speaking pace and background noise affect Whisper ASR accuracy · 2025
danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHub
Whisper Fine-Tune Accuracy Eval — View Repo · GUI tool for comparing fine-tuned vs original Whisper models using WER metrics with whisper.cpp/Vulkan acceleration · 2025
danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHub
One-Shot Transcription Microphone Eval — View Repo · Microphone benchmarking for STT—found environment matters more than equipment cost across 10 mics and 15 samples · 2025
danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHub
Transcription Cleanup Eval — View Repo · Evaluates cloud audio models (GPT-4o, Gemini, Voxtral, Qwen) on single-step transcription with cleanup · Dec 2025
danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHub

Hugging Face Spaces

Single Podcast ASR Eval — View Space · Single podcast ASR evaluation · 2025
STT Comparison — View Space · Speech-to-text comparison tool · 2025
Local STT Eval One Sample — View Space · Local STT evaluation with single samples · 2025
Whisper Fine-Tune Eval — View Space · Interactive evaluation of fine-tuned Whisper models · 2025
Single Shot ASR Eval — View Space · Single-shot ASR evaluation tool · 2025

Hugging Face Datasets

Podcast ASR Evaluation — View Dataset · Dataset for podcast ASR evaluation · 2025
Whisper Fine-Tune One Shot Eval — View Dataset · WER and accuracy evaluation comparing fine-tuned Whisper (Tiny, Base, Small, Medium) vs stock models on 1 hour of audio, inference on Modal A100 · 2025

Audio Samples & Resources

Microphone Audio Samples — View Repo · Collection of microphone audio samples · 2025
danielrosehill/Microphone-Audio-Samples View on GitHub

Audio Processing Experiments

Crying Baby Audio Scrub — View Repo · Audio processing for baby noise removal · 2025
danielrosehill/Crying-Baby-Audio-Scrub View on GitHub
Audio Context Pipeline Model — View Repo · Notes and model for audio context pipeline · Apr 2025
danielrosehill/Audio-Context-Pipeline-Model-0425 View on GitHub
Voice Cleanup Prompt Experiment — View Repo · Comparing OpenAI (Whisper+GPT-4) vs Gemini for transcript cleanup · 2025
danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHub
Voice Cloning Difference Test — View Repo · Experiment testing how training data duration (1/3/5 min) affects one-shot voice cloning quality · 2025
danielrosehill/Voice-Cloning-Difference-Test View on GitHub
Text Cleanup Fine-Tuning Set — View Repo · Dataset and tooling for training AI to automatically clean up STT transcripts · 2025
danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHub

Image Generation & Visual AI

Image Generation Evaluation

Hebrew Image Generation Eval — View Repo · Evaluation of AI image generation models for Hebrew text rendering · 2025
danielrosehill/Hebrew-Image-Generation-Eval View on GitHub

Specialized Applications

Multi-Agent Simulations

Impact Bond Policy Simulator — View Repo · CrewAI multi-agent framework simulating stakeholder reactions to Pay-for-Success impact bond proposals · 2025
danielrosehill/Impact-Bond-Policy-Simulator View on GitHub
Peace In The Middle East — View Repo · Experimental multi-agent AI system simulating geopolitical dialogue with state and non-state actors · 2025
danielrosehill/Peace-In-The-Middle-East View on GitHub

OSINT & Intelligence

OSINT Missile Intelligence Agent — View Repo · OSINT-focused intelligence agent · 2025
danielrosehill/OSINT-Missile-Intelligence-Agent View on GitHub

Data Analysis

GHG EBITDA Correlations — View Repo · Analysis of greenhouse gas and EBITDA correlations · 2025
danielrosehill/GHG-EBITDA-Correlations View on GitHub

Testing & Documentation

Test Repositories

Test Markdown Docs — View Repo · Test repository for markdown documentation · 2025
danielrosehill/Test-Markdown-Docs View on GitHub
Test System Prompts — View Repo · Test repository for system prompts · 2025
danielrosehill/Test-System-Prompts View on GitHub

Related Subindexes

Speech & ASR Evaluations — View Repo · Comprehensive index of speech recognition and ASR evaluation studies
danielrosehill/Speech-And-ASR-Evaluations View on GitHub

Note: This is a focused index covering experimental AI/LLM development projects. For a higher-level collection of all repository indexes and other projects, see the GitHub Master Index.

danielrosehill/Github-Master-Index View on GitHub

Author

Daniel Rosehill Contact: public@danielrosehill.com Website: danielrosehill.com

Proof of Concepts — AI Self-Ideation

AI Agent Ideation Agent — View Repo · Ideation agent that generates ideas for AI agents · 2025
danielrosehill/AI-Agent-Ideation-Agent View on GitHub
AI Assistant Ideator — View Repo · Streamlit app for ideating AI assistants · 2025
danielrosehill/AI-Assistant-Ideator View on GitHub
Claude Space Self-ideator — View Repo · Claude Code ideating ideas for new applications for Claude Code workspaces · 2025
danielrosehill/Claude-Space-Self-ideator View on GitHub

Proof of Concepts — Context & Interview Workflows

Agentic Context Development Interview Demo — View Repo · Demonstration of chained LLM agent workflow for generating personal contextual data · 2025
danielrosehill/Agentic-Context-Development-Interview-Demo View on GitHub
AI Interview Workflow V2 — View Repo · AI interview workflow, version 2 · 2025
danielrosehill/AI-Interview-Workflow-V2 View on GitHub
My LLM Context Repo Public — View Repo · A context repo for experimenting with LLM models (public version) · 2025
danielrosehill/My-LLM-Context-Repo-Public View on GitHub

Proof of Concepts — Multi-Agent Panels & Decision Making

AI Agent Virtual Panel Configs — View Repo · Sets of "panels" for testing virtual AI persona voting bodies and thinking groups · 2025
danielrosehill/AI-Agent-Virtual-Panel-Configs View on GitHub
Claude AI Conference — View Repo · AI experiment: panel + TTS, mini conference/symposium · 2025
danielrosehill/Claude-AI-Conference View on GitHub
Panel Of Claude — View Repo · Exploratory Claude model: multiple agents mimicking a panel debate · 2025
danielrosehill/Panel-Of-Claude View on GitHub
Claude Change My View — View Repo · CMV with AI (pattern/template) · 2025
danielrosehill/Claude-Change-My-View View on GitHub
Claude Decision Evaluation Framework — View Repo · Claude Code model for decision evaluation · 2025
danielrosehill/Claude-Decision-Evaluation-Framework View on GitHub

Proof of Concepts — Research & Report Generation

All About MCP — View Repo · Example repository for agentic AI generated long form report generation · 2025
danielrosehill/All-About-MCP View on GitHub
Claude Deep Research Model — View Repo · Repo model for an iterative deep research model with voice pipeline · 2025
danielrosehill/Claude-Deep-Research-Model View on GitHub
Claude Georeaction Researcher — View Repo · Claude template for analysing global sentiment/reaction to a geopolitical issue · 2025
danielrosehill/Claude-Georeaction-Researcher View on GitHub
Geopol Forecaster POC — View Repo · Experimental prediction analysis for real world events · 2025
danielrosehill/Geopol-Forecaster-POC View on GitHub

Proof of Concepts — Other

AI Resume — View Repo · Notes/test for creating a resume/CV specifically intended for AI agents · 2025
danielrosehill/AI-Resume View on GitHub
Claude Agent Picker Pattern — View Repo · Pattern/idea for the "too many subagents" problem · 2025
danielrosehill/Claude-Agent-Picker-Pattern View on GitHub
Policy Visualiser — View Repo · Visualise how different countries approach policy challenges with Gemini identifying clusters · 2025
danielrosehill/Policy-Visualiser View on GitHub
System Prompt Factory — View Repo · A system prompt generation UI combining model and user characteristics · 2025
danielrosehill/System-Prompt-Factory View on GitHub
System Prompt Generation Configurations — View Repo · System prompts for using AI tools to generate and improve system prompts · 2025
danielrosehill/System-Prompt-Generation-Configurations View on GitHub
The Jerusalem Odyssey Text — View Repo · A 100 page book manuscript generated from a single prompt using Sonnet 3.7 · 2025
danielrosehill/The-Jerusalem-Odyssey-Text View on GitHub

Additional Evaluations

Gemini 3.1 Lite Audio Understanding Eval — View Repo · Voice recording for testing with TTS/cloning · 2025
danielrosehill/Gemini-31-Lite-Audio-Understanding-Eval View on GitHub

Additional Experiments

Better Web Design Inc — View Repo · Agentic web design crew specialising in offbeat designs and client dissatisfaction! · 2025
danielrosehill/Better-Web-Design-Inc View on GitHub
Gemini Body Language Analyst — View Repo · Test app "vibe coded" in Google AI Studio: analyse body language from photo plus context · 2025
danielrosehill/Gemini-Body-Language-Analyst View on GitHub
LLM Detective — View Repo · Agent that tries to probe other models' capabilities with conversation · 2025
danielrosehill/LLM-Detective View on GitHub
LLM Wars — View Repo · (Experiment) LLMs argue "who's better" in a podcast · 2025
danielrosehill/LLM-Wars View on GitHub
Natural Language Relationship Definition — View Repo · Experiment: trying to create a database schema using natural language · 2025
danielrosehill/Natural-Language-Relationship-Definition View on GitHub
No BS AI System Prompt — View Repo · System prompt for a blunt AI assistant that gets to the point · 2025
danielrosehill/No-BS-AI-System-Prompt View on GitHub
No Wheel Inventions — View Repo · Slash command for Claude to encourage avoiding reinventing the wheel · 2025
danielrosehill/No-Wheel-Inventions View on GitHub
Two AIs Talk — View Repo · Experiment: two AI agents, each one thinks the other is a liar · 2025
danielrosehill/Two-AIs-Talk View on GitHub
Weird AI Bots — View Repo · Some configurations for offbeat AI roleplay characters just for fun · 2025
danielrosehill/Weird-AI-Bots View on GitHub

Data Visualization

Agentic AI Architecture Visualisation — View Repo · Framework-agnostic data model and visualizations mapping the moving pieces of agentic AI systems · 2025
danielrosehill/Agentic-AI-Architecture-Visualisation View on GitHub