Experiments & Evaluations Index
Repository Index
This repository serves as an index of experimental projects, evaluations, proof-of-concepts, templates, patterns, and exploratory ideas related to AI/LLM development and workflows.
About this Index
This collection brings together various experimental repositories exploring AI agent workflows, LLM capabilities, evaluation frameworks, and development patterns. These repositories represent hands-on experiments, proof-of-concepts, benchmarking efforts, and reusable templates for AI-driven development.
Quick Reference: Evaluations
Whisper Fine-Tune Accuracy Eval — Speech · Smaller models improve with fine-tuning; larger models degrade unless handling code-switching
danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHubOne-Shot Transcription Microphone Eval — Speech · Environment matters more than equipment cost for STT accuracy
danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHubTranscription Cleanup Eval — Speech · Compares cloud models on single-step transcription + cleanup
danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHubWhisper WPM Background Noise Eval — Speech · Speaking pace and background noise impact on Whisper accuracy
danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHubLong Form Audio Eval — Speech · Long-form audio transcription evaluation
danielrosehill/Long-Form-Audio-Eval View on GitHubLocal ASR STT Benchmark — Speech · Local ASR/STT benchmarking
danielrosehill/Local-ASR-STT-Benchmark View on GitHubHebrew Image Generation Eval — Image · Hebrew text rendering in AI image generation
danielrosehill/Hebrew-Image-Generation-Eval View on GitHubBias Censorship Eval Tests — LLM · Testing for bias and censorship in LLMs
danielrosehill/Bias-Censorship-Eval-Tests View on GitHub
Quick Reference: Experiments
Voice Cloning Difference Test — Speech · How training data duration affects voice cloning quality
danielrosehill/Voice-Cloning-Difference-Test View on GitHubText Cleanup Fine-Tuning Set — Speech · Dataset for training AI to clean up STT transcripts
danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHubVoice Cleanup Prompt Experiment — Speech · Comparing OpenAI vs Gemini for transcript cleanup
danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHubImpact Bond Policy Simulator — Multi-Agent · Simulating stakeholder reactions to policy proposals
danielrosehill/Impact-Bond-Policy-Simulator View on GitHubPeace In The Middle East — Multi-Agent · AI simulation of geopolitical dialogue
danielrosehill/Peace-In-The-Middle-East View on GitHubWeird AI Experiment Ideator — Multi-Agent · Blind multi-pass review for generating experiment ideas
danielrosehill/Weird-AI-Experiment-Ideator View on GitHubLLM Long Codegen Test — LLM · Testing long-form code generation
danielrosehill/LLM-Long-Codegen-Test View on GitHubSingle Shot Brevity Training — LLM · Training for concise responses
danielrosehill/Single-Shot-Brevity-Training View on GitHub
AI Agent Development
Agent Workflows & Patterns
Agent Handover Demo — View Repo · Demonstration of agent handover patterns · 2025
danielrosehill/Agent-Handover-Demo View on GitHubAgent Network Expander Template — View Repo · Template for expanding agent networks · 2025
danielrosehill/Agent-Network-Expander-Template View on GitHubAgent Task Repo Pattern With MCP — View Repo · Repository pattern for agent tasks using MCP · 2025
danielrosehill/Agent-Task-Repo-Pattern-With-MCP View on GitHubAI Agent UN — View Repo · AI agent unified namespace · 2025
danielrosehill/AI-Agent-UN View on GitHubAI Agent Workspace Spec — View Repo · Agent workspace specification · Mar 2025
danielrosehill/AI-Agent-Workspace-Spec-310325 View on GitHubWeird AI Experiment Ideator — View Repo · CrewAI multi-agent system using blind multi-pass review to generate creative AI experiment ideas · 2025
danielrosehill/Weird-AI-Experiment-Ideator View on GitHub
Development Templates
AI Assistant Template — View Repo · Template for AI assistant development · 2025
danielrosehill/AI-Assistant-Template View on GitHubAI Dev Prompts Example — View Repo · Example development prompts for AI · 2025
danielrosehill/AI-Dev-Prompts-Example View on GitHubAI Development Template — View Repo · General AI development template · 2025
danielrosehill/Ai-Development-Template View on GitHub
LLM Evaluation & Benchmarking
LLM Capabilities & Testing
Bias Censorship Eval Tests — View Repo · Evaluation tests for bias and censorship · 2025
danielrosehill/Bias-Censorship-Eval-Tests View on GitHubLLM Evaluation Prompts — View Repo · Prompts for evaluating LLMs · 2025
danielrosehill/LLM-Evaluation-Prompts View on GitHubAssistant Self Ideation — View Repo · Demo of AI "self ideation" in practice · Feb 2025
danielrosehill/Assistant-Self-Ideation-280325 View on GitHub
LLM Experiments
LLM Experiment Notebook — View Repo · Notebook of LLM experiments · 2025
danielrosehill/LLM-Experiment-Notebook View on GitHubLLM Long Codegen Test — View Repo · Testing long-form code generation · 2025
danielrosehill/LLM-Long-Codegen-Test View on GitHubLLM Max Token Length — View Repo · Maximum token length exploration · Feb 2025
danielrosehill/LLM-Max-Token-Length-0225 View on GitHubSingle Shot Brevity Training — View Repo · Training for concise single-shot responses · 2025
danielrosehill/Single-Shot-Brevity-Training View on GitHubOne Prompt AI Book — View Repo · Experiment in generating content from single prompts · 2025
danielrosehill/One-Prompt-AI-Book View on GitHubLong AI Prompting Experiment — View Repo · Testing experiments with extended prompts · 2025
danielrosehill/Long-AI-Prompting-Experiment View on GitHub
Hugging Face Spaces
Single Shot Brevity Training — View Space · Brevity training interface · 2025
LLM Long Code Generation Experiment — View Space · Long-form code generation experiment · 2025
Max Output Tokens Analysis — View Space · Maximum output tokens analysis · Feb 2025
Speech-to-Text & Audio Processing
STT Benchmarks & Evaluation
Local ASR STT Benchmark — View Repo · Local ASR and STT benchmarking · 2025
danielrosehill/Local-ASR-STT-Benchmark View on GitHubLong Form Audio Eval — View Repo · Evaluation of long-form audio transcription · 2025
danielrosehill/Long-Form-Audio-Eval View on GitHubPersonal STT Benchmarking — View Repo · Personal speech-to-text benchmarking · 2025
danielrosehill/Personal-STT-Benchmarking View on GitHubSTT Voice Note Evaluation — View Repo · Evaluation of STT for voice notes · 2025
danielrosehill/STT-Voice-Note-Evaluation View on GitHubWhisper WPM Background Noise Eval — View Repo · Evaluating how speaking pace and background noise affect Whisper ASR accuracy · 2025
danielrosehill/Whisper-WPM-Background-Noise-Eval View on GitHubWhisper Fine-Tune Accuracy Eval — View Repo · GUI tool for comparing fine-tuned vs original Whisper models using WER metrics with whisper.cpp/Vulkan acceleration · 2025
danielrosehill/Whisper-Fine-Tune-Accuracy-Eval View on GitHubOne-Shot Transcription Microphone Eval — View Repo · Microphone benchmarking for STT—found environment matters more than equipment cost across 10 mics and 15 samples · 2025
danielrosehill/One-Shot-Transcription-Microphone-Eval View on GitHubTranscription Cleanup Eval — View Repo · Evaluates cloud audio models (GPT-4o, Gemini, Voxtral, Qwen) on single-step transcription with cleanup · Dec 2025
danielrosehill/Transcription-Cleanup-Eval-1225 View on GitHub
Hugging Face Spaces
Single Podcast ASR Eval — View Space · Single podcast ASR evaluation · 2025
STT Comparison — View Space · Speech-to-text comparison tool · 2025
Local STT Eval One Sample — View Space · Local STT evaluation with single samples · 2025
Whisper Fine-Tune Eval — View Space · Interactive evaluation of fine-tuned Whisper models · 2025
Single Shot ASR Eval — View Space · Single-shot ASR evaluation tool · 2025
Hugging Face Datasets
Podcast ASR Evaluation — View Dataset · Dataset for podcast ASR evaluation · 2025
Whisper Fine-Tune One Shot Eval — View Dataset · WER and accuracy evaluation comparing fine-tuned Whisper (Tiny, Base, Small, Medium) vs stock models on 1 hour of audio, inference on Modal A100 · 2025
Audio Samples & Resources
Microphone Audio Samples — View Repo · Collection of microphone audio samples · 2025
danielrosehill/Microphone-Audio-Samples View on GitHub
Audio Processing Experiments
Crying Baby Audio Scrub — View Repo · Audio processing for baby noise removal · 2025
danielrosehill/Crying-Baby-Audio-Scrub View on GitHubAudio Context Pipeline Model — View Repo · Notes and model for audio context pipeline · Apr 2025
danielrosehill/Audio-Context-Pipeline-Model-0425 View on GitHubVoice Cleanup Prompt Experiment — View Repo · Comparing OpenAI (Whisper+GPT-4) vs Gemini for transcript cleanup · 2025
danielrosehill/Voice-Cleanup-Prompt-Experiment View on GitHubVoice Cloning Difference Test — View Repo · Experiment testing how training data duration (1/3/5 min) affects one-shot voice cloning quality · 2025
danielrosehill/Voice-Cloning-Difference-Test View on GitHubText Cleanup Fine-Tuning Set — View Repo · Dataset and tooling for training AI to automatically clean up STT transcripts · 2025
danielrosehill/Text-Cleanup-Fine-Tuning-Set View on GitHub
Image Generation & Visual AI
Image Generation Evaluation
Hebrew Image Generation Eval — View Repo · Evaluation of AI image generation models for Hebrew text rendering · 2025
danielrosehill/Hebrew-Image-Generation-Eval View on GitHub
Specialized Applications
Multi-Agent Simulations
Impact Bond Policy Simulator — View Repo · CrewAI multi-agent framework simulating stakeholder reactions to Pay-for-Success impact bond proposals · 2025
danielrosehill/Impact-Bond-Policy-Simulator View on GitHubPeace In The Middle East — View Repo · Experimental multi-agent AI system simulating geopolitical dialogue with state and non-state actors · 2025
danielrosehill/Peace-In-The-Middle-East View on GitHub
OSINT & Intelligence
OSINT Missile Intelligence Agent — View Repo · OSINT-focused intelligence agent · 2025
danielrosehill/OSINT-Missile-Intelligence-Agent View on GitHub
Data Analysis
GHG EBITDA Correlations — View Repo · Analysis of greenhouse gas and EBITDA correlations · 2025
danielrosehill/GHG-EBITDA-Correlations View on GitHub
Testing & Documentation
Test Repositories
Test Markdown Docs — View Repo · Test repository for markdown documentation · 2025
danielrosehill/Test-Markdown-Docs View on GitHubTest System Prompts — View Repo · Test repository for system prompts · 2025
danielrosehill/Test-System-Prompts View on GitHub
Related Subindexes
Speech & ASR Evaluations — View Repo · Comprehensive index of speech recognition and ASR evaluation studies
danielrosehill/Speech-And-ASR-Evaluations View on GitHub
Note: This is a focused index covering experimental AI/LLM development projects. For a higher-level collection of all repository indexes and other projects, see the GitHub Master Index.
danielrosehill/Github-Master-Index View on GitHubAuthor
Daniel Rosehill Contact: public@danielrosehill.com Website: danielrosehill.com
Proof of Concepts — AI Self-Ideation
AI Agent Ideation Agent — View Repo · Ideation agent that generates ideas for AI agents · 2025
danielrosehill/AI-Agent-Ideation-Agent View on GitHubAI Assistant Ideator — View Repo · Streamlit app for ideating AI assistants · 2025
danielrosehill/AI-Assistant-Ideator View on GitHubClaude Space Self-ideator — View Repo · Claude Code ideating ideas for new applications for Claude Code workspaces · 2025
danielrosehill/Claude-Space-Self-ideator View on GitHub
Proof of Concepts — Context & Interview Workflows
Agentic Context Development Interview Demo — View Repo · Demonstration of chained LLM agent workflow for generating personal contextual data · 2025
danielrosehill/Agentic-Context-Development-Interview-Demo View on GitHubAI Interview Workflow V2 — View Repo · AI interview workflow, version 2 · 2025
danielrosehill/AI-Interview-Workflow-V2 View on GitHubMy LLM Context Repo Public — View Repo · A context repo for experimenting with LLM models (public version) · 2025
danielrosehill/My-LLM-Context-Repo-Public View on GitHub
Proof of Concepts — Multi-Agent Panels & Decision Making
AI Agent Virtual Panel Configs — View Repo · Sets of "panels" for testing virtual AI persona voting bodies and thinking groups · 2025
danielrosehill/AI-Agent-Virtual-Panel-Configs View on GitHubClaude AI Conference — View Repo · AI experiment: panel + TTS, mini conference/symposium · 2025
danielrosehill/Claude-AI-Conference View on GitHubPanel Of Claude — View Repo · Exploratory Claude model: multiple agents mimicking a panel debate · 2025
danielrosehill/Panel-Of-Claude View on GitHubClaude Change My View — View Repo · CMV with AI (pattern/template) · 2025
danielrosehill/Claude-Change-My-View View on GitHubClaude Decision Evaluation Framework — View Repo · Claude Code model for decision evaluation · 2025
danielrosehill/Claude-Decision-Evaluation-Framework View on GitHub
Proof of Concepts — Research & Report Generation
All About MCP — View Repo · Example repository for agentic AI generated long form report generation · 2025
danielrosehill/All-About-MCP View on GitHubClaude Deep Research Model — View Repo · Repo model for an iterative deep research model with voice pipeline · 2025
danielrosehill/Claude-Deep-Research-Model View on GitHubClaude Georeaction Researcher — View Repo · Claude template for analysing global sentiment/reaction to a geopolitical issue · 2025
danielrosehill/Claude-Georeaction-Researcher View on GitHubGeopol Forecaster POC — View Repo · Experimental prediction analysis for real world events · 2025
danielrosehill/Geopol-Forecaster-POC View on GitHub
Proof of Concepts — Other
AI Resume — View Repo · Notes/test for creating a resume/CV specifically intended for AI agents · 2025
danielrosehill/AI-Resume View on GitHubClaude Agent Picker Pattern — View Repo · Pattern/idea for the "too many subagents" problem · 2025
danielrosehill/Claude-Agent-Picker-Pattern View on GitHubPolicy Visualiser — View Repo · Visualise how different countries approach policy challenges with Gemini identifying clusters · 2025
danielrosehill/Policy-Visualiser View on GitHubSystem Prompt Factory — View Repo · A system prompt generation UI combining model and user characteristics · 2025
danielrosehill/System-Prompt-Factory View on GitHubSystem Prompt Generation Configurations — View Repo · System prompts for using AI tools to generate and improve system prompts · 2025
danielrosehill/System-Prompt-Generation-Configurations View on GitHubThe Jerusalem Odyssey Text — View Repo · A 100 page book manuscript generated from a single prompt using Sonnet 3.7 · 2025
danielrosehill/The-Jerusalem-Odyssey-Text View on GitHub
Additional Evaluations
Gemini 3.1 Lite Audio Understanding Eval — View Repo · Voice recording for testing with TTS/cloning · 2025
danielrosehill/Gemini-31-Lite-Audio-Understanding-Eval View on GitHub
Additional Experiments
Better Web Design Inc — View Repo · Agentic web design crew specialising in offbeat designs and client dissatisfaction! · 2025
danielrosehill/Better-Web-Design-Inc View on GitHubGemini Body Language Analyst — View Repo · Test app "vibe coded" in Google AI Studio: analyse body language from photo plus context · 2025
danielrosehill/Gemini-Body-Language-Analyst View on GitHubLLM Detective — View Repo · Agent that tries to probe other models' capabilities with conversation · 2025
danielrosehill/LLM-Detective View on GitHubLLM Wars — View Repo · (Experiment) LLMs argue "who's better" in a podcast · 2025
danielrosehill/LLM-Wars View on GitHubNatural Language Relationship Definition — View Repo · Experiment: trying to create a database schema using natural language · 2025
danielrosehill/Natural-Language-Relationship-Definition View on GitHubNo BS AI System Prompt — View Repo · System prompt for a blunt AI assistant that gets to the point · 2025
danielrosehill/No-BS-AI-System-Prompt View on GitHubNo Wheel Inventions — View Repo · Slash command for Claude to encourage avoiding reinventing the wheel · 2025
danielrosehill/No-Wheel-Inventions View on GitHubTwo AIs Talk — View Repo · Experiment: two AI agents, each one thinks the other is a liar · 2025
danielrosehill/Two-AIs-Talk View on GitHubWeird AI Bots — View Repo · Some configurations for offbeat AI roleplay characters just for fun · 2025
danielrosehill/Weird-AI-Bots View on GitHub
Data Visualization
Agentic AI Architecture Visualisation — View Repo · Framework-agnostic data model and visualizations mapping the moving pieces of agentic AI systems · 2025
danielrosehill/Agentic-AI-Architecture-Visualisation View on GitHub