A developer built erm, a CLI tool that removes verbal disfluencies like "um" and "uh" from audio recordings. The obvious approach of using Whisper for transcription and cutting filler words only works about 60% of the time and sounds worse than the original. The tool uses faster-whisper with the medium.en model and runs locally.
Tap to vote and see what everyone thinks.