Testing Whisper Transcriptions

30 January, 2023 10:20AM ยท 3 minute read

In my efforts to support the Podcasting 2.0 initiative, I’ve had a stab at developing transcripts for Causality. I first started on this before PC2.0 back in 2018 and again later that year where I was using Dragon Dictate with only very average results.

I then went on and tried YouTube, since via LibSyn I’d been publishing YouTube auto-generated videos for Pragmatic and Causality episodes for several years, and I realised I could extract the SRT files from YouTube. They were substantially better but still had issues that required a significant amount of effort to bring them up to a standard I would accept on TEN.

Using Subtitle Studio I was able align, tweak and correct the files then publish them to the site. All the plumbing has been built so all I needed to do was add the SRT file and job done.

Unfortunately however, the process of correcting the numerous errors in the YouTube SRT was monstrous. A 45 minute episode would take me 4-5 hours to fix all of the SRT errors. I edited Episodes 3, 11, 17, 18, 22, 31, 32, 35, 36, 47 and simply burned out. I lost many weekends, long nights and the pay off was precisely zero. No listeners gave any feedback of any kind either way and so…I stopped bothering.

Irrespective of whether you think I should just put up rubbish transcripts anyway, or how you choose to interpret the legal requirements for posting transcripts in your specific country, I still wanted to do this…so when I saw people discussing an OpenAI derived project MacWhisper, I gave it a shot. The results were, frankly, transformative.

Whisper Comparison 1

The screen shot on the Left is the YouTube SRT, the middle is Whispers, and the Right is my corrected YouTube SRT. In the above comparison note that with a dodgy name like “Chidgey” no-one gets it right except me. That’s fine. It’s a cross I have to bear I guess and that’s okay. The first point of note: Whisper nails the punctuation almost every time! For every missed Full Stop, there’s a missed following capitalisation. That’s a huge time saver!

Whisper Comparison 2

Again Whisper nails the capitalisation of the correct words - in this example above, the title of the episode and it also inserts a comma which arguably I missed in mine. I also like how Whisper capitalises the date for me too. Not too shabby.

Whisper Comparison 3

Here’s where I lost so much time in the past with the YouTube SRT: Company names. Not only did Whisper get the capitalisation and punctuation correct (Yellow Highlight) the only tweak I would add is adding parenthesis around “or GCE” but otherwise perfect! Again, Whisper nails the capitalisation of GCE (the company name) and the punctuation between sentences. Amazing!

Whisper Comparison 4

The last and hardest part now is the technical terminology which is always a struggle for AI transcripts. YouTube put 9 ue when it should be 9UE, but Whisper nails that once again. YouTube does get the next two correct but didn’t capitalise 8UE and 10UE where Whisper did.

With the above results I’m just blown away and last night I began transcribing every Causality episode using MacWhisper on the Best accuracy model (2.8GB) to get the best result I can. I’ll work my way through the back catalogue but hopefully the editing times for SRTs will be significantly reduced and I can finally post word-accurate transcripts to TEN.