This led to improved recognition of unique accents, background noise variants, and technical terminology and jargon.
“The primary intended users of Whisper models are AI researchers, studying robustness, capabilities, biases, generalization, and constraints of the current model. However, Whisper is also potentially useful as an automatic speech recognition solution for developers, especially for English speech recognition.” OpenAI said in a GitHub repo, (program notes) for Whisper. Anyone can download Whisper from GitHub; it is entirely free to use.
The models are showing strength
Also, in the repo, OpenAI wrote “The models show strong ASR results in about 10 languages. They may exhibit additional capabilities, if fine-tuned on certain tasks, like voice activity detection, speaker diarization, and speaker classification. But have not been robustly evaluated in these areas.”
There are some limits
Whisper limitations are found in particular areas, such as text prediction. The system was trained on a great deal of “noisy” data, so OpenAI cautions that Whisper might include words in its transcriptions that weren’t actually spoken. This may be related to trying to predict the next word in audio and trying to transcribe the audio at the same time.
Furthermore, Whisper doesn’t perform equally well across languages. The system does suffer from a higher error rate when it comes to speakers of languages that aren’t well represented in the training data or models.
There is the ever-present racial problem
This is not something new to the world of ASR, unfortunately. Biases have long plagued the best of systems with a 2020 study from Stanford had seen fewer overall errors in big tech company’s ASR, like Amazon, Apple, Google, Microsoft, and IBM – far fewer – about 19% – with users who were white, than with users who were Black.