By Matt Reynolds
Does Siri have trouble with your accent? A project is turning to crowdsourced voice donations to overcome this problem, and iron out some of the other inherent problems with voice recognition.
Voice assistants like Siri and Alexa are trained on huge databases of recorded speech. But if those don’t contain enough samples of a particular accent or dialect, the voice assistants will struggle to understand people who speak that way.
So Mozilla – the foundation behind the Firefox web browser – is turning to crowdsourcing to create voice recognition systems that avoid these problems. Through Project Common Voice, which launched last month, Mozilla aims to collect 10,000 hours of spoken English from people with a wide range of accents.
“Historically, most collections of speech and language tend to be more male and white and middle class,” says Rachael Tatman at the University of Washington. And while companies like Google and Amazon are getting better at balancing their data sets, particularly when it comes to gender, voice recognition systems still have a harder time understanding Americans who speak with certain accents, she says.
The software tends to work better for accents that have traditionally been seen as more prestigious, says Tatman. And the accents and dialects that tend to be under-represented in training data sets are typically associated with groups of people who are already marginalised in US society. That’s why voice software sometimes has lower recognition rates for African Americans, she says.
Words fail me
The same may be true for female voices. Naomi Saphra at the University of Edinburgh, UK, uses voice recognition software to write code as she is unable to type. She has to change how she pronounces some words because the software she uses doesn’t always correctly recognise female-sounding voices. “I would love to see the area of speech recognition more democratised,” she says.
But as with most issues around algorithmic bias, there’s no easy fix.
At the moment, voice recognition data sets are mostly concentrated in the hands of a few companies, says Kelly Davis, who leads the Mozilla project. When people speak to Alexa or Google Voice, the interactions are logged, creating an ever-expanding database of voice data – for Amazon and Google. That strengthens these companies’ monopoly on high-quality voice recognition, making it harder for competitors to develop voice assistants.
Alan Black at Carnegie Mellon University in Pennsylvania says the big voice recognition companies do want to make sure their software is accurate for a wide range of dialects and accents. However, if people with certain accents find they aren’t recognised by a certain system, that might deter future use. If they stop using the devices, the voice assistant companies will miss out on data from people with those accents, further skewing what Siri and Alexa can respond easily to.
Once Mozilla has collected enough audio clips, the foundation plans to release them to allow anyone – even Google and Amazon – to train their own voice recognition system using machine learning. “The top people are really in universities,” says Black, but they don’t usually have access to very large voice-training data sets.
To add their voice to the project, people can visit the Common Voice website, record a clip of themselves reading a preselected sentence and add their demographic details. Visitors can also listen to other people’s recordings to make sure they are accurate.
All this will eventually help build software that can recognise a wide range of accents. At the moment, Mozilla is only accepting recordings in English, but if the project is successful, it plans to launch similar initiatives in other languages.