Researchers from Imperial College, London, develop a new deep learning model to detect visual speech recognition tasks in multiple languages. The new model outperforms some previously proposed models trained on larger datasets.
A Ph.D. graduate, Pingchuan Ma, from Imperial College London, with his friends, realized that many visual speech recognition projects only deal with the English language. His objective was to recognize speech in other languages except for the English language from the lip movements of speakers and then compare it with other models trained to recognize English speech.
The new model created by the Imperial College researchers is similar to the speech recognition models in the past. But, some of its hyperparameters are optimized, the dataset is augmented, and additional loss functions are used. Therefore, by carefully designing the new deep-learning model, Ma and his colleagues achieved state-of-the-art performances in visual speech recognition.
Read more: UK to fine social media companies that fail to remove harmful content
However, some of the deep learning models in the past have achieved greater performance on visual speech recognition tasks. Still, they were primarily trained to detect speech in English, as most existing training datasets only include English speech. This limits users to work in English-speaking contexts only. But the new deep learning model by the London researchers can perform in many languages and allow users to work on many speeches.
In the future, the new model can inspire other researchers to develop alternative visual speech recognition models to recognize speech from the lip movements of speakers effectively.