Machine learning technique boosts lip-reading accuracy

An emerging new “lip-reading” technology is likely to transform modern-day crime solving by flawlessly deciphering visually observed speech captured on CCTV, a new research study has claimed.

Highly sophisticated “visual speech recognition” technology developed by a team of University of East Anglia researchers in Norwich can reportedly be employed to detect with a surprisingly high degree of precision what people say in different situations. Highly accurate results can be best achieved through camera footage particularly in the absence of adequate audio support.

According to Helen Bear, researcher associated with the project and a computer scientist at the university, the technology can be applicable to a varying set of scenarios from “criminal investigations” to entertainment.

According to lip-reading experts, the fundamental process of lip reading entails recognizing a pattern of shapes formed by the human mouth and matching them to particular words or their sequence, which makes it a fiercely daunting task for many experienced lip-reading professionals. It is also believed to be more challenging than the audio speech recognition that are common today. However, experts are confident that rapid strides made by technology in recent times will enable researchers to tackle these challenges.

Most experts agree that the most fundamental challenge in automated lip reading is face and lip recognition. And despite making considerable headway into this area, recognizing, extracting, and categorizing the geometric features of the lips during speech remains an exceedingly complicated task.
Lip-reading techniques involve understanding speech by visually interpreting the movements of the lips, face, and tongue primarily in the absence of sound. Lip readers also rely on context, understanding of the language, and any residual hearing. Historically a key communication technique for people who are devoid of hearing or require hearing aid, lip-reading is a widely recognized skill employed by people of all ages associated with any kind of hearing loss.

According to contributing researcher Richard Harvey, machine lip-reading technology can now successfully differentiate between the sounds for a more precise interpretation of speech.

A few years ago German researchers at the Karlsruhe Institute of Technology claim they’ve introduced a lip-reading phone that allowed for soundless communication, a development that was to mark a massive leap forward into the future of speech technology.

They introduced a software that enabled the device to pick up the subtle movements of the mouths as users spoke, subsequently translating into decipherable speech for people on the other end of the call. The process, based on the principle of electromyography, described as the acquisition and recording of electrical potentials generated by muscle activity was thought to be a radical game-changer.

This present research is part of a three-year project and supported by the Engineering and Physical Sciences Research Council (EPSRC). The researchers presented their work at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Shanghai.