Google’s AI can now lip read better than humansNovember 24, 2016 - 18:18 AMT PanARMENIAN.Net - Researchers from Google’s AI division DeepMind and the University of Oxford have used artificial intelligence to create the most accurate lip-reading software ever. Using thousands of hours of TV footage from the BBC, scientists trained a neural network to annotate video footage with 46.8 percent accuracy. That might not seem that impressive at first — especially compared to AI accuracy rates when transcribing audio — but tested on the same footage, a professional human lip-reader was only able to get the right word 12.4 percent of the time, The Verge reports. The research follows similar work published a separate group at the University of Oxford earlier this month. Using related techniques, these scientist were able to create a lip-reading program called LipNet that achieved 93.4 percent accuracy in tests, compared to 52.3 percent human accuracy. However, LipNet was only tested on specially-recorded footage that used volunteers speaking formulaic sentences. By comparison, DeepMind’s software — known as “Watch, Listen, Attend, and Spell” — was tested on far more challenging footage; transcribing natural, unscripted conversations from BBC politics shows. More than 5,000 hours of footage from TV shows including Newsnight, Question Time, and the World Today, was used to train DeepMind’s “Watch, Listen, Attend, and Spell” program. The videos included 118,000 difference sentences and some 17,500 unique words, compared to LipNet’s test database of video of just 51 unique words. DeepMind’s researchers suggest that the program could have a host of applications, including helping hearing-impaired people understand conversations. It could also be used to annotate silent films, or allow you to control digital assistants like Siri or Alexa by just mouthing words to a camera (handy if you’re using the program in public). But when most people learn that an AI program has learned how to lip-read, their first thought is how it might be used for surveillance. Researchers say that there’s still a big difference in transcribing brightly-lit, high resolution TV footage, and grainy CCTV video with a low frame rate, but you can’t ignore the fact, that artificial intelligence seems to be closing this gap. Top stories Yerevan will host the 2024 edition of the World Congress On Information Technology (WCIT). Rustam Badasyan said due to the lack of such regulation, the state budget is deprived of VAT revenues. Krisp’s smart noise suppression tech silences ambient sounds and isolates your voice for calls. Gurgen Khachatryan claimed that the "illegalities have been taking place in 2020." Partner news Most popular in the section | European Parliament to discuss repression in Azerbaijan The European Parliament will discuss repression of civil society in Azerbaijan on April 24 PACE wants concessions from Azerbaijan to accept Baku back A PACE co-rapporteur said that Azerbaijani authorities must make certain concessions so that the country can return to PACE. Cyprus parliament honors Armenian genocide victims Acting House President Zacharias Koulias noted that April 24 marks the “black anniversary” of the Armenian genocide. Armenia PM, France envoy discuss regional matters Issues related to the consistent development of Armenia-France cooperation were discussed. |