YouTube’s AI enabled to describe sound effects

YouTube’s AI enabled to describe sound effects

PanARMENIAN.Net - YouTube has used algorithms to automatically caption speech for eight years now in an effort to make its billions of videos more accessible for the deaf and hard of hearing. While the feature was pretty rough at first, it has significantly improved it over time, getting "closer and closer to human transcription error rates," Google said in its developers blog. Since speech is just one part of the audio picture, though, YouTube has launched automatic sound effect captioning for the first time, Engadget says.

For now, the system can just show three classes of sounds: Applause, music and laughter. "These were among the most frequent manually captioned sounds, and they can add meaningful context for viewers who are deaf and hard of hearing," the company wrote.

As with the automatic captions, Google uses machine learning to pick out sounds and display them as text. It developed a "deep neural network (DNN)" model for ambient sound, and trained it with "thousands of hours of videos" to get the best results. The toughest part, it wrote in a technical blog, was separating and displaying events that tend to occur at the same, like laughter and applause.

 Top stories
Yerevan will host the 2024 edition of the World Congress On Information Technology (WCIT).
Rustam Badasyan said due to the lack of such regulation, the state budget is deprived of VAT revenues.
Krisp’s smart noise suppression tech silences ambient sounds and isolates your voice for calls.
Gurgen Khachatryan claimed that the "illegalities have been taking place in 2020."
Partner news
---