Microsoft to use AI to automatically describe images for the blind

December 3, 2016 - 12:35 AMT

PanARMENIAN.Net - Microsoft on Friday, December 2 said that, starting in early 2017, its Word and PowerPoint applications will be able to automatically come up with descriptions of photos that users add into documents. Office 365 subscribers will see it first in Word and PowerPoint for PCs, VentureBeat reports.

Ordinarily, if you drop a photo into PowerPoint you can type out an “Alt Text” title and description for the photo. But not everyone does that when they’re making slide decks. Then, when a blind person opens the slide deck, they won’t be able to understand what’s going on in the picture, which could make the slide or the entire deck more difficult to fully grasp.

Microsoft wants to change that. So it’s chosen to automate the process of making Alt Text for photo, drawing on its Cognitive Services Computer Vision application programming interface (API). “Through machine learning, this service will keep improving as more people use it, saving you significant time to make media-rich presentations accessible,” John Jendrezak, accessibility lead and partner director of program management for Microsoft’s Office Engineering team, wrote in a blog post. To use this feature, you’ll simply have to right-click on a photo and select “Automatic Alt Text.”

The technology uses a type of artificial intelligence (AI) called deep learning to recognize objects in photos and then figure out the best words to explain the photos in its entirety. From there, operating systems’ screen readers can read the captions aloud.

Deep learning generally involves training artificial neural networks on lots of data, such as photos, and then getting the neural networks to make inferences about new data. This is a method that has caught on at Apple, Facebook, Google, and Twitter, as well as Microsoft.

In fact, earlier this year Facebook did something similar to what Microsoft is doing. It started automatically generating captions of photos that people share so that when blind people are scrolling through the News Feed on iOS, the VoiceOver screen reader embedded in iOS can quickly read out automatically generated captions. As a result, blind people can then understand what people are talking about in the text that people include with the pictures they post, as well as the comments that users make.

Twitter recently made it possible for users to manually write captions for images they post. While Twitter does do deep learning research, it hasn’t offered to automatically generate captions for images.

Also in the blog post Jendrezak noted that Microsoft will be merging the separate fields for title and description. That way, he wrote, “you have no confusion about where to enter alt-text.” Currently to add that information to an image, you have to open the Format pane, choose the Size & Properties tab (in PowerPoint; if you’re using Word, the tab is called Layout & Properties), and select the Alt Text dropdown.