Last week, Google parent Alphabet Inc (NASDAQ:GOOGL) released a report about their new milestone in artificial inteligence [AI] speech recognition that according to the company outperforms existing technology by 50%. Microsoft Corporation (NASDAQ:MSFT) did not want to be left behind as it too announced an amazing breakthrough for this market. According to Xuedong Huang, chief speech scientist for Microsoft, their researchers achieved a word error rate (WER) of 6.3%. It is considered the lowest in the industry.
Microsoft recorded the benchmark rate by combining neural network based acoustic and language modeling on the US National Institute of Standards and Technology (NIST) 2000 Switchboard speech recognition task. This is a conversational telephone speech recognition test used as an industry standard.
This document found on arXiv.org contained the email reporting the discoveries of the Microsoft researchers:
“We describe Microsoft’s conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.3%, representing an improvement over previously reported results on this benchmark task.”
Last weekend, the international conference speech communication and technology called ‘Interspeech’ was held in San Francisco. During the event, IBM proudly announced that it was able to reach a WER of only 6.6%. Over two decades ago, the top error rate of the best published research system for computer speech recognition was at 43%.
Experts say that these scientific achievements could open the doors to greater possibilities. Computers will soon be able to understand the words people say just like any ordinary person. This is practically what Microsoft aims to accomplish. The company wants to make computer-human interaction more personal. The gradual integration of this technology can be seen in different applications like the Cortana personal assistant, Apple’s Siri, Skype translator, and other services dependent on speech and language recognition.
Microsoft’s achievement also takes them closer to their ultimate goal to provide AI technology that has the ability to predict the needs of the users without waiting for a command. It also paves the way to the development of Microsoft’s plan to build intelligent systems that can see, hear, speak, and understand. If this happens, everything will be made easier for man.
Microsoft and IBM both claim that their advances in speech recognition were made because of the discovery of deep neural networks used for speech processing. The artificial neural networks were made according to the patterns of how scientists perceives the brain to function. For decades, scientists have been finding ways to teach computers to perform activities that are unique to humans like speech comprehension and image recognition.