The Tech Innovators Series is supported by Lenovo. Lenovo does not just manufacture technology. They make Do machines — super-powered creation engines designed to help the people who do, do more, do better, do in brand new ways.
Speech and voice recognition technology have been around for half a century, but it’s still far from mainstream. Where are the machines that write down what we say? Where are the appliances that simply work based on voice commands?
The answer is that speech technology is a tough business. Not only does a machine have to recognize words, but it has to process accents, sentence structure, grammar, language, noise and other factors that help a machine distinguish “pain” from “pane” and a television from a human being.
That’s changing, though, thanks in no small part to Google‘s efforts in developing voice recognition tech. Since 2008, Google has been steadily releasing products that turn voice into text and voice into action. It started with Google Search by Voice for Android, but most recently debuted as an alternative way to search for content on the desktop.
Starting with Mobile
In February 2009, Google introduced Google Search by Voice for Android. This was back when Android ran on the T-Mobile G1, a device with only 192 MB of RAM and Android 1.6. And while Google Search by Voice was far from perfect in those early days, it was still a technological leap for voice recognition. Here was a handheld that could dial and search by voice, something never successfully done at scale before.
Google’s speech recognition technology really took off though with the introduction of Google Voice Actions. “It was a transformative moment for us,” Vanhoucke says. Voice Actions gave users the ability to send texts, make calls, send emails, get directions and play songs without ever having to type.
While it took years for Google to get voice recognition good enough for voice-activated commands, it’s paying off. Vanhoucke says that there has been a sixfold increase in voice traffic as the microphone button has gained more ubiquity across Google’s Android devices.
The challenge of perfecting voice search on mobile wouldn’t even compare to the difficulties Vanhoucke and the speech recognition team faced with desktop voice search, though.
The Difficulties of Voice Recognition on Desktops
Google has received a lot of press recently for Google Voice Search to the desktop. Like Google Search by Voice for Android, Google picks up what the user says through the microphone, quickly calculates what the user said and feeds it back to the user in a matter of seconds.
And this technology was not easy to develop. “In fact, doing voice search on the desktop is a lot more difficult than doing it on mobile. No user would expect that,” says Vanhoucke, who leads Google’s acoustic modeling effort. He outlines two key differences between mobile and desktop search:
- Cell phones are designed for voice input. Desktop microphones are not as sensitive.
- If you’re talking to a microphone in your laptop, you’re sitting far away instead of talking right next to it as you would with a mobile phone. This creates a big difference in noise level and ambient sound.
The result is that new algorithms had to be written that accounted for increased ambient noises and decreased voice clarity. It’s only compounded by the thousands of languages and approximately 230 billion words Google Search by Voice will eventually have to deal with.
Given all of those challenges, it’s astounding that voice search works at all.
The Broader Implications
Voice recognition technology has made some major advances, thanks to Google, but what will that mean for society?
For Google, it means the ubiquity of data. “Speech is another part of the accessibility and ubiquity story of being able to input information on any device,” Vanhouke argues. He doesn’t believe speech will ever replace touchscreens or keyboards, but that it makes it possible to access and input data while on the go (e.g., while driving).
Now that voice recognition is available in our pockets, Google’s vision of opening up new ways of inputting information and communicating is coming true. Much of what has to be done now, Vanhouke argues, is perfecting the system’s accuracy and speed so that it can “deliver on the Star Trek promise of being able to just talk everywhere and anywhere, and we still have a lot of work to do there.” As voice recognition gets more accurate, it becomes more useful.
One thing to watch out for is Android@Home, the search giant’s framework for controlling light switches, alarm clocks and other home appliances through Android devices. The goal is to make the home a smart and connected device.
As Google Android already has voice command technology built-in, it’s not inconceivable that Google or a third party could add voice commands for home appliances. It may not be long until we say “lights on” and “oven heat up 400 degrees” into our Android devices and assume vocal control over everyday household objects.
It might be some time until we’re giving voice commands to robot butlers and having philosophical conversations with our computers, but thanks to Google and other technology companies, the voice barrier to a smarter future is being torn down. In its place is a system for voice search and voice commands that, while still far from perfect, is improving faster than ever.
Series Supported by Lenovo
The Tech Innovators Series is supported by Lenovo. Lenovo makes machines specifically for the innovators. The creators. The people who move the world forward. Machines like the Lenovo ThinkPad and IdeaPad, meticulously engineered with visibly smart second-generation Intel® CoreTM processors to help the people who do, do what’s never been done.