Voice recognition has been around for a long time, and the ability to talk to your computer or device conversationally and have it respond exactly as you want was perfected decades ago – in the movies. Real-life voice recognition technology hasn’t always worked quite as well. In the 90s, I struggled with Dragon Dictate, finally giving up with the conclusion that I could type much more quickly than the software could figure out my words. Things have improved since then – a lot – but I still get a good laugh now and then at the way Siri, Cortana and Google Voice misinterpret what they hear.
Here’s a recent transcript of a Google voicemail message:
“Hey Dad, This is wrong. Robertson just want to confirm that we can meet and greet is Wednesday at 1:30. It’s gonna be. You know the frankenstein round okay. I’m. S. Not sure it’s gonna be published in the sometimes they didn’t know anything about that, so I’m trying to get that dressed now I’m taking care of. Also, so we just need to get the word out this weekend. Okay, if you need to give me a shout. ON my phone with me bye bye.”
Luckily, the voicemail email also includes a link to the sound file so I can play it and find out what the caller really said. While this is an extreme example – the service handles some people’s speech much better than others’ – it illustrates the fact that we aren’t yet at the point where we can do everything by simply talking to our computers, a la Star Trek (and even on the Enterprise, you’ll notice that when the situation really got serious, Mr. Data’s fingers started flying over the LCARS touchscreen.
Voice input makes sense, though, here on earth as well as on a starship, given the increasing mobility of computing devices and especially the coming explosion of wearable tech. Typing into a tiny smart watch screen isn’t going to cut it. That doesn’t mean the transition is going to go smoothly. As someone whose Texas twang regularly gets mangled in translation from speech to text, I have mixed feelings about the day when I’ll have to wean myself from my keyboard. Nonetheless, I know talking to my tech gadgets is the future, so let’s look at where that technology has been, where it is today, and where it’s headed.
Thanks to my first summer jobs as a legal secretary and court reporter transcriptionist, I become a fast typist at a young age, and decades of working with computers have kept my keyboard skills intact. However, for those who can’t type 90 words per minute, voice input can be a godsend. And even top typists can be frustrated by the small on-screen keyboards on tablets and smart phones. I find myself talking to my phone more and more these days. And ironically, many courts now use STTR (Speech To Text Reporting) machines to record proceedings in place of the old shorthand machines.
Traditionally, there were two different types of voice recognition: voice command and dictation. The former is used to tell your computer what to do (“Open application,” “Save file” and so forth). The latter is used in speech-to-text applications, to compose documents or fill text fields in forms. Because commands are shorter and simpler, they’ve generally worked better. Dictation of complex sentences, especially by those with expansive vocabularies, prove a little more challenging.
In the olden days, you had to spend hours “training” your speech recognition program, reading pre-defined text to it so the system could analyze your voice. This results in more accurate results, especially for those whose accents may not fit the software’s expectations. The time involved in teaching the programs to understand you discouraged many people from using them. Many speech recognition apps today are speaker-independent and recognize a wide variety of pronunciations, making them more “turnkey.”
The lines between command and dictation have blurred somewhat with modern applications that allow you to “converse” with your device. When I tell Google Now to “look up voice recognition,” that’s a command, but what I’m basically doing is dictating a search term for it to “type” into the search engine. Note that although it might seem as if these voice-activated “assistants” are pretty smart, this technology really isn’t artificial intelligence. They don’t analyze your words and “understand” your requests; rather they send queries to search engines and respond with results that are translated to speech. This article, although a little outdated, does a pretty good job of explaining the difference.
Intelligent or not, for some applications, voice input is not only convenient; it’s necessary. Using your phone’s built-in GPS navigation app is one example. Fiddling with typing an address into the Nav interface while you’re driving can be dangerous, and in some jurisdictions is illegal. In fact, Google Nav was the first app in which I started using voice input extensively, and branched out from there.
One problem with voice input today is that there are so many competing technologies. It’s bad enough that each mobile OS vendor has its own unique voice program, but even on the same device you may find multiple means for using speech recognition. On my Galaxy Note 3 (and other Galaxy phones), Samsung has installed its own S Voice, along with Google Now that comes built into Android. Since I prefer the latter, I’ve installed an app (S For Switch Voice) that makes Google Now the default instead of S Voice.
Voice input never caught on widely with desktop users, but that might change in the future. There have been rumors that Microsoft might include its Cortana voice-input personal assistant software in the next version of Windows (currently known as “Threshold” or Windows 9). Cortana (named after the fictional artificial intelligence in Halo video games) has received some good reviews and in fact, some in the industry are even asking if it’s going to be the killer app that saves Windows Phone.
As the Big Three tech players battle it out for dominance in the new voice-centric world, it’s becoming clear that speech recognition is destined to play a much bigger role in our computing lives in the coming years. Now if you’ll excuse me, I need to go have a chat with my phone.