Automatic Speech Recognition & Mixing it with IVR

The first time companies began using Automatic Speech Recognition (ASR) to direct callers through their automated menus, callers were constantly becoming frustrated because in the beginning, speech recognition was crude and very limited in its capability.

Though it is vastly improved from the original versions and used by more businesses all the time, there is still some debate as to whether it is effective, or not. In fact, most callers who are forced to interact with it would probably tell you that they would rather never hear that automated customer service robot again.


How has speech recognition improved?

ASR now uses advanced algorithms to process data it compiles from your voice, filters out background noise, and can even learn your voice patterns in real time, all of which allow the program to understand what you’re saying with far greater accuracy than the original versions.

Speech recognition programs are also making customer service calls more efficient and informative by integrating with CIS (Customer Information Systems). This means that the program can be linked to proprietary databases that have all of a customer’s information, enabling it to pull up a caller’s file using their incoming caller ID—so it knows who you are by the time the call is answered. The program can even be configured to check your recent browsing activity on the company’s website, to predict why you may be calling.


Why do businesses want to use ASR?

Businesses are always eager to embrace new technology when it promises to streamline systems, save money, or even eliminate the costly human element, altogether. What the business aims to accomplish by adding speech recognition to its IVR (Interactive Voice Response, or “auto attendant”), is essentially to replace a human being with a computer program, which saves the business a great deal of money in wages of customer service agents.

What they are hoping is that the technology will continue to get better, until it actually becomes easier and more efficient than speaking to an actual person. It’s just not there yet, which brings us to the downside of ASR.


Most people still hate it.

Despite advances in the technology, most people still consider it an annoyance because the bottom line is that if a customer is calling in, they probably want to speak to a person. One of the problems with a standard auto attendant is that it quite often does not provide a menu option that corresponds with the customer’s reason for calling, which is very frustrating.

Current speech recognition programs have gone through an evolution in the past few years, and the aforementioned issue is one of the things ASR is supposed to solve. The program tells the caller, “You can say things like…”, indicating that you can say whatever you want, and it will connect you with the appropriate department. The problem is that it frequently misunderstands what you’re saying, unless what you say sounds close to one of the phrases it is programmed to recognize. This is why we all end up desperately yelling, “Representative!”

customer on hold2

Why can’t ASR communicate with us, effectively?

ASR processes our speech by looking for sound patterns and cues, and trying to fit the data it gathers from our speech into pre-programmed logic and categories. A computer program doesn’t have as much of a hard time picking out the words that are programmed into its memory; what it has a hard time with is interpreting what we mean, when we don’t exactly use those words in the order that it expects.


The complexity of the human voice.

The human voice is staggeringly complex and nuanced, making it challenging for a computer to process, for that reason alone, but obviously the way we communicate with each other goes way beyond the words we use. In addition to our words, we utilize body language, facial expressions, and even the space between words, not to mention volume and emphasis dynamics, all working together in harmony to convey our message. This is precisely why speech recognition is still a pretty long way from conversing with us in a natural and effective manner.

The bottom line is that when we call a business, or any other organization, what we really want is to speak to a person. But since a computer program is way cheaper than a person, it is becoming increasingly difficult to get through to one.

Although speech recognition is way better than it used to be, we are still filled with dread when we hear the robot voice because we know it may take a while to accomplish the thing for which we are calling. Until either the ASR programs are abandoned, or they truly begin to communicate like a human does, automated customer service will continue to be a challenge.


Stephanie is the Marketing Director at Talkroute and has been featured in Forbes, Inc, and Entrepreneur as a leading authority on business and telecommunications.

Stephanie is also the chief editor and contributing author for the Talkroute blog helping more than 100k entrepreneurs to start, run, and grow their businesses.

StephanieAutomatic Speech Recognition & Mixing it with IVR