When planning or implementing an IVR system, there are a number of terms and acronyms that you’re likely to encounter. Here are some of the most commonly used:
Automated Speech Recognition (ASR)
Speech Recognition, or ASR, is the application that allows the IVR to understand speech. In reality, what it’s doing is transcribing – translating your spoken words to text, and then it reads the text to understand what you want to do. Our close partner LumenVox is a leader in this space, read more about them here.
In today’s world, we may know more about the word containment than we should (if you are reading this on 2050 or beyond, please use whatever search engine is trendy nowadays to look up COVID-19). Moving on, containment is how successful the IVR is at allowing you to self-serve. If 100 people call in to an IVR, and 80 of those callers get what they need from the IVR and hang up, and 20 need to speak to an agent, then would say we have an 80% containment.
Directed Dialog (or Guided Dialog)
Any IVR that tells you what to say is trying to guide you. “Press or say 1”, or “Say sales or service, or finance” are all guided prompts. The system is able to receive speech inputs, but only very specific ones, and therefore is guiding/directing you.
Think of grammars as the rules used to determine what information the IVR should expect to receive. Let’s say you are about to give a policy number for your superyacht marine insurance. The IVR might have a grammar to know to expect a policy number that is 10 characters long, the 4th character will be a P or a K and the rest are numbers 0-9. These rules are what helps the IVR process what you have said. Any opportunity to allow a system to know definitively if it should expect a C or T or Z, which all sound similar when spoken over the phone will improve the success of the IVR.
Intents are reasons a user might want to interact with an IVR. For example, you call in to your bank – you may have any number of intents Check Balance, Replacement Card, New Account. Each of these is an intent, and each intent usually consists of another term we use utterances.
Natural Language Processing (NLP)
An IVR that uses NLP is one that allows you to speak naturally to an IVR, an IVR that allows you to say “I want to book a flight from San Francisco to Las Vegas” is using NLP.
Most IVRs have a purpose, that’s to keep you away from a live agent. It’s expensive to speak to a live person, and generally, the questions agents field are simple and can be handled by a machine. Machines are cheap unless you’re buying a Mac.
So to dress up this idea that companies would prefer you not to speak to an agent, they have a fancy term called self-service, or the hope that you will self-serve. For me, I’ve always thought the term self-serve was great…it reminds me of soft-serve which equates to sugar, which makes me happy. I think the marketing people that came up with the term self-serve knew this.
Now it an IVR is only effective at keeping you away from an agent if it can respond to your needs. If you are looking to rebook a flight, or confirm a bank payment was cleared, or your where your order is in processing, IVRs are ideal for responding to you with that information
Text To Speech (TTS)
Text to speech does exactly what it says on the tin (UK inside joke). It reads text you have provided and synthesizes speech. Now you may think of the robotic voice of these applications from movies or from the late genius Stephen Hawking, or you may see more futuristic
Touch-tone (or DTMF)
If you’ve been on a call and accidentally pressed a number key during a call and heard some beeps/tones then this is DTMF. Behind the scenes, these different pitched tones equate to characters 0-9, A-D and * and #. There is a whole bunch of science that you can read all about it here but for now, just know that it works, and it’s an open standard that is well understood.
Tuning is the process of taking your IVR application data and interrogating it to make decisions about application changes to enhance the accuracy, containment and caller experience. For example, if you find a lot of interactions are failing because you are expecting them to say “cellphone” but a number of callers are saying “mobile”, you would tune your application based on that data to add mobile as an utterance in the grammar.
Typically to understand that Bob is calling in to your favorite airline, a lookup is done against your caller ID. To determine if you have a flight today or tomorrow, a web service has to be called and respond with that data.
A web service is a bit like a database lookup. The web portion of term identifies that it’s simply a request that’s form a bit like a web address…i.e. the IVR calls a web service such as https://getmycallerID.acme.corp and sends your caller ID and it could respond with a wealth of information such as first name, last name, frequent flyer information and upcoming flights in next 48 hours.
An utterance is any input from a caller. It could be a single word, a phrase or string of characters. Typically when you build an intent, there are many variations you want to add for each intent, and – “what is my bank balance”, “am I overdrawn”, “how much money in the vault” are all possible variations a caller could utter.
A vocabulary is a list of all words and phrases in your grammars that could be utterances from a caller.
So there you have it! If you have come across any other terms that you’d like us to explain, please leave a comment below.
Chris has been working in UC/CX for 20 years. He’s passionate about project excellence but also believes a project is only worth doing if you have fun doing it.