Automatic Speech Recognition in voice portal

What is speech recognition or automatic speech recognition ?
Automatic speech recognition can be defined as a process by which spoken words are converted to text. Many people may confuse with voice recognition as speech recognition. Voice recognition is recognising the speaker of the voice, while automatic speech recognition recognise the words spoken. While speech recognition is mainly used for menu selection, voice recognition is used for checking authorised user.

How automatic speech recognition can be used in voice portal ?
Voice portal is a large IVR application which provides many value added services to the voice portal user. Normally, voice portals are operated by telecom operators to provide information of their services, activating deactivating services, utility services like matrimonial, job search, listening to music and download, jokes etc. Since voice portals have large information to be provided to a caller, it needs to give out a long list of options. But the DTMF digits or touch tones on a mobile phone or telephone instrument is limited. So automatic speech recognition is used as an alternative to touch tone or DTMF input for selecting a menu option in the voice portal or IVR.

Advantages of Automatic Speech Recognition over DTMF in Voice Portal:
1. Caller can choose a menu option quickly without having to listen to boring long list of menus. IVR can become truly interactive with its caller using same human voice.
2. Using an IVR was always a problem as one he has to remove it from his ear to look at the keypad and press a key. This can be again quite irritating for a voice portal user. Implementation of ASR completely remove this problem.
3. Change of menu or introduction of new options a menu, or shuffling of options in a menu becomes easier as menu options are the words most commonly used by everybody for a particular option. Using a touchtone or DTMF menu, any option was assigned to a DTMF key or a number. With ASR, it is assigned to the word itself.
4. Not many people are aware of # and * keys as they are hardly used in normal operation while IVR seems to use them for some options for very long menus. ASR does not face this kind of problem.

So, one may find it very useful in a developer point of view to use ASR in voice portal as it gives more flexibility for designing menu and presentation of data. But as user point of view it may not that user friendly as touch tone. This mainly due to lack of 100% accuracy in recognising the word spoken by the caller. Wrong detection of words by ASR may result in very awkward menu options and information fetching. One way of tackling this problem is to confirm a user of his chosen option in the menu with a YES or NO for every menu option. But it would be too irritating for a user! A voice portal is normally a premium service where irritating caller is the biggest crime one can commit!

Problems of Speech recognition
1. Accuracy of recognition
Accuracy of recognition can be thought as converting words spoken by a user accurately to ts corresponding text.
As per information available ( by Googling and visiting ASR engine provider websites), almost all ASR engine has high accuracy for detecting two words, YES and NO. Apart from them, other words like numbers, date of births etc. has lower accuracy in recognising. For example, accuracy of recognition of numbers 1,2,3 etc vary from 87 to 91% depending on ASR engine. Accuracy of recognising other natural English language is fare much worse! As per report, recognising departments in company is maximum 85%!
Though accuracy can be improved by training of ASR, but it is not practical at all and does not serve a voice portal.

2. Different accent
Using ASR in a big country where people speak different languages or same language with different accents,speech recognition accuracy is bound to fare worse.

3. Confirming YES/NO again and again is irritating
People may design IVR very intelligently to confirm for any doubtful word recognition by YES or NO, but it is still irritating for many people and it slows down the time for fetching an information. In a voice portal where caller pays by minute of usage, it may not work in the interest of voice portal users.

Still, automatic speech recognition (ASR) is used widely in voice portal now a days and here are the few services in voice portal where ASR may work well :-

1. Contests
2. Classifieds
3. City wise weather information

{ 2 comment… add one }
  • Mousheer November 9, 2009, 11:18 pm

    Thank you Uttam for your rich post.
    Reading through, I believe it’s very helpful and might need adding a question:
    1. Can we replace DTMF input with ASR in nowadays IVR applications?
    I believe Not. The best IVR applications I’ve ever used were those asking me at the beginning if I prefer to continue in DTMF or ASR. Or even smarter if they offer the input type switch during the call. Sometimes you’ll make your call in a noisy environment, which will not be as accurate for ASR as in a quiet environment. Besides still the PIN codes and long numbers such as accounts or phones are more secured and easier to input them as DTMF.

  • Uttam Pegu November 10, 2009, 10:50 am

    Hi Mousher,
    Thank you for your valuable feedback. I totally agree you!

    Speaking out ones sensitive account number, ATM number and then PIN may be really security hazard!

    DTMF is secure, much faster. Also if one uses wireless headphone in mobile phone, pressing keys is as good as pressing keyboard!

