≡ Menu

IVRS and Speech Recognition

Speech recognition in IVRS has become more of a necessity after mobile phones have arrived. POTS ( Plain Old Telephone ) System have telephone instrument with proper keypad along with head-set. Procedure of calling is suitable for pressing DTMF based interaction with IVR System!

You normally lift the headset with left hand, dial pressing keys on telephone instrument using your right hand. So, when IVR is connected and you are listening to the voices by IVR System as well as pressing keys! While in a mobile phone, the tiny instrument itself is head-set, so when you are listening to IVR instructions, it is almost impossible to press keys without removing from your ear and looking at keys! The one solution is to use ear-phone or blue-tooth powered wireless headset!

Most probably, this has given prompted IVR developers as well as CTI vendors to come out with speech recognition for IVR Input. Though, speech recognition has evolved a lot, still speech recognition can not claim to perform 100%. This happens mainly because of pronunciation, accents of language from one region to another.

There are many speech recognition engines available. Nuance, Loquendo, Microsoft, Spinvox etc. are few companies which provide speech recognition engines on various technologies.

IVR with speech recognition capabilities are supposed to more advanced and suitable for mobile user, but it is actually very difficult to predict if the IVR application and speech engine will be able to recognise speech command properly. But IVR with Speech Recognition has been on the rise and every IVR Company now a days provide speech recognition. CTI vendors too support speech recognition in their hardware.

But still many people like me, will prefer to use DTMF for giving command to IVR as it sounds more technical. 🙂 Most probably some people will take time get used to the idea of commanding some invisible machines by speaking!

Next post:

Previous post:

{ 5 comments… add one }
  • Rob McGrady March 14, 2009, 9:32 pm

    IVRS in the clinical trial arena normally has callers in fixed locations (hospitals, clinics, doctor’s offices, etc) and (more importantly) requires that callers enter very specific data about the patients they are enrolling into a trial (from height and weight to age to health status). This data can be plugged into an alogorithm that determines the type and amount of medication the IVRS will assign to a patient. For these reasons, I have not seen voice recognition used in clinical IVR systems. If anything, users will prefer a web-based application (IWRS) in place of the phone. As you say, the reliability of existing voice recognition tools is not high enough to employ them in clinical trials. Good blog!!

  • Shailender March 15, 2009, 11:11 am

    It is really difficult for speech recognition technologies to be accurate in India where there are more than 250 languages! Many voice portals here gives out quite absurd result and the relentless confirmation from IVR is quite irritating.

  • Uttam Pegu March 15, 2009, 8:09 pm

    Hi Rob,
    Thank you for the feedback. Accuracy of data seems to be very important in clinical trials. We have tried speech recognition on our few voice portals and the successful recognition never went above 80% even for regular callers.

  • Abhishek Mittal March 27, 2010, 4:35 pm

    Why we should speak digits as choices for IVR. Since we face then the problem of speech recognition. There are normally maximum 5 choices for a typical IVR system. I think that then we can use a same pronunciation for every caller but different duration of that voice.
    Lets take an example:
    We will use pronunciation of Hindi language’s “अ” character. It’s pronunciation is something like English language’s “A” character’s pronunciation.
    Now my suggestion is depicted in the following table :
    Choice Duration in seconds
    1 less than 1
    2 more than 1 and less than 2
    3 more than 2 and less than 3
    4 more than 3 and less than 4
    5 more than 4
    For choice 3, user will speak “अ” for more than 2 seconds and less than 3 seconds.
    Means IVR system will use the time duration of spoken voice by customer.
    Here, for choice 4, there can be slight problem to customer but upto 3 and for 5, i think there is no problem.
    Can it be implemented ?

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.