The SpeechTEK speech conference has a lot to say about the state of the desktop speech interface. The exhibits in and 2006 and 2007 were largely about where all the speech interface action is these days — not on the desktop, but over the telephone with interactive voice response (IVR) systems.
I went to several sessions aimed at the voice user interface designers (Vuids) who construct telephone speech command interfaces (even though I’m something of an imposter as a desktop voice user interface designer — I guess Dvuid would be the appropriate term).
We’re dealing with a lot of the same issues, though often with different twists:
- Making sure people know what to say and stay oriented in the system
- Accommodating beginners and experienced users
- Making the process as fast and efficient as possible so people won’t hit the operator button or hang up (or not use the software — many people who buy desktop speech recognition software end up not using it)
- In both cases the communications relationship is between a person and machine
And we’re looking at similar answers:
- Making commands consistent
- Avoiding ambiguity
- Doing user testing
- Thinking about configuring information in a certain order to make it more memorable (good mental maps and appropriate training wheels)
- And above all avoiding the trap of thinking that people can just say anything because even if you truly could just say anything you still don’t know what to say
I’ve also been thinking about the differences between IVR and the desktop speech interface — these differences make the challenges more difficult or easier for each of the systems.
- Desktop users tend to follow a more predictable curve — they get more experienced or drop it, while for some IVR systems you have occasional users.
- People are more often forced to use IVR, while most people can easily avoid the desktop speech interface if they wish.
- The desktop is capable of both visual and audio feedback, while IVR systems tend to only have audio feedback. (Interestingly, even though most speech engines come with the ability to speak, desktop computer interfaces generally don’t use this feedback channel. We’ve had positive results in user testing of judicious use of audio feedback.)
- Both systems suffer from the widespread use of pseudo natural language. Natural language doesn’t really exist on either type of system and trying to fake natural language creates its own problems.