Utter Command from Redstart Systems

Structured vs. Natural Language

Many speech interfaces, including the native NaturallySpeaking commands, are aimed at allowing you to speak to the computer the same way you speak to another person. True natural language understanding is a major area of computer science research, but it's a difficult problem and natural language understanding is many years from practical use. Many of today's speech interfaces use a pseudo-natural language approach that, instead of true understanding, provide several ways to say a given command.

There are three serious drawbacks to the pseudo-natural language approach.

1. The programs don't cover all the ways to say a given command. When people are left to figure out command wording for themselves, they often use wording that's not included.

When the computer doesn't respond to a command, there are several possibilities for what went wrong — the computer may not have interpreted your words correctly, or those words may not be correct wording for that particular command. Having several possibilities for what went wrong makes it difficult to know what to do next. If the computer didn't interpret your words correctly, you should repeat the command. If the words aren't correct for that particular command, you should try another wording.

Having multiple wording possibilities for commands also makes it difficult to provide full, usable documentation; users are advised to guess rather than look up commands because the on-line facility to look up a command from the full command list is slow and awkward.

This makes speech recognition software frustrating to use.

In contrast, the structured grammar approach used by UC provides rules and words that make it easy to learn commands.

2. Having many ways to word commands means the computer must listen for many different possibilities, which slows the computer's response time. Synonymous ways to word commands also means you must choose one way, which slows your response time.

This makes using a computer slower and more difficult than it needs to be.

3. Synonymous commands make it impossible to combine several computer steps into one command. To carry out a task on a typical computer using the keyboard and mouse, you often must carry out many steps to accomplish a single task such as finding a particular file. This is because the keyboard and mouse have real estate limitations — a finite number of keys on the keyboard, and a finite amount of space on the screen used for mouse choices. In theory, speech doesn't have a real estate problem — there are many words and word combinations available. The pseudo-natural language approach, however, squanders this potential.

If you have an average of 5 ways to say each of only 20 commands and you'd like to be able to combine any 2 of these commands, the computer must listen for 100 x 95, or 9,500 possible combinations. Allowing for three-command combinations of the same 20 commands (100 x 95 x 90) adds 855,000 combinations. Four-command combinations (100 x 95 x 90 x 85) add 72 million more commands.

This generally limits pseudo natural language systems to commands that mimic individual keyboard and mouse steps. In contrast, the structured grammar approach used by UC makes it possible to combine commands, which greatly speeds computing.

Structure preferred

An independent study (www.cs.cmu.edu/~usi/papers/HLT04.pdf) by researchers at Carnegie Mellon University found that 74% of users prefer a structured rather than natural language approach to speech recognition.

- Kim Patch, 2006

Back to Human Machine Grammar