Archive for the ‘NaturallySpeaking’ Category

Speech recognition and Eastern equine encephalitis

Wednesday, May 6th, 2009


I have a bone to pick with Nuance. I’ve several times seen Dragon NaturallySpeaking demonstrators wow people by saying a long phrase. “Eastern equine encephalitis” is a favorite. The implication is if computer speech recognition can get this difficult phrase right, it can get anything right.

The reality is just the opposite, and the demonstration gives people an incorrect mental map of how the speech engine works.

It’s important to have a good mental map of how something works. If your mental map is correct your instincts will be correct. If you’re working with a child you probably have an idea of the types of simple mistakes that child is going to make, and you’ll expect and have more patience for simple mistakes than when you’re working with an expert.

The NaturallySpeaking speech engine is different than either working with a child or an expert — it’s very good at some things, but not so good at others. The mix is different than it is with people. NaturallySpeaking is very good at identifying long words and even better at identifying common phrases — Eastern equine encephalitis is both and therefore very easy. It will rarely get this wrong. What’s more difficult for the engine is getting short utterances and uncommon phrases correct. If you give the speech engine more information to work with — a longer word, a phrase, or even the same word drawn out a bit, it has more information to work with and therefore does better.

A more impressive demo phrase for a speech engine would be “at up be”.

With the correct mental map of what’s easy and what’s difficult for the speech-recognition engine, you’ll instinctively speak in phrases and draw things out a bit if you see the engine start to make mistakes. This is probably different from how you tend to adjust to a person who isn’t hearing. In the case of a person a common instinct is to say one word at a time: “Eastern… equine… encephalitis”, which is more difficult for a speech engine.

The good news is a mental map works on instinct — if your mental map is correct, you often don’t even have to think about adjustments, they flow naturally. The bad news is a mental map works on instinct — if it’s incorrect your adjustments won’t work but it will feel like they should be working.

Check out NatSpeak Preferred to Pro upgrade

Wednesday, April 29th, 2009

If you’re thinking about upgrading from Dragon NaturallySpeaking Standard or Preferred to Professional now’s the time to do it (Utter Command runs on the NaturallySpeaking Professional engine).

There’s a Dragon NaturallySpeaking pricing special in conjunction with the 10.1 upgrade — for $300 you can upgrade to NaturallySpeaking Professional 10.1 (retail $899) from  the much less expensive Standard or Preferred versions of Dragon 7-9. Dragon resellers are offering the special — here’s a link to one of them: http://www.1st-dragon.com/drupsa.html.

The special pricing is scheduled to last until June 30.

Ten things I’d like to see

Tuesday, March 24th, 2009

In December, 2003 the Boston Voice users group (BVUG) and its New York City counterpart (NYPC) did top 10 lists of what they would like to see in speech recognition engines. At the time both Dragon NaturallySpeaking and IBM’s ViaVoice were available.

Here’s my version for Dragon NaturallySpeaking 10. This list is also posted on the UC Exchange Wiki so I can keep track of whether and when they’re implemented.

1. I’d like a default user option that would let me start the program hands-free.

2. I’d like the ability to check audio settings hands-free.

3. I’d also like ability to save and switch Check Audio settings — this is useful if you travel a lot. I do an audio check whenever I land someplace new, but there’s no reason I should have to do another audio check rather than go back to a saved once I’m back in the office. I have a couple more minor suggestions for the Check Audio dialog box. First, it’s important enough to deserve its own menu item rather than only being buried in the Accuracy menu. Second, there’s an interface gotcha. Once you’ve finished checking the microphone, the focus is still on the go button. If you’re not thinking and click without moving the focus you find yourself checking the microphone again instead of going onto the accuracy check, which at best makes the process longer, and at worst is confusing.

4. I’d like separate the controls for buttons and menus. I’d like to be able to say whatever’s on the button — “yes”, “no”. But at the same time I want a longer command for menu items, e.g. “File Menu” rather than just “File”, because menu options are often active when I’m writing text.

5. The Dragon NaturallySpeaking engine should understand that when I say “Cap” what I’m looking for is a written word, not a number or symbol. “Cap Sixty” should return “Sixty”, not “60″. And “Cap Ampersand” should return “Ampersand” not “&”.

6. In the Spell Correction dialog box, I’d like a way to tell NatSpeak to type a a whole word. I’d like to say the word “Word” to indicate that the rest of the phrase is going to be a word just like I can say “Spell” to indicate that the rest of the phrase is going to be spelled.

7. The old Dragon Dictate where you could say separate words was better for people who have some types of disabilities. Putting a “speak words separately” mode in NaturallySpeaking would help a lot of people.

8. I’d like the option to be able to train the NatSpeak speech engine by repeating audio read to me through headphones rather than reading from text. This would also make training easier for younger kids.

9. I’d like a simple way to duplicate a user. Right now you can do this, but it’s a multistep and confusing process. To make a copy of the current user you have to backup, then restore. A separate menu item for duplicating would take the confusion out of the process.

10. Bring back the Dragon logo:-). The Dragon was much cooler then the green spiky blob.

What do you think of my top 10 list for NaturallySpeaking? What’s yours? Reply here or let me know at info@ this website address.

Ten things I'd like to see

Tuesday, March 24th, 2009

In December, 2003 the Boston Voice users group (BVUG) and its New York City counterpart (NYPC) did top 10 lists of what they would like to see in speech recognition engines. At the time both Dragon NaturallySpeaking and IBM’s ViaVoice were available.

Here’s my version for Dragon NaturallySpeaking 10. This list is also posted on the UC Exchange Wiki so I can keep track of whether and when they’re implemented.

1. I’d like a default user option that would let me start the program hands-free.

2. I’d like the ability to check audio settings hands-free.

3. I’d also like ability to save and switch Check Audio settings — this is useful if you travel a lot. I do an audio check whenever I land someplace new, but there’s no reason I should have to do another audio check rather than go back to a saved once I’m back in the office. I have a couple more minor suggestions for the Check Audio dialog box. First, it’s important enough to deserve its own menu item rather than only being buried in the Accuracy menu. Second, there’s an interface gotcha. Once you’ve finished checking the microphone, the focus is still on the go button. If you’re not thinking and click without moving the focus you find yourself checking the microphone again instead of going onto the accuracy check, which at best makes the process longer, and at worst is confusing.

4. I’d like separate the controls for buttons and menus. I’d like to be able to say whatever’s on the button — “yes”, “no”. But at the same time I want a longer command for menu items, e.g. “File Menu” rather than just “File”, because menu options are often active when I’m writing text.

5. The Dragon NaturallySpeaking engine should understand that when I say “Cap” what I’m looking for is a written word, not a number or symbol. “Cap Sixty” should return “Sixty”, not “60″. And “Cap Ampersand” should return “Ampersand” not “&”.

6. In the Spell Correction dialog box, I’d like a way to tell NatSpeak to type a a whole word. I’d like to say the word “Word” to indicate that the rest of the phrase is going to be a word just like I can say “Spell” to indicate that the rest of the phrase is going to be spelled.

7. The old Dragon Dictate where you could say separate words was better for people who have some types of disabilities. Putting a “speak words separately” mode in NaturallySpeaking would help a lot of people.

8. I’d like the option to be able to train the NatSpeak speech engine by repeating audio read to me through headphones rather than reading from text. This would also make training easier for younger kids.

9. I’d like a simple way to duplicate a user. Right now you can do this, but it’s a multistep and confusing process. To make a copy of the current user you have to backup, then restore. A separate menu item for duplicating would take the confusion out of the process.

10. Bring back the Dragon logo:-). The Dragon was much cooler then the green spiky blob.

What do you think of my top 10 list for NaturallySpeaking? What’s yours? Reply here or let me know at info@ this website address.

Friday Tip: Remembering boilerplate and vocabulary commands

Friday, August 1st, 2008

3

NatSpeak boilerplate Text and Graphics commands allow you to insert any text or graphics into a document using a single speech command. These commands can be very powerful — they’re good for adding text and graphics that you use often, such as your address or a set of directions.

The NatSpeak Vocabulary editor allows you to add words or phrases to your vocabulary that have different spoken and written forms. This allows you to make words like your email address easily pronounceable.

The key to using boilerplate and vocabulary commands is being able to remember them.

There are two ways to make these types of commands easy to remember:

1. Word them consistently

2. Make them easy to look up

I find the easiest way to remember boilerplate Text and Graphics commands is to simply say the first part of the text you’re inserting followed by “Full”. So “Redstart Full” prints the full name and address of Redstart Systems. If you have two different versions of the address, add a number. “Redstart Full 1″ prints the same address in a different format.

You can use the Utter Command Clipboard facility to make anything easy to look up. Once you name your Text and Graphics command say “Line Copy To” followed by the name of the UC Clipboard file and you’ve got it recorded. For example, to keep your boilerplate commands in “UC List 1″ say “Line Copy To List 1″.

Now any time you want to consult your list of commands say “List 1 File”. You can also print it out.

I also use the start-to-say method for vocabulary words that have different written and spoken forms. I’ve put my Redstart email address in as a vocabulary word with the spoken form “Kim at Red” and my Gmail address in as a vocabulary word with the spoken form “Kim at G Mail” (in address commands I use “Kim” whether or not the actual address is just Kim or something longer).

One caution in using vocabulary in this way — make sure commands are at least two words and make sure the two words are not a common phrase that you’d want to say as is. If you need to, use the “Full” method above to avoid this problem. Also make sure to save your user after adding vocabulary words.

If you wish, keep vocabulary words that have different written and spoken forms on the same list as your boilerplate commands.

The difference between boilerplate commands and written/spoken vocabulary words is a block of boilerplate is returned exactly as written, while vocabulary commands are treated like words, with appropriate spacing before and after them.

UC Commands Tip: say “NatSpeak” followed by the first one or two words in a NatSpeak dialog box title to call up that dialog box.

Commands for the dialog boxes mentioned above:

“NatSpeak My Commands” calls up the NatSpeak My Commands dialog box where you can write a boilerplate Text and Graphics macro

“NatSpeak Vocabulary” calls up the NatSpeak Vocabulary Editor dialog box