Archive for the 'Speech Interface' Category

Trying out Dragon Dictation for the iPhone

Wednesday, December 16th, 2009

I’ve been trying out the Dragon Dictation iPhone app. It’s still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. But it’s a step in the right direction of making the iPhone more hands-free.

Here’s how Dragon Dictation for the iPhone works: open the app, hit one button, speak up to 30 seconds of dictation, then hit another button to say you’re done. Your dictation shows up on the screen a few seconds later. Behind the scenes the audio file you’ve dictated is sent to a server, put through a speech-recognition engine, and the results sent back to your screen. Now you can add to your text by dictating again, or hit an actions button that gives you three choices: send what you’ve written to your e-mail app, send it to your text app, or copy it to the clipboard so you can paste it someplace else.

The recognition is usually fairly accurate in quiet environments. Not surprisingly, you get a lot of errors in noisy environments. To its credit, on a mobile device the built-in microphone is not optimal for speech-recognition. It does pretty well given these constraints.

Here’s a practical suggestion that should be easy to implement: Add a decibel meter so people can see exactly how much background noise there it is at any given time. This would make people more aware of background noise so they could set their expectations accordingly.

The interface for correcting errors is reasonable. Tap on a word and there are sometimes alternates available or you can delete it. Tap the keyboard button and you can use the regular system keyboard to clean things up.

I have two interface suggestions:

1. You can’t use the regular system copy and paste without going into the keyboard mode. You should be able to. I suspect this is fairly easy to fix.

2. There is no speech facility for correcting errors. I think there’s a practical fix here as well.

First, some background. Full dictation on a mobile device is tricky. Full dictation speech engines take a lot of horsepower. Dragon Dictation sidesteps the problem by sending the dictation over the network to a server running a speech engine. The trade-off is it’s difficult to give the user close control of the text — you must dictate in batches and wait briefly to see the results. This makes it more difficult to offer ways to correct using speech. But I think there is a good solution already in use on another platform.

Although it’s difficult to implement most speech commands given the server setup, the “Resume With” command that’s part of the Dragon NaturallySpeaking desktop speech application is a different animal. This command lets you start over at any point in the phrase you last dictated by picking up the last couple of words that will remain the same and dictating the rest over again.

This would make Dragon Dictation much more useful for people who are trying to be as hands-free as possible. It would also lower the frustration of misrecognitions and subtly teach people to dictate better.

It’s nice to see progress on mobile speech. I’m looking forward to more.

Speech recognition and Eastern equine encephalitis

Wednesday, May 6th, 2009

I have a bone to pick with Nuance. I’ve several times seen Dragon NaturallySpeaking demonstrators wow people by saying a long phrase. “Eastern equine encephalitis” is a favorite. The implication is if computer speech recognition can get this difficult phrase right, it can get anything right.

The reality is just the opposite, and the demonstration gives people an incorrect mental map of how the speech engine works.

It’s important to have a good mental map of how something works. If your mental map is correct your instincts will be correct. If you’re working with a child you probably have an idea of the types of simple mistakes that child is going to make, and you’ll expect and have more patience for simple mistakes than when you’re working with an expert.

The NaturallySpeaking speech engine is different than either working with a child or an expert — it’s very good at some things, but not so good at others. The mix is different than it is with people. NaturallySpeaking is very good at identifying long words and even better at identifying common phrases — Eastern equine encephalitis is both and therefore very easy. It will rarely get this wrong. What’s more difficult for the engine is getting short utterances and uncommon phrases correct. If you give the speech engine more information to work with — a longer word, a phrase, or even the same word drawn out a bit, it has more information to work with and therefore does better.

A more impressive demo phrase for a speech engine would be “at up be”.

With the correct mental map of what’s easy and what’s difficult for the speech-recognition engine, you’ll instinctively speak in phrases and draw things out a bit if you see the engine start to make mistakes. This is probably different from how you tend to adjust to a person who isn’t hearing. In the case of a person a common instinct is to say one word at a time: “Eastern… equine… encephalitis”, which is more difficult for a speech engine.

The good news is a mental map works on instinct — if your mental map is correct, you often don’t even have to think about adjustments, they flow naturally. The bad news is a mental map works on instinct — if it’s incorrect your adjustments won’t work but it will feel like they should be working.

Rulers right

Thursday, April 30th, 2009

I’ve changed the way I position the mouse rulers, and it’s changed my behavior.

I used to leave Rulers in the default position of top and left. But lately I’ve been using them on the right and bottom, and and I’m liking this better for a couple reasons. I tend to notice them less when they’re tucked above the Taskbar and off to the right. So I tend to leave them on whether I’m using them or not. More important, they don’t change the position of windows, and so don’t affect named mouse touches.

(To change Rulers so they’re just on the right and bottom say “Rulers On”, “Rulers Right Bottom”)

Where do you like Rulers? Let me know here or e-mail at info at this Web address.

We’re live

Tuesday, April 21st, 2009

After working with Beta testers and presales customers for the past year, today we’ve announced the general release of Utter Command.

 

It’s been a long time coming. It started 15 years ago when I got repetitive strain injuries in my hands. I first used the Kurzweil speech engine, and then, when it came out, the first Windows version of DragonDictate, the precursor to Dragon NaturallySpeaking.

 

After several years of writing macros that were similar to everyone else’s — and that I often forgot — I started thinking about the way the brain works with language and started working on a more consistent system. Sometime after that we decided to make a general product out of it. We were thinking it would take six months. It’s taken five years.

 

One of the reasons it took so long is we’ve produced thorough, cross-referenced documentation. Every command is explained. Many thanks to our beta testers, trainers, and presales customers for using and commenting on the UC command system, applets and documentation as we were developing and refining them. Special thanks to Laurie, our VP of QA, and Bill theTrainer for many reads through the documentation and many trips through the self-guided tours.

 

And special thanks to Wren, a programmer who worked with us in the early days. The bird that appears in our logo is the Painted Redstart (we’d already named the company when Wren, also named for a bird, joined us). 

 

Note to presales customers: you should have received your general release copy of Utter Command. Contact us via the support email or Make a Comment contact form if you haven’t.