Category Archives: Speech Interface

Utter Command Knowledge Base Updated

We’ve updated the Utter Command Knowledge Base with a couple of new pages:
Generally useful software, mostly free
Useful help and effective complaint URLs

“Generally useful software, mostly free” is just what it sounds like. In the coming weeks you’ll see more updates to the Knowledge Base, including strategies on using the software listed on this page.

“Useful help and effective complaint URLs” points you to effective places to complain about problems with common software. Make sure to mentioned that you use speech recognition when you register a bug or complaint about other software you are using. The more obvious it is that speech users are using their software, the more software makers will pay attention to how their software works with speech.

Heads-up: Dragon Recorder iPhone App

By Kimberly Patch

Nuance has released a free iPhone Recorder application you can use with the “Transcribe Recording” feature of Dragon NaturallySpeaking for the desktop.

Dragon Recorder is a relatively simple recorder with a fairly clean interface that lets you record WAV files and transfer them to your computer via wifi. Once the files are on your computer, you can process them through Dragon’s Transcribe Recording feature, which is designed to transcribe the voice of a person who has trained a profile on Dragon NaturallySpeaking. It does pretty well with a relatively quiet recording of just that person’s voice.

Dragon Recorder gives you some useful, basic abilities:

  • You can pause, then continue recording.
  • You can play back the recording on the iPhone, and you can move the pause/play button to jump to different portions of the recording.
  • You can continue recording at the end of any previous recording. This is a little tricky — drag the play button all the way to the right and the play button will turn into a record button
  • You designate the first portion of the name of your file in settings. The second portion of the name is an automatic date and time stamp.

I can think of a couple of additions I’m hoping to see in updates:

  • The ability to bookmark recordings on-the-fly during recording and playback. I’m picturing several types of bookmarks you can use like hash tags. Bookmarks should also show up in the transcription.
  • Although this is designed to be transcribed automatically, it would also be useful to have slider bars for controlling the speed and pitch of recording on playback so you have a good way to manually transcribe as well.

What do you think? Let me know at Kim at this website address or look me up on Google+. Feel free to + me if you want to be in my Accessibility, Utter Command or Redstart Reports circles.

iPhone 4S: speech advances but there’s more to do

By Kimberly Patch

Apple’s iPhone 4S has taken a couple of nice big steps toward adding practical speech to smart phones. There are still some big gaps, mind you. I’ll get to those as well.

Speech on the keyboard

The long-awaited speech button is now part of the keyboard. Everywhere there’s a keyboard you can dictate rather than type. This is far better than having to use an app to dictate, then cut and paste into applications. This is one of the big steps. This will make life much easier for people who have trouble using the keyboard. And I suspect a large contingent of others will find themselves dictating into the iPhone a good amount of time, increasingly reserving the keyboard for situations where they don’t want to be overheard.

The key question about speech on the keyboard is how it works beyond the letter keys and straight dictation.
For instance, after you type
“Great! I’ll meet you at the usual place (pool cue at the ready) at 6:30.”
how easy is it to change what you said to something like this?
“Excellent :-) I’ll meet you at the usual place (pool cue at the ready) at 7:00.”
And then how easy is it to go back to the original if you change your mind again?

Speech assistant

After we all use the speech assistant for a couple of days or weeks it‘ll become readily apparent where Siri lies on the very-useful-to-very-annoying continuum.

The key parameters are
– how much time Siri saves you
– how a particular type of Siri audio feedback hits you the10th time you’ve heard it
– how physically and cognitively easy it is to switch between the assistant and whatever you have to do with your hands on the phone.

One thing that has the potential to tame the annoyance factor is giving users some control over the feedback.

I think the tricky thing about computer-human feedback is it’s inherently different from human-human feedback. One difference is the computer has no feelings and we know that. Good computer-human feedback isn’t necessarily the same as good human-human feedback.

The big gap

There’s still a big speech gap on the iPhone. Speech is still just a partial interface.

Picture sitting in an office with a desktop computer and a human assistant. Type anything you want using the letter keys on your keyboard or ask the assistant to do things for you. You could get a fair amount of work done this way, but there’d still be situations where you’d want to control your computer directly using keyboard shortcuts, arrow keys or the mouse. Partial interfaces have a high annoyance factor.

Even if you use a mix of speech, keyboard and gesture, if you’re able to choose the method of input based on what you want to do rather than what happens to be available, true efficiencies will emerge.

Ultimately, I want to be able to completely control my phone by speech. And I suspect if we figure out how to do that, then make it available for everyone, the general mix of input will become more efficient.

I’d like to see the computer industry tap folks who have to use speech recognition as testers. I think this would push speech input into practical use more quickly and cut out some of the annoyance-factor growing pains.

What do you think? Let me know at Kim@ this domain name.

Good signs around Google accessibility

By Kimberly Patch

It looks like Google is stepping up its accessibility effort and resources.

– Google accessibility page:
http://www.google.com/accessibility/

– Google Accessibility Twitter account:
@gooogleaccess Google Accessibility

– Accessibility Google Group
http://groups.google.com/group/accessible

Here’s a tweet about accessibility in Google+:
http://twitter.com/#!/googleaccess/status/86442474523992065
“We considered accessibility of Google+ from day 1. Find something we missed? Press Send Feedback link & let us know.”

I do think there’s a lot missing.

For starters, Google+ is quite short on keyboard shortcuts (the Google Manager addon addresses this in part). It’s also short on basic keyboard navigation — in a perfect world, the down/up arrows and enter key should allow you to navigate anything that looks like a list or a menu.

Asking for feedback like this is a very good sign, however. One thing I’ve use the Send Feedback link to point out is once you get past a dozen circles or so it’s important to have a list view unless you’re willing and able do a lot of unnecessary scrolling.

Here’s a recent Google blog post about accessibility in Docs, Sites, and Calendar that talks about additional keyboard shortcuts:
http://googleblog.blogspot.com/2011/09/enhanced-accessibility-in-docs-sites.html

Some Google applications are gaining more keyboard shortcuts. You still can’t use down/up arrows on everything that looks like a list or a menu, however.

The bottom line is there are some channels open and some good intentions. This is great. Now let’s hold them to it, and keep the keyboard shortcuts coming.

My #1 request as a speech user is the ability to adjust, organize and share keyboard shortcuts across apps. An adjust-your-shortcuts facility that works across apps would not only be good for many different types of users, it would address a special problem of speech users and the type of keyboard shortcuts that web apps tend to use. More on that issue next.

Discover, Adjust, Organize and Share

By Kimberly Patch

Keyboard shortcuts have a lot of potential. They’re fast.

For example, cutting and pasting by

– Hitting “Control x”
– Moving the cursor to the paste location
– Then hitting “Control v”

is speedier than

– Moving the mouse to the “Edit” menu
– Clicking “Edit ”
– Clicking “Cut”
– Moving the cursor to the paste location
– Moving back up to click “Edit ”
– Then clicking “Paste”.

Add this up over many tasks and you have a big difference in productivity.

So why don’t we see more people using keyboard shortcuts?

Ask someone who uses the mouse for just about everything and you’re likely to get a compelling answer — it’s easier. And it is — it’s cognitively easier to choose a menu item than to remember a shortcut.

Given a choice, people generally do what’s easier. On a couple different occasions I’ve heard  people say that, all else being equal, they’d hire a blind programmer over a sighted one because the blind programmer is faster. The blind programmer must use keyboard shortcuts.

This is a common theme  — we have something potentially better, but human behavior stands in the way of adoption.

In the case of keyboard shortcuts there’s a little more to the story, however.

As a software community we haven’t implemented keyboard shortcuts well.

Many folks know keyboard shortcuts for a few very common actions like cut, paste and bold, but it’s more difficult to come up with keyboard shortcuts for actions like adding a link or a hanging indent because they are used less often and are less likely to be the same across programs.

So the user is often stuck with different shortcuts for the same tasks in different programs, requiring him to memorize and keep track of multiple sets of controls. This is cognitively difficult for everyone, and more so for some disabled populations and the elderly.

This type of implementation is akin to asking someone to speak different languages depending on who they are speaking to. Depending on how motivated and talented they are, some folks may be able to do it, but not many. And if there’s an easier way, even those capable of doing it either way will often choose easier even if it’s less efficient.

So we aren’t letting keyboard shortcuts live up to their potential.

There’s a second keyboard shortcuts issue that’s getting worse as Web apps become more prevalent: clashing shortcuts. If you hit “Control f” in a Google document, do you get the Google Find facility or the browser Find facility? Go ahead and try it out. It’s messy.

This is already an issue in the assistive technology community, where people who require alternate input or output must use software that runs all the time in conjunction with everything else. For example, a speech engine must be on all the time listening for commands, and screen magnifier software must be running all the time to enlarge whatever you’re working in.

So there are two problems: keyboard shortcuts aren’t living up to their potential to increase efficiency, and, especially on the Web, keyboard shortcuts are increasingly likely to clash.

I think there’s a good answer to both problems: a cross-program facility to easily discover, adjust, organize and share shortcuts.

– We need to easily discover shortcuts in order to see them all at once so we can see patterns across programs and conflicts in programs/apps that may be opened at once.

– We need to easily adjust shortcuts so we can choose common shortcuts and avoid clashes. We need to organize so we can remember what we did.

– We need to easily arrange commands and add headings so we can find commands quickly and over time build a good mental map of commands.. Lack of ability to organize is the Achilles’ heel of many macro facilities. It’s like asking people to play cards without being able to rearrange the cards in their hand. It’s possible, but unless there’s a reason for it, makes things unnecessarily difficult.

– We need to share the adjustments because it makes us much more efficient as a community. My friend Dan, for instance, is very logical. He uses many of the same programs I do, and we both use speech input. So if there were a facility to discover, adjust, organize and share keyboard shortcuts, I’d look to see if Dan had posted his changes, and I would adjust to my needs from there.

The organizing and sharing parts are the most important, because they allow for crowdsourcing.

Over the past few decades the computer interface ecosystem has shifted from single, unrelated programs to separate programs that share information, to programs so integrated that users may not know when they are going from one to another. This has increased ease-of-use and efficiency but at the same time complicated program control.

At the same time programs have grown more sophisticated. There’s a lot of wasted potential in untapped features.

If we give users the tools to discover, adjust, organize and share, I bet we’ll see an increase in speed and efficiency and an uptick in people discovering nifty new program features.

Suggestion for Dragon: Easier Correction

In the last couple of months I’ve had a couple occasions to suggest to the folks at Nuance, the company that makes the Dragon NaturallySpeaking speech engine, that their “Resume With” command is under advertised. The command is very useful, but I keep meeting people who don’t know about it.

“Resume With” lets you change text on the fly. For instance, if you say “The black cat jumped over the brown dog”, then — once you see it on the screen — change your mind about the last bit and say “Resume With over the moon”, the phrase will change to “The black cat jumped over the moon.”

This is a particularly useful command for doing something people do a lot — change text as they dictate.

Now I have a suggestion that I think would make the command both better and more often used. Split “Resume With” into two commands: “Try Again” and “Change To”. The two commands would have the same result as “Resume With”, but “Try Again” would tell the computer that the recognition engine got it wrong the first time and you are correcting the error. “Change To” would tell the computer that you are simply changing text.

This would be a less painful way to correct text than the traditional correction box. Users are tempted to change text rather correct it because it’s easier. This would make it equally easy to correct and change using what is arguably the fastest and easiest way to make a change.

Easy correcting is important because NaturallySpeaking learns from correcting and because it’s annoying when the computer gets things wrong. Correcting improves recognition. Minimizing the interruption reduces frustration and lets users concentrate on their work rather than spending time telling Dragon how to do its job. From my observations, many users are tempted to change text rather than correct it when the computer gets something wrong simply because it’s easier.

It would be great to have these commands both in Dragon NaturallySpeaking on the desktop and in Dragon Dictation, the iPhone application. This would enable truly hands-free dictation in Dragon Dictation.

Trying out Dragon Dictation for the iPhone

I’ve been trying out the Dragon Dictation iPhone app. It’s still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. But it’s a step in the right direction of making the iPhone more hands-free.

Here’s how Dragon Dictation for the iPhone works: open the app, hit one button, speak up to 30 seconds of dictation, then hit another button to say you’re done. Your dictation shows up on the screen a few seconds later. Behind the scenes the audio file you’ve dictated is sent to a server, put through a speech-recognition engine, and the results sent back to your screen. Now you can add to your text by dictating again, or hit an actions button that gives you three choices: send what you’ve written to your e-mail app, send it to your text app, or copy it to the clipboard so you can paste it someplace else.

The recognition is usually fairly accurate in quiet environments. Not surprisingly, you get a lot of errors in noisy environments. To its credit, on a mobile device the built-in microphone is not optimal for speech-recognition. It does pretty well given these constraints.

Here’s a practical suggestion that should be easy to implement: Add a decibel meter so people can see exactly how much background noise there it is at any given time. This would make people more aware of background noise so they could set their expectations accordingly.

The interface for correcting errors is reasonable. Tap on a word and there are sometimes alternates available or you can delete it. Tap the keyboard button and you can use the regular system keyboard to clean things up.

I have two interface suggestions:

1. You can’t use the regular system copy and paste without going into the keyboard mode. You should be able to. I suspect this is fairly easy to fix.

2. There is no speech facility for correcting errors. I think there’s a practical fix here as well.

First, some background. Full dictation on a mobile device is tricky. Full dictation speech engines take a lot of horsepower. Dragon Dictation sidesteps the problem by sending the dictation over the network to a server running a speech engine. The trade-off is it’s difficult to give the user close control of the text — you must dictate in batches and wait briefly to see the results. This makes it more difficult to offer ways to correct using speech. But I think there is a good solution already in use on another platform.

Although it’s difficult to implement most speech commands given the server setup, the “Resume With” command that’s part of the Dragon NaturallySpeaking desktop speech application is a different animal. This command lets you start over at any point in the phrase you last dictated by picking up the last couple of words that will remain the same and dictating the rest over again.

This would make Dragon Dictation much more useful for people who are trying to be as hands-free as possible. It would also lower the frustration of misrecognitions and subtly teach people to dictate better.

It’s nice to see progress on mobile speech. I’m looking forward to more.

Speech recognition and Eastern equine encephalitis


I have a bone to pick with Nuance. I’ve several times seen Dragon NaturallySpeaking demonstrators wow people by saying a long phrase. “Eastern equine encephalitis” is a favorite. The implication is if computer speech recognition can get this difficult phrase right, it can get anything right.

The reality is just the opposite, and the demonstration gives people an incorrect mental map of how the speech engine works.

It’s important to have a good mental map of how something works. If your mental map is correct your instincts will be correct. If you’re working with a child you probably have an idea of the types of simple mistakes that child is going to make, and you’ll expect and have more patience for simple mistakes than when you’re working with an expert.

The NaturallySpeaking speech engine is different than either working with a child or an expert — it’s very good at some things, but not so good at others. The mix is different than it is with people. NaturallySpeaking is very good at identifying long words and even better at identifying common phrases — Eastern equine encephalitis is both and therefore very easy. It will rarely get this wrong. What’s more difficult for the engine is getting short utterances and uncommon phrases correct. If you give the speech engine more information to work with — a longer word, a phrase, or even the same word drawn out a bit, it has more information to work with and therefore does better.

A more impressive demo phrase for a speech engine would be “at up be”.

With the correct mental map of what’s easy and what’s difficult for the speech-recognition engine, you’ll instinctively speak in phrases and draw things out a bit if you see the engine start to make mistakes. This is probably different from how you tend to adjust to a person who isn’t hearing. In the case of a person a common instinct is to say one word at a time: “Eastern… equine… encephalitis”, which is more difficult for a speech engine.

The good news is a mental map works on instinct — if your mental map is correct, you often don’t even have to think about adjustments, they flow naturally. The bad news is a mental map works on instinct — if it’s incorrect your adjustments won’t work but it will feel like they should be working.

Rulers right

I’ve changed the way I position the mouse rulers, and it’s changed my behavior.

I used to leave Rulers in the default position of top and left. But lately I’ve been using them on the right and bottom, and and I’m liking this better for a couple reasons. I tend to notice them less when they’re tucked above the Taskbar and off to the right. So I tend to leave them on whether I’m using them or not. More important, they don’t change the position of windows, and so don’t affect named mouse touches.

(To change Rulers so they’re just on the right and bottom say “Rulers On”, “Rulers Right Bottom”)

Where do you like Rulers? Let me know here or e-mail at info at this Web address.

We're live


After working with Beta testers and presales customers for the past year, today we’ve announced the general release of Utter Command.

 

It’s been a long time coming. It started 15 years ago when I got repetitive strain injuries in my hands. I first used the Kurzweil speech engine, and then, when it came out, the first Windows version of DragonDictate, the precursor to Dragon NaturallySpeaking.

 

After several years of writing macros that were similar to everyone else’s — and that I often forgot — I started thinking about the way the brain works with language and started working on a more consistent system. Sometime after that we decided to make a general product out of it. We were thinking it would take six months. It’s taken five years.

 

One of the reasons it took so long is we’ve produced thorough, cross-referenced documentation. Every command is explained. Many thanks to our beta testers, trainers, and presales customers for using and commenting on the UC command system, applets and documentation as we were developing and refining them. Special thanks to Laurie, our VP of QA, and Bill theTrainer for many reads through the documentation and many trips through the self-guided tours.

 

And special thanks to Wren, a programmer who worked with us in the early days. The bird that appears in our logo is the Painted Redstart (we’d already named the company when Wren, also named for a bird, joined us). 

 

Note to presales customers: you should have received your general release copy of Utter Command. Contact us via the support email or Make a Comment contact form if you haven’t.