Category Archives: Speech Commands

Utter Command Knowledge Base Updated

We’ve updated the Utter Command Knowledge Base with a couple of new pages:
Generally useful software, mostly free
Useful help and effective complaint URLs

“Generally useful software, mostly free” is just what it sounds like. In the coming weeks you’ll see more updates to the Knowledge Base, including strategies on using the software listed on this page.

“Useful help and effective complaint URLs” points you to effective places to complain about problems with common software. Make sure to mentioned that you use speech recognition when you register a bug or complaint about other software you are using. The more obvious it is that speech users are using their software, the more software makers will pay attention to how their software works with speech.

Getting Gmail working well with speech commands

By Kimberly Patch

If you haven’t used speech commands to control a computer, it might not be obvious that single character commands, for instance “y” to archive a message in Gmail, can present a challenge.

Single-character commands seem like a great idea, especially for Web programs, because your Web browser already takes up some common keyboard shortcuts. Gmail has a lot of single-character commands, and once you get to know them you can fly along using the keyboard. In general I’m all for more keyboard shortcuts because it’s easy to enable them using speech.

Command conundrum

Single-character commands that can’t be changed, however, can get speech users in a lot of trouble. Say a command or make a noise that’s misheard as text in a program that doesn’t use single-character shortcuts and either nothing happens or you get some stray text you can easily undo. Do the same thing in a single-character-command program and you can cause many actions to happen at once.

A stray “Kelly” in your Gmail inbox, for instance, will move the cursor up one message (single-character command “k”) and archive it (single-character command “y”). “Bruno” causes even more damage.

Turn off the keyboard shortcuts, though, and the program becomes fairly inaccessible for speech users. We need the shortcuts, and we can combine multiple keystrokes into single utterances to make things even better. It’s having little control over them that presents a problem.

Speech-safe single character shortcuts

Google Labs has a nifty extension that presents a simple fix. It lets you change the characters you use for keyboard shortcuts, including using two characters rather than one. Add a plus sign (+) to the beginning of every shortcut and they all become speech-safe.

Here are step-by-step instructions.
– go to your Gmail account, click the settings gear icon at the top right of the screen
– click “Labs”
– search for the “Custom Keyboard Shortcuts” extension and click to download. This will add a ”Keyboard Shortcuts” tab to your Gmail settings
– now, click the settings gear icon at the top right of the screen
– click Keyboard Shortcuts
– add “+“ to the beginning of every command

If you’re using Utter Command 2.0 you’re now all set. Say “Plus” and any one- or two-character command. Say, for instance “Plus j” or “Plus Juliet” to move down one item. You can also say a command multiple times in a single utterance. Say “Plus j Repeat 5” to move down five items, for instance. And you can combine two commands: “Plus j Plus y” moves down one item, then archives that item (say “Question Mark” to call up the keyboard shortcuts list.)

Raising the bar

The Google Labs add-on enables Gmail for speech users, but there are many other programs out there that use single-character shortcuts, including other Google programs, and other Web-based programs like Twitter. Message for Google: How about one facility that would let us control keyboard shortcuts across Google programs?

It would also improve things if we could have a larger number of characters available for a given character shortcut, the ability to also control control-key shortcuts, the ability to save and share different sets, and the ability to apply at least some shortcuts across applications

Important Note: If you were a beta tester or received the Utter Command 2.0 pre-release, you might not have the “Plus” set of commands. If this is the case, send e-mail to “Info” at this web address, and we’ll make sure you have the release version. The release version shows 15 new sets of commands on the “New commands for 2.0” list you can open from the Taskbar icon menu.

Tips, tricks, productivity, accessibility, usability and all things speech recognition.

Posting to Word Press by speech

I get a lot of inquiries about how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress:

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say
– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say
– “Find Mark 1”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second one positions the cursor two lines below it at the top of the section. Then I say
– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique is I don’t count how many words I want to select back. I just make sure to select more words than I need to change, then look to see what is selected and resay what I need to.

I also make use of the Dragon inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use  “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”.

We’ve just been testing a series of commands that lets you use a mouse without clicking, and I’ve been experimenting with commands like “Touch Word” and “Touch 3 Words” to select text.

Posting

After I’ve written and edited a piece, I say
– “Find Mark 1”, then “2 Down Home” to put the cursor at the beginning of the headline
Then I use several “1-100  Up\Downs” commands combined with a copy command to select the story, e.g. “50 Downs”, “20 Downs”, “5 Ups Copy”

Then I open the page where I post by saying
– “WordPress Site”
If I’m not already logged on it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored I can login in a single utterance:
“<username> Enter”
Once I’m in I say
– “31 Go” to click the “New Post” link
– “Tab Paste” to tab to the body field and paste the text
– “Go Top” to move the cursor to the top of the file
– “Line Cut” to cut the headline
– “2 Delete” to remove the extra lines
– “49 Go” to move to the headline field
– “This Paste” to paste the headline

Categories and Publish

I add categories using the Go numbers, one or two at a time , e.g. “31 Go” to add one category and “38 Go 41 Go” to add two categories in a single utterance, and use a Go number to hit the “Preview” button.

Then I look over the post, say “Doc Close” to close the preview, and use a Go number to hit “Publish”.

Avoid having to remember commands

I think the key to enabling a program for efficient speech control is to take the time to look at what you want to do in detail and plot it out — take the time to write out the steps. Make a game of figuring out just how efficient you can be. Then take the steps and put them in one of the UC Custom Guides, so you can call it up instantly, e.g. “Custom 3 Guide”, and read the set of commands to carry out the task.

This way you don’t have to remember commands. Eventually, after using the guide a bunch of times, you’ll have the sequence memorized without having to consciously memorize it.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line at kim @ this web address.

I get a lot of inquiries into how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress.

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say

– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say

– “Find Placeholder”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second command positions the cursor two lines below it, so the new ideas are always at the top of the section. Then I say

– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique — I don’t count how many words I want to select back — I just make sure to go over the number I want to change, then I look to see what is selected and resay what I need to. I also make use of the Dragon Inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use  “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”. We’ve just been testing out a series of commands that lets you use a mouse device without clicking, and I’ve found that commands like “Touch Word”and ”

Posting

After I’ve written and edited a piece, I select the blog text and say

– “Copy to 1 File” to copy story to the use the clipboard “1 File” so I can paste it later

– “2 Up” to unselect and put the cursor on the headline, and

– “Line Copy” to copy the headline

Once I have the blog and headline loaded up, I open the page where I post by saying

– “Word Press Site”

If I’m not already logged on it it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored This is all I need to say to login:

“<username> Enter”

Once I’m in I say

– “31 Go” to click the “post” link

– “Paste Tab” to paste the headline and tab to the next field

– “1 File Paste” to paste the blog text.

I think the key to enabling a program for efficient speech control is to take the time to look at what you want to do in detail and plot it out — take the time to write out the steps. Make a game of figuring out just how efficient you can be. Then take the steps and put them in one of the UC custom guides, so you can call up instantly and simply read the set of commands to carry out the task, e.g. “Custom 3 Guide”. This way you don’t have to remember commands. Eventually, from the repetition and saying and picturing the commands in the guide, you’ll have the memorized. But you won’t have to spend extra energy while you’re trying to do your work memorizing them.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line.

Long web documents at a glance

There are two ways to speed up a computer task: Carry out the same steps you always have, but go faster, or find an easy-to-use tool that requires fewer steps.

If you need to navigate through long documents on the Internet — papers, standards documents, patent documents etc. — the Firefox HeadingsMap extension will save you a lot of time. It lets you navigate using a map of the headings in a document. The headings map also gives you a great overview — a quick mental map of the document. It works especially well with speech. And it shows errors in headings, which is useful when you’re putting together a long document.

HeadingsMap shows up as a small symbol containing an “h” on the Status bar at the bottom left corner of the Firefox window. If your Firefox window is maximized the “h” appears immediately above the “Start” button.

Click the “h” and a narrow window appears on the left containing all the headings and subheadings in a document. Click the “h” again and the window disappears. Right-click on the “h” and you’re presented with configuration options. I usually uncheck the “levels” checkbox, which makes the headings map a little cleaner looking.

In general, there are three different ways to navigate among items on tree views like the headings map:
– the mouse
– the Up/Down arrows
– the letter keys

The most efficient way to implement letter keys navigation is to let the user type more than one key of a selection, say “d o” to select “dove” rather than “dinosaur”. A less efficient way is to treat every letter as a new navigation event and jump to the next instance beginning with that letter.

Fortunately, HeadingsMap has implemented all three methods, including the efficient letter key method.This method works especially well with speech because you simply say the whole word to navigate to it, e.g. “dove”.

You can download the HeadingsMap extension here: addons.mozilla.org/en-us/firefox/addon/headingsmap/

And here are a couple of especially long documents you can try it out on:
A paper on the effects of climate change on birds from the Public Library of Science:
www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000585
The public draft of a World Wide Web Consortium standards document:
www.w3.org/TR/2010/WD-UAAG20-20100617/

Spell Everywhere

I’ve been getting a lot of questions lately about the Dragon NaturallySpeaking “Spell XYZ” command. This command lets you say, for instance “Spell s a”. People are complaining that it sometimes doesn’t work. They’re right.

This command doesn’t work everywhere. It only works in text boxes. This is an unfortunate oversight in the Dragon user interface.

Logically, any speech command should work in all contexts where it could be useful. It’s unnecessarily difficult to make the user remember different commands to carry out the same operations in different contexts. Something as basic as pressing a letter key should work anywhere you might want to use a letter, including menus.

This is what people are complaining about. Those who are complaining have gotten adept enough at speech that something basic like pressing letter keys becomes second nature. They have a habit of saying “Spell” and then a letter, number or symbol name whenever they have to hit separate keys. The definition of habit is you don’t have to think about it. And this is where they get in trouble — the habit kicks in everywhere, including when you are in a drop-down menu that doesn’t respond to full words.

If you’d like to use the “Spell XYZ” command everywhere rather than having to stop and think about where you can and can’t use it, complain to Nuance, the company that makes Dragon (there are couple of ways to do this — details are posted on the Redstart wikki: http://redstartsystems.com/Wikka/wikka.php?wakka=NatSpeakUtilitiesandResources).

Thunderbird tabs and consistency

Thunderbird now has tabs for open messages, which is very convenient. You can have three messages open and see where they are from the tabs — this is similar to tabbed browsing in programs like Firefox and Internet Explorer. And you can move among tabs using the same commands you use to move among tabs in your browser: “Tab Back”, “Tab Forward” and “1-20 Tab Back/Forward”.

Unfortunately, however, the keyboard shortcut to close a message tab is different from the standard close document/tab command used in most programs including Firefox, even though Thunderbird is developed by the same organization as Firefox. The usual command “Control Function 4” logically mirrors the common “Alternate Function 4” that’s used to close a window.

If the standard keyboard shortcut were enabled like it is in programs like Microsoft Word and Firefox, you could say the shortcut or “Document Close” to close a document or tab. And if you wanted to close more than one you could say “Document Close Times 3”, for instance.

If you dig through the keyboard shortcuts for Thunderbird, you’ll find that there is a nonstandard keyboard shortcut to close a message tab: “Control w”. So you can train yourself to say “Control w” to close a message when you’re in Thunderbird. Also keep in mind you can also say “Control w Times 3” to close three open messages. But it would be far better to not have to think about which program you are in when closing a tab or document. Feel free to complain to Thunderbird about this oversight at the Thunderbird support forum.

Here’s another Thunderbird tip: If you want to move a message rather than just closing it try “Move Recent”, “1-10 Down Enter”.
There’s more Thunderbird strategy on the Redstart Wiki: http://redstartsystems.com/Wikka/wikka.php?wakka=UCandThunderbird

Suggestion for Dragon: Easier Correction

In the last couple of months I’ve had a couple occasions to suggest to the folks at Nuance, the company that makes the Dragon NaturallySpeaking speech engine, that their “Resume With” command is under advertised. The command is very useful, but I keep meeting people who don’t know about it.

“Resume With” lets you change text on the fly. For instance, if you say “The black cat jumped over the brown dog”, then — once you see it on the screen — change your mind about the last bit and say “Resume With over the moon”, the phrase will change to “The black cat jumped over the moon.”

This is a particularly useful command for doing something people do a lot — change text as they dictate.

Now I have a suggestion that I think would make the command both better and more often used. Split “Resume With” into two commands: “Try Again” and “Change To”. The two commands would have the same result as “Resume With”, but “Try Again” would tell the computer that the recognition engine got it wrong the first time and you are correcting the error. “Change To” would tell the computer that you are simply changing text.

This would be a less painful way to correct text than the traditional correction box. Users are tempted to change text rather correct it because it’s easier. This would make it equally easy to correct and change using what is arguably the fastest and easiest way to make a change.

Easy correcting is important because NaturallySpeaking learns from correcting and because it’s annoying when the computer gets things wrong. Correcting improves recognition. Minimizing the interruption reduces frustration and lets users concentrate on their work rather than spending time telling Dragon how to do its job. From my observations, many users are tempted to change text rather than correct it when the computer gets something wrong simply because it’s easier.

It would be great to have these commands both in Dragon NaturallySpeaking on the desktop and in Dragon Dictation, the iPhone application. This would enable truly hands-free dictation in Dragon Dictation.

Tip: What to do when dictation isn’t recognized as text

Occasionally the Dragon NaturallySpeaking speech engine will get mixed up about whether or not the program or field in focus is something you should be able to type text into. When this happens you’ll see lots of question marks in the recognition box.

The problem is usually easy to fix — move the focus out of whatever program this is happening in, then back in. Here’s a quick way to do that — the UC command “Notepad Open · Notepad Close”.

Tip: What to do when dictation isn't recognized as text

Occasionally the Dragon NaturallySpeaking speech engine will get mixed up about whether or not the program or field in focus is something you should be able to type text into. When this happens you’ll see lots of question marks in the recognition box.

The problem is usually easy to fix — move the focus out of whatever program this is happening in, then back in. Here’s a quick way to do that — the UC command “Notepad Open · Notepad Close”.

Tip: Help on the NaturallySpeaking utilities


A few weeks ago the folks at Nuance posted a Dragon NaturallySpeaking 10 user workbook. It’s an excellent resource. It includes detailed instructions on NaturallySpeaking speech engine utilities that will make your speech experience better.

Here are my favorite parts:

The instructions for creating a user profile (page 1) explain important concepts like choosing the right dictation source. If you accidentally choose the wrong dictation source, your accuracy will not be as good.

The Vocabulary dialog box (page 8) allows you to add custom vocabulary including phrases, and import custom vocabulary from existing documents. I make sure my users have “Utter Command” and “Redstart Systems” as phrases. I have a usability complaint about the Vocabulary dialog box, however. It’s difficult to find in the menus because the menu name doesn’t match the dialog box name: NaturallySpeaking/Words/View-Edit. I think the label should be Vocabulary Editor instead.

The Formatting dialog box (page 23) allows you to control automatic formatting of special text like numbers. This section explains what you can control and how to control it.

The My Commands dialog box (page 44) allows you to create Text and Graphics commands. These boilerplate commands are relatively easy to create and can save you a lot of time. You can assign a command like “My Address” to a larger block of text, complete with line breaks and formatting.

The improving accuracy section (page 50) includes instructions for the Train Words, Acoustic Settings and Acoustic and Language Model Optimizer dialog boxes.

Here are some resources that have to do with using NatSpeak utilities with Utter Command:

In the Utter Command manual we touch on how Utter Command dovetails with NatSpeak Correction, Vocabulary, Recognition and Train Words utilities in UC Lesson 1 (1.3, 1.4, 1.5, 1.11). Also see the “Dragon NaturallySpeaking” section on UC Exchange.