Category Archives: Dragon NaturallySpeaking

Google+: will keyboard access improve beyond jk Enter Tab?

It’s obvious that Google+ is a powerful tool for personal and business use, and that we’ll see much more of it as time goes on. Circles are a brilliant way to organize contacts and share information, and hangouts makes video conferencing easy. What’s not to like?

Well, have you ever been in a situation where you’re looking at food that looks great and smells great and you’re hungry and would very much like to have some, but there’s some reason you can’t?

I have to give Google+ very mixed reviews on accessibility.

The more important a tool like Google+ turns out to be, the more important that everyone have access to it — including a couple of large communities who find it difficult or impossible to use a mouse: folks who have repetitive strain injuries and folks who are blind.

The good news is it’s relatively simple to make a program accessible to everyone: include keyboard controls. Better yet, provide a way to reconfigure keyboard shortcuts and share configurations. Enable the keyboard, and alternative controls like speech recognition can translate keyboard shortcuts to give users whatever type of access they need.

So how accessible is Google+ on this basic — keyboard shortcuts?

I couldn’t easily find shortcuts documentation.

So I tried some things out.

It failed on one basic requirement — you can’t use the arrow keys to move up and down conversations and drop-down menu items.

There’s a little good news, however.
– You can use the the “j” and “k” keys to move between conversations, just like in Gmail.
– The Enter key sets you up to write a comment.
– And a combination of the right number of Tabs and Enter lets you post, delete or cancel from the comment field.

So if you use Dragon plus Utter Command speech software to control the computer you can say “Letter j” or “Letter k” to move back or forward through entries. You can also skip forward or back in a single utterance, e.g. “k Times 3”. And you can open a comment field by saying “Enter”, and, once the comment is in, say, “Tab Enter” to post it, or “2 Tab Enter” to cancel it.

But that’s about it for useful direct keyboard control.

If you use the Firefox Mouseless browsing extension, you can go directly to most elements for a small penalty — numbers taking up space on your screen.

And if you use Dragon’s speak links ability you can click links and buttons that contain words, but this breaks down with less pronounceable names like +1’s and unpronounceable things like the home, pictures, profile and circles icons. Dragon’s speak links ability is also a little fragile — it’s all too easy to accidentally say a word that clicks a link.

And both solutions require you to identify something by sight before you take action, which can make things slower or a showstopper depending on your abilities. This is where keyboard shortcuts should be filling in the blanks.

Given the mixed situation, the easiest way for Dragon plus Utter Command users to access commonly clicked items like the search field, unpronounceable icons like profile, and unpronounceable symbols like the little drop-down list on the top right corner of each entry might be to use the UC Touch List to set up named mouse clicks. It takes a little set up, but will get you to the meal in the end.

Google+ clearly needs more keyboard shortcuts.

Better yet, how about a tool to allow us to easily configure keyboard shortcuts in Google+? Or even better, how about a tool to allow us to easily configure keyboard shortcuts across Google products? This would allow more people into the circle. It also has a lot of potential to improve the experience for folks who are already in.

It’s obvious that Google+ is a powerful tool for personal and business use. We’ll see much more of it as time goes on. Circles are a brilliant way to organize contacts and share information, and hangouts makes video conferencing easy. What’s not to like?

I have to give Google+ very mixed reviews on accessibility.

So how accessible is Google+ on this basic — keyboard shortcuts?

I couldn’t easily find shortcuts documentation.

So I tried some things out.

It failed on one basic requirement — you can’t use the arrow keys to move up and down conversations and drop-down menu items.

But that’s about it for useful direct keyboard control.

If you use the Firefox Mouseless browsing extension, you can go directly to most elements for a small penalty — numbers taking up space on your screen. And if you use Dragon’s speak links ability you can click links and buttons that contain words, but this breaks down with less pronounceable names like +1’s and unpronounceable things like the home, pictures, profile and circles icons. Dragon’s speak links ability is also a little fragile — it’s all too easy to accidentally say a word that clicks a link. And both solutions require you to identify something by sight before you take action, which can make things slower or a showstopper depending on your abilities. This is where keyboard shortcuts should be filling in the blanks.

Given the mixed situation, the easiest way for Dragon plus Utter Command users to access commonly clicked items like the search field and profile buttons might be to use the UC Touch List to set up named mouse clicks. It takes a little set up, but will get you to the meal in the end.

Google+ clearly needs more keyboard shortcuts.

7/14/11
All too often software vendors act like they’re the only ones with a software vendor relationship with the user.
it’s no big deal if an update automatically downloads every once in awhile. It is a big deal if you use 20 pieces of software and an update from each downloads every once in awhile.
This is why standards or good user control is important for communicating with software.

Posting to Word Press by speech

I get a lot of inquiries about how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress:

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say
– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say
– “Find Mark 1”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second one positions the cursor two lines below it at the top of the section. Then I say
– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique is I don’t count how many words I want to select back. I just make sure to select more words than I need to change, then look to see what is selected and resay what I need to.

I also make use of the Dragon inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”.

We’ve just been testing a series of commands that lets you use a mouse without clicking, and I’ve been experimenting with commands like “Touch Word” and “Touch 3 Words” to select text.

Posting

After I’ve written and edited a piece, I say
– “Find Mark 1”, then “2 Down Home” to put the cursor at the beginning of the headline
Then I use several “1-100 Up\Downs” commands combined with a copy command to select the story, e.g. “50 Downs”, “20 Downs”, “5 Ups Copy”

Then I open the page where I post by saying
– “WordPress Site”
If I’m not already logged on it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored I can login in a single utterance:
“<username> Enter”
Once I’m in I say
– “31 Go” to click the “New Post” link
– “Tab Paste” to tab to the body field and paste the text
– “Go Top” to move the cursor to the top of the file
– “Line Cut” to cut the headline
– “2 Delete” to remove the extra lines
– “49 Go” to move to the headline field
– “This Paste” to paste the headline

Categories and Publish

I add categories using the Go numbers, one or two at a time , e.g. “31 Go” to add one category and “38 Go 41 Go” to add two categories in a single utterance, and use a Go number to hit the “Preview” button.

Then I look over the post, say “Doc Close” to close the preview, and use a Go number to hit “Publish”.

Avoid having to remember commands

This way you don’t have to remember commands. Eventually, after using the guide a bunch of times, you’ll have the sequence memorized without having to consciously memorize it.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line at kim @ this web address.

I get a lot of inquiries into how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress.

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say

– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say

– “Find Placeholder”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second command positions the cursor two lines below it, so the new ideas are always at the top of the section. Then I say

– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique — I don’t count how many words I want to select back — I just make sure to go over the number I want to change, then I look to see what is selected and resay what I need to. I also make use of the Dragon Inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”. We’ve just been testing out a series of commands that lets you use a mouse device without clicking, and I’ve found that commands like “Touch Word”and ”

Posting

After I’ve written and edited a piece, I select the blog text and say

– “Copy to 1 File” to copy story to the use the clipboard “1 File” so I can paste it later

– “2 Up” to unselect and put the cursor on the headline, and

– “Line Copy” to copy the headline

Once I have the blog and headline loaded up, I open the page where I post by saying

– “Word Press Site”

If I’m not already logged on it it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored This is all I need to say to login:

“<username> Enter”

Once I’m in I say

– “31 Go” to click the “post” link

– “Paste Tab” to paste the headline and tab to the next field

– “1 File Paste” to paste the blog text.

I think the key to enabling a program for efficient speech control is to take the time to look at what you want to do in detail and plot it out — take the time to write out the steps. Make a game of figuring out just how efficient you can be. Then take the steps and put them in one of the UC custom guides, so you can call up instantly and simply read the set of commands to carry out the task, e.g. “Custom 3 Guide”. This way you don’t have to remember commands. Eventually, from the repetition and saying and picturing the commands in the guide, you’ll have the memorized. But you won’t have to spend extra energy while you’re trying to do your work memorizing them.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line.

Watch for Dragon 11.5

It looks like a new point release of Dragon NaturallySpeaking – 11.5 – will be available for free download for Dragon 11 customers within a few weeks.

Articles about the point release tout its new-found ability to use the iPhone as a wireless microphone for Dragon over wifi. The point release also has some new commands for a couple of very popular websites: Twitter and Facebook.

There have been some more subtle changes as well, including bug fixes aimed at making things run smoother. We weren’t recommending that our customers jump to 11, partly because version 10 was still a little faster, but 11.5 is probably worth the change.

This is the first time since NaturallySpeaking 3.5 that Dragon has done a mid-point release. That one was free as well, only back then Dragon had to send CDs out to everyone.

Here’s where to watch for the free download for Dragon 11 customers:

http://www.nuance.com/dragon/whats-new-upgrade/index.htm

http://www.nuance.com/for-business/by-product/dragon/product-resources/whats-new-version-11/index.htm

Redstart Systems will also have an announcement within a couple of weeks — watch this space.

Spell Everywhere

I’ve been getting a lot of questions lately about the Dragon NaturallySpeaking “Spell XYZ” command. This command lets you say, for instance “Spell s a”. People are complaining that it sometimes doesn’t work. They’re right.

This command doesn’t work everywhere. It only works in text boxes. This is an unfortunate oversight in the Dragon user interface.

Logically, any speech command should work in all contexts where it could be useful. It’s unnecessarily difficult to make the user remember different commands to carry out the same operations in different contexts. Something as basic as pressing a letter key should work anywhere you might want to use a letter, including menus.

This is what people are complaining about. Those who are complaining have gotten adept enough at speech that something basic like pressing letter keys becomes second nature. They have a habit of saying “Spell” and then a letter, number or symbol name whenever they have to hit separate keys. The definition of habit is you don’t have to think about it. And this is where they get in trouble — the habit kicks in everywhere, including when you are in a drop-down menu that doesn’t respond to full words.

If you’d like to use the “Spell XYZ” command everywhere rather than having to stop and think about where you can and can’t use it, complain to Nuance, the company that makes Dragon (there are couple of ways to do this — details are posted on the Redstart wikki: http://redstartsystems.com/Wikka/wikka.php?wakka=NatSpeakUtilitiesandResources).

Friday Tip: Creative New Line

Although you can use the Utter Command Folders List utility to directly open any folder on your folders list using a single command, you’re also likely to at least occasionally navigate ad-hoc through the file system. A common way to navigate is to say a folder name to navigate to the folder (e.g. “Financials”), say “Enter” to go into the folder, say the name of a subfolder (e.g. “Budget”), then say “Enter” to go into the subfolder, etc.

Here’s a tip from Jacob Cole, an MIT student I’ve been training on Utter Command.

Navigate folders using the Dragon “New Line” in-line command, which was originally conceived as a text command. In-line commands are used within a text phrase. They’re mostly punctuation marks like “comma”. “New Line” is a little different. It literally hits the Enter key to give you a new line. The classic “New Line” example is saying a grocery list without pausing between lines, e.g. “Avocados New Line Eggs New Line Flour”.

Jacob pointed out that you can also use “New Line” to reduce the number of phrases you have to say when navigating through folders. Using the above example, instead of having to say four separate utterances to go two folder layers deep, you can say “Financials New Line” to navigate to the financial folder and go into it, then “Budget New Line” to navigate to the budget folder and go into it. Or even “Financials New Line Budget New Line”.

Happy navigating.

Have any good tips or pet peeve’s about using speech input? Let me know at info@ this website address.

What’s in a name? Lots.

I get a lot of inquiries from people who are confused about the Dragon speech engine’s many names, and also the name of the company that owns it.

Here’s a brief history:

The Dragon speech engine has changed hands twice, but the name of the company owning it has changed three times.
In the beginning Dragon Systems created the DragonDictate speech engine. Also in the beginning several other companies also created programs that let you speak to a computer: Kurzweil Applied Intelligence, Lernout & Hauspie, IBM and Philips. These early speech engines all required you to pause between words. This was a somewhat frustrating way to dictate and was hard on your voice.

Dragon, Lernout & Hauspie, IBM and Philips eventually improved their speech engines so you could dictate in phrases. When Dragon Systems brought out continuous speech recognition, it changed the name of its product to Dragon NaturallySpeaking. Dragon NaturallySpeaking generally worked better for dictation than DragonDictate.

People who were trying to use Dragon NaturallySpeaking hands-free, however, found that Dragon NaturallySpeaking lacked some of the DragonDictate features. Some of us who needed hands-free speech input used a combination of DragonDictate and Dragon NaturallySpeaking for years. (For me it was until NaturallySpeaking 3.5 came out. There are still a couple of features that were in the old DragonDictate that haven’t made it into Dragon NaturallySpeaking. The one I miss the most is the ability to go straight to a macro script from the recognition dialog box where you could see what Dragon had heard.) So DragonDictate was used and talked about long after development stopped.

Just before Dragon NaturallySpeaking version 5 came out Dragon Systems was sold to Lernout & Hauspie, makers of rival speech engine VoiceXpress Pro. NaturallySpeaking 6 was a merger of the products, keeping the NaturallySpeaking name and most of the look and feel (with the notable exception of the macro creation facility). When Lernout & Hauspie famously melted down, the Lernout & Hauspie speech assets were sold to ScanSoft, a company that started with optical scanning recognition technology acquired from Xerox, who acquired it by buying Kurzweil Computer Products, Inc., one of several companies started by Ray Kurzweil. (The Lernout & Hauspie speech assets also included the Kurzweil Voice speech engine, which Lernout & Hauspie had acquired by buying Kurzweil Applied Intelligence, another company started by Ray Kurzweil.)

Just before ScanSoft acquired Dragon, they’d signed a 10-year deal with IBM to market IBM’s ViaVoice, which by then included PC and Mac versions. After the ScanSoft acquisition there were no more new ViaVoice products. Over the next few years ScanSoft acquired many more speech-related companies including Nuance. After the Nuance acquisition, ScanSoft switched its name to Nuance. Some people refer to the old Nuance as blue Nuance and the current Nuance as green Nuance. (This was the second name change for ScanSoft. It was founded in 1992 as Visioneer.)

This year, Nuance created an iPhone app named Dragon Dictation — name sound familiar?

Also this year Nuance bought MacSpeech. There’s some name history here too. MacSpeech’s original speech engine for the Mac, iListen, was based on Philips FreeSpeech2000 speech engine. MacSpeech changed its product name to match the company name after signing an initial deal with Nuance in early 2008 to use the Dragon NaturallySpeaking engine. (Later in 2008 Nuance bought Philips Speech Recognition Systems.) After buying MacSpeech Nuance renamed the speech engine product to Dragon Dictate for Mac. Name sound familiar? The old DragonDictate had no space between words. The new Dragon Dictate is two separate words.

OK. Got that all straight? There’s a little more nitty-gritty. The Dragon NaturallySpeaking product line includes a basic version, middle version, professional version, legal version and medical version. The professional, legal and medical versions all originally had the “Dragon NaturallySpeaking” first and middle names, but somewhere along the line the legal and medical versions lost NaturallySpeaking, becoming Dragon Legal, and Dragon Medical.

Meanwhile, the basic version and middle versions have recently changed names. The basic version has in the past gone by “standard” but is currently “home”. The middle version has in the past gone by “preferred” but is currently “premium”. There’s also a sub-basic version not usually sold by resellers that can be found in retail stores usually around Christmastime named Dragon NaturallySpeaking Essentials.

One last thing. I’m not sure where Dragon Speak came from. I’ve heard many people refer to Dragon NaturallySpeaking as Dragon Speak, but that’s never been an official name — so far.

So — I hope that clears everything up.

Utter Command has always been named Utter Command — just saying.

Suggestion for Dragon: Easier Correction

In the last couple of months I’ve had a couple occasions to suggest to the folks at Nuance, the company that makes the Dragon NaturallySpeaking speech engine, that their “Resume With” command is under advertised. The command is very useful, but I keep meeting people who don’t know about it.

“Resume With” lets you change text on the fly. For instance, if you say “The black cat jumped over the brown dog”, then — once you see it on the screen — change your mind about the last bit and say “Resume With over the moon”, the phrase will change to “The black cat jumped over the moon.”

This is a particularly useful command for doing something people do a lot — change text as they dictate.

Now I have a suggestion that I think would make the command both better and more often used. Split “Resume With” into two commands: “Try Again” and “Change To”. The two commands would have the same result as “Resume With”, but “Try Again” would tell the computer that the recognition engine got it wrong the first time and you are correcting the error. “Change To” would tell the computer that you are simply changing text.

This would be a less painful way to correct text than the traditional correction box. Users are tempted to change text rather correct it because it’s easier. This would make it equally easy to correct and change using what is arguably the fastest and easiest way to make a change.

Easy correcting is important because NaturallySpeaking learns from correcting and because it’s annoying when the computer gets things wrong. Correcting improves recognition. Minimizing the interruption reduces frustration and lets users concentrate on their work rather than spending time telling Dragon how to do its job. From my observations, many users are tempted to change text rather than correct it when the computer gets something wrong simply because it’s easier.

It would be great to have these commands both in Dragon NaturallySpeaking on the desktop and in Dragon Dictation, the iPhone application. This would enable truly hands-free dictation in Dragon Dictation.

Trying out Dragon Search for the iPhone

Dragon Search is a nice app. Here’s how it works: open the app, hit one button, speak the phrase you want to search for. By default the app stops listening and starts the search when you pause so you don’t have to hit another button when you’re done.

The app comes up quickly, which from a practical standpoint is extremely important. And in my experience so far the search has been fast. There’s also a button you can push to cancel out of the search. The big plus of this application is the different search channels: Google, iTunes, Twitter, Wikipedia, and YouTube. You can search for something, like green apples, and the results will come up in the channel you used last. Once you’ve done a search you can switch channels easily to see results across channels.

I have a couple of practical suggestions.

1. The history list is just three items long — I’d like a much longer scrolling history list. Google Voice Search has a long scrolling list that includes dates. I would’ve liked to have seen Nuance improve on that.

2. I’d also like to be able to add my own channel.

I’ll also take the opportunity to repeat what I said a couple of days ago. I appreciate the progress on speech apps — don’t get me wrong. But speech on the iPhone is still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. These new apps are steps in the right direction — making the iPhone more hands-free. But there’s still a long way to go.

A few more thoughts on Dragon Dictation

I’ve been using Dragon Dictation on the iPhone a little more over the past few days and have a couple more thoughts for improvement.

1. If you select text in the full-screen application, then switch to the keyboard the text doesn’t stay selected. The text should stay selected. If you’ve selected an incorrect word or phrase, found there are no correct choices, and are proceeding to the keyboard to correct it. It’s frustrating to have to select again.

2. I’ve lost dictation a couple of times because I’ve switched out of the app — this is unexpected because writing apps like Notepad tend to stay where you left them. I suspect that Dragon Dictation maker Nuance made this choice in order to limit the number of steps for new dictation. I think there are ways to provide this valuable option without increasing steps. The quick solution would be a “remember last dictation option” in settings that would let the user decide which way to do it. Maybe a better solution would be adding a “continue” button to the bottom of the initial screen that would give you the option to continue. So if you wanted to start fresh you would press the main button in the middle of the screen, but if you wanted to continue you could press the smaller “continue” button at the bottom of the screen.

Trying out Dragon Dictation for the iPhone

I’ve been trying out the Dragon Dictation iPhone app. It’s still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. But it’s a step in the right direction of making the iPhone more hands-free.

Here’s how Dragon Dictation for the iPhone works: open the app, hit one button, speak up to 30 seconds of dictation, then hit another button to say you’re done. Your dictation shows up on the screen a few seconds later. Behind the scenes the audio file you’ve dictated is sent to a server, put through a speech-recognition engine, and the results sent back to your screen. Now you can add to your text by dictating again, or hit an actions button that gives you three choices: send what you’ve written to your e-mail app, send it to your text app, or copy it to the clipboard so you can paste it someplace else.

The recognition is usually fairly accurate in quiet environments. Not surprisingly, you get a lot of errors in noisy environments. To its credit, on a mobile device the built-in microphone is not optimal for speech-recognition. It does pretty well given these constraints.

Here’s a practical suggestion that should be easy to implement: Add a decibel meter so people can see exactly how much background noise there it is at any given time. This would make people more aware of background noise so they could set their expectations accordingly.

The interface for correcting errors is reasonable. Tap on a word and there are sometimes alternates available or you can delete it. Tap the keyboard button and you can use the regular system keyboard to clean things up.

I have two interface suggestions:

1. You can’t use the regular system copy and paste without going into the keyboard mode. You should be able to. I suspect this is fairly easy to fix.

2. There is no speech facility for correcting errors. I think there’s a practical fix here as well.

First, some background. Full dictation on a mobile device is tricky. Full dictation speech engines take a lot of horsepower. Dragon Dictation sidesteps the problem by sending the dictation over the network to a server running a speech engine. The trade-off is it’s difficult to give the user close control of the text — you must dictate in batches and wait briefly to see the results. This makes it more difficult to offer ways to correct using speech. But I think there is a good solution already in use on another platform.

Although it’s difficult to implement most speech commands given the server setup, the “Resume With” command that’s part of the Dragon NaturallySpeaking desktop speech application is a different animal. This command lets you start over at any point in the phrase you last dictated by picking up the last couple of words that will remain the same and dictating the rest over again.

This would make Dragon Dictation much more useful for people who are trying to be as hands-free as possible. It would also lower the frustration of misrecognitions and subtly teach people to dictate better.

It’s nice to see progress on mobile speech. I’m looking forward to more.

Share this post:

Share this post:

Share this post:

Share this post:

Share this post:

Share this post:

Share this post:

Share this post:

Share this post:

Share this post: