Category Archives: Patch on Speech

We’re live


After working with Beta testers and presales customers for the past year, today we’ve announced the general release of Utter Command.

 

It’s been a long time coming. It started 15 years ago when I got repetitive strain injuries in my hands. I first used the Kurzweil speech engine, and then, when it came out, the first Windows version of DragonDictate, the precursor to Dragon NaturallySpeaking.

 

After several years of writing macros that were similar to everyone else’s — and that I often forgot — I started thinking about the way the brain works with language and started working on a more consistent system. Sometime after that we decided to make a general product out of it. We were thinking it would take six months. It’s taken five years.

 

One of the reasons it took so long is we’ve produced thorough, cross-referenced documentation. Every command is explained. Many thanks to our beta testers, trainers, and presales customers for using and commenting on the UC command system, applets and documentation as we were developing and refining them. Special thanks to Laurie, our VP of QA, and Bill theTrainer for many reads through the documentation and many trips through the self-guided tours.

 

And special thanks to Wren, a programmer who worked with us in the early days. The bird that appears in our logo is the Painted Redstart (we’d already named the company when Wren, also named for a bird, joined us). 

 

Note to presales customers: you should have received your general release copy of Utter Command. Contact us via the support email or Make a Comment contact form if you haven’t.

More on naming a mouse touch


We’re continuing to find new uses for Utter Command’s naming-a-mouse-touch ability.

Here are some new ones:

– “Folders Touch” to click the folder tree button in Windows Explorer. This lets you toggle the folder tree pane on or off  – thanks to Bill Z the trainer

– “Web Touch” to click on the top left corner of a Web page, away from any links. This lets you return focus to the page – thanks to Jill

– In general, iTunes buttons – thanks to Jill

– “Snapshot Touch” to click the snapshot button on the history window in Photoshop – thanks to Eric

– “Highlight Touch” to click the highlighter button in Word – thanks to Jeff

And here’s a new one I’ve been using: “Right Touch” and “Left Touch” to click the right and left side of a horizontal scroll bar in Excel. This lets you scroll left and right by page.

We’re also finding some new uses for naming two mouse clicks in a row.

– “Balloon Middle Touch” to dismiss the Dragon NaturallySpeaking balloon that comes up in NaturallySpeaking 10 Service Pack 1. The command clicks the balloon to make it go away, then clicks the middle of the screen to put the focus back on your application – thanks to Bill Z the trainer

– “Capture Settings Touch” in FastStone Capture. The command clicks the tiny main menu icon on the software toolbar menu, then clicks settings. This makes it easy to switch among full-screen, active area and window capture – thanks to Eric

And here’s one from Daniel:

– “I use a Microsoft address book that always opens in the wrong folder (“shared contacts” instead of “main identity contacts”). The window is also divided so I can’t switch folders with the cursor without moving the mouse or tabbing a lot. So I named a Local Touch to click “main identity contacts” and another one to click inside the portion of the window that lists the names and addresses. What it comes down to is that the brief command “Local Contacts Names Touch” puts me where I want to be after the window opens. This is extremely convenient!”

Thanks, and keep them coming – reply here or let me know at info@ this website address.

4/30/09 Note: see the naming a mouse touch video.

New Videos: The Whirlwind Tour

Check out our new videos — UC Whirlwind Tour part 1 and UC Whirlwind Tour part 2.

The Whirlwind Tour gives you a taste of what you can do with Utter Command in some key areas:

  • Controlling the Utter Command menu
  • Opening and closing programs
  • Clicking the mouse
  • Moving and sizing windows
  • Using Windows and program menus
  • Opening files and folders directly
  • Dictating and closely editing in any program
  • E-mailing
  • Using the Internet

There’s a printed copy of this tour, including cross references, in Getting Started. UC also includes an on-screen guide for this tour (say “UC Whirlwind” and the Whirlwind Tour commands will appear in a narrow on-screen guide window on the right side of your screen — other programs will size around).

Watch the tour, then use the on-screen guide or a printout of Getting Started to take it yourself.

What’s your favorite UC command? What would you like to see a tour on? Reply here or let me know at info@ this website address.

Ten things I'd like to see

In December, 2003 the Boston Voice users group (BVUG) and its New York City counterpart (NYPC) did top 10 lists of what they would like to see in speech recognition engines. At the time both Dragon NaturallySpeaking and IBM’s ViaVoice were available.

Here’s my version for Dragon NaturallySpeaking 10. This list is also posted on the UC Exchange Wiki so I can keep track of whether and when they’re implemented.

1. I’d like a default user option that would let me start the program hands-free.

2. I’d like the ability to check audio settings hands-free.

3. I’d also like ability to save and switch Check Audio settings — this is useful if you travel a lot. I do an audio check whenever I land someplace new, but there’s no reason I should have to do another audio check rather than go back to a saved once I’m back in the office. I have a couple more minor suggestions for the Check Audio dialog box. First, it’s important enough to deserve its own menu item rather than only being buried in the Accuracy menu. Second, there’s an interface gotcha. Once you’ve finished checking the microphone, the focus is still on the go button. If you’re not thinking and click without moving the focus you find yourself checking the microphone again instead of going onto the accuracy check, which at best makes the process longer, and at worst is confusing.

4. I’d like separate the controls for buttons and menus. I’d like to be able to say whatever’s on the button — “yes”, “no”. But at the same time I want a longer command for menu items, e.g. “File Menu” rather than just “File”, because menu options are often active when I’m writing text.

5. The Dragon NaturallySpeaking engine should understand that when I say “Cap” what I’m looking for is a written word, not a number or symbol. “Cap Sixty” should return “Sixty”, not “60”. And “Cap Ampersand” should return “Ampersand” not “&”.

6. In the Spell Correction dialog box, I’d like a way to tell NatSpeak to type a a whole word. I’d like to say the word “Word” to indicate that the rest of the phrase is going to be a word just like I can say “Spell” to indicate that the rest of the phrase is going to be spelled.

7. The old Dragon Dictate where you could say separate words was better for people who have some types of disabilities. Putting a “speak words separately” mode in NaturallySpeaking would help a lot of people.

8. I’d like the option to be able to train the NatSpeak speech engine by repeating audio read to me through headphones rather than reading from text. This would also make training easier for younger kids.

9. I’d like a simple way to duplicate a user. Right now you can do this, but it’s a multistep and confusing process. To make a copy of the current user you have to backup, then restore. A separate menu item for duplicating would take the confusion out of the process.

10. Bring back the Dragon logo:-). The Dragon was much cooler then the green spiky blob.

What do you think of my top 10 list for NaturallySpeaking? What’s yours? Reply here or let me know at info@ this website address.

Ten things I’d like to see

In December, 2003 the Boston Voice users group (BVUG) and its New York City counterpart (NYPC) did top 10 lists of what they would like to see in speech recognition engines. At the time both Dragon NaturallySpeaking and IBM’s ViaVoice were available.

Here’s my version for Dragon NaturallySpeaking 10. This list is also posted on the UC Exchange Wiki so I can keep track of whether and when they’re implemented.

1. I’d like a default user option that would let me start the program hands-free.

2. I’d like the ability to check audio settings hands-free.

3. I’d also like ability to save and switch Check Audio settings — this is useful if you travel a lot. I do an audio check whenever I land someplace new, but there’s no reason I should have to do another audio check rather than go back to a saved once I’m back in the office. I have a couple more minor suggestions for the Check Audio dialog box. First, it’s important enough to deserve its own menu item rather than only being buried in the Accuracy menu. Second, there’s an interface gotcha. Once you’ve finished checking the microphone, the focus is still on the go button. If you’re not thinking and click without moving the focus you find yourself checking the microphone again instead of going onto the accuracy check, which at best makes the process longer, and at worst is confusing.

4. I’d like separate the controls for buttons and menus. I’d like to be able to say whatever’s on the button — “yes”, “no”. But at the same time I want a longer command for menu items, e.g. “File Menu” rather than just “File”, because menu options are often active when I’m writing text.

5. The Dragon NaturallySpeaking engine should understand that when I say “Cap” what I’m looking for is a written word, not a number or symbol. “Cap Sixty” should return “Sixty”, not “60”. And “Cap Ampersand” should return “Ampersand” not “&”.

6. In the Spell Correction dialog box, I’d like a way to tell NatSpeak to type a a whole word. I’d like to say the word “Word” to indicate that the rest of the phrase is going to be a word just like I can say “Spell” to indicate that the rest of the phrase is going to be spelled.

7. The old Dragon Dictate where you could say separate words was better for people who have some types of disabilities. Putting a “speak words separately” mode in NaturallySpeaking would help a lot of people.

8. I’d like the option to be able to train the NatSpeak speech engine by repeating audio read to me through headphones rather than reading from text. This would also make training easier for younger kids.

9. I’d like a simple way to duplicate a user. Right now you can do this, but it’s a multistep and confusing process. To make a copy of the current user you have to backup, then restore. A separate menu item for duplicating would take the confusion out of the process.

10. Bring back the Dragon logo:-). The Dragon was much cooler then the green spiky blob.

What do you think of my top 10 list for NaturallySpeaking? What’s yours? Reply here or let me know at info@ this website address.

New Videos: Commandline and quick Perl


We have a couple of new demo videos up.

Utter Command: Commandline by Speech shows how you can use the UC List Enter facility to speed up the commandline interface.

Utter Command: Writing a Perl Script by Speech shows how you can use UC’s combined keyboard commands to speed up writing code. Note that for this demo we don’t use any custom coding commands, just standard commands that work the same in any program.

You may recognize this Perl script from a YouTube video of a Microsoft speech demonstration. The big difference between the videos is with UC I had fewer commands to say and therefore fewer potential points of failure. There were a couple of other differences as well. I’m using the ideal speech set up: the NaturallySpeaking Pro speech engine running on XP with a Sennheiser ME3 Microphone and a buddy USB pod. I also wasn’t in front of an audience. I suspect the computer hardware is similar. My laptop is a two-year old Intel Core duo 2.16 with 2GB of memory.

Useful free software on UC Exchange


People often ask me for advice on software, and there are a lot of free programs I regularly recommend. I put up a UsefulFreeSoftware page on UC Exchange as a general reply. I also included links to help pages and forums for the software. Let me know if there’s something you think I should add to the list. Reply here or let me know at info @ this website address.

Dealing with the Office 2007 ribbon


I’ve been getting a lot of questions lately about Microsoft Office 2007 versus Microsoft Office 2003.

My stock answer is I prefer the 2003 drop-down menus to the 2007 ribbon. It’s funny, at the same time as Office made the switch from drop-down menus to the more Web-like ribbon, the Web application Google Documents made the opposite move — changing from a tab-based interface to drop-down menus. Out of the box, 2007 is less efficient — it takes up more screen space and requires more steps than 2003.

Having said that, the 2007 interface is also very configurable. You can put any drop-down menu or menu item on the Quick Access Toolbar that runs across the very top of the screen. And you can hide the ribbon. If you take the time to put the items you use most on the Quick Access Toolbar, you can make Office 2007 much more accessible.

For details on setting things up and using Microsoft Office 2007 with Utter Command, see UCExchange: UCandOffice2007 .

What’s your opinion on 2007 versus 2003? Reply here or let me know at info@ this website address.

UC Exchange


The UC Exchange Wiki is up! Check it out (say “UC Exchange”). Over the coming months you’ll see pages on specific applications with advice on how to apply UC to those programs, including step-by-step tours. 

Research Watch: What you see changes what you hear


Who says looks don’t matter?

It looks like what you see changes what you hear. Researchers from Haskins Laboratories and MIT have found that different facial expressions alter the sounds we hear.

This shows that the somatosensory system — the mix of senses and brain filtering that determines how you perceive your body — is involved when you process speech.

This doesn’t have a whole lot to do with speech commands except to show that it’s easy to underestimate the complexity, and subtlety, of our perception of spoken language.

Resources:

Somatosensory function in speech perception
www.pnas.org/cgi/doi/10.1073/pnas.0810063106