Archive for the 'Interface' Category

Gravity on the Web

Wednesday, August 6th, 2008

3

Computer commands of all kinds — speech, keyboard and mouse — are much easier to use when they’re consistent across programs.

At the base level, it’s important that common elements like drop-down menus act the same. You control drop-down menus without thinking — click on an element or use the Left, Right, Up, Down and Enter keys.

Consistent commands are the real-world equivalent of having the same gravity in every room, or keys turning the same way to unlock.

Web applications are looking more and more like standard computer programs, but sometimes the elements that look familiar don’t act the way we’re used to. Drop-down menus usually respond in a familiar way to the mouse, but often don’t respond to the Up, Down and Enter keys.

But perhaps things are getting better.

The first drop-down menus to show up on Google Docs didn’t respond to Left, Right, Up, Down and Enter. Then most of the folder-view drop-down menus were arrow key/Enter enabled, but not document menus. A few months ago document menus changed from looking tab-like to looking more menu like, but still didn’t respond to arrow keys and Enter. Then, sometime in the last few weeks, the Doc menus were arrow key/Enter enabled (the change didn’t show up on the update notice).

The keyboard shortcuts enable better speech navigation as well. I can say, for instance, “3 Down Enter” to choose an item in an open menu, “3 Down 2 Right Enter” to choose a color on the open color menu, or “7 Right Wait 3″ to take a three-second peak at each of the seven successive menus starting with the file menu open.

This is a great trend.

Now all we need is keyboard shortcuts to open the menus in the first place. We also need the same kind of control in all Web applications, including Google spreadsheets.

Friday Tip: Remembering boilerplate and vocabulary commands

Friday, August 1st, 2008

3

NatSpeak boilerplate Text and Graphics commands allow you to insert any text or graphics into a document using a single speech command. These commands can be very powerful — they’re good for adding text and graphics that you use often, such as your address or a set of directions.

The NatSpeak Vocabulary editor allows you to add words or phrases to your vocabulary that have different spoken and written forms. This allows you to make words like your email address easily pronounceable.

The key to using boilerplate and vocabulary commands is being able to remember them.

There are two ways to make these types of commands easy to remember:

1. Word them consistently

2. Make them easy to look up

I find the easiest way to remember boilerplate Text and Graphics commands is to simply say the first part of the text you’re inserting followed by “Full”. So “Redstart Full” prints the full name and address of Redstart Systems. If you have two different versions of the address, add a number. “Redstart Full 1″ prints the same address in a different format.

You can use the Utter Command Clipboard facility to make anything easy to look up. Once you name your Text and Graphics command say “Line Copy To” followed by the name of the UC Clipboard file and you’ve got it recorded. For example, to keep your boilerplate commands in “UC List 1″ say “Line Copy To List 1″.

Now any time you want to consult your list of commands say “List 1 File”. You can also print it out.

I also use the start-to-say method for vocabulary words that have different written and spoken forms. I’ve put my Redstart email address in as a vocabulary word with the spoken form “Kim at Red” and my Gmail address in as a vocabulary word with the spoken form “Kim at G Mail” (in address commands I use “Kim” whether or not the actual address is just Kim or something longer).

One caution in using vocabulary in this way — make sure commands are at least two words and make sure the two words are not a common phrase that you’d want to say as is. If you need to, use the “Full” method above to avoid this problem. Also make sure to save your user after adding vocabulary words.

If you wish, keep vocabulary words that have different written and spoken forms on the same list as your boilerplate commands.

The difference between boilerplate commands and written/spoken vocabulary words is a block of boilerplate is returned exactly as written, while vocabulary commands are treated like words, with appropriate spacing before and after them.

UC Commands Tip: say “NatSpeak” followed by the first one or two words in a NatSpeak dialog box title to call up that dialog box.

Commands for the dialog boxes mentioned above:

“NatSpeak My Commands” calls up the NatSpeak My Commands dialog box where you can write a boilerplate Text and Graphics macro

“NatSpeak Vocabulary” calls up the NatSpeak Vocabulary Editor dialog box

Solving the page down problem

Monday, May 19th, 2008

Whenever I talk to people who use speech commands to control a computer I encourage them to complain. Something that frequently comes up is it’s a drag having to say “page down” so much.

We’ve come up with several ways to diminish the drag:

1. Several screens at once

First, “Page” is a back-of the-mouth word, which is more difficult to say than words that only use sounds that originate in the front of the mouth. This isn’t a problem for commands you don’t use frequently, but looms large when you have to repeat something over and over again.

And when you say “Page Down”, you’re really moving by screen, not by page. This is fortunate, because “Screen” is easier to say than “page”.

Using Utter Command you can say “Page Down” and “Page Up” to hit the page up and page down keys, but you can also say “Screen Down” and “Screen Up”. And you can move multiple screens: “2 Screen Down”, “5 Screen Up”

2. Right to the point

You can also go to a given screen. “Screen 3″, for instance, jumps you right to the third screen of information in a document.

And in programs whose Find facilities recognize page numbers, including pdf’s, you can go right to a given page by saying, for instance, “Find Page 22″. You can try this out on a UC lesson document: “UC Lesson 1″.

3. Wait

It’s still tedious to say “Screen Down” every couple of seconds when you want to glance quickly at subsequent pages. Try this: “3 Screen Down Wait 5″. This moves down a screen, waits 5 seconds, moves down another screen, waits 5 seconds, then moves down another screen.

4. The right tool for the job

It’s also important to look at exactly why you’re going through a document screen by screen. Often you’re looking through pages for a certain section. In this case the screen-by-screen facility isn’t the right tool for the job, but you may be using it because usually it’s the best tool available.

If you’re looking through a document that has numbers, letters or symbols to differentiate sections you can use the UC Keywords facility go directly to any of these. To see what I mean say “Find 1 Period”, “Find 3 Period” in this document. Now picture a longer document with more and longer sections, and a section outline along these lines:

1. Speech Command Problems
1.1 Page Down
1.2 Page Down Solution

2. Speech Command

You could say, for example “Find 1 Period”, “Find 1 Point 1″, “Find 1 Point 2″ and “Find 2 period” to jump among these sections.

Using the UC Keyword list you can use any section organization scheme you want — numbers, letters, numbers and letters (1a., 1b….) or heading words themselves (“Find Introduction”, “Find Summary”). Sometimes I put tildas (~) at key points in a document so I can jump to those points (“Find Tilde”). I also use the word “PLACEHOLDER” this way (“Find Placeholder”).

You can also use “Wait” with keywords. I use this one to scan a document for placeholders: “Find Placeholder Wait 2 Repeat 5″.

Speeding search by speech

Friday, February 1st, 2008

Keyboard shortcuts are powerful tools for the speech interface because they work across all programs and they can be combined — you can say several keyboard shortcuts in one phrase to speed things up.

This is why we encourage all software makers to make all features available via keyboard shortcuts.

Google is experimenting with adding keyboard shortcuts to search results. Here are the experimental keyboard shortcuts:

Command Action
Letter J Selects next result
Letter K Selects previous result
Enter (or Letter O) Opens selected result
Slash Moves cursor to search box
Escape Moves cursor to results

And here’s how to speed things up further with Utter Command combinations:

Command Action
Letter J · Enter Opens next result
Letter K · Enter Opens previous result
J Times 1-100 Moves down 1-100 and selects result
K Times 1-100 Moves up 1-100 and selects result
J Times 1-100 · Enter Moves down 1-100 and opens result
K Times 1-100 · Enter Moves up 1-100 and opens result
Escape · Enter Moves cursor to results and opens

To try these out

1. Go to the Google experimental page www.google.com/experimental/1
2. Under the Keyboard Shortcuts heading click “Join Experiment”
3. Go to regular Google search www.google.com2 or Advanced Google search www.google.com/advanced_search?hl=en3, type a query, then try the shortcuts on the results.

As long as you’re logged in you’ll be able to use these shortcuts in the regular and advanced Google search pages.

Note: the Join Experiment button uses cookies. If your browser is set to remove all cookies at the end of a session and you want to retain this setting add www.google.com to your exceptions list (Firefox: Tools/Options/Privacy/Exceptions; Internet Explorer: Tools/Options/Privacy/Sites).


Talking to your telephone vs. talking to your computer

Monday, January 28th, 2008

The SpeechTEK speech conference has a lot to say about the state of the desktop speech interface. The exhibits in and 2006 and 2007 were largely about where all the speech interface action is these days — not on the desktop, but over the telephone with interactive voice response (IVR) systems.

I went to several sessions aimed at the voice user interface designers (Vuids) who construct telephone speech command interfaces (even though I’m something of an imposter as a desktop voice user interface designer — I guess Dvuid would be the appropriate term).

We’re dealing with a lot of the same issues, though often with different twists:

  • Making sure people know what to say and stay oriented in the system
  • Accommodating beginners and experienced users
  • Making the process as fast and efficient as possible so people won’t hit the operator button or hang up (or not use the software — many people who buy desktop speech recognition software end up not using it)
  • In both cases the communications relationship is between a person and machine

And we’re looking at similar answers:

  • Making commands consistent
  • Avoiding ambiguity
  • Doing user testing
  • Thinking about configuring information in a certain order to make it more memorable (good mental maps and appropriate training wheels)
  • And above all avoiding the trap of thinking that people can just say anything because even if you truly could just say anything you still don’t know what to say

I’ve also been thinking about the differences between IVR and the desktop speech interface — these differences make the challenges more difficult or easier for each of the systems.

  • Desktop users tend to follow a more predictable curve — they get more experienced or drop it, while for some IVR systems you have occasional users.
  • People are more often forced to use IVR, while most people can easily avoid the desktop speech interface if they wish.
  • The desktop is capable of both visual and audio feedback, while IVR systems tend to only have audio feedback. (Interestingly, even though most speech engines come with the ability to speak, desktop computer interfaces generally don’t use this feedback channel. We’ve had positive results in user testing of judicious use of audio feedback.)
  • Both systems suffer from the widespread use of pseudo natural language. Natural language doesn’t really exist on either type of system and trying to fake natural language creates its own problems.

Outside the mouse and keyboard box

Saturday, January 5th, 2008

Here’s an attempt to explain the potential of the speech interface.

Controlling a computer using a mouse and keyboard is a very specific type of control, and for many years it was all we knew. This type of control still defines how we think about communicating with computers.

While it’s good to tap existing knowledge, it’s important not to let experience confine new methods of communication.

The way today’s speech interfaces work, speech commands often follow in the footsteps of the keyboard and mouse (”File”, “Open”, “Budget”, “Enter”) rather than tapping the full potential of speech (”Budget Folder”).

Think about the differences between road travel and air travel.

A plane goes faster than a car, so following a road from the air is faster than driving, and following roads might not be a bad idea at first to get your bearings. But the real power of air travel is the ability to travel any route, including over areas inaccessible by car like large bodies of water, mountain ranges and polar regions.

The Human-Machine Grammar that underpins Utter Command is aimed at mapping the best way to use speech to control the computer. The real power of speech is the ability to command the computer in ways not possible using the keyboard and mouse.

Here’s another metaphor:

In the days when cars that went 15 miles an hour were cutting-edge, this seemed fast — four times faster than walking and you didn’t have to expend energy. It may seem like working on a computer is fast today. It’s not. Speech has the potential to take us into another realm in terms of productivity.