Conversation is a lot more than just a simple exchange of information.
In conversation, we share natural assumptions about a topic. We know how a conversation should develop. We have expectations about the quality and quantity of the contributions that each person should make. On top of that, we roll in politeness, consistency, and other natural rules of conversation. Plus, everyone instinctively knows to disregard superficial meanings if they're unclear or hyperbolic and search for deeper, non-literal interpretations. While we all do it naturally, conversation is actually a rather complicated process.
Linguistics philosopher Paul Grice said that to be understood, people need to speak cooperatively. He called this the Cooperative Principle. We can omit a lot of information by assuming there's an undercurrent of cooperation, making a conversation significantly more efficient. Asking "Do you … ?" really doesn't mean "Say 'yes' or 'no.'" Rather, it's often an indirect, polite way to ask something more specific.
He also created Grice's Maxims to define basic rules of cooperative conversation:
- Quality — Only say things that are true
- Quantity — Don't be more or less informative than needed
- Relevance — Only say things relevant to the topic
- Manner — Be brief, get to the point, and avoid ambiguity and obscurity
In other words, people should be as truthful, informative, relevant, and clear as the situation calls for. This is something voice user interfaces (VUIs) also need to do to be effective.
Logic and accuracy don't always rule
Our verbal shortcuts reveal the often illogical, non-mathematical nature of conversation. "Sue has two kids," for example, is technically and logically correct if Sue has five kids. But the statement is misleading in conversation, because it's missing contextual information (Sue's other three kids, in this case).
Plus, people can sometimes be deliberately uncooperative. In some cases, they're only trying to be kind or polite. When asked how a prospect did in a job interview, for example, they might evade a negative reply by saying, "He wore a beautiful tie."
A VUI has to accommodate all of these rules, which most people follow without thinking.
Recognition grammar performance and repair (error) prompting go hand in hand
VUI designers also have to be able to anticipate certain types of human "errors," as well as how a speech recognition grammar (everything a person might answer at some point in the conversation) is constructed. For example, consider this confirmation prompt for the purchase of a plane ticket:
|Alright, from Atlanta to Geneva on September 13th at 6 p.m. Is that right?|
If the answer is yes, people tend to give a short answer — "Yes," "Yeah," "Correct," "That's right," etc. But when the answer is no, they typically don't say just "no." Instead, they're cooperative, moving the conversation ahead with responses such as, "No, not Geneva, I said 'Jamaica'" or "No, not the 13th, the 30th, three zero." Or they might respond only to part of what they heard, "You got the time right, but the date's wrong."
If a technical limitation prevents your VUI from accommodating such cooperative exchanges, the dialog shouldn't set the user up to expect the system to be artificially capable of doing so. If it does, then first-level repair strategies are limited to unnatural prompts like: "So, did I get that right? Just say 'yes' or 'no'" — instantly distancing the user and exposing the VUI's underlying limitations and the artificial nature of the conversation, thereby undermining any potential for naturalness.
When a VUI isn't designed to accept responses that correct, qualify, or add information, it may mistakenly convey that it regards such naturally cooperative verbal behavior as uncooperative. This linguistic misperception will invariably show up in the brusque or mechanical tone of some so-called "error" prompts, as if the person has misbehaved ("That was not a valid response."), or in a tone of mock concern, as if speaking to a slow learner ("I'm sorry. I did not understand the response I heard.")
Varied user responses are opportunities, not "errors"
Consider how a question could be framed to anticipate repair prompting while keeping the conversation on track without discomfort or distraction. In fact, such an instance can become an opportunity for another meaningful turn in the dialog.
By now, because they've had bad experiences with speech-recognition technology — often compounded by bad VUI design — some people may take prompts literally, and not offer additional information that would be "naturally cooperative" to avoid triggering a recognition error. VUI design should comfortably accommodate these users as well.
Let's take a scenario in which a VUI needs to gather both a date and time. It should start out using a general prompt, which could work for both:
|User||Set an alarm.|
|VUI||Okay, for when?|
|User||Tuesday at 6 a.m.|
|VUI||Done. Your alarm's set for Tuesday at 6 a.m.|
But, since a more tentative or less-informative user might parcel out the information, the VUI doesn't treat it as an error, but simply accommodates:
|User||Set an alarm.|
|VUI||Alright, for when?|
|VUI||And the time?|
|VUI||There! Your alarm's set for Tuesday at 6 a.m.|
The above exchange doesn't expose the hidden logic needed to make the VUI cooperative and prepared for other possible user responses. The user could have completed the entire step with a single command without being prompted at all, giving either or both pieces of data, and it's left to the VUI to interpret the missing information ("Set an alarm for 6 a.m. Monday morning," "Wake me up in 6 hours," or "Set an alarm for 7 o'clock,") — all with elements left unspoken by the user.
The prompt "Alright, for when?" facilitates easy responses from people who have either a day or a time in mind, as well as from those ready with both. This kind of prompt fully facilitates the Cooperative Principle.
Like telling a joke — if you have to explain it, you're doing it wrong
A good VUI focuses on the intuitive power of language and meaning, rather than showcasing how a computer can be programmed to take "commands." It leverages the communication system that people learned first and know best: everyday speech. We're already proficient in our own language, so we don't have to be taught how to say an expected response or command in plain English (or plain Spanish, Tagalog, or Hindi). Put another way: Avoid commands as such, but if you need to help people understand what they can say to move the conversation along, use something intuitive.
So, instead of:
|To hear the message again, say 'Repeat;' to reply to it, say 'Reply;' and to move on to the next one, say 'Next.'|
Consider the more intuitive:
|Repeat, reply, or go on to the next one?|
Natural conversation is time-tested and user-approved
The Cooperative Principle underscores our ability to communicate efficiently and in socially appropriate ways, built on a powerful shared base of knowledge. By leveraging the conventions of natural conversation instead of ignoring them, we can make far better VUIs that people intuitively know how to use and feel comfortable with.