We’re entering a promising new era of computing, where advances in machine learning and artificial intelligence are creating a resurgence of interest in voice interfaces and natural language processing, creating the potential for conversation as the new mode of interaction with technology.
For the most part, the problem of recognizing spoken input has been largely solved, now opening up a new challenge: how to build a user experience that’s modeled after natural human conversation.
This site outlines the basic mechanics of conversation, introduces core principles to design by, and presents you with a practical voice user interface (VUI) toolkit to start creating conversational experiences that engage, delight, and truly help your users.
We can discover the key ingredients for a good VUI conversation simply by deconstructing the rules and conventions that we are mostly unaware of in natural human conversation. Building blocks for successful VUI conversations include:
We take turns in a conversation based on subtle signals that we pass back and forth. Without effective turn-taking, we might talk over each other, or our conversations can get out of sync and hard to follow.
In natural speech, all elements of a conversation are usually woven together in a coherent thread that includes context and the way the conversation evolves over time. Threading helps us keep track of that conversational flow.
Leveraging the inherent efficiency of speech
People often use verbal shortcuts because they intuitively understand what’s being said — essentially, we can “read between the lines” in a conversation and some things can be left unspoken. But the implication for VUIs is having to compensate for the seemingly illogical, non-mathematical nature of human speech.
Anticipating variable user behavior
People use a variety of words and styles to say the same thing, depending on their own situational context and expectations of a VUI based on previous experience, so VUIs should support these variations so all users can have a frictionless experience.
Instead of only focusing on the so-called “happy path”, designers can create robust experiences in all scenarios, even those that seem like “errors”. Things can go wrong with any conversation, and just as humans usually spot and repair their own mistakes, a VUI must also be able to repair the conversation based on the flow and nature of the interaction.
Read more on conversation basics in Understanding How Conversations Work: The Key to a Better Voice UI.
Understanding cooperative behavior
Turn-taking, context, and threading are all part of cooperative conversation, an idea popularized by linguistics philosopher Paul Grice. Grice called this the Cooperative Principle. He also developed Grice’s Maxims to define the essential conversational rules he observed — namely that people should be as truthful, informative, relevant, and clear as possible when talking with each other.
A VUI should try to follow these inherent rules of cooperation as well — and be ready to support wary users who’ve had bad experiences with other voice interfaces in the past.
Read more on the Cooperative Principle in Be Cooperative...Like Your Users.
Unlocking the power of spoken language
A good VUI doesn’t follow a stale script, and shouldn’t be based on the old touch-tone phone systems that force users down narrow paths. It also shouldn’t try to “teach” people what to say to protect them from veering off the so-called “happy path.”
Instead it should focus on the intuitive power of language and meaning, using everyday language to communicate with users. VUIs should also avoid stating the obvious or talking down to users — people don’t appreciate a device that sounds like it thinks it’s smarter than they are.
Read more on building an intuitive VUI in Unlocking the Power of Spoken Language.
Instilling user confidence
A good VUI also means validating user input and managing expectations in order to earn their trust and instill confidence.
Once someone makes a request, a VUI can use acknowledgers — words or phrases like “OK,” “Sure,” “Alright,” “Thanks,” and “Got it” — that show the VUI is engaged and listening. Randomizing acknowledgers can help make the experience feel more fluid and natural.
After acknowledging, the system can then seek either explicit or implicit confirmation that it understood. With explicit confirmation (which is usually used when something major is at stake, such as buying a plane ticket), the VUI asks the user for verbal agreement before proceeding.
With implicit confirmation (used for lower-risk situations, such as streaming a song), the VUI incorporates a key element of the user’s request into its response to validate and instill confidence in the user, but doesn’t require verbal approval per se.