Join us live on October 8th for the virtual Google Assistant Developer Day. Register now.

Conversational Actions

Conversational Actions let you extend Google Assistant with your own conversational interfaces that give users access to your products and services. Actions leverage Assistant's powerful natural language understanding (NLU) engine to process and understand natural language input and carry out tasks based on that input.

Overview

A Conversational Action is a simple object that defines an entry point (referred to as invocation) into a conversation:

  • An invocation defines how users tell Assistant they want to start a conversation with one of your Actions. An Action's invocation is defined by a single intent that gets matched when users request the Action.
  • A conversation defines how users interact with an Action after it's invoked. You build conversations with intents, types, scenes, and prompts.
  • In addition, your Actions can delegate extra work to fulfillment, which are web services that communicate with your Actions via webhooks. This lets you do data validation, call other web services, carry out business logic, and more.

You bundle one or many Actions together, based on the use cases that are important for your users, into a logical container called an Actions project. Your Actions project contains your entire invocation model (the collection of all your invocations), which lets users start at logical places in your conversation model (all the possible things users can say and all the possible ways you respond back to users).

Figure 1. A collection of Actions that serve as entry points into a conversation model. Intents that are eligible for invocation are considered to be global.

Invocation

Invocation is associated with a display name that represents a brand, name, or persona that lets users ask Assistant to invoke your Actions. Users can use this display name on its own (called the main invocation) or in combination with optional, deep link phrases to invoke your Actions.

For example, users can say the following phrases to invoke three separate Actions in an project with a display name of "Facts about Google":

  • "Ok Google, talk to Facts about Google"
  • "Ok Google, talk to Facts about Google to get company facts"
  • "Ok Google, talk to Facts about Google to get history facts"

The first invocation in the example is the main invocation. This invocation is associated with a special system intent named actions.intent.MAIN. The second and third invocations are deep link invocations that let you specify additional phrases that let users ask for specific functionality. These invocations correspond to custom intents that you designated as global. Each invocation in this example provides an entry point into a conversation and corresponds to a single Action.

Figure 2. Example of main invocation

Figure 2 describes a typical main invocation flow:

  1. When users request an Action, they typically ask Assistant for it by your display name.
  2. Assistant matches the user's request with the corresponding intent that matches the request. In this case, it would be actions.intent.MAIN.
  3. The Action is notified of the intent match and responds with the corresponding prompt to start a conversation with the user.

Conversation

Conversation defines how users interact with an Action after it's invoked. You build these interactions by defining the valid user input for your conversation, the logic to process that input, and the corresponding prompts to respond back to the user with. The following figure and explanation shows you how a typical conversation turn works with a conversation's low level components: intents, types, scenes, and prompts.

Figure 3. Example of a conversation

Figure 3 describes a typical conversation turn:

  1. When users say something, the Assistant NLU matches the input to an appropriate intent. An intent is matched if the language model for that intent can closely or exactly match the user input. You define the language model by specifying training phrases, or examples of things users might want to say. Assistant takes these training phrases and expands upon them to create the intent's language model.
  2. When the Assistant NLU matches an intent, it can extract parameters that you need from the input. These parameters have types associated with them, such as a date or number. You annotate specific parts of an intent's training phrases to specify what parameters you want to extract.
  3. A scene then processes the matched intent. You can think of scenes as the logic executors of an Action, doing the heavy lifting and carrying out logic necessary to drive a conversation forward. Scenes run in a loop, providing a flexible execution lifecycle that lets you do things like validate intent parameters, do slot filling, send prompts back to the user, and more.
  4. When a scene is done executing, it typically sends a prompt back to users to continue the conversation or can end the conversation if appropriate.

Fulfillment

During invocation or a conversation, your Action can trigger a webhook that notifies a fulfillment service to carry out some tasks.

Figure 4. Example of a conversation

Figure 4 describes how you can use fulfillment to generate prompts, a common way to use fulfillment:

  1. At specific points of your Action's execution, it can trigger a webhook that sends a request to a registered webhook handler (your fulfillment service) with a JSON payload.
  2. Your fulfillment processes the request, such as calling a REST API to do some data lookup or validating some data from the JSON payload. A very common way to use fulfillment is to generate a dynamic prompt at runtime so your conversations are more tailored to the current user.
  3. Your fulfillment returns a response back to your Action containing a JSON payload. It can use the data from the payload to continue it's execution and respond back to the user.