Prompts

Prompts define how your Action renders responses to users and how your Action prompts them to continue. As you build your Action, you can add prompts to invocations and to various places within scenes. Prompts can be as simple as a text or speech response, or can be more complex and contain rich content like cards, images, and tables.

Response types

For each prompt, you select from a variety of engaging response types for Assistant to present to users:

  • Simple responses: Simple responses take the form of a chat bubble visually and use text-to-speech (TTS) or Speech Synthesis Markup Language (SSML) for sound. Simple responses are the only responses supported on all device types.
  • Rich responses: Rich responses contain visual or functional elements that enhance user interactions with your Actions. With rich responses, you can also display tabular data or play longer-form audio content.
  • Visual selection responses: Visual selection responses provide a visual interface for users to choose between multiple options that are most easily differentiated by their title or by an image.
  • Media responses: Media responses let your Actions play longer form audio content than SSML, and provide a visual component with media controls.
  • Interactive Canvas: Interactive Canvas renders responses as full-screen web views and functions as an interactive web app that Assistant sends as a response to the user in conversation. Canvas uses a slightly different prompt format to accommodate added flexibility from web standards like HTML, CSS, and JavaScript.

Each of these response types uses the same base prompt format and has access to the same general features described below.

Format of a prompt

In your Actions project, you define prompts in either YAML or JSON format. Each prompt can contain up to two simple responses, and optionally define a rich response. Responses are defined in the following ways:

  • first_simple: Initial text or speech (simple) response to send to the user.
  • content: Supplemental rich response content to send after simple responses.
  • last_simple Final text or speech (simple) response to send to the user.
  • canvas: References a web app that integrates with Interactive Canvas.

By default, prompts are appended to one another in the above order in a prompt queue. Before the user can respond, Assistant presents the user with all prompts in the prompt queue.

You additionally provide flexibility to the prompt using the following features:

  • Candidates: Candidates allow you to define responses based on a user's device capabilities. For example, you can have Assistant display rich responses only when a user interacts with your Action on a display-capable device.
  • Variants: Variants are alternate variations of a single message. For example, you can have Assistant choose between five different welcome message variants each time a user invokes your Action.
  • Suggestions: Suggestions provide users on display-capable devices with suggestion chips when Assistant displays the prompt.

A default prompt uses one candidate, one variant, and a first_simple response.

Candidates

In a prompt, the candidates object allows you to define responses based on a user's device capabilities. For example, you can have Assistant display rich responses only when a user interacts with your Action on a display-capable device. To define the device types where Assistant can return a candidate, use the selector property of the candidates object.

In the example below, the selector property contains the device capability information. Prompts set in the first candidate are sent to users on a device that can render rich responses. The second candidate contains prompts for users that can only receive text and speech responses.

YAML

candidates:
  - selector:
      surface_capabilities:
        capabilities:
          - RICH_RESPONSE
    first_simple:
      variants:
        - speech: Here's a simple message.
    content:
      card:
        title: Image card title
        text: Some details about the image
        image:
          url: 'https://www.example.com/image/'
  - first_simple:
      variants:
        - speech: Text explains what the image might have shown in more detail.
    

JSON

{
  "candidates": [{
    "selector": {
      "surface_capabilities": {
        "capabilities": ["RICH_RESPONSE"]
      }
    },
    "first_simple": {
      "variants": [{
        "speech": "Here's a simple message."
      }]
    },
    "content": {
      "card": {
        "title": "Image card title",
        "text": "Some details about the image",
        "image": {
          "url": "https://www.example.com/image/"
        }
      }
    }
  }, {
    "first_simple": {
      "variants": [{
        "speech": "Text explains what the image might have shown in more detail."
      }]
    }
  }]
}
    

You can provide one or more capability requirements for a given candidate. The following list describes each of the available capability requirements:

  • SPEECH: Device can speak to the user via text-to-speech or SSML.
  • RICH_RESPONSE: Device can display rich responses like cards, lists, and tables.
  • LONG_FORM_AUDIO: Device can play long form audio media like music and podcasts.
  • INTERACTIVE_CANVAS: Device can display an Interactive Canvas response.
  • WEB_LINK: Device can use web links in rich responses to open a web browser.
  • HOME_STORAGE: Device can store to and access data from home storage.

Variants

Variants provide a way to define multiple versions of a response. When Assistant sends the prompt to a user, one of the variants is chosen at random. As a best practice in conversation design, provide users with alternate responses when they converse with your Action.

For example, provide different welcome message variants so users don't hear the same response each time they invoke your Action:

YAML

candidates:
  - first_simple:
      variants:
        - speech: Hello.
        - speech: Hi there.
        - speech: Welcome.
    

JSON

{
  "candidates": [{
    "first_simple": {
      "variants": [{
        "speech": "Hello."
      },{
        "speech": "Hi there."
      },{
        "speech": "Welcome."
      }]
    }
  }]
}
    

Suggestions

Example of suggestion chips on a smart display

Suggestions provide users on display-capable devices with suggestion chips when Assistant displays the prompt. Use suggestion chips to hint at user responses to continue or pivot the conversation. When tapped, a suggestion chip returns the displayed text to the conversation verbatim, as if the user had typed it.

You may have a maximum of 8 suggestions in a single prompt, each with a maximum length of 25 plaintext characters.

To add a suggestion, provide a Suggestion object that contains each suggestion in a separate title field. Each title must be unique among the set of suggestion chips. In Actions Builder, this object is represented in YAML and JSON as suggestions.

For example, you can provide "Yes" and "No" suggestions alongside a question:

YAML

candidates:
  - first_simple:
      variants:
        - speech: 'Welcome, do you want to play a game?'
    suggestions:
      - title: 'Yes'
      - title: 'No'
    

JSON

{
  "candidates": [{
    "first_simple": {
      "variants": [{
        "speech": "Welcome, do you want to play a game?"
      }]
    },
    "suggestions": [{
      "title": "Yes"
    }, {
      "title": "No"
    }]
  }]
}