Responses

This document describes the different types of UIs that you can present to your users when building actions that can have audio or visual components or a combination of both.

Simple Responses

Audio (TTS/SSML)

Howdy, this is GeekNum. I can tell you fun facts about almost any number, my favorite is 42. What number do you have in mind?

Visual

Simple responses can appear on audio-only, screen-only, or both surfaces. They take the form of a chat bubble visually, and TTS/SSML sound.

TTS text will be used as chat bubble content by default. So if this looks fine, you do not need to specify any display text for a chat bubble.

Requirements
  • Supported on actions.capability.AUDIO_OUTPUT and actions.capability.SCREEN_OUTPUT surfaces
  • 640 character limit per chat bubble. Strings longer than the limit are truncated at the first word break (or whitespace) before 640 characters.

  • Chat bubble content must be a phonetic subset or a complete transcript of the TTS/SSML output. This helps users map out what you are saying and increases comprehension in various conditions.

  • At most 2 chat bubbles per turn

  • Chat head (logo) that you submit to Google must be 192x192 pixels and cannot be animated

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

conv.ask(new SimpleResponse({
  speech: 'Howdy! I can tell you fun facts about almost any number, like 42. What do you have in mind?',
  text: 'Howdy! I can tell you fun facts about almost any number, like 42. What do you have in mind?',
}));

JSON

{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "richInitialPrompt": {
                    "items": [
                        {
                            "simpleResponse": {
                                "textToSpeech": "Howdy! I can tell you fun facts about almost any number, like 42. What do you have in mind?",
                                "displayText": "Howdy! I can tell you fun facts about almost any number. What do you have in mind?"
                            }
                        }
                    ],
                    "suggestions": []
                }
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.TEXT"
                }
            ]
        }
    ]
}

SSML and sounds

Using SSML and sounds in your responses gives them more polish and enhances the user experience. The following code snippet shows you how to create a response that uses SSML:

Node.js
function saySSML(conv) {
  const ssml = '<speak>' +
    'Here are <say-as interpret-as="characters">SSML</say-as> samples. ' +
    'I can pause <break time="3" />. ' +
    'I can play a sound <audio src="https://www.example.com/MY_WAVE_FILE.wav">your wave file</audio>. ' +
    'I can speak in cardinals. Your position is <say-as interpret-as="cardinal">10</say-as> in line. ' +
    'Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line. ' +
    'Or I can even speak in digits. Your position in line is <say-as interpret-as="digits">10</say-as>. ' +
    'I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>. ' +
    'Finally, I can speak a paragraph with two sentences. ' +
    '<p><s>This is sentence one.</s><s>This is sentence two.</s></p>' +
    '</speak>';
  conv.ask(ssml);
}
    
JSON
{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "initialPrompts": [
                    {
                        "ssml": "<speak>Here are <say-as interpret-as=\"characters\">SSML</say-as> samples. I can pause <break time=\"3\" />. I can play a sound <audio src=\"https://www.example.com/MY_WAVE_FILE.wav\">your wave file</audio>. I can speak in cardinals. Your position is <say-as interpret-as=\"cardinal\">10</say-as> in line. Or I can speak in ordinals. You are <say-as interpret-as=\"ordinal\">10</say-as> in line. Or I can even speak in digits. Your position in line is <say-as interpret-as=\"digits\">10</say-as>. I can also substitute phrases, like the <sub alias=\"World Wide Web Consortium\">W3C</sub>. Finally, I can speak a paragraph with two sentences. <p><s>This is sentence one.</s><s>This is sentence two.</s></p></speak>"
                    }
                ],
                "noInputPrompts": []
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.TEXT"
                }
            ]
        }
    ]
}

See the SSML reference documentation for more information.

Sound library

We provide a variety of free, short sounds in our sound library. These sounds are hosted for you, so all you need to do is include them in your SSML.

Rich responses

Rich responses can appear on screen-only or audio and screen experiences. They can contain the following components:

  • One or two simple responses (chat bubbles)
  • An optional basic card
  • Optional suggestion chips
  • An optional link-out chip
  • An option interface (list or carousel)
Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • The first item in a rich response must be a simple response
  • At most two simple responses
  • At most one basic card, option interface (list or carousel), or StructuredResponse (i.e. You cannot have both a basic card and an option interface at the same time)
  • At most 8 suggestion chips
  • Suggestion chips are not allowed in a FinalResponse

The following examples show you how to build various types of rich responses.

Basic card

A basic card displays information that can include the following:

  • Image
  • Title
  • Sub-title
  • Text body
  • Link button
  • Border

Use basic cards mainly for display purposes. They are designed to be concise, to present key (or summary) information to users, and to allow users to learn more if you choose (using a weblink).

In most situations, you should add suggestion chips below the cards to continue or pivot the conversation.

Avoid repeating the information presented in the card in the chat bubble at all costs.

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • Formatted text (required if there's no image)
    • Plain text by default.
    • Must not contain a link.
    • 10 line limit with an image, 15 line limit without an image. This is about 500 (with image) or 750 (without image) characters. Smaller screen phones will also truncate text earlier than larger screen phones. If text contains too many lines, it's truncated at the last word break with an ellipses.
    • A limited subset of markdown is supported:
      • New line with a double space
      • **bold**
      • *italics*
  • Image (required if there's no formatted text)
    • All images forced to be 192 dp tall.
    • If the image's aspect ratio is different than the screen, the image is centered with gray bars on either vertical or horizontal edges.
    • Image source is a URL.
    • Motion GIFs are allowed.
Optional
  • Title
    • Plain text.
    • Fixed font and size.
    • At most one line; extra characters are truncated.
    • The card height collapses if no title is specified.
  • Sub-title
    • Plain text.
    • Fixed font and font size.
    • At most one line; extra characters are truncated.
    • The card height collapses if no subtitle is specified.
  • Link button
    • Link title is required
    • At most one link
    • Links to sites outside the developer's domain are allowed.
    • Link text cannot be misleading. This is checked in the approval process.
    • A basic card has no interaction capabilities without a link. Tapping on the link sends the user to the link, while the main body of the card remains inactive.
  • Border
    • The border between the card and the image container can be adjusted to customize the presentation of your basic card.
    • Configured by calling the .setImageDisplay method which accepts a string as its parameter.
    • Parameter value options are DEFAULT, WHITE, or CROPPED.

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}
// Create a basic card
conv.ask(new BasicCard({
  text: `This is a basic card.  Text in a basic card can include "quotes" and
  most other unicode characters including emoji 📱.  Basic cards also support
  some markdown formatting like *emphasis* or _italics_, **strong** or
  __bold__, and ***bold itallic*** or ___strong emphasis___ as well as other
  things like line  \nbreaks`, // Note the two spaces before '\n' required for
                               // a line break to be rendered in the card.
  subtitle: 'This is a subtitle',
  title: 'Title: this is a title',
  buttons: new Button({
    title: 'This is a button',
    url: 'https://assistant.google.com/',
  }),
  image: new Image({
    url: 'https://example.com/image.png',
    alt: 'Image alternate text',
  }),
}));

JSON

{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "richInitialPrompt": {
                    "items": [
                        {
                            "simpleResponse": {
                                "textToSpeech": "Math and prime numbers it is!"
                            }
                        },
                        {
                            "basicCard": {
                                "title": "Math & prime numbers",
                                "formattedText": "42 is an even composite number. It\n    is composed of three distinct prime numbers multiplied together. It\n    has a total of eight divisors. 42 is an abundant number, because the\n    sum of its proper divisors 54 is greater than itself. To count from\n    1 to 42 would take you about twenty-one…",
                                "image": {
                                    "url": "https://example.google.com/42.png",
                                    "accessibilityText": "Image alternate text"
                                },
                                "buttons": [
                                    {
                                        "title": "Read more",
                                        "openUrlAction": {
                                            "url": "https://example.google.com/mathandprimes"
                                        }
                                    }
                                ],
                                "imageDisplayOptions": "CROPPED"
                            }
                        }
                    ],
                    "suggestions": []
                }
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.TEXT"
                }
            ]
        }
    ]
}

List

The single-select list presents the user with a vertical list of multiple items and allows the user to select a single one. Selecting an item from the list generates a user query (chat bubble) containing the title of the list item.

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • List Title (optional)
    • Fixed font and font size
    • Restricted to a single line. (Excessive characters will be truncated.)
    • Plain text, Markdown is not supported.
    • The card height collapses if no title is specified.
  • List item

    • Title
      • Fixed font and font size
      • Max length: 1 line (truncated with ellipses…)
      • Required to be unique (to support voice selection)
    • Body Text (optional)
      • Fixed font and font size
      • Max length: 2 lines (truncated with ellipses…)
    • Image (optional)
      • Size: 48x48 px
  • Pagination

    • The pagination control appears under two conditions
      • Simple list: If > 5 items
      • Lists with body text or image: If >3 items
    • 30 item max
  • Interaction

    • Voice/Text
      • The user can always say or type an item's title instead of tapping it.
      • Must have an intent for touch input that handles the actions_intent_OPTION event.
    • Swipe
      • If the number of items in the list is great enough to make the pagination control appear, then swiping left/right reveals different list items
Guidance

Lists are good for when it's important to compare options (e.g which "Peter", do you need to speak to? Peter Jons, or Peter Hans?), or the user needs to choose between options that need to be scanned at a glance.

We recommend adding suggestion chips below a list to enable the user to pivot or expand the conversation. Never repeat the options presented in the list as suggestion chips. Chips in this context are use to pivot the conversation (not for choice selection).

Notice that in the example presented here, the chat bubble that accompanies the list card is a subset of the audio (TTS/SSML). The audio (TTS/SSML) here integrates the first listed item. We discourage reading all the elements from the list. It's best to mention the top item/s (for example, the most popular, the recently purchased, or the most talked about).

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}
// Create a list
conv.ask(new List({
  title: 'List Title',
  items: {
    // Add the first item to the list
    [SELECTION_KEY_ONE]: {
      synonyms: [
        'synonym of title 1',
        'synonym of title 2',
        'synonym of title 3',
      ],
      title: 'Title of First List Item',
      description: 'This is a description of a list item.',
      image: new Image({
        url: IMG_URL_AOG,
        alt: 'Image alternate text',
      }),
    },
    // Add the second item to the list
    [SELECTION_KEY_GOOGLE_HOME]: {
      synonyms: [
        'Google Home Assistant',
        'Assistant on the Google Home',
    ],
      title: 'Google Home',
      description: 'Google Home is a voice-activated speaker powered by ' +
        'the Google Assistant.',
      image: new Image({
        url: IMG_URL_GOOGLE_HOME,
        alt: 'Google Home',
      }),
    },
    // Add the third item to the list
    [SELECTION_KEY_GOOGLE_PIXEL]: {
      synonyms: [
        'Google Pixel XL',
        'Pixel',
        'Pixel XL',
      ],
      title: 'Google Pixel',
      description: 'Pixel. Phone by Google.',
      image: new Image({
        url: IMG_URL_GOOGLE_PIXEL,
        alt: 'Google Pixel',
      }),
    },
  },
}));

JSON

{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "initialPrompts": [
                    {
                        "textToSpeech": "Alright! Here are a few things you can learn. Which sounds interesting?"
                    }
                ],
                "noInputPrompts": []
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.OPTION",
                    "inputValueData": {
                        "@type": "type.googleapis.com/google.actions.v2.OptionValueSpec",
                        "listSelect": {
                            "title": "Things to learn about",
                            "items": [
                                {
                                    "optionInfo": {
                                        "key": "MATH_AND_PRIME",
                                        "synonyms": [
                                            "math",
                                            "math and prime",
                                            "prime numbers",
                                            "prime"
                                        ]
                                    },
                                    "title": "Math & prime numbers",
                                    "description": "42 is an abundant number because the sum of its proper divisors 54 is greater…",
                                    "image": {
                                        "url": "http://example.com/math_and_prime.jpg",
                                        "accessibilityText": "Math & prime numbers"
                                    }
                                },
                                {
                                    "optionInfo": {
                                        "key": "EGYPT",
                                        "synonyms": [
                                            "religion",
                                            "egpyt",
                                            "ancient egyptian"
                                        ]
                                    },
                                    "title": "Ancient Egyptian religion",
                                    "description": "42 gods who ruled on the fate of the dead in the afterworld. Throughout the under…",
                                    "image": {
                                        "url": "http://example.com/egypt",
                                        "accessibilityText": "Egypt"
                                    }
                                },
                                {
                                    "optionInfo": {
                                        "key": "RECIPES",
                                        "synonyms": [
                                            "recipes",
                                            "recipe",
                                            "42 recipes"
                                        ]
                                    },
                                    "title": "42 recipes with 42 ingredients",
                                    "description": "Here's a beautifully simple recipe that's full of flavor! All you need is some ginger and…",
                                    "image": {
                                        "url": "http://example.com/recipe",
                                        "accessibilityText": "Recipe"
                                    }
                                }
                            ]
                        }
                    }
                }
            ]
        }
    ]
}

Handling a selected item

When users select an item, the selected item value is passed to you as an argument. You can use the client library to read the value by calling app.getSelectedOption(). In the returned value, you will get the key identifier for the selected item:

Node.js

const SELECTED_ITEM_RESPONSES = {
  [SELECTION_KEY_ONE]: 'You selected the first item',
  [SELECTION_KEY_GOOGLE_HOME]: 'You selected the Google Home!',
  [SELECTION_KEY_GOOGLE_PIXEL]: 'You selected the Google Pixel!',
};

app.intent('actions.intent.OPTION', (conv, params, option) => {
  let response = 'You did not select any item';
  if (option && SELECTED_ITEM_RESPONSES.hasOwnProperty(option)) {
    response = SELECTED_ITEM_RESPONSES[option];
  }
  conv.ask(response);
});

JSON

{
    "user": {
        "userId": "123456abcde",
        "locale": "en-US"
    },
    "conversation": {
        "conversationId": "123456",
        "type": "ACTIVE",
        "conversationToken": ""
    },
    "inputs": [
        {
            "intent": "actions.intent.OPTION",
            "rawInputs": [
                {
                    "inputType": "VOICE",
                    "query": "42 recipes with 42 ingredients"
                }
            ],
            "arguments": [
                {
                    "name": "OPTION",
                    "textValue": "RECIPES"
                }
            ]
        }
    ],
    "surface": {
        "capabilities": [
            {
                "name": "actions.capability.AUDIO_OUTPUT"
            },
            {
                "name": "actions.capability.SCREEN_OUTPUT"
            }
        ]
    },
    "isInSandbox": true,
    "availableSurfaces": [
        {
            "capabilities": [
                {
                    "name": "actions.capability.AUDIO_OUTPUT"
                },
                {
                    "name": "actions.capability.SCREEN_OUTPUT"
                }
            ]
        }
    ]
}

The carousel scrolls horizontally and allows for selecting one item. Compared to the list selector, it has large tiles-allowing for richer content. The tiles that make up a carousel are similar to the basic card with image. Selecting an item from the carousel will simply generate a chat bubble as the response just like with list selector.

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • Carousel
    • Max # tiles: 10
    • Min # tiles: 2
    • Plain text, Markdown is not supported.
  • Carousel tile
    • Image (optional)
      • Image is forced to be 128 dp tall x 232 dp wide
      • If the image aspect ratio doesn't match the image bounding box, then the image is centered with bars on either side
      • If an image link is broken then a placeholder image is used instead
    • Title (required)
      • Same as the Basic Text Card
      • Titles must be unique (to support voice selection)
    • Description (optional)
      • Same formatting options as the Basic Text Card
      • Max 4 lines
      • Plain text, Markdown is not supported.
  • Interaction
    • Swipe left/right: Slide the carousel to reveal different cards.
    • Tap card: Tapping an item simply generates a chat bubble with the same text as the element title.
      • Must have an intent for touch input that handles the actions_intent_OPTION event.
    • Voice/Keyboard: Replying with the card title (if specified) functions the same as selecting that item.
Guidance

Carousels are good when various options are presented to the user, but a direct comparison is not required among them (versus lists). In general, lists are preferred over carousels simply because lists are easier to visually scan and interact with via voice.

We recommend adding suggestion chips below a carousel if you want to continue the conversation. Never repeat the options presented in the list as suggestion chips. Chips in this context are used to pivot the conversation (not for choice selection).

Same as with lists, the chat bubble that accompanies the carousel card is a subset of the audio (TTS/SSML). The audio (TTS/SSML) here integrates the first tile in the carousel, and we also strongly discourage reading all the elements from the carousel. It's best to mention the first item and the reason why it's there (e.g. the most popular, the most recently purchased, the most talked about, etc.).

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}
// Create a carousel
conv.ask(new Carousel({
  items: {
    // Add the first item to the carousel
    [SELECTION_KEY_ONE]: {
      synonyms: [
        'synonym of title 1',
        'synonym of title 2',
        'synonym of title 3',
      ],
      title: 'Title of First Carousel Item',
      description: 'This is a description of a carousel item.',
      image: new Image({
        url: IMG_URL_AOG,
        alt: 'Image alternate text',
      }),
    },
    // Add the second item to the carousel
    [SELECTION_KEY_GOOGLE_HOME]: {
      synonyms: [
        'Google Home Assistant',
        'Assistant on the Google Home',
    ],
      title: 'Google Home',
      description: 'Google Home is a voice-activated speaker powered by ' +
        'the Google Assistant.',
      image: new Image({
        url: IMG_URL_GOOGLE_HOME,
        alt: 'Google Home',
      }),
    },
    // Add third item to the carousel
    [SELECTION_KEY_GOOGLE_PIXEL]: {
      synonyms: [
        'Google Pixel XL',
        'Pixel',
        'Pixel XL',
      ],
      title: 'Google Pixel',
      description: 'Pixel. Phone by Google.',
      image: new Image({
        url: IMG_URL_GOOGLE_PIXEL,
        alt: 'Google Pixel',
      }),
    },
  },
}));

JSON

{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "initialPrompts": [
                    {
                        "textToSpeech": "Alright! Here are a few things you can learn. Which sounds interesting?"
                    }
                ],
                "noInputPrompts": []
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.OPTION",
                    "inputValueData": {
                        "@type": "type.googleapis.com/google.actions.v2.OptionValueSpec",
                        "carouselSelect": {
                            "items": [
                                {
                                    "optionInfo": {
                                        "key": "MATH_AND_PRIME",
                                        "synonyms": [
                                            "math",
                                            "math and prime",
                                            "prime numbers",
                                            "prime"
                                        ]
                                    },
                                    "title": "Math & prime numbers",
                                    "description": "42 is an abundant number because the sum of its proper divisors 54 is greater…",
                                    "image": {
                                        "url": "http://example.com/math_and_prime.jpg",
                                        "accessibilityText": "Math & prime numbers"
                                    }
                                },
                                {
                                    "optionInfo": {
                                        "key": "EGYPT",
                                        "synonyms": [
                                            "religion",
                                            "egpyt",
                                            "ancient egyptian"
                                        ]
                                    },
                                    "title": "Ancient Egyptian religion",
                                    "description": "42 gods who ruled on the fate of the dead in the afterworld. Throughout the under…",
                                    "image": {
                                        "url": "http://example.com/egypt",
                                        "accessibilityText": "Egypt"
                                    }
                                },
                                {
                                    "optionInfo": {
                                        "key": "RECIPES",
                                        "synonyms": [
                                            "recipes",
                                            "recipe",
                                            "42 recipes"
                                        ]
                                    },
                                    "title": "42 recipes with 42 ingredients",
                                    "description": "Here's a beautifully simple recipe that's full of flavor! All you need is some ginger and…",
                                    "image": {
                                        "url": "http://example.com/recipe",
                                        "accessibilityText": "Recipe"
                                    }
                                }
                            ]
                        }
                    }
                }
            ]
        }
    ]
}

Handling selected item

When users select an item, the selected item value is passed to you as an argument. You can use the client library to read the value by calling app.getSelectedOption(). In the returned value, you will get the key identifier for the selected item:

Node.js

const SELECTED_ITEM_RESPONSES = {
  [SELECTION_KEY_ONE]: 'You selected the first item',
  [SELECTION_KEY_GOOGLE_HOME]: 'You selected the Google Home!',
  [SELECTION_KEY_GOOGLE_PIXEL]: 'You selected the Google Pixel!',
};

app.intent('actions.intent.OPTION', (conv, params, option) => {
  let response = 'You did not select any item';
  if (option && SELECTED_ITEM_RESPONSES.hasOwnProperty(option)) {
    response = SELECTED_ITEM_RESPONSES[option];
  }
  conv.ask(response);
});

JSON

{
    "user": {
        "userId": "123456abcde",
        "locale": "en-US"
    },
    "conversation": {
        "conversationId": "123456",
        "type": "ACTIVE",
        "conversationToken": ""
    },
    "inputs": [
        {
            "intent": "actions.intent.OPTION",
            "rawInputs": [
                {
                    "inputType": "VOICE",
                    "query": "42 recipes with 42 ingredients"
                }
            ],
            "arguments": [
                {
                    "name": "OPTION",
                    "textValue": "RECIPES"
                }
            ]
        }
    ],
    "surface": {
        "capabilities": [
            {
                "name": "actions.capability.AUDIO_OUTPUT"
            },
            {
                "name": "actions.capability.SCREEN_OUTPUT"
            }
        ]
    },
    "isInSandbox": true,
    "availableSurfaces": [
        {
            "capabilities": [
                {
                    "name": "actions.capability.AUDIO_OUTPUT"
                },
                {
                    "name": "actions.capability.SCREEN_OUTPUT"
                }
            ]
        }
    ]
}

A browsing carousel is a rich response, similar to the carousel response as it scrolls horizontally and allows users to select a tile. Browsing carousels are designed specifically for web content by opening the selected tile in a web browser (or an AMP browser if all tiles are AMP-enabled). The browsing carousel will also persist on the user's Assistant surface for browsing later.

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • Browsing carousel
    • Max # tiles: 10
    • Min # tiles: 2
    • Tiles in the carousel must all link to web content (AMP content recommended).
      • In order for the user to be taken to an AMP viewer, the UrlHintType on AMP content tiles must be set to "AMP_CONTENT".
  • Browsing carousel tiles

    • Tile Consistency (required)
      • All tiles in a browsing carousel must have the same components. E.g. if one tile has an image field, the rest of the tiles in the carousel must also have image fields.
      • If all tiles link to AMP-enabled content, the user will be taken to an AMP browser with additional functionality. If any tiles link to non-AMP content, then all tiles will direct users to a web browser.
    • Image (optional)
      • Image is forced to be 128 dp tall x 232 dp wide.
      • If the image aspect ratio doesn't match the image bounding box, then the image is centered with bars on either side.
      • If an image link is broken then a placeholder image is used instead.
      • Alt-text is required on an image.
    • Title (required)
      • Same formatting options as the Basic Text Card.
      • Titles must be unique (to support voice selection).
      • Max 2 lines.
      • Font size 16 sp.
    • Description (optional)
      • Same formatting options as the Basic Text Card.
      • Max 4 lines.
      • Truncated with ellipses (...)
      • Font size 14sp, gray color.
    • Footer (optional)
      • Fixed font and font size.
      • Max 1 line.
      • Truncated with ellipses (...)
      • Anchored at the bottom, so tiles with fewer lines of body text may have white space above the sub-text.
      • Font size 14sp, gray color.
  • Interaction

    • Swipe left/right: Slide the carousel to reveal different cards.
    • Tap card: Tapping an item takes the user to a browser, displaying the linked page.
  • Voice input

    • Mic behavior
      • The mic doesn't re-open when a browsing carousel is sent to the user.
      • The user can still tap the mic or invoke the Assistant ("OK Google") to re-open the mic.
Guidance

By default, the mic remains closed after a browse carousel is sent. If you want to continue the conversation afterwards, we strongly recommend adding suggestion chips below the carousel. Never repeat the options presented in the list as suggestion chips. Chips in this context are used to pivot the conversation (not for choice selection).

Same as with lists, the chat bubble that accompanies the carousel card is a subset of the audio (TTS/SSML). The audio (TTS/SSML) here integrates the first tile in the carousel, and we also strongly discourage reading all the elements from the carousel. It's best to mention the first item and the reason why it's there (e.g. the most popular, the most recently purchased, the most talked about, etc.).

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

const a11yText = 'Google Assistant Bubbles';
const googleUrl = 'https://google.com';
if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}
// Create a browse carousel
conv.ask(new BrowseCarousel({
  items: [
    new BrowseCarouselItem({
      title: 'Title of item 1',
      url: googleUrl,
      description: 'Description of item 1',
      image: new Image({
        url: IMG_URL_AOG,
        alt: a11yText,
      }),
      footer: 'Item 1 footer',
    }),
    new BrowseCarouselItem({
      title: 'Title of item 2',
      url: googleUrl,
      description: 'Description of item 2',
      image: new Image({
        url: IMG_URL_AOG,
        alt: a11yText,
      }),
      footer: 'Item 2 footer',
    }),
  ],
}));

JSON

{
  "conversationToken": "[\"_actions_on_google_\"]",
  "expectUserResponse": true,
  "expectedInputs": [
    {
      "inputPrompt": {
        "richInitialPrompt": {
          "items": [
            {
              "simpleResponse": {
                "textToSpeech": "Alright! Here are a few web pages you might want to check out."
              }
            },
            {
              "carouselBrowse": {
                "items": [
                  {
                    "title": "Title of item 1",
                    "description": "Description of item 1",
                    "footer": "Item 1 footer",
                    "image": {
                      "url": "https://www.gstatic.com/mobilesdk/170329_assistant/assistant_color_96dp.png",
                      "accessibilityText": "Google Assistant Bubbles"
                    },
                    "openUrlAction": {
                      "url": "https://github.com"
                    }
                  },
                  {
                    "title": "Title of item 2",
                    "description": "Description of item 2",
                    "footer": "Item 2 footer",
                    "image": {
                      "url": "https://www.gstatic.com/mobilesdk/170329_assistant/assistant_color_96dp.png",
                      "accessibilityText": "Google Assistant Bubbles"
                    },
                    "openUrlAction": {
                      "url": "https://google.com"
                    }
                  }
                ]
              }
            }
          ]
        }
      },
      "possibleIntents": [
        {
          "intent": "assistant.intent.action.TEXT"
        }
      ]
    }
  ],
  "responseMetadata": {
    "status": {},
    "queryMatchInfo": {
      "queryMatched": true,
      "intent": "74c61f1c-8d79-4c07-a2c5-84e20ebd3d16"
    }
  }
}

Handling selected item

No follow-up fulfillment is necessary for user interactions with browse carousel items, since the carousel handles the browser handoff. Keep in mind that the mic will not re-open after the user interacts with a browse carousel item, so you should either end the conversation or include suggestion chips in your response as per the guidance above.

Suggestion Chip

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
  • Max number of chips: 8
  • Max text length: 25 characters
  • Supports only plain text
Guidance

Use suggestion chips to hint at responses to continue or pivot the conversation. If during the conversation there is a primary call for action, consider listing that as the first suggestion chip.

Whenever possible, you should incorporate one key suggestion as part of the chat bubble, but do so only if the response or chat conversation feels natural.

Sample code

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

if (!conv.surface.capabilities.has('actions.capability.SCREEN_OUTPUT')) {
  conv.ask('Sorry, try this on a screen device or select the ' +
    'phone surface in the simulator.');
  return;
}
conv.ask(new Suggestions('Suggestion Chips'));
conv.ask(new Suggestions(['suggestion 1', 'suggestion 2']));
conv.ask(new LinkOutSuggestion({
  name: 'Suggestion Link',
  url: 'https://assistant.google.com/',
}));

JSON

{
    "conversationToken": "",
    "expectUserResponse": true,
    "expectedInputs": [
        {
            "inputPrompt": {
                "richInitialPrompt": {
                    "items": [
                        {
                            "simpleResponse": {
                                "textToSpeech": "Howdy! I can tell you fun facts about almost any number like 0, 42, or 100. What number do you have in mind?",
                                "displayText": "Howdy! I can tell you fun facts about almost any number. What number do you have in mind?"
                            }
                        }
                    ],
                    "suggestions": [
                        {
                            "title": "0"
                        },
                        {
                            "title": "42"
                        },
                        {
                            "title": "100"
                        },
                        {
                            "title": "Never mind"
                        }
                    ],
                    "linkOutSuggestion": {
                        "destinationName": "Suggestion Link",
                        "url": "https://assistant.google.com/"
                    }
                }
            },
            "possibleIntents": [
                {
                    "intent": "actions.intent.TEXT"
                }
            ]
        }
    ]
}

Media responses

Media responses let your Actions play audio content with a playback duration longer than the 120-second limit of SSML. The primary component of a media response is the single-track card. The card allows the user to perform these operations:

  • Replay the last 10 seconds.
  • Skip forward for 30 seconds.
  • View the total length of the media content.
  • View a progress indicator for audio playback.
  • View the elapsed playback time.

Media responses support the following audio controls for voice interaction:

  • “Ok Google, play.”
  • “Ok Google, pause.”
  • “Ok Google, stop.”
  • “Ok Google, start over.”

Requirements

In order to use media responses in your Actions project, make sure it adheres to the following requirements:

  • Supported on actions.capability.MEDIA_RESPONSE_AUDIO surfaces.
  • Audio for playback must be in a correctly formatted .mp3 file. Live streaming is not supported.
  • The media file for playback must be specified as an HTTPS URL.

Behavior on surfaces

Media responses are supported on Android phones and on Google Home. The behavior of media responses depends on the surface on which users interact with your Actions.

On Android phones, users can see media responses when any of these conditions are met:

  • Google Assistant is in the foreground, and the phone screen is on.
  • The user leaves Google Assistant while audio is playing and returns to Google Assistant within 10 minutes of playback completion. On returning to Google Assistant, the user sees the media card and suggestion chips.

Media controls are available while the phone is locked. On Android, the controls also appear in the notification area.

Sample code

The following code sample shows how you might update your rich responses to include media.

Note that the JSON below describes a webhook response using the Actions SDK.

Node.js

if (!conv.surface.capabilities.has('actions.capability.MEDIA_RESPONSE_AUDIO')) {
  conv.ask('Sorry, this device does not support audio playback.');
  return;
}
conv.ask(new MediaObject({
  name: 'Jazz in Paris',
  url: 'http://storage.googleapis.com/automotive-media/Jazz_In_Paris.mp3',
  description: 'A funky Jazz tune',
  icon: new Image({
    url: 'http://storage.googleapis.com/automotive-media/album_art.jpg',
    alt: 'Media icon',
  }),
}));

JSON

"mediaResponse": {
  "mediaType": "AUDIO",
  "mediaObjects": [{
    "name": "Song One",
    "description": "This is song one.", //optional
    "largeImage": {
      // Use "icon" if displaying icon.
      "url": "https://...jpg"
    },
    "contentUrl": "https://...mp3"
  }]
}

Building Actions with media responses

Your response must include a mediaResponse with a mediaType of AUDIO and containing a mediaObject within the rich response's item array. A media response supports a single media object. A media object must include the name and content URL of the audio file. A media object may optionally include sub-text (description) and an icon or large image URL.

On phones and Google Home, when your Action completes audio playback, Google Assistant checks if the media response is a final response. If not, it sends a callback to your fulfillment, allowing you to respond to the user.

Your Action must include suggestion chips if the response is not a final response.

Handling callback after playback completion

Your Action should handle the actions.intent.MEDIA_STATUS intent to prompt user for follow-up (for example, to play another song). Your Action receives this callback once media playback is completed. In the callback, the MEDIA_STATUS argument contains status information about the current media. The status value will either be FINISHED or STATUS_UNSPECIFIED.

Using Dialogflow

If you want to perform conversational branching in Dialogflow, you’ll need to set up an input context of actions_capability_media_response_audio on the intent to ensure it only triggers on surfaces that support a media response.

Building your fulfillment

The code snippet below shows how you might write the fulfillment code for your Action. If you're using Dialogflow, replace actions.intent.MEDIA_STATUS with the action name specified in the intent which receives the actions_intent_MEDIA_STATUS event, (for example, "media.status.update“).

Node.js

app.intent('actions.intent.MEDIA_STATUS', (conv) => {
  const mediaStatus = conv.arguments.get('MEDIA_STATUS');
  let response = 'Unknown media status received.';
  if (mediaStatus && mediaStatus.status === 'FINISHED') {
    response = 'Hope you enjoyed the tunes!';
  }
  conv.ask(response);
});

JSON

"inputs": [
  {
    "intent": "actions.intent.MEDIA_STATUS",
    "rawInputs": [
      {}
    ],
    "arguments": [
      {
        "name": "MEDIA_STATUS",
        "extension": {
          "@type": "type.googleapis.com/google.actions.v2.MediaStatus",
          "status": "FINISHED"
        }
      }
    ]
  }
]

Table card

Table cards allow you to display tabular data in your response (for example, sports standings, election results, and flights). You can define columns and rows (up to 3 each) that the Assistant is required to show in your table card. You can also define additional columns and rows along with their prioritization.

Tables are different than vertical lists because tables display static data and are not interactable, like list elements.

Requirements
  • Supported on actions.capability.SCREEN_OUTPUT surfaces
Customize table card elements

The following section summarizes how you can customize the elements in a table card.

Name Is Optional Is Customizable Customization Notes
title Yes Yes Overall title of the table. Must be set if subtitle is set. You can customize the font family and color.
subtitle Yes No Subtitle for the table.
image Yes Yes Image associated with the table.
Row No Yes

Row data of the table. Consists of an array of Cell objects and a divider_after property which indicates whether there should be a divider after the row.

The first 3 rows are guaranteed to be shown, but others might not appear on certain surfaces.

Please test with the simulator to see which rows will be shown for a given surface. On surfaces that support the WEB_BROWSER capability, you can point the user to a web page with more data.

ColumnProperties Yes Yes Header and alignment for a column. Consists of a header property (representing the header text for a column) and a horizontal_alignment property (of type HorizontalAlignment).
Cell No Yes Describes a cell in a row. Each cell contains a string representing a text value. You can customize the text in the cell.
Button Yes Yes A button object that usually appears at the bottom of a card. A table card can only have 1 button. You can customize the button color.
HorizontalAlignment Yes Yes Horizontal alignment of content within the cell. Values can be LEADING, CENTER, or TRAILING. If unspecified, content will be aligned to the leading edge of the cell.
Sample code

The following snippets show how to implement a simple table card:

Node.js

// Simple table
conv.ask('Simple Response')
conv.ask(new Table({
  dividers: true,
  columns: ['header 1', 'header 2', 'header 3'],
  rows: [
    ['row 1 item 1', 'row 1 item 2', 'row 1 item 3'],
    ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
  ],
}))

JSON

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "Simple Response"
            }
          },
          {
            "tableCard": {
              "rows": [
                {
                  "cells": [
                    {
                      "text": "row 1 item 1"
                    },
                    {
                      "text": "row 1 item 2"
                    },
                    {
                      "text": "row 1 item 3"
                    }
                  ],
                  "dividerAfter": true
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ],
                  "dividerAfter": true
                }
              ],
              "columnProperties": [
                {
                  "header": "header 1"
                },
                {
                  "header": "header 2"
                },
                {
                  "header": "header 3"
                }
              ]
            }
          }
        ]
      },
      "userStorage": "{\"data\":{}}"
    }
  }
}

The following snippets show how to implement a complex table card:

Node.js

// All fields
conv.ask('Simple Response')
conv.ask(new Table({
  title: 'Table Title',
  subtitle: 'Table Subtitle',
  image: new Image({
    url: 'https://avatars0.githubusercontent.com/u/23533486',
    alt: 'Actions on Google'
  }),
  columns: [
    {
      header: 'header 1',
      align: 'CENTER',
    },
    {
      header: 'header 2',
      align: 'LEADING',
    },
    {
      header: 'header 3',
      align: 'TRAILING',
    },
  ],
  rows: [
    {
      cells: ['row 1 item 1', 'row 1 item 2', 'row 1 item 3'],
      dividerAfter: false,
    },
    {
      cells: ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
      dividerAfter: true,
    },
    {
      cells: ['row 2 item 1', 'row 2 item 2', 'row 2 item 3'],
    },
  ],
  buttons: new Button({
    title: 'Button Title',
    url: 'https://github.com/actions-on-google'
  }),
}))

JSON

{
  "payload": {
    "google": {
      "expectUserResponse": true,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "Simple Response"
            }
          },
          {
            "tableCard": {
              "title": "Table Title",
              "subtitle": "Table Subtitle",
              "image": {
                "url": "https://avatars0.githubusercontent.com/u/23533486",
                "accessibilityText": "Actions on Google"
              },
              "rows": [
                {
                  "cells": [
                    {
                      "text": "row 1 item 1"
                    },
                    {
                      "text": "row 1 item 2"
                    },
                    {
                      "text": "row 1 item 3"
                    }
                  ],
                  "dividerAfter": false
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ],
                  "dividerAfter": true
                },
                {
                  "cells": [
                    {
                      "text": "row 2 item 1"
                    },
                    {
                      "text": "row 2 item 2"
                    },
                    {
                      "text": "row 2 item 3"
                    }
                  ]
                }
              ],
              "columnProperties": [
                {
                  "header": "header 1",
                  "horizontalAlignment": "CENTER"
                },
                {
                  "header": "header 2",
                  "horizontalAlignment": "LEADING"
                },
                {
                  "header": "header 3",
                  "horizontalAlignment": "TRAILING"
                }
              ],
              "buttons": [
                {
                  "title": "Button Title",
                  "openUrlAction": {
                    "url": "https://github.com/actions-on-google"
                  }
                }
              ]
            }
          }
        ]
      },
      "userStorage": "{\"data\":{}}"
    }
  }
}

Customizing your responses

You can change the appearance of your rich responses by creating a custom theme. If you define a theme for your Actions project, rich responses across your project's Actions will be styled according to your theme. This custom branding can be useful for defining a unique look and feel to the conversation when users invoke your Actions on a surface with a screen.

To set a custom response theme, do the following:

  1. In the Actions console, navigate to Build > Theme customization.
  2. Set any or all of the following:
    • Background color will be used as the background of your cards. In general, you should use a light color for the background so the card's content will be easy to read.
    • Primary color is the main color for your cards' header texts and UI elements. In general, you should use a darker primary color to contrast with the background.
    • Font family describes the type of font used for titles and other prominent text elements.
    • Image corner style can alter the look of your cards' corners.
    • Background image will use a custom image in place of the background color. You'll need to provide two different images for when the surface device is in landscape or portrait mode, respectively. Note that if you use a background images, the primary color will be set to white.
  3. Click Save.

UI checklist

The following checklist highlights common things you can do to make sure your responses appear appropropriately on the surface users are experiencing your actions on.

Cards and Options
Use cards and options

Cards and options let you display information in a richer, more customizable format.

  • Basic card - If you need to present a lot of text to the user, use a basic card. A card can display up to 15 lines of text, and link to a website for further reading. Unlike chat bubbles, the card supports text formatting. You can also add an image and a list or carousel to display options.
  • List - If you are asking the user to pick from a list of choices, consider using a list instead of writing out the list in a chat bubble.
  • Carousel - If you want to the user to pick from a list of choices with a focus on larger images, use a carousel, which has a limit of 8 items.

Suggestion Chips
Use them after most turns

The best thing you can do to increase your Action's usability on devices with screens is to add chips, so the user can quickly tap to respond in addition to using voice or the keyboard. For example, any yes/no question should have suggestion chips for **Yes** and **No**.

When there are a few choices...

When offering the user a small number of choices (8 or less) you can add suggestion for each choice (present them in the same order as in your TTS, and using the same terminology).

When there are many choices...

If you ask a question with a wide range of possible answers, present a few of the most popular answers.

When returning media responses...

Your fulfillment must include suggestion chips with the media response if the response is not a final response.

Chat Bubbles
Correct capitalization and punctuation

Now that your TTS strings can show up as chat bubbles, check your them for correct capitalization and punctuation.

Fix phonetic spellings

If you spelled something out phonetically in your TTS to help with a pronunciation issue, then that phonetic misspelling will appear in your chat bubble. Use different display text to use correct spelling for chat bubbles on devices with screens.

Avoid truncation

Chat bubbles are limited to 640 characters and are truncated after that limit (however, we recommend around 300 as a general, design guideline). If you have more than that, you can:

  • Use a 2nd chat bubble - Up to 2 chat bubbles are allowed per turn, so find a natural break point and create a second chat bubble.
  • Don't show everything - If you are presenting long TTS content, consider showing only a subset of the TTS content in the chat bubble, such as just an introduction. You can use shorter display text than TTS text in this case.

Recorded Audio
Remove <audio> text from chat bubbles

If you have text inside your SSML <audio> tag, it's displayed in your corresponding chat bubble. For example, if your SSML is:

<speak>
  Here's that song.
  <audio src="...">song audio</audio>
</speak>

your chat bubble text appears as "Here's that song. song audio".

Instead, add a <desc> element inside your<audio> element. Any text inside <desc> is displayed, and any text outside <audio> is used as the alternate text if the audio source file cannot be loaded. For example:

<speak>
  Here's that song.
  <audio src="bad_url"><desc></desc>song audio</audio>
</speak>

results in the audio output: "Here's that song. song audio" and the chat bubble text: Here's that song.

Alternatively, you can just remove the text from your <audio> tag altogether, or use the SSML <sub> tag.

Eliminate empty chat bubbles

Every dialog turn is required to have at least one chat bubble. If your Action has dialogs that are composed of only streaming audio (no TTS) then the chat bubble text will be missing and your response will fail. In these cases, add display text that matches the words in your recorded audio, or the introduction.