Conversation Webhook

When a user says something that the Assistant interprets as an intent to trigger your app, it sends an JSON-formatted HTTP request to your fulfillment. This reference documents the request/response format.

HTTP request

This section describes the HTTP request format that the Google Assistant sends to your fulfillment. The request is sent as a POST request, with a JSON body.

The following example shows an Assistant request body that's sent to your fulfillment:

{
    "query": "Sounds right",
    "accessToken": "...",
    "expectUserResponse": true,
    "conversationToken": "...",
    "surface": "PHONE",
    "debugInfo": {
        "assistantToAgentDebug": {
            "assistantToAgentJson": {
                "user": {
                    "user_id": "..."
                },
                "conversation": {
                    "conversation_id": "...",
                    "type": 2,
                    "conversation_token": "[\"_actions_on_google_\"]"
                },
                "inputs": [
                    {
                        "intent": "assistant.intent.action.TEXT",
                        "raw_inputs": [
                            {
                                "input_type": 2,
                                "query": "Sounds right",
                                "annotation_sets": []
                            }
                        ],
                        "arguments": [
                            {
                                "name": "text",
                                "raw_text": "Sounds right",
                                "text_value": "Sounds right"
                            }
                        ]
                    }
                ],
                "surface": {
                    "capabilities": [
                        {
                            "name": "actions.capability.AUDIO_OUTPUT"
                        },
                        {
                            "name": "actions.capability.SCREEN_OUTPUT"
                        }
                    ]
                },
                "device": {}
            }
        }
    }
}

The following sections describe the JSON objects in the HTTP request body.

Request root objects

The following table describes the root-level objects of the request body:

Field Type Description
user User Describes the user that initiated this conversation.
device Device Information associated with the device from which the conversation was initiated.
conversation Conversation Holds session data like the conversation ID and conversation_token.
inputs Array[Inputs] List of inputs corresponding to developer-required expected input.
surface Surface Information specific to the Google Assistant client surface the user is interacting with. Surface is distinguished from Device by the fact that multiple Assistant surfaces may live on the same device.

User

The user object contains information about the user. The following table describes the elements of the user object:

Field Type Description Requires permission
user_id String A random string identifier for the Google user. The user_id can be used to track a user across multiple sessions and devices.

-
profile UserProfile Information about the user.

-
access_token String A unique OAuth2 token that identifies the user in your system. Only available if Account Linking configuration is defined in the action package and the user links their account. -

UserProfile

Stores user's personal info. It's accessible only after user grants the permission to the app.

Field Type Description Requires permission
given_name String The user's first name as specified in their Google account. NAME
family_name String The user's last name as specified in their Google account. Note that this field could be empty. NAME
display_name String The user's full name as specified in their Google account. NAME

Device

The device object contains information about the device through which the conversation is taking place. The following table describes the elements of the device object:

Field Type Description Requires permission
location Location Representation of the device location. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION

Location

Field Type Description Requires permission
coordinates.latitude Double The device's latitude, in degrees. It must be in the range [-90.0, +90.0]. DEVICE_PRECISE_LOCATION
coordinates.longitude Double The device's longitude, in degrees. It must be in the range [-180.0, +180.0]. DEVICE_PRECISE_LOCATION
formatted_address String The device's display address; for example "1600 Amphitheatre Pkwy, Mountain View, CA 94043". DEVICE_PRECISE_LOCATION
city String The city in which the device is located. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION
zip_code String The ZIP code in which the device is located. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION

Conversation

The conversationobject defines session data about the ongoing conversation.

Field Type Description
conversation_id String Unique ID for the multi-step conversation, it's assigned for the first step, after that it remains the same for subsequent user's queries until the conversation is terminated.
type Enum[
  'TYPE_UNSPECIFIED',
  'NEW',
  'ACTIVE',
  'EXPIRED',
  'ARCHIVED'
]
Indicates the current stage of the dialog's life cycle, such as whether it's a new dialog, or an active dialog.
conversation_token String Opaque token specified by the action endpoint in a previous response; mainly used by the app to maintain the current conversation state.

Inputs

The inputs object contains useful data about the request.

Field Type Description
intent String Indicates the user's intent; will be one of the possible_intents specified in the developer request.
raw_inputs Array[RawInputs] Raw input transcription from each turn of conversation in the dialog that resulted from the previous expected input.
arguments Array[Arguments] Semantically annotated values extracted from the user's inputs.

RawInputs

Field Type Description
create_time.seconds Integer Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive.
create_time.nanos Integer Non-negative fractions of a second at nanosecond resolution. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be from 0 to 999,999,999 inclusive.
input_type Enum[
  'UNSPECIFIC_INPUT_TYPE',
  'TOUCH',
  'VOICE',
  'KEYBOARD'
]
Indicates the kind of input: a typed query, a voice query, or unspecified.
query String Keyboard input or spoken input from end user.

Arguments

Field Type Description
name String Name of the payload in the query.
raw_text String Raw text value for the argument.
text_value String Specified when query pattern includes a $SchemaOrg_TEXT type or expected input has a built-in intent: "assistant.intent.action.TEXT", or "assistant.intent.action.SELECT". Note that for SELECT intent, we set the text_value as option key, the raw_text above will indicate the raw span in user's query. One of these six values is specified.

Surface

Field Type Description
capabilities Array[Capabilities] A list of capabilities the surface supports at the time of the request e.g. "actions.capability.AUDIO_OUTPUT"

Capabilities

The capabilities object is an array of name:capability pairs.

Field Type Description
name String The name of the capability that we are requesting. It can be one of the following two values, and you can ask for both of them together:
  • actions.capability.AUDIO_OUTPUT
  • actions.capability.SCREEN_OUTPUT

HTTP response

This section describes the expected HTTP response from your fulfillment. The HTTP response from the fulfillment endpoint must be in JSON and conform to the schema described below.

In addition to conforming to the schema, you must also set the following header in your response:

  • Header name: "Google-Assistant-API-Version"
  • Header value: "v1"

For example:

response.append("Google-Assistant-API-Version", "v1");

The following example shows the structure of a response that is sent from the action endpoint to the Assistant Platform:

Syntax

{
  "conversation_token": "token",
  "expect_user_response": [true|false],

When expect_user_response = true:
  "expected_inputs": [
    {
      "input_prompt": {
      1 initial prompt
        "initial_prompts": [
          {
            "text_to_speech": "...", // OR
            "ssml": "..."
          }
        ],
      Up to 3 no input prompts
        "no_input_prompts": [
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          },
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          },
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          }
        ]
      },
      "possible_intents": [
        {
          "intent": "intent_type"
        }
      ]
    }
  ]

OR

When expect_user_response = false:
  "final_response": {
    "simple_response": {
      "text_to_speech": "..."
    }
  }
}

Example 1

// Defines prompts when expect_user_response = true
{
  "conversation_token": "42",
  "expect_user_response": true,

  "expected_inputs": [
    {
      "input_prompt": {
        "initial_prompts": [
          {
            "text_to_speech": "What is your next guess?"
          }
        ],
        "no_input_prompts": [
          {
            "text_to_speech": "I didn't hear a number."
          },
          {
            "text_to_speech": "If you're still there, what's your guess?"
          },
          {
            "text_to_speech": "We can stop here. Let's play again soon."
          }
        ]
      },
      "possible_intents": [
        {
          "intent": "assistant.intent.action.TEXT"
        }
      ]
    }
  ]
}

Example 2

// Defines final response when expect_user_response = false
{
  "conversation_token": "42",
  "expect_user_response": false,

  "final_response": {
    "simple_response": {
      "text_to_speech": "Thanks for playing!"
    }
  }
}

The following sections describe the objects in the JSON body of the response.

Response root objects

The following table describes the root-level objects of the fulfillment's response:

Field Type Description
conversation_token String A serialized opaque_token for any session object that your action wants Assistant to circulate back.
expect_user_response Boolean Indicates whether the fulfillment is expecting a response from the user. This is true when the dialog is ongoing; false when the dialog is done.
expected_inputs Array[ExpectedInputs] Lists inputs that the action requires in the next response.
final_response SimpleResponse The response only when there are no expected_inputs and expect_user_response is false.

ExpectedInputs

The expected_inputs object enables the action to request input from the user by matching one of a set of specified possible_intents.

Field Type Description
input_prompt InputPrompt The customized prompt that asks the user for input.
possible_intents Array[ExpectedIntent] A list of intents that can be used to fulfill the input.

ExpectedIntent

Field Type Description
intent String The ID of the assistant-provided intent.
input_value_spec.permission_value_spec {
  "opt_context": "String",
  "permissions": [
    "NAME",
    "DEVICE_PRECISE_LOCATION",
    "DEVICE_COARSE_LOCATION"
  ]
}
Specified in order to request the user's permission to access profile and device information. The opt_context string provides TTS explaining why the fulfillment needs to request permission.
input_value_spec.option_value_spec {
  "name": "String",
"list_select": ListSelect
OR
"carousel_select": CarouselSelect
}
When a list or carousel item is selected, an actions.intent.OPTION intent is triggered, where the selected item's key info is carried inside the input_value_spec.option_value_spec element.

ListSelect and CarouselSelect

Field Type Description
title String Overall title of the list. Optional. Carousels do not have titles.
items Array[Items] An array of Items. Lists can have at most 30 items. A carousel can have at most 10.

Item

Field Type Description
title String Title of the item. Titles must be unique within an array of Items. Required.
description String Main text describing the item. Optional.
image Image object An image for the item. The object has a single key-value pair indicating the URL. Here's an example:
{"url":"https://www.appx.com/logo.gif"}
option_info OptionInfo object {
  "key": "String", required
  "synonyms": Array[Strings] - An array of synonym strings for the given item in this carousel tile or list item; optional.
}

InputPrompt

Field Type Description
initial_prompts Array[SimpleResponse] A single prompt that asks the user to provide an input.
rich_initial_prompt Array[RichResponse] A rich response that can include audio, text, cards, and suggestions.
no_input_prompts Array[SimpleResponse] Up to three prompts that are used to re-ask the user when there is no input from user. For example, "To get started, how many are playing?"
no_match_prompts Array[SimpleResponse] Prompts used to re-ask user when user's response does not match expected input.
mic_waiting_time_seconds Integer Number of seconds the microphone keeps open to wait for user's response. For some cases mic needs to wait for a few seconds because user needs to take some action, e.g., for games, user needs to think about the problem before responding.

SimpleResponse

The initial and no-input prompts are defined by either a text_to_speech object or an ssml object. The final_response object wraps these in a simple_response object.

Field Type Description
text_to_speech String Plain text of the speech output; for example, "where do you want to go?"

You can specify either text_to_speech or ssml, but not both.

ssml String Structured spoken response to the user.

You can specify either text_to_speech or ssml, but not both.

The string can include SSML markup. For more information, see SSML.

RichResponse

A rich response that can include audio, text, cards, and suggestions. A rich response has the following fields:

Field Type Description
items Array[Items]

A list of RichResponse items to be shown in the response.
Rules:

  1. The first item must be a SimpleResponse.
  2. At most two SimpleResponses are allowed.
  3. At most one BasicCard.
  4. Cards may not be used if a visual select is used i.e. ListSelect or CarouselSelect

The details of the data elements inside a list or a carousel are part of the Options documentation.

suggestions Array[Suggestions] A list of suggested replies. These will always appear at the end of the response. If used in a FinalResponse, they will be ignored. The structure of a suggestion is as follows:
Field Type Description
title String The text shown the in the suggestion chip. When tapped, this text will be posted back to the conversation verbatim as if the user had typed it. Each title must be unique among the set of suggestion chips. Max 25 chars
link_out_suggestion Object Creates a suggestion chip that allows the user to jump out to the App or Website associated with this app. It has the following structure:
Field Type Description
destination_name String The target name displayed on the chip. Always shown as Open {destination_name}.
url String The URL of the App or Site to open when the user taps the suggestion chip.

Items

An item can be only one of the following:

BasicCard

A basic card for displaying some information, e.g. an image and/or text

Field Type Description
title String Overal title of the card.
subtitle String To extend the information presented in the title.
formatted_text String Body text of the card. Supports a limited set of markdown syntax for formatting. Required, unless image is present
image Image object An image for the card. The height is fixed to 192dp. The object has a single key-value pair indicating the URL. Here's an example:
{"url":"https://www.appx.com/logo.gif"}. Note that animated GIFs are supported as images for basic cards.
buttons Array[Buttons] A single button object can be optionally included. It has the following structure:
Field Type Description
title String The title of the button.
open_url_action Object Action to take when a user taps on the button. The object has a single key-value pair indicating the http or https scheme URL. Here's an example:
{"url":"https://en.wikipedia.org/wiki/George_Bernard_Shaw"}