Conversation API

When a user says something that the Assistant Platform interprets as an intent to trigger your agent's actions, it sends an JSON-formatted HTTP request to your agent webhook. This reference documents the request/response format.

The webhook is a RESTful service that adheres to the JSON request and response format described below.

HTTP request

This section describes the HTTP request format that the Assistant Platform sends to your agent webhook. The request is sent as a POST request, with a JSON body.

The following example shows an Assistant request body that is sent to the agent webhook:

{
  "user": {
    "user_id": "...",
    "profile": {
      "given_name": "John",
      "family_name": "Doe",
      "display_name": "John Doe"
    },
    "access_token": "..."
  },
  "device": {
    "location": {
      "coordinates": {
        "latitude": 123.456,
        "longitude": -123.456
      },
      "formatted_address": "1234 Random Road, Anytown, CA 12345, United States",
      "city": "Anytown",
      "zip_code": "12345"
    }
  },
  "conversation": {
    "conversation_id": "...",
    "type": "ACTIVE",
    "conversation_token": "..."
  },
  "inputs": [
    {
      "intent": "assistant.intent.action.MAIN",
      "raw_inputs": [
        {
          "query": "..."
        }
      ],
      "arguments": [
        {
          "name": "destination",
          "raw_text": "SFO",
          "location_value": {
            "latlng": {
              "latitude": 37.620565,
              "longitude": -122.384964
            },
            "formatted_address": "1000 Broadway, San Francisco, CA 95133"
          }
        }
      ]
    }
  ]
}

The following sections describe the JSON objects in the HTTP request body.

Request root objects

The following table describes the root-level objects of the request body:

Field Type Description
user User Describes the user that initiated this conversation.
device Device Information associated with the device from which the conversation was initiated.
conversation Conversation Holds session data like the conversation ID and token.
inputs Array[Inputs] List of inputs corresponding to developer-required expected input. The input could be the query semantics for initial query, or assistant-provided response for developer required input. We ensure 1:1 mapping for all the required inputs from developer. Note that currently only one expected input is supported.

User

The user object contains information about the user. The following table describes the elements of the user object:

Field Type Description Requires permission
user_id String A random string identifier for the Google user. The user_id can be used to track a user across multiple sessions and devices.

-
profile UserProfile Information about the user.

-
access_token String A unique OAuth2 token that identifies the user in your system. Only available if Account Linking configuration is defined in the action package and the user links their account. -

UserProfile

Stores user's personal info. It's accessible only after user grants the permission to the agent.

Field Type Description Requires permission
given_name String The user's first name as specified in their Google account. NAME
family_name String The user's last name as specified in their Google account. Note that this field could be empty. NAME
display_name String The user's full name as specified in their Google account. NAME

Device

The device object contains information about the device through which the conversation is taking place. The following table describes the elements of the device object:

Field Type Description Requires permission
location Location Representation of the device location. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION

Location

Field Type Description Requires permission
coordinates.latitude Double The device's latitude, in degrees. It must be in the range [-90.0, +90.0]. DEVICE_PRECISE_LOCATION
coordinates.longitude Double The device's longitude, in degrees. It must be in the range [-180.0, +180.0]. DEVICE_PRECISE_LOCATION
formatted_address String The device's display address; for example "1600 Amphitheatre Pkwy, Mountain View, CA 94043". DEVICE_PRECISE_LOCATION
city String The city in which the device is located. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION
zip_code String The ZIP code in which the device is located. DEVICE_PRECISE_LOCATION or DEVICE_COARSE_LOCATION

Conversation

The conversationobject defines session data about the ongoing conversation.

Field Type Description
conversation_id String Unique ID for the multi-step conversation, it's assigned for the first step, after that it remains the same for subsequent user's queries until the conversation is terminated.
type Enum[
  'TYPE_UNSPECIFIED',
  'NEW',
  'ACTIVE',
  'EXPIRED',
  'ARCHIVED'
]
Indicates the current stage of the dialog's life cycle, such as whether it's a new dialog, or an active dialog.
conversation_token String Opaque token specified by the action endpoint in a previous response; mainly used by the agent to maintain the current conversation state.

Inputs

The inputs object contains useful data about the request. The input could be the query semantics for the initial query, or the assistant-provided response for developer required input.

Field Type Description
intent String Indicates the user's intent; will be one of the possible_intents specified in the developer request.
raw_inputs Array[RawInputs] Raw input transcription from each turn of conversation in the dialog that resulted from the previous expected input.
arguments Array[Arguments] Semantically annotated values extracted from the user's inputs.

RawInputs

Field Type Description
create_time.seconds Integer Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive.
create_time.nanos Integer Non-negative fractions of a second at nanosecond resolution. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be from 0 to 999,999,999 inclusive.
input_type Enum[
  'UNSPECIFIC_INPUT_TYPE',
  'TOUCH',
  'VOICE'
]
Indicates the kind of input: a typed query, a voice query, or unspecified.
query String Keyboard input or spoken input from end user.

Arguments

Field Type Description
name String Name of the payload in the query.
raw_text String raw text value for the argument.
int_value Integer Specified when the user input had a $SchemaOrg_Number argument.
bool_value Boolean Specified when the user input had a $SchemaOrg_YesNo argument.
text_value String Specified when the user input had a $SchemaOrg_Text argument.
date_value {
  "year": Integer,
  "month": Integer,
  "day": Integer
}
Specified when the user input had a $SchemaOrg_Date argument
time_value {
  "hours": Integer,
  "minutes": Integer,
  "seconds": Integer,
  "nanos": Integer
}
Specified when the user input had a $SchemaOrg_Time argument.
location_value Location Specified when the user input had a $SchemaOrg_Place argument.

HTTP response

This section describes the expected HTTP response from the agent webhook. The HTTP response from the agent endpoint must be in JSON and conform to the schema described below.

In addition to conforming to the schema, you must also set the following header in your response:

  • Header name: "Google-Assistant-API-Version"
  • Header value: "v1"

For example:

response.append("Google-Assistant-API-Version", "v1");

The following example shows the structure of a response that is sent from the action endpoint to the Assistant Platform:

Syntax

{
  "conversation_token": "token",
  "expect_user_response": [true|false],

  // When expect_user_response = true:
  "expected_inputs": [
    {
      "input_prompt": {
        // 1 initial prompt
        "initial_prompts": [
          {
            "text_to_speech": "...", // OR
            "ssml": "..."
          }
        ],
        // Up to 3 no input prompts
        "no_input_prompts": [
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          },
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          },
          {
            "text_to_speech": "..."// OR
            "ssml": "..."
          }
        ]
      },
      "possible_intents": [
        {
          "intent": "intent_type"
        }
      ]
    }
  ]

  // OR

  // When expect_user_response = false:
  "final_response": {
    "speech_response": {
      "text_to_speech": "..."
    }
  }
}

Example 1

// Defines prompts when expect_user_response = true
{
  "conversation_token": "42",
  "expect_user_response": true,

  "expected_inputs": [
    {
      "input_prompt": {
        "initial_prompts": [
          {
            "text_to_speech": "What is your next guess?"
          }
        ],
        "no_input_prompts": [
          {
            "text_to_speech": "I didn't hear a number."
          },
          {
            "text_to_speech": "If you're still there, what's your guess?"
          },
          {
            "text_to_speech": "We can stop here. Let's play again soon."
          }
        ]
      },
      "possible_intents": [
        {
          "intent": "assistant.intent.action.TEXT"
        }
      ]
    }
  ]
}

Example 2

// Defines final response when expect_user_response = false
{
  "conversation_token": "42",
  "expect_user_response": false,

  "final_response": {
    "speech_response": {
      "text_to_speech": "Thanks for playing!"
    }
  }
}

The following sections describe the objects in the JSON body of the response.

Response root objects

The following table describes the root-level objects of the agent's response:

Field Type Description
conversation_token String A serialized opaque_token for any session object that your action wants Assistant to circulate back.
expect_user_response Boolean Indicates whether the agent is expecting a response from the user. This is true when the dialog is ongoing; false when the dialog is done.
expected_inputs Array[ExpectedInputs] Lists inputs that the action requires in the next response.
final_response SpeechResponse The response only when there are no ExpectedInputs and expect_user_response is false.

ExpectedInputs

The expected_inputs object enables the action to request input from the user by matching one of a set of specified possible_intents.

Field Type Description
input_prompt InputPrompt The customized prompt that asks the user for input.
possible_intents Array[ExpectedIntent] A list of intents that can be used to fulfill the input.

ExpectedIntent

Field Type Description
intent String The ID of the assistant-provided intent.
input_value_spec.permission_value_spec {
  "opt_context": "String",
  "permissions": [
    "NAME",
    "DEVICE_PRECISE_LOCATION",
    "DEVICE_COARSE_LOCATION"
  ]
}
Specified in order to request the user's permission to access profile and device information. The opt_context string provides TTS explaining why the agent needs to request permission.

InputPrompt

Field Type Description
initial_prompts Array[SpeechResponse] A single prompt that asks the user to provide an input.
no_input_prompts Array[SpeechResponse] Up to three prompts that are used to re-ask the user when there is no input from user. For example, "I'm sorry, I didn't hear you. Can you repeat that please?"

SpeechResponse

The initia and no-input prompts are defined by either a text_to_speech object or an ssml object. The final_response object wraps these in a speech_response object.

Field Type Description
text_to_speech String Plain text of the speech output; for example, "where do you want to go?"

You can specify either text_to_speech or ssml, but not both.

ssml String Structured spoken response to the user.

You can specify either text_to_speech or ssml, but not both.

The string can include SSML markup. For more information, see SSML.