Google Talk for Developers

Google Talk Call Signaling

Table of Contents

  1. Introduction
  2. Status of this Document
  3. Sequence of Events
  4. Messages
    1. session-initiate
    2. transport-info
    3. session-accept
    4. session-terminate
  5. Supported Media Types
    1. Audio
      1. Packetization
      2. Rate Control
      3. VAD/Comfort Noise
    2. Video
      1. Packetization
      2. Rate Control
    3. Encryption
  6. Detecting Support and Call Routing
    1. Capabilities
    2. Call Routing
  7. Examples
    1. Initiating and Terminating a Video Call
  8. Related Documentation
  9. Document History

Introduction

This document describes the XMPP signaling and other related protocols used by Google Talk (and the libjingle library on which it is based) to conduct voice and video calls. It is intended for use by third-party IM client developers who would like to achieve interoperability with Google Talk/Gmail voice and video chat. For more Google Talk developer information, please refer to the Google Talk Developer Documentation page.

The relevant XMPP extension standards, XEP-0166: Jingle and XEP-0167: Jingle RTP Sessions, have been advanced to Draft status by the XMPP Standards Foundation. Google Talk uses these standards for signalling.

Status of this Document

This document describes the current signaling protocol as used by Google Talk and Gmail video chat.

Note that when using this document, you may encounter undocumented issues for your client -- if this happens, or if anything in the document is unclear, please let us know. You can send feedback on this document to xmpp+jingle@google.com.

Sequence Of Events

The sequence of events is described in XEP-0167: Jingle RTP Sessions. The following diagram shows the basic process with Google transport events for a successful call:

INITIATOR                            RESPONDER
    |                                    |
    | session-initiate (1)               |
    |----------------------------------->|
    |                            ack (1) |
    |<-----------------------------------|
    | transport-info(candidates) (2)     |
    |----------------------------------->|
    |                            ack (2) |
    |<-----------------------------------|
    |     transport-info(candidates) (2) |
    |<-----------------------------------|
    | ack (2)                            |
    |----------------------------------->|
    | STUN Binding Request (3)           |
    |===================================>|
    |            STUN Binding Result (3) |
    |<===================================|
    |           STUN Binding Request (3) |
    |<===================================|
    | STUN Binding Result (3)            |
    |===================================>|
    |                 session-accept (4) |
    |<-----------------------------------|
    | ack (4)                            |
    |----------------------------------->|
    |<=======AUDIO/VIDEO RTP (5)========>|
    |----------------------------------->|
    | session-terminate (6)              |
    |----------------------------------->|
    |                            ack (6) |
    |<-----------------------------------|
    |                                    |

Messages

Messages are described by example, with additional notes in comments. The current implementation of Google Talk uses the XMPP extension standards, XEP-0166: Jingle and XEP-0167: Jingle RTP Sessions. Please refer to the standards for more details.

session-initiate

session-initiate message initiates a Google Talk call, specifying the initiator and all the payload types the initiator can send/receive. The following is an example of session-initiate message.

  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" type="set" id="8">
    <jingle xmlns="urn:xmpp:jingle:1" action="session-initiate" sid="2018324252" initiator="romeo@montague.lit/orchard">
      <content name="audio" creator="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="audio">
          <payload-type id="103" name="ISAC" clockrate="16000"/>
          <payload-type id="0" name="PCMU" clockrate="8000"/>
        </description>
        <transport xmlns="http://www.google.com/transport/p2p"/>
      </content>
      <content name="video" creator="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="video">
          <payload-type id="97" name="H264">
            <parameter name="width" value="320"/>
            <parameter name="height" value="240"/>
            <parameter name="framerate" value="30"/>
          </payload-type>
        </description>
        <transport xmlns="http://www.google.com/transport/p2p"/>
      </content>
    </jingle>
  </iq>

Note that the transport element is required and its xmlns has to be the value as shown in the above example. In future version of Google Talk, we plan to support XEP-0176: Jingle ICE-UDP Transport Method. The current version uses an older version of ICE.

transport-info

transport-info message describes candidate transport for the channel. As described in the Sequence Of Events section above, once a session-initiate is acknowledged, both call parties send candidates via transport-info and probe those they receive (using ICE) until a candidate pair is successful and the responder session-accepts. This process is described further in the ICE Transport section below. The following is an example of transport-info message.

  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" type="set" id="9">
    <jingle xmlns="urn:xmpp:jingle:1" action="transport-info" sid="2018324252">
      <content name="audio" creator="initiator">
        <transport xmlns="http://www.google.com/transport/p2p">
          <candidate name="rtp"
                     address="127.0.0.1"
                     port="60802"
                     preference="1"
                     type="stun"
                     protocol="udp"
                     network="0"
                     username="EiuU8u5pNNMXsVJ/"
                     password="GoVuJl7O2vQTyNln"
                     generation="0"/>
        </transport>
      </content>
    </jingle>
  </iq>

session-accept

session-accept message is sent by the call responder once the call can go ahead, specifying the payload types chosen/accepted. The following is an example of a session-accept message.

  <iq type="set" to="romeo@montague.lit/orchard" id="AC5CDCBE50BA2CE8" from="juliet@capulet.lit/balcony">
    <jin:jingle action="session-accept" sid="2018324252" xmlns:jin="urn:xmpp:jingle:1">
      <jin:content name="audio" creator="initiator">
        <rtp:description media="audio" xmlns:rtp="urn:xmpp:jingle:apps:rtp:1">
          <rtp:payload-type id="103" name="ISAC" clockrate="16000"/>
          <rtp:payload-type id="0" name="PCMU" clockrate="8000"/>
        </rtp:description>
        <p:transport xmlns:p="http://www.google.com/transport/p2p"/>
      </jin:content>
      <jin:content name="video" creator="initiator">
        <rtp:description media="video" xmlns:rtp="urn:xmpp:jingle:apps:rtp:1">
          <rtp:payload-type id="97" name="H264">
            <rtp:parameter name="width" value="320"/>
            <rtp:parameter name="height" value="200"/>
            <rtp:parameter name="framerate" value="30"/>
          </rtp:payload-type>
        </rtp:description>
        <p:transport xmlns:p="http://www.google.com/transport/p2p"/>
      </jin:content>
    </jin:jingle>
  </iq>

session-terminate

session-terminate message is used to terminate a call. A <reason> element may be optionally specified as a child of the <jingle> element.

  <iq type="set" to="romeo@montague.lit/orchard" id="CE62887DCDC1CE0F" from="juliet@capulet.lit/balcony">
    <jingle action="session-terminate" sid="2018324252" xmlns="urn:xmpp:jingle:1">
      <reason>
        <success/>
      </reason>
    </jingle>
  </iq>

Reason codes

The reason codes is defined in section 7.4 of XEP-0166: Jingle. Please refer to the standard for details.

ICE Transport

As mentioned above, audio and video media is transmitted over RTP streams using the ICE transport. For audio calls, only a single "rtp" channel is used; RTCP is not currently used for these calls. Video calls establish RTP and RTCP channels for both audio and video media, resulting in four channels named "rtp", "rtcp", "video_rtp" and "video_rtcp".

The establishment of and maintenance of RTP and RTCP channels is done by performing connectivity checks against all received candidates using STUN binding requests and responses. For a given candidate to be considered usable, it must respond to a STUN binding request with an appropriate STUN binding response. In addition to the initial connectivity checks, STUN is used after the channel is established to ensure it is still alive.

During the call, if multiple usable candidates are available, the client may switch from the one it is currently using to another one, if that one seems to be working better. The implementation of Google Talk client supports both TCP and UDP, but prefers UDP candidates for the best performance.

Note that the client verifies that received packets are coming from a valid candidate port on the remote machine.

Supported Media Types

Audio

The Google Talk client and Gmail voice and video chat support many common voice codecs, including G.711, G.722, Speex, iLBC, and ISAC. When sending, the client will use whatever codec the remote side specifies first in its codec list. The following shows the list of codecs as they would appear in the session description:

  <payload-type id="103" name="ISAC" clockrate="16000">
    <parameter name="bitrate" value="32000"/>
  </payload-type>
  <payload-type id="104" name="ISAC" clockrate="32000">
    <parameter name="bitrate" value="56000"/>
  </payload-type>
  <payload-type id="119" name="ISACLC" clockrate="16000">
    <parameter name="bitrate" value="40000"/>
  </payload-type>
    <payload-type id="99" name="speex" clockrate="16000">
  <parameter name="bitrate" value="22000"/>
    </payload-type>
  <payload-type id="97" name="IPCMWB" clockrate="16000">
    <parameter name="bitrate" value="80000"/>
  </payload-type>
  <payload-type id="9" name="G722" clockrate="16000">
    <parameter name="bitrate" value="64000"/>
  </payload-type>
    <payload-type id="102" name="iLBC" clockrate="8000">
  <parameter name="bitrate" value="13300"/>
    </payload-type>
  <payload-type id="98" name="speex" clockrate="8000">
    <parameter name="bitrate" value="11000"/>
  </payload-type>
  <payload-type id="3" name="GSM" clockrate="8000">
    <parameter name="bitrate" value="13200"/>
  </payload-type>
  <payload-type id="100" name="EG711U" clockrate="8000">
    <parameter name="bitrate" value="64000"/>
  </payload-type>
  <payload-type id="101" name="EG711A" clockrate="8000">
    <parameter name="bitrate" value="64000"/>
  </payload-type>
  <payload-type id="0" name="PCMU" clockrate="8000">
    <parameter name="bitrate" value="64000"/>
  </payload-type>
  <payload-type id="8" name="PCMA" clockrate="8000">
    <parameter name="bitrate" value="64000"/>
  </payload-type>
  <payload-type id="126" name="CN" clockrate="32000"/>
  <payload-type id="105" name="CN" clockrate="16000"/>
  <payload-type id="13" name="CN" clockrate="8000"/>
  <payload-type id="117" name="red" clockrate="8000"/>
  <payload-type id="106" name="telephone-event" clockrate="8000"/>

The payload IDs listed here are shown for example purposes only; different Google clients may use different values for the dynamic payload types (those above 95). Note also that separate payload types are used for the same codec with different clockrates. This is so that the decoder will be able to unambiguously decode a payload with a given id.

Packetization

Where standards exist (iLBC, Speex, G.722 and PCMU/PCMA), RTP packetization is as specified in the respective RFCs. When sending telephone-event (DTMF) signals, RFC 2833 will be used.

Rate Control

Most of the supported audio codecs specify a fixed bitrate, which is indicated during session negotiation. ISAC is dynamic bitrate from 18-48 Kbps; this bitrate is negotiated in-band by a proprietary mechanism.

VAD/Comfort Noise

By default Google Talk does not request the use of VAD. However, it will support it if the other side advertises it in its initiate/accept message. This is done by specifying a payload-type with name="CN", similar to how this is indicated in RFC 3389.

Video

Video chat supports the H.264/SVC, H.264/AVC, and H.263 codecs, using the Scalable Baseline (SVC) and Baseline (AVC) profiles as described in the H.264 specification. Other codecs may be supported in future versions.

The Google Talk client will always prefer to receive H.264. H.263 is only offered for compatibility purposes. However, when sending, the client will use whatever codec the remote side specifies first in their XMPP session-initiate or session-accept message. The version of H.263 supported is H.263-1996. The following shows the list of codecs as they would appear in a session description:

  <payload-type id="99" name="H264-SVC">
    <parameter name="width" value="640"/>
    <parameter name="height" value="400"/>
    <parameter name="framerate" value="30"/>
  </payload-type>
  <payload-type id="96" name="H264-SVC-draft-02">
    <parameter name="width" value="640"/>
    <parameter name="height" value="400"/>
    <parameter name="framerate" value="30"/>
  </payload-type>
  <payload-type id="97" name="H264">
    <parameter name="width" value="640"/>
    <parameter name="height" value="400"/>
    <parameter name="framerate" value="30"/>
  </payload-type>
  <payload-type id="98" name="H263">
    <parameter name="width" value="640"/>
    <parameter name="height" value="400"/>
    <parameter name="framerate" value="30"/>
  </payload-type>

Note that the width, height, and framerate parameters are 'hints' to the sender that specify what the receiver prefers to decode; these values should be chosen based on the capabilities of the receiving client. When interpreting these values, the sender may send the specified resolution, a lower resolution, or a similar resolution with a different aspect ratio. For example, to a client that prefers 320x200, the sender may choose to send 320x200, 160x100, or 320x240, depending on what is best for the sender. The sender should not send a resolution/framerate that markedly exceeds that specified by the sender. For example, to a client that prefers 320x200, the sender should not choose to send 640x400, as the receiver may not be capable of decoding it.

Another note is that the Google Talk client may change the resolution it sends during the call. If there is not enough bandwidth to maintain sufficient quality (QP), the client may reduce resolution from 640x400 -> 480x300 -> 320x200 -> 240x150 -> 160x100, as needed.

Packetization

The RTP packetization for H.264/SVC has not been finalized; we intend to comply with the standard when it is finalized. RTP packetization for H.264/AVC is as specified in RFC 6184. When sending, AVC packetization mode 0 (single NAL) is used, but when receiving, all packetization modes from RFC 6184 are supported.

For H.263, the RTP packetization is as specified in RFC 2429.

Rate Control

H.264 is capable of scaling its bitrate dynamically. We use a mechanism based on TCP Friendly Rate Control (TFRC) to determine the available bandwidth and adjust the rate accordingly. There are two components to this mechanism, a TFRC value written into a RTP extension header in each outgoing RTP packet, and a RTCP feedback message indicating the available bandwidth, which will be used by the receiving client to adjust its send bitrate. 

The format for the RTP extension header is based on this draft: http://tools.ietf.org/html/draft-ietf-avt-tfrc-profile-10.

The format for the RTCP feedback message will be added to a future version of this document.

Encryption

The Google Talk supports SRTP for media encryption, which supports the crypto-suites of AES_CM_128_HMAC_SHA1_32 and AES_CM_128_HMAC_SHA1_80 as described in RFC 3711.

Detecting Support and Call Routing

Capabilities

The Google Talk client uses capabilities to determine whether users support the Google Talk voice and video functionality. There are three capabilities currently defined, which are specified in the ext attribute of a XEP-0115 capabilities element:

  • voice-v1: indicates the user is capable of sending and receiving voice media.
  • video-v1: indicates the user is capable of receiving video media.
  • camera-v1: indicates the user is capable of sending video media.
<c xmlns="http://jabber.org/protocol/caps"
   node="http://mail.google.com/xmpp/client/caps"
   ver="1.0"
   ext="voice-v1 video-v1 camera-v1"/>

Call Routing

When there are multiple possible endpoints (i.e. XMPP resources) to receive a call, the initiating Google client will select what it considers to be the "best" endpoint to receive the call. Currently, forking of calls is not supported, although it is expected to be implemented in a future version.

For voice calls, the following algorithm is used:

  • If any endpoint is active (i.e., its status is not "idle") and supports "voice-v1", use that endpoint.
  • Otherwise, select the first endpoint that supports "voice-v1".

For video calls, the following algorithm is used:

  • If any endpoint is active and supports "video-v1" and "camera-v1", use that endpoint.
  • Otherwise, if any endpoint is active and supports "video-v1", use that endpoint.
  • Otherwise, if any endpoint supports "video-v1" and "camera-v1", use that endpoint.
  • Otherwise, select the first endpoint that supports "video-v1".

Note that Gmail will combine all of its endpoints (i.e. multiple Gmail logins) into a single XMPP resource, which will aggregate the capabilities of the individual endpoints. When receiving a call, the Gmail server will use its own knowledge about endpoint capabilities and activity to determine the best endpoint to send the call on to.

Examples

The following examples show all the IQ messages sent over the course of various audio and video calls.  Messages with a red border are from the call initiator, while those with a blue border are from the responder. As you can see, each message is acknowledged with a result, specifying the id of the corresponding received message.

Initiating and Terminating a Video Call

Note that in an audio-only call, the Video content would be omitted.

  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" type="set" id="8">
    <jingle xmlns="urn:xmpp:jingle:1" action="session-initiate" sid="2018324252" initiator="romeo@montague.lit/orchard">
      <content name="audio" creator="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="audio">
          <payload-type id="103" name="ISAC" clockrate="16000"/>
          <payload-type id="0" name="PCMU" clockrate="8000"/>
        </description>
        <transport xmlns="http://www.google.com/transport/p2p"/>
      </content>
      <content name="video" creator="initiator">
        <description xmlns="urn:xmpp:jingle:apps:rtp:1" media="video">
          <payload-type id="97" name="H264">
            <parameter name="width" value="320"/>
            <parameter name="height" value="240"/>
            <parameter name="framerate" value="30"/>
          </payload-type>
        </description>
        <transport xmlns="http://www.google.com/transport/p2p"/>
      </content>
    </jingle>
  </iq>
  <iq to="romeo@montague.lit/orchard" from="juliet@capulet.lit/balcony" id="8" type="result">
  </iq>
  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" type="set" id="9">
    <jingle xmlns="urn:xmpp:jingle:1" action="transport-info" sid="2018324252">
      <content name="audio" creator="initiator">
        <transport xmlns="http://www.google.com/transport/p2p">
          <candidate name="rtp"
                     address="192.168.0.2"
                     port="52979"
                     username="KcVNlN3GilL593k/"
                     password="7N5A2keU13yo/Abk"
                     preference="1"
                     protocol="udp"
                     generation="0"
                     network="0"
                     type="local"/>
          <candidate name="rtcp"
                     address="192.168.0.2"
                     port="52980"
                     username="pxFyhd6lUm+6rZ5N"
                     password="nHIVwJyMqAeQIhDK"
                     preference="1"
                     protocol="udp"
                     generation="0"
                     network="0"
                     type="local"/>
        </transport>
      </content>
      <content name="video" creator="initiator">
        <transport xmlns="http://www.google.com/transport/p2p">
          <candidate name="video_rtp"
                     address="192.168.0.2"
                     port="52981"
                     username="ITdrwHBGOhVg6pWW"
                     password="y4FwqAgIjMSv/JnM"
                     preference="1"
                     protocol="udp"
                     generation="0"
                     network="0"
                     type="local"/>
          <candidate name="video_rtcp"
                     address="192.168.0.2"
                     port="52982"
                     username="5Q4txm1vACIArjC/"
                     password="ssKFEmRNSHIYl1QP"
                     preference="1"
                     protocol="udp"
                     generation="0"
                     network="0"
                     type="local"/>
        </transport>
      </content>
    </jingle>
  </iq>
  <iq to="romeo@montague.lit/orchard" from="juliet@capulet.lit/balcony" id="9" type="result"/>
  <iq type="set" to="romeo@montague.lit/orchard" id="EC6FAFAFA789F006" from="juliet@capulet.lit/balcony">
    <jin:jingle action="transport-info" sid="2018324252" xmlns:jin="urn:xmpp:jingle:1">
      <jin:content name="audio" creator="initiator">
        <p:transport xmlns:p="http://www.google.com/transport/p2p">
          <candidate name="rtp"
                     address="192.168.0.3"
                     port="4252"
                     preference="1"
                     username="LhCkgya5HMil6OHs"
                     protocol="udp"
                     generation="0"
                     password="EErusOJinbn98oeA"
                     type="local"
                     network="0"/>
          <candidate name="rtcp"
                     address="192.168.0.3"
                     port="4253"
                     preference="1"
                     username="+2xHg478LZxRoyXK"
                     protocol="udp"
                     generation="0"
                     password="3RUE2M3kFV3NJP/W"
                     type="local"
                     network="0"/>
      </jin:content>
      <jin:content name="video" creator="initiator">
        <p:transport xmlns:p="http://www.google.com/transport/p2p">
          <candidate name="video_rtp"
                     address="192.168.0.3"
                     port="4254"
                     preference="1"
                     username="j8A7m8iqXwzyPewt"
                     protocol="udp"
                     generation="0"
                     password="OQHblqhXT6gJuRle"
                     type="local"
                     network="0"/>
          <candidate name="video_rtcp"
                     address="192.168.0.3"
                     port="4255"
                     preference="1"
                     username="GaWnVaNbqGKmOSB1"
                     protocol="udp"
                     generation="0"
                     password="9+NYAZMvHo3iIUkH"
                     type="local"
                     network="0"/>
        </p:transport>
      </jin:content>
    </jin:jingle>
  </iq>
  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" id="EC6FAFAFA789F006" type="result"/>
  <iq type="set" to="romeo@montague.lit/orchard" id="AC5CDCBE50BA2CE8" from="juliet@capulet.lit/balcony">
    <jin:jingle action="session-accept" sid="2018324252" xmlns:jin="urn:xmpp:jingle:1">
      <jin:content name="audio" creator="initiator">
        <rtp:description media="audio" xmlns:rtp="urn:xmpp:jingle:apps:rtp:1">
          <rtp:payload-type id="103" name="ISAC" clockrate="16000"/>
          <rtp:payload-type id="0" name="PCMU" clockrate="8000"/>
        </rtp:description>
        <p:transport xmlns:p="http://www.google.com/transport/p2p"/>
      </jin:content>
      <jin:content name="video" creator="initiator">
        <rtp:description media="video" xmlns:rtp="urn:xmpp:jingle:apps:rtp:1">
          <rtp:payload-type id="97" name="H264">
            <rtp:parameter name="width" value="320"/>
            <rtp:parameter name="height" value="200"/>
            <rtp:parameter name="framerate" value="30"/>
          </rtp:payload-type>
        </rtp:description>
        <p:transport xmlns:p="http://www.google.com/transport/p2p"/>
      </jin:content>
    </jin:jingle>
  </iq>
  <iq from="romeo@montague.lit/orchard" to="juliet@capulet.lit/balcony" id="AC5CDCBE50BA2CE8" type="result"/>
  <iq type="set" to="romeo@montague.lit/orchard" id="CE62887DCDC1CE0F" from="juliet@capulet.lit/balcony">
    <jingle action="session-terminate" sid="2018324252" xmlns="urn:xmpp:jingle:1">
      <reason>
        <success/>
      </reason>
    <jingle>
  </iq>
  <iq to="juliet@capulet.lit/balcony" id="CE62887DCDC1CE0F" type="result"/>

Document History

Version 2.0 (Aug. 04, 2011)

  • Update to reflect move to Jingle.

Version 1.3 (Dec. 11, 2010)

  • Added Encryption information.
  • Added information about RTP/RTCP channels in the ICE Transport section.
  • Updated supported codecs in the Audio and Video sections.

Version 1.2

  • Documented proper handling of the <transport> element in an initiate message.
  • Specified how comfort noise is signaled.
  • Added more information about video rate control.
  • Added more information about how call routing works with Gmail.

Version 1.1

  • Added description of how width and height fields are used.

Version 1.0

  • Initial version.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.