Web Receiver Player Streaming Protocols

The Web Receiver SDK supports three types of streaming protocols today:

DASH, HTTP Live Streaming, and Smooth Streaming.

In this document we list our support for each of the streaming protocols. Note the explanation of supported tags for each protocol is quite abbreviated compared to the detailed protocol spec. The goal is to provide a quick glimpse and understanding of how to use each protocol, and which features of the protocol are supported on Cast enabled devices to deliver their streaming experiences.

Dynamic Adaptive Streaming over HTTP (DASH)

ISO's detailed specification of DASH.

DASH is an adaptive bitrate streaming protocol that enables high quality video streaming through HTTP(S) servers. A manifest, composed in XML, contains most of the metadata information for how to initialize and download the video content. The key concepts that the Web Receiver Player supports are <Period>, <AdaptationSet>, <Representation>, <SegmentTemplate>, <SegmentList>, <BaseUrl>, and <ContentProtection>.

A DASH manifest starts with a root <MPD> tag and inside includes one or more <Period> tags, which represent one streaming content. <Period> tags allow ordering of different pieces of streaming content and are often used to separate main content and advertisement or multiple consecutive video contents.

An <AdaptationSet> under <MPD> is a set of representations for one type of media stream, in most cases video, audio, or captions. The most commonly supported mimetypes are "video/mp4", "audio/mp4", and "text/vtt". An optional <ContentComponent contentType="$TYPE$"> can be included under <AdaptationSet>.

Inside each <AdaptationSet> a list of <Representation> tags should be present and the Web Receiver Player uses the codecs information to initialize MSE source buffer and the bandwidth information to automatically choose the right representation/bitrate to play.

For each <Representation>, media segments are described using either a <BaseURL> for single segment representation, <SegmentList> for list of segments (similar to HLS), or <SegmentTemplate>.

For a <SegmentTemplate>, it indicates how initialization segment and media segments can be represented through templating. In the below example $Number$ indicates the segment number as available from the CDN. So it translates to seg1.m4s, seg2.m4s, etc. as playback continues.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:ns2="http://www.w3.org/1999/xlink"
  profiles="urn:mpeg:dash:profile:isoff-live:2011,http://dashif.org/guidelines/dash264" type="static"
  publishTime="2016-10-05T22:07:14.859Z" mediaPresentationDuration="P1DT0H0M0.000S" minBufferTime="P0DT0H0M7.500S">
  <Period id="P0">
    <AdaptationSet lang="en" segmentAlignment="true">
      <ContentComponent id="1" contentType="audio"/>
      <SegmentTemplate media="seg$Number$.m4s" initialization="seginit.mp4"
        duration="10000" startNumber="1" timescale="1000" presentationTimeOffset="0"/>
      <Representation id="1" bandwidth="150123" audioSamplingRate="44100"
        mimeType="audio/mp4" codecs="mp4a.40.2" startWithSAP="1">
        <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
    <AdaptationSet segmentAlignment="true">
      <ContentComponent id="1" contentType="video"/>
      <SegmentTemplate media="seg$Number$.m4s" initialization="seginit.mp4"
        duration="10000" startNumber="1" timescale="1000" presentationTimeOffset="0"/>
      <Representation id="1" bandwidth="212191" width="384" height="208" sar="26:27"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">
      <Representation id="1" bandwidth="366954" width="512" height="288" sar="1:1"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">
      <Representation id="1" bandwidth="673914" width="640" height="352" sar="44:45"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">

For a <SegmentTemplate>, it’s common to use the <SegmentTimeline> tag to indicate how long each segment is and which segments repeat. A timescale (units to represent one second) is often included as part of the attributes of <SegmentTemplate> so that we can calculate the time of the segment based on this unit. In the example below, the <S> tag signifies a segment tag, the d attribute specifies how long the segment is and the r attribute specifies how many segments of the same duration repeat so that $Time$ can be calculated properly for downloading the media segment as specified in the media attribute.

      <S t="0" d="96256" r="2" />
      <S d="95232" />
      <S d="96256" r="2" />
      <S d="95232" />
      <S d="96256" r="2" />

For representation using <SegmentList>, here is an example:

<Representation id="FirstRep" bandwidth="2000000" width="1280"
  <SegmentList timescale="90000" duration="270000">
     <RepresentationIndex sourceURL="representation-index.sidx"/>
     <SegmentURL media="seg-1.ts"/>
     <SegmentURL media="seg-2.ts"/>
     <SegmentURL media="seg-3.ts"/>

For a single segment file, a <SegmentBase> is often used with byte range requests to specify which part of a <BaseURL> file contains the index, and the rest can be fetched on demand as playback continues or a seek happens. Here the Initialization range specifies the init metadata range and the indexRange specifies the index for the media segments. Note that right now we only support consecutive byte ranges.

<Representation bandwidth="4190760" codecs="avc1.640028"
  height="1080" id="1" mimeType="video/mp4" width="1920">
  <SegmentBase indexRange="674-1149">
    <Initialization range="0-673" />

Regardless of which representation is used, if the streams are protected, a <ContentProtection> section can appear under <AdaptationSet>, where a schemeIdUri uniquely identifies the DRM system to use. An optional key ID can be included for common encryption.

<!-- Common Encryption -->

<!-- Widevine -->

For more examples and details please refer to the MPEG-DASH specification. Below is a list of additional DASH attributes on tags not mentioned above that we currently support:

Attribute Name Attribute Function
mediaPresentationDuration How long the video content is.
minimumUpdatePeriod Attribute of the <MPD> tag; specifies how often we need to reload the manifest.
type Attribute of the <MPD> tag; "dynamic" to indicate that this is a live stream.
presentationTimeOffset Attribute of <SegmentBase> tag; specifies the presentation time offset from the beginning of the period.
startNumber Specifies the number of the first media segment in a presentation in a period. This is often used in live stream.

We also support recognizing EMSG box inside MP4 fragments for DASH and provide an EmsgEvent to developers.

While our current Web Receiver Player supports the major DASH use cases, here is a list of common attributes that our current implementation of DASH ignores or does not use. This means regardless of whether the manifest contains them, they have no impact on the playback experience of the content.

  • availabilityStartTime
  • segmentAlignment

HTTP Live Streaming (HLS)

The overview and full spec of HTTP live streaming can be obtained here.

One of the key strengths of the Web Receiver Player is its ability to support playback of HLS in MSE. Different from DASH, where a manifest comes in a single file, HLS sends the master playlist containing a list of all the variant streams with their respective URL. The variant playlist is the media playlist. The two major HLS tags that the Web Receiver Player currently supports in the master playlist are:

Tag Name Functionality
#EXT-X-STREAM-INF Specifies a bitrate/variant stream. The BANDWIDTH attribute is required which supports adaptive bitrate streaming selection. The CODECS attribute is strongly recommended for initializing MSE, such as "avc1.42c01e,mp4a.40.2". If not specified, the default case is set to H264 main profile 3.0 video and "mp4a.40.2" audio encoded content.
#EXT-X-MEDIA Specifies additional media playlist (in the URI attribute) that represents the content. These are usually alternative audio streams in other format (5.1 surround sound) or language. An attribute of TYPE containing either VIDEO, AUDIO, SUBTITLES, or CLOSED-CAPTIONS are allowed. Setting the DEFAULT attribute to YES will indicate choosing this alternative stream by default.

Here is a list of HLS tags that the Web Receiver Player currently supports in the media playlist:

Tag Name Functionality
#EXTINF Stream information, usually followed by the duration of the segment in seconds, and on the next line the url of the segment.
#EXT-X-TARGETDURATION How long in seconds each segment is. This also determines how often we download/refresh the playlist manifest for a live stream. The Web Receiver Player does not support durations shorter than 0.1 sec.
#EXT-X-MEDIA-SEQUENCE The sequence number (often for a live stream) that the first segment in this playlist represents.
#EXT-X-KEY DRM key information. The METHOD attribute tells us what key system to use. Today we support AES-128 and SAMPLE-AES .
#EXT-X-BYTERANGE The byte range to fetch for a segment url.
#EXT-X-DISCONTINUITY Specifies a discontinuity between consecutive segments. This is often seen with server side ad insertion where an ad segment appears in the middle of the main stream.
#EXT-X-PROGRAM-DATE-TIME Absolute time of the first sample of the next segment, for example "2016-09-21T23:23:52.066Z".
#EXT-X-ENDLIST Whether this is a VOD or live stream.

For live stream, we use #EXT-X-PROGRAM-DATE-TIME and #EXT-X-MEDIA-SEQUENCE as the key factors to determine how to merge a newly refreshed manifest. If present, the #EXT-X-PROGRAM-DATE-TIME is used to match the refreshed segments. Otherwise, the #EXT-X-MEDIA-SEQUENCE number will be used. Note that per the HLS spec, we do not use file name comparison for matching.

Our HLS implementation supports selecting an alternative audio stream, such as 5.1 surround sound, as the main audio playback. This can be accomplished by having an #EXT-X-MEDIA tag with the alternative codecs as well as providing the segment format in the stream configuration.

The Web Receiver Player expects certain per-spec behavior. For example, after a #EXT-INF tag, we expect a URI. If it’s not a URI, for example a #EXT-X-DISCOUNTINUITY will cause the parsing to fail for the playlist.

Every #EXT-X-TARGETDURATION seconds, we reload the playlist/manifest to get new segment lists and we update the new internal representation of all the segments to the new one. Any time a seek is requested, we only seek within the seekable range. For live, we only allow seeking from the beginning of the newest list until a three target duration from the end. So for example, if you have a 10 segment list, and you are on segment 6, you can only seek up to 7, but not 8.

Segment format support

The CAF SDK supports playing content delivered in multiple formats as referenced in HlsSegmentFormat for audio and HlsVideoSegmentFormat for video. This includes support for packed audio such as AAC and AC3 playback, both encrypted and non-encrypted. It is required to specify this information in the MediaInformation of the LoadRequestData in order to properly describe your content to the player. If not specified, the default player configuration will attempt to play the content as Transport Stream packaged content. This property can be set from any of the senders in the load request data (Android, iOS and Web) or within the receiver through message interceptors.

Check out the sample code snippet below or the Loading media using contentId, contentUrl and entity guide for more information on how to prepare content on the Web Receiver.

    cast.framework.messages.MessageType.LOAD, loadRequestData => {
      // Specify segment format for an HLS stream playing CMAF packaged content.
      loadRequestData.media.contentType = 'application/x-mpegurl';
      loadRequestData.media.hlsSegmentFormat = cast.framework.messages.HlsSegmentFormat.FMP4;
      loadRequestData.media.hlsVideoSegmentFormat = cast.framework.messages.HlsVideoSegmentFormat.FMP4;
      return loadRequestData;

Content protection

As listed in the #EXT-X-KEY tag section above, the Cast SDK supports SAMPLE-AES or SAMPLE-AES-CTR where a URI to the key an initialization vector can be specified:

URI="data:text/plain;base64,XXXXXX", \
IV=0x6df49213a781e338628d0e9c812d328e, \
KEYFORMAT="com.widevine", \

The KEYFORMAT we support now is Widevine, and the URI contains a BASE64 encoded DRM info XXXXXXX which when decoded contains the key id:

   "content_id": "MTQ1NjkzNzM1NDgxNA==",
   "key_ids": [

Version 1 defines the following attributes:

Attribute Example Description
KEYFORMATVERSIONS "1" This proposal defines key format version 1
KEYFORMAT "urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed" The UUID is the Widevine UUID from DASH IF IOP. The same exact string is used in MPD with Widevine encrypted streams.
URI "data:text/plain;base64, <base64 encoded PSSH box>" URI of the stream containing the data type and PSSH box.
METHOD SAMPLE-AES-CTR Indicates the encryption cipher used when encrypting the content. SAMPLE-AES signals that the content is encrypted using ‘cbcs’. SAMPLE-AES-CTR signals that the content is encrypted using one of the AES-CTR protections schemes, namely ‘cenc’.

Attributes mapped to DASH MPD:

Attribute Description
KEYFORMAT ContentProtection element’s schemeIdUri attribute.
URI The content of cenc:pssh element.
KEYID 16-byte hexadecimal string encoding the key ID which has the same role as the default_kid in MPEG DASH. If using a hierarchical key scheme, this would be the "root" key.

Example HLS Playlist with V2 Signaling:


Below is a list of features and tags in HLS that we currently do not use or support. Their presence or absence do not affect the streaming behavior.

  • RESOLUTION= attribute in #EXT-X-STREAM-INF is ignored.
  • AUTOSELECT= attribute in #EXT-X-MEDIA is not used. Instead we rely on DEFAULT=
  • #EXT-X-I-FRAME-STREAM-INF in master playlist is ignored.
  • #EXT-X-PLAYLIST-TYPE:EVENT can be present in a live stream and #EXT-X-PLAYLIST-TYPE:VOD can be present in a VOD stream, but currently our Web Receiver Player only relies on the existence of #EXT-X-ENDLIST to determine live v.s. VOD.

Smooth streaming

Microsoft's official Smooth Streaming spec.

Smooth streaming provides adaptive streaming protocol and XML specification over HTTP (similar to DASH). Different than DASH, Smooth Streaming recommends only MPEG-4 packaging for the media segments.

Here is a table of the most common tags and attributes in Smooth Streaming that the Web Receiver Player supports today. Many concepts are already explained in the DASH section above.

Tag/Attribute Usage
<SmoothStreamingMedia> Main tag for the manifest, contains attributes of:
  • TimeScale: Number of units to represent one second, typically increment of 10,000,000.
  • Duration: Length of the content in time scale. The Web Receiver Player does not support durations shorter than 0.1 sec.
  • IsLive: Whether the manifest is a live media.
<StreamIndex> One set of stream, similar to DASH’s AdaptationSet. The type is usually "text", "video", or "audio". The Url attribute usually contains a templated fragment URL using information like bitrate or start time.
<QualityLevel> Each QualityLevel tag specifies its Bitrate and a FourCC codec. The FourCC code are often ‘H264’, ‘AVC1’, ‘AACL’ etc. For video, it specifies its resolutions through MaxWidth and MaxHeight. For audio, it specifies its frequency (such as 44100) through SamplingRate and number of Channels.
<c> Stream Fragment Element. Contains:
  • d: duration of a fragment.
  • t: Media Time of the fragment.
<Protection> A tag with the optional SystemID attribute listing the ID of the system DRM to use under <SmoothStreamingMedia> tag.
<ProtectionHeader> Under <Protection>, can contain an attribute of SystemID and custom data, usually Base64 encoded. For Widevine, it will contain the key id, the key length, the algorithm ID, such as AESCTR, the LA_URL (license acquisition url), LUI_URL (license user interface url), and DS_ID (domain service id).

Content protection

To encode the protection system IDs properly, please use the below mapping:

  • WIDEVINE: 'EDEF8BA9-79D6-4ACE-A3C8-27DCD51D21ED',
  • CLEARKEY: '1077EFEC-C0B2-4D02-ACE3-3C1E52E2FB4B',

For <ProtectionHeader>, below is an example with Base64 encoded data. The data, when decoded, conforms to the same decoded format as described in the DASH content protection support above.

  <ProtectionHeader SystemID="9a04f079-9840-4286-ab92-e65be0885f95">

Below is an example of a live Smooth streaming manifest with a 3000 second duration of content:

<?xml version="1.0"?>
  <SmoothStreamingMedia MajorVersion="2" MinorVersion="0" Duration="3000000000"
    TimeScale="10000000" IsLive="TRUE" LookAheadFragmentCount="2" DVRWindowLength="600000000" CanSeek="TRUE" CanPause="TRUE">
    <StreamIndex Type="text" Name="textstream301_swe" Language="swe" Subtype="CAPT" Chunks="0"
      TimeScale="10000000" Url="QualityLevels({bitrate})/Fragments(textstream301_swe={start time})">
      <QualityLevel Index="0" Bitrate="20000" CodecPrivateData="" FourCC="DFXP"/>
        <c d="40000000" t="80649382288125"/>
        <c d="39980000"/>
        <c d="40020000"/>
      <ProtectionHeader> SystemID="$BASE64ENCODEDDRMDATA$"</ProtectionHeader>
    <StreamIndex Type="audio" Name="audio101_eng" Language="eng" Subtype="AACL" Chunks="0"
      TimeScale="10000000" Url="QualityLevels({bitrate})/Fragments(audio101_eng={start time})">
      <QualityLevel Index="0" Bitrate="128000" CodecPrivateData="1290" FourCC="AACL" AudioTag="255"
        Channels="2" SamplingRate="32000" BitsPerSample="16" PacketSize="4"/>
      <c d="40000000" t="80649401327500"/>
      <c d="40000000"/>
      <c d="40000000"/>
    <StreamIndex Type="video" Name="video" Subtype="AVC1" Chunks="0" TimeScale="10000000"
      Url="QualityLevels({bitrate})/Fragments(video={start time})">
      <QualityLevel Index="0" Bitrate="400000" CodecPrivateData="000000016742E01596540C0EFCB808140000000168CE3880"
        FourCC="AVC1" MaxWidth="384" MaxHeight="216"/>
      <QualityLevel Index="1" Bitrate="800000" CodecPrivateData="00000001674D401E965281004B6020500000000168EF3880"
        FourCC="AVC1" MaxWidth="512" MaxHeight="288"/>
      <QualityLevel Index="2" Bitrate="1600000" CodecPrivateData="00000001674D401E965281B07BCDE020500000000168EF3880"
        FourCC="AVC1" MaxWidth="854" MaxHeight="480"/>
      <QualityLevel Index="3" Bitrate="2200000" CodecPrivateData="00000001674D401F96528080093602050000000168EF3880"
        FourCC="AVC1" MaxWidth="1024" MaxHeight="576"/>
      <c d="40000000" t="80649401378125"/>
      <c d="40000000"/>
      <c d="40000000"/>

In the above example for the video stream, the url template is:

QualityLevels({bitrate})/Fragments(video={start time})

So the first two segments (assuming we are on index 2 quality level) will be the following, with initial time extracted from t="80649401378125" under the video StreamIndex and the increment of time of 4 seconds * 10000000 per segment:


Here is a list of Smooth Streaming attributes that we currently ignore and have no effect on streaming experiences regardless of whether they are provided:

  • CanSeek, CanPause in <SmoothStreamingMedia> tag.
  • Chunks, QualityLevels in <StreamIndex> tag. Instead, we calculate the number of segments and number of quality levels based on information provided inside <StreamIndex> such as the actual QualityLevel tag and the <c> tags.
  • BitsPerSample, PacketSize in <QualityLevel> is not used.

Check display type

The canDisplayType method checks for video and audio capabilities of the Web Receiver device and display by validating the media parameters passed in, returning a boolean. All parameters but the first are optional — the more parameters you include, the more precise the check will be.

Its signature is canDisplayType(<em>mimeType</em>,<em>codecs</em>,<em>width</em>,<em>height</em>,<em>framerate</em>)


Checks whether the Web Receiver device and display support the video/mp4 mimetype with this particular codec, dimensions, and framerate:

canDisplayType("video/mp4", "avc1.42e015,mp4a.40.5", 1920, 1080, 30)

Checks whether the Web Receiver device and display support 4K video format for this codec by specifying the width of 3840 and height of 2160:

canDisplayType("video/mp4", "hev1.1.2.L150", 3840, 2160)

Checks whether the Web Receiver device and display support HDR10 for this codec, dimensions, and framerate:

canDisplayType("video/mp4", "hev1.2.6.L150", 3840, 2160, 30)

Checks whether the Web Receiver device and display support Dolby Vision (DV) for this codec, dimensions, and framerate:

canDisplayType("video/mp4", "dvhe.04.06", 1920, 1080, 30)


Some media content requires Digital Rights Management (DRM). For media content that has its DRM license (and key URL) stored in their manifest (DASH or HLS), the Cast SDK handles this case for you. A subset of that content requires a licenseUrl which is needed to obtain the decryption key. In the Web Receiver, you can use PlaybackConfig to set the licenseUrl as needed.

The following code snippet shows how you can set request information for license requests such as withCredentials:

const context = cast.framework.CastReceiverContext.getInstance();
const playbackConfig = new cast.framework.PlaybackConfig();
// Customize the license url for playback
playbackConfig.licenseUrl = 'http://widevine/yourLicenseServer';
playbackConfig.protectionSystem = cast.framework.ContentProtection.WIDEVINE;
playbackConfig.licenseRequestHandler = requestInfo => {
  requestInfo.withCredentials = true;
context.start({playbackConfig: playbackConfig});

// Update playback config licenseUrl according to provided value in load request.
context.getPlayerManager().setMediaPlaybackInfoHandler((loadRequest, playbackConfig) => {
  if (loadRequest.media.customData && loadRequest.media.customData.licenseUrl) {
    playbackConfig.licenseUrl = loadRequest.media.customData.licenseUrl;
  return playbackConfig;

If you have a Google Assistant integration, some of the DRM information such as the credentials necessary for the content might be linked directly to your Google account through mechanisms such as OAuth/SSO. In those cases, if the media content is loaded through voice or comes from the cloud, a setCredentials is invoked from the cloud to the Cast device providing that credentials. Applications writing a Web Receiver app can then use the setCredentials information to operate DRM as necessary. Here is an example of using the credential to construct the media.

Tip: Also see Loading media using contentId, contentUrl and entity.

Audio channel handling

When the Cast player loads media, it sets up a single audio source buffer. At the same time, it also selects an appropriate codec to be used by the buffer, based on the MIME type of the primary track. A new buffer and codec are set up:

  • when playback starts,
  • at every ad break, and
  • every time the main content resumes.

Because the buffer uses a single codec, and because the codec is chosen based on the primary track, there are situations where secondary tracks may be filtered out and not heard. This can happen when a media program's primary track is in surround-sound, but secondary audio tracks use stereo sound. Because secondary tracks are frequently used to offer content in alternate languages, providing media containing different numbers of tracks can have a substantial impact, such as large numbers of viewers being unable to hear content in their native language.

The following scenarios illustrate why it is important to provide programming where primary and secondary tracks contain the same number of channels:

Scenario 1 - media stream lacking channel parity across primary and secondary tracks:

  • english - AC-3 5.1 channel (primary)
  • swedish - AAC 2-channel
  • french - AAC 2-channel
  • german - AAC 2-channel

In this scenario, if the player's language is set to anything other than English, the user does not hear the track they expect to hear, because all two-channel tracks are filtered out during playback. The only track that could be played would be the primary AC-3 5.1-channel, and then only when the language is set to English.

Scenario 2 - media stream with channel parity across primary and secondary tracks:

  • english - AC-3 5.1 channel (primary)
  • swedish - AC-3 5.1 channel
  • french - AC-3 5.1 channel
  • german - AC-3 5.1 channel

Because this stream's tracks all have the same number of channels, an audience will hear a track regardless of the selected language.

Shaka audio channel handling

The Shaka player (DASH) defaults to a preferred channel count of two, as a mitigation measure when encountering media that lacks parity across secondary audio tracks.

If the primary track is not surround sound (for instance, a two-channel stereo track), then the Shaka player will default to two channels, and will automatically filter out any secondary media tracks that have more than two channels.

Shaka's preferred number of audio channels can also be configured by setting the preferredAudioChannelCount in the shakaConfig property on cast.framework.PlaybackConfig.

For example:

shakaConfig = { "preferredAudioChannelCount": 6 };

With the preferredAudioChannelCount set to 6, Shaka Player checks to see if it can support the surround sound codecs (AC-3 or EC-3), and automatically filters out any media tracks that do not conform to the preferred number of channels.