Media Player Library Streaming Protocols

The Google Cast Media Player Library supports three types of streaming protocol today:

DASH, HTTP Live Streaming, and Smooth Streaming.

In this document we list our support for each of the streaming protocols. Note the explanation of supported tags for each protocol is quite abbreviated compared to the detailed protocol spec. The goal is to provide a quick glimpse and understanding of how to use each protocol, and which features of the protocol are supported on Cast enabled devices to deliver their streaming experiences.

Dynamic Adaptive Streaming over HTTP (DASH)

A detailed specification of DASH can be obtained here from the ISO.

DASH, as its name suggests, is an adaptive bitrate streaming protocol that enables high quality video streaming through HTTP(S) servers. A manifest, composed in XML, contains most of the metadata information for how to initialize and download the video content. The key concepts that Media Player Library supports are <Period>, <AdaptationSet>, <Representation>, <SegmentTemplate>, <SegmentList>, <BaseUrl>, and <ContentProtection>.

A DASH manifest starts with a root <MPD> tag and inside includes one or more <Period>s, where each <Period> represents one streaming content. <Period>s allow ordering of different pieces of streaming content and are often used to separate main content and advertisement or multiple consecutive video contents.

An <AdaptationSet> under <MPD> is a set of representations for one type of media stream, in most cases video, audio, or captions. The most commonly supported mimetypes are “video/mp4”, “audio/mp4”, and “text/vtt”. An optional <ContentComponent contentType=”$TYPE$”> can be included under <AdaptationSet>.

Inside each <AdaptationSet> a list of <Representation>s should be present and Media Player Library uses the codecs information to initialize MSE source buffer and the bandwidth information to automatically choose the right representation/bitrate to play.

For each <Representation>, media segments are described using either a <BaseURL> for single segment representation, <SegmentList> for list of segments (similar to HLS), or <SegmentTemplate>.

For a <SegmentTemplate>, it indicates how initialization segment and media segments can be represented through templating. In the below example $Number$ indicates the segment number as available from the CDN. So it translates to seg1.m4s, seg2.m4s, etc. as playback continues.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:ns2="http://www.w3.org/1999/xlink"
  profiles="urn:mpeg:dash:profile:isoff-live:2011,http://dashif.org/guidelines/dash264" type="static"
  publishTime="2016-10-05T22:07:14.859Z" mediaPresentationDuration="P1DT0H0M0.000S" minBufferTime="P0DT0H0M7.500S">
  <Period id="P0">
    <AdaptationSet lang="en" segmentAlignment="true">
      <ContentComponent id="1" contentType="audio"/>
      <SegmentTemplate media="seg$Number$.m4s" initialization="seginit.mp4"
        duration="10000" startNumber="1" timescale="1000" presentationTimeOffset="0"/>
      <Representation id="1" bandwidth="150123" audioSamplingRate="44100"
        mimeType="audio/mp4" codecs="mp4a.40.2" startWithSAP="1">
        <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
        <BaseURL>http://www.google.com/testVideo</BaseURL>
      </Representation>
    </AdaptationSet>
    <AdaptationSet segmentAlignment="true">
      <ContentComponent id="1" contentType="video"/>
      <SegmentTemplate media="seg$Number$.m4s" initialization="seginit.mp4"
        duration="10000" startNumber="1" timescale="1000" presentationTimeOffset="0"/>
      <Representation id="1" bandwidth="212191" width="384" height="208" sar="26:27"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">
        <BaseURL>http://www.google.com/testVideo/bitrate1/</BaseURL>
      </Representation>
      <Representation id="1" bandwidth="366954" width="512" height="288" sar="1:1"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">
        <BaseURL>http://www.google.com/testVideo/bitrate2/</BaseURL>
      </Representation>
      <Representation id="1" bandwidth="673914" width="640" height="352" sar="44:45"
        frameRate="25" mimeType="video/mp4" codecs="avc1.42c01f" startWithSAP="1">
        <BaseURL>http://www.google.com/testVideo/bitrate3/</BaseURL>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

For a <SegmentTemplate>, it’s common to use the <SegmentTimeline> tag to indicate how long each segment is and which segments repeat. A timescale (units to represent one second) is often included as part of the attributes of <SegmentTemplate> so that we can calculate the time of the segment based on this unit. In the example below, the <S> tag signifies a segment tag, the d attribute specifies how long the segment is and the r attribute specifies how many segments of the same duration repeat so that $Time$ can be calculated properly for downloading the media segment as specified in the media attribute.

<SegmentTemplate>
  timescale="48000"
  initialization="$RepresentationID$-init.dash"
  media="$RepresentationID$-$Time$.dash"
    startNumber="1">
    <SegmentTimeline>
      <S t="0" d="96256" r="2" />
      <S d="95232" />
      <S d="96256" r="2" />
      <S d="95232" />
      <S d="96256" r="2" />
   </SegmentTimeline>
</SegmentTemplate>

For representation using <SegmentList>, here is an example:

<Representation id="FirstRep" bandwidth="2000000" width="1280"
  height="720">  
  <BaseURL>FirstRep/</BaseURL>
  <SegmentList timescale="90000" duration="270000">  
     <RepresentationIndex sourceURL="representation-index.sidx"/>  
     <SegmentURL media="seg-1.ts"/>  
     <SegmentURL media="seg-2.ts"/>  
     <SegmentURL media="seg-3.ts"/>  
  </SegmentList>
</Representation>

For a single segment file, a <SegmentBase> is often used with byte range requests to specify which part of a <BaseURL> file contains the index, and the rest can be fetched on demand as playback continues or a seek happens. Here the Initialization range specifies the init metadata range and the indexRange specifies the index for the media segments. Note that right now we only support consecutive byte ranges.

<Representation bandwidth="4190760" codecs="avc1.640028"
  height="1080" id="1" mimeType="video/mp4" width="1920">
  <BaseURL>video.mp4BaseURL>
  <SegmentBase indexRange="674-1149">
    <Initialization range="0-673" />
  </SegmentBase>
</Representation>

Regardless of which representation is used, if the streams are protected, a <ContentProtection> section can appear under <AdaptationSet>, where a schemeIdUri uniquely identifies the DRM system to use. An optional key ID can be included for common encryption.

<!-- Common Encryption -->
<ContentProtection
  schemeIdUri="urn:mpeg:dash:mp4protection:2011"
  value="cenc"
  cenc:default_KID="7D2714D0-552D-41F5-AD56-8DD9592FF891">
</ContentProtection>

<!-- PlayReady -->
<ContentProtection
  schemeIdUri="urn:uuid:9A04F079-9840-4286-AB92-E65BE0885F95"
  value="MSPR 2.0">
</ContentProtection>

<!-- Widevine -->
<ContentProtection
  schemeIdUri="urn:uuid:EDEF8BA9-79D6-4ACE-A3C8-27DCD51D21ED">
</ContentProtection>

Here's an example of embedded content protection using Microsoft PlayReady with license server request:

<ContentProtection schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95"
  value="2.0" default_KID="10000000-1000-1000-1000-100000000001">
  <mspr:pro>XXXXXX</mspr:pro>
</ContentProtection>

Where XXXXX is BASE64 encoded data which when decoded looks something like this:

<WRMHEADER xmlns="http://schemas.microsoft.com/DRM/2007/03/PlayReadyHeader" version="4.0.0.0">  
  <DATA>
     <PROTECTINFO>
       <KEYLEN>16</KEYLEN>
       <ALGID>AESCTR</ALGID>
     </PROTECTINFO>
     <KID>AAAAEAAQABAQABAAAAAAAQ==</KID>
     <CHECKSUM>5TzIYQ2hrOY=</CHECKSUM>
     <LA_URL>http://playready.directtaps.net/pr/svc/rightsmanager.asmx?PlayRight=1&UseSimpleNonPersistentLicense=1</LA_URL>  
     <CUSTOMATTRIBUTES>
       <IIS_DRM_VERSION>7.1.1565.4</IIS_DRM_VERSION>
     </CUSTOMATTRIBUTES>
  </DATA>
</WRMHEADER>

KEYLEN specifies the length of the KID, and ALGID specifies the encryption algorithm which in this example is AES counter mode. LA_URL is the license acquisition URL; if not present a LU_URL license user interface URL should be in place.

For more examples and details please refer to the MPEG-DASH specification. Below is a list of additional DASH attributes on tags not mentioned above that we currently support:

Attribute Name Attribute Function
mediaPresentationDuration How long the video content is.
minimumUpdatePeriod Attribute of the <MPD> tag; specifies how often we need to reload the manifest.
type Attribute of the <MPD> tag; “dynamic” to indicate that this is a live stream.
presentationTimeOffset Attribute of <SegmentBase> tag; specifies the presentation time offset from the beginning of the period.
startNumber Specifies the number of the first media segment in a presentation in a period. This is often used in live stream.

We also support recognizing EMSG box inside MP4 fragments for DASH and provide notification to developers. For usage of this notification please refer to processMetadata on the application Host API where the type will be ‘EMSG’ and data containing the EMSG box.

While our current Media Player Library supports the major DASH use case, here is a list of common attributes that our current implementation of DASH ignores or doesn’t use. This means regardless of whether the manifest contains them, they have no impact on the playback experience of the content.

  • availabilityStartTime
  • language
  • maxSegmentDuration
  • minBufferTime
  • segmentAlignment
  • timeShiftBufferDepth

Additionally, our library doesn’t have full support for the following features:

  • Multiple periods that have different representations structures
  • WebM in DASH

For these, we are looking to have this support in the future. We recommend not using these features for now.

HTTP Live Streaming (HLS)

The overview and full spec of HTTP live streaming can be obtained here.

One of the key strengths of the Media Player Library is its ability to support playback of HLS in MSE. Different from DASH, where a manifest comes in a single file, HLS sends the master playlist containing a list of all the variant streams with their respective URL. The variant playlist is the media playlist. The two major HLS tags that MPL currently supports in the master playlist are:

Tag Name Functionality
#EXT-X-STREAM-INF Specifies a bitrate/variant stream. It contains BANDWIDTH for adaptive bitrate streaming selection, CODECS for initializing MSE, such as “avc1.42c01e,mp4a.40.2”.
#EXT-X-MEDIA Specifies additional media playlist (in the URI= attribute) that represents the content. These are usually alternative audio streams in other format (5.1 surround sound) or language. An attribute of TYPE containing either VIDEO, AUDIO, SUBTITLES, or CLOSED-CAPTIONS are allowed. A DEFAULT=YES can be used here to indicate choosing this alternative stream by default.

Here is a list of HLS tags that MPL currently supports in media playlist:

Tag Name Functionality
#EXTINF Stream information, usually followed by the duration of the segment in seconds, and on the next line the url of the segment.
#EXT-X-TARGETDURATION How long in seconds each segment is. This also determines how often we download/refresh the playlist manifest for a live stream.
#EXT-X-MEDIA-SEQUENCE The sequence number (often for a live stream) that the first segment in this playlist represents.
#EXT-X-KEY DRM key information. The METHOD= attribute tells us what key system to use. Today we support AES-128 and SAMPLE-AES.
#EXT-X-BYTERANGE The byte range to fetch for a segment url.
#EXT-X-DISCONTINUITY Specifies a discontinuity between consecutive segments. This is often seen with server side ad insertion where an ad segment appears in the middle of the main stream.
#EXT-X-PROGRAM-DATE-TIME Absolute time of the first sample of the next segment, for example "2016-09-21T23:23:52.066Z".
#EXT-X-ENDLIST Whether this is a VOD or live stream.

For live stream, we use #EXT-X-PROGRAM-DATE-TIME and #EXT-X-MEDIA-SEQUENCE as the key factors to determine how to merge a newly refreshed manifest. An explicit option of preferSequenceNumberForPlaylistMatching can be specified on the Host options to choose which of the two to use. By default, we prefer to use #EXT-X-PROGRAM-DATE-TIME to match the refreshed segments over the #EXT-X-MEDIA-SEQUENCE number. In the absence of one, we always use the other one. Note that per HLS spec, we do not use file name comparison for matching.

Our HLS implementation supports selecting an alternative audio stream, such as 5.1 surround sound, as the main audio playback. This can be accomplished by having an #EXT-X-MEDIA tag with the alternative codecs as well as adding HlsSegmentFormat in the stream configuration.

Our HLS implementation supports packed audio such as AAC and AC3 playback, both encrypted and non-encrypted.

MPL expects certain per-spec behavior. For example, after a #EXT-INF tag, we expect a URI. If it’s not a URI, for example a #EXT-X-DISCOUNTINUITY will cause the parsing to fail for the playlist.

Every #EXT-X-TARGETDURATION seconds, we reload the playlist/manifest to get new segment lists and we update the new internal representation of all the segments to the new one. Any time a seek is requested, we only seek within the seekable range. For live, we only allow seeking from the beginning of the newest list until a three target duration from the end. So for example, if you have a 10 segment list, and you are on segment 6, you can only seek up to 7, but not 8.

Content Protection

As listed in the #EXT-X-KEY tag section above, we support AES-128 where a URI to the key an initialization vector can be specified:

#EXT-X-KEY:METHOD=AES-128,URI="https://example.com/crypt.key?id=testKey",
  IV=0x4B3E8EA2C3CB13E942A5F8F3CDB2C3C9

or SAMPLE-AES:

#EXT-X-KEY:METHOD=SAMPLE-AES,KEYFORMAT="com.widevine",KEYFORMATVERSIONS="1",
  URI="data:text/plain;base64,XXXXXX",IV=0x31812F52BAAB4FE0DAFBF71CDDCAEEAF

Where the KEYFORMAT we support now is Widevine, and the URI contains a BASE64 encoded DRM info XXXXXXX which when decoded contains the key id:

{
   "provider": "cast",
   "content_id": "MTQ1NjkzNzM1NDgxNA==",
   "key_ids": [
      "xxxxxxxxxxxxxxxx"
   ]
}

Below is a list of features and tags in HLS that we currently do not use or support. Their presence or absence do not affect the streaming behavior.

  • RESOLUTION= attribute in #EXT-X-STREAM-INF is ignored.
  • AUTOSELECT= attribute in #EXT-X-MEDIA is not used. Instead we rely on DEFAULT=
  • GROUP-ID= in #EXT-X-MEDIA is ignored.
  • #EXT-X-I-FRAME-STREAM-INF in master playlist is ignored.
  • #EXT-X-DISCONTINUITY-SEQUENCE is ignored
  • #EXT-X-MAP in playlist is ignored.
  • #EXT-X-PLAYLIST-TYPE:EVENT can be present in a live stream and #EXT-X-PLAYLIST-TYPE:VOD can be present in a VOD stream, but currently our Media Player Library only relies on the existence of #EXT-X-ENDLIST to determine live v.s. VOD.

Smooth Streaming

The official Smooth Streaming spec is on MSDN here: https://msdn.microsoft.com/en-us/library/ff469518.aspx

Smooth streaming provides adaptive streaming protocol and XML specification over HTTP (similar to DASH). Different than DASH, Smooth Streaming recommends only MPEG-4 packaging for the media segments.

Here is a table of the most common tags and attributes in Smooth Streaming that Media Player Library supports today. Many concepts are already explained in the DASH section above.

Tag/Attribute Usage
<SmoothStreamingMedia> Main tag for the manifest, contains attributes of:
  • TimeScale: Number of units to represent one second, typically increment of 10,000,000.
  • Duration: Length of the content in time scale.
  • IsLive: Whether the manifest is a live media.
<StreamIndex> One set of stream, similar to DASH’s AdaptationSet. The type is usually “text”, “video”, or “audio”. The Url attribute usually contains a templated fragment URL using information like bitrate or start time.
<QualityLevel> Each QualityLevel tag specifies its Bitrate and a FourCC codec. The FourCC code are often ‘H264’, ‘AVC1’, ‘AACL’ etc. For video, it specifies its resolutions through MaxWidth and MaxHeight. For audio, it specifies its frequency (such as 44100) through SamplingRate and number of Channels.
<c> Stream Fragment Element. Contains:
  • d: duration of a fragment.
  • t: Media Time of the fragment.
<Protection> A tag with the optional SystemID attribute listing the ID of the system DRM to use under <SmoothStreamingMedia> tag.
<ProtectionHeader> Under <Protection>, can contain an attribute of SystemID and custom data, usually Base64 encoded. For Widevine, it will contain the key id, the key length, the algorithm ID, such as AESCTR, the LA_URL(license acquisition url), LUI_URL(license user interface url), and DS_ID( domain service id).

Content Protection

To encode the protection system IDs properly, please use the below mapping:

  • PLAYREADY: '9A04F079-9840-4286-AB92-E65BE0885F95',
  • WIDEVINE: 'EDEF8BA9-79D6-4ACE-A3C8-27DCD51D21ED',
  • CLEARKEY: '1077EFEC-C0B2-4D02-ACE3-3C1E52E2FB4B',
  • MPEG_DASH_MP4PROTECTION: 'URN:MPEG:DASH:MP4PROTECTION:2011'

For <ProtectionHeader>, below is an example with Base64 encoded data. The data, when decoded, conforms to the same decoded format as described in the DASH content protection support above.

<Protection>
  <ProtectionHeader SystemID="9a04f079-9840-4286-ab92-e65be0885f95">
    $BASE64ENCODED_DATA
  </ProtectionHeader>
</Protection>

Below is an example of a live Smooth streaming manifest with a 3000 second duration of content:

<?xml version="1.0"?>  
  <SmoothStreamingMedia MajorVersion="2" MinorVersion="0" Duration="3000000000"
    TimeScale="10000000" IsLive="TRUE" LookAheadFragmentCount="2" DVRWindowLength="600000000" CanSeek="TRUE" CanPause="TRUE">  
    <StreamIndex Type="text" Name="textstream301_swe" Language="swe" Subtype="CAPT" Chunks="0"
      TimeScale="10000000" Url="QualityLevels({bitrate})/Fragments(textstream301_swe={start time})">  
      <QualityLevel Index="0" Bitrate="20000" CodecPrivateData="" FourCC="DFXP"/>  
        <c d="40000000" t="80649382288125"/>  
        <c d="39980000"/>  
        <c d="40020000"/>   
    </StreamIndex>  
    <Protection>  
      <ProtectionHeader> SystemID="$BASE64ENCODEDDRMDATA$”</ProtectionHeader>  
    </Protection>  
    <StreamIndex Type="audio" Name="audio101_eng" Language="eng" Subtype="AACL" Chunks="0"
      TimeScale="10000000" Url="QualityLevels({bitrate})/Fragments(audio101_eng={start time})">  
      <QualityLevel Index="0" Bitrate="128000" CodecPrivateData="1290" FourCC="AACL" AudioTag="255"
        Channels="2" SamplingRate="32000" BitsPerSample="16" PacketSize="4"/>  
      <c d="40000000" t="80649401327500"/>  
      <c d="40000000"/>  
      <c d="40000000"/>  
    </StreamIndex>  
    <StreamIndex Type="video" Name="video" Subtype="AVC1" Chunks="0" TimeScale="10000000"
      Url="QualityLevels({bitrate})/Fragments(video={start time})">  
      <QualityLevel Index="0" Bitrate="400000" CodecPrivateData="000000016742E01596540C0EFCB808140000000168CE3880"
        FourCC="AVC1" MaxWidth="384" MaxHeight="216"/>  
      <QualityLevel Index="1" Bitrate="800000" CodecPrivateData="00000001674D401E965281004B6020500000000168EF3880"
        FourCC="AVC1" MaxWidth="512" MaxHeight="288"/>  
      <QualityLevel Index="2" Bitrate="1600000" CodecPrivateData="00000001674D401E965281B07BCDE020500000000168EF3880"
        FourCC="AVC1" MaxWidth="854" MaxHeight="480"/>  
      <QualityLevel Index="3" Bitrate="2200000" CodecPrivateData="00000001674D401F96528080093602050000000168EF3880"
        FourCC="AVC1" MaxWidth="1024" MaxHeight="576"/>  
      <c d="40000000" t="80649401378125"/>  
      <c d="40000000"/>  
      <c d="40000000"/>  
    </StreamIndex>  
  </SmoothStreamingMedia>

In the above example for the video stream, the url template is:

QualityLevels({bitrate})/Fragments(video={start time})

So the first two segments (assuming we are on index 2 quality level) will be the following, with initial time extracted from t="80649401378125" under the video StreamIndex and the increment of time of 4 seconds * 10000000 per segment:

QualityLevels(2)/Fragments(video=80649401378125)
QualityLevels(2)/Fragments(video=80649441378125)
...

Here is a list of Smooth Streaming attributes that we currently ignore and have no effect on streaming experiences regardless of whether they are provided:

  • CanSeek, CanPause in <SmoothStreamingMedia> tag.
  • Chunks, QualityLevels in <StreamIndex> tag. Instead, we calculate the number of segments and number of quality levels based on information provided inside <StreamIndex> such as the actual QualityLevel tag and the <c> tags.
  • BitsPerSample, PacketSize in <QualityLevel> is not used.