Live encoding with VP9 using FFmpeg

Encoding parameters

VP9 provides a range of parameters to optimize live encoding. Some broad principles of these are discussed in Bitrate Modes.

FFmpeg VP9 encoding example

The table below describes the parameters of an example ffmpeg call for VP9 encoding.

Parameter Description
-quality realtime realtime is essential for live streaming and for speeds above 5.
-speed 6 Speed 5 to 8 should be used for live / real-time encoding. Lower numbers (5 or 6) are higher quality but require more CPU power. Higher numbers (7 or 8) will be lower quality but more manageable for lower latency use cases and also for lower CPU power devices such as mobile.
-tile-columns 4 Tiling splits the video into rectangular regions, which allows multi-threading for encoding and decoding. The number of tiles is always a power of two. 0 = 1 tile, 1 = 2, 2 = 4, 3 = 8, 4 = 16, 5 = 32.
-frame-parallel 1 Enable parallel decodability features.
-threads 8 Maximum number of threads to use.
-static-thresh 0 Motion detection threshold.
-max-intra-rate 300 Maximum i-Frame bitrate (pct)
-deadline realtime Alternative (legacy) version of -quality realtime
-lag-in-frames 0 Maximum number of frames to lag
-qmin 4 -qmax 48 Minimum and maximum values for the quantizer. The values here are merely a suggestion and adjusting this will help increase/decrease video quality at the expense of compression efficiency.
-row-mt 1 Enable row-multithreading. Allows use of up to 2x thread as tile columns. 0 = off, 1 = on.
-error-resilient 1 Enable error resiliency features.

Choosing encoding parameters

The information below uses constant bitrate (CBR) encoding for live adaptive bitrate streaming (ABR), where each target rate is explicitly set in the packager's manifest. This will result in cleaner "switching" between rates for clients. Variable bitrate (VBR) encoding and CQ mode are also options if the bitrate can be more flexible or the encoding is being chunked. Q mode will struggle with the realtime encoding required for live video. See Bitrate Modes for more information.

For further details on how to manipulate VP9 it is also worth referring to the accompanying article on VOD settings, but taking into consideration a focus on CBR.

Tips and tricks

Remember that, when live streaming, everything is constrained to a minimum realtime encoding speed of 1x (FFmpeg reports encoding speed as it progresses). If your encoding speed drops below 1x then the encoding process will not keep up with the input of live video, and users will experience buffering, and breaks in the transmission will render the stream unusable during the live broadcast (although the archive will be generally usable).

Examples of encoding parameters in action

The following shows CPU utilization at 25 fps for various frame sizes on a quad-core i5 3.6Ghz desktop running Linux:

Target Resolution FFmpeg VP9 Parameters CPU / Speed (example)
3840x2160 (2160p) -r 30 -g 90 -s 3840x2160 -quality realtime -speed 5 -threads 16 -row-mt 1 -tile-columns 3 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 7800k ~88% 0.39x
2560x1440 (1440p) -r 30 -g 90 -s 2560x1440 -quality realtime -speed 5 -threads 16 -row-mt 1 -tile-columns 3 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 6000k ~86% 0.68x
1920x1080 (1080p) -r 30 -g 90 -s 1920x1080 -quality realtime -speed 5 -threads 8 -row-mt 1 -tile-columns 2 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 4500k ~82% 1.04x
1280x720 (720p) -r 30 -g 90 -s 1280x720 -quality realtime -speed 5 -threads 8 -row-mt 1 -tile-columns 2 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 3000k ~78% 1.77x
854x480 (480p) -r 30 -g 90 -s 854x480 -quality realtime -speed 6 -threads 4 -row-mt 1 -tile-columns 1 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 1800k ~64% 3.51x
640x360 (360p) -r 30 -g 90 -s 640x360 -quality realtime -speed 7 -threads 4 -row-mt 1 -tile-columns 1 -frame-parallel 0 -qmin 4 -qmax 48 -b:v 730k ~62% 5.27x
426x240 (240p) -r 30 -g 90 -s 426x240 -quality realtime -speed 8 -threads 2 -row-mt 1 -tile-columns 0 -frame-parallel 0 -qmin 4 -qmax 48 -b:v 365k ~66% 8.27x

An example FFmpeg might look like this:

ffmpeg -stream_loop 100 -i /home/id3as/Videos/120s_tears_of_steel_1080p.webm \
  -r 30 -g 90 -s 3840x2160 -quality realtime -speed 5 -threads 16 -row-mt 1 \
  -tile-columns 3 -frame-parallel 1 -qmin 4 -qmax 48 -b:v 7800k -c:v vp9 \
  -b:a 128k -c:a libopus -f webm pipe1

Tips and tricks

  • Note that here we are outputting to a FIFO pipe ("pipe1"), which should be created before execution, before running the FFmpeg command. To do this, give the command mkfifo pipe1 in your working directory. When using Shaka Packager, it will listen to that pipe as its input source for the given stream. Other packaging models may require a different method.

  • To ensure -row-mt commands are recognized, use the latest stable release of FFmpeg (3.3.3 currently) from https://www.ffmpeg.org/download.html

Example adaptive bitrate set

Depending on the power of the machine running the FFmpeg encode it may or may not be possible to deliver all of the following encodings at the same time, so a subset suited to your own available resources and target audiences should be selected from the list.

FFmpeg full ABR set

In an ideal scenario we combine the encoding examples outlined in the preceding section to create a single command that produces them all at the same time:

ffmpeg -stream_loop 100 -i lakes1080p.mp4 \
  -y -r 25 -g 75 -s 3840x2160 -quality realtime -speed 5 -threads 8 \
  -tile-columns 2 -frame-parallel 1 \
  -b:v 7800k -c:v vp9 -b:a 196k -c:a libopus -f webm pipe1 \
  -y -r 25 -g 75 -s 2560x1440 -quality realtime -speed 5 -threads 8 \
  -tile-columns 2 -frame-parallel 1 \
  -b:v 6000k -c:v vp9 -b:a 196k -c:a libopus -f webm pipe2 \
  -y -r 25 -g 75 -s 1920x1080 -quality realtime -speed 5 -threads 4 \
  -tile-columns 2 -frame-parallel 1 \
  -b:v 4500k -c:v vp9 -b:a 196k -c:a libopus -f webm pipe3 \
  -y -r 25 -g 75 -s 1280x720 -quality realtime -speed 5 -threads 4 \
  -tile-columns 2 -frame-parallel 1 \
  -b:v 3000k -c:v vp9 -b:a 196k -c:a libopus -f webm pipe4 \
  -y -r 25 -g 75 -s 854x480 -quality realtime -speed 6 -threads 4 \
  -tile-columns 2 -frame-parallel 1 \
  -b:v 2000k -c:v vp9 -b:a 196k -c:a libopus -f webm pipe5 \
  -y -r 25 -g 75 -s 640x360 -quality realtime -speed 7 -threads 2 \
  -tile-columns 1 -frame-parallel 0 \
  -b:v 730k -c:v vp9 -b:a 128k -c:a libopus -f webm pipe6 \
  -y -r 25 -g 75 -s 426x240 -quality realtime -speed 8 -threads 2 \
  -tile-columns 1 -frame-parallel 0 \
  -b:v 365k -c:v vp9 -b:a 64k -c:a libopus -f webm pipe7

The above full set will, however, require a very powerful CPU, or possibly support from hardware GPU offload such as some chipsets increasingly provide. Intel Kabylake (and beyond) has a full hardware encoding pipeline. (Note that the Kabylake GPU can do 8-bit VP9 encode, but not 10-bit).

A practical desktop example using Shaka Packager

A more practical example for common desktop machines might use Shaka Packager. A simple way to setup Shaka is to install it within a Docker container, using Google's DockerHub image. Instructions can be found here:

https://github.com/google/shaka-packager#using-docker-for-testing--development

For this example, we used a machine with the following configuration:

System Host: obs Kernel: 4.4.0-91-lowlatency x86_64 (64 bit)
Desktop Xfce 4.12.3 Distro: OS: https://ubuntustudio.org/2016/10/ubuntu-studio-16-10-released/
CPU Quad core Intel Core i5-6500 (-MCP-)
cache: 6144 KB
clock speeds: max: 3600 MHz 1: 800 MHz 2: 800 MHz 3: 800 MHz 4: 800 MHz
Graphics Card Intel Skylake integrated graphics
Memory 8GB RAM
{:.no-th }

In practice this machine could optimally produce the following usable range of ABR encodes, with FFmpeg consistently reporting 1x encode speed:

ffmpeg -stream_loop 100 -i 120s_tears_of_steel_1080p.webm \
  -y -r 30 -g 90 -s 1920x1080 -quality realtime -speed 7 -threads 8 \
  -row-mt 1 -tile-columns 2 -frame-parallel 1 -qmin 4 -qmax 48 \
  -b:v 4500k -c:v vp9 -b:a 128k -c:a libopus -f webm pipe1 \
  -y -r 30 -g 90 -s 1280x720 -quality realtime -speed 8 -threads 6 \
  -row-mt 1 -tile-columns 2 -frame-parallel 1 -qmin 4 -qmax 48 \
  -b:v 3000k -c:v vp9 -b:a 128k -c:a libopus -f webm pipe2 \
  -y -r 30 -g 90 -s 640x360 -quality realtime -speed 8 -threads 2 \
  -row-mt 1 -tile-columns 1 -frame-parallel 1 -qmin 4 -qmax 48 \
  -b:v 730k -c:v vp9 -b:a 128k -c:a libopus -f webm pipe3

Note that the -speed settings are quite high. These settings were established experimentally and will vary from machine to machine.

Shaka Packager overhead

Packaging is not a particularly CPU-intensive activity. Shaka Packager can be set to listen for all the outputs, even if only a subset are being delivered by FFmpeg. These are the packager settings tested on the machine outlined above:

packager \
  in=pipe1,stream=audio,init_segment=livehd-audio-1080.webm,segment_template=livehd-audio-1080-\$Number\$.webm \
  in=pipe1,stream=video,init_segment=livehd-video-1080.webm,template=livehd-video-1080-\$Number\$.webm \
  in=pipe2,stream=audio,init_segment=livehd-audio-720.webm,segment_template=livehd-audio-720-\$Number\$.webm \
  in=pipe2,stream=video,init_segment=livehd-video-720.webm,template=livehd-video-720-\$Number\$.webm \
  in=pipe3,stream=audio,init_segment=livehd-audio-360.webm,segment_template=livehd-audio-360-\$Number\$.webm \
  in=pipe3,stream=video,init_segment=livehd-video-360.webm,template=livehd-video-360-\$Number\$.webm \
  --mpd_output livehd.mpd --dump_stream_info --min_buffer_time=10 --time_shift_buffer_depth=300 \
  --segment_duration=3 --io_block_size 65536