Click here to see your recently viewed pages and most viewed pages.
Hide

Configuring Downstream Caches

Overview

Note: New feature as of 1.6.29.3

Note: This feature is currently experimental. Options and configuration described here are subject to change in future releases. Please subscribe to the announcements mailing list to keep yourself informed of updates to this feature.

PageSpeed serves HTML files with Cache-Control: no-cache, max-age=0 so that changes to the HTML and resources embedded in it can be sent afresh on every request. This is essential for the correct working of several filters, such as Extend Cache, that rely on updating the content hash when there are changes to the content of an embedded resource. Note that downstream caches are able to accelerate the delivery of resources already rewritten by PageSpeed, and do so without expiring these resources for one year, but downstream caches are unable to cache rewritten HTML due to the use of no-cache.

However, as of 1.6.29.3, PageSpeed servers and downstream caches such as Nginx's proxy_cache and Varnish can be configured to cache and serve rewritten HTML. This is achieved by the PageSpeed server sending a purge request to the caching layer whenever it identifies an opportunity for more rewriting to be done on content that was just served. Such opportunities could arise because of, say, the resources now becoming available in the PageSpeed cache or an image compression operation completing. The cache purge forces the next request for the HTML file to come all the way to the backend PageSpeed server and obtain better rewritten content, which is then stored in the cache. This interaction between the PageSpeed server and the downstream caching layer is depicted in the diagram given below.

In the interaction depicted above, note that the partially optimized HTML will be served from the cache until a purge request gets sent by the PageSpeed server. This interaction will also need to be repeated once for every User-Agent class that is capable of supporting a different version of some of the PageSpeed optimizations. For example, the HTML response stored for a WebP-capable browser cannot be served to a non-WebP-capable browser. Hence, the cache will need to be fragmented on User-Agent classes, and purge requests will also need to purge the file from the right fragment. This is done by incorporating the User-Agent class into the cache key and forwarding the incoming User-Agent request headers to the purge requests. It is recommended to set up PageSpeed and the downstream caching layer servers on a one to one basis so that the purges can be sent to the correct downstream server.

Since the rewritten HTML might refer to cache-extended embedded resources with extended TTLs, you should ensure that the duration for which the rewritten HTML is cached is lower than the TTLs of all the publicly cacheable embedded resources on the page.

Configuring PageSpeed servers for downstream caching

Sample Apache and Nginx PageSpeed server configurations that can work for Varnish and Nginx proxy_cache are given below. These directives can be specified in the location block for Nginx and Directory or VirtualHost block for Apache that corresponds to the root location "/".

Apache:
ModPagespeedDownstreamCachePurgeLocationPrefix http://localhost:80
ModPagespeedDownstreamCachePurgeMethod PURGE
ModPagespeedDownstreamCacheRewrittenPercentageThreshold 95
Nginx:
pagespeed DownstreamCachePurgeLocationPrefix http://localhost:80;
pagespeed DownstreamCachePurgeMethod PURGE;
pagespeed DownstreamCacheRewrittenPercentageThreshold 95;

The DownstreamCachePurgeLocationPrefix specifies the host:port location of the caching layer server and the purge path prefix to be used for sending purge requests. A non-empty string for this directive enables the downstream caching feature. The DownstreamCachePurgeMethod specifies the HTTP request method to be used for purging, usually PURGE for Varnish and GET for proxy_cache.

Whenever PageSpeed serves an HTML response that is not fully optimized it continues rewriting in the background. When it finishes, if the HTML it served was less than DownstreamCacheRewrittenPercentageThreshold optimized, it sends a purge request to the downstream cache.

Assuming that your backend servers already serve publicly cacheable HTML with non-zero max-age and public Cache-Control headers, these will be propagated as is by the PageSpeed servers if you have configured ModPagespeedDownstreamCachePurgeLocationPrefix (for Apache) or DownstreamCachePurgeLocationPrefix (for Nginx) with non-empty string values. In such cases, setting ModPagespeedModifyCachingHeaders (for Apache) or the pagespeed ModifyCachingHeaders (for Nginx) to on will not have any effect on Cache-Control headers, but will continue to affect other caching headers such as Last-Modified as expected.

Most caching layers respect Cache-Control headers set by upstream servers. However, HTML should still be served to users with Cache-Control: no-cache, max-age=0 so that they do not see stale content from their browser caches. Caching layers can be configured to make this transformation as described in the next two sections.

Configuring proxy_cache servers

This section describes the steps for configuring proxy_cache to work with a PageSpeed server running with DownstreamCache* directives. Note that the configuration below allows caching of four versions for each HTML page corresponding to the most popular User-Agent classes.

  1. Setting up proxy_cache servers: proxy_cache servers can be set up using the Nginx Http Proxy Module. To set one up, you can follow these steps.
    1. Specify proxy_cache_path: You need to specify the directory in which the files cached by proxy_cache will be stored and the name of the zone for caching. A sample proxy_cache_path configuration to be added to the http block in the Nginx configuration file is:
      proxy_cache_path /path/to/proxy-cache-dir levels=1:2 keys_zone=htmlcache:60m inactive=90m max_size=50m;
      
    2. Define server hostname and port: Add a server block defining its hostname and the port on which the server will run. A sample server block is:
      server {
        # Block 1:  Basic port, server_name definitions.
        # This server represents the external caching layer server which
        # receives user requests and proxies them to the upstream PageSpeed
        # server when the response is not available in the cache.
        # It also services purge requests from the upstream server.
        listen 80;
        server_name proxy_cache.example.com;
      
        # Disable PageSpeed on this server.
        pagespeed off;
      }
      
    3. Install ngx_cache_purge: You need to install the ngx_cache_purge module for this feature to work correctly. You can do this using the below git command and then including ngx_cache_purge in the additional modules with which Nginx is compiled, using --add-module.
      git clone https://github.com/FRiCKLE/ngx_cache_purge.git
      
  2. Building the proxy_cache_key prefix based on User-Agent: Within the proxy_cache server block, the proxy_cache_key should be redefined to use the User-Agent classification logic given below in order to make sure that the cache stores different versions of the HTML based on whatever optimizations are possible with the given User-Agents.
        # Block 2: Define prefix for proxy_cache_key based on the UserAgent.
    
        # Define placeholder PS-CapabilityList header values for large and small
        # screens with no UA dependent optimizations. Note that these placeholder
        # values should not contain any of ll, ii, dj, jw or ws, since these
        # codes will end up representing optimizations to be supported for the
        # request.
        set $default_ps_capability_list_for_large_screens "LargeScreen.SkipUADependentOptimizations";
        set $default_ps_capability_list_for_small_screens "TinyScreen.SkipUADependentOptimizations";
    
        # As a fallback, the PS-CapabilityList header that is sent to the upstream
        # PageSpeed server should be for a large screen device with no browser
        # specific optimizations.
        set $ps_capability_list $default_ps_capability_list_for_large_screens;
    
        # Cache-fragment 1: Desktop User-Agents that support lazyload_images (ll),
        # inline_images (ii) and defer_javascript (dj).
        # Note: Wget is added for testing purposes only.
        if ($http_user_agent ~* "Chrome/|Firefox/|Gecko|Trident/6\.|Safari|Wget") {
          set $ps_capability_list "ll,ii,dj:";
        }
        # Cache-fragment 2: Desktop User-Agents that support lazyload_images (ll),
        # inline_images (ii), defer_javascript (dj), webp (jw) and lossless_webp
        # (ws).
        if ($http_user_agent ~*
            "Chrome/[2][3-9]+\.|Chrome/[[3-9][0-9]+\.|Chrome/[0-9]{3,}\.") {
          set $ps_capability_list "ll,ii,dj,jw,ws:";
        }
        # Cache-fragment 3: This fragment contains (a) Desktop User-Agents that
        # match fragments 1 or 2 but should not because they represent older
        # versions of certain browsers or bots and (b) Tablet User-Agents that
        # correspond to large screens. These will only get optimizations that work
        # on all browsers and use image compression qualities applicable to large
        # screens. Note that even Tablets that are capable of supporting inline or
        # webp images, e.g. Android 4.1.2, will not get these advanced
        # optimizations.
        if ($http_user_agent ~* "Firefox/[1-2]\.|bot|Yahoo!|Ruby|RPT-HTTPClient|(Google \(\+https\:\/\/developers\.google\.com\/\+\/web\/snippet\/\))|Android|iPad|TouchPad|Silk-Accelerated|Kindle Fire") {
          set $ps_capability_list $default_ps_capability_list_for_large_screens;
        }
        # Cache-fragment 4: Mobiles and small screen Tablets will use image compression
        # qualities applicable to small screens, but all other optimizations will be
        # those that work on all browsers.
        if ($http_user_agent ~* "Mozilla.*Android.*Mobile*|iPhone|BlackBerry|Opera Mobi|Opera Mini|SymbianOS|UP.Browser|J-PHONE|Profile/MIDP|portalmmm|DoCoMo|Obigo|Galaxy Nexus|GT-I9300|GT-N7100|HTC One|Nexus [4|7|S]|Xoom|XT907") {
          set $ps_capability_list $default_ps_capability_list_for_small_screens;
        }
    
    

    Note that the logic above allows for four classes of User-Agents to be cached. If you do not have any User-Agent dependent filters such as inline_images, inline_preview_images, lazyload_images and defer_javascript enabled on your server, you can drop the first cache-fragment. If you do not have WebP related optimizations enabled on your PageSpeed server, you can drop the second cache-fragment block. Finally, if you do not have separate image compression qualities defined for small screens, you can drop the fourth cache-fragment.

  3. Bypassing the cache for certain requests: Requests for .pagespeed. resources are set to bypass the cache since these are already cached in the PageSpeed server. Adding these to the cache would cause bloating due to identical copies of the same resource getting cached in different User-Agent fragments.
        # Block 3a: Bypass the cache for .pagespeed. resource. PageSpeed has its own
        # cache for these, and these could bloat up the caching layer.
        if ($uri ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") {
          set $bypass_cache "1";
        }
    
    Request headers other than User-Agent can also alter the response sent by the backend server. For example, the Accept-Encoding header dictates whether the response will be gzipped or not.
        # Block 3b: Only cache responses to clients that support gzip.  Most clients
        # do, and the cache holds much more if it stores gzipped responses.
        if ($http_accept_encoding !~* gzip) {
          set $bypass_cache "1";
        }
    
    Cookies or source IP addresses could also influence the content served by your backend server. You should check for any such request headers to which your backend server is sensitive and consider incorporating these into the caching logic, by either adding them to the cache key or by bypassing the cache entirely.
  4. Defining a handler for purge requests: A location block for handling purge requests is required in the proxy_cache server configuration. Assuming that htmlcache is the name of the zone to be used for caching as defined in the proxy_cache_path directive, a sample configuration for the purge block is given below. Note how the proxy_cache_key is redefined in the proxy_cache_purge directive to use the User-Agent dependent prefix generated previously.
  5.     # Block 4: Location block for purge requests.
        location ~ /purge(/.*) {
          allow localhost;
          allow 127.0.0.1;
          allow YOUR-SERVER-IP;
          deny all;
          proxy_cache_purge htmlcache $ps_capability_list$1$is_args$args;
        }
    
  6. Respecting Cache-Control and Content-Type response headers: Assuming that your backend server already serves cacheable HTML with the right Cache-Control headers, the max-age values will be respected by proxy_cache. However, these responses still need to be served with Cache-Control: no-cache, max-age=0. In order to infer whether the response is publicly cacheable HTML or not, you can add the following lines to the http block in your Nginx configuration file:
         # Block 5a: Decide on Cache-Control header value to use for outgoing
         # response.
         # Map new_cache_control_header_val to "no-cache, max-age=0" if the
         # content is html and use the original Cache-Control header value
         # in all other cases.
         map $upstream_http_content_type $new_cache_control_header_val {
           default $upstream_http_cache_control;
           "~*text/html" "no-cache, max-age=0";
         }
    
    To override the Cache-Control headers with the inferred value from the above snippet, you should add the following lines to the location block corresponding to "/":
         # Block 5b: Override Cache-Control headers as needed.
         # Hide the upstream cache control header.
         proxy_hide_header Cache-Control;
         # Add the inferred Cache-Control header.
         add_header Cache-Control $new_cache_control_header_val;
    
  7. Defining proxy_cache directives: The location block for "/" on the proxy_cache server should specify the upstream PageSpeed server location and the zone for caching. It should use the already computed cache key prefix to redefine proxy_cache_key and this should be redefined exactly as it is done in the proxy_cache_purge directive previously. The directive for bypassing specific requests should also be present here. The Host header should be forwarded to the upstream PageSpeed server. The PS-CapabilityList header should be set to the proxy_cache_key prefix that is based on User-Agent as an indication of the optimizations that will be supported by the associated cache-fragment. A header for reporting cache hits and misses can be optionally used for debugging purposes. A sample configuration is:
        # Block 6: Location block with proxy_cache directives.
        location / {
          # 1: Upstream PageSpeed server is running at localhost:8050.
          proxy_pass http://localhost:8050;
    
          # 2: Use htmlcache as the zone for caching.
          proxy_cache htmlcache;
    
          # 3: Bypass requests that correspond to .pagespeed. resources
          # or clients that do not support gzip etc.
          proxy_cache_bypass $bypass_cache;
    
          # 4: Use the redefined proxy_cache_key and make sure the /purge/
          # block uses the same key.
          proxy_cache_key $ps_capability_list$uri$is_args$args;
    
          # 5: Forward Host header to upstream server.
          proxy_set_header Host $host;
    
          # 6: Set the PS-CapabilityList header for PageSpeed server to respect.
          proxy_set_header PS-CapabilityList $ps_capability_list;
    
          # 7: Add a header for identifying cache hits/misses/expires. This is
          # for debugging purposes only and can be commented out in production.
          add_header X-Cache $upstream_cache_status;
        }
    

Configuring Varnish servers

This section describes the steps for configuring Varnish 3.x or 4.x to work with a PageSpeed server running with the DownstreamCache* directives. Note that the configuration below allows caching of four versions for each HTML page corresponding to the most popular User-Agent classes.

  1. Setting up Varnish servers: Varnish can be set up using the following steps.
    sudo apt-get install varnish
    
    Edit /etc/default/varnish and configure the port on which the Varnish server should listen. A sample set of lines to add at the end of the file are:
      DAEMON_OPTS="-a :80 \
                 -T localhost:6082 \
                 -f /etc/varnish/default.vcl \
                 -S /etc/varnish/secret \
                 -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"
    
    The /etc/varnish/default.vcl file contains the Varnish server configuration. You should update the backend default block to reflect the correct upstream PageSpeed server host and port. A sample configuration is:
    # Block 1: Define upstream server's host and port.
    backend default {
      # Location of PageSpeed server.
      .host = "127.0.0.1";
      .port = "8080";
    }
    
    To start the Varnish server after making any changes to the default.vcl file, use the following command:
    sudo service varnish restart
    
  2. Building the User-Agent based key for hashing into the cache: In order to make sure that the cache stores different versions of the HTML based on whatever optimizations are possible with the given User-Agents, the below subroutine with User-Agent based logic should be defined.
    # Block 2: Define a key based on the User-Agent which can be used for hashing.
    # Also set the PS-CapabilityList header for PageSpeed server to respect.
    sub generate_user_agent_based_key {
        # Define placeholder PS-CapabilityList header values for large and small
        # screens with no UA dependent optimizations. Note that these placeholder
        # values should not contain any of ll, ii, dj, jw or ws, since these
        # codes will end up representing optimizations to be supported for the
        # request.
        set req.http.default_ps_capability_list_for_large_screens = "LargeScreen.SkipUADependentOptimizations:";
        set req.http.default_ps_capability_list_for_small_screens = "TinyScreen.SkipUADependentOptimizations:";
    
        # As a fallback, the PS-CapabilityList header that is sent to the upstream
        # PageSpeed server should be for a large screen device with no browser
        # specific optimizations.
        set req.http.PS-CapabilityList = req.http.default_ps_capability_list_for_large_screens;
    
        # Cache-fragment 1: Desktop User-Agents that support lazyload_images (ll),
        # inline_images (ii) and defer_javascript (dj).
        # Note: Wget is added for testing purposes only.
        if (req.http.User-Agent ~ "(?i)Chrome/|Firefox/|Gecko|Trident/6\.|Safari|Wget") {
          set req.http.PS-CapabilityList = "ll,ii,dj:";
        }
        # Cache-fragment 2: Desktop User-Agents that support lazyload_images (ll),
        # inline_images (ii), defer_javascript (dj), webp (jw) and lossless_webp
        # (ws).
        if (req.http.User-Agent ~
            "(?i)Chrome/[2][3-9]+\.|Chrome/[[3-9][0-9]+\.|Chrome/[0-9]{3,}\.") {
          set req.http.PS-CapabilityList = "ll,ii,dj,jw,ws:";
        }
        # Cache-fragment 3: This fragment contains (a) Desktop User-Agents that
        # match fragments 1 or 2 but should not because they represent older
        # versions of certain browsers or bots and (b) Tablet User-Agents that
        # on all browsers and use image compression qualities applicable to large
        # screens. Note that even Tablets that are capable of supporting inline or
        # WebP images, e.g. Android 4.1.2, will not get these advanced
        # optimizations.
        if (req.http.User-Agent ~ "(?i)Firefox/[1-2]\.|bot|Yahoo!|Ruby|RPT-HTTPClient|(Google \(\+https\:\/\/developers\.google\.com\/\+\/web\/snippet\/\))|Android|iPad|TouchPad|Silk-Accelerated|Kindle Fire") {
          set req.http.PS-CapabilityList = req.http.default_ps_capability_list_for_large_screens;
        }
        # Cache-fragment 4: Mobiles and small screen Tablets will use image compression
        # qualities applicable to small screens, but all other optimizations will be
        # those that work on all browsers.
        if (req.http.User-Agent ~ "(?i)Mozilla.*Android.*Mobile*|iPhone|BlackBerry|Opera Mobi|Opera Mini|SymbianOS|UP.Browser|J-PHONE|Profile/MIDP|portalmmm|DoCoMo|Obigo|Galaxy Nexus|GT-I9300|GT-N7100|HTC One|Nexus [4|7|S]|Xoom|XT907") {
          set req.http.PS-CapabilityList = req.http.default_ps_capability_list_for_small_screens;
        }
        # Remove placeholder header values.
        # Varnish 4.x uses "unset" instead of "remove".
        remove req.http.default_ps_capability_list_for_large_screens;
        remove req.http.default_ps_capability_list_for_small_screens;
    }
    

    Note that the logic above allows for four classes of User-Agents to be cached. If you do not have WebP related optimizations enabled on your PageSpeed server, you can drop the first cache-fragment block. If you do not have other User-Agent dependent filters such as inline_images, inline_preview_images, lazyload_images and defer_javascript enabled on your server, you can drop the second cache-fragment. Finally, if you do not have separate image compression qualities defined for small screens, you can drop the fourth cache-fragment.

    The PS-CapabilityList header being set in the subroutine will be subsequently used for computing the cache key. It is also sent to the PageSpeed server as an indication of the optimizations that will be supported by the associated cache-fragment.

  3. Redefine the cache key hash: The vcl_hash method needs to be redefined to incorporate the PS-CapabilityList header value into the cache key hash as given below:
    sub vcl_hash {
      # Block 3: Use the PS-CapabilityList value for computing the hash.
      hash_data(req.http.PS-CapabilityList);
    }
    
  4. Define an ACL for purge: An ACL needs to be defined for accepting purge requests. A sample ACL definition is:
    # Block 3a: Define ACL for purge requests.
    acl purge {
      # Purge requests are only allowed from localhost.
      "localhost";
      "127.0.0.1";
      "YOUR-SERVER-IP";
    }
    
  5. Redefine hit/miss handlers for purge requests: vcl_hit and vcl_miss should be redefined to process purge requests and issue purges. Sample redefinitions are:
    # Blocks 3b and 3c are only for Varnish 3.x, so if you're using 4.x don't
    # include them.  In Varnish 4.x purging when you get a purge request is what
    # happens by default.
    
    # Block 3b: Issue purge when there is a cache hit for the purge request.
    sub vcl_hit {
      if (req.request == "PURGE") {
        purge;
        error 200 "Purged.";
      }
    }
    
    # Block 3c: Issue a no-op purge when there is a cache miss for the purge
    # request.
    sub vcl_miss {
      if (req.request == "PURGE") {
         purge;
         error 200 "Purged.";
      }
    }
    
  6. Generate the User-Agent based key: vcl_recv is the entry point for every request to the cache. It should be redefined to call the subroutine responsible for generating the User-Agent based key to be used for computing the cache key hash. Purge requests also need to be processed here and forwarded to one of vcl_hit or vcl_miss.
    Varnish 3.x:
    # Block 4: In vcl_recv, on receiving a request, call the method responsible for
    # generating the User-Agent based key for hashing into the cache.
    sub vcl_recv {
      call generate_user_agent_based_key;
      # Block 3d: Verify the ACL for an incoming purge request and handle it.
      if (req.request == "PURGE") {
        if (!client.ip ~ purge) {
          error 405 "Not allowed.";
        }
        return (lookup);
      }
      # Blocks which decide whether cache should be bypassed or not go here.
    }
    
    Varnish 4.x:
    # Block 4: In vcl_recv, on receiving a request, call the method responsible for
    # generating the User-Agent based key for hashing into the cache.
    sub vcl_recv {
      call generate_user_agent_based_key;
      # Block 3d: Verify the ACL for an incoming purge request and handle it.
      if (req.method == "PURGE") {
        if (!client.ip ~ purge) {
          return (synth(405,"Not allowed."));
        }
        return (purge);
      }
      # Blocks which decide whether cache should be bypassed or not go here.
    }
    
  7. Bypassing the cache for certain requests: Requests for .pagespeed. resources are set to bypass the cache since these are already cached in the PageSpeed server. Adding these to the cache would cause bloating due to identical copies of the same resource getting cached in different User-Agent fragments.
      # Block 5a: Bypass the cache for .pagespeed. resource. PageSpeed has its own
      # cache for these, and these could bloat up the caching layer.
      if (req.url ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") {
        # Skip the cache for .pagespeed. resource.  PageSpeed has its own
        # cache for these, and these could bloat up the caching layer.
        return (pass);
      }
    
    Request headers other than User-Agent can also alter the response sent by the backend server. For example, the Accept-Encoding header dictates whether the response will be gzipped or not.
      # Block 5b: Only cache responses to clients that support gzip.  Most clients
      # do, and the cache holds much more if it stores gzipped responses.
      if (req.http.Accept-Encoding !~ "gzip") {
        return (pass);
      }
    
    Cookies or source IP addresses could also influence the content served by your backend server. You should check for any such request headers to which your backend server is sensitive and consider incorporating these into the caching logic, by either adding them to the cache key or by bypassing the cache entirely.
  8. Respecting Cache-Control and Content-Type response headers: If your backend server already serves cacheable HTML with appropriate Cache-Control headers, the max-age values will be respected by Varnish. However, these responses still need to be served with Cache-Control: no-cache, max-age=0. You can do this by redefining vcl_fetch as shown below:
    # Block 6: Mark HTML uncacheable by caches beyond our control.
    # Varnish 4.x calls this "vcl_backend_response" instead of "vcl_fetch".
    sub vcl_fetch {
       if (beresp.http.Content-Type ~ "text/html") {
         # Hide the upstream cache control header.
         # Varnish 4.x uses "unset" instead of "remove".
         remove beresp.http.Cache-Control;
         # Add no-cache Cache-Control header for html.
         set beresp.http.Cache-Control = "no-cache, max-age=0";
       }
       return (deliver);
    }
    
  9. Record hits/misses in response headers: vcl_deliver can be redefined to output a response header that indicates whether the response was a cache hit or cache miss. A sample redefinition is:
    # Block 7: Add a header for identifying cache hits/misses.
    sub vcl_deliver {
      if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
      } else {
        set resp.http.X-Cache = "MISS";
      }
    }
    

Configuring generic caching proxy servers

If you want to use a caching proxy server other than Varnish or Nginx's proxy_cache, it needs to be able to:

  • Accept purge requests via http. Both PURGE and GET are supported, and PageSpeed can add a prefix to the path of a purge request.
  • Allow setting a custom cache key based on User-Agent regexps.
  • Allow setting custom request headers for upstream servers based on User-Agent regexps.
  • Override outgoing caching headers for text/html responses.
  • Bypass the cache for certain requests.
If your caching proxy server supports all of these things, you can have a look at the Varnish and proxy_cache configurations described in the above sections and try to convert them to the configuration language of your caching server. Please let us know if you get it working or if you run into issues via the discussion group.

Integration with beaconing dependent filters

Note: New feature as of 1.8.31.2

Several filters such as inline_images, inline_preview_images, lazyload_images and prioritize_critical_css depend extensively on client beacons to determine critical images and CSS. When such filters are enabled, pages periodically have beaconing JavaScript inserted as part of the rewriting process. The configuration recommended in the previous sections will not work effectively with such filters because instrumented pages can be cached in the downstream caching layer and served to multiple users. This will result in excessive beacon traffic to the server, most of which will be ignored, thus reducing the quantity of valid beacon data available to the server.

The instructions in this section will allow filters that depend on beaconing to interact well with downstream caches. A certain percentage of the traffic seen by the downstream caching layer will be redirected to the backend with a special beaconing header. This beaconing header will force the PageSpeed server to instrument the page whenever the header value matches the configured beaconing key value. These intermittent instrumentation opportunities will ensure that the server receives sufficient beacon responses. Instrumented pages will also be made uncacheable so that the instrumented page will be served to only one user whose beacon response will be accepted by the server as valid.

Configuring PageSpeed servers

PageSpeed servers must be configured to specify a beaconing key which will be known and used only by the downstream caching layers. Choose a key that external users cannot guess because this key will allow them to force the servers to instrument lots of pages, and send potentially malicious beacon responses to corrupt the server's impression of critical images and CSS. You should also ensure that your server does not echo the request headers back to the user since that could reveal the secret key to an attacker. The beaconing key can be specified as follows:

Apache:
ModPagespeedDownstreamCacheRebeaconingKey <your-secret-key>
Nginx:
pagespeed DownstreamCacheRebeaconingKey <your-secret-key>;

Configuring proxy_cache servers

If you are using proxy_cache as your downstream caching layer, here are the configuration snippets required in addition to the ones mentioned in the previous proxy_cache section.
  1. Install an nginx module that helps generate random numbers: The following example snippets use the ngx_set_misc module for generating random numbers. Any module with the same same functionality can be used.
  2. Send a small percentage of traffic to the backend with the beaconing key: In the proxy_cache server block, add the following snippet just before the relevant location block is defined. Ideally, we recommend setting the special beaconing header on 5% of HITs and 25% of MISSes before passing these along to the backend for instrumentation, but, since nginx does not allow us to bypass the cache after we check the cache for a HIT/MISS status, we use a constant percentage (5%) of the entire traffic for this logic as indicated below.
    set_random $rand 0 100;
    set $should_beacon_header_val "";
    if ($rand ~* "^[0-4]$") {
      set $should_beacon_header_val <your-secret-key>;
      set $bypass_cache 1;
    }
    
    In the "/" location block where other proxy_cache directives have been specified, add in the following snippet to set the beaconing header to the value computed above.
    proxy_set_header PS-ShouldBeacon $should_beacon_header_val;
    
  3. Remove incoming beaconing headers: Add the following snippet to the "/" location block to avoid passing incoming beaconing request headers to the backend. This prevents malicious users from being able to request the instrumented page even if they are able to discover the secret key.
    proxy_hide_header PS-ShouldBeacon;
    

Configuring Varnish servers

If you are using Varnish as your downstream caching layer, here are the configuration snippets required in addition to the ones mentioned in the previous Varnish section.
  1. Import a module that helps generate random numbers: The Varnish Standard Module can be imported at the beginning of the vcl file and used for generating random numbers.
    import std;
    
  2. Send a small percentage of traffic to the backend with the beaconing key: Define the beaconing key value in vcl_recv for reuse in other blocks.
      # We want to support beaconing filters, i.e., one or more of inline_images,
      # lazyload_images, inline_preview_images or prioritize_critical_css are
      # enabled. We define a placeholder constant called ps_should_beacon_key_value
      # so that some percentages of hits and misses can be sent to the backend
      # with this value used for the PS-ShouldBeacon header to force beaconing.
      set req.http.ps_should_beacon_key_value = <your-secret-key>;
    
    vcl_hit should be modified to send a small percentage of its traffic to the backend by bypassing the cache and specifying the beaconing key.
    sub vcl_hit {
      # ... Old content goes here ...
      # Send 5% of the HITs to the backend for instrumentation.
      if (std.random(0, 100) < 5) {
        set req.http.PS-ShouldBeacon = req.http.ps_should_beacon_key_value;
        return (pass);
      }
    }
    
    vcl_miss should also be modified to send a somewhat larger percentage of its traffic to the backend with the specified beaconing key.
    sub vcl_miss {
      # ... Old content goes here ...
      # Send 25% of the MISSes to the backend for instrumentation.
      if (std.random(0, 100) < 25) {
        set req.http.PS-ShouldBeacon = req.http.ps_should_beacon_key_value;
        return (pass);
      }
    }
    
  3. Remove incoming beaconing headers: Add the following snippet to vcl_recv to avoid passing incoming beaconing request headers to the backend. This prevents malicious users from being able to request the instrumented page even if they are able to discover the secret key.
      # Incoming PS-ShouldBeacon headers should not be allowed since this will allow
      # external entities to force the server to instrument pages.
      remove req.http.PS-ShouldBeacon;