Extend Cache

Configuration

The 'Extend Cache' filter is enabled by specifying:

Apache:
ModPagespeedEnableFilters extend_cache
Nginx:
pagespeed EnableFilters extend_cache;

in the configuration file. This is equivalent to enabling all three of extend_cache_images, extend_cache_scripts, and extend_cache_css.

Also see: extend_cache_pdfs.

Description

'Extend Cache' seeks to improve the cacheability of a web page's resources without compromising the ability of site owners to change the resources and have those changes propagate to users' browsers.

This filter is based on the best practice to optimize caching, as applied to the browser.

Operation

The 'Extend Cache' filter rewrites the URL references in the HTML page to include a hash of the resource content (if rewrite_css is enabled then image URLs in CSS will also be rewritten). Thus if the site owners change the resource content, then the URL for the rewritten resource will also change. The old content in the user's browser cache will not be referenced again, because it will not match the new name.

The 'Extend Cache' filter also rewrites the HTTP header to extend the max-age value of the cacheable resource to 31536000 seconds, which is one year.

For example, for the following HTML tag/HTTP header pair:

HTML tag   : <img src="images/logo.gif">
HTTP header: Cache-Control:public, max-age=300

PageSpeed will rewrite these into:

HTML tag   : <img src="images/logo.gif.pagespeed.ce.xo4He3_gYf.gif">
HTTP header: Cache-Control:public, max-age=31536000

PageSpeed uses the origin cache time-to-live (TTL), in this case 300 seconds, to periodically re-examine the content to see if it's changed. If it changes, then the hash of the content will also change. Thus it's safe to serve the hashed URL with a long timeout—PageSpeed uses one year.

If the site owners change the logo, then PageSpeed will notice within 5 minutes and begin serving a different URL to users. But if the content does not change, then the hash will not change, and the copy in each user's browser will still be valid and reachable.

Thus the site owners are still in complete control of how rapidly they can deploy changes to the site, but this does not affect the effectiveness of the browser cache. Decreasing the TTL only affects how often PageSpeed will need to re-examine the resource.

It should be noted that cache extension is built into other PageSpeed filters as well. All filters that rewrite resources include a content-hash in the generated URL, and serve the resource with a 1-year TTL. The purpose of this filter is to extend cache lifetimes for all resources that are not otherwise optimized.

Example

You can see the filter in action at www.modpagespeed.com for cache-extending resources in HTML and in CSS.

Limitations

Cache extension is only applied to resources that are publicly cacheable to begin with. Cache extension is not done on resources that have Cache-Control: private or Cache-Control: nocache.

This can be overridden with:

Apache:
ModPagespeedForceCaching on
Nginx:
pagespeed ForceCaching on;

This switch is intended for experimental purposes only, to help evaluate the benefit of cache extension against the effort of adding cache-control headers to resources. Live traffic should not be served this way.

The following configure file fragment demonstrates how to configure caching headers in Apache. This is how the mod_pagespeed_example directory is set up.

# These caching headers are set up for the mod_pagespeed example, and
# also serve as a demonstration of good values to set for the entire
# site, if it is to be optimized by mod_pagespeed.
<Directory /var/www/mod_pagespeed_example>
  # Any caching headers set on HTML are ignored, and all HTML is served
  # uncacheable.  PageSpeed rewrites HTML files each time they are served.  The
  # first time mod_pagespeed sees an HTML file, it generally won't optimize it
  # fully.  It will optimize better after the second view.  Caching defeats this
  # behavior.

  # Images, styles, and JavaScript are all cache-extended for
  # a year by rewriting URLs to include a content hash.  mod_pagespeed
  # can only do this if the resources are cacheable in the first place.
  # The origin caching policy, set here to 10 minutes, dictates how
  # frequently mod_pagespeed must re-read the content files and recompute
  # the content-hash.  As long as the content doesn't actually change,
  # the content-hash will remain the same, and the resources stored
  # in browser caches will stay relevant.
  <FilesMatch "\.(jpg|jpeg|gif|png|js|css)$">
    Header set Cache-control "public, max-age=600"
  </FilesMatch>
</Directory>

The equivalent configuration for Nginx would be:

# Make sure this goes after the .pagespeed. location regexp in your
# configuration file so that .pagespeed. resources don't get this header
# applied.
location /mod_pagespeed_example {
  location ~* \.(jpg|jpeg|gif|png|js|css)$ {
    add_header Cache-Control "public, max-age=600";
  }
}

Risks

This filter is considered low risk. The rewritten URL will have a different name than that of the original URL, however, so JavaScript that uses URLs as templates can stop working. For example, consider a site that has <input type=image src="button.gif"> and runs JavaScript that turns button.gif into button-hover.gif when the user hovers over the button. With cache extension enabled, or any filter that changes the URLs of images, PageSpeed would replace the HTML fragment with something like <input type=image src="button.gif.pagespeed.ce.xo4He3_gYf.gif">. If the script was coded as "insert '-hover' before the final '.'" then it would construct an invalid hover URL of button.gif.pagespeed.ce.xo4He3_gYf-hover.gif. If this is a problem on your site, consider In-Place Resource Optimization.

When applied to JavaScript files, this filter is sensitive to AvoidRenamingIntrospectiveJavascript. For example, a JavaScript file that calls document.getElementsByTagName('script') will not be cache-extended.