Optimize HTML

PageSpeed Service was turned off on August 3rd, 2015. Please see Turndown Information for PageSpeed Service.

Objective

Reduce payload size of HTML document and reduce processing on the browser when rendering the page.

PageSpeed rule

This rewriter implements the PageSpeed rule for minimizing payload size and optimizing browser rendering.

Operation

The following rewriters control the respective functionalities described below.

Collapse Whitespace

This rewriter reduces bytes transmitted in a HTML file by replacing contiguous whitespace with a single whitespace character. Because HTML is often formatted with extra whitespace for human readability or as an incidental effect of the templates used to generate it, this technique can reduce the number of bytes needed to transmit HTML resources.

For example, if the HTML document looks like this:

<html>

  <head>
    <title>Hello,   world!   </title>
    <script> var x = 'Hello,   world!';</script>
  </head>

  <body>
    Hello, World!
    <pre>
      Hello,
        World!
    </pre>
  </body>

</html>

Then PageSpeed Service will rewrite it into:

<html>
<head>
<title>Hello, world!</title>
<script> var x = 'Hello,   world!';</script>
</head>
<body>
Hello, World!
<pre>
      Hello,
        World!
</pre>
</body>
</html>

This rewriter will not modify whitespace appearing within <pre>, <textarea>, <script> and <style> tags. Extraneous whitespace within inline scripts and styles can be removed using the Minify Javascript and Minify CSS rewriters. This rewriter will attempt to preserve newline characters to an extent - a contiguous sequence of whitespace with at least one newline anywhere in it will always collapse to a single new line.

Limitations:

Although contiguous whitespace in HTML (beyond the first space) is normally ignored by the browser outside of tags like <pre> and <textarea>, one can use CSS properties such as "white-space: pre" to make the browser preserve whitespace within a portion of the document. Use of such properties is relatively rare; however, this rewriter is not yet CSS-aware, so any pages that might use such CSS properties (either statically or dynamically) should not use this rewriter at this time.

Combine HEADs

This rewriter combines multiple <head> sections in the HTML document into one. Technically HTML documents are not allowed to have multiple <head> sections, but sites which aggregate content from multiple sources sometimes have them. This rewriter moves the content from later <head> sections into the first head. This can change the order of content (e.g. CSS and JS) in the later <head> sections relative to intervening <body> elements.

For example, if the HTML document looks like this:

<html>
  <head>
    <link rel="stylesheet" type="text/css" href="styles/yellow.css">
    <link rel="stylesheet" type="text/css" href="styles/blue.css">
  </head>
  <body>
    <div class="blue yellow big bold">
      Hello, world!
    </div>
  </body>
  <head>
    <link rel="stylesheet" type="text/css" href="styles/big.css">
    <link rel="stylesheet" type="text/css" href="styles/bold.css">
  </head>
</html>

Then PageSpeed Service will rewrite it into:

<html>
  <head>
    <link rel="stylesheet" type="text/css" href="styles/yellow.css">
    <link rel="stylesheet" type="text/css" href="styles/blue.css">
    <link rel="stylesheet" type="text/css" href="styles/big.css">
    <link rel="stylesheet" type="text/css" href="styles/bold.css">
  </head>
  <body>
    <div class="blue yellow big bold">
      Hello, world!
    </div>
  </body>
</html>

This rewriter operates within the scope of a "flush window". Specifically, large or dynamically generated HTML files my be "flushed" by the resource generator before they are complete. If the CSS combiner encounters a Flush prior to the end of the first <head>, then subsequent <head> sections will not be merged in.

Limitations:

In some browsers, in the above example, the original version will flash quickly as the browser will render the "Hello, world!" text before it sees second set of style tags providing definitions for "big and bold". This transformation will eliminate that flashing, but the end result will be the same.

If there are style or script tags in the body, between two heads, then the rewrite pass can change their order. The risk is reduced if the Minify CSS rewriter is also enabled. Additionally, JavaScript that is executed before a later <head> will see a different view of the DOM in the presence of this rewriter. If there is such JavaScript embedded in the middle of a page then this rewriter may change its behavior.

Convert Meta Tags

This rewriter adds a response header that matches each meta tag with an http-equiv attribute. For example, <meta http-eqiv="Content-Language" content="fr"> would be converted to Content-Language: fr in the response headers. The original tag is left unchanged.

Certain http-equiv meta tags, specifically those that specify content-type, require a browser to reparse the html document if they do not match the headers. By ensuring that the headers match the meta tags, these reparsing delays are avoided.

Elide Attributes

This rewriter reduces the transfer size of HTML files by removing attributes from tags when the specified value is equal to the default value for that attribute. This can save a modest number of bytes, and may make the document more compressible by canonicalizing the affected tags.

There are two cases where an attribute value can be removed. First, some attributes are "single-valued" or "boolean", in that the value specified for the attribute is irrelevant -- all that matters is whether the attribute is present or not. In such cases, the rewriter will remove the value from the tag, leaving only the attribute name.

For example, the following tag:

  <button name="ok" disabled="disabled">

can be rewritten to:

  <button name="ok" disabled>

The second case is an optional attribute with a default value. If an HTML attribute includes an explicit value for an attribute (perhaps to aid readability) that is equal to the default attribute, the rewriter will remove the attribute name and value, knowing that the browser will infer the intended attribute anyway. For example, the following tag:

  <form method="get">

can be rewritten to:

  <form>

Limitations:

This rewriter must be wary of documents with an XHTML doctype, as removing the value from a single-valued attribute will result in invalid XHTML. The rewriter attempts to recognize XHTML doctype declarations and will disable this rewriting feature in such cases.

This rewriter can break CSS formatting on pages that select on default attributes. For example, it is possible to write a CSS rule such as:

  td[colspan="1"] { ... }

This rule matches all <td> elements with a colspan attribute whose value is "1." When this rewriter removes the colspan attribute, the table column will still not be extended through further table cells, but the CSS rule above will no longer match.

Further, JavaScript can be written that inspects the DOM looking for the presence of certain attributes. Such JavaScript may behave differently on a page which has removed otherwise unnecessary attributes.

Remove Comments

This rewriter eliminates HTML comments, which are often used to document the code or to comment out experiments. Note that this directive applies only to HTML files. CSS comments are eliminated with the Minify CSS rewriter, and Javascript comments are eliminated with the Minify JavaScript rewriter.

This rewriter reduces the transfer size of HTML files by removing most HTML comments. Depending on the HTML file, this rewriter can significantly reduce the number of bytes transmitted on the network.

For example, if the HTML document looks like this:

<html>

  <head>
    <title>Hello,   world!   </title>
    <script> var x = 'Hello,   world!';</script>
  </head>

  <body>
    Hello, World!
    <pre>
      Hello,
        World!
    </pre>
  </body>

</html>

Then PageSpeed Service will rewrite it to:

<html>
<head>
<title>Hello, world!</title>
<script> var x = 'Hello,   world!';</script>
</head>
<body>
Hello, World!
<pre>
      Hello,
        World!
</pre>
</body>
</html>

This rewriter is aware of Internet Explorer conditional comments and does not remove them.

Limitations:

Some web pages use comments to embed data or JavaScript, in order to reduce the parse time of the HTML document. This rewriter should be disabled for such pages, as it will remove the comments containing the data or JavaScript that is needed by these web pages.

Remove Quotes

This rewriter eliminates unnecessary quotation marks (either "" or '') from HTML attributes. While required by the various HTML specifications, browsers permit their omission when the value of an attribute is composed of a certain subset of characters (alphanumerics and some punctuation characters).

Quote removal produces a modest savings in byte count on most pages. It may also benefit gzip compression by canonicalizing the textual representation of name=value pairs.

For example, if the HTML document looks like this:

<html>
  <head>
  </head>
  <body>
    <img src="BikeCrashIcn.png" align='left' alt="" border="0" width='70' height='30' >
  </body>
</html>

Then PageSpeed Service will rewrite it into:

<html>
  <head>
  </head>
  <body>
    <img src=BikeCrashIcn.png align=left alt="" border=0 width=70 height=30 >
  </body>
</html>

Only previously-quoted attributes are subject to quote removal. Quote removal occurs after most rewriting passes, so that any alterations to attribute values (such as rewritten URLs) will be correctly accounted for. This rewriter will act as a pass-through when XHTML is detected via doctype or content-type.

Limitations:

Space savings from the rewriter may be small (so the cost of running the rewriter may outweigh its benefits in certain settings).

Trim URLs

This rewriter trims URLs by resolving them and making them relative to the base URL for the page. E.g. on http://www.example.com/, "http://www.example.com/foo" would be shortened to "foo". This rewriter works only on URLs that are the values specified by src or href attributes. It also trims image URLs in CSS if Minify CSS is enabled.

This rewriter reduces the transfer size of HTML files by shortening most of the URLs. Depending on the HTML file, this rewriter can significantly reduce the number of bytes transmitted on the network. While it's useful for development to fully specify your URLs so that links don't break when things move around, these are bytes that are sent unnecessarily on the wire.

For example, if the HTML document looks like this:

<html>
  <head>
  <base href="http://www.example.com/">
  </head>
  <body>
    <a href="http://www.example.com/bar">Link with long URL</a>
    <img src="http://www.othersite.example.org/image.jpg">
  </body>
</html>

Then PageSpeed Service will rewrite it to:

<html>
  <head>
  <base href="http://www.example.com/">
  </head>
  <body>
    <a href="bar">Link with long URL</a>
    <img src="//www.othersite.example.org/image.jpg">
  </body>
</html>

Limitations:

Only URLs referenced by href and src attributes and, if Minify CSS is enabled, URLs in CSS files are rewritten. URLs that occur elsewhere are not altered.