Webmasters

Full Specification

Abstract

This document describes an agreement between web servers and search engine crawlers that allows for dynamically created content to be visible to crawlers. Google currently supports this agreement. The hope is that other search engines will also adopt this proposal.

Basic definitions

  • Web application: In this document, a web application is an AJAX-enabled, interactive web application.
  • State: While traditional static web sites consist of many pages, a more appropriate term for AJAX applications is "state". An application consists of a number of states, where each state constitutes a specific user experience or a response to user input. Examples of states: For a mail application, states could be base state, inbox, compose, etc. For a chess application, states could be base state, start new game, but also current state x of the chessboard, including information about past moves, whose player's turn it is, and so forth. In an AJAX application, a state often corresponds to a URL with a hash fragment.
  • Hash fragments: Traditionally, hash fragments (that is, everything after # in the URL) have been used to indicate one portion of a static HTML document. By contrast, AJAX applications often use hash fragments in another function, namely to indicate state. For example, when a user navigates to the URL http://www.example.com/ajax.html#key1=value1&key2=value2, the AJAX application will parse the hash fragment and move the application to the "key1=value1&key2=value2" state. This is similar in spirit to moving to a portion of a static document, that is, the traditional use of hash fragments. History (the back button) in AJAX applications is generally handled with these hash fragments as well. Why are hash fragments used in this way? While the same effect could often be achieved with query parameters (for example, ?key1=value1&key2=value2), hash fragments have the advantage that in and of themselves, they do not incur an HTTP request and thus no round-trip from the browser to the server and back. In other words, when navigating from www.example.com/ajax.html to www.example.com/ajax.html#key1=value1&key2=value2, the web application moves to the state key1=value1&key2=value2 without a full page refresh. As such, hash fragments are an important tool in making AJAX applications fast and responsive. Importantly, however, hash fragments are not part of HTTP requests (and as a result they are not sent to the server), which is why our approach must handle them in a new way. See RFC 3986 for more details on hash fragments.
  • Query parameters: Query parameters (for example, ?s=value in the URL) are used by web sites and applications to post to or obtain information from the server. They incur a server round-trip and full page reload. In other words, navigating from www.example.com to www.example.com?s=value is handled by an HTTP request to the server and a full page reload. See RFC 3986 for more details. Query parameters are routinely used in AJAX applications as well.
  • HTML snapshot: An HTML snapshot is the serialization of the DOM the browser will produce when loading the page, including executing any JavaScript that is needed to get the intial page.
  • Pretty URL: Any URL containing a hash fragment beginning with !, for example, www.example.com?myquery#!key1=value1&key2=value2
  • Ugly URL: Any URL containing a query parameter with the key _escaped_fragment_, for example, www.example.com?myquery&_escaped_fragment_=key1=value1%26key2=value2.

Bidirectional between #! URL to _escaped_fragment_ URL

A bidirectional mapping exists between pretty and ugly URLs:

?_escaped_fragment_=key1=value1%26key2=value2: used for crawling only, indicates an indexable AJAX app state

#!key1=value1&key2=value2: used for normal (browser) web site interaction

Mapping from #! to _escaped_fragment_ format

Each URL that contains a hash fragment beginning with the exclamation mark is considered a #! URL. Note that any URL may contain at most one hash fragment. Each pretty (#!) URL has a corresponding ugly (_escaped_fragment_) URL, which is derived with the following steps:

  1. The hash fragment becomes part of the query parameters.
  2. The hash fragment is indicated in the query parameters by preceding it with _escaped_fragment_=
  3. Some characters are escaped when the hash fragment becomes part of the query parameters. These characters are listed below.
  4. All other parts of the URL (host, port, path, existing query parameters, and so on) remain unchanged.

Mapping from _escaped_fragment_ format to #! format

Any URL whose query parameters contain the special token _escaped_fragment_ as the last query parameter is considered an _escaped_fragment_ URL. Further, there must only be one _escaped_fragment_ in the URL, and it must be the last query parameter. The corresponding #! URL can be derived with the following steps:

  1. Remove from the URL all tokens beginning with _escaped_fragment_= (Note especially that the = must be removed as well).
  2. Remove from the URL the trailing ? or & (depending on whether the URL had query parameters other than _escaped_fragment_).
  3. Add to the URL the tokens #!.
  4. Add to the URL all tokens after _escaped_fragment_= after unescaping them.

Note: As is explained below, there is a special syntax for pages without hash fragments, but that still contain dynamic Ajax content. For those pages, to map from the _escaped_fragment_ URL to the original URL, omit steps 3 and 4 above.

Escaping characters in the bidirectional mapping

The following characters will be escaped when moving the hash fragment string to the query parameters of the URL, and must be unescaped by the web server to obtain the original URL:

  • %00..20
  • %23
  • %25..26
  • %2B
  • %7F..FF

Control characters (0x00..1F and 0x7F) should be avoided. Non-ASCII text will be converted to UTF-8 before escaping.

Role of the Search Engine Crawler

Transformation of URL

  1. URLs of the format domain[:port]/path#!hashfragment, for example, www.example.com#!key1=value1&key2=value2 are temporarily transformed into domain[:port]/path?_escaped_fragment_=hashfragment, such as www.example.com?_escaped_fragment_=key1=value1%26key2=value2. In other words, a hash fragment beginning with an exclamation mark ('!') is turned into a query parameter. We refer to the former as "pretty URLs" and to the latter as "ugly URLs".
  2. URLs of the format domain[:port]/path?queryparams#!hashfragment (for example, www.example.com?user=userid#!key1=value1&key2=value2) are temporarily transformed into domain[:port]/path?queryparams&_escaped_fragment_=hashfragment (for the above example, www.example.com?user=userid&_escaped_fragment_=key1=value1%26key2=value2). In other words, a hash fragment beginning with an exclamation mark ('!') is made part of the existing query parameters by adding a query parameter with the key "_escaped_fragment_" and the value of the hash fragment without the "!". As in this case the URL already contains query parameters, the new query parameter is delimited from the existing ones with the standard delimiter '&'. We refer to the former #! as "pretty URLs" and to the latter _escaped_fragment_ URLs as "ugly URLs".
  3. Some characters are escaped when making a hash fragment part of the query parameters. See the previous section for more information.
  4. If a page has no hash fragments, but contains <meta name="fragment" content="!"> in the <head> of the HTML, the crawler will transform the URL of this page from domain[:port]/path to domain[:port]/path?_escaped_fragment= (or domain[:port]/path?queryparams to domain[:port]/path?queryparams&_escaped_fragment_= and will then access the transformed URL. For example, if www.example.com contains <meta name="fragment" content="!"> in the head, the crawler will transform this URL into www.example.com?_escaped_fragment_= and fetch www.example.com?_escaped_fragment_= from the web server.

Request

The crawler agrees to request from the server ugly URLs of the format:

  • domain[:port]/path?_escaped_fragment_=hashfragment
  • domain[:port]/path?queryparams&_escaped_fragment_=hashfragment
  • domain[:port]/path?_escaped_fragment_=
  • domain[:port]/path?queryparams&_escaped_fragment_=

Search result

The search engine agrees to display in the search results the corresponding pretty URLs:

  • domain[:port]/path#!hashfragment
  • domain[:port]/path?queryparams#!hashfragment
  • domain[:port]/path
  • domain[:port]/path?queryparams

Role of the application and web server

Opting into the AJAX crawling scheme

The application must opt into the AJAX crawling scheme to notify the crawler to request ugly URLs. An application can opt in with either or both of the following:

  • Use #! in your site's hash fragments.
  • Add a trigger to the head of the HTML of a page without a hash fragment (for example, your home page):

    <meta name="fragment" content="!">
    

Once the scheme is implemented, AJAX URLs containing hash fragments with #! are eligible to be crawled and indexed by the search engine.

Transformation of URL

In response to a request of a URL that contains _escaped_fragment_ (which should always be a request from a crawler), the server agrees to return an HTML snapshot of the corresponding pretty #! URL. See above for the mapping between _escaped_fragment_ (ugly) URLs and #! (pretty) URLs.

Serving the HTML snapshot corresponding to the dynamic page

In response to an _escaped_fragment_ URL, the origin server agrees to return to the crawler an HTML snapshot of the corresponding #! URL. The HTML snapshot must contain the same content as the dynamically created page.

HTML snapshots can be obtained in an offline process or dynamically in response to a crawler request. For a guide on producing an HTML snapshot, see the HTML snapshot section.

Pages without hash fragments

It may be impossible or undesirable for some pages to have hash fragments in their URLs. For this reason, this scheme has a special provision for such pages: in order to indicate that a page without a hash fragment should be crawled again in _escaped_fragment_ form, it is possible to embed a special meta tag into the head of its HTML. The syntax for this meta tag is as follows:

<meta name="fragment" content="!">

The following important restrictions apply:

  1. The meta tag may only appear in pages without hash fragments.
  2. Only "!" may appear in the content field.
  3. The meta tag must appear in the head of the document.

The crawler treats this meta tag as follows: If the page www.example.com contains the meta tag in its head, the crawler will retrieve the URL www.example.com?_escaped_fragment_=. It will index the the content of the page www.example.com and will display www.example.com in search results.

As noted above, the mapping from the _escaped_fragment_ to the #! syntax is slightly different in this case: to retrieve the original URL, the web server instead simply removes the tokens _escaped_fragment_= (note the =) from the URL. In other words, you want to end up with the URL www.example.com instead of www.example.com#!.

Warning: Should the content for www.example.com?_escaped_fragment_= return a 404 code, no content will be indexed for www.example.com! So, be careful if you add this meta tag to your page and make sure an HTML snapshot is returned.

Hyperlinks and Sitemaps

In order to crawl your site's URLs, a crawler must be able to find them. Here are two common ways to accomplish this:

  1. Hyperlinks: An HTML page or an HTML snapshot can contain hyperlinks to pretty URLs, that is, URLs containing #! hash fragments. Note: The crawler will not follow links extracted from HTML that contain _escaped_fragment_.
  2. Sitemap: Pretty URLs may be listed in Sitemaps. For more information on Sitemaps, please see www.sitemaps.org.

Backward compatibility to current practice

Current practices will still be supported. Hijax remains a valid solution, as we describe here. Giving the crawler access to static content remains the main goal.

Existing uses of #!

A few web pages already use exclamation marks as the first character in a hash fragment. Because hash fragments are not a part of the URL that are sent to a server, such URLs have never been crawled. In other words, such URLs are not currently in the search index.

Under the new scheme, they can be crawled. In other words, a crawler will map each #! URL to its corresponding _escaped_fragment_ URL and request this URL from the web server. Because the site uses the pretty URL syntax (that is, #! hash fragments), the crawler will assume that the site has opted into the AJAX crawling scheme. This can cause problems, because the crawler will not get any meaningful content for these URLs if the web server does not return an HTML snapshot.

There are two options:

  1. The site adopts the AJAX crawling scheme and returns HTML snapshots.
  2. If this is not desired, it is possible to opt out out of the scheme by adding a directive to the robots.txt file:

    Disallow: /*_escaped_fragment_

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.