This document describes an agreement between web servers and search engine crawlers that allows for dynamically created content to be visible to crawlers. Google currently supports this agreement. The hope is that other search engines will also adopt this proposal.
- Web application: In this document, a web application is an AJAX-enabled, interactive web application.
- State: While traditional static web sites consist of many pages, a more appropriate term for AJAX applications is "state". An application consists of a number of states, where each state constitutes a specific user experience or a response to user input. Examples of states: For a mail application, states could be base state, inbox, compose, etc. For a chess application, states could be base state, start new game, but also current state x of the chessboard, including information about past moves, whose player's turn it is, and so forth. In an AJAX application, a state often corresponds to a URL with a hash fragment.
- Hash fragments: Traditionally, hash fragments (that is, everything after # in the URL) have been used to indicate one portion of a static HTML document. By contrast, AJAX applications often use hash fragments in another function, namely to indicate state. For example, when a user navigates to the URL
http://www.example.com/ajax.html#key1=value1&key2=value2, the AJAX application will parse the hash fragment and move the application to the "key1=value1&key2=value2" state. This is similar in spirit to moving to a portion of a static document, that is, the traditional use of hash fragments. History (the back button) in AJAX applications is generally handled with these hash fragments as well. Why are hash fragments used in this way? While the same effect could often be achieved with query parameters (for example,
?key1=value1&key2=value2), hash fragments have the advantage that in and of themselves, they do not incur an HTTP request and thus no round-trip from the browser to the server and back. In other words, when navigating from
www.example.com/ajax.html#key1=value1&key2=value2, the web application moves to the state
key1=value1&key2=value2without a full page refresh. As such, hash fragments are an important tool in making AJAX applications fast and responsive. Importantly, however, hash fragments are not part of HTTP requests (and as a result they are not sent to the server), which is why our approach must handle them in a new way. See RFC 3986 for more details on hash fragments.
- Query parameters: Query parameters (for example,
?s=valuein the URL) are used by web sites and applications to post to or obtain information from the server. They incur a server round-trip and full page reload. In other words, navigating from
www.example.com?s=valueis handled by an HTTP request to the server and a full page reload. See RFC 3986 for more details. Query parameters are routinely used in AJAX applications as well.
- Pretty URL: Any URL containing a hash fragment beginning with
!, for example,
- Ugly URL: Any URL containing a query parameter with the key
_escaped_fragment_, for example,
#! URL to
A bidirectional mapping exists between pretty and ugly URLs:
?_escaped_fragment_=key1=value1%26key2=value2: used for crawling only, indicates an indexable AJAX app state
#!key1=value1&key2=value2: used for normal (browser) web site interaction
Each URL that contains a hash fragment beginning with the exclamation mark is considered a
#! URL. Note that any URL may contain at most one hash fragment. Each pretty (
#!) URL has a corresponding ugly (
_escaped_fragment_) URL, which is derived with the following steps:
- The hash fragment becomes part of the query parameters.
- The hash fragment is indicated in the query parameters by preceding it with
- Some characters are escaped when the hash fragment becomes part of the query parameters. These characters are listed below.
- All other parts of the URL (host, port, path, existing query parameters, and so on) remain unchanged.
_escaped_fragment_ format to
Any URL whose query parameters contain the special token
_escaped_fragment_ as the last query parameter is considered an
_escaped_fragment_ URL. Further, there must only be one
_escaped_fragment_ in the URL, and it must be the last query parameter. The corresponding
#! URL can be derived with the following steps:
- Remove from the URL all tokens beginning with
_escaped_fragment_=(Note especially that the
=must be removed as well).
- Remove from the URL the trailing
&(depending on whether the URL had query parameters other than
- Add to the URL the tokens
- Add to the URL all tokens after
_escaped_fragment_=after unescaping them.
Note: As is explained below, there is a special syntax for pages without hash fragments, but that still contain dynamic Ajax content. For those pages, to map from the
_escaped_fragment_ URL to the original URL, omit steps 3 and 4 above.
Escaping characters in the bidirectional mapping
The following characters will be escaped when moving the hash fragment string to the query parameters of the URL, and must be unescaped by the web server to obtain the original URL:
Control characters (0x00..1F and 0x7F) should be avoided. Non-ASCII text will be converted to UTF-8 before escaping.
Role of the Search Engine Crawler
Transformation of URL
- URLs of the format
domain[:port]/path#!hashfragment, for example,
www.example.com#!key1=value1&key2=value2are temporarily transformed into
domain[:port]/path?_escaped_fragment_=hashfragment, such as
www.example.com?_escaped_fragment_=key1=value1%26key2=value2. In other words, a hash fragment beginning with an exclamation mark ('!') is turned into a query parameter. We refer to the former as "pretty URLs" and to the latter as "ugly URLs".
- URLs of the format
www.example.com?user=userid#!key1=value1&key2=value2) are temporarily transformed into
domain[:port]/path?queryparams&_escaped_fragment_=hashfragment(for the above example,
www.example.com?user=userid&_escaped_fragment_=key1=value1%26key2=value2). In other words, a hash fragment beginning with an exclamation mark ('!') is made part of the existing query parameters by adding a query parameter with the key "_escaped_fragment_" and the value of the hash fragment without the "!". As in this case the URL already contains query parameters, the new query parameter is delimited from the existing ones with the standard delimiter '&'. We refer to the former
#!as "pretty URLs" and to the latter
_escaped_fragment_URLs as "ugly URLs".
- Some characters are escaped when making a hash fragment part of the query parameters. See the previous section for more information.
- If a page has no hash fragments, but contains
<meta name="fragment" content="!">in the
<head>of the HTML, the crawler will transform the URL of this page from
domain[:port]/path?queryparams&_escaped_fragment_=and will then access the transformed URL. For example, if
<meta name="fragment" content="!">in the head, the crawler will transform this URL into
www.example.com?_escaped_fragment_=from the web server.
The crawler agrees to request from the server ugly URLs of the format:
The search engine agrees to display in the search results the corresponding pretty URLs:
Role of the application and web server
Opting into the AJAX crawling scheme
The application must opt into the AJAX crawling scheme to notify the crawler to request ugly URLs. An application can opt in with either or both of the following:
#!in your site's hash fragments.
- Add a trigger to the head of the HTML of a page without a hash fragment (for example, your home page):
<meta name="fragment" content="!">
Once the scheme is implemented, AJAX URLs containing hash fragments with
#! are eligible to be crawled and indexed by the search engine.
Transformation of URL
In response to a request of a URL that contains
_escaped_fragment_ (which should always be a request from a crawler), the server agrees to return an HTML snapshot of the corresponding pretty
#! URL. See above for the mapping between
_escaped_fragment_ (ugly) URLs and
#! (pretty) URLs.
Serving the HTML snapshot corresponding to the dynamic page
In response to an
_escaped_fragment_ URL, the origin server agrees to return to the crawler an HTML snapshot of the corresponding
#! URL. The HTML snapshot must contain the same content as the dynamically created page.
HTML snapshots can be obtained in an offline process or dynamically in response to a crawler request. For a guide on producing an HTML snapshot, see the HTML snapshot section.
Pages without hash fragments
It may be impossible or undesirable for some pages to have hash fragments in their URLs. For this reason, this scheme has a special provision for such pages: in order to indicate that a page without a hash fragment should be crawled again in
_escaped_fragment_ form, it is possible to embed a special meta tag into the head of its HTML.
The syntax for this meta tag is as follows:
<meta name="fragment" content="!">
The following important restrictions apply:
- The meta tag may only appear in pages without hash fragments.
- Only "!" may appear in the content field.
- The meta tag must appear in the head of the document.
The crawler treats this meta tag as follows: If the page
www.example.com contains the meta tag in its head, the crawler will retrieve the URL
www.example.com?_escaped_fragment_=. It will index the content of the page as
www.example.com and will display
www.example.com in search results.
As noted above, the mapping from the
_escaped_fragment_ to the
#! syntax is slightly different in this case: to retrieve the original URL, the web server instead simply removes the tokens
_escaped_fragment_= (note the
=) from the URL. In other words, you want to end up with the URL
www.example.com instead of
Warning: Should the content for
www.example.com?_escaped_fragment_= return a 404 code, no content will be indexed for
www.example.com! So, be careful if you add this meta tag to your page and make sure an HTML snapshot is returned.
Hyperlinks and Sitemaps
In order to crawl your site's URLs, a crawler must be able to find them. Here are two common ways to accomplish this:
- Hyperlinks: An HTML page or an HTML snapshot can contain hyperlinks to pretty URLs, that is, URLs containing
#!hash fragments. Note: The crawler will not follow links extracted from HTML that contain
- Sitemap: Pretty URLs may be listed in Sitemaps. For more information on Sitemaps, please see www.sitemaps.org.
Backward compatibility to current practice
Current practices will still be supported. Hijax remains a valid solution, as we describe in our blog post A spider's view of Web 2.0. Giving the crawler access to static content remains the main goal.
Existing uses of #!
A few web pages already use exclamation marks as the first character in a hash fragment. Because hash fragments are not a part of the URL that are sent to a server, such URLs have never been crawled. In other words, such URLs are not currently in the search index.
Under the new scheme, they can be crawled. In other words, a crawler will map each
#! URL to its corresponding
_escaped_fragment_ URL and request this URL from the web server. Because the site uses the pretty URL syntax (that is,
#! hash fragments), the crawler will assume that the site has opted into the AJAX crawling scheme. This can cause problems, because the crawler will not get any meaningful content for these URLs if the web server does not return an HTML snapshot.
There are two options:
- The site adopts the AJAX crawling scheme and returns HTML snapshots.
- If this is not desired, it is possible to opt out out of the scheme by adding a directive to the