What the user sees, what the crawler sees
In recent years, more and more of the web has become populated with AJAX-based applications, replacing static HTML pages. This is a great development for users because it makes applications much faster and richer. But making your application more responsive has come at a huge cost: crawlers are not able to see any content that is created dynamically. As a result, the most modern applications are also the ones that are often the least searchable. For example, a typical AJAX application may result in the following being seen by the crawler:
But imagine that what the user actually sees in the browser is lots of content relating to movies and information about them. How does this happen? The browser executes the script
getMovieInformation.js and creates the HTML that the user sees, for example something like this:
<html> <head> <title>MovieInfo</title> </head> <body> <div id="browseArea"> ... <div style="font-weight: bold;">Select from below:</div> ... <div id="browseTable" valign="top"> ... <a href="#%21tab0&q=Walking+on+Frozen+Water" class="menuItem">Walking on Frozen Water</a> ... <a href="#%21tab0&q=Climbing+Mauna+Kea" class="menuItem">Climbing Mauna Kea</a> ... <a href="#%21tab0&q=Sea+Turtles" class="menuItem">Sea Turtles</a> ... <a href="#%21tab0&q=This+Street+Makes+Me+Look+Fat" class="menuItem">This Street Makes Me Look Fat</a> ... <a href="#%21tab0&q=Octopus+spotting" class="menuItem">Octopus spotting</a> ... <a href="#%21tab0&q=Falling+in+Love" class="menuItem">Falling in Love</a> ... </div> <div id="load"> <p>Octopus spotting follows an octopus through an average octopus day. It tells stories of hiding from predators and divers, of the neighborhood the octopus lives in, and the other animals that share its living quarters.</p> </div> ... </body> </html>
If you're curious about your own application, load it in a browser and then view the source (for example, in Firefox, right-click and select "View Page Source"). In our example, "View Page Source" would not contain the word "octopus". Similarly, if some of your content is created dynamically, the page source will not include all the content you will want the crawler to see. In other words "View Page Source" is exactly what the crawler gets. Why is this important? It is important because search results are based in part on the words that the crawler finds on the page. In other words, if the crawler can't find your content, it's not searchable.
If you're starting from scratch, one good approach is to build your site's structure and navigation using only HTML. Then, once you have the site's pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses.
<a href="ajax.htm?foo=32" onClick="navigate('ajax.html#foo=32'); return false">foo 32</a>
Note that the static link's URL has a parameter (?foo=32) instead of a fragment (#foo=32), which is used by the AJAX code. This is important, as search engines understand URL parameters but often ignore fragments. Web developer Jeremy Keith labeled this technique as Hijax. Since you now offer static links, users and search engines can link to the exact content they want to share or reference.
This approach will continue to work. If your site is already configured with Hijax, you're good to go. But if your content changes regularly and you don't want to update it manually, or if you want search engines to serve fast AJAX links, or if you have not yet implemented the Hijax scheme, you should consider the new scheme we describe here.
An agreement between crawler and server
In order to make your AJAX application crawlable, your site needs to abide by a new agreement. This agreement rests on the following:
- The site adopts the AJAX crawling scheme.
- For each URL that has dymanically produced content, your server provides an HTML snapshot, which is the content a user (with a browser) sees. Often, such URLs will be AJAX URLs, that is, URLs containing a hash fragment, for example
- The search engine indexes the HTML snapshot and serves your original AJAX URLs in search results.
In order to make this work, the application must use a specific syntax in the AJAX URLs (let's call them "pretty URLs;" you'll see why in the following sections). The search engine crawler will temporarily modify these "pretty URLs" into "ugly URLs" and request those from your server. This request of an "ugly URL" indicates to the server that it should not return the regular web page it would give to a browser, but instead an HTML snapshot. When the crawler has obtained the content for the modified ugly URL, it indexes its content, then displays the original pretty URL in the search results. In other words, end users will always see the pretty URL containing a hash fragment. The following diagram summarizes the agreement: