Stay organized with collections
Save and categorize content based on your preferences.
Wednesday, October 07, 2009
Today we're excited to propose a new standard for making AJAX-based websites crawlable. This will
benefit webmasters and users by making content from rich and interactive AJAX-based websites
universally accessible through search results on any search engine that chooses to take part. We
believe that making this content available for crawling and indexing could significantly improve
the web.
While AJAX-based websites are popular with users, search engines traditionally are not able to
access any of the content on them. The last time we checked, almost 70% of the websites we know
about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but
the better that search engines could crawl and index AJAX, the more that developers could add
richer features to their websites and still show up in search engines.
Some of the goals that we wanted to achieve with this proposal were:
Minimal changes are required as the website grows
Users and search engines see the same content (no cloaking)
Search engines can send users directly to the AJAX URL (not to a static copy)
Site owners have a way of verifying that their AJAX website is rendered correctly and thus that
the crawler has access to all the content
Here's how search engines would crawl and index AJAX in our initial proposal:
Slightly modify the URL fragments for stateful AJAX pages
Stateful AJAX pages display the same content whenever accessed directly. These are pages that
could be referred to in search results. Instead of a URL like
https://example.com/page?query#state
we would like to propose adding a token to make it possible to recognize these URLs:
https://example.com/page?query#[FRAGMENTTOKEN]state. Based on a review of current
URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed
URL that could be shown in search results would then be:
https://example.com/page?query#!state.
Use a headless browser that outputs an HTML snapshot on your web server
The headless browser is used to access the AJAX page and generates HTML code based on the final
state in the browser. Only specially tagged URLs are passed to the headless browser for
processing. By doing this on the server side, the website owner is in control of the HTML code
that is generated and can easily verify that all JavaScript is executed correctly. An example of
such a browser is
HtmlUnit,
an open-sourced "GUI-less browser for Java programs.
Allow search engine crawlers to access these URLs by escaping the state
As URL fragments are never sent with requests to servers, it's necessary to slightly modify the
URL used to access the page. At the same time, this tells the server to use the headless browser
to generate HTML code instead of returning a page with JavaScript. Other, existing URLs - such
as those used by the user - would be processed normally, bypassing the headless browser. We
propose escaping the state information and adding it to the query parameters with a token.
Using the previous example, one such URL would be
https://example.com/page?query&[QUERYTOKEN]=state. Based on our analysis of
current URLs on the web, we propose using _escaped_fragment_ as the token. The
proposed URL would then become
https://example.com/page?query&_escaped_fragment_=state.
Show the original URL to users in the search results
To improve the user experience, it makes sense to refer users directly to the AJAX-based pages.
This can be achieved by showing the original URL (such as
https://example.com/page?query#!state from our example above) in the search results.
Search engines can check that the indexable text returned to Googlebot is the same or a subset
of the text that is returned to users.
In summary, starting with a stateful URL such as
https://example.com/dictionary.html#AJAX, it could be available to both crawlers and
users as https://example.com/dictionary.html#!AJAX which could be crawled as
https://example.com/dictionary.html?_escaped_fragment_=AJAX which in turn would be
shown to users and accessed as https://example.com/dictionary.html#!AJAX
We're currently working on a proposal and a prototype implementation. Feedback is very
welcome—please add your comments below or in our
Webmaster Help Forum.
Thank you for your interest in making the AJAX-based web accessible and useful through search
engines!
Proposal by Katharina Probst, Bruce Johnson, Arup Mukherjee, Erik van der Poel and Li Xiao, Google
Blog post by
John Mueller,
Webmaster Trends Analyst, Google Zürich
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis proposal aims to make AJAX-based websites crawlable by search engines, improving content accessibility for users.\u003c/p\u003e\n"],["\u003cp\u003eThe proposal suggests modifying URL fragments for stateful AJAX pages using "!" as a token for recognition.\u003c/p\u003e\n"],["\u003cp\u003eA headless browser is utilized to generate an HTML snapshot of the AJAX page for search engine indexing.\u003c/p\u003e\n"],["\u003cp\u003eSearch engine crawlers access these snapshots through escaped fragment URLs containing state information.\u003c/p\u003e\n"],["\u003cp\u003eUsers are directed to the original AJAX URLs in search results, ensuring a better user experience.\u003c/p\u003e\n"]]],["The proposal introduces a new standard for crawling AJAX-based websites. Key actions include modifying stateful AJAX URLs by adding a \"!\" token (e.g., `#!state`) and using a headless browser to generate HTML snapshots. Search engine crawlers can access these URLs by escaping the state information into query parameters with `_escaped_fragment_`. The original AJAX URL is shown in search results. This method aims to make AJAX content accessible to search engines, benefiting both users and webmasters.\n"],null,["| It's been a while since we published this blog post. Some of the information may be outdated (for example, some images may be missing, and some links may not work anymore).\n\nWednesday, October 07, 2009\n\n\nToday we're excited to propose a new standard for making AJAX-based websites crawlable. This will\nbenefit webmasters and users by making content from rich and interactive AJAX-based websites\nuniversally accessible through search results on any search engine that chooses to take part. We\nbelieve that making this content available for crawling and indexing could significantly improve\nthe web.\n\n\nWhile AJAX-based websites are popular with users, search engines traditionally are not able to\naccess any of the content on them. The last time we checked, almost 70% of the websites we know\nabout use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but\nthe better that search engines could crawl and index AJAX, the more that developers could add\nricher features to their websites and still show up in search engines.\n\nSome of the goals that we wanted to achieve with this proposal were:\n\n- Minimal changes are required as the website grows\n- Users and search engines see the same content (no cloaking)\n- Search engines can send users directly to the AJAX URL (not to a static copy)\n- Site owners have a way of verifying that their AJAX website is rendered correctly and thus that the crawler has access to all the content\n\nHere's how search engines would crawl and index AJAX in our initial proposal:\n\n- **Slightly modify the URL fragments for stateful AJAX pages** \n Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like `https://example.com/page?query#state` we would like to propose adding a token to make it possible to recognize these URLs: `https://example.com/page?query#[FRAGMENTTOKEN]state`. Based on a review of current URLs on the web, we propose using \"!\" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: `https://example.com/page?query#!state`.\n- **Use a headless browser that outputs an HTML snapshot on your web server** \n The headless browser is used to access the AJAX page and generates HTML code based on the final state in the browser. Only specially tagged URLs are passed to the headless browser for processing. By doing this on the server side, the website owner is in control of the HTML code that is generated and can easily verify that all JavaScript is executed correctly. An example of such a browser is [HtmlUnit](https://htmlunit.sourceforge.net/), an open-sourced \"GUI-less browser for Java programs.\n- **Allow search engine crawlers to access these URLs by escaping the state** \n As URL fragments are never sent with requests to servers, it's necessary to slightly modify the URL used to access the page. At the same time, this tells the server to use the headless browser to generate HTML code instead of returning a page with JavaScript. Other, existing URLs - such as those used by the user - would be processed normally, bypassing the headless browser. We propose escaping the state information and adding it to the query parameters with a token. Using the previous example, one such URL would be `https://example.com/page?query&[QUERYTOKEN]=state`. Based on our analysis of current URLs on the web, we propose using `_escaped_fragment_` as the token. The proposed URL would then become `https://example.com/page?query&_escaped_fragment_=state`.\n- **Show the original URL to users in the search results** \n To improve the user experience, it makes sense to refer users directly to the AJAX-based pages. This can be achieved by showing the original URL (such as `https://example.com/page?query#!state` from our example above) in the search results. Search engines can check that the indexable text returned to Googlebot is the same or a subset of the text that is returned to users.\n\n\nIn summary, starting with a stateful URL such as\n`https://example.com/dictionary.html#AJAX`, it could be available to both crawlers and\nusers as `https://example.com/dictionary.html#!AJAX` which could be crawled as\n`https://example.com/dictionary.html?_escaped_fragment_=AJAX` which in turn would be\nshown to users and accessed as `https://example.com/dictionary.html#!AJAX`\n| We used to have a presentation embedded in this article, but it's lost now (the internet ate our homework?).\n\n\n[View the presentation](https://docs.google.com/present/view?id=dc75gmks_120cjkt2chf)\n\n\nWe're currently working on a proposal and a prototype implementation. Feedback is very\nwelcome---please add your comments below or in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community/thread?tid=01242a2a9bafd648).\nThank you for your interest in making the AJAX-based web accessible and useful through search\nengines!\n\n\nProposal by Katharina Probst, Bruce Johnson, Arup Mukherjee, Erik van der Poel and Li Xiao, Google\n\nBlog post by\n[John Mueller](https://twitter.com/JohnMu),\nWebmaster Trends Analyst, Google Zürich"]]