Webmasters

Robots meta tag and X-Robots-Tag HTTP header specifications

Abstract

This document details how Google handles the page-level indexing settings allow you to control how Google makes content available through search results. You can specify these by including a meta tag on (X)HTML pages or in an HTTP header.

Note: Keep in mind that these settings can be read and followed only if crawlers are allowed to access the pages that include these settings.

Back to top

Using the robots meta tag

The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in search results. Place the robots meta tag in the <head> section of a given page, like this:

<!DOCTYPE html>
<html><head>
<meta name="robots" content="noindex" />
(…)
</head>
<body>(…)</body>
</html>

The robots meta tag in the above example instructs all search engine not to show the page in search results. The value of the name attribute (robots) specifies that the directive applies to all crawlers. To address a specific crawler, replace the robots value of the name attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user-agents (a crawler uses its user-agent to request a page.) Google's standard web crawler has the user-agent name Googlebot. To prevent only Googlebot from crawling your page, update the tag as follows:

<meta name="googlebot" content="noindex" />

This tag now instructs Google (but no other search engines) not to show this page in its web search results. Both the name and the content attributes are non-case sensitive.

Search engines may have different crawlers for different properties or purposes. See the appendix for a complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:

<meta name="googlebot-news" content="noindex" />

If you need to specify multiple crawlers individually, it's okay to use multiple robots meta tags:

<meta name="googlebot" content="noindex">
<meta name="googlebot-news" content="nosnippet">

If competing directives are encountered by our crawlers we will use the most restrictive directive we find.

Back to top

Using the X-Robots-Tag HTTP header

The X-Robots-Tag can be used as an element of the HTTP header response for a given URL. Any directive that can used in an robots meta tag can also be specified as an X-Robots-Tag. Here's an example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)

Multiple X-Robots-Tag headers can be combined within the HTTP response, or you can specify a comma-separated list of directives. Here's an example of an HTTP header response which has a noarchive X-Robots-Tag combined with an unavailable_after X-Robots-Tag.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST
(…)

The X-Robots-Tag may optionally specify a user-agent before the directives. For instance, the following set of X-Robots-Tag HTTP headers can be used to conditionally allow showing of a page in search results for different search engines:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
(…)

Directives specified without a user-agent are valid for all crawlers. The section below demonstrates how to handle combined directives. Both the name and the specified values are not case sensitive.

Back to top

Valid indexing & serving directives

Several other directives can be used to control indexing and serving with the robots meta tag and the X-Robots-Tag. Each value represents a specific directive. The following table shows all the directives that Google honors and their meaning. Note: it is possible that these directives may not be treated the same by all other search engine crawlers. Multiple directives may be combined in a comma-separated list (see below for the handling of combined directives). These directives are not case-sensitive.

DirectiveMeaning
all There are no restrictions for indexing or serving. Note: this directive is the default value and has no effect if explicitly listed.
noindex Do not show this page in search results and do not show a "Cached" link in search results.
nofollow Do not follow the links on this page
none Equivalent to noindex, nofollow
noarchive Do not show a "Cached" link in search results.
nosnippet Do not show a snippet in the search results for this page
noodp Do not use metadata from the Open Directory project for titles or snippets shown for this page.
notranslate Do not offer translation of this page in search results.
noimageindex Do not index images on this page.
unavailable_after: [RFC-850 date/time] Do not show this page in search results after the specified date/time. The date/time must be specified in the RFC 850 format.

After the robots.txt file (or the absence of one) has given permission to crawl a page, by default pages are treated as crawlable, indexable, archivable, and their content is approved for use in snippets that show up in the search results, unless permission is specifically denied in a robots meta tag or X-Robots-Tag.

Back to top

Handling combined indexing and serving directives

You can create a multi-directive instruction by combining robots meta tag directives with commas. Here is an example of a robots meta tag that instructs web crawlers to not index the page and to not crawl any of the links on the page:

<meta name="robots" content="noindex, nofollow">

For situations where multiple crawlers are specified along with different directives, the search engine will use the sum of the negative directives. For example:

<meta name="robots" content="nofollow">
<meta name="googlebot" content="noindex">

The page containing these meta tags will be interpreted as having a noindex, nofollow directive when crawled by Googlebot.

Back to top

Practical implementation of X-Robots-Tag with Apache

You can add the X-Robots-Tag to a site's HTTP responses using .htaccess and httpd.conf files that are available by default on Apache based web servers. The benefit of using an X-Robots-Tag with HTTP responses is that you can specify crawling directives that are applied globally across a site. The support of regular expressions allows a high level of flexibility.

For example, to add a noindex, nofollow X-Robots-Tag to the HTTP response for all .PDF files across an entire site, add the following snippet to the site's root .htaccess file or httpd.conf file:

<Files ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

You can use the X-Robots-Tag for non-HTML files like image files where the usage of robots meta tags is not possible. Here's an example of adding a noindex X-Robots-Tag directive for images files (.png, .jpeg, .jpg, .gif) across an entire site:

<Files ~ "\.(png|jpe?g|gif)$">
  Header set X-Robots-Tag "noindex"
</Files>

Back to top

Combining crawling with indexing / serving directives

Robots meta tags and X-Robots-Tag HTTP headers are discovered when a URL is crawled. If a page is disallowed from crawling through the robots.txt file, then any information about indexing or serving directives will not be found and will therefore be ignored. If indexing or serving directives must be followed, the URLs containing those directives cannot be disallowed from crawling.

Back to top

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.