Abstract
This document details how the page-level indexing settings allow you to control how Google makes content available through search results. You can specify these by including a meta tag on (X)HTML pages or in an HTTP header.
Using the robots meta tag
The robots meta tag lets you utilize a granular, page-specific
approach to controlling how an individual page should be indexed and served
to users in search results. Place the robots meta tag in the
<head>
section of a given page, like this:
<!DOCTYPE html> <html><head> <meta name="robots" content="noindex" /> (…) </head> <body>(…)</body> </html>
The robots meta tag in the above example instructs most search
engines not to show the page in search results. The value of the
name
attribute (robots
) specifies
that the directive applies to all crawlers. To
address a specific crawler, replace the robots
value of the name
attribute with the name of the crawler that you are addressing. Specific
crawlers are also known as user-agents (a crawler uses its user-agent
to request a page.) Google's standard web crawler has the user-agent name
Googlebot
. To prevent only Googlebot from crawling your
page, update the tag as follows:
<meta name="googlebot" content="noindex" />
This tag now instructs Google (but no other search engines) not to
show this page in its web search results. Both
the name
and the content
attributes
are non-case sensitive.
Search engines may have different crawlers for different properties or purposes. See the complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:
<meta name="googlebot-news" content="noindex" />
If you need to specify multiple crawlers individually, it's okay to use multiple robots meta tags:
<meta name="googlebot" content="noindex"> <meta name="googlebot-news" content="nosnippet">
If competing directives are encountered by our crawlers we will use the most restrictive directive we find.
Using the X-Robots-Tag
HTTP header
The X-Robots-Tag
can be used as an element of the HTTP
header response for a given URL. Any directive that can be used in an robots
meta tag can also be specified as an X-Robots-Tag
. Here's an
example of an HTTP response with an X-Robots-Tag
instructing
crawlers not to index a page:
HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT (…) X-Robots-Tag: noindex (…)
Multiple X-Robots-Tag
headers can be combined within the
HTTP response, or you can specify a comma-separated list of directives.
Here's an example
of an HTTP header response which has a noarchive
X-Robots-Tag
combined
with an unavailable_after
X-Robots-Tag
.
HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT (…) X-Robots-Tag: noarchive X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST (…)
The X-Robots-Tag
may optionally specify a user-agent before the
directives. For instance, the following set of X-Robots-Tag
HTTP
headers can be used to conditionally allow showing of a page in
search results for different search engines:
HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT (…) X-Robots-Tag: googlebot: nofollow X-Robots-Tag: otherbot: noindex, nofollow (…)
Directives specified without a user-agent are valid for all crawlers. The section below demonstrates how to handle combined directives. Both the name and the specified values are not case sensitive.
Valid indexing & serving directives
Several other directives can be used to control
indexing and serving with the robots meta tag and the
X-Robots-Tag
. Each value represents a specific directive.
The following table shows all the directives that Google honors and
their meaning.
Note: it is possible that these directives may not be treated the
same by all other search engine crawlers. Multiple directives may be
combined in a comma-separated list (see below for the handling of
combined directives). These directives are not case-sensitive.
Directives | |
---|---|
all |
There are no restrictions for indexing or serving. Note: this directive is the default value and has no effect if explicitly listed. |
noindex |
Do not show this page in search results and do not show a "Cached" link in search results. |
nofollow |
Do not follow the links on this page. |
none |
Equivalent to noindex, nofollow . |
noarchive |
Do not show a "Cached" link in search results. |
nosnippet |
Do not show a text snippet or video preview in the search results for this page. A static thumbnail (if available) will still be visible. |
notranslate |
Do not offer translation of this page in search results. |
noimageindex |
Do not index images on this page. |
unavailable_after: [RFC-850 date/time] |
Do not show this page in search results after the specified date/time. The date/time must be specified in the RFC 850 format. |
After the robots.txt file (or the absence of one) has given
permission to crawl a page, by default pages are treated as
crawlable, indexable, archivable, and their
content is approved for use in snippets that show up in the search results,
unless permission is specifically denied in a robots meta tag or
X-Robots-Tag
.
Handling combined indexing and serving directives
You can create a multi-directive instruction by combining robots meta tag directives with commas. Here is an example of a robots meta tag that instructs web crawlers to not index the page and to not crawl any of the links on the page:
<meta name="robots" content="noindex, nofollow">
For situations where multiple crawlers are specified along with different directives, the search engine will use the sum of the negative directives. For example:
<meta name="robots" content="nofollow"> <meta name="googlebot" content="noindex">
The page containing these meta tags will be interpreted as having a
noindex, nofollow
directive when crawled by Googlebot.
Practical implementation of X-Robots-Tag
with Apache
You can add the X-Robots-Tag
to a site's HTTP responses
using .htaccess and httpd.conf files that are available by default on
Apache based web servers. The benefit of using an X-Robots-Tag
with HTTP responses is that you can
specify crawling directives that are applied globally across a site. The
support of regular expressions allows a high level of flexibility.
For example, to add a noindex, nofollow
X-Robots-Tag
to the HTTP
response for all .PDF files across an entire site, add the following snippet
to the site's root .htaccess file or httpd.conf file:
<Files ~ "\.pdf$"> Header set X-Robots-Tag "noindex, nofollow" </Files>
You can use the X-Robots-Tag
for non-HTML files like
image files where the usage of robots meta tags is not possible. Here's
an example of adding a noindex
X-Robots-Tag
directive for images files (.png, .jpeg, .jpg, .gif) across an
entire site:
<Files ~ "\.(png|jpe?g|gif)$"> Header set X-Robots-Tag "noindex" </Files>
Combining crawling with indexing / serving directives
Robots meta tags and X-Robots-Tag
HTTP headers are
discovered when a URL is crawled. If a page is disallowed from crawling
through the robots.txt file, then any information about indexing or serving
directives will not be found and will therefore be ignored. If indexing or
serving directives must be followed, the URLs containing those directives
cannot be disallowed from crawling.