- Does my website need a robots.txt file?
- Which method should I use?
- Can I use these methods to remove someone else's site?
- How can I slow down Google's crawling of my website?
- I use the same robots.txt for multiple websites. Can I use a full URL instead of a relative path?
- Can I place the robots.txt file in a subdirectory?
- I want to block a private folder. Can I prevent other people from reading my robots.txt file?
- Do I have to include an
allowdirective to allow crawling?
- What happens if I have a mistake in my robots.txt file or use an unsupported directive?
- What program should I use to create a robots.txt file?
- If I block Google from crawling a page using a robots.txt
disallowdirective, will it disappear from search results?
- How long will it take for changes in my robots.txt file to affect my search results?
- How do I specify AJAX-crawling URLs in the robots.txt file?
- How can I temporarily suspend all crawling of my website?
- My server is not case-sensitive. How can I disallow crawling of some folders completely?
- I return 403 "Forbidden" for all URLs including the robots.txt file. Why is the site still being crawled?
Robots meta tag
- Is the robots meta tag a replacement for the robots.txt file?
- Can the robots meta tag be used to block a part of a page from being indexed?
- Can I use the robots meta tag outside of a
- Does the robots meta tag disallow crawling?
- How does the
nofollowrobots meta tag compare to the
X-Robots-Tag HTTP header
Does my website need a robots.txt file?
No. When Googlebot visits a website, we first ask for permission to crawl by attempting to retrieve the robots.txt file. A website without a robots.txt file, robots meta tags or X-Robots-Tag HTTP headers will generally be crawled and indexed normally.
Which method should I use?
It depends. In short, there are good reasons to use each of these methods:
- robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
- robots meta tag: Use it if you need to control how an individual HTML page is shown in search results (or to make sure that it's not shown).
- X-Robots-Tag HTTP header: Use it if you need to control how non-HTML content is shown in search results (or to make sure that it's not shown).
Can I use these methods to remove someone else's site?
No. These methods are only valid for sites where you can modify the code or add files. If you want to remove content from a third-party site, you need to contact the webmaster to have them remove the content.
How can I slow down Google's crawling of my website?
I use the same robots.txt for multiple websites. Can I use a full URL instead of a relative path?
No. The directives in the robots.txt file (with exception of "Sitemap:") are only valid for relative paths.
Can I place the robots.txt file in a subdirectory?
No. The file must be placed in the topmost directory of the website.
I want to block a private folder. Can I prevent other people from reading my robots.txt file?
No. The robots.txt file may be read by various users. If folders or filenames of content should not be public, they should not be listed in the robots.txt file. It is not recommended to serve different robots.txt files based on the user-agent or other attributes.
Do I have to include an
directive to allow crawling?
No, you do not need to
allow directive. The
directive is used to override
disallow directives in the same robots.txt file.
What happens if I have a mistake in my robots.txt file or use an unsupported directive?
Web-crawlers are generally very flexible and typically will not be swayed by minor mistakes in the robots.txt file. In general, the worst that can happen is that incorrect / unsupported directives will be ignored. Bear in mind though that Google can't read minds when interpreting a robots.txt file; we have to interpret the robots.txt file we fetched. That said, if you are aware of problems in your robots.txt file, they're usually easy to fix.
What program should I use to create a robots.txt file?
You can use anything that creates a valid text file. Common programs used to create robots.txt files are Notepad, TextEdit, vi, or emacs. Google Webmaster Tools contains a tool that helps you to create a robots.txt file for your website. Once the robots.txt file has been placed on the website, you can verify the validity by using the Fetch as Googlebot feature in Google Webmaster Tools. Webmaster Tools can also help to generate a robots.txt file.
If I block Google from crawling a page
using a robots.txt
disallow directive, will it disappear
from search results?
Blocking Google from crawling a page is likely to decrease that page's ranking or cause it to drop out altogether over time. It may also reduce the amount of detail provided to users in the text below the search result. This is because without the page's content, the search engine has much less information to work with.
Disallow does not guarantee that a
page will not appear in results: Google may still decide, based on external
information such as incoming links, that it is relevant. If you wish
to explicitly block a page from being indexed, you should instead use
noindex robots meta tag or X-Robots-Tag HTTP header.
In this case, you should not disallow the page in robots.txt, because
the page must be crawled in order for the tag to be seen and obeyed.
How long will it take for changes in my robots.txt file to affect my search results?
First, the cache of the robots.txt file must be refreshed (we generally cache the contents for up to one day). Even after finding the change, crawling and indexing is a complicated process that can sometimes take quite some time for individual URLs, so it's impossible to give an exact timeline. Also, keep in mind that even if your robots.txt file is disallowing access to a URL, that URL may remain visible in search results despite that fact that we can't crawl it. If you wish to expedite removal of the pages you've blocked from Google, please submit a removal request via Google Webmaster Tools.
How do I specify AJAX-crawling URLs in the robots.txt file?
You must use the crawled URLs when specifying URLs that use the AJAX-crawling proposal. For more information, see the AJAX-crawling FAQ.
How can I temporarily suspend all crawling of my website?
You can temporarily suspend all crawling by returning a HTTP result code of 503 for all URLs, including the robots.txt file. The robots.txt file will be retried periodically until it can be accessed again. We do not recommend changing your robots.txt file to disallow crawling.
My server is not case-sensitive. How can I disallow crawling of some folders completely?
Directives in the robots.txt file are case-sensitive. In this case, it is recommended to make sure that only one version of the URL is indexed using canonicalization methods. Doing this allows you to simplify your robots.txt file. Should this not be possible, we recommended that you list the common combinations of the folder name, or to shorten it as much as possible, using only the first few characters instead of the full name. For instance, instead of listing all upper and lower-case permutations of "/MyPrivateFolder", you could list the permutations of "/MyP" (if you are certain that no other, crawlable URLs exist with those first characters). Alternately, it may make sense to use a robots meta tag or X-Robots-Tag HTTP header instead, if crawling is not an issue.
I return 403 "Forbidden" for all URLs, including the robots.txt file. Why is the site still being crawled?
The HTTP result code 403—as all other 4xx HTTP result codes—is seen as a sign that the robots.txt file does not exist. Because of this, crawlers will generally assume that they can crawl all URLs of the website. In order to block crawling of the website, the robots.txt must be returned normally (with a 200 "OK" HTTP result code) with an appropriate "disallow" in it.
Robots meta tag
Is the robots meta tag a replacement for the robots.txt file?
No. The robots.txt file controls which pages are accessed. The robots meta tag controls whether a page is indexed, but to see this tag the page needs to be crawled. If crawling a page is problematic (for example, if the page causes a high load on the server), you should use the robots.txt file. If it is only a matter of whether or not a page is shown in search results, you can use the robots meta tag.
Can the robots meta tag be used to block a part of a page from being indexed?
No, the robots meta tag is a page-level setting.
Can I use the robots meta tag outside of a
No, the robots meta tag currently needs to be in the
<head> section of a page.
Does the robots meta tag disallow crawling?
No. Even if the robots meta tag currently says
we'll need to recrawl that URL occasionally to check if the meta tag has
How does the
meta tag compare to the
rel="nofollow" link attribute?
nofollow robots meta tag applies to all links on a page.
rel="nofollow" link attribute only applies to specific
links on a page. For more information on the
link attribute, please see our Help Center articles on user-generated spam and the rel="nofollow".
X-Robots-Tag HTTP header
How can I check the X-Robots-Tag for a URL?
Have we missed anything?
Feel free to post in our Webmaster Help Forum for more help!