A robots.txt provides restrictions to search engine robots (known as "bots") that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. If you want to protect some of your contents from being indexed by search engines, robots.txt is a simple tool for it. In this time, we would like to discuss how to use it.
The "/robots.txt" file is a text file, with one or more records. The robots.txt file must be reside in the root of the domain and must be exactly named "robots.txt". A robots.txt file located in a subdirectory is not a valid, as bots only check for this file in the root of the domain.
For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not.
User-agent:*Block or remove your entire website using a robots.txt file
To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:
User-agent: *To remove your site from Google only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root:
User-agent: GooglebotEach port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *For the https protocol (https://yourserver.com/robots.txt):
User-agent: *(alternative solution: Just create an empty "/robots.txt" file, or don't use one at all.)
B lock or remove pages using a robots.txt file
You can use a robots.txt file to block Googlebot from crawling pages on your site.For example, if you're manually creating a robots.txt file, to block Googlebot from crawling all pages under a particular directory (for example, private), you'd use the following robots.txt entry:
User-agent: GooglebotTo block Googlebot from crawling all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:
User-agent: GooglebotTo block Googlebot from crawling any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: GooglebotWhile we won't crawl or index the content of pages blocked by robots.txt, we may still crawl and index the URLs if we find them on other pages on the web. As a result, the URL of the page or other publicly available information such as anchor text in links to the site can appear in Google search results. However, no content from your pages will be crawled, indexed, or displayed.