Useful robots.txt rules
Here are some common useful robots.txt rules:
| Useful rules | |
|---|---|
| Disallow crawling of the entire site |
Keep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled. User-agent: * Disallow: / |
Allow crawling of an entire site (with an empty Disallow rule)
|
This explicitly allows all crawlers to access the entire site. It is functionally
equivalent to having no robots.txt file at all, or using an User-agent: * Disallow: |
| Disallow crawling of a directory and its contents |
Append a forward slash to the directory name to disallow crawling of a whole directory. User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/ |
|
Disallow crawling of a single web page |
For example, disallow the User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html |
|
Disallow crawling of the whole site except a subdirectory |
Crawlers may only access the User-agent: * Disallow: / Allow: /public/ |
| Allow access to a single crawler |
Only User-agent: Googlebot-News Allow: / User-agent: * Disallow: / |
| Allow access to all but a single crawler |
User-agent: Unnecessarybot Disallow: / User-agent: * Allow: / |
|
Disallow crawling of an entire site, but allow |
This implementation hides your pages from Google Search results, but the
User-agent: * Disallow: / User-agent: Storebot-Google Allow: / |
|
Block all images on your site from Google (includes anywhere images are displayed on Google, including Google Images and Discover) |
Google can't index images and videos without crawling them. User-agent: Googlebot-Image Disallow: / |
|
Block a specific image from Google Images |
For example, disallow the User-agent: Googlebot-Image Disallow: /images/dogs.jpg |
|
Disallow crawling of files of a specific file type |
For example, disallow for crawling all User-agent: Googlebot Disallow: /*.gif$ |
Use the * and $ wildcards to match URLs that end with a
specific string
|
For example, disallow all User-agent: Googlebot Disallow: /*.xls$ |
| Combine multiple user agents in a single group |
Consolidating rules for multiple crawlers into one group makes the file shorter and easier to manage, as all rules in the group apply to every user agent listed. This is the same as listing the user agents twice with the respective rules. User-agent: Googlebot User-agent: Storebot-Google Allow: /cats Disallow: / |