"Crawler" (sometimes also called a "robot" or "spider") is a generic term for any program that
is used to automatically discover and scan websites by following links from one webpage to
another. Google's main crawler is called
Googlebot. This table lists information
about the common Google crawlers you may see in your referrer logs, and how to specify them in
robots.txt, the
robots meta tags, and the
X-Robots-Tag HTTP directives.
The following table shows the crawlers used by various products and services at Google:
The user agent token is used in the User-agent: line in robots.txt
to match a crawler type when writing crawl rules for your site. Some crawlers have more than
one token, as shown in the table; you need to match only one crawler token for a rule to
apply. This list is not complete, but covers most of the crawlers you might see on your
website.
The full user agent string is a full description of the crawler, and appears in
the HTTP request and your web logs.
Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19
Google StoreBot
User agent token
Storebot-Google
Full user agent strings
Desktop agent:
Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
Mobile agent:
Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36
A note about Chrome/W.X.Y.Z in user agents
Wherever you see the string Chrome/W.X.Y.Z in the user agent
strings in the table, W.X.Y.Z is actually a placeholder that represents the version
of the Chrome browser used by that user agent: for example, 41.0.2272.96. This version
number will increase over time to
match the latest Chromium release version used by Googlebot.
If you are searching your logs or filtering your server for a user agent with this pattern,
use wildcards for the version number rather than specifying an exact
version number.
User agents in robots.txt
Where several user agents are recognized in the robots.txt file, Google will follow the most
specific. If you want all of Google to be able to crawl your pages, you don't need a
robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing
some of your content, you can do this by specifying Googlebot as the user agent. For example,
if you want all your pages to appear in Google Search, and if you want AdSense ads to appear
on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages
from Google altogether, blocking the Googlebot user agent will also block all
Google's other user agents.
But if you want more fine-grained control, you can get more specific. For example, you might
want all your pages to appear in Google Search, but you don't want images in your personal
directory to be crawled. In this case, use robots.txt to disallow the
Googlebot-Image user agent from crawling the files in your personal directory
(while allowing Googlebot to crawl all files), like this:
To take another example, say that you want ads on all your pages, but you don't want those
pages to appear in Google Search. Here, you'd block Googlebot, but allow the
Mediapartners-Google user agent, like this:
Each Google crawler accesses sites for a specific purpose and at different rates. Google uses
algorithms to determine the optimal crawl rate for each site. If a Google crawler is crawling
your site too often, you can
reduce the crawl rate.