Overview of Google crawlers and fetchers (user agents)
Google uses crawlers and fetchers to perform actions for its products, either automatically or triggered by user request.
"Crawler" (sometimes also called a "robot" or "spider") is a generic term for any program that is used to automatically discover and scan websites by following links from one web page to another. Google's main crawler used for Google Search is called Googlebot.
Fetchers, like a browser, are tools that request a single URL when prompted by a user.
The following tables show the Google crawlers and fetchers used by various products and services, how you may see in your referrer logs, and how to specify them in robots.txt. The lists are not exhaustive, they only cover the most common requestors that may show up in log files.
-
The user agent token is used in the
User-agent:
line in robots.txt to match a crawler type when writing crawl rules for your site. Some crawlers have more than one token, as shown in the table; you need to match only one crawler token for a rule to apply. This list is not complete, but covers most crawlers you might see on your website. - The full user agent string is a full description of the crawler, and appears in the HTTP request and your web logs.
Common crawlers
Google's common crawlers are used to find information for building Google's search indexes, perform other product specific crawls, and for analysis. They always obey robots.txt rules and generally crawl from the IP ranges published in the googlebot.json object.
Common Crawlers | |||||
---|---|---|---|---|---|
Googlebot Smartphone |
|
||||
Googlebot Desktop |
|
||||
Googlebot Image |
Used for crawling image URLs for Google Images and products dependent on images.
|
||||
Googlebot News |
Googlebot News uses Googlebot for crawling news articles, however it respects its
historic user agent token
|
||||
Googlebot Video |
Used for crawling video URLs for Google Video and products dependent on videos.
|
||||
Google StoreBot |
Google StoreBot crawls through certain types of pages, including, but not limited to, product details pages, cart pages, and checkout pages.
|
||||
Google-InspectionTool |
Google-InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in Search Console. Apart from the user agent and user agent token, it mimics Googlebot.
|
||||
GoogleOther |
GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.
|
||||
GoogleOther-Image |
GoogleOther-Image is the version of GoogleOther optimized for fetching publicly accessible image URLs.
|
||||
GoogleOther-Video |
GoogleOther-Video is the version of GoogleOther optimized for fetching publicly accessible video URLs.
|
||||
Google-CloudVertexBot |
Google-CloudVertexBot crawls sites on the site owners' request when building Vertex AI Agents.
|
||||
Google-Extended |
|
Special-case crawlers
The special-case crawlers are used by specific products where there's an agreement between the
crawled site and the product about the crawl process. For example, AdsBot
ignores the
global robots.txt user agent (*
) with the ad publisher's permission. The
special-case crawlers may ignore robots.txt rules and so they operate from a different IP range
than the common crawlers. The IP ranges are published in the
special-crawlers.json object.
Special-case crawlers | |||||
---|---|---|---|---|---|
APIs-Google |
Used by Google APIs to deliver push notification messages. Ignores the global user agent
(
|
||||
AdsBot Mobile Web |
Checks mobile
web page ad quality.
Ignores the global user agent (
|
||||
AdsBot |
Checks desktop
web page ad quality.
Ignores the global user agent (
|
||||
AdSense |
The AdSense crawler visits your site to determine its content in order to provide relevant
ads. Ignores the global user agent (
|
||||
Mobile AdSense |
The Mobile AdSense crawler visits your site to determine its content in order to provide
relevant ads. Ignores the global user agent (
|
||||
Google-Safety |
The Google-Safety user agent handles abuse-specific crawling, such as malware discovery for publicly posted links on Google properties. This user agent ignores robots.txt rules.
|
User-triggered fetchers
User-triggered fetchers are initiated by users to perform a product specific fetching function. For example, Google Site Verifier acts on a user's request, or a site hosted on Google Cloud (GCP) has a feature that allows the site's users to retrieve an external RSS feed. Because the fetch was requested by a user, these fetchers generally ignore robots.txt rules. The IP ranges the user-triggered fetchers use are published in the user-triggered-fetchers.json and user-triggered-fetchers-google.json objects.
User-triggered fetchers | |||||
---|---|---|---|---|---|
Feedfetcher |
Feedfetcher is used for crawling RSS or Atom feeds for Google Podcasts, Google News, and PubSubHubbub.
|
||||
Google Publisher Center |
Fetches and processes feeds that publishers explicitly supplied through the Google Publisher Center to be used in Google News landing pages.
|
||||
Google Read Aloud |
Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).
|
||||
Google Site Verifier |
Google Site Verifier fetches upon user request Search Console verification tokens.
|
A note about Chrome/W.X.Y.Z in user agents
Wherever you see the string Chrome/W.X.Y.Z in the user agent
strings in the table, W.X.Y.Z is actually a placeholder that represents the version
of the Chrome browser used by that user agent: for example, 41.0.2272.96
. This version
number will increase over time to
match the latest Chromium release version used by Googlebot.
If you are searching your logs or filtering your server for a user agent with this pattern, use wildcards for the version number rather than specifying an exact version number.
User agents in robots.txt
Where several user agents are recognized in the robots.txt file, Google will follow the most
specific. If you want all of Google to be able to crawl your pages, you don't need a
robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing
some of your content, you can do this by specifying Googlebot as the user agent. For example,
if you want all your pages to appear in Google Search, and if you want AdSense ads to appear
on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages
from Google altogether, blocking the Googlebot
user agent will also block all
Google's other user agents.
But if you want more fine-grained control, you can get more specific. For example, you might
want all your pages to appear in Google Search, but you don't want images in your personal
directory to be crawled. In this case, use robots.txt to disallow the
Googlebot-Image
user agent from crawling the files in your personal directory
(while allowing Googlebot to crawl all files), like this:
User-agent: Googlebot Disallow: User-agent: Googlebot-Image Disallow: /personal
To take another example, say that you want ads on all your pages, but you don't want those
pages to appear in Google Search. Here, you'd block Googlebot, but allow the
Mediapartners-Google
user agent, like this:
User-agent: Googlebot Disallow: / User-agent: Mediapartners-Google Disallow:
Controlling crawl speed
Each Google crawler accesses sites for a specific purpose and at different rates. Google uses algorithms to determine the optimal crawl rate for each site. If a Google crawler is crawling your site too often, you can reduce the crawl rate.
Retired Google crawlers
The following Google crawlers are no longer in use, and are only noted here for historical reference.
Retired Google crawlers | |||||
---|---|---|---|---|---|
Duplex on the web |
Supported the Duplex on the web service.
|
||||
Web Light |
Checked for the presence of the
|
||||
AdsBot Mobile Web |
Checks iPhone
web page ad quality.
Ignores the global user agent (
|
||||
Mobile Apps Android |
Checks Android app page
ad quality.
Obeys
|
||||
Google Favicon |
|