Friday, February 17, 2023
Over the last few months we noticed an uptick in website owners and some content delivery networks
(CDNs) attempting to use
404 and other
4xx client errors (but not
429) to attempt to reduce Googlebot's crawl rate.
The short version of this blog post is: please don't do that; we have documentation about how to reduce Googlebot's crawl rate. Read that instead and learn how to effectively manage Googlebot's crawl rate.
Back to basics:
4xx errors are for client errors
4xx errors servers return to clients are a signal from the server that the
client's request was wrong in some sense. Most of the errors in this category are pretty benign:
"not found" errors, "forbidden", "I'm a teapot" (yes, that's a thing). They don't suggest anything
wrong going on with the server itself.
The one exception is
429, which stands for "too many requests". This error is a clear
signal to any well-behaved robot, including our beloved Googlebot, that it needs to slow down
because it's overloading the server.
4xx errors are bad for rate limiting Googlebot (except
Client errors are just that: client errors. They generally don't suggest an error with the server:
not that it's overloaded, not that it's encountered a critical error and is unable to respond
to the request. They simply mean that the client's request was bad in some way. There's no
sensible way to equate for example a
404 error to the server being overloaded.
Imagine if that was the case: you get an influx of
404 errors from your friend accidentally
linking to the wrong pages on your site, and in turn Googlebot slows down with crawling. That
would be pretty bad. Same goes for
And again, the big exception is the
429 status code, which translates to "too many
What rate limiting with
4xx does to Googlebot
4xx HTTP status codes (again, except
429) will cause your content
to be removed from Google Search. What's worse, if you also serve your robots.txt file with a
4xx HTTP status code, it will be treated as if it didn't exist. If you had a rule
there that disallowed crawling your dirty laundry, now Googlebot also knows about it; not great
for either party involved.
How to reduce Googlebot's crawl rate, the right way
We have extensive documentation about how to reduce Googlebot's crawl rate and also about how Googlebot (and Search indexing) handles the different HTTP status codes; be sure to check them out. In short, you want to do either of these things:
- Use Search Console to temporarily reduce crawl rate.
429HTTP status code to Googlebot when it's crawling too fast.
If you need more tips or clarifications, catch us on Twitter or post in our help forums.