Don't use 403s or 404s for rate limiting

Friday, February 17, 2023

Over the last few months we noticed an uptick in website owners and some content delivery networks (CDNs) attempting to use 404 and other 4xx client errors (but not 429) to attempt to reduce Googlebot's crawl rate.

The short version of this blog post is: please don't do that; we have documentation about how to reduce Googlebot's crawl rate. Read that instead and learn how to effectively manage Googlebot's crawl rate.

Back to basics: 4xx errors are for client errors

The 4xx errors servers return to clients are a signal from the server that the client's request was wrong in some sense. Most of the errors in this category are pretty benign: "not found" errors, "forbidden", "I'm a teapot" (yes, that's a thing). They don't suggest anything wrong going on with the server itself.

The one exception is 429, which stands for "too many requests". This error is a clear signal to any well-behaved robot, including our beloved Googlebot, that it needs to slow down because it's overloading the server.

Why 4xx errors are bad for rate limiting Googlebot (except 429)

Client errors are just that: client errors. They generally don't suggest an error with the server: not that it's overloaded, not that it's encountered a critical error and is unable to respond to the request. They simply mean that the client's request was bad in some way. There's no sensible way to equate for example a 404 error to the server being overloaded. Imagine if that was the case: you get an influx of 404 errors from your friend accidentally linking to the wrong pages on your site, and in turn Googlebot slows down with crawling. That would be pretty bad. Same goes for 403, 410, 418.

And again, the big exception is the 429 status code, which translates to "too many requests".

What rate limiting with 4xx does to Googlebot

All 4xx HTTP status codes (again, except 429) will cause your content to be removed from Google Search. What's worse, if you also serve your robots.txt file with a 4xx HTTP status code, it will be treated as if it didn't exist. If you had a rule there that disallowed crawling your dirty laundry, now Googlebot also knows about it; not great for either party involved.

How to reduce Googlebot's crawl rate, the right way

We have extensive documentation about how to reduce Googlebot's crawl rate and also about how Googlebot (and Search indexing) handles the different HTTP status codes; be sure to check them out. In short, you want to do either of these things:

If you need more tips or clarifications, catch us on Twitter or post in our help forums.