Rate Limits

The Google Ads API buckets requests for rate limiting by queries per second (QPS) per client customer ID (CID) and developer token, meaning that metering is enforced independently on both CIDs and developer tokens. The Google Ads API uses a Token Bucket algorithm to meter requests and determine an appropriate QPS limit, so the exact limit will vary depending on the overall server load at any given time.

The purpose of imposing rate limits is to prevent one user from disrupting service for other users by (either intentionally or unintentionally) overwhelming the Google Ads API servers with a high volume of requests.

Requests that are in violation of rate limits will be rejected with the error: RESOURCE_TEMPORARILY_EXHAUSTED.

You can take control of your app and mitigate rate limits by both actively reducing the number of requests and throttling QPS from the client side.

There are a number of ways to reduce the chances of exceeding the rate limit. Becoming familiar with Enterprise Integration Patterns (EIP) concepts such as Messaging, Redelivery, and Throttling can help you build a more robust client app.

The following recommended practices ordered by complexity, with simpler strategies at the top and more robust but sophisticated architectures after:

Limit concurrent tasks

One root cause of exceeding rate limits is that the client app is spawning an excessive number of parallel tasks. While we don't limit the number of parallel requests a client app can have, this can easily exceed the Requests Per Second limit at the developer token level.

Setting a reasonable upper bound for the total number of concurrent tasks that are going to make requests (across all processes and machines), and adjusting upward to optimize your throughput without exceeding the rate limit is recommended.

Furthermore, you can consider throttling QPS from the client side (check out Throttling and rate limiters).

Batching requests

Consider batching multiple operations into a single request. This is most applicable on MutateFoo calls. For example, if you're updating status for multiple instances of AdGroupAd - instead of calling MutateAdGroupAds once for each AdGroupAd, you can call MutateAdGroupAds once, and pass in multiple operations. Refer to our batch operations guidance for some additional examples.

While batching requests reduces the total number of requests and mitigates the Requests Per Minute rate limit, it may trigger the Operations Per Minute rate limit if you perform a large number of operations against a single account.

Throttling and rate limiters

In addition to limiting the total number of threads in your client application, you can also implement rate limiters on the client side. This can ensure all the threads across your processes and / or clusters are governed by a specific QPS limit from the client side.

You can check out Guava Rate Limiter, or implement your own Token Bucket based algorithm for a clustered environment. For example, you could generate tokens and store them in a shared transactional storage such as a database, and each client would have to acquire and consume a token before it processes the request. If the tokens were used up, the client would have to wait until the next batch of tokens is generated.


A message queue is the solution for operation load distribution, while also controlling request and consumer rates. There are a number of message queue options available—some open source, some proprietary—and many of them can work with different languages.

When using message queues, you can have multiple producers pushing messages to the queue and multiple consumers processing those messages. Throttles can be implemented at the consumer side by limiting the number of concurrent consumers, or implement rate limiters or throttlers for either the producers or consumers.

For example, if a message consumer encounters a rates limit error, that consumer can return the request to the queue to be retried. At the same time, that consumer can also notify all other consumers to pause processing for a number of seconds to recover from the error.