Usage Limits

The Google Play EMM API has a default limit of 100 queries per second (QPS) for each enterprise. If you send fewer than 100 QPS, the remainder of your quota is automatically carried over for several seconds. For example:

Time Total quota Number of requests sent Unused quota carried over
0 sec. 100 80 20
1 sec. 120 50 70
2 sec. 170 170 0
3 sec. 100 75 25

Note that in the example above the quota is never reached. If you exceed the quota, then the Google Play EMM API returns HTTP 429 Too Many Requests. To help ensure that you don’t exceed the stated usage limits and offer an optimal experience for your users, consider implementing some of the best practices described in the section below.

Recommendations for staying below the API usage limits

When using the Google Play EMM API, there are some best practices that you can implement to distribute requests and reduce your risk of exceeding the usage limits.

Randomize start times and intervals

Activities such as syncing or checking-in devices at the same time are likely to result in a significant increase in request volume. Instead of performing these activities at regularly scheduled intervals, you can distribute your request load by randomizing these intervals. For example, rather than syncing each device every 24 hours, you can sync each device at a randomly chosen time period between 23 and 25 hours. This helps spread out the number of requests.

Similarly, if you run a daily job that makes many API calls in quick succession, consider starting the job at a random time each day to prevent making a high volume of requests for all your enterprise customers at the same time.

Use exponential backoff to retry requests

If you run jobs that consists of many API calls, use an exponential backoff strategy in response to reaching the quota. Exponential backoff is an algorithm that retries requests exponentially. An example flow for implementing simple exponential backoff is as follows:

  1. Make a request to the Google Play EMM API.
  2. Receive an HTTP 429 response.
  3. Wait 2 seconds + random_time, then retry the request.
  4. Receive an HTTP 429 response.
  5. Wait 4 seconds + random_time, then retry the request.
  6. Receive an HTTP 429 response.
  7. Wait 8 seconds + random_time, then retry the request.

The random_time is typically a random number ranging from -0.5 * wait time to +0.5 * wait time. Redefine a new random_time each time you retry your request. API calls that are required to complete user-facing actions can be retried on a more frequent schedule (0.5s, 1s, and 2s, for example).

Rate-limit batch processes

Each time a batched process reaches the quota, the latency of user actions that call the API increases. In situations like these, strategies such as exponential backoff may not be effective enough in maintaining low latency for user actions.

To avoid reaching the API’s usage limits repeatedly and increasing latency for user-facing actions, consider using a rate limiter for your batched processes (see Google’s RateLimiter). With a rate limiter you can adjust the rate of your API requests so that you consistently remain below the usage limits.

For example, start a batched process with a default rate limit of 50 QPS. As long as the API doesn’t return an error, increase the rate limit slowly (1% every minute). Each time you reach the quota, reduce your request rate by 20%. This adaptive approach results in a more optimal request rate while reducing latency for user-facing actions.