This guide explains best practices to consider when developing applications according to the RTB Protocol.
Keep connections alive
Establishing a new connection increases latencies and takes far more resources on both ends than reusing an existing one. By closing fewer connections, you can reduce the number of connections that must be opened again.
First, every new connection requires an extra network round-trip to establish. Because we establish connections on demand, the first request on a connection has a shorter effective deadline and is more likely to time out than subsequent requests. Any extra timeouts increase the error rate, which can lead to your bidder being throttled.
Second, many web servers spawn a dedicated worker thread for each connection established. This means that to close and recreate the connection, the server must shut down and discard a thread, allocate a new one, make it runnable, and build the connection state, before finally processing the request. That's a lot of unnecessary overhead.
Avoid closing connections
Begin by tuning connection behavior. Most server defaults are tailored for environments with large numbers of clients, each making a small number of requests. For RTB, by contrast, a small pool of machines sends requests on behalf of a large number of browsers, relatively speaking. Under these conditions, it makes sense to re-use connections as many times as possible. We recommend that you set:
- Idle timeout to 2 minutes.
- Maximum number of requests on a connection to the highest possible value.
- Maximum number of connections to the highest value your RAM can accommodate, while taking care to verify that the number of connections does not approach that value too closely.
In Apache, for example, this would entail setting
KeepAliveTimeout to 120,
MaxClients to a value that depends on server type.
Once your connection behavior is tuned, you should also ensure that your bidder code does not close connections needlessly. For example, if you have front-end code that returns a default "no bid" response in the event of backend errors or timeouts, make sure the code returns its response without closing the connection. That way you avoid the situation in which if your bidder gets overloaded, connections start to close, and the number of timeouts increases, causing your bidder to be throttled.
Keep connections balanced
If Authorized Buyers connects to your bidder's servers through a proxy server, the connections may become unbalanced over time because, knowing only the proxy server's IP address, Authorized Buyers cannot determine which bidder server is receiving each callout. Over time, as Authorized Buyers establishes and closes connections and the bidder's servers restart, the number of connections mapped to each can become highly variable.
When some connections are heavily utilized, other opened connections may remain mostly idle because they are not needed at the time. As Authorized Buyers's traffic changes, idle connections can become active and active connections can go idle. These may cause uneven loads on your bidder servers if the connections are clustered poorly. Google attempts to prevent this by closing all connections after 10,000 requests, to automatically rebalance hot connections over time. If you still find traffic becoming unbalanced in your environment, there are further steps you can take:
- Select the backend per request rather than once per connection if you are using frontend proxies.
- Specify a maximum number of requests per connection if you are
proxying connections through a hardware load balancer or firewall and the
mapping is fixed once the connections are established. Note that Google
already specifies an upper limit of 10,000 requests per connection, so
you should only need to provide a stricter value if you still find hot
connections becoming clustered in your environment. In Apache, for example,
- Configure the bidder's servers to monitor their request rates and close some of their own connections if they are consistently handling too many requests compared to their peers.
Handle overload gracefully
Ideally, quotas would be set high enough so your bidder can receive all the requests it can handle, but no more than that. In practice, keeping quotas at optimal levels is a difficult task, and overloads do happen, for a variety of reasons: a backend going down during peak times, a traffic mix changing so that more processing is required for each request, or a quota value just being set too high. Consequently, it pays to consider how your bidder will behave with too much traffic coming in.
In terms of behavior under heavy loads, bidders fall into three broad categories:
- The "respond to everything" bidder
While straightforward to implement, this bidder fares the worst when overloaded. It simply tries to respond to every bid request that comes in, no matter what, queueing up any that cannot be served immediately. The scenario that ensues is often something like this:
- As the request rate climbs, so do the request latencies, until all requests start timing out
- Latencies rise precipitously as callout rates approach peak
- Throttling kicks in, sharply reducing the number of allowed callouts
- Latencies start to recover, causing throttling to be reduced
- The cycle to begins again.
The graph of latency for this bidder resembles a very steep saw-tooth pattern. Alternatively, queued-up requests cause the server to start paging memory or do something else that causes a long-term slowdown, and latencies do not recover at all until peak times are over, leading to depressed callout rates during the entire peak period. In either case, fewer callouts are made or responded to than if the quota had simply been set to a lower value.
- The "error on overload" bidder
This bidder accepts callouts up to a certain rate, then starts returning errors for some callouts. This may be done through internal timeouts, disabling connection queuing (controlled by
ListenBackLogon Apache), implementing a probabilistic drop mode when utilization or latencies get too high, or some other mechanism. If Google observes an error rate above 15%, we’ll start throttling. Unlike the "respond to everything" bidder, this bidder "cuts its losses," which allows it to recover immediately when request rates go down.
The graph of latency for this bidder resembles a shallow saw-tooth pattern during overloads, localized around the maximum acceptable rate.
- The "no-bid on overload" bidder
This bidder accepts callouts up to a certain rate, then starts returning "no-bid" responses for any overload. Similar to the "error on overload" bidder, this can be implemented in a number of ways. What's different here is that no signal is returned to Google, so we never throttle back on callouts. The overload is absorbed by the front-end machines, which only allow the traffic that they can handle to continue through to the backends.
The graph of latency for this bidder shows a plateau that (artificially) stops paralleling the request rate at peak times, and a corresponding drop in the fraction of responses that contain a bid.
We recommend combining the "error on overload" with the "no-bid on overload" approach, in the following way:
- Over-provision the front-ends and set them to error on overload, to help maximize the number of connections they can respond to in some form.
- When erroring on overload, the front-end machines can use a canned "no-bid" response, and do not need to parse the request at all.
- Implement health-checking of the backends, such that if none have sufficient capacity available, they return a "no-bid" response.
This allows some overload to be absorbed and gives the backends a chance to respond to exactly as many requests as they can handle. You can think of this as "no-bid on overload" with front-end machines falling back to "error on overload" when request counts are significantly higher than expected.
If you have a "respond to everything" bidder, consider transforming it into an "error on overload" bidder by tuning connection behavior so it in effect refuses to be overloaded. While this causes more errors to be returned, it reduces timeouts and prevents the server from getting into a state where it cannot respond to any requests.
Respond to pings
Making sure your bidder can respond to ping requests, while not connection management per se, is surprisingly important for debugging. Google uses ping requests for sanity-checking and debugging of bidder status, connection close behavior, latency, and more. Ping requests take the following form:
BidRequest < id: "1234567890123456" is_test: true is_ping: true >
Keep in mind that, contrary to what you might expect, the ping request does not contain any adslots. And, as detailed above, you should not close the connection after responding to a ping request.
Another way to reduce network latency or variability is to peer with Google. Peering helps optimize the path traffic takes to get to your bidder. The connection endpoints stay the same, but the intermediate links change. See the Peering guide for details. The reason to think of peering as a best practice can be summarized as follows:
On the internet, transit links are chosen primarily through "hot-potato routing," which finds the closest link outside of our network that can get a packet to its destination, and routes the packet through that link. When traffic traverses a section of backbone owned by a provider with whom we have many peering connections, the chosen link is likely to be close to where the packet starts. Beyond that point we have no control of the route the packet follows to the bidder, so it may be bounced to other autonomous systems (networks) along the way.
In contrast, when a direct peering agreement is in place, packets are always sent along a peering link. No matter where the packet originates, it traverses links that Google owns or leases until it reaches the shared peering point, which should be close to the bidder location. The reverse trip begins with a short hop to the Google network and remains on the Google network the rest of the way. Keeping most of the trip on Google-managed infrastructure ensures that the packet takes a low-latency route, and avoids much potential variability.