EDNS Client Subnet (ECS) Guidelines

Introduction

RFC 7871 – Client Subnet in DNS Queries – defines a mechanism for recursive resolvers like Google Public DNS to send partial client IP address information to authoritative DNS name servers. Content Delivery Networks (CDNs) and latency-sensitive services use this to give accurate geo-located responses when responding to name lookups coming through public DNS resolvers.

The RFC describes ECS features that authoritative name servers must implement; but implementers don’t always follow those requirements. There are also ECS operational and deployment issues the RFC does not address that can cause problems for resolvers like Google Public DNS that auto-detect ECS support in authoritative name servers, as well as resolvers that require ECS whitelisting, like OpenDNS.

These guidelines are intended to help authoritative DNS implementers and operators avoid many common mistakes that can cause problems for ECS.

Definitions of Terms

We use the following terms to describe ECS operations:

  • A name server implements (or supports) ECS if it replies to ECS queries with ECS responses that have matching ECS options (even if the ECS options always have a global /0 scope prefix length).

  • A zone is ECS-enabled if ECS queries to its name servers sent with a non-zero source prefix receive ECS responses with a non-zero scope.

Guidelines for Authoritative Name Servers

  1. All authoritative name servers for an ECS-enabled zone must enable ECS for the zone.

    • Even if only one name server does not implement ECS or enable it for the zone, it quickly becomes the source of most cached data. Because its responses have global scope they are used (until their TTL expires) as the response to all queries for the same name (regardless of client subnet). Responses from servers that do implement ECS and enable it are only used for queries from clients within the specific scope, so they are much less likely to be used than the global scope responses.
  2. Authoritative name servers that implement ECS must send ECS responses to ECS queries for all zones served from an IP address or NS hostname, even for zones that are not ECS-enabled.

    • Google Public DNS auto-detects ECS support by IP address rather than name server hostname or DNS zone because the number of addresses is smaller than the number of name server hostnames and much smaller than the number of DNS zones. If an authoritative name server does not always send ECS responses to ECS queries (even for zones that are not ECS-enabled), Google Public DNS may stop sending it ECS queries.
  3. Authoritative name servers that implement ECS must respond to all ECS queries with ECS responses, including negative and referral responses.

    • The same issues about auto-detection of ECS support apply here too.

    • Negative responses (NXDOMAIN and NODATA) should use global /0 scope for better caching and compatibility with RFC 7871.2

    • Besides NXDOMAIN and NODATA (NOERROR with empty answer section), other error responses to ECS queries (particularly SERVFAIL and REFUSED) should include a matching ECS option with global /0 scope.

      • If an authoritative name server is attempting to shed load from a DoS attack, it can return a SERVFAIL without ECS data; doing this repeatedly causes Google Public DNS to stop sending queries with ECS (which may reduce the number of legitimate queries they send, but would not affect random subdomain attack queries). Reducing legitimate query load during a DoS attack may or may not improve the success rate for legitimate queries (although responses can be served from cache for all clients).

        A more effective load-shedding approach is to send all responses with global /0 scope so that Google Public DNS continues to send ECS queries. This lets Google Public DNS return geo-located responses much sooner after the attack stops, as it does not need to re-detect ECS support, just to re-query once the global scope response TTLs expire.

    • Referral (subdelegation) responses must also have matching ECS data. They should generally use a global /0 scope, but if geo-located glue records are present this is not required (it is still a pretty questionable practice).

  4. Authoritative name servers that implement ECS must include a matching ECS option in responses to all query types received with an ECS option. It's not good enough to respond to IPv4 address (A) queries with ECS data; responses to A, AAAA, PTR, MX, or any other query type must have matching ECS data or resolvers may drop the response as a possibly forged response, and Google Public DNS may stop sending queries with ECS data.

    In particular, ECS responses to SOA, NS, and DS queries should always use global /0 scope for better caching and a consistent view of delegation (geo-located responses to A/AAAA queries for name server hostnames are OK). Responses to any query type (e.g. TXT, PTR, etc) that do not change based on ECS data should not use a scope equal to the source prefix length, they should use a global /0 scope for better caching and reduced query load.

  5. Authoritative name servers returning ECS-enabled CNAME responses SHOULD3 only include the first CNAME in the chain, and the final target of the CNAME chain should be ECS-enabled to the same scope prefix length. Because of ambiguity in the ECS specification, some recursive resolvers (notably Unbound4) may return a response with the scope of the final non-CNAME domain (/0 if it is not ECS-enabled).

  6. ECS data may contain IPv6 addresses even for IPv4-only name servers (and vice-versa, although IPv6-only name servers are rare).

    • Name servers need to respond with valid ECS option data (/0 scope is OK, but source address and prefix length must match).

    • ECS for a zone can be enabled separately for IPv4 and IPv6 addresses.

  7. Authoritative name servers returning ECS-enabled responses should not return answers with overlapping scope prefixes. An example of overlapping scopes would be the following:

    • Query with source prefix 198.18.0.0/15: response A with scope prefix 198.0.0.0/8

    • Query with source prefix 198.51.100/24: response B with scope prefix 198.51.0.0/16

    If a client queries an ECS-enabled recursive resolver in the order above, both queries may get response A, because the scope of the cached response A includes the second query’s prefix scope. Even if the client queries are made in the opposite order, and both queries are forwarded to authoritative name servers, cached responses may expire at different times; subsequent queries to the recursive resolver in the overlapping prefix 198.51.100/24 could get either response A or B.

  8. When implementing ECS support for the first time on name servers, use new IP addresses for name servers serving these ECS enabled zones.

    • When authoritative name servers that implemented ECS but returned global scope results start returning ECS enabled answers for a zone, Google Public DNS starts returning geo-located responses to queries as soon as the TTLs of previous global scope responses expire.

    • Google Public DNS auto-detection of ECS support very rarely tries ECS queries for an IP address (or name server hostname) when it has auto-detected lack of ECS support (timeouts, returning FORMERR, BADVERS, or sending non-ECS responses). New ECS implementations on those IP addresses (or NS hostnames) are auto-detected very slowly, or not at all.

  9. Make sure that network connections are reliable and that any response rate limiting is set sufficiently high that name servers do not drop queries (or worse, respond with errors lacking a matching ECS option).

    • For name servers implementing response rate limiting on ECS queries, the best response is NODATA with the truncation (TC) flag set, containing only a matching question section and a matching ECS option.
  10. Send timely responses to all queries (ideally within 1 second).

    • Using online Geo-IP lookup services for ECS queries won't work reliably, as the cumulative latency of the DNS query and online Geo-IP service is unlikely to be within one second. Google Public DNS auto-detection of ECS support considers delayed responses an indication of poor or incomplete ECS support, and reduces the likelihood that future queries are sent with ECS. If enough responses are delayed, it stops sending ECS queries.

1 https://tools.ietf.org/html/rfc7871#section-7.2.1

2 https://tools.ietf.org/html/rfc7871#section-7.4

3 https://tools.ietf.org/html/rfc7871#section-7.2.1

4 https://unbound.net/pipermail/unbound-users/2015-May/003875.html

Using the scope of the final domain in a CNAME chain is harmless in Unbound, since it is usually deployed as a local stub or forwarding resolver, where all clients are in the same subnet and would get the same response.