Introduction: DNS security threats and mitigations
Because of the open, distributed design of the Domain Name System, and its use of the User Datagram Protocol (UDP), DNS is vulnerable to various forms of attack. Public or "open" recursive DNS resolvers are especially at risk, since they do not restrict incoming packets to a set of allowable source IP addresses. We are mostly concerned with two common types of attacks:
- Spoofing attacks leading to DNS cache poisoning. Various types of DNS spoofing and forgery exploits abound, which aim to redirect users from legitimate sites to malicious websites. These include so-called Kaminsky attacks, in which attackers take authoritative control of an entire DNS zone.
- Denial-of-service (DoS) attacks. Attackers may launch DDoS attacks against the resolvers themselves, or hijack resolvers to launch DoS attacks on other systems. Attacks that use DNS servers to launch DoS attacks on other systems by exploiting large DNS record/response size are known as amplification attacks.
Each class of attack is discussed further below.
Cache poisoning attacks
There are several variants of DNS spoofing attacks that can result in cache poisoning, but the general scenario is as follows:
- The attacker sends a target DNS resolver multiple queries for a domain name for which they know the server is not authoritative, and that is unlikely to be in the server's cache.
- The resolver sends out requests to other name servers (whose IP addresses the attacker can also predict).
- In the meantime, the attacker floods the victim server with forged responses that appear to originate from the delegated name server. The responses contain records that ultimately resolve the requested domain to IP addresses controlled by the attacker. They might contain answer records for the resolved name or, worse, they may further delegate authority to a name server owned by the attacker, so that they take control of an entire zone.
- If one of the forged responses matches the resolver's request (for example, by query name, type, ID and resolver source port) and is received before a response from the genuine name server, the resolver accepts the forged response and caches it, and discards the genuine response.
- Future queries for the compromised domain or zone are answered with the forged DNS resolutions from the cache. If the attacker has specified a very long time-to-live on the forged response, the forged records stay in the cache for as long as possible without being refreshed.
For an excellent introduction to Kaminsky attacks, see An Illustrated Guide to the Kaminsky DNS Vulnerability.
DoS and amplification attacks
DNS resolvers are subject to the usual DoS threats that plague any networked system. However, amplification attacks are of particular concern because DNS resolvers are attractive targets to attackers who exploit the resolvers' large response-to-request size ratio to gain additional free bandwidth. Resolvers that support EDNS0 (Extension Mechanisms for DNS) are especially vulnerable because of the substantially larger packet size that they can return.
In an amplification scenario, the attack proceeds as follows:
- The attacker sends a victim DNS server queries using a forged source IP address. The queries may be sent from a single system or a network of systems all using the same forged IP address. The queries are for records that the attacker knows will result in much larger responses, up to several dozen times1 the size of the original queries (hence the name "amplification" attack).
- The victim server sends the large responses to the source IP address passed in the forged requests, overwhelming the system and causing a DoS situation.
The standard system-wide solution to DNS vulnerabilities is DNSSEC. However, until it is universally implemented, open DNS resolvers need to independently take some measures to mitigate against known threats. Many techniques have been proposed; see IETF RFC 5452: Measures for making DNS more resilient against forged answers for an overview of most of them. In Google Public DNS, we have implemented, and we recommend, the following approaches:
- Securing your code against buffer overflows, particularly the code responsible for parsing and serializing DNS messages.
- Overprovisioning machine resources to protect against direct DoS attacks on the resolvers themselves. Since IP addresses are trivial for attackers to forge, it's impossible to block queries based on IP address or subnet; the only effective way to handle such attacks is to simply absorb the load.
- Implementing basic validity-checking of response packets and of name server credibility, to protect against simple cache poisoning. These are standard mechanisms and sanity checks that any standards-compliant caching resolver should perform.
- Adding entropy to request messages, to reduce the probability of more sophisticated spoofing/cache poisoning attacks such as Kaminsky attacks. There are many recommended techniques for adding entropy, including randomizing source ports; randomizing the choice of name servers (destination IP addresses); randomizing case in name requests; and appending nonce prefixes to name requests. Below, we give an overview of the benefits, limitations, and challenges of each of these techniques, and discuss how we implemented them in Google Public DNS.
- Removing duplicate queries, to combat the probability of "birthday attacks".
- Rate-limiting requests, to prevent DoS and amplification attacks.
- Monitoring the service for the client IPs using the most bandwidth and experiencing the highest response-to-request size ratio.
Resolvers that implement DNSSEC counter cache poisoning attacks by verifying the authenticity of responses received from name servers. Each DNS zone maintains a set of private/public key pairs and for each DNS record, a unique digital signature is generated and encrypted using the private key. The corresponding public key is then authenticated via a chain of trust by keys belonging to parent zones. DNSSEC-compliant resolvers reject responses that do not contain the correct signatures. DNSSEC effectively prevents responses from being tampered with, because in practice, signatures are almost impossible to forge without access to private keys.
As of January 2013, Google Public DNS fully supports DNSSEC. We accept and forward DNSSEC-formatted messages and validate responses for correct authentication. We strongly encourage other resolvers to do the same.
We also cache NSEC responses as specified in IETF RFC 8198: Aggressive Use of DNSSEC-Validated Cache. This can reduce NXDOMAIN queries to name servers implementing DNSSEC and using NSEC for negative answers.
DNS over HTTPS
As of April 2016, Google Public DNS offers DNS over HTTPS, DNS resolution over an encrypted HTTPS connection. DNS over HTTPS prevents tampering, eavesdropping and spoofing, greatly enhancing privacy and security between a client and Google Public DNS. It complements DNSSEC to provide end-to-end authenticated DNS lookups.
Implementing basic validity checking
Some DNS cache corruption can be due to unintentional, and not necessarily malicious, mismatches between requests and responses (e.g. perhaps because of a misconfigured name server, a bug in the DNS software, and so on). At a minimum, DNS resolvers should put in checks to verify the credibility and relevance of name servers' responses. We recommend (and implement) all of the following defenses:
- Do not set the recursive bit in outgoing requests, and always follow delegation chains explicitly. Disabling the recursive bit ensures that your resolver operates in "iterative" mode so that you query each name server in the delegation chain explicitly, rather than allowing another name server to perform these queries on your behalf.
- Reject suspicious response messages. See below for details of what we consider to be "suspicious".
- Do not return A records to clients based on glue records cached from previous requests. For example, if you receive a client query for ns1.example.com, you should re-resolve the address, rather than sending an A record based on cached glue records returned from a .com TLD name server.
Rejecting responses that do not meet required criteria
Google Public DNS rejects all of the following:
- Unparseable or malformed responses.
- Responses where key fields do not match corresponding fields in the request. This includes query ID, source IP, source port, destination IP or query name. See RFC 5452, Section 3 for the complete description of DNS spoof behavior.
- Records which are not relevant to the request.
- Answer records for which we cannot reconstruct the CNAME chain.
- Records (in the answer, authority, or additional sections) for which the responding name server is not credible. We determine the "credibility" of a name server by its place in the delegation chain for a given domain. Google Public DNS caches delegation chain information, and we verify each incoming response against the cached information to determine the responding name server's credibility for responding to a particular request.
Adding entropy to requests
Once a resolver does enforce basic sanity checks, an attacker has to flood the victim resolver with responses in an effort to match the query ID, UDP port (of the request), IP address (of the response), and query name of the original request before the legitimate name server does.
Unfortunately, this is not difficult to achieve, as the one uniquely identifying field, the query ID, is only 16 bits long (i.e. for a 1/65,536 chance in getting it right). The other fields are also limited in range, making the total number of unique combinations a relatively low number. See RFC 5452, Section 7 for a calculation of the combinatorics involved.
Therefore, the challenge is to add as much entropy to the request packet as possible, within the standard format of the DNS message, to make it more difficult for attackers to successfully match a valid combination of fields within the window of opportunity. We recommend, and have implemented, all the techniques discussed in the following sections.
Randomizing source ports
As a basic step, never allow outgoing request packets to use the default UDP port 53, or to use a predictable algorithm for assigning multiple ports (e.g. simple incrementing). Use as wide a range of ports from 1024 to 65535 as allowable in your system, and use a reliable random number generator to assign ports. For example, Google Public DNS uses ~15 bits, to allow for approximately 32,000 different port numbers.
Note that if your servers are deployed behind firewalls, load-balancers, or other devices that perform network address translation (NAT), those devices may de-randomize ports on outgoing packets. Make sure you configure NAT devices to disable port de-randomization.
Randomizing choice of name servers
Some resolvers, when sending out requests to root, TLD, or other name servers, select the name server's IP address based on the shortest distance (latency). We recommend that you randomize destination IP addresses to add entropy to the outgoing requests. In Google Public DNS, we simply pick a name server randomly among configured name servers for each zone, somewhat favoring fast and reliable name servers.
If you are concerned about latency, you can use round-trip time (RTT) banding, which consists of randomizing within a range of addresses that are below a certain latency threshold (e.g. 30 ms, 300 ms, etc.).
Randomizing case in query names
The DNS standards require that name servers treat names with case-insensitivity. That is, the names example.com and EXAMPLE.COM should resolve to the same IP address2. However, in the response, most name servers echo back the name as it appeared in the request, preserving the original case.
Therefore, another way to add entropy to requests is to randomly vary the case
of letters in domain names queried.
This technique, also known as "0x20" because bit 0x20 is used to set the case of
US-ASCII letters, was first proposed in the IETF internet draft
Use of Bit 0x20 in DNS Labels to Improve Transaction Identity.
With this technique, the name server response must match not only the query name
but the case of every letter in the name string;
This may add little or no entropy to queries for the top-level and root domains,
but it's effective for most hostnames.
One significant challenge we discovered when implementing this technique is that some name servers do not follow the expected response behavior:
- Some name servers respond with complete case-insensitivity: they correctly return the same results regardless of case in the request, but the response does not match the exact case of the name in the request.
- Other name servers respond with complete case-sensitivity (in violation of the DNS standards): they handle equivalent names differently depending on case in the request, either failing to reply at all or returning incorrect NXDOMAIN responses that match the exact case of the name in the request.
For both of these types of name servers, altering the case of the query name would produce undesirable results: for the first group, the response would be indistinguishable from a forged response; for the second group, the response (if any) could be totally incorrect.
Our current solution to this problem is to create a whitelist of name servers which we know apply the standards correctly, and to only apply the case randomization technique in requests to those servers. We also list the appropriate exception subdomains for each of them, based on analyzing our logs. If a response that appears to come from those servers does not contain the correct case, we reject the response. The whitelisted name servers comprise more than 70% of our traffic.
Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.
Prepending nonce labels to query names
If a resolver cannot directly resolve a name from the cache, or cannot directly
query an authoritative name server, then it must follow referrals from a root or
TLD name server.
In most cases, requests to the root or TLD name servers will result in a
referral to another name server,
rather than an attempt to resolve the name to an IP address.
For such requests, it should therefore be safe to attach a random label to a
query name to increase the entropy of the request, while not risking a failure
to resolve a non-existent name.
That is, sending a request to a referring name server for a name prefixed with a
nonce label, such as
entriih-f10r3.www.google.com, should return the same
result as a request for
Although in practice such requests make up less than 3% of outgoing requests, assuming normal traffic (since most queries can be answered directly from the cache or by a single query), these are precisely the types of requests that an attacker tries to force a resolver to issue. Therefore, this technique can be very effective at preventing Kaminsky-style exploits.
Implementing this technique requires that nonce labels only be used for requests that are guaranteed to result in referrals; that is, responses that do not contain records in the answers section. However, we encountered several challenges when attempting to define the set of such requests:
- Some country-code TLD (ccTLD) name servers are actually authoritative for
other second-level TLDs (2LDs).
Although they have two labels, 2LDs behave just like TLDs, which is why they
are often handled by ccTLD name servers.
For example, the
.ukname servers are also authoritative for the
nic.ukzones, and, hence, hostnames contained in those zones, such as
www.mod.uk, and so on. In other words, requests to ccTLD name servers for resolution of such hostnames will not result in referrals, but in authoritative answers; appending nonce labels to such hostnames will cause the names to be unresolvable.
- Sometimes generic TLD (gTLD) name servers return non-authoritative responses
for name servers.
That is, there are some name server hostnames that happen to live in a gTLD
zone rather than in the zone for their domain.
A gTLD will return a non-authoritative answer for these hostnames,
using whatever glue record it happens to have in its database, rather than
returning a referral.
For example, the name server
ns3.indexonlineserver.comused to be in the
.COMgTLD zone rather than in the
indexonlineserver.comzone. When we issued a request to a gTLD server for
n3.indexonlineserver.com, we got an IP address for it, rather than a referral. However, if we prepended a nonce label, we got a referral to
indexonlineserver.com, which was then unable to resolve the hostname. Therefore, we cannot append nonce labels for name servers which require a resolution from a gTLD server.
- Authorities for zones and hostnames change over time. This can cause a nonce-prepended hostname that was once resolvable to become unresolvable if the delegation chain changes.
To address these challenges, we created a "blacklist" file containing exceptions for which we cannot append nonce labels. The file is populated with hostnames for which TLD name servers return non-referring responses, according to our server logs. We continually review the exceptions list to ensure that it stays valid over time.
Removing duplicate queries
DNS resolvers are vulnerable to "birthday attacks", so called because they exploit the mathematical "birthday paradox", in which the likelihood of a match does not require a large number of inputs. Birthday attacks involve flooding the victim server not only with forged responses but also with initial queries, counting on the resolver to issue multiple requests for a single name resolution. The greater the number of issued outgoing requests, the greater the probability that the attacker will match one of those requests with a forged response: an attacker only needs on the order of 300 in-flight requests for a 50% success chance at matching a forged response, and 700 requests for close to 100% success.
To guard against this attack strategy, you should be sure to discard all duplicate queries from the outbound queue. For example, Google Public DNS, never allows more than a single outstanding request for the same query name, query type, and destination IP address.
Preventing denial-of-service attacks poses several particular challenges for open recursive DNS resolvers:
- Open recursive resolvers are attractive targets for launching amplification attacks. They are high-capacity, high-reliability servers and can produce larger responses than a typical authoritative name server—especially if an attacker can inject a large response into their cache. It is incumbent on any developer of an open DNS service to prevent their servers from being used to launch attacks on other systems.
- Amplification attacks can be difficult to detect while they are occurring. Attackers can launch an attack via thousands of open resolvers, so that each resolver only sees a small fraction of the overall query volume and cannot extract a clear signal that it has been compromised.
- Malicious traffic must be blocked without any disruption or degration of the DNS service to normal users. DNS is an essential network service, so shutting down servers to cut off an attack is not an option, nor is denying service to any given client IP for too long. Resolvers must be able to quickly block an attack as soon as it starts, and restore fully operational service as soon as the attack ends.
The best approach for combating DoS attacks is to impose a rate-limiting or "throttling" mechanism. Google Public DNS implements two kinds of rate control:
- Rate control of outgoing requests to other name servers. To protect other DNS name servers against DoS attacks that could be launched from our resolver servers, Google Public DNS enforces QPS limits on outgoing requests from each serving cluster for each name server IP address.
Rate control of outgoing responses to clients. To protect any other systems against amplification and traditional distributed DoS (botnet) attacks that could be launched from our resolver servers, Google Public DNS performs two types of rate limiting on client queries:
- To protect against traditional volume-based attacks, each server imposes per-client-IP QPS and average bandwidth limits.
- To guard against amplification attacks, in which large responses to small queries are exploited, each server enforces a per-client-IP maximum average amplification factor. The average amplification factor is a configurable ratio of response-to-query size, determined from historical traffic patterns observed in our server logs.
If DNS queries from one source IP address exceed the maximum QPS rate, excess queries will be dropped. If DNS queries over UDP from one source IP address exceed the average bandwidth or amplification limit consistently (the occasional large response will pass), queries may be dropped or only a small response may be sent. Small responses may be an error response or an empty response with the truncation bit set (so that most legitimate queries will be retried via TCP and succeed). Not all systems or programs will retry via TCP, and DNS over TCP may be blocked by firewalls on the client side, so some applications may not operate correctly when replies are truncated. Nonetheless, truncation allows RFC-compliant clients to work properly in most cases.