Infrastructure Challenges to DNS Scaling

by Bill Manning

This article looks a few steps beyond the Root Scaling Study report from 2009.[1] In 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) board commissioned a report to evaluate the effect of scaling the root zone from its current size to an undefined but larger root zone. Attributes considered were Domain Name System Security Extensions (DNSSEC), Internet Protocol Version 6 (IPv6), Internationalized Domain Names (IDNs), and a larger number of entries in the zone. The report itself focused on the editorial processes and presentation of the finished root zone to the greater Internet. The report concluded that with prudence and with the addition of some “watch & warn” systems in place, the root zone could accommodate adding IPv6, DNSSEC, and IDNs along with other new Top-Level Domain (TLD) entries in a controlled manner. What the report did not consider was the effects of the deployed Internet infrastructure on the ability to get this new information into the rest of the Domain Name System (DNS) infrastructures of the Internet. Early experimental evidence[7, 8] suggests that the current state of infrastructure deployment will create problems for the deployment of these attributes.

Until recently the root zone of the DNS has enjoyed two important stabilizing properties:

  • It is relatively small—currently the root zone holds delegation information for 280 generic, country-code, and special-purpose TLDs, and the size of the root zone file is roughly 80,000 bytes.
  • It changes slowly—on average, the root zone absorbs less than one change per TLD per year, and the changes tend to be minor.

The root system has therefore evolved in an environment in which information about a small number of familiar TLDs remains stable for long periods of time. However, the type, amount, and volatility of the information that is contained in the root zone are expected to change as a result of the following four recent or pending policy decisions:

  • Support for DNSSEC, or “signing the root”
  • The addition of IDN TLDs
  • Support for the additional larger addresses associated with IPv6
  • The addition of new TLDs

These changes are placed in a backdrop of an infrastructure that is fundamentally changing, removing a third attribute of a stable DNS that was the presumption of a common transport protocol with well-defined constraints.

Core Design Principles

The DNS was designed so that queries and responses would have the greatest chance of survival and broadest reachability by using an IPv4 default User Datagram Protocol (UDP) packet size of 512 bytes for the initial bootstrapping. Larger packet sizes are supported and the Transmission Control Protocol (TCP) was defined and used as an alternate transport protocol—but expected to be infrequently used.

With these core principles intact, the DNS was able to successfully evolve into a highly decentralized dynamic system. The geographic and organizational decentralization of the root system arises from a deliberate design decision in favor of diversity and minimal fate-sharing coordination, which confers substantial stability and robustness benefits on the global Internet.

Simple quantitative extrapolation from a baseline model of the current DNS does not predict realistic future states of the system beyond the very short term, because:

  • Each part of the system adapts in different ways to changes in the quantity, type, and update frequency of information, while also responding to changes in the rest of the Internet.
  • These adaptations are not—and cannot be—effectively coordinated.
  • For some, if not all, of the actors, nonquantifiable considerations dominate their individual adaptation behavior (both strategically, in a planning context, and tactically, in an operations context).

The risks associated with adding DNSSEC and IPv6 addresses to the DNS simultaneously change the basic assumption for DNS Query/Response reachability. Signing DNS data would, by itself, immediately increase the size of any zone by roughly a factor of 4 and increase the size of the response message[2]. The consequences of the second of these effects could be absorbed by replanning in order to recover lost headroom by adding bandwidth. Adding IPv6 addresses would in addition increase the size of any response. However, simply adding additional bandwidth may be insufficient when there are middleboxes, application layer gateways, or divergent transport options between the query path and the response path.

In these cases more information has to be carried in the packets that are returned in response to a query, meaning that the required amount of network bandwidth needed to support the operations of the server increases. As the DNS messages get bigger, they will no longer fit in single 512-byte packets forwarded by the UDP transport mechanism of the Internet. This situation will lead to clients being forced to resend their queries using UDP “jumbograms” or the TCP transport mechanism—a mechanism that has much more overhead and requires the end nodes to maintain much more state information. It also has much more overhead in terms of “extra packets” sent just to keep things on track. The benefit is, of course, that it can carry much larger pieces of information.

Moving the root system from its default UDP behavior to UDP “jumbograms” or TCP will not only have the undesirable effects mentioned previously, it will also affect the current trend of deploying servers using IP anycast[10]. Anycast works well with single packet transactions (such as UDP), but is much less well suited to handle TCP packet streams. If TCP transactions become more prevalent, the anycast architecture may require changes.

The point of view from the client side is worth mentioning. In certain client configurations, where firewalls are incorrectly configured[3], the following scenario can occur:

A resolver inside the misconfigured firewall receives a DNS request that it cannot satisfy locally. The query is sent to the root servers, usually over UDP, and a root server responds to this query with a referral, also over UDP. Today, this response fits nicely in 512 bytes. It is also true that for the past 6 years, the Internet Systems Consortium (ISC) has been anticipating DNSSEC and has shipped resolver code that, by default, requests DNSSEC data. After the root is signed, the response no longer fits into a 512-byte message. Estimates from the National Institute of Standards and Technology (NIST), using standard key lengths, indicate that DNSSEC will push the response to at least 2048 bytes or larger. This larger response will not be able to get past a misconfigured firewall that restricts DNS packets to 512 bytes, not recognizing the more modern extensions to the protocol that allow for bigger packets.

Upon not receiving the answer, the resolver on the inside will then retry the query, setting the buffer size to 512 bytes. The root will resend the response using smaller packets, but because it does not fit in a 512-byte packet, will fragment the response into a series of 512-byte replies, and the root server will set the “fragmented” and “truncated” flags in the packets, indicating to the resolver that the answer was fragmented and truncated, and encouraging the resolver to retry the query once more using TCP transport. The resolver will do so, and the root server will respond using TCP, but the misconfigured firewall also will reject DNS over TCP, because this transport has not been considered a normal or widely used transport for DNS queries.

In this worst case, a node will be unable to get DNS resolution after the root zone is signed, and the DNS traffic will triple, including one round in which TCP state must be maintained between the server and the resolver. There are of course ways around this problem, the most apparent ones being to configure the firewall correctly, or to configure the resolver to not ask for DNSSEC records.

Effect of IPv6 on Priming Queries

The basic DNS protocol specifies that clients, resolvers, and servers be capable of handling message sizes of at least 512 bytes. They may support larger message sizes, but are not required to do so.

The 512-byte “minimal maximum” was the original reason for having only nine root servers. In 1996 Bill Manning, Mark Kosters, and Paul Vixie presented a plan to Jon Postel to change the naming of the root name servers to take advantage of DNS label compression and allow the creation of four more authoritative name servers for the root zone. The outcome was the root name server convention as it stands today.

The use of 13 “letters” left a few unused bytes in the priming response, which were left there to allow for changes—which soon arrived. With the advent of IPv6 addressing for the root servers, it was no longer possible to include both an IPv4 “A” record and an IPv6 “AAAA” record for every root server in the priming response without truncation; AAAA records for only two servers could be included without exceeding the 512-byte limit. Fortunately the root system was able to rely on the practical circumstance that any node asking for IPv6 address information also supported Extension Mechanisms for DNS (EDNS0)[4].

DNSSEC also increases the size of the priming response, particularly because there are now more records in the Resource Record set and those records are larger. In [5] the authors make the following observation: “The resolver MAY choose to use DNSSEC OK[6], in which case it MUST announce and handle a message size of at least 1220 octets.”

EDNS and MTU Considerations

The changes described will also affect other parts of the Internet, including (for example) end-system applications such as web browsers; intermediary “middleboxes” that perform traffic shaping, firewall, and caching functions; and Internet Service Providers (ISPs) that “manage” the DNS services provided to customers.

Although modern DNS server software defaults to using EDNS0, current measurement[7] collected from several of the RFC 1918[11] servers suggests that EDNS0 usage has not yet reached generally accepted levels of usefulness. Over the 12-month study, the ratio of ENDS0 queries received at these nodes remained at roughly 65 percent of the total queries received, with about 33 percent being non-EDNS queries. In the “other” camp are queries that set EDNS0 but then restrict packet sizes to 512 bytes. These queries cannot use the larger, negotiable Maximum Transmission Unit (MTU) sizes for larger UDP responses and therefore must use TCP to support larger responses. Some evidence suggests that with signed data, there is a pattern of retransmission of queries when responses larger than 512 bytes are generated and blocked. Such retransmissions can take as long as 7 seconds before timing out.

Lack of EDNS0 support in DNS caches suggests that many parts of the Internet will be constrained to using the traditional UDP sizes or will fall back to using TCP. Even where EDNS0 is indicated as being available, there are increased difficulties in knowing or negotiating a consistent Path Maximum Transmission Unit (Path MTU)[8].

The data supports an argument that the expectation of a useful UDP “jumbogram” or enough resources to manage hundreds of thousands or millions of TCP connections is unfounded because of historical expectations on “normal” DNS packet profiles. Clean, clear Internet paths that will allow larger packet sizes are rare, particularly when crossing the Internet. Locally, it is much more likely that larger packet sizes will be found and supported, raising the question for wide-scale deployment of IPv6 or DNSSEC because both attributes require larger packet sizes regardless of transport. If neither larger UDP packets nor TCP will be viable, what other choices are there?

Recent work inside the Internet Engineering Task Force (IETF) is exploring the use of the Hypertext Transfer Protocol (HTTP) as an alternative transport protocol for DNS messages.[9] It might be possible to augment the deployed DNS base to understand the addition of a third transport protocol.

The augmentation of the DNS protocol to support multiple transport protocols will require additional logic on the part of the servers to keep track of which transport a query was received on and select that transport when sending back the response. It will also require more complex logic to determine failover selection from one transport to another.

With the efforts going into making the infrastructure of the Internet IPv6-capable, it is possible that the underlying MTU problems may be corrected faster than adoption of a new transport protocol for the DNS. Certainly MTU problems have been considered for many years and for slightly different reasons[8] principally related to faster signaling rates and changes in the types of data being moved through the Internet. Regardless, this transition will take considerably more time than a simple DNS code refresh. Full support for larger packet sizes in the DNS will require changes in the equipment and code that comprise the baseline Internet infrastructure—and such changes may take decades.

References

    [1]  Jaap Akkerhuis, Lyman Chapin, Patrik Fältström, Glenn Kowack, Lars-Johan Liman, and Bill Manning, “Report on the Impact on the DNS Root System of Increasing the Size and Volatility of the Root Zone, Prepared by the Root Scaling Study Team,” Version 1.0, September 2009.
[2]  “DNSSEC and Its Impact on DNS Performance,” 17 August 2009, http://www.dnsops.gov/dnssec-perform.html
[3]  Ray Bellis and Lisa Phifer, “Test Report: DNSSEC Impact on Broadband Routers and Firewalls,” SAC035, 16 September 2008,http://www.icann.org/en/committees/security/ssac-documents.htm
[4]  Paul Vixie, “Extension Mechanisms for DNS (EDNS0),” RFC 2671, August 1999.
[5]  Peter Koch and Matt Larson, “Initializing a DNS Resolver with Priming Queries, Internet Draft, expired, July 2008, http://tools.ietf.org/id/draft-ietf-dnsop-resolver-priming-01.txt
[6]  Roy Arends, Rob Austein, Matt Larson, Dan Massey, and Scott Rose, “DNS Security Introduction and Requirements,” RFC 4033, March 2005.
[7]  EDNS Support: http://www.ripe.net/data-tools/dns/as112/edns
[8]  Matt Mathis, “The Case for Raising the Internet MTU,” July 2003, http://staff.psc.edu/mathis/papers/Cisco200307/index.html
[9]  Mohan Parthasarathy and Paul Vixie, “Representing DNS Messages Using XML,” Internet Draft, work in progress, September 2011, http://www.ietf.org/id/draft-mohan-dns-query-xml-00.txt
[10]  Ted Hardie, “Distributing Authoritative Name Servers via Shared Unicast Addresses,” RFC 3258, April 2002.
[11]  Yakov Rekhter, Robert G Moskowitz, Daniel Karrenberg, Geert Jan de Groot, and Eliot Lear, “Address Allocation for Private Internets,” RFC 1918, February 1996.

BILL MANNING has been in the network field since 1979, most recently with Booz Allen Hamilton. He has been an IETF Working Group chair, RFC author, and an ARIN Trustee, and he has been on numerous ICANN committees. He has worked as part of the teams that run Internet Root name servers, built the first Internet Exchange points, and worked on transitioning from NSFnet to commercial services. Current client work is focused on Internet Policy and Governance, Risk Analysis, and the future of naming systems.
E-mail: bmanning@sfc.keio.ac.jp

Leave a Reply

*