Please stop using DNS protocol for your application

At previous stages of my career I spent 5 years working with the DNS protocol and during this time I have collected a fairly large number of ways to shoot myself in the foot in a seemingly empty place.

DNS is great for the task of bringing visitors to your site or API endpoint, but that’s where use of DNS protocol in your application should end. If you need to publish some configuration for your clients or you’re looking for best protocol for service discovery then please do not use DNS.

Let’s follow the process of developing DNS client from scratch to highlight most obvious implementation issues you may experience. I must say right away that the number of good libraries for working with DNS is very small, I can only recommend MiekG DNS, although it is also not suitable as a client library. Almost all DNS client implementations present in the standard libraries of programming languages are oversimplified and have poor interfaces that hide many important details. I’m sure that my list of good libraries is incomplete, but that doesn’t mean they don’t exist.

We made sure that there is no implementation of the DNS protocol for our programming language, and we began to implement our own client. It is well known that DNS is one of the few protocols that uses the UDP protocol and we have already prepared a UDP client that establishes a connection to a remote host on which a recursive DNS resolver is expected to listen on port 53.

The next step is to try to tell the DNS server what we specifically want from it. Let’s look on DNS message diagram to get some idea about way to make query (thanks to authors of this image):

It looks scary and very sophisticated for unprepared reader, but we see the Questions field and can guess that this is where we need to add our request (encoded in a special wire format of course).

The first problem you may encounter is that the word Question is plural. No, there is no possibility to request more than 1 record in real implementation of RFC compliant DNS servers, so you can ask just one question.

But what if your application supports dual stack (IPv4 + IPv6 addresses) and you don’t want to make two consecutive requests (A and then AAAA, although I would recommend the opposite sequence). Obviously, the ANY query type comes to the rescue, which will return all records.

This is another trap! Many DNS providers are moving away from the ANY type, and there is an RFC about it. For example, does not support this type and you will receive NOTIMP in return. Therefore, you will have to do name resolution sequentially using two queries.

So, we got to the point where we created a DNS query that requests A record for domain. How will the server respond and will it respond? Actually it may not provide answer at all, but setting the exact limit, how long to wait for a response and how many times to repeat it is a very difficult task. In the wild environment of the Internet, this figure can vary from three hundreds of milliseconds to several thousand milliseconds. Therefore, be very careful with the limit.

The server responded! How do you know if the search for the requested name was successful? We look more closely and see the rCode field. What does a qualified programmer do? He/she goes to the IANA site and looks at the allowed values for this field:

A couple more seconds of careful search and common sense and we found NoError. Obviously, this is how the server reports that the name lookup was successful. Hooray!

Let’s look again at the picture with the structure of the DNS message and try to find exactly where to look for the answer. What is the difference between answer, authority and additional? Common sense can suggest that we may find answer in answer section but we still have no idea about purpose of other fields.

Will it always be this way? No. In the DNS protocol, it is perfectly normal (also known as NODATA) to return rCode set to NoError and an empty Answer section. In this case, the authority section will contain a completely unnecessary for us SOA record.

So, we come to the conclusion that the success of a DNS request can be determined by two checks: rCode == NoError and the answer section is not empty.

Hooray! We are close to a working DNS client. It’s time to try running it from a cloud server in order to test it in conditions closer to production. But here’s the bad luck. Our client just stopped working. We can see that rCode is equal to NoError and the Answer section is not empty, but there is no A record in it. Instead, it has some kind of CNAME string which points to another hostname and no traces of the IP addresses at all. This is where we come nearer to the reasons for my advice to avoid using DNS for applications unless other options exist.

Yes, CNAME responses are perfectly legitimate and are used by various recursive and authoritative server implementations. DNS servers can return a single CNAME, or they can return a chain of multiple CNAMEs, or even a CNAME and A/AAAA record. Each of these cases must be handled carefully and another (recursive only) request is required to get the IP address of the target server from CNAME.

For the next six months, our application works just fine and our client successfully receives a list of 3 backend servers from the DNS. Due to the increase in load, you decide to add 10 more servers. But something strange is happening. Only some of the servers are returned in the response from the DNS server.

What happened? So, we ran into the DNS response length limit — 512 bytes. What to do in this case? Time to uncover the skills of working with the TCP protocol and implement support for it in our DNS client. When server returns a response with the TC flag over the UDP protocol, the request must be repeated over TCP, which has no response size limitations.

Congratulations, we took into account almost all the well known issues and created a stable DNS client which works with all the variety of authoritative and recursive DNS servers.

Unfortunately, with the subsequent improvement of your client you will face more issues in quite unexpected places. For example, when trying to add a cache for your client, you will face the fact that the duration of caching for successful requests when the server responded with records and unsuccessful ones should be different and duration of caching must not exceeds values announced by remote server in TTL field for records or in SOA record for negative lookups.

Oh yes, one more issue. It is almost impossible to distinguish a temporary DNS server failure from a permanent one, since in both cases you will see SERVFAIL.

I hope you will not use DNS protocol where it is not needed and will serve configuration to your applications in JSON format using HTTPS protocol.

Subscribe to Pavel's blog about underlying Internet technologies

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.