The latest DNS bug is terrifying, widespread, and reveals deep flaws in Internet security


[Read the post]


Luckily it can be mitigated by using a caching resolver that limits the response length below the size that causes the buffer overflow. Most such things do it.

…of course if the attacker has control over the resolver, e.g. controls a wifi network or does a MITM on your connection, you’re in potential trouble.


With the Internet of Things spreading extraordinarily, perhaps it’s time to be less concerned about being able to spy on every last phone call and more concerned about how we can make sure innovators have better environments to build upon.

That’s adorable! <3


So, it’s another buffer overflow thing? I thought plans to mitigate buffer overflow problems once and forever were instigated quite some time ago, by clearly segregating code from data. Surely that went somewhere?


From the FA:

Length Limits Are Silly Mitigations
No other way to say it. Redhat might as well have suggested filtering all AAAA (IPv6) records – might actually be effective, as it happens, but it turns out security is not the only engineering requirement at play. DNS has had to engineer several mechanisms for sending more than 512 bytes, and not because it was a fun thing to do on a Saturday night. JavaScript is not the only thing that’s gotten bigger over the years; we are putting more and more in there and not just DNSSEC signatures either. What is worth noting is that IT, and even IT Security, has actually learned the very very hard way not to apply traditional firewalling approaches to DNS. Basically, as a foundational protocol it’s very far away from normal debugging interfaces. That means, when something goes wrong – like, somebody applied a length limit to DNS traffic who was not themselves a DNS engineer – there’s this sudden outage that nobody can trace for some absurd amount of time. By the time the problem gets traced…well, if you ever wondered why DNS doesn’t get filtered, that is why.

And ultimately, any DNS packet filter is a poor version of what you really want, which is an actual protocol enforcing scrubbing firewall, i.e. a name server that is not a stub, though it might be a forwarder (meaning it enforces all the rules and provides a cache, but doesn’t wander around the Internet resolving names). My expectations for mitigations, particularly as we actually start getting some real intelligence around cache traversing glibc attacks, are:

We will put more intelligent resolvers on more devices, such that glibc is only talking to the local resolver not over the network, and
Caching resolvers will learn how to specially handle the case of simultaneous A and AAAA requests. If we’re protected from traversing attacks it’s because the attacker just can’t play a lot of games between UDP and TCP and A and AAAA responses. As we learn more about when the attacks can traverse caches, we can intentionally work to make them not.
Local resolvers are popular anyway, because they mean there’s a DNS cache improving performance. A large number of embedded routers are already safe against the verified on-path attack scenario due to their use of dnsmasq, a common forwarding cache.

Note that technologies like DNSSEC are mostly orthogonal to this threat; the attacker can just send us signed responses that he in particular wants to break us. I say mostly because one mode of DNSSEC deployment involves the use of a local validating resolver; such resolvers are also DNS caches that insulate glibc from the outside world.

There is the interesting question of how to scan and detect nodes on your network with vulnerable versions of glibc. I’ve been worried for a while we’re only going to end up fixing the sorts of bugs that are aggressively trivial to detect, independent of their actual impact to our risk profiles. Short of actually intercepting traffic and injecting exploits I’m not sure what we can do here. Certainly one can look for simultaneous A and AAAA requests with identical source ports and no EDNS0, but that’s going to stay that way even post patch. Detecting what on our networks still needs to get patched (especially when ultimately this sort of platform failure infests the smallest of devices) is certain to become a priority – even if we end up making it easier for attackers to detect our faults as well.

If you’re looking for actual exploit attempts, don’t just look for large DNS packets. UDP attacks will actually be fragmented (normal IP packets cannot carry 2048 bytes) and you might forget DNS can be carried over TCP. And again, large DNS replies are not necessarily malicious.

And thus, we end up at a good transition point to discuss security policy. What do we learn from this situation?


That’s EXACTLY what I meant - the length limit I talked about is the 2048 bytes, and the limitation I meant is implemented in most caching resolvers. dnsmasq is a good choice though I prefer djbdns myself for now.


I think you’re thinking of the attempts at a non-executable stack. If non-executable stack protection is on it will be at a minimum much harder for an attacker to exploit for RCE. If you’re attacked you’d still find a program crashing somewhere odd and could DoS a target. Implementations of non-executable stacks on Linux are hardware-specific, and are not guaranteed to be present/enabled since this affects all hardware. Even common x86 Linux distros like Ubuntu and Fedora don’t enable non-executable stack protection by default since it would break hardware support for some platforms. Unless you have the luxury of throwing out support for old hardware it’s a really hard problem to solve.


Couldn’t the support be autodetected by the installer?


As much as neck beards shit on managed code, it can be very resilient to this type of super common exploit. Obviously there’s no cure for stupid coding but it helps catch common mistakes.


Non-executable stacks are part of the world of post-exploitation defenses I’ve kind of lost some faith in. Not that they’re not cool and all, but when these bugs are found, a (small?) number of hackers reliably are able to break through these defenses in a day or two.

Maybe we can do better.


Yeah by coding in python :wink:

(former C programmer here, and author of many buffer overflow bugs).


Yeah, you’re right. There are ways around non-executable stacks, and while ASLR makes that harder, it’s not sufficient protection against a skilled attacker, and it’ll only get worse once the exploits start showing up with so many vectors. Stack canaries could theoretically help, but in practice since that has to be compiled in, with the scope of things calling getaddrinfo() it’s useless.


While using higher-level languages can help, this affects Haskell (and Ruby, Perl, Python, Java, JavaScript, and pretty much every other lang. runtime, and a Jesus-ton of other code all over like ‘sudo’ - ‘sudo’!). You shouldn’t write anything in C unless you really, really have to, but there’s a lot that ultimately winds it’s way down to libc regardless of the language you’re writing your code in.


All of your internet services do this then? You know this for a fact?


ha ha ha

Not as long as things are written in C or C++! We talk about “footguns” at work and C and C++ are the biggest.


Channelling Andrew Tannenbaum…

With a micro kernel, the resolver would be a service, running in its own address space and could be written in a high level language, so buffer overflow issues like this would be less likely.


I get the feeling that Linus might not be the biggest fan of microkernels. :slightly_smiling: The down side is performance, but at this point the world would be better off suffering with slightly slower software free from this class of internet-wide security nightmare. At some point some kind of micro-sandboxing of individual code paths is a thing the Linux world should really work to make a reality.


One more bullet dodged by the BSD boxen.


Most ISPs run caching resolvers. Or you can use OpenDNS or the Google’s DNS servers. Some home network gateways run some lightweight version of such server as embedded. Or you can run your own, to be sure (and to be able to selectively block or redirect domains); it’s a handy tool.

Won’t 100% protect portable devices that connect to unknown networks (though you can run your own caching resolver on those ones, too; easy on a laptop, should be possible on a rooted android with just a bit of pain if it does not already exist as an app).

Should be good enough for things on a local LAN. Could be attacked by a hostile machine that disables/DoSes the LAN’s DHCP server and takes over the DNS (or does ARP hijack or so), though, if it already gets to the LAN - but then you are likely to already have a bigger problem.

The third-party resolvers should be testable by running your own authoritative DNS server for a subdomain (which can be as simple as a short script that spits out precrafted fragmented packets with too big payload, I did (well, improved some German guy’s code) something vaguely similar in Lua for ESP8266 to redirect all DNS queries to the module’s IP)), and then check how the domain resolves and if the bad stuff gets through.


Do you have any experience with the OsX resolver? I run bind on a BSD system for my home DNS, and noticed when looking at pcap files that Macs have an annoying habit of querying root servers after the initial lookup, which of course, defeats the purpose of caching. I’ve figured out how to make OsX “behave” in other ways, but the resolver issue is vexing.