JUNOS (Juniper) Kernel Crash Video

We have noted some interesting responses since our post yesterday detailing the information in Juniper bulletin PSN-2010-01-623 and our thoughts on its somewhat understated effect. Since our post yesterday, the bulletin has been updated, becoming more specific about the versions affected (basically excluding JUNOS version 10.x and versions no longer supported by Juniper). We’ve been quoted here and there saying that the potential worst case scenario with this flaw could have been widespread Internet outages (not overstatement in our opinion), and that such a simple attack that escapes filtering and can reboot high end routers is a big deal. We have tested sending all 256 permutations of the Options field in the TCP header to a vulnerable Juniper router operating system, found the correct value, and reproduced the kernel crash, which is demonstrated in the video below.

SCAPY was used to send the packets used in the test.

Responses to Our Original Post

The Bizarre

We’ve seen kind of off the wall responses, like this one:

It seems like this isn’t as major as they say. Sure it’s a kernel crash, but it requires a packet to be sent to a listening port. I doubt any core routers have any ports open to the public internet at all.

In order for a router to function as a router, some TCP ports must be open. The BGP port will be open on a core router. So yes, a core router will not have ports open to the public Internet. The BGP port however will be open to neighbors, and a packet that cannot be filtered negates ACL rules preventing access by anyone but neighbors. At a high level, that is how high end equipment is affected.

The Official

We saw the response from Juniper we talked about yesterday repeated again today, which continues to leave something to be desired: A Juniper spokeswoman declined to provide more technical details on the issue, saying that the company only passes on this information to customers and partners. The advisory was one of seven issued recently by the company, she said via e-mail.

Yes, there were seven advisories. Six were somewhat less interesting than one of them:

Unofficial, but from Juniper Anyway

We received a response from Matt at Juniper in the comment section of the original post, which we appreciated. He tightened the versions affected information, by noting the mistake in the original Juniper bulletin that stated version 10.x was affected.

Again, thanks for the update Matt.

Another Unofficial, but from Juniper Anyway

JuniperPhilly responds in the comments of the Register article as follows:

it’s probably not as bad as you might think- All Junos software releases built on or after January 28, 2009 have fixed this specific issue. In short, we fixed this particular problem about 350 days ago. …. Disclaimer: I work for Juniper as a Systems Engineer.

Well, sort of. The criticality of the defect was certainly reclassified, so the fix made a while back actually seems divorced from the discovery that this problem leads to a kernel crash based on a remote exploit. The Juniper advisory itself reads this way, suggesting that the fix was made without knowing that it was a fix for a remote exploit. This is not that uncommon, problems are fixed for one reason, without ever knowing there was an even better reason for correcting it.

But routers, especially high capacity ones, are only patched for serious reasons. So a defect identified but not reported in the same way back in January 2009 does not carry the affect of releasing a bulletin labeled critical yesterday. The second makes people maintaining those routers move, as the example below shows.

Qwest, like other backbone providers, doesn’t have unannounced outages for unspecified security concerns over “not as bad as you might think” issues:

Date: 2010-01-07 10:04:08 GMT (15 hours and 1 minute ago) We just had a qwest outage of about 2 mins at 1:41am pst. When I called to report it I was told it was a 200+ emergency software upgrade due to a security concern, and that we will get a notice later after the fact. Normally we get notices in advance, even for software upgrades due to security or other important issues, so I am curious if other qwest customers had the same experience and wether this is how it’s going to be from here on in? The affected platform was juniper and I’d love to know the specfic case being addressed here.

Mike-

Source: http://thread.gmane.org/gmane.org.operators.nanog/71244

This thread actually produced interesting responses regarding how the actual notification was published after the outage:

The thread link above contains this and the rest of this particular discussion.

The Newsgroups

We were told the problem wasn’t corroborated by discussions in newsgroups. It started showing up today:

Yeah but Cisco makes the Core Routers

Sigh…

Not to become public relations for Juniper, but:

The innovations listed above, as well as many others, have helped the T Series become the industry’s most widely deployed core routing family. Juniper has shipped over 5000 T Series to more than 220 customers around the world — including more than 500 T1600s in just over a year of availability. According to Synergy Research, in the past five years, Juniper’s share of the core routing market has grown by 44 percent — with the company gaining 11 points of share as others have seen share declines.

Source: http://www.juniper.net/us/en/company/press-center/press-releases/2009/pr_2009_06_08-09_00.html

And the following line from the same press release:

All of these platforms are powered by JUNOS® Software, a single operating system integrating routing, switching, security and network services from Juniper Networks.

What about Anti-spoofing and egress filtering

(Comments From: ANTON DELPORT)

One thing that will also be required for a successful attacked would be spoofed IP packets. Keep in mind that most ISP follow the best practice guidelines and implement ACL and anti-spoofing. So yes, the router will listen to BGP port but only for a small range of prefixes. If the source address (and destination) is not correct, the packet will be dropped in hardware before it can do any damage.

Anti-spoofing and egress filtering as recommended by BCP 38 is to help mitigate this issue for routers that are not at the edge. It does nothing to help the edge routers themselves. Example:

The reason why this issue is real is that I can identify border networks simply with traceroute, and I know that BGP is used to exchange routes. Given this information there is nothing to protect providers if they are running an affected version of the software at the edge of their network.

Finally

So people are attaching viewpoints to this problem that don’t entirely make sense. A high end router is not the same as your local Microsoft Windows OS, it doesn’t get updated every month following Tuesday, it gets updated when a network administrator determines there is a problem severe enough to warrant an outage to make the patch update. Many of the “big iron” routers that would have been affected had this been out in the wild (which as far as we know its not yet) were not patched as of Monday, and from all appearances were patched as of late Tuesday.

Juniper is a major player in the high end router market, it is not a one player market. If an unpatched Juniper router were hit with this packet, it would crash.

But let’s walk through a thought experiment for the “this wouldn’t have been a big deal if uncorrected” crowd:

Watch the video above, the OS reboot takes a while on a virtual machine (big routers take longer). Imagine a bot net being rented to run the program that was developed for the video above at a certain time (say midnight). Conceive of the bad actor identifying boundary routers between service providers (traceroute), and sending the crafted packet to the BGP port of both side’s IP addresses, rebooting boxes, and severing BGP connections. Even after reboot, the effects are magnified as a BGP convergence happens globally.

You can rent a decent size botnet on the Internet right now if you like. The program above that found the right option to send took a couple hours to write (on and off with other things going on), the actual option field that causes the problem identified fairly quickly after that. The second program that sends the packet is just a small python script.

This hypothetical scenario would have been a long day on the old Intertubes. I’m sure there are details to be worked out (if you crash enough gateways, can you continue the attack?), but you get the idea.

So let’s be realistic as we go into the automatic “nothing is ever really a big issue, everything is FUD” reactive mode that so often follows news in information security. Remote exploits are still bad. Ones that cause kernel crashes are still bad. Remote exploits that cause kernel crashes in one of the most widely used network operating systems in the world are bad. Identifying security issues that are critical, responding to them appropriately, sending out bulletins with appropriate CVSS ratings, and avoiding big potential problems like this, are good. We can’t call it a total win (its not hard to find the option value, and so this could enter the wild shortly), but it looks from the outside like large providers have taken preventative steps to be prepared.

And if anyone else noticed Twitter seemed to have its own blackout, of Juniper personnel, as none of them have been tweeting a whole lot this week.