Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Royal Pingdom

Sweden’s Internet broken by DNS mistake

Last night, a routine maintenance of Sweden’s top-level domain .se went seriously wrong, introducing an error that made DNS lookups for all .se domain names start failing. The entire Swedish Internet effectively stopped working at this point. Swedish (.se) websites could not be reached, email to Swedish domain names stopped working, and for many these problems persist still.

According to sources we have inside the Swedish web hosting industry, the .se zone, the central record for the .se top-level domain, broke at 21:19 21:45 local time and was not returned to normal until 22:43 local time.

However, since DNS lookups are cached externally by Internet service providers (ISPs) and web hosting companies, the problems remained even after that. It wasn’t until around 23:30 local time last night that the major Swedish ISPs had flushed their own DNS caches, meaning that they cleared away the broken results so that new DNS lookups could start working properly again. If they had not done this the problem would have remained for a full 24 hours.

There are still a large number of smaller ISPs that have not yet fixed the problem. It is also likely that ISPs outside of Sweden is not aware of the incident, so the effects of the problem may remain there as well.

We (Pingdom) are based in Sweden, so we have witnessed the massive effects of this incident firsthand and also the widespread frustration from end users. The incident is also receiving a significant amount of media attention.

What exactly happened?

The problem happened during planned maintenance of the .se domain. The .SE registry used an incorrectly configured script to update the .se zone, which introduced an error to every single .se domain name.

We have spoken to a number of industry insiders and what happened is that when updating the data, the script did not add a terminating “.” to the DNS records in the .se zone. That trailing dot is necessary in the settings for DNS to understand that “.se” is the top-level domain. It is a seemingly small detail, but without it, the whole DNS lookup chain broke down.

The problems were made worse by the fact that DNS lookups are cached externally. Since DNS lookups are cached a certain time and the .se zone has a 24 hour time-to-live (the time information is cached by external DNS servers), the problem could last for up to 24 hours for some users.

The solution once the problem had been corrected was to “flush” the cache of external DNS servers, i.e. empty their cache, but this can only be done by the ones controlling the DNS servers, usually ISPs and web hosting companies. The end user has little control over this and is left at the mercy of his/her ISP.

The implications

Pingdom monitors the uptime of tens of thousands of websites for our customers, and we often see downtime due to DNS problems. These problems are very common all over the world, but usually it’s a single domain name that has been incorrectly configured or the DNS servers of a single web host having problems. An entire top-level domain breaking is exceptionally rare.

Problems that affect an entire top-level zone have very wide-ranging effects as can be seen by the .se incident. There are just over 900,000 .se domain names, and every single one of these were affected.

Imagine the same thing happening to the .com domain, which has over 80 million domain names. Although not all of these are actually in use by websites or for email, the effects would still be huge and cause an unprecedented amount of downtime across the entire Internet.

Update: According to a statement issued by the .SE registry the problem started at 21:45 local time, not 21:19 as we previously noted from our source. Changed this accordingly.

Want to test your site every minute?








You will get an email with your login information.

24 Comments

Since the entire .se zone was broken, imagine what happened to other domains that had name servers like ns1.company.se… Those domains broke down as well. :(

What the article doesn’t say is *where * this happened – was this in Sweden, or in the US?

I think the caching issue is overstated. As I understand it the missing . was on the end of the domains, which means the origin would have been added turning all .se domains into .se.se domains. That would then lead to NXDOMAIN responses, which the SOA record for .se indicates should be cached for only two hours.

Jay: No, the missing dot was for the NS-records. The problem caused NS records like ns1.someserver.com.se., which does not lead to NXDOMAIN, it leads to domains not resolving and resolvers thus caching the authority-records (which has 86400 TTL), since they can’t get better answers from an authorative server.

Rob: It happened to the .SE-zone, which affected users trying to reach .se-domains no matter where they where geograpically.

Rob: I’m not sure I understand your question. The DNS is global. The error was *made* in Sweden but its effects were seen worldwide.

One who remembers

October 15th, 2009 at 7:28 am


Reminds me of the days when a missing “.” in a COBOL program or the placement of a symbol in JCL would be one position off; these would drive mainframers bonkers. It’s amusingly ironic that technology has advanced so far yet is still subject to the same problems as its dinosaur ancestors – human error.

Who cares?

i think the problem is that these linux / unix chaps use too notepad too much. vs apps that check values and could have some intelligence built in.

Tushar Kapila: They have failsafes in place, afaik, but apparently in this case something went wrong in the routines.

Leave a Reply

Comments are moderated and not published in real time. All comments that are not related to the post will be removed.


15 fantastic firsts on the Internet

First!

Trailblazers, creatives and innovators have taken the Internet to where it is today and made it an essential part of our everyday lives. We have selected a number of interesting “firsts” from the history of the Internet (and the Web) for your reading pleasure.

Read more

Facebook, social media juggernaut (infographic)

FacebookFacebook has announced that it now has 400 million active users. Just one year ago Facebook had 150 million users, so 2009 was an incredible year for the social media giant.

There can be no doubt that Facebook is pretty much unstoppable at the moment, a real juggernaut. For some perspective on Facebook’s amazing growth, we have put together this infographic. We hope you’ll enjoy it!

Read more

New from Pingdom: Shareable uptime banners with graphs

Pingdom logoSometimes you want an easy way to share your Pingdom monitoring data with others. So far we’ve had public report pages that you can use, but now we’ve added one more sharing method that is very flexible and easy to use.

Enter our new “report banners”.

Read more

Why the iPad’s lack of multitasking is a GOOD thing

Apple iPad

Unless you’ve been hiding under a rock lately, you’ll know that last week Apple announced the iPad, its new tablet device. Reactions have been a mixed bag, and a storm of discussion has swept through the blogosphere about various features the iPad should or shouldn’t have had.

One of the main complaints so far has been the iPad’s lack of multitasking. (To be precise, multitasking is a bit of a misnomer here; the iPhone OS has multitasking. What people really mean is only allowing one app at a time to run.)

Read more

The 20 richest Americans in tech

Richest in ITThe tech industry is littered with billionaires. We all enjoy a good income, but some clearly have earned more than others. Much, much more. The question is, how much money do the really big names in tech actually have?

To find out, we went through the Forbes 400, a list of the wealthiest Americans, and filtered out the people who work within the tech field, or more specifically: IT.

So here they are, the 20 richest Americans in tech today.

Read more