Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Pingdom Blog

Royal Pingdom

Ramblings from the Pingdom team about the Internet and web tech

RSS Feed

Sweden’s Internet broken by DNS mistake

Last night, a routine maintenance of Sweden’s top-level domain .se went seriously wrong, introducing an error that made DNS lookups for all .se domain names start failing. The entire Swedish Internet effectively stopped working at this point. Swedish (.se) websites could not be reached, email to Swedish domain names stopped working, and for many these problems persist still.

According to sources we have inside the Swedish web hosting industry, the .se zone, the central record for the .se top-level domain, broke at 21:19 21:45 local time and was not returned to normal until 22:43 local time.

However, since DNS lookups are cached externally by Internet service providers (ISPs) and web hosting companies, the problems remained even after that. It wasn’t until around 23:30 local time last night that the major Swedish ISPs had flushed their own DNS caches, meaning that they cleared away the broken results so that new DNS lookups could start working properly again. If they had not done this the problem would have remained for a full 24 hours.

There are still a large number of smaller ISPs that have not yet fixed the problem. It is also likely that ISPs outside of Sweden is not aware of the incident, so the effects of the problem may remain there as well.

We (Pingdom) are based in Sweden, so we have witnessed the massive effects of this incident firsthand and also the widespread frustration from end users. The incident is also receiving a significant amount of media attention.

What exactly happened?

The problem happened during planned maintenance of the .se domain. The .SE registry used an incorrectly configured script to update the .se zone, which introduced an error to every single .se domain name.

We have spoken to a number of industry insiders and what happened is that when updating the data, the script did not add a terminating “.” to the DNS records in the .se zone. That trailing dot is necessary in the settings for DNS to understand that “.se” is the top-level domain. It is a seemingly small detail, but without it, the whole DNS lookup chain broke down.

The problems were made worse by the fact that DNS lookups are cached externally. Since DNS lookups are cached a certain time and the .se zone has a 24 hour time-to-live (the time information is cached by external DNS servers), the problem could last for up to 24 hours for some users.

The solution once the problem had been corrected was to “flush” the cache of external DNS servers, i.e. empty their cache, but this can only be done by the ones controlling the DNS servers, usually ISPs and web hosting companies. The end user has little control over this and is left at the mercy of his/her ISP.

The implications

Pingdom monitors the uptime of tens of thousands of websites for our customers, and we often see downtime due to DNS problems. These problems are very common all over the world, but usually it’s a single domain name that has been incorrectly configured or the DNS servers of a single web host having problems. An entire top-level domain breaking is exceptionally rare.

Problems that affect an entire top-level zone have very wide-ranging effects as can be seen by the .se incident. There are just over 900,000 .se domain names, and every single one of these were affected.

Imagine the same thing happening to the .com domain, which has over 80 million domain names. Although not all of these are actually in use by websites or for email, the effects would still be huge and cause an unprecedented amount of downtime across the entire Internet.

Update: According to a statement issued by the .SE registry the problem started at 21:45 local time, not 21:19 as we previously noted from our source. Changed this accordingly.

Want to test your site every minute?








You will get an email with your login information.

25 Comments

Since the entire .se zone was broken, imagine what happened to other domains that had name servers like ns1.company.se… Those domains broke down as well. :(

What the article doesn’t say is *where * this happened – was this in Sweden, or in the US?

I think the caching issue is overstated. As I understand it the missing . was on the end of the domains, which means the origin would have been added turning all .se domains into .se.se domains. That would then lead to NXDOMAIN responses, which the SOA record for .se indicates should be cached for only two hours.

Jay: No, the missing dot was for the NS-records. The problem caused NS records like ns1.someserver.com.se., which does not lead to NXDOMAIN, it leads to domains not resolving and resolvers thus caching the authority-records (which has 86400 TTL), since they can’t get better answers from an authorative server.

Rob: It happened to the .SE-zone, which affected users trying to reach .se-domains no matter where they where geograpically.

Rob: I’m not sure I understand your question. The DNS is global. The error was *made* in Sweden but its effects were seen worldwide.

One who remembers

October 15th, 2009 at 7:28 am


Reminds me of the days when a missing “.” in a COBOL program or the placement of a symbol in JCL would be one position off; these would drive mainframers bonkers. It’s amusingly ironic that technology has advanced so far yet is still subject to the same problems as its dinosaur ancestors – human error.

Who cares?

i think the problem is that these linux / unix chaps use too notepad too much. vs apps that check values and could have some intelligence built in.

Tushar Kapila: They have failsafes in place, afaik, but apparently in this case something went wrong in the routines.

Only just found out about this. It’s interesting to me because, as far as I can tell, this happens to the dot-zm (Zambia) TLD on a regular basis.

No news is good news for the Super Bowl website

The New England Patriots held what seemed to be a commanding lead (17-15) with five minutes left of Super Bowl XLVI last night. But the New York Giants came back and managed to win with 21-17.

As exciting as the game sounds, we missed the whole thing, instead spending our time watching the Superbowl.com website.

It turned out to be a rather dull thing to do because the site held up well and there was no downtime at all. The response time also didn’t give away anything significant in terms of online Super Bowl traffic.

Read more

As Super Bowl 46 is approaching, fans will flock to the Lucas Oil Stadium in Indianapolis, Indiana, and to TV sets around the world to follow the New York Giants battle it out with the New England Patriots.

Kickoff is scheduled for 6:30EST on Sunday, February 5, and we’re already monitoring Superbowl.com to see how the site will handle the event.

What team will win Super Bowl 46? How will the site cope? We can only wait to find out.

Read more

Weekend must-read articles #2

Every Friday we bring you a collection of links to places on the web that we find particularly newsworthy, interesting, entertaining, and topical. We try to focus on some particular area or topic each week, but in general we will cover Internet, web development, networking, performance, and other geeky topics.h

This week we bring you a collection of articles focusing on cloud, with a few other topics thrown in to boot.

Read more

Out of the 59 US-based e-commerce sites we monitored during the holiday season last year 28 scored a perfect 100% uptime for December.

Whether this helped spur on the booming sales in the US, we don’t know, but retail e-commerce spending in the US reached $37.2 billion for the November to December 2011 period. That was an increase of 15% from the same period in 2010.

We decided to dig into the numbers for these e-commerce sites to see how well they did in terms of uptime and performance. After massaging the data coming from our Pingdom probes, it turns out that the sites overall performed well during December 2011 in terms of uptime, but response time was an issue for several sites.

Read more

Pingdom Podcast #5

Pingdom’s Mobile Podcast is a weekly show about Internet, web, and mobile stuff.

In this show, Saleh also gives us an update on the pending submission of his Carbon for Windows Phone Twitter client. We’re also joined by Mario Lurig, who talks about using Amazon S3 and Cloudfront to speed up a website.

Read more