Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Do you know if your website is up right now? We do! LEARN MORE

Sweden’s Internet broken by DNS mistake

Last night, a routine maintenance of Sweden’s top-level domain .se went seriously wrong, introducing an error that made DNS lookups for all .se domain names start failing. The entire Swedish Internet effectively stopped working at this point. Swedish (.se) websites could not be reached, email to Swedish domain names stopped working, and for many these problems persist still.

According to sources we have inside the Swedish web hosting industry, the .se zone, the central record for the .se top-level domain, broke at 21:19 21:45 local time and was not returned to normal until 22:43 local time.

However, since DNS lookups are cached externally by Internet service providers (ISPs) and web hosting companies, the problems remained even after that. It wasn’t until around 23:30 local time last night that the major Swedish ISPs had flushed their own DNS caches, meaning that they cleared away the broken results so that new DNS lookups could start working properly again. If they had not done this the problem would have remained for a full 24 hours.

There are still a large number of smaller ISPs that have not yet fixed the problem. It is also likely that ISPs outside of Sweden is not aware of the incident, so the effects of the problem may remain there as well.

We (Pingdom) are based in Sweden, so we have witnessed the massive effects of this incident firsthand and also the widespread frustration from end users. The incident is also receiving a significant amount of media attention.

What exactly happened?

The problem happened during planned maintenance of the .se domain. The .SE registry used an incorrectly configured script to update the .se zone, which introduced an error to every single .se domain name.

We have spoken to a number of industry insiders and what happened is that when updating the data, the script did not add a terminating “.” to the DNS records in the .se zone. That trailing dot is necessary in the settings for DNS to understand that “.se” is the top-level domain. It is a seemingly small detail, but without it, the whole DNS lookup chain broke down.

The problems were made worse by the fact that DNS lookups are cached externally. Since DNS lookups are cached a certain time and the .se zone has a 24 hour time-to-live (the time information is cached by external DNS servers), the problem could last for up to 24 hours for some users.

The solution once the problem had been corrected was to “flush” the cache of external DNS servers, i.e. empty their cache, but this can only be done by the ones controlling the DNS servers, usually ISPs and web hosting companies. The end user has little control over this and is left at the mercy of his/her ISP.

The implications

Pingdom monitors the uptime of tens of thousands of websites for our customers, and we often see downtime due to DNS problems. These problems are very common all over the world, but usually it’s a single domain name that has been incorrectly configured or the DNS servers of a single web host having problems. An entire top-level domain breaking is exceptionally rare.

Problems that affect an entire top-level zone have very wide-ranging effects as can be seen by the .se incident. There are just over 900,000 .se domain names, and every single one of these were affected.

Imagine the same thing happening to the .com domain, which has over 80 million domain names. Although not all of these are actually in use by websites or for email, the effects would still be huge and cause an unprecedented amount of downtime across the entire Internet.

Update: According to a statement issued by the .SE registry the problem started at 21:45 local time, not 21:19 as we previously noted from our source. Changed this accordingly.

Check your DNS health here.



10 comments
Craig
Craig

Only just found out about this. It's interesting to me because, as far as I can tell, this happens to the dot-zm (Zambia) TLD on a regular basis.

Tushar Kapila
Tushar Kapila

i think the problem is that these linux / unix chaps use too notepad too much. vs apps that check values and could have some intelligence built in.

One who remembers
One who remembers

Reminds me of the days when a missing "." in a COBOL program or the placement of a symbol in JCL would be one position off; these would drive mainframers bonkers. It's amusingly ironic that technology has advanced so far yet is still subject to the same problems as its dinosaur ancestors - human error.

Stéphane Bortzmeyer
Stéphane Bortzmeyer

Rob: I'm not sure I understand your question. The DNS is global. The error was *made* in Sweden but its effects were seen worldwide.

Jimmy Bergman
Jimmy Bergman

Jay: No, the missing dot was for the NS-records. The problem caused NS records like ns1.someserver.com.se., which does not lead to NXDOMAIN, it leads to domains not resolving and resolvers thus caching the authority-records (which has 86400 TTL), since they can't get better answers from an authorative server. Rob: It happened to the .SE-zone, which affected users trying to reach .se-domains no matter where they where geograpically.

Jay Daley
Jay Daley

I think the caching issue is overstated. As I understand it the missing . was on the end of the domains, which means the origin would have been added turning all .se domains into .se.se domains. That would then lead to NXDOMAIN responses, which the SOA record for .se indicates should be cached for only two hours.

Rob
Rob

What the article doesn't say is *where * this happened - was this in Sweden, or in the US?

Andreas
Andreas

Since the entire .se zone was broken, imagine what happened to other domains that had name servers like ns1.company.se... Those domains broke down as well. :(

Pingdom
Pingdom

Tushar Kapila: They have failsafes in place, afaik, but apparently in this case something went wrong in the routines.

Trackbacks

  1. [...] names start failing. The net result was that Internet access in Sweden effectively ground to a halt for more than an hour and a half. The entire problem began when an incorrectly configured script forget to add the needed [...]

  2. [...] to Web monitoring company Pingdom, which happens to be based in Sweden, the disablement of an entire top-level domain “is [...]

  3. [...] bloggers at Royal Pingdom think they might have figured it out. According to them the .SE registry used an incorrectly [...]

  4. [...] war anscheinend das schwedische Internet, also alle Seiten mit Endung .se, kaputt (Link, Link) und ich hab’ nichtmal was [...]

  5. [...] Original Article [...]

  6. [...] to Royal Pingdom, who is based in Sweden, routine maintenance to the top-level domain — .se — turned into [...]

  7. [...] sitio Royal Pingdom, basado en Suecia, explicó que “el registro .se usó un script configurado incorrectamente [...]

  8. [...] 10月12日晚上,大约有1百万使用.se(瑞典)顶级域名的网站在互联网上短时间内消失了。位于瑞典的网络监视公司PingdomM称,一个顶级域名整个下线是极其罕见的事情,无法想象.com域名下线会造成何等影响。Pingdom称,.se域名共下线了1个半小时,但余震将会持续很久。 下线的原因是.SE注册商使用错误配置的脚本去更新.se的zone文件,导致每一个.se域名发生错误。错误发生在脚本没有在.se的DNS记录中增加一个“.”,DNS需要这个“.”才能理解“.se”是一个顶级域名。虽然这听起来是一个小细节,但没有它,整个DNS查找链就被破坏。一些记录在域名服务器缓存中的.se域名仍然要等24个小时DNS更新会才能再次出现在互联网上。管理.se域名的Internet Infrastructure Foundation发言人称,这个小错误将会影响2天时间。 [...]

  9. [...] Sweden’s Internet broken by DNS mistake [...]

  10. [...] hat ein Skript diesen abschließenden Punkt unterschlagen. Die Konsequenz daraus: Alle 900.000 Domains mit der Endung .se waren nicht mehr erreichbar. Der Fehler wurde kurze Zeit später korrigiert. Bis alle DNS Caches wieder mit gültigen Adressen [...]

  11. [...] sitio Royal Pingdom, basado en Suecia, explicó que “el registro .se usó un script configurado incorrectamente [...]

  12. [...] to Web monitoring company Pingdom (ironically based in Sweden) the disablement of an entire top-level domain is rare, “usually [...]