Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Pingdom Blog

Royal Pingdom

Ramblings from the Pingdom team about the Internet and web tech

RSS Feed

How to stop an outage from becoming an outrage

Sooner or later, every site or application will fail. However the consequences depend not only on how the failure is managed but also on how it is communicated. Recently the web hosting company Media Temple and even Google have well illustrated how hard it is for modern connected organizations to respond quickly enough to system outages. Here’s a suggested crisis checklist and notes on the difficulties of always practicing it.

On Saturday, February 28, a storage cluster at Media Temple failed, depriving thousands of customers of their service until the following Monday morning. In the process, the company did not mass e-mail its customers or swiftly seem to update anything other than the system status account on Twitter. Only later did the company attempt to send private messages to the accounts of some irritated customers. This quickly led to outrage on blogs and online communities.

Similarly, in an incident covered here as well as elsewhere, Google faced a similar crisis four days earlier when its Gmail service stopped functioning globally for 2-4 hours. As millions of users and companies were unable to use their e-mail, the company communicated only very briefly on its official blog. Of course, very quickly the big media, blogosphere and communities were on fire with messages about “Gfail”.

These examples show how modern organizations need to excel in following the deceptively simple rules of crisis communication – always try to reserve the capacity to:

I. Preparations:
  • Define your main stakeholders – customers, investors, partners, suppliers etc.
  • Keep an eye on big real-time forums where they may communicate.
  • Define what a serious error is and how to notice when one has happened.
II. Urgent actions:
  • Define what has happened as far as possible – be careful to separate facts from guesses.
  • Define what to do about it – recovery, calling in extra resources etc.
  • Define which stakeholders are affected.
  • Define how to communicate with these groups – avoid speculation and optimistic promises in favor of continuous updates, addressing the information vacuum and user frustration.
  • Start communicating.
III. Follow-up:
  • Respond quickly to further questions from key stakeholders – always stick to the facts/message as agreed above and avoid speculation.
  • If an error has been committed, offer apologies and remuneration (which both the mentioned companies currently have done).

With hindsight, Media Temple reacted as quickly as possible, throwing all resources at solving the issue – and forgot to communicate actively with their customers, generating anger and accusations that might have been avoided. Google for their part aggravated the error by reacting first offering erroneous information to its users – the failure was hardly “limited to a small subset of users”.

Both companies were hung high on Twitter, underlining the need for organizations to monitor real-time communities like this who can improve or aggravate the situation by instantly spreading information – if such is available. Media Temple later claimed that it lacked the staff resources to handle the thousands of micro conversations.

It may be the case that in this kind of situation the best course for a company may be to define its one message, mass-communicate and update this actively and avoid speculation or individualization. This is when it is beneficial to have one single source of information that all customers can be referred to for status updates, for example an externally hosted status blog.

So, are we saying that by following the above rules, communications mishaps could never happen? Of course not, the answer is that crisis management is never easy – otherwise it wouldn’t be a crisis.

Do you have any examples of superb crisis communications – or the opposite?

Please don’t hesitate to share them with us in the comments.

Want to test your site every minute?








You will get an email with your login information.

No news is good news for the Super Bowl website

The New England Patriots held what seemed to be a commanding lead (17-15) with five minutes left of Super Bowl XLVI last night. But the New York Giants came back and managed to win with 21-17.

As exciting as the game sounds, we missed the whole thing, instead spending our time watching the Superbowl.com website.

It turned out to be a rather dull thing to do because the site held up well and there was no downtime at all. The response time also didn’t give away anything significant in terms of online Super Bowl traffic.

Read more

As Super Bowl 46 is approaching, fans will flock to the Lucas Oil Stadium in Indianapolis, Indiana, and to TV sets around the world to follow the New York Giants battle it out with the New England Patriots.

Kickoff is scheduled for 6:30EST on Sunday, February 5, and we’re already monitoring Superbowl.com to see how the site will handle the event.

What team will win Super Bowl 46? How will the site cope? We can only wait to find out.

Read more

Weekend must-read articles #2

Every Friday we bring you a collection of links to places on the web that we find particularly newsworthy, interesting, entertaining, and topical. We try to focus on some particular area or topic each week, but in general we will cover Internet, web development, networking, performance, and other geeky topics.h

This week we bring you a collection of articles focusing on cloud, with a few other topics thrown in to boot.

Read more

Out of the 59 US-based e-commerce sites we monitored during the holiday season last year 28 scored a perfect 100% uptime for December.

Whether this helped spur on the booming sales in the US, we don’t know, but retail e-commerce spending in the US reached $37.2 billion for the November to December 2011 period. That was an increase of 15% from the same period in 2010.

We decided to dig into the numbers for these e-commerce sites to see how well they did in terms of uptime and performance. After massaging the data coming from our Pingdom probes, it turns out that the sites overall performed well during December 2011 in terms of uptime, but response time was an issue for several sites.

Read more

Pingdom Podcast #5

Pingdom’s Mobile Podcast is a weekly show about Internet, web, and mobile stuff.

In this show, Saleh also gives us an update on the pending submission of his Carbon for Windows Phone Twitter client. We’re also joined by Mario Lurig, who talks about using Amazon S3 and Cloudfront to speed up a website.

Read more