Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Do you know if your website is up right now? We do! LEARN MORE

In regard to the recent outages

In less than a week, we’ve experienced two outages. In fact, these two outages combined have been the worst since the company was founded in 2007.

We wanted to take this opportunity to give you an update on the situation and tell you where we go from here. The current status is that all our core websites and services are up and running as they should. This includes the monitoring you have set up of your websites, alerts, our API etc.

The only services remaining to be started up again are the non-core Ping and Traceroute service, parts of our Tools, as well as Report Banners.

When this most recent outage started yesterday, we were of course alerted right away through the various ways we monitor our own services. Despite this, the severity of this incident and the challenges we faced meant it took longer than anticipated addressing the problems. During both incidents, we have worked closely with consultants as well as suppliers.

We are in the website monitoring business, and we think you will agree with us that everyone will, at some point, be struck by unscheduled downtime. To put things in perspective, our systems handle around 280,000 customer accounts and store almost 300 million monitoring results each day. We invest millions every year in hardware, software, services, and more, to provide the best possible monitoring solution available today. Unfortunately, even with the best of intentions and the most thorough preparations, things sometime go wrong.

We’re not trying to play the blame game, but we want to be as upfront as we can with you, our customers. Next week we will present a detailed plan for how we will fix the point of failures we have today, including a specific timeline for when different things will be done. Following that plan, we will continuously keep you updated and informed about the progress we make.

Until everything has been implemented to fix the risk presented by problems we face, we’ve already put into place measures that hopefully will prevent any similar incident from happening again until we made the necessary long term fixes. We anticipate that we will be completely done with the first step of the coming changes and updates in 5-10 days from now.

Everyone at Pingdom is now dedicated to fixing these issues, and all other development has been put on hold. Even after what’s now happened, we’re very proud of the services we provide, and we’ll work exclusively on making our systems as reliable as they can possibly be.

Even though, this is just the start of what will be an intensive process for us, we’d like to think we learn from our mistakes. This is now our chance to prove to you, our customers, that we do just that.

If you have any questions or concerns, please send an email to support@pingdom.com.

As always, it would be very helpful for us if you could provide as much information as possible when you contact us, including your account information and what checks are affected. Also, wherever possible, sending us a screenshot of the exact issue will help us help you faster.



3 Comments

Thanks for the suggestion Dan, that’s certainly something that’s in our plans.

I’ve heard nothing in Pingdom’s replies here to indicate they agree their silence was unacceptable. In fact the paragraph containing “we’re not trying to blame someone else” did exactly that…we’re not unintelligent individuals. Bad things happen, being honest/communicative about them and making sure that permutation never happens again is all I expect of my vendors.
 
I opened a support ticket on Friday when it was clear they were having issues….no response to that ticket and it’s now Tuesday. Not so much as a customer-wide mass mailing admitting the issue and it’s nature (much less an RFO with a resolution). We moved to Pingdom from Alertra because it looked like you got more for your money. Guess not…in 6 years of using Alertra for the same purpose (and check types) we had not one false positive or critical outage. In 2 years of using Pingdom this is the 5th significant issue. Short of a public mea culpa indicating Pingdom values it’s customers, we will be moving again…
 
While we’re on the subject of valuing your customers…if you’re generating enough revenue to justify millions of $$ in “hardware, software, and services” why the heck don’t spend some of that on a couple of guys to have 24 hour email support? The lack of it is just another indication that all that is valued about Pingdom’s customers are their payments.

@jasonjwwilliams Thanks for the input Jason, we do appreciate it. We put this comment on Facebook in response to comments (you can also see it below) but we’ll add it here too since it applies:
 
“ This incident was certainly a learning experience for all of us at Pingdom, when it comes to technology as well as in other ways, including communications. We have taken everything that our customers have said to heart and will do better in the future, that much we can promise. As we said in the blog post from a few days ago, we will this week publish our plan for what we’re doing to get to grips with the points of failure that led to the incident this past weekend. Please keep the comments coming – everything you’ve said is greatly appreciated and we *will* do better.”
 
Regarding increasing support, that’s something that is in the plans for the near future.
 
About your email to support, if you could give us the ticket number, we’ll follow up on that.