Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Do you know if your website is up right now? We do! LEARN MORE

Bug temporarily affected monitoring for a portion of our customers today

PingdomToday, the Pingdom team deployed a software upgrade to some of our monitoring probes. Despite thorough testing, this upgrade contained a malfunction that led to false down alerts being sent to a portion of our customers.

Even if the issue affected monitoring for less than 90 minutes for a limited number of customers, it’s of course frustrating if you were one of them. We take a lot of pride in delivering a reliable service and this doesn’t represent what Pingdom stands for.

Let us first stress how rare it is that something like this happens at Pingdom. In fact, this is the first time a similar occurrence has struck us. That said, we want to take this opportunity to provide information about what happened, present what actions we’ve already taken, as well as tell you how we move forward.

Our normal deployment of new and updated software consists of a series of tests designed to making sure that our systems are reliable. This means that we roll out updates gradually to our infrastructure and only after they’ve been thoroughly tested in our development and staging environment.

Today at around 8 am GMT we gradually started to roll out the update to a few selected monitoring probes. Immediately we saw that there was an issue with the code and did a rollback. But, unfortunately, a limited number of customers had faulty downtimes recorded in their data and in some cases also received faulty down alerts during a limited time.

After a thorough investigation we’ve already initiated actions to minimize the effect this may have had, including:

  • Affected Pingdom checks will have their up and down records marked as unmonitored for the period in question, up to a maximum of 90 minutes. Therefore, each site’s uptime record will not be affected. In other words, your uptime percentage will not change due to this incident.
  • Any lost SMS credits due to incorrect alerts in connection with this issue have been refunded. You will receive double the amount of credits that was used during the incident.
  • We will take further steps to make sure that future upgrades to our infrastructure will be implemented with even more caution. This incident has already led to improvements in our deployment routines.

We want you to rest assured that all of us working at Pingdom take significant pride in delivering the best possible service, and even though mistakes happen they are not acceptable to us.

If you were affected by this, we’re really sorry. You can be sure that someone will be wearing the stupid hat today.

Please contact us at support@pingdom.com if you have any questions or comments.



10 comments
Payday loan in Missouri
Payday loan in Missouri

Attractive post. I just stumbled upon your blogpost and wish to say that I have really enjoyed analysis your blog posts. Any way I'll be subscribing to your feed and I expect you post again shortly.

immigration
immigration

Really your post is really very good and I appreciate it. It’s hard to sort the good from the bad sometimes, but I think you’ve nailed it. You write very well which is amazing. I really impressed by your post.

minhwanl
minhwanl

@pingdom It's better to not receive rather then receiving fake alerts... however, we didn't receive any alerts and it's good to know.

pkclark
pkclark

@pingdom We're gonna need photos of the guilty party wearing the "stupid hat". Nice proactive response BTW.

evanculver
evanculver

@codysoyland @webology it definitely feels like there should be a better, more simple solution to this problem. #its2012

ZestCRM
ZestCRM

Do we get to see a picture of the individual in the stupid hat?

Starefossen
Starefossen

@pingdom thanks for being so up-front about this and coming clean!

jonpott
jonpott

@pingdom thanks for being up-front about it; now just send me a few hours' sleep in the mail and we'll call it even. ;-)

Mikel Bullis
Mikel Bullis

thank you for the notification. That was a crazy sleepless night but glad to confirm it wasn't our cluster of servers. thanks for all you do, you guys are awesome!