Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Royal Pingdom

Major Google App Engine hiccup reveals weaknesses

Google’s App Engine suffered from increased data access latency and errors yesterday, including problems serving applications. According to TechCrunch, the problems lasted for approximately six hours.

From the App Engine status page:

On July 2nd, all applications experienced increased error rate and latency with read and write Datastore and memcache operations, as well as some serving errors. Datastore access and serving have been fully restored as of 12:25 PM PDT.

In a longer explanation posted later by the App Engine team, the problem was apparently due to an issue with GFS (the Google File System) in one of App Engine’s datacenters. This in turn broke Bigtable, which App Engine’s Datastore depends on and also caused the application serving problems (in plain English: causing actual application downtime, not just slowdown).

The increased latency is clearly visible on the status page:

In Google’s defense, system problems like this do happen, and App Engine is no exception. Still, six hours is a significant period of time and is sure to have been very frustrating to those who host their applications on the service (and equally frustrating to the people trying to use those applications!).

The cloud weakness

Whenever something like this happens, it clearly reveals one of the big drawbacks with cloud computing: Cloud computing services become the single point of failure for all applications depending on that service. Therefore any service downtime will have a wide impact.

From many companies’ perspective another drawback is also relying on an external service for something that may be business critical, but that is a discussion for another day.

But now on to something that really surprised us:

The App Engine status page went missing in action

The App Engine issue revealed a weakness in the way Google has set up its system status page.

It’s good that they have one, but while yesterday’s problems lasted, the App Engine status page was unavailable, something that surprised us here at Pingdom a great deal.

Why? Because normally it’s good practice to have the status page for a service completely separated from the service’s infrastructure and placed in a different datacenter, which apparently wasn’t the case here. Considering the scale of Google’s operation, we’re not sure why Google hasn’t done this.

Here is what Google had to say about it:

Many users noted that the System Status site was also down. The System Status site is hosted separately from App Engine applications, and is not typically affected by availability problems. However, due to the low level problem with GFS in this case, the System Status site was also affected.

The team ended up posting status updates via Google Groups and Twitter.

Note that they said that the status page is “hosted separately from App Engine applications,” which is good, but it shouldn’t depend on any of the infrastructure that the App Engine uses. Google really should think about adding another level of separation.

Want to test your site every minute?








You will get an email with your login information.

One Comment

I am tying to imagine how some people might have been frustrated by that

Leave a Reply

Comments are moderated and not published in real time. All comments that are not related to the post will be removed.


15 fantastic firsts on the Internet

First!

Trailblazers, creatives and innovators have taken the Internet to where it is today and made it an essential part of our everyday lives. We have selected a number of interesting “firsts” from the history of the Internet (and the Web) for your reading pleasure.

Read more

Facebook, social media juggernaut (infographic)

FacebookFacebook has announced that it now has 400 million active users. Just one year ago Facebook had 150 million users, so 2009 was an incredible year for the social media giant.

There can be no doubt that Facebook is pretty much unstoppable at the moment, a real juggernaut. For some perspective on Facebook’s amazing growth, we have put together this infographic. We hope you’ll enjoy it!

Read more

New from Pingdom: Shareable uptime banners with graphs

Pingdom logoSometimes you want an easy way to share your Pingdom monitoring data with others. So far we’ve had public report pages that you can use, but now we’ve added one more sharing method that is very flexible and easy to use.

Enter our new “report banners”.

Read more

Why the iPad’s lack of multitasking is a GOOD thing

Apple iPad

Unless you’ve been hiding under a rock lately, you’ll know that last week Apple announced the iPad, its new tablet device. Reactions have been a mixed bag, and a storm of discussion has swept through the blogosphere about various features the iPad should or shouldn’t have had.

One of the main complaints so far has been the iPad’s lack of multitasking. (To be precise, multitasking is a bit of a misnomer here; the iPhone OS has multitasking. What people really mean is only allowing one app at a time to run.)

Read more

The 20 richest Americans in tech

Richest in ITThe tech industry is littered with billionaires. We all enjoy a good income, but some clearly have earned more than others. Much, much more. The question is, how much money do the really big names in tech actually have?

To find out, we went through the Forbes 400, a list of the wealthiest Americans, and filtered out the people who work within the tech field, or more specifically: IT.

So here they are, the 20 richest Americans in tech today.

Read more