Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Pingdom Blog

Royal Pingdom

Ramblings from the Pingdom team about the Internet and web tech

RSS Feed

Major Google App Engine hiccup reveals weaknesses

Google’s App Engine suffered from increased data access latency and errors yesterday, including problems serving applications. According to TechCrunch, the problems lasted for approximately six hours.

From the App Engine status page:

On July 2nd, all applications experienced increased error rate and latency with read and write Datastore and memcache operations, as well as some serving errors. Datastore access and serving have been fully restored as of 12:25 PM PDT.

In a longer explanation posted later by the App Engine team, the problem was apparently due to an issue with GFS (the Google File System) in one of App Engine’s datacenters. This in turn broke Bigtable, which App Engine’s Datastore depends on and also caused the application serving problems (in plain English: causing actual application downtime, not just slowdown).

The increased latency is clearly visible on the status page:

In Google’s defense, system problems like this do happen, and App Engine is no exception. Still, six hours is a significant period of time and is sure to have been very frustrating to those who host their applications on the service (and equally frustrating to the people trying to use those applications!).

The cloud weakness

Whenever something like this happens, it clearly reveals one of the big drawbacks with cloud computing: Cloud computing services become the single point of failure for all applications depending on that service. Therefore any service downtime will have a wide impact.

From many companies’ perspective another drawback is also relying on an external service for something that may be business critical, but that is a discussion for another day.

But now on to something that really surprised us:

The App Engine status page went missing in action

The App Engine issue revealed a weakness in the way Google has set up its system status page.

It’s good that they have one, but while yesterday’s problems lasted, the App Engine status page was unavailable, something that surprised us here at Pingdom a great deal.

Why? Because normally it’s good practice to have the status page for a service completely separated from the service’s infrastructure and placed in a different datacenter, which apparently wasn’t the case here. Considering the scale of Google’s operation, we’re not sure why Google hasn’t done this.

Here is what Google had to say about it:

Many users noted that the System Status site was also down. The System Status site is hosted separately from App Engine applications, and is not typically affected by availability problems. However, due to the low level problem with GFS in this case, the System Status site was also affected.

The team ended up posting status updates via Google Groups and Twitter.

Note that they said that the status page is “hosted separately from App Engine applications,” which is good, but it shouldn’t depend on any of the infrastructure that the App Engine uses. Google really should think about adding another level of separation.

Want to test your site every minute?








You will get an email with your login information.

One Comment

I am tying to imagine how some people might have been frustrated by that

No news is good news for the Super Bowl website

The New England Patriots held what seemed to be a commanding lead (17-15) with five minutes left of Super Bowl XLVI last night. But the New York Giants came back and managed to win with 21-17.

As exciting as the game sounds, we missed the whole thing, instead spending our time watching the Superbowl.com website.

It turned out to be a rather dull thing to do because the site held up well and there was no downtime at all. The response time also didn’t give away anything significant in terms of online Super Bowl traffic.

Read more

As Super Bowl 46 is approaching, fans will flock to the Lucas Oil Stadium in Indianapolis, Indiana, and to TV sets around the world to follow the New York Giants battle it out with the New England Patriots.

Kickoff is scheduled for 6:30EST on Sunday, February 5, and we’re already monitoring Superbowl.com to see how the site will handle the event.

What team will win Super Bowl 46? How will the site cope? We can only wait to find out.

Read more

Weekend must-read articles #2

Every Friday we bring you a collection of links to places on the web that we find particularly newsworthy, interesting, entertaining, and topical. We try to focus on some particular area or topic each week, but in general we will cover Internet, web development, networking, performance, and other geeky topics.h

This week we bring you a collection of articles focusing on cloud, with a few other topics thrown in to boot.

Read more

Out of the 59 US-based e-commerce sites we monitored during the holiday season last year 28 scored a perfect 100% uptime for December.

Whether this helped spur on the booming sales in the US, we don’t know, but retail e-commerce spending in the US reached $37.2 billion for the November to December 2011 period. That was an increase of 15% from the same period in 2010.

We decided to dig into the numbers for these e-commerce sites to see how well they did in terms of uptime and performance. After massaging the data coming from our Pingdom probes, it turns out that the sites overall performed well during December 2011 in terms of uptime, but response time was an issue for several sites.

Read more

Pingdom Podcast #5

Pingdom’s Mobile Podcast is a weekly show about Internet, web, and mobile stuff.

In this show, Saleh also gives us an update on the pending submission of his Carbon for Windows Phone Twitter client. We’re also joined by Mario Lurig, who talks about using Amazon S3 and Cloudfront to speed up a website.

Read more