Synthetic Monitoring

Simulate visitor interaction with your site to monitor the end user experience.

View Product Info

FEATURES

Simulate visitor interaction

Identify bottlenecks and speed up your website.

Learn More

Real User Monitoring

Enhance your site performance with data from actual site visitors

View Product Info

FEATURES

Real user insights in real time

Know how your site or web app is performing with real user insights

Learn More

Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info
Comprehensive set of turnkey infrastructure integrations

Including dozens of AWS and Azure services, container orchestrations like Docker and Kubernetes, and more 

Learn More

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info
Complete visibility into application issues

Pinpoint the root cause down to a poor-performing line of code

Learn More

Log Management and Analytics Powered by SolarWinds Loggly

Integrated, cost-effective, hosted, and scalable full-stack, multi-source log management

 View Log Management and Analytics Info
Collect, search, and analyze log data

Quickly jump into the relevant logs to accelerate troubleshooting

Learn More

Major Google App Engine hiccup reveals weaknesses

Google’s App Engine suffered from increased data access latency and errors yesterday, including problems serving applications. According to TechCrunch, the problems lasted for approximately six hours.
From the App Engine status page:

On July 2nd, all applications experienced increased error rate and latency with read and write Datastore and memcache operations, as well as some serving errors. Datastore access and serving have been fully restored as of 12:25 PM PDT.

In a longer explanation posted later by the App Engine team, the problem was apparently due to an issue with GFS (the Google File System) in one of App Engine’s datacenters. This in turn broke Bigtable, which App Engine’s Datastore depends on and also caused the application serving problems (in plain English: causing actual application downtime, not just slowdown).
The increased latency is clearly visible on the status page:

In Google’s defense, system problems like this do happen, and App Engine is no exception. Still, six hours is a significant period of time and is sure to have been very frustrating to those who host their applications on the service (and equally frustrating to the people trying to use those applications!).

The cloud weakness

Whenever something like this happens, it clearly reveals one of the big drawbacks with cloud computing: Cloud computing services become the single point of failure for all applications depending on that service. Therefore any service downtime will have a wide impact.
From many companies’ perspective another drawback is also relying on an external service for something that may be business critical, but that is a discussion for another day.
But now on to something that really surprised us:

The App Engine status page went missing in action

The App Engine issue revealed a weakness in the way Google has set up its system status page.
It’s good that they have one, but while yesterday’s problems lasted, the App Engine status page was unavailable, something that surprised us here at Pingdom a great deal.
Why? Because normally it’s good practice to have the status page for a service completely separated from the service’s infrastructure and placed in a different datacenter, which apparently wasn’t the case here. Considering the scale of Google’s operation, we’re not sure why Google hasn’t done this.
Here is what Google had to say about it:

Many users noted that the System Status site was also down. The System Status site is hosted separately from App Engine applications, and is not typically affected by availability problems. However, due to the low level problem with GFS in this case, the System Status site was also affected.

The team ended up posting status updates via Google Groups and Twitter.
Note that they said that the status page is “hosted separately from App Engine applications,” which is good, but it shouldn’t depend on any of the infrastructure that the App Engine uses. Google really should think about adding another level of separation.

Introduction to Observability

These days, systems and applications evolve at a rapid pace. This makes analyzi [...]

Webpages Are Getting Larger Every Year, and Here’s Why it Matters

Last updated: February 29, 2024 Average size of a webpage matters because it [...]

A Beginner’s Guide to Using CDNs

Last updated: February 28, 2024 Websites have become larger and more complex [...]

The Five Most Common HTTP Errors According to Google

Last updated: February 28, 2024 Sometimes when you try to visit a web page, [...]

Page Load Time vs. Response Time – What Is the Difference?

Last updated: February 28, 2024 Page load time and response time are key met [...]

Monitor your website’s uptime and performance

With Pingdom's website monitoring you are always the first to know when your site is in trouble, and as a result you are making the Internet faster and more reliable. Nice, huh?

START YOUR FREE 30-DAY TRIAL

MONITOR YOUR WEB APPLICATION PERFORMANCE

Gain availability and performance insights with Pingdom – a comprehensive web application performance and digital experience monitoring tool.

START YOUR FREE 30-DAY TRIAL
Start monitoring for free