Google Apps SLA loophole allows for major downtime without consequences
Gmail could be unavailable for more than 21 hours in a day, and Google could still tell you that according to their SLA, the service has had 100% uptime.
It sounds impossible, but it’s a direct consequence of how Google has written its SLA for Google Apps (which includes Gmail, Google Docs, Google Calendar and more). We will explain this in detail further down, but let’s first look at what the SLA actually says.
What the Google Apps SLA says
Here are the key parts, quoted from the Google Apps SLA, emphasis added by us:
“Downtime Period” means, for a domain, a period of ten consecutive minutes of Downtime. Intermittent Downtime for a period of less than ten minutes will not be counted towards any Downtime Periods.
[...]
“Monthly Uptime Percentage” means total number of minutes in a calendar month minus the number of minutes of Downtime suffered from all Downtime Periods in a calendar month, divided by the total number of minutes in a calendar month.
So, “downtime periods” are what’s ultimately used for counting the uptime percentage for Google Apps, and these downtime periods ignore all downtime that lasts less than 10 minutes.
A worst-case scenario
Now back to our initial statement. How does the SLA make it possible to have more than 21 hours of downtime in a day and yet Google would call it 100% uptime?
Here is the problem: What if Google Apps was down for 9 minutes, up for 1 minute, down 9 minutes, etc. That would mean 54 minutes of downtime each hour, but Google still wouldn’t count it because none of the individual downtimes lasted 10 minutes of more.
Over a day (24 hours), that’s 21 hours and 36 minutes of downtime that Google would simply ignore when calculating the final uptime percentage.

Above: Red is downtime, green is uptime. Note that none of the downtime periods above last 10 minutes or more and thus are not counted according to the Google Apps SLA.
It’s an extreme and very unlikely worst-case scenario, but we wanted to illustrate the consequence of how Google’s SLA sums up its downtime and calculates its uptime percentage.
A more likely scenario
Now let’s take a much more likely example of intermittent problems:
3m, 8m, 12m, 5m, 9m, 14m, 4m = 57 minutes of actual downtime
But Google would only count this as 26 minutes of downtime, including only the downtime periods lasting 12 and 14 minutes.

Above: In this scenario, only the downtime periods lasting 12 and 14 minutes (marked with yellow) would be counted according to the Google Apps SLA.
Short outages are common in the real world
The problem with the Google Apps SLA is that short outages, less than 10 minutes in length, are actually very common in the real world.
As you may know, we here at Pingdom run an uptime monitoring service, and we know from our own experience (and a LOT of data from thousands of sites) that it’s much more common for sites to have multiple short intervals of downtime instead of a few long ones.
The 99.9% monthly uptime guarantee in the Google Apps SLA allows for 43 minutes of downtime in a 30-day month, but ignoring problems that last less than 10 minutes at a time will definitely make it much easier for Google to honor its SLA.

As Super Bowl 46 is approaching, fans will flock to the Lucas Oil Stadium in Indianapolis, Indiana, and to TV sets around the world to follow the New York Giants battle it out with the New England Patriots.
Every Friday we bring you a collection of links to places on the web that we find particularly newsworthy, interesting, entertaining, and topical. We try to focus on some particular area or topic each week, but in general we will cover Internet, web development, networking, performance, and other geeky topics.h
Out of the
Pingdom’s Mobile Podcast is a weekly show about Internet, web, and mobile stuff.
Want to be able to download a DVD worth of data in about 38 minutes? It may not seem very impressive, but that’s with the average Internet speed in South Korea, according to the latest “


Jason
January 6th, 2009 at 3:55 pm
Actually it is even more interesting, if they monitor by the second, which I’m sure they do.
9 Mins and 59 seconds would not count as downtime, so you can have 9:59 mins down and 1 second up, which would give you basically about 2 minutes of uptime daily.