Google Apps SLA loophole allows for major downtime without consequences
Gmail could be unavailable for more than 21 hours in a day, and Google could still tell you that according to their SLA, the service has had 100% uptime.
It sounds impossible, but it’s a direct consequence of how Google has written its SLA for Google Apps (which includes Gmail, Google Docs, Google Calendar and more). We will explain this in detail further down, but let’s first look at what the SLA actually says.
What the Google Apps SLA says
Here are the key parts, quoted from the Google Apps SLA, emphasis added by us:
“Downtime Period” means, for a domain, a period of ten consecutive minutes of Downtime. Intermittent Downtime for a period of less than ten minutes will not be counted towards any Downtime Periods.
“Monthly Uptime Percentage” means total number of minutes in a calendar month minus the number of minutes of Downtime suffered from all Downtime Periods in a calendar month, divided by the total number of minutes in a calendar month.
So, “downtime periods” are what’s ultimately used for counting the uptime percentage for Google Apps, and these downtime periods ignore all downtime that lasts less than 10 minutes.
A worst-case scenario
Now back to our initial statement. How does the SLA make it possible to have more than 21 hours of downtime in a day and yet Google would call it 100% uptime?
Here is the problem: What if Google Apps was down for 9 minutes, up for 1 minute, down 9 minutes, etc. That would mean 54 minutes of downtime each hour, but Google still wouldn’t count it because none of the individual downtimes lasted 10 minutes of more.
Over a day (24 hours), that’s 21 hours and 36 minutes of downtime that Google would simply ignore when calculating the final uptime percentage.
Above: Red is downtime, green is uptime. Note that none of the downtime periods above last 10 minutes or more and thus are not counted according to the Google Apps SLA.
It’s an extreme and very unlikely worst-case scenario, but we wanted to illustrate the consequence of how Google’s SLA sums up its downtime and calculates its uptime percentage.
A more likely scenario
Now let’s take a much more likely example of intermittent problems:
3m, 8m, 12m, 5m, 9m, 14m, 4m = 57 minutes of actual downtime
But Google would only count this as 26 minutes of downtime, including only the downtime periods lasting 12 and 14 minutes.
Above: In this scenario, only the downtime periods lasting 12 and 14 minutes (marked with yellow) would be counted according to the Google Apps SLA.
Short outages are common in the real world
The problem with the Google Apps SLA is that short outages, less than 10 minutes in length, are actually very common in the real world.
As you may know, we here at Pingdom run an uptime monitoring service, and we know from our own experience (and a LOT of data from thousands of sites) that it’s much more common for sites to have multiple short intervals of downtime instead of a few long ones.
The 99.9% monthly uptime guarantee in the Google Apps SLA allows for 43 minutes of downtime in a 30-day month, but ignoring problems that last less than 10 minutes at a time will definitely make it much easier for Google to honor its SLA.