Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Do you know if your website is up right now? We do! LEARN MORE

How Google collects data about you and the Internet

We are watching you

Google has, perhaps more than any other company, realized that information is power. Information about the Internet, information about innumerable trends, and information about its users, YOU.

So how much does Google know about you and your online habits? It’s only when you sit down and actually start listing all of the various Google services you use on a regular basis that you begin to realize how much information you’re handing over to Google.

This has, as these things tend to do, given rise to various privacy concerns. It probably didn’t help when Google’s CEO, Eric Schmidt, recently went on the record saying: “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”

Now let’s have a look at how Google is gathering information from you, and about you.

Google’s information-gathering channels

Google’s stated mission is “to organize the world’s information and make it universally accessible and useful” and it is making good on this promise. However, Google is gathering even more information than most of us realize.

  • Searches (web, images, news, blogs, etc.) – Google is, as you all know, the most popular search engine in the world with a market share of almost 70% (for example, 66% of searches in the US are made on Google). Google tracks all searches, and now with search becoming more and more personalized, this information is bound to grow increasingly detailed and user specific.
  • Clicks on search results – Not only does Google get information on what we search for, it also gets to find out which search results we click on.
  • Web crawling – Googlebot, Google’s web crawler, is a busy bee, continuously reading and indexing billions of web pages.
  • Website analytics – Google Analytics is by far the most popular website analytics package out there. Due to being free and still supporting a number of advanced features, it’s used by a large percentage of the world’s websites.
  • Ad serving – Adwords and Adsense are cornerstones of Google’s financial success, but they also provide Google with a lot of valuable data. Which ads are people clicking on, which keywords are advertisers bidding on, and which ones are worth the most? All of this is useful information.
  • Email – Gmail is one of the three largest email services in the world, together with competing options from Microsoft (Hotmail) and Yahoo. Email content, both sent and received, is parsed and analyzed. Even from a security standpoint this is a great service for Google. Google’s email security service, Postini, gets a huge amount of data about spam, malware and email security trends from the huge mass of Gmail users.
  • Twitter – “All your tweets are belong to us,” to paraphrase an early Internet meme. Google has direct access to all tweets that pass through Twitter after a deal made late last year.
  • Google Apps (Docs, Spreadsheets, Calendar, etc.) – Google’s office suite has many users and is of course a valuable data source to Google.
  • Google Public Profiles – Google encourages you to put a profile about yourself publicly on the Web, including where you can be found on social media sites and your homepage, etc.
  • Orkut – Google’s social network isn’t a success everywhere, but it’s huge in some parts of the world (mainly Brazil and India).
  • Google Public DNS – Google’s newly launched DNS service doesn’t just help people get fast DNS lookups, it helps Google too, because it will get a ton of statistics from this, for example what websites people access.
  • The Google Chrome browser – What is your web browsing behavior? What sites do you visit?
  • Google Finance – Aside from the finance data itself, what users search for and use on Google Finance is sure to be valuable data to Google.
  • YouTube – The world’s largest and most popular video site by far is, as you know, owned by Google. It gives Google a huge amount of information about its users’ viewing habits.
  • Google Translate – Helps Google perfect its natural language parsing and translation.
  • Google Books – Not huge for now, but has the potential to help Google figure out what people are reading and want to read.
  • Google Reader – By far the most popular feed reader in the world. What RSS feeds do you subscribe to? What blog posts do you read? Google will know.
  • Feedburner – Most blogs use Feedburner to publicize their RSS feeds, and every Feedburner link is tracked by Google.
  • Google Maps and Google Earth – What parts of the world are you interested in?
  • Your contact network – Your contacts in Google Talk, Gmail, etc, make up an intricate network of users. And if those also use Google, the network can be mapped even further. We don’t know if Google does this, but the data is there for the taking.
  • Coming soon – Chrome OS, Google Wave, more up-and-coming products from Google.

And the list could go on since there are even more Google products out there, but we think that by now you’ve gotten the gist of it… ;)

Much of this data is anonymized, but not always right away. Logs are kept for nine months, and cookies (for services that use them) aren’t anonymized until after 18 months. Even after that, the sheer amount of generic user data that Google has on its hands is a huge competitive advantage against most other companies, a veritable gold mine.

Google’s unstoppable data collection machine

There are many different aspects of Google’s data collection. The IP addresses requests are made from are logged, cookies are used for settings and tracking purposes, and if you are logged into your Google account, what you do on Google-owned sites can often be coupled to you personally, not just your computer.

In short, if you use Google services, Google will know what you’re searching for, what websites you visit, what news and blog posts you read, and more. As Google adds more services and its presence gets increasingly widespread, the so-called Googlization (a term coined by John Batelle and Alex Salkever in 2003) of almost everything continues.

The information you give to any single one of Google’s services wouldn’t be much to huff about. The really interesting dilemma comes when you use multiple Google services, and these days, who doesn’t?

Try using the Internet for a week without touching a single one of Google’s services. This means no YouTube, no Gmail, no Google Docs, no clicking on Feedburner links, no Google search, and so on. Strictly, you’d even have to skip services that Google partner with, so, sorry, no Twitter either.

This increasing Googlization is probably why some people won’t want to use Google’s Chrome OS, which will be strongly coupled with multiple Google services and most likely give Google an unprecedented amount of data about your habits.

Why does Google do this?

As we stated in the very first sentence of this article, information is power.

With all this information at its fingertips, Google can group data together in very useful ways. Not just per user or visitor, but Google can also examine trends and behaviors for entire cities or countries.

Google can use the information it collects for a wide array of useful things. In all of the various fields where Google is active, it can make market decisions, research, refine its products, anything, with the help of this collected data.

For example, if you can discover certain market trends early, you can react effectively to the market. You can discover what people are looking for, what people want, and make decisions based on those discoveries. This is of course extremely useful to a large company like Google.

And let’s not forget that Google earns much of its money serving ads. The more Google knows about you, the more effectively it will be able to serve ads to you, which has a direct effect on Google’s bottom line.

It’s not just Google

It should be mentioned that Google’s isn’t alone in doing this kind of data collection. Rest assured that Microsoft is doing similar things with Bing and Hotmail, to name just one example.

The problem (if you want to call it a problem) with Google is that, like an octopus, its arms are starting to reach almost everywhere. Google has become so mixed up in so many aspects of our online lives that it is getting an unprecedented amount of information about our actions, behavior and affiliations online.

Google, an octopus?
Google, an octopus?

Accessing Google’s data vault

To its credit, Google is making some of its enormous cache of data available to you as well via various services.

  • Google Trends
  • Google Trends for Websites
  • Google Insights for Search
  • Google Ad Planner
  • Various search tools like the Wonder Wheel.

If Google can make that much data publicly available, just imagine the amount of data and the level of detail Google can get access to internally. And ironically, these services give Google even more data, such as what trends we are interested in, what sites we are trying to find information about, and so on.

An interesting observation when using these tools is that in many cases information can be found for everything except for Google’s own products. For example, Ad Planner and Trends for Websites don’t show site statistics for Google sites, but you can find information about any other sites.

No free lunch

Did you ever wonder why almost all of Google’s services are free of charge? Well, now you know. That old saying, “there ain’t no such thing as a free lunch,” still holds true. You may not be paying Google with dollars (aside from clicking on those Google ads), but you are paying with information. That doesn’t have to be a bad thing, but you should be aware of it.

Photo credit: Octopus by Albert Kok.



26 comments
Data Mining Courses
Data Mining Courses

It is extremely disturbing how much data Google has about us. Like you listed everything from names and addresses to credit card numbers and what we are doing throughout the day(social media sites).

Isabelle
Isabelle

As a consumer using websites to purchase or create a profile, I hesitate everytime about what will happen to my information. Perhaps the best thing is to create a fake profile to use everytime, with a PO Box and use a disposable visa so no one can get "real" details from you.

Sociale
Sociale

It's time to wake up and stop publishing private data on the web. This behaviour has to be populated so that noone will collect users private unwittingly.

Xris
Xris

Really good and important article. It's nice to see such a broad number of views and thoughtful responses. I'm heartened to see folks take this seriously. My two cents: We should not be afraid of these technologies, but we MUST pay attention. I'm a big Google user. And I watch these matters carefully. I realized many years ago that tracking and data mining are wide-spread, unstoppable, and potentially dangerous. I'm particularly worried about ending up on watch lists by accident with no knowledgeable human intervention. Google was the first player (I'm aware of) to address these concerns head-on and openly. Therefore I support Google as a model of good practice in a world of far more nefarious players. But I also realize we must watch them carefully. Google certainly is not Little Red Riding Hood. But I don't see them as 'the wolf' either. They do at least maintain some humanity in potentially damaging processes that machines are all too often allowed to run without human oversight. It's up to each and every individual citizen to stay informed. And we must also undertake to enlighten our friends and family to today's common practices and the potential dangers of 'blind faith' in the system. Not to scare anyone, but to open their eyes to the reality of life in cyberspace. Thanks again for the clear and concise report. Xris

taikan
taikan

All of the earlier comments seemed to focus only on what Google might do with the information it collects about each of us. Even though the federal government is prohibited from collecting similar information, there is nothing to prevent it from getting that information from Google by means of an administrative subpoena or, if the government can persuade a judge, search warrant.

rockey16
rockey16

Ok we all know information is very valuable. The more information Google has about us the more likely it can profile us better and in turn targets ads that are specifically tailored towards us. In this century its impossible to stay off internet. And avoiding Google is not the correct solution either. If not Google someone else will collect the data this is the trend we are heading to. But we all are entitled to know what else is going on with our personal data. Are the laws regulating internet strong enough to stop someone from abusing us with the data they have at hand? We all are entitled for private space.

Dave
Dave

Wow -And Google still can't show my business in the right location on its maps, despite numerous requests for corrections. Boycott Google as best you can.

Anon
Anon

use Scroogle Scraper, go to Scroogle.org. It doesn't track you.

Bob Gilley
Bob Gilley

“If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” Welcome to the "Brave New World" in which freedom and choice have become quaint relecs of past democratic idealist. Big brother has been watching for decades.

sars rocks
sars rocks

Google's split personality is showing: Google of old had a persona of "do no harm" the new persona is now one of "I want control" Example: Google now seems to be using its following to collect cell phone numbers. Recently I needed to access my Google maps and found that I needed a gmail account. When I tried to get one I found that I had to give them a cell phone number for texting me the access. Well, I don't have a cell phone with texting capibilities and wouldn't give the number to anyone if I did. The suggested work around of using a friends phone was even worse. I wouldn't give even my best friend (and Google isn't one) access to my personal information. I couldn't even give any feedback to Goggle because the only way to contact them seems to be via a Google group and you guessed it, you need a gmail account to participate. I may be suffering from paranoia but why, other than for some yet to be defined commercial exploitation, would Google want my personal information. I guess Google is taking the attitude that there is no more free lunch (as is their right to) with out giving them a marketing channel direct to your phone. I will not be using Google for any services unless I can get my investment in My Maps back. Sorry Google: You can't have my "EVERYTHING"

Bob Hazard
Bob Hazard

I for one welcome our new google overlords. I read what you say, and it all makes sense - it is not some conspiracy theory, but for some reason I feel comfortable with google whereas I have serious worries over microsoft and facebook. Whether it is their support for open source, open web standards or just the fact that they are pretty upfront about their model of being an advertising company that gives everything else away, for some irrational reason I gave my digital life over to them. Fear of the HD ticking in the background drove me away from Windows to Linux many years ago; what is it doing? Is it phoning home or just a virus? I should feel the same about google but I don't I guess it is because I know google doesn't care about me just about trends on a large scale, where MS is looking for DRM licences on my disk etc

Albin
Albin

The article leaves out (or I missed it) reference to Google's new privacy dashboard, which permits disabling a lot of the data collection. Use Google to find it. Personally, I find some Google services irresistible, so I silo their use in the very fast Chrome browser, while using FireFox with security extensions as default browser for other services in personal email, banking, etc. In that way Google gets to know only what I use Google's own services for, as it would anyway.

Giri Alam
Giri Alam

i have to disagree with all of your points. 1-7 You give your information to Google in exchange for free services like web search and email. and every search engine does the same things. Chrome OS is only available for specific netbook. read http://tekno.kompas.com/read/xml/2009/12/30/10014897/Bocor..Detail.Netbook.Chrome.OS. and buying a pc with windows7 bundled might not be a wise choice. please consider viruses and malwares. Google Docs is a pioneer for cloud computing, even microsoft office 2010 will have this feature. read http://en.wikipedia.org/wiki/Office_Web_Apps

Offbeatmammal
Offbeatmammal

don't forget as well they are also expanding beyond the obvious ... Google Health and their new role as an electricity broker opens up the places their tendrils reach. All this in a decade... from a simple search engine to knowing more about you than your own government...

Stupidscript
Stupidscript

You showed your hand by including Apps and Finance in your list, and by your comments included with the inclusion of Contacts. While it is true that those are stored on Google's servers and run Google's software, NONE of the data in any of those services is available to Google, neither through programming nor human snooping. In addition, the contents of Gmail messages are simply parsed for ad correlation, so the only data being mined in that app is the same as what is being mined in the search engine: word popularity and ad serving/clicks. And if you believe that your Contact List would be available to Google in any form, you need a bigger tin foil hat. The information in all of those sections is very strictly kept private, and only a court order could be successful in revealing their contents, and then only to law enforcement in pursuit of a specific goal, never to Google. Let's talk about the access your operating system manufacturer has to your personal stuff, shall we? It puts whatever Google might be able to put together on you to shame, because it doesn't care whether you use Google or not. Every mouse click, every keystroke, every password ... all available to the operating system. What? You say that I have no proof that Microsoft is checking in on my activities? Ignoring "Automatic Updates" and "Windows Authenticity Verification", your claims about Google hold even less water. Let's talk about the phone companies, shall we? Every time you pick up the phone and call someone, they don't even need to inform you that your call is being spied on! It's the law of the land! So before you get all hot and bothered by the data mining that Google is doing on your searches and whatnot, try to keep it real and remind yourself that with very few exceptions, EVERYTHING you do and EVERYWHERE you go, you are being shadowed by some company or another. If not Google, then AT&T, Microsoft or the U.S. Government. It's all part of the fun of living in the 21st Century. Get over it.

fjpoblam
fjpoblam

Ditto James Foster very much *and* Marah Marie! (Avoid GOOG, but remember GOOG search tracking is opt-in by default). Choose another SE on all yer browsers. Install Glims on Safari so's you can get another SE there, too. Use yauba or scroogle.org (note, *not* scroogle.com which is a porn site) for safe search. It's blamed hard to get an online email service. The safest is to get yer own domain and either use webmail via cpanel or download to yer laptop via pop or imap. Yahoo! "promises" in TOS not to scan your email for targeted advertising, BUT Yahoo! recently acquired xoopit for email image processing, and xoopit TOS do *not* make such a promise! So, while it would be nice to have online services provide the same sort of privacy as we expect from the mail we transmit via USPS, the truth is still, "never post online anything you wouldn't be willing to see displayed on a public billboard."

James Foster
James Foster

OK for the paranoid I think freenet is worth trying.

Marah Marie
Marah Marie

The trouble with James' "easy solution" is, Google monitors and stores searches by IP address. If you use a proxy or dynamic IP, no problem. If you have a static IP that you normally don't have a problem surfing the Web with, Google will capture, store, and base your search results on everything you search for with that IP. It's called your "Web history" and you can find a link for it in the upper right-hand corner of the results page - but you can't see your own history, not even once you click the link, unless you sign in. You can only modify or turn it off if you're signed in. If you're not signed in, you can still turn it off, but it's cookie-based, so every time you clear the cookie/restart your browser (depending on exactly what your cookie settings are) Google starts storing your web history to "tailor" results to you all over again. And we are all automatically "opted-in" to this so-called "service", like it or not. It's not just "opt-out", either, it's "constantly opt-out, each time I toss Google's cookie". It's gotten to the point where I think a permanent connection to TOR might be the only answer, and even that has its own obvious limitations and shortcomings (including slowness).

MaikR
MaikR

Nice article but... the bit populistic "big brother is watching you" and the overworked metaphor of an "octopus" could imply that there's something that we should be afraid of. So: What else would Google actually DO (so not COULD do, but DO) with aaaaaaalllll that information about you? They use it to make money on trend reports and advertising... but what else? Why is Google offering something as The Data Liberation Front? Goal: "Users should be able to control the data they store in any of Google's products. Our team's goal is to make it easier to move data in and out." You may visit the FAQ at http://www.dataliberation.org/ and the corresponding blog at http://dataliberation.blogspot.com/ and read more on Data Liberation. Of course, when Google would misuse our information (as in "insider trading") they should be brought to their knees. But I don't think they would do such a thing. Why Open Source, Open Standards and Data Liberation? And, why throw away a good reputation? Another quote from CEO Eric Schmidt: "How do you be big without being evil? We don't trap end users. So if you don't like Google, if for whatever reason we do a bad job for you, we make it easy for you to move to our competitor." Google wants to become the number one choice in Cloud Computing for the Enterprise. There's a billion dollar market at there fingertips and they have all the tools in place (hardware and software). They have to be a dependable, secure, rock solid... etcetera... 200% reliable partner. As long as Google can make a good profit on 'scanning' and using my "raw data" and surfing behavior for minimal adverts (no popups, screaming banners or spam email) and market reports than that's fine with me. Or, use it to provide a virus and spam free platform for their (business) users, thats of course is fine too. As long as Google uses me for beta testing release candidates of 'free' products and services than that's also fine with me. Otherwise I also could pay 50 dollars a year for using the full Google Apps Premium platform. Just my 2 cents...

kate
kate

On #6 in James' comment - to avoid Google Search, you can substitute Ixquick search - it doesn't store any IP data. Also, don't forget...if you use Blogger.com...it's Google.

James Foster
James Foster

There is an easy solution to prevent googls harvesting your data: 1. Use Adblockplus and no script on Firefox. 2. Do not create a Google account and use Gmail. Use your ISP or Yahoo mail or Hotmail. 3. Do not buy an Android phone. Buy the iPhone, a Blackberry (but only if e-mail is your top priority), HTC HD2, or Nokia N900 (maemo), Nokia E71, Nokia N97, Nokia 5800. 4. Do not even consider a Chrome OS PC. Buy a Windows 7 PC if you want the most choice and best value. Buy a Mac if you are a poser or you can afford one. Install Fedora, Ubuntu or Suse only if you are a ubergeek. 5. Do not use Google Apps. It is shit plus Google mine your data. If you want online Apps try Office Live, Zoho and Thinkfree. Or just use Microsoft Office, iWork, Openoffice whatever on the hard drive. 6. If you use Google search which I do as about 80 of you do then make sure you do not have a Google Account or just delete your existing Google Account. The point of Gmail and now Calender and Apps is to force you to sign up for a Google Account and make it harder for you to migrate to a different search engine when one eventually comes along. 7. In conclusion only use Google Search and Youtube but don't sign in.

Travel
Travel

yes same with google checkout...

Russell Heimlich
Russell Heimlich

If Google is sooo smart they would realize I bought a Nexus One and stop showing me ads touting the Nexus One. Afterall, why would I buy another one?

Trackbacks

  1. [...] post today on http://www.RoyalPingdom.com about why Google collects information on its users and the ways they go about doing it. Google has, [...]

  2. [...] Success. http://bit.ly/5zbY5UHow Google collects data about you and the Internet | Royal Pingdom http://bit.ly/7LyJ2DInternet usage on mobile devices is expected to ramp up even faster than it did on desktop [...]

  3. [...] How Google collects data about you and the Internet Google has, perhaps more than any other company, realized that information is power. Information about the Internet, information about innumerable trends, and information about its users, YOU. [...]

  4. [...] The Royal Pingdom blog had a post last week that detailed some of the ways that Google is indexing all of the information in the world with our help and at our expense – all in the name of getting something free. Here’s the abridged list of what they posted: [...]

  5. [...] Google uses to track user behavior. Whenever I mention, noticing hover time, I get weird looks. “How Google Collects Data about You and the Internet” provides an accessible summary of the basics of Google monitoring technology. Most people [...]

  6. [...] mit Google, wenn es um die Faktoren Einfluss im Netz und das Sammeln von Nutzerdaten geht. Beispiele? Gerne: Google ist die populärste Suche des Planeten, alle Anfragen werden gespeichert, auch, [...]

  7. [...] Pingdom : Comment Google récolte des données sur vous et Internet ; [...]

  8. [...] (1) Voici la liste impressionnante des données récoltées par les services Google: How Google collects data about you and the Internet [...]

  9. [...] de acuerdo o no con él, pero de un modo u otro Google sigue a toda maquina. En un post titulado How Google collects data about you and the Internet, el Blog Royal Pingdom explica paso a paso como el gigante de las búsquedas recoge todos los datos [...]

  10. [...] How Google collects data about you and the Internet | Royal Pingdom [...]