Pingdom Home

US + international: +1-212-796-6890

SE + international: +46-21-480-0920

Business hours 3 am-11:30 am EST (Mon-Fri).

Do you know if your website is up right now? We do! LEARN MORE

Exploring the software behind Facebook, the world’s largest site

FacebookAt the scale that Facebook operates, a lot of traditional approaches to serving web content break down or simply aren’t practical. The challenge for Facebook’s engineers has been to keep the site up and running smoothly in spite of handling close to half a billion active users. This article takes a look at some of the software and techniques they use to accomplish that.

Facebook’s scaling challenge

Before we get into the details, here are a few factoids to give you an idea of the scaling challenge that Facebook has to deal with:

  • Facebook serves 570 billion page views per month (according to Google Ad Planner).
  • There are more photos on Facebook than all other photo sites combined (including sites like Flickr).
  • More than 3 billion photos are uploaded every month.
  • Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.
  • More than 25 billion pieces of content (status updates, comments, etc) are shared every month.
  • Facebook has more than 30,000 servers (and this number is from last year!)

Software that helps Facebook scale

In some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its operation to incorporate a lot of other elements and services, and modify the approach to existing ones.

For example:

  • Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance.
  • Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput).
  • Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the “other side” of the Memcached layer).

Then there are the custom-written systems, like Haystack, a highly scalable object store used to serve Facebook’s immense amount of photos, or Scribe, a logging system that can operate at the scale of Facebook (which is far from trivial).

But enough of that. Let’s present (some of) the software that Facebook uses to provide us all with the world’s largest social network site.

Memcached

MemcachedMemcached is by now one of the most famous pieces of software on the internet. It’s a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).

Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the world’s largest Memcached installation.

HipHop for PHP

HipHop for PHPPHP, being a scripting language, is relatively slow when compared to code that runs natively on a server. HipHop converts PHP into C++ code which can then be compiled for better performance. This has allowed Facebook to get much more out of its web servers since Facebook relies heavily on PHP to serve content.

A small team of engineers (initially just three of them) at Facebook spent 18 months developing HipHop, and it is now live in production.

Haystack

Haystack is Facebook’s high-performance photo storage/retrieval system (strictly speaking, Haystack is an object store, so it doesn’t necessarily have to store photos). It has a ton of work to do; there are more than 20 billion uploaded photos on Facebook, and each one is saved in four different resolutions, resulting in more than 80 billion photos.

And it’s not just about being able to handle billions of photos, performance is critical. As we mentioned previously, Facebook serves around 1.2 million photos per second, a number which doesn’t include images served by Facebook’s CDN. That’s a staggering number.

BigPipe

BigPipe is a dynamic web page serving system that Facebook has developed. Facebook uses it to serve each web page in sections (called “pagelets”) for optimal performance.

For example, the chat window is retrieved separately, the news feed is retrieved separately, and so on. These pagelets can be retrieved in parallel, which is where the performance gain comes in, and it also gives users a site that works even if some part of it would be deactivated or broken.

Cassandra

CassandraCassandra is a distributed storage system with no single point of failure. It’s one of the poster children for the NoSQL movement and has been made open source (it’s even become an Apache project). Facebook uses it for its Inbox search.

Other than Facebook, a number of other services use it, for example Digg. We’re even considering some uses for it here at Pingdom.

Scribe

Scribe is a flexible logging system that Facebook uses for a multitude of purposes internally. It’s been built to be able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up (Facebook has hundreds).

Hadoop and Hive

HadoopHadoop is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). Hive originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use.

Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.

Thrift

Facebook uses several different languages for its different services. PHP is used for the front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps other languages as well). Thrift is an internally developed cross-language framework that ties all of these different languages together, making it possible for them to talk to each other. This has made it much easier for Facebook to keep up its cross-language development.

Facebook has made Thrift open source and support for even more languages has been added.

Varnish

VarnishVarnish is an HTTP accelerator which can act as a load balancer and also cache content which can then be served lightning-fast.

Facebook uses Varnish to serve photos and profile pictures, handling billions of requests every day. Like almost everything Facebook uses, Varnish is open source.

Other things that help Facebook run smoothly

We have mentioned some of the software that makes up Facebook’s system(s) and helps the service scale properly. But handling such a large system is a complex task, so we thought we would list a few more things that Facebook does to keep its service running smoothly.

Gradual releases and dark launches

Facebook has a system they called Gatekeeper that lets them run different code for different sets of users (it basically introduces different conditions in the code base). This lets Facebook do gradual releases of new features, A/B testing, activate certain features only for Facebook employees, etc.

Gatekeeper also lets Facebook do something called “dark launches”, which is to activate elements of a certain feature behind the scenes before it goes live (without users noticing since there will be no corresponding UI elements). This acts as a real-world stress test and helps expose bottlenecks and other problem areas before a feature is officially launched. Dark launches are usually done two weeks before the actual launch.

Profiling of the live system

Facebook carefully monitors its systems (something we here at Pingdom of course approve of), and interestingly enough it also monitors the performance of every single PHP function in the live production environment. This profiling of the live PHP environment is done using an open source tool called XHProf.

Gradual feature disabling for added performance

If Facebook runs into performance issues, there are a large number of levers that let them gradually disable less important features to boost performance of Facebook’s core features.

The things we didn’t mention

We didn’t go much into the hardware side in this article, but of course that is also an important aspect when it comes to scalability. For example, like many other big sites, Facebook uses a CDN to help serve static content. And then of course there is the huge data center Facebook is building in Oregon to help it scale out with even more servers.

And aside from what we have already mentioned, there is of course a ton of other software involved. However, we hope we were able to highlight some of the more interesting choices Facebook has made.

Facebook’s love affair with open source

We can’t complete this article without mentioning how much Facebook likes open source. Or perhaps we should say, “loves”.

Not only is Facebook using (and contributing to) open source software such as Linux, Memcached, MySQL, Hadoop, and many others, it has also made much of its internally developed software available as open source.

Examples of open source projects that originated from inside Facebook include HipHop, Cassandra, Thrift and Scribe. Facebook has also open-sourced Tornado, a high-performance web server framework developed by the team behind FriendFeed (which Facebook bought in August 2009).

(A list of open source software that Facebook is involved with can be found on Facebook’s Open Source page.)

More scaling challenges to come

Facebook has been growing at an incredible pace. Its user base is increasing almost exponentially and is now close to half a billion active users, and who knows what it will be by the end of the year. The site seems to be growing with about 100 million users every six months or so.

Facebook even has a dedicated “growth team” that constantly tries to figure out how to make people use and interact with the site even more.

This rapid growth means that Facebook will keep running into various performance bottlenecks as it’s challenged by more and more page views, searches, uploaded images, status messages, and all the other ways that Facebook users interact with the site and each other.

But this is just a fact of life for a service like Facebook. Facebook’s engineers will keep iterating and coming up with new ways to scale (it’s not just about adding more servers). For example, Facebook’s photo storage system has already been completely rewritten several times as the site has grown.

So, we’ll see what the engineers at Facebook come up with next. We bet it’s something interesting. After all, they are scaling a mountain that most of us can only dream of; a site with more users than most countries. When you do that, you better get creative.

Data sources: Various presentations by Facebook engineers, as well as the always informative Facebook engineering blog.



60 comments
FsaFsa
FsaFsa

A very good and informative article indeed . It helps me a lot to enhance my knowledge, I really like the way the writer presented his views. Keep blogging long frocks

jeparaonline
jeparaonline

Kami toko onlien menjual produk Furniture Jepara Kami toko onlien Mebel Jepara dengan sangat teliti karena kualitas selalu menjadi gebyok Kursi Tamu Minimalis dengan model klasik modern juga tersedia mebel minimalis khas mebel jepara yang di buat dengan dari kayu kebun, produk Mebel Jepara Minimalis dengan harga furniture murah dengan Mebel Jepara Minimalis
dari produk Mebel Jepara yang kami tawarkan. Kami selalu berupaya dengan menawarkan produk asli jepara berkualitas tinggi, anda bisa membandingkannya dan kami nanti akan sediakan itu produk Mebel Jati

Epicbeaverinstagram
Epicbeaverinstagram

In our site listings We have now consulted using our IT team to receive buy instagram followers insight on systems choices We have now provided and developed a exercise program to personnel and contractorsPlan and Funnel Range We have now presented and created a social bookmarking funnel guide which shows how you make use of Social Bookmarking There exists a characterized plan with distinct aims goals measures and initiatives We have now assessed and.

PeterMavy
PeterMavy

Forex trading Implemented swiss replica watches to make money. Buying replica watch from hold is actually a cut price. Paying for Rolex watch. Buying The Most suitable watch Men Found at Naaptol.By just 1906 wristlets used a bit more defined measure toward currently being arm watch.

PeterMavy
PeterMavy

To know more about the list of or any other city, visit classified websites like khojle and get ample opportunities in every sector of jobs just by sitting at a comfort of your home. http://citizensitjobs.com

sancoLgates
sancoLgates

Great article, Facebook sharing they technologies and some is open source. Facebook prove that science is a common and must be propagated !!! "HeLL Yeah Mark"

PepepiLosem
PepepiLosem

Electrical power abilities presented inside the various wrist rolex replica watches. Electrical power good Emporio Armani watch available.There are lots of different watch available from all of the artist, sports watch, bridal watch, formalized watch, unconventional build watch by using rubberized, wash rag along with other silver precious metal bandz in addition.

sanamarzoo
sanamarzoo

HY !  Facebook or my space is discussing some of its technological innovation. One particularly exciting example is Hip Hop. This application requires the PHP rule used by many contemporary sites (including those operating on WordPress and Media Wiki) and gathers it into extremely enhanced C++ which web servers procedure more easily. While I do not know much about it, it seems like something that WordPress might be able to usefully integrate or recreate in a upcoming edition. It is awesome to see Facebook or my space offering possibly useful rule to the broader web group. http://www.objectszone.com

YugShende
YugShende

Can we get an update on this article ? Revised statistics , software additions etc .?

MosherCharles
MosherCharles

HI !  Facebook is sharing some of its technologies. One particularly interesting example is HipHop. This software takes the PHP code used by many modern websites (including those running on WordPress and MediaWiki) and compiles it into highly optimized C++ which servers process more quickly. While I don’t know much about it, it seems like something that WordPress might be able to usefully incorporate or reproduce in a future version. It is nice to see Facebook providing potentially useful code to the wider web community.

MosherCharles
MosherCharles

HI !  Facebook is sharing some of its technologies. One particularly interesting example is HipHop. This software takes the PHP code used by many modern websites (including those running on WordPress and MediaWiki) and compiles it into highly optimized C++ which servers process more quickly. While I don’t know much about it, it seems like something that WordPress might be able to usefully incorporate or reproduce in a future version. It is nice to see Facebook providing potentially useful code to the wider web community.

CharlieGilichibi
CharlieGilichibi

Facebook never envisioned it could scale to its current mammoth size. Having started off with PhP was a good choice as it is an open script acting like glue to HTML front end and C++ application logic back ends. With the growth in photos and videos and monetisation growth rate not as fast, the business model might not be as future proof as it should be. That is to say Facebook is not making money fast enough compared to its growth content accummulation rate. eMarkerter research indicated that Facebook revenue peaked (in percentage terms) in 2011 and is now declining. So not only is Facebook facing technological challenges, but shareholder value creation challenges as well. The click-through-rate (CTR) on Facebook is 0.05% while on Google it is 0.4% (comScore, 2011). 

CharlieGilichibi
CharlieGilichibi

Facebook never envisioned it could scale to its current mammoth size. Having started off with PhP was a good choice as it is an open script acting like glue to HTML front end and C++ application logic back ends. With the growth in photos and videos and monetisation growth rate not as fast, the business model might not be as future proof as it should be. That is to say Facebook is not making money fast enough compared to its growth content accummulation rate. eMarkerter research indicated that Facebook revenue peaked (in percentage terms) in 2011 and is now declining. So not only is Facebook facing technological challenges, but shareholder value creation challenges as well. The click-through-rate (CTR) on Facebook is 0.05% while on Google it is 0.4% (comScore, 2011). 

Alfieman
Alfieman

Massive numbers. I have worked with 1000 servers and a 700Terrabite SAN environment. Doesn't scale anywhere near. I am particularly impressed with the software launch and update principle. Very robust and secure. Top marks Facebook.

Doctor Fox
Doctor Fox

This article is really very interesting in that it exposed quite a handful of the most important aspects of running a social media site like Facebook. I have seen one article on carsonified by Steve Huffman, but that is not as detailed as this. I love this article for its definiteness and straight to the point approach. Well done!

Webmaster
Webmaster

This is a brilliant article, i would suggest you keep it updated with current numbers. I had no idea that fb is this huge. So many people are hooked on it and play games etc. It would be great to see what FB plans for its future. 10 years from now they would have grown and how would they scale their infrastructure ?

aditia
aditia

yay open source rules

Rohit Prakash
Rohit Prakash

I always wanted to know that. Excellent stuff. Seeing the figures, I am a bit worried. It is clear from the figures above that facebook is spending excessively on quality and technology. Will it remain FREE?

Jim Isaacs
Jim Isaacs

With a site as massive as FB, I for some reason always thought they had written, or at least started to write their own kernel where the only processes are the common FB actions we know. Is this what you meant by your Linux comment? Even trough I can understand the software and the number of machines it takes to process, what I don't quite understand, or using better words, I can't seem to imagine the networking power needed for a site like this. I wonder if they have 30 to 40k servers, does this mean they are essentially using 30 to 40k network addresses, or maybe half of that? How many of those servers a devoted to networking, how many are devoted to processing, and how many are devoted to storage? Basically how does their massive internal network connect and serve to the half a billion all over the world? It's just staggering to think that FB is running on CDN capacity, and still using a CDN.

Sava
Sava

I wish you can post the same post about Twitter, that would be awesome

Roel Berger
Roel Berger

Very interesting post, thanks for this one. I already did a lot of research on this and most seems to match up. One question I have left from personal research: How do they store their PHP session data? Because with all existing PHP session options, you will always have a single point of failure for session data, or performance issues in my opinion, speaking worldwide. Even with memcache you can't fill all requirements. Do they have a custom session object they use that is stored in cassandra or something else that is blazingly fast to read/write for every pageview? A beer for those that enlighten me on this topic :)

Ejaz
Ejaz

Very informative and insightful article. It is good that facebook is paying back opensoruce community by enhancing the products it uses.

Sunil Chauraha
Sunil Chauraha

Its really making me to think about the technology.

Peter A.
Peter A.

not a word about facebook using ejabberd for xmpp webchat...

Kimse
Kimse

@OS Advocate Think it's easier to just buy Microsoft :P But it would be cool, to see some commercial products listed side-by-side by the open source products in this article.

Gadelkareem
Gadelkareem

Facebook has made it clear that PHP is the best for web applications and developing new compiler for PHP is a big payback to the PHP community

vinay
vinay

really awesome.. fact and very nice to know about facebook.. Thanks for share..

RandomDude
RandomDude

@MS Architect: In reality it is not as simple as just a "price in money" on implementing Facebook on a Microsoft stack. Sure in theory they could throw out the web-servers and replace them with IIS and Windows 2008R2 Server, replace MySQL with MSSQL, replace the development stack with .NET and Visual C++. But then when you are working on that scale it is not as simple as replacing A with B, sure in theory it would be possible but then we are talking a shitload of problems that is not similar to what they have now. And also the solutions would be very very different. And then we have the interesting way of estimating the cost. It is not only license costs, but also costs for development, maintenance etc etc etc. So in reality it would not be estimable as Facebook is so unique in so many ways, so the only way would probably to replace Facebook with IronFacebook (pardon the phun) and find out and that would not happen anytime soon. Also money is not the only factor, sure the development I do for a living would be much cheaper with a FOSS-stack than the current .NET-stack with MSSQL. But then there the money is such a small factor overall with the advantages of using that platform. Everything has advantages, and disadvantages and many people tend to forget that.

Schnäppchen
Schnäppchen

I'm REALLY astonished of how facebook deals with its success. This is much more complicated than I thought it was. It's also very appreciated that they distribute their software as open source, however, only they habe the money and manpower for that. They are like Google, they have a good product used by billions of people which enables them to do good things other companies do not have the money for.

Jason
Jason

@ProPuke The technologies--open source or not--are nowhere near free when there is this much development going into them from the organization utilizing them. For just one piece of the stack above: Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack). If we use the development of their custom compiler as a baseline because they have done deep customization to the OSS stuff: "A small team of engineers (initially just three of them) at Facebook spent 18 months developing HipHop, and it is now live in production." 1.5 years * 3+ developers = not anywhere near free

ProPuke
ProPuke

PaPa Big Don "Where does the money come from? They would not do it for free." Money? They use a load of technologies which are (mostly) free, anyway, then release their inhouse technology as more opensource tech so that the interwebs can scrutinize & improve them, for free. It's win-win.

Muhammad Ghazali
Muhammad Ghazali

Wow, it's pretty cool to know the software behind the facebook and it's great to know too that it's open source. Great post btw. Thank you.

Amit Jain
Amit Jain

really amazed to see the statistics on which Facebook is working. There are many other government portals in my country(India) which are in need of such technologies to smoothly operate its large user base, instead of just getting down. I hope they also have gone through this article.

Iwani
Iwani

Wonderful and insightful post!!

Eric
Eric

What does mysql@facebook offer over a traditional mysql install?

PaPa Big Don
PaPa Big Don

Where does the money come from? They would not do it for free.

dstudio101
dstudio101

"qoute -- PAPAMIKE" This is a very interesting article! However like PAPAMIKE said -- how does the photo management numbers add up? More power!

Ranjith Kumar K
Ranjith Kumar K

Awesome post and interesting... Its very helpful to understand the key features that make the giant working smoothly.. Thanks a lot to the publishers.. Cheers

Agus Dwi Basuki
Agus Dwi Basuki

Wow... This's nice article. So, we can conclude that facebook has enhanced the open-source technology to earn a lot of money. Right? adb.

Pingdom
Pingdom

@ Jim Isaacs: Might be worth pointing out regarding the number of servers... Apparently FB may now have as many as 60k servers.

Trackbacks

  1. [...] must read for web developers Sunday, June 20th, 2010 | Business Saw this link on reddit today, Exploring the software behind Facebook, the world’s largest site: At the scale that Facebook operates, a lot of traditional approaches to serving web content break [...]

  2. [...] FeedBurner, iStockPhoto, YouSendIt, Meebo, Vimeo, Alexaholic a blogu TechCrunch, bude i tentokrát report od Pindomu zajímavý především z úhlu škálování. Co je třeba mít pod kapotou, abyste uvezli [...]

  3. [...] Interesting rundown of the software Facebook uses to scale [...]

  4. [...] Exploring the software behind Facebook, the world’s largest site | Royal Pingdom We have mentioned some of the software that makes up Facebook’s system(s) and helps the service scale properly. But handling such a large system is a complex task, so we thought we would list a few more things that Facebook does to keep its service running smoothly. (tags: software facebook exploring) [...]

  5. [...] Exploring the software behind Facebook, the world’s largest site | Royal Pingdom [...]

  6. [...] » Exploring the software behind Facebook, the world’s largest site [...]

  7. [...] News des Tages: SFO, der Flughafen von San Francisco, wird demnächst gratis Wlan [...]

  8. [...] gibt es eigentlich keine Entschuldigung mehr, die Facebook-Alternative kann gestartet werden! Denn wie man mit 570 Milliarden Seitenaufrufen, 3 Milliarden Bild-Uploads und anderen Fantastilionen….[via]Das ist mir was wert: var flattr_wp_ver = '0.9.5'; var flattr_uid = '7956'; var flattr_url = [...]

  9. [...] Exploring the software behind Facebook, the world’s largest site At the scale that Facebook operates, a lot of traditional approaches to serving web content break down or simply aren’t practical. The challenge for Facebook’s engineers has been to keep the site up and running smoothly in spite of handling close to half a billion active users. This article takes a look at some of the software and techniques they use to accomplish that. Royal Pingdom [...]

  10. [...] reichen Out-of-the-Box-Lösungen bei weitem nicht aus, eigene angepasste Lösungen müssen her. Welche Software für einen reibungslosen Betrieb der Facebook-Plattform sorgt, hat Royal Pingdom ana…. [...]

  11. [...] PECL :: Package :: xhprof. (via) [...]

  12. [...] knights of Royal Pingdom took a ride through the shire of Facebook to survey the code behind it. One of the most interesting technologies featured is HipHop, a PHP-to-C++ compiler. It brings back [...]

  13. [...] is a massive and complicated undertaking, and they actually do a lot of it on open source software.Pingdom takes a look at how Facebook does it, and describes some of the open source technology the company leverages in an interesting article [...]

  14. [...] fascinating loom at the software behind Facebook which helps the site serve up to half a billion users, 570 billion page views every month and the [...]

  15. [...] Cet article est basé sur Exploring the software behind Facebook de Pingdom. [...]

  16. [...] Peak at the Software Behind Facebook Our VP of Customer Services found this interesting article on Pingdom about the techniques and software that Facebook uses to keep up with its half a billion [...]

  17. [...] pingdom.com — which is an uptime and performance monitoring service, they actually wrote a really, really interesting article on some of the software behind Facebook that allows a site as massive as Facebook to run at lightspeed, and it’s pretty neat. First, [...]