Hadoop – Weekend must-read articles #20
Dealing with “big data” is a headache many companies and organizations are facing. That we have large amounts of data to process and store is perhaps not something new, but the sheer volume of data we now face are unprecedented. That’s where Apache Hadoop comes in.
Every Friday we bring you a collection of links to places on the web that we find particularly newsworthy, interesting, entertaining, and topical. We try to focus on some particular area or topic each week, but in general we will cover Internet, web development, networking, performance, security, and other geeky topics.
This week’s suggested reading
It is time for your kids to start learning about Hadoop, the formless data repository that is the current favorite of many dot-coms and the darling of the data nerds. Indeed, the younger the better. The Hadoop ecosystem is a big tent and getting bigger. To grok it, you have to cast aside several long-held tech assumptions.
Datameer and Karmasphere, two competing upstart vendors offering reporting, data-visualization, and data-analysis capabilities on top of Hadoop, released new versions of their software on Monday. Both talked up the need for next-generation tools. It’s not that old-school business intelligence software tools are going away, these upstarts grant. But both portray batch-oriented extract-transform-load (ETL) data integration, relational data warehousing, and old-school analytics as too slow, rigid, and expensive to keep up in the big-data era.
Open … and Shut Hadoop is quickly becoming essential infrastructure for enterprises hoping to glean insights from the massive quantities of data they collect. The problem is that relatively few enterprises have the necessary competence to make effective use of the still-complex open-source project. While Hadoop vendors like Cloudera, Hortonworks, EMC, and MapR are doing their parts to simplify Hadoop, the real breakthrough for Hadoop may come from the applications that run on it, and not improvements to the infrastructure…
Running what they believe is the world’s largest Hadoop-based collection of data, Facebook engineers have developed a way to circumnavigate a core weakness of the data analysis platform, that of relying on only a single name server to coordinate all operations… Facebook has what it believes is the world’s largest collection of data on the Hadoop Distributed File System (HDFS), over 100 PBs worth, spread out over 100 different clusters across its data centers.
Hadoop remains a difficult platform for most enterprises to master. For now skills are still hard to come by – both for data architect or engineer, and especially for data scientists. It still takes too much skill, tape, and baling wire to get a Hadoop cluster together. Not every enterprise is Google or Facebook, with armies of software engineers that they can throw at a problem. With some exceptions, most enterprises don’t deal with data on the scale of Google or Facebook either – but the bar is rising.
Now six years old, the Apache Hadoop platform for storing and processing huge amounts of data — perhaps the catalyst of the current big data movement — appears ready for its closeup. According to the companies leading the Hadoop charge, they’re already beating away customers with a stick. Continual improvements to make Hadoop consumable by mainstream business users and applications are only going to make things better.
This is a big-data week in Silicon Valley, kicking off last night with a Churchill Club event here called “The Elephant in the Enterprise: What Role will Hadoop Play?” and featuring a high-powered group of big-data executives.
Hadoop, the open-source software that has emerged as the de facto standard for big data processing, may be what tips enterprise in the favor of open source. The desire to get more data and find value in it has become a business priority, and Hadoop is playing a major role in making sense of data.
The lure of using big data for your business is a strong one, and there is no brighter lure these days than Apache Hadoop, the scalable data storage platform that lies at the heart of many big data solutions. But as attractive as Hadoop is, there is still a steep learning curve involved in understanding what role Hadoop can play for an organization, and how best to deploy it.
How do you pronounce Hadoop?
Watch this video with Doug Cutting, the creator of Hadoop.
You can also subscribe to these articles
You can also subscribe to these weekly articles and receive them in your email inbox each week.
Image (top) from Shutterstock.