View profile

ArchiTECHt Daily: Facebook exposes the limits of Hadoop (again)

This is a an edited excerpt of a post that originally appeared on the website on Saturday. On Friday,
ArchiTECHt Daily: Facebook exposes the limits of Hadoop (again)
By ARCHITECHT • Issue #11 • View online
This is a an edited excerpt of a post that originally appeared on the website on Saturday.
On Friday, Facebook published a blog post detailing a new open source system called Beringei. It’s a time-series storage engine that the company uses to serve real-time data about system health to both the humans and automated systems tasked with keeping Facebook online. Because of the scale and real-time nature of Facebook’s operations—"Beringei currently stores up to 10 billion unique time series and serves 18 million queries per minute"—the company had to replace its previous HBase daa store for this workload.
Beringei signifies yet another step in the evolution of how the company is using Hadoop. And where Facebook goes, the industry tends to follow.
In part, this is because when Facebook starts outgrowing a technology, it’s a pretty good indicator that other large users of those technologies might soon start running into similar issues. For example, Facebook created the Hive data warehouse system for running SQL queries on Hadoop data, only to replace it with a much faster system called Presto in 2013.
Since then, Presto has amassed an impressive list of users. What’s more, Hadoop vendors Cloudera, Hortonworks and MapR—plus any number of startups, and even users such as Salesforce—have developed their own low-latency Hive alternatives.
Recently (and somewhat ironically), Facebook jumped onto the Apache Spark bandwagon in big way, after putting Spark through its paces to ensure it could handle Facebook’s giant batch-processing workloads. Spark was created, and became hugely popular, as a simpler, faster alternative to Hadoop MapReduce.
But Beringei is different from previous Facebook creations such as Hive, Presto or Corona because it doesn’t require Hadoop at all. While it’s a narrow-enough use case that it alone won’t likely have a material effect on HBase usage, or certainly on the overall market for Hadoop software, Beringei might play a small role in a future scenario where Hadoop just isn’t on the radar for many companies.
Already, startups not yet dealing with big data (at least by today’s definition), can likely afford to bypass Hadoop and its relative complexity altogether, opting instead to build around more modern and right-scale technologies from the start. Beringei and Spark are two examples among a sea of open source databases, data stores and real-time processing engines now available.
And as the pool of engineers leave companies like Facebook and other early adopters, to start their own companies or join others, we might expect them to become software Johnny Appleseeds spreading the seed of these technologies and practices across new lands. You hear frequently about ex-Google engineers missing its famous Borg system once they leave, for example, or about engineers having to rebuilt the same thing over and over as they move from job to job. But the beautiful thing about the open source era is that it’s much easier now to take your favorite tools with you.

Source: Facebook
Source: Facebook
What's new on ArchiTECHt
Around the web: Artificial intelligence
Around the web: Cloud and infrastructure
Around the web: Security
Did you enjoy this issue?

ARCHITECHT delivers the most interesting news and information about the business impacts of cloud computing, artificial intelligence, and other trends reshaping enterprise IT. Curated by Derrick Harris.

Check out the Architecht site at

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue