How many times have you used the word “Big Data”? And what is your definition of “Big Data”? How “big” is your “Big Data”?
I guess we all have different definitions, and for me, if it fits in RAM it is not considered “Big Data”. AWS supports machines with up to 4TB of RAM, and are working on new machines with up to 16TB, so I guess you can fit a lot of your data in RAM.
This article is explaining how you don’t need to use complex clusters and tools for the purpose of training a logistic regression model. He shows how you can do it with a single machine, and how many requests per second you can process on a single machine. Also, it explains how you can solve your problems even if your data doesn’t fit in RAM, but still you can use a single machine with enough performance for 95% of your company needs.
The same author wrote also this article:
He was reading an article on Hadoop and AWS EMR to do some analytics. The EMR cluster was taking 26 minutes with 7 c1.medium instances, while he managed to reproduce the same experiment using only command lines tools on his laptop and it took 12 seconds instead! Quite impressive performance improvements!
It’s important to know when to use the correct tools for the job, and not just use one just because some articles were mentioning it or some big companies are using it.