If you recall the early days of big data, there was a lot of talk about quantity versus quality. On the one hand, you had Peter Norvig and Google talking about
the unreasonable effectiveness of data—which makes a lot of sense at Google’s scale and with, especially at the time, the limited types of data and scope of things it was trying to analyze. On the other hand, you had pragmatists reiterating the old mantra of “garbage in, garbage out.”
Untold millions of MapReduce jobs later, I think most people can agree that both things are true. It’s true for any sort of data science or predictive analytics process, and it’s especially as more people and organizations start experimenting with artificial intelligence, and deep learning specifically.
I came across three completely different types of content—a blog post, a research paper, and a Quora answer—this week that help drive this point home. Enjoy:
Is your data holding you back? This is a really good overview on data gaps from Silicon Valley Data Science—essentially, the process of figuring out what you have, what it’s good for, and what needs to be done to make it good for thing for which you want it to be good.
—
If you’re wondering why Intel is
investing so many resources into AI, look no further than those TPUs at Google. Beyond wanting to own on-device processing for things like computer vision, I think Intel is also banking on the possibility that GPUs might not be the long-term answer to mainstream AI workloads (even though they are today and, by the way, IBM’s cloud now
offers the newest, most-powerful Nvidia GPUs as a service). If Google’s TPUs are 15-30 times faster than GPUs and CPUs, and 30-80 times more efficient, you can bet other cloud providers, web companies, and large enterprises doing AI are going to want that type of performance for themselves.