So, I recorded a video podcast type of thing this morning about big data, AI and the like, and, upon hanging up, immediately thought of a great point that I should have made. (This is par for the course pretty much anytime I’m answering questions rather than asking them: I think of something I should have said and I think it’s a great point.) Because I couldn’t make it there, I’ll make it here instead. Basically, it’s that while the previous era of big data was driven by the technology and encouraged companies to hoard as much data as possible, the current era of AI is actually driven by applications and encourages companies to get the right data.
I could of course be entirely wrong, but it seems like the general message behind technologies like Hadoop and its ecosystem was that because you could now store so much data, and so many types of data, for so cheap, you should obviously do that. Once it’s all captured and stored, you can then figure out how to process it, blend it and use it to improve analytics. If it worked for companies like Yahoo and Google, it should work for everyone else, too.
It’s a logical enough argument in theory, but I think most people would agree it hasn’t always worked out as planned. Companies are almost certainly more data savvy than they were a decade or two ago, but I’m sure how much closer we are to any sort of data-driven nirvana. Maybe that’s not such a bad thing.
After all, the concept of big data did at least get companies thinking about data, and it probably got many of them investing in the infrastructure to deal with it. Now AI rolls around promising much better predictive capabilities in many situations, and everyone is that much closer to capitalizing on it. Better yet, the conventional wisdom around AI (and by AI here I probably mean deep learning) is not “capture everything” but, rather, “capture lots of what you need for the task you’re trying to solve.”
Really, this has probably been the motto of many data scientists for years. If you’re trying to perform a specific task, capturing lots of relevant, high-quality data will always be a better option than capturing everything and trying to wade through it. Even the NSA, some experts have argued, suffered from too much data
, to the point that it actually made it more difficult to analyze criminal behavior and networks.
So perhaps it’s possible that AI, given its current status as panacea to all things that ail business, will effect a shift in thinking about data – something more along the lines “What do we want to do, and what are the data and tools we need (and already have) to do it?” Yes, one successful project will beget more projects and data volumes will continue to rise, but that seems like a preferable approach compared with collecting first and asking questions later.
And, hey, with growing concerns about security and privacy, and regulations like the GDPR
set to kick off, it’s probably a good time to start bringing some real order to data strategies anyhow.