Facebook published a blog post on Monday
highlighting how it’s able to analyze, classify and serve up its users’ billions of photos via a system called Photo Search. On the surface, the post highlights Facebook’s prowess across several big data domains, including deep learning, graph analysis and indexing. Being able to return relevant image results to a user’s query is hard enough, but doing so with acceptable latency requires some particularly clever thinking about how to design the system.
However, below the surface (or perhaps between the lines) lies the ultimate truth about Facebook’s artificial intelligence engineering: None of it would matter very much if Facebook didn’t have so much valuable data. While the sheer scale of its photo data might present some challenges, the real complexity occurs because Facebook wants to do more than just show a user all of her cat photos. It wants to show her friends’ cat photos, as well, and ads and businesses and whatever related content it can serve up.
Our photos, connections, wall updates, likes, etc., etc., etc., all provide Facebook with valuable data about who we are, who we know and what we like. Its investments in AI, big data infrastructure, network engineering
and everything else are essentially a means to get us to share more data (because, hey, it’s free and fast), and then to analyze it sell ads against it.
Think about trying to build a successful photo-search service—something worthy of VC investment, or probably even profitable enough to justify bootstrapping for a prolonged period—today. It would be a tall order. Even if you built the best photo-search experience ever, you’d still need to convince customers to upload their photos in yet another service, and then to actually pay for it. If you wanted to make money beyond subscriptions, you would need some other type of data against which you could sell ads, recommendations, or whatever else you aimed to productize.
The big problem, of course, is that Facebook, Apple, Google and Flickr already have all our photos and offer search for free or next to free. Facebook and Google, in particular, also have petabytes upon petabytes of other data they can use to target users with ads, recommendations, new products, you name it. The big data infrastructure, the deep learning models and everything else they do exist to serve the data, not the other way around.
AI presents an enormous opportunity to make money and change the world (if you’re into that sort of thing). AI technologies are also readily available today and will only become more commonplace in the years to come. But everyone can have those: Facebook and Google are literally giving away some of their technologies, in part because they already have such a big headstart with so many types of data that open sourcing some tooling doesn’t really matter much.
AI hasn’t changed the lessons of the past decade of big data, except for providing some powerful new means by which to process it and analyze it. The data is still the thing that matters most.