Tuesday’s news that Apache-Kafka-based startup Confluent raised another $50 million
(and now more than $80 million since launching in 2014) should help its open source big data peers, and the general community of open source startups, sleep a little easier at night. This even as yet another big cloud computing conference kicks off on Wednesday, surely promising a slew of new competitive managed services and features from Google.
What Confluent’s big new round says to me—and something I saw playing out firsthand during my stint at Mesosphere—is that many (broadly defined) big data projects are just getting underway in earnest and that enterprises, especially, want to build them with mature open source technologies. Kafka and Spark, in particular, are remarkably popular among companies building real-time building applications. Elasticsearch (whose parent company Elastic is holding its annual user conference this week
) seems quite popular, as well.
Growing competition from massive cloud providers like Amazon Web Services, Google and Microsoft, should give the companies backing these open source projects pause—followed by, perhaps, a feeling of gratitude. More mature cloud platforms attract bigger, more risk-averse users that also happen to be more concerned with issues like lock-in and knowing that the technologies they choose will be around for a while. And we’re at a point right now where an established open source project can offer stronger assurances than most cloud services on those issues.
So, as I wrote last week while explaining why AWS might see its market share slip
in the coming years, the result of this could be massive adoption of cloud computing infrastructure running open source technologies on top of it. This should be a golden opportunity for open source infrastructure companies, and not just those in the big data space, to ride the cloud wave and hopefully attract customers that will stick with them for a long time.
Over time, cloud services will get better and more appealing, especially to younger companies, and newer, shinier open source projects will pop up. But by that time, technologies like Kafka and companies like Confluent will be the incumbents likely powering some strategically important workloads, and will not be easily replaced.
P.S. I intentionally did not address the current NoSQL database or Hadoop markets here. I think they have their own unique sets of challenges, and opportunities, that don’t necessarily apply to newer, more focused open source startups.