According to journalist Timothy Prickett Morgan at the Next Platform
, it would have been a bigger deal
if Google’s Valentine’s Day announcement had been an open source release of Spanner, or perhaps support for CockroachDB
(an open source Spanner clone), rather than the proprietary Cloud Spanner service
. The argument certainly has some merit from a user point of view (who doesn’t like open source?), but the economics of cloud computing suggest it’s probably not the best decision financially.
The biggest reason is that the business of open source databases has not been a particularly lucrative one thus far. As the recent RethinkDB saga
illustrates, it’s often a dog-eat-dog world where open source projects fight over precious users but there’s no guarantee of big money. By all accounts, the revenues at MongoDB, the current big dog in that world (or even MySQL before it was twice-acquired), would barely move the needle at a company like Oracle.
The thing with databases is that if they’re fundamentally important to businesses, like Google expects Cloud Spanner to be, companies will often pay for them right up front—absent some proven, viable open source alternative. An example from the startup world might be MemSQL
, which, although I don’t know for sure, appears to be doing alright despite not being open source.
Amazon DynamoDB highlights another factor to consider: For some companies, computing in the cloud is as much about operational simplicity as it is about anything else. They want a database that works, that’s scalable and that they don’t have to invest manpower in maintaining. For cloud providers, it’s almost certainly easier to build these service from scratch using code they’ve developed in-house than it is to turn someone else’s project into a managed service.
When Amazon Web Services launched its DynamoDB service some years ago, word began to spread that it was the fastest-growing AWS service ever. Yes, it could have instead open source Dynamo or built a managed version of MongoDB or something. But why?
A counterpoint to all of this might
be that cloud providers, including Google
, were quick to offer managed versions of open source projects such as Hadoop and Spark. However—unlike pretty much any open source database you can name—those were already very popular projects with no real peers. Cloud providers had the choice of offering managed versions with a built-in price premium, or having users install the Apache versions and only pay for cloud resources. But not
running Hadoop or Spark was not really an option.
However, even offering Hadoop and/or Spark services hasn’t stopped cloud providers from also offering their own proprietary options. In the cloud, business is all about providing users with what they want, but also having an opinion. Google, for example, has Cloud Dataflow as a Spark alternative based on its own technology, which Google argues is the superior method for programming and running data-processing jobs.
Google wants its customers to use Cloud Dataflow, but it’s fighting an uphill battle against the entrenched options. So it released a lot of Dataflow as part of the Apache Beam project, hoping to spur adoption from the bottom up. (I discuss that effort here.
Google might eventually open source Spanner, or decide to support CockroachDB in some fashion, but it will require a lot of pressure—primarily in the form of users voting with their feet. Until then, Google is hoping Cloud Spanner can be a cash cow, and the competitive advantage that helps make a serious run at AWS and Microsoft up ahead.