🔮 What: Confluent has a nice short explanation of the principle of “federated computational governance — govern data wherever it is”, together with a few examples. While federated computational governance is a concept that stems from the data mesh paradigm, the idea is pretty simple and extends beyond any specific paradigm. The idea is simple, once we decentralize data ownership, we will still need to govern it. But centralized governance will not work with decentralized ownership, to govern completely decentralized obviously is beside the point, so we end up with federated governance.
Since doing federated governance “manually” sounds neither feasible nor scalable, it should be computational — or automatic. And that’s it.
🐰 My perspective: For some reason, if you google for “federated computational governance” you get some quote about it being a “team of data product owners” which doesn’t really be very general, and neither like a “must be”.
Let’s take a small example to truly understand this. I think the hardest concept of the data mesh paradigm: Let us build a super small data mesh: A small git repository containing CSVs.
Let’s decentralize ownership: Give every autonomous team push rights to their respective “folder” inside the git repository. So now, teams can completely autonomously push data into the data mesh.
But wait! Someone just pushed in lots of personalized information, that’s not good! Let’s introduce some governance. Put up a small “BestPractices.MD” with a section “How to handle data sensitively”, and invite the teams to submit pull requests to that.
Perfect, we just federated the data governance.
The question only is, how do we truly control this? How about instead of giving teams push rights we let them create pull requests? Does that sound like a solution? I hope not, because it would take away a lot of autonomy and pretty much destroy the network effects we would gain by running a data mesh.
So? The only good option here is to have some automatic way of checking the pushed data, like a git-hook scanning for X@Y.Z patterns, or something like that. So let’s implement that and congratulations, we just added the computational to the federated governance!
At our destination: So we quickly arrived at our destination, federated computational governance. I hope the trade-off, and the key questions become apparent to you. How decentralized should this be? How much do we want to control? How much freedom do we have to take away to gain which kind of value by making the meshwork together?
Can I ignore this if I am small? The problem with governance is, for a data mesh, it is what keeps it “meshy”. If your mesh is small, actually consisting of CSVs, but the CSVs have no way of joining together because people keep messing up the conformed dimensions used for joining, your data mesh becomes pretty worthless. So there is no way around federated computational governance if you’re implementing a data mesh.
Now, this might sound like something that is only important in the data mesh context, but I think it already is kind of alive in a lot of companies extracting data at scale, not specifically following a data mesh paradigm. Most machine learning systems in fact will run in a mostly decentralized way, exposing data through APIs, just as most dashboards created by decentralized data analysts which all too often produce data governance issues.
Additionally, federated computational governance opens up a lot of business opportunities, because automatic data classification, identification, access right monitoring isn’t something that is covered by many solutions out there, but it should.