We will add a second architectural plane to our data platform, a “data storage plane”. This concept will allow anyone to:
- store the location of a data product
- get a terraform template for launching storage in the form of an AWS S3 bucket anywhere and automatically register it.
Additionally, we modify the “data product plane” internally, to use the data storage plane to store the locations. This way, we are creating optionality! Basically, we’re doing a non-breaking change to our interface, allowing:
- people who don’t have their own storage or enough terraform skills to simply use the CSV storage by default
- people with advanced knowledge if they want to use the advanced functionality.
Great! We’ve just taken a big leap for the data producers, we’ve taken a large burden away from them and this way enables many more teams of data producers to use the platform.
But wait, what about the consumers?
The Catch: Data Consumers Just Lost value!
You probably already noticed that we just made the data consumers worse off. Why? They still have one single point of truth for the metadata, the data catalog.
But they lost their single point of access, the repository.
Luckily, there is some smart technology that enables us to change that. We can use a query engine like trinoDB to access our data that is located in AWS S3, and the CSV files.
Note: …with some modifications. There is a CSV connector, but I’m not sure about its maintenance level. The easiest solution is probably to just dump the CSVs into another data store that has a well-maintained trinoDB connector.
Now, let us take this a step further, the data consumers currently have a “surplus”, they get more value than they had before the platform, whereas the data producers still only are at “roughly 0 additional value”. Let’s see what we can do to change that.