I just watched JPMCs talk about their data mesh journey. It’s an interesting watch, and there are definitely some important parts in it, but I’d like to highlight four things:
- JPMC moves from an on-premise network to the cloud and calls it “Data Lake via Data Mesh”, so they do both, a cloud move as well as a data mesh transformation.
- They seem to lack a connection to company strategy. I really like the HelloFresh approach I shared in a previous newsletter, deriving the need for a data mesh from their flywheel (which is essential to the company strategy). In the JPMC case, it actually seems that their “goals” are a bit contradicted with the data mesh paradigm. They seem to aim for lower cost (in terms of storage?), whereas the data mesh of course is more expensive than a centralized data lake.
- They seem to try to build a very AWS-centric data mesh, using a lot of AWS products. I usually issue a warning to produce such a high level of coupling because it looks to me, they are aiming at both, coupling on components and on the cloud. That will make them unable to switch any component at all, should a better solution emerge.
- I still think at their scale they will very likely need a data mesh, they might just end up building the wrong one on their first try.
JPMC is building the data mesh on top of AWS, taking a common infrastructure that also allows for ingestion of data and then put every domain/ data product inside one AWS account. They use AWS lake formation for access rights etc. management of their new data lake but also open up things for a hybrid cloud strategy for quite a while. They will use AWS Athena for SQL-based access and GraphQL for application-level access.
One thing that they figured out early on, was that “data product thinking” is the hardest part of all of this. They put quite some thought into that so be sure to listen to these segments.