DAGs suck; FAIR Data; E(t)LT(P); ThDPTh #48

#53・
140

subscribers

61

issues

Subscribe to our newsletter

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Three Data Point Thursday will receive your email address.

Three Data Point Thursday
DAGs suck; FAIR Data; E(t)LT(P); ThDPTh #48
By Sven Balnojan  • Issue #53 • View online
Hey everyone, I just opened up a new section on this newsletter called “Notes from the ThDPTh” community. So if you feel like sharing something interesting and are deeply involved in the data space, go ahead and just ping me!
This week, my thoughts were all over the place. I got caught by a conversation about how to properly handle communication in data stacks; And no, a data orchestrator is in my opinion usually not the right choice of tool.
I also talked to someone about the FAIR data framework and got pulled into a weird concept by Meltano, which immediately sparked my thoughts on E(t)LT(P).
See below…
I’m Sven, I collect “Data Points” to help understand & shape the future, one powered by data.

Svens Thoughts
If you only have 30 seconds to spare, here is what I would consider actionable insights for investors, data leaders, and data company founders.
- Data orchestrators are overrated. Data developers spend a lot of time creating “DAGs”, large graphs of dependencies. Guess what, software developers spend a lot of time breaking up dependencies and reducing them.
- DAGs in general are a bad thing. DAGs are dependencies, and dependencies cause a lot of problems, simply because dependencies cascade breakdowns, errors, bugs throughout the DAG.
- Data systems can use fault-tolerance communication too! But it is not necessary to actually build a huge DAG. Instead, we can use the common communication patterns used in software engineering to at the very least, reduce the fault tolerance and coupling of the dependent components.
- E(t)LT(P) might become a thing. The name is way too long for an architecture pattern, and I don’t like it. But that doesn’t mean that adding the “P” for publishing data sets to another place isn’t a thing that is happening. Meltanos AJ Steers rightly points out that publishing data sets to someplace else is actually all over the place and pretty important.
- Data should be FAIR. The FAIR framework is already 5 years old, but I find it easier to handle than the Data Mesh list of attributes. However, I think to make data usable at a company we will have to make it at least FAIR even in the smallest start-ups out there.
🎁 Notes from the ThDPTh community
Nothing new this week! If you have something, just ping me by responding!
🎄 Thanks => Feedback!
Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.
Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.
If you want to support this, please share it on Twitter, LinkedIn, or Facebook.
And of course, leave feedback if you have a strong opinion about the newsletter! So?
P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!
Did you enjoy this issue?
Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue