I’ve been playing around with dagster lately, comparing it to prefect and airflow and I came to like it. Two reasons make dagster so much fun. Of the currently in vogue data orchestrators it’s the one that:
- Has the most compelling “vision”, focusing on being a true orchestrator, abstracting away the stuff below it
- Is the most fun to develop!
What’s the vision? To orchestrate, basically build an overarching “DAG’’ regardless of your tool choice. You can use a Jupyter notebook, Spark, SQL, whatever, dagster doesn’t care. That resonates very well with what is currently happening in the typical data team and will very likely continue to happen in most teams.
Why is it fun to develop? First and foremost, because dagster makes it easy to write tests! Tests for the smallest units, tests for the whole flow. You can mock data and run things on your laptop quickly, you can easily swap environments and run against either integration or a production environment. That’s made possible by outputs & inputs and a stronger system around the “metadata of the flow”.
As a resource, for now, I recommend simply take a look at the journey of Mapbox, until I get around to write a “test-driven dagster” tutorial.