Iâve been playing around with dagster lately, comparing it to prefect and airflow and I came to like it. Two reasons make dagster so much fun. Of the currently in vogue data orchestrators itâs the one that:
- Has the most compelling âvisionâ, focusing on being a true orchestrator, abstracting away the stuff below it
- Is the most fun to develop!
Whatâs the vision? To orchestrate, basically build an overarching âDAGââ regardless of your tool choice. You can use a Jupyter notebook, Spark, SQL, whatever, dagster doesnât care. That resonates very well with what is currently happening in the typical data team and will very likely continue to happen in most teams.
Why is it fun to develop? First and foremost, because dagster makes it easy to write tests! Tests for the smallest units, tests for the whole flow. You can mock data and run things on your laptop quickly, you can easily swap environments and run against either integration or a production environment. Thatâs made possible by outputs & inputs and a stronger system around the âmetadata of the flowâ.
As a resource, for now, I recommend simply take a look at the journey of Mapbox, until I get around to write a âtest-driven dagsterâ tutorial.