Oliver Molander did a nice analysis about Dbt and analytics engineering in general. I leave it to you to read the piece, but I cannot without commenting on one quote:
“Bob Muglia, former CEO of Snowflake and investor & board member of Fivetran, sees that in a long-term perspective
, data lakes won’t have a place in the modern data stack. Muglia underlines that you have to look at the evolution of how infrastructure changes over time to take on new capabilities. He predicts that five years from now, data is going to sit behind a SQL prompt by and large, and then over time evolve into relational. He predicts that relational will dominate and SQL data warehouses will replace data lakes.”
I think Bob Muglia is completely wrong. I think data lakes and data warehouses will converge in the future. Data lakes will keep the separation of compute and storage, which are the new and shiny thing in Redshift + Snowflake but have always been there on the data lake side. I think the future will be about 98% data without structure, which means SQL will play a much less important role. It will be there to model the few key structured sets, but a lot of the emerging data is real-time and diverse in structure, so SQL as we know it will not be the key player in the future of data.
This still means a good spot for the analytics engineer, the one who combats and takes the few major important data sets and models them, but it also means that the rest has to be dealt with!
So no, I don’t think Dbt will be bigger than spark, there will very likely be some new players entering this zone, but Dbt with its clear-cut vision will stay focused on the niche it has.