What: G. Costa & R. Souza explain how Miros data team grew from two data engineers to 15. They share their journey and why they focused on data monitoring.
The team starts out with a typical stack, most components running in the cloud, and a Grafana + Prometheus stack for monitoring in general, including Slack alerts to get notified and manually fix problems with data and infrastructure.
They then go on a journey to shift towards a monitoring and incident management automation & standard.
My perspective: A few interesting things about the article: Miro uses a model which is very similar to what we’ve been employing at Unite, a more platform-like data team that serves data and infrastructure to analytics engineering “teams” as well as “analyst teams”.
This model is focused on providing data, in the raw or modeled form to others and it makes the service level of the data a prime topic. Which is what the Miro team identifies as key challenges.
If your data team is more cross-functional and also builds models, maintains a BI tool, and does data science projects, your prime topics will also be more diversified and the quality of data won’t make up all of the service-level you’re providing.
I really like and recommend the structured approach the team takes by classifying incidents’ severity and then mapping them onto their components. They do so by thinking through the value the components provide to the end-users. They don’t just focus on the technical components but rather split the data into business domains and identify importantly, and not so important domains.