Real data behaves in many unexpected ways that can break even the most well-engineered data pipelines. To catch as much of this weird behaviour as possible before users are affected, the ING Wholesale Banking Advanced Analytics team has created 7 layers of data testing that they use in their CI setup and Apache Airflow pipelines to stay in control of their data.
Batch data processing, historically known as ETL, is extremely challenging. It’s time-consuming, brittle, and often unrewarding. How to simplify the process? Use right instruments and follow best practices.