top of page
Jarkko Moilanen

#76 What is Data Observability and why does it matter?

Data Observability will be the key for corporate transformation to a data-based approach. Hence, Data Observability is expected to see massive growth. Observability is no longer just for software engineering. With the rise of data downtime and the increasing complexity of the data stack, observability has emerged as a critical concern for data teams, too.


The five pillars of data observability. Each pillar encapsulates a series of questions that, in aggregate, provide a holistic view of data health.

  • Freshness: is the data recent? When was the last time it was generated? What upstream data is included/omitted?

  • Distribution: is the data within accepted ranges? Is it properly formatted? Is it complete?

  • Volume: has all the data arrived?

  • Schema: what is the schema, and how has it changed? Who has made these changes and for what reasons?

  • Lineage: for a given data asset, what are the upstream sources and downstream assets which are impacted by it? Who are the people generating this data, and who is relying on it for decision-making?

Data observability is the blanket term for monitoring and improving the health of data within applications and systems like data pipelines.

Data observability vs. monitoring?

“Data monitoring” lets you know the current state of your data pipeline or your data. It tells you whether the data is complete, accurate, and fresh. It tells you whether your pipelines have succeeded or failed. Data monitoring can show you if things are working or broken, but it doesn’t give you many contexts outside of that.


As such, monitoring is only one function of observability. “Data observability” is an umbrella term that includes:

  • Monitoring—a dashboard that provides an operational view of your pipeline or system

  • Alerting—both for expected events and anomalies

  • Tracking—ability to set and track specific events

  • Comparisons—monitoring over time, with alerts for anomalies

  • Analysis—automated issue detection that adapts to your pipeline and data health

  • Next best action—recommended actions to fix errors

By encompassing not just one activity—monitoring—but rather a basket of activities, observability is much more useful to engineers. Data observability doesn’t stop at describing the problem. It provides context and suggestions to help solve it.

Comments


bottom of page