by Wolfram Wingerath
Since data processing is at the core of many businesses today, ensuring good data quality is often required for smooth operations and valid business decisions. But what does "good" data quality actually mean? When is it "good enough"? And how to make sure it stays "good enough" in the face of growing data volumes and evolving business processes?
In this presentation, we will introduce you to challenges and best practices for data validation in data-intensive domains, highlighting its critical impact on everything from product optimization over customer reporting to machine learning use cases. We will start with dimensions along which data quality can be quantified, before we explore concrete strategies for ensuring them. We will discuss how requirements can be specified using data constraints and will illustrate this with a practical example. Finally, we will highlight the inherent challenges of handling data validation in Big Data applications and share our experience from having done this for more than 10 years.
By the end of the talk, you will not only understand the significance of data validation in a data-centric world, but you will also have a grip on why this is a complex task and how it can be accomplished at scale.
18 ноя 2023