Why "bad data quality" is the supervillain you can never beat

 

Author: Johan Nodén, Data Architect and Advisor @ Random Forest

 

...unless you use the (not so) secret weapon

There is a lot of talk about data quality these days, and "bad data quality" often gets to play the "bad guy" in stories about failed projects, missed business opportunities or spectacular errors where customers have been treated bad. But who is this mysterious villain we call "bad data quality"?

A quick search for data quality gives a lot of articles about common data quality issues such as duplicates, missing values and outdated data. Followed by detailed descriptions about quality dimensions such as consistency, integrity, accuracy and so on. While these are really great help in how to work with data quality, I believe that most of us misses the real point.

What is quality?

According to ISO 9001, the general term quality is the “degree to which a set of inherent characteristics [or distinguishing features] of an object fulfills requirements”. So, quality is all about being fit for purpose, to fulfill requirements. If something has quality, it fits its purpose. For example, if the purpose is to conveniently transport a person from one place to another, a luxurious sports car will probably be considered having higher quality than a dull truck. But if the purpose is to transport a pile of dirt, the truck will probably be considered having higher quality.

How we measure the quality of a car depends on what we want to use it for. So what would happen if we just went of and bought a car without considering any requirements about how it would fulfill the purpose we want to use it for? Unless we are very lucky, we would probably end up with a low quality car.

Stop focusing on data quality

Data quality is no different from quality in general. Quality is the degree to which an object (or data) fits its purpose and fulfills its requirements. If we do not know the requirements, we cannot get any quality. Or turned the other way: We cannot know what data quality is, if we do not know the data requirements. But still, we are always eager to paint the picture of a villain and name it "bad data quality" when we trace a failure to data. But trying to fight bad data without figuring out our data requirements is like fighting an invisible ghost. We are never going to beat it!

Start focusing on information requirements

The first reaction to beat bad data quality is to start chasing "good data quality". Working with data quality dimensions, policies and data classification can give a lot of value, but it doesn't matter how much work we put into it if you don't know what we are aiming for. "Good data quality" can only be defined based on information requirements. We must start with the information we need in our business. First, start with the business goals and business requirements. Then, figure out what information we need to meet those business goals and requirement. Only after that, with our information requirements documented, we can start talking about data quality.

Prepare your (secret) weapon

Bad data quality is simply when the data we have deviates from the information we need. And only when we know what we need, we can start talking about the deviation. Only when we know what we need, we can start measuring how much we deviate from the requirements. Only when we know what we need, we can define guidelines and policies for data to meet the requirements. Only when we know what we need, we can beat the bad guy "bad data quality".

So, instead of chasing after the bad data quality ghost, we must start working with information. Treat information (and data) as a necessity for the business. Define and document the information we need to run and enhance the business. Start working with information requirements when we digitalize our business. Instead of only focusing on functional and non-functional requirements in IT project, start focusing on information requirements.

Simply put, start doing Information Management (IM)! That is the (secret) weapon to beat the unbeatable "bad data quality" villain.

Author: Johan Nodén, Data Architect and Advisor @ Random Forest

Any questions and want to be contacted by Random Forest?