Data Quality is the buzz word in the digital age.
What is data quality and why is it so important?
“Data quality” is the term that is probably hidden but plays an important role in many streams. Data plays a vital role in acquiring a market place, especially in enterprise data management stream.
Data Quality Examples
Following are some examples which emphasize the need for data quality.
- A customer shouldn’t be allowed to enter his age where he has to mention his marital status.
- When a customer enters a store, there is a high possibility that he might miss out his original details to be filled up with the forms, some of it can be in a hurry not mentioning a correct phone number.
- There is also a possibility of the billing staff to wrongly enter the store address as default in place of the customer address which contributes to a bad quality data that gets persisted in the system.
This data may be crucial as the customer might not just be a Guest customer and the customers’ viable interest towards the store becomes obscure.
This blog post speaks on Data Quality, the significance of Data Quality, business impacts, best practices to be followed, and Mastech InfoTrellis’ specialization in Data validation
Business Impacts on Data Quality
Recent researches from Gartner indicate that poor data quality is a primary reason for about 40% of failing business initiatives.
A low-quality data costs around $600 billion dollars for American businesses alone which in turn causes the failure of any advanced data and technology initiatives.
Significance of Data Quality
The successors of the big business clearly understand the importance of quality data.
The quality of data is directly proportional to the:
- The marketing campaigns cost and the determination of the right audience
- Knowing the customers interest
- Converting the prospects into sales
- The turnaround time for converting a prospect into sales
- The precise business decisions that are made
- How accurately you can make business decisions
The integral part is played by the Quality assurance consultants in revving up the data and ensuring that the data that is consumed by the upstream and downstream are credible.
Data Quality Techniques
For any data to be consumed by the system, the data need to be cleansed to understand the data model of the customer and post cleanse, the data needs to be profiled for a deeper understanding of the data model/ the pattern the data is accumulated
Figure 1: Data Quality Techniques
One of our clients, had issues providing quality data to the subscribing source systems. The existing implementation did not provide a solution in achieving the goal of providing a quality data. Therefore it required production fixes by the customer business or customer IT team.
There were several issues with the current implementation that hindered the business from achieving its goal of providing good quality data to subscribing source systems. A large proportion of these issues had to do with adding and updating customer information.
Mastech InfoTrellis Solution Expertise
We as Data Consultants, followed the data validation cycle, analyzed, and identified the data pattern in “address” data. Since the customer had reported bad data quality, the data pattern had to be analyzed as a first step.
Sampling Example of address data
- Invalid address – records containing duplicate addresses
- Store address provided as customer address
- Address line one, two/ three – null/ blank
- Unknown/ TBD values provided in the attributes
- Country value as Null
- Zip postal code with invalid value
These were the patterns that were analyzed and presented to the clients for further evaluation. Database was queried and the samples were provided to the clients. Once the pattern was evaluated by the clients, the solution was designed.
Sample Business Rule Validation
Postal Code Validation
Disallowing entry of invalid postal code or entering the Postal code of US address for a Canadian address
Figure 2: Best Practices
Our Specialization in Data Validation
As data management consultants, every resource needs to understand the verticals or domains in which we are specialized. Following are the various domains :
- Health care
The data can be customer specific, contract or product data. We as data scientists have handled data from all these domains and from all geographical regions.
For example; a name of John can be common in the USA and not in South Africa. Hence, analyzing the data comes with experience. We guide the customers and provide an insight into the data pattern.
We have profiled, cleansed the data and identified duplicates between the data for various clients who have their wings spread across different geographies.
Hence, the solutions are best designed with the analyzation of the customer problems. Data plays a vital role in capitalizing the market, which the major players on the market have already started eying. The consumers should be aware of the pattern the data should be segregated and displayed with the product owners, and the above methodologies gives a bird’s eye- view on some of the validation techniques.
About the Author
Narayan is an Associate Architect at Mastech InfoTrellis with an overall experience of around 6.5 years in certifying the IBM Master Data Management Advanced edition, Collaborative edition, Standard edition, Probabilistic matching engine, and ETL solutions.