Data Quality Evaluation Plan

Words: 838 Pages: 3

Table of Contents

Introduction
Duplication of efforts and overall quality of data
Determining the quality of redundant data elements
Factors to influence the quality of the data
Resolving data differences between systems
How to address missing data
Conclusion
References

Introduction

This is an elaborate plan on data quality evaluation for GlaxoSmithKline plc (GSK). This is a multinational pharmaceutical company with headquarters in London. There are issues in master data management as a result of two firms merging in 2000; Glaxo Wellcome merged with SmithKline Beecham Company. These two companies were operating in the same environment. Thus, it is expected that the duplication of customers is as a result of merging the previously independent databases they were using. Data quality is of exceptional concern in any organization. Poor quality of data leads to wrong conclusions by the management (Loshin, 2008).

This plan will help to commence and implement master data project to aid in enterprise level planning and reporting. The master data will be of exceptional importance to the middle level and top management in understanding the past dynamics of the business and can help in predicting the future. This plan borrows from the preliminary plan presented on all the existing data stored in inventory, and which existing data will form the master data.

Duplication of efforts and overall quality of data

Data are duplicated when same object is entered in the same system, but there is no unique identifier used to identify this object. For instance, in GSK 10 customers in Africa are entered as 16 different customers. This means that 6 customers have been duplicated when entering their information. For total data quality, this plan recommends that data objects should be uniquely recognized in the inventory database.

Two methods are used to detect duplicate entries; exact matching method and fuzzy method. In exact matching, the entries being compared have to be identical in the database. The fuzzy method of detecting duplicates looks for entries that are almost identical. Master data should not have duplicates, missing entries or redundant data and wrong entries. Master data without these mentioned kinds of data will be of very high quality (Otto, Hüner, & Österle, 2012). It is a reliable master database from which the management can make conclusions that will have a positive impact on the GSK.

Determining the quality of redundant data elements

The overall purpose of data cleaning algorithms is to ensure that there are no identical fields entered in more than one table in a master database (Loshin, 2008). These programs will go through three phases; field description, duplicate detection and the actual detection. In all the three phases sorting and blocking candidates in the database is used. Following this process, the database will be subjected to individual datum searching and transformation. Data standardization will be the last step in ensuring that the data elements conform to set rules in the database (Loshin, 2008).

Factors to influence the quality of the data

Quality data are the cornerstones of GSK because it uses them for day to day operations. However, quality of the data is affected by several factors. Firstly, human factor is so common in large organizations like GSK. In this case, employees do not know the purpose of good data. They do not appreciate how master data in the organization can be used for business growth. They might not be well trained or motivated. Secondly, wrong methods might be used for data collection and/or recording.

GSK is a global firm with its operations spanning across the continents. Therefore, several people in all these continents collect data for the firm in their areas of operations. Wrong procedures may be applied to record the data thus compromising on the quality of data. Thirdly, faulty algorithms might be used to carry out data cleaning, duplicate detection and final data standardization (Otto et al., 2012).

Resolving data differences between systems

There are many systems in use at GSK; accounting, marketing, production, distribution and research among others. A quality master data should not have differences between systems (Silvola, Jaaskelainen, Kropsu-Vehkaper & Haapasalo, 2011). For instance, distribution and marketing systems should have identical fields and entries because the clients being distributed and marketed for are the same. This master database should be cleaned for any data redundancy by the appropriate data algorithms.

How to address missing data

When data are missing they are detected when the database is being queried for completeness. Several approaches may be used to correct missing data. These are; limited deletions (case-wise or pairwise), substitution (full imputation and impartial imputation) and interpolation (Silvola et al., 2011). Deletion method will result to less data than before, having no missing values. Substitution will replace the missing entries with known values or entries in the master data. Interpolation extends the data points from the available values to predict the missing entries. All these approaches are statistics based.

Conclusion

This plan is designed to help implement master data project for level planning and reporting. It has addressed data duplication and overall quality of data, how to determine the quality of redundant data elements, and has discussed factors that influence the quality of data. It has also given approaches on how to resolve differences between systems and how to solve missing data.

References

Loshin, D. (2008). Master data management. Burlington, MA: Morgan Kaufmann.

Otto, B., Hüner, K. M., & Österle, H. (2012). Toward a functional reference model for master data quality management. Information Systems and e-Business Management, 10(3), 395-425.

Silvola, R., Jaaskelainen, O., Kropsu-Vehkapera, H., & Haapasalo, H. (2011). Managing one master data–challenges and preconditions. Industrial Management & Data Systems, 111(1), 146-162.