HR Data Quality

“70% of organizations are increasing investments in talent analytics, but only 12% feel like they’re getting results”



Data quality is the ability of a given data set to serve its intended uses in operations, decision making and planning. Data Quality testing is an important aspect of any data analysis. Data Quality maintenance in HR context is related to maintaining complete employee profile; Personal for regulatory and well-being purposes, and professional aligned to organisation structure, skillsets and Training and Development. As data are significant resources in all organizations the quality of data is critical for managers and operating processes to identify related performance issues. Moreover, high-quality data can increase the opportunity for achieving top services in an organization. However, identifying various aspects of data quality from the definition, dimensions, types, strategies, techniques are essential to equip methods and processes for improving data



If the DQ is not done properly, it may lead to loss of opportunity, misleading insights causing ineffective decisions and hence losing trust with the consumers and leaders and Reputational Damage.

“Friedman & Smith (2011), Measuring the business value of data quality, a Gartner Publication, stated that ‘research shows that 40% of the anticipated value of all business initiatives is never achieved. Poor data quality in both the planning and execution phases of these initiatives is a primary cause.’ For HR staff involved in change management, this should be a significant concern. It was also reported that ‘data quality affects overall labour productivity by as much as 20%.’ Even if we moderate that number to just 5%, it is a material amount of lost productivity.”

  1. Loss of Opportunity: If your competitors are gaining more insights from data than you are, that might mean your company misses a critical opportunity for new product development or talent need that a competitor with a more mature understanding of data may capitalize upon.
  2. Misleading insights causing lack of trust: 84% of CEOs are concerned about the quality of the data they are basing decisions on, according to KPMG’s “2016 Global CEO Outlook.” When there is a lack of trust in data quality, confidence in the results it provides is quickly eroded. That can cause obstacles to gaining executive buy-in, dampening enthusiasm for further investment in data and quality improvement initiatives.
  3. Reputational Damage: Recall Apple’s widely panned Maps rollout in 2012. Immediately it became clear that much of the underlying data was inaccurate or missing, resulting in a product that TechCrunch later called “barely usable”. Even after 8 years, many Apple device users rely on Google Maps rather than Apple Maps.



There are many tests conducted for Data Quality, namely;

  1. Completeness: The proportion of stored data against the potential of “100% complete”
  2. Uniqueness: Nothing will be recorded more than once based upon how that thing is identified.
  3. Timeliness: The degree to which data represent reality from the required point in time
  4. Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition
  5. Accuracy: The degree to which data correctly describes the “real world” object or event being described.
  6. Consistency: The absence of difference, when comparing two or more representations of a thing against a definition
  7. Relevance: Extent to which data is applicable and helpful for the task at hand.

Here we will expand only three basic tests which should be done in the beginning of any project or whenever we encounter new data.

  1. Completeness: It is a measure of the absence of blank (null or empty string) values or the presence of non-blank values. For eg. In Employee Database,
    • All the employees must be present in the dataset.
    • All the mandatory and regulatory fields like Employee Name, email-ID etc must be filled for each employee.

Completeness is tested for each field/variable by using a mathematical formula.

If a data item/field is mandatory, 100% completeness can be achieved, however, relevance and accuracy checks would need to be performed to determine if the field is passing all the quality parameters.

  1. Accuracy: It is the degree to which data correctly describes the “real world” object or event being described. Ideally, the “real world” truth is established through primary research. However, as this is often not practical, it is common to use 3rd party reference data from sources which are deemed trustworthy and of the same chronology.

The data should be accurate in terms of the Format used like PinCode in India must be 6 digits, Aadhaar Number must be 12 Digits. PAN number should contain Alphanumeric characters etc. To ascertain the accuracy of a given data value, it should be compared to a known reference value. For example, Aadhaar Number of an employee can be verified with UID portal in India, Similarly Driving License can be verified with Transport authorities


  1. Relevance: All the data collected must be relevant to the study underplay. We must consider “Why” we really need this information. It is a simple waste of time and resources if irrelevant information is being gathered. For eg. In a case study of testing hypothesis for employee attrition drivers, there will not be any need of collecting any information beyond the hypotheses established. Also, the analyses will not be as valuable as there might be more noise in the data rather than relevant information.


In today’s business environment, data quality dimensions ensure that you get the most out of your information. When your information does not meet these standards, it is not valuable.

There are many elements that determine data quality, and each can be prioritized differently by different organizations. The prioritization could change depending on the stage of growth of an organization or even its current business cycle. The key is to define what is most important for the organization when evaluating data. Then, use these characteristics to define the criteria for high-quality, accurate data. Once defined, one can be assured of a better understanding of the data insights and are better positioned to achieve organisational goals.



  4. Judoo S., Carlisle G., Duquenoy P., Windridge D., Data Governance in the Health Industry: Investigating Data Quality Dimensions within a Big Data Context. Appl. Syst. Innov. 2018, 1, 43.


14,930 total views, 58 views today



Web Master