Valid and Reliable?

The first thing a statistician, data scientist, medical researcher, engineer, social scientist or anyone depending on data is to assess its quality.

As of this writing, the recent release of the Durham Report suggests that the FBI was lax in their assessment of the alleged Russian influence in the 2016 presidential election cycle. This resulted in the reporting to the nation and the rest of the globe what appears to be Fake information. Did it tip the scales on the election and subsequent events? Not for this pundit to say, albeit this will be the subject of discussion (including some conspiracy theories) for years to come.

Our case is more straightforward. Organizations of all types private and public routinely make critical decisions based on poor quality data (including gasps in data).

The first tenet of quality is to assess the validity and reliability of said data. Validity refers to the accuracy of the measurement but does NOT determine whether the right process was evaluated. Reliability is a function of data consistency (can it be reproduced?).

Error Management

There are two types of data errors. Not surprisingly labeled Type I and Type II. Statistician define Type I errors as providing a ‘false positive’ and Type II a ‘false negative’ result. One way to asses data is the use of Hypothesis Testing. Assessment often begins with a hypothesis about a set of data.

A ‘null hypothesis‘ makes the assumption that the data is a function of pure chance.
The ‘alternative hypothesis‘ assumes the data set is impacted by a non-random cause.

Moreover, a data set can have multiple hypotheses.

Data is also classified as Primary or Secondary. Primary data is that which was collected directly by the data scientist/organization. Secondary is that which as obtained from a third party.

This researcher considers Secondary data as more likely to contain errors and needs additional scrutiny. It appears the FBI data on election interference was Secondary data.

According to a 2022 Harvard Business Review article, “It costs 10 times as much to complete a unit of work when the data is flawed in any way as it does when the data is good.” In 2016 HBR reported that according to IBM, decisions made on poor quality data cost $3.1 trillion. In 2021, the research firm Gartner reported that poor data quality costs the average firm $12.9 million. For the Fortune 500 alone that is over $3.225 trillion.

For interested readers, the cited Gartner article provides a set of 12 actions organizations can take to improve their data quality.

Decision Support

We must recognize that data quality is an issue and while we can take steps to improve it, the problem is ubiquitous and most likely growing. We must make the assumption that ALL data has either Type I or Type II errors and act accordingly.

One approach is the use of the Scientific Method. This model is developed for the average lay person and can used for business decisions as well as in everyday life.

Moreover, ALL data sets will be incomplete or have gaps. Statistical and other decision support tools can deal with this issue. Finally, the human injects bias into the process as well.

Coda

The running joke, “If it is on the Internet, it must be true,” is widely known as satire. That said, we often trust data due to its source or the fact that a so-called expert is the author, commentator or recommender. As with many things in life a healthy dose of ‘data’ skepticism is in order.

What steps is your organization taking to assure decision-making processes are based on high quality data?

For More Information

Please note, RRI does not endorse or advocate the links to any third-party materials herein. They are provided for education and entertainment only.

See our Economic Value Proposition Matrix® (EVPM) for additional information and a free version to build your own EVPM.

The author’s credentials in this field are available on his LinkedIn page. Moreover, Dr. Shemwell is a coauthor of the just published book, “Smart Manufacturing: Integrating Transformational Technologies for Competitiveness and Sustainability.” His focus is on Operational Technologies.

“People fail to get along because they fear each other; they fear each other because they don’t know each other; they don’t know each other because they have not communicated with each other.” (Martin Luther King speech at Cornell College, 1962). For more information on Cross Cultural Engagement, check out our Cross Cultural Serious Game. You can contact this author as well.

For more details regarding climate change models, check out Bjorn Lomborg ands his latest book, False Alarm: How Climate Change Panic Costs Us Trillions, Hurts the Poor, and Fails to Fix the Planet.

Regarding the economics of Climate Change, check out our recent blog, Crippling Green.

For those start-up firms addressing energy (including renewables) challenges, the author can put you in touch with Global Energy Mentors which provide no-cost mentoring services from energy experts. If interested, check it out and give me a shout.

best practice, cognitive bias, Cross Culture, Decision Making, Durham Report, FBI, Harvard Business Review, hypothesis, primary data, reability, scientific method, secondary data, Type I error, Type II error, valid

All Posts »