Data Bias: The Latent or Unobserved

world-war-2-aircraft-survivorship-bias-abraham-wald-17

In statistics a Latent Variable can be defined as, ‘a variable inferred from observed or measured data.’ Its analysis is often used psychology, economics, and predictive modeling.  This author used Structural Equation Models (SEM) in his 1996 doctoral dissertation, Cross Cultural Negotiations Between Japanese and American Businessmen: A Systems Analysis (Exploratory Study).

From that abstract, “The use of sophisticated statistical techniques such as structural equation modeling and game theory is becoming increasing more important.  Traditional techniques are known to be limited, particularly in the context of cross-cultural behavioral studies.”

Survival Bias

A recent LinkedIn post alerted this writer to the inimitable perspective statistician Abraham Wald brought to the assessment of World War II Allied bomber damage upon return from missions.  He argued that observed anti-aircraft damage was non-crippling since the aircraft remained airworthy and was able to return.

He surmised that planes that did not come home may have suffered damage to other areas making them unairworthy and hence their data was unobserved.  Based on this analysis, the U.S. Navy beefed-up armor in the less or unaffected areas and this was credited with saving lives and aircraft.

This type of analysis came to known as Survival Bias which has its proponents and detractors.  On the surface, it seemed intuitively obvious that areas of damage need addressing while not necessarily those statistically showing fewer issues.

It is not our intent herein to assess its merits and applicability.  Rather to help readers better understand the very nature of big data and its use, especially in predictive and behavioral models.

Covid-19

Today, policy and other decision makers are tasked with dealing with a deadly global pathogen.  Apparently developing quickly and spreading exponentially—a super spreading event.  As of this writing has afflicted millions in 188 countries/region in much less than 12 months.

In this pundit’s opinion, much of the concern, confusion and clearly wrong information regarding this disease and mitigation protocols can be traced to data collection and analysis.  By now most readers will have some familiarity with the chaos associated with these predictive models.

For example, according to the US National Library of Medicine, National Institute of Health, “A key fact for us all to remember is that, for the majority of countries, we’re not actually counting how many people have the virus—instead were counting the reports of how many people have the virus, and, like all metrics, those numbers vary according to how they’re measured.  An increase in the number of tests being carried out will result in an increase in the number of infections detected.”

In addition to the Herculean efforts to tame this tiger from the vast medical, scientific, technology and many other disciplines, Structural Equation Modeling is being used to shed additional light on the latent variables.

Final Thoughts

The 2020 Coronavirus is an early test of Big Data analysis in support of decision makers both for public policy and non-government organizations.  While performance so far has been weak, this pundit believes great value can come from this effort.

Data quality must be highly reliable and valid.  Moreover, models must assess what is not seen, the latent variables such as found in Survival Bias.  These two aspects of strong decision support models are crucial.  These are lessons for all of us.

Where Didn’t the Bullets Hit Your Business Model?

For More Information

Please note, RRI does not endorse or advocate the links to third-party materials.  They are provided for education and entertainment only.

For more information on Cross Cultural Engagement, check out our Cross Cultural Serious Game

We are presenting, Should Cross Cultural Serious https://rri-ccgame.com/Games Be Included in Your Diversity Program: Best Practices and Lessons Learned at the Online Conference, New Diversity Summit 2020 the week of September 14, 2020.  Check Out this timely conference!!

You can contact the author as well.

Share on twitter
Twitter
Share on linkedin
LinkedIn
en_USEnglish
en_USEnglish
Scroll to Top