Correlation and Causation in Statistics

Words: 833 Pages: 3

Table of Contents

Introduction
Basic statistical issues
Typical examples
Causation interpretation
References

Introduction

When starting work on a study, it is necessary to choose the appropriate method of statistical data analysis. Unfortunately, scientists often consider some essential elements and concepts of statistical analysis as interchangeable. This paper aims to discuss the fundamental statistical issues and present at least five typical examples applying statistical knowledge and comment on approaches that permit progress from a simple correlation to actual causal interpretation.

Basic statistical issues

Correlation is synchronicity suggesting a possible but not inevitable interdependence between phenomena. This synchronicity is often called association, and it is believed that such associations can become the basis or prerequisite for scientific research. According to Didelez², correlation is often regarded as an association that describes situations “where phenomena occur more often together (or not together) than would be expected under independence.” The scholar mentions that in a purely statistical sense, associations do not need to be meaningful since they only express “the expectation that they reflect a causal relation”.² The existing correlation usually motivates scientists to find and prove a causal connection between the given events. Further, causation means there is a causal connection between the issues, which can be determined in the research process.

A confounding factor is another essential statistical concept. It is a factor common for two cases between which a correlation exists. The search for the confounding factor is critical when conducting observational scientific researches because its character usually determines the existence of a causal connection between phenomena.³ Hence, the confounding factor can prove or disprove the presence of the causal connection. Interestingly, when conducting experiments, scientists sometimes pre-determine the confounding factor in particular variables to exclude them from the study. Researchers can also use the confounding factor to include certain variables.³ A few examples will illustrate the basic statistical concepts outlined above.

Typical examples

Firstly, the role of correlation, causation, and confounding factors should be considered. To explain what does ‘correlation’ mean, Didelez² chooses an example, where the scientists are comparing a relatively large number of newborns and storks in the same area. These two phenomena are correlated and, despite the absence of a causal connection between a more significant amount of storks and newborns, have a confounding factor – the size of the communities. In particular, the higher number of newborns is explained by the vast population of the village. The higher population density also means there are more roofs, where storks usually nest.

Another example mentioned by Didelez² is the correlation between the rising tides in Venice and the prices for bread in the UK. In this example, the phenomena under consideration are not connected; therefore, scientists say that these trends are unrelated. Didelez² also provides an example of the correlation between atmospheric emissions of chlorofluorocarbons (CFC) and ozone hole growth. Here, two phenomena are interconnected and there is a proven causal connection between them.

Causation interpretation

It is imperative to use fundamental statistical concepts accurately when moving from a simple correlation to actual causal interpretation. In particular, causal interpretation implies the proper understanding of the differences between the concepts predictor and determinant. This difference can be illustrated with the following typical example. There is a study in which scientists examined the correlation between the conceptions and the abundance of light.¹ It was found that women who participated in the experiment and were exposed to artificial light daily became more fertile. However, this factor did not have a decisive influence on the statistics of conceptions, since most impregnations in the US occur in the fall, the season of the light day reduction. Therefore, in this example, light is a predictor, not a determinant of the outcome.

In another study, scientists were exploring the correlation between IQ and myopia in children. According to the study, children who read more books were more likely to have myopia.⁴ At the same time, a more significant number of books read, implied higher performance with IQ tests. Thus, a higher IQ was recognized as a risk factor for myopia, whereas a more substantial amount of time spent on reading was detected as a confounding factor for this correlation.

Most scientists agree that there is a need for the correct use of statistical concepts when conducting causal interpretation. In particular, Kramer³ notes that many researchers confuse basic statistical concepts such as predictor, risk factor, determinant, and cause and use them interchangeably. For example, a predictor can be confused with a determinant in the first stages of the study. The use of causal interpretation allows drawing significant conclusions when conducting experimental or observational studies. Therefore, it is essential to understand the differences between its components.

Thus, a discussion regarding the fundamental statistical issues was provided, the typical examples were presented, and the approaches used to progress from correlation to causal interpretation were commented. To summarize, when conducting research, it is necessary to understand the difference between correlation and causation. Besides, when performing a causal interpretation, it is essential to correctly distinguish statistical concepts since this allows drawing correct and more thorough conclusions.

References

Cummings DR. Human birth seasonality and sunshine. American Journal of Human Biology. 2010;22(3):316-324.
Didelez V. Statistical causality. Consilience Interdisciplinary Communications. 2005;2996:114-120.
Kramer MS. Uses and misuses of causal language. BJOG: An International Journal of Obstetrics & Gynaecology. 2015;122(4):462-463.
Saw SM, Tan SB, Fung D, et al. IQ and the association with myopia in children. Investigative Ophthalmology & Visual Science. 2004;45(9):2943-2948.