Correlation and regression, how they are similar, how they are different, and for what each is used
Correlation is a statistical technique of data analysis that is used to evaluate the relationship between two groups of numbers. The relationship tested is what is the effect of decrease or increase of one group of numbers to the other group. Correlation technique is also used to test a hypothesis about a relationship, to check the independence assumption between samples and establish the interactions between variables (Godfrey K, 1985). The relationship between two samples or variables can either be direct or indirect. A direct relationship is where the decrease in one sample decreases the values of the other group of variable. Correlation Coefficient is a value used to test to what extents to samples are related. This value ranges between -1.00 and + 1.00. Perfect correlation occurs when the value at 1.00 and +1.00. On the other hand, at 0, the numbers or two samples are non-correlated, hence, there is no relationship between them. The variables to be correlated must be continuous and normal. A correlation coefficient value closer to 0.00 indicates a weak relation while that close to 1 shows a strong relationship between the variables being tested. Correlation enables the researcher to establish the strength of the relationship between two variables. The statistical significance is tested at 0.05 where a value greater than this is rejected.
Regression analysis just like correlation establishes a relationship between two variables. However, regression analysis further draws a line between the two variables. There must be a dependent variable and one or more independent variables (O’Brien PC, 1981). The main aim of this analysis is to predict how dependent variable will behave in relation to the values of independent variable.
Summary of a public health example
This paper will use correlation and simple linear regression to test the relationship between Sodium (Na) and Blood Pressure (Bp). Blood Pressure is the dependent variable while Sodium the independent variable. This will be used when carrying out regression analysis given that there must be independent and dependent variable to get a directional line.
Regression analysis in SPSS
Steps
Go to Analyze on the menu bar move to Regression then select linear.
Move BP to box with label y and Na to a box labeled x., then OK.
Interpretation of coefficient r.
Regression coefficient R = 0.920 (table 1a appendix) which indicates a strong relationship between the two variables with a coefficient determinant R2 =0.847. This shows regression on sodium intake explained by 84.7% variability in BP. The level of sodium in blood explains the differences observed in blood pressure persons. The y intercept is -280.920 while the slope is 64.010 (table 1 b appendix) which forms an equation as follows: Y= -280.910+ 64.01x. This equation implies that for every increase in Sodium by 64.01g causes an increase in blood pressure by 1 mm/Hg. The significant of the r of 0.920 is < 0.001(table 2b appendix), hence, statistical significance in the relationship between the two variables. Correlation analysis table also indicates Pearson correlation of 0.920 with sig. (2-tailed) of 0.000 >0.001(table 2a appendix).
Comparison between correlation and regression using an example from public health
Regression analysis requires the researcher to define dependent and independent variables unlike correlation where there is no direction. Coefficient of the independent variable and its significant and overall model fit are used in the interpretation of the results while correlation uses correlation coefficient, significance and coefficient of determination to interpret the results (Wassertheil-Smoller S., 1990). While correlation only indicates the relationship between two numbers, regression further predicts the behavior of dependent variable basing on the independent ones.
“Correlation is not causation,” implies that Sodium should not be taken as the only cause for blood pressure. There are other factors that can cause the change in blood pressure (Holland, 1986).
References
Godfrey, K. (1985). Simple linear regression in medical research. N Engl J Med, 313(26): 1629-36.
Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396): 945-960.
O’Brien P. C., & Shampo, M. A. (1981). Statistics for clinicians. 7. Regression. Mayo Clin Proc, 56(7):452-4.
Wassertheil-Smoller, S. (1990). Biostatistics and epidemiology: a primer for health professionals. New York: Springer-Verlag.
Appendix
Table 1 a regression analysis for BP and Na
Table 1 b regression analysis for BP and Na
Table 2a correlation analysis for BP and Na.
Table 2b Table ANOVA table for regression analysis for BP and Na.