Statistics: Dummy and Orthogonal-Coded Regression

Words: 1442 Pages: 5

Table of Contents

Introduction
Data File Description
Testing Assumptions
Research Question, Hypothesis, and Alpha Level
Interpretation
Dummy-Coded Regression Results
Orthogonal-Coded Regression Results
Conclusion
References

Introduction

The current paper provides the results of two multiple regressions performed on the same data but using different types of coding of dummy variables: dummy coding and orthogonal coding. After the description of the data file and after testing the regressions’ assumptions, the research questions, hypotheses, and the alpha level are specified; next, the results of the statistical tests are supplied. The paper is concluded with an analysis of the strengths and limitations of the two types of coding of dummy variables.

Data File Description

The data set contains results of a survey aimed at assessing the impact of anxiety on exam performance. The outcome variable is Performance, which is measured on an interval/ratio scale. The ordinal predictor variable, Anxiety, was dummy coded using dichotomous variables D1 and D2, and orthogonally coded using nominal variables O1 and O2. The sample size is N=15.

Testing Assumptions

From the histogram provided in Figure 1 below, it is apparent that the normality assumption is not significantly violated for the Performance variable.

Figure 1. The histogram for the Performance variable.

Research Question, Hypothesis, and Alpha Level

For the dummy-coded regression, the research question is: “Do levels of anxiety predict exam performance?” The null hypothesis for the overall regression is that the levels of anxiety do not predict exam performance (i.e., the means of performance do not differ significantly). The alternative hypothesis is that the levels of anxiety predict exam performance (i.e., at least two means differ significantly). For D1, the null hypothesis is that there is no significant difference in exam performance between the medium- and low-anxiety groups; the alternative hypothesis is that there is such a difference. For D2, the null hypothesis is that there is no significant difference in exam performance between the medium- and high-anxiety groups; the alternative hypothesis is that there is such a difference. For the orthogonal-coded regression, the research question, and the null and alternative hypothesis for the overall regression are the same as those for the dummy-coded regression. However, for O1, the null hypothesis is that there is no significant difference in exam performance between the high- and low-anxiety groups; the alternative hypothesis is that there is such a difference. For O2, the null hypothesis is that there is no significant difference in exam performance between the mean of the medium-anxiety group and the combined means of the low-anxiety and high-anxiety groups; the alternative hypothesis is that there is such a difference. Because no rationale is provided for choosing the α-level, the standard α=.05 will be used for the tests.

Interpretation

As was stated before, the Performance variable was judged to be approximately normal, so no transformations were needed.

For the dummy-coded regression, D1=1 for the low-anxiety group, and D1=0 for other groups.

For the orthogonal-coded regression, the dummy variables were coded as shown in Table 1 below:

Table 1. Orthogonal coding of dummy variables for the orthogonal-coded regression.

	Low Anxiety	Medium Anxiety	High Anxiety
O1	-1	0	+1
O2	+1	-2	+1

Both regressions were conducted using the method of forced entry (“Enter”) (Field, 2013).

Dummy-Coded Regression Results

Table 2. Model summary output for the dummy-coded regression.

Model Summary
Model	R	R Square	Adjusted R Square	Std. The error of the Estimate	Change Statistics
					R Square Change	F Change	df1	df2	Sig. F Change
1	.738^a	.544	.468	7.512	.544	7.164	2	12	.009
a. Predictors: (Constant), High Anxiety Group, Low Anxiety Group

Table 2 above supplies the model summary. The multiple correlation coefficient R=.738, which indicates a good model fit. The R²=.544, meaning that the model can explain approximately 54.4% of the variance in the data.

Table 3. The SPSS ANOVA output for the dummy-coded regression.

ANOVA
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	808.533	2	404.267	7.164	.009^b
	Residual	677.200	12	56.433
	Total	1485.733	14
a. Dependent Variable: Performance
b. Predictors: (Constant), High Anxiety Group, Low Anxiety Group

Table 3 above provides the ANOVA output for the regression. In this case, F(2)=7.164, and it is statistically significant at p=.009. Therefore, the null hypothesis for the overall dummy-coded regression can be rejected at α=.05.

Table 4. The SPSS Coefficients output for the dummy-coded regression.

Coefficients
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	Correlations
		B	Std. Error	Beta	t	Sig.	Zero-order	Partial	Part
1	(Constant)	86.400	3.360		25.718	.000
	Low Anxiety Group	-17.600	4.751	-.834	-3.704	.003	-.549	-.730	-.722
	High Anxiety Group	-12.000	4.751	-.568	-2.526	.027	-.152	-.589	-.492
a. Dependent Variable: Performance

Table 4 above demonstrates the Coefficients output. The b values mean that the performance can be predicted from the regression model as follows (Warner, 2013):

Performance = b_Constant + b_{LowAnxietyGroup}*D1 + b_{HighAnxietyGroup}*D2.

The b_{LowAnxietyGroup} and b_{HighAnxietyGroup} coefficients refer to mean differences between the respective group and the medium anxiety group; the latter means is represented by _constant.

Both b values were statistically significant:

b_{LowAnxietyGroup} = -17.600, t(11)=-3.704, p=.003; therefore, the null hypothesis for D1 was rejected, and evidence was found to support the alternative hypothesis. The effect size as measured by squared semi partial correlation was sr_D1=.52 (large).
b_{HighAnxietyGroup} = -12.000, t(11)=-2.526, p=.027. Thus, the null hypothesis for D2 was rejected, and evidence was found to support the alternative hypothesis. The effect size as measured by squared semi partial correlation was sr_D2=.24 (medium).

Orthogonal-Coded Regression Results

Table 5. Model summary output for the orthogonal-coded regression.

Model Summary
Model	R	R Square	Adjusted R Square	Std. An error of the Estimate	Change Statistics
					R Square Change	F Change	df1	df2	Sig. F Change
1	.738^a	.544	.468	7.512	.544	7.164	2	12	.009
a. Predictors: (Constant), Orthogonal Curvilinear Trend, Orthogonal Positive Linear Trend

Table 5 above provides the model summary. The multiple correlation coefficient R=.738, (a good model fit). The R²=.544, so the model can explain nearly 54.4% of the variance in the data.

Table 6. The SPSS ANOVA output for the orthogonal-coded regression.

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	808.533	2	404.267	7.164	.009^b
	Residual	677.200	12	56.433
	Total	1485.733	14
a. Dependent Variable: Performance
b. Predictors: (Constant), Orthogonal Curvilinear Trend, Orthogonal Positive Linear Trend

Table 6 above provides the ANOVA output for the regression. Here, F(2)=7.164; it is significant, p=.009. Thus, the null hypothesis for the overall orthogonal-coded regression can be rejected at α=.05.

Table 7. The SPSS Coefficients output for the orthogonal-coded regression.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	Correlations
		B	Std. Error	Beta	t	Sig.	Zero-order	Partial	Part
1	(Constant)	76.533	1.940		39.457	.000
	Orthogonal Positive Linear Trend	2.800	2.376	.230	1.179	.261	.230	.322	.230
	Orthogonal Curvilinear Trend	-4.933	1.372	-.701	-3.597	.004	-.701	-.720	-.701
a. Dependent Variable: Performance

Table 7 above supplies the Coefficients output. _constant is the grand mean of Performance, whereas b-values reflect contrasts (Warner, 2013):

b_{OrthogonalPositiveLinearTrend}= 2.800; it represents the contrast between the low- and high-anxiety groups. Here, t(11)=1.179, p=.261. Therefore, the null hypothesis for O1 was not rejected; the difference between the mentioned groups was non-significant. The effect size as measured by squared semi partial correlation was sr_O1=.0529 (small).
b_{OrthogonalCurvilinearTrend}= -4.933; it represents the difference between the mean of the medium-anxiety group and the combined means of low- and high-anxiety groups. In this case, t(11)=-3.597, p=.004; thus, the null hypothesis for O2 was rejected, and evidence was found to support the alternative hypothesis. The effect size as measured by squared semi partial correlation was sr_O2=.491 (large).

Conclusion

Therefore, both the dummy-coded multiple regression and the orthogonal-coded multiple regression provided the same answers to the overall research question of the analysis (the results were significant). The ANOVA outputs (Tables 2 and 5), as well as the Model Summary outputs (Tables 3 and 6), were equivalent in the two regressions, which indicates that both regressions tested the same overall hypotheses. However, the Coefficients outputs (Tables 4 and 7) were different, which is caused by the fact that the variables are coded differently, and the regressions tested different null hypotheses for the dummy variables.

A strength of the dummy coding is that it allows for directly comparing the groups to one another; for instance, in the current regression, the medium-anxiety group was directly compared to the low-anxiety group and to the high-anxiety group. In addition, the dummy coding allows for easily obtaining the group means for the dependent variable. However, a limitation is that it might be difficult to contrast a number of groups with the same coding. An advantage of orthogonal coding is that it permits for more easily contrasting different groups to one another, or for comparing one group to the rest of the groups. A disadvantage, however, is that is somewhat more difficult to calculate the group means for the dependent variables.

References

Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.

Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.