Logistic Regression Results for Data Analysis | Free Essay Example

# Logistic Regression Results for Data Analysis

Words: 1720
Topic: Sciences
Updated:

## Introduction

This paper provides the results of a logistic regression that was run to analyze the data contained in the file “helping3.sav.” The data file is described; the assumptions for the logistic regression are articulated and tested; the research question, hypotheses, and alpha level are stated; and the results of the test are reported and explained.

## Data File Description

The data was taken from the file “helping3.sav” found at https://s3.amazonaws.com/documents.routledge-interactive/9780134320250/eResources.zip. It comprises real research data assessing people’s helpfulness (George & Mallery, 2016). The current paper provides a logistic regression analysis to predict the people’s helpfulness (cathelp, dichotomous: 0=helpful, 1=not helpful, after the modification for analysis by SPSS) as assessed by the friend they tried to help, from those people’s sympathy towards the friend (sympathy, interval/ratio), their anger towards the friend (angert, interval/ratio), the helper’s self-efficacy (effect, interval/ratio), and the helper’s ethnicity (ethnic, categorical; will be dummy-coded in regression; see Appendix). For all the interval/ratio variables, lower numbers represent a lower intensity of the feeling (George & Mallery, 2016). The sample size is N=537. The demographic variables in the data set allow for concluding that the population for the study is rather broad.

## Assumptions, Data Screening, and Verification of Assumptions

### Assumptions

The logistic regression requires that certain assumptions are satisfied (Field, 2013; Laird Statistics, n.d.):

1. The dependent variable is dichotomous, and comprises mutually exclusive, exhaustive categories; the independent variables are interval/ratio or categorical;
2. There ought to be the independence of observations;
3. There are linear relationships between the continuous independent variables and their logit transformations;
4. There needs to be no multicollinearity in the data.

### Verifying Assumptions

#### Assumption 1

As follows from the “Data File Description” section, the assumption is met.

#### Assumption 2

The observations are independent–each case represents a different person helping a different friend.

#### Assumption 3

To test this assumption, it is advised to run a linear regression using the “Enter” method; the regression should include interaction terms between the continuous independent variables and their logit transformations (for instance, their natural logarithms) (Field, 2013, sec. 19.4.1, 19.8.1).

Therefore, three variables were created for this purpose: ln_sympatht, ln_effict, and ln_angert. The results of the analysis are supplied in Table 1 below. It can be seen that the assumption is violated for the variable angert because the interaction is significant (p=.015).

 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a sympathy .978 1.313 .555 1 .456 2.659 angert 1.341 .541 6.153 1 .013 3.823 effect 4.121 2.834 2.114 1 .146 61.637 ethnic 5.036 4 .284 ethnic(1) -.330 .369 .801 1 .371 .719 ethnic(2) .204 .486 .176 1 .675 1.226 ethnic(3) -.105 .436 .058 1 .810 .900 ethnic(4) -.675 .451 2.239 1 .135 .509 ln_sympatht by sympathy -.189 .521 .132 1 .716 .828 angert by ln_angert -.630 .260 5.882 1 .015 .533 effect by ln_effict -1.166 1.110 1.103 1 .294 .312 Constant -15.355 5.553 7.646 1 .006 .000 a. Variable(s) entered on step 1: sympatht, angert, effict, ethnic, ln_sympatht * sympatht , angert * ln_angert , effict * ln_effict.

Table 1. Testing the assumption of the linear relationship between the independent variables and their logit transformations.

To address this problem, data transformation procedures can be employed to adjust the independent variables (Warner, 2013). However, this will not be done in the current paper due to the need to follow specific instructions from George and Mallery (2016).

#### Assumption 4

To test the assumption of non-multicollinearity, it is possible to run a linear regression with the same variables as the main logistic regression while using the SPSS option “multicollinearity diagnostics” (Field, 2013, sec. 19.8.2). It is stated that the tolerance values below 0.1 and VIF (variance inflation factor) values greater than 10 indicate multicollinearity. As one can see from Table 2 below, the analyzed data should have no problems with multicollinearity. Therefore, the assumption is met.

 Coefficients Model Collinearity Statistics Tolerance VIF 1 sympathy .945 1.058 effect .963 1.038 angert .982 1.018 ethnic .996 1.004 a. Dependent Variable: cathelp

Table 2. Coefficients for collinearity diagnostics.

## Screening the Data

Figures 1-3 below provide histograms for sympathy, effect, and angert variables. It can be seen that sympathy and effect are approximately normally distributed, and there are no apparent considerable outliers. However, the anger variable is not normally distributed due to a large number of values close to 1 (about 220 cases, or nearly 40% of the sample), and because of this, there are some outliers. However, the data will not be transformed, and no outliers will be excluded, for it is needed to follow specific instructions from George and Mallery (2016); also, it is very improbable that such a distribution results from a sampling error.

As for out-of-bounds values, the histograms show that there are no such values in the data.

## Inferential Procedure, Hypotheses, Alpha Level

The research question for the given analysis is: “Do any of the following variables: the sympathy, anger, self-efficacy, and ethnicity of helpers–predict the helpfulness as perceived by those they provided help for?” The null hypothesis will be: “None of the following variables: the sympathy, anger, self-efficacy, and ethnicity of helpers–predict the helpfulness as perceived by those they provided help for.” The alternative hypothesis will be: “At least some of the following variables: the sympathy, anger, self-efficacy, and ethnicity of helpers–predict the helpfulness as perceived by those they provided help for.” The alpha level will be standard, α=.05. The hypotheses will be addressed by the χ2 of the model and its significance value.

## Interpretation

Therefore, logistic regression was conducted using the variables sympathy, effect, anger, and ethnicity as predictors, and the variable can help as the outcome. As was noted, the variable ethnic was recoded by SPSS using the dummy coding technique (see Appendix). The forward stepwise method of entry based on the likelihood ratio was used; the likelihood ratio for variable entry was set at.05, and the likelihood ratio of.10 was used for variable removal.

There were 2 steps in the variable entry. Table 3 below shows how the variables entered improved the model. It can be seen that the first variable added χ2(1)=114.843, p<.001, and the second variable added χ2(1)=29.792, p<.001, resulting in total χ2(2)=144.635, p<.001 of the model.

 Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 114.843 1 .000 Block 114.843 1 .000 Model 114.843 1 .000 Step 2 Step 29.792 1 .000 Block 144.635 2 .000 Model 144.635 2 .000

Table 3. The variables included in the model.

As a result, the variable’s anger and ethnic(1) ethnic(4) (the dummy variables representing the variable ethnic) were removed from the model (see Table 4 below).

 Variables not in the Equation Score df Sig. Step 1 Variables sympathy 29.135 1 .000 angert .019 1 .891 ethnic 4.379 4 .357 ethnic(1) .353 1 .553 ethnic(2) 2.681 1 .102 ethnic(3) .246 1 .620 ethnic(4) 1.709 1 .191 Overall Statistics 34.113 6 .000 Step 2 Variables angert .640 1 .424 ethnic 4.892 4 .299 ethnic(1) .464 1 .496 ethnic(2) 2.304 1 .129 ethnic(3) .357 1 .550 ethnic(4) 2.169 1 .141 Overall Statistics 5.404 5 .369

Table 4. The variables which were not included in the model (the final step).

From the Table 5 below, it can be seen that the final model predicted the outcome variable better than the first one, for the -2 Log-likelihood was smaller in the second model (George & Mallery, 2016). It can also be seen that the final model predicted nearly 23.6-31.5% of the variance in the data (Cox & Snell’s R2=.236, Nagelkerke R2=.315). Table 6 below shows that the final model successfully predicted the outcomes in nearly 70.4% of cases.

 Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 629.506a .193 .257 2 599.713b .236 .315 a. Estimation terminated at iteration number 4 because parameter estimates changed by less than.001. b. Estimation terminated at iteration number 5 because parameter estimates changed by less than.001.

Table 5. Model summary for the final step.

 Classification Tables Observed Predicted cathelp Percentage Correct NOT HELPFUL HELPFUL Step 1 cathelp NOT HELPFUL 176 89 66.4 HELPFUL 79 193 71.0 Overall Percentage 68.7 Step 2 cathelp NOT HELPFUL 181 84 68.3 HELPFUL 75 197 72.4 Overall Percentage 70.4 a. The cut value is.500

Table 6. Classification table for the final step.

Only the variables effect and sympathy were retained in the final model, as can be seen from Table 7 below. Effect significantly predicted helpfulness: Exp(B)=3.046, Wald’s test statistic=76.197, df=1, p<.001. Sympathy also significantly predicted helpfulness: Exp(B)=1.596, Wald’s test statistic=27.459, df=1, p<.001. The Constant coefficient for the model was Exp(B)=.001; it had Wald’s test statistic=93.006, df=1, p<.001.

 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a effect 1.130 .122 85.356 1 .000 3.094 Constant -5.303 .585 82.030 1 .000 .005 Step 2b sympathy .467 .089 27.459 1 .000 1.596 effect 1.114 .128 76.197 1 .000 3.046 Constant -7.471 .775 93.006 1 .000 .001 a. Variable(s) entered on step 1: effict. b. Variable(s) entered on step 2: sympathy.

Table 7. Variables in the regression equation at the final step.

Thus, the equation for the regression model at the final step was as follows (George & Mallery, 2016):

ln (Odds (helping)) = ln (P(helping) / P(not helping)) =

= -7.471 + 0.467×sympatht + 1.114×effict,

or:

Odds (helping) = P(helping) / P(not helping) =

= e-7.471 × e0.467×sympatht + e1.114×effict.

Therefore, the null hypothesis was rejected, and evidence was found to support the alternative hypothesis.

A certain limitation related to the fact that the given equations do not take into account the standard error of B, which is displayed in Table 7 above, should be pointed out. Another limitation is related to the entry method (forward stepwise method of entry based on likelihood ratio); purely mathematical considerations were used to select the variables for the final model, which is strongly advised against by Field (2013).

## Conclusion

Thus, logistic regression was run to find out whether the sympathy, anger, self-efficacy, and ethnicity of helpers could predict whether they would be assessed as helpful or non-helpful by friends for whom they provided the aid. It was found that sympathy and self-efficacy could predict their helpfulness, whereas the rest of the independent variables could not. Therefore, the null hypothesis was rejected, and evidence was found to support the alternative hypothesis that at least some of the independent variables could predict the helpfulness of helpers.

## References

Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.

George, D., & Mallery, P. (2016). IBM SPSS Statistics 23 step by step: A simple guide and reference (14th ed.). New York, NY: Routledge.

Laerd Statistics. (n.d.). Binomial logistic regression using SPSS Statistics. Web.

Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.

## Appendix

Dummy coding of the independent variable ethnic as carried out by SPSS:

 Categorical Variables Codings Frequency Parameter coding (1) (2) (3) (4) ethnic WHITE 293 1.000 .000 .000 .000 BLACK 50 .000 1.000 .000 .000 HISPANIC 80 .000 .000 1.000 .000 ASIAN 70 .000 .000 .000 1.000 OTHER/DTS 44 .000 .000 .000 .000