Logistic Regression: Research Methods

Words: 1384 Pages: 5

This is a method that used in the modeling of dichotomous outcome variables. It is usually represented in two forms, either as simple logistic regression or multiple logistic regression. The simple logistic regression is used whenever there is one nominal variable with two values and a measurable variable. Under such circumstances the dependent is replaced by the nominal variable. The independent is usually replaced by the measurable variable. The existence of two independent variables with the dependent variable being nominal usually prompts the use of multiple logistic regression. There are a number of similarities between the linear and simple logistic regressions however; in simple logistic regression the dependent variable is nominal.. The major huddle might be in determining the value of the nominal variable when measurable variable has been given (Ahuja, 2010).

The prediction of the nominal value is possible through the use of logistic regression. This model is very relevant to the nursing field as it is used in finding out the probability of a certain medical condition occurring. One may for instance set out to determine the effect of the cholesterol level on heart attack. In such a case, the statistician might take a number of women who are 50 years old and record their cholesterol levels. 15 years later, the statistician follows up to determine the number of those who were prone to heart attack and compare with the statistics of the cholesterol level. Basing on the findings, one is able to deduce that cholesterol levels have or do not have an effect on the probability of having a heart attack. Logistic regression is also used whenever the measurable variable has been set by the statistician while the nominal variable is free to vary. Logistic regression can also be done when there are two nominal variables (Brockopp, 2003).

How it Works

In simple logistic regression an equation is usually used to find the Y value for every X value or variable. In the linear regression the value of Y can be determined. In logistic regression the Y value is the probability of getting a given value of a nominal variable. Taking the case of cholesterol level and its impact on the probability of having a heart attack, the nominal value would be represented by the probability of having a heart attack. The value usually varies from 0 to 1. The limited value is not used directly but it takes the form of an equation; Y (1-Y), also known as the odds. In the event that the probability of a patient suffering from heart attack is 0.25, then the odds would be 0.25(1-0.25). The odds would therefore be 1/3. The equation for the natural log would therefore be; ln[Y/(1−Y)]=a+bX. The slope and the intercept are used to determine the line of best fit. The maximum likelihood method is used to determine the value at which the expected results can probably be observed. The method is computer intensive unlike the list-squares method that is employed in the linear regression method. The P value can be determined by a number of methods although the most recommended is the likelihood ratio method(Brockopp, 2003).

Logistic regression has been cited as the best model for data analysis as it is easier to interpret. The multiple logistic regression is more often used as it allows the researchers to make a comparison between the dependent and the independent/predictor variables. The Logistic regression’s results usually include the odds ration hence making the interpretation of data easier. The odds ratio has been widely used as it estimates the likelihood of an occurrence. By using the odds ration, one can easily find the relative risk. If there is only one observable value for the nominal variable, then one does not have to use a scatter graph as the Y value will either be at 0 or 1(Ahuja, 2010).

The Use of Spreadsheets/SPSS

The use of spreadsheets for data analysis is vital in analyzing the data and checking out for any errors. The SPSS uses the nominal, ordinal and scale data. In SPSS, the variables need to have short labels (Baker, 1988).In SPSS, the table with omnibus tests Model Coefficients is usually critical in complex models and its purpose is to enable the predicting of the covariates within the model jointly. In addition, SPSS has a model summary table which shows the fitness of the model in handling the respective data. This allows for the comparison of various models. The model also has the classification table which allows for the diagnostic testing. The table has a section for whatever is observed, predicted and the percentage of correctness. There is usually a need for the setting of the cut-off field to a value different from the default. In some cases, the percentage correct is usually associated with the sensitivity or specificity. The other well known table in SPSS is the Variables in the Equation table. This one usually has a column for the log odds ratio that has been estimated, the Sig. Column which shows the p-value and the Exp (B) column which shows the odds ratio. The other type of table that is quite evident in the SPSS model is the Risk Estimate Table which gives the odds ratio as well as the various risk ratio information. There is also another table for Case Processing Summary. It has the information on the missing as well as those cases that have not been selected. This prevents the unexpected loss of data. The table for the Dependent Variable Encoding shows the categories that have been labeled as 0 and 1. In case of the divergence of the expected results from the expected, then the statistician has to check here (Grove, 2005).

In the mentioned case, they would represent the women that get a heart attack after the period of 15 years. The marginal percentage usually represents the proportion of the respective observation to the number of variables in that particular group. In the given case, this can be found by getting the proportion of those who succumbed to heart attack within the group. The other parameter which is referred to as ‘valid’ stands for those observations in the data whereby the outcome and predictor variables are both present. The ‘Missing’ parameter stands for the missing data within the observations of a particular dataset or predictor. “Total’ as a parameter represents both the valid and missing parameters in a given set of data (Grove, 2005).

Sub population represents the combination of certain predictor variables. B, which is the regression coefficient, stands for the change that occurs in the dependent variable for an increase in the predictor variable unit. Another parameter is the T-test which is a statistical comparison between to variables. This helps in determining whether the two different variables are similar or different, their rate of change might for instance be identical. In the given example, if the population to be examined included both sexes (Male and Female), then the t-test would determine whether the cholesterol level in the body for both sexes had the same effect on the probability of having a heart attack or if there was a difference (Grove, 2005). The F-statistic is vital in the analysis of variance as well as linear regression. This is usually a square of the t-value. The null hypothesis on the other hand stands for the general position that can either be proved right or wrong depending on the findings from the research (Munro, 2000). In the mentioned case, the null hypothesis could be; ‘People with high cholesterol levels are more susceptible to heart attacks than those with lower cholesterol levels.’

Regression is a parameter that determines how strong the dependent variable is related to the independent variables. According to the given case, it is quite evident that Logistic regression is very important in determining the probability of some condition occurring or not. Basing on such deductions, one is able to give advice concerning a number of conditions for instance in the medical field. In the given case for instance, if it turns out that those with high cholesterol levels have a greater chance of developing a heart attack, then the doctor might advise his/her patience to reduce the rate of cholesterol consumption so as to avoid the risk of developing a heart attack (Munro, 2000).

References

Ahuja, R. (2010). Research Methods. New Delh: Rawat Publications.

Baker, T. (1988). Doing Social Research. New York: McGraw Hill Book Co.

Brockopp, D. (2003). Fundamental of Nursing Research. Boston: Jones and Bartle.

Grove, N. B. (2005). The Practice of Nursing Research Appraisal, Synthesis, and Generation of Evidence. Texas: Saunders.

Munro, B. H. (2000). Statistical methods for health care research. Michigan: Lippincott Williams & Wilkins.