Split Variables in IBM SPSS Statistical Software

Topics: IBM, Software Words: 362 Pages: 1

The IBM SPSS software provides an option to split a file into groups. The membership of cases in groups is determined by the value of a split variable for that case. It is possible to use virtually any variable as a split variable, but using only a certain type of variable for this purpose makes sense (Warner, 2013).

For the division into groups to be meaningful, it is usually needed that the split variable is a categorical one, and not continuous. Using a continuous variable for dividing files often means that there will be nearly as many subgroups as there are cases, so there is no point in the division. On the other hand, a categorical variable may allow for splitting the file into a manageable number of groups.

In addition, the split variable needs to divide the file into meaningful subsets for which it makes sense to calculate statistics separately. For example, it makes sense to divide the sample of participants into groups according to their gender or race, or according to their level of income, where the income is measured categorically, for instance, as “low,” “medium,” and “high”; or “below $20,000,” “$20,000-50,000,” and “above 50,000.” Another example: level of education of participants: 1 – high school or lower, 2 – some college, 3 – college degree, 4 – Master’s degree, 5 – Ph.D. or a similar degree.

Using test scores as a split variable usually does not make sense because there would be nearly as many groups as participants; these “groups” would not represent anything apart from those participants who would be their sole members. Similarly, it does not make sense to use ID numbers as split variables, because there would be precisely the same number of groups as the number of participants.

Splitting the sample into groups ought to create some meaningful division, such that calculating statistics for each of these groups and comparing them would provide meaningful results. In addition, the number of groups resulting from the split should be manageable. For example, it is easy to compare 2-3 groups, but comparing 50 different groups would often yield no results that could be understood by humans (Field, 2013).

References

Field, A. (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.

Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.