Data Suitability
Although the structure of the data appears normal (observations are stored in one record each; replications have separate records; each variable has its own field, which is always in the same column, etc.), not all the variables from the file “Data 02 (1).sav” are suitable for analysis. The data in several variables missing or appears corrupted. For example, the values 7777 and 9999 in the variable HEIGHT3 cannot be explained without further clarification. The variable ORACE2 only has 58 valid values; the rest, 7631 values, are missing. The data in the variable IDATE is corrupt because the last number of the year is missing, although this can be mitigated because there is the IYEAR variable. The variable HEIGHT3 is very difficult to use, and should be treated as categorical; it should be recorded into a continuous variable (this is done below).
The variable CHILDREN, which is supposed to reflect the number of children in a household, often has the value of 88 (5728 out of 7689 values). For certain variables, it might be possible to address the problem; for instance, the value “88” in CHILDREN might mean that the household has no children; the missing values in the variable PREGNANT may denote the value “not pregnant,” although it is impossible to differentiate these from simply missing data.
On the whole, some of the data is usable, at least for certain purposes; some is corrupt, but can be mitigated; some is corrupt, and cannot be mitigated without additional information.
Converting and Combining Variables
Converting
Table 1 below is a frequencies table for height in feet and inches (file “Data 02 (1).sav”), whereas Table 2 displays frequencies for a new variable cat_height, obtained via Transform → Recode into different variables (Field, 2013), which reflects the categories into which the participants were sorted according to their height: 1 is ≤4 feet 11 inches; 2 is 5 feet – 5 feet 11 inches; 3 is 6 feet – 7 feet 8 inches; 0 is ≥ 7 feet 9 inches. The last category (0) contains original values 7777 and 9999 which make no sense and should be excluded from the analysis (e.g., via Filter Data procedure).
Table 1. Height in feet and inches – frequencies (file “Data 02 (1).sav”).
Table 2. Height categories – frequencies (file “Data 02 (1).sav”).
Combining
Tables 3 and 4 below provide the frequencies for the numbers of men and women in households, respectively (file “Data 02 (1).sav”).
Table 3. Frequencies for men (file “Data 02 (1).sav”).
Table 4. Frequencies for women (file “Data 02 (1).sav”).
Table 5 provides frequencies for the total number of men and women in a household; this new variable, males_plus_females, was gained via the Transform → Compute variable (Warner, 2013).
Table 5. Frequencies for men and women (file “Data 02 (1).sav”).
Further Data Manipulations
Merging
To combine the files “Data 02 (1).sav” and “Data 03 (1).sav,” which contain the same variables, a variable id_merge was created to enumerate the cases and keep track of them. Cases were enumerated 1 through 7689, and 7690 through 12466 for the named files, respectively.
For the file “Data 02 (1).sav,” the descriptives for the variable WEIGHT2 are shown in Table 6 below. The descriptives for the same variable from the file “Data 03 (1).sav” are shown in Table 7. The descriptives for the same variable from the merged file “02_03_merged.sav” are shown in Table 8.
Table 6. Descriptives for WEIGHT2 in “Data 02 (1).sav.”
Table 7. Descriptives for WEIGHT2 in “Data 03 (1).sav.”
Table 8. Descriptives for WEIGHT2 in “02_03_merged.sav.”
Merging these two files allows for combining the data from these files with respect to the sample. In other words, because both files have the same variables, merging the files simply permits to add cases from the second data set to the first data set.
Manipulating the Data to Create a New Variable
A new variable in the merged file “02_03_merged.sav” will be created by using the command Transform → Compute variable to multiply the existing variable WEIGHT2 by the number 0.453592 (George & Mallery, 2016). This will allow for creating a new variable weight_kg denoting the weight of the participants in kilograms. Such a variable will be useful if it is needed to calculate the body mass index of the participants (BMI = weight / height2, where weight is in kilograms, and height is in meters), which will permit for assessing whether the participants are underweight, of normal weight, overweight, or obese.
The descriptives for WEIGHT2 in this data set can be found in Table 8 above. The descriptives for weight_kg can be found in Table 9 below.
Table 9. Descriptives for weight_kg in “02_03_merged.sav.”
Manipulating the Data Structure
Manipulating the data structure means changing variables so that they would be measured in different units (DeCoster, 2001). On the whole, this was done in the previous subsection, when weight in pounds was transformed into weight in kilograms. The same can be done with the variable HEIGHT3 to make it usable in the analysis (file “Data 02 (1).sav”). First, it is possible to create a new variable in which the height would be measured in inches only. It is possible to do that by using the command via Transform → Recode into different variables and manually setting the values for each value of height (from 400 to 708). The resulting variable is height_inches.
There is no point in creating the descriptives for HEIGHT3 because the data is categorical. However, the frequencies are given in Table 1 above. For the data in inches only (height_inches), descriptives are provided in Table 10 below.
It should be noted that the syntax does not contain transformation instructions for the values from 700 through 707 because there are no such values in the data, as can be seen from Table 1 with frequencies for HEIGHT3. Also, the values 7777 and 9999 (outliers that make no sense in this variable) were left as they were during the transformation. They can be filtered out by using the command Data → Select cases, for example.
Table 10. Descriptives for height_inches in “Data 02 (1).sav”.
References
DeCoster, J. (2001). Transforming and restructuring data. Web.
Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.
George, D., & Mallery, P. (2016). IBM SPSS Statistics 23 step by step: A simple guide and reference (14th ed.). New York, NY: Routledge.
Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.