Data Structure: Statistical Analysis

Is the Data Suitable for Analysis?

Although the structure of the data appears normal (observations are stored in one record each; replications have separate records; each variable has its own field, which is always in the same column, etc.), not all the variables from the file “Data 02 (1).sav” are suitable for analysis. The data in several variables missing or appears corrupted. For example, the values 7777 and 9999 in the variable HEIGHT3 cannot be explained without further clarification. The variable ORACE2 only has 58 valid values; the rest, 7631 values, are missing. The data in the variable IDATE is corrupt because the last number of the year is missing, although this can be mitigated because there is the IYEAR variable. The variable HEIGHT3 is very difficult to use, and should be treated as categorical; it should be recorded into a continuous variable (this is done below).

The variable CHILDREN, which is supposed to reflect the number of children in a household, often has the value of 88 (5728 out of 7689 values). For certain variables, it might be possible to address the problem; for instance, the value “88” in CHILDREN might mean that the household has no children; the missing values in the variable PREGNANT may denote the value “not pregnant,” although it is impossible to differentiate these from simply missing data.

On the whole, some of the data is usable, at least for certain purposes; some is corrupt, but can be mitigated; some is corrupt, and cannot be mitigated without additional information.

Converting and Combining Variables

Converting

Table 1 below is a frequencies table for height in feet and inches (file “Data 02 (1).sav”), whereas Table 2 displays frequencies for a new variable cat_height, obtained via Transform → Recode into different variables (Field, 2013), which reflects the categories into which the participants were sorted according to their height: 1 is ≤4 feet 11 inches; 2 is 5 feet – 5 feet 11 inches; 3 is 6 feet – 7 feet 8 inches; 0 is ≥ 7 feet 9 inches. The last category (0) contains original values 7777 and 9999 which make no sense and should be excluded from the analysis (e.g., via Filter Data procedure).

HEIGHT3
Frequency Percent Valid Percent Cumulative Percent
Valid 400 1 .0 .0 .0
402 1 .0 .0 .0
404 1 .0 .0 .0
405 1 .0 .0 .1
406 2 .0 .0 .1
407 3 .0 .0 .1
408 9 .1 .1 .2
409 12 .2 .2 .4
410 21 .3 .3 .7
411 83 1.1 1.1 1.7
500 231 3.0 3.0 4.7
501 264 3.4 3.4 8.2
502 600 7.8 7.8 16.0
503 631 8.2 8.2 24.2
504 873 11.4 11.4 35.5
505 703 9.1 9.1 44.7
506 740 9.6 9.6 54.3
507 603 7.8 7.8 62.2
508 512 6.7 6.7 68.8
509 475 6.2 6.2 75.0
510 419 5.4 5.4 80.4
511 406 5.3 5.3 85.7
600 420 5.5 5.5 91.2
601 247 3.2 3.2 94.4
602 145 1.9 1.9 96.3
603 100 1.3 1.3 97.6
604 55 .7 .7 98.3
605 22 .3 .3 98.6
606 17 .2 .2 98.8
607 8 .1 .1 98.9
608 5 .1 .1 99.0
609 4 .1 .1 99.0
611 1 .0 .0 99.0
708 1 .0 .0 99.1
7777 56 .7 .7 99.8
9999 17 .2 .2 100.0
Total 7689 100.0 100.0

Table 1. Height in feet and inches – frequencies (file “Data 02 (1).sav”).

cat_height
Frequency Percent Valid Percent Cumulative Percent
Valid 0 73 .9 .9 .9
1 134 1.7 1.7 2.7
2 6457 84.0 84.0 86.7
3 1025 13.3 13.3 100.0
Total 7689 100.0 100.0

Table 2. Height categories – frequencies (file “Data 02 (1).sav”).

Combining

Tables 3 and 4 below provide the frequencies for the numbers of men and women in households, respectively (file “Data 02 (1).sav”).

Table 5 provides frequencies for the total number of men and women in a household; this new variable, males_plus_females, was gained via the Transform → Compute variable (Warner, 2013).

NUMMEN
Frequency Percent Valid Percent Cumulative Percent
Valid 0 2237 29.1 34.0 34.0
1 3872 50.4 58.9 92.9
2 407 5.3 6.2 99.1
3 50 .7 .8 99.8
4 9 .1 .1 100.0
5 1 .0 .0 100.0
Total 6576 85.5 100.0
Missing System 1113 14.5
Total 7689 100.0

Table 3. Frequencies for men (file “Data 02 (1).sav”).

NUMWOMEN
Frequency Percent Valid Percent Cumulative Percent
Valid 0 736 9.6 11.2 11.2
1 5109 66.4 77.7 88.9
2 634 8.2 9.6 98.5
3 84 1.1 1.3 99.8
4 8 .1 .1 99.9
5 4 .1 .1 100.0
6 1 .0 .0 100.0
Total 6576 85.5 100.0
Missing System 1113 14.5
Total 7689 100.0

Table 4. Frequencies for women (file “Data 02 (1).sav”).

males_plus_females
Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 2662 34.6 40.5 40.5
2.00 3112 40.5 47.3 87.8
3.00 584 7.6 8.9 96.7
4.00 181 2.4 2.8 99.4
5.00 27 .4 .4 99.8
6.00 6 .1 .1 99.9
7.00 3 .0 .0 100.0
10.00 1 .0 .0 100.0
Total 6576 85.5 100.0
Missing System 1113 14.5
Total 7689 100.0

Table 5. Frequencies for men and women (file “Data 02 (1).sav”).

Further Data Manipulations

Merging

To combine the files “Data 02 (1).sav” and “Data 03 (1).sav,” which contain the same variables, a variable id_merge was created to enumerate the cases and keep track of them. Cases were enumerated 1 through 7689, and 7690 through 12466 for the named files, respectively.

For the file “Data 02 (1).sav,” the descriptives for the variable WEIGHT2 are shown in Table 6 below. The descriptives for the same variable from the file “Data 03 (1).sav” are shown in Table 7. The descriptives for the same variable from the merged file “02_03_merged.sav” are shown in Table 8.

Merging these two files allows for combining the data from these files with respect to the sample. In other words, because both files have the same variables, merging the files simply permits to add cases from the second data set to the first data set.

Descriptive Statistics
N Mean Std. Deviation
WEIGHT2 7689 522.08 1711.739
Valid N (listwise) 7689

Table 6. Descriptives for WEIGHT2 in “Data 02 (1).sav.”

Descriptive Statistics
N Mean Std. Deviation
WEIGHT2 4777 615.26 1960.493
Valid N (listwise) 4777

Table 7. Descriptives for WEIGHT2 in “Data 03 (1).sav.”

Descriptive Statistics
N Mean Std. Deviation
WEIGHT2 12466 557.78 1811.593
Valid N (listwise) 12466

Table 8. Descriptives for WEIGHT2 in “02_03_merged.sav.”

Manipulating the Data to Create a New Variable

A new variable in the merged file “02_03_merged.sav” will be created by using the command Transform → Compute variable to multiply the existing variable WEIGHT2 by the number 0.453592 (George & Mallery, 2016). This will allow for creating a new variable weight_kg denoting the weight of the participants in kilograms. Such a variable will be useful if it is needed to calculate the body mass index of the participants (BMI = weight / height2, where weight is in kilograms, and height is in meters), which will permit for assessing whether the participants are underweight, of normal weight, overweight, or obese.

The descriptives for WEIGHT2 in this data set can be found in Table 8 above. The descriptives for weight_kg can be found in Table 9 below.

Descriptive Statistics
N Mean Std. Deviation
weight_kg 12466 253.0065 821.72429
Valid N (listwise) 12466

Table 9. Descriptives for weight_kg in “02_03_merged.sav.”

Manipulating the Data Structure

Manipulating the data structure means changing variables so that they would be measured in different units (DeCoster, 2001). On the whole, this was done in the previous subsection, when weight in pounds was transformed into weight in kilograms. The same can be done with the variable HEIGHT3 to make it usable in the analysis (file “Data 02 (1).sav”). First, it is possible to create a new variable in which the height would be measured in inches only. It is possible to do that by using the command via Transform → Recode into different variables and manually setting the values for each value of height (from 400 to 708), or by using the syntax from Appendix. The resulting variable is height_inches.

There is no point in creating the descriptives for HEIGHT3 because the data is categorical. However, the frequencies are given in Table 1 above. For the data in inches only (height_inches), descriptives are provided in Table 10 below.

It should be noted that the syntax in Appendix does not contain transformation instructions for the values from 700 through 707 because there are no such values in the data, as can be seen from Table 1 with frequencies for HEIGHT3. Also, the values 7777 and 9999 (outliers that make no sense in this variable) were left as they were during the transformation. They can be filtered out by using the command Data → Select cases, for example.

Descriptive Statistics
N Mean Std. Deviation
height_inches 7689 144.62 803.189
Valid N (listwise) 7689

Table 9. Descriptives for height_inches in “Data 02 (1).sav”.

References

DeCoster, J. (2001). Transforming and restructuring data. Web.

Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.

George, D., & Mallery, P. (2016). IBM SPSS Statistics 23 step by step: A simple guide and reference (14th ed.). New York, NY: Routledge.

Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.

Appendix

DATASET ACTIVATE DataSet1.

RECODE HEIGHT3 (400=48) (401=49) (402=50) (403=51) (404=52) (405=53) (406=54) (407=55) (408=56)

(409=57) (410=58) (411=59) (500=60) (501=61) (502=62) (503=63) (504=64) (505=65) (506=66) (507=67)

(508=68) (509=69) (510=70) (511=71) (600=72) (601=73) (602=74) (603=75) (604=76) (605=77) (606=78)

(607=79) (608=80) (609=81) (610=82) (611=83) (708=92) (7777=7777) (9999=9999) INTO height_inches.

VARIABLE LABELS height_inches ‘height in inches’.

EXECUTE.

Cite this paper

Select style

Reference

StudyCorgi. (2022, June 5). Data Structure: Statistical Analysis. https://studycorgi.com/data-structure-statistical-analysis/

Work Cited

"Data Structure: Statistical Analysis." StudyCorgi, 5 June 2022, studycorgi.com/data-structure-statistical-analysis/.

* Hyperlink the URL after pasting it to your document

References

StudyCorgi. (2022) 'Data Structure: Statistical Analysis'. 5 June.

1. StudyCorgi. "Data Structure: Statistical Analysis." June 5, 2022. https://studycorgi.com/data-structure-statistical-analysis/.


Bibliography


StudyCorgi. "Data Structure: Statistical Analysis." June 5, 2022. https://studycorgi.com/data-structure-statistical-analysis/.

References

StudyCorgi. 2022. "Data Structure: Statistical Analysis." June 5, 2022. https://studycorgi.com/data-structure-statistical-analysis/.

This paper, “Data Structure: Statistical Analysis”, was written and voluntary submitted to our free essay database by a straight-A student. Please ensure you properly reference the paper if you're using it to write your assignment.

Before publication, the StudyCorgi editorial team proofread and checked the paper to make sure it meets the highest standards in terms of grammar, punctuation, style, fact accuracy, copyright issues, and inclusive language. Last updated: .

If you are the author of this paper and no longer wish to have it published on StudyCorgi, request the removal. Please use the “Donate your paper” form to submit an essay.