Cluster-Based Analysis of Accredited Investors Using US Census Data

Study Background

The problem of clustering a large data set is critical when the collected information is distributed at many levels, contains many records, and cannot be summarized reliably using descriptive statistics tools alone. The value of the clustering method is that it allows a large data set to be divided into distinct groups (clusters) with as similar characteristics as possible within the group but differ at the cluster level. In other words, clustering is used to identify which cohorts are present in the dataset, and hence, it has goals that lie in the plane of classification tasks. This approach also has the advantage of categorizing a particular region into a cluster, hence determining how the targets of interest for that region behave.

This study utilized data collected by the US Census (2023) for 2019, which included various demographic, social, and economic population data. The objective was to examine the data pertaining to accredited investors, identify their regions of residence, and conduct statistics on income and benefits.

Data Filtration

First of all, it is worth paying attention to the clustering method used, which was chosen to filter the data. The source file contained records for 282 unique ZIP codes specific to Connecticut. However, not every region was selected for the final cluster, but only those whose median annual household income was above $250,000 — this category of households was referred to as accredited investors for the purposes of this paper. After filtering by household income, the accredited investor’s cluster included only 13 records; their geographic location is shown on the map in Figure 1. Table 1 contains information on the ZIP code and corresponding household income for each element of the cluster created.

Map showing only those ZIP codes that were assigned to the accredited investor cluster
Figure 1 — Map showing only those ZIP codes that were assigned to the accredited investor cluster.
ZIP Mean Household Income ZIP Mean Household Income ZIP Mean Household Income
06793 $260,159 06853 $309,832 06890 $283,703
06820 $365,528 06870 $303,203 06897 $261,380
06830 $253,080 06878 $343,115 06903 $280,191
06831 $319,816 06880 $319,050
06840 $336,642 06883 $300,718

Table 1— Elements included in the cluster created.

Data Clustering

An additional interest was to study the structure of the elements in the cluster. Figure 2 shows the distribution of such shares: as the data shows, the top 4 regions were 06903 (62.3%), 06897 (62.9%), 06883 (66.9%), and 06853 (67.6%) — these regions had the most people (from the cluster) employed. Meanwhile, Regions 06840 (53.3%) and 06878 (55.2%) had the least population employed; however, this was still above half of the region’s population.

Distribution of the proportion of the employed population to the total population for ZIPs
Figure 2 — Distribution of the proportion of the employed population to the total population for ZIPs.

Data Analysis

Table 2 shows the percentage structure for each of the ZIPs, showing the proportion of family and non-family households, as well as retirement income cells, of the total employed population. Several conclusions can be drawn from the data. For example, the region with the highest share of households with retirement benefits is 06840 (15.0%), and the region with the lowest share is 06883 (9.7%). In addition, of all the cluster regions, 06840 had the maximum proportion of family households (67.3%), and 06793 had the minimum proportion of such households (44.1%). In terms of non-family households, 06883 had the minimum proportion (10.0%), and in contrast, 06793 had the maximum proportion of non-family units (30.7%).

06793 06820 06830 06831 06840 06853 06870 06878 06880 06883 06890 06897 06903
With retirement income 13.9 10.9 11.9 13.7 15.0 13.3 14.3 12.5 13.7 9.7 14.3 12.8 13.7
Families 44.1 62.9 53.0 57.4 67.3 47.6 63.0 63.1 59.7 56.1 56.7 56.9 57.5
Non-family households 30.7 13.0 28.4 15.1 20.8 26.4 15.5 15.9 18.7 10.0 22.4 12.5 10.1

Table 2 — Percentage structure of cluster elements.

Following the findings, it was of interest to identify the relationship between family and non-family household income with total accredited investor income. As shown in Figure 3, an increase in family income leads to an increase in investor income. In contrast, the relationship between non-family household income and accredited investor income seems to be either non-existent (R2 = 0.006) or top-down for the cluster.

Scatter plot with regression analysis results for the relationship between investor income and household income
Figure 3 — Scatter plot with regression analysis results for the relationship between investor income and household income.
Scatter plot displaying the regression analysis outcomes for the correlation between investor income and income from non-family sources
Figure 4 — Scatter plot displaying the regression analysis outcomes for the correlation between investor income and income from non-family sources.

Findings

From the clustering results and data analysis, it was obtained that it was most appropriate for accredited investors to select regions from a cluster of 13 values to build a brick-and-mortar office. A more detailed cluster analysis showed that the income of investors will increase with the growth of family household income. Of all 13 regions in the cluster, only 06820, 06840, 06870, and 06878 had elevated rates of family households among the employed population. In other words, building a brick-and-mortar office in any of these four regions is a viable strategy for accredited investors.

Reference

US Census. (2023). DPO3: Selected economic characteristics. United States Census Bureau. Web.

Cite this paper

Select style

Reference

StudyCorgi. (2025, September 21). Cluster-Based Analysis of Accredited Investors Using US Census Data. https://studycorgi.com/cluster-based-analysis-of-accredited-investors-using-us-census-data/

Work Cited

"Cluster-Based Analysis of Accredited Investors Using US Census Data." StudyCorgi, 21 Sept. 2025, studycorgi.com/cluster-based-analysis-of-accredited-investors-using-us-census-data/.

* Hyperlink the URL after pasting it to your document

References

StudyCorgi. (2025) 'Cluster-Based Analysis of Accredited Investors Using US Census Data'. 21 September.

1. StudyCorgi. "Cluster-Based Analysis of Accredited Investors Using US Census Data." September 21, 2025. https://studycorgi.com/cluster-based-analysis-of-accredited-investors-using-us-census-data/.


Bibliography


StudyCorgi. "Cluster-Based Analysis of Accredited Investors Using US Census Data." September 21, 2025. https://studycorgi.com/cluster-based-analysis-of-accredited-investors-using-us-census-data/.

References

StudyCorgi. 2025. "Cluster-Based Analysis of Accredited Investors Using US Census Data." September 21, 2025. https://studycorgi.com/cluster-based-analysis-of-accredited-investors-using-us-census-data/.

This paper, “Cluster-Based Analysis of Accredited Investors Using US Census Data”, was written and voluntary submitted to our free essay database by a straight-A student. Please ensure you properly reference the paper if you're using it to write your assignment.

Before publication, the StudyCorgi editorial team proofread and checked the paper to make sure it meets the highest standards in terms of grammar, punctuation, style, fact accuracy, copyright issues, and inclusive language. Last updated: .

If you are the author of this paper and no longer wish to have it published on StudyCorgi, request the removal. Please use the “Donate your paper” form to submit an essay.