Overfitting: Cross-Validation, Dropping Out, and Feature Selection

Words: 518 Pages: 2

In most cases, overfitting happens when a model performs perfectly during training data but generalizes poorly to unseen data. It is common in machine learning, and numerous studies have revealed methods that can prevent overfitting (Delua, 2021). Some of the most potent ways to avoid overfitting are cross-validation, dropping out, and feature selection (Zhou et al., 2018). Since these three methods are among the most powerful, they will be critically discussed in this essay.

Cross-validation involves splitting the dataset into k groups, referred to as K-fold. One of the groups is the testing set, and the other is the training test. This process is repeated until each divided group has been used as the testing set (Provost and Fawcett, 2013). Repetition is done to ensure all data has been used for training. The repetitions on the data set prevent overfitting from happening in the model. Although the method prevents overfitting, its limitation is that the training algorithm should be rerun from scratch to k times (McAfee et al., 2012). This indicates that it takes k times as much computation for a complete evaluation to happen.

Dropping out is usually done by ignoring subsets of network units with a set probability. It prevents overfitting by reducing interdependent learning that occurs between units. When performing the dropout method, more epochs are required to ensure that the model converges to indicate that there is no overfitting (Adhikari et al., 2019). Since the technique has spatial relationships encoded in feature maps, it causes its limitation because activations become highly correlated within the model (Petrescu and Krishen, 2020). This makes the method less effective in preventing overfitting, but generally, it is a compelling method.

Feature selection is made by having a limited amount of training samples. The selected data should have the essential features for training to prevent the model from learning numerous features that lead to overfitting. The method involves testing different features, training individual models for the elements, and evaluating generalization abilities (Batistič and der Laken, 2019). Some widely used feature methods are random forest importance, Fisher’s score, Chi-square, and correlation coefficient. The main limitations of future selection are increasing risk if there are insufficient observations and significant computation time when variables are large (Müller, Fay and vom Brocke, 2018). Despite these two limitations, the method perfectly prevents overfitting.

Bias and variance have an inverse connection, and it is almost to have machine learning with low bias and variance. This is why biases are reduced by changing the method used to create the model. On the other hand, high variances are reduced by training data using multiple models (Gupta et al., 2018). Therefore, cross-validation, dropping out, and feature selection are the numerous models that can be used to reduce high variances (Belkin et al., 2019). These three methods should be used in cases with low biases but the high variance to ensure that the resultant model runs effectively. One perfect example of overcoming a bias-variance tradeoff is a machine learning-powered chatbots that provide real-time, client-oriented, and human-like assistance at online banking services to enhance user experience and save the organization’s resources.

Reference List

Adhikari, L., Ozrazgat-Baslanti, T., Ruppert, M., Madushani, R., Paliwal, S., Hashemighouchani, H., Zheng, F., Tao, M., Lopes, J., Li, X., Rashidi, P. and Bihorac, A., 2019. ‘Improved predictive models for acute kidney injury with IDEA: Intraoperative Data Embedded Analytics’, PLOS ONE, 14(4), p.e0214904.

Batistič, S. and der Laken, P., 2019. ‘History, evolution and future of big data and analytics: a bibliometric analysis of its relationship to performance in organizations’, British Journal of Management, 30(2), pp. 229-251.

Belkin, M., Hsu, D., Ma, S. and Mandal, S., 2019. ‘Reconciling modern machine-learning practice and the bias-variance trade-off’, Proceedings of the National Academy of Sciences, 116(32), 15849-15854.

Delua, J., 2021. Supervised vs. unsupervised learning: what’s the difference. Artificial Intelligence Retrieved, 5(09), p. 2021.

Gupta, A., Deokar, A., Iyer, L., Sharda, R. and Schrader, D., 2018. ‘Big data & analytics for societal impact: recent research and trends’, Information Systems Frontiers, 20(2), pp. 185-194.

McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D.J. and Barton, D., 2012. ‘Big data: the management revolution’, Harvard Business Review, 90(10), pp. 60-68.

Müller, O., Fay, M. and vom Brocke, J., 2018. ‘The effect of big data and analytics on firm performance: an econometric analysis considering industry characteristics’, Journal of Management Information Systems, 35(2), pp. 488-509.

Petrescu, M. and Krishen, A., 2020. ‘The importance of high-quality data and analytics during the pandemic’, Journal of Marketing Analytics, 8(2), pp. 43-44.

Provost, F. and Fawcett, T., 2013. Data science for business: what you need to know about data mining and data-analytic thinking. “O’Reilly Media, Inc.”.

Zhou, S., Qiao, Z., Du, Q., Wang, G., Fan, W. and Yan, X., 2018. ‘Measuring customer agility from online reviews using big data text analytics’, Journal of Management Information Systems, 35(2), pp. 510-539.