Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Bibliographic Details
Title: Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.
Authors: Uddin, Shahadat1 (AUTHOR) shahadat.uddin@sydney.edu.au, Lu, Haohui1 (AUTHOR)
Superior Title: PLoS ONE. 4/18/2024, Vol. 19 Issue 4, p1-12. 12p.
Subject Terms: *MACHINE learning, *REGRESSION trees, *RANDOM forest algorithms, *SUPPORT vector machines, *DECISION trees
Abstract: Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article. [ABSTRACT FROM AUTHOR]
Copyright of PLoS ONE is the property of Public Library of Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Academic Search Premier
Full text is not displayed to guests.
Description
Description not available.