Comparison of coronary heart disease prediction using basic model and ensemble learning
Main Article Content
Abstract
Coronary heart disease (CHD) remains one of the leading causes of death worldwide, highlighting the urgent need for accurate early detection. This study aims to compare the performance of various machine learning models—including Decision Tree, K-Nearest Neighbor (KNN), Logistic Regression, Random Forest, XGBoost, and Stacking Ensemble—in predicting CHD using the UCI Heart Disease Dataset. The models were evaluated using five metrics: accuracy, precision, recall, F1-score, and AUC. The results indicate that Stacking and Logistic Regression achieved the highest AUC scores (0.80), while XGBoost obtained the best F1-score (0.40). Simpler models such as Decision Tree and KNN showed relatively lower performance. In addition, feature importance analysis using permutation methods revealed that features like number of major vessels (ca), thalassemia (thal), ST depression (oldpeak), and age play a critical role in prediction accuracy. These findings demonstrate that ensemble learning approaches, especially Stacking and XGBoost, can effectively improve diagnostic performance and offer strong potential for clinical decision support systems (CDSS) in the early detection of coronary heart disease.
Downloads
Article Details
An, Q., Rahman, S., Zhou, J., & Kang, J. J. (2023). A comprehensive review on machine learning in healthcare industry: classification, restrictions, opportunities and challenges. Sensors, 23(9), 4178.
Ciumărnean, L., Milaciu, M. V., Negrean, V., Orășan, O. H., Vesa, S. C., Sălăgean, O., Iluţ, S., & Vlaicu, S. I. (2021). Cardiovascular risk factors and physical activity for the prevention of cardiovascular diseases in the elderly. International Journal of Environmental Research and Public Health, 19(1), 207.
Demir, S., & Sahin, E. K. (2023). An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications, 35(4), 3173–3190.
Fick, A., & Hillegass, E. (2022). Ischemic cardiovascular conditions and other vascular pathologies. Essentials of Cardiopulmonary Physical Therapy-E-Book, 51.
Gu, J., Liu, S., Zhou, Z., Chalov, S. R., & Zhuang, Q. (2022). A stacking ensemble learning model for monthly rainfall prediction in the Taihu Basin, China. Water, 14(3), 492.
Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1989). Heart Disease [Dataset]. In UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X
Li, J. (2024). Area under the ROC Curve has the most consistent evaluation for binary classification. PLoS One, 19(12), e0316019.
Lin, S., Zheng, H., Han, B., Li, Y., Han, C., & Li, W. (2022). Comparative performance of eight ensemble learning approaches for the development of models of slope stability prediction. Acta Geotechnica, 17(4), 1477–1502.
Lubis, A. R., & Lubis, M. (2020). Optimization of distance formula in K-Nearest Neighbor method. Bulletin of Electrical Engineering and Informatics, 9(1), 326–338.
Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big data preprocessing. Cham: Springer, 1, 1–186.
Mahajan, P., Uddin, S., Hajati, F., & Moni, M. A. (2023). Ensemble learning for disease prediction: A review. Healthcare, 11(12), 1808.
Maulana, R., Narasati, R., Herdiana, R., Hamonangan, R., & Anwar, S. (2024). Komparasi Algoritma Decision Tree Dan Naive Bayes Dalam Klasifikasi Penyakit Diabetes. JATI (Jurnal Mahasiswa Teknik Informatika), 7(6), 3865–3870. https://doi.org/10.36040/jati.v7i6.8265
Minja, N. W., Nakagaayi, D., Aliku, T., Zhang, W., Ssinabulya, I., Nabaale, J., Amutuhaire, W., de Loizaga, S. R., Ndagire, E., & Rwebembera, J. (2022). Cardiovascular diseases in Africa in the twenty-first century: gaps and priorities going forward. Frontiers in Cardiovascular Medicine, 9, 1008335.
Mir, M. A., Dar, M. A., & Qadir, A. (2024). Exploring the Landscape of Coronary Artery Disease: A Comprehensive Review. Am. J. Biomed. Pharm, 1, 9–22.
Mustafa, H., Mohamed, C., Nabil, O., & Noura, A. (2023). Machine learning techniques for diabetes classification: A comparative study. International Journal of Advanced Computer Science and Applications, 14(9).
Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1–14.
Organization, W. H. (2023). Global report on hypertension: the race against a silent killer. World Health Organization. https://doi.org/10.24432/C52P4X.
Pająk, A., Jankowski, P., & Zdrojewski, T. (2022). The burden of cardiovascular disease risk factors: A current problem. Polish Heart Journal (Kardiologia Polska), 80(1), 5–15.
Rahman, A., Khan, M. S. I., Eidmum, M. D. Z. A., Shaha, P., Muiz, B., Hasan, N., Debnath, T., Kundu, D., Tamanna, J. T., & Sayduzzaman, M. (2025). Stacked Ensemble Method: An Advanced Machine Learning Approach for Anomaly-based Intrusion Detection System. Statistics, Optimization & Information Computing.
Rayadin, M. A., Musaruddin, M., Saputra, R. A., & Isnawaty, I. (2024). Implementasi Ensemble Learning Metode XGBoost dan Random Forest untuk Prediksi Waktu Penggantian Baterai Aki. BIOS: Jurnal Teknologi Informasi Dan Rekayasa Komputer, 5(2), 111–119.
Rayhan, Y., & Setyohadi, D. B. (2021). Classification of grape leaf disease using convolutional neural network (CNN) with pre-trained model VGG16. 2021 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), 1–5.
Sagi, O., & Rokach, L. (2021). Approximating XGBoost with an interpretable decision tree. Information Sciences, 572, 522–542.
Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random forest algorithm overview. Babylonian Journal of Machine Learning, 2024, 69–79.
Shaikh, T. A., Rasool, T., Verma, P., & Mir, W. A. (2024). A fundamental overview of ensemble deep learning models and applications: systematic literature and state of the art. Annals of Operations Research, 1–77.
Shao, C., Wang, J., Tian, J., & Tang, Y. (2020). Coronary artery disease: from mechanism to clinical practice. Coronary Artery Disease: Therapeutics and Drug Discovery, 1–36.
Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi, A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145, 105458.
Spencer, R., Thabtah, F., Abdelhamid, N., & Thompson, M. (2020). Exploring feature selection and classification methods for predicting heart disease. Digital Health, 6, 2055207620914777.
Tyralis, H., & Papacharalampous, G. (2021). Boosting algorithms in energy research: a systematic review. Neural Computing and Applications, 33(21), 14101–14117.
Zaki, Z., Shah, M. A., Wakil, K., & Sher, F. (2020). Logistic regression based human activities recognition. J. Mech. Contin. Math. Sci, 15(4), 228–246.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.