SHAP-Enhanced Multi-Class XG Boost for Early Detection and Risk Assessment of Cardiovascular Diseases
Keywords:
Cardiovascular Disease Prediction, Multi-Class XG Boost, SHAP (SHapley Additive ExPlanations), Risk Stratification, Healthcare, Clinical Decision Support.Abstract
Cardiovascular diseases (CVDs) are a group of disorders affecting the heart and blood vessels, including coronary heart disease, cerebrovascular disease, and rheumatic heart disease. These conditions are among the leading causes of death globally. Traditional diagnostic methods often depend heavily on clinical expertise and can be time-consuming, which highlights the potential of machine learning as an effective alternative for faster and more accurate decision-making. This study employs a Multi-Class XG Boost model integrated with SHAP (SHapley Additive exPlanations) to predict the likelihood of cardiovascular diseases using a dataset comprising key demographic, clinical, and diagnostic features. The model achieved an accuracy of 86.59% on training data and 83.75% on testing data, demonstrating strong performance and robustness. The Multi-Class XG Boost approach allows the model to classify patients into multiple risk categories—such as low, medium, and high risk—rather than just binary outcomes (disease/no disease). This enhances the model’s clinical utility by providing a more granular prediction of disease severity. Additionally, the integration of SHAP enables interpretability by identifying how each feature (such as age, cholesterol level, blood pressure, or smoking status) contributes to the final prediction, making the model’s decisions more transparent to clinicians. In conclusion, the results suggest that the Multi-Class XG Boost model with SHAP not only provides high accuracy but also offers explainable and actionable insights for early detection and risk assessment of cardiovascular diseases. Future research could focus on expanding the dataset, exploring deep learning-based architectures, and enhancing model interpretability for real-time clinical applications
Downloads
References
Kumar, N. K., Sindhu, G. S., Prashanthi, D. K., & Sulthana, A. S. (2020). Analysis and prediction of cardiovascular disease using machine learning classifiers. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 15–21). https://www.researchgate.net/publication/340885231_Analysis_and_Prediction_of_Cardio_Vascular_Disease_using_Machine_Learning_Classifiers
Burden, G. T. I. C. (2001). Epidemiology of cardiovascular disease. BMJ, 323(7311), 1422–1423. https://doi.org/10.1136/bmj.323.7311.1422
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109. https://doi.org/10.1016/S0933-3657(01)00077-X
Katarya, R., & Meena, S. K. (2021). Machine learning techniques for heart disease prediction: A comparative study and analysis. Health Technology, 11(1), 87–97. https://doi.org/10.1007/s12553-020-00505-7
Woolf, S. H., Chan, E. C., Harris, R., Sheridan, S. L., Braddock, C. III, Kaplan, R. M., Krist, A., O’Connor, A. M., & Tunis, S. (2005). Promoting informed choice: Transforming health care to dispense knowledge for decision making. Annals of Internal Medicine, 143(4), 293–300. https://doi.org/10.7326/0003-4819-143-4-200508160-00010
Cai, Y.-Q., Gong, D.-X., Tang, L.-Y., Cai, Y., Li, H.-J., Jing, T. C., Gong, M., Hu, W., Zhang, Z.-W., Zhang, X., et al. (2024). Pitfalls in developing machine learning models for predicting cardiovascular diseases: Challenge and solutions. Journal of Medical Internet Research, 26, e47645. https://doi.org/10.2196/47645
Bhowmik, P. K., Miah, M. N. I., Uddin, M. K., Sizan, M. M. H., Pant, L., Islam, M. R., & Gurung, N. (2024). Advancing heart disease prediction through machine learning: Techniques and insights for improved cardiovascular health. British Journal of Nursing Studies, 4(2), 35–50. https://doi.org/10.32996/bjns.2024.4.2.5
Noorullah, R. M., Begam, S. R., Rani, D. S., & Shreeya, S. (2024). Medi Molecule: An AI-powered platform for accelerating drug discovery through molecule generation and real-time collaboration. Frontiers in Health Informatics, 14(2), 2534–2544.
Chaudhari, S., Gautam, C. S., & Waoo, A. A. (2023). Predicting heart disease using machine learning classification technique. International Journal of Computer Applications, 178(25), 1–5. https://doi.org/10.5120/ijca2023922976
Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542–81554. https://doi.org/10.1109/ACCESS.2019.2925478
Swamy, S., Singh, P., Bajpai, P., Rakaraddi, A., & Sachin, D. (2024). Early age heart disease prediction: A comprehensive survey. Indiana Journal of Multidisciplinary Research, 4(3), 198–204. https://doi.org/10.5281/zenodo.7000000
Saraswathi, U., Noorullah, R. M., & Reddy, A. R. M. (2024). A machine learning approach using statistical models for early detection of cardiac arrest in newborn babies in the cardiac intensive care unit. Frontiers in Health Informatics, 14(2), 2560–2574.
Banapuram, C., Naik, A. C., Vanteru, M. K., Kumar, V. S., & Vaigandla, K. K. (2024). A comprehensive survey of machine learning in healthcare: Predicting heart and liver disease, tuberculosis detection in chest X-ray images. SSRG International Journal of Electronics and Communication Engineering, 11(5), 155–169. https://doi.org/10.14445/23488549/IJECE-V11I5P118
Ogunpola, A., Saeed, F., Basurra, S., Albarrak, A. M., & Qasem, S. N. (2024). Machine learning-based predictive models for detection of cardiovascular diseases. Diagnostics, 14(2), 144. https://doi.org/10.3390/diagnostics14020144
Anjum, N., Siddiqua, C. U., Haider, M., Ferdus, Z., Raju, M. A. H., Imam, T., & Rahman, M. R. (2024). Improving cardiovascular disease prediction through comparative analysis of machine learning models. Journal of Computer Science and Technology Studies, 6, 62–70. https://doi.org/10.5281/zenodo.7000001
Raza, A., Srinivasulu, C., Reddy, A. R. M., & Noorullah, R. M. (2025). Study of oxygen-deprived V307L mutated cardiac ventricular cell. Frontiers in Health Informatics, 14(2), 2693–2704.
Choudhury, A., Mondal, A., & Sarkar, S. (2024). Searches for the BSM scenarios at the LHC using decision tree-based machine learning algorithms: A comparative study and review of random forest, AdaBoost, XGBoost and LightGBM frameworks. European Physical Journal Special Topics, 233, 1–39. https://doi.org/10.1140/epjs/s11734-024-00109-9
Almustafa, K. M. (2020). Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinformatics, 21, 1. https://doi.org/10.1186/s12859-020-03781-7
Yılmaz, R., & Yagın, F. H. (2022). Early detection of coronary heart disease based on machine learning methods. Medical Record, 4(1), 1–6. https://doi.org/10.1016/j.medrec.2022.100101
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.