Predicting Maternal Health Risks Using Nutritional Data and Machine Learning

Authors

  • Ruyi Zhang
  • Zaliha Harun
  • Linjun Liu

Abstract

Introduction: Maternal health remains a critical global issue, with high mortality rates in low-resource areas due to delayed risk detection and limited healthcare access. Despite medical progress, preventable conditions like hypertensive disorders and gestational diabetes persist, highlighting the need for early diagnostic tools aligned with SDG 3. This study develops machine learning models using clinical data (e.g., blood pressure, glucose) to predict maternal risks, aiming to (1) identify key predictors, (2) evaluate model performance, and (3) support clinical decisions. Challenges include data privacy and quality. The methodology emphasizes preprocessing, model training (XGBoost, KNN), and interpretability for practical deployment, advancing AI-driven solutions for maternal care and SDG 3. Objectives: This study develops machine learning models to predict maternal health risks using clinical indicators like blood pressure and glucose levels. It compares XGBoost, KNN and Random Forest algorithms, evaluating their performance through accuracy, precision and recall metrics. The research identifies key predictive features while examining how data preprocessing affects results. The goal is to create an interpretable risk prediction tool that balances accuracy with clinical usability, particularly for low-resource settings. Implementation addresses data privacy compliance and EHR integration to support healthcare decision-making and improve maternal outcomes. Methods: The study utilized the Maternal Health Risk Dataset, comprising 1,014 entries with features like age, blood pressure, and blood sugar levels. Data preprocessing included outlier removal, encoding, and scaling. Three models—XGBoost, K-Nearest Neighbors (KNN), and Random Forest—were trained and evaluated using accuracy, precision, recall, and F1-score. Hyperparameter tuning was performed via GridSearchCV. Results: The Random Forest model outperformed others, achieving 86.70% accuracy with standardized full features. It excelled in identifying high-risk cases (96% precision, 95% recall). XGBoost followed closely (86.21% accuracy), while KNN lagged (80.30%). Partial feature sets reduced performance across all models. Conclusions: The Random Forest model is recommended for deployment due to its high accuracy and interpretability. Future work includes expanding datasets and integrating real-time EHR systems to enhance predictive capabilities and maternal healthcare outcomes.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Tran, H. A., Chunilal, S. D., Harper, P. L., Tran, H., Wood, E. M., Gallus, A. S., & Australasian Society of Thrombosis and Haemostasis (ASTH) (2013). An update of consensus guidelines for warfarin reversal. The Medical journal of Australia, 198(4), 198–199. https://doi.org/10.5694/mja12.10614

Arif Ali, Z., H. Abduljabbar, Z., A. Tahir, H., Bibo Sallow, A., & Almufti, S. M. (2023). eXtreme Gradient Boosting Algorithm with Machine Learning: a Review. Academic Journal of Nawroz University, 12(2), 320–334. https://doi.org/10.25007/ajnu.v12n2a1612

Azal Ahmad Khan, Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778–122778. https://doi.org/10.1016/j.eswa.2023.122778

Belokurova, G., & Piazza, C. (2018). Case Study—Using SPSS Modeler and STATISTICA to Predict Student Success at High-Stakes Nursing Examinations (NCLEX) *. Elsevier EBooks, 335–357. https://doi.org/10.1016/b978-0-12-416632-5.00025-6

Bolandraftar, M., Bafandeh, S., & And, I. (2013). Application of K-nearest neighbor (KNN) approach for predicting economic events theoretical background Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background. Journal of Engineering Research and Applications Www.ijera.com, 3, 605–610.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. ResearchGate. https://doi.org/10.1145//2939672.2939785

Ebrahimi, M., & Alireza Basiri. (2024). RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm. Knowledge-Based Systems, 301, 112357–112357. https://doi.org/10.1016/j.knosys.2024.112357

Eyyup Ensar Başakın, Ömer Ekmekcioğlu, & Mehmet Özger. (2023). Developing a novel approach for missing data imputation of solar radiation: A hybrid differential evolution algorithm based eXtreme gradient boosting model. Energy Conversion and Management, 280, 116780–116780. https://doi.org/10.1016/j.enconman.2023.116780

Feucherolles, M., Nennig, M., Becker, S. L., Martiny, D., Losch, S., Penny, C., Cauchie, H.-M., & Ragimbeau, C. (2021). Investigation of MALDI-TOF Mass Spectrometry for Assessing the Molecular Diversity of Campylobacter jejuni and Comparison with MLST and cgMLST: A Luxembourg One-Health Study. Diagnostics, 11(11), 1949. https://doi.org/10.3390/diagnostics11111949

Liu, L., Das, S. K., & Jin, Z. (2024). Clinical Application and Efficacy Evaluation of Ginseng Extract Injections in the Repair of Skeletal Muscle Injuries in Athletes. Journal of Theory and Practice in Engineering and Technology, 1(3), 9-13.

Grillone, B., Stoyan Danov, Sumper, A., Cipriano, J., & Mor, G. (2020). A review of deterministic and data-driven methods to quantify energy efficiency savings and to predict retrofitting scenarios in buildings. Renewable and Sustainable Energy Reviews, 131, 110027–110027. https://doi.org/10.1016/j.rser.2020.110027

Jin, S., Zhang, F., Zheng, Y., Zhou, L., Zuo, X., Zhang, Z., Zhao, W., Zhang, W., & Pan, X. (2023). CSKNN: Cost-sensitive K-Nearest Neighbor using hyperspectral imaging for identification of wheat varieties. Computers and Electrical Engineering, 111, 108896. https://doi.org/10.1016/j.compeleceng.2023.108896

K-Nearest Neighbor(KNN) Algorithm for Machine Learning - Javatpoint. (2021). Www.javatpoint.com. https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning

M.A. Ganaie, Hu, M., Malik, A. K., Tanveer, M., & P.N. Suganthan. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151–105151. https://doi.org/10.1016/j.engappai.2022.105151

Mohammad Amin Roudak, Farahani, M., & Fatemeh Bourbour Hosseinbeigi. (2024). Extension of K-nearest neighbors and introduction of an applicable prediction criterion for a novel Monte Carlo simulation-based method in structural reliability. Structures, 66, 106867–106867. https://doi.org/10.1016/j.istruc.2024.106867

Mu, C., Yan, Z., & Zhu, Y. (2023). Prediction of Maternal Health Risk based on Physiological Indicators. Proceedings of the 2023 4th International Symposium on Artificial Intelligence for Medicine Science, 45, 578–584. https://doi.org/10.1145/3644116.3644212

Notes on Parameter Tuning — xgboost 2.1.1 documentation. (2022). Readthedocs.io. https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html

NVIDIA . (2019). What is XGBoost? NVIDIA Data Science Glossary. https://www.nvidia.com/en-us/glossary/xgboost/

Panhalkar, A. R., & Doye, D. D. (2021). A novel approach to build accurate and diverse decision tree forest. Evolutionary Intelligence, 15(1), 439–453. https://doi.org/10.1007/s12065-020-00519-0

Shi, Y., Yang, K., Yang, Z., & Zhou, Y. (2022). Primer on artificial intelligence. Elsevier EBooks, 7–36. https://doi.org/10.1016/b978-0-12-823817-2.00011-5

Tarwidi, D., Sri Redjeki Pudjaprasetya, Didit Adytia, & Mochamad Apri. (2023). An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX, 10, 102119–102119. https://doi.org/10.1016/j.mex.2023.102119

UCI Machine Learning Repository. (2023). Uci.edu. https://archive.ics.uci.edu/dataset/863/maternal+health+risk

S. Yan and L. Liu, "Optimizing Fighter Strategies and Predicting Outcomes in Bellator MMA Using Artificial Intelligence," 2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS), Yanji, China, 2024, pp. 901-905, doi: 10.1109/EIECS63941.2024.10800209.

Wang, R., Ullah, A., & Lee, T.-H. (2020). Bootstrap Aggregating and Random Forest | Request PDF. ResearchGate. https://doi.org/10.1007//978-3-030-31150-6_13

World Health Organization. (2024, April 26). Maternal Mortality. World Health Organization; World Health Organization: WHO. https://www.who.int/news-room/fact-sheets/detail/maternal-mortality

Yunida, H. (2022). Saving of Maternal and Infant Lives with Sustainable Midwifery Services. International Journal of Community Based Nursing and Midwifery, 10(4), 313–314. https://doi.org/10.30476/IJCBNM.2022.95877.2092

Downloads

Published

2025-07-07

How to Cite

1.
Zhang R, Harun Z, Liu L. Predicting Maternal Health Risks Using Nutritional Data and Machine Learning. J Neonatal Surg [Internet]. 2025Jul.7 [cited 2025Oct.4];14(32S). Available from: https://jneonatalsurg.com/index.php/jns/article/view/7929