Background and Objectives: Identifying pregnant women who are at risk of premature birth and determining its risk factors is essential because it affects their health. This study aimed to use an interpretable machine-learning model to predict premature birth.
Methods: In this study, data from 149,350 births in Tehran in 2019 were utilized from the Iranian Mothers and Babies Network (IMaN) dataset. Various factors related to the mother and the fetus, such as the mother's demographic variables and health status, medical history, pregnancy conditions, childbirth, and associated risks, were considered. The machine learning models, including multilayer neural networks, random forest, and XGBoost, were employed to predict the occurrence of preterm birth after data preprocessing. The models were evaluated based on accuracy, sensitivity, specificity, and area under the ROC curve. The Python programming language version 3.10.0 was applied to analyze the data.
Results: About 8.67% of births were premature. The XGBoost algorithm achieved the highest prediction accuracy (90%). According to the model output, multiple births, which account for 46% of pregnant women's births, had the highest importance score. Delivery risk factors had a score of 41%, and other variables, including neurological and mental illness, preeclampsia, and cardiovascular disease, were subsequently ranked in order of importance for this particular individual.
Conclusion: Using an interpretable machine learning method could predict the occurrence of premature birth. Based on risk factors, the interpretable machine learning method can provide personalized preventive recommendations for every pregnant woman, aiming to reduce the risk of preterm birth.