Skip to main content
Fig. 4 | BMC Medicine

Fig. 4

From: Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth

Fig. 4

Billing code-based model outperforms a model based on clinical risk factors. A We compared the performance of boosted decision trees trained using either billing codes (ICD-9 and CPT) present before 28 weeks of gestation (purple) or known clinical risk factors (gray) to predict preterm delivery. Clinical risk factors (the “Methods” section) included self- or third-party-reported race (Black, Asian, or Hispanic), age at delivery (> 34 or < 18 years old), non-gestational diabetes, gestational diabetes, sickle cell disease, presence of fetal abnormalities, pre-pregnancy BMI > 35, pre-pregnancy hypertension (> 120/80), gestational hypertension, preeclampsia, eclampsia, and cervical abnormalities. Both models were trained and evaluated on the same cohort of women (n = 21,099). B Precision-recall and C ROC curves for the model using billing codes (purple line) or clinical risk factors (gray line). Preterm births are predicted more accurately by models using billing codes at 28 weeks of gestation (PR-AUC = 0.40, ROC-AUC = 0.75) than using clinical risk factors as features (PR-AUC = 0.25, ROC-AUC = 0.65). For the precision-recall curves, chance performance is determined by the preterm birth prevalence (dashed black line). D Billing code-based prediction model performance stratified by the number of risk factors for an individual. The billing code-based model detects more preterm cases and has higher precision (dark purple) across all numbers of risk factors compared to preterm (PTB) prevalence (light purple). E The model using billing codes also performs well at predicting the subset of spontaneous preterm births in the held-out set (recall = 0.48) compared to risk factors (recall = 0.35)

Back to article page