Skip to main content
Fig. 2 | BMC Medicine

Fig. 2

From: Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth

Fig. 2

Machine learning accurately predicts preterm birth using billing codes present before 28 weeks of gestation. A Machine learning framework for training and evaluating all models. We train models (boosted decision trees) on 80% of each cohort to predict the delivery as preterm or not preterm. EHR features used to ascertain delivery type are excluded from the training. Performance is reported on the held-out cohort consisting of 20% of deliveries using the area under the ROC and precision-recall curves (ROC-AUC, PR-AUC). B We trained models using billing codes (ICD-9 and CPT) present before each of the following time points during pregnancy: 0, 13, and 28 weeks of gestation. These time points were selected to approximate gestational trimesters. Women who already delivered were excluded at each time point. To facilitate comparison across time points, we downsampled the cohorts available so that the models were trained on a cohort with similar numbers of women (n = 11,227 to 11,474). C The ROC-AUC increased from conception at 0 weeks (0.63, dark blue line) to 28 weeks of gestation (0.72, green line) compared to a chance (black dashed line) AUC of 0.5. D The model at 28 weeks of gestation achieved the highest PR-AUC (0.33). This is an underestimate of the possible performance; the performance improves further when all women with data available at 28 weeks are considered (ROC-AUC 0.75 and PR-AUC 0.40). Chance (dashed lines) represents the preterm birth prevalence in each cohort

Back to article page