Skip to main content
Fig. 1 | BMC Medicine

Fig. 1

From: Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth

Fig. 1

Definition and attributes of Vanderbilt delivery cohort. A Schematic overview of the assembly of the delivery cohort from electronic health records (EHRs). Using billing codes, women with at least one delivery were extracted from the EHR database (n = 35,282). B Delivery date and type were ascertained using ICD-9, CPT, and/or estimated gestational age (EGA) from each woman’s EHR (the “Methods” section). From this cohort, 104 randomly selected EHRs were chart reviewed to validate the preterm birth label for the first recorded delivery. C Number of women in the billing code cohort with estimated gestational age (+EGA), demographics (+Age, self- or third-party-reported race), clinical labs (+Labs), clinical obstetric notes (+Obstetric notes), patient clinical history (+Clinical History), and genetic data (+Genetics). D The EGA distribution at delivery (mean 38.5 weeks (red line); 38.0–40.3 weeks, 25th and 75th percentiles). Less than 0.015% (n = 49) deliveries have EGA below 20 weeks. E The concordance between estimated gestational age (EGA) within 3 days of delivery and ICD-9-based delivery type for the 15,041 women with sufficient data for both. Precision and recall values were > 93% across labels except for preterm precision (85%). F Accuracy of delivery type phenotyping. The phenotyping algorithm was evaluated by chart review of 104 randomly selected women. The approach has high precision and recall for binary classification of “preterm” or “not preterm”

Back to article page