Development of models for cervical cancer screening: construction in a cross-sectional population and validation in two screening cohorts in China

Background Current methods for cervical cancer screening result in an increased number of referrals and unnecessary diagnostic procedures. This study aimed to develop and evaluate a more accurate model for cervical cancer screening. Methods Multiple predictors including age, cytology, high-risk human papillomavirus (hrHPV) DNA/mRNA, E6 oncoprotein, HPV genotyping, and p16/Ki-67 were used for model construction in a cross-sectional population including women with normal cervix (N = 1085), cervical intraepithelial neoplasia (CIN, N = 279), and cervical cancer (N = 551) to predict CIN2+ or CIN3+. A base model using age, cytology, and hrHPV was calculated, and extended versions with additional biomarkers were considered. External validations in two screening cohorts with 3-year follow-up were further conducted (NCohort-I = 3179, NCohort-II = 3082). Results The base model increased the area under the curve (AUC, 0.91, 95% confidence interval [CI] = 0.88–0.93) and reduced colposcopy referral rates (42.76%, 95% CI = 38.67–46.92) compared to hrHPV and cytology co-testing in the cross-sectional population (AUC 0.80, 95% CI = 0.79–0.82, referrals rates 61.62, 95% CI = 59.4–63.8) to predict CIN2+. The AUC further improved when HPV genotyping and/or E6 oncoprotein were included in the base model. External validation in two screening cohorts further demonstrated that our models had better clinical performances than routine screening methods, yielded AUCs of 0.92 (95% CI = 0.91–0.93) and 0.94 (95% CI = 0.91–0.97) to predict CIN2+ and referrals rates of 17.55% (95% CI = 16.24–18.92) and 7.40% (95% CI = 6.50–8.38) in screening cohort I and II, respectively. Similar results were observed for CIN3+ prediction. Conclusions Compared to routine screening methods, our model using current cervical screening indicators can improve the clinical performance and reduce referral rates. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-021-02078-2.


Background
Cervical cancer is the fourth most frequently diagnosed cancer and the fourth leading cause of cancer death in women, with an estimated 570,000 new cases and 311,000 deaths in 2018 worldwide [1]. Cancer morbidity and mortality have decreased in developed countries due to the implementation of routine cervical cancer screening [2], and testing for high-risk human papillomavirus (hrHPV) has improved cervical cancer prevention efforts [3].
The decision for modern cervical cancer screening programs is often made based on age, cytology, and hrHPV testing results. For example, the USA has different cervical cancer screening guidelines for women in different age groups. For women aged 30 to 65 years, guidelines recommend the use of cytology and hrHPV co-testing due to its high sensitivity. HPV testing has not been recommended in women aged 21-29 without atypical squamous cells of undetermined significance (ASC-US) due to its low specificity [4]. However, the cytology and hrHPV co-testing would increase the number of referrals, unnecessary diagnostic procedures, and costs of the health care system [5,6].
Cervical intraepithelial neoplasia grade 2/3 (CIN2/3) can progress to cervical cancer if left untreated, and therefore identifying women who would benefit from further monitoring and/or treatment is important [7]. However, we need to ensure that unnecessary referrals are avoided. To retain the high sensitivity of our current primary screening tests and improve the specificity, additional screening biomarkers such as HPV genotyping [8,9], E6 oncoprotein [10], or p16/Ki-67dual staining [11] have been developed and shown to have high specificity as triage tests for HPV positive women, respectively. However, decision-making in routine cancer screening may become more complicated as additional biomarkers are added and screening algorithms become increasingly complex. These complex screening algorithms may have limited practical applications.
In recent years, machine learning methods play an important role in selecting an appropriate combination of multiple biomarkers. With the increasing availability of large national databases and computing power, the use of machine learning methods in medical science and health care has been rapidly growing [12,13]. Studies have shown that machine learning methods such as logistic regression and support vector machine (SVM) can enhance prediction performances by providing clinicians with valuable evidence-based prognostic information. By using the machine learning methods, we may substantially improve the sensitivity and specificity of cervical cancer screening, avoiding unnecessary colposcopy referral, and simplifying decision-making in clinical practice.
The aim of this study was to develop models that have better prediction of CIN2+ by using age, cytology, and hrHPV testing, with or without other biomarkers. Models were constructed and evaluated in a crosssectional population enriched with CIN and cervical cancer. External validation in two screening cohorts was further presented to demonstrate the usefulness of our methods.

Study population
This study included three populations, one crosssectional population and two screening cohorts. Women were eligible if they had an intact cervix and no prior history of CIN. Women eligible for the screening cohorts were additionally aged 25 to 65. Women who were pregnant, had a hysterectomy, or received treatment for cervical diseases were excluded. Further details on the study design are provided in Fig. 1. Institutional review board (IRB) approval was provided by the Ethics Committee from Cancer Hospital, Chinese Academy of Medical Sciences. All participants have agreed on the study protocol and provided informed consent.

Cross-sectional population
Participants were recruited from five hospitals in China between 2014 and 2015 and included women attending routine cervical cancer screening programs, outpatients referred for colposcopy, and inpatients planning treatment for CIN2+. A questionnaire was used to collect information on demographic factors and obstetrics and gynecology history. Two cervical exfoliated cell samples were collected: one was kept in PreservCyt Solution (Hologic) and aliquoted for cobas HPV (Roche), Aptima HPV (Hologic), Onclarity HPV (BD Diagnostics) testing, p16/Ki-67 dual staining (Roche), and liquid-based cytology (LBC) assessment and the other sample was kept in a Dacron swab for HPV16/18 E6 protein detection (Arbor Vita Corporation). Cervical biopsies were conducted using a protocol as previously described [14]. Local pathologists provided the primary diagnosis, and a panel of five pathologists from each center underwent a diagnostic blind review for consensus. For both screening cohorts, women who were HPV positive or had an ASC-US+ cytology continued to annual follow-up visits, and all women regardless the results at baseline came back at the 3rd year for a final visit. At each visit, a LBC specimen was obtained and women with ASC-US+ were referred for colposcopy. Women found to have a diagnosis of CIN2+ at baseline or follow-up exited the study after the colposcopy visit and were referred for treatment.

Laboratory tests
The Onclarity HPV is a PCR assay for the detection of six individual HPV genotypes (16, 18,   the manufacturer's instructions. The OncoE6 cervical test is an immunochromatographic test for the detection of HPV16/18 E6 oncoprotein. The operation procedures were described previously [15].
Cytology slides were first evaluated by junior cytologists and then diagnosed by senior cytologists. Results were reported using the Bethesda 2014 nomenclature. A second cytology slide was prepared from the residual PreservCyt Solution for p16/Ki-67 dual staining using the CINtecPLUS Cytology kit according to the manufacturer's instructions for the cross-sectional samples. Technicians were blinded to each other's findings to minimize bias.

Statistical analyses Model development
Based models of logistic regression and SVM were implemented on the platform of R (Version 3.5.2). Model construction and internal validation were performed in the cross-sectional population, which was randomly split into 70% for a training set and 30% for a testing set.
Logistic regression or SVM using age, cytology, and hrHPV as predictors was set as the base model. Among the predictors, age was a continuous covariate; hrHPV testing was dichotomous (any type of the 14 hrHPV types positive vs. all of the 14 hrHPV types negative); and cytology was a seven-level covariate: negative for intraepithelial lesion or malignancy (NILM), ASC-US, low-grade squamous intraepithelial lesion (LSIL), atypical squamous cells cannot exclude high-grade lesion (ASC-H), atypical glandular cell (AGC), high-grade squamous intraepithelial lesion/adenocarcinoma in situ (HSIL/AIS), and squamous cell carcinoma/adenocarcinoma (SCC/ADC). HSIL and AIS, as well as SCC and ADC, were separately combined because limited cases were available for these levels. CIN2+ or CIN3+, the outcome of interest, was dichotomous. Receiver operating characteristic (ROC) curve (sensitivity and 1specificity) and the area under the curve (AUC) were used to assess predictive accuracy. Sensitivity, specificity, and colposcopy referral rate were also calculated for current screening methods and models based on the thresholds with the largest Youden Index.

External validation in screening cohorts
The base model and extended versions with HPV genotyping were applied to both screening cohorts. The extended models with E6 oncoprotein were applied to SC-I only because swab samples were not collected in SC-II. Cytology results diagnosed by junior and senior cytologists were also evaluated in models. Three-year cumulative risks of CIN2+ were estimated by hrHPV and cytology co-testing negative and predicted-negative populations.

Model development
Results for the current screening methods and the proposed models for CIN2+ prediction are presented in Table 2. Statistical comparisons showed that the logistic regression had slightly higher AUC compared to SVM thus were chosen in further analysis (parameters of the models are shown in Additional file 1: Table S1-S2). The logistic regression from the testing set of the crosssectional population showed that the base model had a sensitivity of 92.00% (95% confidence interval [CI] = 88.00-95.11%), specificity of 89.08% (95% CI = 85.63-92.24%), and AUC of 0.91 (95% CI = 0.88-0.93). The AUC of the base model slightly increased when p16/Ki-67 dual staining was added in the base model, whereas larger AUC improvements were obtained when HPV genotyping or E6 oncoprotein were included in the base model (Fig. 2). Results of cobas, Aptima, and Onclarity showed no significant changes.
For the current screening methods, the largest AUCs were obtained by ASC-US+ (AUC = 0.85, 95% CI=0.84-0.87) and HPV mRNA (AUC = 0.87, 95% CI = 0.86-0.89). For co-testing, the AUC was 0.80 (95% CI = 0.79-0.82). The base model showed to have slightly higher AUC compared to current screening methods using either ASC-US+ or hrHPV mRNA. In addition, the base model reduced the number of colposcopy referrals, with a referral rate of 42.76% (95% CI = 38.67-46.92%) compared to   (Table 2). Similar results for the current screening methods and the proposed models for CIN3+ prediction are presented in Additional file 1: Table S3.

External validation in screening cohorts at follow-up
During the 3-year follow-up procedures, 42 CIN2+ cases were diagnosed in SC-I, with 37 cases predicted to be positive and 5 cases to be negative by base model at baseline. These 5 cases were both hrHPV negative and normal cytology at baseline. Women with predictednegative findings had slightly lower 3-year risks of CIN2+ compared with women with hrHPV and cytology co-test negative (0.19%, 95% CI = 0.06-0.44% vs 0.20%, 95% CI = 0.06-0.46%). As for SC-II, 28 CIN2+ cases were diagnosed during follow-up, with 11 cases predicted to be positive and 17 cases to be negative at baseline. Women with predicted-negative findings had higher 3-year risks of CIN2+ compared with women with co-test negative (0.70%, 95% CI = 0.43-1.08% vs 0.09%, 95% CI = 0.01-0.32%).
Since the 3-year risk of CIN2+ was higher among women with negative results of the predictive model Fig. 2 Area under the receiver operating characteristic curve (AUC) and 95% confidence interval (CI) of the base model with or without additional biomarkers compared to co-testing, we changed the thresholds from the highest Youden Index to the highest sensitivity for base model prediction. Results for SC-I did not change, whereas SC-II yielded a sensitivity of 100%, specificity of 84.13% (95% CI = 82.88-85.42%), and colposcopy referral rate of 17.23% (95% CI = 15.91-18.61%), which also has a higher specificity and AUC, same sensitivity and lower colposcopy referral rate compared to co-testing at baseline. By using this threshold, 26 out of 28 follow-up CIN2+ cases were predicted to be positive at baseline and 2 cases were negative. Women with predictednegative findings had slightly lower 3-year risks of CIN2+ (0.08%, 95% CI = 0.01-0.28%) compared with 0.09% (95% CI = 0.01-0.32%) of women with co-testing negative.

Discussion
In this study, we developed and evaluated machine learning-based models to predict CIN2+ or CIN3+ for cervical cancer screening. A logistic regression model using hrHPV, cytology, and age was set as the base model due to its superior performances in prediction and colposcopy referral rates reduction. Improved clinical performance of the base model can be gained by incorporating E6 oncoprotein and/or HPV genotyping information. External validation in two screening cohorts further demonstrated that our models had better clinical performances than routine screening methods. The 3-year risks of CIN2+ for the predicted-negative women depended on the thresholds of the model, but the improvement of clinical performance at baseline can be obtained whichever threshold was chosen.
Different models were used for cervical cancer screening in previous studies. Karakitsos et al. used the learning vector quantizer neural network classifier on cytological diagnosis, HPV DNA test, E6/E7 HPV mRNA test, and p16 immunostaining to build an algorithm to facilitate the classification of CIN2+. This model improved the AUC (0.916) significantly compared to cytology diagnosis alone (0.866) [16]. In the study conducted by Branca et al., comprehensive multivariate models were constructed by a panel of 13 biomarkers to  [17]. A Korean study developed a web-based tool on age, cytology and presence of 15 hrHPV genotypes in a SVM model to identify the patient features that maximally contributed to progression to cervical lesions, which obtained an accuracy of 74.41%. However, this model was not developed for cancer screening and their result was highly dependent on the proportion of positive and negative individuals they selected [18]. Several studies used logistic regression to establish predictors for histologic grade or risk stratification based on the epidemiologic risk factors and the molecular markers. In a large study of around 100,000 women using race, smoking status, insurance, marital status, median income, and previous HPV test result as predictors, their model only obtained an AUC of 0.81 for CIN2+ [19]. Another study of 1,477 women reported that the most predictive factors were mRNA level, DNA index, parity, and age, and the AUC was 0.99 for HSIL and 0.81 for LSIL [20]. However, findings from previous studies may not be replicable across studies due to differences in adjustment factors, sample size, and degrees of diagnoses.
The clinical performances of our extended models included HPV genotyping and/or E6 oncoprotein showed to be better than the base model in each of the study population assessed, but resulted in a slight increase in cost. HPV genotyping can be a byproduct of HPV testing that has little additional costs but more additional information. E6 oncoprotein is pivotal in initiation and maintenance of oncogenic transformation by HPV [21] and associated with viral persistence [22,23]. The protein testing is a lateral flow immunoassay designed for low-and middle-income countries [15]. When conducting study in SC-I, we assumed that only people positive with HPV mRNA result could express E6 oncoprotein. Therefore, our protein testing was performed only in the HPV16/18/45 mRNA-positive participants (N = 126). These results showed that additionally testing for the E6 oncoprotein in a limited group of people could yield better screening performance than the base model. In addition, the models recommended fewer women to receive immediate colposcopy compared to HPV and cytology testing alone or co-testing but had the same cost with co-testing in the real-world setting to collect HPV testing and cytology information, hence could reduce unnecessary diagnostic procedures and costs.
Cytology diagnosis is subjective in nature, and its reproducibility and accuracy are affected by the cytologist's skill [24]. In our study, the cytology results diagnosed by senior cytologists were performed in a high-quality laboratory in Beijing, which may not be generalizable to all the cytology laboratories [25]. For example, the cytology diagnosis in SC-II was conducted by the best cytologist in China, whose sensitivity (0.961) was higher than HPV testing (0.902). In order to better extrapolate, models based on the cytology results from junior cytologists were also evaluated. Although we found that model performances were affected by the skill level of the cytologists, the clinical performances of our models were still increased, compared to hrHPV testing and cytology cotesting within the same cytologist's skill level.

Conclusions
Our study demonstrated that machine learning could incorporate multiple screening methods into one algorithm and develop models by the current cervical cancer screening indicators, which has the potential to be a reliable screening method considering its better clinical performance and lower referral rate.