Skip to main content

Development of models for cervical cancer screening: construction in a cross-sectional population and validation in two screening cohorts in China



Current methods for cervical cancer screening result in an increased number of referrals and unnecessary diagnostic procedures. This study aimed to develop and evaluate a more accurate model for cervical cancer screening.


Multiple predictors including age, cytology, high-risk human papillomavirus (hrHPV) DNA/mRNA, E6 oncoprotein, HPV genotyping, and p16/Ki-67 were used for model construction in a cross-sectional population including women with normal cervix (N = 1085), cervical intraepithelial neoplasia (CIN, N = 279), and cervical cancer (N = 551) to predict CIN2+ or CIN3+. A base model using age, cytology, and hrHPV was calculated, and extended versions with additional biomarkers were considered. External validations in two screening cohorts with 3-year follow-up were further conducted (NCohort-I = 3179, NCohort-II = 3082).


The base model increased the area under the curve (AUC, 0.91, 95% confidence interval [CI] = 0.88–0.93) and reduced colposcopy referral rates (42.76%, 95% CI = 38.67–46.92) compared to hrHPV and cytology co-testing in the cross-sectional population (AUC 0.80, 95% CI = 0.79–0.82, referrals rates 61.62, 95% CI = 59.4–63.8) to predict CIN2+. The AUC further improved when HPV genotyping and/or E6 oncoprotein were included in the base model. External validation in two screening cohorts further demonstrated that our models had better clinical performances than routine screening methods, yielded AUCs of 0.92 (95% CI = 0.91–0.93) and 0.94 (95% CI = 0.91–0.97) to predict CIN2+ and referrals rates of 17.55% (95% CI = 16.24–18.92) and 7.40% (95% CI = 6.50–8.38) in screening cohort I and II, respectively. Similar results were observed for CIN3+ prediction.


Compared to routine screening methods, our model using current cervical screening indicators can improve the clinical performance and reduce referral rates.

Peer Review reports


Cervical cancer is the fourth most frequently diagnosed cancer and the fourth leading cause of cancer death in women, with an estimated 570,000 new cases and 311,000 deaths in 2018 worldwide [1]. Cancer morbidity and mortality have decreased in developed countries due to the implementation of routine cervical cancer screening [2], and testing for high-risk human papillomavirus (hrHPV) has improved cervical cancer prevention efforts [3].

The decision for modern cervical cancer screening programs is often made based on age, cytology, and hrHPV testing results. For example, the USA has different cervical cancer screening guidelines for women in different age groups. For women aged 30 to 65 years, guidelines recommend the use of cytology and hrHPV co-testing due to its high sensitivity. HPV testing has not been recommended in women aged 21–29 without atypical squamous cells of undetermined significance (ASC-US) due to its low specificity [4]. However, the cytology and hrHPV co-testing would increase the number of referrals, unnecessary diagnostic procedures, and costs of the health care system [5, 6].

Cervical intraepithelial neoplasia grade 2/3 (CIN2/3) can progress to cervical cancer if left untreated, and therefore identifying women who would benefit from further monitoring and/or treatment is important [7]. However, we need to ensure that unnecessary referrals are avoided. To retain the high sensitivity of our current primary screening tests and improve the specificity, additional screening biomarkers such as HPV genotyping [8, 9], E6 oncoprotein [10], or p16/Ki-67dual staining [11] have been developed and shown to have high specificity as triage tests for HPV positive women, respectively. However, decision-making in routine cancer screening may become more complicated as additional biomarkers are added and screening algorithms become increasingly complex. These complex screening algorithms may have limited practical applications.

In recent years, machine learning methods play an important role in selecting an appropriate combination of multiple biomarkers. With the increasing availability of large national databases and computing power, the use of machine learning methods in medical science and health care has been rapidly growing [12, 13]. Studies have shown that machine learning methods such as logistic regression and support vector machine (SVM) can enhance prediction performances by providing clinicians with valuable evidence-based prognostic information. By using the machine learning methods, we may substantially improve the sensitivity and specificity of cervical cancer screening, avoiding unnecessary colposcopy referral, and simplifying decision-making in clinical practice.

The aim of this study was to develop models that have better prediction of CIN2+ by using age, cytology, and hrHPV testing, with or without other biomarkers. Models were constructed and evaluated in a cross-sectional population enriched with CIN and cervical cancer. External validation in two screening cohorts was further presented to demonstrate the usefulness of our methods.


Study population

This study included three populations, one cross-sectional population and two screening cohorts. Women were eligible if they had an intact cervix and no prior history of CIN. Women eligible for the screening cohorts were additionally aged 25 to 65. Women who were pregnant, had a hysterectomy, or received treatment for cervical diseases were excluded. Further details on the study design are provided in Fig. 1. Institutional review board (IRB) approval was provided by the Ethics Committee from Cancer Hospital, Chinese Academy of Medical Sciences. All participants have agreed on the study protocol and provided informed consent.

Fig. 1

Flow chart of the study

Cross-sectional population

Participants were recruited from five hospitals in China between 2014 and 2015 and included women attending routine cervical cancer screening programs, outpatients referred for colposcopy, and inpatients planning treatment for CIN2+. A questionnaire was used to collect information on demographic factors and obstetrics and gynecology history. Two cervical exfoliated cell samples were collected: one was kept in PreservCyt Solution (Hologic) and aliquoted for cobas HPV (Roche), Aptima HPV (Hologic), Onclarity HPV (BD Diagnostics) testing, p16/Ki-67 dual staining (Roche), and liquid-based cytology (LBC) assessment and the other sample was kept in a Dacron swab for HPV16/18 E6 protein detection (Arbor Vita Corporation). Cervical biopsies were conducted using a protocol as previously described [14]. Local pathologists provided the primary diagnosis, and a panel of five pathologists from each center underwent a diagnostic blind review for consensus.

Screening cohorts

Both screening cohorts included a baseline phase and a 3-year follow-up phase. Participants in the screening cohort I (SC-I) were recruited from Shanxi Province of China between 2017 and 2020. At baseline, all participants received Aptima HPV, INNO-LiPA HPV genotyping (Innogenetics), and LBC. Aptima HPV positive samples were tested by Aptima HPV16/18/45. Women with HPV16/18/45 positive or abnormal cervical cytology (ASC-US+) were referred for colposcopy and women with HPV16/18/45 results had an additional swab for E6 oncoprotein test collected before colposcopy.

Participants in the screening cohort II (SC-II) were recruited from the Inner Mongolia Autonomous Region of China between 2016 and 2019. At baseline, all participants received cobas HPV, INNO-LiPA HPV genotyping, and LBC. Women with HPV16/18 positive or ASC-US+ were referred for colposcopy.

For both screening cohorts, women who were HPV positive or had an ASC-US+ cytology continued to annual follow-up visits, and all women regardless the results at baseline came back at the 3rd year for a final visit. At each visit, a LBC specimen was obtained and women with ASC-US+ were referred for colposcopy. Women found to have a diagnosis of CIN2+ at baseline or follow-up exited the study after the colposcopy visit and were referred for treatment.

Laboratory tests

The Onclarity HPV is a PCR assay for the detection of six individual HPV genotypes (16, 18, 31, 45, 51, and 52) and three groups of types (33/58, 59/56/66, and 39/68/35). The cobas HPV is another PCR assay for the detection of viral DNA of the 14 hrHPV types, which simultaneously differentiates HPV16 and HPV18. The Aptima HPV is based on the qualitative detection of E6/E7 mRNA of 14 hrHPV types. The Aptima HPV16/18/45 uses the same technology as Aptima HPV for detection of E6/E7 mRNA from HPV16/18/45; the assay differentiates genotype 16 from 18 and 45 but does not differentiate between 18 and 45. INNO-LiPA HPV genotyping assay allows simultaneous and separate detection of 25 different HPV genotypes (14 hrHPV and HPV6, 11, 34, 40, 42, 43, 44, 53, 54, 70, and 74). All HPV tests were performed at the fully automated system according to the manufacturer’s instructions. The OncoE6 cervical test is an immunochromatographic test for the detection of HPV16/18 E6 oncoprotein. The operation procedures were described previously [15].

Cytology slides were first evaluated by junior cytologists and then diagnosed by senior cytologists. Results were reported using the Bethesda 2014 nomenclature. A second cytology slide was prepared from the residual PreservCyt Solution for p16/Ki-67 dual staining using the CINtecPLUS Cytology kit according to the manufacturer’s instructions for the cross-sectional samples. Technicians were blinded to each other’s findings to minimize bias.

Statistical analyses

Model development

Based models of logistic regression and SVM were implemented on the platform of R (Version 3.5.2). Model construction and internal validation were performed in the cross-sectional population, which was randomly split into 70% for a training set and 30% for a testing set.

Logistic regression or SVM using age, cytology, and hrHPV as predictors was set as the base model. Among the predictors, age was a continuous covariate; hrHPV testing was dichotomous (any type of the 14 hrHPV types positive vs. all of the 14 hrHPV types negative); and cytology was a seven-level covariate: negative for intraepithelial lesion or malignancy (NILM), ASC-US, low-grade squamous intraepithelial lesion (LSIL), atypical squamous cells cannot exclude high-grade lesion (ASC-H), atypical glandular cell (AGC), high-grade squamous intraepithelial lesion/adenocarcinoma in situ (HSIL/AIS), and squamous cell carcinoma/adenocarcinoma (SCC/ADC). HSIL and AIS, as well as SCC and ADC, were separately combined because limited cases were available for these levels. CIN2+ or CIN3+, the outcome of interest, was dichotomous. Receiver operating characteristic (ROC) curve (sensitivity and 1-specificity) and the area under the curve (AUC) were used to assess predictive accuracy. Sensitivity, specificity, and colposcopy referral rate were also calculated for current screening methods and models based on the thresholds with the largest Youden Index.

The base model was extended by substituting hrHPV using different detection methods, i.e., the result of cobas was substituted by Aptima or Onclarity. Additional covariates were also added to the base model, including E6 oncoprotein (dichotomous, either HPV16/18 positive vs. both HPV16&18 negative), p16/Ki-67 (dichotomous, positive vs. negative), and HPV genotyping (nine dummy variables: HPV16, 18, 31, 45, 51, 52, 33/58, 59/56/66, and 39/68/35, positive vs. negative). AUCs were compared using the “pROC” package in R. Logistic regression or SVM, which one showed better clinical performance, was chosen for further analysis. Statistical significance was assessed by two-tailed tests with α level of 0.05.

External validation in screening cohorts

The base model and extended versions with HPV genotyping were applied to both screening cohorts. The extended models with E6 oncoprotein were applied to SC-I only because swab samples were not collected in SC-II. Cytology results diagnosed by junior and senior cytologists were also evaluated in models. Three-year cumulative risks of CIN2+ were estimated by hrHPV and cytology co-testing negative and predicted-negative populations.


Study population characteristics

Table 1 shows the characteristics of the study populations. A total of 1915, 3179, and 3082 women were eligible in the cross-sectional population, SC-I, and SC-II, respectively. The average ages (years ± standard deviation) of women were 47.79±9.78, 45.22±7.76, and 42.80±8.85; the positivity rates of HPV were 50.81%, 13.90%, and 17.07%; the abnormal cytology proportions were 53.16%, 10.47%, and 17.46%; and the CIN2+ percentages were 39.06%, 2.45%, and 1.65%, respectively.

Table 1 Characteristics of three study populations at baseline

Model development

Results for the current screening methods and the proposed models for CIN2+ prediction are presented in Table 2. Statistical comparisons showed that the logistic regression had slightly higher AUC compared to SVM thus were chosen in further analysis (parameters of the models are shown in Additional file 1: Table S1-S2). The logistic regression from the testing set of the cross-sectional population showed that the base model had a sensitivity of 92.00% (95% confidence interval [CI] = 88.00–95.11%), specificity of 89.08% (95% CI = 85.63–92.24%), and AUC of 0.91 (95% CI = 0.88–0.93). The AUC of the base model slightly increased when p16/Ki-67 dual staining was added in the base model, whereas larger AUC improvements were obtained when HPV genotyping or E6 oncoprotein were included in the base model (Fig. 2). Results of cobas, Aptima, and Onclarity showed no significant changes.

Table 2 Clinical performance of current screening methods and models for cross-sectional population (CIN2+)
Fig. 2

Area under the receiver operating characteristic curve (AUC) and 95% confidence interval (CI) of the base model with or without additional biomarkers

For the current screening methods, the largest AUCs were obtained by ASC-US+ (AUC = 0.85, 95% CI=0.84–0.87) and HPV mRNA (AUC = 0.87, 95% CI = 0.86–0.89). For co-testing, the AUC was 0.80 (95% CI = 0.79–0.82). The base model showed to have slightly higher AUC compared to current screening methods using either ASC-US+ or hrHPV mRNA. In addition, the base model reduced the number of colposcopy referrals, with a referral rate of 42.76% (95% CI = 38.67–46.92%) compared to 61.62% (95% CI = 59.40–63.80%) by hrHPV and cytology co-testing, 48.25% (95% CI = 45.98–50.53%) by hrHPV mRNA, and 53.16% (95% CI = 50.89–55.41%) by ASC-US+. The referral rates of the base model were further reduced when additional predictors were used (Table 2).

Similar results for the current screening methods and the proposed models for CIN3+ prediction are presented in Additional file 1: Table S3.

External validation in screening cohorts at baseline

The models were further applied to the baseline data of the two screening cohorts, with or without E6 oncoprotein and/or HPV genotyping for CIN2+ (Table 3) and CIN3+ (Additional file 1: Table S3) prediction. For CIN2+ prediction, the base models of SC-I and SC-II yielded sensitivities of 100% and 94.12% (95% CI = 86.27–100.00%), specificities of 0.84.49% (95% CI = 83.23–85.75%) and 94.06% (95% CI = 93.20–94.89%), and AUCs of 0.92 (95% CI = 0.91–0.93) and 0.94 (95% CI=0.91–0.97), respectively, better than hrHPV and cytology co-testing. The base model also had lower colposcopy referral rates than co-testing (17.55%, 95% CI = 16.24–18.92%, versus 20.64% 95% CI = 19.24–22.08% in SC-I; and 7.40% 95% CI = 6.50–8.38%, versus 26.83% 95% CI = 25.28–28.44% in SC-II). Although the models based on the diagnosis of junior cytologists did not perform as well as those using the diagnosis from senior cytologists, their AUCs were still higher than the corresponding hrHPV and cytology co-testing in both cohorts. The inclusion of E6 oncoprotein and/or HPV genotyping into the base model slightly increased AUCs in the baseline data. Similar results were observed using CIN3+ as the outcome.

Table 3 Clinical performance of current screening methods and models for screening cohorts at baseline (CIN2+)

External validation in screening cohorts at follow-up

During the 3-year follow-up procedures, 42 CIN2+ cases were diagnosed in SC-I, with 37 cases predicted to be positive and 5 cases to be negative by base model at baseline. These 5 cases were both hrHPV negative and normal cytology at baseline. Women with predicted-negative findings had slightly lower 3-year risks of CIN2+ compared with women with hrHPV and cytology co-test negative (0.19%, 95% CI = 0.06–0.44% vs 0.20%, 95% CI = 0.06–0.46%). As for SC-II, 28 CIN2+ cases were diagnosed during follow-up, with 11 cases predicted to be positive and 17 cases to be negative at baseline. Women with predicted-negative findings had higher 3-year risks of CIN2+ compared with women with co-test negative (0.70%, 95% CI = 0.43–1.08% vs 0.09%, 95% CI = 0.01–0.32%).

Since the 3-year risk of CIN2+ was higher among women with negative results of the predictive model compared to co-testing, we changed the thresholds from the highest Youden Index to the highest sensitivity for base model prediction. Results for SC-I did not change, whereas SC-II yielded a sensitivity of 100%, specificity of 84.13% (95% CI = 82.88–85.42%), and colposcopy referral rate of 17.23% (95% CI = 15.91–18.61%), which also has a higher specificity and AUC, same sensitivity and lower colposcopy referral rate compared to co-testing at baseline. By using this threshold, 26 out of 28 follow-up CIN2+ cases were predicted to be positive at baseline and 2 cases were negative. Women with predicted-negative findings had slightly lower 3-year risks of CIN2+ (0.08%, 95% CI = 0.01–0.28%) compared with 0.09% (95% CI = 0.01–0.32%) of women with co-testing negative.


In this study, we developed and evaluated machine learning-based models to predict CIN2+ or CIN3+ for cervical cancer screening. A logistic regression model using hrHPV, cytology, and age was set as the base model due to its superior performances in prediction and colposcopy referral rates reduction. Improved clinical performance of the base model can be gained by incorporating E6 oncoprotein and/or HPV genotyping information. External validation in two screening cohorts further demonstrated that our models had better clinical performances than routine screening methods. The 3-year risks of CIN2+ for the predicted-negative women depended on the thresholds of the model, but the improvement of clinical performance at baseline can be obtained whichever threshold was chosen.

Different models were used for cervical cancer screening in previous studies. Karakitsos et al. used the learning vector quantizer neural network classifier on cytological diagnosis, HPV DNA test, E6/E7 HPV mRNA test, and p16 immunostaining to build an algorithm to facilitate the classification of CIN2+. This model improved the AUC (0.916) significantly compared to cytology diagnosis alone (0.866) [16]. In the study conducted by Branca et al., comprehensive multivariate models were constructed by a panel of 13 biomarkers to predict CIN2+, giving the AUC of 0.897 [17]. A Korean study developed a web-based tool on age, cytology and presence of 15 hrHPV genotypes in a SVM model to identify the patient features that maximally contributed to progression to cervical lesions, which obtained an accuracy of 74.41%. However, this model was not developed for cancer screening and their result was highly dependent on the proportion of positive and negative individuals they selected [18]. Several studies used logistic regression to establish predictors for histologic grade or risk stratification based on the epidemiologic risk factors and the molecular markers. In a large study of around 100,000 women using race, smoking status, insurance, marital status, median income, and previous HPV test result as predictors, their model only obtained an AUC of 0.81 for CIN2+ [19]. Another study of 1,477 women reported that the most predictive factors were mRNA level, DNA index, parity, and age, and the AUC was 0.99 for HSIL and 0.81 for LSIL [20]. However, findings from previous studies may not be replicable across studies due to differences in adjustment factors, sample size, and degrees of diagnoses.

The clinical performances of our extended models included HPV genotyping and/or E6 oncoprotein showed to be better than the base model in each of the study population assessed, but resulted in a slight increase in cost. HPV genotyping can be a byproduct of HPV testing that has little additional costs but more additional information. E6 oncoprotein is pivotal in initiation and maintenance of oncogenic transformation by HPV [21] and associated with viral persistence [22, 23]. The protein testing is a lateral flow immunoassay designed for low- and middle-income countries [15]. When conducting study in SC-I, we assumed that only people positive with HPV mRNA result could express E6 oncoprotein. Therefore, our protein testing was performed only in the HPV16/18/45 mRNA-positive participants (N = 126). These results showed that additionally testing for the E6 oncoprotein in a limited group of people could yield better screening performance than the base model. In addition, the models recommended fewer women to receive immediate colposcopy compared to HPV and cytology testing alone or co-testing but had the same cost with co-testing in the real-world setting to collect HPV testing and cytology information, hence could reduce unnecessary diagnostic procedures and costs.

Cytology diagnosis is subjective in nature, and its reproducibility and accuracy are affected by the cytologist’s skill [24]. In our study, the cytology results diagnosed by senior cytologists were performed in a high-quality laboratory in Beijing, which may not be generalizable to all the cytology laboratories [25]. For example, the cytology diagnosis in SC-II was conducted by the best cytologist in China, whose sensitivity (0.961) was higher than HPV testing (0.902). In order to better extrapolate, models based on the cytology results from junior cytologists were also evaluated. Although we found that model performances were affected by the skill level of the cytologists, the clinical performances of our models were still increased, compared to hrHPV testing and cytology co-testing within the same cytologist’s skill level.


Our study demonstrated that machine learning could incorporate multiple screening methods into one algorithm and develop models by the current cervical cancer screening indicators, which has the potential to be a reliable screening method considering its better clinical performance and lower referral rate.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.





Atypical glandular cell


Adenocarcinoma in situ


Atypical squamous cells cannot exclude high-grade lesion


Atypical squamous cells of undetermined significance


The area under the curve


Confidence interval


Cervical intraepithelial neoplasia


Human papillomavirus virus


High-grade squamous intraepithelial lesion


Liquid-based cytology


Low-grade squamous intraepithelial lesion


Negative for intraepithelial lesion or malignancy


Receiver operating characteristic


Squamous cell carcinoma


Screening cohort I


Screening cohort II


Support vector machine


  1. 1.

    Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

    Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Vizcaino AP, Moreno V, Bosch FX, Munoz N, Barros-Dios XM, Borras J, et al. International trends in incidence of cervical cancer: II. Squamous-cell carcinoma. Int J Cancer. 2000;86(3):429–35.<429::aid-ijc20>;2-d.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Walboomers JM, Jacobs MV, Manos MM, Bosch FX, Kummer JA, Shah KV, et al. Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J Pathol. 1999;189(1):12–9.<12::AID-PATH431>3.0.CO;2-F.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Saslow D, Solomon D, Lawson HW, Killackey M, Kulasingam SL, Cain J, et al. American Cancer Society, American Society for Colposcopy and Cervical Pathology, and American Society for Clinical Pathology screening guidelines for the prevention and early detection of cervical cancer. Am J Clin Pathol. 2012;137(4):516–42.

    Article  PubMed  Google Scholar 

  5. 5.

    Mayrand MH, Duarte-Franco E, Rodrigues I, Walter SD, Hanley J, Ferenczy A, et al. Canadian Cervical Cancer Screening Trial Study G: Human papillomavirus DNA versus Papanicolaou screening tests for cervical cancer. N Engl J Med. 2007;357(16):1579–88.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Ronco G, Giorgi-Rossi P, Carozzi F, Confortini M, Dalla Palma P, Del Mistro A, et al. Results at recruitment from a randomized controlled trial comparing human papillomavirus testing alone with conventional cytology as the primary cervical cancer screening test. J Natl Cancer Inst. 2008;100(7):492–501.

    Article  PubMed  Google Scholar 

  7. 7.

    WHO. In: WHO Guidelines for Screening and Treatment of Precancerous Lesions for Cervical Cancer Prevention. edn. Geneva; 2013.

  8. 8.

    Schiffman M, Hyun N, Raine-Bennett TR, Katki H, Fetterman B, Gage JC, et al. A cohort study of cervical screening using partial HPV typing and cytology triage. Int J Cancer. 2016;139(11):2606–15.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Ejegod D, Bottari F, Pedersen H, Sandri MT, Bonde J. The BD Onclarity HPV Assay on Samples Collected in SurePath Medium Meets the International Guidelines for Human Papillomavirus Test Requirements for Cervical Screening. J Clin Microbiol. 2016;54(9):2267–72.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Yu L, Jiang M, Qu P, Wu Z, Sun P, Xi M, et al. Clinical evaluation of human papillomavirus 16/18 oncoprotein test for cervical cancer screening and HPV positive women triage. Int J Cancer. 2018;143(4):813–22.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Wentzensen N, Fetterman B, Castle PE, Schiffman M, Wood SN, Stiemerling E, et al. p16/Ki-67 Dual Stain Cytology for Detection of Cervical Precancer in HPV-Positive Women. J Natl Cancer Inst. 2015;107(12):djv257.

    Article  Google Scholar 

  12. 12.

    Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–30.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Erickson BJ, Korfiatis P, Akkus Z, Kline TL. Machine Learning for Medical Imaging. Radiographics. 2017;37(2):505–15.

    Article  PubMed  Google Scholar 

  14. 14.

    Pretorius RG, Zhang WH, Belinson JL, Huang MN, Wu LY, Zhang X, et al. Colposcopically directed biopsy, random cervical biopsy, and endocervical curettage in the diagnosis of cervical intraepithelial neoplasia II or worse. Am J Obstet Gynecol. 2004;191(2):430–4.

    Article  PubMed  Google Scholar 

  15. 15.

    Zhao FH, Jeronimo J, Qiao YL, Schweizer J, Chen W, Valdez M, et al. An evaluation of novel, lower-cost molecular screening tests for human papillomavirus in rural China. Cancer Prev Res (Phila). 2013;6(9):938–48.

    Article  Google Scholar 

  16. 16.

    Karakitsos P, Chrelias C, Pouliakis A, Koliopoulos G, Spathis A, Kyrgiou M, et al. Identification of women for referral to colposcopy by neural networks: a preliminary study based on LBC and molecular biomarkers. J Biomed Biotechnol. 2012;2012:303192.

    Article  Google Scholar 

  17. 17.

    Branca M, Ciotti M, Giorgi C, Santini D, Di Bonito L, Costa S, et al. Predicting high-risk human papillomavirus infection, progression of cervical intraepithelial neoplasia, and prognosis of cervical cancer with a panel of 13 biomarkers tested in multivariate modeling. Int J Gynecol Pathol. 2008;27(2):265–73.

    Article  PubMed  Google Scholar 

  18. 18.

    Kahng J, Kim EH, Kim HG, Lee W. Development of a cervical cancer progress prediction tool for human papillomavirus-positive Koreans: A support vector machine-based approach. J Int Med Res. 2015;43(4):518–25.

    Article  PubMed  Google Scholar 

  19. 19.

    Rothberg MB, Hu B, Lipold L, Schramm S, Jin XW, Sikon A, et al. A risk prediction model to allow personalized screening for cervical cancer. Cancer Causes Control. 2018;29(3):297–304.

    Article  PubMed  Google Scholar 

  20. 20.

    Scheurer ME, Guillaud M, Tortolero-Luna G, Follen M, Adler-Storthz K. Epidemiologic modeling of cervical dysplasia with molecular and cytopathological markers. Gynecol Oncol. 2007;107(1 Suppl 1):S163–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Ghittoni R, Accardi R, Hasan U, Gheit T, Sylla B, Tommasino M. The biological properties of E6 and E7 oncoproteins from human papillomaviruses. Virus Genes. 2010;40(1):1–13.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Yu LL, Kang LN, Zhao FH, Lei XQ, Qin Y, Wu ZN, et al. Elevated Expression of Human Papillomavirus-16/18 E6 Oncoprotein Associates with Persistence of Viral Infection: A 3-Year Prospective Study in China. Cancer Epidemiol Biomark Prev. 2016;25(7):1167–74.

    CAS  Article  Google Scholar 

  23. 23.

    Zhang Q, Dong L, Hu S, Feng R, Zhang X, Pan Q, et al. Risk stratification and long-term risk prediction of E6 oncoprotein in a prospective screening cohort in China. Int J Cancer. 2017;141(6):1110–9.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Koliopoulos G, Nyaga VN, Santesso N, Bryant A, Martin-Hirsch PP, Mustafa RA, et al. Cytology versus HPV testing for cervical cancer screening in the general population. Cochrane Database Syst Rev. 2017;8:CD008587.

    PubMed  Google Scholar 

  25. 25.

    Pan QJ, Hu SY, Zhang X, Ci PW, Zhang WH, Guo HQ, et al. Pooled analysis of the performance of liquid-based cytology in population-based cervical cancer screening studies in China. Cancer Cytopathol. 2013;121(9):473–82.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We thank all the local doctors from Beijing, Inner Mongolia Autonomous Region, Shanxi, Tianjin, Sichuan, and Henan Province, the staffs of pathology review group members, as well as other colleagues who assisted in conducting this work.


This work was supported by the National Natural Science Foundation of China (Grant number: 81272337 and 81973136).

Author information




ZW and WC had full access to all the data in the study and take responsibility for the integrity of the data and accuracy of the data analysis. YQ and WC organized the original individual studies concept and design.ZW, TL, MJ, YY, HX, LY, JC, BL, FC, DL, JY, XZ, and QP acquired the raw data. ZW and YH analyzed and interpreted the data. ZW and TL drafted the paper. YH and WC revised the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Wen Chen.

Ethics declarations

Ethics approval and consent to participate

Institutional review board (IRB) approval was provided by the Ethics Committee from Cancer Hospital, Chinese Academy of Medical Sciences. The ethical permission numbers are 13-104/780, 16-111/1190, and 15-118/1045 for the cross-sectional study, screening cohort I, and screening cohort II, respectively. All participants have agreed on the study protocol and provided informed consent.

Consent for publication

Not applicable

Competing interests

YQ has received fund from Qiagen, Roche, Hologic, and Daltonbio for registration trials of HPV test kits. WC has received consulting fees and payment for lectures from Roche, Hologic, and BD Diagnostics. The other authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional File 1:

Table S1-S3. Table S1. – Logistic Regression Parameters. Table S2. – SVM Parameters. Table S3. – Clinical performance of current screening methods and models for cross-sectional population and screening cohorts at baseline (CIN3+)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Li, T., Han, Y. et al. Development of models for cervical cancer screening: construction in a cross-sectional population and validation in two screening cohorts in China. BMC Med 19, 197 (2021).

Download citation


  • Cervical cancer
  • Human papillomavirus virus (HPV)
  • Screening