Development and validation of a 25-Gene Panel urine test for prostate cancer diagnosis and potential treatment follow-up

Background Heterogeneity of prostate cancer (PCa) contributes to inaccurate cancer screening and diagnosis, unnecessary biopsies, and overtreatment. We intended to develop non-invasive urine tests for accurate PCa diagnosis to avoid unnecessary biopsies. Methods Using a machine learning program, we identified a 25-Gene Panel classifier for distinguishing PCa and benign prostate. A non-invasive test using pre-biopsy urine samples collected without digital rectal examination (DRE) was used to measure gene expression of the panel using cDNA preamplification followed by real-time qRT-PCR. The 25-Gene Panel urine test was validated in independent multi-center retrospective and prospective studies. The diagnostic performance of the test was assessed against the pathological diagnosis from biopsy by discriminant analysis. Uni- and multivariate logistic regression analysis was performed to assess its diagnostic improvement over PSA and risk factors. In addition, the 25-Gene Panel urine test was used to identify clinically significant PCa. Furthermore, the 25-Gene Panel urine test was assessed in a subset of patients to examine if cancer was detected after prostatectomy. Results The 25-Gene Panel urine test accurately detected cancer and benign prostate with AUC of 0.946 (95% CI 0.963–0.929) in the retrospective cohort (n = 614), AUC of 0.901 (0.929–0.873) in the prospective cohort (n = 396), and AUC of 0.936 (0.956–0.916) in the large combination cohort (n = 1010). It greatly improved diagnostic accuracy over PSA and risk factors (p < 0.0001). When it was combined with PSA, the AUC increased to 0.961 (0.980–0.942). Importantly, the 25-Gene Panel urine test was able to accurately identify clinically significant and insignificant PCa with AUC of 0.928 (95% CI 0.947–0.909) in the combination cohort (n = 727). In addition, it was able to show the absence of cancer after prostatectomy with high accuracy. Conclusions The 25-Gene Panel urine test is the first highly accurate and non-invasive liquid biopsy method without DRE for PCa diagnosis. In clinical practice, it may be used for identifying patients in need of biopsy for cancer diagnosis and patients with clinically significant cancer for immediate treatment, and potentially assisting cancer treatment follow-up. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-020-01834-0.


Background
Prostate cancer (PCa) is the second most prevalent cancer and a leading cause of cancer-related death [1]. Needle biopsy is a standard method for PCa diagnosis, yet it is invasive and associated with complications and missing lesions [2]. Prostate-specific antigen (PSA) is a widely used PCa screening test, yet with moderate sensitivity and very low specificity (< 30%), resulting in > 70% false positive rate and many unnecessary biopsies [2]. Although tests using PSA isoforms/analogs have been developed, their improvement on accuracy is limited [2,3]. For clinically meaningful PCa diagnosis, it is important to identify patients with clinically significant cancer. Although the new tools such as magnetic resonance imaging (MRI) and multiparametric MRI targeted biopsy have been used to identify patients with clinically significant PCa, these methods have limited accuracy [4][5][6].
During tumorigenesis, PCa cells are exfoliated from the prostate and released into the urine [7], making urine a readily available source to detect prostate-specific biomarkers for diagnosis and prognosis. Although many urine biomarkers have been identified and used individually or in combination for diagnosis, none of them has sensitivity and specificity both above 90% and AUC above 0.9. Most studies tested in < 300 samples. All of them use urine collected after digital rectal examination (DRE), which is invasive and uncomfortable for patients [2,6,[8][9][10][11][12]. In addition, with very low specificity of the PSA test for cancer diagnosis and limitation of imaging technologies to identify residual cancer lesions after treatment, no accurate test is available to assess efficacy and outcome of PCa treatment such as prostatectomy. Yet it is crucial to accurately measure treatment outcome to assist treatment decision-making, such as assessing if residual cancer lesion remains after prostatectomy to determine the necessity of subsequent treatment, leading to improved cancer treatment and reduced mortality [13,14]. Therefore, it is of great clinical significance to develop better tests for these unmet medical needs.
PCa is a cancer with a high degree of heterogeneity. Many gene alterations contribute to cancer tumorigenesis, progression, recurrence, and metastasis [15]. Thus, it is necessary to combine multiple biomarkers involved in these processes.
We therefore developed a novel 25-Gene Panel urine test for PCa diagnosis and potential treatment follow-up. We showed that the test was robust with high accuracy in two independent, multi-center studies.

Retrospective and prospective studies
A multi-center retrospective study was approved by the Institutional Review Board (IRB) of San Francisco General Hospital (San Francisco, USA) (IRB # 15-15816) to collect and test archived urine sediments to identify and validate urine biomarkers for PCa diagnosis. The prospectively designed, retrospectively collected pre-biopsy urine samples were randomly picked from sample archives at Cooperative Human Tissue Network (CHTN) Southern Division (patients in the USA) and Indivumed GmbH (patients in Germany). The urine samples were collected from patients with elevated PSA scheduled for biopsy for cancer diagnosis from July 2004 to November 2014 with prior ethical approval and patient consent for future studies. A multi-center prospective study for urine biomarkers was approved by IRB of Shenzhen People's Hospital (Shenzhen, China) (Study Number P2014-006) to collect pre-biopsy fresh urine samples from patients treated at seven hospitals collaborated in the study with patient consent, including Shenzhen People's Hospital, The First Affiliated Hospital of Sun Yat-Sen University, Peking University First Hospital, Foshan First People's Hospital, Nanfang Hospital at Southern Medical University, Peking University Shenzhen Hospital, and The Second People's Hospital of Shenzhen. The urine samples were collected consecutively from patients with elevated PSA scheduled for biopsy in the participating hospitals. Both studies used the same patient inclusion criteria of age at 18-85, with histopathological diagnosis of PCa, BPH, or prostatitis from biopsy, and without treatment of PCa drugs or 5-alpha reductase inhibitors prior to urine collection. The exclusion criteria included having prostatectomy or treatment with PCa drugs or 5alpha reductase inhibitors before urine collection. In addition, ten patients undergoing prostatectomy were recruited to collect urine samples several days before and after surgery. The pathological diagnosis of PCa in both retrospective and prospective studies was performed by using standard needle biopsy with consistent procedures. The pathological diagnosis of clinically significant or insignificant PCa was defined based on PCa risk stratification guidelines from the National Comprehensive Cancer Network (NCCN) with modifications. The clinically significant PCa patients were classified as meeting any of the following criteria: Gleason score > 7, Gleason score 4 + 3 = 7, cancer staging ≥ T3, PSA > 20 ng/mL at diagnosis, biochemical recurrence after prostatectomy during the follow-up period, or cancer metastasis at diagnosis or during the follow-up period. The rest of the patients were classified as clinically insignificant PCa. All samples were de-identified and coded with patient numbers to protect patient privacy following the Health Insurance Portability and Accountability Act guidelines. Urine samples from 665 patients were received with 51 excluded in the retrospective cohort and urine samples from 411 patients were received with 15 excluded in the prospective cohort respectively, due to the lack of pathology report, diagnosis uncertainty, or low/no gene expression detected.

Urine processing and quantification of gene expression
For the retrospective study, 10-15 mL urine samples were collected without digital rectal examination (DRE) and the urine pellet was flash-frozen and stored at − 80°C. For the prospective study, 15-45 mL urine without DRE was collected in the presence of 5 mL DNA/ RNA preservative AssayAssure (Thermo Fisher Scientific, Waltham, MA, USA) or U-Preserve (Hao Rui Jia Biotech Ltd., Beijing, China), stored at 4°C, and processed within 7 days. The urine pellet obtained after centrifugation at 1000×g for 10 min was washed with phosphate-buffered saline followed by a second centrifugation at 1000×g for 10 min. The cell pellet was processed for RNA purification or immediately frozen on dry ice and stored at − 80°C. A detailed procedure of gene expression quantification is listed in Additional file 1: Methods.

Prostate tissue specimen cohort
The GSE17951 prostate tissue specimen cohort includes quantitative mRNA expression data of PCa and benign prostate specimens obtained from Affymetrix U133Plus2 array [16,17]. The PCa tissues (n = 56) in the cohort were collected from patient biopsy specimens, and the benign prostate tissues (n = 98) were obtained from prostate autopsy specimens of patients with benign disease. The gene expression levels of the 25 genes in the panel were obtained from the database and normalized with beta-actin expression level.

Data analysis and algorithm for cancer diagnosis
All data analysis and diagnosis by the 25-Gene Panel were performed blindly without prior knowledge of patient information. The gene expression data was downloaded and first analyzed with ABI Quantstudio 6 software (Thermo Fisher Scientific, Waltham, MA, USA). The mRNA expression level of the housekeeping gene beta-actin was measured in each urine sample and used for gene expression normalization to control variation of cDNA quantity in the urine samples. The cycle threshold (Ct) value of each gene in the panel was divided by the Ct value of the beta-actin and then multiplied by 1000 as the normalized gene expression value (CtS = Ct (sample)/Ct (actin) × 1000). For each gene, the average Ct value from triplicate PCR was used. For the diagnosis of cancer by the 25-Gene Panel, the relative Ct (CtS) values of the 25 genes in the panel were used to generate a classification score (diagnostic D score).
For cancer diagnosis in both retrospective and prospective cohorts, each sample was diagnosed using the Diagnosis Algorithm as shown below: Diagnostic D score = C PCa − C Non Whereas A PCa is the PCa constant, B Non is the non-PCa constant, CtS 1 through CtS 25 are CtS values of gene 1 through gene 25, X 1 through X 25 are PCa regression coefficients of gene 1 through gene 25, X 1*1 through X 25*25 are gene 1 and gene 1 cross PCa regression coefficients through gene 25 and gene 25 cross PCa regression coefficients, Y 1 through Y 25 are non-PCa regression coefficients of gene 1 through gene 25, and Y 1*1 through Y 25*25 are gene 1 and gene 1 cross non-PCa regression coefficients through gene 25 and gene 25 cross non-PCa regression coefficients. The sample was diagnosed to be PCa when the diagnostic D score was > 0, whereas the sample was diagnosed to be benign prostate (non-PCa) when the diagnostic D score was ≤ 0.

Statistical analysis
To generate an algorithm for diagnosing urine samples as PCa or benign prostate (Diagnosis Algorithm), discriminant analysis was performed to test the association between pathological diagnosis and CtS values of the 25 genes in the panel using a statistical software program XLSTAT (Addinsoft, Paris, France). The diagnosis of all the samples by the algorithm was compared to their pathological diagnosis to assess diagnostic performance by calculating sensitivity, specificity, positive predictive value, negative predictive value, odds ratio, and their respective 95% confidence intervals. The receiver operating characteristic curve was plotted and the area under the curve (AUC) with its 95% confidence interval was calculated. To further validate the 25-Gene Panel in the combination cohort, the leave-one-out cross-validation analysis was performed to generate regression coefficients to determine the classification of each sample by the 25-Gene Panel, which was then compared with the pathological diagnosis of each sample to calculate the diagnostic performance of cross-validation using XLSTAT. In addition, univariate and multivariate logistic regression analyses were conducted to compare the diagnostic performance of pre-biopsy PSA, pre-biopsy PSA at the cutoff value of 4 ng/mL, patient age, PCa family history, the 25-Gene Panel urine test, and their combinations.

Non-invasive urine test
Current urine tests for PCa diagnosis and prognosis rely on DRE before urine collection to enrich prostate cells in the urine, yet the procedure is uncomfortable and invasive for patients and requires a physician to perform. To develop a non-invasive urine test to measure gene expression of biomarkers, we employed a modified method of cDNA preamplification before real-time qRT-PCR [18] and showed that it improved quantification of gene expression in urine collected without DRE that contained fewer prostate cells. We detected mRNA expression of the genes with significantly increased sensitivity by~10 Ct units without changing the relative gene expression values (ΔCt) (Additional file 2: Table  S1). The ΔCt values were similar in the urine samples collected from the same patients with and without DRE (Additional file 2: Table S2), the urine with and without DRE had similar diagnostic D score, and the diagnosis of the urine with or without DRE was the same (Table 1). With the help of DNA/RNA preservative, urine can be collected without DRE or physician's involvement and stored or shipped at room temperature within a week. Our data demonstrated that the new method developed in the study is robust and can be used to quantify biomarker gene expression in urine samples without DRE, making it a valid and much improved liquid biopsy method in clinical practice.

Development of the 25-Gene Panel classifier
In a previous study, we identified a series of biomarker candidates involved in cell proliferation, survival, migration, tumorigenesis, cancer invasion, and metastasis with differential gene expression in PCa and benign prostate tissue specimens [19,20]. To develop a gene panel for cancer diagnosis with high diagnostic accuracy, we used a random forest machine learning program [21,22] combined with a discriminant analysis classification test to screen mRNA expression profiles of the biomarker candidates in PCa and benign prostate specimens in large cohorts obtained from Gene Expression Omnibus (GEO) database. The diagnosis of the specimens by various panels combining the candidate biomarkers was compared to the pathological diagnosis of the specimens to assess the diagnostic performance of the panels to distinguish PCa and benign prostate, which included diagnostic parameters of sensitivity, specificity, odds ratio, and AUC. A 25-Gene Panel consisting of HIF1A, FGFR1, BIRC5, AMACR, CRISP3, FN1, HPN, MYO6, PSCA, PMP22, GOLM1, LMTK2, EZH2, GSTP1, PCA3, VEGFA, CST3, PTEN, PIP5K1A, CDK1, TMPRSS2, ANXA3, CCNA1, CCND1, and KLK3 was discovered to have the highest diagnostic accuracy to distinguish cancer lesions from benign prostate (Additional file 2: Table S3). We found that subtracting any one or more genes from the panel would lower the diagnostic accuracy, such as lowered sensitivity, specificity, and AUC. This showed that all genes in the panel contribute significantly to the diagnostic algorithm.

The 25-Gene Panel urine test for cancer diagnosis
We examined if the 25-Gene Panel identified above can be used for cancer diagnosis using urine samples DRE digital rectal examination, D score-DRE-urine diagnostic D score of the urine sample collected without DRE, D score-DRE+urine diagnostic D score of the urine sample collected after DRE, diagnosis-DRE-urine diagnosis of the urine sample collected without DRE, diagnosis-DRE+urine diagnosis of the urine sample collected after DRE collected without DRE (Fig. 1). We conducted independent, multi-center retrospective and multi-center prospective studies to collect pre-biopsy urine samples and used the 25-Gene Panel as a classifier to distinguish PCa and benign prostate for diagnosis. The study population in both cohorts represents patients in real clinical practice as they are patients who underwent routine cancer diagnosis using standard PSA and biopsy in the participating hospitals. The end point of the study was to assess the diagnostic performance of the 25-Gene Panel urine test and its improvement over the known clinical parameters for PCa diagnosis. The patient characteristics and clinical parameters are illustrated based on the standard clinical practice [23] as shown in Table 2.
We successfully quantified mRNA expression of each biomarker in the 25-Gene Panel using preamplification of cDNA purified from urine pellets followed by real-time qRT-PCR. The retrospective cohort (n = 614) was used as a training set to create the Diagnosis Algorithm, which combined the mRNA expression quantity of the biomarkers in the panel for classification of the urine sample as PCa or benign prostate. Such diagnosis was then compared to the pathological diagnosis from biopsy to calculate the diagnostic performance of the 25-Gene Panel urine test. Table 3 and Fig. 2a, the 25-Gene Panel was capable of distinguishing PCa from benign prostate (non-PCa) with high sensitivity of 92.5% (95% CI 94.8-90.2%), specificity of 91.5% (95% CI 97.1-85.9%), odds ratio of 132.6 (95% CI 293.5-59.9), and AUC of 0.946 (95% CI 0.963-0.929).

As shown in
We then used an independent multi-center prospective cohort (n = 396) as a validation set to assess the diagnostic accuracy of the 25-Gene Panel urine test. The result showed sensitivity of 85.0% (95% CI 89.9-80.2%), specificity of 94.7% (95% CI 97.9-91.5%), odds ratio of 101.6 (95% CI 213.5-48.4), and AUC of 0.901 (95% CI 0.929-0.873) ( Table 3 and Fig. 2b). The diagnostic performance was further validated by combining the retrospective (n = 614) and prospective (n = 396) cohorts, which used the same inclusion and exclusion criteria to enroll patients and collected urine samples without DRE, to form a combination cohort of 1010 patients with 283 benign prostate (28.0%) and 727 PCa (72.0%). The 25-Gene Panel showed high sensitivity of 90.4% (95% CI 92.5-88.2%), specificity of 93.6% (95% CI 96.5-90.8%), odds ratio of 138.2 (95% CI 236.5-80.8), and AUC of 0.936 (95% CI 0.956-0.916) ( Table 3 and Fig. 2c). Crossvalidation of the 25-Gene Panel urine test in the combination cohort generated similarly accurate diagnostic   (Table 5). This result demonstrated that the 25-Gene Panel urine test had superior diagnostic performance than PSA at 4 ng/ mL, with greatly improved diagnostic specificity. Each year, more than 700,000 unnecessary negative biopsies were performed in the USA due to~70% false positive rate of PSA at 4 ng/mL in the cancer screening test [24]. If the 25-Gene Panel urine test was used after the PSA    (Table 4). Furthermore, important diagnostic measures including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the 25-Gene Panel urine test combined with PSA, PSA plus age, PSA-4, and PSA-4 plus age in the PSA Cohort were compared. As shown in Table 5

Identification of clinically significant cancer
It is important to develop accurate tests to identify and subtype clinically significant and insignificant PCa. We examined whether the 25-Gene Panel urine test could be used to identify clinically significant PCa. In the retrospective and prospective cohorts, 727 patients were diagnosed to have PCa by routine biopsy. Using the 25-Gene Panel urine test with a Stratification Algorithm (Additional file 1: Methods), clinically significant and insignificant PCa were identified with high accuracy as shown by AUC of 0.928 (95% CI 0.947-0.909) (Fig. 3). Such an accurate and convenient urine test may be used  could be used to show the absence of PCa after the tumors had been surgically removed by RP, we collected urine from ten patients before and after RP and performed diagnosis using the 25-Gene Panel. As shown in Table 6, nine out of ten urine samples (90%) were diagnosed to be non-PCa after RP, which was consistent with successful RP in most patients. The one patient diagnosed to be PCa may still have residual cancer lesion after the surgery and need additional treatment. The result of the preliminary study in the small patient cohort suggests that the 25-Gene Panel urine test has potential to be used as an accurate and simple method to measure efficacy of RP for treatment follow-up.

The 25-Gene Panel urine test is PCa-specific
In the urine cohorts, some patients had other types of cancers in addition to PCa or benign prostate (Table 2), especially urinary tract cancers such as bladder cancer, which might affect PCa diagnosis since cells of other cancers could be released into the urine. We have not The study population in the retrospective and prospective cohorts represented patients in real clinical practice as they were from the clinical cases obtained from the participating hospitals. These patients with elevated PSA underwent scheduled biopsy for cancer diagnosis/treatment. AUC analysis is an important tool to assess the diagnostic performance of the 25-Gene Panel. In addition, other important parameters including sensitivity, specificity, positive predictive value, negative predictive value, and odds ratio were used to assess the 25-Gene Panel. Thus, combining these measurements provided valid assessment of the 25-Gene Panel urine test.
Currently, none of the clinical parameters (i.e., PSA and its derivatives such as PHI), biomarkers (i.e., PCA3), or combinations of biomarkers or clinical parameters (i.e., PCA3 combined with TMPRSS2:ERG, microRNA signatures, metabolomic biomarkers) used in clinical practice or reported in publications was able to diagnose PCa or stratify cancer risk with > 90% sensitivity and specificity, and AUC over 0.9, as shown in several recent reviews [2, 4-6, 8-10, 25-27]. Our 25-Gene Panel urine test was validated for accurate cancer diagnosis by two independent multi-center study cohorts as well as the large combination cohort with uniformly high diagnostic sensitivity and specificity above 90% and AUC exceeding 0.9. In statistics, AUC of the ROC curve is an important measure of how accurate a classifier can predict future classification, and AUC over 0.9 indicates an accurate classifier [28]. The fact that the AUC values of the 25-Gene Panel urine test in all cohorts were well above 0.9 suggests it may be a more accurate and superior PCa diagnostic tool than PSA, clinical parameters, existing biomarkers, and their combinations. Our study found that the 25-Gene Panel urine test could be combined with PSA to provide exceptionally accurate diagnosis. In clinical practice, it may be combined with PSA, multiparametric MRI imaging, and biopsy to greatly improve diagnostic accuracy and avoid unnecessary biopsy and overdiagnosis.
For cancer diagnosis and treatment, it is important to identify clinically significant and insignificant cancer so patients with clinically significant cancer are given immediate treatment while clinically insignificant cancer patients are placed under active surveillance. In our study, we found that the 25-Gene Panel was able to accurately identify clinically significant and insignificant cancers. Thus, the 25-Gene Panel has great potential to improve cancer diagnosis and treatment.
In this study, the diagnostic performance of the 25-Gene Panel in the retrospective and prospective cohorts were similar, regardless of using freshly collected urine or frozen urine pellet stored for long term. In addition, the PCa patients in the retrospective cohort had a mean PSA level of 6.1 ng/mL, while the patients in the prospective cohort had a high average PSA level of 67.9 ng/ mL (Table 1). This showed that the diagnostic performance of the 25-Gene Panel was not affected by high PSA levels.
The similar diagnostic performance obtained in the cohorts consisting of patients with different ethnic background (Caucasians in the retrospective cohort and