Skip to main content
  • Research article
  • Open access
  • Published:

Construction of a risk stratification model integrating ctDNA to predict response and survival in neoadjuvant-treated breast cancer

Abstract

Background

The pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) of breast cancer is closely related to a better prognosis. However, there are no reliable indicators to accurately identify which patients will achieve pCR before surgery, and a model for predicting pCR to NAC is required.

Methods

A total of 269 breast cancer patients in Shandong Cancer Hospital and Liaocheng People’s Hospital receiving anthracycline and taxane-based NAC were prospectively enrolled. Expression profiling using a 457 cancer-related gene sequencing panel (DNA sequencing) covering genes recurrently mutated in breast cancer was carried out on 243 formalin-fixed paraffin-embedded tumor biopsies samples before NAC from 243 patients. The unique personalized panel of nine individual somatic mutation genes from the constructed model was used to detect and analyze ctDNA on 216 blood samples. Blood samples were collected at indicated time points including before chemotherapy initiation, after the 1st NAC and before the 2nd NAC cycle, during intermediate evaluation, and prior to surgery. In this study, we characterized the value of gene profile mutation and circulating tumor DNA (ctDNA) in combination with clinical characteristics in the prediction of pCR before surgery and investigated the prognostic prediction. The median follow-up time for survival analysis was 898 days.

Results

Firstly, we constructed a predictive NAC response model including five single nucleotide variant (SNV) mutations (TP53, SETBP1, PIK3CA, NOTCH4 and MSH2) and four copy number variation (CNV) mutations (FOXP1-gain, EGFR-gain, IL7R-gain, and NFKB1A-gain) in the breast tumor, combined with three clinical factors (luminal A, Her2 and Ki67 status). The tumor prediction model showed good discrimination of chemotherapy sensitivity for pCR and non-pCR with an AUC of 0.871 (95% CI, 0.797–0.927) in the training set, 0.771 (95% CI, 0.649–0.883) in the test set, and 0.726 (95% CI, 0.556–0.865) in an extra test set. This tumor prediction model can also effectively predict the prognosis of disease-free survival (DFS) with an AUC of 0.749 at 1 year and 0.830 at 3 years. We further screened the genes from the tumor prediction model to establish a unique personalized panel consisting of 9 individual somatic mutation genes to detect and analyze ctDNA. It was found that ctDNA positivity decreased with the passage of time during NAC, and ctDNA status can predict NAC response and metastasis recurrence. Finally, we constructed the chemotherapy prediction model combined with the tumor prediction model and pretreatment ctDNA levels, which has a better prediction effect of pCR with the AUC value of 0.961.

Conclusions

In this study, we established a chemotherapy predictive model with a non-invasive tool that is built based on genomic features, ctDNA status, as well as clinical characteristics for predicting pCR to recognize the responders and non-responders to NAC, and also predicting prognosis for DFS in breast cancer. Adding pretreatment ctDNA levels to a model containing gene profile mutation and clinical characteristics significantly improves stratification over the clinical variables alone.

Peer Review reports

Background

Breast cancer is the most commonly diagnosed cancer in females [1,2,3]. Neoadjuvant chemotherapy (NAC) has long been considered the preferred treatment approach for locally advanced breast cancer to downstage the tumor while concurrently allowing for in vivo assessment of tumor response to NAC [4]. Despite impressive successes, approximately 10% of breast cancer patients with no response fails to benefit from NAC [5]. These patients could benefit from stopping NAC and proceeding directly to surgery or switching to a different treatment. Therefore, assessing the sensitivity of patients to NAC is an important task in clinical practice.

Pathological complete response (pCR) by examination of surgical specimens after NAC is associated with long-term survival and has been used as the primary endpoint of neoadjuvant trials [6, 7]. A treatment monitoring biomarker that can accurately predict pCR before the surgery is required. Clinically, imaging evaluations, particularly ultrasound, mammography, and breast magnetic resonance imaging (MRI) are mainly used to evaluate the extension of the mass, which the disease progression has occurred [8, 9]. Therefore, more sensitive biomarkers are needed that will earlier detection of progression to identify the response to NAC.

Some polygenic predictor panels have been developed to predict the pCR to NAC and guide physicians to make adjuvant treatment decisions [10, 11]. Huang Liang et al. developed a predictor of the pCR in triple-negative breast cancer (TNBC) patients with DNA repair genes to NAC [12]. Masanori Osh et al. developed a five-gene score model to predict the pCR to NAC for estrogen receptor-positive/Her2-negative breast cancer and a novel three-gene score as a predictive biomarker for pCR after NAC in TNBC [10, 13]. However, these single-platform profiling markers fail to accurately predict the response to NAC with the complexity of the tumor ecosystem and the dynamic variability of treatment. Undoubtedly, clinicians continue to select NAC patients by clinical experience.

Circulating tumor DNA (ctDNA) corresponds to the DNA fragments released into the blood by the tumor, and utilizing ctDNA to track tumor progression has great potential for the clinical treatment [14]. ctDNA levels have been shown to be correlated with tumor loading to predict therapeutic prognosis and survival in pan-cancer [15, 16]. Chaudhuri et al. reported that ctDNA was detected in patients with disease recurrence 5.2 months earlier than imaging evaluations, offering therapeutic opportunities to treat patients while tumor burden and heterogeneity are at their lowest in lung cancer [17]. Continuous measurement of ctDNA is a potential method for early tumor surveillance, and it serves as an objective parameter for treatment response and earlier relapse detection.

Therefore, we conducted a study to establish a risk model based on the gene mutation profile and clinical characteristics integrating ctDNA dynamic monitoring of breast cancer patients treated with NAC to develop a predictable NAC response for breast cancer patients.

Methods

Patient eligibility

Approval for this prospective study was obtained from the Human Research Ethics Committee of Shandong Cancer Hospital (SDTHEC201802002) and was registered at clinicaltrial.gov (clinical trial No. NCT03688035). All patients provided written informed consent. From January 2016 to April 2020, a total of 269 female breast cancer patients who received NAC were enrolled, and 246 enrolled patients were eligible. Patients were excluded (n = 23) if they were defined with distant metastasis before NAC, failed to complete the chemotherapy regimen, or missed the clinical date. All patients were staged according to the American Joint Committee on Cancer (AJCC) 8th edition TNM staging guideline system. All of the patients received 4–8 cycles of NAC with anthracycline and taxane-based regimens, and all Her2-positive patients received anti-Her2 regimens including trastuzumab or dual Her2 blockade with trastuzumab plus pertuzumab before surgical resection of the tumor.

Miller-Payne (MP) grading and residual cancer burden (RCB) index were used to validate the response to NAC. The RCB evaluation system (www.mdanderson.org/breastCancer_RCB) was used to calculate the RCB index [18]. pCR (stage yp-T0/is, ypN0) was defined as RCB = 0, and residual disease (no-pCR) was placed into three predefined subgroups (RCB-I, RCB-II, and RCB-III) on the basis of predefined cut-off points of 1.36 and 3.28 index scores. RCB and pCR status were evaluated by two independent pathologists. If their conclusions were inconsistent, a third pathologist reassessed the situation.

Sample collection

A total of 243 tissue puncture samples from formalin fixation and paraffin embedding (FFPE) were obtained before the initiation of chemotherapy by core needle biopsy from 246 patients. Among the 246 patients, 56 patients underwent ctDNA testing of blood samples. The blood samples were collected from each patient dynamically over the course of NAC at four time points as follows: before NAC (T0), after the 1st NAC and before the 2nd NAC cycle (T1), during intermediate evaluation (T2), and after the end of NAC but before surgery (T3) (Fig. 1A). The baseline plasma samples of all 56 patients were collected, while six patients failed to complete the sample collection at all four time points, and the remaining 50 patients completed the entire plasma sample collection process (Fig. 1B).

Fig. 1
figure 1

The study design, sample collections, and patients’ response. A Sample collection at different points. B Condition of enrolled patients. C Number of patients in different groups responding to pCR

Tissue and plasma DNA preparation and genomic DNA extraction

A GeneRead DNA FFPE kit (Qiagen, USA) was used to extract genomic DNA (gDNA) from FFPE and fresh tissue samples, and a DNA blood Midi/Mini kit (Qiagen, USA) was used to extract genomic DNA from white blood cell samples according to the manufacturer's instructions. The MagMAX cell-free DNA Isolation Kit (Thermo Company) was used to separate plasma cell-free DNA (cfDNA). The quality of purified DNA was quantified by gel electrophoresis and using a Qubit® 4.0 fluorometer (Life Technologies, USA).

Construction of the NGS gene panel sequencing library based on gDNA and cfDNA

First, the purified gDNA was fragmented to approximately 200 bp by enzymatic hydrolysis (5X WGS Fragmentation Mix, Qiagen, USA). After end repair and A-tail connection, the two ends were connected with T-adaptors and then PCR amplified to form a prelibrary. The purified prelibrary was hybridized with a customized biotin probe pool (457 gene panel, Berry Oncology, Beijing, China) to capture the target clip. According to the manufacturer’s instructions, the 96 rxn xGen Exome Research Group v1.0 (Integrated DNA Technologies, USA) was used to prepare the final sequencing library.

For targeted sequencing of cfDNA, a prelibrary was prepared according to the method described above. Internally designed panels were used to capture cfDNA fragments and generate sequencing libraries. The sequencing library was applied to the NovaSeq 6000 platform (Illumina, San Diego, USA) in 150PE mode.

The generated sequence was trimmed, low-quality filtered, and subjected to variant calls. Variations were filtered into nonsynonymous SNP, indel, and splicing variations. For gDNA, somatic mutations were left with allele frequencies (VAF) ≥ 3%, and cancer hotspots were retained with VAF ≥ 1%. For cfDNA, the frequency of variant alleles (cut-off value ≥ 0.5%) was used to identify somatic mutations, the frequency of variant alleles (cut-off value ≤ 0.1%) was used to screen for cancer hot spots, and at least 20 high-quality reads were screened.

Bioinformatics analysis of gene mutations

FASTP [19] was used to trim adapters and delete low-quality sequences to obtain clean reads. The clean reads were compared with the Ensemble GRCh37/hg19 reference genome executed by BWA. The PCR repetitive sequence was processed to consensus sequence by GenCore [20], then SAMtools [21] was used to detect somatic single nucleotide variants (SNVs), insertions and deletions (InDels), and then false variants were filtered by a series of methods such as vaf cutoff, paired control sample, negative background database. HGVS variant description was annotated with ANNOVAR software [22]. After annotation, we used PopFreqMax > 0.05 to eliminate SNVs and InDels, then retained nonsynonymous SNVs and InDels with VAF > 0.5% or VAF > 0.1% among the cancer hot spots in the patient database for further analysis. Somatic copy number variants (CNVs) were called by CNVkit [23] through several steps, such as depth normalization and GC correction. If the copy number > 3, we marked the target as a target gain, and if the copy number < 1, we marked the target as a target loss. The tumor mutation burden (TMB) was defined as the number of all nonsynonymous mutations and indel variants per megabase of coding regions.

Statistical analysis

Baseline characteristics analysis was performed on all 246 patients, including the distribution of patients with baseline genomic or clinical characteristics in the pCR/non-pCR group and the correlation between baseline characteristics and pCR status or patient prognosis. The association analysis between mutation detection and prognosis was performed on 56 patients with continuous ctDNA test data (that is, the completion of the entire study).

Fisher’s exact test (two-sided test) was used to analyze the impact of baseline pathological characteristics (such as breast cancer type, Ki67, age, sex, and disease stage) and mutation characteristics on non-pCR. Those features related to the pCR/non-pCR state were selected to construct a predictive model for pCR/non-pCR prediction. Usually, we chose to perform the next step of data analysis with Fisher's exact test (P value < 0.05) for the feature. A total of 192 patients who had tissue samples from Shandong Cancer Hospital were randomized into a training cohort (n = 128) and a testing cohort (n = 64) using the “caret” R package. The training cohort was used to find a meaningful signature, and the testing cohort and patients of Liaocheng People’s Hospital (n = 51) were used to validate its efficiency.

In the training cohort, significant SNVs and CNVs with a mutation frequency greater than 10% including five SNVs and 10 CNVs mutations were subjected to multivariate stepwise logistic regression analysis. The nine mutated genes including five SNV mutations and four CNV mutations were identified by step-by-step logistic regression analysis. Then, a multi-response model was built including only clinical characteristics, only SNV characteristics, only CNV characteristics, both SNV and CNV characteristics, both SNV and clinical characteristics, both CNV and clinical characteristics, and SNV, CNV, and clinical characteristics for a total of 6 response models. All models were based on the random forest method and developed in a tenfold cross-validation (CV) schema. Performance was assessed in terms of accuracy (ACC), sensitivity, and specificity by receiver operating characteristic (ROC) curve analysis. A nomogram was also applied to estimate the performance of the signature. ROC analysis was performed using the “caret” and “ROCR” R packages and a nomogram was generated using the “rms” R package.

We defined disease-free survival (DFS) as the time from the breast surgery until disease progression (including local or distant recurrences) or death. We constructed a DFS prediction model based on the signature of the best model among the multiresponse models for 192 patients from Shangdong Cancer Hospital in the training set and test set. Then, a risk score was calculated for each patient, and patients were divided into high- and low-risk groups based on the median risk score. The ROC analysis was performed using the “timeROC” R package. Kaplan–Meier curves were generated for survival analysis, and the log-rank test was used for comparisons.

The threshold for ctDNA positivity was determined by the mutation numbers of nine genes included in the conducted tumor prediction model. If there were more than two mutations among these nine genes, then this sample was recognized as ctDNA positive [24]. ctDNA fractions were calculated as 2/(1/Max(VAF) + 1) [25]. One-way analysis of variance (ANOVA) was applied for comparison of ctDNA fractions in different groups. Use the random forest model to obtain the probability of treatment response or non-response for each sample, and combine the ctDNA status at different time points with the random forest model to construct a combined tissue and blood efficacy prediction model.

Conduct multivariable Cox regression analysis using the mutation levels of the nine genes at the tissue level, clinical features, and the ctDNA status at each time point to obtain the risk scores for each sample, and construct a DFS prediction model. Similarly, divide the samples into high and low-risk groups based on the median risk score.

A flowchart for the algorithm is shown in sFig. 4. All statistical analyses were performed using R software, version 3.6.3 (www.r-project.org). For all statistical analyses, p < 0.05 was considered statistically significant.

Results

Patient characteristics

A total of 192 patients who had tissue samples in Shandong Cancer Hospital and 51 patients in Liaocheng People’s Hospital were used to predict the residual cancer burden status of patients receiving NAC. The specific clinical information is shown in Table 1. The median age was 50 years old, and 59.3% were younger than 50 years old. Most patients were in stage III (51.9%). The subtypes of Her2 positivity, luminal A, luminal B (Her2-), luminal B (Her2 +), and TNBC accounted for 19.3% (47/243), 21.4% (52/243), 23.5% (57/243), 18.1% (44/243), and 17.7% (43/243), respectively. The pCR rate in different molecular subtypes was shown in Additional file 1: Table S1. Patients with Ki67 expression > 20% accounted for the main population (67.8%). Postoperative pathological examination showed that 61 patients (25.1%) had achieved pCR by RCB index. For those patients who showed non-pCR, 11.9% (29/243), 35.8% (87/243) and 27.2% (66/243) reached RCB-I, RCB-II and RCB-III, respectively. 87.2% of patients received the anthracyclines plus taxanes-based chemotherapy, 5.8% received the anthracycline-containing regimen only, and 7% of patients received taxanes-containing regimen alone (Additional file 1: Table S2 and Table S3). An overview of the study design was shown in Fig. 1.

Table 1 Patient clinical characteristics

Somatic mutation detection in tissue samples

To discover somatic variations in tissue samples used for model construction, we extracted DNA from FFPE samples of punctured tissue samples and performed NGS-based 457 gene panel testing. We analyzed and summarized the somatic mutations of 192 samples with high-frequency mutation ≥ 10%. The 425 unique genes were identified, and the top five highly mutated genes were TP53, KMT2C, PIK3CA, EPHA1 and EPPK1 with mutation frequencies of 64% (123/192), 47% (90/192), 44% (84/192), 42% (80/192), and 36% (69/192) (Fig. 2A and Additional file 1: Table S4). The results of somatic CNV showed that 114 genes in tumor tissues were amplified with at least three times of normal tissues. There were 43 genes that were deleted, and 200 genes were both amplified and deleted (Additional file 1: Table S5).

Fig. 2
figure 2

The landscape of clinical and mutational characteristics. A The landscape of highly mutated genes. B Significantly different SNV included in the model in different chemotherapy responses. C Significantly different CNV included in the model in different chemotherapy responses. D Comparing the differences in clinical characteristics of different chemotherapy responses. P values were calculated using Fisher’s exact test. The size of the white dots in B and C represents the size of the samples

Identified the features significantly associated with pCR

We investigated the relationship with pCR and gene mutation characteristics and clinical phenotypes. All 192 patients were randomly divided into two groups in a 2:1 ratio into a training set (n = 128) and a test set (n = 64). In the training set, we first performed differential mutation gene analysis on pCR and no-pCR (RCB-I, RCB-II, and RCB-III) patients. Using Fisher’s exact test, we found five different SNV mutated genes (TP53, SETBP1, PIK3CA, NOTCH4, and MSH2) and ten CNV mutated genes (B4GALT3-gain, CDK12-gain, EGFR-gain, PRDM1-gain, GATA2-loss, GNA11-loss, NOTCH3-loss, FOXP1-gain, IL7R-gain, and NFKB1A-gain) with a mutation frequency greater than 10% (Additional file 1: Table S6). To further screen for the factors used to predict pCR status, stepwise logistic regression was used to screen for the differentially mutated genes, and nine differentially mutated genes that were significantly related to the pCR status (MSH2, NOTCH4, PIK3CA, SETBP1, TP53, EGFR, FOXP1, IL7R, NFKB1A) were screened (Fig. 2B, C). Given the potential importance of clinical factors in NAC, we screened the clinically relevant factors of luminal A, Her2+, and Ki67 related to NAC using Fisher’s exact test. Patients with luminal A were less sensitive to NAC than patients with other subtypes (P < 0.001) and no luminal A patients achieved pCR. In Her2+ patients, the proportion of patients who achieved pCR after NAC was significantly higher than that of non-Her2+ patients (43.2% vs 20.8%, P = 0.01). In addition, patients with higher Ki67 expression pretherapy had a higher proportion of attained pCR patients (31.39% vs 9.26%, P = 0.001) (Fig. 2D).

Machine learning integrates the gene mutation status and clinical factors to build a tumor prediction model to predict the NAC response

Above, the nine mutant genes (five SNV mutations and four CNV mutations) and three clinical factors were identified that were associated with response to NAC. This motivated the use of a machine learning framework to integrate these factors into a predictive model of pCR. We investigated a number of prediction models including the gene mutation information and clinical factors alone and the combination of mutation information and clinical factors. The results found that a combination of nine mutant genes and three clinical factor models had the higher sensitivity and specificity than other combinations (Additional file 2: Fig. S1) in the training test (AUC: 0.871, 95% CI: 0.797–0.927), in the verification set (AUC: 0.771, 95% CI: 0.649–0.883) and in the extra test (AUC: 0.726, 95% CI: 0.556–0.865) (Fig. 3A). We performed a multivariate logistic analysis of NAC response in the training cohort to generate a nomogram to predict the results of pCR according to RCB index after NAC. Among the nine mutant genes and three clinical factors, MSH2, FOXP1 and luminal A were the three most important factors (Fig. 3B). Given the role of MP scoring, which has universal application in the clinic, we also tested the applicability of the model to MP scores and found that our model also predicted pCR and non-pCR well in MP score classification (Fig. 3C), and the nomogram showed that MSH2 still had the greatest importance (Fig. 3D).

Fig. 3
figure 3

Predicting response and DFS combing mutation characteristics and clinical characteristics. A The ROC curve of predictive model in RCB index system. B Nomogram from stepwise logistic regression for predicting pCR in RCB index system. C The ROC curve of predictive model in MP scoring system. D Nomogram from stepwise logistic regression for predicting pCR in MP scoring system. E Predicting DFS combing important mutation and clinical characteristics. F Kaplan–Meier curves for patients in high- and low-risk groups. Response rate refers to the probability of a patient responding to treatment

Given that patients who achieved a pCR had better OS than patients who did not, we tested the predictive ability of the model on the prognosis. In total, 23 patients were lost to follow-up in the 192 enrolled patients from Shandong Cancer Hospital. The median follow-up time after surgery was 898 days (97 to 1765 days), and 28 of the 169 patients (17.6%) progressed. The results showed that the model also has a good predictive effect on the prognosis of NAC patients, and the predictive effect of the long-term prognosis was better than the short-term predictive effect (AUC at 1 year = 0.749 vs. AUC at 3 years = 0.830) (Fig. 3E). According to the median risk score, patients were divided into high-risk groups and low-risk groups. The DFS of the low-risk group was significantly higher than that of the high-risk group (P < 0.0001) (Fig. 3F).

The role of ctDNA used to predict NAC response in dynamic monitoring

A personalized panel consisting of nine somatic mutation genes was selected to detect and analyze ctDNA from the tumor model factors for predicting NAC response. Fifty-six patients among all the 246 patients underwent ctDNA testing. two hundred sixteen blood samples were collected dynamically over the course of NAC (plasma samples were collected from 56 patients of T0 and T1, 54 patients of T2, and 50 patients of T3). A sample with at least two detectable somatic variations was considered positive for ctDNA (Additional file 1: Table S7). Before treatment (T0), 46% of patients were ctDNA positive (Fig. 4A). Patients with TNBC had a higher expression of positive ctDNA (80%) compared with other subtypes while luminal A and luminal B patients mainly had negative ctDNA (Fig. 4B). In addition, patients with low Ki67 status expressed negative ctDNA (70%) (Fig. 4B).

Fig. 4
figure 4

Mutation landscape of ctDNA. A Overview of ctDNA status, clinical characters, and response at baseline (T0). B Proportion of ctDNA-positive and ctDNA-negative patients at baseline (T0) according to clinical characteristics. C Proportion of ctDNA-positive and ctDNA-negative patients at different time points. D Comparing the difference of ctDNA fraction at different time points. P values were calculated using one-way analysis of variance

The ctDNA positivity rate decreased with the passage of time during NAC. In the entire population, ctDNA positivity gradually declined during NAC, from 46% before treatment (T0) to 14% before the 2nd NAC cycle (T1), and it was 13% during intermediate evaluation (T2), and after NAC (T3), it dropped to 10% (Fig. 4C). Similarly, the ctDNA fraction also decreased with the passage of NAC time (Fig. 4D).

The clearance dynamics of ctDNA reflected the NAC response

To investigate whether the dynamic change of ctDNA could related to the NAC response, we constructed five patterns based on the clearance dynamics of ctDNA expression in 50 patients who had complete data at all four time points, with cleared at all time points (T0, n = 26, 52%), cleared at T1 (n = 13, 26%), cleared at T2 (n = 4, 8%), cleared at T3 (n = 2, 4%), and patients who remained ctDNA positive after NAC (T3) (n = 5, 10%) (Fig. 5A). We identified 45 patients who had both survival data and ctDNA status at all four time points. The rate for positive detection of ctDNA decreased during the NAC, and the positive rate dropped from 44.4% (T0, 20/45) to 11.1% (T3, 5/45) (Fig. 5B). All the patients with pCR had undetectable ctDNA at T2 and T3 with no disease progression (Additional file 2: Fig. S2A, Fig. 5B). In contrast, patients with ctDNA-positive at T2 and T3 not achieved pCR (Fig. 5B). In addition, the patients with disease progression were mainly RCB-III (75%, 6/8) and RCB-II (25%, 2/8) (Fig. 5B).

Fig. 5
figure 5

The association between the dynamic changes of ctDNA and DFS or response in the course of NAC. A Patients with complete ctDNA data for four time points (n = 50) were grouped according to the different patterns of ctDNA clearance or non-clearance. B Sankey plot showing the dynamic changes of patients with complete ctDNA data and DFS data (n = 45). C Sankey plot showing ctDNA dynamics in ctDNA-positive patients at T0. D DFS in ctDNA-cleared patients and non-cleared patients during NAC. E Kaplan–Meier analysis of DFS stratified based on ctDNA status after NAC (T3) and response to treatment, RCB means no-pCR

To detect ctDNA dynamically for patients whose ctDNA is detected before the NAC (T0), we performed monitoring of the ctDNA expression at T1, T2, and T3. The positive rate of ctDNA gradually decreased as the NAC treatment with 35% (7/20) at T1 and 20% at T2 and T3 (4/20) (Fig. 5C). Among patients who did not clear ctDNA at T1, as many as 85.8% had residual disease at the time of surgery (6/7 non-pCR), while 69% of patients who cleared ctDNA at T1 had residual disease (9/13 non-pCR). At T2 and T3, in patients who did not have clear ctDNA, 100% had residual disease during surgery (4/4 non-pCR), while in patients who cleared ctDNA, 69% (11/16 non-pCR) had residual disease. The positive predictive value of ctDNA increased with treatment time (Additional file 2: Fig. S2B).

Dynamic changes in ctDNA are significantly related to metastasis and recurrence

To assess whether ctDNA status was related to metastasis and recurrence, we analyzed the association with ctDNA dynamic pattern and DFS. Patients who did not clear ctDNA at T3 (n = 5) had a significantly higher risk of metastasis and recurrence than patients who cleared ctDNA at T0, T1, T2, and T3 (Fig. 5D, HR 4.61; 95% CI, 1.05–20.19, P = 0.027). Compared with patients who were ctDNA negative at T0 (n = 22), patients who were ctDNA negative at T1, T2, or T3 (n = 18) had a similar risk of metastasis and recurrence (Additional file 2: Fig. S3). Patients who had cleared ctDNA at T1, T2, T3 had longer DFS than patients who had not cleared ctDNA at T3 (Additional file 2: Fig. S3).

The clearance of ctDNA after NAC (T3) is related to the improvement of the survival rate. After NAC, patients were stratified according to pCR and ctDNA status (n = 45). Seven patients with pCR (100%) (all ctDNA negative) showed good DFS. Among patients who did not achieve pCR (n = 38), ctDNA positivity (n = 5) was related to worse DFS. The probability of recurrence differed between patients who failed to achieve pCR, being greater in RCB/ctDNA + groups compared with the RCB/ctDNA- group (Fig. 5E, HR 3.92; 95% CI, 0.9–17.02, P = 0.061).

The chemotherapy prediction model integrating ctDNA status before NAC has a better prediction effect

To further improve the accuracy in predicting NAC response to further predict the pCR status of breast cancer patients after NAC, we calculated the probability of pCR and non-pCR of the sample through the established tumor prediction model, and then we combined it with the negative and positive status of ctDNA at different time points and used random forest to construct a chemotherapy model. Firstly, the status of ctDNA before NAC combined with the tumor prediction model has a better prediction effect of pCR with the AUC of 0.961 (Fig. 6A). We constructed a chemotherapy prediction model by combining tumor prediction models with ctDNA status at different time points (T0, T1, T2, T0 and T1, T0 and T1 and T2) in order to assess the impact of dynamic changes in ctDNA on prediction. It was found that the prediction model was compatible with ctDNA, and combined with ctDNA status at different time points had similar results for AUC (Fig. 6B, T0 = 0.961, T1 = 0.951, T2 = 0.92, T0/T1 = 0.961, T0/T1/T2 = 0.961). To clarify the impact of the chemotherapy prediction model on patients' prognosis, we analyzed the predictive effect of the DFS. It showed a better predictive effect (AUC at 1 year = 1.000, AUC at 2 years = 0.941) on DFS (Fig. 6C). The patients were divided into high-risk groups and low-risk groups according to the median risk score. The DFS of the low-risk group was significantly higher than that of the high-risk group (P = 0.0031) (Fig. 6D).

Fig. 6
figure 6

The prediction effect of pCR and the prognosis by a combination of the prediction model and ctDNA monitoring. A The pCR prediction determined by the chemotherapy predictive model constructed by combining the information from the established tumor prediction model (including DNA mutations and clinical factors), along with the information from ctDNA status. B Different chemotherapy predictive models are established using random forest based on the expression status of ctDNA at different time points of T0, T1, and T2. C The predictive effect for the chemotherapy predictive model on the prognosis of NAC patients. D Kaplan–Meier curves for patients in high- and low-risk group

Discussion

In this study, we developed 2 prediction model for predicting the sensitivity of NAC. First, we constructed a tumor prediction model for predicting pCR based on the DNA mutations of tumor tissue and clinical information. Then, we analyzed the relation between the ctDNA in dynamic monitoring and the NAC response. Finally, we constructed the chemotherapy prediction model integrating ctDNA status before NAC and tumor prediction model that composed of 9-gene mutant in tumor, the clinical factors, and the ctDNA status. The chemotherapy prediction model is a good predictor for the efficacy in NAC to guide therapy, but also predicts the prognosis in DFS (Fig. 7).

Fig. 7
figure 7

The pattern diagram guiding the clinical application of the therapeutic efficacy prediction model

The MP scoring and RCB index system are commonly recommended for pathological assessments after NAC for breast cancer. Compared with MP scoring, the RCB index system, first described in 2007, provides a comprehensive assessment of tumors for evaluating axillary lymph nodes and cell density in primary tumors [18]. RCB index system has become widely accepted to replace MP scoring to evaluate tumor regression due to the role in predicting long-term survival after NAC in breast cancer [26, 27]. Consequently, we used the RCB index to assess the chemotherapeutic response in the prediction models. In addition, our model based on the RCB index system also remained feasible with good sensitivity and specificity for the MP scoring system in the training set, validation set, and external validation set.

A mounting number of studies have suggested that patients with pCR after NAC have a better prognosis compared to those with residual disease [28,29,30]. The NSABP B-18 and NSABP B-27 trials reported that pCR was associated with improved prognosis after NAC in breast cancer [31, 32], which was also confirmed in subsequent cohort studies and meta-analyses [28, 33, 34]. Despite patients who obtain pCR after NAC having a better prognosis, the current pCR rate of patients receiving NAC is less than 50% [28, 35]. Our study is in line with these data, reporting a pCR rate of 25.1% (61/243) according to the RCB index system and 33.3% (81/243) by the MP scores in all 243 patients, suggesting that the majority of patients (non-pCR), there might not be benefit in survival from the routine use of NAC and experience unnecessary exposure to chemotherapy and delayed surgery treatment. Therefore, sensitive and specific markers are needed to distinguish between pCR and non-pCR patients after NAC.

We established a tumor prediction model by combining clinical factors (including molecular subtype and Ki67 status) and gene mutation information (MSH2, NOTCH4, PIK3CA, SETBP1, TP53, EGFR, FOXP1, IL7R, NFKB1), which had good sensitivity and specificity for predicting pCR in the training set, validation set, and external validation set. In our study, we observed that MSH2 and FOXP1 gene mutations and the luminal A subtype ranked in the top three in the importance of model predictors, consistent with previous studies. The MSH2 gene is part of the DNA mismatch repair system (MMR), which binds to DNA mismatches to initiate DNA repair. Previous studies have reported the importance of MSH2 in the role of resistance to chemotherapeutic drugs and causing progression to advanced stages in patients with NAC [36, 37]. FOXP1 is a member of the FOX transcription factor family which is associated with the development and prognosis in tumors. In a cross-sectional study of breast cancer, an analysis of stage I to III breast cancer patients who received NAC from 2018 to 2019 found that, in response to treatment, there was a significant association between complete response and FOXP1 (p = 0.01) [38]. The insensitivity of the luminal A subtype of breast cancer to NAC has been reported. In the I-SPY 1 trial, it was found that the most insensitive subtype to adjuvant chemotherapy or NAC was luminal A, with a pCR rate of only 9% [39]. Our results provided more evidence for the predictive value of incorporating gene mutant signatures into clinical risk stratification.

Additionally, considering the high concordance rate of somatic mutations between ctDNA and tumor DNA, we analyzed ctDNA using a unique personalized panel from the constructed model [40]. Our study tracks up to 9 patient-specific somatic variants at the same time for offering a performance in the heterogeneity of a patient’s tumor as previously reported [41]. Nevertheless, several limitations were associated with this method. For example, newly emergent somatic variants which presented during tumor evolution in response to NAC treatment were usually not detectable.

Our data found that a significant reduction in pre-operative ctDNA level during the NAC could predict the pCR, suggesting that dynamic ctDNA monitoring may be helpful in tailoring the treatment regimen of NAC. Among all patients with ctDNA positive in T0, the positive rate of ctDNA gradually decreased during the NAC treatment with 35% at T1 and 20% at T2 and T3. The patients with pCR after NAC were all patients who tested negative for ctDNA at T2 and T3. At T2 and T3, no ctDNA-positive patient achieved pCR and the main grade was RCB-III. These results suggest that the ctDNA status of patients who experienced at least one chemotherapy cycle may predict the response of NAC. Furthermore, the ability to predict the response of NAC became more accurate with increasing numbers of cycles of chemotherapy, and the time points of T2 (during intermediate evaluation) and T3 (after the end of NAC but before surgery) may be taken to obtain blood samples for ctDNA analysis.

Interestingly, we demonstrated the prognostic value of ctDNA status, therefore, might probably act as a promising predictor of metastasis and recurrence in breast cancer patients with NAC. The patients with ctDNA-positive at T3 had significantly worse DFS compared with the patients with ctDNA-negative. We observed similar results after stratifying according to pCR and ctDNA status after NAC. The patients who achieved pCR with ctDNA negative had the best outcomes, and patients who failed to achieve pCR with ctDNA positive had the worst outcomes. In patients who did not achieve pCR, ctDNA negative was significantly associated with better DFS than ctDNA positive. Our results are consistent with recent studies [41]. The presence of ctDNA reflects the presence of metastatic tumor burden, and the presence of elevated ctDNA levels predicts disease progression [42, 43].

Considering the importance of ctDNA to predict patient response to NAC [44], ctDNA status was added as a categorical factor to refine the tumor predictive model [45]. We constructed the chemotherapy predictive model using the ctDNA status before NAC combined with the tumor predictive model. This new chemotherapy predictive model effectively reflected the sensitivity of NAC and predicted the patient’s prognosis for distinguish high- and low-risk patients. It is vital to predict and judge the response to the NAC for subsequent treatment strategies. Research has demonstrated that the high residual cancer burden after NAC in breast cancer signifies a poor prognosis. Additionally, these non-responders can benefit from additional adjuvant chemotherapy. As shown in the recently published CREATE-X trial, capecitabine was used for 6 months in TNBC patients who did not obtain pCR, leading to an improved overall survival rate and DFS [46, 47]. Our results may have important implications for predicting the response of NAC and rational treatment guidance.

In this study, there are some limitations. Firstly, the tumor model constructed by tumor DNA mutant and clinical information has been internally validated and externally validated. However, the chemotherapy model constructed by ctDNA status has not been further validated due to the limited number of patients. Only 56 had the ctDNA analysis and there was no external independent verification. Second, the follow-up period was limited to 2–3 years, and a long-term follow-up is needed to confirm our results. Therefore, the results will be further verified by a prospective cohort with a larger sample size and longer follow-up time.

Conclusions

The focus of this study was to construct a prediction model in the neoadjuvant setting. Multiscale approaches that integrate clinical and genomic DNA mutations of tumors were used to construct the predictive model to predict response to NAC and prognosis. In addition, given the prognostic value of ctDNA in breast cancer patients, we included pretreatment ctDNA levels in the predictive model. The model integrating ctDNA may have a more reliable predictive efficacy, but a larger cohort of patients is needed to validate our findings.

Availability of data and materials

The raw data of the datasets generated during the current study could be achieved from the public accession GSA-Human database (HRA004909), and the processed data are available from the corresponding author on reasonable request.

Abbreviations

ACC:

Accuracy

AJCC:

American Joint Committee on Cancer

CNV:

Copy number variation

ctDNA:

Circulating tumor DNA

DFS:

Disease-free survival

FFPE:

Formalin fixation and paraffin embedding

MP:

Miller-Payne

MRI:

Magnetic resonance imaging

NAC:

Neoadjuvant chemotherapy

pCR:

Pathological complete response

ROC:

Receiver operating characteristic

SNV:

Single nucleotide variants

TMB:

Tumor mutation burden

TNBC:

Triple-negative breast cancer

VAF:

Variant allele frequency

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

  2. Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, Jemal A, Siegel RL. Breast cancer statistics, 2022. CA Cancer J Clin. 2022;72:524.

    Article  PubMed  Google Scholar 

  3. Chai C, Wu HH, Abuetabh Y, Sergi C, Leng R. Regulation of the tumor suppressor PTEN in triple-negative breast cancer. Cancer Lett. 2022;527:41–8.

    Article  CAS  PubMed  Google Scholar 

  4. Pusztai L, Foldi J, Dhawan A, DiGiovanna MP, Mamounas EP. Changing frameworks in treatment sequencing of triple-negative and HER2-positive, early-stage breast cancers. Lancet Oncol. 2019;20(7):e390–6.

    Article  PubMed  Google Scholar 

  5. Caudle AS, Gonzalez-Angulo AM, Hunt KK, Liu P, Pusztai L, Symmans WF, Kuerer HM, Mittendorf EA, Hortobagyi GN, Meric-Bernstam F. Predictors of tumor progression during neoadjuvant chemotherapy in breast cancer. J Clin Oncol. 2010;28(11):1821–8.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Symmans WF, Yau C, Chen YY, Balassanian R, Klein ME, Pusztai L, Nanda R, Parker BA, Datnow B, Krings G, et al. Assessment of residual cancer burden and event-free survival in neoadjuvant treatment for high-risk breast cancer: an analysis of data from the I-SPY2 randomized clinical trial. JAMA Oncol. 2021;7:1654.

    Article  PubMed  Google Scholar 

  7. Sardesai SD, Thomas A, Gallagher C, Lynce F, Ottaviano YL, Ballinger TJ, Schneider BP, Storniolo AM, Bauchle A, Althouse SK, et al. Inhibiting fatty acid synthase with omeprazole to improve efficacy of neoadjuvant chemotherapy in patients with operable TNBC. Clin Cancer Res. 2021;27:5810.

    Article  CAS  PubMed  Google Scholar 

  8. Steinhof-Radwanska K, Grazynska A, Lorek A, Gisterek I, Barczyk-Gutowska A, Bobola A, Okas K, Lelek Z, Morawska I, Potoczny J, et al. Contrast-enhanced spectral mammography assessment of patients treated with neoadjuvant chemotherapy for breast cancer. Curr Oncol. 2021;28(5):3448–62.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Woitek R, McLean MA, Ursprung S, Rueda OM, Manzano Garcia R, Locke MJ, Beer L, Baxter G, Rundo L, Provenzano E, et al. Hyperpolarized carbon-13 MRI for early response assessment of neoadjuvant chemotherapy in breast cancer patients. Cancer Res. 2021;81:6004.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Oshi M, Gandhi S, Angarita FA, Kim TH, Tokumaru Y, Yan L, Matsuyama R, Endo I, Takabe K. A novel five-gene score to predict complete pathological response to neoadjuvant chemotherapy in ER-positive/HER2-negative breast cancer. Am J Cancer Res. 2021;11(7):3611–27.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Liu XY, Jiang W, Ma D, Ge LP, Yang YS, Gou ZC, Xu XE, Shao ZM, Jiang YZ. SYTL4 downregulates microtubule stability and confers paclitaxel resistance in triple-negative breast cancer. Theranostics. 2020;10(24):10940–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Huang L, Lang GT, Liu Q, Shi JX, Shao ZM, Cao AY. A predictor of pathological complete response to neoadjuvant chemotherapy in triple-negative breast cancer patients with the DNA repair genes. Ann Transl Med. 2021;9(4):301.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Oshi M, Angarita FA, Tokumaru Y, Yan L, Matsuyama R, Endo I, Takabe K. A novel three-gene score as a predictive biomarker for pathologically complete response after neoadjuvant chemotherapy in triple-negative breast cancer. Cancers (Basel). 2021;13(10):2401.

    Article  CAS  PubMed  Google Scholar 

  14. Liu C, Xiang X, Han S, Lim HY, Li L, Zhang X, Ma Z, Yang L, Guo S, Soo R, et al. Blood-based liquid biopsy: insights into early detection and clinical management of lung cancer. Cancer Lett. 2022;524:91–102.

    Article  CAS  PubMed  Google Scholar 

  15. Zhang Y, Yao Y, Xu Y, Li L, Gong Y, Zhang K, Zhang M, Guan Y, Chang L, Xia X, et al. Pan-cancer circulating tumor DNA detection in over 10,000 Chinese patients. Nat Commun. 2021;12(1):11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lee RJ, Gremel G, Marshall A, Myers KA, Fisher N, Dunn JA, Dhomen N, Corrie PG, Middleton MR, Lorigan P, et al. Circulating tumor DNA predicts survival in patients with resected high-risk stage II/III melanoma. Ann Oncol. 2018;29(2):490–6.

    Article  CAS  PubMed  Google Scholar 

  17. Chaudhuri AA, Chabon JJ, Lovejoy AF, Newman AM, Stehr H, Azad TD, Khodadoust MS, Esfahani MS, Liu CL, Zhou L, et al. Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling. Cancer Discov. 2017;7(12):1394–403.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Symmans WF, Peintinger F, Hatzis C, Rajan R, Kuerer H, Valero V, Assad L, Poniecka A, Hennessy B, Green M, et al. Measurement of residual breast cancer burden to predict survival after neoadjuvant chemotherapy. J Clin Oncol. 2007;25(28):4414–22.

    Article  PubMed  Google Scholar 

  19. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Chen SF, Zhou YQ, Chen YR, Huang TX, Liao WT, Xu Y, Li ZC, Gu J. Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data. Bmc Bioinformatics. 2019;20(1):1.

    Article  Google Scholar 

  21. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. Genome project data processing s: the sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16): e164.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: genome-wide copy number detection and visualization from targeted DNA Sequencing. Plos Comput Biol. 2016;12(4):e1004873.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Bratman SV, Yang SYC, Iafolla MAJ, Liu ZH, Hansen AR, Bedard PL, Lheureux S, Spreafico A, Razak AA, Shchegrova S, et al. Personalized circulating tumor DNA analysis as a predictive biomarker in solid tumor patients treated with pembrolizumab. Nature Cancer. 2020;1(9):873-+.

    Article  CAS  PubMed  Google Scholar 

  25. Annala M, Vandekerkhove G, Khalaf D, Taavitsainen S, Beja K, Warner EW, Sunderland K, Kollmannsberger C, Eigl BJ, Finch D, et al. Circulating tumor DNA genomics correlate with resistance to abiraterone and enzalutamide in prostate cancer. Cancer Discov. 2018;8(4):444–57.

    Article  CAS  PubMed  Google Scholar 

  26. Symmans WF, Wei C, Gould R, Yu X, Zhang Y, Liu M, Walls A, Bousamra A, Ramineni M, Sinn B, et al. Long-term prognostic risk after neoadjuvant chemotherapy associated with residual cancer burden and breast cancer subtype. J Clin Oncol. 2017;35(10):1049–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Yau C, Osdoit M, van der Noordaa M, Shad S, Wei J, de Croze D, Hamy AS, Lae M, Reyal F, Sonke GS, et al. Residual cancer burden after neoadjuvant chemotherapy and long-term survival outcomes in breast cancer: a multicentre pooled analysis of 5161 patients. Lancet Oncol. 2022;23(1):149–60.

    Article  CAS  PubMed  Google Scholar 

  28. Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, Bonnefoi H, Cameron D, Gianni L, Valagussa P, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. Lancet. 2014;384(9938):164–72.

    Article  PubMed  Google Scholar 

  29. Spring L, Greenup R, Niemierko A, Schapira L, Haddad S, Jimenez R, Coopey S, Taghian A, Hughes KS, Isakoff SJ, et al. Pathologic complete response after neoadjuvant chemotherapy and long-term outcomes among young women with breast cancer. J Natl Compr Canc Netw. 2017;15(10):1216–23.

    Article  PubMed  Google Scholar 

  30. Rodrigues-Ferreira S, Nahmias C. Predictive biomarkers for personalized medicine in breast cancer. Cancer Lett. 2022;545: 215828.

    Article  CAS  PubMed  Google Scholar 

  31. Bear HD, Anderson S, Brown A, Smith R, Mamounas EP, Fisher B, Margolese R, Theoret H, Soran A, Wickerham DL, et al. The effect on tumor response of adding sequential preoperative docetaxel to preoperative doxorubicin and cyclophosphamide: preliminary results from National surgical adjuvant breast and bowel project protocol B-27. J Clin Oncol. 2003;21(22):4165–74.

    Article  CAS  PubMed  Google Scholar 

  32. Rastogi P, Anderson SJ, Bear HD, Geyer CE, Kahlenberg MS, Robidoux A, Margolese RG, Hoehn JL, Vogel VG, Dakhil SR, et al. Preoperative chemotherapy: updates of National surgical adjuvant breast and bowel project protocols B-18 and B-27. J Clin Oncol. 2008;26(5):778–85.

    Article  PubMed  Google Scholar 

  33. Spring LM, Fell G, Arfe A, Sharma C, Greenup R, Reynolds KL, Smith BL, Alexander B, Moy B, Isakoff SJ, et al. Pathologic complete response after neoadjuvant chemotherapy and impact on breast cancer recurrence and survival: a comprehensive meta-analysis. Clin Cancer Res. 2020;26(12):2838–48.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Consortium IST, Yee D, DeMichele AM, Yau C, Isaacs C, Symmans WF, Albain KS, Chen YY, Krings G, Wei S, et al. Association of event-free and distant recurrence-free survival with individual-level pathologic complete response in neoadjuvant treatment of stages 2 and 3 breast cancer: three-year follow-up analysis for the I-SPY2 adaptively randomized clinical trial. JAMA Oncol. 2020;6(9):1355–62.

    Article  Google Scholar 

  35. Li S, Zhang Y, Zhang P, Xue S, Chen Y, Sun L, Yang R. Predictive and prognostic values of tumor infiltrating lymphocytes in breast cancers treated with neoadjuvant chemotherapy: A meta-analysis. Breast. 2022;66:97–109.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Malik SS, Masood N, Asif M, Ahmed P, Shah ZU, Khan JS. Expressional analysis of MLH1 and MSH2 in breast cancer. Curr Probl Cancer. 2019;43(2):97–105.

    Article  PubMed  Google Scholar 

  37. Dasgupta H, Islam S, Alam N, Roy A, Roychoudhury S, Panda CK. Hypomethylation of mismatch repair genes MLH1 and MSH2 is associated with chemotolerance of breast carcinoma: clinical significance. J Surg Oncol. 2019;119(1):88–100.

    Article  CAS  PubMed  Google Scholar 

  38. Su H, Liu Y, Zhang C, Yu T, Niu Y. PRMT5 and FOXP1 expression profile in invasive breast cancer patients undergoing neoadjuvant chemotherapy. Cell Mol Biol (Noisy-le-grand). 2020;66(2):142–5.

    Article  PubMed  Google Scholar 

  39. Esserman LJ, Berry DA, DeMichele A, Carey L, Davis SE, Buxton M, Hudis C, Gray JW, Perou C, Yau C, et al. Pathologic complete response predicts recurrence-free survival more effectively by cancer subset: results from the I-SPY 1 TRIAL–CALGB 150007/150012, ACRIN 6657. J Clin Oncol. 2012;30(26):3242–9.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Zhang J, Dai D, Tian J, Li L, Bai J, Xu Y, Wang Z, Tang A. Circulating tumor DNA analyses predict disease recurrence in non-muscle-invasive bladder cancer. Front Oncol. 2021;11: 657483.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Magbanua MJM, Swigart LB, Wu HT, Hirst GL, Yau C, Wolf DM, Tin A, Salari R, Shchegrova S, Pawar H, et al. Circulating tumor DNA in neoadjuvant-treated breast cancer reflects response and survival. Ann Oncol. 2021;32(2):229–39.

    Article  CAS  PubMed  Google Scholar 

  42. Fribbens C, Garcia Murillas I, Beaney M, Hrebien S, O’Leary B, Kilburn L, Howarth K, Epstein M, Green E, Rosenfeld N, et al. Tracking evolution of aromatase inhibitor resistance with circulating tumour DNA analysis in metastatic breast cancer. Ann Oncol. 2018;29(1):145–53.

    Article  CAS  PubMed  Google Scholar 

  43. Garcia-Murillas I, Schiavon G, Weigelt B, Ng C, Hrebien S, Cutts RJ, Cheang M, Osin P, Nerurkar A, Kozarewa I, et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci Transl Med. 2015;7(302):302ra133.

    Article  PubMed  Google Scholar 

  44. Kingston B, Cutts RJ, Bye H, Beaney M, Walsh-Crestani G, Hrebien S, Swift C, Kilburn LS, Kernaghan S, Moretti L, et al. Genomic profile of advanced breast cancer in circulating tumour DNA. Nat Commun. 2021;12(1):2423.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Olivier T, Prasad V. Molecular testing to deliver personalized chemotherapy recommendations: risking over and undertreatment. Bmc Med. 2022;20(1):1.

    Article  Google Scholar 

  46. Masuda N, Lee SJ, Ohtani S, Im YH, Lee ES, Yokota I, Kuroi K, Im SA, Park BW, Kim SB, et al. Adjuvant capecitabine for breast cancer after preoperative chemotherapy. N Engl J Med. 2017;376(22):2147–59.

    Article  CAS  PubMed  Google Scholar 

  47. Liu Z, Shan J, Yu Q, Wang X, Song X, Wang F, Li C, Yu Z, Yu J. Real-world data on apatinib efficacy - results of a retrospective study in metastatic breast cancer patients pretreated with multiline treatment. Front Oncol. 2021;11: 643654.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank all the patients and family members who gave their consents on presenting the data in this study, as well as the investigators and research staff involved.

Funding

The study was supported by funds from the Academic Promotion Program of Shandong First Medical University (2019ZL002), Research Unit of Radiation Oncology, Chinese Academy of Medical Sciences (2019RU071), the foundation of National Natural Science Foundation of China (81627901, 81972863 and 82030082), the foundation of Natural Science Foundation of Shandong (ZR2023QH187), the foundation of China Postdoctoral Science Foundation (2023M732024), and the Jinan clinical medical science and technology innovation plan (202328076).

Author information

Authors and Affiliations

Authors

Contributions

DC, JY, and ZY: scientific guidance and resources. ZL: designed the manuscript. BY and MS: data preprocessing and technical implementation. ZL, CY, CL1, XW, XS, CL2, and FW acquired clinical data and performed patient follow-up. JM and MW acquired tissue samples for genomic profiling. All authors participated in data interpretation. ZL wrote the manuscript with the help of all of the authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Dawei Chen, Jinming Yu or Zhiyong Yu.

Ethics declarations

Ethics approval and consent to participate

Approval for this prospective study was obtained from the Human Research Ethics Committee of Shandong Cancer Hospital (SDTHEC201802002) and was registered at clinicaltrial.gov (clinical trial No. NCT03688035). All patients provided written informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: sTable 1.

The pCR rate in different molecular subtype. stable 2. Patient Clinical Characteristics (Train and Test Datasets). stable 3. Patient Clinical Characteristics (Extra Test Datasets). stable 4. Patient SNV Characteristics (Train and Test Datasets). stable 5. Patient CNV Characteristics (Train and Test Datasets). stable 6. significant SNV and CNV. stable 7. Patient ctDNA Characteristics.

Additional file 2: sFig. 1.

Comparing the differences in the performance of predicting pCR with different clinical factors, SNV and CNV characters. sFig. 2. (A) Sankey plot showing the differences in patients ctDNA cleared at different time points (T1, T2, T3). (B) Sankey plot showing the differences in patients with positive ctDNA at different time points (T1, T2, T3). sFig. 3. Kaplan–Meier analysis of DFS stratified based on ctDNA status during NAC. sFig 4. Overall algorithm flowchart for the predictive model construction.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Yu, B., Su, M. et al. Construction of a risk stratification model integrating ctDNA to predict response and survival in neoadjuvant-treated breast cancer. BMC Med 21, 493 (2023). https://doi.org/10.1186/s12916-023-03163-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12916-023-03163-4

Keywords