Computer-assisted quantification of tumor-associated collagen signatures to improve the prognosis prediction of breast cancer

Background Collagen fibers play an important role in tumor initiation, progression, and invasion. Our previous research has already shown that large-scale tumor-associated collagen signatures (TACS) are powerful prognostic biomarkers independent of clinicopathological factors in invasive breast cancer. However, they are observed on a macroscale and are more suitable for identifying high-risk patients. It is necessary to investigate the effect of the corresponding microscopic features of TACS so as to more accurately and comprehensively predict the prognosis of breast cancer patients. Methods In this retrospective and multicenter study, we included 942 invasive breast cancer patients in both a training cohort (n = 355) and an internal validation cohort (n = 334) from one clinical center and in an external validation cohort (n = 253) from a different clinical center. TACS corresponding microscopic features (TCMFs) were firstly extracted from multiphoton images for each patient, and then least absolute shrinkage and selection operator (LASSO) regression was applied to select the most robust features to build a TCMF-score. Finally, the Cox proportional hazard regression analysis was used to evaluate the association of TCMF-score with disease-free survival (DFS). Results TCMF-score is significantly associated with DFS in univariate Cox proportional hazard regression analysis. After adjusting for clinical variables by multivariate Cox regression analysis, the TCMF-score remains an independent prognostic indicator. Remarkably, the TCMF model performs better than the clinical (CLI) model in the three cohorts and is particularly outstanding in the ER-positive and lower-risk subgroups. By contrast, the TACS model is more suitable for the ER-negative and higher-risk subgroups. When the TACS and TCMF are combined, they could complement each other and perform well in all patients. As expected, the full model (CLI+TCMF+TACS) achieves the best performance (AUC 0.905, [0.873–0.938]; 0.896, [0.860–0.931]; 0.882, [0.840–0.925] in the three cohorts). Conclusion These results demonstrate that the TCMF-score is an independent prognostic factor for breast cancer, and the increased prognostic performance (TCMF+TACS-score) may help us develop more appropriate treatment protocols. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-021-02146-7.

Conclusion: These results demonstrate that the TCMF-score is an independent prognostic factor for breast cancer, and the increased prognostic performance (TCMF+TACS-score) may help us develop more appropriate treatment protocols.

Background
Most of cancer-related mortality is directly caused by the cancer cell metastasis from the primary tumor to distant sites [1]. The migration of cancer cells is a multistep process and begins with a remodeling of the local tumor microenvironment (TME) including changes in the extracellular matrix (ECM) [2]. The ECM provides structural and mechanical support for cells and tissues, influences cell migration through its physical properties [3,4], and is mainly made up of a complex meshwork of collagen fibers, glycoproteins, and proteoglycans [5]. Collagen fibers, which are the important component of ECM and may promote or inhibit cell motion, either impede tumor invasion via acting as a barrier against migration [6] or facilitate invasion through providing highspeed "highways" according to their orientation [7]. The role of neighboring stroma in mediating breast tumor initiation, progression, and invasion to surrounding tissue has been broadly researching [8,9].
There has a famous prognostic factor associated with collagen orientation, known as TACS1-3, which predicts the behavior of cancer cells according to the mode of collagen alignment [10]. In vitro research showed that collagen orientations are crucial to tumor cell invasion [11], and in vivo study also proven that collagen signature is an important prognostic factor [12]. Especially, TACS3, which was identified via the presence of linear collagen structure perpendicular to tumor border, shows an important prognostic value and is related to reducing the survival rate of patients [12]. In our previous research, we expanded the TACS1-3 by recognizing TACS4-8 at a large scale and at the invasion front of the primary tumor. TACS1-8 could describe the changes of collagen distribution patterns caused by the interaction between tumors and surrounding collagen fibers during the process of tumorigenesis, development, and invasion. We found that TACS-score (combining TACS1-8) was a strong and general prognostic indicator for disease-free survival (DFS) of breast cancer patients [13]. However, the microscopic features of TACS have not been systematically investigated.
Many new computational techniques were developed to quantify the collagen microstructures and provided robust and informative features within a heterogeneous collection of collagen patterns [14]. For example, Bredfeldt et al. have applied the curvelet-denoising filter following by the FIRE (CT-FIRE) to track collagen shape changes over time in an in vivo mouse model [15]. Falzon et al. have used the elliptical Fourier analysis to compare the differences in collagen shape between normal, benign, and malignant breast tissues [16], and Hu et al. suggested the orientation-dependent gray-level cooccurrence method (OD-GLCM) for distinguishing the different texture patterns of collagen fibers in rat tendons and human pancreatic tissue [17]. Multiphoton microscopy (MPM) has the talent to provide detailed tissue architecture information of unprocessed specimens through a combination of two-photon excitation fluorescence (TPEF) and second harmonic generation (SHG), where SHG imaging could provide a direct and labelfree approach for observing collagen structures. Recently, the feature extraction of SHG images is a rapidly growing field, which involves in extracting numerous quantitative features to determine the relationships between the microscopic features and potential pathology [18,19]. Nevertheless, the association between collagen microstructure characteristics and breast cancer prognosis has not been fully explored.
In this study, we first converted TACS1-8-related SHG images into high-dimensional minable data and extracted TACS corresponding microscopic features (TCMFs). Then, the least absolute shrinkage and selection operator (LASSO) regression was used for choosing the most robust features to build a TCMF-score. In contrast to the results from TACS, the TCMF-score is more suitable for identifying low-risk patients. Furthermore, when the TACS and TCMF models were combined, they could complement each other and perform well in all patients. Therefore, this strong and general imaging prognosticator (TCMF-score) may convince pathologists to adopt it as a prognostic histopathology tool, thereby broadly altering therapeutic options.

Patients and clinicopathological information
The institutional review board at Fujian Medical University Union Hospital and Harbin Medical University Cancer Hospital provided ethical approval for the use of patient material in this study. We retrospectively collected 1223 formalin-fixed paraffin-embedded (FFPE) breast cancer tissue samples from 1223 patients (age 21-87 years), where 281 samples were excluded according to the exclusion criteria and 942 passed quality control for the final analysis. Samples were included in terms of a consecutive series of predefined inclusion criteria: histologic diagnosis of invasive breast cancer without distant metastasis, more than 5 years of follow-up except for patients who developed distant relapse within 5 years. The exclusion criteria were as follows: patients were treated with preoperative therapy (neoadjuvant chemotherapy or radiotherapy); patients had missed relevant clinicopathological characteristics or follow-up data; some samples were not suitable for analysis due to the damage, tumor-free, or poor section quality; some samples' collagen fibers are too sparse to extract their microscopic features.
For this study, 689 patients were from the Fujian Medical University Union Hospital and were divided randomly into training (355 cases) and internal validation (334 cases) cohorts. The external validation cohort was comprised of 253 patients from Harbin Medical University Cancer Hospital. Table 1 shows the clinical characteristics of the patients in the three cohorts. The patient recruitment pathway is shown in Additional file 1: Fig.  S1. Patients in the three cohorts are balanced for disease-free survival (DFS), with the median DFS time (IQR) of 71.0 (38.0-84.0) months for the training cohort, 70.0 (35.8-83.0) months for the internal validation cohort, and 70.0 months (40.5-80.0) for the external validation cohort. Baseline clinical characteristics were collected, including age at surgical intervention, molecular subtype (luminal A and B, HER2-enriched as well as triple-negative), tumor size (T1, T2, and T3), lymph node metastasis (N0, N1, and N2), clinical stage (I, II, and III), histological grade (G1, G2, and G3), chemotherapy (yes or no), radiation therapy (yes or no), endocrine therapy (yes or no), and targeted therapy (yes or no).

Sample preparation and computer-based quantitation of TACS corresponding microscopic features (TCMFs)
Two serial sections of 5-μm thickness were cut from archived paraffin block via a semiautomatic microtome in the pathology department. One slice was deparaffinized by alcohol and xylene and stained with H&E for whole slide imaging, and the other adjacent section was simply deparaffinized for stain-free MPM imaging.
The protocol for TACS1-8 quantification has been described in detail in a previously published paper [13]. For each case, the FFPE tissue block with tumor boundary was selected by a pathologist for microscopic analysis of collagen features. Throughout the whole tissue slice, several (7-20) discrete~2.8-mm squared nonoverlapping regions across the invasive margin and adjacent tumor areas, which depended on the size of samples, were firstly numbered in H&E images. Then, MPM imaging was performed on another section based on all the numbered regions. The large-scale TACS were determined on the MPM images by three independent reviewers who did not know the pathological outcomes.
Subsequently, for each patient, a region of interest (ROI) with a field of view of 150μm × 150μm was identified from each non-overlapping large-scale MPM image with TACS patterns. A total of 142 TACS corresponding microscopic features (TCMFs), including 8 morphologic features and 134 textural features, were first extracted from each ROI using Matlab 2016b (Additional file 1: Supplementary Methods [20][21][22][23][24][25][26][27][28], Table S8-S11, Fig. S8, Fig. S10) and averaged over all ROIs from each patient, and then all 142 TCMFs were normalized using Z-score transformation. The average time of MPM imaging on a slice (a patient) was about 60 min, and the examination time for a trained reviewer to extract TACS was about 10 min/section. The average time of the intercepting small-scale image and running Matlab program was about 6 min/section.

Statistical analysis
The statistical analysis was performed on R 3.5.2 and IBM SPSS Statistics 24. The clinical endpoint of our study was disease-free survival (DFS) that was defined as the time from the date of diagnosis to that of the first recurrence of the disease, date of death, date last known to have no evidence of disease, or date of the most recent follow-up. All statistical tests were two-sided, and a P < 0.05 was considered statistically significant.
The least absolute shrinkage and selection operator (LASSO) regression was applied to choose the most robust features to establish a TCMF-score. LASSO is a popular regression method for the high-dimensional predictors [29] and has been expanded to the Cox proportional hazard regression model for survival analysis [30]. The R package "glmnet" was selected to implement the LASSO Cox regression model analysis [31]. We applied the LASSO algorithm jointly with the Cox survival model to perform a nested feature selection scheme to analyze the association between each TCMF feature and DFS in the training cohort. A formula was obtained through a linear combination of the selected features weighted by their respective LASSO coefficients and then was applied to acquire a TCMF-score for each patient. We used multivariate Cox regression analysis to calculate the relative weight of each score (TCMF-score, TACS-score, CLI-score) and then used a linear combination of each score and its relative weight to establish a comprehensive prognosis score (TCMF+TACS or CLI+ TCMF+TACS).
We performed a receiver operating characteristic (ROC) curve analysis to obtain the areas under the curves (AUCs) for estimating prognostic accuracy and to determine the optimal cutoff value by maximizing the Youden index in the training cohort. Then, the cutoff value was applied to classify the patients into the lowand high-risk groups. The Kaplan-Meier survival curves were used for evaluating the correlation between variables and disease-free survival, and the log-rank test was used to analyze the differences in survival between the two groups. Univariate and multivariate Cox proportional hazard regression analysis was used for choosing independent predictors by likelihood ratio test [32], and then we made use of the independent predictors to establish the nomogram and generate a comprehensive indicator for assessing DFS. The performance of the nomogram was evaluated via discrimination and calibration [33]. A concordance index (C-index) which ranged from 0.5 (no discrimination at all) to 1.0 (perfect discrimination) was used for estimating the discrimination, and a calibration plot, which was a graphic representation of the relationship between the actual incidence and predicted probabilities, was used for evaluating the calibration. A calibration curve would be close to the 45°diagonal line for a well-calibrated model [34].

Results
TACS corresponding microscopic feature score (TCMFscore) Our previous results suggested that the TACS-score is a key determinant of invasive breast cancer prognosis. We recognized 8 major TACSs in the large-scale MPM images, in a way similar to identify biomarkers of histopathological subtypes from H&E images. TACS1-8 were mainly based on the macroscopic appearance of collagen morphological changes in the tumor microenvironment. Details of the TACS-score for each patient have been presented previously [13]. The calculation formula of the TACS-score was in Additional file 1: Supplementary Methods. In this study, the TACS-score is a comprehensive prognostic manifestation of TACS1-8, namely TACS1-8-score. All TACS-score description in the text refers to the TACS1-8-score and TACS is the abbreviation for TACS1-8. After obtaining the macrostructural information by MPM images, we extracted the corresponding microscopic features (TCMF) of these 8 patterns. The SHG image was segmented and 142 features were extracted. The flowchart of this study is shown in Fig. 1 Fig. 2A, the color bar of the heatmap indicates the relationship between the risk scores and DFS. A lower score (TCMF-score, TACS-score, or TCMF+ TACS-score) is associated with better prognosis (higher 5-year DFS), while a higher score is associated with worse prognosis (lower 5-year DFS). Figure 2A also shows that there is a relatively apparent demarcation line at 5 years for the three scores in the three cohorts. Most of the bars larger than 5 years are blue and have a good prognosis, and most of the bars lower than 5 years are red and have a poor prognosis. In the training cohort, the lower TCMF-score, TACS-score, and TCMF+TACSscore (<−1) predict higher 5-year DFS rate (93.3%, CI, 92.9-93.7%; 90.3%, CI, 89.9-90.8%; 91.1%, CI, 90.6-91.7%), and the higher score (>1) indicated lower 5-year DFS rate (16.5%, CI, 12.6-20.4%; 12.2%, CI, 8.8-15.5%; 11.2%, CI, 7.1-15.3%). Similar findings were also obtained in the validation cohorts. The probability of 5year DFS predicted by the scores is presented in Fig. 2B. The histograms of the three scores can be seen in the supplementary material (Additional file 1: Fig. S2) and are evenly distributed in the three cohorts. This scoring system may act as an auxiliary tool for pathologists to estimate patients' survival.

Predictive performance of the TCMF-score
The TCMF-score is significantly associated with DFS in univariate Cox proportional hazard regression analysis (HR 3.667, [2.716-4.953], P < 0.0001). After adjusting for clinical variables by multivariate Cox regression analysis, the TCMF-score remains an independent prognostic indicator for predicting DFS (HR 1.911, [1.407-2.595], P < 0.0001) (Additional file 1: Table S1) in the training cohort. Noteworthy, collagen signatures remain Model performance was primarily evaluated using receiver operating characteristic (ROC) analysis in the training, internal validation, and external validation cohorts. The TCMF model (AUC at 5-year DFS, 0.781; 95% CI, 0.731 to 0.831) performs better than the CLI model (0.749; 95% CI, 0.694 to 0.803), but is inferior to the TACS model (0.845, 95% CI, 0.802 to 0.889) in the training cohort (Fig. 3A). The input into the CLI model are clinical risk factors including age, molecular subtype, tumor size, nodal status, histological grade, clinical stage, chemotherapy, and radiation therapy, and learning procedure is based on Cox proportional hazard regression. We included all clinical parameters to show that TACSscore or TMCF-score has a good prediction performance. These models were also applied to the internal validation and external validation cohorts to examine their generalizability. We found that the prediction performance of these models is generally stable on the two validation cohorts.
Moreover, patients were divided into low-risk and high-risk groups according to the cutoff value at the 5year time point. The optimal cutoff value was Fig. 1 The flowchart of this study. Large-scale TACSs were visually examined on SHG images by three independent reviewers. A total of 142 TACS corresponding microscopic features (TCMFs) were extracted from SHG images. Ridge regression and LASSO regression were used to calculate the TACS-and TCFM-score, and then the two scores were combined for a series of prognostic analysis. TACS1, curved collagen fibers wrap around the emergent tumor foci; TACS2, collagen fibers are stretched due to tumor growth and align more parallel to tumor boundary; TACS3, collagen fibers align perpendicular to the tumor boundary in a radiation pattern to facilitate tumor cell migration; TACS4, reticular distribution of collagen fibers adjacent to expanding tumor that leads to a clear tumor boundary; TACS5, directionally distributed collagen fibers that enable unidirectional tumor cell migration without a clear tumor boundary; TACS6, chaotically aligned collagen fibers that enable multidirectional tumor cell migration without a clear tumor boundary; TACS7, densely distributed collagen fibers at the tumor invasion front largely free of tumor cells; TACS8, sparsely distributed collagen fibers at the tumor invasion front largely free of tumor cells determined by the maximum Youden index in the training cohort obtained by the ROC curve analysis. Survival differences between the low-risk and high-risk groups in each cohort were evaluated by Kaplan-Meier survival analysis. The predicted high-risk group has worse DFS than the low-risk group as shown in Fig. 4

Difference between TCMF and TACS models
In order to elucidate the predictive performance of these models among different subgroups, we conducted a number of subgroup analyses classified by the clinical variables. We evaluated the ability of risk stratification and prediction of 5-year DFS of different models in different subgroups ( Table 2). The CLI model demonstrates poor In contrast, as can be seen in Table 2, the TACS model appears to be particularly prominent in high-risk patients as discussed extensively in our previous publication [13]. When the two models (TCMF+TACS) are combined, they could complement each other and perform well in all patients, highlighting its general applicability (Additional file 1: Table S4, Fig. S3), and the full model (CLI+ TCMF+TACS) achieves the best performance and further stratifies the low-and high-risk patients with outstanding values of HR (Additional file 1: Table S5). To further examine whether there was a specific benefit of these models, we stratified the patients into various subgroups (ER-positive group: luminal, ER+/N−, ER+/ HER2−, ER+/N−/HER2−, ER+/PR+; ER-negative group:  Table S6). Surprisingly, for both risk stratification capability (HR) and prediction accuracy (AUC), the TCMF model is particularly outstanding in the ER-positive and lower-risk combination subgroup, and the TACS model is counter-matched in the ER-negative and higher-risk combination subgroup (Additional file 1: Table S6). Similarly, the TCMF+TACS model is quite robust in all ER-positive and ER-negative subgroups (HR, AUC in the Additional file 1: Table S6, Cindex in the Additional file 1: Fig. S4). The differential benefit of the TCMF/TACS model is also revealed by the increases in HR, AUC from the CLI model to the full model (Additional file 1: Table S6).

Combination prediction of TCMF-and TACS-score
When TCMF is combined with TACS, the AUC increases to 0.876 (95% CI, 0.838 to 0.913) (Fig. 3A). The accuracy, sensitivity, and specificity of TCMF+TACS are also universally better than other single models. The full model achieves more improved AUC, accuracy, and specificity, but reduces sensitivity, which may be due to the low sensitivity of the CLI model (Table 3). High specificity would help the model avoid misidentifying low-risk patients as high risk, i.e., few low-risk individuals are diagnosed as a high-risk group, while good sensitivity allows the model to not miss high-risk patients, i.e., most highrisk patients are correctly diagnosed as a high-risk group. The TCMF+TACS model predicts 5-year DFS for all categories by six clinical variables (Fig. 5). Patients with a low-risk score have less frequent recurrence at 5 years than these with a high-risk score. The personalized prognostic potential of the TCMF+TACS-score implies that we could triage all patients by the six clinical variables into low-and high-risk groups with diverged stratification of a 5-year DFS rate. Moreover, not all patients with small tumors (patients with a tumor 2 cm in diameter or smaller) and luminal A patients are at low risk. Those patients with a high-risk TCMF+TACS-score have a 54.1 and 37.0% risk of recurrence at 5 years, respectively.
To further evaluate the prognostic performance of different models, we calculated an integrated cumulative/dynamic area under the curve (iAUC) for the five models, respectively, being 70.6%, 71.7%, 77.1%, 79.8%, and 83.3% in the training cohort (Fig. 3B). Applying the models to the internal and external validation cohorts' results in iAUC values of 71.4%, 70.9%, 75.9%, 80.5%, 83.6%, and 68.7%, 74.5%, 74.4%, 77.9%, 81.8%, respectively. Figure 3C shows the AUC for time-dependent ROC performance, and the performance of these models in predicting recurrence risk remains stable with only small fluctuation in time-dependent AUC values as we move away from the time point of diagnosis. These results highlight the importance of tumor-stroma spatial patterns.
As illustrated in Fig. 4A (Figs. 3 and 4A). To confirm that the TCMF+TACS model has an outstanding prognostic value in different populations, we further extended it to the internal and external validation cohorts and acquired the similar results (Figs. 3 and 4B, C). Thus, high discriminatory accuracy and super risk stratification ability validate TCMF+TACS-score as a strong DFS prognosticator.
Moreover, we propose that the combined analysis of CLI, TCMF, and TACS can improve the prognostic stratification of breast cancer patients by evaluating the high and low risks of different models. The risk groups of CLI low /TCMF low /TACS low represent that primary breast cancer was controlled by standard treatment with improved DFS. In contrast, the risk groups of CLI high / TCMF high /TACS high would represent a population of patients at risk for recurrence. Therefore, as can be seen from the Kaplan-Meier curves (Additional file 1: Fig. S5, upper left), low-risk groups of TCMF and TACS are associated with a higher DFS rate regardless of the CLI risk category. On the contrary, there is a poorer DFS rate when TCMF and TACS are both high (Additional file 1: Fig. S5, bottom right).
In addition, TCMF and TACS-score remain as independent prognostic factors by the multivariate analysis of DFS (Additional file 1: Table S1), along with the clinicopathologic biomarkers such as molecular subtype, tumor size, nodal status, and chemotherapy. Other clinical factors, such as age, clinical stage, histological grade, and radiation therapy, are excluded because they are not significantly associated with DFS in the univariate and multivariate Cox proportional hazard regression analyses (P > 0.05). Then, a clinically applicable nomogram integrating these independent prognostic factors was developed to predict 1-year, 3-year, and 5-year DFS in the training cohort (Additional file 1: Fig. S6A). Calibration plots (Additional file 1: Fig. S6B) show that nomogram performs well (C-index 0.88, 0.86-0.91 for the training cohort; 0.85, 0.82-0.89 for the internal validation cohort; and 0.85, 0.81-0.88 for the external validation cohort) and displays excellent agreement between the observed and predicted rates in the three cohorts (proximity to 45°diagonal line).

Clinical applications
Finally, we used the CLI, TCMF, TACS, and CLI+ TCMF+TACS models to evaluate the effect of postoperative adjuvant treatment on different recurrence risk groups stratified via the treatment guideline in China, which agrees with the 10th St Gallen expert consensus [35]. This guideline allowed 590 patients to be classified as minimum risk (n = 28) or moderate risk (n = 562) (defined together as low risk) for less aggressive treatment, and 352 patients as high risk for more aggressive treatment (Additional file 1:  Table S7). Therefore, there may be inadequate treatment or overtreatment for some patients after standard treatment.

Discussion
At present, conventional clinical management of breast cancer mainly depends on the traditional prognostic factors, including patient's age, molecular subtype, tumor size, lymph node metastasis, clinical stage, and histological grade, in addition to estrogen receptor (ER), progesterone receptor (PR), and HER2 [36,37]. These biomarkers are valuable for predicting local recurrence and distant metastasis [38]. However, patients with the same pathological features often have different outcomes, suggesting that the current clinical factors are inadequate for predicting prognosis [39]. Therefore, the search for new prognostic and predictive biomarkers is motivated to promote the prognostic evaluation and improve patient outcome. In our previous study, we found that TACS1-8 emerge as a tumor microenvironmentbased structural prognosticator, but these large-scale collagen patterns are macroscopic morphologies observed from multiphoton images [13].
High-throughput collagen microscopic features including morphological and textural features can be extracted through image processing to quantify the differences between tissues. The morphological features could quantify shape-related properties, such as collagen area, number, length, width, straightness, crosslink density, crosslink space, and orientation. Recently, Sprague et al. found that multiple morphological features of collagen fibers around DCIS were associated with DCIS recurrence risk [40]: patients with greater collagen width and density had a lower risk of recurrence, while patients with higher fiber straightness and distance to the nearest two fibers had a higher risk of recurrence. However, Case et al. reported that collagen width is associated with poor outcome by multivariate analysis [41]. The texture features, including histogram, gray-level co-occurrence matrix (GLCM), and Gabor wavelet transformation feature, have been proved to be popular features in biomedical image analysis [19,22,[42][43][44]. Texture analysis, which is based on first-order and second-order statistics, can be used to extract the image features associated with the structural and biochemical changes in collagen networks and quantitatively track the alterations correlated with collagen remodeling [45]. GLCM is a second-order statistical texture feature that reflects the spatial heterogeneity of collagen fibers with five different displacements of pixels and four different directions, and Gabor wavelet transformation is also a type of textural analysis that could reflect the spatial relationship of images in different scales and orientations after the convolution of images. Texture analysis has been employed extensively in magnetic resonance imaging (MRI) for distinguishing between breast cancer subtypes [46], differentiating between benign and malignant lesions [47], and evaluating the relationship between textural features and survival outcomes of patients with breast cancer [48], but not for evaluating the collagen spatial structure because of the limitation in resolution of MRI [49]. In this study, morphological and textural analyses were used for extracting the microscopic features of collagen fibers in high spatial resolution MPM images.
We used computer-assisted image feature processing to automatically extract the corresponding microscopic feature of each TACS (TCMF), and then made use of the least absolute shrinkage and selection operator (LASSO) regression to choose the most robust features to acquire a TCMF-score. Statistical analyses show that TCMF-score is a strong independent prognostic indicator for predicting DFS. Interestingly, we found different prognostic values for subgroups when the TACS and TCMF were used for evaluating the accuracy (AUC) of diagnosing 5-year DFS and risk stratification ability. TACS-score would be helpful to recognize a significant percentage of high-risk patients classified through the clinicopathological factors. By contrast, TCMF-score performs better for the low-risk patients, especially for hormone-positive breast cancer, which may compete with the popular multigene assays on prognosis [50,51]. As we all know, microstructural changes will precede the changes of macrostructure. The changes in microstructure may occur in the early stage of diseases, but cannot be observed visually. It is these microstructural changes, rather than the macrostructural changes, that may have the sensitivity to signify important features for detecting early lesions [52]. This may be the reason why the microscopic collagen features are more suitable for identifying low-risk patients, while the macroscopic collagen patterns are more suitable for identifying high-risk patients. It is a promising method for the early detection of lesions by microstructural changes. What is more, the TCMF+TACS-score is far better than the wellestablished clinicopathological factors in prognostic performance, highlighting the obbligato role of the tumor microenvironment in cancer progression. This study would extend our previous research on TACS and highlight the importance of microscopic collagen morphology changes.

Conclusions
In summary, we used computer-assisted image processing to automatically extract the corresponding microscopic features of each TACS (TCMF) and then utilized the LASSO regression to screen out the most robust features to establish a TCMF-score. TCMF and TACS would reflect the microscopic and macroscopic morphology changes of collagen fibers in the breast tumor microenvironment. A combination of TCMF-score with TACS-score could be used for predicting individual disease-free survival rate with good statistical significance, high discriminatory accuracy, and superior risk stratification ability. With the increasing automation of image processing, TCMF+TACS screening has great potential to become a clinical diagnostic tool, providing more accurate prognosis information.
Additional file 1: Supplementary Methods: 1.1 Multiphoton imaging system. 1.2 TACS-score calculation formula. 1.3 Morphological features. 1.4 Texture features. 1.5 TCMF-score calculation formula. Figures S1-S10: Fig. S1. Recruitment pathways for patients. Fig. S2. Distribution histogram of TCMF-score, TACS-score and TCMF+TACS-score. Fig. S3. Hazard ratio of 5-year recurrence for all patients according to the TCMF+TACSscore classifier in different subgroups stratified by clinical parameters. Fig.  S4. Forest plot shows the concordance index (C-index) of different prediction models (TCMF, TACS, and TCMF+TACS) in the different risk combination patients. Fig. S5. Kaplan-Meier survival curves for comparing the risk stratification ability of the CLI-score under four different situations (TACS-low risk and TCMF-low risk, TACS-high risk and TCMF-low risk, TACS-low risk and TCMF-high risk, TACS-high risk and TCMF-high risk). Fig. S6. Nomogram and Calibration curves. Fig. S7. TCMF selection using LASSO cox regression analysis. Fig. S8. Schematic of extracting TACS corresponding microscopy features (TCMF). Fig. S9. Kaplan-Meier curves of DFS according to the CLI, TCMF, TACS, TCMF+TACS, and CLI+TCMF+TACS models for the low risk and high risk patients classified by the treatment guideline. Fig. S10. Representative MPM images of three patients to illustrate the extraction of TCMF. Tables S1-S11: Table S1. Univariate and multivariate Cox proportional hazards regression analyses of the association of variables with DFS in the training cohort. Table S2. Univariate and multivariate Cox proportional hazards regression analyses of the association of variables with DFS in the internal validation cohort. Table  S3. Univariate and multivariate Cox proportional hazards regression analyses of the association of variables with DFS in the external validation cohort. Table S4. Risk stratification ability of TCMF+TACS-score in subgroups of the three cohorts. Table S5. Risk stratification ability of the five prediction models in three cohorts. Table S6. Prognosis performance of the five models at 5-year DFS in patients stratified by the combination of different risk factors. Table S7. The number of likely undertreated (orange-highlighted) and overtreated (purple-highlighted) patients according to the Chinese treatment guideline and four models. Table S8. Morphological features. Table S9. First order intensity histogram features. Table S10. Grey-level co-occurrence matrix-based features. Table S11. Gabor wavelet transform features.