A compact VEGF signature associated with distant metastases and poor outcomes

Background Tumor metastases pose the greatest threat to a patient's survival, and thus, understanding the biology of disseminated cancer cells is critical for developing effective therapies. Methods Microarrays and immunohistochemistry were used to analyze primary breast tumors, regional (lymph node) metastases, and distant metastases in order to identify biological features associated with distant metastases. Results When compared with each other, primary tumors and regional metastases showed statistically indistinguishable gene expression patterns. Supervised analyses comparing patients with distant metastases versus primary tumors or regional metastases showed that the distant metastases were distinct and distinguished by the lack of expression of fibroblast/mesenchymal genes, and by the high expression of a 13-gene profile (that is, the 'vascular endothelial growth factor (VEGF) profile') that included VEGF, ANGPTL4, ADM and the monocarboxylic acid transporter SLC16A3. At least 8 out of 13 of these genes contained HIF1α binding sites, many are known to be HIF1α-regulated, and expression of the VEGF profile correlated with HIF1α IHC positivity. The VEGF profile also showed prognostic significance on tests of sets of patients with breast and lung cancer and glioblastomas, and was an independent predictor of outcomes in primary breast cancers when tested in models that contained other prognostic gene expression profiles and clinical variables. Conclusion These data identify a compact in vivo hypoxia signature that tends to be present in distant metastasis samples, and which portends a poor outcome in multiple tumor types. This signature suggests that the response to hypoxia includes the ability to promote new blood and lymphatic vessel formation, and that the dual targeting of multiple cell types and pathways will be needed to prevent metastatic spread.


Background
Metastases are the main cause of mortality for patients with breast cancer. The molecular biology behind metastasis is complex and likely requires changes in cell cycle regulation [1], the repertoire of expressed proteases and protease inhibitors [2], proteins that promote autocrine growth loops, and/or proteins that cause an epithelial-tomesenchymal transition [3]. To make matters more complicated, it is clear that metastasis biology is in part governed by non-tumor cells including fibroblasts [4], endothelial cells [5], and myoepithelial cells [6]. For example, recent evidence suggests that tumor endothelial cell interactions are important for determining patient outcomes as evidenced by the promising results from clinical trials that use bevacizumab, a monoclonal antibody directed against vascular endothelial growth factor (VEGF) [7,8].
Genomic profiling of human tumors and model systems has identified important features concerning metastasis biology. First, it has been shown that the expression profile of primary tumors without metastases can be highly predictive of the development of future metastases [9][10][11][12][13]. Second, cell lines can be selected that have specific endorgan tropisms with distinct expression profiles [14,15]. Finally, cell line and murine models have demonstrated many different genes as being important for breast tumor metastasis, including Twist [16], Snail [3], and CXCL12 [17]. In this paper, we compare primary breast tumors, regional metastases, and distant metastases with each other and show that distant metastasis samples are distinct and provide unique signatures that predict poor outcomes in primary tumors.

Tissue samples and microarray protocols
One hundred and forty-six patients represented by 161 breast tumor specimens (with 23 paired tumor samples) and 10 normal breast samples (195 total microarrays) were profiled. Most of these samples appeared in previous publications [18][19][20], with 39 being new to this study, and all of which were collected using institutional review board-approved protocols. The clinical information for all samples is in the table in Additional file 1. Included within the 161 profiled tumors were 134 primary tumors, nine regional metastases and 18 distant metastases. Patients were heterogeneously treated in accordance with the standard of care dictated by their disease stage, estrogen receptor (ER) and HER2 status.
Total RNA isolation and microarray protocols are described in Hu et al [21]. Each sample was assayed versus a common reference sample [22]. The microarrays used were Agilent Human oligonucleotide microarrays that were scanned on an Axon GenePix 4000B, analyzed with GenePix Pro 4.1, and Lowess normalized. All microarray data have been deposited into the GEO under the accession number of GSE3521.

Supervised microarray data analysis
The background-subtracted, Lowess-normalized log 2 ratio of Cy5 over Cy3 intensity values were filtered to select genes that had a signal intensity of > 30 units in both the Cy5 and Cy3 channels. Only genes that met these criteria in at least 70% of the 195 microarrays were included for subsequent analysis. Next, each patient was classified according to the following metastasis scoring system (MetScore): MetScore = 1 were patients that had a primary tumor and were clinically node negative (N = 0) and distant metastasis negative (M = 0); MetScore = 2 were patients that had a regional metastasis (N = 1-3) and no distant metastasis (M = 0); MetScore = 3 were patients with confirmed distant disease at the time of diagnosis (M = 1 and any N) or that were represented by an actual distant metastasis sample. We next performed a multi-class significance analysis of microarrays (SAM) using a single sample from each patient, biasing the sample selection to use the actual regional or distant metastasis samples (146 arrays, see Additional file 1). We identified the gene set that was associated with the MetScore 1-2-3 distinction, which gave 1195 genes at a false discovery rate of 5%. This gene set was next used in a one-way average linkage hierarchical cluster using the program 'Cluster' [23], with the data being displayed relative to the median expression for each gene using 'Java Treeview' [24].

Cross-validation analyses
Relationships between the gene expression data and the MetScore classification was further examined using a 10fold cross-validation (CV) analysis to identify a set of genes that might distinguish a MetScore group from the others. 10-fold CV using five different statistical predictors including PAM [25], a k-nearest neighbor classifier with either Euclidean distance or one-minus-Spearman-correlation as the distance function, and a class nearest centroid metric with either Euclidean distance or one-minus-Spearman-correlation as the distance function, were used as described in Chung et al [26]. We performed 10-fold CV using the five different statistical predictors with the reported CV prediction accuracies being the average of the five predictors (Tables 1, 2, 3 and 4). For the VEGF profile, an average expression value across  all 13 genes (RRAGD, FABP5, UCHL1, GAL, PLOD,  DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3 and C14ORF58) was determined and the patients were placed into a three-group classification based their 13gene average log 2 expression ratio from the University of North Carolina (UNC) training data set and using the cut off values (-0.63/0.08) that were identified using X-tile [27] and relapse-free survival as the endpoint. Analyses using the VEGF profile and the training set cutoffs were also applied to an independent test set of 295 patients assayed on Agilent microarrays (that is, NKI295 [28]), to a set of lung carcinoma samples from Bhattacharjee et al [29], and to the glioblastoma sample set from Nutt et al [30]. To perform these across-data set analyses, for the NKI295 dataset we used the log ratio of red channel intensity versus green channel intensity and the data was median centered for every gene across the 295 arrays. The Netherlands Cancer Institute (NKI) dataset was then distant weight discrimination (DWD) normalized [31] with the UNC training dataset after collapsing by NCBI Entrez GeneID; after DWD normalization, the NKI data was also column standardized. For the Affymetrix datasets the probe level intensity .CEL files were processed by robust multi-chip average. The probe sets' log intensity was median centered for every gene across all the arrays. The Affymetrix datasets were also DWD normalized relative to the UNC training data after collapsing by NCBI Entrez GeneID, and were column standardized.

Multiple expression predictor analyses
First, each sample was assigned an 'intrinsic subtype' as described in Hu et al [18], where a centroid was created for each of the following intrinsic subtypes: Basal-like, Luminal A, Luminal B, HER2-enriched and Normal-like. Next, we tested for associations between a tumor's intrinsic subtype, the VEGF profile and other published expression profiles implicated in metastasis biology that included a) the 70-gene outcome predictor developed by van't Veer et al [10,11], b) the 'wound-response' profile [32], c) the hypoxia-induced cell line signature [33], d) the 11-gene BMI/stem cell signature [34], e) a bone metastasis signature [14], f) a lung metastasis signature [15], and g) the expression profiles of HIF1α, Snail [3] and Twist [16]; we extracted as many genes as was possible from our micro-  arrays for each predictor and followed the classification scheme described by the authors. For the bone metastasis signature [14], we created an average value for each patient using the 43 genes that were highly expressed in the cell line derivatives that metastasized to the bone; we performed a similar analysis for the lung metastasis signature [15].
Lastly, for the 11-gene stem cell signature [34], we created an average value across all 11 genes. We also created a 'glycolysis-profile' by starting with the nine glycolysis genes/ probes present on the array, then filtering for probes that showed > 30 intensity units in both channels, and then selecting for 70% good data across all samples; next we selected the subset of glycolysis gene probes that passed filtering and showed a Pearson correlation of greater than 0.4, which resulted in the selection of six out of nine glycolysis genes (GPI, PKM2, PFKP, PGK1, GAPD, ENO1), which were then used to create an average profile for each patient.
We examined correlations between profiles using multiple methods (Additional file 1): for quantized profile testing, Chi-squared analysis and Fischer's exact test were used. For continuous variable testing, ANOVA analyses were performed. Finally, we also performed a calculation of the Cramer's V statistic for the evaluation of the strength of association between two quantized variables (see Oh et al [19]).

Survival analyses
Univariate Kaplan-Meier analysis was performed with a log-rank test using WinSTAT for excel. Multivariate analysis of the NKI295 test set using Cox proportional hazards modeling was conducted in SAS version 9.1; a Cox hazard model was tested that included estrogen receptor status (coded as positive vs. negative), tumor size (coded as ≤ 2 cm vs. > 2 cm), lymph node status (codes as 0, 1-3, > 3 positive nodes or M = 1), age (continuous variable, formatted in decades), grade (coded as grade 1 vs. 2, and grade 1 vs. 3), and treatment (coded as yes if treatment with chemo and/or hormonal therapy, no if no adjuvant therapy was given), and the VEGF profile of low, intermediate or high as a single categorical variable. Another Cox model was also tested that included all the clinical variables, the VEGF profile, and other expression predictors [11,13,18,19,28,35].

In situ hybridization and immunohistochemistry
In situ hybridization (ISH) on tissue microarrays containing 250 different human breast tumors (not related to the 146 patients used for microarray analysis) was performed as previously described [36]. Briefly, digoxigenin (DIG)labeled sense and anti-sense RNA probes are generated by  PCR amplification of approximately 450 bp products with the T7 promoter incorporated into the primers; the primer sequences used for amplification were VEGF (Forwardtctccctgatcggtgacagt, Reverse-tcgaaaaactgcactagagacaa), ANGPTL4 (Forward: gggaatcttctggaagacctg, Reversetacacacaacagcaccagca) and ADM (Forward-gtgtttgccaggcttaagga, Reverse-tcggtgtttccttcttccac). In vitro transcription was performed with a DIG RNA-labeling kit and T7 polymerase according to the manufacturer's protocol (Roche Diagnostics, Indianapolis, IN). Immunohistochemistry (IHC) was performed for HIF1α using Mouse Anti-Human HIF1α (BD Biosciences #610958) according to the protocol from Vleugel et al [37]; the tumors were scored for perinecrotic and diffuse staining as described in Vleugel et al.

Expression patterns associated with metastases
To identify gene expression patterns associated with breast cancer metastases, we performed 195 microarrays repre-senting 134 primary tumors, nine regional metastases and 18 distant metastasis specimens (146 different patients and 10 normal breast tissues). Each patient was classified according to a MetScore, which is roughly analogous to stage except that tumor size was not considered (see Methods). As expected, this scoring system was highly predictive of patient outcomes ( Figure 1A and 1B). Using the MetScore classifications, we performed CV analyses to determine if any MetScore group might be distinct relative to the others. Low accuracy rates (56% to 65%) for the prediction of MetScore 1 vs. MetScore 2 specimens were observed; however, when MetScore 1 vs. MetScore 3 (80% to 85%) or MetScore 2 vs. MetScore 3 samples (81% to 83%) were compared, high accuracy rates were obtained, which suggests that MetScore 3 was the most distinct group.
Next, we performed a multi-class significance analysis of microarray [38] analysis using a single sample from each of the 146 patients and the MetScore 1-2-3 grouping and Kaplan-Meier survival plots  where the samples were first ordered according to Met-Score, and then according to their correlation to the average profile (that is, centroid) of the MetScore 3 class. This analysis demonstrates that some MetScore 1 and 2 samples actually have a MetScore 3 profile; a similar result has been shown before by Ramaswamy et al [9].
The gene expression patterns from this SAM analysis were complex and there were few, if any, that directly correlated with a simple progression from MetScore 1 to 2 to 3. Included within this gene set were many clusters and/or gene sets that have been identified previously, including a luminal/ER+ pattern [11,39,40] and a proliferation signature [41,42], both of which are integral parts of a gene expression assay that predicts the likelihood of recurrence in ER+ and patients treated with tamoxifen [13]. In addition, many other biologically important gene sets were identified, including an 'immediate early' gene cluster containing c-FOS and JUNB (Figure 2A) [43], a set of fibroblast genes containing PLAU, THSB2 and multiple collagen genes ( Figure 2B), a set of immune cell genes ( Figure 2D), and a gene set containing CXCL12 ( Figure  2C); CXCL12 was the top-ranked gene from this SAM analysis and was recently identified as a chemokine whose high expression promotes tumor cell proliferation, migration and invasion [17]. Analysis of these individual clusters by EASE [44], with both EASE score and Bonferroni < One-way average linkage hierarchical cluster analysis  0.05 used as the cut off, identified many significant gene ontology categories that included 'transcription regulation' and 'DNA/nucleic acid binding' for the FOS-JUN cluster, while the fibroblast cluster was over-represented for 'extracellular matrix', 'cell adhesion and communication', 'organogenesis', 'development', and 'regulation of protease activity'. The CXCL12 cluster was over-represented for 'cell adhesion', 'cell migration' and 'extracellular matrix'. Lastly, a small 13-gene cluster containing VEGF, Adrenomedulin (ADM) and Angiopoietin-like 4 (ANGPTL4) was identified as the 'VEGF-profile' ( Figure  2E), which is discussed below in greater detail.
Our previous work identified five 'intrinsic' subtypes of breast cancer that are of prognostic and predictive value [18,41,45]. Subtype classification of the tumors using the centroid predictor from Hu et al [18] showed significant outcome predictions ( Figure 1C and 1D). A Chi-squared test (p = 0.0006) showed that intrinsic subtype was significantly correlated with MetScore, with the Basal-like and HER2-enriched groups being the most frequent in Met-Score 3 and with no Luminal A samples being in the Met-Score 3 group. Correlations between tumor subtype and stage have been described [46,47], and were recapitulated here.

Analysis of the VEGF profile
A small cluster of genes containing VEGF was identified ( Figure 2E) that showed high expression in MetScore 3 tumors. This gene cluster contained several secreted proteins that have been implicated in endothelial cell (VEGF and ANGPTL4), lymphatic cell (ADM) and smooth muscle cell (GAL) dynamics. As a step in evaluating this profile, we performed ISH to determine what cell type was producing VEGF, ANGPTL4 and ADM. In the vast majority of cases that showed strong ISH positivity (which totaled approximately 10% of the 250 tumors tested), it was the tumor cells themselves that produced the mRNA for these three genes, and typically all three were produced (Figure 3). In a few cases, both tumor and fibroblasts showed ISH positivity, but this was rare.
As a second step in the evaluation of the VEGF profile, we created an average expression ratio of the 13 genes for each patient and looked for correlations with outcome. By dividing the patients into low, intermediate and highexpression groups using relapse-free survival (RFS) and cutoffs determined by X-tile [27], we saw that the VEGF profile was prognostic of RFS ( Figure 4A) and overall survival (data not shown) with the high expression portending a poor outcome. Rank order expression classifications (two or three groups) were also robust methods of predicting outcomes (Additional file 3). Applying the VEGF profile classification rules to an independent test set of 295 patients (that is, NKI295) [10,28] also significantly In situ hybridization

ANGPTL4
VEGF predicted outcomes ( Figure 4B), as did rank order classifications (Additional file 3). This classification rule was also of prognostic value on a set of lung carcinoma samples ( Figure 4C and Additional file 3E and 3F), although there were too few 'low' samples to be included into the Kaplan-Meier plot analysis, and on a set of patients with glioblastoma ( Figure 4D and Additional file 3G and 3H); we noted that two genes (ANGPTL4 and C14ORF58) were not found on the Affymetrix platform for lung and glioblastoma test data sets. However, the Pearson correlation is 0.992 (UNC training dataset) and 0.986 (NKI295 test dataset) respectively between the average of the 13 genes and that of the 11 genes (omitting ANGPTL4 and C14ORF58). We repeated the survival analysis for the UNC dataset and NKI test set again using the 11 genes and the results were very similar (data not shown).
A multivariate Cox proportional hazards analysis on the NKI295 test set using RFS was performed using clinical variables and the VEGF profile, and it was determined that the VEGF profile was a significant predictor of RFS (Table  1). In Fan et al [48], we evaluated the prognostic powers and concordance across multiple expression predictors including the intrinsic subtypes, the NKI 70 gene signature, a microarray-based version of the Genomic Health Inc. Recurrence Score, and the wound-response profile using this same NKI patient data set, and we have also identified other profiles of prognostic significance includ-Univariate Kaplan-Meier survival plots Figure 4 Univariate Kaplan-Meier survival plots. Univariate Kaplan-Meier survival plots of survival for patients stratified using the VEGF profile on the A) UNC training data set, B) NKI test data set, C) Bhattacharjee et al lung carcinoma data set [29], and D) Nutt et al glioblastoma data set [30]. Note: two genes ANGPTL4 and C14ORF58 were not found on Affymetrix platforms for C and D. ing an estrogen pathway [19] and p53 mutation profiles [35]; therefore, we performed a Cox proportional hazards analysis (Table 2) with backwards variable selection  (Table 3) to evaluate a model that contained all of the aforementioned gene expression predictors and clinical variables. The final model contained both clinical parameters and multiple gene expression predictors including the VEGF profile (Table 3). Similar results were obtained when using time to distant metastasis formation, or overall survival (data not shown), or when treating the VEGF profile as a continuous variable ( Table 4).

Analysis of a glycolysis-profile and HIF1α expression
A biological implication of the VEGF profile is that it is related to a tumor's response to hypoxic conditions, which historically has been referred to as the Warburg effect [49,50]. A central tenant of the Warburg effect is that a tumor's metabolism becomes more dependent upon glycolysis due to anaerobic conditions. To examine glycolysis using a genomic approach, we created a 'glycolysisprofile' using the six most highly correlated glycolytic enzyme probes (GPI, PKM2, PFKP, PGK1, GAPD, ENO1, Figure 5A); the VEGF profile and the six best glycolysis probes were highly correlated (p < 0.001, Table 5).
HIF1α is a known regulator of VEGF expression, and therefore we determined that HIF1α mRNA gene expression was correlated with the VEGF profile (p = 0.0004; Table 5); in addition, 'perinecrotic' HIF1α IHC staining as defined by Vleugel et al [37] was also assayed on a subset of 66 of these tumors and was correlated with expression of the VEGF profile (ANOVA p-value = 0.018, data not shown), while a 'diffuse' HIF1α IHC profile was not. Next, the promoter region of each of the genes in the VEGF profile was examined using the program rVISTA [51] and showed that DDIT4, VEGF, NDRG1, SLC16A3, PLOD, ADM, ANGPTL4 and C14ORF58 all had hypoxia response elements within 2000 bp upstream of their start codons; it is already known that many of these genes including VEGF [52], ADM [53], and DDIT4 [54] are HIF1α-regu-lated. Nearly identical genomic results were also obtained from the NKI295 test set ( Figure 5B).

Fibroblast signature
A fibroblast/mesenchymal signature was another profile that changed with MetScore ( Figure 2B), and thus to examine the potential fibroblast cell content present within each MetScore group we determined each patient's average expression value of the genes contained with the cluster presented in Figure 2B. This gene set contains fibrillin, fibroblast activation protein alpha, six collagen VEGF profile, glycolysis and HIF1α gene expression analyses  protein subunits and versican, which are genes and/or proteins that are typically produced by fibroblast and/or mesenchymal cells [55]. This analysis shows that the fibroblast profile is correlated with intrinsic subtype ( Table 5, p = 0.012) and that the MetScore 3 samples had the lowest average expression compared with the Met-Score 1 and 2 samples (ANOVA p-value = 0.005, data not shown). Pathological examination of H&E sections of the distant metastasis samples also supports this conclusion and shows scant admixed mesenchymal cells in the distant metastasis samples versus their primaries that show abundant admixed mesenchymal cells ( Figure 6).

Correlations between multiple metastasis associated profiles
We examined whether the intrinsic subtypes, the Met-Score classification, and the VEGF signature correlated with any of the following expression profiles that have been associated with metastatic potential: a) the NKI 70gene predictor [10,11], b) the 'wound-response' profile [32], c) a cell line-derived hypoxia profile [33], d) an 11gene BMI/stem cell signature [34], e) a bone metastasis signature [14], f) a lung metastasis signature [15], g) a hypoxia metagene [56], and h) the expression profile of three individual genes (HIF1α, Snail [3], and Twist [16]). These analyses identified a large amount of concordance across profiles (Table 5). For example, the breast tumor subtype was significantly correlated with the bone and lung profiles, Snail expression, and the 11-gene stem cell signature; in particular, the bone and lung profiles were associated with both ER-negative subtypes (Basal-like and HER2-enriched), and Snail expression and the 11-gene stem cell signature were the highest within the Basal-like subtype. Similar results were also observed when the VEGF profile was compared with the other profiles. Two 'hypoxia signatures' have been described and shown to be of prognostic value across a variety of tumor types including breast [33,56]; the large signature of Chi et al [38] showed a four-gene overlap with the VEGF profile (ADM, NDRG1, DDIT4 and ANGPLT4) while the 'hypoxia-metagene' of Winter et al [56] showed a three-gene overlap (VEGF, NDRG1 and ANGPLT4); as might be expected, all three of these profiles were correlated (Table 5, p < 0.0001).

Discussion
We took a genomics approach to study metastasis biology and classified patients with breast cancer according to the presence and location of their metastases (that is, Met-Score). The resulting analyses showed that the most distinct group with the most distinguishing features were the distant metastases; few differences were seen between primary tumors and regional metastases, as has been shown before [57]. When the set of genes that were correlated with MetScore was determined, many previously known H&E images of a primary breast tumor taken from a Met-Score 3 patient C gene sets were identified including proliferation [58], ER status [11,39,40], and fibroblast and/or mesenchymal genes [55,59]. Notable distant metastasis features included the low expression of fibroblast genes (and a corresponding paucity of fibroblasts as defined by histological examination) and the high expression of the VEGF profile. The VEGF profile represents a in vivo defined gene expression program that includes a combination of cellintrinsic and cell-extrinsic factors. The cell-extrinsic factors have known roles as inducers of endothelial cell growth (VEGF and ANGPTL4), inducers of lymphatic vessel growth (ADM) [60], and smooth muscle cell dynamics (GAL); thus, the expression of this gene set would appear to increase the likelihood of tumor survival by causing de novo vessel formation and providing a dual conduit for metastatic spread. The cell-intrinsic factors include the high expression of SLC16A3, whose function is to efflux the lactic acid out of the cell that occurs during high glycolytic activity, and the expression of NDRG1, which is a known hypoxia-inducible gene [61,62]. In addition, the tumors that highly express the VEGF profile also highly express glycolytic enzymes. In total, our data suggests poor-outcome distant metastasis samples have the intrinsic ability to promote vessel formation, the intrinsic ability to live under anaerobic conditions, and have lost dependence upon fibroblasts.
Many genomic profiles for breast tumor metastasis biology have been identified, and we therefore compared them with each other and determined that significant correlations exist. In particular, all metastasis profiles tested correlated with 'intrinsic subtype'. For example, the Basallike subtype showed significant correlation with the 11gene stem cell profile, the lung and the bone metastasis profiles (consistent with these observations, one of the MetScore 3 Basal-like patients had distant metastases present in the bone, lung and liver). The Basal-like subtype also showed high expression of Snail, and Basal-like tumors have been shown to have other features of epithelial-mesenchymal transition [63] including vimentin expression [64].

Conclusion
The VEGF profile showed very significant prognostic value when using primary tumors, even when tested in models that contained many other expression predictors and clinical variables. We also believe it possible that the VEGF profile may have predictive value for angiogenesis inhibitors because it contains VEGF and ANGPTL4, which are inducers of angiogenesis. How, or if, the VEGF profile is correlated with response to angiogenesis inhibitors remains to be determined; however, our profile does suggests that effective anti-angiogenesis therapies for patients who express this profile may need to extend beyond VEGF to include the simultaneous targeting of ANGPTL4 and/or ADM.

Additional file 1
Supplementary