- Research article
- Open Access
- Open Peer Review
Improved survival among colon cancer patients with increased differentially expressed pathways
BMC Medicinevolume 13, Article number: 75 (2015)
Studies of colorectal cancer (CRC) have shown that hundreds to thousands of genes are differentially expressed in tumors when compared to normal tissue samples. In this study, we evaluate how genes that are differentially expressed in colon versus normal tissue influence survival.
We performed RNA-seq on tumor/normal paired samples from 175 colon cancer patients. We implemented a cross validation strategy to determine genes that were significantly differentially expressed between tumor and normal samples. Differentially expressed genes were evaluated with Ingenuity Pathway Analysis to identify key pathways that were de-regulated. A summary differential pathway expression score (DPES) was developed to summarize hazard of dying while adjusting for age, American Joint Committee on Cancer (AJCC) stage, sex, and tumor molecular phenotype, i.e., MSI, TP53, KRAS, and CIMP.
A total of 1,138 genes were up-regulated and 695 were down-regulated. These de-regulated genes were enriched for 19 Ingenuity Canonical Pathways, with the most significant pathways involving cell signaling and growth. Of the enriched pathways, 16 were significantly associated with CRC-specific mortality, including 1 metabolic pathway and 15 signaling pathways. In all instances, having a higher DPES (i.e., more de-regulated genes) was associated with better survival. Further assessment showed that individuals diagnosed at AJCC Stage 1 had more de-regulated genes than individuals diagnosed at AJCC Stage 4.
Our data suggest that having more de-regulated pathways is associated with a good prognosis and may be a reaction to key events that are disabling to tumor progression.
Please see related article: http://dx.doi.org/10.1186/s12916-015-0307-6.
Cancer is a multifaceted disease, characterized by dysregulation of multiple genes in multiple pathways. Gene expression studies have demonstrated the extent to which genes are altered in tumors. Studies of colorectal cancer (CRC) have shown that hundreds to thousands of genes are differentially expressed in tumors when compared to normal tissue samples . While gene expression studies are limited in their ability to distinguish the importance of individual genes that are differentially expressed in tumors; assessment of unique features of these genes and their associated pathways has shed light on important molecular differences between tumors. Studies have used gene expression data to classify tumor phenotypes as well as evaluate tumors with microsatellite instability [2-4]. Nannini et al.  summarized the utility of gene expression profiling into three categories: molecular diagnosis and disease classification; molecular characterization, including molecular staging, treatment prediction, and prognosis prediction; and target discovery that can lead to new treatment options.
In this study, we perform gene expression analysis with the goal of molecular characterization and prognosis prediction focused on identifying molecular pathways that are associated with outcome.
Our analysis takes a pathway approach with the goal of improving our knowledge of molecular diagnosis and prognosis prediction. We classify genes that are significantly differentially expressed into pathways relevant to the carcinogenic process for colon cancer. We evaluate the impact of these significant pathways on survival and disease stage. We believe that our sequential analytic approach will provide insight into the carcinogenic process and provide a better understanding of the biological significance of these pathways in colon cancer as well as insight into therapeutic possibilities.
We used RNA from 175 tumor and normal pairs who were part of the Diet, Activity, and Lifestyle study, which is an incident, population-based, case-control study of colon cancer conducted in Utah, the Kaiser Permanente Medical Research Program (KPMRP), and the Twin Cities Metropolitan area. Normal tissue was taken from tissue adjacent to the tumor and was determined to be free of any tumor cells by the study pathologist. Tumor and non-tumor colonic tissue (subsequently referred to this as ‘normal’) for RNA extraction were available from the Utah and KPMRP sites. Cases had to have tumor registry verification of a first primary adenocarcinoma of the colon and diagnosed between October 1991 and September 1994. Tumor tissue was obtained for 97% of all Utah cases diagnosed and for 85% of all KPMRP study participants , including those who signed informed consent and those retrieved by local tumor registries and sent to study investigators without personal identifiers. The study was approved by the Institutional Review Board of the University of Utah and at KPMRP.
We have previously assessed these tumor samples for Tumor protein p53 (TP53) and KRAS mutations, the CpG island methylator phenotype (CIMP) using the classic panel , and MSI based on the mononucleotides BAT26 and TGFβRII and a panel of 10 tetranucleotide repeats that were correlated highly with the Bethesda Panel ; our study was carried out prior to the Bethesda Panel development. We consider tumor molecular phenotype in our evaluation of survival since we have shown their association with survival [9,10]. Samples were selected to maximize numbers based on tumor molecular phenotype or TP53, KRAS, CIMP, and MSI tumor status.
RNA was extracted from formalin-fixed paraffin-embedded tissues. We assessed slides and tumor blocks that were prepared over the duration of the study prior to the time of RNA isolation to determine their suitability. Older slides produced comparable RNA quality as more recent slides; RNA quality was not correlated with time lapse between slide preparation and mRNA preparation. The study pathologist reviewed slides to delineate cancer and normal tissue. Cells were dissected from 1 to 4 sequential sections on aniline blue stained slides using a hematoxylin and eosin slide for reference. Total RNA was extracted, isolated, and purified using the RecoverAll Total Nucleic Acid isolation kit (Ambion). RNA yields were determined using a NanoDrop spectrophotometer.
Sequencing library preparation
Library construction was performed using the Illumina TruSeq Stranded Total RNA Sample Preparation Kit with Ribo-Zero. Briefly, ribosomal RNA was removed from 100 ng total RNA using biotinylated Ribo-Zero oligos attached to magnetic beads that are complimentary to cytoplasmic rRNA. Following purification, the rRNA-depleted sample is fragmented with divalent cations under elevated temperatures and primed with random hexamers in preparation for cDNA synthesis. First strand reverse transcription is accomplished using Superscript II Reverse Transcriptase (Invitrogen). Second strand cDNA synthesis is accomplished using DNA polymerase I and Rnase H under conditions in which dUTP is substituted for dTTP, yielding blunt-ended cDNA fragments in which the second strand contains dUTP. An A-base is added to the blunt ends as a means to prepare the cDNA fragments for adapter ligation and block concatemer formation during the ligation step. Adapters containing a T-base overhang were ligated to the A-tailed DNA fragments. Ligated fragments were PCR-amplified (13 cycles) under conditions in which the PCR reaction enables amplification of the first strand cDNA product, whereas attempted amplification of the second strand product stalls at dUTP bases and is therefore not represented in the amplified library. The PCR-amplified library was purified using Agencourt AMPure XP beads (Beckman Coulter Genomics). The concentration of the amplified library was measured with a NanoDrop spectrophotometer and an aliquot of the library was resolved on an Agilent 2200 Tape Station to define the size distribution of the sequencing library.
Sequencing and data processing
Sequencing libraries (18 pM) were chemically denatured and applied to an Illumina TruSeq v3 single read flow cell using an Illumina cBot. Hybridized molecules were clonally amplified and annealed to sequencing primers with reagents from an Illumina TruSeq SR Cluster Kit v3-cBot-HS. Following transfer of the flowcell to an Illumina HiSeq instrument, a 50 cycle single-read sequence run was performed using TruSeq SBS v3 sequencing reagents. The single-end 50-base reads from the Illumina HiSeq2500 were aligned to a sequence database containing the human genome chromosomes (build GRCh37/hg19, February 2009, from UCSC Genome Bioinformatics ) plus all splice junctions generated using the USeq MakeTranscriptome application (version 8.8.1, ). Alignment was performed using Novoalign (version 2.08.01, ), which also trimmed any adapter sequence. Following alignment, genome alignments to splice junctions were translated back to genomic coordinates using the USeq SamTranscriptomeParser application. The resulting alignments were sorted and indexed using the Picard SortSam application (version 1.100, ). Aligned read counts for each gene were calculated using the pysam  and SAMtools . A python script using the pysam library was given a list of the genome coordinates for each gene, and counts to the exons and UTRs of those genes were calculated. Gene coordinates were downloaded from the UCSC Genome Bioinformatics website .
Our data were compared to 51,041 molecular features in the gene table. Of these, 33,876 were excluded because of low to no expression in colon tissue or because they were non-coding or had no known function. We used the BioMart tool on the Ensembl website , to create a list of known regions linked to protein-coding genes from the human GRCh38 gene annotation dataset. We included the 17,165 features on 17,141 genes involved in protein coding for data analysis.
Of the 197 initial tumor/normal pairs, 5 subjects failed quality control based on the low number of sequence counts for both tumor and normal, and 17 were dropped because either the normal or tumor pair failed quality control, leaving 175 subjects with high quality data for inclusion in the analysis. From this pool of subjects, we randomly assigned people to group “A” or “B” to cross-validate findings pertaining to differentially expressed genes. To prevent biasing the data towards those genes differentially expressed among people who were alive since our population had more alive individuals than those who had died, we used balanced groups of people who died and matched people who were alive to those who died based on age category and sex. To assess differences in overall tumor vs. normal tissue expression level, we performed a paired comparison, resampling the data 20 times, and permuting the data 1,000 times using the program SAMseq implemented in the ‘samr’ package of R  for each group [19,20]. Fold-change was calculated as the ratio of the means of tumor expression to the means of normal expression. Further bioinformatics analysis and survival and stage analysis included the entire sample of eligible participants, analyzing only those features that were significantly differentially expressed in both groups A and B with a P <0.05 and over a two-fold change in expression level between normal and tumor tissue.
Bioinformatics analysis was performed on the list of Ensemble IDs found to be significantly differentially expressed between tumor and normal tissue at a P value of <0.05 with a two-fold change in both Group A and Group B. Our goal in the bioinformatics analysis was to identify key pathways that were deregulated in colon cancer and assess the potential impact of those pathways on survival. We utilized QIAGEN’s Ingenuity Pathway Analysis (IPA)  with the following criteria: A total of 1,138 unregulated features and 695 down-regulated feature Ensembl IDs were uploaded to IPA, and all but five were successfully mapped to Ingenuity. The five unmapped IDs were: ENSG00000184682, ENSG00000214999, ENSG00000251184, ENSG00000244255, ENSG00000167046; the first two IDs are for genes that were down-regulated and the last three are IDs for genes that were up-regulated. The IPA settings were as follows for General Settings: only genes from Ingenuity Knowledge Base were used, and both indirect and direct relationships were considered; for Network: both causal and interaction networks were included, for Interaction we included endogenous chemicals and we used the defaults set by Ingenuity, 35 molecules per network and 25 networks per analysis; for Data Sources: all data sources were used; for Confidence: only experimentally observed relationships were considered; for Species: all species were included; for Tissue: no specific tissue was selected; for Mutations: all mutations were included. For Species and Tissue selections, the ‘stringent filter’ option was selected. For the Canonical Pathways Analysis the selected scoring method was the B-H Multiple Testing Correction P value and for this method all pathways that score between 0 and 6.47 are displayed.
Genes that were significantly differentially expressed were grouped into pathways based on IPA summary data. This involved several steps. First, individuals were given a score for each differentially expressed gene depending on their level of differential expression: 1 was assigned to individuals in the bottom quartile of the distribution of differential expression (i.e., closer to normal), 2 was assigned to individuals whose differential expression for the gene was between the 25th and 75th percentile of the population differential expression, and 3 was assigned to those whose tumors were in the top (>75th percentile) level of differential expression. Next, individual differential gene expression scores (DGES) were summed to obtain a differential pathway expression score (DPES) for genes in the pathways that were statistically significant after adjusting for multiple comparisons as described by Benjamini and Hochberg . A higher DPES correlates with more genes being differentially expressed. DPES were categorized into tertiles for survival analysis using a Cox Proportional Hazard model, adjusting for age, sex, American Joint Committee on Cancer (AJCC) stage, and tumor molecular phenotype (TP53, KRAS, MSI, and CIMP) using SAS 9.4 (SAS Institute, Cary, NC, USA). We report hazard ratios (HR) and 95% confidence intervals (CI) associated with survival. Survival data were obtained from local tumor registries and reported as months survived from date of diagnosis to date of last contact or lost to follow-up. We report HR associated with CRC death where other causes of death were censored. Similar categories were used to evaluate mean DPES expression levels across AJCC stages of 1 through 4.
We analyzed gene expression in tumor/normal paired samples from 175 colon cancer patients using RNA-seq. Of these tumors, 47.9% were proximal and 52.1% were distal colon and were similar for both Group A and B (Table 1). Evaluation of tumor molecular phenotype showed that 25.7% were CIMP high, 18.3% were MSI, 27.4% were KRAS mutated, and 44% were TP53 mutated. The average age of the study participants included in these analysis was 65.2 years. Groups A and B were used to determine if significant differentially expressed genes were similar for most variables.
Of the 17,141 genes evaluated, using the parameter of two-fold change in addition to a P <0.05 for both groups, 1,138 were significantly up-regulated and 695 were significantly down-regulated between tumor and normal tissue. Of the 1,833 genes identified as having significant differential expression, 1,567 were linked to Cancer, while 1,290 were linked to Gastrointestinal Diseases in IPA. The main molecular and cellular functions that these genes contributed to were cellular growth and proliferation (715 genes with gene enrichment P values of 1.89 × 10-43 to 2.01 × 10-05), cell death and survival (632 genes with gene enrichment P values from 5.39 × 10-33 to 20.4 × 10-05), cell cycle (316 genes with enrichment P values from 6.25 × 10-26 to 2.19 × 10-05), cellular movement (415 genes with enrichment P values of 1.51 × 10-21 to 2.22 × 10-05), and cellular assembly and organization (275 genes with enrichment P values of 2.79 × 10-20 to 7.75 × 10-06). Additional file 1 has a complete list of differentially expressed genes analyzed and their level of expression.
We also linked these genes to major canonical pathways in IPA, summarized in Figure 1; green refers to down-regulated genes and red to up-regulated genes within the pathway. Our significant differentially expressed genes were significantly enriched in 30 pathways (Additional file 2 shows genes in our data that were associated with these pathways). For the most part, the pathways with the majority of genes being significantly down-regulated were in metabolic pathways (Thyroid Hormone Metabolism, Malatonin Degredation I, Seratonin Degradation, Superpathway of Melatonin Degradiation, and Nicotine Degradation III and II). The other two metabolic pathways, Superpathway of Serine and Glycine Biosynthesis and Purine Nucleotides De Novo Biosynthesis, were only up-regulated. The other 22 pathways that were differentially expressed were signaling pathways, where the majority of genes were up-regulated. Exceptions to this were Complement System and Eicosanoid Signaling where the majority of de-regulated genes were down-regulated.
In order to analyze a pathway’s prognostic value, we constructed a DPES that captures the extent to which a pathway is de-regulated in a given individual. Evaluation of DPES with CRC-specific mortality showed significant reduced mortality as the number of differentially expressed genes increased for several signaling pathways (Table 2). Of the pathways significantly enriched for genes that were differentially expressed between tumor and normal tissue in our data, 16 were significantly associated with CRC-specific mortality. One of these, Purine Nucleotides de Novo Biosynthesis II was a metabolic pathway that was up-regulated, while the other 15 were signaling pathways. In all instances, a higher DPES (T3) was associated with better survival. Similar associations were observed when looking at expression in tumors only. Additional file 3: Figures S1, Additional file 4: Figure S2, and Additional file 5: Figure S3 show Kaplan-Meier curves for the first three pathways listed in Table 2.
We further assessed mean levels of DPES by stage (Table 3) and observed that those diagnosed at AJCC Stage 1 had more differentially expressed genes than individuals who were diagnosed at AJCC Stage 4. This trend was present for most pathways, although the majority did not reach statistical significance, which may be indicative of limited power from the few individuals with an AJCC Stage 4 tumor.
To help interpret these results, we assessed upstream regulators using IPA. The top upstream of molecules associated with the de-regulated genes in our data were TGFB1 (P = 8.14 × 10-46), beta-estradiol (P = 1.21 × 10-41), TP53 (P = 1.90 × 10-38), CDKN1A (P = 1.41 × 10-37), and MYC (P = 4.59 × 10-36) (Table 4). It is interesting to note that TP53, which was predicted to be inhibited in our data, was actually up-regulated, although only significant in Group A and therefore not included in the analysis. This suggests that an indirect relationship with other molecules could have resulted in this shift in expected activation. The major network regulators regulated by the TGFB1, TP53, and MYC are shown in Figure 2a, b, and c, respectively. There were 20 mechanistic network regulators for TGFB1, 20 for TP53, and 22 for MYC that were significantly enriched in our data based on the number of differentially expressed genes belonging directly and indirectly to these networks (Additional file 6: Table S3 shows genes in our data that were directly or indirectly regulated by these pathways).
Our data illustrate the complexity of colon cancer and the extent to which genes are significantly differentially expressed in tumors. These differentially expressed genes are associated with many pathways and functions, many of which are associated with survival and provide insight into the broader carcinogenic process. Our data suggest that tumors with the most differentially expressed genes in key pathways are associated with better survival and less advanced disease stage. The differentially expressed genes were associated with upstream regulators that have previously been associated with colon cancer, such as TGFB1, MYC, and TP53. In our data, these genes were activated and influenced the downstream genes and pathways they regulate. It appears that activation of these pathways improves prognosis. This information could potentially be utilized to help determine biomarkers for treatment modalities that may influence survival.
Disruption of multiple biological pathways is a hallmark feature of the tumors. Our data illustrates the number of key pathways involved in the carcinogenic process and the number of genes showing significant differential gene expression in both test and validation data sets, with 1,138 features being significantly up-regulated and 695 being significantly down-regulated in both groups. Many of these pathways are comprised of genes involved in cell growth, differentiation, and apoptosis. One of the top pathways that showed significant differential expression in our data was Cell Cycle Control of Chromosomal Replication (55.6% of genes in pathway were significantly differentially expressed in our data). Additionally, in the Cell Cycle: G1/S Checkpoint Regulation pathway almost 30% of genes in the pathway were de-regulated in our data. These observations were further supported by the number of genes involved in key molecular and cellular functions of cellular growth and proliferation (715 genes), cell death and survival (632 genes), cell cycle (316 genes), cellular assembly and organization (275 genes), and cellular movement (415 genes). Also of interest was the observation that 54.2% of genes that showed significant differential expression were in the Estrogen-mediated S-phase Entry Canonical Pathway, which was the second most enriched pathway from our data. Estrogen status and hormone therapy have been shown to reduce risk of developing colon cancer and have been associated with better survival after diagnosis with colon cancer [23,24].
The major finding from this study is that better survival is seen in patients with more differential gene expression and DPES after adjusting for tumor stage and tumor molecular phenotype of TP53, KRAS, CIMP, and MSI. This observation was further supported by the higher DPES being observed for those who were diagnosed at AJCC Stage 1 vs. Stage 4. This implies that genes are activated as part of a cell response mechanism potentially to promote apoptosis and decrease tumor growth. Individuals who are able to initiate this response have better survival and tumors that are less likely to advance. This observation also supports the concept that genes downstream respond to upstream events that may be driving the carcinogenic process. While the phenomena of having more differentially expressed genes being associated with better survival could seem counterintuitive, given disrupted genes and gene regulation are a hallmark feature of tumors, others have noted similar observations that tumors from patients who live longer after cancer diagnosis have more differentially expressed genes . However, replication of these findings in other similar datasets is needed.
To help interpret the results, we further evaluated the top upstream regulators of pathways where we observed significant enrichment of differentially expressed genes in our data. Three of the top upstream regulators were TGFB1, MYC, and TP53; these regulators are frequently associated with colon cancer. TGFB1 is a growth factor that has been linked to apoptosis through multiple mechanisms [26,27] and is required to maintain homeostasis between apoptosis and cell growth. Multiple pathways, including MAPK signaling, SMAD, and JNK are linked to TFGB1 and its role on apoptosis. If TGFB1 is up-regulated, as it was in our data, it is possible it activates signaling cascades that lead to cell death, which in turn would improve survival. MYC regulates many functions, some that promote tumor growth, while others promote apoptosis . MYC expression has been associated with improved survival in the absence of TP53 mutations in one study . While we attempted to confirm this association, there were too few deaths among those with TP53 mutations to estimate the association; however, an inverse association was observed for higher MYC differential expression among those with non-TP53 mutated tumors. In our data, MYC primarily regulated transcription factors, playing a key role in downstream gene regulation, through both direct and indirect gene targeting. The importance of MYC as a regulator of transcriptional activation and repression in CRC also was noted by the Cancer Genome Atlas Network . Likewise, TP53, a frequently mutated gene in colon cancer , was significantly up-regulated in our data, and genes for which TP53 was an upstream regulator were differentially expressed in our data. As a tumor suppressor gene, TP53 is involved in apoptosis, so it is reasonable that enrichment of differentially expressed genes in this pathway could significantly have a favorable effect on survival.
There are several possible explanations for our observations that pathways with increased differential are associated with better survival. Large gene expression changes could be due to cell response to over-proliferation and attempt to shut down. Increased gene expression changes could be destabilizing for the tumor and lead to better overall survival. Alternatively, expression changes could be the reaction to immune signaling and infiltration.
This study has several strengths and limitations. First, we have a rich dataset in which to examine gene expression profiles of colon tumors. Because we had tumor and normal paired samples, we were able to evaluate differential gene expression. Although our normal tissue was from colonic tissue adjacent to the tumor, it could have undergone gene changes and therefore not truly ‘normal’. However, it is the only practical tissue available for comparison. Our sample was large enough that we were able to use both a test and retest set of data to validate findings and only evaluate those differentially expressed genes with survival that were differentially expressed in both groups. Our pathway approach enabled us to group genes together based on canonical pathways rather than evaluate genes individually. Other pathways and genes that were not identified in IPA could be important. Additionally, it is important to recognize that other genes or specific pathways could detrimentally influence survival. While a linear trend with increasing de-regulated genes would be expected, frequently this was not the case. Often, the largest drop in risk was going from tertile 1 to tertile 2. Additionally, we are only able to look at gene expression and not actual protein expression.
In summary, our data suggest that having more genes differentially expressed in colon tumors compared to normal tissues improves survival and the likelihood of being diagnosed at a less advanced disease stage. This may be the signature of a cellular response mechanism and an ongoing challenge is to identify the key factors that stimulate the activation of important upstream genes that are required to mount a cellular response to the initial drivers in the carcinogenic process and to understand the cellular response to those initiating events.
Our data suggest that having more de-regulated pathways is associated with a good prognosis and may be a reaction to key events that are disabling to tumor progression. This observation is re-enforced by the observation that people diagnosed at AJCC Stage 1 had more de-regulated genes than those diagnosed at AJCC Stage 4. These findings need confirmation in other studies.
American joint committee on cancer
CpG Island Methylator Phenotype
Differential gene expression score
Differential pathway expression score
Ingenuity pathway analysis
Kaiser Permanente Medical Research Program
v-myc avian myelocytomatosis viral oncogene homolog
- TGFB1 :
Transforming growth factor beta 1
- TP53 :
Tumor protein p53
Birkenkamp-Demtroder K, Christensen LL, Olesen SH, Frederiksen CM, Laiho P, Aaltonen LA, et al. Gene expression in colorectal cancer. Cancer Res. 2002;62:4352–63.
Budinska E, Popovici V, Tejpar S, D’Ario G, Lapique N, Sikora KO, et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J Pathol. 2013;231:63–76.
Burgess DJ. Gene expression: colorectal cancer classifications. Nat Rev Cancer. 2013;13:380–1.
Sanz-Pamplona R, Berenguer A, Cordero D, Riccadonna S, Sole X, Crous-Bou M, et al. Clinical value of prognosis gene expression signatures in colorectal cancer: a systematic review. PLoS One. 2012;7:e48877.
Nannini M, Pantaleo MA, Maleddu A, Astolfi A, Formica S, Biasco G. Gene expression profiling in colorectal cancer using microarray technologies: results and perspectives. Cancer Treat Rev. 2009;35:201–9.
Slattery ML, Edwards SL, Palmer L, Curtin K, Morse J, Anderson K, et al. Use of archival tissue in epidemiologic studies: collection procedures and assessment of potential sources of bias. Mutat Res. 2000;432:7–14.
Samowitz WS, Albertsen H, Herrick J, Levin TR, Sweeney C, Murtaugh MA, et al. Evaluation of a large, population-based sample supports a CpG island methylator phenotype in colon cancer. Gastroenterology. 2005;129:837–45.
Slattery ML, Curtin K, Anderson K, Ma KN, Ballard L, Edwards S, et al. Associations between cigarette smoking, lifestyle factors, and microsatellite instability in colon tumors. J Natl Cancer Inst. 2000;92:1831–6.
Samowitz WS, Curtin K, Ma KN, Edwards S, Schaffer D, Leppert MF, et al. Prognostic significance of p53 mutations in colon cancer at the population level. Int J Cancer. 2002;99:597–602.
Samowitz WS, Curtin K, Ma KN, Schaffer D, Coleman LW, Leppert M, et al. Microsatellite instability in sporadic colon cancer is associated with an improved prognosis at the population level. Cancer Epidemiol Biomarkers Prev. 2001;10:917–23.
UCSC Genome Bioinformatics. http://genome.ucsc.edu.
Novocraft technologies. www.novocraft.com/products/novoalign/.
Ensembl project. http://www.ensembl.org.
The Comprehensive R Archive Network. http://cran.r-project.org.
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:5116–21.
Li J, Tibshirani R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res. 2013;22:519–36.
QIAGEN’s Ingenuity Pathway Analysis. http://www.ingenuity.com/.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
Slattery ML, Anderson K, Samowitz W, Edwards SL, Curtin K, Caan B, et al. Hormone replacement therapy and improved survival among postmenopausal women diagnosed with colon cancer (USA). Cancer Causes Control. 1999;10:467–73.
Kampman E, Potter JD, Slattery ML, Caan BJ, Edwards S. Hormone replacement therapy, reproductive history, and colon cancer: a multicenter, case–control study in the United States. Cancer Causes Control. 1997;8:146–58.
Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks JD. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS Med. 2006;3:e13.
Sulkowska M, Wincewicz A, Sulkowski S, Koda M, Kanczuga-Koda L. Relations of TGF-beta1 with HIF-1 alpha, GLUT-1 and longer survival of colorectal cancer patients. Pathology. 2009;41:254–60.
Sanchez-Capelo A. Dual role for TGF-beta1 in apoptosis. Cytokine Growth Factor Rev. 2005;16:15–34.
Dang CV. c-Myc target genes involved in cell growth, apoptosis, and metabolism. Mol Cell Biol. 1999;19:1–11.
Smith DR, Goh HS. Overexpression of the c-myc proto-oncogene in colorectal carcinoma is associated with a reduced mortality that is abrogated by point mutation of the p53 tumor suppressor gene. Clin Cancer Res. 1996;2:1049–53.
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487:330–7.
Slattery ML, Curtin K, Wolff RK, Boucher KM, Sweeney C, Edwards S, et al. A comparison of colon and rectal somatic DNA alterations. Dis Colon Rectum. 2009;52:1304–11.
This study was funded by NCI CA48998. This research also was supported by the Utah Cancer Registry, which is funded by Contract #N01-PC-67000 from the National Cancer Institute, with additional support from the State of Utah Department of Health, the Northern California Cancer Registry, and the Sacramento Tumor Registry. The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official view of the National Cancer Institute. We would like to acknowledge the contributions of Dr. Bette Caan, Donna Schaffer, and Judy Morse at the Kaiser Permanente Medical Care Program in Oakland, CA; Sandra Edwards at the University of Utah for data collection and oversight; Wade Samowitz for slide review; and Brett Milash at the University of Utah Bioinformatics Core Facility for bioinformatics support.
The authors declare that they have no competing interests.
MLS obtained funding, designed study, oversaw data collection, oversaw statistical analysis, and wrote the manuscript. JSH conducted statistical analysis and approved final paper. LEM conducted bioinformatics analysis and approved the final paper. JG provided insight into study design, assisted with data interpretation, and approved final manuscript. RKW communicated with core laboratory for RNAseq analysis and approved the final manuscript. All authors read and approved the final manuscript.
Genes up and down regulated in both Group A and Group B.
Major Canonical Pathway and related gene enrichment and key molecules; pathways have a significant gene enrichment after adjustment for multiple comparisons.
Kaplan-Meier curve for de-regulated genes in the Thyroid Hormone Metabolism II in the IPA Canonical Pathway.
Kaplan-Meier curve for de-regulated genes in the Cell Cycle Control of Chromosomal Replication Pathway IPA Canonical Pathway.
Kaplan-Meier curve for de-regulated genes in the Estrogen-mediated S-phase Entry IPA Canonical Pathway.
Description of major upstream regulators and their targets in our data.