Circadian pathway genetic variation and cancer risk: evidence from genome-wide association studies

Background Dysfunction of the circadian clock and single polymorphisms of some circadian genes have been linked to cancer susceptibility, although data are scarce and findings inconsistent. We aimed to investigate the association between circadian pathway genetic variation and risk of developing common cancers based on the findings of genome-wide association studies (GWASs). Methods Single nucleotide polymorphisms (SNPs) of 17 circadian genes reported by three GWAS meta-analyses dedicated to breast (Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) Consortium; cases, n = 15,748; controls, n = 18,084), prostate (Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE) Consortium; cases, n = 14,160; controls, n = 12,724) and lung carcinoma (Transdisciplinary Research In Cancer of the Lung (TRICL) Consortium; cases, n = 12,160; controls, n = 16,838) in patients of European ancestry were utilized to perform pathway analysis by means of the adaptive rank truncated product (ARTP) method. Data were also available for the following subgroups: estrogen receptor negative breast cancer, aggressive prostate cancer, squamous lung carcinoma and lung adenocarcinoma. Results We found a highly significant statistical association between circadian pathway genetic variation and the risk of breast (pathway P value = 1.9 × 10–6; top gene RORA, gene P value = 0.0003), prostate (pathway P value = 4.1 × 10–6; top gene ARNTL, gene P value = 0.0002) and lung cancer (pathway P value = 6.9 × 10–7; top gene RORA, gene P value = 2.0 × 10–6), as well as all their subgroups. Out of 17 genes investigated, 15 were found to be significantly associated with the risk of cancer: four genes were shared by all three malignancies (ARNTL, CLOCK, RORA and RORB), two by breast and lung cancer (CRY1 and CRY2) and three by prostate and lung cancer (NPAS2, NR1D1 and PER3), whereas four genes were specific for lung cancer (ARNTL2, CSNK1E, NR1D2 and PER2) and two for breast cancer (PER1, RORC). Conclusions Our findings, based on the largest series ever utilized for ARTP-based gene and pathway analysis, support the hypothesis that circadian pathway genetic variation is involved in cancer predisposition. Electronic supplementary material The online version of this article (10.1186/s12916-018-1010-1) contains supplementary material, which is available to authorized users.


Background
The circadian clock is a time-tracking rhythmic biological system (internal timing machine) with a periodicity of about 24 h that enables organisms to anticipate environmental changes (such as food availability) and allows them to modify their behaviour and physiological functions (e.g. sleep and wakefulness, basal metabolism, body temperature, blood pressure, hormone production and immunity) in the most efficient way [1]. This system consists of two components: the central clock, located in the suprachiasmatic nucleus of the brain, and the peripheral clocks, which are present in virtually all body tissues. Circadian rhythms are controlled by what are called circadian pathway genes [2], which have been discovered in all studied species: remarkably, the disruption of these rhythms has been linked to the risk of different diseases such as insomnia, depression, jet leg, stomach ailments, heart attack and cancer [3]. As regards the latter, a growing wealth of evidence supports the potential tumour suppressor role of the biological clock [4]. In particular, single germline variations of circadian genes have been associated with the predisposition of some tumour types such as breast carcinoma [5], although the evidence is not conclusive due to the scarcity of data in this recent field of research.
Germline DNA variation has been long recognized as a key component of the individual risk to develop cancer, and recently the discovery rate of susceptibility loci is being greatly accelerated by genome-wide association studies (GWASs) which can test up to one million single nucleotide polymorphisms (SNPs) in thousands of subjects at a time [6]. However, the proportion of genetic susceptibility to complex traits (such as cancer) explained by single locus analysis still remains small, whereas it is increasingly recognized that multiple locus analysissuch as gene and gene set (or pathway) analysisis more powerful for dissecting the genetic architecture of complex diseases according to the principles of systems genetics [7]. In fact, a single SNP can have an effect too small to be detected by the single locus approach, whereas gene/pathway analysis, which jointly tests multiple SNPs from the same gene/pathway, can more likely identify the association between the outcome and the basic functional unit involved in disease development [8][9][10].
In this work, we intended to investigate whether germline genetic variation of the circadian pathway is associated with the risk of cancer by analysing publicly available data from GWASs. To this aim we chose to focus on the three most common tumour types, i.e. lung, breast and prostate carcinomas, which account for up to 40% of all cancer incident cases and observed deaths [11].

Study design
We conducted this study to test the hypothesis that germline DNA variation of the circadian pathway might be associated to the risk of cancer. To this aim we followed the principles described in the Strengthening the Reporting of Genetic Association Studies (STREGA) statement, an extension of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (www.strobe-statement.org) [12].
Briefly, the study was composed of three phases: (1) identification of circadian genes; (2) collection of single nucleotide variants of these genes that have been associated with the risk of the three most common malignancies; (3) conduction of adaptive rank truncated product (ARTP)-based gene and pathway analysis based on the P values of circadian gene SNPs retrieved from GWASs.
To this aim, we first defined the core circadian genes by querying the publicly available Molecular Signatures Database (MSigDB), which includes compiled gene sets from a variety of resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG, www.genome.jp/kegg), Gene Ontology (GO, www.geneontology.org) and others [13]. We also screened previously published literature dedicated to the circadian clock [1,2].
The physical position of these genes (including 3000 bp upstream and 1000 bp downstream)which was needed to retrieve the relevant SNPswas assessed using the National Center for Biotechnology Information (NCBI) Gene database (https://www.ncbi.nlm.nih.gov/gene).
Then we searched the NCBI database of genotypes and phenotypes (GaPdb) data repository (https:// www.ncbi.nlm.nih.gov/gap) as the source of publicly available GWAS findings on the three most frequently occurring tumour types: breast, prostate and lung carcinomas.
To be eligible, the data had to be from a GWAS and include the following information: (1) variant ID of the SNP, which allows one to know the variant physical position (with special regard to the relationship with the gene of interest) as well as its effect and reference alleles; (2) strength of association as expressed by the odds ratio (OR); (3) P value of the association test.
GWAS meta-analyses were deemed to be more informative than single GWASs due to the larger sample sizes achieved by consortia pooling the findings of multiple GWASs.

ARTP-based gene and pathway analysis
Gene and pathway analysis was carried out using the ARTP methodwhich was originally designed to analyse individual-level genotype dataextended to accept input from SNP-level summary statistics (summarybased ARTP, sARTP), as performed by the ARTP2 (version 0.9.22) R package [14,15].
Briefly, the ARTP method was developed to overcome the major limitations of other existing P-value-combining approaches (such as the Fisher's product method and the rank truncated product), which do not take into consideration the organization of the DNA into functional elements (that is, genes), ignore the linkage disequilibrium patterns between SNPs within the same gene/gene region, and arbitrarily specify a K rank truncation point so as to combine the K smallest P values as the summary statistics. Instead, ARTP takes into consideration the gene-based structure of biological pathways as well as the correlation between P values (which is estimated using an external panel of reference samples such as the 1000 Genomes Project), selects the optimal rank truncation point among a set of candidates and then adjusts the generated P value for multiple testing using a permutation procedure.
This gene-based pathway analysis first obtains a summary statistic for the association between each gene and the phenotype and then combines gene-level evidence using the ARTP method. A challenge for this approach is that it requires a multiple layer resampling procedure to calculate the significance of the pathway-level test statistic. In fact, a first layer of permutation is needed to generate the gene-level summary of association, a second layer is required to yield the P value associated with the pathway-level statistic for each truncation point and a third layer is necessary to assess the significance of the ARTP statistic after adjusting for multiple testing across different truncation points. Since this multi-level permutation procedure can become computationally intensive, the ARTP2 package implements an efficient algorithm using a single level of permutation iterations to achieve the goal of a multiple-level permutation procedure.
For the SNP selection process, we used a minor allele frequency (MAF) equal to or greater than 1% and a linkage disequilibrium r-squared lower than 0.9.
The number of candidate truncation points to inspect the top SNPs in a gene (or top genes in a pathway) was set at five, a truncation point being defined at every 20% of the top SNPs (or genes). In other words, considering Fig. 1 Schematic view of the circadian pathway. CLOCK and NPAS2 form heterodimers with ARNTL (also known as BMAL1) or ARNTL2 (BMAL2); these heterodimers act as transcription factors binding to enhancer box (E-box) elements upstream of target genes. Besides the clock-controlled genes (CCGs), which mediate the circadian pathway physiological functions, CLOCK and NPAS2 activate the transcription of other core circadian genes such as PER1, PER2, PER3 and CRY1, CRY2. PER and CRY proteins heterodimerize and activate a negative feedback loop acting directly on CLOCK and NPAS2. The activity of PER and CRY proteins is also regulated by additional proteins such as CSNK1E and CSNK1D (inhibition) and TIMELESS (unclear effect), respectively. CLOCK and NPAS2 also transactivate the expression of other pathway components such as NR1D1, NR1D2 (also known as REV-ERBs) and RORA, RORB and RORC (which are transcription factors acting through ROR/REV-ERB elements): these proteins can inhibit or enhance ARNTL transcription, respectively, which adds a further level of modulation of CLOCK/NPAS2 activity. Green lines, stimulatory effect (positive loop); red lines, inhibitory effect (negative loop) the case of a gene (or pathway) represented by 100 SNPs (or genes), the five truncation points will be the following: 20, 40, 60, 80 and 100. The P values were estimated by 1,000,000 resampling steps. Since some degree of genomic over-dispersion is often observed under a polygenic model (even in the absence of population stratification and other technical artifacts) [16], the results were adjusted by the lambda inflation factor reported by each eligible GWAS. For each analysis, we reported the following information: (1) the pathway P value, with the number of SNPs and genes contributing to the pathwaylevel analysis; (2) the gene P value of each gene contributing to the pathway analysis, along with the number of SNPs contributing to the gene-level analysis; (3) the top gene and SNP, defined as the gene and the SNP with the lowest P value from the gene-level analysis and the original GWAS (or GWAS meta-analysis), respectively.

GWAS
For each one of the three tumour types considered in this study, we found (and chose as the most informative data source) a meta-analysis of multiple GWASs: These data sources provided information on 15 out of 17 selected clock genes, as no SNPs were available for CSNK1D and TIMELESS. Overall, data on 181 SNPs were available (see Additional file 1: Table S1) and were utilized for ARTP-based pathway analysis, as described in the following section.

ARTP-based gene and pathway analysis
As regards breast cancer (all cases), we found a highly significant association between circadian pathway variation and risk of developing this tumour (circadian pathway P value 1.9 × 10 -6 ). This result was based on the data regarding 20 SNPs located in eight genes ( Table 1). The top gene and SNP were RORA (eight SNPs, circadian gene P value 0.0003) and RORB rs1018584 (GWAS meta-analysis P value 0.0007), respectively.
Upon subgroup analysis, the risk of estrogen receptor negative carcinoma was also associated with circadian pathway variation (circadian pathway P value: 2.4 × 10 -6 ), the finding being based on 15 SNPs located in seven genes ( Table 2). The top gene and SNP were RORA (seven SNPs, circadian gene P value 0.0002) and PER3 rs77404158 (GWAS meta-analysis P value 0.0003), respectively.
As for prostate cancer (all cases), there was a highly significant association between genetic variation of the circadian pathway and the susceptibility to this malignancy (circadian pathway P value 4.1 × 10 -6 ). This result was based on the data regarding 17 SNPs located in seven genes ( Table 1). The top gene and SNP were ARNTL (one SNP, circadian gene P value 0.0002) and ARNTL rs142435152 (GWAS meta-analysis P value 0.0002), respectively.
Subgroup analysis showed that the risk of aggressive prostate cancer was also associated with circadian pathway variation (circadian pathway P value 1.49 × 10 -6 ), the finding being based on 28 SNPs located in seven   Table 2). The top gene and SNP were RORA (12 SNPs, circadian gene P value 4.49 × 10 -6 ) and RORA rs17191414 (GWAS meta-analysis P value 0.000069), respectively. As regards lung cancer (all cases), we found a highly significant association between genetic variation of the circadian pathway and the risk of developing this tumour (circadian pathway P value 6.9 × 10 -7 ). This result was based on the data regarding 79 SNPs located in 13 genes ( Table 1). The top gene and SNP were RORA (27 SNPs, circadian gene P value 2.0 × 10 -6 ) and RORB rs77599950 (GWAS meta-analysis P value 0.0015), respectively.
The genes statistically significantly linked to the risk of one, two or all three tumour types (breast, prostate and lung carcinoma) are illustrated in the Venn diagram of Fig. 2.
The details of ARTP-based gene analysis are reported in Additional file 2: Table S2 (primary analysis, all cases included) and Additional file 3: Table S3 (subgroup analysis by histological subtype).

Discussion
In this study we found that germline genetic variation in the circadian pathway is associated with the risk of developing breast, prostate and lung carcinoma in a large cohort of cases (n = 42,068) and controls (n = 47,646). This association was also maintained in subgroup analyses for estrogen receptor negative breast cancer, aggressive prostate cancer and both squamous carcinoma and adenocarcinoma lung cancer. To the best of our knowledge, this is the first time that ARTP-based gene and pathway analysis has been applied to the relationship between circadian genes' germline variation and cancer susceptibility. Thus far, molecular epidemiology studies have investigated only single variants of single circadian genes in relationship with some tumour types (such as breast, pancreatic and prostate carcinomas) [5,20]. In particular, according to our recent systematic review and meta-analysis of the published literature on the subject [5], out of 687 SNPs (located in 14 circadian genes) only 10 SNPs located in five genes (NPAS2 rs10165970, rs895520, rs17024869 and rs7581886; CLOCK rs3749474 and rs11943456; RORA rs7164773 and rs10519097; RORB rs7867494; and PER3 rs1012477) resulted in being significantly associated with the predisposition to only one tumour type, that is, breast carcinoma. Moreover, none of the SNPs investigated in the three GWAS meta-analyses included in the present study reached statistical significance after adjustment for multiple testing [17][18][19]. In contrast, pathway analysis enabled us to link with high statistical significance (pathway P values always lower than 1 × 10 -5 ) the circadian pathway variation to the susceptibility not only of breast cancer but also to that of other two most common malignancies such as prostate and lung carcinoma. This relationship was sustained by 15 statistically significant genes out of 17 genes investigated, with only CSNK1D and TIME-LESS being excluded from the association (see Tables 1  and 2).
The implication of most circadian genes in all three tumour types (as well as all their subtypes) indicates that variation of this pathway could actually be involved in the predisposition to cancer in general, which still requires more investigation to be demonstrated in patients affected with malignancies other than those considered in this study. On the other hand, our results point out that the germline variation of some genes (ARNTL, CLOCK, RORA and RORB) is shared by all three tumour types, whereas the polymorphisms of other genes might be more specific to one or two malignancies (see the Venn diagram in Fig. 2). This finding suggests that some circadian genes might be more relevant than others in terms of cancer predisposition. In particular, it is noteworthy that all the above-mentioned four shared Fig. 2 Venn diagram showing the genes selected by pathway analysis as statistically significantly associated with the risk of one, two or three types of cancer considered in this study genes belong to the positive loop of the circadian pathway (that is, the stimulatory component of the biological clock circuit; see Fig. 1) and that RORA is the most significant gene associated with all tumour types (except for prostate carcinoma, where it ranks second) and subtypes (see Tables 1 and 2). However, the biological meaning of these observations requires dedicated studies to be elucidated. For instance, it is known that the CLOCK gene product activity can affect both estrogen [21] and androgen pathways [22], which is concordant with the relationship between circadian pathway perturbation and the risk of hormone-driven malignancies such as breast and prostate cancer, respectively; however, the association with lung carcinoma remains less intuitive and warrants further investigation on the cascade of molecular events underlying the link between the biological clock and this type of tumour.
Overall, our data underscore the fact that a biological relationship undetected by single polymorphisms can be unveiled by pathway analysis, confirming the power of this multi-SNP and multi-gene approach [8][9][10]23].
In particular, our results support the pre-clinical evidence regarding the candidate role of the circadian pathway as a tumour suppressor circuit acting through the transcriptional control of (or the direct interaction with) key regulators of cell proliferation, apoptosis and DNA repair (and thus genomic stability) and metabolism, such as Ciclin-D1, c-Myc, Mdm2, p53, Gadd45alpha, Atm, Chk1, Nampt and Sirt-1 [4,24,25], which are well known to play a pivotal role in carcinogenesis.
In a Mendelian randomization perspective (that is, using variation in genes of known function to examine the causal effect of a given environmental exposure/behaviour on disease, reasonably assuming that genes are not themselves associated with any confounding factors) [26,27], our data also support the hypothesis that the disruption of the physiological internal clockas in sleep deprivation, insomnia, work shifting and jet legmight lead to an increased risk of cancer, as suggested by some classical epidemiology studies [28][29][30][31] and confuted by others [32][33][34][35], with most of them focusing on breast cancer.
Certainly, we cannot draw any definitive conclusion on this subject, as dedicated studies of fine mapping are needed to systematically investigate the relationship between germline variation of the circadian pathway molecular components and cancer risk. Moreover, functional experiments are required to fully dissect the actual link between circadian pathway polymorphisms and the molecular mechanisms underlying cancer development. Finally, a pathway-based polygenic risk score [36,37] should be tested to translate genetic information into clinically valuable risk prediction. In fact, pathway analysis only provides evidence of association between a given biological circuit and the predisposition to a studied disease; it does not provide any clue to the magnitude of the risk linked to a specific (that is, individual) genetic signature. We hope that this study can represent a decisive step forward towards the personalization of cancer risk prediction, with potentially important implications in terms of screening programs [38].