Pan-cancer analysis of pre-diagnostic blood metabolite concentrations in the European Prospective Investigation into Cancer and Nutrition
BMC Medicine volume 20, Article number: 351 (2022)
Epidemiological studies of associations between metabolites and cancer risk have typically focused on specific cancer types separately. Here, we designed a multivariate pan-cancer analysis to identify metabolites potentially associated with multiple cancer types, while also allowing the investigation of cancer type-specific associations.
We analysed targeted metabolomics data available for 5828 matched case-control pairs from cancer-specific case-control studies on breast, colorectal, endometrial, gallbladder, kidney, localized and advanced prostate cancer, and hepatocellular carcinoma nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. From pre-diagnostic blood levels of an initial set of 117 metabolites, 33 cluster representatives of strongly correlated metabolites and 17 single metabolites were derived by hierarchical clustering. The mutually adjusted associations of the resulting 50 metabolites with cancer risk were examined in penalized conditional logistic regression models adjusted for body mass index, using the data-shared lasso penalty.
Out of the 50 studied metabolites, (i) six were inversely associated with the risk of most cancer types: glutamine, butyrylcarnitine, lysophosphatidylcholine a C18:2, and three clusters of phosphatidylcholines (PCs); (ii) three were positively associated with most cancer types: proline, decanoylcarnitine, and one cluster of PCs; and (iii) 10 were specifically associated with particular cancer types, including histidine that was inversely associated with colorectal cancer risk and one cluster of sphingomyelins that was inversely associated with risk of hepatocellular carcinoma and positively with endometrial cancer risk.
These results could provide novel insights for the identification of pathways for cancer development, in particular those shared across different cancer types.
Metabolomics allows the simultaneous measurement of a large variety of compounds present in biological samples, such as human blood [1, 2]. Circulating metabolite levels can reflect both endogenous and exogenous processes, providing a snapshot of biological activity [3, 4]. As a result, metabolomics may facilitate the identification of biological mechanisms involved in the development of chronic diseases. For example, prior metabolomics studies have identified metabolites associated with the risk of various chronic conditions, including type 2 diabetes (T2D) [5,6,7], cardiovascular diseases (CVD) [8,9,10], and different site-specific cancers, including cancers of the breast , prostate [12, 13], endometrium , kidney , colorectum [16,17,18], hepatocellular carcinoma (HCC) , and others [20, 21].
Several shared biological mechanisms are known to underlie multiple chronic diseases. Obesity, physical inactivity, and adherence to a Western-type diet, as well as chronic inflammation and insulin resistance, are recognized risk factors for cardio-metabolic diseases, including T2D, CVD, and several site-specific cancers [22,23,24]. Metabolomics may help uncover novel etiological mechanisms that are common to several chronic diseases as well as those that are disease-specific. One recent study identified metabolites associated with the risk of multimorbidity, defined as the simultaneous presence of multiple chronic conditions within one individual . Focusing on a pre-defined panel of metabolites, a targeted metabolomics study of breast, prostate, and colorectal cancers in a German population found that circulating levels of the phosphatidylcholine PC ae C30:0 and several lysophosphatidylcholines, including lysoPC a C18:0, were predictive of the development of any of these three cancers , suggesting that some etiological mechanisms could be shared across multiple cancer types.
In this work, we extended this concept by leveraging targeted metabolomics data available within nested case-control studies on eight cancer types (breast, colorectal, endometrial, gallbladder and biliary tract, kidney, localized prostate and advanced prostate cancers, and HCC) previously acquired in the European Prospective Investigation into Cancer and Nutrition (EPIC) [11, 12, 14, 15, 19]. The data-shared lasso [27,28,29], a penalized multivariate approach specifically designed for the investigation of a set of shared risk factors across different disease outcomes, was used to carry out a multivariate pan-cancer analysis to identify mutually adjusted metabolites associated with cancer risk and to identify those metabolites with consistent or heterogeneous patterns of associations across the eight cancer types.
EPIC is an ongoing multicentric prospective study with over 500,000 men and women recruited between 1992 and 2000 from 23 centres in 10 European countries , originally designed to study the relationship between diet and cancer risk. Incident cancer cases were identified through a combination of methods, including health insurance records, cancer and pathology registries, and active follow-up through study participants and their next-of-kin. At recruitment, information on diet and lifestyle was collected via self-administered questionnaires. Blood samples were collected from around 386,000 participants according to a standardized protocol. In France, Germany, Greece, Italy, the Netherlands, Norway, Spain, and the UK, serum (except in Norway), plasma, erythrocytes, and buffy coat aliquots were stored in liquid nitrogen (− 196 °C) in a centralized biobank at the International Agency for Research on Cancer (IARC). In Denmark, blood fractions were stored locally in the vapour phase of liquid nitrogen containers (− 150 °C), and in Sweden, they were stored locally at − 80 °C in standard freezers. Fasting was not required.
Our analyses used a set of metabolomics measurements from 15,948 EPIC participants from seven cancer-specific matched case-control studies nested within EPIC (Table 1). In each study, each case was matched to one control selected among cancer-free participants (other than non-melanoma skin cancer) by risk set sampling, using matching factors that included study centre, sex, age at blood collection, time of the day of blood collection, fasting status, and use of exogenous hormones for women.
All participants provided written informed consent to participate in the EPIC study. The cancer-specific case-control studies were all approved by the ethics committee of IARC and participating EPIC centres.
As summarized in Table 1, pre-diagnostic blood samples were assayed at the Helmholtz Zentrum (München, Germany) for the second colorectal cancer study, at Imperial College London (UK) for the endometrial cancer study, and at IARC for all other studies. Data for a total of 171 metabolites were acquired by tandem mass spectrometry using either the AbsoluteIDQ p150 (for the second colorectal cancer study) or the AbsoluteIDQ p180 commercial kit (Biocrates Life Science AG, Innsbruck Austria). Two successive assays were used, liquid chromatography-tandem mass spectrometry (LC-MS/MS) for amino acids and biogenic amines, and flow injection analysis-tandem mass spectrometry (FIA-MS/MS) for the other metabolites. Samples were either serum or citrate plasma, and samples within each study were all from the same type of blood matrix, except for the breast cancer study (Table 1). Samples of each case-control pair were assayed on the same batch (and in the same laboratory).
Selection of the metabolites and data pre-processing
Data were pre-processed following an established procedure . Briefly, metabolites with more than 25% missing values in any study were excluded. Samples with more than 25% missing values overall were excluded, as were those detected as outliers by a principal component analysis (PCA)-based approach applied within each study separately. Then, for all metabolites measured by FIA with a semi-quantitative method (acylcarnitines, glycerophospholipids, sphingolipids, hexoses), measurements below the batch-specific limit of detection (LOD) were imputed to half the LOD. When the batch-specific LOD was unknown, LOD was first set to study-specific medians of known batch-specific LODs. For the metabolites measured with a fully quantitative approach (amino acids and biogenic amines), measurements below the lower limit of quantification (LLOQ) or above the upper limit of quantification (ULOQ) were imputed to half the LLOQ or to the ULOQ, respectively. For all metabolites, other missing values were imputed to the batch-specific median of the non-missing measurements. The resulting measurements were then log-transformed to improve symmetry.
Cancer types and exclusion criteria
We focused on eight cancer types, namely breast, colorectal, endometrial, kidney, gallbladder and biliary tract cancers, HCC, and advanced and localized prostate. As detailed in the Supplementary Material (Additional file 1: Section 1 [12, 19]), matched case-control pairs for HCC and gallbladder and biliary tract cancer were extracted from the liver cancer study, while matched case-control pairs for advanced and localized prostate cancer were extracted from the prostate cancer study. Since hormones could affect metabolite levels and their association with cancer risk , women using exogenous hormones (either hormone replacement therapy or oral contraceptive) at baseline were excluded.
All analyses were performed using R software. Characteristics of cases and controls for the eight studied cancer types were described using the mean and standard deviation or frequency. Pearson correlations between the metabolites were computed in controls only to reduce collider bias.
Clustering of metabolites
The most strongly correlated metabolites were grouped together by applying the hierarchical clustering approach implemented in the ClustOfVar R package  to the control samples. For each cluster, the method defined its representative as the first principal component in the PCA of the metabolites grouped into that cluster. In our figures and tables, cluster representatives were labelled as “xxx_clus”, with “xxx” representing one particular metabolite that composed that cluster. We retained the model with the lowest number of clusters such that representatives explained at least 80% of the total variation in each cluster. Cluster representatives and metabolites left isolated after the clustering were simply referred to as metabolites hereafter.
Given the number of studied metabolites, penalized conditional logistic regression models were used to estimate mutually adjusted associations with cancer risk. Since body mass index (BMI) could be a strong confounder of the relationship between several of the examined metabolites [33, 34] and cancers [35,36,37,38,39], metabolite-specific linear models were used to compute residuals on BMI. To account for the large number of metabolites and leverage possible commonalities among the metabolic disorders preceding cancer development for different cancer types, estimation was based on the data-shared lasso [27,28,29], an extension of the lasso  allowing the analysis of case-control studies with multiple disease types. For each metabolite, the data-shared lasso decomposes its type-specific odds ratio as the product of (i) an overall odds ratio capturing the overall association with cancer and (ii) type-specific deviations from this overall odds ratio. Then, the method identifies whether its overall (mutually adjusted) association with cancer is null or not and also whether some of its type-specific associations deviate from its (possibly null) overall association with cancer. Compared to more standard approaches, the data-shared lasso was shown to perform particularly well for the identification of features with a consistent non-null association with multiple disease types, while also allowing for the identification of type-specific associations . The data-shared lasso along with its implementation are described further in the Supplementary Material (Additional file 1: Section 2 [27,28,29, 41,42,43,44,45]).
To assess the robustness of the identified associations, the data-shared lasso was applied repeatedly on 100 bootstrap samples generated from the original sample . Moreover, following the rationale of the lasso-OLS hybrid , associations identified by the data-shared lasso were further inspected using unpenalized conditional logistic regression models, (i) to quantify their strength and investigate possible heterogeneity among the type-specific associations beyond those identified by the data-shared lasso (see Additional file 1: Section 3 [47, 48] for details); (ii) to assess possible departure from linearity by comparing models with natural cubic splines to models with linear terms only; and (iii) to assess possible attenuation after excluding, in turn, first 2 and first 7 years of follow-up (to examine potential reverse causation and more generally assess the impact of time to diagnosis on our findings), after adjustment for additional factors (education level, waist circumference, height, physical activity, smoking status, alcohol intake, use of non-steroidal anti-inflammatory drugs, and, for women, menopausal status and phase of menstrual cycle in premenopausal women), and after reintegrating the pairs comprising at least one hormone user. Effect modification by BMI was assessed under standard (i.e., non-conditional) logistic regression models after breaking the matching and correcting metabolite measurements for batch and study effects . Finally, to assess the impact of the exclusion of pairs with missing information on tumour stage in the prostate study, the data-shared lasso was applied to 100 bootstrap samples generated from the sample comprising all pairs from the prostate study, after considering an additional subtype (“unknown stage”) for prostate cancer.
For comparison, non-mutually adjusted associations with cancer risk were estimated for each metabolite in conditional logistic regression models adjusted for BMI. Those analysis and subsequent results are presented in the Supplementary Material (Additional file 1: Section 4 ).
Analysis of additional metabolites
The 16 metabolites (Additional file 2: Table S1) that were not acquired in the second colorectal cancer study (AbsoluteIDQ p150 kit) were not included in our main analysis and were examined in a reduced sample, using the methods described above.
Among the 118 metabolites that were measured in all cancer type-specific studies of the main analysis, the acylcarnitine C4-OH (C3-DC) was the only one that was missing in more than 25% of the samples of at least one study (prostate) and was excluded. Exclusions of subjects are detailed in Fig. 1. Briefly, 44 samples were initially excluded due to being either assayed on batches with less than 10 samples (6 samples), identified as outlying samples (2 samples), or unmatched to either a case or a control sample (36 samples). Seventy-nine pairs from the liver study were also excluded, having developed a liver cancer other than HCC or GBC, along with 1164 pairs from the prostate study for which no information on the tumour stage was available for the case. Finally, 881 pairs including at least one exogenous hormone user at blood collection were excluded.
Description of the study population
A total of 11,656 EPIC participants were included in the analysis comprising 5828 matched case-control pairs. Cases were diagnosed at an average age of 64.4 years, 8.4 years after blood collection. The main characteristics of cases and controls in each study are displayed in Table 2.
The main analysis focused on 117 metabolites that were retained after the pre-processing step (Additional file 2: Table S1). As displayed in Additional file 2: Figure S1, strong positive correlations were observed between some metabolites, particularly between some of the glycerophospholipids (phosphatidylcholines, PCs, and lysophosphatidylcholines, lysoPCs) and sphingomyelins (SMs).
Clustering of metabolites
The hierarchical clustering applied to controls grouped 100 metabolites into 33 clusters of size ranging from 2 to 6 metabolites per cluster, while 17 metabolites remained isolated. As displayed in Fig. 2, clusters comprised metabolites of the same chemical class, and correlations between metabolites and their representative were consistently greater than 0.83. On average, clusters’ representatives explained 86% of the total variation of their cluster (range: 80–95%), and the 33 + 17 = 50 studied metabolites together explained more than 88% of the total variation of the original 117 metabolites.
As displayed in Figs. 3 and 4, the data-shared lasso identified nine metabolites with a non-null overall association with cancer: butyrylcarnitine (acylcarnitine C4), glutamine, lysoPC a C18:2, and three clusters of PCs (those containing PC aa C32:2, PC aa C36:0, and PC aa C36:1, respectively), with an inverse overall association with cancer risk, and decanoylcarnitine (acylcarnitine C10), proline, and the cluster of PCs that included PC aa C28:1 with a positive overall association. Cancer type-specific deviations from the overall association with cancer risk were identified for three of these metabolites: the association between proline and breast cancer risk was inverse or null, while the associations between lysoPC a C18:2 and the cluster containing PC aa C36:0 with localized prostate cancer were positive or null.
Several cancer type-specific associations were identified among the remaining 41 metabolites. Specifically, positive associations were observed between breast cancer risk and two clusters that included tetradecenoylcarnitine (acylcarnitine C14:1) and PC aa C36:5, respectively. The risk of colorectal cancer was positively associated with arginine and PC ae C36:0 and inversely associated with the cluster that included histidine. The risk of HCC was positively associated with the cluster containing PC aa C40:2 and inversely associated with the two clusters that included lysoPC a C20:3 and SM C16:0, respectively. This latter cluster was also positively associated with endometrial cancer risk. The cluster that included octadecenoylcarnitine (acylcarnitine C18:1) was inversely associated with the risk of advanced prostate cancer. Finally, the risk of localized prostate cancer was inversely associated with hexoses (H1).
The strength of the associations identified by the data-shared lasso was similar after excluding, in turn, the first 2 and the first 7 years of follow-up (Additional file 2: Fig. S2), and after reintegrating the 881 pairs comprising at least one hormone user (Additional file 2: Fig. S3). Likewise, models adjusted for additional factors produced similar associations (Additional file 2: Fig. S2), except for the overall association with cancer for the cluster that included PC aa C28:1, whose odds ratio (OR) was attenuated from 1.09 (95% confidence interval: 1.01–1.17) to 1.04 (0.98–1.12), and for the association between endometrial cancer risk and the cluster that included SM C16:0, whose OR decreased from 1.51 (1.19–1.93) to 1.20 (0.97–1.47). For each overall association and type-specific deviation identified by the data-shared lasso, linearity and absence of effect modification by BMI were compatible with our data (Additional file 2: Fig. S4). Focusing on the nine metabolites that had a non-null overall association with cancer, the analysis presented in Additional file 2: Fig. S5 suggested possible cancer type-specific deviations from the overall associations beyond the three ones identified by the data-shared lasso, in particular for HCC (with acylcarnitine C4, proline, and the cluster that comprises PC aa C36:1) and for kidney cancer (with acylcarnitines C10 and C4 and the cluster that comprises PC aa C36:1). However, none of the comparisons between the models identified by the data-shared lasso and the nine “extended” models used to derive these fully cancer type-specific associations reached statistical significance (Additional file 2: Fig. S5).
As displayed in Table 3 (third column), 15 out of the 22 associations identified by the data-shared lasso were replicated in more than 50% of the bootstrap samples. As displayed in Table 4, three inverse cancer type-specific associations that were not identified by the data-shared lasso on the original sample were identified in more than 55% of the bootstrap samples: the cluster comprising glycine with endometrial cancer risk (identified in 65% of the bootstrap samples) and the cluster containing decenoylcarnitine (acylcarnitine C10:1) with risk of kidney cancer (56%) and lysoPC a C16:1 with risk of localized prostate cancer (84%). Positive associations between arginine and kidney cancer risk (74%) and between the cluster containing lysoPC a C16:0 and localized prostate cancer risk (86%) were also observed in more than 55% of the bootstrap samples.
Results obtained on the bootstrap samples generated from the extended sample comprising all the pairs from the prostate study are presented in Additional file 2: Tables S2 and S3. Fifteen associations out of the 22 identified in our main analysis were replicated in more than 50% of these bootstrap samples. A few additional overall and type-specific associations were identified in a large proportion of the bootstrap samples (see Additional file 2: Table S3). In particular, an inverse association between acylcarnitine C10 and unknown stage prostate cancer was observed in 80% of the samples.
Analysis of the extended list of metabolites
After excluding 2134 samples from the second colorectal cancer study which used a different platform that measured a lower number of metabolites, 16 additional metabolites could be evaluated (Additional file 2: Table S1). Among them, the clustering step grouped leucine and isoleucine together. The analysis of this extended list of metabolites then focused on 65 metabolites (31 isolated metabolites and 34 cluster representatives), measured in 9522 participants. As displayed in Table 3, 11 out of the 22 associations identified in the main analysis presented above were again replicated in more than 50% of the bootstrap samples generated from this reduced sample. Four associations that were not identified in our previous analyses were identified in more than 55% of these new bootstrap samples (Table 4): an overall positive association between cancer risk and glutamate (55% of the bootstrap samples), an overall inverse association between cancer risk with spermine (78%), and two cancer type-specific associations between glutamate with breast cancer risk (inverse, 56%) and between serotonin and colorectal cancer risk (positive, 84%).
Using available metabolomics data from eight cancer-specific matched case-control studies nested within the EPIC cohort, we investigated the relationship between pre-diagnostic blood levels of over one hundred metabolites and risks of breast cancer, colorectal cancer, endometrial cancer, gallbladder and biliary tract cancer, HCC, kidney cancer, and localized and advanced prostate cancers. In our main analysis, we found nine metabolites associated with cancer risk across different cancer types, suggesting the existence of shared metabolic pathways, as well as fourteen cancer type-specific associations. These identified associations were found to be robust after extensive sensitivity analyses: in particular, they were not attenuated after exclusion of the first years of follow-up, hence were less likely to be due to reverse causality, were not attenuated after adjustment for relevant cancer risk factors, were not modified by BMI, and did not deviate significantly from linearity. In additional analyses, in particular those based on bootstrap samples, we identified several additional metabolites possibly associated with the risk of specific cancer types or with cancer risk across different cancer types.
Our results suggested that concentrations of glycerophospholipids (phosphatidylcholines and lysophosphatidylcholines) could be linked to the risk of cancer overall as well as to specific cancer types. The role of glycerophospholipids in carcinogenesis is not fully understood but could be related to their documented anti-inflammatory properties, protection from oxidative stress, inhibition of cell proliferation, and induction of apoptosis [50,51,52]. We observed a consistent inverse association between cancer risk with lysoPC a C18:2 as well as three clusters of phosphatidylcholines across all studied cancer types, except localized prostate cancer for which the association with lysoPC a C18:2 and one cluster of phosphatidylcholines was absent, or positive. An inverse association was previously reported between lysoPC a C18:2 with T2D in different studies [7, 53] as well as with risks of breast, colorectal, and prostate cancers in the pan-cancer analysis conducted in the EPIC Heidelberg study . Our results regarding the three clusters of phosphatidylcholines were in line with many previously reported inverse associations between cancer and phosphatidylcholines [11, 12, 15, 16, 20, 54]. Besides, we identified a positive association between the cluster that included PC aa C28:1 and cancer risk across all studied cancer types. This cluster also comprised PC ae C30:0, for which a positive association was reported with risks of breast, colorectal, and prostate cancers in the EPIC Heidelberg study . Cancer type-specific positive associations were found for the cluster containing PC aa C36:5 with breast cancer, PC ae C36:0 with colorectal cancer, and the cluster containing PC aa C40:2 with HCC. These three clusters were correlated with one another (Pearson correlation greater than 0.48), indicating that higher levels of these phosphatidylcholines might contribute to the development of these three cancer types.
We also observed robust associations between specific circulating amino acids and cancer risk. Our results suggested that proline was positively related to cancer risk across all studied cancer types, except breast cancer and possibly HCC (see Additional file 2: Fig. S5). A positive association between proline and prostate cancer risk was previously reported in EPIC . In addition, a drosophila model of high-sugar diet  recently highlighted the possible role of proline in tumour growth, and proline was also found to distinguish colorectal cancer patients from those with adenomas  and to be associated with metastasis formation . In the body, proline is generally synthesized via the glutamate/pyrroline 5-carboxylate pathway . Glutamate was also found to be positively related to the risk of all cancer types except for breast cancer in our analysis. Moreover, glutamate is formed from the degradation of glutamine, which was inversely associated with overall cancer risk. Although prior studies of the French E3N and SU.VI.MAX cohorts reported a positive association between glutamine and premenopausal breast cancer [59, 60], our results regarding glutamine and glutamate were consistent with those of many previous studies that reported inverse associations between glutamine and risk of colorectal cancer , HCC [19, 61], and T2D [7, 25] and positive associations between glutamate and risk of premenopausal breast cancer , kidney cancer , HCC [19, 61], and T2D . Lower serum levels of glutamine were also observed in kidney cancer  and ovarian cancer  cases compared to controls. Glutamine is an energy substrate for cancer cells and makes a major contribution to nitrogen metabolism. Alterations in glutamine-glutamate equilibrium often reflect energetic processes related to cancer metabolism . It is possible that altered levels of glutamine and glutamate in individuals subsequently diagnosed with cancer may reflect ongoing metabolic processes related to cancer development and as such may serve as an early biomarker of cancer risk. However, the inverse association between glutamine levels and overall cancer risk observed in our analysis was only slightly attenuated after excluding, in turn, the first 2 and the first 7 years of follow-up suggesting that changes in the glutamine-glutamate may precede cancer development.
Our analysis additionally identified two positive and two inverse cancer type-specific associations with circulating amino acids. We observed an inverse association between colorectal risk and the cluster containing histidine, for which previous studies reported inverse associations with risks of colorectal cancer and T2D , while a positive association was reported with breast cancer . Also, lower serum levels of histidine were previously reported in ovarian cancer cases compared to controls . Our results further suggested an inverse association between endometrial cancer risk and the cluster composed of glycine and serine, in line with previous results from the EPIC cancer-specific study of endometrial cancer . Previous studies also reported inverse associations between glycine and/or serine with risks of T2D . Finally, our analysis suggested a positive association between arginine with risks of colorectal and kidney cancers (Table 4). Arginine plays a key role in nitric oxide production and polyamines synthesis . Both have been found to be associated with tumour growth, with polyamines enhancing it and nitric oxide inhibiting it. Arginine’s influence on tumour growth thus might be related to the relative activity of those two pathways. For instance, arginine was previously found to be positively associated with breast cancer in the E3N cohort , while an inverse association with breast cancer was reported in EPIC .
Regarding the biogenic amines, we found a positive association between serotonin levels and colorectal cancer risk, consistent with previous results from the CORSA case-control study and a previous EPIC analysis of colon cancer . We also found a consistent inverse association between spermine and the risk of the eight studied cancer types. Like other polyamines, spermine is involved in cell proliferation and differentiation and has antioxidant properties , and dysregulation of polyamine metabolism is characteristic of multiple types of tumours . It was previously reported that polyamine supplementation, in particular spermidine, which acts as an intermediate in the conversion of putrescine to spermine, could be related to reduced overall and cancer-specific mortality [70,71,72].
In our analysis, localized and advanced prostate cancers were considered as two different outcomes as previous results suggested that metabolic dysregulation might be predictive of advanced or aggressive prostate cancers only . In fact, we observed some differences between the metabolites associated with risks of localized and advanced prostate cancers, respectively. Specifically, and as previously reported [12, 13], our results suggested that hexoses, glycerophospholipids, octadecenoylcarnitine (acylcarnitine C18:1), and/or octadecadienylcarnitine (acylcarnitine C18:2) could help differentiate the respective mechanisms involved in the development of aggressive and localized prostate tumours. On the other hand, the positive association with decanoylcarnitine (acylcarnitine C10), which was observed with risk of all cancer types, and in particular with both localized and advanced prostate cancer risk, was notably attenuated when including the unknown stage prostate cancer pairs: it was only detected in 44% of the bootstrap samples generated from that extended sample (see Additional file 2: Table S2), in line with the inverse association between decanoylcarnitine and unknown stage prostate cancer that was observed in 80% of the samples (Additional file 2: Table S3). Overall, these results suggested that the positive association between decanoylcarnitine and prostate cancer identified in our main analysis might not be real and might be due to an association between decanoylcarnitine and cancer stage missingness in our prostate cancer study.
Some metabolites identified in our study were previously associated with established cancer risk factors, such as obesity [33, 34]. In particular, a recent metabolomics study of BMI reported inverse associations with glutamine, lysophosphatidylcholine a C18:2, and phosphatidylcholine PC aa C38:0 (which was clustered with PC aa C36:0 in our analysis) and a positive association with glutamate. Directions of the associations with BMI were consistent with those identified in our study with cancer risk after adjustment for BMI, indicating that these metabolites might be mediators of the obesity-cancer relationship.
Our study has several strengths. First, it relied on a large sample of pre-diagnostic metabolomics data acquired among 5828 case-control pairs in nested studies on eight cancer types within a large prospective cohort, on average 6.4 years before cases developed cancer. Second, in a context where some metabolites might be predictive of cancer risk for multiple cancer types, the data-shared lasso used in our analysis automatically accounted for or ignored cancer types when assessing the association between each metabolic feature with cancer risk, depending on whether heterogeneity among the cancer type-specific associations was supported by the data for that particular feature. The comparison of results produced by the standard univariate analyses and the data-shared lasso illustrated the interest of the latter. First, the data-shared lasso benefited from the increased statistical power of the pooled analysis for the identification of metabolites that could be involved in cancer development for multiple cancer types: for example, butyrylcarnitine (acylcarnitine C4) was not associated with cancer risk in any of the cancer type-specific univariate analyses, while it was in the univariate pooled analysis and in the data-shared lasso analysis. Moreover, unlike the simple pooled analysis, the data-shared lasso would not necessarily mask cancer type-specific associations: for example, the data-shared lasso identified a positive association between the cluster containing tetradecenoylcarnitine (acylcarnitine C14:1) and breast cancer risk, as the univariate analysis of the breast cancer study did, while the univariate pooled analysis could not. Another key difference between the standard univariate analyses and the data-shared lasso is that the latter allowed the investigation of mutually adjusted associations, hence the identification of metabolites or clusters of metabolites whose association with cancer risk could not be explained away by other metabolites included in our analysis. Furthermore, mutual adjustment revealed associations that could not be detected in minimally adjusted models, such as the one between arginine and colorectal cancer risk, which was not apparent in models not adjusted for glutamine and histidine. Another strength of our study stemmed from the extensive sensitivity analyses that we carried out.
On the other hand, identifying cancer risk factors is particularly challenging when candidate risk factors are strongly correlated with one another. Here, we clustered the most strongly correlated metabolites together prior to applying the data-shared lasso. As a sensitivity analysis, the data-shared lasso was applied to the original set of 117 metabolites, thus ignoring the clustering step, and the results were largely consistent with those of our main analysis (Additional file 2: Fig. S7). Moreover, because strong correlations remained among some of the metabolites produced by the hierarchical clustering (Additional file 2: Fig. S8, Additional file 2: Fig. S9), we applied the data-shared lasso to multiple bootstrap samples to gauge the robustness and specificity of the associations identified in our main analysis. Although most of the identified associations were replicated in a large proportion of bootstrap samples, a few of them were less robust, hence more questionable. For example, the identified inverse association between HCC risk and the cluster that included lysoPC a C20:3 was replicated in 32% of the bootstrap samples only. This lack of robustness could be due to the strong correlation between this cluster and the other three studied metabolites related to lysoPCs (Pearson correlation greater than 0.65; Additional file 2: Fig. S8). As a matter of fact, an inverse association between HCC risk and at least one of the four metabolites related to lysoPCs was identified in 78% of the bootstrap samples. Overall, these results were suggestive of a stronger inverse association with features related to lysoPCs for HCC compared to the other cancer types, but our analysis failed to unambiguously identify which specific lysoPCs might underlie this stronger inverse association. An additional limitation for interpreting the lipid results is the lack of specificity for lipids measured with the AbsoluteIDQ p180/p150 kits as a result of the FIA method [73, 74], which does not allow for unambiguous identification of the compounds measured since the signal observed could correspond to several compounds. Moreover, the limited sample size for some of the studied cancer types (in particular, gallbladder and biliary tract cancer and HCC) was a limitation for the identification of cancer type-specific deviations. In this respect, we complemented our analysis by the inspection of estimates computed under models derived from the one identified by the data-shared lasso but that further allowed fully type-specific associations (Additional file 2: Fig. S5). Another potential limitation of our study was the lack of repeated measurements, yet previous studies suggested that blood levels of metabolites were relatively stable and that a single measurement might be sufficient to capture medium-term exposure [75,76,77].
Our results confirmed the complex link between metabolism and cancer risk and highlighted the potential of metabolomics to identify possible informative markers associated with cancer risk and to gain insights into the biological mechanisms leading to cancer development. Our study indicated that specific metabolite families might be related to the risk of multiple cancer types. Some of these metabolites could reflect biological mechanisms underlying the carcinogenic effects of some established cancer risk factors, including obesity.
Availability of data and materials
The R scripts developed to implement the analyses will be made available on the GitHub platform, for easy access to all interested scientists. The EPIC data is not publicly available, but access requests can be submitted to the Steering Committee (https://epic.iarc.fr/access/submit_appl_access.php).
Advanced prostate cancer
Body mass index
European Prospective Investigation into Cancer and Nutrition
False discovery rate
Flow injection analysis
Gallbladder and biliary tract cancer
International Agency for Research on Cancer
Imperial College London
Least absolute shrinkage and selection operator
Lower limit of quantification
Localized prostate cancer
Limit of detection
Tandem mass spectrometry
Ordinary least square regression
Principal component analysis
Type 2 diabetes
Upper limit of quantification
Beger RD. A review of applications of metabolomics in cancer. Metabolites. 2013;3(3):552–74. https://doi.org/10.3390/metabo3030552.
Scalbert A, Huybrechts I, Gunter MJ. The food exposome. In: Dagnino S, Macherone A, editors. Unraveling the exposome: Springer International Publishing. 2019. p. 217–45. https://doi.org/10.1007/978-3-319-89321-1_8.
Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Perspect. 2014;122(8):769–74. https://doi.org/10.1289/ehp.1308015.
González-Domínguez R, Jáuregui O, Queipo-Ortuño MI, Andrés-Lacueva C. Characterization of the human exposome by a comprehensive and quantitative large-scale multianalyte metabolomics platform. Anal Chem. 2020;92(20):13767–75. https://doi.org/10.1021/acs.analchem.0c02008.
Gonzalez-Franquesa A, Burkart AM, Isganaitis E, Patti ME. What have metabolomics approaches taught us about type 2 diabetes? Curr Diab Rep. 2016;16(8):74. https://doi.org/10.1007/s11892-016-0763-1.
Ahola-Olli AV, Mustelin L, Kalimeri M, et al. Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia. 2019;62(12):2298–309. https://doi.org/10.1007/s00125-019-05001-w.
Sun Y, Gao HY, Fan ZY, He Y, Yan YX. Metabolomics signatures in type 2 diabetes: a systematic review and integrative analysis. J Clin Endocrinol Metab. 2020;105(4):1000–8. https://doi.org/10.1210/clinem/dgz240.
McGarrah RW, Crown SB, Zhang GF, Shah SH, Newgard CB. Cardiovascular metabolomics. Circ Res. 2018;122(9):1238–58. https://doi.org/10.1161/CIRCRESAHA.117.311002.
Cavus E, Karakas M, Ojeda FM, et al. Association of circulating metabolites with risk of coronary heart disease in a European population: results from the biomarkers for cardiovascular risk assessment in Europe (BiomarCaRE) Consortium. JAMA Cardiol. 2019;4(12):1270–9. https://doi.org/10.1001/jamacardio.2019.4130.
Müller J, Bertsch T, Volke J, et al. Narrative review of metabolomics in cardiovascular disease. J Thorac Dis. 2021;13(4):2532–50. https://doi.org/10.21037/jtd-21-22.
His M, Viallon V, Dossus L, et al. Prospective analysis of circulating metabolites and breast cancer in EPIC. BMC Med. 2019;17(1):178. https://doi.org/10.1186/s12916-019-1408-4.
Schmidt JA, Fensom GK, Rinaldi S, et al. Pre-diagnostic metabolite concentrations and prostate cancer risk in 1077 cases and 1077 matched controls in the European Prospective Investigation into Cancer and Nutrition. BMC Med. 2017;15(1):122. https://doi.org/10.1186/s12916-017-0885-6.
Schmidt JA, Fensom GK, Rinaldi S, et al. Patterns in metabolite profile are associated with risk of more aggressive prostate cancer: a prospective study of 3,057 matched case-control sets from EPIC. Int J Cancer. 2020;146(3):720–30. https://doi.org/10.1002/ijc.32314.
Dossus L, Kouloura E, Biessy C, et al. Prospective analysis of circulating metabolites and endometrial cancer risk. Gynecologic Oncol. 2021. https://doi.org/10.1016/j.ygyno.2021.06.001.
Guida F, Tan VY, Corbin LJ, et al. The blood metabolome of incident kidney cancer: a case–control study nested within the MetKid consortium. PLOS Med. 2021;18(9):e1003786. https://doi.org/10.1371/journal.pmed.1003786.
Shu X, Xiang YB, Rothman N, et al. Prospective study of blood metabolites associated with colorectal cancer risk. Int J Cancer. 2018;143(3):527–34. https://doi.org/10.1002/ijc.31341.
Harlid S, Gunter MJ, Van Guelpen B. Risk-predictive and diagnostic biomarkers for colorectal cancer; a systematic review of studies using pre-diagnostic blood samples collected in prospective cohorts and screening settings. Cancers. 2021;13(17):4406. https://doi.org/10.3390/cancers13174406.
Rothwell JA, Bešević J, Dimou N, et al. Circulating amino acid levels and colorectal cancer risk in the European Prospective Investigation into Cancer and Nutrition and UK Biobank cohorts (In preparation).
Stepien M, Duarte-Salles T, Fedirko V, et al. Alteration of amino acid and biogenic amine metabolism in hepatobiliary cancers: findings from a prospective cohort study. Int J Cancer. 2016;138(2):348–60. https://doi.org/10.1002/ijc.29718.
Shu X, Zheng W, Yu D, et al. Prospective metabolomics study identifies potential novel blood metabolites associated with pancreatic cancer risk. Int J Cancer. 2018;143(9):2161–7. https://doi.org/10.1002/ijc.31574.
Zeleznik OA, Clish CB, Kraft P, Avila-Pacheco J, Eliassen AH, Tworoger SS. Circulating lysophosphatidylcholines, phosphatidylcholines, ceramides, and sphingomyelins and ovarian cancer risk: a 23-year prospective study. J Natl Cancer Inst. 2020;112(6):628–36. https://doi.org/10.1093/jnci/djz195.
Deng T, Lyon CJ, Bergin S, Caligiuri MA, Hsueh WA. Obesity, inflammation, and cancer. Annu Rev Pathol. 2016;11:421–49. https://doi.org/10.1146/annurev-pathol-012615-044359.
Wiebe N, Stenvinkel P, Tonelli M. Associations of chronic inflammation, insulin resistance, and severe obesity with mortality, myocardial infarction, cancer, and chronic pulmonary disease. JAMA Netw Open. 2019;2(8):e1910456. https://doi.org/10.1001/jamanetworkopen.2019.10456.
Li Y, Schoufour J, Wang DD, et al. Healthy lifestyle and life expectancy free of cancer, cardiovascular disease, and type 2 diabetes: prospective cohort study. BMJ. 2020:l6669. https://doi.org/10.1136/bmj.l6669.
Pietzner M, Stewart ID, Raffler J, et al. Plasma metabolites to profile pathways in noncommunicable disease multimorbidity. Nat Med. 2021:1–9. https://doi.org/10.1038/s41591-021-01266-0.
Kühn T, Floegel A, Sookthai D, et al. Higher plasma levels of lysophosphatidylcholine 18:0 are related to a lower risk of common cancers in a prospective metabolomics study. BMC Med. 2016;14:13. https://doi.org/10.1186/s12916-016-0552-3.
Gross SM, Tibshirani R. Data shared lasso: a novel tool to discover uplift. Comput Stat Data Anal. 2016;101:226–35. https://doi.org/10.1016/j.csda.2016.02.015.
Ollier E, Viallon V. Regression modelling on stratified data with the lasso. Biometrika. 2017;104(1):83–96. https://doi.org/10.1093/biomet/asw065.
Ballout N, Garcia C, Viallon V. Sparse estimation for case-control studies with multiple disease subtypes. Biostatistics. 2021;22(4):738–55. https://doi.org/10.1093/biostatistics/kxz063.
Riboli E, Hunt KJ, Slimani N, et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5(6B):1113–24. https://doi.org/10.1079/PHN2002394.
Viallon V, His M, Rinaldi S, et al. A new pipeline for the normalization and pooling of metabolomics data. Metabolites. 2021;11(9):631. https://doi.org/10.3390/metabo11090631.
Chavent M, Kuentz-Simonet V, Liquet B, Saracco J. ClustOfVar: an R package for the clustering of variables. J Stat Software. 2012;50:1–16. https://doi.org/10.18637/jss.v050.i13.
Carayol M, Leitzmann MF, Ferrari P, et al. Blood metabolic signatures of body mass index: a targeted metabolomics study in the EPIC cohort. J Proteome Res. 2017;16(9):3137–46. https://doi.org/10.1021/acs.jproteome.6b01062.
Kliemann N, Viallon V, Murphy N, et al. Metabolic signatures of greater body size and their associations with risk of colorectal and endometrial cancers in the European Prospective Investigation into Cancer and Nutrition. BMC Med. 2021;19(1):101. https://doi.org/10.1186/s12916-021-01970-1.
Pischon T, Nimptsch K. Obesity and cancer. Recent Results in Cancer Research. Cham: Springer; 2016. https://doi.org/10.1007/978-3-319-42542-9.
Fortner RT, Katzke V, Kühn T, Kaaks R. Obesity and breast cancer. Recent Results Cancer Res. 2016;208:43–65. https://doi.org/10.1007/978-3-319-42542-9_3.
Keum N, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nat Rev Gastroenterol Hepatol. 2019;16(12):713–32. https://doi.org/10.1038/s41575-019-0189-8.
Capitanio U, Bensalah K, Bex A, et al. Epidemiology of renal cell carcinoma. Eur Urol. 2019;75(1):74–84. https://doi.org/10.1016/j.eururo.2018.08.036.
Dashti SG, English DR, Simpson JA, et al. Adiposity and endometrial cancer risk in postmenopausal women: a sequential causal mediation analysis. Cancer Epidemiol Biomarkers Prev. 2021;30(1):104–13. https://doi.org/10.1158/1055-9965.EPI-20-0965.
Tibshirani R. Regression shrinkage and selection via the lasso. J Royl Stat Soc Series B (Methodological). 1996;58(1):267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29. https://doi.org/10.1198/016214506000000735.
Krämer N, Schäfer J, Boulesteix AL. Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics. 2009;10(1):384. https://doi.org/10.1186/1471-2105-10-384.
He K, Wang Y, Zhou X, Xu H, Huang C. An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis. Lifetime Data Anal. 2019;25(3):569–85. https://doi.org/10.1007/s10985-018-9455-2.
Ballout N, Etievant L, Viallon V. On the use of cross-validation for the calibration of the adaptive lasso. arXiv. 2005:10119 Published online July 15, 2021. Accessed 1 Dec 2021. http://arxiv.org/abs/2005.10119.
Chen Y, Yang Y. The one standard error rule for model selection: does it work? Stats. 2021;4(4):868–92. https://doi.org/10.3390/stats4040051.
Bach FR. Bolasso: model consistent Lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08: Association for Computing Machinery. 2008. p. 33–40. https://doi.org/10.1145/1390156.1390161.
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2004;32(2):407–99. https://doi.org/10.1214/009053604000000067.
Taylor J, Tibshirani R. Post-selection inference for -penalized likelihood models. Can J Stat. 2018;46(1):41–61. https://doi.org/10.1002/cjs.11313.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royl Stat Soc Series B (Methodological). 1995;57(1):289–300.
Treede I, Braun A, Sparla R, et al. Anti-inflammatory effects of phosphatidylcholine. J Biol Chem. 2007;282(37):27155–64. https://doi.org/10.1074/jbc.M704408200.
Hannun YA, Obeid LM. Principles of bioactive lipid signalling: lessons from sphingolipids. Nat Rev Mol Cell Biol. 2008;9(2):139–50. https://doi.org/10.1038/nrm2329.
Beloribi-Djefaflia S, Vasseur S, Guillaumond F. Lipid metabolic reprogramming in cancer cells. Oncogenesis. 2016;5:e189. https://doi.org/10.1038/oncsis.2015.49.
Klein MS, Shearer J. Metabolomics and type 2 diabetes: translating basic research into clinical application. J Diabetes Res. 2016;2016:3898502. https://doi.org/10.1155/2016/3898502.
Stepien M, Keski-Rahkonen P, Kiss A, et al. Metabolic perturbations prior to hepatocellular carcinoma diagnosis: findings from a prospective observational cohort study. Int J Cancer. 2021;148(3):609–25. https://doi.org/10.1002/ijc.33236.
Newton H, Wang YF, Camplese L, et al. Systemic muscle wasting and coordinated tumour response drive tumourigenesis. Nat Commun. 2020;11:4653. https://doi.org/10.1038/s41467-020-18502-9.
Gumpenberger T, Brezina S, Keski-Rahkonen P, et al. Untargeted metabolomics reveals major differences in the plasma metabolome between colorectal cancer and colorectal adenomas. Metabolites. 2021;11(2):119. https://doi.org/10.3390/metabo11020119.
Elia I, Broekaert D, Christen S, et al. Proline metabolism supports metastasis formation and could be inhibited to selectively target metastasizing cancer cells. Nat Commun. 2017;8(1):15267. https://doi.org/10.1038/ncomms15267.
Watford M. Glutamine metabolism and function in relation to proline synthesis and the safety of glutamine and proline supplementation. J Nutr. 2008;138(10):2003S–7S. https://doi.org/10.1093/jn/138.10.2003S.
Lécuyer L, Dalle C, Lyan B, et al. Plasma metabolomic signatures associated with long-term breast cancer risk in the SU.VI.MAX prospective cohort. Cancer Epidemiol Biomarkers Prev. 2019;28(8):1300–7. https://doi.org/10.1158/1055-9965.EPI-19-0154.
Jobard E, Dossus L, Baglietto L, et al. Investigation of circulating metabolites associated with breast cancer risk by untargeted metabolomics: a case-control study nested within the French E3N cohort. Br J Cancer. 2021;124(10):1734–43. https://doi.org/10.1038/s41416-021-01304-1.
Fages A, Duarte-Salles T, Stepien M, et al. Metabolomic profiles of hepatocellular carcinoma in a European prospective cohort. BMC Med. 2015;13:242. https://doi.org/10.1186/s12916-015-0462-9.
Gao H, Dong B, Liu X, Xuan H, Huang Y, Lin D. Metabonomic profiling of renal cell carcinoma: high-resolution proton nuclear magnetic resonance spectroscopy of human serum with multivariate data analysis. Analytica Chimica Acta. 2008;624(2):269–77. https://doi.org/10.1016/j.aca.2008.06.051.
Plewa S, Horała A, Dereziński P, et al. Usefulness of amino acid profiling in ovarian cancer screening with special emphasis on their role in cancerogenesis. Int J Mol Sci. 2017;18(12):E2727. https://doi.org/10.3390/ijms18122727.
Yi H, Talmon G, Wang J. Glutamate in cancers: from metabolism to signaling. J Biomed Res. 2019;34(4):260–70. https://doi.org/10.7555/JBR.34.20190037.
Plewa S, Horała A, Dereziński P, Nowak-Markwitz E, Matysiak J, Kokot ZJ. Wide spectrum targeted metabolomics identifies potential ovarian cancer biomarkers. Life Sci. 2019;222:235–44. https://doi.org/10.1016/j.lfs.2019.03.004.
Wu G, Bazer FW, Davis TA, et al. Arginine metabolism and nutrition in growth, health and disease. Amino Acids. 2009;37(1):153–68. https://doi.org/10.1007/s00726-008-0210-y.
Papadimitriou N, Gunter MJ, Murphy N, et al. Circulating tryptophan metabolites and risk of colon cancer: results from case-control and prospective cohort studies. Int J Cancer. 2021;149(9):1659–69. https://doi.org/10.1002/ijc.33725.
Muñoz-Esparza NC, Latorre-Moratalla ML, Comas-Basté O, Toro-Funes N, Veciana-Nogués MT, Vidal-Carou MC. Polyamines in food. Front Nutr. 2019;6:108. https://doi.org/10.3389/fnut.2019.00108.
Moinard C, Cynober L, de Bandt JP. Polyamines: metabolism and implications in human diseases. Clin Nutr. 2005;24(2):184–97. https://doi.org/10.1016/j.clnu.2004.11.001.
Vargas AJ, Ashbeck EL, Wertheim BC, et al. Dietary polyamine intake and colorectal cancer risk in postmenopausal women. Am J Clin Nutr. 2015;102(2):411–9. https://doi.org/10.3945/ajcn.114.103895.
Pietrocola F, Castoldi F, Kepp O, Carmona-Gutierrez D, Madeo F, Kroemer G. Spermidine reduces cancer-related mortality in humans. Autophagy. 2018;15(2):362–5. https://doi.org/10.1080/15548627.2018.1539592.
Fan J, Feng Z, Chen N. Spermidine as a target for cancer therapy. Pharmacol Res. 2020;159:104943. https://doi.org/10.1016/j.phrs.2020.104943.
Koelmel JP, Ulmer CZ, Jones CM, Yost RA, Bowden JA. Common cases of improper lipid annotation using high-resolution tandem mass spectrometry data and corresponding limitations in biological interpretation. Biochim Biophys Acta. 2017;1862(8):766–70. https://doi.org/10.1016/j.bbalip.2017.02.016.
Köfeler HC, Ahrends R, Baker ES, et al. Recommendations for good practice in MS-based lipidomics. J Lipid Res. 2021;62:100138. https://doi.org/10.1016/j.jlr.2021.100138.
Floegel A, Drogan D, Wang-Sattler R, et al. Reliability of serum metabolite concentrations over a 4-month period using a targeted metabolomic approach. PLoS One. 2011;6(6):e21103. https://doi.org/10.1371/journal.pone.0021103.
Townsend MK, Clish CB, Kraft P, et al. Reproducibility of metabolomic profiles among men and women in 2 large cohort studies. Clin Chem. 2013;59(11):1657–67. https://doi.org/10.1373/clinchem.2012.199133.
Carayol M, Licaj I, Achaintre D, et al. Reliability of serum metabolites over a two-year period: a targeted metabolomic approach in fasting and non-fasting samples from EPIC. PLoS One. 2015;10(8):e0135437. https://doi.org/10.1371/journal.pone.0135437.
This paper is dedicated to the memory our of colleague Dr. Bas Bueno-de-Mesquita.
Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.
The coordination of EPIC is financially supported by International Agency for Research on Cancer (IARC) and by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC).
The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam-Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS) - Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford) (UK). IDIBELL acknowledges support from the Generalitat de Catalunya through the CERCA Program. The breast cancer study was funded by the French National Cancer Institute (grant number 2015-166). The colorectal cancer studies were funded by World Cancer Research Fund (reference: 2013/1002; www.wcrf.org/) and the European Commission (FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/). The endometrial cancer study was funded by Cancer Research UK (grant number C19335/A21351). The kidney study was funded by the World Cancer Research Fund (MJ; reference: 2014/1193; www.wcrf.org/) and the European Commission (FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/). The liver cancer study was supported in part by the French National Cancer Institute (L’Institut National du Cancer; INCa; grant numbers 2009-139 and 2014-1-RT-02-CIRC-1) and by internal funds of the IARC. For the participants in the prostate cancer study, sample retrieval and preparation, and assays of metabolites were supported by Cancer Research UK (C8221/A19170), and funding for grant 2014/1183 was obtained from the World Cancer Research Fund (WCRF UK), as part of the World Cancer Research Fund International grant programme. Mathilde His’ work reported here was undertaken during the tenure of a postdoctoral fellowship awarded by the International Agency for Research on Cancer, financed by the Fondation ARC. The funders were not involved in designing the study; collecting, analysing, and interpreting results; or writing and submitting the manuscript for publication.
Ethics approval and consent to participate
The EPIC study, and in particular the seven case-control studies nested within EPIC, were conducted according to the Declaration of Helsinki and approved by the ethics committee at the International Agency for Research on Cancer (IARC): on 10 April 2008 (IEC 08-06) and on 11 February 2016 (IEC 16-06) for the liver cancer study, on 7 April 2014 (IEC 14-07) for the breast cancer study, on 7 April 2014 (IEC 14-08) for the two colorectal cancer studies, on 7 April 2014 (IEC 14-09) for the prostate cancer study, on 25 February 2015 (IEC 15-06) for the kidney cancer study, and on 28 April 2016 (IEC 16-20) for the endometrial cancer study. Written informed consent was obtained from all subjects involved in the study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material regarding (i) the definition of cancer cases for HCC, GBC, Adv.PrC and Loc.PrC; (ii) the definition and implementation of the data-shared lasso; (iii) the models used to derive point estimates and confidence intervals from the model selected by the data-shared lasso; and (iv) the univariate analysis conducted for comparison.
Figure S1. Pearson correlation between the 117 original metabolites. Figure S2. Sensitivity analyses of mutually adjusted ORs for the overall associations and cancer type-specific deviations. Figure S3. Sensitivity analysis of mutually adjusted ORs for the overall associations and cancer type-specific deviations with or without excluding hormone users. Figure S4. p-values of tests for departure from linearity and effect modification by BMI. Figure S5. ORs for the overall associations identified by the data-shared lasso with (i) the original model (ii) the extended type-specific model. Figure S6. Results from the univariate analyses. Figure S7. Comparison of the associations identified by the data-shared lasso when working with the 50 features (as in our main analysis) or with the original 117 metabolites. Figure S8. Pearson correlation between the 50 clusters. Figure S9. Pearson correlation between the 19 features related to at least one cancer site in our main analysis. Table S1. list of the 117 metabolites studied in the main analysis, and of the 16 additional metabolites studied when excluding the second colorectal study. Table S2. Robustness of the associations identified in the main analysis when including all the pairs from the prostate cancer study. Table S3. Other associations identified in a large proportion of bootstrap samples when including all the pairs from the prostate cancer study.
About this article
Cite this article
Breeur, M., Ferrari, P., Dossus, L. et al. Pan-cancer analysis of pre-diagnostic blood metabolite concentrations in the European Prospective Investigation into Cancer and Nutrition. BMC Med 20, 351 (2022). https://doi.org/10.1186/s12916-022-02553-4