Metabolomic profiles of hepatocellular carcinoma in a European prospective cohort

Background Hepatocellular carcinoma (HCC), the most prevalent form of liver cancer, is difficult to diagnose and has limited treatment options with a low survival rate. Aside from a few key risk factors, such as hepatitis, high alcohol consumption, smoking, obesity, and diabetes, there is incomplete etiologic understanding of the disease and little progress in identification of early risk biomarkers. Methods To address these aspects, an untargeted nuclear magnetic resonance metabolomic approach was applied to pre-diagnostic serum samples obtained from first incident, primary HCC cases (n = 114) and matched controls (n = 222) identified from amongst the participants of a large European prospective cohort. Results A metabolic pattern associated with HCC risk comprised of perturbations in fatty acid oxidation and amino acid, lipid, and carbohydrate metabolism was observed. Sixteen metabolites of either endogenous or exogenous origin were found to be significantly associated with HCC risk. The influence of hepatitis infection and potential liver damage was assessed, and further analyses were made to distinguish patterns of early or later diagnosis. Conclusion Our results show clear metabolic alterations from early stages of HCC development with application for better etiologic understanding, prevention, and early detection of this increasingly common cancer. Electronic supplementary material The online version of this article (doi:10.1186/s12916-015-0462-9) contains supplementary material, which is available to authorized users.

incidence. A valuable tool toward these goals is the analysis of bio-samples from prospective cohort studies, where healthy participants are enrolled and followed over time for the appearance of various diseases. Since HCC development implies alterations in the metabolic functions of the liver and, in a majority of cases, progresses from pre-cancerous lesions through to cirrhosis and cancer, it is conceivable that metabolic changes may be detected from the very early stages of the disease, long prior to clinical diagnosis. Thus, metabolomics may serve as a valuable tool for the identification of biomarkers for early detection of HCC.
Metabolomics is a powerful high-throughput approach that relies on state of the art analytical methods, such as nuclear magnetic resonance (NMR), to identify metabolic signatures or biomarkers associated with homeostasis perturbations [7]. Metabolomic strategies play an increasingly important role in clinical and observational studies, in the hope that they will offer new perspectives not only in understanding the processes of disease development, but also for identification of diagnostic/prognostic markers and targeted healthcare [8]. Indeed, several recent studies have leveraged metabolite profiling to provide new insights into pathological processes pertaining to cancer, heart disease, or diabetes mellitus [9][10][11][12][13][14][15]. Although a number of metabolomic-based approaches have been applied to HCC, they have either been largely based on traditional case-control designs, high risk patient groups (e.g. hepatitis infection, cirrhosis, or other chronic liver diseases), non-Western populations where traditional HCC risk factors predominate, or on tumor tissues [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. However, there is currently very little information derived from prospective settings where biological samples have been collected prior to disease diagnosis [32][33][34].
In this study, we investigated whether metabolic differences could be detected between HCC cases and matched controls derived from a prospective cohort study using serum samples collected prior to diagnosis. A NMR-based metabolomic approach was applied to a case-control study nested within a large, multi-center prospective cohort.

Study design
The present study is based on a case-control study nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort, a multicenter prospective study designed to investigate the association between diet, lifestyle, and environmental factors and the incidence of various types of cancer and other chronic diseases. The rationale, detailed study design, and methods have been previously detailed [35]. Briefly, diet and lifestyle data were collected at recruitment from approximately 520,000 men and women aged 35-85 years enrolled between 1992 and 2000 in 23 centers from 10 Western European countries (Denmark, France, Germany, Greece, Italy, Norway, Spain, Sweden, the Netherlands, and the United Kingdom) [35]. The study subjects were recruited from the general population, except for France (women who were members of a health insurance scheme for state school employees), Naples and Norway (women only), Utrecht and Florence (women attending breast cancer screening), and subsamples of the Oxford "Health Conscious" sub-cohort (vegetarians) and the Italian and Spanish cohorts (mainly members of blood donor associations).

Ethics
The EPIC cohort in general, and this study in particular, have received approval from the Ethics Committee of the International Agency for Research on Cancer as well as the ethics review boards of individual EPIC centers. EPIC participants provided written consent for the use of their blood samples and all data.

Blood sample collection
Blood samples were collected using standardized methods at recruitment from most participants and are stored at IARC (Lyon, France) in liquid nitrogen at -196°C for all countries except Denmark (−150°C, nitrogen vapor) and Sweden (−80°C, freezers) where samples are stored locally [35].

Cancer and vital status assessment
Vital status during follow-up (98.5 % complete) was assessed by record linkage with regional and/or national mortality registries in all countries except Germany and Greece, where follow-up was actively reported by study subjects or their next-of-kin. Cancer incidence was determined through record linkage with populationbased regional cancer registries (Denmark, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) or via a combination of methods, including the use of health insurance records, contacts with cancer and pathology registries, and active follow-up through study subjects and their next-of-kin (France, Germany, Greece). For the present study, the dates of follow-up for cancer incidence and vital status are complete up to end of 2006.
The HCC nested case-control study Ascertainment of cases HCC cases were defined as tumor in the liver (C22.0) according to the 10th Revision of the International Statistical Classification of Diseases, Injury and Causes of Death. For each HCC case identified, the histology, methods used to diagnose the cancer, and α-fetoprotein (AFP) levels were reviewed to exclude metastatic cases or other types of primary liver cancers.
The nested case-control study The design of the nested case-control study has been previously described in detail [36]. Briefly, 125 HCC cases with available blood samples at baseline were identified between participants' recruitment and 2006. For each case, two controls were selected by incidence density sampling from all cohort members alive and free of cancer (except non-melanoma skin cancer), and matched by age at blood collection (±1 year), sex, study center, date (±2 months) and time of the day at blood collection (±3 h), and fasting status at blood collection (<3/ 3-6/>6 h). Women were additionally matched by menopausal status (pre-/peri-/postmenopausal) and hormone replacement therapy use at time of blood collection (yes/ no). Participants with insufficient remaining blood sample for NMR analyses were excluded (N cases = 11). For six cases, only one eligible control was available for each case. Therefore, the final sample size for the present analysis included 114 HCC cases and 222 matched controls.

Serum sample analysis
Laboratory assays: HBV/HCV infection, biomarkers of liver function and AFP HBV and HCV seropositivity were detected in serum samples using the ARCHITECT HBsAg and anti-HCV chemiluminescent microparticle immunoassays (CMIAs; Abbott Diagnostics, France): HBsAg-positive when ≥0.05 IU/mL and HCV-positive when the ratio of sample relative light units to cutoff relative light units was ≥1 in two measurements [36]. Biochemical markers of hepatic injury, including albumin, total bilirubin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), gammaglutamyltransferase (GGT), and liver-specific alkaline phosphatase (AP) were measured on the ARCHITECT c Systems™ (Abbott Diagnostics) using standard protocols. The normal ranges were: albumin, 35-50 g/L; total bilirubin, 3.4-20.5 mmol/L; ALT, <55 U/L; AST, 5-34 U/L; GGT, 12-64 U/L (men) and 9-36 U/L (women); and AP, 40-150 U/L. A liver function score was calculated from concentrations of albumin, total bilirubin, ALT, AST, GGT, and AP, each contributing 1 point when outside of the normal range [37]. The liver score was categorized as no liver damage (liver score 0), probable liver damage (liver score 1-2), and likely liver damage (liver score ≥3). Additionally, the concentration of serum AFP, which is currently a pre-diagnostic biomarker for HCC, was measured in blood using the ARCHITECT AFP kit. The laboratory analyses were performed at the Centre de Biologie République, Lyon, France [36].

NMR metabolomic data acquisition
Serum samples (200 μL) were processed according to standard procedures for NMR metabolomic measurement [38]. One-dimensional 1 H Carr-Purcell-Meiboom-Gill (CPMG) and Nuclear Overhauser effect spectroscopy (NOESY) NMR spectra were recorded for each serum sample on a Bruker Avance III spectrometer operating at 800.15 MHz 1 H NMR frequency. Additional two-dimensional NMR spectra were recorded on a set of representative samples (one control and one case) to achieve assignment of the NMR signals observed in the 1 H one-dimensional fingerprints to metabolites. The measured chemical shifts were compared to reference shifts of pure compounds using the HMDB [39], MMCB [40], and ChenomX NMR Suite (Chenomx Inc., Edmonton, Canada) databases. Figure 1 shows the mean CPMG spectrum with metabolite assignments. The detailed list of the 44 annotated metabolites is provided in Additional file 1: Table S1. NMR signals arising from lipids enabled the quantification of unsaturated lipids in the serum (signal at 5.28 ppm, resonance of -CH = CHfrom unsaturated lipids) as well as terminal lipids methyls corresponding to several classes of lipoproteins: very-lowdensity lipoproteins (VLDL; δ 0.86 ppm), low-density lipoproteins (LDL; δ 0.84 ppm), and high-density lipoproteins (HDL; δ 0.82 ppm). After processing and calibration, each 1D NMR spectrum was reduced into bins of 0.001 ppm width over a chemical shift range of 0.5-9 ppm using the AMIX software (Bruker GmbH, Rheinstetten, Germany), giving a total number of 8,500 NMR variables.
All NMR analyses were performed blindly with respect to case/control status. Further details on sample preparation, NMR data acquisition, and spectra processing are available in Additional file 1.

Statistical analysis
Orthogonal partial least-square (O-PLS) O-PLS [41] analyses were conducted in order to build predictive sample classification models based on whole CPMG or NOESY NMR spectra to discriminate between HCC cases and controls, by relating the 8,500 NMR variables to case/control status. Results were visualized on score plots corresponding to sample projection onto the predictive axis and the first orthogonal component of the model. The metabolic signature discriminating HCC cases from controls was visualized by the corresponding loading plot. The optimal number of orthogonal components for building O-PLS models was selected using a 7-fold cross validation procedure. The associated R 2 and Q 2 parameters were calculated as a measure of the "goodness of fit and prediction", i.e. the explained and predicted variances, respectively. The robustness of O-PLS models was further validated using permutations (1000 times) under the null hypothesis; for each permutated case/control labels, R 2 and Q 2 values were obtained and compared to the original ones, their decrease indicating the good quality of the model [42].

Metabolite paired difference analysis
The statistical recoupling of variables [43] procedure was first applied to reduce the 8,500 NMR variables into 285 intelligent buckets, or clusters of NMR variables, that correspond to reconstructions of peak entities. ANOVA models were then carried out on each of the 285 clusters of variables by modelling the case-control set by means of a random effect variable to account for the matching design of the study in ANOVA mixed-effect models. To correct for multiple testing, q values were determined using the Benjamini-Hochberg procedure [44] to control the false discovery rate with a threshold of 0.05. In this way, 96 clusters of NMR variables were found to be significantly associated with HCC outcome. Significant clusters of variables corresponding to different peaks of the same metabolite (based on the metabolite identification reported above) were combined into a single variable by summing up the bins intensities taking into consideration the number of homolog protons in the signal resonance. This procedure resulted in a list of 23 combined clusters of variables, 16 of which corresponded to distinct metabolite or lipid classes and were retained for further analyses, while five corresponded to other signals from mixed classes of lipids and two corresponded to the superimposition of signals from different metabolites.

Conditional logistic regression (CLR)
CLR models were used to quantify the associations between the 16 metabolites selected as described above and HCC risk by computing odds ratios (OR) and 95 % confidence intervals (95 % CIs). The metabolites were modeled as continuous variables with the OR corresponding to one standard deviation increase in metabolic intensity. CLR models were run conditioned on the matching factors (referred to as crude), and after adjustment for potential confounding variables (referred to as multivariable), i.e. body mass index (continuous), smoking status (current smokers, non-smokers, former smokers, unknown), lifetime alcohol drinking pattern (never drinkers, former drinkers, drinkers only at recruitment, lifetime drinkers), level of alcohol consumption at recruitment (g/d; continuous), serum-clot contact time (≤1 d or >1 d; a value that corresponds to the time between blood collection and blood centrifugation [45]), physical activity (inactive, moderately inactive, moderately active, active, missing), educational status (primary school, secondary school, professional school, longer education, unknown; as a proxy variable for socioeconomic status), and waist circumference (cm). The multivariable models for serum ethanol concentration were not adjusted for level of alcohol consumption at recruitment. For all metabolites, an additional CLR model with further adjustment for liver function score was also run.

Receiver operating characteristics (ROC)
ROC curves and corresponding area under the curve (AUC) were generated for several models including the AFP concentration, the liver function score, the multivariate metabolic profile using both the score values from the O-PLS classification model (referred as O-PLS score), and the cross-validated predicted-Y values (referred as O-PLS CV status) as well as a combination between the O-PLS CV status and AFP or the liver score. Combinations of the variables were obtained by summing up the O-PLS CV status with either AFP or the liver score after normalization of each variable to one unit variance. The specificity, sensitivity, and accuracy were obtained from the optimal cut-off point that corresponded to the minimal distance to the ideal point.

Subgroup analyses
Analyses stratified by hepatitis infection status (37 HCC cases Hep + , 77 HCC cases Hep -), by liver function score (34 HCC cases with no liver damage, 80 HCC cases with probable to certain liver damage), by years between blood collection and cancer diagnosis with a cut-off at 2 years (22 HCC cases diagnosed <2 years, 92 HCC cases diagnosed ≥2 years from blood collection) were also conducted. In the grouping of cases diagnosed <2 years, the small sample size prevented model stability upon multivariable adjustment. Thus, only crude CLR models were run for this subgroup.

Results
Baseline characteristics of the study participants are summarized in Table 1. The median follow-up time between blood collection and HCC diagnosis (lag time) was 4.8 years. Serum blood samples of HCC cases were more likely to test positive for HBV or HCV infections (32.5 % vs. 3.2 % in the controls), and to have altered liver function as indicated by high liver function score (36.8 % vs. 14.4 % for probable liver damage and 33.3 % vs. 0.9 % for likely liver damage for cases vs. controls, respectively).
The O-PLS analysis presented in Fig. 2a shows a metabolic profile discriminating between HCC cases and the matched controls (R 2 = 35 %, Q 2 = 21 %). The metabolic signature (Fig. 2b) associated with HCC occurrence presented (1) higher levels in the aromatic amino acids (AAA) tyrosine and phenylalanine, glutamate, acetate, citrate, glucose, propylene glycol, and ethanol; (2) lower levels in unsaturated lipids and VLDL, N-acetyl glycoproteins, choline, glutamine, acetone, mannose and the branched-chain amino-acids (BCAA) valine, leucine, and isoleucine levels, compared to the control group. The corresponding P values, q values, and fold changes of the metabolites are presented in Table 2. The ROC analyses (Fig. 2c) of the metabolic signature (O-PLS score) and of the cross-validated data (O-PLS CV status) presented an AUC of 85 % and 74 %, respectively (   Table 4).
The O-PLS analyses stratified by hepatitis infection status of the cases (Fig. 3a,b) presented distinct metabolic signatures from hepatitis-infected HCC cases (R 2 = 45 %, Q 2 = 34 %) and hepatitis-free HCC cases (R 2 = 28 %, Q 2 = 12 %). Hepatitis-infected HCC cases presented (1) higher levels of AAA, glucose, and citrate and (2) lower VLDL and unsaturated lipids levels, while on the other hand HCC hepatitis-free cases were characterized by (1) higher levels in ethanol and glutamate and (2) lower levels in glutamine, BCAA, and choline. In hepatitis-free HCC cases, the risk associations of glutamine (OR = 0.56; 95 % CI, 0.34-0.92) and glutamate (OR = 2.06; 95 % CI, 1.18-3.61) were significantly different from matched controls ( Table 4). Figure 3c shows O-PLS subgroup analysis of HCC cases with abnormal liver function (score ≥1). A robust model was obtained (R 2 = 58 %, Q 2 = 43 %) and the metabolic signature was similar to that including all samples (Fig. 2b). However, no significant model was obtained from HCC cases with a normal liver function (score = 0) only (data not shown). Table 4 shows results of multivariable CLR additionally adjusted for liver function score for which only citrate (OR = 1.88; 95 % CI, 1.14-3.11) and phenylalanine (OR = 1.75; 95 % CI, 1.04-2.94) remained significantly associated with HCC risk. Figure 4 presents the O-PLS and ROC analyses stratified by lag time between blood collection and diagnosis. The metabolic signature of HCC cases diagnosed within 2 years after blood collection is characterized by (1) higher levels in AAA and glutamate, and (2) lower levels in unsaturated lipids and choline while in addition, the metabolic signature of HCC diagnosed later (≥2 years) presented (1) higher levels in glucose, ethanol, and propylene glycol and (2) lower levels in BCAA and N-acetyl glycoproteins. Among the cases diagnosed <2 years from recruitment, the AUC of ROC curves from the O-PLS metabolic signature and from O-PLS CV data were 93 % and 82 %, respectively (Fig. 4c).

Discussion
This study is, to the best of our knowledge, the first NMR metabolomic analysis based on subjects from a prospective cohort study on Western European populations for epidemiology of liver cancer. We have identified a number of metabolites that differed between HCC cases and corresponding matched controls. As concerns the specificity of these associations, we note that an analogous study was conducted in parallel on extrahepatic/ intrahepatic bile duct carcinomas without providing any  Table 3 significant results (data not shown). We also note that the impact of long-term storage of EPIC samples as well as other potential sources of systematic variations of the metabolic profiles has been thoroughly detailed earlier [45].
O-PLS analysis showed a clear discrimination between cases and controls with somewhat different metabolomic profiles with respect to the length of time from blood collection to diagnosis, hepatitis infection status, and liver function. Importantly, this study showed that consideration of metabolomic profiles can improve HCC diagnosis beyond that provided by AFP and liver enzyme levels, which are currently the most common HCC biomarkers often applied in clinical practice.
The liver is central for the metabolism of carbohydrates, fats and proteins, and also plays key roles in detoxification and hormone production. Thus, a degree of metabolic dysregulation would be expected with liver diseases, particularly HCC. For this reason, the application of metabolomic technologies may be able to provide some insight into the etiology and mechanisms of HCC and, possibly, the identification of early diagnostic biomarkers or biomarker patterns characteristic of cancer at this anatomical site.
To date, three NMR or NMR/mass spectrometry, serum-based metabolomic studies have been conducted looking specifically at HCC [20,23,24]. All three casecontrol studies were based on sera collected from HCC cases post-diagnosis. The comparison group in one of the studies was hepatitis-infected subjects [20], while that of the others were cirrhotic patients [23,24]. The studies identified potential (1) impairment of the tricarboxylic acid cycle, increased lipid catabolism, and elevation of essential amino acids [20], and (2) defects on ammonium detoxification and increased fatty acid beta-oxidation [24] in HCC. The fundamental design differences with the present study are that the latter is based on prospectively identified HCC cases, such that metabolomic profiles are      likely indicative of pre-diagnostic changes, and that the matched control subjects were cancer-free cohort participants. The key metabolic alterations observed are related to changes in amino acid, polyunsaturated lipid, acetate, and citrate metabolism, among the 16 individual metabolites highlighted here. Because our study is nested within the prospective EPIC cohort, which has detailed information on dietary and lifestyle factors and measured anthropometry, we were able to make statistical adjustments for many important confounding variables such as smoking status, alcohol consumption and habits, physical activity, educational attainment (as a proxy marker for socioeconomic status), body mass index, and waist circumference.
Of particular note is our observation of a 0.82-fold reduction in choline in HCC cases (Table 2), meaning a significant inverse HCC risk association for this compound (Table 4; OR = 0.45; 95 % CI, 0.31-0.65). In animal studies, choline deficiency has been shown to cause liver damage, oxidative stress, and spontaneous liver cancer [47][48][49]. In human studies, HCC has been associated with a down regulation in choline metabolism [25].
Also interesting is our identification of circulating ethanol as a strong HCC risk factor, alcohol being a major lifestyle risk factor for this disease. We also observed a shift, in terms of fold difference between HCC cases and their matched controls, from glutamine to glutamate, indicating a possible defect in ammonium detoxification [50], as also observed in the study by Nahon et al. [24]. It is of interest that Nahon et al. [24] observed this shift comparing HCC cases to cirrhotic controls, while our findings indicate that this important change may actually be present for some time prior to diagnosis. In the study by Gao et al. [20], higher levels of AAA were associated with liver cirrhosis and HCC, together with lower levels of BCAA, choline, and unsaturated lipids. The same changes were observed suggesting Fig. 4 Analyses stratified by the interval between recruitment into the EPIC cohort and clinical diagnosis of HCC (<2 years after recruitment vs. ≥2 years after recruitment). (a) O-PLS score plot including HCC cases that were diagnosed <2 years (n = 22) after blood collection and matched controls (n = 43), R 2 = 45 % and Q 2 = 33 %, and the metabolic signature colored for correlation after significance to ANOVA tests (Benjamini-Hochberg multiple corrected). (b) O-PLS score plot including HCC cases that were diagnosed ≥2 years after blood collection (n = 92) and their matched controls (n = 179), R 2 = 27 % and Q 2 = 16 %, and the metabolic signature colored for correlation after significance to ANOVA tests (Benjamini-Hochberg multiple corrected). (c) ROC analyses for each stratified group including AFP, liver function score, O-PLS score, O-PLS cross validated (CV) status, and a combination between O-PLS CV status and AFP or liver function score. The ROC curves of the O-PLS CV status and the O-PLS CV status + AFP are almost overlapped for the ROC analysis performed on cases diagnosed <2 years. The characteristics of each model are presented in Table 3. The validations of the O-PLS models are presented in Additional file 1: Figure S3 An interesting observation in the present study was a strong, significant positive HCC risk association for the exogenous metabolite propylene glycol. Identification of propylene glycol in human serum is not uncommon [51], and it is thought to derive largely from pharmaceutical use since it is widely used as a solvent in many intravenous, oral, and topical pharmaceutical preparations (as well as in other general products including cosmetics, food, and toothpastes). The liver of an adult with normal liver and kidney functions will metabolize propylene glycol into lactate, acetate, and pyruvate within several hours [52]. Therefore, high levels of propylene glycol could be reflective of medication use, possibly in participants with liver damage or due to its simple accumulation resulting from impaired liver function. Despite the prospective nature of our study, it may be speculated that HCC cases may have encountered some symptoms, which may have prompted medical surveillance and/or alteration of dietary/lifestyle habits (e.g. reduced alcohol intake or smoking cessation). Yet, such changes would likely bias risk estimates towards the null or be unrelated to the disease outcome.
In addition to its prospective design, availability of detailed pre-diagnostic lifestyle/dietary data, and anthropometric measures, additional strengths of our study include the ability to consider liver function parameters based on a score developed from clinically relevant liver enzyme concentrations. The assumption is that decreased liver function is associated with a greater degree of liver damage. From our findings, it is apparent that the metabolic pattern associated with HCC may be reflective of liver dysfunction, as suggested by the stratified analysis on the liver function score. These results support the fact that HCC largely arises from a background of increasingly severe liver damage. Indeed, the process ending with HCC is considered to be gradual, involving infection by hepatitis viruses or the development of fatty liver diseases or cirrhosis [53]. Each part of the process may be characterized by alterations in metabolic factors, which may be detectable by metabolomic approaches [54][55][56][57][58]. Due to this gradual process, we note that longer follow-up time would be required in order to thoroughly assess, prior to any liver damage, the specificity of the identified HCC risk associations. Our study was composed of a large number of HCC cases that were not infected with either hepatitis B or C. Thus, we attempted to determine whether metabolomic differences could be observed in the absence of these predominant HCC risk factors. Although exclusion of hepatitis-positive cases attenuated some of our findings and resulted in loss of significance for specific metabolites, strong associations were observed for glutamate and glutamine. This is indicative of a potential defect in ammonium detoxification in non-hepatitis HCC. This observation deserves further in-depth investigation.
In our study, we were also interested in comparing metabolic changes preceding cancer diagnosis by several years. Thus, we conducted stratified analysis by lag time between blood collection and diagnosis, which showed specific metabolic changes according to follow-up time. However, a key limitation of the present study is the lack of any clinical data, assessment of any medication usage, or subgroup analyses based on pathways of HCC development. The metabolite changes associated with the later cases are more likely to be informative on the etiology and/or risk exposure (e.g. dietary components, environmental, lifestyle, and pollutants), while metabolic changes in cases diagnosed <2 years after recruitment likely reflect a direct influence of the tumor.

Conclusion
For the first time, a metabolic pattern based on serum samples was identified to be associated with HCC risk within a large prospective study. Several metabolites associated with either an increased or decreased HCC risk have been highlighted. The majority of associations remained significant after controlling for potential confounders and consideration of correction for multiple testing. The results suggest that metabolic patterns can provide meaningful etiologic insight into HCC development and can potentially be used to detect this cancer in its early stages, even several years prior to clinical diagnosis.

Additional file
Additional file 1: Supplementary methods for NMR metabolomics data acquisition, additional table (Table S1. Metabolites identified in serum samples) and additional figures ( Figure S1. Validation (1000 resampling) of the O-PLS model based on 1H CPMG spectra and O-PLS metabolic signature obtained from the analysis of 1H NOESY NMR spectra. Figure S2. Validation (1000 resampling) of the O-PLS models stratified by hepatitis infection status of the cases, and liver function score.

Competing interests
The authors declare that they have no competing interests.