Reduced levels of hydroxylated, polyunsaturated ultra long-chain fatty acids in the serum of colorectal cancer patients: implications for early screening and detection

Background There are currently no accurate serum markers for detecting early risk of colorectal cancer (CRC). We therefore developed a non-targeted metabolomics technology to analyse the serum of pre-treatment CRC patients in order to discover putative metabolic markers associated with CRC. Using tandem-mass spectrometry (MS/MS) high throughput MS technology we evaluated the utility of selected markers and this technology for discriminating between CRC and healthy subjects. Methods Biomarker discovery was performed using Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS). Comprehensive metabolic profiles of CRC patients and controls from three independent populations from different continents (USA and Japan; total n = 222) were obtained and the best inter-study biomarkers determined. The structural characterization of these and related markers was performed using liquid chromatography (LC) MS/MS and nuclear magnetic resonance technologies. Clinical utility evaluations were performed using a targeted high-throughput triple-quadrupole multiple reaction monitoring (TQ-MRM) method for three biomarkers in two further independent populations from the USA and Japan (total n = 220). Results Comprehensive metabolomic analyses revealed significantly reduced levels of 28-36 carbon-containing hydroxylated polyunsaturated ultra long-chain fatty-acids in all three independent cohorts of CRC patient samples relative to controls. Structure elucidation studies on the C28 molecules revealed two families harbouring specifically two or three hydroxyl substitutions and varying degrees of unsaturation. The TQ-MRM method successfully validated the FTICR-MS results in two further independent studies. In total, biomarkers in five independent populations across two continental regions were evaluated (three populations by FTICR-MS and two by TQ-MRM). The resultant receiver-operator characteristic curve AUCs ranged from 0.85 to 0.98 (average = 0.91 ± 0.04). Conclusions A novel comprehensive metabolomics technology was used to identify a systemic metabolic dysregulation comprising previously unknown hydroxylated polyunsaturated ultra-long chain fatty acid metabolites in CRC patients. These metabolites are easily measurable in serum and a decrease in their concentration appears to be highly sensitive and specific for the presence of CRC, regardless of ethnic or geographic background. The measurement of these metabolites may represent an additional tool for the early detection and screening of CRC.


Background
Colorectal cancer (CRC) mortality remains one of the highest among all cancers, second to only lung cancer (Canadian Cancer Statistics, 2008). Despite the known benefits of early detection, screening programmes based on colonoscopy and fecal occult blood testing have been plagued with challenges such as public acceptance, cost, limited resources, accuracy and standardization. There is consensus in the field that the use of colonoscopy alone for CRC screening is not practical [1], and that a minimally-invasive serum-based test capable of accurately identifying subjects who are high risk for the development of CRC would result in a higher screening compliance than current approaches and better utilization of existing endoscopy resources [1][2][3]. Although there have been multiple reports of altered transcript levels [4][5][6][7][8][9][10][11], aberrantly methylated gene products [12][13][14] and proteomic patterns [15][16][17][18] associated with biological samples from CRC patients, few if any have advanced into clinically useful tests. This may be due to a number of reasons including technical hurdles in assay design, challenges obtaining reproducible results, costs and lengthy regulatory processes. Furthermore, most of the tests currently used or in development are based upon the detection of tumour-specific markers and have poor sensitivity for identifying subjects who are either very early stage, or are predisposed to risk but show no clinical presentation of disease.
Although causal genetic alterations for CRC have been well characterized, the number of cases due to adenomatous (APC) and hereditary nonpolyposs colorectal cancer are less than 5% of the total, with approximately 15% claimed to be attributable to inheritable family risk likely due to complex patterns of low penetrance mutations which have yet to be delineated [19]. The fact remains that approximately 80% of CRC cases are thought to arise sporadically, with diet and lifestyle as key risk factors [20,21]. In addition, an individual's microbiome is intricately linked to their gastrointestinal physiological status and may itself be involved as a risk factor [22]. Given that metabolism is heavily influenced by both diet and lifestyle and that the microbiome contributes its own metabolic processes, it is surprising that there has been little effort aimed identifying metabolic markers as risk indicators of CRC. This may, in part, have been due to the lack of platform technologies and informatics approaches capable of comprehensively characterizing metabolites in a similar way that DNA microarrays or surface-enhanced laser desorption ionization can characterize transcripts or proteins, respectively.
Recently, however, there have been rapid advances made in mass spectrometric-based systems which can identify large numbers of metabolic components within samples in a parallel manner [23][24][25]. Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) is based upon the principle that charged particles exhibit cyclotron motion in a magnetic field, where the spin frequency is proportional the mass [26]. FTICR-MS is known for its high resolving power and capability of detecting ions with mass accuracy below 1 part per million (ppm). Liquid sample extracts can be directly infused using electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI) without chromatographic separation [23], where ions with differing mass to charge (m/z) ratios can be simultaneously resolved using a Fourier transformation. Using informatics approaches, spectral files from multiple samples can be accurately aligned and peak intensities across the samples compared [23]. High resolution also enables the prediction of elemental composition of all ions detected in a sample, providing a solid foundation for metabolite classification and identification, as well as the ability to construct de novo metabolic networks [23,27]. The combination of liquid extraction, flow injection, high resolution and informatics affords a unique opportunity to broadly characterize the biochemical composition of samples, with no a priori knowledge about the sample itself, to a degree which was not previously possible. This 'non-targeted' approach has the advantage of detecting novel compounds and is therefore ideally suited for biomarker-driven discovery applications. Using a MS-based discovery platform for metabolic biomarker identification also has the added advantage of straightforward translation into a quantitative method based upon triple-quadruple multiple-reaction-monitoring (TQ-MRM), similar to the clinical methods used to screen for inborn errors of metabolism [28].
Here we report on the application of this technology for characterizing the serum metabolomes of treatmentnaive CRC patients and healthy asymptomatic subjects. A specific metabolic perturbation was discovered in the serum of CRC patients compared to controls in three independent and unrelated sets of samples (total n of 222). We further verify the perturbation using a tandem MS (MS/MS) approach in two additional independent case-control populations totalling 220 subjects. Implications of the findings for CRC screening are discussed.

Patient sample selection
Clinical samples used for the first discovery project were obtained from Genomics Collaborative, Inc (GCI, MA, USA), while samples for the second discovery project and one validation project were obtained from Seracare Lifesciences (MA, USA). These companies specialize in the collection and storage of serum and tissue samples specifically for research purposes. Samples were collected, processed and stored in a consistent manner by teams of physicians as part of a global initiative using standardized protocols and operating procedures. Collection protocols for GCI and Seracare Lifesciences were approved by the Western Institutional Review Board and all samples were properly consented. The inclusion criterion for patient sample selection from the GCI and Seracare biobanks for both the discovery and validation cohorts was that the serum be taken prior to any form of treatment, including surgery, chemo or radiation therapies. All samples were accompanied by detailed pathology reports which were independently verified by certified pathologists at GCI and Seracare. The GCI discovery sample set included serum samples from 40 pretreatment CRC patients and matched 50 controls; the Seracare discovery set included samples from 26 pretreatment CRC and matched 25 controls, and the validation Seracare set included 70 pretreatment CRC and matched 70 controls. The discovery samples provided by Osaka Medical University (Osaka, Japan) included 46 pre-surgery CRC patients matched 35 controls which were prospectively collected according to the standard collection protocol of the institution and were properly consented. Study protocols were performed according to the ethical guidelines set by the committee of the three ministries of the Japanese Government. The samples for the Chiba, Japan, validation population, which included 40 pre-surgery CRC patients and 40 matched controls, were also prospectively consented and collected under an ethics reviewed protocol approved by the Institutional Review Board of Graduate School of Medicine, Chiba University. A summary of the populations including disease staging is shown in Table 1. All samples were processed and analysed in a randomized manner and the results unblinded following analysis.

Sample extraction
Serum samples were stored at -80°C until thawed for analysis and were only thawed once. All extractions were performed on ice. Serum samples were prepared for FTICR-MS analysis by first sequentially extracting equal volumes of serum with 1% ammonium hydroxide and ethyl acetate (EtOAc) three times. Samples were centrifuged between extractions at 4°C for 10 min at 3500 rpm and the organic layer removed and transferred to a new tube (extract A). A 1:5 ratio of EtOAc (extract A) to butanol (BuOH) was then evaporated under nitrogen to the original BuOH starting volume (extract B). All extracts were stored at -80°C until FTICR-MS analysis.

FTICR-MS analysis
For analysis under negative ESI conditions, sample extract B was diluted 10-fold in methanol:0.1% (v/v) ammonium hydroxide (50:50, v/v) prior to direct infusion. For APCI, extract A was directly injected without diluting. All analyses were performed on a Bruker Daltonics APEX III FTICR-MS equipped with a 7.0 T actively shielded superconducting magnet (Bruker Daltonics, MA, USA). Samples were directly injected using ESI and APCI at a flow rate of 600 μL per hour. Ion transfer/detection parameters were optimized using a standard mix of serine, tetra-alanine, reserpine, Hewlett-Packard tuning mix and the adrenocorticotrophic hormone fragment 4-10. In addition, the instrument conditions were tuned to optimize ion intensity and broadband accumulation over the mass range of 100-1000 atomic mass units (amu) according to the instrument manufacturer's recommendations. A mixture of the above mentioned standards was used to internally calibrate each sample spectrum for mass accuracy over the acquisition range of 100-1000 amu. FTICR data were analysed using a linear least-squares regression line, mass axis values were calibrated such that each internal standard mass peak had a mass error of < 1 part ppm compared with its theoretical mass. Using XMASS software from Bruker Daltonics Inc (CA, USA), data file sizes of one megaword were acquired and zero-filled to two megawords. A SINm data transformation was performed prior to Fourier transform and magnitude calculations. The mass spectra from each analysis were integrated, creating a peak list that contained the accurate mass and absolute intensity of each peak. Compounds in the range of 100-1000 mz were analysed. In order to compare and summarize the data, all detected mass peaks were converted to their corresponding neutral masses, assuming hydrogen adduct formation. A self-generated two-dimensional (mass versus sample intensity) array was then created using DISCOVA-metrics™ software (Phenomenome Discoveries Inc, Saskatoon, Canada). The data from multiple files were integrated and this combined file was then processed in order to determine all of the unique masses. The average of each unique mass was determined, representing the y-axis. A column was created for each file that was originally selected to be analysed, representing the xaxis. The intensity for each mass found in each of the files selected was then filled into its representative x,y coordinate. Coordinates that did not contain an intensity value were left blank. Each of the spectra was then peak-picked in order to obtain the mass and intensity of all metabolites detected. The data from all modes were then merged to create one data file per sample. The data from all 90 discovery serum samples were then merged and aligned to create a two-dimensional metabolite array in which each sample is represented by a column, each unique metabolite is represented by a single row and each cell in the array corresponds to a metabolite intensity for a given sample. The array tables were then used for statistical analysis described in 'statistical analyses' (see Additional File 1). Ethyl acetate extracts of commercial serum (180 mL serum, 500 mg extract) was subjected to reverse phase flash column chromatography (FCC) with a step gradient elution; acetonitrile -water 25:75 to 100% acetonitrile. The fractions collected were analysed by LC/MS and MS/MS. The fractions containing the CRC biomarkers were pooled (12.5 mg). This procedure was repeated several times to obtain about 60 mg of CRC biomarker rich fraction. This combined sample was then subjected to FCC with a step gradient elution; hexanechloroform-methanol and the fractions collected subjected to LC/MS and MS/MS analysis. The biomarker rich fraction labelled sample A (5.4 mg, about 65%) was analysed by NMR. Sample A (3 mg) was then treated with excess ethereal diazomethane and kept overnight at room temperature. After the removal of solvent, the sample was analyzed by NMR.

TQ-MRM methodology
Serum samples were extracted as described for non-targeted FTICR-MS analysis, with the addition of 10 ug/ mL [ 13 C 1 ]cholic acid to the serum prior to extraction (resulting in a final ethyl acetate concentration of [ 13 C 1 ] cholic acid of 36 nM. The ethyl acetate organic fraction was used for the analysis of each sample. A series of [ 13 C 1 ]cholic acid dilutions in ethyl acetate from Randox serum extracts was used to generate a standard curve ranging between 0.00022 μg/mL and 0.222 μg/mL. 100 μL of sample were injected by flow-injection analysis into the 4000QTRAP™ equipped with a TurboV™ source with an APCI probe. The carrier solvent was 90% methanol:10% ethyl acetate, with a flow rate of 360 μL/ min into the APCI source. The source gas parameters were as follows: CUR: 10

Statistical analysis
FTICR-MS accurate mass array alignments were performed using DISCOVAmetrics™ version 3.0 (Phenomenome Discoveries Inc, Saskatoon, Canada). Statistical analysis and graphs of FTICR-MS data was carried out using Microsoft Office Excel 2007 and distribution analysis of TQ-MRM data and was analysed using JMP version 8.0.1. Meta Analysis (Fisher's inverse chi-square method) was carried out using SAS 9.2 and R 2.9.0. Two-tailed unpaired Student's t-tests were used for determination of significance between CRC and controls. P-values of less than 0.05 were considered significant. Receiver operating curve (ROC) curves were generated using the continuous data mode of JROCFIT http://www.jrocfit.org.

FTICR metabolomic profiling
The experimental workflow for the studies described in this paper is summarized in Figure 1. Non-targeted metabolomic profiles of sera from three independent populations of treatment-naive CRC patients and geographically and ethnically matched healthy controls (summarized in Table 1) were generated over a 24-month period (that is, each study was separated by approximately 12 months). The first study comprised 40 CRC patients and 50 control subjects acquired from Genomics Collaborative, Inc (GCI); the second study comprised 26 CRC subjects and 25 controls acquired from Seracare Lifesciences Inc; and the third study included 46 CRC and 35 controls prospectively collected in Osaka, Japan. In all cases, serum metabolites were captured through a liquid extraction process (see Methods), followed by direct infusion of the extracts using negative electrospray ionization (nESI) and negative atmospheric pressure chemical ionization (nAPCI) on an FTICR mass spectrometer. The resulting spectral data of all the subjects for each study was aligned within 1 ppm mass accuracy, background peaks were subtracted, and a two-dimensional array table comprising the intensities each of the sample-specific spectral peaks was created using custom informatics software (see Methods). Metabolic differences between CRC patient and control profiles for the three independent studies were visualized by plotting the control mean-normalized log ratio peak intensities across the detected mass range as shown in Figures 2A to 2C. In each independent study, a region of spectra between approximately 440 and 600 Da showed peaks consistently reduced in intensity in CRC patients relative to controls (green, yellow, orange and red points in Figure 2). On average, this cluster of masses showed between 50% and 75% reduction in CRC patient serum compared to the respective controls, with p-values of 1 × 10 -5 or lower in each study.
The overlap between each of the discovery studies was further investigated by ranking the top 50 masses based upon P-value from each study and comparing them with masses showing a significant difference (P < 0.05 between CRC and controls) in the other studies as shown in Table  2. For example, 46 of the top 50 metabolites (92%) with the lowest p-values in the GCI discovery set were also significant (P < 0.05) in the Seracare 1 dataset, while 31 out of the 50 GCI masses were significant (P < 0.05) in the Osaka dataset. Likewise, the top 50 metabolites in the Osaka study showed 88% and 94% redundancy with metabolites showing P < 0.05 in the GCI and Seracare 1 studies, respectively. These results indicated a very high degree of commonality among significantly differentiated masses across the three studies, and in fact, 63% of the top 50 masses in each study were also present within the top 50 of at least one of the other two studies (see Additional File 2). Of the top 50 rank-ordered masses, only those identified in more than one study were found to exist within the 440 to 600 Da mass range highlighted above and there was not a single peak detected outside this region which was significantly different between CRCs and controls in any two of the studies. Filtering for metabolic differences detected exclusively in all three studies (as well as removal of C13 isotopic peaks and redundant masses detected in both ESI and APCI), resulted in 13 masses representing individual 12 C metabolites as shown in Table  3. These represented the most statistically significant and robust discriminators among the three studies. Subsequent molecular formula assignments, as discussed further below, as well as related expression profiles, suggested that the metabolites belonged to a related chemical family. The relative intensities of the two lowest molecular weight molecules with nominal masses of 446 and 448 are shown in Figure 3A. We observed little to no correlation between the reduction of the metabolites and disease stage ( Figures 3A and 3B), and ROC curve analysis resulted in an average area under-the-curve (AUC) of 0.91 ± 0.03 ( Figure 3C; individual AUCs shown) across all three studies for all stages combined.
Computational assignments of reasonable molecular formulas were then carried out for the 13 masses identified above, as well as the top 50 for each discovery set shown in Additional File 2. The assignments were based on a series of mathematical and chemometric rules as described previously [23], which are reliant on high mass accuracy for precise prediction. The algorithm computes the number of carbons, hydrogens, oxygens and other elements, based on their exact mass, which can be assigned to a detected accurate mass within defined constraints. Logical putative molecular formulas were Figure 1 Study design. The study comprised three phases: Fourier transform ion cyclotron resonance mass spectrometry metabolomic discovery in three independent sample sets, structural investigation and determination of metabolic biomarkers as hydroxylated polyunsaturated ultra long-chain fatty acids and validation using a triple-quadrupole multiple reaction monitoring targeted assay. computed for masses in Table 3 (and Additional File 2), resulting in elemental compositions containing either 28, 30, 32 or 36 carbons and four to six oxygen. We used this information in the subsequent section to select appropriate molecules for structural comparison studies. Collectively, the results indicated a consistent 50% to 75% reduction of organically soluble oxygenated metabolites ranging between 28 and 36 carbons in length, in the serum of CRC patients compared to controls.

Structural elucidation
Selected ethyl acetate extracts of serum from the GCI cohort used in the FTICR-MS work described above were re-analysed using HPLC coupled to a quadrupole time-of-flight (Q-TOF) MS in full-scan APCI negative ion mode. Consistent with the FTICR-MS results, a cluster of peaks between approximately 440 and 600 Da at a retention time of between 16 and 18 min following reverse-phase HPLC was detected in asymptomatic control sera, but was absent from CRC patient serum (Figure 4). Molecular ions from all six C28 biomarkers (m/z 446, m/z 448, m/z 450, m/z 464, m/z 466 and m/z 468) as well as many of the remaining C32 and C36 markers were easily detectable within normal serum. Extracted masses up to 400 Da within the 16-18 min retention time showed similar peak intensities in both populations (Figure 4, region to the right of the box), as did extracted mass spectra at other retention times (not shown), reinforcing the specificity of this depleted metabolic region for CRC patient serum.
( Table 4 and Additional File 9). Collectively, these deductions indicated that the metabolomic markers were not analogues of vitamins A, D, E, K and steroids, but rather long-chain hydroxy fatty acids containing varying degrees of unsaturation. We collectively refer to these metabolites as hydroxylated polyunsaturated ultra long-chain fatty acids (hPULCFAs; where the term 'ultra' has been used to refer to C30 and longer chain fatty acids [34]). Next, an enrichment strategy using bulk serum extracts and a two-stage flash column chromatography approach followed by nuclear magnetic resonance (NMR) analysis was carried out to provide further structural verification of the hPULCFAs. First, reverse phase FCC using a water-acetonitrile solvent gradient was performed and the resulting fractions analysed by LC/MS. Fractions containing the hPULCFAs (fraction 9, Additional Files 10 and 11) were pooled and subjected to normal phase FCC using chloroform-methanol mixtures to obtain an approximately 65% rich semi-purified fraction labelled sample A (Additional File 12). LC and MS/MS analyses (MS2 and MS3) data on sample A were used to track and confirm enrichment of the markers. NMR ( 1 H, 13 C and 2D) analyses on sample A and its methyl esters revealed resonances and correlations (Table 6) consistent with very long chain polyunsaturated hydroxy fatty acids with observance of some suppression of resonances for hydrogen atoms attached to sp 2 carbons.

Independent validation using MRM methodology
Reduced levels of hPULCFAs in the blood of CRC patients was further confirmed using a MS/MS approach (see methods) in two more independent populations. The approach is based upon the measurement of parentdaughter fragment ion combinations (referred to as MRM) for quantifying analytes [28,35]. We developed an assay to specifically measure semi-quantitatively three of the 28 Figure 5A. Significantly lower levels (P < 0.001, actual values shown in Figure 5A) of each of the metabolites was observed in treatment-naive CRC-positive subjects compared to controls. ROC analysis resulted in AUCs of 0.87 ± 0.005 for each of the 28-carbon containing hPULCFAs ( Figure 5B). Plotting patients by disease stage showed a slight (but not significant) reduction between stage I and III, with stage IV subjects showing the least reduction ( Figures 5C and 5D), albeit it only seven subjects. The corresponding average AUCs of the 28-carbon pool by stage were 0.87 for stage I, 0.88 for stage II, 0.94 for stage III and 0.66 for stage IV. We next used the MRM method to characterize another independent population of CRC and control subjects from Chiba, Japan. Serum from 40 pre-treatment CRC subjects and 40 controls were analysed and a significant reduction was again observed in the CRC-positive group ( Figure  6A). The corresponding average AUC for the three metabolites was 0.97 ± 0.014 ( Figure 6B). In this study, a significant correlation with stage was observed (P < 0.05) for all comparisons between stages I, II and III/IV (Figures 6C  and 6D). The AUCs by stage were 0.93 for stage I, 0.97 for stage II and 1.0 for stage III/IV (two stage IVs were grouped with stage III; Figure 6D).

Discussion
We report here on the discovery of novel hydroxylated polyunsaturated ultra long-chain fatty acids containing between 28 and 36 carbons reduced in the serum of CRC patients compared to healthy asymptomatic controls. The utility of non-targeted metabolomics using high resolution FTICR-MS coupled with flow injection technology for biomarker discovery was demonstrated by applying the technology to three independent test populations. In contrast to the 'training/test-set' approach often used by splitting a single sample set in half to validate the performance of biomarkers [36][37][38], which often relies on complex algorithms (see review [39]) and can result in bias [40], we carried out fully independent discovery analyses on three separate sample sets of matched cases and controls of different ethnic backgrounds collected from multiple sites around the world to ensure a high degree of robustness and minimal chance of sampling bias. Of the top 50 metabolic discriminators discovered in the Osaka set, 44 and 47 of these were also significantly changed in the GCI and Seracare sets, respectively. This remarkable inter-study agreement indicates that not only is non-targeted FTICR-MS technology a reproducible biomarker discovery engine, but that disease-related metabolomic changes can be highly conserved across geographic locations and races. The reduction of hPULCFAs in the serum of CRC patients was further validated by translation of the non-targeted FTICR-MS discovery into a simple targeted TQ-MRM method for three hPULCFAs, which was used on two further independent and ethnically diverse case-control test populations. ROC AUCs generated from the TQ-MRM method on the two validation studies were consistent with those based upon the same fatty acids detected in the three FTICR-MS discovery studies (Figures 3, 5 and  6). In total, five independent study populations collectively comprising 222 treatment-naive CRC patient samples and 220 disease-free asymptomatic controls were evaluated using two different analytical methods. Indeed, the likelihood of the reported association between the reduction of hPULCFAs and CRC being a false positive result across the five independent sets of samples is astronomically low. Meta-analysis was performed on the false positive rates using ; p = P-values of five independent samples, k = five different samples, C = upper tail of the chi-square distribution with 2 k degrees of freedom ( X 0 05 10 2 . , = 18.31)) [41,42]. Based upon the meta-analysis, the resulting P-values for markers 446 and 448 were more significant than the individual P-values, at 2.96 × 10 -47 and 8.11 × 10 -49 , respectively. Although there were differences in the median ages between the CRC and control cohorts in two of the studies, there was no statistically significant trend between age and hPULCFA levels within the individual cohorts and we observed no significant difference between hPULCFA concentrations among the controls from the different populations (not shown). We also observed no differences between genders, and although there were slightly higher BMI levels in the control cohorts for the GCI and Seracare 1 cohorts, the BMIs were matched in the second Seracare validation population suggesting the markers are not related to BMI. A prospective analysis of disease-free subjects equally distributed across various age groups is underway specifically to address any potential age or BMI effects in more detail. Overall our results indicate with a high degree of confidence that a reduction in these metabolites is correlated with the presence of CRC.
The FTICR-MS provided resolution sufficient for confident molecular formula predictions based upon accurate mass in conjunction with extraction, ionization, and statistical correlative information. Although multiple elemental compositions were theoretically assignable to given biomarker masses, only formulas having 28 to 32 carbons, and four to six oxygen were consistently assignable to common masses detected in two or three of the discovery sets. Given a high degree of statistical interaction between the sample-to-sample expression profiles of the hPULCFAs (that is, a high degree of correlation between the relative intensities of the markers across subjects) we suspected they were all part of the same metabolic system and should therefore show related compositions. Detection in negative ionization mode also reduced the likelihood that nitrogen was present in any of the compositions. This information in conjunction with tandem mass spectrometry showing prominent losses of water and carbon dioxide enabled the determination of molecular formulas as shown in Table 3 and Additional File 2. A number of candidate classes of molecules theoretically fitting the molecular formula class were easily excluded using tandem MS. For example, we observed no fragments indicative of condensed ring systems such as those in steroids or vitamin D, and no fragments indicative of chroman ring systems such as those observed in the vitamin E tocopherols. Several other classes of molecules including vitamin K and retinol, and bile acids such as cholic acid and 3β,7α-dihydroxy-5-cholestenoic acid also did not  Table 5 Tandem mass spectrometric results of various standards     preparatory HPLC and chemical synthesis is in progress and will be reported in subsequent publications. Interestingly, the metabolite markers reported in this study represent a human-specific metabolic system. We analysed serum samples from multiple species, including rat, mouse and bovine, as well as multiple different sample sources including numerous cell lines, conditioned media, tumour and normal colonic tissue from patients in the GCI discovery set, and brain, liver, adipose and other tissues from various species, all of which failed to show any detectable levels of these hPULCFAs (results not shown). We also could not detect these molecules in various plant tissues or grains, including policosanol extracts which are rich in saturated C28 and longerchain fatty acids [43,44]. This suggests that the molecules may originate from human-specific metabolic processes, such as specific p450-mediated and/or microbiotic processes. The lack of detection in tumour or normal colonic tissue suggests that the metabolites are not 'tumour-derived markers' and, combined with the high rate of association in stage I cancer, it is not likely that the reduction is the result of tumour burden. Analysis of post-surgery samples is currently in progress to address this question. However, the further reduction of levels observed in some late stage Japanese cases ( Figure  6) could be explained if lower levels of the hPULCFAs were indeed indicative of progression rate in this group. It is also important to note that in all control groups reported in this paper, subjects were not colonoscopyconfirmed to be free of tumours or advanced neoplasia. Based upon colonoscopy results by Collins et al in average-risk subjects, up to 10% of an asymptomatic population can be positive for advanced neoplasia [45]. Therefore, the ability of these metabolites to discriminate between subjects at risk and not at risk for CRC is likely under-estimated in our results. Studies are currently in progress to evaluate endoscopy-confirmed controls, to assess the effect of treatment on the markers, and to investigate any possible association with various grades of colon pathologies and non-malignant GI disorders as well as other cancers.
Although fatty acids of this length containing hydroxyl groups have never been reported as far as we are aware, they appear to resemble a class of hydroxylated very long-chain fatty acids knows as the resolvins and protectins that originate from the n3 essential fatty acids EPA and DHA, respectively, which are critical in promoting the resolution of acute inflammation. The inability to sufficiently 'resolve' acute inflammation is the leading theory behind the establishment of chronic inflammatory states which underlie multiple conditions including cancer [46] and Alzheimer's Disease [47]. Of particular relevance is the effect of pro-resolution long-chain hydroxy fatty acid mediators on intestinal inflammatory conditions such as irritable bowl disease (IBD), Crohn's Disease, Colitis and colon cancer. Both Resolvin E1 (RvE1) and Lipoxin A4 (LXA4) have been implicated with protective effects against colonic inflammation. RvE1 was shown to protect against the development of 2,4,6-trinitrobenze sulphonic acid-induced colitis in mice, accompanied by a block in leukocyte infiltration, decreased proinflammatory gene expression, induced nitric oxide synthase, with improvements in survival rates and sustained body weight [48]. Similarly, LXA4 analogues have been shown to attenuate chemokine secretion in human colon ex vivo [49], and attenuated 50% of genes, particularly those regulated by NFB induced in response to pathogenically induced gastroenteritis [50]. In vivo, LXA4 analogues reduced intestinal inflammation in DSS-induced inflammatory colitis, resulting in significantly reduced weight loss, haematochezia and mortality [50]. Structurally, resolvins and protectins (as well the n6 lipoxins) comprise mono-, diand tri-hydroxylated products of the parent VLCFAs, catalyzed by various lipoxygenases, cyclooxygenases and p450 enzymes [51][52][53][54][55]. The possibility that the hPULC-FAs reported here represent elongation products of these molecules cannot be excluded. Future studies will be required to address the origin, as well as the biological role, if any, that these molecules may play in defending the body against CRC development.
Although we report results from multiple case-control cohorts each having a limited sample size, the average AUC across all the samples reported here was 0.91 ± 0.04, which translates into approximately 75% sensitivity at 90% specificity with little to no disease-stage bias. The real-world screening performance is currently being evaluated through two large ethically approved prospective clinical trials, one in collaboration with the Saskatchewan Cancer Agency and the Saskatchewan Provincial Government (PDI-CT-1; n = 5000), and the other with the University of Calgary (PDI-CT-3 n = 1500). Clinically relevant questions are being addressed, including correlation between hPULCFAs and CRC in a prospective hospital screening environment, correlation with other non-malignant gastrointestinal disorders (such as IBD, Crohn's and colitis), whether there is any correlation with various stages of neoplasia or polyps and family history and whether subjects with low hPULCFA levels show higher incidence rates of CRC than subjects with 'normal' levels over time.
In summary, we have identified a consistent reduction of novel circulating hPULCFAs in CRC patients which could have considerable implications for CRC diagnosis and screening and possibly prevention and treatment. Adherence to currently recommended screening modalities, namely faecal occult blood testing and colonoscopy, is poor due to a number of factors including public acceptance, risk, cost and available resources. The use of a serum-based test to screen the population for subjects who are high risk would focus endoscopy resources on subjects who need it the most, resulting in a higher detection rate, particularly in early stages of the disease. Given the positive prognosis of early-stage therapeutic intervention, it is tempting to speculate that hPULCFA-based screening could one day result in decreased CRC mortality.

Conclusions
We have shown that comprehensive non-targeted metabolomics technology based upon high-resolution FTICR mass spectrometry represents a powerful and robust approach for small-molecule biomarker-driven discovery. Accurate mass measurements combined with conventional MS/MS resulted in the rapid identification of key structural characteristics of the novel metabolites discovered and the assignment of putative chemical structures. The subsequent translation of these metabolite biomarker discoveries into an efficient and clinically viable high-throughput semiquantitative triple-quadrupole platform represents a significant advancement in the clinical implementation of biomarker discoveries. The reduction of systemic hydroxylated ultra-long chain fatty acids in CRC patients raises intriguing biological and aetiological questions given the large numbers of sporadic CRC cases and the heavy influence of lifestyle and diet on risk. Further research is ongoing regarding the potential role(s) these novel molecules play in CRC progression and whether they have any association with previously established risk factors.
Additional file 1: Fourier transform ion cyclotron resonance mass spectrometry feature data. Click here for file [ http://www.biomedcentral.com/content/supplementary/1741-7015-8-13-S1.XLSX ] Surgery and was involved in the experimental design of the Osaka discovery sample set, data analysis/interpretation and unblinding of data. HM was the head of surgery (for CRC population, Chiba) responsible for the patient enrolment and selection for the Chiba samples. FN was the group leader at Chiba overseeing the entire project, including protocol design and approvals, data analysis and unblinding. DBG was the President and CEO of Phenomenome Discoveries Inc, and oversaw most of the efforts at PDI, was integrally involved in the interpretation of MS/MS data, the development of FTICR methodology, the experimental designs and was also a significant contributor to the format and direction of the manuscript.