Urine metabolome profiling of immune-mediated inflammatory diseases

Background Immune-mediated inflammatory diseases (IMIDs) are a group of complex and prevalent diseases where disease diagnostic and activity monitoring is highly challenging. The determination of the metabolite profiles of biological samples is becoming a powerful approach to identify new biomarkers of clinical utility. In order to identify new metabolite biomarkers of diagnosis and disease activity, we have performed the first large-scale profiling of the urine metabolome of the six most prevalent IMIDs: rheumatoid arthritis, psoriatic arthritis, psoriasis, systemic lupus erythematosus, Crohn’s disease, and ulcerative colitis. Methods Using nuclear magnetic resonance, we analyzed the urine metabolome in a discovery cohort of 1210 patients and 100 controls. Within each IMID, two patient subgroups were recruited representing extreme disease activity (very high vs. very low). Metabolite association analysis with disease diagnosis and disease activity was performed using multivariate linear regression in order to control for the effects of clinical, epidemiological, or technical variability. After multiple test correction, the most significant metabolite biomarkers were validated in an independent cohort of 1200 patients and 200 controls. Results In the discovery cohort, we identified 28 significant associations between urine metabolite levels and disease diagnosis and three significant metabolite associations with disease activity (PFDR < 0.05). Using the validation cohort, we validated 26 of the diagnostic associations and all three metabolite associations with disease activity (PFDR < 0.05). Combining all diagnostic biomarkers using multivariate classifiers we obtained a good disease prediction accuracy in all IMIDs and particularly high in inflammatory bowel diseases. Several of the associated metabolites were found to be commonly altered in multiple IMIDs, some of which can be considered as hub biomarkers. The analysis of the metabolic reactions connecting the IMID-associated metabolites showed an over-representation of citric acid cycle, phenylalanine, and glycine-serine metabolism pathways. Conclusions This study shows that urine is a source of biomarkers of clinical utility in IMIDs. We have found that IMIDs show similar metabolic changes, particularly between clinically similar diseases and we have found, for the first time, the presence of hub metabolites. These findings represent an important step in the development of more efficient and less invasive diagnostic and disease monitoring methods in IMIDs. Electronic supplementary material The online version of this article (doi:10.1186/s12916-016-0681-8) contains supplementary material, which is available to authorized users.

The IMIDC is a network of clinical and biomedical researchers investigating the molecular basis of immune-mediated inflammatory diseases. Informed consent was obtained from all participants, and protocols were reviewed and approved by local institutional review boards.
This study was conducted in accordance with the Declaration of Helsinki principles.
The inclusion and exclusion criteria for each IMID disease were as follows: • RA: diagnosed according to the revised American College of Rheumatology (ACR) 1987 diagnostic criteria for RA 4 , and with >2 years of disease evolution. All patients had joint erosions in either hands or feet. Concomitant cutaneous psoriasis was an exclusion criterion for RA patients.
• PsA: diagnosed according to the CASPAR diagnostic criteria for PsA 5 with >1 year of disease evolution. Exclusion criteria for PA included the presence of any other form of inflammatory arthritis, or rheumatoid factor levels greater than twice the normality threshold.
• CD: diagnosed according to the standard Lennard-Jones diagnostic criteria for CD 6 .
Concomitance of any other IMID was an exclusion criterion for CD patients.
• UC: diagnosed according to the Lennard-Jones diagnostic criteria for UC 6 . Concomitance of any other IMID was an exclusion criterion for UC patients.
• SLE: diagnosed according to the ACR diagnostic criteria for SLE 7 and with ≥3 years of disease evolution. Concomitance of any other rheumatic disease, cutaneous psoriasis or inflammatory bowel disease was an exclusion criterion for SLE patients.
• Ps: All eligible PS patients had to have chronic plaque type of PS affecting torso and/or extremities with at least 1 year of duration at the time of recruitment.
All IMID patients were >18 years old at the time of sample collection and were born in Spain.
Also, all patients were Caucasian and with all grandparents and parents born in Spain.
Control individuals were recruited in collaboration with the Spanish National DNA Bank from blood bank donors in 13 Spanish hospitals. All the controls were screened for the presence of any autoimmune disorder, as well as for first-degree family occurrence of autoimmune diseases.
If positive, the individuals were excluded. All control individuals had also to be Caucasian with all four grandparents born in Spain. Most of the control individuals (i.e. 98%) were ≥40 years old at the time of recruitment.
Study design. The urine metabolomics study in IMIDs was designed following a two-stage approach [8][9][10][11] . In the first stage (i.e. discovery stage), we analyzed the urine metabolome of In both the discovery and validation stages, patients within each IMID disease were selected from the IMIDC sample biobank (IMID-Biobank) in order to represent the two extremes of disease activity (i.e. high and low disease activity; Supplementary Figure 1). For each IMID, established scores were used to measure the disease activity at the time of sample collection ( Table 1). Within the SLE cohort we used the maximum of the BILAG and SELENA-SLEDAI indices to select the patients according to disease activity. Importantly, patients and controls selection was performed in order to minimize the differences in potential confounding epidemiological variables like gender, age or body mass index, as well as technical variables   13 C, 31 P) gradient cryoprobe. One-dimensional 1 H, 13 C-HSQC 14 databases were used for 2D structural confirmation of those 1D assignments.
Spectral processing, metabolite identification, normalization and scaling. In the discovery stage, the raw spectral 1 H-NMR profiles were processed using FOCUS software 12 . FOCUS is a complete workflow for NMR data processing that efficiently integrates the different analytical steps, including peak identification, alignment and quantification. One of the main advantages of this method is that it is able to efficiently process large numbers of samples like in the present study. Also, FOCUS includes the RUNAS algorithm, which performs robust peak alignment without the need of a reference spectrum. The RUNAS algorithm has shown to have much better performance than other algorithms, particularly, in metabolomic studies with significantly unaligned spectra.
In this study, spectral processing was performed using spectral windows with a 50% of overlap and a length of 0.077 ppms corresponding to 256 spectral data points. The minimum peak width was set to 0.01 ppms, and the peak frequency threshold was set to 5%. After running FOCUS, we applied several quality control filters at the peak level in order to guarantee the quality of the final set of peaks. First, NMR peaks from windows showing a high degree of sample unalignment were discarded. Significant unalignment was defined when the standard deviation of the shift corrections applied to the spectra was higher than one third of the window length.
Second, spectral peaks showing strong evidences of overlapping with other peaks were also discarded.
After applying the quality control filters, we performed the metabolite identification in order to assign to each spectral peak or set of spectral peaks their corresponding metabolite (Table S1).
In those cases when multiple spectral peaks represent one same metabolite, we selected the peak showing the highest intensity levels as well as the lowest degree of overlapping (if any).
Importantly, exogenous metabolites generated from the metabolism of drugs (e.g. ibuprofen, acetaminophen and 5-aminosalicylic acid) or metabolites related with sample processing (e.g. methanol and ethanol) were identified and subsequently removed from the analysis. Finally, those metabolites that had a high missingness rate (>15% missingness, n=2), were also discarded from the analysis.
In the validation stage, the spectral 1 H-NMR profiles from the case and control urine samples were also processed using FOCUS software. In this case, and in order to improve the accuracy of the data analysis, the NMR signal processing was restricted to the metabolites identified in the discovery stage. Consequently, the parameters used in FOCUS NMR signal processing were set to accurately analyze the spectral regions that were informative in the discovery stage.
Intensity normalization is an important step in 1 H-NMR spectral processing and it can be critical when analyzing urine samples 9,[15][16][17] . The objective of this analytical step is to correct the variability introduced by the dilution of metabolites naturally present in urine samples.
Normalization to creatinine area was discarded, since the urine concentrations of this metabolite have been associated to gender, muscle mass or dietary habits 9,18 . We therefore chose normalization to the total spectral area (i.e. area under the curve; AUC), a robust method that has been extensively used in multiple urine metabolomics studies [19][20][21][22] . In order to avoid the bias introduced by metabolites present in high concentrations in urine, we excluded the following regions from the total spectral calculation: creatinine (4.00 to 4.10 and 3.00 to 3.10 ppm), water

Epidemiological and sample collection variables.
There is substantial evidence that metabolite levels in human body fluids are influenced by many different factors that can act as confounders in disease association analyses 8,10,11,25 . In order to control for these factors, we Three types of clinical association analyses of urine metabolites were performed: • Diagnostic biomarkers: the urine metabolite levels of each IMID disease were compared against the levels in the control cohort. In this analysis approach we used all the patients from each IMID and we also performed an analysis using only the patients showing high disease activity levels.
• Differential diagnostic biomarkers: the metabolite levels were compared between IMID diseases that have the most similar clinical features (i.e. CD vs. UC, RA vs. PsA, Ps vs. PsA, and RA vs. SLE).
• Activity biomarkers: within each IMID disease, we compared the urine metabolite levels between high and low disease activity groups of patients.
Correction for multiple testing was performed using the false discovery rate (FDR) method 27 .
After multiple test correction, a significant metabolite association (i.e. FDR-adjusted P-Value <0.05) was subsequently selected for replication in the independent validation dataset. In the validation analysis, multiple test correction was also performed using the FDR approach.
Combined P-Values of the discovery and validation datasets were computed using Fisher's method 28 . IMID diseases and metabolites were clustered according to the replicated metabolite-disease associations using hierarchical clustering (i.e. function "heatmap.2" of the R package "Gplots" 29 ). The input data for the clustering was a matrix (metabolites x disease), containing the association -log 10 (P-Values) between the corresponding row metabolite and the corresponding column disease.
The receiver operating characteristic (ROC) curve analysis for metabolomic diagnostic biomarkers was performed to evaluate the prediction power of the urine metabolome to distinguish between healthy subjects and IMID patients as described previously 23,30 . The area under de curve (AUC) metric was used to assess the performance of the classifiers. The ROC curves were computed using the "pROC" R package 31 . The optimal threshold was determined by maximizing the sum of the sensitivity and specificity parameters across de ROC curve. The prediction models were built using only those metabolites associated in the discovery cohort for each IMID disease. The concentrations of each cohort (i.e. discovery and validation) were scaled using Pareto scaling based on the mean and variance of the corresponding control cohorts. The classification models were based on the PLS-DA algorithm included in the "mixOmics" R package 32 . In this analysis, two latent variables and the entire set of metabolites associated at the single level in the discovery cohort were used. We used the validation dataset to assess the performance of the PLS-DA model built with the entire discovery dataset. This approach corresponds to a true validation on a completely independent dataset.