Skip to main content
  • Research article
  • Open access
  • Published:

Integrated epigenome, whole genome sequence and metabolome analyses identify novel multi-omics pathways in type 2 diabetes: a Middle Eastern study



T2D is of high prevalence in the middle east and thus studying its mechanisms is of a significant importance. Using 1026 Qatar BioBank samples, epigenetics, whole genome sequencing and metabolomics were combined to further elucidate the biological mechanisms of T2D in a population with a high prevalence of T2D.


An epigenome-wide association study (EWAS) with T2D was performed using the Infinium 850K EPIC array, followed by whole genome-wide sequencing SNP-CpG association analysis (> 5.5 million SNPs) and a methylome-metabolome (CpG-metabolite) analysis of the identified T2D sites.


A total of 66 T2D-CpG associations were identified, including 63 novel sites in pathways of fructose and mannose metabolism, insulin signaling, galactose, starch and sucrose metabolism, and carbohydrate absorption and digestion. Whole genome SNP associations with the 66 CpGs resulted in 688 significant CpG-SNP associations comprising 22 unique CpGs (33% of the 66 CPGs) and included 181 novel pairs or pairs in novel loci. Fourteen of the loci overlapped published GWAS loci for diabetes related traits and were used to identify causal associations of HK1 and PFKFB2 with HbA1c. Methylome-metabolome analysis identified 66 significant CpG-metabolite pairs among which 61 pairs were novel. Using the identified methylome-metabolome associations, methylation QTLs, and metabolic networks, a multi-omics network was constructed which suggested a number of metabolic mechanisms underlying T2D methylated genes. 1-palmitoyl-2-oleoyl-GPE (16:0/18:1) – a triglyceride-associated metabolite, shared a common network with 13 methylated CpGs, including TXNIP, PFKFB2, OCIAD1, and BLCAP. Mannonate – a food component/plant shared a common network with 6 methylated genes, including TXNIP, BLCAP, THBS4 and PEF1, pointing to a common possible cause of methylation in those genes. A subnetwork with alanine, glutamine, urea cycle (citrulline, arginine), and 1-carboxyethylvaline linked to PFKFB2 and TXNIP revealed associations with kidney function, hypertension and triglyceride metabolism. The pathway containing STYXL1-POR was associated with a sphingosine-ceramides subnetwork associated with HDL-C and LDL-C and point to steroid perturbations in T2D.


This study revealed several novel methylated genes in T2D, with their genomic variants and associated metabolic pathways with several implications for future clinical use of multi-omics associations in disease and for studying therapeutic targets.

Peer Review reports


The worldwide prevalence of Type 2 diabetes (T2D) in 2021 was estimated to be 9.8% [1]. However, the prevalence in countries from the Middle East and North Africa was 18.1%. The prevalence of T2D in Qataris has been estimated to be 14–17% [2], similar to other Middle Eastern populations. Such high prevalence of T2D in Qatar compared to western populations increases the risk of diabetes complications, including cardiovascular disease, retinopathy, nephropathy and early mortality. More than 400 independent genetic loci have been found to be associated with T2D across different ethnicities and yet explain only a modest portion of the prevalence [3,4,5]. T2D is caused by complex interactions of multiple factors including genetic, epigenetic and environmental influences. Current evidence indicates that lifestyle and environmental factors can influence gene expression and clinical phenotype through epigenetic mechanisms such as changes in DNA methylation. Identifying the epigenetic factors driving T2D are thus important to our understanding of T2D etiology.

DNA methylation in blood or pancreatic beta cells has been associated with T2D in multiple populations [6,7,8,9,10,11,12,13]. Most used a previous version of the Infinium EPIC array utilizing 450K CpGs (rather than the newer 850K array) and involved western populations. A previous study done on Qataris (n = 123) demonstrated heterogeneity in methylation between Qatari and UK populations, indicating the importance of studying multiple world populations with varying prevalence of T2D [14].

Genetic variants have also been associated with methylation sites, referred to as methylation quantitative trait loci (meQTLs), and have further elucidated factors affecting methylation. For example, 4.7 million cis- and 630,000 trans-meQTL variants were identified in one study using the 450K array [15]. Using the 850K EPIC arrays, another study identified a large number of significant SNP-CpG pairs related to various features of diabetic kidney disease [16]. Those, and other previous studies, used imputed genotypes from available SNP arrays.

Circulating metabolites have also been associated with T2D and its complications [17, 18]. Since metabolite levels are influenced by both genetics [19] and epigenetics, and methylation affects gene expression which in turn affects metabolic pathways, T2D methylation would be expected to be associated with metabolites and metabolic pathways related to T2D.

This study focuses on the epigenetic associations with T2D in a large number of subjects from the Qatari population to highlight the similarities and differences in T2D methylation compared to other populations. The integration of genomics and metabolomics with epigenetics was used to better understand the biological mechanisms underlying methylation changes associated with diabetes. First, the larger Infinium 850K EPIC array was used to identify novel methylation associations with T2D using Qatari samples. Second, whole genome-wide sequencing (> 5.5M SNPs) was used to identify methylation quantitative trait loci (meQTLs) for the T2D-associated CpGs to highlight genetically driven CpGs. Third, an untargeted panel of metabolites (Metabolon) was used to identify the associations of the T2D CpGs with the metabolome. Finally, combining these results with metabolic correlations, multi-omics networks were constructed to find functional metabolic pathways connecting the methylated genes and to understand their biological relation to T2D in this high-risk Middle Eastern population.



The Qatar Biobank (QBB) is Qatar's National Repository Centre for biological samples and health information [2]. T2D patients were selected based on the following criteria: a) The patient replied “Yes” to the question “Have you ever been diagnosed by a doctor as diabetic;” or b) HbA1c was ≥ 6.5); and c) The self-reported age of diabetes onset was above 30 years or missing. Controls were selected to exclude any diabetics (T2D or T1D) according to the above criteria.

Samples were delivered to Weill Cornell Medicine’s genomics core for methylation in three batches as kits became available. Subjects in cohorts 1 (454 samples after QC, 39% with T2D) and 2 (381 samples after QC, 48% with T2D) were selected randomly within disease category with a target to select approximately 40% of the total sample (actual final percent was 43%) with T2D to increase statistical power. A third cohort, Cohort 3 (191 samples after QC, 33% with T2D), was selected after the T2D EWAS analyses of cohorts 1 and 2 were completed (batch 3) and was combined with these cohorts to increase the power of the methylation-SNP (meQTL) association analyses.

DNA methylation profiling, quality control and statistical methods

A total of 1056 blood samples were collected and profiled using an Infinium Methylation EPIC 850K beadchip (Illumina, Inc). The R package and library “minfi” were used for quality control. Samples that failed the “qc” function and sex assignment were removed. Samples or sites with a median detection p value < 0.05 were kept. ~ 30,000 CpG sites were removed after dropping CpGs in a SNP locus (“droploci” function). Normalization was done using “Funnorm” [20] and beta values were used for the analysis. A total of 813,660 CpGs in autosomal chromosomes, and 1026 samples remained after QC (details in Additional file 1).

For the T2D EWAS, CpGs associated with T2D in cohort 1 were tested for replication in cohort 2 and those discovered in cohort 2 were tested for replication in cohort 1. Each time, the discovery p-value of 5.8 × 10–8 and the replication p-value of 0.05/(#EWAS significant CpGs identified in the discovery cohort) were used. This experiment was repeated two times, with and without adjusting for BMI, and all CpGs with significant replication in both experiments were combined.

Linear regression models were used for T2D EWAS associations, with CpG as the dependent variable and T2D the independent variable. Covariates included were sex, two principal components of the cell counts measured in the lab (neutrophils, basophils, eosinophils, monocytes, and lymphocytes), plate number, batch effects, sample well position, smoking surrogate (AHRR CpG site cg05575921) and three whole genome principal components to correct for population stratification [21] and relatedness. Batch effects were determined according to groups of samples that were profiled together, and included differences in recording the gender code in some batches, which was included as a batch-gender interaction effect. Because the random selection of subjects for cohorts 1 and 2 resulted in different distributions of BMI among cases and controls, we used both BMI-adjusted and BMI-unadjusted association models. As age was found to be collinear with T2D in our cohorts, an age residual model was used for age adjustment in the T2D EWAS, where the CpGs were first regressed with age in controls and then CpGs in all samples were adjusted using the β0 (intercept) and β1 coefficients resulting from regression. Moreover, the CpGs that were identified as associated with T2D in the EWAS were further tested for association with age at a later stage where a larger set of controls was available. Only two CpGs were found to be associated with age as indicated in Additional file 2: Table S1.

Finally, pathway analysis was done using the Enrichr online tool [22, 23], where all genes of the 66 identified CpGs were submitted and KEGG pathways were selected. All KEGG results were used to assign the genes to pathways in order to discuss their biological relevance to diabetes.

Whole genome sequence, quality control and statistical methods (methylation quantitative trait loci)

A total of 5 million variants remained for analysis after quality control (Additional file 1) for 1026 samples. The subjects were divided into discovery (n = 703) and replication cohorts (n = 323) according to the genomic profiling batches. For the meQTL analysis (SNP-CpG), mixed models were used and included kinship, age, sex, T2D status, BMI, two principal components of the actual cell counts (neutrophils, basophils, eosinophils, monocytes, and lymphocytes), plate number, batch effects, sample well position, smoking surrogate (AHRR CpG site cg05575921) and three whole genome principal components as covariates. This analysis was done using the “Genabel” package [24] in the R statistical package, specifically the “polygenic” and “mmscore” functions [25].

Metabolomics, quality control and statistical methods (methylation-metabolite association analysis)

A total of 1160 serum metabolites were measured using an untargeted metabolomics platform by Metabolon Inc. in 2985 samples. Data was quality controlled by log transforming the values, removing outliers (above or below 3 standard deviations from the mean over all samples for each metabolite) and z-scoring the metabolite values. A total of 936 metabolites remained for association with T2D after quality control. For the T2D MWAS analyses, the samples were randomly divided into a discovery (n = 1791, 70%) and a replication cohort (n = 1194, 30%). Additional file 2: Table S12 shows the distribution of gender, age and BMI in the discovery and replication cohorts.

For CpG-metabolite association analyses, 708 samples from combined cohorts 1, 2 and 3, that had both metabolomics and methylation were divided by batch into two sets, and METAL software [26] was used for the meta-analysis. Associations with heterogeneity at p < 0.05 were excluded and corresponded to Isq > 74.

First, a T2D metabolome wide association study (MWAS) was performed to identify associations between metabolites and T2D, using linear regression models, with the metabolite as the dependent variable and T2D as the independent variable, including age, sex, BMI, and three whole genome principal components as covariates. Second, using the metabolites identified as significant from the MWAS and the CpGs significant from the T2D EWAS, CpG-metabolite associations were computed using the same model and covariates (excluding BMI) from the T2D EWAS analyses, replacing T2D status with T2D metabolites.

Mendelian randomization

A two sample mendelian randomization (2SMR) analysis was used where CpG-HbA1c associations were computed in our cohort for CpGs that had meQTL SNP-HbA1c association statistics reported in the GWAS catalogue or Stanford Bio Bank. Linear regression models for CpG-HbA1c associations included all covariates considered in the EWAS analysis except for T2D. CpGs in HK1 and PFKFB2 were found significantly associated with HbA1c using cohort 1 and cohort 2 as discovery and replication cohorts, respectively. The causal associations were then tested for those genes with HbA1c. 2SMR statistics were computed using “mr_maxlik” (maximum likelihood) and “ivw” (inverse variance weighted) functions in “MendelianRandomization” package in R.

Network analysis

Metabolites were corrected for all covariates mentioned previously and residuals were used for the partial correlation analysis, using “GeneNet” package in R, and utilizing a Bonferroni p-value threshold. Pairs of all identified CpG associations and significantly correlated metabolites were combined and visualized using Cytoscape software.


Figure 1 summarizes the overall study design. Three cohorts of samples were obtained from the Qatar Biobank over three years with 43.5% being T2D patients (Table 1 shows the cohort characteristics). Cohorts 1 and 2 were used for the T2D Epigenome Wide Association Study (EWAS) (n = 835) and cohort 3 was added to improve the statistical power of SNP-CpG association analysis (n = 1026). Samples from the 3 cohorts that had both metabolomics and methylation were used for the CpG-metabolite association analyses (n = 708).

Fig. 1
figure 1

Detailed flow of the study of T2D using data from epigenetic, whole genome and metabolomics data

Table 1 Sample characteristics for 3 cohorts selected from the Qatar Biobank

66 CpG sites in 48 genes were associated with T2D

A T2D EWAS analysis was performed using cohorts 1 and 2, in a two-way discovery and replication analysis. A total of 47 CpGs were significant at an EWAS significance threshold of p-value p < 5.8 × 10–8 in cohort 1 and replicated in cohort 2. An additional 10 CpGs were significant when using cohort 2 as the discovery cohort and cohort 1 as the replication cohort. When including BMI as a covariate, 33 CpG sites were identified in cohort 1 and replicated in cohort 2 and 14 CpG sites were identified as significant in cohort 2 and replicated in cohort 1. In total, 74 CpGs from models with or without adjustment for BMI were identified and replicated, out of which 66 CpGs in 48 genes had the same direction of association in both cohorts (Table 2, Additional file 2: Table S1, Fig. 2). Twenty-seven of the 66 CpGs were significant both with and without BMI adjustment. Of the 66 CpGs, only two CpGs in TXNIP were lower in T2D patients compared to nondiabetics; all other CpGs were higher in T2D compared to nondiabetics.

Table 2 66 CpG sites associated with T2D from two regression models (with and without BMI adjustment)
Fig. 2
figure 2

Manhattan plot of 66 CpGs associated with T2D. The red line indicates the Bonferroni p-value threshold for significance

Three of the 66 CpGs (TXNIP, POR, and DQX1) were previously reported at an EWAS significance, and 63 CpGs are thus novel. Thirty one of the 66 identified CpGs have been previously reported to be methylated in T2D or related traits at a nominal p-value (p < 0.05), including fasting glucose, HOMA IR, HbA1c [6, 16], or BMI (Additional file 2: Table S2). An additional 15 T2D CpGs were reported to be associated with kidney function, albuminuria and kidney function decline in [16] (Additional file 2: Table S2). We also replicated T2D methylation associations [6,7,8, 27, 28] at a replication p-value in the larger cohort using the “no-BMI” model (Additional file 2: Table S3).

Five of the 48 methylated genes, namely TCF3, OCIAD1, SPRED2, DOCK10 and LMTK2 were found to be associated with T2D in the GWAS catalogue [29], (Additional file 2: Table S2). The Biobank-based Integrative Omics Studies website (BIOS QTL) [30], showed that 7 of the 66 CpGs, namely LRFN1, RP11-629O1.2, PABPC4, CD81, COL7A1, NLGN2 and HCG18 have expression-methylation associations at FDR < 0.05 (Additional file 2: Table S4).

KEGG pathways containing the genes associated with the significant CpGs included fructose and mannose metabolism (PFKFB2, HK1) and insulin signaling (PPP1R3E, HK1, PRKAR2A). HK1 is linked to galactose, starch and sucrose metabolism, carbohydrate absorption and digestion, glycolysis and gluconeogenesis, and amino sugar and nucleotide sugar metabolism (Additional file 2: Table S5).

688 whole genome meQTLs in 27 loci were identified for T2D CpGs

The 66 significant CpG sites were tested for association with whole genome sequence SNPs (> 5.5 million SNPs) in 703 samples and tested for replication in 308 samples (see Methods). A total of 688 associations were significant in the discovery cohort at p < 8.7 × 10–10 (Fig. 3, Additional file 2: Table S6) and constituted 22 unique CpG sites that were associated with 27 unique genes, in 27 unique genic and intergenic loci. Out of 688 SNP-CpG pairs (27 loci), 222 pairs (7 loci) were either replicated at the exact SNP position or in the same locus (within 500 kb of the SNP position) in the Qatari replication cohort, and 369 pairs confirmed previous pairs or loci in Sheng et al. [16] or in BIOS QTL [30]. Collectively, 591 pairs (86% of pairs) in 19 loci (75% of loci) were either replicated in the Qatari replication cohort or confirmed in previous studies with an exact pair match or in the same locus. A total of 130 pairs were considered novel CpG-SNP associations located in previously published loci and replicated in Qataris (Additional file 2: Table S6), and 51 pairs were in novel loci, replicated in Qataris (Table 3, Additional file 2: Table S6). These loci were in TXNIP-SLC2A1, OCIAD1, and SERPINF1 genes. A total of 542 (i.e., 92% of 591) pairs were cis pairs (defined as less than 1 Mb distance between the SNP and the CpG; however, all identified cis pairs were within < 134 kb distance, with an average of 32 kb) and 49 pairs were trans associations (24 pairs were on the same chromosome at > 100 Mb distance, and 25 pairs were on another chromosome) in TXNIP-SLC2A1 and OCIAD1. SNPs in the 591 pairs are annotated as follows: 416 in introns, 84 intergenic, 48 in a downstream/upstream gene, 14 in 3’UTR or 5’UTR, 14 non-coding exons, 6 synonymous, 3 intron-near-splice, and 6 missense and stop-gained SNPs.

Fig. 3
figure 3

Manhattan plot of 688 CpG-SNP significant pairs. Red line is the Bonferroni p-value threshold for significance. (p = 8.6 × 10–10). “x” indicates an undefined gene or an intergenic region. “*” indicates a trans meQTL

Table 3 Fifty-one novel SNP-CpG pairs in novel loci

Three meQTLs harbored nonsynonymous and 5’UTR variants, specifically in SERPINF1, DOCK10 and SLC2A1-TXNIP (Fig. 4). Four missense SNPs in the SERPINF1-SMYD4 locus were associated with cg11692409 (chromosome 17:1665181), among which rs1136287 (17:1673276) and rs1804145 (17:1674434) associated with this CpG at p = 4.0 × 10–22 and p = 5.4 × 10–17 respectively. The former (cg11692409-rs1136287) was also reported in [15] at p = 3.9 × 10–165 and in [16] at p = 3.8 × 10–12. Moreover, rs1136287 had been associated with pigment epithelium-derived factor (PEDF, p = 3.9 × 10−35), where PEDF was associated with diabetic nephropathy (p < 0.001) and sight threatening diabetic retinopathy (p < 0.001) [31]. Additionally, [32] showed an inverse correlation between cg11692409 and the mRNA expression of its gene (PEDF) (correlation coefficient =  − 0.38, p < 0.001). Another gene, DOCK10, harbored a stop-gained (rs12328236), a missense (rs4674940) and an intron-near-splice (rs77952666) variant associated with cg12331557, which were considered novel associations compared to previously known meQTLs [16]. It is worth noting that rs80176144 (~ 20 kb away from this locus) was associated with coronary artery calcified atherosclerotic plaque in T2D at p = 7 × 10–6 [33] and is ~ 50 kb away from the identified meQTL SNP. The third gene, SLC2A1, harbored a 5’UTR SNP (rs11537640) that associated with cg19693031 in TXNIP, two genes that are biologically linked [34, 35]. Located 6738 bp upstream of rs11537640, rs12407920 had been associated with diabetic nephropathy (DN) [36], whereas cg19693031, besides its association with T2D, has been associated with kidney function [37].

Fig. 4
figure 4

Regional plots and summary of biological functions of meQTLs in SLC2A1, DOCK10 and SERPINF1

meQTLs with GWAS associations and causal relationships

To investigate the biological functions of meQTL genes and identify whether meQTL SNPs are in loci that harbor GWAS associations with T2D relevant traits, we searched the GWAS catalogue [29] and the Stanford Biobank [38] for phenotypic associations with the meQTL genes. From the GWAS catalogue, 64 unique SNPs were identified at < 1kb away from meQTL SNPs, and were associated with several traits, some of which were relevant to T2D. Among those, 29 SNPs were exact matches of meQTL SNPs associated with 10 T2D CpGs (Fig. 5, Table 4, Additional file 2: Table S7). Among the most significant GWAS associations were meQTL SNPs in HK1 with HbA1c. Using the Stanford portal, we identified 20 unique SNPs located at < 1kb away from meQTL SNPs, that were associated with several traits (Table 4, Additional file 2: Table S8). Among those, 7 were exact matches of meQTL SNPs associated with 3 T2D CpGs. Both HK1 and PFKFB2 GWAS associations with HbA1c were significant in this database.

Fig. 5
figure 5

Bonferroni significant GWAS associations for traits associated with SNPs within 1kb of the meQTL SNPs. Traits are clustered according to the phenotype in > 9 clusters, indicated on the horizontal axis, that could be relevant to T2D and the blue bars are irrelevant to T2D. The marked clusters (*) have at least one exact matching meQTL and GWAS SNPs. 47 associations are exact matches, with further details shown in Additional file 2: Table S7

Table 4 Significant GWAS associations with HbA1c and exact meQTL SNP matches

For meQTL SNPs in HK1 and PFKFB2 that had associations with HbA1c, causal relationships of methylated genes with HbA1c as a biomarker of T2D were studied. The associations of methylation sites in those genes with HbA1c were first confirmed to be significant in our cohort. cg08992189 in HK1 was associated with HbA1c at a discovery p-value of p = 2.39 × 10–5 and replicated at a p-value of p = 1.6 × 10–4, whereas cg22699725 in PFKFB2 was associated with HbA1c at a discovery p-value of p = 2.68 × 10–6 and replicated at a p-value of p = 2.53 × 10–5. Afterwards, a two-Sample Mendelian Randomization (2SMR) analysis was conducted using CpG-SNP statistics from our cohort (meQTL results) and SNP-HbA1c statistics from the GWAS catalogue and Stanford Biobank. Table 5 shows the 2SMR results for PFKFB2 and HK1 with HbA1c (See Methods).

Table 5 Mendelian randomization results for associations of HK1 and PFKB2 with HbA1c

66 CpG-metabolite associations were identified from T2D CpGs and T2D metabolites

We ran a T2D metabolome wide association study (MWAS) using 2985 individuals with 936 plasma metabolites, and identified 112 metabolites that were significantly associated with T2D at a Bonferroni p-value threshold of (p < 0.05/936) in the discovery cohort (n = 1791), of which 75 metabolites were replicated in the replication cohort (n = 1194) (p < 0.05/112) (Additional file 2: Table S9). All 66 T2D CpGs identified in this study were tested for associations with the 75 T2D metabolites in 708 individuals. Those were divided into two cohorts of 364 and 344 individuals which were used for meta-analysis of the CpG-metabolite associations obtained from each. We identified 77 significant CpG-metabolite associations (p < 0.05/66*75), out of which 11 associations were removed due to a high heterogeneity, resulting in 66 significant associations in 24 unique CpGs (19 genes) and 25 unique metabolites (Table 6, Fig. 6, Additional file 2: Table S10). Of these, 61 were considered novel associations.

Table 6 66 significant CpG-metabolite associations sorted by gene and significance
Fig. 6
figure 6

Volcano plot showing the significant CpG-metabolite associations, indicating the gene and metabolite of the top 5 significant associations (top). Metabolic pathways associated with T2D CpGs, with significant associations are above the threshold red line (bottom). The top most significant associations are labelled 1–10 and they are all TXNIP associations with (1) 1,5 AG, (2) X – 24295, (3) pyruvate, (4) 2-methyl-2-oxobutyrate, (5) X – 24334, (6) mannonate, (7) glucose, (8) 1-palmitoyl-2-arachidonoyl-GPE (16:0/20:4), (9) 2-hydroxybutyrate/2-hydroxyisobutyrate, (10) alanine

TXNIP (mainly cg1969303), was associated with 25 metabolites including 1,5 anhydroglucitol and pyruvate from the glycolysis pathway, and metabolites from fructose/mannose metabolism, alanine and aspartate metabolism, aminosugars, pentose metabolism, phenylalanine metabolism, branched chain amino acids, glutamate metabolism, glutathione metabolism, phosphatidylethanolamines and metabolites classified as food components. Among the remaining 41 associations, were DQX1 (DEAQ Box Polypeptide 1) with phosphatidylethanolamines and ribitol, BLCAP (Bladder Cancer Associated Protein) with ribitol, mannonate, glucose, and phosphatidylethanolamines, DAZAP1 (Deleted In Azoospermia-Associated Protein 1) with mannonate, OCIAD1 (Ovarian Carcinoma Immunoreactive Antigen) with phosphatidylethanolamines, PFKFB2 (6-Phosphofructo-2-Kinase/Fructose-2,6-Biphosphatase 2) with 1-carboxyethylphenylalanine, alanine, erythronate and phosphatidylethanolamines, and POR (P450 Cytochrome Oxidoreductase) with 1,5 anhydroglucitol and a hexosylceramide among others.

Multi-omics networks in T2D

A multi-omics network was constructed from all associations obtained and metabolic networks to identify pathways combining methylation and metabolism. Integrating the 66 CpG-metabolite pairs with 9 meQTL SNPs of the CpGs in those pairs and the T2D metabolites that had significant partial correlations with the metabolites involved in those 66 CpG-metabolite pairs, we obtained a network with 105 nodes (SNP/CpG/metabolite) and 165 edges (associations or correlations). The largest subnetwork had 75 nodes and 120 edges and harbored all CpGs that had metabolic associations (Fig. 7, Additional file 3: Figure S1).

Fig. 7
figure 7

Multi-omics network of T2D methylated genes associations with metabolites, and their meQTLs integrated with metabolic networks. Oval nodes are CpGs, diamonds are SNPs, and hexagons are metabolites

The methylated genes interconnect together through three major pathways: amino acids (11 metabolites), carbohydrates (8 metabolites) and lipids (13 metabolites). Amino acids formed a subnetwork that linked TXNIP, PFKFB2, THBS4, and POR with alanine and aspartate metabolism, urea cycle metabolites, branched chain amino acid metabolites, glutathione metabolism, phenylalanine metabolism, and glutamate metabolism. The carbohydrates formed a subnetwork that linked TXNIP, PFKFB2, PPP1R3E, BLCAP, DQX1, RPRD1B, and POR to glycolysis metabolism (1,5 anhydroglucitol, glucose, pyruvate), fructose, mannose, pentose metabolism (ribitol, ribulonate), and the aminosugar erythronate. A group of 8 lipids formed a subnetwork of ceramides and sphingosines linked to POR and BAIAP2-AS1. Other major connectivity is through hub metabolites (with many edges to genes) as 1-palmitoyl-2-oleoyl-GPE (16:0/18:1) connection to 13 CpGs in 11 genes and mannonate’s (xenobiotic) connection to six genes.

To enrich the metabolic pathway information, the associations of metabolites with BMI, lipoprotein cholesterols and triglycerides were identified. Of the 43 T2D metabolites in this network, 16 showed higher significance in association with BMI, LDL-C, HDL-C or triglycerides compared to their association with T2D (Additional file 2: Table S11), among which 1-palmitoyl-2-oleoyl-GPE (16:0/18:1) was associated with triglycerides and linked to 13 CpGs.


This study used a large number of subjects from the Qatari population that has a high prevalence of diabetes and used the largest methylation EPIC array. Whole genome sequencing and metabolomics data were incorporated to enrich the identification of biological pathways associated with methylation and diabetes.

A total of 66 CpG sites were significantly associated with T2D, of which 63 CpGs were novel in the insulin signaling pathway, fructose and mannose pathway and other T2D relevant pathways. Twenty-two of the identified CpGs had meQTLs in 688 CpG-SNP associations, among which 130 were novel associations and 51 pairs were in novel loci. Novel nonsynonymous and 5’UTR variants were identified in T2D methylated sites of SERPINF1, DOCK10, and TXNIP. Several meQTLs SNPs had GWAS associations with T2D related traits, and causal relationships were statistically inferred between novel CpG sites in HK1 and PFKFB2 and HbA1c (p < 0.0001). Finally, 61 CpG-metabolite pairs out of all 66 identified pairs were novel, among which TXNIP, DAZAP1, BLCAP, and OCIAD1 were found associated with various carbohydrates, lipids, amino acids and xenobiotics. The constructed multi-omics network revealed several methylation-metabolism pathways that related to T2D risks or complications.

The most significantly-associated T2D methylated gene, thioredoxin-interacting protein (TXNIP), plays a role in insulin sensitivity, is a tumor suppressor that is upregulated in diabetes, is induced when glucose levels are elevated, and its deficiency improves glucose tolerance and increases insulin sensitivity in high fat diet-induced obesity [34, 39]. TXNIP suppresses glucose uptake by directly binding and suppressing the glucose transporter, GLUT1 (SLC2A1), by facilitating its clathrin-mediated endocytosis [34, 40]. When elevated, TXNIP induces β-cell apoptosis, while its deficiency protects against type I and type II diabetes by promoting β-cell survival [41, 42]. It is also known to be involved in kidney injury and with IL-1 \(\upbeta\) in the pathogenesis of T2D [43, 44]. Thus, findings from the meQTL analysis, linking SLC2A1 to TXNIP and its several metabolic associations support and add to those known functions of the gene. Besides TXNIP, pathway analysis and the known biology of the novel methylated genes support their relevance to T2D. For example, PPP1R3E, PRKAR2A and HK1 are involved in insulin signaling and resistance (Additional file 2: Table S5). Together with the several T2D-related pathways reported for HK1, further evidence of its relevance to T2D has been provided by GWAS associations of HK1 genetic variants with HbA1c [45], with the strongest signals reported in [46]. HK1 is in the family of hexokinases, which are known to phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways [47]. Its association with the insulin signaling pathway and hyperinsulinemia [48], along with being part of the carbohydrate absorption, glycolysis, fructose and mannose metabolism pathways further emphasizes the statistically inferred causality relation. Other methylated genes have been associated with T2D incidence [49] and cardiovascular disease [50]. For example, CACNA2D2 is involved in pathways related to T2D-associated coronary heart disease, cardiomyopathy, cardiac muscle contraction [51]. Variants of DOCK6 have been associated with GWAS studies on coronary heart disease and total and HDL cholesterol levels [52, 53]. THBS4 has been associated with more advanced peripheral artery disease and T2D [54]. It has also been reported to mediate breast cancer inflammation and growth in mouse models in response to hyperglycemia and TGF-beta [55]. IDO2, one of the isoforms of IDO (Indoleamine 2, 3-dioxygenase) which forms part of the tryptophan metabolism and catalyzes the conversion of tryptophan to kynurenine, is upregulated in T2D patients with baseline tryptophan being associated with higher risk of incident T2D [56].Variants of PBRM1 have been reported to be associated with BMI-related traits and diabetes pathways [57, 58]. While the pathways and biological function of the identified genes could partially be explained from previous findings, our study reveals other biological functions through linking methylation with genomic variants and metabolic pathways.

The meQTL analysis indicated the effect of genomic variation on methylation levels of 22 T2D CpGs, whereas the remaining 44 T2D CpGs are more likely affected by environment/lifestyle. The association of the missense SNP rs1136287 in the SERPINF1/SMYD4 locus with cg11692409 aligns with the previously reported GWAS association of SMYD4 with T2D that is harbored in the SRR locus [4]. The DOCK10 association with diabetic atherosclerosis [33] and its gene expression association with glucose in pancreatic islets [59] suggest that the nonsynonymous meQTL SNPs in DOCK10 may link the gene’s methylation to T2D genetic factors, diabetes complications or mechanisms in islet cells. The presence of a 5’UTR variant in SLC2A1 that codes the GLUT1 protein and that controls methylation in TXNIP is another novel finding that confirms previous biological links found between TXNIP and SLC2A1 [34, 35, 40] as discussed above.

GWAS associations of meQTL SNPs with T2D-related traits such as HbA1c (HK1 and PFKFB2), BMI (EFEMP2, CFL1), eGFR and creatinine (PFKFB2), diabetic nephropathy, urate levels (EFEMP2), bilirubin (HK1), and white blood cell counts (DOCK10, EFEMP2, SLC2A1, CFL1, PFKFB2) [51, 60, 61], further emphasize the involvement of the meQTL genes in the pathogenesis of T2D. In other words, SNP variation may alter methylation levels in T2D CpGs inducing various biological perturbations and complications in T2D patients. Furthermore, the mendelian randomization analysis suggested a causal relationship of DNA methylation in HK1 with HbA1c levels, supporting previous findings of HK1’s GWAS associations with HbA1c and HK1 pathways involving glucose [47], insulin signaling and hyperinsulinemia [48].

This study reports novel T2D methylation-metabolite associations. Compared to a few previously reported metabolic associations with TXNIP CpG cg19693031, this study reveals a much larger role of TXNIP in T2D metabolic pathways that span several pathways of interest, including branched chain amino acids, alanine, lipid metabolism, and sugars among others. DQX1, previously identified to be methylated with T2D in an Arab population, has no meQTL associated with it, suggesting that the metabolic associations of DQX1 with lipid metabolism and ribitol could be largely affected by lifestyle factors. Metabolic associations with genes known to associate with bladder cancer (BLCAP), ovarian cancer (OCIAD1), and infertility caused by azoospermia (DAZAP1) could highlight the importance of integrating both methylation and metabolomics for assessing T2D complications.

Integrating methylation with both genomics and metabolomics has an important role in understanding the cause-effect direction of methylation-metabolism in T2D, i.e., whether methylation drives metabolic changes through gene expression or changes in metabolites cause changes in methylation. For example, methylation of TXNIP is controlled by SNPs in SLC2A1, known to be involved in glucose metabolism, and thus TXNIP association with the glycolysis pathway could be through this meQTL. Similarly, we showed that the meQTL of PFKFB2 had SNPs associated with HbA1c, eGFR, and creatinine, and at the same time the gene had associations with carbohydrates, lipids, phenylalanine and alanine that are involved in both T2D [17, 18] and kidney function [62, 63], suggesting that the metabolic perturbations could be an end effect of the meQTL. Linking methylated genes to metabolism may help future investigation of the population-specificity of some methylated genes to Qataris. Perturbation of the methylation-metabolic pathway by changing lifestyle factors or nutrition in Qataris may result in different methylation patterns in Qataris compared to the other populations and thus may help explain T2D mechanisms in Qataris.

Multi-omics networks have facilitated the identification of metabolic pathways linked to the methylated genes, thus revealing their functionality through metabolites that were associated with T2D risks/complications. One example is the alanine subnetwork that links PFKFB2 and TXNIP to the urea cycle and to amino acids, where amino acids were associated with triglycerides (alanine association with triglycerides: discovery p-value of p = 6.4 × 10–12, replication p-value of p = 9.3 × 10–13) and where both urea cycle and alanine have been associated with kidney failure [64]. Moreover, 1-carboxyethylphenylalanine in the alanine subnetwork was associated with triglycerides (p-value discovery: 1.42 × 10–34, p-value replication: 4.6 × 10–21) and is known to be associated with hypertension [65]. Both links extend the GWAS findings for PFKFB2 described above and reveal the involvement of triglycerides in T2D methylation. The association of 1-palmitoyl-2-oleoyl-GPE(16:0/18:1) with triglycerides (discovery p-value p = 6.47 × 10–100, replication p-value p = 1.39 × 10–80) and its link to 11 methylated genes suggests a possible cause of methylation linked to a diabetes risk/complication that is common between those genes. Mannonate, a xenobiotic (food component/plant) is another example that shares a network with six genes, namely TXNIP, BLCAP, THBS4, PEF1, DAZAP1, and KIAA1211L and may suggest a possible xenobiotic effect on methylation. The networks also highlighted a STYXL1–POR–sphingosines/ceramides–LDL,HDL–steroids connection, where the unique association of POR with a lipid subnetwork of sphingosines, GPCs, and ceramides is supported by the fact that gene’s production of the enzyme cytochrome P450 oxidoreductase is required for the synthesis of cholesterol and steroid hormones (, and references therein). Moreover, since reproductive and fertility complications may arise from diabetes [66], it is interesting to note that STYXL1, the meQTL gene of the cg01676795 (POR) is known to be associated with seminal vesicle tumor and male reproductive organ benign neoplasm [67], that may help in understanding the link of POR to a network of sphingosines and ceramides which play a role in forming steroids [68].

Limitations of the study include measuring methylation from whole blood rather than pancreatic islets as that was the only available tissue from the Qatar Bio Bank. Another limitation was the inability to correct for medication as that information was not yet annotated by the provider at the time of the study.


A total of 66 CpG sites were identified to be associated with T2D, of which 63 were novel and linked to biological pathways of T2D. Genomic drivers of the CpG sites were identified in 688 significant CpG-SNP pairs, among which 181 CpG-SNP pairs were novel associations or in novel loci. Several GWAS associations in meQTLs’ genes revealed various underlying factors in pathways linked to T2D. The study identified novel nonsynonymous and 5’UTR variants associated with T2D methylation in SERPINF1, DOCK10, and TXNIP, as well as the statistically-inferred causal relation between HbA1c and each of the HK1 and PFKFB2 methylation sites. A total of 61 novel methylation-metabolite associations revealed association of several methylated genes with T2D metabolic pathways including population specific genes such as DQX1, among others. The multi-omics network suggested a number of T2D biological mechanisms associated with methylation, including the association of triglyceride metabolism and xenobiotics to 11 and 6 methylated genes respectively, indicating a common factor among those methylated genes and may have a role in their methylation. The pathways identified linked to steroid metabolism, and to metabolites associated with BMI, lipoprotein cholesterols and kidney function, namely STYXL1-POR, SLC2A1-TXNIP, and PFKFB2, may reveal the association of T2D risk factors or complications with methylation mechanisms. Finally, this study revealed several novel methylated genes related to T2D, with their associated genomic variants and metabolic pathways.

Availability of data and materials

All data is provided in the manuscript and in the supplementary tables.



5'—C—phosphate—G—3', Cytosine and guanine separated by a phosphate group


Epigenome Wide Association Study


Genome Wide Association Study


Qatar Bio Bank


Quantitative Trait Loci


methylation Quantitative Trait Loci


Single Nucleotide Polymorphism


Type 2 Diabetes


Metabolome Wide Association Study


  1. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF diabetes atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119.

    PubMed  Google Scholar 

  2. Al Thani A, Fthenou E, Paparrodopoulos S, Al Marri A, Shi Z, Qafoud F, et al. Qatar biobank cohort study: study design and first results. Am J Epidemiol. 2019;188(8):1420–33.

    PubMed  Google Scholar 

  3. Kwak SH, Park KS. Recent progress in genetic and epigenetic research on type 2 diabetes. Experiment Mol Med. 2016;48(3):e220-e.

    Google Scholar 

  4. Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536(7614):41–7.

    CAS  PubMed Central  PubMed  Google Scholar 

  5. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50(11):1505–13.

    CAS  PubMed Central  PubMed  Google Scholar 

  6. Kulkarni H, Kos MZ, Neary J, Dyer TD, Kent JW Jr, Göring HH, et al. Novel epigenetic determinants of type 2 diabetes in Mexican-American families. Hum Mol Genet. 2015;24(18):5330–44.

    CAS  PubMed Central  PubMed  Google Scholar 

  7. Chambers JC, Loh M, Lehne B, Drong A, Kriebel J, Motta V, et al. Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case-control study. Lancet Diabetes Endocrinol. 2015;3(7):526–34.

    CAS  PubMed Central  PubMed  Google Scholar 

  8. Kriebel J, Herder C, Rathmann W, Wahl S, Kunze S, Molnos S, et al. Association between DNA methylation in whole blood and measures of glucose metabolism: KORA F4 study. PLoS ONE. 2016;11(3):e0152314.

    PubMed Central  PubMed  Google Scholar 

  9. Dayeh T, Volkov P, Salo S, Hall E, Nilsson E, Olsson AH, et al. Genome-wide DNA methylation analysis of human pancreatic islets from type 2 diabetic and non-diabetic donors identifies candidate genes that influence insulin secretion. PLoS Genet. 2014;10(3):e1004160.

    PubMed Central  PubMed  Google Scholar 

  10. Olsson AH, Volkov P, Bacos K, Dayeh T, Hall E, Nilsson EA, et al. Genome-wide associations between genetic and epigenetic variation influence mRNA expression and insulin secretion in human pancreatic islets. PLoS Genet. 2014;10(11):e1004735.

    PubMed Central  PubMed  Google Scholar 

  11. Wang Z, Qiu C, Lin X, Zhao LJ, Liu Y, Wu X, et al. Identification of novel functional CpG-SNPs associated with type 2 diabetes and coronary artery disease. Mol Genet Genomics. 2020;295(3):607–19.

    CAS  PubMed Central  PubMed  Google Scholar 

  12. Juvinao-Quintero DL, Marioni RE, Ochoa-Rosales C, Russ TC, Deary IJ, van Meurs JBJ, et al. DNA methylation of blood cells is associated with prevalent type 2 diabetes in a meta-analysis of four European cohorts. Clin Epigenetics. 2021;13(1):40.

    CAS  PubMed Central  PubMed  Google Scholar 

  13. Agarwal P, Wicklow BA, Dart AB, Hizon NA, Sellers EAC, McGavock JM, et al. Integrative analysis reveals novel associations between DNA methylation and the serum metabolome of adolescents with type 2 diabetes: a cross-sectional study. Front Endocrinol (Lausanne). 2022;13:934706.

    PubMed  Google Scholar 

  14. Al Muftah WA, Al-Shafai M, Zaghlool SB, Visconti A, Tsai PC, Kumar P, et al. Epigenetic associations of type 2 diabetes and BMI in an Arab population. Clin Epigenetics. 2016;8:13.

    PubMed Central  PubMed  Google Scholar 

  15. Huan T, Joehanes R, Song C, Peng F, Guo Y, Mendelson M, et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun. 2019;10(1):4267.

    PubMed Central  PubMed  Google Scholar 

  16. Sheng X, Qiu C, Liu H, Gluck C, Hsu JY, He J, et al. Systematic integrated analysis of genetic and epigenetic variation in diabetic kidney disease. Proc Natl Acad Sci U S A. 2020;117(46):29013–24.

    CAS  PubMed Central  PubMed  Google Scholar 

  17. Yousri NA, Suhre K, Yassin E, Al-Shakaki A, Robay A, Elshafei M, et al. Metabolic and metabo-clinical signatures of type 2 diabetes, obesity, retinopathy, and dyslipidemia. Diabetes. 2021;71(2):184–205.

    PubMed Central  Google Scholar 

  18. Yousri NA, Mook-Kanamori DO, Selim MME-D, Takiddin AH, Al-Homsi H, Al-Mahmoud KAS, et al. A systems view of type 2 diabetes-associated metabolic perturbations in saliva, blood and urine at different timescales of glycaemic control. Diabetologia. 2015;58(8):1855–67.

    CAS  PubMed Central  PubMed  Google Scholar 

  19. Yousri NA, Fakhro KA, Robay A, Rodriguez-Flores JL, Mohney RP, Zeriri H, et al. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a middle Eastern population. Nat Commun. 2018;9(1):333.

    PubMed Central  PubMed  Google Scholar 

  20. Fortin J-P, Labbe A, Lemire M, Zanke BW, Hudson TJ, Fertig EJ, et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 2014;15(11):503.

    PubMed Central  PubMed  Google Scholar 

  21. Hunter-Zinck H, Musharoff S, Salit J, Al-Ali KA, Chouchane L, Gohar A, et al. Population genetic structure of the people of Qatar. Am J Hum Genet. 2010;87(1):17–25.

    CAS  PubMed Central  PubMed  Google Scholar 

  22. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128.

    PubMed Central  PubMed  Google Scholar 

  23. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90–7.

    CAS  PubMed Central  PubMed  Google Scholar 

  24. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23(10):1294–6.

    CAS  PubMed  Google Scholar 

  25. Chen WM, Abecasis GR. Family-based association tests for genomewide association scans. Am J Hum Genet. 2007;81(5):913–26.

    CAS  PubMed Central  PubMed  Google Scholar 

  26. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1.

    CAS  PubMed Central  PubMed  Google Scholar 

  27. Toperoff G, Aran D, Kark JD, Rosenberg M, Dubnikov T, Nissan B, et al. Genome-wide survey reveals predisposing diabetes type 2-related DNA methylation variations in human peripheral blood. Hum Mol Genet. 2012;21(2):371–83.

    CAS  PubMed  Google Scholar 

  28. Walaszczyk E, Luijten M, Spijkerman AMW, Bonder MJ, Lutgers HL, Snieder H, et al. DNA methylation markers associated with type 2 diabetes, fasting glucose and HbA1c levels: a systematic review and replication in a case–control sample of the Lifelines study. Diabetologia. 2018;61(2):354–68.

    CAS  PubMed  Google Scholar 

  29. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–12.

    CAS  PubMed  Google Scholar 

  30. Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet. 2017;49(1):139–45.

    CAS  PubMed  Google Scholar 

  31. Cheung CYY, Lee C-H, Tang CS, Xu A, Au K-W, Fong CHY, et al. Genetic regulation of pigment epithelium-derived factor (PEDF): an exome-chip association analysis in chinese subjects with type 2 diabetes. Diabetes. 2019;68(1):198–206.

    CAS  PubMed  Google Scholar 

  32. Yi H, Ji D, Zhan T, Yao Y, Li M, Jia J, et al. Prognostic value of pigment epithelium-derived factor for neoadjuvant radiation therapy in patients with locally advanced rectal carcinoma. Int J Oncol. 2016;49(4):1415–26.

    CAS  PubMed  Google Scholar 

  33. Divers J, Palmer ND, Langefeld CD, Brown WM, Lu L, Hicks PJ, et al. Genome-wide association study of coronary artery calcified atherosclerotic plaque in African Americans with type 2 diabetes. BMC Genet. 2017;18(1):105.

    PubMed Central  PubMed  Google Scholar 

  34. Wu N, Zheng B, Shaywitz A, Dagon Y, Tower C, Bellinger G, et al. AMPK-dependent degradation of TXNIP upon energy stress leads to enhanced glucose uptake via GLUT1. Mol Cell. 2013;49(6):1167–75.

    CAS  PubMed Central  PubMed  Google Scholar 

  35. Jandova J, Wondrak GT. Genomic GLO1 deletion modulates TXNIP expression, glucose metabolism, and redox homeostasis while accelerating human A375 malignant melanoma tumor growth. Redox Biol. 2021;39:101838.

    CAS  PubMed  Google Scholar 

  36. Stefanidis I, Tziastoudi M, Tsironi EE, Dardiotis E, Tachmitzi SV, Fotiadou A, et al. The contribution of genetic variants of SLC2A1 gene in T2DM and T2DM-nephropathy: association study and meta-analysis. Ren Fail. 2018;40(1):561–76.

    CAS  PubMed Central  PubMed  Google Scholar 

  37. Kho M, Zhao W, Ratliff SM, Ammous F, Mosley TH, Shang L, et al. Epigenetic loci for blood pressure are associated with hypertensive target organ damage in older African Americans from the genetic epidemiology network of Arteriopathy (GENOA) study. BMC Med Genomics. 2020;13(1):131.

    CAS  PubMed Central  PubMed  Google Scholar 

  38. McInnes G, Tanigawa Y, DeBoever C, Lavertu A, Olivieri JE, Aguirre M, et al. Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics. Bioinformatics. 2018;35(14):2495–7.

    PubMed Central  Google Scholar 

  39. Lei Z, Chen Y, Wang J, Zhang Y, Shi W, Wang X, et al. Txnip deficiency promotes β-cell proliferation in the HFD-induced obesity mouse model. Endocr Connect. 2022;11(4):e210641.

    CAS  PubMed Central  PubMed  Google Scholar 

  40. Dykstra H, LaRose C, Fisk C, Waldhart A, Meng X, Zhao G, et al. TXNIP interaction with GLUT1 depends on PI(4,5)P(2). Biochim Biophys Acta Biomembr. 2021;1863(12):183757.

    CAS  PubMed Central  PubMed  Google Scholar 

  41. Shalev A. Minireview: thioredoxin-interacting protein: regulation and function in the pancreatic β-cell. Mol Endocrinol. 2014;28(8):1211–20.

    PubMed Central  PubMed  Google Scholar 

  42. Wondafrash DZ, Nire’a AT, Tafere GG, Desta DM, Berhe DA, Zewdie KA. Thioredoxin-interacting protein as a novel potential therapeutic target in diabetes mellitus and its underlying complications. Diabetes Metab Syndr Obes. 2020;13:43–51.

    CAS  PubMed Central  PubMed  Google Scholar 

  43. Zhou R, Tardivel A, Thorens B, Choi I, Tschopp J. Thioredoxin-interacting protein links oxidative stress to inflammasome activation. Nat Immunol. 2010;11(2):136–40.

    CAS  PubMed  Google Scholar 

  44. Abais JM, Xia M, Li G, Chen Y, Conley SM, Gehr TWB, et al. Nod-like receptor protein 3 (NLRP3) inflammasome activation and podocyte injury via thioredoxin-interacting protein (TXNIP) during hyperhomocysteinemia. J Biol Chem. 2014;289(39):27159–68.

    CAS  PubMed Central  PubMed  Google Scholar 

  45. Gjesing AP, Nielsen AA, Brandslund I, Christensen C, Sandbaek A, Jorgensen T, et al. Studies of a genetic variant in HK1 in relation to quantitative metabolic traits and to the prevalence of type 2 diabetes. BMC Med Genet. 2011;12:99.

    CAS  PubMed Central  PubMed  Google Scholar 

  46. Soranzo N, Sanna S, Wheeler E, Gieger C, Radke D, Dupuis J, et al. Common variants at 10 genomic loci influence hemoglobin A(1)(C) levels via glycemic and nonglycemic pathways. Diabetes. 2010;59(12):3229–39.

    CAS  PubMed Central  PubMed  Google Scholar 

  47. Safran M, Rosen N, Twik M, BarShir R, Stein TI, Dahary D, et al. The GeneCards Suite. In: Abugessaisa I, Kasukawa T, editors., et al., Practical Guide to Life Science Databases. Singapore: Springer Singapore; 2021. p. 27–56.

    Google Scholar 

  48. Pinney SE, Ganapathy K, Bradfield J, Stokes D, Sasson A, Mackiewicz K, et al. Dominant form of congenital hyperinsulinism maps to HK1 region on 10q. Horm Res Paediatr. 2013;80(1):18–27.

    CAS  PubMed  Google Scholar 

  49. Cheng Y, Gadd DA, Gieger C, Monterrubio-Gomez K, Zhang Y, Berta I, et al. Development and validation of DNA methylation scores in two European cohorts augment 10-year risk prediction of type 2 diabetes. Nat Aging. 2023;3(4):450–8.

    CAS  PubMed  Google Scholar 

  50. Krolevets M, Cate VT, Prochaska JH, Schulz A, Rapp S, Tenzer S, et al. DNA methylation and cardiovascular disease in humans: a systematic review and database of known CpG methylation sites. Clin Epigenetics. 2023;15(1):56.

    CAS  PubMed Central  PubMed  Google Scholar 

  51. Hu X, Rong S, Wang Q, Sun T, Bao W, Chen L, et al. Association between plasma uric acid and insulin resistance in type 2 diabetes: a Mendelian randomization analysis. Diabetes Res Clin Pract. 2021;171:108542.

    CAS  PubMed  Google Scholar 

  52. Hoffmann TJ, Theusch E, Haldar T, Ranatunga DK, Jorgenson E, Medina MW, et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat Genet. 2018;50(3):401–13.

    CAS  PubMed Central  PubMed  Google Scholar 

  53. Koyama S, Ito K, Terao C, Akiyama M, Horikoshi M, Momozawa Y, et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat Genet. 2020;52(11):1169–77.

    CAS  PubMed  Google Scholar 

  54. Zierfuss B, Hobaus C, Herz CT, Pesau G, Koppensteiner R, Schernthaner GH. Thrombospondin-4 increases with the severity of peripheral arterial disease and is associated with diabetes. Heart Vessels. 2020;35(1):52–8.

    PubMed  Google Scholar 

  55. Muppala S, Xiao R, Gajeton J, Krukovets I, Verbovetskiy D, Stenina-Adognravi O. Thrombospondin-4 mediates hyperglycemia- and TGF-beta-induced inflammation in breast cancer. Int J Cancer. 2021;148(8):2010–22.

    CAS  PubMed  Google Scholar 

  56. Yu E, Papandreou C, Ruiz-Canela M, Guasch-Ferre M, Clish CB, Dennis C, et al. Association of tryptophan metabolites with incident type 2 diabetes in the PREDIMED trial: a case-cohort study. Clin Chem. 2018;64(8):1211–20.

    CAS  PubMed Central  PubMed  Google Scholar 

  57. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206.

    CAS  PubMed Central  PubMed  Google Scholar 

  58. Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518(7538):187–96.

    CAS  PubMed Central  PubMed  Google Scholar 

  59. Ottosson-Laakso E, Krus U, Storm P, Prasad RB, Oskolkov N, Ahlqvist E, et al. Glucose-induced changes in gene expression in human pancreatic islets: causes or consequences of chronic hyperglycemia. Diabetes. 2017;66(12):3013–28.

    CAS  PubMed  Google Scholar 

  60. Abbasi A, Deetman PE, Corpeleijn E, Gansevoort RT, Gans ROB, Hillege HL, et al. Bilirubin as a potential causal factor in type 2 diabetes risk: a mendelian randomization study. Diabetes. 2014;64(4):1459–69.

    PubMed  Google Scholar 

  61. Gkrania-Klotsas E, Ye Z, Cooper AJ, Sharp SJ, Luben R, Biggs ML, et al. Differential white blood cell count and type 2 diabetes: systematic review and meta-analysis of cross-sectional and prospective studies. PLoS ONE. 2010;5(10):e13405.

    PubMed Central  PubMed  Google Scholar 

  62. Gai Z, Wang T, Visentin M, Kullak-Ublick GA, Fu X, Wang Z. Lipid accumulation and chronic kidney disease. Nutrients. 2019;11(4):722.

    CAS  PubMed Central  PubMed  Google Scholar 

  63. Kopple JD. Phenylalanine and tyrosine metabolism in chronic kidney failure. J Nutr. 2007;137(6 Suppl 1):1586S–90S (discussion 97S-98S).

    CAS  PubMed  Google Scholar 

  64. Darshi M, Van Espen B, Sharma K. Metabolomics in diabetic kidney disease: unraveling the biochemistry of a silent killer. Am J Nephrol. 2016;44(2):92–103.

    CAS  PubMed  Google Scholar 

  65. Shi M, He J, Li C, Lu X, He WJ, Cao J, et al. Metabolomics study of blood pressure salt-sensitivity and hypertension. Nutr Metab Cardiovasc Dis. 2022;32(7):1681–92.

    CAS  PubMed Central  PubMed  Google Scholar 

  66. Maresch CC, Stute DC, Alves MG, Oliveira PF, de Kretser DM, Linn T. Diabetes-induced hyperglycemia impairs male reproductive function: a systematic review. Hum Reprod Update. 2017;24(1):86–105.

    Google Scholar 

  67. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr Protocols Bioinform. 2016;54(1):1.30.1-1..3.

    Google Scholar 

  68. Lucki NC, Sewer MB. Multiple roles for sphingolipids in steroid hormone biosynthesis. Subcell Biochem. 2008;49:387–412.

    PubMed Central  PubMed  Google Scholar 

Download references


The authors would like to acknowledge the Qatar BioBank, the Qatar Genome Project and the Genomics Core at Weill Cornell Medicine-Qatar.

Author’s information

Authors’ Twitter handle: @ny99.


This work was made possible by PPM2 grant #PPM2-0226–170020 from the Qatar National Research Fund (QNRF) and Qatar Genome Program (QGP) (members of Qatar Foundation). This work was also made possible by NPRP-S11 grant #NPRP11S-0114–180299 from the Qatar National Research Fund (a member of Qatar Foundation). The findings herein reflect the work and are solely the responsibility of the authors. O.M.E.A. is also supported by an independent QNRF grant# NPRP11C-0115–180010. Noha A Yousri is the guarantor of this work.

Author information

Authors and Affiliations



NAY is the lead PI of two QNRF grants supporting this study; she designed the study, performed all computational analysis, literature review and interpretation of findings, and prepared the manuscript. OMEA and SCH helped in the study design, revised and edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Noha A. Yousri.

Ethics declarations

Ethics approval and consent to participate

All participants enrolled in the Qatar Biobank provided informed consent for the collection of their data and specimens. Approval of the study was provided by the IRB of the Qatar Biobank and the approval reference numbers were [E-2020-QF-QBB-RES-ACC-0104–0109, E -2017-QF-QBB-RES-ACC-0058–0025].

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.


Additional file 2:

Table S1. Results of two way discovery replication T2D EWAS using cohorts 1 and 2. Table S2. Results of lookup of the identified 66 T2D CpGs in databases and previous literature. Table S3. Replication of literature findings. Table S4. eQTM at FDR < 0.05 from BIOS QTL site for all 66 T2D CpGs. Table S5. KEGG pathway (Pathways relevant to diabetes or complications are highlighted). Table S6. All 688 CpG-SNP significant associations in 27 loci. Table S7. GWAS associations for the SNPs in the neighborhood of meQTL SNPs as reported in the GWAS catalogue. Table S8. GWAS associations for the SNPs in the neighborhood of meQTL SNPs as reported in the Stanford bio bank. Table S9. A)Discovery and Replication cohorts for T2D MWAS, B)T2D significant metabolites with the discovery and replication statistics. Table S10. 77 CpG-metabolite association results from meta-analysis (11 colored rows are associations with high heterogeneity (p<0.05)). Table S11. Associations of 16 metabolites from the multi-omics network with BMI, lipoproteins and triglycerides. Table S12. Characteristics of metabolomics samples used for the T2D MWAS.

Additional file 3: Figure S1.

Full omics network of CpG-metabolites and associated SNP-CpGs and associated metabolite-metabolite pairs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousri, N.A., Albagha, O.M.E. & Hunt, S.C. Integrated epigenome, whole genome sequence and metabolome analyses identify novel multi-omics pathways in type 2 diabetes: a Middle Eastern study. BMC Med 21, 347 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: