Investigating causal relationships between exposome and human longevity: a Mendelian randomization analysis

Background Environmental factors are associated with human longevity, but their specificity and causality remain mostly unclear. By integrating the innovative “exposome” concept developed in the field of environmental epidemiology, this study aims to determine the components of exposome causally linked to longevity using Mendelian randomization (MR) approach. Methods A total of 4587 environmental exposures extracting from 361,194 individuals from the UK biobank, in exogenous and endogenous domains of exposome were assessed. We examined the relationship between each environmental factor and two longevity outcomes (i.e., surviving to the 90th or 99th percentile age) from various cohorts of European ancestry. Significant results after false discovery rates correction underwent validation using an independent exposure dataset. Results Out of all the environmental exposures, eight age-related diseases and pathological conditions were causally associated with lower odds of longevity, including coronary atherosclerosis (odds ratio = 0.77, 95% confidence interval [0.70, 0.84], P = 4.2 × 10−8), ischemic heart disease (0.66, [0.51, 0.87], P = 0.0029), angina (0.73, [0.65, 0.83], P = 5.4 × 10−7), Alzheimer’s disease (0.80, [0.72, 0.89], P = 3.0 × 10−5), hypertension (0.70, [0.64, 0.77], P = 4.5 × 10−14), type 2 diabetes (0.88 [0.80, 0.96], P = 0.004), high cholesterol (0.81, [0.72, 0.91], P = 0.0003), and venous thromboembolism (0.92, [0.87, 0.97], P = 0.0028). After adjusting for genetic correlation between different types of blood lipids, higher levels of low-density lipoprotein cholesterol (0.72 [0.64, 0.80], P = 2.3 × 10−9) was associated with lower odds of longevity, while high-density lipoprotein cholesterol (1.36 [1.13, 1.62], P = 0.001) showed the opposite. Genetically predicted sitting/standing height was unrelated to longevity, while higher comparative height size at 10 was negatively associated with longevity. Greater body fat, especially the trunk fat mass, and never eat sugar or foods/drinks containing sugar were adversely associated with longevity, while education attainment showed the opposite. Conclusions The present study supports that some age-related diseases as well as education are causally related to longevity and highlights several new targets for achieving longevity, including management of venous thromboembolism, appropriate intake of sugar, and control of body fat. Our results warrant further studies to elucidate the underlying mechanisms of these reported causal associations. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-021-02030-4.


Conclusions:
The present study supports that some age-related diseases as well as education are causally related to longevity and highlights several new targets for achieving longevity, including management of venous thromboembolism, appropriate intake of sugar, and control of body fat. Our results warrant further studies to elucidate the underlying mechanisms of these reported causal associations.

Keywords: Longevity, Mendelian randomization, Exposome
Background Longevity is defined as the length or duration of life or viability, typically refer to the age of death or survival beyond of 90-100 years or older [1]. It is a heterogenous trait that is susceptible to genetic and environmental factors. Previous genome-wide association studies (GWASs) have revealed genetic loci associated with human longevity or parental lifespan [2,3], while environmental factors, including socio-economic status, smoking, gender, and lifestyle, are considered determinants [1]. Observational studies have also featured the associations on various risk factors, where the predicted longevity could be significantly reduced by cardiovascular disease (CVD), diabetes, hypertension, and tobacco smoking [4,5]. However, due to the vulnerability to reverse causation and confounding bias, most of the epidemiological studies are insufficient to draw a definite conclusion on causality.
Mendelian randomization (MR) is an analytical approach that can overcome such limitations by using genetic variants as instrumental variables (IVs) to evaluate the causal effect of exposure on the outcome. Since genotypes are randomly allocated from parents to offspring [6], MR method is less likely to be affected by reverse causality and measurement errors in the absence of pleiotropy, making causal inference more feasible compared to conventional study designs. Although several MR analyses have demonstrated a subset of environmental factors that were causally associated with longevity [7][8][9], the exploration of causal exposures is still in a relatively primitive stage. However, by applying the "exposome" concept proposed in the field of environmental epidemiology, we are able, for the first time, to investigate the totality of environmental exposures that affect an individual from conception until death [10]. Using the MR approach, our study aims to construct the potential components of exposome that causally linked to longevity.

Exposure data
UK Biobank (UKB) is a large-scale and long-term biobank with information on both genetics and broad environmental exposures collected over 10 years (www. ukbiobank.ac.uk). Over 500,000 individuals aged 40-69 years were recruited from across the UK between 2006 and 2010. The exposome data used in our MR analysis were originally from the UK Biobank. GWAS summary statistics of 4587 environmental exposures were obtained from the Neale Lab (http://www.nealelab.is/uk-biobank), based on 361,194 participants [11]. Categorical exposures with cases < 250 and duplicated exposures were excluded [12]. Exposures with less than three independent single nucleotide polymorphisms (SNPs) at P < 5 × 10 −8 were also excluded ( Fig. 1). Finally, a total of 704 exposures were included in primary analysis, and 663 exposures were included in secondary analysis. We classified all these available exposures into three main domains: endogenous, exogenous individual and exogenous macro-level [13]. Exposures in each domain were then classified into different categories, mainly according to information in UKB.

Outcome data
We used two summary statistics from the largest metaanalysis of human longevity GWAS of European ancestry [3]. Longevity was defined as two dichotomous phenotypes [3]. Cases were individuals who lived beyond the 90th (N = 11,262) or 99th (N = 3484) percentile. Controls (N = 25,483) were individuals who died at or before the age at the 60th percentile or whose age at the last follow-up visit was at or before the 60th percentile age. To mitigate the heterogeneity, the cohort-specific life tables for the country, sex, and birth, are used to identify the age threshold for cases and controls in the original GWAS [3]. Hence, the number of selected cases and controls is independent of the study population used. The 90th percentile longevity data was used in the primary analysis because of the larger sample size, while 99th percentile data was used as secondary analysis. The mean age of 90th percentile cases was 97 years, ranging from 87 to 122. The mean age of 99th percentile cases was 101 years, ranging from 90 to 122. The mean age of the control group was 55 years, ranging from 0 to 88. All participants provided written informed consent in original GWAS [3].

Two-sample MR design
We inferred causal relationships between each environmental exposure and longevity using two-sample MR, in which the selections of IVs are based on GWAS summary statistics generated from different, non-overlapping samples. To obtain unbiased estimates of the causal effects, MR analysis rests on three assumptions [6]: (i) the genetic variants are associated with the exposure, (ii) the genetic variants are independent of confounders of the risk factor-outcome association, and (iii) the genetic variants influence the outcome only through the exposure.

Selection of instrumental variables
For each exposure, single nucleotide polymorphisms (SNPs) associated at P-value < 5 × 10 −8 with a minor allele frequency greater than 0.01 were considered potential instruments. We used MR-Base (http://www.mrbase. org) to select independent SNPs at a linkage disequilibrium threshold of r 2 < 0.001, and retained SNPs with the strongest effect on the associated trait. For palindromic SNPs, we aligned strands using allele frequency and discarded palindromic SNP(s) that had minor allele frequency above 0.42. Then, exposure-outcome datasets were harmonized. We have considered the palindromic SNPs and checked original datasets to avoid reverse effects.
We computed the F-statistic of each exposure to judge the strength of IVs. The bias from weak instruments depends on the strength of the instrument through the F-statistic, which is related to the proportion of variance in the phenotype explained by IVs (R 2 ), sample size (n) and number of instruments (k) by the formula F = ð n−k−1 k Þð R 2 1−R 2 Þ [14]. Typically, a strong instrument was defined as an F-statistic > 10 [14]. We estimated the statistical power with a false positive rate α = 0.05 using R code provided by Burgess S [15]. Details of the genetic instruments were presented in Additional file 1: Table S1.

Statistical analysis
We used the inverse variance weighted (IVW) method as our principal MR analytical approach. This method will return an unbiased estimate in the absence of horizontal pleiotropy or when horizontal pleiotropy is balanced. Results are presented as odds ratio (OR) per standard deviation (SD) increase in genetically determined metabolites on AD for the outcome was dichotomous. For the Neal lab GWAS data using a linear model (rather than a logistic model) when analyzing case-control traits, thus, we applied a transformation according to the manual of BOLT_LMM (https:// alkesgroup.broadinstitute.org/BOLT-LMM/downloads/ BOLT-LMM_v2.3.4_manual.pdf) in order to convert SNP effect estimates ("betas") on the quantitative scale to traditional ORs. This approximate transformation is log OR = β/(μ × (1 − μ)), where μ = case fraction. Standard errors of SNP effect size estimates are also be divided by (μ × (1 − μ)) when applying that transformation to obtain log ORs.
Sensitivity analyses were conducted using weighted median [16], MR-Egger regression [17], and Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) [18]. These methods hold different assumptions at the costs of reduced statistical power. The weighted median allows for 50% of the IVs to be invalid or present pleiotropy [16]. MR-Egger regression allows > 50% of the variants to be invalid [17]. Heterogeneity in the IVW estimates was examined by Cochran's Q test. Furthermore, MR-Egger intercept and MR-PRESSO global test were used to check for the presence of pleiotropy. In the case of horizontal pleiotropy, MR-PRES SO outlier test compares the observed and expected distributions of the tested variants to identify outlier variants. If significant outliers (P < 0.05) are detected, they were removed from the analysis to return an unbiased causal estimate [18].
To correct for multiple comparisons, we applied false discovery rates (FDR) correction in IVW. An FDR corrected P-value < 0.05 was considered significant, and an unadjusted P-value < 0.05 was considered the evidence of a suggestive association. The significant traits with consistent point estimates across sensitivity analyses and IVW estimates were selected in the screening phase as the most robust causal exposures. Analyses were conducted using R version 3.6.3, with the MR analysis performed using the "TwoSampleMR" package version 0.5.2 [19].

Validation
For those identified significant exposures, we used non-UKB GWAS to validate our MR results. A total of 20 independent GWAS data were publicly available as part of the MRbase package [19]. If more than one GWAS were available for a given trait, an optimal one was selected based on large sample size, sufficient available SNPs, both sexes, and European or mixed descent. Details of independent exposure GWAS were presented in Additional file 1: Table S2. For each trait in the validation, IVs were constructed starting from all SNPs with P < 5 × 10 −8 . In validation analysis, the IVs of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides are partly overlapped [20]. Thus, we used multivariable MR to adjust for the genetic correlation [21]. Validation analyses were not conducted for those significant exposures without eligible data.

Screening results
Of all analyzed exposures, 110 exposures and 73 exposures showed associations with longevity at P < 0.05 in the primary analysis and secondary analysis, respectively. We found that 53 exposures showed significant associations with either or both 90th and 99th percentile longevity after FDR correction and sensitivity analysis (Fig. 1). Of the 53 screening exposures, sensitivity analysis showed consistent point estimates with IVW in primary stage (Table 1). These exposures were classified into eight categories, including disease, physical measures, family history, medication, early life factors, education, lifestyle, and diet (Fig. 2, and Additional file 1: Table S3-S5). The list of the SNPs used as IVs for each screening exposure was presented in Additional file 1: Table S6. MR analyses were repeated using non-UKB exposure datasets (Fig. 3, and Additional file 1: Table S7-S9). A list of overview results in the present study is showed in Additional file 1: Table  S10. All the significant and suggestive causal exposures from two longevity datasets are presented in Additional file 1: Table S11-S12. MR results of all traits are presented in Additional file 2: Table S13-S14.
Among reported exposures in Fig. 2, forty-two traits were associated with both 90th and 99th percentile survival longevity outcomes. In the disease category, diseases of circulatory system (OR 90 = 0.43 [0.32, 0.59], P 90 = 1.0 × 10 −7 ) were causally associated with lower odds of 90th and 99th percentile longevity. We observed that ischemic heart disease (OR 90 = 0.66 [0.51, 0.87], P 90 = 0.0029) was causally linked to both two longevity outcomes. MR-PRESSO global test and Q test showed substantial pleiotropy between the SNPs used as IVs for the two exposures (P < 0.05; Table 2). However, after removing potential outlying SNPs, the corrected MR-PRES SO results are still significant. For other heart diseaserelated traits, coronary atherosclerosis (OR 90   were also associated with higher odds of 90th and 99th percentile longevity. After removing potential outlying SNP through MR-PRESSO outlier test, significant effects remained of the two traits (Table 2). We also noted an association between self-reported high cholesterol (OR 90 = 0.81 [0.72, 0.91], P 90 = 0.0003) and two longevity outcomes. After removing the outlying SNP identified by MR-PRESSO outlier test, the significant effects on 90th percentile longevity remained ( = 0.0016) was also associated with lower odds of both two longevity outcomes without any evidence of heterogeneity or pleiotropy.
In the physical measures category, seven exposures referring to body morphology showed hazardous effects on both two longevity outcomes, including arm fat mass  Fig. 2 Mendelian randomization estimates for association between genetically predicted exposures and longevity in primary analysis. The estimates present here were calculated by the IVW method. *Group 1: heart disease, stroke, high blood pressure, chronic bronchitis/emphysema, and Alzheimer's disease/dementia, diabetes. AD, Alzheimer's disease; CI, confidence interval; COPD, chronic obstructive pulmonary diseases; DVT, deep venous thrombosis; FDR, false discovery rates; N, number or sample size; OR, odds ratio; SNPs, single nucleotide polymorphisms   [1.71, 6.24], P 90 = 0.0003). Regular taken of blood pressure medication, cholesterol lowering medication, metformin, and aspirin also showed significant association with lower odds of 90th and 99th percentile longevity. Besides, more medications taken (OR 90 = 0.37 [0.18, 0.77], P 90 = 0.008) was suggestively associated with lower odds of surviving to the 90th and 99th percentile age (see Additional file 1: Table S11-S12), while none medications taken showed the opposite (Fig, 2). In medication and family history category, results either showed no potential pleiotropy in MR-Egger intercept test or remained significant on 90th percentile longevity after removing outlying SNPs (Table 2).
Additionally, we found that comparative height size at age 10 (OR 90 = 0.77 [0.65, 0.92], P 90 = 0.0035) and never eat sugar or food/drinks containing sugar (OR 90 = 0.59 [0.44, 0.80], P 90 = 0.0005) showed association with lower odds of 90th and 99th percentile longevity. Substantial pleiotropy was only detected in comparative height size at age 10 (MR-Egger intercept P = 0.004; global test P = 0.006), but the result of corrected MR-PRESSO test was still significant after removing outlying variants.
Comparing results of 53 reported exposures in primary and in secondary analysis, four traits in disease category  ) were associated with higher odds of longevity only in 90th percentile data. All significant exposures identified in secondary analysis also showed significant results in primary stage.

Validation
In the validation, the results of myocardial infarction (P 90 = 0.036), coronary artery disease (P 90 = 0.004), VTE (P 90 = 0.017), AD (P 90 = 3.0 × 10 −5 ), trunk fat mass (P 90 = 0.039), and education attainment (i.e., the number of years of schooling completed; P 90 = 0.020) had secured our MR estimates in screening.  Table S10-S11). SBP and body fat mass were nonsignificant in the validation, but the statistical power to detect an effect was not enough to preclude the positive effects in primary analysis. Of all exposures in the validation, the Egger intercept test showed no pleiotropy. After adjusting for genetic correlation between different types of blood lipids, the association between HDL-C and longevity was partially attenuated (OR 90

Potential components of human longevity exposome
After screening and validation, robust exposures were considered components of longevity exposome, including 39 exposures that showed associations with both 90th and 99th percentile survival longevity at significant or suggestive levels in screening (see Additional file 1: Table S3), as well as VTE, AD, trunk fat mass, and educational attainment that were significant in the validation (Figs. 3 and 4). For note, malignant neoplasm of prostate, BMI, and waist circumference were excluded because of the non-significant validation results with a high power. Atrial fibrillation and flutter and age first had sexual intercourse were not considered components of longevity exposome for that the two results cannot be verified neither in secondary analysis nor in validation.

Discussion
This is the first study using the MR approach to reveal causal components of longevity exposome. We found evidence that some heart diseases, metabolic syndromes, AD, VTE, greater body fat, higher comparative height size at 10, and never eat sugar or foods/drinks containing sugar have adverse effects on longevity, whereas higher HDL-C levels and higher education attainment have protective effects.
Our findings suggest the susceptibility to age-related diseases may significantly affect human longevity. Intuitively, our results have shown consistency with previous investigations. A progressive delay in the onset of agerelated diseases, including ischemic heart disease, coronary atherosclerosis, angina, and AD, has been found with an association of increasing survival age [22]. Remarkably, GWAS have found that human longevity shared genetic correlations with CVD [3]. However, previous studies didn't investigate the potential association using robust genetic analyses. By using MR method, we strengthen the potential causal effects of cardiovascular diseases on human longevity. Our MR study also demonstrated that hypertension, T2D, and higher LDL-C level were associated with lower odds of longevity, which is a strong confirmation of previous observational studies [2,5,8,23]. It is believed to be causing genomic instability, telomere attrition, epigenetic alterations, and loss of proteostasis in the development of metabolic syndrome [24], thus leading to the reducing survival age. A healthy metabolic profile to avoid or delay the occurrence of metabolic syndrome may prolong longevity, as our results yield a positive association between age high blood pressure diagnosed and longevity at a suggestive level (see Additional file 1: Table S10-S12). Previous studies have also shown correlations between exceptionally healthy metabolic profile and human longevity [5,25], shedding new insights for revealing the complexity of longevity. Furthermore, it is well known that many of those metabolic factors act as risk factors for CVD, metabolic syndrome, and AD. As these exposures of longevity interplay and intertwined, further studies are needed to decipher the pathways supporting these causal associations.
The protective effect of HDL-C was still significant in our study even after adjusting for LDL-C and triglycerides. As the genetically predicted HDL-C is not causally associated with CVD [26,27], the relationship between HDL-C and longevity is unexpected and the underlying mechanism is not clear. HDL-C levels may affect longevity through complex relationships involving diverse factors [28]. Future studies focusing on the quality and components of HDL rather than the simple measurement of HDL may help to clarify the underlying mechanisms behind this relationship.
Despite some published studies have indicated an association between BMI and human lifespan [2], our results for BMI were conflicted among screening and validation stage. The conflicting results may be attributed to non-linear relation between BMI and longevity. As previous MR study and observational study showed, the relation between BMI and all-cause mortality is Jshaped [29,30], and underweight is also correlated with higher risk of mortality. On the other hand, the relation of BMI and mortality is also affected by smoking status and age [29]. Thus, it is reasonable not to simply include higher BMI into the hazardous components of longevity exposome. However, some traits of body fat showed Fig. 4 Components of the longevity exposome. COPD, chronic obstructive pulmonary diseases; DBP, diastolic blood pressure; HTN, hypertension; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; SBP, systolic blood pressure; T2D, type 2 diabetes; TC, total cholesterol robust association with longevity, while body fat-free mass and weight were unrelated to longevity (see Additional file 1: Table S10). Based on these results, in terms of longevity, a practical recommendation is to reduce body fat than focus on the body fat-free mass or weight. Especially, higher trunk fat mass showed an association with human longevity. As a marker of central adiposity, it was linked with an increased risk of CVD and metabolic diseases [31], which may be one of the potential mechanisms.
Height in adulthood is believed to link with health and longevity, but the exact effect of height on human longevity is conflicted [32][33][34][35]. Our study clarified that standing height and sitting height were not associated with longevity at a suggestive level (see Additional file 1: Table S10-S12). However, higher comparative height size at 10 was negatively associated with human longevity. This result provided a different research prospective for investigation of relation between height and longevity.
Our results indicated a protective effect of higher education attainment, especially gaining college or university degree, on longevity. It is supported by previous evidences that higher life expectancies are associated with greater educational levels [36][37][38]. Education has also been proposed as a protective factor with both AD and CVD outcomes [39,40]. Whether the protective effect of education on longevity is achieved by reducing the risk of CVD or AD needs further investigations.
Strengths of the study include the adoption of the MR approach for assessing the causal effects of a wide array of factors, getting the utmost out of large data and reducing selection bias. Our study identified some exposures that have never been investigated with MR frameworks of longevity, such as VTE, family history, body fat, diet, and comparative height in early life. Furthermore, the prudency on the definition of longevity phenotype has also allowed us to propose components of exposome causally linked to longevity more precisely since the definition of outcome was limited to mortality or parental life span in previous MR [2,8]. Meanwhile, as with all MR studies, the exclusion of pleiotropy or alternative direct causal pathways is a conspicuous challenge. Although all the reported causal exposures in this study identified no pleiotropy in the Egger intercept test, significant Q-tests for some traits found substantial heterogeneity in the analysis. However, to avoid violation of MR assumptions, we conducted sensitivity analysis with weighted median, MR-Egger, and MR-PRESSO method. These methods can provide unbiased causal effect estimates at the cost of reduced power when invalid IVs exist [16,17], and MR-PRESSO outlier test can return an unbiased result by removing potential outlying SNPs [18]. For each significant causal exposure in screening, the point estimates in sensitivity analyses were consistent with IVW, enhancing the robustness of our results [12]. Moreover, increased confidences were gained from the validation using independent exposure datasets. For exposures with vague phenotype descriptions in UKB, more detailed causal traits like LDL-C and T2D were included in the validation analysis using non-UKB exposure data.
There are some limitations to the present study. First, although we have used the largest data of longevity [3], the power of some exposures was below 80%. For example, smoking-related traits showed non-significant effects on longevity; however, because of the limited power (Additional file 2: Table S13-S14), we cannot preclude that they have effects on longevity. Second, not all significant exposures were able to conduct validation due to the lack of appropriate non-UKB data. It is important to note that the absence of a validation result does not disconfirm the robustness of a causal factor, but it also points to the need for further studies with a more comprehended exposure phenotype and a large sample size. Third, some of the exposures from UKB are ordinal variables but are treated as continuous when calculating betas for effect allele at each SNP, leading to difficulties in interpreting estimates quantitatively in subsequent MR analysis. In addition, the findings were discovered from participants of European ancestry that were recruited at the age between 40 and 69 that may not be generalizable to other populations [11]. Another limitation is that for some exposure GWASs, only sex stratified data were available in UK Biobank given that the outcome dataset is men and women combined. However, the effect estimates were very similar between men and women ( Fig. 2), indicating the results were reliable. What is more, a few SNPs overlapped among some exposures, which may suggest that these exposures affect longevity by an interaction. Further studies are required to clarify the underpinning mechanisms of those causal associations.
Based on our findings, it is pellucid that the interventions on cardiovascular disease, metabolic syndrome, and AD, as well as VTE are in demand for the overall benefits of human longevity. Several preventions strategies have been proposed in published literatures and should be abundantly publicized [24,39,40]. We recommend people reducing body fat mass, especially the trunk fat mass, rather than simply focusing on losing weight. In the long term, receiving a higher-level education, at least gaining college or university degree, can generate persistent benefits for longevity. Moreover, appropriate intake of sugar or food/drinks containing sugar is recommended for the general population.