Adult height and risk of 50 diseases: a combined epidemiological and genetic analysis

Background Adult height is associated with risk of several diseases, but the breadth of such associations and whether these associations are primary or due to confounding are unclear. We examined the association of adult height with 50 diseases spanning multiple body systems using both epidemiological and genetic approaches, the latter to identify un-confounded associations and possible underlying mechanisms. Methods We examined the associations for adult height (using logistic regression adjusted for potential confounders) and genetically determined height (using a two-sample Mendelian randomisation approach with height-associated genetic variants as instrumental variables) in 417,434 individuals of white ethnic background participating in the UK Biobank. We undertook pathway analysis of height-associated genes to identify biological processes that could link height and specific diseases. Results Height was associated with 32 diseases and genetically determined height associated with 12 diseases. Of these, 11 diseases showed a concordant association in both analyses, with taller height associated with reduced risks of coronary artery disease (odds ratio per standard deviation (SD) increase in height ORepi = 0.80, 95% CI 0.78–0.81; OR per SD increase in genetically determined height ORgen = 0.86, 95% CI 0.82–0.90), hypertension (ORepi = 0.83, 95% CI 0.82–0.84; ORgen = 0.88, 95% CI 0.85–0.91), gastro-oesophageal reflux disease (ORepi = 0.85, 95% CI 0.84–0.86; ORgen = 0.94, 95% CI 0.92–0.97), diaphragmatic hernia (ORepi = 0.81, 95% CI 0.79–0.82; ORgen = 0.91, 95% CI 0.88–0.94), but increased risks of atrial fibrillation (ORepi = 1.42, 95% CI 1.38–1.45; ORgen = 1.33, 95% CI 1.26–1.40), venous thromboembolism (ORepi = 1.18, 95% CI 1.16–1.21; ORgen = 1.15, 95% CI 1.11–1.19), intervertebral disc disorder (ORepi = 1.15, 95% CI 1.13–1.18; ORgen = 1.14, 95% CI 1.09–1.20), hip fracture (ORepi = 1.19, 95% CI 1.12–1.26; ORgen = 1.27, 95% CI 1.17–1.39), vasculitis (ORepi = 1.15, 95% CI 1.11–1.19; ORgen = 1.20, 95% CI 1.14–1.28), cancer overall (ORepi = 1.09, 95% CI 1.08–1.11; ORgen = 1.06, 95% CI 1.04–1.08) and breast cancer (ORepi = 1.08, 95% CI 1.06–1.10; ORgen = 1.07, 95% CI 1.03–1.11). Pathway analysis showed multiple height-associated pathways associating with individual diseases. Conclusions Adult height is associated with risk of a range of diseases. We confirmed previously reported height associations for coronary artery disease, atrial fibrillation, venous thromboembolism, intervertebral disc disorder, hip fracture and cancer and identified potential novel associations for gastro-oesophageal reflux disease, diaphragmatic hernia and vasculitis. Multiple biological mechanisms affecting height may affect the risks of these diseases. Electronic supplementary material The online version of this article (10.1186/s12916-018-1175-7) contains supplementary material, which is available to authorized users.


Background
Epidemiological studies have associated higher adult height with lower risk of mortality from coronary artery disease (CAD) and respiratory diseases [1][2][3][4][5][6][7] and increased risk of atrial fibrillation (AF) [8,9], venous thromboembolism (VTE) [9][10][11], cancer and cancer in specific sites [1,7,[12][13][14][15][16]. Although such studies have typically adjusted for age, sex and some socio-economic and behavioral risk factors, the observed associations may still be due to unmeasured confounding. Height itself is determined by genetics and early-life factors, such as nutrient availability, socio-economic circumstances and diseases [17][18][19], and some of these can themselves impact on risk of some diseases in later life. Precise reason(s) for the association of adult height with risk of these diseases are not known. In particular, it is unclear whether the associations are primary (due to shared biological pathways affecting both adult height and risk of diseases) and not due to confounding.
Genetic approaches (Mendelian randomisation) that use genetic variants as instrumental variables have been used to test for causal relationships between exposure and disease outcomes [20]. Genotypes are generated at conception with alleles randomly passed on from each parent and are independent of environmental and life style factors that can confound epidemiological analysis. Also, they cannot be altered by disease and therefore remove the possibility of reverse causality. Several recent studies have used a Mendelian randomisation approach to assess the relationship of height with selected diseases, including CAD [21,22], stroke [22], VTE [23] and cancers [16,[24][25][26][27][28].
A recent genome-wide association studies (GWAS) meta-analysis involving 253,288 individuals of European ancestry by the GIANT consortium identified 697 height-associated variants which explain~20% of the heritability of adult height [29]. Here, using this set of genetic variants and the breadth and scale of the UK Biobank [30], we comprehensively evaluated the associations of adult height with 50 diseases in multiple body systems in the same population using both traditional epidemiological and genetic approaches. We additionally undertook pathway analysis of the height-associated genes to explore potential biological processes underlying the associations.

Study design and setting
Details of the design of the UK Biobank have been reported elsewhere [30]. Briefly, the UK Biobank is a population-based longitudinal study that recruited5 00,000 participants aged 40-69 years during 2006-2010 from throughout the United Kingdom (UK). Participants were recruited by inviting just over 9 million individuals from central NHS registration databases in the appropriate age group and living within around 20-25 miles of 22 recruitment centres established in England, Scotland and Wales with an eventual participation rate of around 5%. Detailed data on sociodemographic, health status, family history and life style were collected via questionnaires. Standing height was measured by the Seca 202 device, and other physical measurements such as weight and blood pressure were measured at the assessment centres and biological samples were taken for further analysis.
The UK Biobank data have been linked to Hospital Episode Statistics (HES), as well as national death registries and cancer registries. HES data covers admissions to NHS hospitals in the UK between April 1997 and March 2015, with the Scottish data dating back as early as 1981. HES uses International Classification of Diseases ICD 9 and 10 to record diagnosis information, and OPCS-4 (Office of Population, Censuses and Surveys: Classification of Interventions and Procedures, version 4) to code operative procedures. Death registries include all deaths in the UK up to mid-2015, with both primary and contributory causes of death coded in ICD-10. Cancer registries cover registrations across the UK from 1970s to 2014 with diagnoses (coded in ICD9 and 10) acquired from a variety of sources including NHS and private hospitals, cancer screening programmes, cancer centres, hospices and nursing homes, general practices as well as HES, death certificates and Cancer Waiting Time (CWT) data.
The UK Biobank performed genome-wide genotyping using the Affymetrix UK BiLEVE Axiom array on first 50,000 participants as part of the BiLEVE study [31] and subsequently using the Affymetrix UK Biobank Axiom® array for the remaining cohort. The two arrays are very similar with over 95% overlap content. Details on genotyping, quality control and imputation methodology have been described elsewhere [32]. The assayed genotype data from both stages have been jointly imputed providing genome-wide genotypes on over 70 million SNPs for each subject.

Study participants
We included 459,324 UK Biobank participants with genotype data who self-reported as having a white ethnic background. We excluded 985 participants with missing height and 22 participants whose height was > 4 standard deviation (SD) away from the mean (< 131.6 and > 205.6 cm). On the basis of the genetic data, we further excluded subjects because of uncertain gender (n = 354), genotype data quality (high missingness, QC failure, etc.) (n = 7025) and relatedness (kinship coefficient > 0.088) (n = 33,504). Altogether, we investigated 417,434 unrelated individuals of white ethnic background with valid height measures and genotype data.

Disease definition
We examined 50 diseases covering six broad categories: (i) Cardiovascular diseases-coronary artery disease (CAD), hypertension, peripheral vascular disease (PVD), heart failure (HF), atrial fibrillation (AF), venous thromboembolism (VTE), aortic valve stenosis (AS) and stroke (ii) Musculoskeletal diseases-osteoarthritis, osteoporosis, gout, sciatica, and intervertebral disc disorder (IDD), and hip fracture (iii)digestive diseases-liver cirrhosis, peptic ulcer, gastro-oesophageal reflux disease (GORD), irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), gallstone, appendicitis, diaphragmatic hernia and inguinal hernia (iv) Psychiatric and neurological diseases-anxiety disorder, depression, bipolar disorder, multiple sclerosis (MS), epilepsy, dementia and Parkinson's disease (v) Other non-neoplastic diseases-chronic obstructive pulmonary disease (COPD), asthma, diabetes, hyperthyroidism, hypothyroidism, glaucoma, cataract and vasculitis (vi) Cancer including cancer overall and 11 specific sites-lung, colorectum, prostate, female breast, uterus, ovary, kidney, bladder, melanoma, non-Hodgkin lymphoma and leukemia We defined cases using both self-reported and registry data and included both prevalent and incident cases. Case definition and the proportion of cases that were self-reported are shown in Additional file 1: Table S1. All diseases have a minimum of 1000 cases from 417,434 subjects in this study, providing 80% power to detect a relative 15% difference in disease risk (i.e. an odds ratio of at least 1.15 or 0.85) per one SD change in height at α = 0.001.

Statistical analysis Epidemiological analysis
We used logistic regression to estimate the association of height with risk of diseases, adjusted for age, sex, obesity (BMI ≥ 30 kg/m 2 ), socio-economic status (based on quintiles of Townsend Deprivation Index [33] (an area-based measure of material deprivation calculated from census household data with a higher index indicating more deprived), smoking status (ever smoker, exposed to environmental tobacco smoke, none), physical activity and for individual diseases, other relevant factors, which included, where appropriate, waist-hip-ratio (WHR), systolic blood pressure (SBP), use of insulin, presence of hay fever/eczema and family history of relevant diseases, and for female diseases, parity, nulliparous, ever use of contraceptive pills and ever on hormone replacement therapy. Further details of the adjustments are given in the legend to Fig. 1.

Genetic analysis
We used a two-sample Mendelian randomisation (MR) approach to assess the association of genetically determined height with various diseases. Summary statistics for effect size of the height-associated SNPs were extracted from Wood et al. [29]. Of the 697 height-associated SNPs, 8 were unavailable in the genotype data imputed from the Haplotype Reference Consortium panel. Of the 691 variants serving as instrumental variables, 146 were genotyped and 545 were imputed with excellent imputation quality (mean 0.99 and the lowest 0.84). Effects of these height-associated SNPs on the risk of diseases were estimated with the UK Biobank data under an additive model adjusting for age, sex, array type (BiLEVE or main study) and five principal components. We calculated the ratio estimate for each height-associated variant [20] and then combined the estimates across all variants using meta-analysis [34]. For each height-associated variant, we extracted β 1 (the effect size of the association between the variant and height) from published table [29], estimated β 2 (the effect size of the association between the variant and the disease) under an additive mode of inheritance and computed β 3 (the putative association between height and risk of disease mediated through that variant) using the equation and its standard error , where s 2 is the standard error of β 2 . Estimates of β 3 across all the height-associated SNPs were then pooled using inverse-variance-weighted fixed-effect meta-analysis to obtain the overall association of genetically determined height on the risk of the disease [34]. In cases of high heterogeneity (I 2 > 40%), we conducted a random-effects meta-analysis. The estimated β 3 reflect the logarithm of the odds ratio of developing the disease per one SD increase in genetically determined height. We adjusted the p values for testing 50 diseases using Bonferroni correction in both epidemiological and genetic analyses.

Sensitivity analysis
The MR analysis assumes a linear relationship between height and risk of diseases. It relies on certain assumptions on the selected SNPs as instruments [20]: they are (1) associated with height, (2) not associated with confounding factors and (3) associated with the diseases only through their effect on height. We assessed the potential impact of violation of these assumptions using MR-Egger regression [35] and median-based methods [36] as sensitivity analysis. Standard MR analysis is equivalent to performing a weighted regression of the effect sizes of variant-disease associations (β 2 ) against the effect sizes of variant-height associations (β 1 ) with no intercept term, but MR-Egger conducts such regression with an unconstrained intercept. The estimated intercept in the MR-Egger regression [35] can be interpreted as an estimate of the average pleiotropic effects across the genetic variants. A non-zero intercept is indicative of directional pleiotropy, that is, the pleiotropic effects do not cancel out resulting in a biased MR estimate. We also conducted three median-based methods [36]: (1) simple median method which takes the median β 3 estimate assuming all variants carry equal weight, (2) weighted median method which gives more weight to variants with more precise estimates and (3) penalised weighted median method which down-weights the contribution of heterogeneous variants. MR-Egger regression can give consistent estimates even when 100% of variants are invalid but requires the variants to satisfy a weaker assumption (the InSIDE assumption), whilst the weighted median methods can give consistent estimates as long as at least 50% of the weight comes from variants without pleiotropic effects. Additionally, we examined the association of genetically determined height with risk factors and repeated the genetic analysis excluding SNPs that showed a nominal association (p < 0.05) with suggested confounding factors. We restricted these sensitivity analyses to diseases showing evidence of association in primary genetic analysis (Bonferroni adjusted p < 0.05).
In this study, we defined cases using both self-reported and registry (hospital episodes, cancer and death registries) information. To assess any impact of including self-reported data, we repeated the epidemiological and genetic analyses defining cases based on registry data only. Furthermore, we defined cases including both prevalent and incident cases and have assumed that there were little changes in exposure of risk factors over time in the epidemiological analysis. To assess any impact of this assumption, we repeated the epidemiological analysis defining cases based on incidence only.

Genetic score for height
We calculated a genetic score for height to evaluate the effect of inheriting number of height-increasing alleles on risk of diseases. For each variant, we computed a score based on the posterior probabilities for the height-increasing allele, which is then multiplied by the effect size for height estimated by Wood et al. [29]. The genetic score is the sum of these weighted values across all 691 height-associated variants. We used regression analysis to assess the proportion of variance of height that can be explained by the genetic score. We divided the study subjects into quartiles based on their genetic scores with quartile 1 (Q1) carrying the least number of height-increasing alleles and Q4 carrying the most number of height-increasing alleles. We then used the Cochran-Armitage trend test to assess the presence of diseases across quartiles, and logistic regression to estimate the ORs for the quartiles.

Pathway analysis
To identify possible shared biologic processes between height and diseases, we identified pathways represented by the 691 height-associated SNPs using the Ingenuity Pathway Analysis Software (analysis performed on September 19, 2017). This analysis requires the assignment of each height-associated SNP to a specific gene. We selected the genes identified by Wood et al. [29] through their extensive bioinformatics analysis of each locus. From the pathways identified, we selected those that contained at least five genes from among the height-related genes and tested the association of height and risk of diseases through the specific pathway by combining disease-specific β 3 estimates for all height variants in the pathway using (See figure on previous page.) Fig. 1 Epidemiological and genetic associations of height with diseases. Legend: Odds ratio (OR) and 95% confidence intervals per one standard deviation (SD) increase in height based on observed (epidemiology model) and genetically determined height (genetic model) are shown for a cardiovascular diseases (coronary artery disease (CAD), peripheral vascular disease (PVD), stroke, hypertension, aortic valve stenosis (AS), heart failure (HF), venous thromboembolism (VTE) and atrial fibrillation (AF)), b musculoskeletal diseases (osteoporosis, osteoarthritis, gout, sciatica, intervertebral disc disorder (IDD) and hip fracture), c digestive disorders (liver cirrhosis, peptic ulcer, diaphragmatic hernia, inguinal hernia, gastro-oesophageal reflux disease (GORD), irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), gallstones and appendicitis), d psychiatric and neurological disorders (dementia, epilepsy, anxiety disorder, depression, bipolar disorder, Parkinson's disease and multiple sclerosis (MS)), e other non-neoplastic diseases (chronic obstructive pulmonary disease (COPD), asthma, diabetes, glaucoma, cataract, hypothyroidism and hyperthyroidism and vasculitis), and f cancers and various sites. One SD is 9.2 cm; for men and women specific diseases, 1-SD corresponds to 6.8 cm and 6.2 cm, respectively. All epidemiological models were adjusted for age, sex, obesity (BMI ≥ 30), socio-economic status (Townsend deprivation index in highest quintile), Smoking status (ever smoker, exposed to environmental tobacco smoke, none), physical activity (vigorous exercise at least once a week or more) and other relevant disease-specific risk factors as described below: models for CAD-waist-hip-ratio, systolic blood pressure, use of insulin and family history of heart diseases; models for AF, VTE, PVD and heart failure-systolic blood pressure, use of insulin and family history of heart diseases; model for hypertension-use of insulin and family history of hypertension; model for stroke-waist-hip-ratio, systolic blood pressure, use of insulin and family history of stroke; model for COPD-family history of COPD; model for asthma-presence of hay fever or eczema; model for dementia-family history of dementia; depression-family history of depression; Parkinson's disease-family history of Parkinson's disease; model of glaucoma-systolic blood pressure and use of insulin; model for diabetes-waist-hip-ratio, systolic blood pressure and family history of diabetes; model of cataract-use of insulin; model for cancer overall-family history of lung/breast/prostate/bowel cancer; model for cancer of the breast-nulliparous, ever use of contraceptive pills, ever on hormone replacement therapy and family history of breast cancer; models for lung, prostate and colorectal cancers-family history of respective cancers. *p < 0.05, **p < 0.01, ***p < 0.001 after Bonferroni correction for 50 tests meta-analysis. The results were not adjusted for multiple testing as this analysis was exploratory.

Results
Of the 417,434 people included in the study, 54.0% were women, the mean age at recruitment was 56.8 years (range 38-73), and the mean height was 168.7 cm (SD 9.2). Taller people were younger, more likely to have a lower BMI, a lower WHR, and lower blood pressure (Table 1). They were also less likely to have ever smoked or be socio-economically deprived and more likely to be physically active. Taller women were more likely to be nulliparous and to have ever used the oral contraceptive pill and less likely to have ever been on hormone replacement therapy. Given that taller women were younger, this observation is likely to be confounded by age.
As expected, people carrying more height-increasing alleles are taller ( Table 2). Regression analyses showed that the weighted genetic score explained 16.7 and 16.5% of variation of height for women and men, respectively. Individuals in the upper quartiles were marginally older, with similar sex composition across quartiles, and appeared to be associated with a slightly lower BMI, lower blood pressure and lower Townsend Deprivation Index, but not with smoking history, or undertaking vigorous physical activity. Women with higher genetic score are more likely to be nulliparous and ever on hormone replacement therapy.

Association of height with diseases based on epidemiological and genetic analyses
The estimated odds ratio (OR) per one SD (9.2 cm) increase in height (epidemiological analysis) and genetically determined height (genetic analysis) are shown in Fig. 1. For men and women specific diseases, 1-SD of height corresponds to 6.8 and 6.2 cm respectively. Overall, 39 and 23 diseases showed evidence suggestive of epidemiological and genetic associations with height (p < 0.05), respectively. The height association remained for 32 and 12 diseases (11 in common) after adjusting for multiple testing.
Among psychiatric/ neurological diseases, we observed inverse epidemiological association for dementia, epilepsy, anxiety and depression; however, the corresponding genetic associations were much weakened (Fig. 1d). Similarly, the inverse associations observed for COPD, asthma and diabetes in epidemiological analysis were not found in genetic analysis (Fig. 1e). However, we observed taller height being associated with increased risks of vasculitis in both epidemiological (OR = 1.15, 95% CI 1.11-1.19, p < 0.0001) and genetic analyses (OR = 1.20, 95% CI 1.14-1.28, p < 0.0001).
Both epidemiological (OR = 1.09, 95% CI 1.08-1.11, p < 0.0001) and genetic analysis (OR = 1.06, 95% CI 1.04-1.08, p < 0.0001) showed evidence of a positive association of height with overall cancer (Fig. 1f). Of the 11 sites, height was associated with 8 and 5 sites at p < 0.05 (unadjusted) in epidemiological and genetic analyses, respectively. After adjusting for multiple testing, epidemiological association remained for four sites: female breast (OR = 1.08, 95% CI 1.06-1.10, p < 0.0001), kidney (OR = 1.17, 95% CI 1.08-1.27, p = 0.0053), non-Hodgkin lymphoma (OR = 1.19, 95% CI 1.12-1.27, p < 0.0001) and melanoma (OR = 1.21, 95% CI 1.16-1.26, p < 0.0001), and genetic association   Association is expressed as odds ratios per 1 standard deviation increase in genetically determined height and its 95% confidence interval CAD coronary artery disease, AF atrial fibrillation, VTE venous thromboembolism, GORD gastro-oesophageal reflux disease, The intercept term in MR-Egger regression can be interpreted as an estimate of the average pleiotropic effect across the genetic variants, with a non-zero intercept indicative of directional pleiotropy. MR-Egger uses standard regression in the analysis, whilst robust MR-Egger uses robust regression that down-weights the influence of outliers. The median-based method calculates a median of the causal estimates across all SNPs.

Sensitivity analysis
For the 12 diseases showing an association with genetically determined height, the intercept tests from the MR-Egger regression revealed little evidence for pleiotropy ( Table 3). The associations revealed by MR-Egger regression and median-based methods were broadly similar to the primary genetic analyses for the nine non-neoplastic diseases. For cancer overall, the OR estimates from MR-Egger and weighted median methods remained similar to the primary genetic analysis. For breast cancer and colorectal cancer, the sensitivity analysis appears to suggest a weaker or null genetic association. We found that genetically determined height was associated with obesity, blood pressure, Townsend Deprivation Index and nulliparity (Additional file 2: Table S2). We repeated the genetic analyses excluding SNPs that were associated with these factors. This showed little impact on the estimates of the genetic associations, apart from the anticipated effect when excluding SBP-associated SNPs, which weakened the genetic association for hypertension. (Additional file 2: Table S3).
The estimates for using registry-based cases only were similar to that for using both self-reported and registrybased cases (Additional file 2: Table S4). This suggests little impact on using self-reported data. In addition, we repeated the epidemiological analysis excluding prevalent cases at baseline. There were no significant changes in the estimates, except for IDD in which the association appeared to have become much weakened (Additional file 2: Table S5). The confidence intervals for the estimates as expected became wider due to reduced statistical power.

Genetic score of height and odds ratios of diseases
For the 12 diseases associated with genetically determined height, all exhibited a trend (p < 0.05) in either increasing or decreasing risk with carriage of more height-raising alleles (Fig. 2), and the directions were compatible with the findings from the primary genetic analysis. Compared with subjects in Q1, those in Q4 had a decreased risk of CAD (OR = 0.87, 95% CI 0.84-0.91, p < 0.001), Fig. 2 Risk of disease by quartiles of weighted genetic score for height. Legend: CAD coronary artery disease, VTE venous thromboembolism, AF atrial fibrillation, IDD intervertebral disc disorder and GORD gastro-oesophageal reflux disease. Associations by quartile of weighted genetic score for height are shown for the 12 diseases which showed an association with genetically determined height (Bonferroni p value < 0.05). Individuals in quartile 1 (Q1) (reference quartile) carry the least number, and Q4 carry the highest number of height-increasing alleles. p values for trends (GORD P trend = 0.003, colorectal cancer P trend = 0.003 and breast cancer P trend = 0.010, all other diseases P trend < 0.001 Table 4

Biological pathways
We identified 202 Ingenuity pathways which included five or more genes from amongst the 691 height-related variants (Additional file 1: Table S6). Table 4 shows the top five pathways associated with each disease. There was little overlap in the pathways showing the strongest association with individual diseases. Furthermore, no individual pathway explains the majority of the association with any disease.

Discussion
Our combined epidemiological and genetic analysis showed that adult height is associated with risk of many diseases affecting multiple body systems. We observed a concordance between the epidemiological and genetic analyses for 11 diseases suggesting a primary association between height and risk of these diseases. For colorectal cancer, the genetic analyses suggested a strong association with height but the epidemiological association was slightly weaker. For some diseases (e.g. HF, COPD), we observed an epidemiological association but a much weaker or null genetic association suggesting that the epidemiological associations, despite adjustment for potential confounders, likely remain subject to residual confounding. We used a large number of genetic variants as instruments in our analysis. It is plausible that some of these variants have effects on disease development that are not linked/mediated through their effects on height (pleiotropy). However, among diseases that showed an association with genetically determined height, we observed trends between carrying more height-raising alleles and disease risk (Fig. 2), evidencing a dose response relationship that supports the role of height, or shared biological mechanisms related to both height and the development of disease.
Our analysis does not exclude the possibility that height itself induces behaviour or reflects circumstances that impact on disease risk. For example, genetically determined height appeared to be associated with a lower Townsend Deprivation Index suggesting the potential impact of genetically determined traits on socio-economic outcomes, which could subsequently impact on disease (Additional file 2: Table S2). However, excluding height-related variants that also associated with Townsend Deprivation Index did not attenuate the observed associations (Additional file 2: Table  S3). In the sensitivity analysis, the weighted penalised median produced a weakened association compared with the primary genetic analysis for breast and colorectal cancer, suggesting the possibility of non-homogeneity of the casual effect across variants. Whilst this may lead to incorrect estimate and affect the interpretation of the results, this may not lead to inappropriate inferences [37]. For most diseases in this study, the MR-Egger and weighted median analysis provided consistent estimates with the standard genetic analysis, together with the intercept tests of MR-Egger, supporting the validity of the selected genetic instruments (Table 3).
Turning to individual sets of diseases, the concordant inverse associations between height and risks of CAD in both epidemiological and genetic analyses, together with the absence of any attenuation of the genetic association from exclusion of lipid-related variants [21], suggests a primary impact of shorter height on the vasculature that predisposes to atherosclerotic disease. Our estimate of OR of 0.86 per 1-SD (9.2 cm) increase in genetically determined height agrees with previous reports [21,22] which showed an estimated equivalent OR of 0.83-0.86. Shorter people have smaller caliber vessels which could cause symptomatic disease despite similar plaque burden [38,39]. Height also affects pulsatile arterial haemodynamics with increased augmentation of central systolic pressure in shorter people [40] that could influence disease risk in multiple vascular beds.
Our study found evidence of a possible primary association between taller height and risk for AF. Increased atrial size has been recognised as a risk factor of AF [41]. Given the association between body size and left atrial size [42], it is plausible that the height-AF association may be mediated through atrial size. Previous epidemiological studies have reported a positive association between height and VTE [9][10][11], and our study supports a genetic role of height. One recent study showed an OR = 1.34 per 10 cm increase in genetically determined height [23], which appeared to be strong than our estimate of OR = 1.15 per 9.2 cm increase in genetically determined height, but this study used the genetic risk score (GRS) as the single instrumental variable as opposed to using the variants that comprise the GRS as the instrumental variables in our study and this may lead to the difference in the effect estimates [35]. Taller height is associated with an increased risk of VTE. It is possible that greater venous surface area in taller people increases the risk of VTE.
Extending previous epidemiological studies [43,44], we found a positive relationship between height and risk of IDD. Whilst the mechanisms remain to be identified, one possible mechanism may be through facet tropism (asymmetry in left and right facet joint angles of lumbar spine) [45]. In addition, our result agrees with a recent meta-analysis of prospective cohort studies which concluded a positive association between height and risk of hip fracture [46]. It is plausible that the association might be mediated with hip axis length, given that height is positively associated with hip axis length [47], which has been reported as a risk factor of hip fracture [48]. Our study also found evidence for a negative association between height and risks of diaphragmatic hernia and GORD. Given that hiatal hernia is the most common type of diaphragmatic hernia, and that it plays an important role in the pathogenesis of GORD [49], it is not surprising height has the same directional impact to the risk of developing these two diseases. Whilst the mechanisms remain unclear, a prior hypothesis is that shorter people have greater intra-abdominal pressure, which increases the risk of developing hiatal hernia and subsequent reflux symptoms [50]. Our study also found strong evidence for a positive association between adult height and vasculitis. This suggests a possible link between height and some aspects of immune function although this requires further investigation. To our knowledge, this is the first analysis performed to evaluate the association of height with risks of diaphragmatic hernia, GORD and vasculitis.
Consistent with previous epidemiological reports [1,7,[12][13][14][15], we found that taller height was associated with a higher overall risk of cancer. One recent study [27] using the UK Biobank showed an OR of 1.10 (95% CI 1.07-1.13) per 1-SD increase in genetically determined height, which appeared to be slightly stronger than our estimate (OR = 1.06, 95% CI 1.04-1.08 per 1-SD in genetically determined height), but one should note the difference in the study designs. Ong et al. [27] excluded self-reported cancer cases, used 2059 genetic variants as instrumental variables and conducted the analysis using the SNP-height and SNP-cancer effects all derived within the UK Biobank. Overlapping subjects in a two-sample MR is known to induce bias [51], and our estimate is potentially more accurate. Nonetheless, these studies strongly suggested that the positive association of height and cancer is primary (not due to confounding), potentially reflecting multiple shared mechanisms influencing cellular growth (Table 4). There were concordant trends towards higher risk with both observed and genetically determined height for various site-specific cancers (Fig. 1f ). The diversity of the types of cancers associated with height and their magnitude of associations suggested that there may be different biological mechanisms by which height affects the risks. We observed concordant epidemiological and genetic evidence for breast cancer. Our genetic estimate of OR = 1.07 per 6.2 cm increase (equivalent to OR = 1.12 per 10 cm increase) is similar to previous reports of OR = 1.19 and 1.22 per 10 cm increase in genetically determined height [16,24]. Our study also agrees with previous studies showing evidence of genetically determined height with risk of colorectal cancer [26], and little evidence with prostate cancer [16,25]. It is interesting to note that for lung cancer, the genetic and epidemiological associations were in opposite directions, although both associations were not found statistically significant after adjustment for multiple testing suggesting that the observed associations could be due to chance. It is also possible that the epidemiological finding here for lung cancer remains subject to confounding. Our genetic estimate of OR = 1.15 per 9.2 cm increase is consistent with previous report of OR = 1.10 per 10 cm increase in genetically determined height [16]. Previous epidemiological report in women population suggested possible effect modification by smoking status for smoking-related cancers [14], but our epidemiological analysis did not show evidence of difference in height-lung cancer risks between ever and never smokers (interaction of height with smoking status p = 0.723).
Our pathway analysis showed that there were different height-associated pathways influencing risks of individual diseases ( Table 4). Several of these pathways have been linked to diseases or disease risk in their respective categories, although in many cases, the relationship between the pathways and the disease risks are not very well understood. Nitric oxide signalling has a known relationship to CAD risk [52], and Wnt signalling has been linked to AF [53]. We also found a link between Wnt signalling and IDD, but its role with the disease remains to be investigated. In addition, we observed a link of glioma signalling with VTE and the role of tissue factor in cancer. Several studies suggested possible link of VTE with malignant glioma [54,55] and other forms of cancer [56]. In this study, we also noted an association of Sphingosine-1-phosphate (S1P) signalling with both the 'cancer overall' and colorectal cancer disease categories. S1P is known to have a role in tumorigenesis and tumor growth [57] and has been linked with multiple cancer types and has an association with intestinal inflammation and tumorigenesis [58]. The role of Rho GTPases in the development of colorectal cancer has been reported [59], and this is consistent with our finding of the link of 'signalling by Rho family GTPases' pathway with colorectal cancer.

Limitations of study
Whilst the scale and breadth of the UK Biobank and the ability to examine and directly compare both epidemiological and genetic associations in the same population are particular strengths of our analysis, some limitations need to be highlighted. Although large, the UK Biobank may not be representative of the UK population. There is a skew towards individuals in higher socio-economic groups [30]. Despite a low response rate, the fact that the associations reported in this paper largely agree with other studies, is reassuring. We included both prevalent and incident cases in the primary epidemiological analysis and assumed the exposure of risk factors recorded at baseline remained constant. Our sensitivity analysis revealed generally little impact of the current design as opposed to a prospective design which includes incident cases only. Our design allowed consistent case definition for both genetic and epidemiological analyses and enabled maximum statistical power for detection of association. Finally, there were a small minority of non-White participants in the UK Biobank. We restricted our analysis to individuals of a White ethnic background, and it remains to be shown whether the height-related associations apply to other ethnic groups.

Conclusion
Adult height is associated with risks of diseases in multiple body systems. Our study, using both epidemiological and genetic approaches, not only confirmed previously reported height associations for CAD, AF, VTE, IDD, hip fracture and cancer, but also identified potential novel associations for GORD, diaphragmatic hernia and vasculitis. It suggests complex relationship between adult height and risk of diseases and shared biological mechanisms underpinning many of the observed height-disease associations.

Additional files
Additional file 1: Table S1. Case definition and data coverage. Table S6. Pathways identified by height-associated variants. (XLSX 45 kb) Additional file 2: Table S2. Association of genetically determined height and disease risk factors. Table S3. Association of genetically determined height and risks of diseases excluding SNPs with potential pleiotropic effects. Table S4. Sensitivity analysis for impact of self-reported cases. Table S5