Study design and participants
Potential participants were first identified for the UK Biobank study using National Health Service (NHS) records, and 9.2 million eligible individuals, aged 40–70 and living within 25 miles of one of the assessment centres in the UK, were invited to participate in the study. Over 500,000 participants (5.5% response rate) consented to participate between 2006 and 2010 [27] and visited one of 22 assessments centres across England, Wales, and Scotland. A full description of the study protocol can be found on the UK Biobank website [27].
The UK Biobank was approved by the NHS North West Multicentre Research Ethics Committee (21/NW/0157). All participants provided informed consent at recruitment, allowing for follow-up using data-linkage to health records. The study was performed in accordance to the Declaration of Helsinki.
Exclusions
Participants were excluded from this analysis if they withdrew consent over the study period (n = 871), had a prevalent cancer diagnosis at recruitment (excluding non-melanoma skin cancer International Statistical Classification of Disease (ICD-10) code: C44; n = 29,504), their genetic sex was different from their reported sex (n = 321), or they did not contribute any follow-up time (n = 2; Additional File 1 Fig. S1). Participants who responded as ‘do not know’ or ‘prefer not to say’ for all dietary questions regarding meat intake were also excluded from the analyses (n = 282). This left a total of 472,337 participants, of whom 217,937 were males and 254,400 were females. For prostate cancer analyses, women were excluded, and for postmenopausal breast cancer analyses, women who were premenopausal at recruitment and did not reach the age of 55 over the follow-up time (n = 16,222), and men, were excluded.
Diet group classification
Diet groups were categorised using the touchscreen questionnaire completed at recruitment which asked participants about their frequency of consumption of processed meat, beef, lamb or mutton, pork, chicken, turkey or other poultry, and oily and non-oily fish. Participants chose a frequency of intake ranging from “Never” to “Once or more daily”. From these responses, participants were categorised into four diet groups (regular meat-eaters; low meat-eaters; fish-eaters; and vegetarians). Regular meat-eaters were participants who said they consumed processed, red meat (beef, pork, lamb), or poultry > 5 times a week. Low meat-eaters were participants who reported consuming processed, red meat, or poultry ≤ 5 times a week. Fish-eaters were participants who reported that they never consumed red meat, processed meat, or poultry but ate oily and/or non-oily fish. Vegetarians were defined as participants who reported that they never consumed any meat or fish. The vegetarian group also included vegans who reported not consuming any meat, fish, dairy, or eggs (n = 446).
Covariates and biomarkers
The baseline touchscreen questionnaire also asked participants about sociodemographic, reproductive, and lifestyle factors. In addition, all participants had their blood drawn and anthropometric measurements, including height and weight, taken by a trained professional. Further information on covariate data collection and classification can be found in the Additional File 1 Supplementary Methods.
Non-fasting blood samples were provided by 99.7% of participants at recruitment and were shipped to the central processing laboratory at 4 °C prior to serum preparation, aliquoting, and cryopreservation in the central working archive. Biochemistry markers were measured including insulin-like growth factor-I (IGF-I) and testosterone, as well as sex hormone-binding globulin which we used to calculate an estimate of free testosterone [28]. Further description of the UK Biobank biomarker measurements can be found online [29].
Follow-up and outcome ascertainment
Data on cancer diagnosis were ascertained using a combination of records from the NHS Digital (cancer registry) and Public Health England for participants from England and Wales, NHS Central Register for participants from Scotland [30] as well as the Hospital Episodes Statistics (HES) data for English participants and Scottish Morbidity Records (SMR) for Scottish participants (please see details in the Additional File 1 Supplementary Methods). Using the World Health Organization’s ICD-10 codes, participants were classified as having an event if they had an incident diagnosis of cancer recorded as: all cancer (C00-97 excluding non-melanoma skin cancer: C44), colorectal cancer (C18-C20), breast cancer (C50), or prostate cancer (C61), or if no prior incident diagnosis was reported their primary underlying cause of death was the respective cancer. Participants contributed follow-up time from the date of recruitment until the date of the first cancer registration or cancer first recorded on death certificate, date of death, or last day of follow-up available from HES and SMR data (28 February 2021 for England and Scotland). Cancer registry data were available until 31 July 2019 for England and Wales, and 31 October 2015 for Scotland; after this time, only HES and SMR data were used for the follow-up of participants. For Welsh participants, hospital episode data did not extend past the cancer registry censoring date and therefore were not used. For breast cancer, analyses were restricted to postmenopausal breast cancer and women contributed follow-up time beginning when they turned 55 years of age or their date at recruitment if they were categorised as being postmenopausal from questions asked at baseline (see Additional File 1 Supplementary Methods for further details) [31].
Statistical analyses
Baseline characteristics of UK Biobank participants were summarised across diet groups for all participants, and separately for men and women.
Cox proportional hazards regressions were used, with age as the underlying time variable, to estimate hazard ratios (HR) and 95% confidence intervals (CI). Minimally adjusted models were stratified by sex (for all cancer and colorectal cancer analyses only) and age at recruitment (< 45, 45–49, 50–54, 55–59, 60–64, ≥ 65 years) and adjusted for region at recruitment (North-West England, North-Eastern England, Yorkshire & the Humber, West Midlands, East Midlands, South-East England, South-West England, London, Wales, and Scotland).
Multivariable-adjusted Cox regression models for all analyses were further adjusted for height (eight sex-specific categories increasing by 5 cm, and unknown/missing (0.51%)), physical activity (low: 0–9.99, medium: 10–49.99, high: ≥ 50 metabolic equivalent of task-hours /week, and unknown/missing (4.04%)), Townsend deprivation index (quintiles from most deprived to least deprived, and unknown/missing (0.13%)), education (completion of national exam at age 16, completion of national exam at age 17–18, college or university degree, or other/unknown/missing (18.7%)), employment status (employed, retired, not in paid employment, or unknown (1.15%)), smoking status (never, former, light smoker: ≤ 15 cigarettes/day, medium smoker: 16–29 cigarettes/day, heavy smoker: ≥ 30 cigarettes/day, or missing/unknown (0.65%)), alcohol consumption (none drinkers, < 1, 1–9.99, 10–19.99, ≥ 20 g/day, or unknown/missing (0.73%)), ethnicity (White, Mixed race or other, Asian or British Asian, and Black or Black British, or missing/unknown (0.56%)), and diabetes status (no, yes, or unknown (0.53%)).
For colorectal cancer and for all cancer sites, multivariable models were further adjusted for female specific covariates: menopausal hormone therapy (MHT) use (no, former, current, or unknown (0.58%)) and menopausal status at recruitment (premenopausal, postmenopausal, or unknown (9.0%)). Moreover, for colorectal cancer, multivariable models were adjusted for non-steroidal anti-inflammatory drug use (NSAID; no reported use, irregular use, regular use of aspirin/ibuprofen). For prostate cancer, models were additionally adjusted for marital status (not living with a partner, living with a partner) [32]. For postmenopausal breast cancer, models were additionally adjusted for MHT use (same as above), age at menarche (≤ 12 years, 13 years old, ≥ 14 years, or unknown (22.5%)), parity and age at first birth (nulliparous, 1–2 children < 25 years old, 3+ children < 25 years old, 1–2 children 25–29.9 years old, 3+ children 25–29.9 years old, 1–2 children 30+ years old, 3+ children 30+ years old, or missing (0.3%)). Further information on covariate classification can be found in Additional File 1 Supplementary Methods. In all models, the proportional hazards assumption was evaluated using Schoenfeld residuals, and no violations were observed.
We considered BMI as a potential confounder as well as a mediator. When BMI was considered as a potential confounder, BMI measured at recruitment was added to multivariable models (multivariable adjusted + BMI; < 20, 20–22.49, 22.5–24.9, 25.0–27.49, 27.5–29.9, 30–32.49, 32.5–34.9, ≥ 35 kg/m2, or unknown/missing (0.57%)). Models assessing BMI as a mediator are explained below in the mediation analyses section.
To determine if there was heterogeneity in the associations of diet groups with cancer risk, and to assess the influence of confounder adjustments [33, 34], χ2 statistics and p-values for including the diet group in the model were estimated using likelihood ratio tests (LRT) comparing a model without the diet groups variable to the model with the diet groups variable.
Subgroup and sensitivity analyses
For all analyses, we assessed heterogeneity by subgroups of BMI (median: < 27.5 and ≥ 27.5 kg/m2) and smoking status (ever and never) by using a LRT comparing the main model to a model including an interaction term between diet groups and the subgroup variable (BMI and smoking status). For colorectal cancer, we further assessed heterogeneity by sex. For all cancer sites combined, we additionally explored heterogeneity by smoking status, censoring participants at baseline who were diagnosed with lung cancer.
In sensitivity analyses, we excluded cases and participants who had less than 2 years of follow-up and all participants with missing data on covariates. We also examined associations separately in white participants because a large proportion of the vegetarians in this cohort are of South Asian ethnicity (~ 17.5%). Furthermore, we additionally adjusted for fruit and vegetable intake in the multivariable adjusted model (< 3 servings/day, 3–3.99 servings/day, 4–5.99 servings/day, ≥ 6 servings/day, unknown) to control for this component of dietary intake as a proxy for a healthy diet. For prostate cancer analyses, we included in the multivariable adjusted model prostate-specific antigen (PSA) testing (no PSA testing, had PSA test, or unknown) reported at baseline in all men and during follow-up from general practice records in a subsample (n = 99,412 males; records available for participants until 31 May 2016 for England, 31 March 2017 for Scotland, and 31 August 2017 for Wales).
Mediation analyses
If a significant association was observed between a diet group and a cancer outcome in the main analyses, we then further explored potential mediators that have been shown to be associated or possibly associated with diet groups [19, 21] and were previously related to the cancer site of interest (BMI, IGF-I, and free testosterone) [25, 26]. To determine if differences in mediators were observed by diet group, we used multivariable linear regression to compare the selected biomarker measurements (IGF-I and free testosterone [28]) and BMI across dietary groups, adjusting for potential confounders (see Additional File 1 Supplementary Methods). We did not explore mediation if there was no significant difference in cancer risk between each diet group and regular meat-eaters or if the biomarker concentrations were not significantly different between diet groups. We explored mediation via BMI for all cancer, colorectal cancer, and postmenopausal breast cancer risk [8], but not for prostate cancer due to its heterogeneous association with risk by stage and grade [35] and as these data are not available in this cohort. For prostate cancer and postmenopausal breast cancer, we also explored potential mediation via circulating concentrations of IGF-I and calculated free testosterone [25, 26, 28]. We did not explore biomarker mediation for the all cancer–diet group associations as these biomarkers have not been associated with all cancer risk.
To assess for mediation, we used the inverse odds ratio weighting (IORW) method [36, 37]. This method aims to decompose associations between diet group mediated by the potential mediator (natural indirect effect [NIE]) and the estimated association of diet group with cancer risk not mediated by baseline BMI or biomarkers (natural direct effect [NDE]). The term “effect” is used here in concordance with the causal mediation literature but should not be interpreted as implying causality. To determine the proportion of the association between diet groups and cancer outcome mediated by the mediator of interest (e.g. BMI), we took the log of the indirect effect HR and divided it by the log of the total effect HR. Further details of the mediation analyses can be found in the Additional File 1 Supplementary Methods [38, 39].
All analyses were conducted using Stata version 17.0 (Stata Corp LP, College Station, TX). P-values were two-sided with p < 0.05 being considered statistically significant.