Data source and participants
For this community-based cohort study, data were extracted from the public UK Biobank Resource [25]. The UK Biobank is a prospective cohort study with over 500,000 community-dwelling participants across the UK aged 37–73 years when recruited between 2006 and 2010 [26].
Participants who indicated they were in paid employment or self-employed at baseline were included in our study. We excluded those who (1) reported previous cognitive impairment or dementia, (2) lack of information about shift work or night shift work status, and (3) have no genetic data.
Shift work definition
The definition of shift work in UK Biobank was “a schedule falling outside of 9 am to 5 pm; by definition, such schedules involved afternoon, evening, or night shifts or rotating through these shifts,” while night shift work was defined as “a work schedule that involves working through the normal sleeping hours, for instance, working through the hours from 12 to 6 am.”
The UK Biobank first asked participants employed at baseline to report whether their current main job involved shift schedule; if so, participants were further asked if night shifts were involved. For both questions, response options were never/rarely, sometimes, usually, or always. We derived individual current shift work status according to responses to the two questions, and categorized them as “non-shift workers” or “shift workers,” with “non-shift workers” defined as working between hours 9 am to 5 pm; among shift workers, participants were categorized as “night shift workers” or “shift but non-night shift workers”, with “non-night shift workers” defined working between hours 5 pm to 12 am; among night shift workers, participants were further categorized as “some night shift workers” or “usual/permanent night shift workers.”
Outcomes
The primary outcome was all-cause dementia in a time-to-event analysis, and the secondary outcomes included AD, VD, and other types of dementia. The electronic health records (EHRs), a data linkage to hospital inpatient admissions and death registries, include primary or secondary events in England, Scotland, and Wales. A previous comparison between EHRs and expert clinical adjudicators in the UK Biobank showed that the overall positive predictive value for dementia diagnosis is 82.5% [27], suggesting that the EHRs were effective to assess the association between risk factors and dementia. We used the algorithms provided by UK Biobank to identify dementia cases, which were generated based on EHRs, using ICD-9 and ICD-10 codes (Additional file 1: Table S1). In the time-to-event analysis, the date of incident dementia during follow-up was set as the earliest date of dementia codes recorded regardless of the source used. At the time of analysis, as hospital admission data were available until 30 June 2021, we, therefore, censored the disease-specific outcome analysis at this date or the date of the first disease incidence or death, whichever occurred first. Mortality data were available for participants until 31 May 2021.
Polygenetic risk score for dementia
We developed a polygenetic risk score (PRS) for quantifying the genetic predisposition to dementia using single-nucleotide polymorphisms (SNPs) associated with dementia based on previous genome-wide association studies that did not include UK Biobank participants [28]. Information on the 23 selected SNPs is listed in Additional file 1: Table S2. Individual SNPs were coded as 0, 1, and 2 according to the number of risk alleles. The PRS was formulated as the sum of the number of risk alleles at each locus multiplied by the respective regression coefficient, divided by the number of SNPs, using PRSice-2 [29, 30]. The PRS was then divided into quartiles and categorized as low (quartiles 1 to 2), intermediate (quartile 3), and high (quartile 4) genetic predisposition to dementia (Additional file 1: Table S3).
Covariates
Possible confounding variables include: age; sex; ethnicity (white/not white); education, categorized as higher (college/university degree or other professional qualification), upper secondary (second/final stage of secondary education), lower secondary (first stage of secondary education), vocational (work-related practical qualifications), or other; socioeconomic status, categories derived from Townsend deprivation index quartiles 1 (low), 2 to 3 (intermediate), and 4 (high); diabetes mellitus (DM); hypertension (HTN); stroke; coronary heart disease (CHD); cholesterol-lowering medication; antihypertensives; aspirin; body mass index (BMI); systolic blood pressure (SBP); total cholesterol (TC); triglycerides (TG); high-density lipoprotein (HDL); low-density lipoprotein (LDL); glycated hemoglobin (HbA1c); smoking status (current or no current smoking); alcohol consumption; healthy diet, based on consumption of at least 4 of 7 commonly eaten food groups following recommendations on dietary priorities [31]; regular physical activity, defined as meeting the 2017 UK Physical activity guidelines of 150 min of moderate activity per week or 75 min of vigorous activity; years of work; sleep duration, categorized as ≤ 6, 7–8, and ≥ 9 h/day; chronotype preference (definitely a “morning” person, more a “morning” than “evening” person, more an “evening” than a “morning” person, and definitely an “evening” person).
Statistical analysis
For baseline characteristics, continuous variables conforming to normal distribution were described by their means and standard deviations, while those not conforming to normal distribution were described by medians and interquartile ranges. Categorical variables were described by counting numbers and calculating percentages. Univariate comparisons between groups were performed using Student’s t, Mann–Whitney, or χ2 tests according to the type and distribution of variables.
In the primary analysis, time-to-event analysis for all-cause dementia was performed using the Cox proportional hazard regression model, and we constructed several models that included different covariates to estimate hazard ratios (HR) and their 95% confidence intervals (95% CI). Model 1 was adjusted for age at baseline and sex. Model 2 was adjusted for terms in model 1, ethnicity, education, and socioeconomic status. Model 2 was chosen as the primary model.
We used a fixed sequence procedure for multiple comparisons, which would not inflate the type I error. We sequentially compared differences in the incidence of dementia between shift workers and non-shift workers, night shift workers and shift but non-night shift workers, and some/usual night shift workers and permanent night shift workers. In the subgroup analysis, which was set out to explore whether the impact of shift work on dementia varied in the subgroups defined according to age at baseline (≤ 60, > 60 years), ethnicity, sex, socioeconomic status, sleep duration, and genetic predisposition to dementia by PRS, the P value for interaction was calculated by the tests of exposure-by-covariate interaction in the Cox models. The secondary outcomes of dementia subtypes were analyzed using the same Cox models of the primary analysis.
We conducted several sensitivity analyses. First, we further adjusted some covariate. Model 3 was further adjusted for terms in model 2, DM, HTN, stroke, CHD, cholesterol-lowering medication, antihypertensives, aspirin, BMI, SBP, TC, TG, HDL, LDL, HbA1c, smoking status, alcohol consumption, healthy diet, and regular physical activity. Model 4 was adjusted for terms in model 3, genetic predisposition to dementia by PRS category. Model 5 was adjusted for terms in model 4, years of work. Model 6 was adjusted for terms in model 5, sleep duration. Model 7 was adjusted for terms in model 6, chronotype preference. Second, we analyzed the impact of shift work on dementia using Fine-Gray methods accounting for death as a competing risk, to assess the robustness of our findings [32]. Third, we also excluded subjects with follow-up time < 1 year or incident dementia < 1 year from baseline to perform the analysis. Forth, we perform the same analysis in the dataset containing 278,270 participants using multiple imputations by chained equations with 5 imputations to impute missing values.
All P values were reported as two-sided tests with significance defined as P < 0.05. Statistical analyses were performed in the R software (Version 4.0.3, R Core Team, https://www.r-project.org).