Risk prediction models for dementia: role of age and cardiometabolic risk factors

Background Cardiovascular Risk Factors, Aging, and Incidence of Dementia (CAIDE) risk score is the only currently available midlife risk score for dementia. We compared CAIDE to Framingham cardiovascular Risk Score (FRS) and FINDRISC diabetes score as predictors of dementia and assessed the role of age in their associations with dementia. We then examined whether these risk scores were associated with dementia in those free of cardiometabolic disease over the follow-up. Methods A total of 7553 participants, 39–63 years in 1991–1993, were followed for cardiometabolic disease (diabetes, coronary heart disease, stroke) and dementia (N = 318) for a mean 23.5 years. Cox regression was used to model associations of age at baseline, CAIDE, FRS, and FINDRISC risk scores with incident dementia. Predictive performance was assessed using Royston’s R2, Harrell’s C-index, Akaike’s information criterion (AIC), the Greenwood-Nam-D’Agostino (GND) test, and calibration-in-the-large. Age effect was also assessed by stratifying analyses by age group. Finally, in multistate models, we examined whether cardiometabolic risk scores were associated with incidence of dementia in persons who remained free of cardiometabolic disease over the follow-up. Results Among the risk scores, the predictive performance of CAIDE (C-statistic = 0.714; 95% CI 0.690–0.739) and FRS (C-statistic = 0.719; 95% CI 0.693–0.745) scores was better than FINDRISC (C-statistic = 0.630; 95% CI 0.602–0.659); p < 0.001), AIC difference > 3; R2 32.5%, 32.0%, and 12.5%, respectively. When the effect of age in these risk scores was removed by drawing data on risk scores at age 55, 60, and 65 years, the association with dementia in all age groups remained for FRS and FINDRISC, but not for CAIDE. Only FRS at age 55 was associated with dementia in persons who remained free of cardiometabolic diseases prior to dementia diagnosis while no such association was observed at older ages for any risk score. Conclusions Our analyses of CAIDE, FRS, and FINDRISC show the FRS in midlife to predict dementia as well as the CAIDE risk score, its predictive value being also evident among individuals who did not develop cardiometabolic events. The importance of age in the predictive performance of all three risk scores highlights the need for the development of multivariable risk scores in midlife for primary prevention of dementia.


Background
There is considerable evidence of the importance of vascular pathways to cognitive impairment and dementia [1][2][3]. The brain's need for a constant supply of oxygen and glucose for maintenance of physiological function makes it vulnerable to vascular dysfunction [4]. Current understanding of Alzheimer's disease, the primary cause of dementia, suggests that changes in biomarkers are present 15-20 years before the appearance of clinical symptoms [5]. Accordingly, there is emerging research on the association between midlife cardiometabolic risk factors and dementia although a wide age range, 35 to 68 years [6][7][8][9], is used to characterize midlife. Studies that have defined midlife with more precision have examined individual risk factors [10,11], rather than the overall risk burden.
There are several multivariable risk scores for cardiovascular disease and diabetes that have been elaborated but the Cardiovascular Risk Factors, Aging, and Incidence of Dementia (CAIDE) risk score [12] is the only currently available midlife risk score specifically developed for dementia prediction. It is composed of sociodemographic and vascular risk factors: age, education, sex, systolic blood pressure, body mass index (BMI), total cholesterol, and physical activity. Age is the strongest known risk factor for dementia, and a previous paper found additional adjustment for age to attenuate the association between the Framingham cardiovascular Risk Score (FRS) [13], which already contains an age component, and dementia [14]. Furthermore, previous studies have not considered the fact that the association of cardiovascular risk factors with dementia depends on the age at assessment of cardiovascular risk burden, and mid-rather than late-life exposure has been shown to be important [10,11,15].
Our objective was to compare the association of CAIDE, FRS, and the FINDRISC diabetes [16] risk score with incidence of dementia. To address the effects of age in the risk scores, we examined the association of risk scores drawn from baseline (when participants were 39 to 63 years), adjusted for age, and then at ages 55, 60, and 65 years. This allowed us to test the hypothesis that when a wide age range is used at risk score assessment, the age component of the risk score is the prime driver of reported findings. We then examined whether the association of risk scores with dementia was mediated by clinical cardiovascular disease and diabetes.

Methods
Data are drawn from the Whitehall II study, an ongoing prospective cohort study established in 1985 on 6895 men and 3413 women, aged 35 to 55 years at recruitment [17]. The study design consists of a self-administered questionnaire and a clinical examination every 4 to 5 years (1991-1993, 1997-1999, 2003-2004, 2007-2009, 2012-2013, and 2015-2016) which includes anthropometry, cardiovascular and metabolic risk factors, biochemical measures, and chronic diseases. Risk factors included in the analysis were incorporated in the study starting in 1991-1993; data for the construction of risk scores for each participant were drawn from 1991 to 1993 and from the closest wave of clinical examination when participants were 55, 60, and 65 years over the follow-up.

Risk scores
CAIDE [12], FRS [13], and FINDRISC [18] risk scores were calculated using the original scoring methods (Additional file 1: Tables S1, S2, S3) at study baseline (age 39-63 years) and ages 55, 60, and 65 years for each participant. Venous blood was taken in the morning after ≥ 8 h of fasting or at least 5 h after a light, fat-free breakfast. Serum for lipid analyses was refrigerated at 4°C and assayed within 72 h. Total cholesterol was measured using a Cobas Fara centrifugal analyzer (Roche Diagnostics System), HDL cholesterol by precipitating non-HDL cholesterol with dextran sulfate-magnesium chloride with the use of a centrifuge and measuring cholesterol in the supernatant fluid. Systolic blood pressure (mmHg) was taken as the average of two measurements (Hawksley random-zero sphygmomanometer) with the participant in a sitting position after 5 min rest. Treated hypertension was determined using chapters 2.2 to 2.6 of the British National Formulary. Diabetes was defined by a fasting glucose ≥ 7.0 mmol/L or reported doctor-diagnosed diabetes or the use of diabetes medication. Weight was measured in underwear to the nearest 0.1 kg on digital Soehnle electronic scales (Leifheit AS, Nassau, Germany). With the participant standing erect in bare feet with head in the Frankfurt plane, height was measured to the nearest 1 mm using a stadiometer. BMI (kg/m 2 ) was calculated by dividing weight (in kilograms) by height (in meters squared). Waist circumference, the smallest circumference at or below the costal margin, was measured with subjects in the standing position in light clothing, using a fiberglass tape measure at 600 g tension.
Data on smoking status (current or never/ex-smoker), frequency of fruit and vegetables (8-point scale categorized as "less than daily" or "daily"), and education (years in full-time education) were reported by participants. Family history of diabetes was reported by the participants and personal history of diabetes ascertained from their clinical records in the study. Physical activity for the CAIDE was measured as engagement in activity causing sweating and breathlessness, at least twice a week, for a total weekly duration of 1 h or more. For the FINDRISC, it was duration in moderate to vigorous physical activity, at least 4 h a week [19].

Dementia ascertainment
All residents in the UK have a unique National Health Service (NHS) identification number which was used to link all participants to electronic health records. Three registers (HES, the Mental Health Services Data Set, and the mortality register) were used for dementia ascertainment using ICD-10 codes F00-F03, F05.1, G30, and G31. Record linkage was available until 31 March 2017. The NHS provides most of the health care, including out-and in-patient care. The sensitivity and specificity of dementia in the NHS HES data is 78.0% and 92.0% [21]. In addition, we used the Mental Health Services Data Set, a national database which contains information on dementia for persons in contact with mental health services in hospitals, out-patient clinics, and the community.

Statistical analyses
The TRIPOD checklist is included in Additional file 1: Table S4.

Risk scores and incidence of dementia
As there was no suggestion of deviations from linearity (Additional file 1: Fig. S1), all three risk scores were standardized (Mean = 0, SD = 1) to allow comparison between them, sex-specific for the FRS and FINSRISC in accordance with the original scoring. We used Cox proportional hazard regression to examine the predictive performance of the three risk scores, drawn from baseline assessment in 1991-1993. Participants were followed to the date of record of dementia, death, or 31 March 2017, whichever came first. Censoring participants who died over the follow-up at the date of death allowed us to account for competing risk of death using cause-specific hazard models [22].
Assumptions of proportional hazards and loglinearity were found not to be violated using Schoenfeld and Martingale residuals. For each risk score, we estimated Royston's modified R 2 for survival data as a measure of overall performance (higher values indicate a greater proportion of variation explained) along with confidence intervals calculated using 2000 bootstrap replications [23]; Harrell's C-statistic for discrimination, [24] which were formally compared using a nonparametric approach [25]; the Akaike information criterion (AIC) for relative goodness-of where lower absolute values indicate better model fit, differences in AIC of 3 or more are considered to be meaningful; calibration for agreement between observed and predicted risk was tested using the Greenwood-Nam-D'Agostino (GND) test, an extension of the Hosmer-Lemeshow test, p < 0.05 indicates a lack-of-fit [26]; and calibration-in-the-large shown in plots of observed and predicted dementia rate per 1000 person/years in deciles of the predictor. Subsequent analyses were stratified by age at exposure (ages 55, 60, and 65 years) and the predictive performance of CAIDE was compared to the FRS and FIN-DRISC. The follow-up time in these analyses was calculated from age at exposure (ages 55, 60, and 65 years) to the record of dementia, death, or March 31, 2017, whichever came first. In a complimentary approach to assessing the role of age, we compared the performance of each risk score to age and then their modified versions by removing the age component from the risk scores.

Role of cardiometabolic disease (CHD, stroke, diabetes) in the association between risk scores and dementia
In participants free of cardiometabolic disease at the assessment of risk scores, we examined the role of cardiometabolic disease over the follow-up in the association between risk scores and incidence of dementia using multistate models (Additional file 1: Fig. S2). These models allow simultaneous estimation of the risk associated with the risk scores (CAIDE, FRS, FINDRISC) in three transitions (or change in health states) over the follow-up: (1) from a healthy state to incidence of cardiometabolic disease, (2) from cardiometabolic disease over the follow-up to incidence of dementia, and (3) from a healthy state to incidence of dementia in those free of cardiometabolic disease over the follow-up. Participants who died over the follow-up were censored at date of death in order to take the competing risk of death into account [27]. Age was used as the timescale, and analyses were undertaken using R (mstate). These analyses were undertaken using risk scores from 1991 to 1993 and at ages 55, 60, and 65.

Sensitivity analysis
Two sets of analyses were carried out. As information on dementia subtype was not available for all cases, we used data on the history of cardiovascular disease (myocardial infarction or stroke) over the follow-up to create a proxy indicator for Alzheimer's disease dementia defined as dementia without a history of cardiovascular disease. We also undertook analysis using Fine and Gray subdistribution hazard models to assess whether the results differed using an alternative method of taking the competing risk of mortality. This method is recommended when the focus is on quantifying an individual's absolute risk [22]; although this was not our focus, we examined whether results using this approach were broadly consistent with the main findings.

Results
Of the 10,308 participants recruited to the Whitehall II study in 1985-1988, a total of 8814 participated at the 1991-1993 wave (Additional file 1: Fig. S3). The analyses using risk scores from 1991 to 1993 were based on 7553 participants, followed for incidence of dementia. Table 1 presents characteristics of this study population in 1991-1993, mean age of participants was 50 (range 39 to 63) years, and 318 cases of dementia recorded over a mean follow-up of 23.5 (SD = 4.0) years. As expected, there was accelerated cognitive decline in the years leading to dementia diagnosis (Additional file 1: Fig. S4), supporting the validity of dementia ascertainment.
The CAIDE study used logistic regression to calculate predictive performance as data on dementia status were not available throughout the follow-up. This was not the case in our study as incidence of dementia and date of death (to take competing risk into account) were available over the entire follow-up, leading us to use survival analyses. The C-statistic obtained using logistic regression and a 20-year follow-up in our study (0.80; 95% CI 0.78, 0.82) is comparable to that in the CAIDE study (0.77; 95% CI 0.71, 0.83).
The predictive performance of all three risk scores, drawn from baseline in 1991-1993, for dementia is presented in Table 2. FINDRISC had the weakest association with dementia (HR = 1.52; 95% CI 1.38, 1.67), and Cstatistic was also lower than CAIDE (0.630 compared to 0.714, p < 0.001, Table 2). CAIDE and FRS had similar discrimination (C-statistic of 0.714 and 0.719, respectively, p = 0.727) but CAIDE had a slightly better fit (Δ AIC = 3.3). Calibration, reflecting the agreement between observed outcomes and prediction, shows age on its own to do better than the risk scores (Fig. 1); GND values suggest poor calibration for CAIDE and FINDRISC.
Analyses stratified by age were based on risk scores for each participant using data from waves closest to when participants were 55 (mean = 55.6; SD = 2.3), 60 (mean = 59.9; SD = 2.0), and 65 (mean = 64.6; SD = 2.1) years; exact age was used in the calculation of the risk scores. The performance indicators for the three risk scores were considerably poorer in these analyses (Table 2), with poor discrimination (C-statistic lower than 0.60) and variance explained (R 2 less than 10%). CAIDE was associated with dementia when assessed at age 55 (HR = 1.22; 95% CI 1.09, 1.38) but not at age 60 (HR = 1.03; 95% CI 0.92, 1.16) or 65 (HR = 1.05; 95% CI 0.93, 1.18). Although all scores performed poorly in these analyses stratified by age at risk assessment, FRS and FINDRISC had better predictive ability than the CAIDE as assessed by R 2 (CAIDE always the lowest), C-statistic (CAIDE always the lowest, albeit not significantly), and AIC (Δ AIC > 3). Further complimentary analyses on the role of age show that age on its own, ranging from 39 to 63 years, had the strongest association with dementia and better predictive performance than all three risk scores (Additional file 1: Table S5).
In sensitivity analyses, we repeated these analyses using a proxy for Alzheimer's disease dementia (dementia cases without a history of cardiovascular disease); findings were broadly similar to that in the main analyses (Additional file 1: Table S6). Analysis using subdistribution hazard models for competing risk of death were also similar to those obtained using cause-specific hazard models (Additional file 1: Table S7). Table 3 presents results from the multistate models. In analyses using risk scores from 1991 to 1993, all three risk scores were associated with higher risk of dementia in those who remained free of cardiometabolic disease over the follow-up (transition "healthy to dementia", Table 3). In analyses stratified by age at risk factor assessment, only FRS at age 55 was associated with risk of dementia in those free of clinical cardiometabolic disease over the mean follow-up of 17.8 years (HR = 1.33; 95% CI 1.11, 1.60). As expected, all three risk scores were associated with incidence of cardiometabolic disease, irrespective of the age at which risk factors were assessed (transition "healthy to cardiometabolic disease", Table 3).

Discussion
The long course of dementia makes midlife an important target for dementia prevention. Given the multifactorial etiology of dementia, risk scores represent an effective prevention tool but CAIDE is the only existing midlife risk score constructed specifically for dementia. We compared its predictive performance for dementia to risk scores developed for cardiovascular disease (FRS) and diabetes (FINDRISC) and found it not to be better at predicting dementia. We also found all three risk scores to have poor discrimination (C-statistic < 0.60) for dementia when the effect of age was neutralized by assessing risk factors at ages 55, 60, and 65 years. Finally, Physical activity causing sweating ≥ 2 times/week for a total weekly duration ≥ 1 h analyses using multistate models showed only the Framingham cardiovascular score at age 55 to be associated with risk of dementia in those who remained free of cardiovascular disease and diabetes until dementia diagnosis.
Most existing risk scores for dementia are based on predictors assessed at older ages, with reviews concluding that their predictive accuracy is poor [28][29][30]. We did not assess the predictive performance of risk scores constructed for use at older ages; our focus was midlife as the pathophysiological processes underlying dementia unfold over many years, perhaps decades [31], making it important to consider the age at assessment of risk factors. CAIDE, based on midlife predictors, was reported as having a C-statistic of 0.77 in the derivation cohort [12]. A study using Kaiser Permanente data on adults aged 40-55 years at risk factor assessment, followed for a mean 36.1 years, reported a C-statistic of 0.75 [32] for CIADE but analyses of individual components revealed only age and sex of the seven components in CAIDE to be associated with risk of dementia. Given that 4 of the 15 points in the score are due to age (age < 47 scored 0, 47-45 scored 3, and > 53 years scored 4), it is likely to be an important driver of its predictive ability. In the Rotterdam study where all participants were older than 53 at the start of follow-up (hence, had a score of 4 for age), the C-statistic for CAIDE after a 15-year follow-up was only 0.55 (95% CI 0.53, 0.58) [33]. In our data using risk factors from 1991 to 1993, the C-statistic was 0.71 but it fell to 0.55 (95% CI 0.51, 0.59) when risk factors for all participants were assessed at age 55, matching results obtained in the Rotterdam study. Thus, the predictive ability of the CAIDE dementia risk score beyond age was poor at best.
As cardiometabolic risk factors feature prominently in dementia prevention guidelines [2], we examined whether risk profiles developed for cardiovascular disease and diabetes are useful in predicting dementia. Our results do not provide strong evidence for their utility in predicting dementia with the caveat that risk factors included in these algorithms and their categorization were not optimized for dementia prediction. Two results are particularly noteworthy. One, analysis using risk scores constructed at ages 55, 60, and 65 years to remove the effect of age showed the cardiovascular disease and diabetes risk scores to be associated (HR had p < 0.05) with  late-life dementia in contrast to the CAIDE where associations were found only when risk factors were assessed at age 55. Two, only FRS at age 55 was associated with risk of dementia in persons free from clinical cardiometabolic disease at dementia diagnosis.
The long preclinical phase of dementia and the absence of effective disease modification have led to an interest in prevention. Furthermore, several risk factors have an age-dependent association with dementia, particularly cardiovascular risk factors where the risk of dementia is shaped by mid-rather than later-life exposure. This is reflected in our findings for all three risk scores as their predictive performance is systematically better when assessed at age 55 than at age 65 years. The risk score approach can be useful for multifactorial conditions, as demonstrated by the example of cardiovascular disease [34,35]. Better understanding of risk factors has led to the development of both therapeutic strategies and public health campaigns that targeted major risk factors, leading to declines in cardiovascular disease. A similar approach for dementia would be valuable, but it requires consideration of a larger set of predictors (e.g., smoking, cardiovascular disease, glucose, insulin, and inflammatory markers) with careful categorization to best reflect the continuum of risk. For risk factors such as systolic blood pressure, there is now evidence that the 140-mmHg threshold might not adequately capture risk [10]. Cardiovascular risk scores were elaborated and continue to be modified, to better take into account key risk factors along with appropriate categorization of risk factors to improve the predictive ability of risk scores. A similar effort is now needed for dementia prevention and assessment of the predictive ability of existing risk scores is the first step in that process.
Our findings need to be considered in light of the study's strengths and limitations. Strengths include the longitudinal design and repeat risk factor assessments allowing age-specific analyses of dementia prediction and the relatively large population-based sample with the main analysis on 318 cases of dementia compared to 61 in the study population used to develop the CAIDE score. A limitation of the study is the ascertainment of dementia being based on linkage to electronic health records. In the Mayo Clinic Study of Aging and the Adult Changes study, a comparison of passive case finding to active approach showed the passive approach to have high specificity, approximately 70% sensitivity, and to miss mostly milder cases of dementia [36]. A similar pattern is likely in our study as health coverage is universal in the UK, and electronic health records have been shown to be reliable for the ascertainment of dementia status [37]. As dementia ascertainment in our study is independent from the assessment of risk factors, major bias is unlikely. Furthermore, we were able to undertake analyses on everyone with data on risk factors rather than only those who were alive 20 years later and participated in an in-person assessment of dementia status. Finally, we were unable to examine the subtypes of dementia due to small numbers. However, our analysis of dementia without a history of cardiovascular disease as a proxy for Alzheimer's disease suggests that the findings are likely to be generalizable to all major types of dementia.

Conclusions
Dementia is a worldwide health, economic and socialcare priority. The latest systematic review of global prevalence estimates the number of people living with dementia at 46.8 million with this number expected to double every 20 years until 2050 [38]. Even a 1-year delay in dementia onset is projected to lead to 9.2 million fewer cases worldwide by 2050 [39]. However, the manner in which this can be achieved remains unclear.