The UK biobank is a large-scale population-based prospective cohort study of 502, 656 UK residents aged 40 to 69 years who were registered with the National Health Service (NHS). The overall study protocols and data are available elsewhere . Briefly, baseline assessments were performed between 2006 and 2010 in 22 assessment centres across the UK. Participants completed electronic questionnaires to provide information on socio-demographics, lifestyle, environmental exposures, medical history and cognitive functions. Physical examinations including blood pressure, heart rate, grip strength, anthropometrics and spirometry were done for all participants. Biological samples including stored blood, urine and saliva samples were collected. Follow-up of medical conditions was performed mainly through data linkages to hospital records and mortality registries.
This study was reviewed and approved by the National Information Governance Board for Health and Social Care and the NHS North West Multicenter Research Ethics Committee (11/NW/0382) and the Biobank consortium (application no. 62489). Since we used de-identified data in a public dataset, the Medical Research Ethics Committee of Guangdong Provincial People’s Hospital waived the requirements to obtain the ethical approval. The study was performed in accordance with the Declaration of Helsinki. All participants provided informed consent.
Between 2009 and 2010, ophthalmic examinations were introduced at six assessment centres across the UK . The 45° non-mydriatic retinal fundus and optical coherence tomography (OCT) imaging of the optic disc and macular were captured using a spectral domain OCT for each eye (Topcon 3D OCT 1000 Mk2, Topcon Corp, Tokyo, Japan). At baseline, ophthalmic examinations were completed in 66,500 participants, resulting in a total of 131,238 fundus images.
Deep learning model for age prediction
A total of 80,169 images from 46,969 participants passed the image quality check and were included in the analysis. The characteristics of the participants stratified by the number of images passing the quality check were described in detail in Additional file 1: Table S1. Among 46,969 participants, 11,052 participants did not report any previous disease at baseline. The DL model for age prediction was constructed based on fundus images of disease-free participants. To maximize the data available, binocular images, if available, were used for training and validation. The association between the retinal age gap and stroke was investigated using images from the remaining 35,304 participants who had no history of stroke at baseline. Images from the right eye were included in the test set to predict retinal age and images from the left eye were used only if images from the right eye were not available.
The methods of retinal age prediction using DL models were described in detail in a previous study . Our previous study has assessed the performance of the DL model for age prediction. The DL model accurately predicted retinal age, as reflected by a strong correlation of 0.80 (P<0.001) between predicted retinal age and chronological age, as well as an overall mean absolute error (MAE) of 3.55 years. Attention maps retrieved from the DL model for age prediction mainly highlighted areas around the retinal vessels in the fundus images.
Retinal age gap definition
The retinal age gap was defined as the difference between the retinal age predicted by the DL model based on fundus images and the chronological age. A positive retinal age gap indicated an ‘older’-appearing retina, while a negative one indicated a ‘younger’-appearing retina.
Stroke was ascertained by the UK Biobank Outcome Adjudication Group, and was defined by codes 430.X, 431.X, 433.X1, 434.X1, 436.X in the 9th edition of the International Classification of Diseases (ICD-9) and ICD-10 codes I60, I61, I63, and I64. Stroke events were derived from linked electronic health records, including hospital records on admissions and diagnoses from hospitals in England, Scotland and Wales, as well as cause of death obtained from national death registers. The date of the first known stroke after baseline assessment was recorded. The follow-up period for each participant was defined from the recruitment date of the UK Biobank study to 28th February 2018 (the last follow-up date), or to the date of the first known stroke, whichever came first.
Covariates in the present analyses included socio-demographic factors (baseline age, sex, ethnicity, Townsend deprivation indices [TDI], education attainment), lifestyle factors (smoking status, drinking status, physical activity level), and general health status. Baseline age and sex were obtained from central registry or self-reported questionnaires. Self-reported ethnicity was divided into white or non-white. TDI was a proxy measure of socioeconomic status based on the postcode. Education attainment was classified into college/university degree or above, or others. Smoking and drinking status were categorized as current/previous users, or never. Physical activity level was categorized into reaching moderate/vigorous/walking recommendation or not. General health status was classified as excellent/good or fair/poor. Body mass index (BMI) was calculated as the weight of an individual in kilograms divided by their height in meters squared. Obesity was defined as a BMI of 30 kg/m2 or above. Diabetes mellitus was defined as any record of self-reported or doctor-diagnosed diabetes mellitus, or the use of anti-hyperglycaemic medications or insulin. Hypertension was defined as self-reported, or doctor-diagnosed hypertension, or the use of antihypertensive drugs, or an average systolic blood pressure ≥ 130 mmHg or an average diastolic blood pressure ≥ 80 mmHg.
Continuous variables were reported as means and standard deviations (SDs), while categorical variable was reported as numbers and percentages. Unpaired t-tests and Chi-square tests were performed to examine the differences of the continuous and categorical variables, respectively. The log-rank test was used for comparing the survival distributions among different retinal age gap groups. Cox proportional hazards regression models were used to estimate the effect of retinal age gap on the risk of stroke. Each variable was assessed for the proportional hazards assumption and all of them met the assumption. Retinal age gap was introduced into the models as a continuous variable (per one-year increase) and a categorical variable (quintiles), respectively. Model I adjusted for baseline age, sex, and ethnicity. Model II adjusted for all variables in model I, and also TDI, educational level, smoking status, drinking status, physical activity level, diabetes mellitus, hypertension, obesity and general health status. Logistic regression models were used to estimate the predictive value of the well-established conventional risk factor-based model (including age, gender, smoking status, history of diabetes, systolic blood pressure, and total cholesterol to HDL-cholesterol ratio)  and the retinal age-based model in 10-year stroke risk. Area under the receiver-operating-characteristic curve (AUC) was used to describe the discrimination of the models in predicting 10-year stroke risk.
Sensitivity analysis was performed to adjust for the age-squared term in the final models in addition to age. We also investigated whether retinal age acceleration residual (defined as the residuals from regressing predicted retinal age against chronological age) was a biomarker of stroke in the second sensitivity analysis.
A two-sided p value of < 0·05 indicated statistical significance. All analyses were performed using R (version 3.3.0, R Foundation for Statistical Computing, www.R-project.org, Vienna, Austria) and Stata (version 13, StataCorp, TX, USA).