Study design and participants
The full study cohort consisted of 2951 consecutive admissions to the acute geriatrics service at University College London Hospital (UCLH) between March 2015 and March 2017 evaluated in the course of an unselected audit of the service. The cohort was drawn from all patients over the age of 70 admitted with any acute general medical problem: the indication for entry into the acute geriatrics service at UCLH. The cohort excluded patients whose admission diagnosis was surgical, or those directly admitted to the intensive care unit. Each patient was reviewed by a consultant geriatrician within 24 h of hospital admission and clinically classified as having (i) delirium only; (ii) dementia only; (iii) delirium superimposed on dementia; or (iv) no or minimal cognitive impairment, from the medical notes and bedside clinical assessment. Admissions were considered as a single episode if the patient was readmitted within 28 days of the prior discharge date. We linked contemporaneous admission information to this clinical dataset, laboratory and imaging investigations, corresponding as closely as possible to the index admission (laboratory results within 48 h of admission; non-contrast CT head imaging performed within four weeks of admission date). The full complement of variables was available for 804 admission episodes involving 616 unique patients (see Fig. 1). The primary diagnosis of each patient was coded as a chapter header of International Classification of Diseases ICD-10. Each patient’s mortality status and date of death were recorded on 24 December 2018 through the hospital vital statistics database (Carecast, GE Healthcare). The study has ethical permission for the analysis of irrevocably anonymized data gathered in the course of routine clinical care. Our reporting adopts the TRIPOD reporting framework.
Haematological and biochemical investigations
Routinely performed blood tests with coverage of at least 75% of the population—full blood count differentials, red cell distribution width, urea, creatinine, glomerular filtration rate, alanine transaminase, alkaline phosphatase, bilirubin, albumin, potassium, C-reactive protein—were linked for each admission. Where there were multiple values, we used both the chronologically indexed first value and the mean and standard deviation for the rest of the admission. Where only one test was performed, first and mean were identical and standard deviation was zero. This procedure yielded a set of 78 variables capturing both static and dynamic changes in each test. Distributions were visually examined, transformed where appropriate, and clipped to enclose values within 99% of the density of the underlying distribution. Missingness is reported in Additional file 1: Table S1.
Clinical investigations are generally guided by prior, clinically informed belief. To capture the effect of such ‘intention to investigate’, five levels of investigative intention combined with obtained values were defined for each test: (1) investigation performed or not performed (one binary variable); (2) counts of investigation performed over the first 48 ours (one real-numbered variable); (3) investigative intention level 1 and the first test value (two variables); (4) investigative intention level 2 and the mean test value (two variables); and (5) the first test value, mean, and standard deviation (three variables).
Data were modelled at different investigative intention levels to quantify the relative predictive content of the intention to investigate vs the actual test values thereby obtained.
Cranial imaging
Non-contrast CT imaging of the head performed within 4 weeks of admission for any indication was linked to each patient episode. Each image was processed within an SPM-based (https://www.fil.ion.ucl.ac.uk/spm/) pipeline that included, in order, rigid-body realignment to Montreal Neurological Institute (MNI) space, resampling to 1 mm3 isotropic resolution, and non-linear unified spatial segmentation and normalisation to MNI space based on a CT-optimised extension of SPM’s unified segmentation and normalisation routine [13], employing a custom, CT-specific atlas of both intensity and spatial distributions [14] (https://github.com/WCHN/CTseg). The unified segmentation and normalisation approach enables robust segmentation of tissues even in the presence of focal pathological changes, which are implicitly modelled as outliers. The presence and nature of any pathology was not explicitly modelled for the following reasons. First, the diversity of pathological appearances in this population—spanning chronic vascular, degenerative, benign neoplastic, metabolic, and traumatic changes commonly comorbid with acute medical admission—is too wide to be successfully captured at moderate data-scales. Second, leaving diverse variation unmodelled can only reduce predictive performance—our primary task—not spuriously enhance it. Third, our objective is not to create an optimal predictive model but enable a meaningful comparison of model flexibility and input dimensionality. Fourth, deploying an array of disease-specific models would greatly complicate the image analysis, introducing potential dependence on methodological specificities that would limit generalizability.
The output of the pipeline for each patient was two sets of probabilistic tissue segmentation maps of grey matter, white matter, cerebrospinal fluid, skull, and meninges/soft tissue: one native and one non-linearly registered to MNI.
Summary statistics of the volumes of each tissue compartment were derived by thresholding each native-space compartment at > 0.5 and summing the result. Total intracranial volume was quantified as the sum of white matter, grey matter, and cerebrospinal fluid volumes; degree of atrophy, as the sum of grey and white matter divided by total intracranial volume.
Sets of downsampled, signal-optimised, high-dimensional representations of each non-linearly registered compartment were created by cubic resampling of each compartmental image at 5 mm isotropic resolution and extracting all voxels meeting the following criteria: tissue probability > 0.5 and voxel-wise probability variance across the cohort > 0.01. These representations were used as input to the predictive models.
Predictive modelling
Predictive models for 600 days post-admission mortality were constructed with the gradient boosting machines-based algorithm XGBoost [15]. The choice of algorithm was motivated by the combination of robustness, flexibility, data efficiency, and optimisability given the scale of available data. To quantify the value of increased dimensionality, we estimated an array of models incrementally increasing in number and range of input variables: (1) age and sex (two variables); (2) primary diagnosis, age and sex (17 variables); (3) cognitive status, age and sex (four variables); (4) primary diagnosis, cognitive status, age, and sex (19 variables); (5) bloods, primary diagnosis, cognitive status, age, and sex (91 variables); (6) CT intracranial, primary diagnosis cognitive status, age, and sex (5367 variables); (7) CT extracranial, primary diagnosis, cognitive status, age, and sex (12989 variables); (8) CT whole brain, age, and sex (18399 variables); (9) CT whole brain, bloods, primary diagnosis, cognitive status, age, and sex (18494 variables); (10) CT whole brain, primary diagnosis, cognitive status, age, and sex (18422 variables). The target outcome for all models was survival at 600 days from admission.
The data were randomly split into training (70%) and testing (30%) partitions, stratified by 600-day mortality outcome. Where multiple CT images were obtained in the same admission episode, the first image was always used. The test partition contained unique patients only. XGB models were trained and optimised using tenfold cross-validation from the training partition only with 600-day mortality and the area under the receiver operating characteristic curve (AUROC) as the evaluation metric. A manually targeted grid search followed a random initial parameter grid search to optimise model hyperparameters (number of estimators, maximum depth, minimum child weight, learning rate, gamma, subsample, column sample by tree (Additional file 1: Table S2). The best performing fold hyperparameters as defined by maximal AUROC were used to quantify performance on held-out test data, evaluated through ten-fold cross-validation of the test set only. For completeness, area under the precision-recall curves (AUPRC) are also provided, without model retuning to that objective. SHapley Additive exPlanations (SHAP) values for the top twenty most contributory non-imaging features were derived from the best performing model to illustrate comparative non-anatomical feature weighting in the fitted model. Model calibration, decision threshold curves, and performance variation with age and sex, are evaluated for the best model. No imputation of any missing data was necessary: XGBoost allows missing values to be modelled explicitly.
Anatomical inference
To understand the anatomical patterns driving the imaging contribution to model fidelity, we sought to identify linear and non-linear voxel-wise associations with the target outcome. To identify linear relations, we performed standard voxel-based brain morphometry of grey matter, white matter, soft tissue, and bone compartments across separate models, all implemented in SPM. At each voxel, the corresponding tissue concentration—the dependent variable—was entered into a multiple regression with survival, age, sex, delirium status, dementia status, degree of global atrophy, and total intracranial volume as independent variables. After model estimation, one-tailed t-tests were performed on the regression coefficients with the resultant SPMs interpreted at a p < 0.05 family-wise error corrected threshold and displayed at p < 0.001 uncorrected to convey the full spatial extent of anatomical association. To identify potentially non-linear associations captured by the XGBoost model, its feature importances, indexed by ranked Gini impurity, were projected back into MNI space for anatomical visualisation.
Code and data availability
The code employed in this study and derived imaging maps are available from the corresponding authors on request by email. The source data is not available for dissemination under the terms of ethically approved access owing to concerns about potential reidentification in the context of high-dimensional data.