High-risk multimorbidity patterns on the road to cardiovascular mortality

Background Multimorbidity, the co-occurrence of two or more diseases in one patient, is a frequent phenomenon. Understanding how different diseases condition each other over the lifetime of a patient could significantly contribute to personalised prevention efforts. However, most of our current knowledge on the long-term development of the health of patients (their disease trajectories) is either confined to narrow time spans or specific (sets of) diseases. Here, we aim to identify decisive events that potentially determine the future disease progression of patients. Methods Health states of patients are described by algorithmically identified multimorbidity patterns (groups of included or excluded diseases) in a population-wide analysis of 9,000,000 patient histories of hospital diagnoses observed over 17 years. Over time, patients might acquire new diagnoses that change their health state; they describe a disease trajectory. We measure the age- and sex-specific risks for patients that they will acquire certain sets of diseases in the future depending on their current health state. Results In the present analysis, the population is described by a set of 132 different multimorbidity patterns. For elderly patients, we find 3 groups of multimorbidity patterns associated with low (yearly in-hospital mortality of 0.2–0.3%), medium (0.3–1%) and high in-hospital mortality (2–11%). We identify combinations of diseases that significantly increase the risk to reach the high-mortality health states in later life. For instance, in men (women) aged 50–59 diagnosed with diabetes and hypertension, the risk for moving into the high-mortality region within 1 year is increased by the factor of 1.96 ± 0.11 (2.60 ± 0.18) compared with all patients of the same age and sex, respectively, and by the factor of 2.09 ± 0.12 (3.04 ± 0.18) if additionally diagnosed with metabolic disorders. Conclusions Our approach can be used both to forecast future disease burdens, as well as to identify the critical events in the careers of patients which strongly determine their disease progression, therefore constituting targets for efficient prevention measures. We show that the risk for cardiovascular diseases increases significantly more in females than in males when diagnosed with diabetes, hypertension and metabolic disorders.

is an ancestor of node j; the set S 0 is represented by the root node. To each node k is further associated a 17 vector p (k) = (p Geometrically, p (k) is the centroid of the |S k | binary vectors represented by node k, embedded into R N . For is the probability that an observation, randomly selected from S k , has feature j.

20
The inertia of a node k is defined as where '•' denotes the coefficientwise product, '·' is the Euclidean scalar product, 1 = (1, . . . , 1) and · 2 22 denotes the Euclidean norm. For a given feature 1 ≤ j ≤ N , the value p according to the entries of w. Note that our use of the letter w differs from [1], where weights were assigned 27 to observations.

28
Let after any number of iterations of the algorithm node k be a leaf node of the tree constructed thus far.

29
To simplify the notation, we drop the index k and let S be the set of observations represented by that node.

30
Define for a given feature 1 ≤ j ≤ N the subsets S j 0 = {i ∈ S | X i,j = 0} and S j 1 = {i ∈ S | X i,j = 1}. Hence, 31 the sets S j 1 and S j 0 partition the set S into those observations which do or do not have feature j, respectively.

32
Among the features 1 ≤ j ≤ N for which v j = 1, the algorithm now selects the feature 33 j 0 = argmin where I j 0 and I j 1 are the inertias of S j 1 and S j 0 , respectively, and Node k now becomes the parent node of two newly created leaf nodes representing the subsets S j0 0 and 35 S j0 1 , respectively. The feature j 0 is hence the feature used to split up node k; the binary vector v specifies where X 1 , . . . , X s are iid random variables with X 1 ∼ Bernoulli(p). It follows from the central limit theorem 99 that for s → ∞, where d − → denotes convergence in distribution. This means that for δ > 0, where C(·) is the cumulative distribution function of the standard normal distribution. For sufficiently large s, the condition that is equivalent to p ≥ δ 2 /(sλ 2 ).

102
For δ 1.96, 2C(δ) − 1 = 0.95. Our condition for an edge to be robust is therefore that S5 Performance of the method and comparison with benchmarks 120 To quantify how well the cluster transitions shown in Fig. 6 of the main text characterize the progression 121 of actual multimorbid health states, we evaluate the performance of our method for simulating patient 122 trajectories in terms of several metrics and compare these metrics to two benchmark models that replace 123 DIVCLUS-T by a clustering scheme which assigns patients to clusters based on a single disease. 124 We denote for 1 ≤ d ≤ N by p d the probability that a randomly selected patient of the study cohort has 125 been assigned a diagnosis from the ICD-10 code block d until the end of the observation period. The total 126 disease burden of the study cohort at the end of the observation period is given by where M = 5, 112, 811 is the size of the cohort and p = (p 1 , . . . , p N ). Here, for p ≥ 1, p denotes the L p 128 norm, which, for a vector x = (x 1 , . . . , x L ), is defined as Letp d be the marginal probability for a randomly selected patient to have been diagnosed with disease d 130 until the end of the observation period according to our random walk model. The vectorp = (p 1 , . . . ,p N ) is 131 computed as where p (k) = (p  Cluster ID 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.0200 ln(transition rate) Figure S2: Visualisation of the transition rates between each pair of clusters. The image consists of 132 × 132 pixels, each representing one pair of clusters. The pixel in the jth row and kth column is coloured according to the rate at which patients transition between cluster j and cluster k. For better visibility, the pixels on the diagonal from the upper left to the lower right corner of the image, which represent self-loops, have been shaded dark. Cluster ID 0 2 4 6 8 10 ln(step count) Figure S3: Visualisation of the number of patients transitioning between each pair of clusters. The image consists of 132 × 132 pixels, each representing one pair of clusters. The pixel in the jth row and kth column is coloured according to the number of patients who step from cluster j to cluster k or from cluster k to cluster j. For better visibility, the pixels on the diagonal from the upper left to the lower right corner of the image, which represent self-loops, have been shaded dark.   Figure S5: Two-dimensional histogram of the number of distinct diagnoses acquired by patients of the study cohort in the observation period from 2003 to 2014, depending on the 5-year age group they belong to in 2014. The data is normalized such that values sum to one separately for each age group.
1 9 0 0 1 9 1 0 1 9 2 0 1 9 3 0 1 9 4 0 1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0 2 0 1 0 Date of birth    Tables   159   The following table lists Table S61: Inclusion and exclusion criteria for cluster 60. Female ratio: 0%, mean age of patients: 54, mortality: 1%.                                                                               Hypertensive diseases (I10-I15) Other forms of heart disease (I30-I52) Diseases of arteries, arterioles and capillaries (I70-I79) Renal failure (N17-N19)  2.30 ± 0.24 2.21 ± 0.14 1.73 ± 0.11 120 m 2.09 ± 0.12 1.58 ± 0.08 1.47 ± 0.07 1.68 ± 0.12 1.35 ± 0.08 1.28 ± 0.07 f 3.04 ± 0.18 2.17 ± 0.10 1.59 ± 0.07 2.39 ± 0.18 1.86 ± 0.10 1.48 ± 0.07 Table S134: Relative risks for patients in clusters 112, 119 and 120 to step into the high cardiovascular mortality region within one year, compared with all patients of the same sex and age group who have not been assigned to the high cardiovascular mortality region (left) and patients of the same sex and age group in cluster 0 where patients have not been assigned any hospital diagnoses yet (right).   Table S137: Cluster transitions which are significantly overrepresented in males compared with females. The columns 'Source' and 'Target' give the labels of the source and target cluster of the corresponding transition, 'Rate male>' gives the lower bound of the 95% confidence interval for the rate at which males of the corresponding age group step from the source cluster to the target cluster; 'Rate female<' gives the upper bound of the 95% confidence interval of the same rate for females. The column 'Inclusion diag. source' gives the inclusion criteria of the source cluster, 'New diag.' denotes the diagnoses which patients acquire when stepping from the source to the target cluster. The column 'Baseline<' gives the upper boundary of the 95% confidence interval of the incidence of the diagnosis block in the column 'New diag.' for men of the corresponding age group. If two blocks are in the columns 'New diag', the smaller incidence value is given. To exclude trivial results, only transitions made by at least 20 female patients are listed.