Characteristics, service use and mortality of clusters of multimorbid patients in England: a population-based study

Background Multimorbidity is associated with mortality and service use, with specific types of multimorbidity having differential effects. Additionally, multimorbidity is often negatively associated with participation in research cohorts. Therefore, we set out to identify clusters of multimorbidity patients and how they are differentially associated with mortality and service use across age groups in a population-representative sample. Methods Linked primary and secondary care electronic health records contributed by 382 general practices in England to the Clinical Practice Research Datalink (CPRD) were used. The study included a representative set of multimorbid adults (18 years old or more, N = 113,211) with two or more long-term conditions (a total of 38 conditions were included). A random set of 80% of the multimorbid patients (N = 90,571) were stratified by age groups and clustered using latent class analysis. Consistency between obtained multimorbidity phenotypes, classification quality and associations with demographic characteristics and primary outcomes (GP consultations, hospitalisations, regular medications and mortality) was validated in the remaining 20% of multimorbid patients (N = 22,640). Results We identified 20 patient clusters across four age strata. The clusters with the highest mortality comprised psychoactive substance and alcohol misuse (aged 18–64); coronary heart disease, depression and pain (aged 65–84); and coronary heart disease, heart failure and atrial fibrillation (aged 85+). The clusters with the highest service use coincided with those with the highest mortality for people aged over 65. For people aged 18–64, the cluster with the highest service use comprised depression, anxiety and pain. The majority of 85+-year-old multimorbid patients belonged to the cluster with the lowest service use and mortality for that age range. Pain featured in 13 clusters. Conclusions This work has highlighted patterns of multimorbidity that have implications for health services. These include the importance of psychoactive substance and alcohol misuse in people under the age of 65, of co-morbid depression and coronary heart disease in people aged 65–84 and of cardiovascular disease in people aged 85+.


Background
As a result of improved life expectancy and ageing populations, a growing number of individuals are living with multimorbidity, i.e. more than one long-term condition [1,2]. Multimorbidity has been recognised as a global challenge for health care management [3], and it is estimated by the Health Foundation that 14 million individuals in England have multimorbidity, with over a third of these having more than four long-term conditions [4]. Patients with multimorbidity also account for the majority of primary care consultations, prescriptions and hospitalisations in the UK [5]. However, current clinical specialities, guidelines, quality improvement strategies and quality of care metrics are organised around single diseases [6], and treatments of multiple conditions are rarely coordinated, resulting in insufficient or even conflicting care [6,7].
Patients with multimorbidity have a diverse range of diseases, needs and outcomes [4,5,8]. Identifying and characterising groups of multimorbid patients that share similar patterns of long-term conditions might facilitate an improvement in their healthcare. For example, such an approach might aid the development of effective strategies for early diagnosis and prevention of multimorbidity and allow for a better design and delivery of targeted interventions [1,9]. Several systematic reviews have found common multimorbidity clusters involving cardiovascular-metabolic conditions, mental health and musculoskeletal disorders [10,11]. However, existing evidence has important limitations. The two largest studies of specific multimorbidity groups in the UK have used UK Biobank data where participants are healthier (i.e. less multimorbid), smoke and drink less and are from less socioeconomically deprived areas than the overall population [12][13][14]. Second, most previous studies have focused on older populations (aged 60+); few have provided age-stratified clusters [10,11,15], leaving scarce evidence for the younger multimorbid population. Third, multimorbidity clusters composed of more than two conditions have not been well profiled mostly due to non-representative and smaller samples [1,10,11]. Fourth, there is substantial heterogeneity in the number of conditions considered (often less than 20) and in the statistical methods. Most studies focused on grouping diseases rather than patients, where each disease can only go into one cluster and so it is not straightforward to relate patients to outcomes in order to facilitate patient-centred policy-making [16]. Commonly used clustering methods were exploratory approaches such as factor analysis and hierarchical clustering [10,17], where results were highly sensitive to the subjective choice of metrics [18]. Finally, the validity and generalisability of cluster solutions in new samples is important for decision-making but is often ignored in the current literature [10,11].
This study aims to identify, validate and study the outcomes of age-stratified clusters of multimorbid adult patients in a large representative sample of UK patients. Towards this end, we used a comprehensive list of 38 long-term conditions [5] and a robust model-based probabilistic approach, latent class analysis [18].

Data source
Our analysis used the Clinical Practice Research Datalink (CPRD)-GOLD database where anonymised and longitudinal primary care clinical data are contributed by UK general (family) practices (GP) who use the Vision health record system [19]. CPRD has been validated to be representative of the UK population for age, sex and ethnicity [19,20]. Patients' GP records were linked with hospitalisation data (Hospital Episodes Statistics, HES), all-cause mortality data (Office for National Statistics, ONS) and area-based (1000-1500 people per area around patients' home) socioeconomic deprivation data (Index of Multiple Deprivation, IMD); these linked data were available for approximately 75% of English practices in CPRD. The protocol for this study (16_057RA2) was approved scientifically and ethically by the CPRD Independent Scientific Advisory Committee.

Study population
Data on a random selection of individuals were acquired from CPRD (the same individuals studied in Cassell et al. [5]). Patients aged 18 years and above with valid registered status in a practice with data classified by CPRD as "up-to-standard" in January 2012 [19] were included in the study. We chose the year 2012 to allow complete ascertainment of 5-year mortality. Additionally, we required that their practice allowed linkage to ONS, IMD and HES, resulting in the inclusion of only English practices.

Patient and public involvement
There was no patient or public involvement in this study.

Statistical software
Data analysis was performed in R 3.4.4. R package names are given in the following sections where appropriate (in brackets and italics); for example, memory-efficient packages were used to extract data for analysis in R (ff, CALIBERdatamanage). For transparency and reproducibility, all analysis scripts and code lists are available from https://github.com/Kiddle-group.

Definition of patient characteristics, morbidities and outcomes
Morbidities in this study were defined as binary variables (present or not) based on the classification of LTCs in primary care developed by Barnett et al. [8]. This taxonomy attempted to include all conditions "likely to be chronic (defined as having significant impact over at least the most recent year) and with significant impact on patients in terms of need for chronic treatment, reduced function, reduced quality of life and risk of future morbidity and mortality", and was developed for use in UK primary care electronic health record research and has been adapted for use in CPRD [5]. The specific definitions for each LTC are based on the UK Read code system and electronic prescription data coded using CPRD's prodcode, giving a total of 38 LTCs (https:// www.phpc.cam.ac.uk/pcu/cprd_cam/codelists/v11/). The LTCs used in this study largely match the only other large sample size UK multimorbidity cluster study [13].
Two sets of outcome variables related to service use and mortality were defined. NHS service utilisation or treatment burden was measured by three variables over the 12-month period after January 2012: primary care consultations (consultations with any clinician in the primary care team), the number of all-type hospitalisation spells (defined by discharge dates) and the count of regular medications (at least four prescriptions in a year by counting the unique British National Formulary (BNF) codes). All-cause mortality at 2 and 5 years was extracted from ONS data.
Patient characteristics that were considered in this study include gender, age groups (stratified into 18-44, 45-64, 65-84 and 85+ years) in 2012, last recorded pre-2012 body mass index (BMI), last recorded pre-2012 smoking status (current, never and ex-smokers) and socioeconomic deprivation measured by quintiles of IMD across the UK (1 for the least socioeconomically deprived quintile of areas and 5 for the most). Gender and age were determined in a straightforward manner from the CPRD-GOLD patient table. BMI and smoking status were extracted from the CPRD-GOLD clinical and additional tables using CPRD entity type 13 (BMI), CPRD entity type 4 (smoking status) and a smoking status Read code list from Jennifer Quint (Imperial College London) which is available at https://github.com/Kiddle-group.

Statistical analysis
This study aims to identify clusters of multimorbid patients using patterns of co-existing long-term conditions. We used latent class analysis (LCA) (poLCA) to assign all patients to non-overlapping clusters (i.e. each patient is assigned to only one cluster) in a data-driven fashion [21,22]. Compared to other exploratory clustering methods (e.g. factor analysis, hierarchical clustering [11,23]), LCA is a model-based probabilistic clustering approach that is not sensitive to the rotation of factors and does not require any subjective choice of "distance measures" for multimorbidity patterns [18,24]. This greatly enhances the reproducibility and stability of the latent class solutions. Clustering patients rather than diseases allows diseases to belong to multiple clusters and more naturally allows the characteristics and outcomes of clusters to be studied. As a result, each derived patient cluster has a unique and probabilistic multimorbidity phenotype profile where members do not necessarily need to have all conditions.
Guided by simulation studies [25], the optimal number of latent classes was decided using a combination of statistics (Bayesian Information Criteria (BIC), sample sizeadjusted BIC, log-likelihood ratio test, entropy for classification quality) and clinical judgement. Within our datasets, conditions are present (i.e. recorded) or not by definition, and so missing data methods were not needed for cluster analysis. More details on a technical review of commonly used clustering methods, the LCA methodology and application of selection statistics are provided in Additional file 2: section 3.
To account for the different nature of multimorbidity clusters at different ages, four age strata (18-44, 45-64, 65-84, 85+ years) were chosen. We derived the cluster solution and performed post hoc statistical tests in a stratified (by age strata) random sample of the multimorbid population that contained 80% of the patients (i.e. training set). Separate LCAs were performed for each stratum, and each patient allocated to a single multimorbidity cluster. For ease of interpretation, clusters were labelled by their three most distinctive conditions whose difference in prevalence between cluster and age strata were the highest (see Additional file 2: table 1 for full details of conditions). To quantify the association between outcomes, multimorbidity clusters and patient demographics, generalised linear models were fitted (see Additional file 2: section 5). In the models for service use and health outcomes, the multimorbidity cluster with the lowest impact on the outcomes was taken as reference. For these models, individuals with missing data for last pre-2012 recording of smoking status or BMI represented a small percentage (< 5%) of the population, and so were excluded from the generalised linear models. Additional sensitivity analyses also using complete-case analysis were performed for the entire sample where the non-multimorbid patient group (which had higher missingness of smoking and BMI) was taken as reference (see Additional file 2: tables [16][17][18][19].

Assessment of the stability of morbidity clusters
To assess the stability of age-stratified multimorbidity clusters, LCAs were repeated in the remaining 20% of the population (i.e. test set), fixing the number of clusters to match that learned from the training set [26]. We employed three methods to indirectly validate our cluster solutions (a direct approach was not possible as clusters were unobserved). First, to check the consistency between disease profiles for 38 LTCs in the training and test sets, each cluster in the test set was matched (using two criteria for robustness) with a corresponding cluster in the training set. Matched cluster pairs were selected such that Jensen-Shannon distance [27] (JSD; a measure of the divergence between disease profiles) is the smallest and the bivariate Pearson's correlation coefficient [28] (the degree to which two disease profiles co-vary) is the highest (Additional file 2: tables 4a, b). Second, entropy measures [25] (for classification quality) computed in the training and test sets were expected to be similar. Finally, stability was further assessed by observing in the training and test sets similar associations (in terms of size, direction and statistical significance) between clusters, patient demographics and outcome variables. For more details, see Additional file 2: section 4.

Characteristics of the study population
A total of 391,669 patients were included in the study, of which 49% and 22% had none or only one long-term condition respectively (see Table 1 for patient demographics). Females, older individuals and those from areas of greater socioeconomic deprivation had a higher prevalence of multimorbidity.
Among the multimorbid patients (i.e. those with more than one long-term condition, N = 113,211, 29%), all unique combinations of conditions were less than 1% prevalent in the total population with the most prevalent 20 containing only pairs of conditions (Additional file 2: table 2). This together with the large number of unique combinations of conditions (Additional file 2: table 3) indicated that multimorbidity patterns were highly heterogeneous. Stratified by age strata, Table 2 shows that multimorbidity in the younger population (18-44) was more common in areas with greater socioeconomic deprivation while for older groups, the pattern is reversed.

Multimorbidity clusters and outcomes
For ease of reference, we refer to each cluster by its lead or key conditions (i.e. one or three conditions, respectively, whose cluster-specific prevalence is highest, and higher than their overall prevalence in their respective age group). These clusters differ across age strata, both in terms of the number of clusters per strata and main components within each cluster (Table 3 and Additional file 2: figures 1-4). The association between multimorbidity clusters and outcomes (service use and mortality) remained significant (p < 0.01 in almost all clusters) after stratifying by age strata, controlling for socioeconomic deprivation, BMI and smoking behaviour (Additional file 2: tables [8][9][10][11]. Results for the distribution of outcomes (i.e. median, interquartile range (IQR)) are shown in Table 4. Covariate-adjusted incidence rate ratios (aIRRs) for service utilisation and odds ratios (OR) for mortality derived from generalised linear models are available in Additional file 2: tables 8-15 (adjusted covariates were gender, socioeconomic deprivation, smoking status, BMI and age).

Age strata: 18-44 years old
Five clusters were uncovered in the 18-44 age strata (Additional file 2: figures 1 & 5), whose lead conditions were depression (the most common cluster, 32% of patients in strata), pain (23%), asthma (20%), irritable bowel syndrome (18%) and psychoactive substance misuse (7%). Those in the cluster whose three key conditions were depression (within-cluster prevalence 100%), anxiety (41%) and pain (31%) were found to have the highest use of primary care consultations (median 12  in a year). This cluster had an aIRR of primary care consultations of 1.35 (95% confidence interval (CI) 1.28-1.43) in comparison with the cluster with the lowest service use and mortality (whose lead condition was irritable bowel syndrome).
Those in the cluster whose three key conditions were pain (36%), hearing loss (30%) and hypertension (23%) were found to have the highest hospital admission rates (an average of 0.6 visits in a year) and the highest count of regular medicines (median 1 [IQR 0-4] unique drug classes in a year). This corresponded to an aIRR for hospitalisations of 1.04 [95% CI 0.90-1.20] and an aIRR for regular medicines of 1.87 [95% CI 1.74-2.02] relative to the cluster with the lowest service use and mortality.
The highest mortality in this age range was found in the least prevalent (7%) multimorbidity cluster whose three key conditions were psychoactive substance misuse (75%), alcohol problems (42%) and depression (24%) (3.9% mortality in 5 years). This level of mortality was 18 times higher than that of individuals in the same age range without multimorbidity (0.2%). This cluster was predominantly male (72%), came from socioeconomically deprived areas (63% from the most deprived 40% of UK areas) and with high smoking rates (76% current smokers).

Age strata: 45-64 years old
In the 45-64 age strata, LCA revealed five clusters (Additional file 2: figures 2 & 6), whose lead conditions were hypertension (the most common cluster, 37% of patients in strata), irritable bowel syndrome (24%), depression (22%), asthma (12%) and alcohol problems (4%). Those ; its key conditions were alcohol problems (62%), psychoactive substance misuse (42%) and pain (28%). Again, this cluster was characterised by being typically male smokers from areas of high socioeconomic deprivation. Pain as a comorbidity was represented in all the clusters in this age group.

Age strata: 65-84 years old
Six clusters were found in the 65-84 age strata (Additional file 2: figures 3 & 7), whose lead conditions were hypertension (the most common cluster, 41% of patients in strata), hearing loss (22%), depression (14%), coronary heart disease (11%), chronic obstructive pulmonary disease (8%) and pain (5%). The least prevalent multimorbidity cluster (5%), whose key conditions were pain (81%), coronary heart disease (53%) and depression (45%), had the highest use of primary care consultations (median 23   , whose lead conditions were hypertension (the most common, 58% of patients in strata), pain (23%), heart failure (11%) and asthma (8%). The majority of patients (58%) fitted within a cluster whose key conditions were hypertension (72%), hearing loss (39%) and diabetes (18%). The cluster with the majority of patients had the lowest mortality (50% 5-year mortality), as well as the least number of conditions (median 3 [IQR 2-4] morbidities), and the least health care utilisation (roughly half the GP contacts, hospitalisations and regular medicines of the cluster whose lead condition was "coronary heart disease"). The cluster with the highest mortality, GP contact, hospitalisations and repeat prescriptions comprised a trio of cardiac IBS irritable bowel syndrome, PSM psychoactive substance misuse (not alcohol), CHD coronary heart disease, COPD chronic obstructive pulmonary disease conditions: coronary heart disease, atrial fibrillation and heart failure.

Validation of cluster morbidity profiles
As well as validating the clusters by their association with patient characteristics and outcomes, the similarity of multimorbidity clusters was compared between the training set (80% of patients, N = 90,571) and the test set (the other 20% of patients, N = 22,640). Results are summarised below and given in full in Additional file 2: section 4. Measures of cluster quality (i.e. entropy) were found to be consistent between the training and test sets.
As the training set contained more disease patterns, the derived clusters were more comprehensive. The test set (with fewer patients) contained fewer disease patterns, and therefore, we expected the derived clusters to be a subset of those in the training set. Indeed, validation of cluster profiles showed that every cluster in the test set found a match in the training set. Some clusters were particularly robust (had the smallest JSD and the highest Pearson's correlation coefficient), for instance, those in the largest age strata (65-84 age strata, N = 49, 494), and clusters whose lead condition was depression, psychoactive drug misuse or alcohol problems. A cluster with a less clear match had the lead condition asthma in the 18-44 age strata.

Summary of results and comparison with other studies
This study identified and validated clusters of multimorbid patients using a novel patient-centred approach. In summary, we identified 20 patient clusters across four age strata. In the younger age-strata (18-44; 45-64), the clusters with the highest mortality (18 times higher than the non-multimorbid group in 18-44-year olds) comprised psychoactive substance abuse in combination with alcohol problems. The clusters with the most contact with general practice in people aged under 65 comprised depression, anxiety and pain. In 65-84-year olds, the cluster with the highest mortality and highest health service use (GP contact, hospitalisations, repeat prescriptions) comprised pain, coronary heart disease and depression, and in people aged 85 or over, it comprised heart failure, coronary heart disease and atrial fibrillation. The most common cluster in 18-44-year olds was centred around depression, but in all other age groups, they were centred around hypertension. In the oldest age group, this hypertension-centred cluster was associated with the best survival and lowest health service use among multimorbid patients. Pain featured in 13 of the clusters.
In this study, unlike most previous analyses of multimorbidity, we have defined novel clusters in terms of patients rather than diseases [10,11]. These clusters have practical implications for service delivery, by supporting a move away from healthcare for individual diseases towards the care of individuals with clusters of conditions, and thus helping to reduce treatment burden [29]. The high mortality of the cluster of psychoactive substance and alcohol misuse warrants attention. The descriptive epidemiology of this cluster (male, smoker, under age 65, relatively low service use, from areas of greater socioeconomic deprivation) supports the need for better integration of substance misuse services within primary care and need to provide improved access to holistic management including for physical health [30]. Conversely, we found that the commonest cluster in people aged 85 and over (58% of patients with multimorbidity in this age group) was associated with the least health service use and the lowest mortality (Table 4). This gives a more nuanced perspective on the association of multimorbidity with age that has been widely reported [8], in that it suggests that in the oldest age group, multimorbidity per se may be less important, although numerically most common. We hypothesise that this is due to a survivor effect, i.e. individuals with more fatal multimorbidity often do not reach this age. Our age-stratified approach also enables different patterns of co-morbidity to be identified. Thus, in younger age groups, clusters focused around mental health are associated with most GP contact; in people aged 65-84, a cluster of mental health and coronary heart disease is associated with most GP contact, and other indicators of health service use, whereas in people aged over 85, the cluster representing most health service contact is dominated by cardiovascular conditions.
In terms of relative importance of single conditions within multimorbid clusters, the predominance of mental health conditions and hypertension has been identified in previous work [10,11,13,31]. Mental health conditions are recognised as having a major impact on health service use over and above physical problems [32], and our findings emphasise the importance of ensuring psychological needs are given equal priority to physical ones in those clusters of patients where mental health problems are prevalent [33]. A novel finding in our work is the inclusion of pain in many of the clusters we identified. This aligns with studies that confirm chronic pain is widespread and emphasises the need to provide integrated pain management services to address the potential adverse impact on health service use and both physical and social functioning [34,35].

Strengths and limitations
The robust identification of such clusters would not have been possible without the novel use of representative data reflecting real-world patterns of multimorbidity, age stratification, patient-level clustering (not requiring all patients to have identical lists of conditions) and validation with held-out data. This is the largest-scale application of age-stratified latent class analysis to multimorbidity, both in patient numbers (above 100,000) and the number of conditions (38) [11]. By including younger patients and stratifying by age, we see how multimorbidity clusters differ over the lifecourse. Combining this with the release of reproducible analysis scripts is an approach which we recommend for future multimorbidity clustering efforts. Our systematic approach including all 38 conditions from Barnett et al. [8], age stratification, clustering and outcomes, was necessary to handle the complexity of multimorbidity in healthcare.
This study suffers from typical limitations of electronic health record research in that they rely on routine coding within the healthcare system including residual confounding and variable CPRD data quality. Wherever practically feasible we have taken steps to address these, e.g. the careful design of codelists, relying on variables with low missingness and adjusting for key variables. Some relevant information, such as disease severity, was not available for the majority of diseases and so was not modelled. This may affect the association of disease with characteristics and outcomes. Given the observational nature of this data, some residual confounding such as this is inevitable, and so we caution that the relationship between clusters, patient demographics and outcomes should not be interpreted causally.
While the clustering approach used (LCA) is a robust probabilistic approach, results may differ subtly if other approaches are used. Validation of latent clusters also requires further research where a larger sample size for the test set, perhaps from another database or country, can strengthen the validation. We notice that in every age strata, there was a cluster whose lead condition (pain, irritable bowel syndrome, hearing loss and asthma respectively) had a within-cluster prevalence of less than 50%, suggesting that they are less distinctive than the other clusters. It is also interesting that they are often the clusters with the lowest mortality. While these were validated in the test set, it may be that bigger datasets are required to split these into more distinct and interpretable clusters. Despite this, given the large and representative sample, the consistency of results both internally, across age strata and with existing literature, we are confident in our main results. Finally, multimorbidity evolves over time, but we only use longitudinal data to extract conditions in 2012 and service use and mortality outcomes.

Conclusion
These multimorbidity clusters highlight major targets for public health and healthcare, giving a more nuanced understanding of multimorbidity than the work of Barnett et al. [8] which rely more on simple disease counts. The 18-fold higher mortality of younger multimorbid patients with psychoactive substance misuse is a clear case of an unmet need. Improving outcomes for this neglected patient group is likely to be feasible given that their risk factors (drug use, smoking, deprivation) are potentially amenable to intervention. Conversely, the fact that the majority of older multimorbid patients have relatively low service use and mortality has implications for the design of health services. More generally, the fact that chronic pain is a key feature of many multimorbidity clusters suggests that it is important to manage pain within the context of multimorbidity rather than in its own right. Similarly, our findings add to the evidence showing the importance of mental health in multimorbid patients, justifying the push for parity of physical and mental health within the healthcare system.
While patients with multimorbidity account for an ever-increasing proportion of healthcare need and provision [1,4,5,7,8], no existing interventions have shown convincing evidence of benefit in improving important outcomes [9,36]. Our findings fit with the suggestion from Salisbury et al. [9] that one reason for the failure of previous interventions is that multimorbidity is heterogeneous, with very different diseases, needs and outcomes in different groups of patients. Our findings support the proposal that interventions to improve outcomes in multimorbidity may be more appropriately targeted on distinct types, and we have systematically highlighted groups of patients where tailored approaches could be attempted.

Additional file 2. Supplementary
Additional file 3. STROBE checklist