Impact of clinical phenotypes on management and outcomes in European atrial fibrillation patients: a report from the ESC-EHRA EURObservational Research Programme in AF (EORP-AF) General Long-Term Registry

Background Epidemiological studies in atrial fibrillation (AF) illustrate that clinical complexity increase the risk of major adverse outcomes. We aimed to describe European AF patients’ clinical phenotypes and analyse the differential clinical course. Methods We performed a hierarchical cluster analysis based on Ward’s Method and Squared Euclidean Distance using 22 clinical binary variables, identifying the optimal number of clusters. We investigated differences in clinical management, use of healthcare resources and outcomes in a cohort of European AF patients from a Europe-wide observational registry. Results A total of 9363 were available for this analysis. We identified three clusters: Cluster 1 (n = 3634; 38.8%) characterized by older patients and prevalent non-cardiac comorbidities; Cluster 2 (n = 2774; 29.6%) characterized by younger patients with low prevalence of comorbidities; Cluster 3 (n = 2955;31.6%) characterized by patients’ prevalent cardiovascular risk factors/comorbidities. Over a mean follow-up of 22.5 months, Cluster 3 had the highest rate of cardiovascular events, all-cause death, and the composite outcome (combining the previous two) compared to Cluster 1 and Cluster 2 (all P < .001). An adjusted Cox regression showed that compared to Cluster 2, Cluster 3 (hazard ratio (HR) 2.87, 95% confidence interval (CI) 2.27–3.62; HR 3.42, 95%CI 2.72–4.31; HR 2.79, 95%CI 2.32–3.35), and Cluster 1 (HR 1.88, 95%CI 1.48–2.38; HR 2.50, 95%CI 1.98–3.15; HR 2.09, 95%CI 1.74–2.51) reported a higher risk for the three outcomes respectively. Conclusions In European AF patients, three main clusters were identified, differentiated by differential presence of comorbidities. Both non-cardiac and cardiac comorbidities clusters were found to be associated with an increased risk of major adverse outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-021-02120-3.


Conclusions:
In European AF patients, three main clusters were identified, differentiated by differential presence of comorbidities. Both non-cardiac and cardiac comorbidities clusters were found to be associated with an increased risk of major adverse outcomes.
Keywords: Atrial fibrillation, Clinical phenotypes, Cluster analysis, Clinical management, Major adverse outcomes Background Atrial fibrillation (AF) is a cardiovascular condition which has a multifactorial origin, with several cardiovascular (CV) and non-CV risk factors and comorbidities significantly contributing to the development of incident AF cases [1][2][3]. Indeed, epidemiological data clearly demonstrates that the concomitant presence of multiple risk factors/comorbidities increases the risk of developing AF [2,4]. Moreover, patients with AF have a high prevalence of various CV (i.e. heart failure, stroke, coronary artery disease, peripheral artery disease) and non-CV comorbidities, intended as non-cardiac or vascular related (i.e. diabetes mellitus, chronic kidney disease, gastric diseases, chronic obstructive pulmonary disease), as well as a high rate of multi-morbidity [4][5][6][7].
Cluster analysis is a data-driven approach which helps to improve clinical phenotype identification and classification, which has been applied to the study of several CV diseases [8][9][10][11]. Cluster analysis helps to identify the relevant clinical phenotypes, but has been applied to AF in relatively few studies [10][11][12][13]. In those studies which investigated this particular approach, cluster analysis helped to identify patients with similar clinical characteristics which were different between the various groups (more or less prevalence of risk factors and comorbidities combined together, i.e. 'clinical phenotypes'), entailing differential management approach and differential risk for adverse outcomes, hence demonstrating how in groups of patients with different clinical characteristics AF can have a different clinical course [10][11][12][13].
Thus far, no large European AF cohort has been investigated to elucidate which are the most common clinical phenotypes in patients presenting with AF. Indeed, as demonstrated by previous literature, most of the cohorts examined so far were from North America and Asia [10][11][12][13]. Also a systematic review of machine-learningbased studies, an even more sophisticated form of cluster analysis, on disease definition and risk prediction found that most of the studies were based on North America cohorts [14].
The European Society of Cardiology (ESC) -European Heart Rhythm Association (EHRA) EURObservational Research Programme in AF (EORP-AF) General Long-Term Registry is the largest contemporary observational non-industry sponsored study on European AF patients presenting to cardiology centers. The aim of this report from the EORP-AF is to study the most relevant clinical phenotypes in terms of multi-morbidity clusters among European AF patients. Second, we aimed to analyse their impact in terms of clinical management, healthcare resources use, and major adverse outcomes.

Methods
The ESC-EHRA EORP Atrial Fibrillation General Long-Term Registry is a multicenter observational registry held by the ESC and endorsed by the EHRA. The General Long-Term Registry has been preceded by the General Pilot Registry [15][16][17][18]. The EORP-AF General Long-Term Registry is a prospective, observational, multicenter registry established by the ESC in 27 participating countries. The study enrolled consecutive AF patients presenting in 250 cardiology practices, both inand outpatient settings. The detailed description of the study design, baseline characteristics and 1-year followup results have been provided previously [19,20]. Briefly, all AF patients enrolled had AF documented within 12 months before enrollment, based on electrocardiographic proof. All patients were ≥ 18 years old and provided written informed consent. Enrollment was undertaken from October 2013 to September 2016, with 1-year follow-up performed until to September 2018. Baseline and follow-up data were completed into a centralized electronic case report (eCRF) form by each investigator. According to its observational nature, only a limited set of variables, related to baseline thromboembolic and bleeding risk, baseline comorbidities and pharmacological therapy, had to be compulsory filled. 'Unknown' value, when provided, was considered as missing and then not considered. As reported in this study, more than 80% of patients reported a valid data for the core variables (Table 1 and Table 2). Patient data were obtained after the signing of a written informed consent by each patient, following the approval of study protocol by an Institutional Review Board/Ethic Committee. The study was firstly approved by the National Coordinators' main institutions (listed in the Additional file 1) and subsequently was authorized by each peripheral site under the responsibility of the lead contact and study team (all listed in the Additional file 1), according to the specific national and local regulation. Any details regarding approval numbers for the study protocol regarding any specific site could be obtained from the  [1]. Thromboembolic risk was defined according to CHA 2 DS 2 -VASc score [1]. Bleeding risk was defined according to HAS-BLED score [1]. Both CHA 2 DS 2 -VASc and HAS-BLED scores were computed according to the original schemes. High thromboembolic risk was defined as CHA 2 DS 2 -VASc ≥ 2 in males and ≥ 3 in females. High bleeding risk was defined for HAS-BLED ≥ 3. We have also examined distribution of CHA 2 DS 2 -VASc quartiles across the clusters. Multi-morbidity was defined as the concomitant presence of at least 2 different comorbidities [21]. Frailty was defined based on a 40-item frailty index ≥ 0.25 built according to Rockwood and Mitnitski [22]. Polypharmacy was defined as the concomitant use of ≥ 5 drugs [23]. Additionally, we examined the distribution of comorbidities and concomitant drug distribution. Adherence to the Atrial fibrillation Better Care (ABC) pathway was defined according to a previously published study [24]. Briefly, the ABC pathway has been proposed to streamline integrated care and holistic management in AF patients and is based on the following: (i) avoid stroke with anticoagulation; (ii) better symptom management with patientcentred symptom-directed decisions on rate or rhythm control; (iii) cardiovascular risk factor and comorbidity optimization including lifestyle changes [25]. Adherence to the ABC pathway has consistently been associated with reduction in risk for major clinical outcomes associated with AF [26]. According to the eCRF rate/ rhythm control strategy, as well the use of specific medical techniques (electric or pharmacological cardioversion, catheter ablation), were evaluated both priori admission/consultation and during admission/ consultation. All the baseline variables were established regardless of the clustering process and according to previous international definitions; hence, no a priori difference can be determined according to the various clusters.

Clustering process
We performed an agglomerative hierarchical cluster analysis based on Ward's Minimum Variance Method to minimize the total within-cluster variance and we selected the squared Euclidean as measure of distance or dissimilarity. The squared Euclidean distance was used since only dichotomous variables were selected. The aim of the analysis was to identify the optimal number of clusters that were homogenous and indicative of a clinically relevant phenotypic subgroup of AF patients without a priori knowledge of the outcomes. We a priori selected 22 clinical variables as follows: age, sex, heart failure, coronary artery disease, valvular disease, hypertension, diabetes mellitus, ischemic stroke, peripheral ischemic events, liver disease, chronic obstructive pulmonary disease, anaemia, dementia, any cardiomyopathy, hyperthyroidism, hypothyroidism, chronic kidney disease, obstructive sleep apnoea syndrome, malignancy, body mass index. All variables were considered as categorical-either present or absent. Age and body mass index were dichotomized, according to usual clinical practice, as age < 75 and ≥ 75 years and body mass index < 25 kg/ m 2 (normal BMI) and ≥ 25 kg/m 2 (overweight/obese). From the EORP-AF dataset, a total of 9363 (84.8%) were found to have all data available for the 22 variables and were included in the analysis. The clustering algorithm begins with each element (i.e. patient) in a separate cluster and then proceeds with a 'bottom-up' approach grouping each cluster with the most similar one until all clusters become one. The hierarchical clustering process is visually represented by a dendrogram graph in which vertical lines represent clusters that are joined together and the position of an horizontal line on the scale indicates the rescaled distance at which clusters were joined (the higher is the rescaled distance at which clusters combine on the y-axis, the more dissimilarity exists between clusters since they joined nearer to the final point of the dendrogram, in which the clusters become one). Ward Linkage coefficients provided a mean to determine the heterogeneity between clusters by providing the difference in Euclidean distances over which clusters are joined (i.e. the difference between subsequent horizontal lines on the dendrogram) with larger distances indicating greater heterogeneity between the clusters joined at that step. By examining the dendrogram produced by the clustering process and considering the Ward Linkage coefficients, we found that the distance between the points in which the elements grouped together (between 10 and 15 on the y-axis) became larger and consequently the groupings became more heterogeneous after being expanded to 3 clusters. Therefore, the 3-cluster model was used in this study.

Outcomes
To evaluate the comprehensive impact of different AF clinical phenotypes, we considered a large set of outcomes. First, we examined the differential use of healthcare resources according to the three clusters. In patients enrolled during hospitalization, we evaluated the overall length of stay. Further, we recorded and analysed the occurrence of cardiology and internal medicine/general practitioner visits, as well as emergency room admissions (all intended as a binary variable as 'at least one visit/admission'), in the three clusters identified, occurred separately during the first and second year of follow-up. Second, we considered several major clinical adverse events, throughout the follow-up period. The primary clinical outcomes were as follows: (i) cardiovascular events including the occurrence of any thromboembolic event (including stroke, transient ischemic attack and any peripheral embolism), any acute coronary syndrome and CV death; (ii) all-cause death; and (iii) a final composite outcome of CV events and/or allcause death. All primary outcomes were analysed with a time-to-event and intention-to-treat approach, with observation censored after the first event occurred. Additionally, we evaluated the occurrence of several secondary clinical outcomes: (i) any bleeding; and hospital readmission for (ii) any cause; (iii) CV-related; (iv) AF; (v) cardiovascular but non-AF related; and (vi) any non-CV cause. These secondary outcomes, given the lack of dates, were not analysed with a time-to-event approach. All outcomes were assessed by clinical visit or telephonic interview with patient or next of kind and reported by investigators. Each event was not centrally adjudicated but categorized according to investigator's clinical evaluation. All data regarding outcomes were collected before the analysis was planned and performed; hence, no difference in assessment of outcomes exists according to the clusters.

Statistical analysis
Continuous variables were expressed as mean and standard deviation or median and IQR, and differences across the clusters were evaluated according to one-way analysis of variance (ANOVA) and Kruskal-Wallis one-way ANOVA, respectively. Categorical variables were expressed as counts and percentages and differences across clusters were evaluated according to the chisquared test. A logistic regression model, adjusted for type of AF and EHRA score, was compiled to examine the association between clusters and use of oral anticoagulant (OAC) therapy.
To evaluate the differences in length of hospital stay between the three clusters, a one-way analysis of covariance (ANCOVA) model, adjusted for type of AF and EHRA score, was used. To analyze the association between clusters and other healthcare use resources, a logistic regression model was used, adjusted for type of AF, EHRA score and use of OAC.
Differences in cumulative risk for the three main study outcomes were evaluated using log-rank test and drafted according to Kaplan-Meier curves. To investigate the association between the three clusters and the study primary clinical outcomes, a Cox regression analysis, adjusted for type of AF, EHRA score and use of OAC. The association between the three clusters and the secondary clinical outcomes utilized a logistic regression model, adjusted for type of AF, EHRA score and use of OAC.
Finally, to understand whether the application of a more comprehensive and integrated clinical management could have had an impact on the occurrence of clinical outcomes, we performed an analysis to assess the impact of adherence to the ABC pathway on the composite outcome of CV events and all-cause death, according to the three clusters. A Cox regression model for ABC vs. non-ABC and each ABC criterion, adjusted for type of AF, EHRA score and use of OAC, was performed. All logistic regression analysis results were reported as odds ratio (OR) and 95% confidence interval (CI). All Cox regression analysis results were reported as hazard ratio (HR) and 95% CI. No formal interaction analysis was performed, and missing data were just considered as missing with no imputation analysis performed. A two-sided p < 0.05 was considered statistically significant. All analyses were performed using SPSS statistical software version 25.0.0.1 (IBM, NY, USA) for MacOS.

Results
Among the overall 11,096 patients originally enrolled in the study, a total of 9363 (84.8%) were included in this analysis.  (Table 1). Accordingly, these patients reported the highest prevalence of previous cardiac implantable electronic device. Additionally, the prevalence of multi-morbidity, frailty and polypharmacy was highest in Cluster 3 compared to the other clusters (all P < .001). When examining the median number of comorbidities and concomitant drugs, both were highest in Cluster 3 and progressively lower in Cluster 1 followed by Cluster 2. When looking at quartiles of comorbidities and concomitant drugs, patients in Q4 (respectively comorbidities ≥ 6 and concomitant drugs ≥ 7) were more commonly found in Cluster 3, and lower in Clusters 1 and 2 (Table 1). Based on baseline characteristics, we can 'label' the three clusters (Fig. 1) as follows: (i) Cluster 1: older patients with prevalent non-cardiac comorbidities; (ii) Cluster 2: younger patients with an overall low thromboembolic risk and low comorbidity burden; and (iii) Cluster 3: patients with prevalent cardiovascular risk factors and comorbidities, at highest risk of adverse events.

Management of AF
Analysis of the management of AF according to the three clusters is reported in Table 2. Use of antiplatelet drugs was highest in Cluster 3 (P < .001), while use of OAC was lowest in Cluster 2 (P < .001). Among OAC, vitamin K antagonists were more likely used in Cluster 3, with non-vitamin K antagonist OACs use more prevalent in Cluster 2 (both P < .001). Dual antithrombotic therapy was more used in Cluster 3 (P < .001). When considering only those patients eligible for OAC treatment (male patients with CHA 2 DS 2 -VASc ≥ 1 or female patients with CHA 2 DS 2 -VASc ≥ 2), we found substantially similar prevalence of treatments, with the only exception of prevalence of OAC which was higher in Cluster 2 than in the others (P = .004) (Additional file 1, Table S1).
After adjustment for type of AF and EHRA score, compared to those in Cluster 2, both patients in Cluster 1 and in Cluster 3 were more likely prescribed with OAC (OR 1.20, 95% CI 1.05-1.39 and OR 1.17, 95% CI 1.01-1.36, respectively). Among OAC users, Cluster 1 and Cluster 3 were significantly associated with greater vitamin K antagonist use compared to non-vitamin K antagonist OACs, when compared to Cluster 2 (adjusted OR 1.21, 95% CI 1.08-1.36 and OR 1.45, 95% CI 1.29-1.63, respectively).
Prior admission/consultation, a rate control strategy occurred more often among patients in Cluster 3, while a rhythm control strategy was more often used in Cluster 2 (P < .001). All rhythm control strategies, except for pharmacological cardioversion, were more prevalent in Cluster 2 than in the other clusters (P < .001, P = .003 and P < .001, respectively).
During the index admission/consultation, a rhythm control intervention was planned/performed more frequently in Cluster 2 (P < .001). Electrical cardioversion and catheter ablation were more likely used in Cluster 2, while pharmacological cardioversion was more common in Clusters 1 and 3 (all P < .001). At discharge, a rate control strategy was more likely used in Cluster 3 and patients in Cluster 3 were less likely managed as adherent to ABC pathway (P < .001).

Use of healthcare resources
Among the 4694 patients enrolled during a hospital admission, mean [standard deviation] length of stay was progressively lower in patients in Cluster 3 (8.07 [8.50] days), Cluster 1 (6.52 [7.29] days) and Cluster 2 (4.36 [6.33] days) (P < .001 for the overall model and for differences between each cluster). After adjustment for EHRA score and type of AF, differences in overall length of stay remained significant (F = 72.215, P < .001).
During follow-up, use of healthcare resources (Table 3) differed significantly among the three clusters. Patients in Cluster 1 and Cluster 3 were more likely to have at least one internal medicine/general practitioner visit both at 1 year and 2 years, even after adjustments (see Table 3). Patients in Cluster 3 were more likely to have at least one emergency

Major adverse events
Outcomes were assessed over a median [IQR] 731 [701-749] days of follow-up (Table 4). A progressively higher rate of events was found from Cluster 2 to Cluster 1 and Cluster 3 for the occurrence of cardiovascular events, all-cause death, composite outcomes and any cardiovascular non-AF-related hospital readmission (all P < .001) ( Table 4). Occurrence of any bleeding and any non-CV-related hospital readmission was significantly lower in Cluster 2, while a higher rate of AF-related readmission was found. A nonsignificant trend for a higher rate of any readmission and any cardiovascular readmission was evident for Cluster 3 (Table 4). Adjusted logistic regression analyses (Fig. 2) found a higher risk for all the secondary outcomes in Cluster 1 and Cluster 3, except the risk for any AF-related readmission, which was lower for both these clusters.
Regarding the main clinical study outcomes, Kaplan-Meier curves (Fig. 3) show a progressively higher cumulative risk across the three clusters for all the main study outcomes. Adjusted Cox regression analyses (Table 5) found that compared to Cluster 1, Cluster 2 and Cluster 3 were associated with a progressively higher risk for all the three study primary outcomes.
Adherence to ABC pathway and outcomes according to clusters (Table 6) In Cluster 1, we found that while the adherence to the overall ABC pathway was not significantly associated with a lower risk of the composite outcome, the 'B' criterion showed a non-significant trend in inverse association with the risk of event occurrence. In Cluster 2, which was at a generally low thromboembolic risk, adherence to ABC pathway was found to be associated to a lower risk for the composite outcome, with no single  Legend: for acronyms, please see previous tables' legends criterion being independently associated with lower risk. Cluster 3 showed that full adherence to the ABC pathway was strongly associated with a significant reduction in the risk of adverse outcomes, but that the risk reduction was mainly associated with adherence to the 'C' criterion (Table 6).

Discussion
In this cluster analysis derived from the ESC-EHRA EORP-AF General Long-Term Registry, we showed that three main clinical phenotypes can be identified among European AF patients. The first cluster was characterized by older patients with a prevalent high burden of non-cardiac comorbidities (Cluster 1); the second cluster was associated with a younger age, with a low burden of comorbidities and an overall low thromboembolic risk (Cluster 2); in the third cluster, we observed older AF patients with a high burden of CV risk factors and comorbidities, with an overall high burden of multi-morbidity and frailty, and the highest thromboembolic risk (Cluster 3). The three clusters showed clear differences in terms of OAC therapy and clinical management, with a differential risk in long-term major adverse events. Both Cluster 1 and Cluster 3 showed an overall higher use of healthcare resources during follow-up and a higher risk of major adverse events, particularly those patients in Cluster 3.
Recently, machine-learning-based data analysis has been increasing applied to biomedical scientific research, even in cardiovascular health [27]. Among the more basic machine-learning analyses, the unsupervised cluster analysis has garnered attention, with studies in the hypertension and heart failure cohorts [8,28]. Use of this analytic technique allows us to perform insightful epidemiological analysis, allowing better risk stratification, which could lead to more focused management and treatment [8]. Thus far, cluster analysis in AF patients has been applied to AF patients only in two large  observational studies, the 'Outcomes Registry for Better Informed Treatment of Atrial Fibrillation' and the 'Keio Interhospital Cardiovascular Studies for AF' registry; US and Japanese cohorts, respectively [10,11]. More recently other two cluster analyses regarding large AF datasets were published [12,13]. In this context, our study provides novel evidence, representing the first large cluster analysis focused on European AF patients.
The current analyses demonstrate how the level and the type of comorbidities are key essential elements in differentiating AF patients with distinctive clinical needs and long-term risks. Previous studies investigating cluster analysis in AF patients have shown how specific clusters characterized by a higher burden of comorbidities and risk factors are associated with higher risk of major adverse events during long-term follow-up [10][11][12][13]. The results we provide not only underline the importance of the role of comorbidities in determining the occurrence of major adverse events, but also highlight the differential impact of non-CV and CV comorbidities. While on one side some previous analyses regarding the use of machine-learning systems, of which cluster analysis represents a primordial representative, underlined how several methodological issues can limit the reliability and reproducibility of such data, it is our opinion that putting our data in the context of previous literature helps to stress some important concepts about how not only comorbidities are crucial in determining the risk of outcomes, but is also important how they associate each other and influence the natural history of the disease as a whole.
The Framingham Heart Study previously showed that AF patients with comorbidities have a consistently increased risk for cardiovascular events and allcause death compared to those without [29]. In an analysis from the 'Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation' trial, multi-morbidity was found to be associated with an increased risk of adverse outcomes [7]. Furthermore, in a registry-based analysis of hospitalized AF patients, an increasing Charlson Comorbidity Index, a validated tool to evaluate the level of multimorbidity, was directly associated with the occurrence of stroke, major bleeding and all-cause death [6]. Our cluster analysis demonstrates that not all the comorbidities carry the same risk. Indeed, while Cluster 1 demonstrates an increased risk, Cluster 3 showed a significantly greater risk compared to Cluster 1.
The 2020 ESC guidelines on the management of AF [1] introduce a paradigm shift promoting a more integrated and holistic approach to AF diagnosis, characterization and management, summarized as 'CC to ABC'. If the first 'C' relates to confirmation of AF diagnosis, the second 'C' focused specifically on the need to properly evaluate and characterize each AF patient, in order to appropriately stratify their individual risk and plan the best diagnostic and therapeutic pathway. Regarding the 'characterization' of AF patients, the ESC guidelines recommends the '4S-AF scheme' to provide a 'structured characterization of AF and to streamline the assessment of AF patients at different healthcare levels, inform treatment decision-making and facilitate optimal management of AF patients' [30]. The 4S-AF scheme suggests evaluating patients as follows: (i) stroke risk; (ii)  Legend: a adjusted for type of AF, EHRA score, use of OAC. For acronyms, please see previous tables' legends symptom severity; (iii) severity of AF burden; (iv) substrate severity. Fully supporting the 4S-AF approach, our analysis demonstrates how comprehensive clinical characterization provides important information, delineating a clear profile for each cluster, which also differentiated patients in terms of healthcare needs and longterm risks. Hence, the information gathered through the clustering process can help in defining the patients' risks and determine strategies to improve their care and management.
The differential impact of the ABC pathway underlines how the same treatment strategies can have distinct effects according to the clinical profile of the patient. Also, the evidence that the ABC pathway is more effective in the cluster with the greatest multimorbidity (Cluster 3), and that the effectiveness is driven largely by the management of comorbidities, emphasizes the prominent role of CV risk factors/comorbidities in determining adverse outcomes in AF patients, while clearly demonstrating how a comprehensive management plan is clearly needed to improve patient care, as well as a proper evaluation and characterization.
Our data are also in line with more recent research in the area of multi-morbidity, which now distinguishes patterns/clusters of conditions, clearly defined in terms of sociodemographic, clinical and functional characteristics, beyond the mere presence of multiple conditions [31]. This analysis represents the first application of this approach to a large European AF population.
Results from the cluster analysis in AF patients underline the importance of stratifying patients' characteristics and identify those clinical phenotypes more prone to adverse events, beyond the mere focus on thromboembolic risk, and to properly address patients' care need and healthcare management plan. If the clustering process is not easily applicable in clinical daily life, information gathered from this analysis reinforces the concept that the presence of comorbidities increases the risk. Thus, we further reinforce the need for implementation of the 4S-AF scheme to characterize AF, which allows an easy and straightforward way for everyday clinical use, also helping to the use and evaluation of quality indicators [32]. We believe it is important to underline that our data are generated from a large Europe-wide cohort. Even though patients were gathered principally from thirdlevel cardiology practices, they were collected consecutively over the enrolment time and with a minimal set of inclusion/exclusion criteria, reassuring on the representativity of our data. Even though we based our clustering analysis on a lower number of variables, we think that the superimposable results, in particular about the main characteristics of the clusters and their impact on risk of outcomes, could reassure about the reliability of our analysis, with a relevant generalisability of the results provided.

Limitations
The main limitation of the study relates to its observational nature, with a limited power to detect differences in subgroups which were not pre-specified in the study design. Moreover, as an observational registry, completeness of data is not as high as clinical trial; hence, this aspect may have partially limited our analytical capability. Notwithstanding, several paper already published about the EORP-AF General Long-Term Registry showed similar findings to other contemporary registries held in other geographical locations, both in terms of baseline characteristics and follow-up data [19,20]. Second, the data presented do not imply causality, rather they describe a statistical association. Furthermore, identified clusters may vary according to patients' characteristics and available data and may change over time, since risk is dynamic (changing with ageing and incident risk factors [33][34][35]) and not a 'one-off' assessment. Moreover, no formal interaction analysis was performed, and missing data were just considered as missing with no imputation analysis performed. Finally, the optimal number of clusters can be difficult to determine since different statistical algorithms may generate different results and the final selection of clusters was based in part on investigator discretion, also no analysis regarding the between-cluster heterogeneity could have been performed. The extent of the limitations suggests caution in interpreting our findings. Notwithstanding, we believe that given the large cohort presented generated from the entire European territory confers a significant generalizability, even though is necessary to take in mind the possible limitations of such analyses [14].

Conclusions
In European AF patients, three main clinical clusters were identified, older patients with non-cardiac comorbidities, a younger, low risk group and older patients with cardiac comorbidities. Both non-cardiac and cardiac comorbidities clusters were found to be associated with an increased risk of major adverse outcomes.
Abbreviations ABC: Atrial fibrillation Better Care; AF: Atrial fibrillation; CI: Confidence interval; CV: Cardiovascular; EHRA: European Heart Rhythm Association; EORP: EURObservational Research Programme; ESC: European Society of Cardiology; HR: Hazard ratio; OAC: Oral anticoagulant; OR: Odds ratio of the lead contact and study team (all listed in the Additional file 1), according to the specific national and local regulation. Any details regarding approval numbers for the study protocol regarding any specific site could be obtained from the Corresponding Authors, upon reasonable request. The study was performed according to the European Union Note for Guidance on Good Clinical Practice CPMP/ECH/135/95 and the Declaration of Helsinki.

Consent for publication
Not applicable.