Development and description of measurement properties of an instrument to assess treatment burden among patients with multiple chronic conditions

Background Patients experience an increasing treatment burden related to everything they do to take care of their health: visits to the doctor, medical tests, treatment management and lifestyle changes. This treatment burden could affect treatment adherence, quality of life and outcomes. We aimed to develop and validate an instrument for measuring treatment burden for patients with multiple chronic conditions. Methods Items were derived from a literature review and qualitative semistructured interviews with patients. The instrument was then validated in a sample of patients with chronic conditions recruited in hospitals and general practitioner clinics in France. Factor analysis was used to examine the questionnaire structure. Construct validity was studied by the relationships between the instrument's global score, the Treatment Satisfaction Questionnaire for Medication (TSQM) scores and the complexity of treatment as assessed by patients and physicians. Agreement between patients and physicians was appraised. Reliability was determined by a test-retest method. Results A sample of 502 patients completed the Treatment Burden Questionnaire (TBQ), which consisted of 7 items (2 of which had 4 subitems) defined after 22 interviews with patients. The questionnaire showed a unidimensional structure. The Cronbach's α was 0.89. The instrument's global score was negatively correlated with TSQM scores (rs = -0.41 to -0.53) and positively correlated with the complexity of treatment (rs = 0.16 to 0.40). Agreement between patients and physicians (n = 396) was weak (intraclass correlation coefficient 0.38 (95% confidence interval 0.29 to 0.47)). Reliability of the retest (n = 211 patients) was 0.76 (0.67 to 0.83). Conclusions This study provides the first valid and reliable instrument assessing the treatment burden for patients across any disease or treatment context. This instrument could help in the development of treatment strategies that are both efficient and acceptable for patients.


Background
Chronic diseases are the leading cause of mortality in the world, representing more than 36 million deaths in 2008 [1]. About 45% of the population and 88% of people older than 65 years have at least one chronic condition. The prevalence of chronic diseases continues to increase: in 2020, nearly 50% of the US population will have at least one chronic condition [2]. Therefore, the challenge for physicians has switched from curing acute illnesses to managing multiple chronic conditions. However, illnesses are still the primary focus of medical care [3] and many clinical practice guidelines focus on single conditions. For example, a physician following extant guidelines could prescribe up to 12 medications for a patient with osteoporosis, osteoarthritis, type 2 diabetes mellitus, hypertension, and chronic obstructive pulmonary disease [4].
Being a patient implies more investment of time and effort than just taking medicines. It also involves drug management, self-monitoring, visits to the doctor, laboratory tests and changes of lifestyle. For example, patients with type 2 diabetes controlled by oral agents could spend 143 minutes daily in recommended self-care [5]. This workload can affect quality of life as severely as the illness itself, and patients rate this treatment burden equal to that of diabetic neuropathy or nephropathy [6].
Treatment burden can be defined as the impact of health care on patients' functioning and well-being, apart from specific treatment side effects [7,8]. It takes into account everything patients do to take care of their health: visits to the doctor, medical tests, treatment management, and lifestyle changes. Treatment burden is associated, independently of illnesses, with adherence to therapeutic care [9,10] and could affect hospitalization [11] and survival rates [12].
Minimally disruptive medicine seeks to tailor treatment to the contexts of patients by integrating the notion of treatment burden in their care [13]. Therefore, caregivers need tools to establish the weight of the treatment burden. Many instruments assess treatment burden for specific conditions [14][15][16][17][18], but none has been developed to assess this burden globally across multiple chronic diseases. Because the treatment burden grows from the combination of chronic diseases, only an instrument that assesses it globally could help clinicians and researchers develop effective therapeutic programs that minimize the treatment workload [13].
In the present work, we aimed to develop a measure of treatment burden for patients with at least one chronic condition. This measure should be of use in daily clinical practice and in clinical research.

Methods
We used a multistep method to develop a tool to measure the treatment burden of chronic diseases [19,20] following the quality criteria proposed in the literature [21].

Stage 1: elaboration of the questionnaire
The objective of the instrument was to capture the perception of treatment burden of patients as 'the work of being a patient' dealing with increasingly complex treatment regimens [13], that is, the impact of the workload of healthcare on a patient's well-being and functioning.
We searched MEDLINE via PubMed for literature on treatment burden and existing questionnaires assessing it in specific diseases. We found no instrument appraising the treatment burden globally. Treatment burden was often assessed only as a subscale of specific disease scales [14][15][16][17] and thus was considered only for the regimen associated with a particular condition. Items often focused on drug intake, adherence to care and convenience of use.
Using this literature review, three members of the team who had experience in the care of patients with chronic diseases (V-TT, BF, PR) highlighted possible relevant topics to capture the aspects of the workload of healthcare that could affect a patient's life. These topics were the burden associated with taking medicines, self-surveillance, laboratory tests, doctor visits, need for organization, administrative tasks, following advice on diet and physical exercise and social impact of the treatment. According to the conceptual model of our instrument, we chose not to include other consequences of the treatment such as treatment side effects.
In addition, because our instrument was elaborated in France and administered to French patients, we did not take into account the financial burden of treatment, because our national public health insurance program guarantees healthcare free of charge for patients with chronic conditions.
We recruited a convenience sample of 22 patients with at least 1 chronic condition from the department of internal medicine of Hospital Pitié-Salpetrière and a general practitioner clinic in Paris in April 2011 (Additional file 1, Appendix 1). These two settings involved patients with various chronic conditions, requiring primary, secondary and tertiary care. During semistructured interviews, we presented the concept of treatment burden to patients and asked them about their diseases, their treatment and the burden of treatment, with open-ended questions: 'Could you tell us about your health problems?' 'Could you tell us about what you have to do to take care of your health?' 'What aspects of your care have the most impact on your life?' Then, we asked them about the burden associated with the different topics highlighted earlier by asking them (1) to rate each of these items, (2) to explain why they would rate it like that and (3) if they found the item relevant in the assessment of treatment burden generally. Finally, we asked patients, if other aspects of the workload of healthcare bothered them. As a result of these interviews, examples were added to the items, and we added one item 'Frequent healthcare reminds me of my health problems' to the questionnaire.
The resulting questionnaire consisted of seven items (two of which had four subitems), formed by an introductory sentence with examples, followed by a rating scale ranging from 0 to 10 with numbers placed under boxes and labeled end anchors ('No burden' and 'Considerable burden') [22][23][24].
A group of ten physicians (two methodologists, three general practitioners, two internists, one cardiologist, one pneumologist, one diabetologist) with experience in the care of patients with chronic conditions, some of whom had experience in questionnaire development, reviewed the clarity and wording of the items. All physicians agreed that, on the surface, items appeared to be measuring what they actually were and that the instrument achieved face validity.

Stage 2: measurement properties of the instrument
The measurement properties of the questionnaire were assessed by four steps: (1) reduction of the number of items, (2) assessment of factorial validity, (3) assessment of construct validity and (4) assessment of reliability.
We recruited consecutive patients from six teaching hospitals of the Assistance-Publique Hôpitaux de Paris and eight general practitioner clinics in Paris to validate the questionnaire. Patients were eligible if they were 18 years or older, were able to complete a consent form and had at least one condition requiring medical follow-up for at least 6 months. Patients with cognitive impairment that could interfere with understanding the questionnaire were excluded. All patients provided written informed consent to be in the study.
Reducing the number of items was based on (1) a floor effect, considered present if more than 15% of respondents had the lowest score [21]; (2) the relevance of the items, assessed by the number of answers for which patients checked 'Does not apply'; and (3) item redundancy, suspected when interitem correlations by Spearman's correlation coefficient were > 0.80 [19]. Items were eliminated after discussion among three investigators (V-TT, BF, PR).
Answers to the questionnaire were aggregated in a global score by summing the item responses. 'Does not apply' or missing answers were considered the lowest possible score (0) because we considered that a patient not concerned by a domain of the treatment burden had no burden for that domain.
Factorial validity was assessed by determining the dimensional structure of the questionnaire by use of factor analysis. Scree plots were used to visualize a break between factors with large and small Eigenvalues. Factors that appeared before the horizontal break were assumed to be meaningful. Internal consistency was assessed by Cronbach's α [25] and was considered acceptable between 0.70 and 0.95 [26].
Construct validity was obtained by confirming two constructs theorized on the treatment burden [27]. First, we hypothesized a negative correlation between treatment burden, defined as the work of dealing with complex treatment regimens, and treatment satisfaction, defined as the balance between expectations about the treatment, side effects, convenience of use, and perceived efficacy. Treatment satisfaction was assessed by the Treatment Satisfaction Questionnaire for Medication (TSQM), an 11-item questionnaire validated in a population with diverse chronic conditions, measuring patient satisfaction with various medications designed to treat, control or prevent a wide variety of medical conditions [28,29]. TSQM scores range from 0 to 100 and measure patient satisfaction with the treatment's effectiveness, side effects, convenience and globally. Correlations were expected to be higher between our instrument and the TSQM convenience score because some items overlapped. Second, we assumed a positive correlation between the patient evaluation of the treatment burden and treatment workload evaluated by items on (1) drug intake (number of tablets, injections and intakes per day); (2) medical follow-up (number of different physicians, medical appointments per month and hospitalizations per year); and (3) daily time spent on selfcare. The correlations between the global questionnaire score, the TSQM scores and treatment workload variables were assessed by Spearman correlation coefficient (r s ) and considered high with r s > 0.50 and moderate with r s 0.35 to 0.50 [30]. Wilcoxon and Kruskal-Wallis tests were used to compare measurements for qualitative variables across groups. A P value < 0.05 was considered statistically significant. We used linear regression analyses to examine variables that predicted the global questionnaire score. Relationships were characterized with beta coefficients, standard errors, and percent variance explained (adjusted R 2 ) within these models. Heteroskedasticity was corrected by the method described by Greene et al. [31].
Description of our sample was completed by clustering homogenous groups of patients depending on the similarity of their response patterns to the Treatment burden questionnaire and analysis of treatment workload variables in each cluster of patients. Clustering involved a hierarchical ascendant classification with a Ward's distance method [32]. The number of clusters was determined so as to have a minimal sample of 100 patients. Stability of clustering was assessed by a twofold crossvalidation method.
We compared the patient's self-evaluation of treatment burden with an evaluation by their physician and by an informal caregiver using the same questionnaire adapted for heteroevaluation. Physicians and informal caregivers were asked to make the best estimate of the patient's treatment burden from their perspective.
Reliability of the instrument was determined by a testretest method. Patients completed the new instrument twice: at baseline and after 2 weeks or 1 month. Reliability was assessed by the intraclass correlation coefficient (ICC) for agreement [33]. The 95% confidence intervals (95% CIs) were determined by a bootstrap method. Agreement was considered acceptable with ICC > 0.60 [27,34]. Agreement was represented by Bland and Altman plots, which represent the differences between two measurements against the means of the two measurements [35].
Statistical analyses involved use of SAS v. 9.2 (SAS Institute, Cary, NC, USA) and R v. 2.13.1 http://www.r-project. org/. This study was approved by the Institutional Review Board of Hospital Bichat (IRB: 00006477).
During item reduction, we eliminated the subitem 'The conditions to store your medications (in your refrigerator etc.)' because a large number of patients responded 'Does not apply' (51.6%) and it had a large floor effect (64.0%) (Additional file 2, Appendix 2). Therefore, the final version of the questionnaire, the Treatment Burden Questionnaire (TBQ), consisted of seven items (two of which had four subitems) ( Table 2). Factorial validity, assessed by scree plots, favored a unidimensional instrument because 91% of the variance was explained by the first principal factor (Figure 1 and Additional file 3, Appendix 3). Cronbach's α was 0.89. The global score of the Treatment Burden Questionnaire was the sum of the answers to each item and ranged from 0 to 130. It was highly correlated with every item of the questionnaire (r s = 0.47 to 0.68) (Additional file 4, Appendix 4). Construct validity showed (1) a moderate negative correlation of the Treatment Burden Questionnaire score with the TSQM global and convenience scores (r S = -0.41 and r S = -0.53) and a weak negative correlation with the TSQM efficacy score (r S = -0.26) ( Table 3) and (2) a significant association of scores for variables used to describe treatment workload and the Treatment Burden Questionnaire global score (Table 4).
Using hierarchical ascendant classification, we clustered our sample in three homogenous groups of patients by the answers to the Treatment Burden Questionnaire (Additional file 5, Appendix 5). Twofold cross validation showed stable clustering results. The global score was 11.3 (± 9.2) in the first cluster, 34.6 (± 11.1) in the second cluster and 65.8 (± 18.1) in the third cluster. Therefore, we defined the clusters as patients with low, moderate and high burden of treatment. Descriptive analysis of the treatment workload items within the three clusters showed that scores for these variables were significantly higher for patients with high treatment burden (Table 5). Treatment workload variables could explain up to 69% of the variability in the patient's score. Prediction of global score with these variables was more accurate with high than low treatment burden (R 2 = 0.86 vs R 2 = 0.62) (Additional file 6, Appendix 6). Treatment burden score was significantly higher when patients experienced medication side effects (P < 0.0001) and for patients whose treatment did not relieve their symptoms (P < 0.0001). We found a moderate agreement (ICC 0.60 (0.28 to 0.79)) between patient and informal caregiver global scores (39 informal caregivers (7.8%) completed the questionnaire) (Additional file 7, Appendix 7a). Bland and Altman plots showed a mean difference of -8.7; 95% limits of agreement were -58.0 and 40.7 (Additional file 7, Appendix 7b). Agreement between patient and physician global scores was weak (ICC 0.38 (0.29 to 0.47)) (396 physicians (78.9%) completed the questionnaire) (Additional file 8, Appendix 8). Bland and Altman plot showed a mean difference of -7.6; 95% limits of agreement were -60.7 and 45.4 ( Figure 2). Agreement between patient and general practitioner (n = 209) evaluations was ICC = 0.42 (0.27 to 0.54). Agreement between patient and hospital specialists (n = 187) evaluations was ICC = 0.29 (0.14 to 0.42) (Additional file 9, Appendix 9). Treatment workload variables could explain up to 76% of the variation in physician evaluations and was more accurate for patients with high than low treatment burden (R 2 = 0.82 vs R 2 = 0.72) (Additional file 6, Appendix 6). Retests were obtained for 211 patients (42.0%). For the global score, the ICC for all retests was 0.76 (0.67 to 0.83) How would you rate the constraints associated with your diet (for example, not being allowed to eat certain foods)?

5
How would you rate the burden associated with the recommendations from your doctors to practice regular physical exercises? 6 What is the impact of your healthcare on your social relationships (for example, need for assistance, being ashamed to take your medication in front of people)?

Discussion
In this study, we presented a unidimensional valid and reliable instrument assessing the treatment burden of chronic diseases for patients with multiple chronic conditions. This patient-reported measure took into account the burden associated with drug intake, surveillance, lifestyle changes and the impact of healthcare on social relationships. The instrument could help in clinical research for developing clinical practice guidelines adapted to the realities of patient lives. In addition, it could be used in clinical practice as a validated global score that is easy to calculate to identify patients overwhelmed by their treatment to help begin conversations about treatment burden with these patients.
We highlighted a negative correlation between treatment burden and treatment satisfaction: the more satisfied patients were with their treatment, the less the treatment burden. We expected that our scale score would correlate highly with the TSQM convenience score because some items overlapped. However, patients with side effects and who found the treatment inefficient would feel less agreeable to integrate the treatment in their lives.
Treatment burden did not concern only patients taking a lot of medications: 25% of patients in our sample took < 3 medications a day and still had a median treatment burden score of 17 (Q1 to Q3: 6 to 36). Therefore, treatment burden should be taken into account for every patient, because it could be associated with adherence to care [9] and thus could contribute to hospitalizations and survival rates [11]. However, physicians were often not fully aware of their patients' investment of time and efforts to comply with every prescription: we found only weak agreement between evaluation of treatment burden between patients and physicians. Even for specific domains such as self-monitoring or the prescription of a diet, physicians could not predict their patient's Figure 1 Eigenvalue diagram of the factor analysis of the questionnaire for treatment burden. The scree plot shows a break before factor 2, which suggests a unidimensional solution. 'Does not apply' was considered the lowest possible score (0). The TSQM assesses satisfaction with medication. Scores range from 0 to 100. A high score indicates high satisfaction with the medication. Negative coefficients indicate a decrease in the TSQM score associated with an increase in treatment burden. a TSQM side effects score was calculated only for patients who declared experiencing side effects. evaluation. General practitioners, who are coordinators of care in France, have better knowledge than hospital specialists of how patients cope with everything they do to take care of their health (ICC = 0.42 for general practitioners and 0.20 for hospital specialists) but still fail to assess patients' treatment burden accurately. This finding is not unexpected, because treatment burden is a relatively new concept to physicians [13] and expresses a patient experience that is not shared in depth during consultations [36]. In existing questionnaires, treatment burden was often considered only as a subscale for larger disease-specific scales [16,17] and focused on a single treatment regimen. Given the increasing number of patients with multiple chronic diseases and complex treatment regimens, measuring global treatment burden seems increasingly important. As Gallacher et al. have shown for chronic heart failure, treatment burden relates to how patients cope with their treatment [37]: (1) learning about treatment and their consequences, (2) monitoring the treatment, (3) adhering to treatment and lifestyle changes and (4) engaging with others. During our study, we asked patients about aspects of their healthcare that were not mentioned in our questionnaire but had an impact on their lives. We found the same domains of treatment burden as Gallacher et al., with the exception of gaining an understanding about illness and treatments. Nevertheless, acquiring this knowledge is an important burden in the management of chronic conditions, especially when patients have to make sense of the disparate and conflicting information they gather from different sources. However, because we recruited patients with illnesses for at Global score is the sum of all items scores of the questionnaire with 'Does not apply' and missing answers considered as the lowest possible score (0). least 6 months, they might have already coped with this particular burden, adapted to it, and therefore did not mention it. The strengths of this study included field testing the instrument in a large sample of both inpatients and outpatients with different conditions and treatment regimens, which ensured that our instrument was flexible enough for assessing the treatment burden across any disease or context. However, we found a significant floor effect and a large proportion of 'Does not apply' responses for all of our scales. This result was expected because treatment burden depends on how patients cope with their treatment regimens. Therefore, patients could have no burden in aspects of their care they have Patients were clustered in three groups depending on the similarity of their responses to the instrument. Global score was 11.3 (± 9.2) in the first cluster, 34.6 (± 11.1) in the second cluster and 65.8 (± 18.1) in the third cluster. Therefore, we defined the clusters as patients with low, moderate and high burden of treatment. Continuous variables are presented as mean ± SE. Categorical variables are presented as proportion of the corresponding subgroup. Associations between continuous variables among different classes were determined by Wilcoxon test. Qualitative variables are presented by their frequency in the whole sample. Associations between qualitative variables among different classes were determined by the χ 2 test. Global score is the sum of all items scores of the questionnaire with 'Does not apply' and missing answers considered the lowest possible score (0). *Time needed for patients who did not require specific organization for daily care or who had no self-monitoring was considered 0. integrated in their lives. As well, patients with similar treatment regimens could have very different treatment burdens. Still, domains not included in this instrument may be critical to some of these patients. During the validation study, we systematically searched for other aspects of treatment burden that could have an impact on patients' quality of life but found no preeminent domain. More work in measuring treatment burden is needed. Because treatment burden depends on the context of patients (social or family structure, care delivery system) [13] and because our instrument was developed in France, we could not exclude that different domains could arise in other settings. As an example, the financial burden of the treatment did not arise from our qualitative interviews because the public health insurance program in France guarantees healthcare free of charge for patients with chronic conditions. In addition, depending on the social or family structure, the treatment burden may be shared by the patient with one or more informal caregivers, thus affecting the validity of the measure when only reported by the patient. important intellectual content. DB and PR provided administrative, technical, and material support. All authors saw and approved the final manuscript. PR is the guarantor, had full access to the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis.