We used a multistep method to develop a tool to measure the treatment burden of chronic diseases [19, 20] following the quality criteria proposed in the literature .
Stage 1: elaboration of the questionnaire
The objective of the instrument was to capture the perception of treatment burden of patients as 'the work of being a patient' dealing with increasingly complex treatment regimens , that is, the impact of the workload of healthcare on a patient's well-being and functioning.
We searched MEDLINE via PubMed for literature on treatment burden and existing questionnaires assessing it in specific diseases. We found no instrument appraising the treatment burden globally. Treatment burden was often assessed only as a subscale of specific disease scales [14–17] and thus was considered only for the regimen associated with a particular condition. Items often focused on drug intake, adherence to care and convenience of use.
Using this literature review, three members of the team who had experience in the care of patients with chronic diseases (V-TT, BF, PR) highlighted possible relevant topics to capture the aspects of the workload of healthcare that could affect a patient's life. These topics were the burden associated with taking medicines, self-surveillance, laboratory tests, doctor visits, need for organization, administrative tasks, following advice on diet and physical exercise and social impact of the treatment. According to the conceptual model of our instrument, we chose not to include other consequences of the treatment such as treatment side effects.
In addition, because our instrument was elaborated in France and administered to French patients, we did not take into account the financial burden of treatment, because our national public health insurance program guarantees healthcare free of charge for patients with chronic conditions.
We recruited a convenience sample of 22 patients with at least 1 chronic condition from the department of internal medicine of Hospital Pitié-Salpetrière and a general practitioner clinic in Paris in April 2011 (Additional file 1, Appendix 1). These two settings involved patients with various chronic conditions, requiring primary, secondary and tertiary care. During semistructured interviews, we presented the concept of treatment burden to patients and asked them about their diseases, their treatment and the burden of treatment, with open-ended questions: 'Could you tell us about your health problems?' 'Could you tell us about what you have to do to take care of your health?' 'What aspects of your care have the most impact on your life?' Then, we asked them about the burden associated with the different topics highlighted earlier by asking them (1) to rate each of these items, (2) to explain why they would rate it like that and (3) if they found the item relevant in the assessment of treatment burden generally. Finally, we asked patients, if other aspects of the workload of healthcare bothered them. As a result of these interviews, examples were added to the items, and we added one item 'Frequent healthcare reminds me of my health problems' to the questionnaire.
The resulting questionnaire consisted of seven items (two of which had four subitems), formed by an introductory sentence with examples, followed by a rating scale ranging from 0 to 10 with numbers placed under boxes and labeled end anchors ('No burden' and 'Considerable burden') [22–24].
A group of ten physicians (two methodologists, three general practitioners, two internists, one cardiologist, one pneumologist, one diabetologist) with experience in the care of patients with chronic conditions, some of whom had experience in questionnaire development, reviewed the clarity and wording of the items. All physicians agreed that, on the surface, items appeared to be measuring what they actually were and that the instrument achieved face validity.
Stage 2: measurement properties of the instrument
The measurement properties of the questionnaire were assessed by four steps: (1) reduction of the number of items, (2) assessment of factorial validity, (3) assessment of construct validity and (4) assessment of reliability.
We recruited consecutive patients from six teaching hospitals of the Assistance-Publique Hôpitaux de Paris and eight general practitioner clinics in Paris to validate the questionnaire. Patients were eligible if they were 18 years or older, were able to complete a consent form and had at least one condition requiring medical follow-up for at least 6 months. Patients with cognitive impairment that could interfere with understanding the questionnaire were excluded. All patients provided written informed consent to be in the study.
Reducing the number of items was based on (1) a floor effect, considered present if more than 15% of respondents had the lowest score ; (2) the relevance of the items, assessed by the number of answers for which patients checked 'Does not apply'; and (3) item redundancy, suspected when interitem correlations by Spearman's correlation coefficient were > 0.80 . Items were eliminated after discussion among three investigators (V-TT, BF, PR).
Answers to the questionnaire were aggregated in a global score by summing the item responses. 'Does not apply' or missing answers were considered the lowest possible score (0) because we considered that a patient not concerned by a domain of the treatment burden had no burden for that domain.
Factorial validity was assessed by determining the dimensional structure of the questionnaire by use of factor analysis. Scree plots were used to visualize a break between factors with large and small Eigenvalues. Factors that appeared before the horizontal break were assumed to be meaningful. Internal consistency was assessed by Cronbach's α  and was considered acceptable between 0.70 and 0.95 .
Construct validity was obtained by confirming two constructs theorized on the treatment burden . First, we hypothesized a negative correlation between treatment burden, defined as the work of dealing with complex treatment regimens, and treatment satisfaction, defined as the balance between expectations about the treatment, side effects, convenience of use, and perceived efficacy. Treatment satisfaction was assessed by the Treatment Satisfaction Questionnaire for Medication (TSQM), an 11-item questionnaire validated in a population with diverse chronic conditions, measuring patient satisfaction with various medications designed to treat, control or prevent a wide variety of medical conditions [28, 29]. TSQM scores range from 0 to 100 and measure patient satisfaction with the treatment's effectiveness, side effects, convenience and globally. Correlations were expected to be higher between our instrument and the TSQM convenience score because some items overlapped. Second, we assumed a positive correlation between the patient evaluation of the treatment burden and treatment workload evaluated by items on (1) drug intake (number of tablets, injections and intakes per day); (2) medical follow-up (number of different physicians, medical appointments per month and hospitalizations per year); and (3) daily time spent on self-care. The correlations between the global questionnaire score, the TSQM scores and treatment workload variables were assessed by Spearman correlation coefficient (rs) and considered high with rs > 0.50 and moderate with rs 0.35 to 0.50 . Wilcoxon and Kruskal-Wallis tests were used to compare measurements for qualitative variables across groups. A P value < 0.05 was considered statistically significant. We used linear regression analyses to examine variables that predicted the global questionnaire score. Relationships were characterized with beta coefficients, standard errors, and percent variance explained (adjusted R2) within these models. Heteroskedasticity was corrected by the method described by Greene et al. .
Description of our sample was completed by clustering homogenous groups of patients depending on the similarity of their response patterns to the Treatment burden questionnaire and analysis of treatment workload variables in each cluster of patients. Clustering involved a hierarchical ascendant classification with a Ward's distance method . The number of clusters was determined so as to have a minimal sample of 100 patients. Stability of clustering was assessed by a twofold crossvalidation method.
We compared the patient's self-evaluation of treatment burden with an evaluation by their physician and by an informal caregiver using the same questionnaire adapted for heteroevaluation. Physicians and informal caregivers were asked to make the best estimate of the patient's treatment burden from their perspective.
Reliability of the instrument was determined by a test-retest method. Patients completed the new instrument twice: at baseline and after 2 weeks or 1 month. Reliability was assessed by the intraclass correlation coefficient (ICC) for agreement . The 95% confidence intervals (95% CIs) were determined by a bootstrap method. Agreement was considered acceptable with ICC > 0.60 [27, 34]. Agreement was represented by Bland and Altman plots, which represent the differences between two measurements against the means of the two measurements .
Statistical analyses involved use of SAS v. 9.2 (SAS Institute, Cary, NC, USA) and R v. 2.13.1 http://www.r-project.org/. This study was approved by the Institutional Review Board of Hospital Bichat (IRB: 00006477).