Adaptation and validation of the TBQ in English followed a multi-step approach recommended in the literature [12–14]: 1) review the conceptual evidence about treatment burden in English-speaking countries; 2) translate the TBQ into English; 3) pretest the instrument with patients to assess the relevance of items, clarity and wording and; 4) assess validity and reliability by a test–retest of the adapted instrument.
Conceptual evidence of treatment burden in English-speaking countries
During a literature review of MEDLINE via Pubmed, we identified several articles describing the concept of treatment burden in English-speaking countries. In the USA, Eton and colleagues identified three broad themes for treatment burden: 1) work patients must do to take care of their health, 2) problem-focused strategies to facilitate self-care, and 3) factors that exacerbate the perceived burden [2]. In the UK, Gallacher and associates found four similar themes in patients with chronic heart failure and stroke: 1) learning about treatments and their consequences, 2) engaging with others, 3) adhering to treatment and lifestyle changes, and 4) monitoring their treatments [15, 16]. Finally, in Australia, Sav and associates identified four inter-related components of treatment burden: 1) financial, 2) time and travel, 3) medication, and 4) healthcare access burdens [17].
The original TBQ encompassed all of these domains, with the exception of the financial treatment burden, because in France, the public health insurance program guarantees healthcare free of charge for patients with chronic conditions. Therefore, we added a new item in the English adaptation of the TBQ: ‘How would you rate the financial burden associated with your healthcare (e.g., out-of-pocket expenses or expenses not covered by insurance)?’
Translation of the TBQ into English
Translation of the TBQ intoEnglish involved a classic ‘forward-backward’ translation method [13]. First, the original instrument in French was translated into English by two bilingual translators; one (CB) had a medical background, and was familiar with the concept of treatment burden. Second, the two obtained translations were synthesized and reviewed by a committee, which included the authors of the original questionnaire. Third, two different translators, blinded to the original version, back-translated the questionnaire into French. Finally, the committee reviewed and synthesized all translations to elaborate English items that were similar to the original items and easy to answer.
Pre-testing the instrument
To assess the relevance of items, clarity, and wording, we pre-tested the obtained instrument with a convenience sample of 200 participants in August 2013 (see Additional file 1). We used an internet platform, the Open Research Exchange (ORE) [18, 19], to recruit patients on PatientsLikeMe (PLM) [20], an online network where 200,000 voluntary participants with chronic conditions share data about their treatment, conditions, and symptoms. Members of PLM join the site with the expectation that they will be participating in research. To participate, patients had to have at least one chronic condition (defined as requiring ongoing healthcare for at least 6 months). After having answered the questionnaire, they provided feedback about 1) the clarity, wording, and relevance of the items, and 2) any burden that they felt was not covered or insufficiently covered in the questionnaire in an open-ended manner. Their answers were categorized and discussed by two of the authors (VT-T and CB).
Concerning the wording of items, 10 patients (5.0%) felt that the word ‘constraints’ was confusing, thus we replaced the word ‘constraints’ by the word ‘problem’, as suggested by the patients. Patients were also asked whether there were any important elements of treatment burden that they considered to be missing from questions: 15 patients (7.5%) thought that relationships between patients and healthcare providers were insufficiently covered in the original items. Other suggestions were either specific to a particular condition, were related to the burden of disease, or were already covered in the existing items. Thus, we added a new item for testing: ‘How would you rate the difficulties you could have in your relationships with healthcare providers (e.g., feeling not listened to enough or not taken seriously)?’ After the pretest, the English TBQ was therefore composed of 15 items, with rating scales ranging from 0 to 10 and labeled anchors (‘not a problem’ and ‘large problem’).
Assessment of validity and reliability of the English TBQ
We studied the measurement properties of the instrument by 1) describing the item properties, 2) assessing factor structure, 3) assessing construct validity, and 4) assessing reliability by test–retest.
We recruited a convenience sample of patients via the aforementioned internet platform. Patients were eligible if they were 18 years or older, and had at least one condition that had required ongoing health care for at least 6 months. We sent email invitations to a random sample of 3,000 members on the internet platform who did not participate in the pretest and who met the eligibility criteria, encouraging them to connect to the website and complete the questionnaire. To increase the number of respondents, an email reminder was sent after 2 months. Patients consented electronically to participate in the study. The recruitment message outlined the purpose of the study and reminded patients that they were under no obligation to participate, that their aggregated results may be published. Because there were no anticipated adverse consequences for participation, institutional review board (IRB) approval was not sought for this project.
Item properties were described using three criteria: 1) proportion of missing answers, 2) relevance of items assessed by the proportion of ‘does not apply,’ and 3) score distributions.
Factor structure was investigated by exploratory factorial analysis. Scree plots were used to visualize a break between factors with large eigenvalues and those with smaller eigenvalues. Factors that appeared before the horizontal break were assumed to be meaningful. Internal consistency was assessed by Cronbach’s α [21], and was considered acceptable between 0.70 and 0.95 [22].
The global score of the TBQ (TBQ Global score) was the sum of the answers to each item. ‘Does not apply’ or missing answers were considered the lowest possible score (0) because we considered that a patient not concerned by a domain of treatment burden had no burden for that domain.
Construct validity was tested by confirming four pre-specified hypotheses. First, we expected a negative correlation between treatment burden (as measured by the TBQ global score) by the TBQ global score) and quality of life. Quality of life was measured by the PatientsLikeMe Quality of Life (PLMQOL) scale, a validated 24-item questionnaire assessing physical, mental, and social quality of life. PLMQOL scores range from 0 to 100 for each domain (higher scores indicating better quality of life) and are summed for a global assessment of quality of life [23]. Second, we predicted an association between TBQ global score and adherence to medication: the greater the treatment burden, the lower the adherence to treatment. Adherence to medical treatment was measured by Morisky’s Medication Adherence Scale 8 (MMAS-8) [24, 25], a validated eight-item questionnaire, with scores ranging from 0 to 8. High adherence is a score of 8; medium adherence, 6 to 7; and low adherence, less than 6 [24]. Third, we hypothesized that patients with better knowledge of their conditions and treatments would have a low treatment burden. Confidence in patients’ knowledge about their treatments and conditions was assessed by two questions: ‘Do you think you have sufficient knowledge about your conditions (e.g., symptoms, disease progression)?’ and ‘Do you think you have sufficient knowledge about your treatments (e.g., possible side effects, expected benefits, other treatment options)?’. Answers were rated on a five-step scale: ‘very sufficient’, ‘sufficient’, ‘average’, ‘insufficient’ and ‘very insufficient’. Finally, we assumed a positive correlation between TBQ global score and the following clinical variables: 1) number of conditions, 2) drug administration (number of tablets, injections, and administrations per day), and 3) medical follow-up (number of different physicians, medical appointments per month, and hospitalizations per year).
To elicit the chronic conditions a patient had, we asked the patient to self identify the condition(s) from a list recommended as core for any measure of multimorbidity [26]. Options were presented as categories illustrated by common conditions; for example: ‘Rheumatologic disease (e.g. osteoporosis, arthritis, or inflammatory polyarthropathies)’. Patients were encouraged to complete their answer with free text. The text was analyzed, and the condition was categorized in the appropriate category by a single investigator (VTT).
The association between the TBQ global score, quality of life score and clinical variables was assessed by Spearman correlation coefficient (rs), which was considered high when greater 0.50, and moderate when 0.35 to 0.50 [27]. Wilcoxon and Kruskal-Wallis tests were used to compare qualitative variables across groups. P < 0.05 was considered statistically significant.
Reliability of the instrument was determined by a test–retest method. Patients were asked to complete the new instrument twice: at baseline and again after 2 weeks when they received an email reminder. Reliability was assessed by the intraclass correlation coefficient (ICC) for agreement [28], defined as the ratio of the subject variance by the sum of the subject variance, the rater variance and the residual. The 95% confidence interval (CI) was determined by a bootstrap method. Agreement was considered acceptable when ICC was greater than 0.60 [29]. Agreement was represented by Bland and Altman plots, which represent the differences between two measurements against the means of the two measurements [30].
Statistical analyses were performed using SAS (version 9.3; SAS Inst., Cary, NC, USA) and R (version 3.0 [31], the R Foundation for Statistical Computing, Vienna, Austria).