Skip to main content

Archived Comments for: Fatigue Intervention by Nurses Evaluation – The FINE Trial. A randomised controlled trial of nurse led self-help treatment for patients in primary care with chronic fatigue syndrome: study protocol. [ISRCTN74156610]

Back to article

  1. Why is the 11-item bimodal Chalder Fatigue Scale being used as a primary outcome measure?

    Tom Kindlon, Irish ME/CFS Association - for Information, Support & Research

    17 June 2008

    Stouten [1] has analysed some commonly used fatigue scales in the CFS area: the Chalder Fatigue Scale, the Checklist Individual Strength and the Krupp Fatigue Severity. He calculated the lower bound for the number and percentage of items with the maximum score for several studies and found that "extreme scoring" was common in studies in the field using these instruments.

    What is clear from this analysis is that the bimodal Chalder Fatigue Scale comes out worst in this regard.

    Two of the authors of the current study should have been "intimately" aware of this problem as they were involved in one [2] of the two studies examined by Stouten [1] that used the 11-item bimodal Chalder Fatigue Scale.

    Here is the data from that study[2]:

    Baseline assessment for four intervention groups:

    Mean (95% Confidence Interval):

    10.6 (10.4 to 10.9)

    10.4 (10.0 to 10.7)

    9.9 (9.2 to 10.6)

    10.2 (9.9 to 10.6)

    In percentage terms, this means that the lower bounds for the number of items with the maximum score for intervention groups were:





    One can also calculate a lower bound for the percentage of patients who scored a maximum of 11 out of 11 for the items in the four intervention






    It should be remembered that these are lower bounds and the actual figure is likely to be higher (unless the authors give this data one can't calculate it from the mean).

    This can be illustrated by calculating the lower bound for the percentage of maximum (11 out of 11) scores for one study of an outpatient clinic in London[3] and comparing it to the actually precentage with the maximum score that was given.

    The Mean Score on the 11-item bimodal Chalder Fatigue Scale was 9.9. This translates to a lower bound for the percentage of patients who scored a maximum of 11 out of 11 of 0% (this could be achieved, for example, by 90% of the patients scoring 10/11 and 10% scoring 9/11). However the actual percentage of patients who scored the maximum (11/11) was 58%.

    This shows that the 11-item bimodal Chalder Fatigue Scale doesn't just have a low ceiling for each individual question but also for the total score when applied to Chronic Fatigue Syndrome patients.

    Why is this important? Well, as the authors point out, some surveys of patient groups have found patients reporting being made worse by interventions such as Graded Exercise Therapy and Cognitive Behavioural Therapy (CBT).

    For example, the results of a large survey with 2338 respondents were published in a report for the Chief Medical Officer[4]: it found that, of 285 who had done Cognitive Behavioural Therapy, 26% reported being made worse by the program, compared to 7% who said they were helped and 67% who said it made no difference. Of 1214 people who had done a graded exercise program, 50% had been been made worse by it (compared to 34% who said it helped and 16% who said it made no difference).

    These are not once-off results. For example, a recently published report of 2763 patients with ME or CFS in the UK[5] which asked about people's experiences of treatments over the last three years, found that of 699 who said they'd tried Graded Exercise Therapy, 34% said they'd been made worse by it compared to 45% who said they'd been helped and 21% who said it made no difference.

    The contention that people would not have being made worse by a treatment if they had done the treatment under specialist supervision, is not backed up by the data from this study[5]. Patients were asked who provided the GET treatment. Of the 567 who answered this question, 181 (31.92%) said it had made them worse compared to 276 (48.68%) who said it helped and 110 (19.40%) who said it made no difference; these are very similar percentages to the subgroup of 335 patients who had done the management strategy under an "NHS specialist": 111 (31.27%) of this group said they'd been made worse compared to 162 (45.63%) who said they'd been helped and 82 (23.10%) who said it made no difference.

    Again these don't appear to be once-off figures. In 2003, Action for ME did a smaller survey of 550 members asking about their experiences of Treatments[6]. 354 (64%) replied. The numbers for this study were small but if you combined the data from those who had done GET under a Physiotherapist, Occuapational Therapist, Doctor or Behavioural Therapist, the results are: Negative 22 (56.41%) Neutral 2 (5.13%) Positive 15 (38.46%). [These don't compare favourably to the small group who did GET under no professional: Negative 1 (8.33%), Neutral 4 (33%) and Positive 7 (58%)]. A large percentage of the patients also reported being made by made worse by CBT in this study. Of 55 who had done CBT under a CBT Therapist/psychologist, Doctor/Psychiatrist, CPN/Other Mental Health, Counsellor/Psychotherapist, OT or Nurse specialist, 19 (34.55%) said it had made them worse compared to 24 (43.64%) who had been helped and 12 (21.82%) who said it made no difference.

    The reason this is important is that if somebody already has a score of 11/11, they can't come up on the 11-item bimodal Chalder Fatigue Scale as being made worse on the treatment. Indeed, once they improved on one item, it would be marked as an improvement overall even if they actually felt worse on one or more of the other 10 items. Of course, this doesn't just apply to patients who score 11 out of 11; a patient could score 10/11 (say), feel worse on several items they'd already scored the maximum but come out as an improver once they improved on one idea. Saying all that, it would be good if the authors reported how many patients in each branch of the study scored the maximum.

    Some of the items on the 11-item Chalder Fatigue Scale may also not be good as a measure of severity of CFS or ME. For example, somebody could be severe but not answer positively to the question, 'do you feel sleepy or drowsy'. Similarly a patient could disimprove and still not answer positively to this question. So a patient who scores 11 may not necessarily be more severely affected on a patient scoring 10. Saying all this, the generally very high scores (i.e. close to 11 on average) found in previous studies in the field suggest that the Likert method (0,1,2,3) of scoring does appear preferable. However it too also suffers from a ceiling effect although not to the same extent[1]. Also as previously pointed out, because of questions such as 'do you feel sleepy or drowsy', (which many if not most would feel are not intrinsic to Chronic Fatigue Syndrome or ME), even the likert version of the Chalder Fatigue Scale is not ideal for measuring severity of the condition.

    [1] Stouten B: Identification of ambiguities in the 1994 chronic fatigue syndrome research case definition and recommendations for resolution. BMC Health Serv Res. 2005 May 13;5:37.

    [2] Powell P, Bentall RP, Nye FJ, Edwards RHT: Randomised controlled trial of patient education to encourage graded exercise in chronic fatigue syndrome. BMJ 2001, 322:387-390.

    [3] Jenkins M, Rayman M. Nutrient intake is unrelated to nutrient status in patients with chronic fatigue syndrome. Journal of Nutritional & Environmental Medicine, 2005, 15, 4, 177-189.

    [4] Independent Working Group: A Report of the CFS/ME working group. Report to the Chief Medical Officer of an Independent Working Group. London: Department of Health; 2002.

    [5] Action for ME & Association of Youth with ME Survey 2008. (Accessed 31st May 2008)

    [6] Action for ME Survey 2003. (Accessed 31st May 2008)

    Competing interests

    No competing interests

  2. Study confirms problems of using the 11-item bimodal Chalder Fatigue Scale as an outcome measure

    Tom Kindlon, Irish ME/CFS Association - for Information, Support & Research

    23 October 2008

    A study[1] has recently been published which confirms the problems 11-item bimodal Chalder Fatigue Scale I highlighted in my first comment[2].

    In a group of 26 people with ME recruited from a local support group in England, it found a mean bimodal score of 9.81 (SD 2.04). Fifty per cent of the patients recorded the maximum score using the bimodal method and 77% recorded the two highest scores.

    The two questions which attracted the lowest scores and which were responsible for most of the variance were ‘do you feel sleepy or drowsy’ and ‘do you have problems starting things’. These are hardly crucial elements of CFS/ME.

    [1] Goudsmit EM, Stouten B, Howes S: Fatigue in Myalgic Encephalomyelitis. Bulletin of the IACFS/ME - Volume 16, Issue 3. i.e.

    [2] Kindlon Tom: Why is the 11-item bimodal Chalder Fatigue Scale being used as a primary outcome measure?

    Competing interests

    No competing interests

  3. Further comments on the outcome measures being used and suggestions for other outcome measures that could be useful in such trials

    Tom Kindlon, Irish ME/CFS Association - for Information, Support & Research

    23 January 2009

    In the protocol, the authors say the following:

    "A 2001 systematic review of all treatments for CFS/ME concluded that cognitive behaviour therapy (CBT) and graded exercise therapy (GET) were the most promising treatments for CFS/ME, but that owing to the small number of studies available for review, the generalisability of these results could not be assured [1]*. The authors recommended that further studies be carried out using standardised outcome measures."

    They authors neglect to say that the authors of that review also recommended the use of more objective outcome measures:

    e.g. "Outcomes such as 'improvement,' in which participants were asked to rate themselves as better or worse than they were before the intervention began, were frequently reported. However, the person may feel better able to cope with daily activities because they have reduced their expectations of what they should achieve, rather than because they have made any recovery as a result of the intervention. A more objective measure of the effect of any intervention would be whether participants have increased their working hours, returned to work or school, or increased their physical activities."

    It is very disappointing that the organisers of this trial have not taken this on board with this study. Given the cost of the trial (over £1m), the cost of some actometers (for example) would have virtually neglible.

    Existing research has some interesting findings on the issue.

    For example, one study (on a single patient)[2] found "using a 26-session graded activity intervention involved gradual increases in physical activity" that "from baseline to treatment termination, the patient’s self-reported increase in walk time from 0 to 155 min a week contrasted with a surprising 10.6% decrease in mean weekly step counts."

    The authors of the current trial refer to the Prins (2001) study[3] as an example of a study which supposedly found that hospital-based hospital-based CBT was an "effective treatment" for CFS. Judging by some of the questionnaire data, it does look like CBT has had an effect. However the actometer data from this study subsequently became available[4) and the

    increases in activity were minimal. For instance, the baseline average for the group which received CBT was 67.9, which increased to 68.8 after treatment and to 72.2 at follow-up. About 4 points. Not unlike the medical care controls, who went from 64.9 to 68.7 in the same period.

    One of the aims of CBT (for CFS) has been said to be "increased confidence in exercise and physical activity"[5].

    Thus it may be the case that when asked questions about one's ability to do things, such as in the physical functioning subscale of SF-36 (one of the three primary outcome measures in the FINE Trial), the patients might say that they are "Limited A Little" or "Not Limited At All" but may be just as limited as patients in other arms of study who say "limited a lot".

    The physical functioning subscale is the primary outcome measure that is also being used to measure "clinically significant improvement" ("An improvement of 50% or more on the SF-36 physical functioning scale, or a score of 75% or more on that scale, will be considered a clinically significant improvement"). It is not an objective instrument, particularly in a psychosocial trial.

    In my two previous comments, I have criticised the use of the bimodal Chalder Fatigue Scale as an outcome measure in a trial of patients with "CFS/ME". Recently another trial[6] was published involving CFS patients in the UK. The mean score was not given but the median mark was 11. That is to say, at least 50% of the people scored the maximum mark before the

    intervention. These people can not "get worse" on the scale using the scale even if they feel worse.

    The third and final primary outcome measure being used is a quality of life measure. Although it may be useful to measure it quality of life, the findings of a recent study[7] make for interesting reading. It used 73 patients, also with a diagnosis of CFS according to the Oxford criteria, from UK clinics. It involved using principal-component analysis to analyse various bits of questionnaire. The Principal-component analysis of all scale scores revealed 2 distinct components, explaining 53% of the total variance. The results are summarised in the following extract: "The perceived incapacity in fulfilling social and physical roles may be best captured by the subscales of the SF-36 on social and physical functioning. The scores on these subscales are associated with vitality and inversely with one of the defining symptoms of CFS, i.e. fatigue (Chalder Fatigue Scale, Fatigue Visual Analogue Scale). They are also associated with other physical symptoms (SDQ, SCL-90-R subscale ‘somatization’), but not with psychological symptoms such as depression (Beck Hopelessness Scale, SCL-90-R subscale ‘depression’) and anxiety (Spielberger Trait Anxiety Questionnaire, SCL-90-R subscale ‘anxiety’). These psychological symptoms are linked to a generic measure of quality of life (MANSA), reflecting satisfaction with life in general and life domains, and to emotional role functioning and mental health (SF-36, subscale)."

    Of course, the instrument to measure quality of life is different in this study so the relevance of this study is unclear at this time. But like the other two outcome measures (Chalder Fatigue Scale and SF-36 PF), the Euroqol is subjective.

    It is disappointing that there apart from checking for the presence or absence of the CDC criteria, there appears to be no measurement of other symptoms apart from fatigue. (And of course I've already pointed out the problems with using the bimodal Chalder Fatigue Scale e.g. it's hard for some patients to score "worse" scores using the scale). But most researchers do not think CFS = fatigue. Even if some patients have few other symptoms because the Oxford criteria is being used, this should not have mattered.

    There are numerous pre-existing instruments out there that measure other symptoms associated with CFS. Off the top of my head, two that come to mind are the Chronic Fatigue Syndrome Symptom List and the CFS CDC Symptom Inventory. It encompasses the 19 most frequently reported symptoms in a sample of 1578 chronic fatigue syndrome patients[8]. In order to assess the severity of the symptoms included in the Chronic Fatigue Syndrome Symptom List, visual analogue scales (100 mm) are used. The Symptom Inventory "collects information about the presence, frequency, and intensity of 19 fatigue and illness-related symptoms" including the 8 CDC criteria symptoms. Perceived frequency of each symptom is rated on a four-point scale (1=a little of the time, 2=some of the time, 3=most of time, 4=all of the time), and severity or intensity of symptoms was measured on a three-point scale (1=mild, 2=moderate, 3=severe). To summarize the degree of distress associated with each symptom, individual symptom scores were calculated by multiplying the frequency score by the intensity score. The scoring would not have to be done like this - for example in the same paper I quoted from for the method of scoring above, the CDC team[8] used the following method: they "transformed the intensity scores into equidistant scores before multiplication (i.e., 0 = symptom not reported 1 = mild, 2.5 = moderate, 4 = severe) resulting in range 0–16 for each symptom." A total score for each person can be calculated by summing the 19 individual symptom scores (possible range from 0 to 304). A Case Definition score can be calculated as the sum of the 8 individual CFS case-definition symptom scores and an Other Symptoms score by considering only the 11 non-CFS symptoms.

    Calculating levels of various symptoms like this would have given a better overall idea of the health of the patients and how badly affected they were by "CFS/ME".

    They could also have been used before and after the exercise testing. In most management strategies these days, whether they're based on a graded exercise/activity model or a pacing model, patients are discouraged from "boom and bust" i.e. doing too much or pushing themselves and then crashing with lots of symptoms. Faced with the exercise testing, a patient who is good at avoiding "booming and busting" may not push themselves as hard as another patient. That does not mean they are not as well or do not manage their illness as well as another patient. One way of measuring whether this occurred with the exercise testing was if measures were used before and after the exercise testing. Given the post-exertional nature of many of the symptoms of "CFS/ME", it can be good not to restrict testing just to the day of exercise testing.

    There are some examples in the literature of patients being followed up after exercise testing. For example, Nijs[10] performed a gentle walking exercise on patients where they walked on average 558m(+/-340) (range: 120-1620) at a speed of 0.9m/s (+/-0.2) (range: 0.6-1.1). This resulted in a statistically significant (p<0.05) worsening of scores in the following areas when comparing pre-exercise, post-exercise and 24 hour post-exercise scores using ANOVA: VAS fatigue, VAS musculoskeletal pain, VAS sore throat, SF-36 bodily pain and SF-36 general health perception. 14 out of 24 subjects experienced a clinically meaningful change (worsening) in bodily pain (i.e. a minimum change of the SF-36 bodily pain subscale score of at least 10).

    In another study, Lapp [11] reported on the effects of 31 patients to his practice who were asked to monitor their symptoms three weeks before to 12 days after a maximal exercise test. 74% of the patients experienced worsening fatigue and 26% stayed the same. None improved. The average relapse lasted 8.82 days although 22% were still in relapse when the study ended at 12 days. There were similar changes with exercise in lymph pain, depression, abdominal pain, sleep quality, joint and muscle pain and sore throat.

    Actometers could also have been used in the period before and after testing to see whether there was "booming and busting" and so to see whether the exercise testing alone is useful or not.

    I am unsure whether much of what I've written can be used at this stage for this study but it may be useful for people interpreting the results as well as for others designing further trials.

    * When quoting from the paper's text, I have changed the reference numbers of papers to ones I've used.

    1. Whiting P, Bagnall A-M, Sowden A, Cornell J, Mulrow C, Ramirez G: Interventions for the treatment and management of chronic fatigue syndrome. A systematic review. JAMA 2001, 286:1360-1368.

    2. Friedberg, F. Does graded activity increase activity? A case study of chronic fatigue syndrome. Journal of Behavior Therapy and Experimental Psychiatry, 2002, 33, 3-4, 203-215

    3. Prins JB, Bleijenberg G, Bazelmans E, et al. Cognitive behaviour therapy for chronic fatigue syndrome: a multicentre randomised controlled trial. Lancet 2001; 357: 841-47.

    4. Van Essen, M and de Winter, LJM. Cognitieve gedragstherapie by het vermoeidheidssyndroom cognitive behaviour therapy for chronic fatigue syndrome). Report from the College voor Zorgverzekeringen. Amstelveen: Holland. June 27th, 2002. Bijlage B. Table 2.

    5. O'Dowd, H., Gladwell, P., Rogers, CA., Hollinghurst, S and Gregory, A. Cognitive behavioural therapy in chronic fatigue syndrome: a randomised controlled trial of an outpatient group programme. Health Technology Assessment, 2006, 10, 37, 1-140.

    6. Roberts AD, Papadopoulos AS, Wessely S, Chalder T, Cleare AJ. Salivary cortisol output before and after cognitive behavioural therapy for chronic fatigue syndrome. J Affect Disord. 2008 Oct 18.

    7. Priebe S, Fakhoury WK, Henningsen P. Functional incapacity and physical and psychological symptoms: how they interconnect in chronic fatigue syndrome. Psychopathology. 2008;41(6):339-45.

    8. De Becker P, McGregor N, De Meirleir K. A definition-based analysis of symptoms in a large cohort of patients with chronic fatigue syndrome. J Intern Med 2001; 250: 234–40.

    9. Wagner D, Nisenbaum R, Heim C, Jones JF, Unger ER, Reeves WC. Psychometric properties of the CDC Symptom Inventory for assessment of chronic fatigue syndrome. Popul Health Metr. 2005 Jul 22;3:8.

    10. Nijs J, Almond F, De Becker P, Truijen S, Paul L. Can exercise limits prevent post-exertional malaise in chronic fatigue syndrome? An uncontrolled clinical trial. Clin Rehabil. 2008 May;22(5):426-35.

    11. Lapp, C (1997). Exercise limits in chronic fatigue syndrome. Am J Med, 103: 83-84.

    Competing interests

    No competing interests

  4. Comment on the discussion of the effectiveness of interventions for CFS

    Tom Kindlon, Irish ME/CFS Association - for Information, Support & Research

    11 February 2009

    The authors state that, in this study, "effect sizes and confidence intervals will be reported for each group", which is to be welcomed.

    As well as giving the protocol for the FINE Trial, this paper also gives information and data from some previous studies in the area including saying some treatments have been shown to be "effective". However, it does not report effect sizes.

    Readers of this paper may be interested to know about a recent meta-analysis of the efficacy of CBT for CFS[1]. The studies involved a total of 1371 patients. It involved calculating the size of an effect measure, the Cohen's d value.

    They calculated d using the following method:

    "Separate mean effect sizes were calculated for each category of outcome variable (e.g., fatigue self- rating) and for each type of outcome variable (mental, physical, and mixed mental and physical). Studies generally included multiple outcome measures. For all analyses except those that compared different categories or types of outcome variables, we used the mean effect size of all the relevant outcome variables of the study."

    d was calculated to be 0.48.

    For anyone unfamiliar with Cohen's d values, they are not bounded by 1; also, the higher the score, the bigger the "effect size" i.e. the more "effective" a treatment was found to be. Cohen's d values are considered to be a small effect size at 0.2, a moderate effect size at 0.5, and a large effect size at 0.8[2].

    CBT had a more general definition in this paper and included some papers on GET. For example, the current paper says that, "Graded exercise therapy has also been shown to be effective in randomised controlled trials with selected hospital patients[3] and in our own previous study with a more general sample of hospital patients[4]." Malouff et al[1] calculated the d value for these studies as 0.46 (95% CI: 0.03 -0.95) and 0.17 (95% CI: -0.30 - +0.65) respectively (For the latter study, Malouff et al calculated the figures by comparing drug (i.e. antidepressant) treatment plus CBT to drug treatment without CBT).


    [1] Malouff, J. M., et al., Efficacy of cognitive behavioral therapy for chronic fatigue syndrome: A meta-analysis. Clinical Psychology Review (2007), doi:10.1016/j.cpr.2007.10.004

    [2] Cohen J: Statistical power analysis for the behavioural sciences. Edited by: 2. New Jersey: Lawrence Erlbaum; 1988.

    [3] Fulcher KY, White PD: Randomised controlled trial of graded exercise in patients with the chronic fatigue syndrome. BMJ 1997, 314:1647-1652.

    [4] Wearden, A. J., Morriss, R. K., Mullis, R., Strickland, P. L., Pearson, D. J., Appleby, L., et al. (1998). Randomised, double-blind, placebo-controlled treatment trial of fluoxetine and graded exercise for chronic fatigue syndrome. British Journal of Psychiatry, 172, 485?490.

    Competing interests

    No Competing Interests