Pain relief that matters to patients: systematic review of empirical studies assessing the minimum clinically important difference in acute pain

Background The minimum clinically important difference (MCID) is used to interpret the clinical relevance of results reported by trials and meta-analyses as well as to plan sample sizes in new studies. However, there is a lack of consensus about the size of MCID in acute pain, which is a core symptom affecting patients across many clinical conditions. Methods We identified and systematically reviewed empirical studies of MCID in acute pain. We searched PubMed, EMBASE and Cochrane Library, and included prospective studies determining MCID using a patient-reported anchor and a one-dimensional pain scale (e.g. 100 mm visual analogue scale). We summarised results and explored reasons for heterogeneity applying meta-regression, subgroup analyses and individual patient data meta-analyses. Results We included 37 studies (8479 patients). Thirty-five studies used a mean change approach, i.e. MCID was assessed as the mean difference in pain score among patients who reported a minimum degree of improvement, while seven studies used a threshold approach, i.e. MCID was assessed as the threshold in pain reduction associated with the best accuracy (sensitivity and specificity) for identifying improved patients. Meta-analyses found considerable heterogeneity between studies (absolute MCID: I2 = 93%, relative MCID: I2 = 75%) and results were therefore presented qualitatively, while analyses focused on exploring reasons for heterogeneity. The reported absolute MCID values ranged widely from 8 to 40 mm (standardised to a 100 mm scale) and the relative MCID values from 13% to 85%. From analyses of individual patient data (seven studies, 918 patients), we found baseline pain strongly associated with absolute, but not relative, MCID as patients with higher baseline pain needed larger pain reduction to perceive relief. Subgroup analyses showed that the definition of improved patients (one or several categories improvement or meaningful change) and the design of studies (single or multiple measurements) also influenced MCID values. Conclusions The MCID in acute pain varied greatly between studies and was influenced by baseline pain, definitions of improved patients and study design. MCID is context-specific and potentially misguiding if determined, applied or interpreted inappropriately. Explicit and conscientious reflections on the choice of a reference value are required when using MCID to classify research results as clinically important or trivial.


Background
It can be challenging to decide whether a modest effect in a randomised clinical trial, or a meta-analysis of several such trials, is clinically relevant. Statistical tests inform on the probability of a result being a chance finding; however, they convey no information on whether a given effect will be experienced as important by patients. The degree of pain reduction that is considered clinically relevant has an impact on which analgesic interventions are regarded clinically useful. This interpretation problem for clinical relevance has been at the core of debates of the importance of several types of interventions intended for reducing acute pain, for example, paracetamol [1][2][3], non-steroidal anti-inflammatory drugs [4,5], morphine or synthetic opiates [6], corticosteroids [7], muscle relaxants [4], laser therapy [8], transcranial direct current stimulation [9], EMLA cream [10], and acupuncture [11]. A related challenge involves the calculation of sample sizes for clinical trials, where researchers need to know the smallest clinically important effect that the trial should not miss to be able to determine an adequate sample size.
Jaeschke et al. [12] characterised the concept of minimum clinically relevant difference in 1989 as "the smallest difference in score in the domain of interest which participants perceive as beneficial and which would mandate, in the absence of troublesome side effects and costs, a change in the patient's management". The strength of the concept is that it defines a relevant effect size based on clinical considerations and not only statistical significance [13,14]. It has subsequently been supplemented by a related conceptthe substantial (and not only minimum) clinically relevant difference [15].
The minimum clinically important difference (MCID) is sometimes chosen on the basis of expert consensus judgement [16], statistical models [17] or objective criteria [18]. However, in acute pain, it seems reasonable to anchor clinical relevance to the patients' experience. This approach is in accordance with the increasing awareness of the relevance of patient-reported outcomes in clinical research [19]. Several such empirical studies have been conducted to determine the MCID in acute pain, but they differ with regard to methodology, clinical condition and findings, and have not yet been systematically reviewed. Since acute pain is a core symptom in healthcare, an assessment of the MCID and a clarification of the causes for its variation will have broad interest. It has been suggested that baseline pain influences absolute values of MCID, but study reports have been conflicting [20,21], and it remains unclear which other clinical or methodological factors are of importance.
We therefore decided to systematically review empirical studies of the MCID in acute pain relief and to examine possible causes for variation between study results, especially their likely dependency on baseline pain levels. We also reviewed studies of the substantial clinically important difference in acute pain relief as well as clinically important differences for worsening of pain.

Eligibility criteria
We included prospective studies of patients with acute pain, regardless of age, clinical condition, and intervention, in which pain intensity was assessed on a onedimensional scale, e.g. a 100 mm visual analogue scale (VAS) or a 0-10 point numerical rating scale (NRS), and in which the MCID was determined using an anchorbased method using patients' perception of change to determine clinical importance. Pain was considered acute when its duration was less than 1 month or, if duration was not indicated, when it was described as such in a study report.
Studies were excluded if they were not clinical (i.e. used healthy volunteers) or determined the MCID from objective criteria (e.g. return to work), the distribution of data (e.g. the minimum detectable difference) or expert consensus.
A typical eligible study would ask patients to score their pain intensity, e.g. using a VAS, at baseline and follow-up. At follow-up, patients were also asked to categorise their change in pain intensity using response options such as 'no change' , 'a little better'/'somewhat better' , and 'a lot better'/'much better'. The MCID was then determined from the change in scores on the pain scale among patients having categorised their change as 'a little better' (or a similar expression indicating a minimum clinically important improvement).
We included studies with two types of analytical approaches (1) the 'mean change approach' , i.e. the mean difference in pain scores among patients who reported a minimum degree of pain relief [22]; or (2) the 'threshold approach' , i.e. the threshold value for pain score change which most accurately (yielding best sensitivity and specificity) identified patients experiencing relevant pain relief in analogy with a diagnostic test where the gold standard is patients' perception of change [23].

Search strategy
We searched PubMed, EMBASE and Cochrane Library until August 2016 with no language restrictions. The core search string was: (minimal OR minimally OR minimum OR 'clinically significant' OR 'clinically important' OR 'clinically meaningful' OR 'clinically relevant') AND (difference OR change OR relief OR reduction) AND ('pain measurement*' OR 'visual analog scale' OR 'numeric rating scale') AND (pain) with variations according to the specific database (Appendix 1). The reference lists of all included studies and relevant review papers were read systematically to identify further studies.
Screening of titles and abstracts to determine the eligibility of studies was done by the primary author (MFO), while the selected full-text records were examined by two researchers independently (MFO and either EB, NEL, BT, or MDH). Any disagreement was solved by discussion.

Data extraction and retrieval
Data extraction was conducted by two researchers independently (MFO and EB, BT, or NEL) using pretested data extraction forms generated in EpiData (EpiData Association, Odense, Denmark). Any disagreements were solved by discussion.
For each study, we extracted descriptive data including publication year, study design, setting, clinical condition, type of intervention, sampling method, sample size, and definition of patients with relevant change (see Appendix 2 for complete list). For studies using a mean change approach, we extracted the following outcome data: the MCID for pain relief (absolute values in mm or points and relative value in percent change from baseline) and for pain worsening (absolute and relative values), as well as the substantial clinically important difference for relief and worsening of pain (absolute and relative values). We extracted MCIDs as the mean change in pain score among patients who indicated a one-category improvement (e.g. 'a little better'). If unavailable, we extracted the mean change among patients who were minimally improved by authors' definition (e.g. some authors defined minimum important change as the mean change in pain score among patients with either a one-or two-category improvement). Similarly, we extracted the substantial clinically important differences as the mean change among patients with a two-category improvement or used the authors' definition. We extracted the point estimate of outcomes with their corresponding standard error or, if unavailable, other measures of variation such as standard deviation or 95% confidence interval.
For studies using a threshold approach, we extracted information about definition of responders (i.e. patients with relevant change) and non-responders and the cut-off point with its corresponding sensitivity (i.e. percentage of responders correctly classified as such) and specificity (i.e. percentage of non-responders correctly classified as such). If studies reported pain scores from several concurrent pain assessments (e.g. back pain and leg pain), we extracted the assessment where more data were available or, if no difference found, we randomly selected which to extract. All scales were standardised to a 0-100 mm scale. When studies reported pain assessments based on both VAS and NRS, we used the assessment based on VAS.
If the primary outcome or other key variables were unclear or incompletely reported from a study, we contacted the corresponding author. In cases where authors provided individual patient data, we first checked whether we could replicate a main result of the published paper. We then calculated estimates of absolute and relative MCIDs and their corresponding standard error.
For each study, we assessed risk of attrition bias (studies were considered low risk when attrition < 10%) and risk of non-representative study sample (studies were considered low risk if using consecutive or random sampling).

Data synthesis and analysis
For each study, we extracted or calculated the MCID for pain relief (absolute and relative change), and noted results of any study-based exploration of causes for variation, e.g. baseline pain.
We then summarised results qualitatively as there was considerable clinical and methodological variation between studies and heterogeneity in their results. To provide an overview, we first reported the range of results for all studies and then the range and median results with interquartile ranges (IQR) of studies according to analytical strategy (mean change or threshold approach). To facilitate exploration of reasons for heterogeneity, we then pooled results of studies using mean change approach with inverse-variance meta-analysis using random effects models. We studied the association with baseline pain scores in three different analyses. First, we explored the impact of the average population baseline pain in a mixed-effects meta-regression (acknowledging the limitations of aggregated data-level analysis [24]). Second, we pooled individual patient data to estimate the MCID (absolute and relative changes) using a twostage individual participant data meta-analysis. In these models, MCIDs were first estimated in each individual study using a mixed model based on all repeated measurements and participant specific random effects to capture the serial correlation within each participant. Results were then pooled using random-effects inversevariance meta-analysis. Third, we used the same model to derive outcomes measured at different time points in studies with multiple measurements per patients (i.e. the use of a 'moving baseline').
We furthermore used subgroup analyses to explore whether between-study variation was explained by differences in other clinical and methodological factors including clinical condition, type of pain scale (VAS vs. NRS), directionality of global transition scale (one-sided vs. two-sided scale), definition of minimum clinically important change (one category vs. several categories improvement vs. distinction between meaningful and non-meaningful change), number of pain assessments per patients (single vs. multiple assessments adjusted for correlation vs. multiple unadjusted assessments), risk of attrition bias (low vs. high or unclear) and risk of nonrepresentative sample (low vs. high or unclear).
Finally, we used the same analysis strategy for the substantial clinically important difference for pain relief (absolute and relative change) and the minimum and substantial clinically important differences for worsening of pain (absolute and relative change).
All data analyses were done using Stata/IC version 13.

Characteristics of included studies
The majority of studies were based in emergency departments and included a mix of patients with acute pain of both traumatic and non-traumatic origin (10 studies) or unspecified pain (7 studies). Additional studies included patients with post-operative pain, cancer-related pain, sickle cell crisis, rheumatic pain, abdominal pain, low back pain, or headache (Tables 1 and 2). All studies were published in English. Twenty studies assessed pain using a 100 mm VAS (or the similar Color Analog Scale), and 12 studies used an 11-point NRS (0-10), while five studies used both scales. In 32 studies, patients compared their current pain intensity with pain at their previous assessment, while they were asked to assess the effect of their treatment in five studies [25,49,58]. Transition scales were either two-sided (29 studies) with 3-15 response categories for both improvement and deterioration, or one-sided (8 studies) with five response categories addressing only improvement.   (12)(13)(14)(15)(16)(17)(18)(19)(20) 34 (27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41) Kelly [44] Kelly [45,46]   For studies using the mean change approach, the majority defined the minimum clinically important improvement as a one-category improvement on the transition scale (31 studies). The response categories were similar with wordings such as 'a little less pain' , 'a bit better' , 'slightly improved' , or 'slight relief'. In four studies, the MCID was defined as the mean change in pain score among patients with a one-or two-category improvement, thereby combing patients answering 'some relief' and 'partial relief ' [55], or 'much improved' and 'best ever' [49][50][51]. Finally, two studies differentiated between non-important and important change, using the categories 'inadequate relief ' and 'moderate relief ' in one study [40], and 'poor' and 'less good' efficiency of treatment in another [25] (Table 1). Studies using the threshold approach had large variation in transition scales and definitions of responders versus non-responders; patients were regarded as importantly improved if they indicated a one-category relief in two studies [41,57], while they needed a five-category improvement in another [58] ( Table 2).
Pain intensity was assessed at baseline and a single follow-up measurement in 14 studies, while it was assessed at multiple (from 2 to 16) follow-ups with intervals between 10 and 45 minutes in 23 studies. The latter group then derived their outcome as the summarised mean difference in pain score from the patients' previous pain assessment when they reported minimum relief (i.e. a series of 'moving baselines'). In eight of these studies, the P values of analyses were adjusted for correlation between estimates, for example, with Generalised Estimation Equation, while the remaining studies either made no adjustment or did not report this. Access to individual patient data increased the number of studies with adjusted estimates to 11. In 10 studies, the MCID was defined as a numeric change for all patients with minimum change, regardless of whether pain had improved or worsened. After contacting authors, separate estimates for pain relief were available from eight of these.

MCID regardless of analytical approach
Standardised to a 100-mm scale, the absolute MCID in 30 studies ranged from 8 to 40 mm, and the relative difference in 15 studies ranged from 13% to 85%.

MCIDs in studies using the mean change approach
The determination of the MCID was based on a mean change approach in 35 studies, of which 30 (6598 patients) were included in our analyses and five were disregarded (see below). Twenty-nine studies (6517 patients) reported absolute values ranging from 8 to 40 mm, with a median of 17 mm (IQR 14-23 mm) (Fig. 2a). Only nine of the 30 studies reported relative MCIDs, but access to individual patient data made relative values available from 14 studies (1617 patients) ranging from 13% to 85%, with a median of 23% (IQR 18-36%) (Fig. 2b). For data syntheses, we did not include results from five of the 35 studies (1567 patients) as they did not differentiate between pain relief and pain worsening [21,48], because median (and not mean) differences in pain were reported [26,33], or because outcomes were reported for subgroups and no overall estimate could be derived [36]. The range of MCID in these studies was comparable to the included studies: 10-19 mm. An additional six studies (493 patients) were not included in meta-analysis since information about standard error of estimates was unavailable [40,49,53,54]. The results from these studies ranged from 11 to 40 mm.
We had data usable for meta-analyses from 23 studies (6024 patients) reporting absolute values and 11 studies (1397 patients) reporting relative values. Meta-analyses of both the absolute and relative values showed considerable heterogeneity: I 2 = 93%, P < 0.001 and I 2 = 75%, P < 0.001 (Table 3). We present the meta-analyses for completeness and as a basis for exploring reasons for the heterogeneity, but stress that medians and interquartile ranges are more appropriate descriptors of the results.

MCID in studies using the threshold approach
Seven of the 37 included studies (2602 patients) determined clinically important differences as a threshold to differentiate between patients with or without relevant pain relief. Absolute thresholds ranged from 10 to 35 mm in six studies (2331 patients) with a median of 10 mm, and the relative threshold ranged from 15% to 50% in four studies (534 patients) ( Table 3). In one additional study [58], patients were defined as responders if they indicated at least a five-category improvement. The corresponding clinically important differences were thus higher (34 to 63 mm depending on baseline pain) than in studies where patients only needed a one- [41,57], two- [49], or three-category improvement [25], respectively, to be defined as responders (Table 2). Total number of patients in the included studies c I 2 is the percentage of the variability in results that is due to heterogeneity rather than sampling error (chance); I 2 of 0% to 40% might not be important, 30% to 60% may represent moderate heterogeneity, 50% to 90% may represent substantial heterogeneity, and 75% to 100% represents considerable heterogeneity

Impact of baseline pain scores on the MCID
Eleven studies had assessed the possible influence of baseline pain on minimum clinically improvement (Appendix 3). Of nine studies assessing absolute change, seven reported an association [31,33,36,43,58,59]. The two remaining studies found no association, but these were disregarded since they determined the MCID without differentiating between pain relief and pain worsening [21,45]. Six studies assessed the association between baseline pain and relative change and either found the association to be non-significant or found it to be weaker than for absolute change.
Based on meta-regression, we found no association between baseline pain and either absolute (20 studies, P = 0.70) nor relative (9 studies, P = 0.83) estimates of MCIDs.
From individual patient data, we also found that the MCID decreased with increasing time from baseline, from 17 (12 to 21) mm at 30 minutes to 11 (8 to 14) mm at 120 minutes. However, patients' pain level declined accordingly during the multiple follow-ups and estimates expressed as a relative change from the previous assessment therefore did not decline.
The MCIDs of subgroups are presented as medians and pooled averages, respectively (Table 4). For most clinical and methodological factors, the number of studies in each subgroup was too small to ensure detection of relevant differences between them. Nevertheless, although only few studies defined the MCID as the mean pain reduction among patients with several categories of improvement, or patients with 'meaningful' (and not just 'minimum') change, it was clear that these studies found higher MCIDs (medians 25 (IQR [23][24][25][26][27][28][29] and [34][35][36][37][38][39][40], respectively) than studies where it was  IQR 13-21)). It was also clear that the MCID was higher when based on a single assessment involving one fixed baseline value (median 25 (IQR 23-29)), than when summarised from multiple assessments with the previous assessment applied as a 'moving baseline' (medians 15 (IQR 13-16) and 16 (IQR 10-21), respectively). Subgroups of studies among patients with various clinical conditions were underpowered to detect relevant differences. The comparison of one-and two-sided transition scales was also underpowered, but the difference in scales did not seem to affect study results, while the comparison of VAS and NRS included enough studies to conclude that type of pain scale did not influence the MCID. Finally, we did not find differences in outcomes relating to the risk of attrition bias or risk of non-representative samples.

Supplementary outcomes
The supplementary outcomes for pain relief and worsening were only reported from studies using the mean change The median is based on studies included in the pooled average as well as studies with unavailable standard errors b Total number of patients in studies c I 2 is the percentage of the variability in results that is due to heterogeneity rather than sampling error (chance); I2 of 0% to 40% might not be important, 30% to 60% may represent moderate heterogeneity, 50% to 90% may represent substantial heterogeneity, and 75% to 100% represents considerable heterogeneity d Includes knee surgery, laparotomy, third molar extraction and mixed surgery (e.g. spinal fusion and splenectomy) e Includes headache, low back pain, sickle cell crisis, rheumatic condition, and pain flare after external beam radiotherapy for bone metastases approach ( Table 3). The results showed similarly high heterogeneity. The substantial clinically important difference for pain relief ranged from 18 to 54 mm (23 studies), while minimum and substantial clinically important differences for worsening of pain ranged from 8 to 21 mm increase (18 studies) and from 0 to 66 mm increase (16 studies), respectively.

Discussion
We included 37 studies (8479 patients) assessing the MCID in acute pain, of which 35 used the mean change approach and seven used the threshold approach. Meta-analyses found considerable heterogeneity between studies and, consequently, no single value of minimum clinical important difference could be meaningfully determined. Study results ranged widely both when they were reported as absolute change (from 8 to 40 mm) and as relative change from baseline (from 13 to 85%). The median of study results based on the mean change approach was 17 (IQR 14 to 23) mm and 23 (IQR 18 to 36) % for absolute and relative values, respectively. Reasons for heterogeneity were explored and baseline pain was identified as a cause of variation in absolute, but not relative, outcomes. In addition, the definition of minimum clinically important change and the use of multiple assessments per patient influenced study results. High heterogeneity was also found for assessments of substantial clinically important difference as well as for worsening of pain.

Strengths and limitations
To our knowledge, this is the first systematic review of MCIDs in acute pain. We identified 37 studies involving over 8000 patients and a broad range of clinical conditions, study approaches and pain scales. We gained access to unpublished data from 10 studies, including individual patient data from seven studies (918 patients). This ensured high data quality and uniform analysis, and enabled an adequate assessment of the association with baseline pain avoiding the risk of ecological fallacy [24] inherent to aggregated studylevel data. The median results of studies providing individual patient data was comparable with the remaining studies and we have no reason to believe these studies were not representative. Association with baseline pain has been reported from individual studies [31,33,36,43,[57][58][59], but the present review is the first comprehensive assessment of the impact of baseline pain across studies. In addition, we identified variation in study designs (single or multiple assessments) and definitions of patients with minimum relief as factors influencing the MCID. However, we were not able to fully explain the large heterogeneity among studies. We found no effects of pain scale, but for the comparison of clinical conditions and directionality of transition scale, subgroups involved too few studies to ensure detection of all relevant associations. In addition, our ability to assess clinical condition was limited by the fact that many studies included a mixed patient group and we did not have access to individual patient diagnoses. Similarly, the studies included a variety of analgesia and other treatments which did not allow an assessment of potential impact of interventions. Regarding the risk of attrition bias and non-representative sampling, the majority of studies were categorised as unclear and potential impact could therefore not be assessed. Most importantly, acknowledging the association with baseline pain, it would have been more accurate to base subgroup analyses on relative outcomes, but the data available only allowed comparison of absolute outcomes. We could not assess impact of various description of pain (e.g. "intensity") or the follow-up time between measurements since there was not enough variation between studies. Furthermore, the available data did not allow an assessment of the potential influence of pre-existing pain level (e.g. if patients are affected by chronic pain in addition to their current episode of acute pain), pre-existing use of pain relief or the psychological state of patients, since this was not reported by any of the studies. Finally, we cannot dismiss the risk of recall bias in studies where patients simultaneously assess their pain status and perceived change [60]. Differences in baseline pain or other methodological or clinical factors may influence subgroup analyses of study-level data. Thus, better access to individual patient data would greatly improve the chances of identifying causes of heterogeneity.

Other studies
Only few systematic reviews of the minimum clinically relevant change have been published despite a vast number of primary studies. Stauffer [20] and Erdogan [61] reviewed studies of minimum clinically relevant change in pain scales used for chronic rheumatologic conditions, but we have not identified any systematic reviews focusing on acute pain.
The problem of variability in results of studies of minimum clinically relevant change has previously been addressed primarily when attempting to reconcile dissimilar results from different approaches, e.g. anchor-based and distribution-based studies [62]. Our study demonstrates considerable unexplained variation also within anchor-based approaches. In line with our findings, Terwee [63] found variability between results of five studies of minimum clinically important change on the Western Ontario and McMaster University pain subscale for osteoarthritis. In a systematic review of the minimum clinical important difference in chronic pain, we have found similar issues of high study variability (manuscript in preparation).

Mechanisms and perspectives
We included studies with a patient-reported anchor. While some find that using a patient-reported criterion as an anchor for a patient-reported score is circular and basically flawed [64], we would argue that pain intensity is essentially a subjective experience best expressed by and anchored to those experiencing it. Other observerbased anchors may be used when the outcome of interest is return to work or daily level of activity [65]. The varying content of the patient-reported anchors is, however, problematic. The applied transition scales were either one-or two-sided, allowing patients to report their degree of change (or only relief) by choosing between anywhere from three to 15 response categories. The majority of studies then determined the MCID as a mean change in pain score among everyone reporting a one-category pain relief. However, this value does not apply to all individuals in the group, since their differences in pain are distributed around the mean [14]. In contrast, MCIDs expressed as threshold values are derived with the intention of obtaining the best possible discrimination between patients with and without relevant relief. The frequency of false-positive and false-negative results may be reduced but is not eliminated by this approach. Therefore, caution is always merited when bringing an overall estimate of important change to the level of interpretation for an individual patient [66,67].
The studies we included varied considerably both in methods and analytical approaches. As would be expected, differences in the definition of patients with minimum important change impacted study results. We also found that the use of multiple measurements per patient resulted in lower outcomes. This corresponded with the finding that outcomes decreased during several follow-ups as patients' pain declined over time. Furthermore, one in four of the reviewed studies did not differentiate between minimum relief and minimum worsening of pain in their original study reports. The practice of combining groups with minimum change, regardless of its direction, is sometimes based on a seemingly similar distribution of data in the two groups [44]. However, although they may be similar at one point, the MCID for pain relief and worsening will change in opposite directions with variations of baseline pain (since patients with higher baseline pain require larger pain reduction to perceive relief, but smaller increase to perceive worsening of their condition).
The association between MCID and baseline pain may to some extent be explained by 'regression towards the mean' as patients are likely to score closer to the mean if their initial scores were more extreme because of chance [68]. However, it is also very plausible that patients with higher pain require greater decrease to perceive relief. Relative changes are therefore more stable indicators of clinically important differences, although they lack interval scale properties at the scale extremes, e.g. when baseline values are close to zero and small degrees of pain change result in very large relative changes [69]. From this review, however, it is clear that the advantage of relative values is largely overlooked, since only 10 of 37 studies (27%) reported relative change.
This review included studies that determined MCID from an anchor-based method using patients' perception of change to determine clinical importance. Although this is the most common approach, it is just one among a wide range of alternative methods. Revicki noted that retrospective self-reports of pain relief tend to correlate more strongly with the endlevel of pain than the start level, implying that the current status matters to patients more than the degree of improvement [70]. This has led to the development of the concept of 'patient acceptable symptom state' , defined as the level of symptoms which patients feel is acceptable [71,72]. Patient acceptable symptom state corresponds with the dominant aim of clinical patient care to reduce pain to an acceptable level [73] and could be a strong candidate for an alternative to MCID. Other promising approaches have integrated intervention costs and side effects [74][75][76][77].
It is likely that the challenges of MCID, apparent for acute pain, may not be isolated to that specific research area. Acute pain stands out because of the many studies that have been conducted, reflecting the status of acute pain as a core symptom in clinical practice. Our study can thus be seen as a model for a more general challenge with empirical assessments of the MCID.
The methodological challenges embedded in the empirical assessment of MCID are of such an extent that merits caution for its use and interpretation. It is clearly inappropriate to use and interpret MCID as a kind of clinical scale constanta characteristic which, once empirically determined, is universally valid. This is, however, often the practice seen [78]. Nevertheless, there is a strong and reasonable demand for a structured approach for evaluating whether effects of interventions are clinically meaningful to patients.

Implications
The choice of a reference value has large consequences for the number of patients needed in a trial, e.g. four times as many patients will be included, if researchers accept a MCID value of 12 mm as compared to 24 mm. Further, the conclusion about the clinical relevance of a trial result is often based on whether a mean difference exceeds a chosen reference value, but with the large span of MCIDs available in the literature, it is highly problematic to randomly pick one or a few single assessments for guidance. The considerable variation means that it is necessary to conscientiously and explicitly reflect on the span of results in relation to context-specific clinical and methodological factors, as presented in this review, with a particular focus of the baseline pain of patients, whether repeated measurements were used, and how minimum relief was defined. A starting point for such an exercise by individual clinicians or researchers, or by consensus building committees, could very well be our overview of studies and their results.
In future studies there is a clear need for uniform guidelines for standardised conduct, analyses and reporting of the MCID, especially for how transition scales and questions are structured and how data are analysed. We strongly encourage using values relative to baseline painalso for multiple measurements where the patient's last assessment should be applied as a 'moving baseline' , standardising the definition of relevant pain relief, and distinguishing clearly between improvement and worsening of pain. In addition, since the influence of clinical and methodological factors is difficult to identify from aggregated data, we encourage improved access to individual patient data to enable further exploration of the causes of heterogeneity.

Conclusion
The MCID in acute pain varied greatly between studies. Absolute MCID ranged from 8 to 40 mm in 29 studies, and relative values ranged from 13% to 85% in 14 studies. Baseline pain was strongly associated with absolute, but not relative, values and variation in definitions of minimum relief and study designs influenced study results. Due to the heterogeneity between study results, no meaningful overall value of minimum clinically important change can be concluded. Instead, we recommend that MCIDs are considered context-specific and take account of baseline pain. The MCID in acute pain is central for the interpretation of results of randomised trials and meta-analyses and for determining appropriate sample sizes for new trials, but it is potentially misguiding if determined, applied or interpreted inappropriately. Explicit and conscientious reflections on the choice of a MCID value are required, when using it to classify research results as clinically important or trivial.

Data extraction
Descriptive data For each study we extracted the following data: Publication year, study design, setting, type of intervention, sample size, sex, age and clinical condition of patients, pain level at baseline, type of pain scale (VAS, NRS or other) and scale dimensions, type of anchor including wording of the transition question, concept (e.g. perception of pain relief, overall status or treatment effect), directionality (one-or twosided), number of discriminating subcategories, wording of items used to obtain minimal clinically important difference, method of data collection including follow-up time, number of follow-up measurements per participant, whether analysis was adjusted for dependency between measurements and applied method, sampling method (e.g. consecutive, random or convenience), attrition rate, and information about any subgroup analysis that had been undertaken in the studies.
Outcome data For studies assessing clinically important differences using the mean difference approach, we extracted the following outcomes as mean change scores with corresponding standard error or, if unavailable, other measures of variation such as standard deviation or 95% confidence interval: The minimal clinically important difference for pain relief (mean absolute and relative change) The substantial clinically important difference for pain relief (mean absolute and relative change) The minimal clinically important difference for pain worsening (mean absolute and relative change) The substantial clinically important difference for pain relief (mean absolute and relative change) For studies assessing clinically important differences using the threshold approach, we extracted information about the definition of responders (patients with relevant pain relief ) and non-responders and the cut-off point that differentiated between them with its corresponding sensitivity and specificity. Two additional studies are not included, since they did not differentiate between pain relief and worsening when assessing the relationship between baseline pain and MCID (Kelly [46] and Lopez [21]) b Patients answering 'some' or 'partial' relief c Data is 75th percentile among patients answering 'a great deal better' (13) or 'a good deal better' (14) on a 15 category scale MCID minimum clinically important difference, NR not reported, CI confidence interval Appendix 3