From the diagnostic test accuracy of signs and symptoms analysis, all symptoms and signs included in the analysis have only a modest ability to discriminate patients with GABHS pharyngitis from those without it (range +LR 1.45-2.20, range -LR 0.53-0.71); therefore no sign or symptom on its own has the power to rule in or rule out a diagnosis of GABHS pharyngitis. Fever and 'any exudates' have a higher specificity than sensitivity and are more valid for ruling in a diagnosis of GABHS pharyngitis when present, while absence of cough and tender anterior cervical adenopathy have a higher sensitivity than specificity and are more valid for ruling out GABHS pharyngitis when absent. Based on our analysis it could be argued that the signs and symptoms present in the Centor score could be given different weights depending on whether the aim of the physician is to rule in or rule out a diagnosis of GABHS pharyngitis. However, it is highly unlikely that the benefit would outweigh the cost of complicating such a simple score.
In terms of diagnostic accuracy, our analysis of the Centor score as a decision aid for antibiotic prescribing suggests that although the score is reasonably specific when ≥ 3 signs or symptoms are present (0.82) and very specific when 4 are present (0.95), the post-test probability of GABHS pharyngitis is relatively low (that is, for a prevalence of 15% and a score of ≥ 3, post-test probability is 32%, Table 4). Therefore, although the Centor score can enhance appropriate prescribing of antibiotics, it should be used with caution as treating all patients presenting with a sore throat and a score of ≥ 3 may lead to many patients being treated with antibiotics inappropriately (Table 4).
In terms of calibration, the Centor score produces consistent observed:predicted performance across all risk strata in different populations (Figure 4). This shows that the Centor score is well calibrated, suggesting that the rule is generalisable across settings and countries .
Findings in the context of other studies
The diagnostic accuracy of signs and symptoms findings of this systematic review are consistent with a previous review on GABHS pharyngitis which concluded that no sign or symptom on its own is powerful enough to rule in or rule out the diagnosis of GABHS pharyngitis . Not all studies reported the same signs or symptoms to be of similar predictive value. For example, Lindbaek et al. and Llor et al. found that among the four Centor criteria, only cervical adenitis and absence of cough were significantly more frequent in the GABHS pharyngitis patients compared to those with negative cultures [33, 39], while Meland et al. found that tonsillar exudate had no predictive ability . Our meta-analysis shows that all individual symptoms and signs that comprise the Centor score do have modest discriminatory power, with 'any exudates' being the strongest (Table 2).
To the best of our knowledge, this is the first diagnostic test accuracy review of the Centor score. Wigton et al.  reported that a cut-off point of ≥ 2 signs or symptoms in their patient cohort produced a sensitivity of 86% and a specificity of 42%, which was similar to our pooled results (79% and 55% respectively). The most appropriate cut point for antibiotic treatment when using the Centor score depends on the clinicians aim; adults in Western society rarely have complications such as rheumatic fever and clinicians may want to ensure a high specificity in the test, which would lead to lower antibiotic prescription rates but missed cases of GABHS pharyngitis. Where as a clinician in a developing country with a high rate of rheumatic fever, and no access to other diagnostic tests, may feel a high sensitivity is more important.
Strengths and weaknesses
The strengths of this study include the inclusion of additional data from authors, and pooling the results of validation studies for the Centor score so that formal quantitative validation of the Centor score is accomplished.
We acknowledge that our review has several limitations: there is moderate heterogeneity in the Centor score calibration analysis (I2 = 11% to 49%). Heterogeneity in the studies could be due to a variety of factors: chance; a threshold effect as caused by observer variation in the measurement of signs and symptoms; a variation in the pretest probability of GABHS pharyngitis; or other unanticipated factors. The prevalence of GABHS pharyngitis was highly variable between studies (Additional file 1). We addressed the effect of study prevalence as a source of heterogeneity in our calibration analysis.
Although we used a systematic search strategy, we acknowledge that it was not exhaustive and it is possible that we may have missed relevant articles. In particular, the use of search filters in systematic reviews is debatable and not always recommended .
The use of a throat culture as the reference standard for diagnosing GABHS pharyngitis is open to some debate. To date, throat culture is still considered by most to be the reference standard of choice when diagnosing GABHS pharyngitis [3, 8]. Newly developed RADTs can be used in ambulatory care settings, with results available within minutes [51, 52]. However, throat cultures and RADTs fail to distinguish between active infection and carriage, which can lead to inappropriate prescribing of antibiotics for cases of carriage [10, 53]. In addition, many argue that lower sensitivities and the lack of cost effectiveness of RADTs in primary care, will limit their use and that signs and symptoms will always be valuable [54, 55].
The method of analysis in pooling the individual Centor score studies (calibration analysis) is based on the comparative approach used by Bont et al. to validate the CRB-65 CPR in a single validation study . This method extends and employs the absolute risk from the derivation study as a model to generate predicted values in subsequent validation studies. The absolute risk is presented in CPR risk strata so that the clinical value of the CPR across these strata can be assessed. Our method is further supported by an explorative analysis (unpublished results) that compares our original method to a validated and published method for comparing predicted-to-observed values . No statistically significant difference was found between the predicted events by the two methods (P > 0.05). A limitation of this method is that it compares the proportion of patients predicted and observed to have GABHS pharyngitis but without patient level data it is not possible to determine if the positives as predicted by the Centor score are the same patients who are positive based on the throat swab.
Implications for practice
Our meta-analysis of Centor score suggests that it transfers well to other populations and can be used by clinicians to make informed decisions (Table 4 and Figure 4). However, the relatively low post-test probability of GABHS pharyngitis even in areas of high prevalence (Table 4), suggests the score should be used with caution by clinicians when used as a decision aid for antibiotic prescribing. Studies have shown that the use of scores can improve antibiotic prescribing , while others have found them no better than clinician judgement .
A barrier when introducing CPRs such as the Centor score into practice is that clinicians often fail to apply them [59, 60]. One community-based study that used repeated clinical prompts for the modified Centor score to try and influence physician's behaviour when prescribing antibiotics for sore throats, found no significant change in physician behaviour . However, the authors had problems retaining community-based physicians for the duration of the study and believe their results may have been biased by these losses .
The formal incorporation of CPRs can be facilitated by computer-based clinical decision support systems (CDSSs) that quantify diagnostic and prognostic information so as to provide physicians with patient specific recommendations: such aids have been shown to reduce antibiotic prescribing in respiratory tract infections in children in primary care settings [61, 62].