The present study provides an overview about the use of the NNT in medical research during the last decade. The adherence of selected studies to basic methodological recommendations was reviewed. This topic is particularly relevant given that the NNT concept has been extended to derive related metrics with potential for use in benefit-risk assessments, namely for clinical decision making or drug regulatory purposes. An example is provided by impact numbers, which give a population perspective to the NNT [45, 46]. Impact numbers are useful to describe the public health burden of a disease and the potential impact of a treatment [6]. Two measures of impact numbers are particularly interesting: the number of events prevented in the population (NEPP) and the population impact number of eliminating a risk factor over time \( t \) (PIN-ER- \( t \)) [6, 47, 48].
Clinicians and other investigators should be aware that the calculation and interpretation of NNTs depend on specific study characteristics, particularly the design and outcome variables. The use of inadequate calculating methods may lead to biased results and misleading conclusions [22, 29, 35, 49].
The majority of studies included in the present review aimed at assessing primarily only the efficacy of medical interventions. The NNT was used more often to assess only benefits (51.9%) rather than only harms (21.2%). This finding was expected, considering what is commonly seen in the medical literature. A previous systematic review including meta-analyses published over a 5-year period found that only 14% of studies were designed to investigate drug safety as primary outcome [38]. In another study comprising systematic reviews with absolute effect estimates, it was found that the NNT was mostly used to assess beneficial outcomes rather than harmful events [14].
Overall, included studies reported more frequently results for binary outcomes than for time-to-event outcomes. This finding contrasts with the results of a previous review in which nearly 55% of included studies reported NNTs for time-to-event outcomes [12]. However, that review included only RCTs [12], while the present study included several research designs.
Relative measures of effect were used to express treatment differences in the majority of included studies (82.4%). These findings are in line with the conclusions of a recent survey of 202 systematic reviews [14]. Of those, the majority included meta-analyses with estimation of relative effects (92.1%), while absolute effect estimates were provided in 36.1% [14].
As previously mentioned, the concept of NNT requires the description of a defined period of time and varies with baseline risk (also called CER). Nevertheless, the time horizon was lacking in more than one fourth (25.5%) of studies. The NNT is uninterpretable if the time of follow-up during which cumulative outcome incidences are measured is not provided [34]. In addition, baseline risks could not be ascertained in nearly 28% of studies. Previous findings indicate that 56.2% of studies reporting absolute risks do not present the source of baseline risk estimates [14]. Lastly, more than one third (37.3%) of studies included in the present review did not report the CI for the point-estimate NNT. This result is in line with previous findings [12]. Thus, a moderately high proportion of papers published in journals with high impact factor in the category of “General and/or Internal Medicine” misuse the NNT metric.
As seen across the articles reviewed here, several approaches have been used to derive NNTs from meta-analyses. However, in 13 out of 23 meta-analyses (56.5%) the approach was considered inadequate. Of these meta-analyses, one calculated the reciprocal of simple proportions (using total numbers of both patients with outcome and exposed patients coming from all included studies). Using simple proportions, i.e., treating the data as if they all come from a single trial, to calculate NNTs is not correct, as this method is prone to bias due to Simpson’s paradox [35, 50]. The other 12 meta-analyses inverted pooled RDs, but this method should also be avoided [19, 31, 36, 51]. Absolute RDs are usually not constant and homogeneous across different baseline event rates; therefore, they are rarely appropriate for calculating NNTs from meta-analyses [19, 31, 36, 51]. Moreover, the effects of secular trends on disease risk and time horizon preclude the use of pooled RDs, as they can result in misleading NNTs [36, 51]. Relative effect measures (such as RR and OR) are usually more stable across risk groups than are absolute differences. Thus, pooled estimates of relative effect measures should be used rather than absolute RDs to derive NNTs from meta-analyses [19, 31, 36]. Clinicians should preferably use fixed effects OR, random effects OR or RR, and the patient expected event rate (PEER) to individualize NNT when applying results from meta-analyses in clinical practice [4, 19].
Most RCTs (94.1%) followed basic methodological recommendations to calculate NNTs. It is noteworthy that the majority of included RCTs (13 out of 17) analyzed binary outcomes. Studies with fixed times of follow-up are usually not prone to miscalculation of NNT because cumulative incidences equal simple proportions at the study end [29]. However, previous studies suggested that NNTs are miscalculated in at least half of RCTs with time-to-event outcomes [12, 29].
In the present review, one out four RCTs with varying follow-up times applied a non-recommended method to calculate NNT (see, e.g., [52]). In that RCT, the effect of two doses of atorvastatin (80 mg or 10 mg daily) was tested, for the first occurrence of a major cardiovascular event (i.e., a time-to-event outcome), in patients with coronary artery disease (CAD) and type 2 diabetes, with and without chronic kidney disease [52]. Patients were followed for varying times (median, 4.8 years). Although Kaplan-Meier curves have been estimated, the authors used simple proportions of patients with the outcome to compute NNT (e.g., for patients with diabetes without CAD, 1/([62/441] – [57/444]) = 82) and concluded that 82 patients were needed to treat with 80 mg/day versus 10 mg/day to prevent one major cardiovascular event over 4.8 years [52]. Using the cumulative incidences provided in Kaplan-Meier curves (12.5% for 80 mg and 13.3% for 10 mg), NNT would have been estimated at 125 over the same time horizon. This example illustrates how the use of simple proportions can lead to misleading values of NNT. Simple proportions should be used only if all patients are followed for the entire study period, as they equal cumulative incidences estimated by the Kaplan-Meier approach [30]. Since follow-up times usually vary in RCTs, simple proportions are not valid estimates of cumulative incidences. In cases where follow-up is short and mostly complete, simple proportions and Kaplan-Meier incidences are almost similar [30].
As the present study assessed results from research published since 2006, two different methodologies were considered adequate for calculating NNT from RCTs where the outcome is time to an event [26, 53, 54]. More recently, however, the authors of a study comparing the risk difference approach (reciprocal of risk differences estimated by survival time methods) and the incidence difference approach (reciprocal of incidence rates differences) concluded that the methods based on incidence rates often lead to misleading NNT estimates and recommended the use of survival time methods to estimate NNTs in RCTs with time-to-event outcomes [28]. The incidence difference approach still can be used in the case of small baseline risks, strong treatment effects, and exponentially distributed survival times [28]. Nevertheless, Girerd et al. argued that the two methods measure different things, but both are valid and provide complementary information regarding the absolute effect of an intervention, highlighting that the incidence rate approach assesses person-years rather than persons [55]. This calculating method estimates the number of person-times (e.g. patient-years), not the absolute number of persons, needed to observe one less (or one more) event in the treatment group than in the control group [28, 29, 54,55,56]. This estimate is different from the “classical” person-based NNT, and therefore may be difficult to interpret [56]. For example, 100 patient-years do not necessarily mean 100 individual patients treated over 1 year (or 50 patients treated for 2 years). A thorough explanation of person-based NNT, person-time-based NNT, and event-based NNT (for multiple recurrent outcome events) is provided elsewhere [29, 57].
With regard to observational studies, one cohort study did not follow methodological recommendations [58]. In that study, Kaplan-Meier curves and Cox proportional HRs for time to event, adjusted for confounding factors, with pioglitazone as reference, were used to test the effect of rosiglitazone on several cardiovascular adverse events [58]. However, the authors applied unadjusted incidence rate differences to calculate NNTs, instead of using adjusted data. For example, at 1 year of follow-up, the NNT for a composite cardiovascular endpoint would be 92 from Kaplan-Meier curves rather than the 60 person-years obtained by the authors. Further, the authors interpreted person-years as number of persons treated over 1 year, which is not exactly the same. A detailed review and discussion of methods used to calculate NNTs from observational studies is provided elsewhere [21,22,23].
The present study was not primarily aimed at identifying all papers with methodological recommendations for calculating NNTs. For this reason, a systematic review of literature was not performed to identify such papers. This is a potential limitation of the study. Nevertheless, the literature used as the source of evidence was probably adequate for the complexity of the assessment. The study focused on the adherence of calculating methods to basic methodological recommendations, rather than to more complex methodological and statistical issues. Therefore, estimates of NNT reported by studies that followed basic methodological recommendations are not necessarily correct. There are possibly other reasons that can still lead to biased estimates, but which could not be assessed with an acceptable effort. In addition, the magnitude of error produced in studies that did not follow basic methodological recommendations to calculate NNTs was not tested. Aside from some examples provided in the discussion, the calculation of correct NNTs was not sought for studies that did not follow recommendations. Lastly, the study was limited to the top 25 high-impact factor journals in the “General and/or Internal Medicine” category. Whether or not the results in other fields are likely to show similar results deserves further testing.
The present results illustrate that these metrics have not always been adequately calculated. From the clinicians’ point of view, this may cause some concerns, since these metrics can be used to support clinical decision-making processes, including the prescription of medicines. Therefore, clinicians need to rely on the methodological appropriateness of such calculations.