Research article | Open | Open Peer Review | Published:
Reproducibility of clinical research in critical care: a scoping review
BMC Medicinevolume 16, Article number: 26 (2018)
The ability to reproduce experiments is a defining principle of science. Reproducibility of clinical research has received relatively little scientific attention. However, it is important as it may inform clinical practice, research agendas, and the design of future studies.
We used scoping review methods to examine reproducibility within a cohort of randomized trials examining clinical critical care research and published in the top general medical and critical care journals. To identify relevant clinical practices, we searched the New England Journal of Medicine, The Lancet, and JAMA for randomized trials published up to April 2016. To identify a comprehensive set of studies for these practices, included articles informed secondary searches within other high-impact medical and specialty journals. We included late-phase randomized controlled trials examining therapeutic clinical practices in adults admitted to general medical-surgical or specialty intensive care units (ICUs). Included articles were classified using a reproducibility framework. An original study was the first to evaluate a clinical practice. A reproduction attempt re-evaluated that practice in a new set of participants.
Overall, 158 practices were examined in 275 included articles. A reproduction attempt was identified for 66 practices (42%, 95% CI 33–50%). Original studies reported larger effects than reproduction attempts (primary endpoint, risk difference 16.0%, 95% CI 11.6–20.5% vs. 8.4%, 95% CI 6.0–10.8%, P = 0.003). More than half of clinical practices with a reproduction attempt demonstrated effects that were inconsistent with the original study (56%, 95% CI 42–68%), among which a large number were reported to be efficacious in the original study and to lack efficacy in the reproduction attempt (34%, 95% CI 19–52%). Two practices reported to be efficacious in the original study were found to be harmful in the reproduction attempt.
A minority of critical care practices with research published in high-profile journals were evaluated for reproducibility; less than half had reproducible effects.
Owing to harms associated with early acceptance of scientific claims that are subsequently not reproducible , the reproducibility of science has garnered attention from high-profile journals [2,3,4,5,6] and mainstream media [7,8,9]. Most research pertaining to scientific reproducibility concentrates within biomedical sciences, and suggests that 10–25% of the findings from biomedical research are reproducible [5, 6, 10]. Reproducibility within clinical research has received relatively less scientific attention, despite being equally important as it may inform clinical practice, research agendas, and the design of future studies.
In biomedical research, it is common to evaluate an experiment’s ‘methodological reproducibility’ through repeating previously performed experiments using exactly the same methods, data, and tools as the original experiment . Assessing methodological reproducibility requires accurate reporting of methods in the original study, and an experimental population that can be easily accessed or recreated. Clinical research is typically evaluated for results or inferential reproducibility, wherein ‘results reproducibility’ refers to corroborating the results of an original study by repeating the original methods in a new set of participants and ‘inferential reproducibility’ refers to the ability of independent analyses to draw the same conclusions from a given dataset . Clinical studies examining results reproducibility of an original study may be further described as a retest (direct) or an approximate (conceptual) reproduction attempt [12, 13]. A retest reproduction attempt repeats exactly the methodology of the original study in another group of participants, whereas an approximate reproduction attempt may deviate slightly from the methodology employed in the original study [12, 13].
Most studies that have examined reproducibility within clinical research assessed results reproducibility. Estimates from these studies suggest that less than half of reproduction attempts report results that are consistent with the original study [14,15,16,17,18]. However, most of these studies did not employ systematic review methodology, and/or employed definitions of reproducibility that are difficult to reliably operationalize [14,15,16,17,18]. We used scoping review methodology to systematically examine results reproducibility (inclusive of both retest and approximate subtypes) of clinical research. Scoping reviews are a type of knowledge synthesis designed to provide a broad perspective of the literature, set research agendas and provide high-level information for decision-makers [19,20,21], and represent an ideal means of systematically studying reproducibility. Similar to a recent study examining reproducibility in psychological science , for reasons of feasibility, we focused our study on one test clinical discipline, namely adult critical care medicine.
We used two phases of electronic database searching to identify the target cohort of articles. To identify clinical practices relevant to a broad audience of critical care providers , and which were the subject of potentially high-profile research , our primary search involved randomized controlled trials (RCTs) examining the efficacy, effectiveness, or safety of therapeutic clinical practices among adults admitted to intensive care units (ICUs) published in the three medical journals with the highest impact factors, namely the New England Journal of Medicine, The Lancet, and JAMA. To identify a comprehensive set of studies for the clinical practices identified in the primary search, we conducted a secondary search for articles examining these practices published in other high-profile general medical or critical care specialty journals (Annals of Internal Medicine, BMJ, American Journal of Respiratory and Critical Care Medicine, Chest, Critical Care Medicine, Intensive Care Medicine, and Critical Care) . Results from the two sets of searches established the target ‘cohort’ of articles that was subsequently analyzed within a framework to describe reproducibility of experimental clinical research (Table 1). Our methods are outlined in a detailed, published protocol  and depicted within Additional file 1: Figure S1. The published protocol indicates intention to include systematic reviews, systematic reviews with meta-analyses, and studies examining the clinical effects of diagnostic interventions within the target cohort of articles; however, at the request of the reviewers, these studies were removed from this manuscript.
For the primary search, studies were retained if (1) study design was a late-phase RCT, (2) the study population included adults (mean age ≥ 18 years) admitted to general medical-surgical or specialty ICUs , and (3) the effect of a therapeutic clinical practice was reported. Late-phase RCTs were phase III or IV studies that examined the efficacy, effectiveness, or safety of a given therapy . Studies were excluded if (1) study participants were primarily admitted to coronary care units , (2) the clinical practice was provided exclusively in the pre-hospital setting, or (3) the study examined diagnostic accuracy or outcomes associated with the use of a diagnostic intervention. For the secondary searches, studies were retained if they fit the primary search eligibility criteria AND represented an ‘original study’ OR a ‘reproduction attempt’ of a study identified in the primary search (Table 1) .
Search strategy and data sources
For the primary search, we used MEDLINE, the Cochrane Central Register of Controlled Trials, and the American College of Physicians (ACP) Journal Club to search for relevant articles published in the three highest-impact medical journals from database inception (1946) to April 4, 2016. The MEDLINE search (available in Additional file 1: Online Appendix) was peer-reviewed by an experienced librarian using the Peer Review of Electronic Search Strategies (PRESS) checklist .
For secondary searches, the PubMed ‘related articles’ feature was used to conduct targeted searches for articles related to those included from the primary search, published in the other aforementioned general medical and critical care journals (Additional file 1: Figure S1). Additional sources of articles included bibliographies of included articles, and international clinical trial registries [30, 31].
A screening form was independently calibrated by three team members with a random sample of 50 articles. Once consistent selection was achieved (κ ≥ 0.8) , a two-stage process was used to independently and in duplicate screen all articles identified by the searches. First, titles and abstracts were reviewed to determine whether the studies met inclusion or exclusion criteria. Second, the full text of any study classified as ‘include’ or ‘unclear’ after title and abstract review was assessed to determine whether it met inclusion criteria. Eligibility disagreements were resolved by consensus or arbitration by an additional reviewer. Agreement was quantified with the κ statistic .
Data extraction and analysis
Data was extracted independently and in duplicate using a predesigned electronic form, which was pilot tested with a random sample of 10 articles. Once data was consistently abstracted (κ ≥ 0.8) , reviewers proceeded with data extraction for the full set of included articles. Extracted data were related to the study itself, the study participants, the practice under investigation, and the primary outcome.
Included articles were analyzed using a framework to describe reproducibility of experimental clinical research (Table 1). The framework was developed using approaches outlined in previous research [4, 12, 14,15,16]. First, included articles were categorized according to the unique clinical practice they examined (e.g., therapeutic hypothermia for anoxic brain injury). Second, data for a study’s primary outcome and any secondary safety outcomes were used to classify the effect of each unique practice reported in each article as efficacy, lack of efficacy, or harm . Where there was a significant positive effect reported for the primary outcome, and a significant negative effect reported for a safety outcome, practice classification was based on the relative importance of each outcome. For example, if survival was improved, but there was an increased incidence of adverse drug reaction, the practice was classified as having efficacy. Third, within each unique clinical practice, relevant articles were classified as an ‘original study’ or a ‘reproduction attempt’. An original study was chronologically the first experimental study to examine the effects of a clinical practice. A reproduction attempt was any subsequent article that (intentionally or unintentionally) endeavored to re-examine the results of the original by repeating the methodology in another group of participants. To be considered a reproduction attempt the sample size had to be at least 90% that of the original RCT . Finally, using the effect reported for each practice, original studies and reproduction attempts were further classified according to whether they demonstrated ‘consistent effect estimates’ (e.g., efficacy in original study and reproduction attempt) or ‘inconsistent effect estimates’ (e.g., efficacy in original study and lack of efficacy in reproduction attempt). Practices with ‘consistent effect estimates’ denoted those with reproducible results, whereas practices with ‘inconsistent effect estimates’ denoted those with non-reproducible results.
Normally distributed data were reported as mean and 95% confidence interval (CI). Skewed data were transformed using logarithms and reported as geometric mean and 95% CI. Nominal data were summarized using counts with percentages, or percentages with 95% CI where appropriate. Statistical comparisons between original studies and reproduction attempts were performed using mixed effects logistic regression with clustering at the level of the individual clinical practice. For all other comparisons, Fisher’s exact test, χ2, or Student’s t test were used, as appropriate. All analyses were conducted using Stata version 14.2 (Stata Corp, College Station, TX, USA) and statistical significance was set at P < 0.05.
From 2636 unique articles, 275 relevant articles were identified that reported on 158 unique clinical practices in 283 studies (Fig. 1). Because one article could report on the effects of more than one practice (e.g., factorial RCT), we used the term ‘study’ to refer to any comparison of an intervention to a control. Accordingly, there were more studies than articles because seven factorial RCTs reported results for two clinical practices in the same article [34,35,36,37,38,39,40], and one article reported on the results of two separate RCTs . Most included studies were published after 1990 (n = 259, 92%), and examined the effects of drugs (n = 134, 47%) or devices (n = 95, 34%) in patients with respiratory failure (n = 102, 36%). Characteristics of the included studies are described in Table 2, and bibliographic details appear in Additional file 1: Tables S1–S5.
Clinical practices without a reproduction attempt
Agreement for classification within our reproducibility framework was excellent (κ = 0.9). For 92 practices (58%, 95% CI 50–66%) a reproduction attempt could not be found (Fig. 2). Of these 92 practices, 31 (34%, 95% CI 24–44%) were reported to be efficacious, 50 (54%, 95% CI 43–65%) reported lack of efficacy, and 11 (12%, 95% CI 6–20%) reported harm. Practices with studies that reported efficacy commonly targeted patients with respiratory failure (n = 10, 29%), practices with studies that reported lack of efficacy commonly targeted patients with sepsis (n = 12, 22%), and harmful practices commonly targeted patients with neurological conditions (n = 3, 27%) (Additional file 1: Table S1).
Clinical practices with a reproduction attempt
In total, 66 clinical practices (42%, 95% CI 33–50%) had one or more reproduction attempts identified. The geometric mean time from publication of the original study to publication of the first reproduction attempt was 4.6 (95% CI 3.7–5.7) years (Additional file 1: Figure S2). Original studies reported a larger effect estimate for the primary endpoint than the corresponding reproduction attempt (mean absolute risk difference 16.0%, 95% CI 11.6–20.5% vs. 8.4%, 95% CI 6.0–10.8%, P = 0.003). Twenty-seven of the 66 practices had at least two reproduction attempts (41%, 95% CI 28–54%). All reproduction attempts were an approximate reproduction of the corresponding original study. For three practices, the reproduction attempt was in progress [38, 42, 43]. Of the remaining 63 practices, the original study and reproduction attempt demonstrated consistent effect estimates (i.e., reproducible results) for 28 practices (44%, 95% CI 31–58%), and inconsistent effect estimates (i.e., non-reproducible results) for 35 practices (56%, 95% CI 42–68%) (Fig. 2). Practices with consistent effects had a smaller number of reproduction attempts per original study than those with inconsistent effects (geometric mean 1.3, 95% CI 1.0–1.6 vs. 1.9, 95% CI 1.4–2.4, P = 0.03).
Practices with consistent effects
Among 28 practices with consistent effects, most reported lack of efficacy (n = 14, 50%, 95% CI 30–69%), with a minority reporting efficacy (n = 11, 39%, 95% CI 21–59%) or harm (n = 3, 11%, 95% CI 2–28%). Practices consistently reported to be efficacious included lung protective ventilation for acute respiratory distress syndrome (ARDS) and non-invasive ventilation for cardiogenic pulmonary edema (Additional file 1: Table S2). Practices that consistently reported lack of efficacy included immune-modulating therapies for sepsis and continuous (compared with intermittent) renal replacement therapy (Additional file 1: Table S3). The clinical practice with the most consistent evidence of harm was fluid resuscitation with hydroxyethyl starches (Additional file 1: Table S4).
Practices with inconsistent effects
For 11 of the 35 practices with inconsistent effects (31%, 95% CI 16–49%), there were multiple different estimates of effect among the reproduction attempts (e.g., original study reports efficacy and some reproduction attempts report lack of efficacy, while others report efficacy) (Additional file 1: Table S5). Of the remaining 24 practices that had one change in the direction of effect between the original study and reproduction attempt, the most common change in effect was from efficacy in the original study to either lack of efficacy or harm in the reproduction attempt (n = 14, 58%, 95% CI 36–78%). For four practices, a reproduction attempt reported efficacy after an original study reported lack of efficacy. No reproduction attempt found efficacy for any practice originally found to be harmful.
We used a rigorous knowledge synthesis method to analyze results reproducibility within a cohort of clinical critical care research published in high-profile journals. The main findings of our study add novel information to this important and evolving scientific area. First, the effects of fewer than half of clinical practices evaluated were assessed for their reproducibility and, of these, less than half had effects that were consistent across original studies and reproduction attempts. Second, slight methodological differences between the original study and corresponding reproduction attempt created challenges reporting reproducibility for certain practices and resulted in most reproduction attempts being an approximate of the corresponding original. Finally, studying results reproducibility within critical care enabled the creation of a map of clinical critical care practices with reproducible evidence (Fig. 3).
Our results compare favorably with prior research [4, 14,15,16,17,18]. Four previous studies examined reproducibility by comparing original studies and reproduction attempts within existing published literature [14,15,16,17]. Ioannidis found that 20 (44%) of 45 highly cited studies (at least 1000 indexed citations) claiming a practice to be beneficial, reported results that were consistent with a subsequent reproduction attempt . In two distinct but similar studies, Prasad et al. [15, 16] found that approximately 27% of original research publications in the New England Journal of Medicine reported reproduction attempts and, of these, 38–46% found effects that were consistent with the original study. Makel et al.  found that 79% of reproduction attempts within published psychology literature reported effects that were consistent with the original study. This estimate decreased to 65% if the authors of the reproduction attempt differed from those of the original study . Two studies examined reproducibility by conducting reproduction attempts for several published original studies . The Open Science Collaboration conducted reproduction attempts for 100 studies published in the psychology literature and found that, depending on the definition of reproducibility, between 36% and 47% of reproduction attempts reported results consistent with the original study . Using a similar approach, Camerer et al.  found that, for 18 experimental economic studies, 11 (61%) reproduction attempts found a significant effect in the same direction as the original study.
In conjunction with these previous studies, our study highlights challenges associated with studying reproducibility. First, is the systematic and efficient identification of relevant articles within the vast landscape of published literature. To manage the breadth of the critical care literature, we restricted the primary search to the three general medical journals with the highest impact factors. This was done to reduce the number of early-phase RCTs that are inherently at higher risk for bias, are less relevant to discussions of reproducibility, are more likely published in lower-impact journals, and less likely to influence clinical practice. This restriction may have missed potentially relevant studies. However, articles included in our study are comparable to other reviews of important clinical critical care research [24, 44, 45]. Restricting the primary search to high-profile literature may have overestimated the number of practices with a reproduction attempt. However, through identification of 158 clinical critical care practices, and reporting the estimate of reproduction attempts at the level of the practice rather than the individual original study, it is less likely that inclusion of potentially lower-profile literature within the primary search would considerably alter this estimate. The second challenge associated with examining reproducibility is determining what constitutes a reproduction attempt. There is no consensus definition of a reproduction attempt. Among previous similar studies, definitions are not consistent and are difficult to reliably operationalize [14,15,16,17]. In comparison, our definition required greater similarity between original studies and reproduction attempts, with strict criteria pertaining to study design and sample size, and minor latitude given to study population, nature of the intervention and/or control, and primary outcome measure. It is possible that this relatively stricter definition excluded potential reproduction attempts and resulted in a lower estimate of the number of practices with a reproduction attempt. However, by employing a strict definition, our study endeavored to include reproduction attempts that were methodologically similar to the original study and reduced the likelihood that inconsistent results were due to differences in methodological quality . This identifies the third challenge associated with studying reproducibility, which is determining what constitutes a consistent reproduction attempt. Previous studies used conclusions reported by authors to determine whether the results of a reproduction attempt were consistent with the original study [14,15,16,17]. We employed a more objective approach that classified the primary efficacy outcome and any pre-specified secondary safety outcome to derive our own assessment of the efficacy of each practice, and used this to determine whether original studies and reproduction attempts reported consistent effects. Accepting the limitations of this approach , it is congruent with that employed in previous reproducibility research [14,15,16,17], and resulted in a rate of reproducible research that compares favorably with much of the existing clinical literature [4, 14,15,16].
Our study has implications for clinicians, scientists, and funding agencies. From a clinical perspective, our study may help clinicians interpret the implementation ramifications of experimental critical care research published in high-profile journals. Our results suggest (1) that adoption of practices with one study claiming efficacy should wait until confirmed through a reproduction attempt (e.g., tight glycemic control ), (2) that hope not be lost after publication of one study demonstrating lack of efficacy (e.g., prone ventilation ), and (3) that clinicians need not wait for a reproduction attempt before deciding against adoption of practices shown to be harmful (e.g., hydroxyethyl starches ). Examining reproducibility also enabled the creation of a map of clinical critical care practices with consistent evidence that could broadly inform quality improvement initiatives, such as the Choosing Wisely campaign , in deciding what to promote as best practice. The strength of this approach is that it not only includes practices known to have strong reproducible evidence that should be universally adopted (e.g., lung protective ventilation among patients with ARDS) or de-adopted (e.g., hydroxyethyl starch fluid resuscitation), but also less well recognized practices with reproducible evidence that should be adopted (e.g., central venous catheterization via the subclavian compared to jugular or femoral sites) or de-adopted (e.g., high positive end-expiratory pressure in ARDS).
From a scientific perspective, our study demonstrates that understanding which experimental clinical studies require a reproduction attempt, as well as the number of reproduction attempts required for a given clinical practice, requires more study. Due to the risks and costs associated with conducting experimental clinical research, identifying which studies require a reproduction attempt necessitates a thoughtful approach that integrates findings from the original study and factors related to the clinical practice. It also requires a general acceptance within the scientific community of the merit of conducting and publishing the results of reproduction attempts. With regard to findings from the original study, as suggested by our data, wherein no clinical practice found to be harmful in an original study was found to have efficacy in a reproduction attempt, any clinical practice shown to be harmful in a phase III RCT should generally not be examined in additional RCTs. However, among studies reporting efficacy or lack of efficacy, the assessment of whether a reproduction attempt is necessary requires deeper understanding of the likelihood that a reproduction attempt will provide valuable information. If the reproduction attempt is likely to produce consistent results, it is arguably not required, especially if the practice in question is complex and the cost of doing a follow-up RCT is high. On the other hand, if the reproduction attempt is predicted to produce findings that differ from the original study, a reproduction attempt is vitally important. Knowing which studies need a reproduction attempt requires additional understanding of study factors that predict when a reproduction attempt will be consistent with the original study. Such factors include but are not limited to potential small differences in study protocols (i.e., retest versus approximate reproduction attempt), a low fragility index in original studies , delta inflation bias in power calculations in reproduction attempts , or heterogeneity of treatment effects and the reporting of one effect estimate for a population of patients at differential risk for the outcome . The number of reproduction attempts is also likely an important determinant of consistency, in that as more reproduction attempts are conducted, the likelihood of obtaining a result that differs from the original study increases. The optimal number of reproduction attempts is not clear. When the first reproduction attempt reports findings consistent with the original study, this is likely adequate to assess the efficacy of a given clinical practice, especially if there are no signals from secondary analyses that additional patient subgroups and/or outcomes should be examined. In this case, additional reproduction attempts may result in patients not receiving beneficial practices (or unnecessarily experiencing ineffective practices), and waste of valuable healthcare and scientific resources. When the findings from a first reproduction attempt are not consistent with the original study, clinicians and scientists should view that inconsistency as an opportunity to pause and re-examine each component of the clinical question (i.e., population, intervention, etc.) before moving forward with any additional experimental research. Additional understanding pertaining to rates and predictors of reproducibility will help scientists decide which practices warrant repeat examination through a reproduction attempt, and may help design studies that are less susceptible to non-reproducibility. Similarly, funding agencies may be better positioned to weigh the relative importance and methodological strength of a proposed reproduction attempt, which may help inform the controversial balance between funding science that intends to examine existing concepts and science that intends to discover new concepts.
Fewer than half of clinical critical care practices with research published in high-profile journals were evaluated for reproducibility and, of these, less than half had reproducible effects. Heterogeneity within study populations and delivery of interventions presents challenges to studying reproducibility within clinical research. These challenges notwithstanding, implications of our work include that caution is warranted when interpreting initial reports of clinical research; specialty societies should consider waiting for evidence of reproducibility before defining best practices given the potential broad impact of their recommendations. Further, researchers and funding agencies should increase efforts to evaluate the reproducibility of clinical experiments, with examination of scientific reproducibility being an accepted and required part of scientific discourse.
American College of Physicians
acute kidney injury
acute respiratory distress syndrome
chronic obstructive pulmonary disease
continuous renal replacement therapy
central venous catheter
intensive care unit
intermittent renal replacement therapy
positive end-expiratory pressure
peer review of electronic search strategies
randomized controlled trial
Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, et al. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ. 2015;351:h4320.
McNutt M. Reproducibility. Science. 2014;343:229.
McNutt M. Journals unite for reproducibility. Science. 2014;346:679.
Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716.
Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533:452–4.
Ioannidis JP. Acknowledging and overcoming nonreproducibility in basic and preclinical research. JAMA. 2017;317:1019–20.
The Scientific Method: Let’s Just Try That Again. The Economist. 2016. https://www.economist.com/news/science-and-technology/21690020-reproducibility-should-be-sciences-heart-it-isnt-may-soon. Accessed 24 Nov 2016.
BMJ Editor Fiona Godlee Takes on Corruption in Science. http://www.cbc.ca/news/health/bmj-fiona-godlee-science-1.3541769. Accessed 28 Nov 2016.
Carroll AE. Science Needs a Solution for the Temptation of Positive Results. New York Times. 2017; https://www.nytimes.com/2017/05/29/upshot/science-needs-a-solution-for-the-temptation-of-positive-results.html. Accessed 20 June 2017.
Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116:116–26.
Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility mean? Sci Transl Med. 2016;8:341ps312.
Curran J, Vachon B, Grimshaw J. Is replication research informing the results of systematic reviews in knowledge translation research? 21st Cochrane Colloquium Abstract. 2013. https://abstracts.cochrane.org/2013-qu%C3%A9bec-city/replication-research-informing-results-systematic-reviews-knowledge-translation. Accessed 15 Sept 2014.
Zwaan RA, Etz A, Lucas RE, Donnellan MB. Making replication mainstream. Behav Brain Sci. 2017; https://doi.org/10.1017/S0140525X17001972.
Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294:218–28.
Prasad V, Gall V, Cifu A. The frequency of medical reversal. Arch Intern Med. 2011;171:1675–6.
Prasad V, Vandross A, Toomey C, Cheung M, Rho J, Quinn S, et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clinic Proc. 2013;88:790–8.
Makel MC, Plucker JA, Hegarty B. Replications in psychology research: how often do they really occur? Perspect Psychol Sci. 2012;7:537–42.
Camerer CF, Dreber A, Forsell E, Ho TH, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351:1433–6.
Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8:19–32.
The Joanna Briggs Institute. The Joanna Briggs Institute Reviewers Manual 2015: Methodology for JBI Scoping Reviews. Australia: The Joanna Briggs Institute; 2015.
Tricco AC, Lillie E, Zarin W, O'Brien K, Colquhoun H, Kastner M, et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16:15.
McKibbon KA, Haynes RB, McKinlay RJ, Lokker C. Which journals do primary care physicians and specialists access from an online service? J Med Libr Assoc. 2007;95:246–54.
McKibbon KA, Wilczynski NL, Haynes RB. What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals? BMC Med. 2004;2:33.
Harhay MO, Wagner J, Ratcliffe SJ, Bronheim RS, Gopal A, Green S, et al. Outcomes and statistical power in adult critical care randomized trials. Am J Respir Crit Care Med. 2014;189:1469–78.
Niven DJ, JT MC, Straus SE, Hemmelgarn BR, Jeffs LP, Stelfox HT. Identifying low-value practices in critical care medicine: protocol for a scoping review. BMJ Open. 2015;5:e008244.
Simchen E, Sprung CL, Galai N, Zitser-Gurevich Y, Bar-Lavi Y, Gurman G, et al. Survival of critically ill patients hospitalized in and out of intensive care units under paucity of intensive care unit beds. Crit Care Med. 2004;32:1654–61.
Friedman LM, Furberg CD, DeMets DL. Introduction to Clinical Trials. In: Fundamentals of Clinical Trials. 4th ed. New York: Springer; 2010. p. 3–8.
Walker DM, West NE, Ray SG, British Cardiovascular Society Working Group on Acute Cardiac C. From coronary care unit to acute cardiac care unit: the evolving role of specialist cardiac care. Heart. 2012;98:350–2.
Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009;62:944–52.
Clinicaltrials.gov. http://www.clinicaltrials.gov/. Accessed 5 Apr 2016.
Current Controlled Trials. http://www.controlled-trials.com/. Accessed 5 Apr 2016.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Pocock SJ, Stone GW. The primary outcome is positive – is that good enough? N Engl J Med. 2016;375:971–9.
Cobb DK, High KP, Sawyer RG, Sable CA, Adams RB, Lindley DA, et al. A controlled trial of scheduled replacement of central venous and pulmonary-artery catheters. N Engl J Med. 1992;327:1062–8.
Brunkhorst FM, Engel C, Bloos F, Meier-Hellmann A, Ragaller M, Weiler N, et al. Intensive insulin therapy and pentastarch resuscitation in severe sepsis. N Engl J Med. 2008;358:125–39.
Timsit JF, Schwebel C, Bouadma L, Geffroy A, Garrouste-Orgeas M, Pease S, et al. Chlorhexidine-impregnated sponges and less frequent dressing changes for prevention of catheter-related infections in critically ill adults: a randomized controlled trial. JAMA. 2009;301:1231–41.
Annane D, Cariou A, Maxime V, Azoulay E, D'Honneur G, et al. Corticosteroid treatment and intensive insulin therapy for septic shock in adults: a randomized controlled trial. JAMA. 2010;303:341–8. [Erratum appears in JAMA. 2010 May 5;303(17):1698]
Jakob SM, Ruokonen E, Grounds RM, Sarapohja T, Garratt C, Pocock SJ, et al. Dexmedetomidine vs midazolam or propofol for sedation during prolonged mechanical ventilation: two randomized controlled trials. JAMA. 2012;307:1151–60.
Heyland D, Muscedere J, Wischmeyer PE, Cook D, Jones G, Albert M, et al. A randomized trial of glutamine and antioxidants in critically ill patients. N Engl J Med. 2013;368:1489–97. [Erratum appears in N Engl J Med. 2013 May 9;368(19):1853 Note: Dosage error in article text.]
Robertson CS, Hannay HJ, Yamal JM, Gopinath S, Goodman JC, Tilley BC, et al. Effect of erythropoietin and transfusion threshold on neurological recovery after traumatic brain injury: a randomized clinical trial. JAMA. 2014;312:36–47.
Takala J, Ruokonen E, Webster NR, Nielsen MS, Zandstra DF, Vundelinckx G, et al. Increased mortality associated with growth hormone treatment in critically ill adults. N Engl J Med. 1999;341:785–92.
Cooper DJ, Rosenfeld JV, Murray L, Arabi YM, Davies AR, D'Urso P, et al. Decompressive craniectomy in diffuse traumatic brain injury. N Engl J Med. 2011;364:1493–502. [Erratum appears in N Engl J Med. 2011 Nov 24;365(21):2040]
Papazian L, Forel JM, Gacouin A, Penot-Ragon C, Perrin G, Loundou A, et al. Neuromuscular blockers in early acute respiratory distress syndrome. N Engl J Med. 2010;363:1107–16.
Ospina-Tascon GA, Buchele GL, Vincent JL. Multicenter, randomized, controlled trials evaluating mortality in intensive care: doomed to fail? Crit Care Med. 2008;36:1311–22.
Landoni G, Comis M, Conte M, Finco G, Mucchetti M, Paternoster G, et al. Mortality in multicenter critical care trials: an analysis of interventions with a significant effect. Crit Care Med. 2015;43:1559–68.
Finfer S, Chittock DR, Su SY, Blair D, Foster D, Dhingra V, et al. Intensive versus conventional glucose control in critically ill patients. N Engl J Med. 2009;360:1283–97.
Guerin C, Reignier J, Richard JC, Beuret P, Gacouin A, Boulain T, et al. Prone positioning in severe acute respiratory distress syndrome. N Engl J Med. 2013;368:2159–68.
Zarychanski R, Abou-Setta AM, Turgeon AF, Houston BL, McIntyre L, Marshall JC, et al. Association of hydroxyethyl starch administration with mortality and acute kidney injury in critically ill patients requiring volume resuscitation: a systematic review and meta-analysis. JAMA. 2013;309:678–88.
Cassel CK, Guest JA. Choosing wisely: helping physicians and patients make smart decisions about their care. JAMA. 2012;307:1801–2.
Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The Fragility Index in multicenter randomized controlled critical care trials. Crit Care Med. 2016;44:1278–84.
Aberegg SK, Richards DR, O'Brien JM. Delta inflation: a bias in the design of randomized controlled trials in critical care medicine. Crit Care. 2010;14:R77.
Iwashyna TJ, Burke JF, Sussman JB, Prescott HC, Hayward RA, Angus DC. Implications of heterogeneity of treatment effect for reporting and analysis of randomized trials in critical care. Am J Respir Crit Care Med. 2015;192:1045–51.
Loscalzo J. Pilot trials in clinical research: of what value are they? Circulation. 2009;119:1694–6.
We would like to acknowledge Becky Skidmore (Independent Information Specialist Consultant, Ottawa, Ontario) for peer review of the literature search strategy, and Peggy Robinson (Independent Medical Editor, Ottawa, Ontario) and Dr. Kirsten Fiest (Assistant Professor, University of Calgary) for comments on an earlier version of this manuscript.
During the time this work was conducted, Dr. Niven was funded through a Clinician Fellowship Award from Alberta Innovates – Health Solutions, a Knowledge Translation Canada Student Fellowship and Training Program grant, and a Knowledge Translation Canada Research Stipend. Dr. Stelfox was supported by a Population Health Investigator Award from Alberta Innovates–Health Solutions. Dr. Straus was funded by a Tier 1 Canada Research Chair. Dr. Hemmelgarn was supported by the Roy and Vi Baay Chair in Kidney Research. The funding agencies did not contribute to design and conduct of the study, collection, management, analysis or interpretation of the data, or preparation, review or approval of the final manuscript.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Clinical practices without a reproduction attempt. Table S2. Clinical practices with consistent estimates of efficacy between original studies and reproduction attempts. Table S3. Clinical practices with consistent estimates of lack of efficacy between original studies and reproduction attempts. Table S4. Clinical practices with consistent estimates of harm between original studies and reproduction attempts. Table S5. Clinical practices with inconsistent effect estimates between original studies and reproduction attempts. Figure S1. Flow diagram showing study design including electronic search strategy, article eligibility criteria, and reproducibility classification. Figure S2. The relationship between time since publication of the original study and the occurrence of a first reproduction attempt. Online Appendix. MEDLINE Search Strategy (April 4, 2016). (DOCX 1175 kb)