Skip to main content

Evaluating agreement between bodies of evidence from randomized controlled trials and cohort studies in medical research: a meta-epidemiological study

Abstract

Background

Randomized controlled trials (RCTs) and cohort studies are the most common study design types used to assess the treatment effects of medical interventions. To evaluate the agreement of effect estimates between bodies of evidence (BoE) from randomized controlled trials (RCTs) and cohort studies and to identify factors associated with disagreement.

Methods

Systematic reviews were published in the 13 medical journals with the highest impact factor identified through a MEDLINE search. BoE-pairs from RCTs and cohort studies with the same medical research question were included. We rated the similarity of PI/ECO (Population, Intervention/Exposure, Comparison, Outcome) between BoE from RCTs and cohort studies. The agreement of effect estimates across BoE was analyzed by pooling ratio of ratios (RoR) for binary outcomes and difference of mean differences for continuous outcomes. We performed subgroup analyses to explore factors associated with disagreements.

Results

One hundred twenty-nine BoE pairs from 64 systematic reviews were included. PI/ECO-similarity degree was moderate: two BoE pairs were rated as “more or less identical”; 90 were rated as “similar but not identical” and 37 as only “broadly similar”. For binary outcomes, the pooled RoR was 1.04 (95% CI 0.97–1.11) with considerable statistical heterogeneity. For continuous outcomes, differences were small. In subgroup analyses, degree of PI/ECO-similarity, type of intervention, and type of outcome, the pooled RoR indicated that on average, differences between both BoE were small. Subgroup analysis by degree of PI/ECO-similarity revealed high statistical heterogeneity and wide prediction intervals across PI/ECO-dissimilar BoE pairs.

Conclusions

On average, the pooled effect estimates between RCTs and cohort studies did not differ. Statistical heterogeneity and wide prediction intervals were mainly driven by PI/ECO-dissimilarities (i.e., clinical heterogeneity) and cohort studies. The potential influence of risk of bias and certainty of the evidence on differences of effect estimates between RCTs and cohort studies needs to be explored in upcoming meta-epidemiological studies.

Peer Review reports

Background

Randomized controlled trials (RCTs) and cohort studies are the most common study design types used to assess the treatment effects of medical interventions [1, 2]. RCTs are considered the gold standard in medical research to assess benefits and harms of treatments [13]. Randomization allows causal inference [4]. However, RCTs may not be available for certain research questions due to ethical reasons [5] or they may suffer from low external validity [69], too short follow-up duration to assess late adverse events [5], or low adherence [10]. In contrast to RCTs, large cohort studies may often have higher external validity [6], e.g., when including diverse populations [8, 9]. Cohort studies can complement information from RCTs or might even serve as a replacement [11] and enlarge the available body of evidence (BoE: all studies available for a given research question, i.e., all RCTs/cohort studies investigating the impact of oral contraception on breast cancer), or they may be useful to identify relevant subgroups for subsequent RCTs [12]. However, there is an ongoing debate about the trustworthiness of results from cohort studies mainly fuelled by their susceptibility to risk of bias by confounding [8, 13]. For example, systematic reviews from the Cochrane Collaboration impose high thresholds on the inclusion of cohort studies [5]. Several studies have investigated whether the susceptibility to bias in different types of observational studies indeed leads to disagreement of effect estimates [1417]; the largest study so far, a meta-methodological study comparing health care outcomes from RCTs to observational studies (including case-control and cohort studies) concluded that results were mainly concordant [18]. The authors suggested that factors other than the study design only should be investigated in the case of disagreement of results. However, the study lacked an empirical investigation of factors such as PI/ECO (population, intervention/exposure, comparator, outcome)-differences (for example, differences between the interventions tested in RCTs and cohort studies) that potentially account for disagreement of study results and little is known about this topic so far. Therefore, in the present meta-epidemiological study, we do not only evaluate the agreement of effect estimates between BoE from RCTs and cohort studies from the general medical field. Additionally, we investigate whether factors such as PI/ECO-differences between BoE are associated with disagreement. This also allows us, to explore and to better understand potential reasons for statistical heterogeneity. Factors associated with disagreement would require special attention in future health-care evidence syntheses integrating both BoE.

Methods

This meta-epidemiological study was planned, written, and reported in adherence to guidelines for reporting meta-epidemiological research [19]. The detailed inclusion criteria are described in Table 1.

Table 1 Detailed description of inclusion and exclusion criteria

Literature search

The search was conducted in MEDLINE (via PubMed.gov) on June 05, 2020, for the period between January 01, 2010, to December 31, 2019, in the 13 medical journals with the highest impact factor (according to the Journal Citation Report [JCR] 2018; category: general and internal medicine). This cut-off was chosen to cover a 10-year period in line with a recent meta-epidemiological study in nutrition research [20]. Initially, we planned to include the 10 highest impact factor journals, but three journals (New England Journal of Medicine, Nature Reviews Disease Primers, and Journal of cachexia, sarcopenia, and muscle) did not publish any systematic review with an eligible BoE-pair (see inclusion criteria in Table 1). We therefore included the subsequent three journals according to the JCR 2018 (Cochrane Database of Systematic Reviews, Mayo Clinic Proceedings, Canadian Medical Association Journal). The search strategy is given in Additional file 1 (Appendix S1). The title and abstract screening was conducted by one reviewer (NB), and potentially relevant full texts were screened by two reviewers independently (NB, LS). Any discrepancy was resolved by a third reviewer (JJM). Supplementary hand searches identified three additional systematic reviews [2123]. For each included BoE from a systematic review, we included a maximum of three patient-relevant outcomes (e.g., mortality, cardiovascular disease (CVD)), and a maximum of three intermediate disease markers (e.g., blood lipids). If more than three outcomes were available for a given systematic review, we included the primary outcomes, and thereafter, we used a top-down approach (mentioned first).

Evaluating similarity between BoE from RCTs and cohort studies

We evaluated the similarity of PI/ECO between BoE from RCTs and cohort studies. In accordance with a previous meta-epidemiological study [20], the acronym PI/ECO instead of PICO was used, to better represent exposures in cohort studies (e.g., serum vitamin D status) and to distinguish them from interventions in RCTs (e.g., vitamin D supplementation). For each BoE-pair, the similarity of each PI/ECO-domain was rated as “more or less identical,” “similar but not identical,” or “broadly similar.” Overall, the similarity of each BoE-pair was then determined according to the domain with the lowest degree of similarity. For example, when the PI/ECO-rating for the domain “population” was rated as “broadly similar” the overall similarity of this BoE-pair was also rated as “broadly similar.” The PI/ECO-similarity rating was conducted by two reviewers independently (NB, JB) using pre-specified criteria (Additional file 1: Table S1). Categorization of interventions and outcomes was conducted by two reviewers (NB, LH). Discrepancies of PI/ECO-similarity rating or categorizations were resolved through discussion with experts.

Data extraction

Data extraction was performed by two reviewers independently (NB, LH). The following data were extracted for each BoE: effect estimates, type of effect measure, 95% confidence interval (CI), number of studies, number of participants, number of events, and certainty of the evidence. Further, we extracted information on study characteristics of primary studies for each BoE: description of the study population, intervention/exposure, comparator, design of the primary study, intervention duration, and follow-up and risk of bias/study quality.

If RCTs were pooled with other types of studies (e.g., quasi-experimental RCTs), we performed a meta-analysis excluding these other study types. The rationale for this approach was the suggestion in the new Cochrane handbook to classify quasi-experimental RCTs as non-randomized studies of interventions (NRSI) [5]. This was the case for three BoE from RCTs [2426]. Accordingly, meta-analyses of cohort studies were recalculated if they included other study types (e.g., case-control studies); this was the case for 35 BoE from cohort studies [25, 2742]. If RCTs and cohort studies were pooled without subgroup analysis by study type, we performed separate meta-analyses; this was the case for nine BoE-pairs [37, 40, 4345]. Upon request, authors from one systematic review [45] provided data to perform separate meta-analyses. In two BoE-pairs from one systematic review evaluating infection outcomes of influenza vaccines [46] RCTs with different populations (community-dwelling and institutionalized) were combined in a single meta-analysis; we pooled respective cohort studies that were initially not combined. For ten BoE pairs [38, 42, 47, 48], we pooled different types of cohort studies (e.g., clinical cohorts, population-based cohorts) that were not pooled in the corresponding systematic review. If there was a meta-analysis for the BoE from one study type (e.g., RCTs) and a corresponding BoE from the other study type (e.g., cohort studies) was not pooled but relevant data were available, we pooled the respective primary studies: cohort studies for nine BoE pairs [4955] and primary RCTs for one BoE pair [56].

Statistical analysis

If the summary effect measure for binary or continuous outcomes was not the same for BoE from RCTs and BoE from cohort studies, we used the appropriate conversion formulas in order to have the two estimates expressed in the same measure: risk ratio (RR), odds ratio (OR), or hazard ratio (HR) for binary outcomes and mean difference (MD) for continuous outcomes.

If effect measures (RR, OR, HR) for binary outcomes were not the same within a BoE pair, they were converted to an identical effect measure (RR) using an assumed control risk (ACR); \(\mathrm{RR}=\frac{\mathrm{OR}}{1-\mathrm{ACR}\ \mathrm{x}\ \left(1-\mathrm{OR}\right)}\) [13, 57]. If either a RR, OR, or HR was used for both BoE, we did not convert summary effect estimates. We converted effect measures for binary outcomes for 16 BoE pairs [22, 23, 44, 5254, 56, 5860] and for continuous outcomes for one BoE pair [61]. Detailed descriptions about the conversions can be found in Additional file 1 (Table S2 [6266]). We standardized the direction of effect of the outcomes so that summary effect estimates (HR/OR/RR) <1 are always expressing a beneficial effect. We revised the direction of effect for three outcomes from the systematic reviews by Hüpfl et al. [67] (survival to all-cause mortality) and Alipanah et al. [24] (treatment success/completion to low treatment success, low treatment completion) (see Table 2). To quantify differences of effect estimates, we computed a ratio of ratios (RoR) [68] for each BoE pair with a binary outcome. For continuous outcomes, we computed a difference of mean differences (DMD). For the assessment of binary and continuous outcomes cohort studies served as the reference group. We pooled the RoRs across BoE-pairs using a random-effects model [69] to assess whether in total effect estimates of BoE from RCTs are larger or smaller in relation to those of BoE from cohort studies. The RoR does not indicate larger or smaller treatment effects in one of the BoE, but only differences between the two BoEs. The direction of difference depends on the direction of effect of the underlying BoEs. For example, a risk ratio from RCTs of 0.8 and a risk ratio from cohort studies of 1 would yield a RoR of 0.8, whereas a risk of 1.00 in RCTs compared with a risk ratio of 1.25 in cohort studies would also yield a RoR of 0.8. We pooled DMDs for the same continuous outcomes using a random-effects model [69]. We evaluated the statistical heterogeneity of effect estimates across all BoE-pairs with binary outcomes and across BoE pairs using the same continuous outcomes with the I2 and τ2 statistics [69, 70]. To estimate τ2, we used Paule and Mandel method [71, 72]. We computed 95% prediction intervals (PIs) to estimate the extent of differences between results of BoE from RCTs and BoE from cohort studies likely to occur in future comparisons. Meta-analyses were performed with the R package meta [73] using random-effects models [69].

Table 2 Effect estimates and overall PI/ECO-similarity degree for each included body of evidence-pair

Subgroup and sensitivity analyses

We performed pre-specified and post hoc subgroup analyses to explore factors potentially related to the disagreement of effect estimates. The study protocol specified subgroup analysis by degree of PI/ECO-similarity and intervention type (drug, invasive procedure, nutrient, vaccine). Post hoc subgroup analyses were performed by the type of binary effect estimate (RR, OR, HR), type of intervention stratified by degree of PI/ECO-similarity, and type of outcome (e.g., CVD outcomes, cancer outcomes). We performed a post hoc multivariable meta-regression among “similar but not identical” BoE pairs with binary outcomes. For each PI/ECO-domain, the average effect on the pooled RoR of the category “similar but not identical” was evaluated as compared to the reference category “more or less identical.” We performed two post hoc sensitivity analyses: First, by including only the BoE pair from each systematic review with the highest number of RCTs (if the number of RCTs was equal, we primarily included the BoE with the highest number of participants, followed by the highest number of events, followed by the highest number of cohort studies) and second, by direction of cohort study summary effect estimate (HR, OR, RR <1 vs. HR, OR, RR ≥1).

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for the design or implementation of the study. No patients were asked for advice on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.

Results

The literature search identified 1362 records of which 234 full texts were assessed for inclusion and 64 systematic reviews were included in this study (Additional file 1: Fig. S1 and Table S3). Overall, we included 129 BoE pairs [2156, 5861, 67, 7496] (Table 2). Three journals contributed a major part of systematic reviews (n = 51; 80%): the BMJ (n=22), Annals of Internal Medicine (n = 15), and the Cochrane Database of Systematic Reviews (n = 14). The number of studies in BoE from RCTs ranged from 1 to 41 (median: 4) and from 1 to 68 (median: 5) in BoE from cohort studies. The range of participants was 99 to 437,600 (median: 3541) in BoE from RCTs and 162 to 1,934,183 (median: 12,850) in BoE from cohort studies. We performed re-analyses for 70 BoE pairs from 38 systematic reviews [2225, 2756, 5861].

Interventions in BoE pairs (n = 129) consisted of invasive procedures (n = 44), drugs (n = 40), nutrition (n = 32), vaccines (n = 9), birth assistance (n = 2), blood transfusions (n = 1), and cardiopulmonary resuscitation (n = 1). The outcomes of the 129 BoE pairs were categorised as follows: all-cause mortality (n = 28), CVD outcomes (n = 27), drug safety outcomes including adherence outcomes (n = 20), infection outcomes (n = 14), orthopedic outcomes (n = 13), obstetrical outcomes (n = 10), oncological outcomes (n = 9), metabolic outcomes (n = 3), urological outcomes (n = 3), and neurological outcomes (n = 2).

The most frequently used tools for risk of bias assessment were the Cochrane risk of bias tool [97] for 94 (73%) BoE from RCTs and the Newcastle Ottawa scale [98] for 61 (47%) BoE from cohort studies. Certainty of the evidence ratings using GRADE [99] or Agency for Healthcare Research and Quality criteria [100] were available for 38 BoE from RCTs and 31 BoE from cohort studies. Study characteristics for each BoE including effect estimates, detailed descriptions of PI/ECO, the certainty of the evidence ratings, and study quality/risk of bias ratings of primary studies are depicted in Additional file 1 (Tables S4-S7); Additional file 1 (Table S8) shows an overview of the instruments that were used for risk of bias assessment.

Similarity degree

Two (1.5%) BoE pairs were rated as “more or less identical”; 90 (69.8%) were rated as “similar but not identical” and 37 (28.7%) as “broadly similar”. The rating “broadly similar” was due to differences of study populations (n = 16), interventions and comparators (n = 20), and both population and outcome (n = 1) (Table 3, Additional file 1: Table S9).

Table 3 Ratings of PI/ECO-similarity degree for the included body of evidence-pairs by each PI/ECO-element

Statistical heterogeneity of included individual comparisons

Median I2 across meta-analyses of RCTs was 8% and 46% across meta-analyses of cohort studies. For binary outcomes, median I2 was 4% for meta-analyses of RCTs and 44% for meta-analyses of cohort studies. For continuous outcomes, I2 was 9% across meta-analyses of RCTs and 69% across meta-analyses of cohort studies. Median I2 across meta-analyses with binary outcomes stratified by PI/ECO-similarity degree indicated higher statistical heterogeneity for “broadly similar” BoE: I2 was 23% for meta-analyses from RCTs and I2 was 62% for meta-analyses from cohort studies, whereas for “more or less identical” BoE, I2 was 0% for meta-analyses of RCTs and I2 was 34% for meta-analyses of cohort studies (Additional file 1: Table S10).

Meta-epidemiological analysis

Pooling RoRs across BoE pairs with binary outcomes resulted in a pooled RoR of 1.04 (95% CI 0.97 to 1.11; n = 120) with considerable statistical heterogeneity (I2 = 69%; τ2 = 0.061; 95% PI 0.63 to 1.71) (Fig. 1 and Table 4). Differences of MDs in continuous outcomes (n = 9) were mostly small, with the exception of operation duration for two types of knee prostheses where clear disagreement was shown [42] (Fig. 2).

Fig. 1
figure 1

Forest plot for binary outcomes, pooled ratio of ratios (RoR) for bodies of evidence from randomized controlled trials vs. cohort studies stratified by type of effect measure. CSs cohort studies, DDP-4 dipeptidyl peptidase 4, DHA docosahexaenoic acid, EPA eicosapentaenoic acid, HR hazard ratio, NSTE-ACS= non-ST elevation acute coronary syndrome, OR odds ratio, RCTs randomized controlled trials, RHR ratio of hazard ratios, ROR ratio of odds ratios, RR risk ratio, RRR ratio of risk ratios, SGLT-2 sodium glucose transporter 2

Table 4 Overview of main results for binary outcomes (n=120)
Fig. 2
figure 2

Forest plot for continuous outcomes, pooled difference of mean differences (DMD) for bodies of evidence from randomized controlled trials vs. cohort studies. CSs cohort studies, DMD difference of mean differences, MD mean difference, RCTs randomized controlled trials

Subgroup analyses

For BoE pairs using RRs as summary effect estimate the pooled RoR was 1.02 (95% CI 0.94 to 1.11; I2= 73%; τ2= 0.072; 95% PI 0.60 to 1.75; n=85) and RoR 1.11 (95% CI 0.98 to 1.25; I2=48%; τ2=0.039; 95% PI 0.72 to 1.70; n=30), RoR 1.01 (95% CI 0.78 to 1.30; I2= 31%; τ2= 0.026; 95% PI 0.52 to 1.95; n=5) for ORs and HRs, respectively (Fig. 1 and Table 4).

Analysis by overall PI/ECO-similarity degree of BoE-pairs showed a pooled RoR of 1.17 (95% CI 0.90 to 1.51; I2=0%; τ2=0.00; 95%; n=2) across “more or less identical,” 1.06 (95% CI 0.99 to 1.14; I2=54%; τ2=0.034; 95% PI 0.73 to 1.54; n=81) across “similar but not identical,” and 0.99 (95% CI 0.85 to 1.16; I2=82%; τ2=0.149; 95% PI 0.45 to 2.21; n=37) across “broadly similar” BoE-pairs (Fig. 3 and Table 4). Results of analyses by similarity of each PI/ECO-domain are depicted in Additional file 1 (Fig. S2a-d); in BoE-pairs with “broadly similar” intervention, the pooled RoR indicated the largest disagreement and statistical heterogeneity were highest (RoR: 1.14, 95% CI 0.87 to 1.49; I2= 86%; τ2= 0.194; 95% PI 0.42 to 3.08; n=15) (Additional file 1: Fig. S2b). Results of multivariable meta-regression by comparing for each PI/ECO-domain the “similar but not identical” to the reference category “more or less identical” among 81 BoE-pairs rated as “similar but not identical” with binary outcomes are as follows: On average, the pooled RoR was changed by the factor 1.14 for populations, 0.89 for interventions, 1.12 for comparators, and 1.02 for outcomes. The results of the meta-regression were not statistically significant (Table 5).

Fig. 3
figure 3

Forest plot for binary outcomes, pooled ratio of ratios (RoR) for bodies of evidence from randomized controlled trials vs. cohort studies stratified by overall PI/ECO*-similarity degree. *PI/ECO population, intervention/exposure, comparator, outcome, CSs cohort studies, DDP-4 dipeptidyl peptidase 4, DHA docosahexaenoic acid, EPA eicosapentaenoic acid, HR hazard ratio, NSTE-ACS non-ST elevation acute coronary syndrome, OR odds ratio, RCTs randomized controlled trials, RHR ratio of hazard ratios, ROR ratio of odds ratios, RR risk ratio; RRR ratio of risk ratios, SGLT-2 sodium glucose transporter 2

Table 5 Multivariable meta-regression for each PI/ECO-domain across body of evidence-pairs with binary outcomes within the category “similar but not identical”

Our analyses stratified by type of intervention showed the following: The pooled RoR was 1.04 (95% CI 0.89 to 1.21; I2= 76%; τ2= 0.139; 95% PI 0.48 to 2.24; n=40) for drugs, 1.00 (95% CI 0.91 to 1.10; I2= 25%; τ2= 0.011; 95% PI 0.79 to 1.26; n=39) for invasive procedures, 1.07 (95% CI 0.98 to 1.16; I2= 71%; τ2= 0.023; 95% PI 0.77 to 1.48; n=28) for nutrition-interventions, 1.24 (95% CI 0.87 to 1.75; I2= 80%; τ2= 0.177; 95% PI 0.42 to 3.63; n=9) for vaccines, 0.97 (95% CI 0.62 to 1.52; I2= 0%; τ2= 0; n=2) for birth assistance, 0.38 (95% CI 0.18 to 0.77; n=1) for blood transfusion, and 0.79 (95% CI 0.62 to 1.00; n=1) for cardiopulmonary resuscitation (Table 4, Additional file 1: Fig. S3). Exploratory analyses with stratification by PI/ECO-similarity degree within subgroups of interventions (Additional file 1: Fig. S3a-e) showed disagreement between both BoE for drugs with divergence between BoE-pairs rated as “broadly similar” (RoR: 0.79, 95% CI 0.56 to 1.11; I2= 69%; τ2=0.290; 95% PI 0.23 to 2.71; n=14) and BoE-pairs rated as “similar but not identical” (RoR: 1.20, 95% CI 1.05 to 1.37; I2=67%; τ2=0.050; 95% PI 0.74 to 1.94; n=26) (Additional file 1: Fig. S3b). For “broadly similar” BoE pairs from nutrition research, differences in effect estimates between both BoE were observed (RoR: 1.17, 95% CI 1.03 to 1.33; n=11) (Additional file 1: Fig. S3c). Exploratory analysis excluding BoE-pairs evaluating effects of vitamin D or calcium (n=8) resulted in estimates that were more in agreement (RoR: 1.09, 95% CI 1.04 to 1.14; I2=0%; τ2=0.00; 95% PI 1.04 to 1.15; n=20) and statistical heterogeneity disappeared (Additional file 1: Fig. S4). Analysis of BoE pairs evaluating vaccines indicated a higher extend of disagreement for “broadly similar” BoE-pairs (RoR: 1.37, 95% CI 0.86 to 2.17; I2=90%; τ2=0.177; 95% PI 0.17 to 10.88; n=4) compared to “similar but not identical” BoE-pairs (RoR: 1.09, 95% CI 0.62 to 1.92; I2=58%; τ2=0.177; 95% PI 0.19 to 6.45; n=5) (Additional file 1: Fig. S3d).

Stratified analyses by outcome-category are shown in Additional file 1 (Fig. S5) and Table 4. The pooled RoR was 0.94 (95% CI 0.82 to 1.09; I2=80%; τ2=0.075; 95% PI 0.53 to 1.69; n=28) for BoE pairs reporting all-cause mortality, 1.12 (95% CI 1.02 to 1.23; I2=43%; τ2=0.022; 95% PI 0.81 to 1.55; n=26) for CVD outcomes, and 1.06 (95% CI 0.89 to 1.26; I2=67%; τ2=0.068; 95% PI 0.60 to 1.90; n=20) for drug safety outcomes.

The results of the sensitivity analysis where only one outcome (with the largest number of RCTs) was chosen from each systematic review confirmed findings from the main analysis (RoR: 1.08, 95% CI 0.97 to 1.20; I2=76%; τ2=0.097; 95% PI 0.57 to 2.03; n=60) (Additional file 1: Fig. S6). Sensitivity analysis by direction of effect yielded a pooled RoR of 1.18 (95% CI 1.10 to 1.27; I2=61%; τ2=0.046; 95% PI 0.77 to 1.82; n=79) and 0.81 (95% CI 0.76 to 0.87; I2=16%; τ2=0.005; 95% PI 0.69 to 0.95; n=41) for BoE pairs where the cohort study effect estimate was <1 and ≥1, respectively (Additional file 1: Fig. S7).

Discussion

Summary of findings

This large meta-epidemiological study identified and compared empirical data investigating the same medical research question to determine the extent to which estimates of BoE from RCTs and cohort studies are in agreement. Overall, 129 BoE pairs derived from 64 systematic reviews were enclosed for the analyses. Only two BoE pairs were rated as “more or less identical” according to PI/ECO-similarity. For binary outcomes, the pooled RoR showed that on average, the extent of deviations towards larger and smaller effect estimates in BoE from RCTs versus cohort studies was almost identical. Differences of effect estimates between the two BoE for continuous outcomes were mostly small. Subgroup analyses by intervention type, type of effect measure, and outcome category showed that on average, there was a little indication for overall differences between both BoE (with the exception of subgroups for ORs and CVD outcomes). Even though the pooled RoR showed that on average effect estimates did not differ, this does not preclude important differences in individual comparisons and/or studies.

Pooling RoRs from BoE-pairs with pharmacological interventions resulted in high statistical heterogeneity. The pooled RoR was similar to the main analysis in BoE pairs with a higher and lower degree of PI/ECO-similarity between both BoE. However, when pooling RoRs, statistical heterogeneity was highest across BoE pairs with the most dissimilar PI/ECO and PIs were substantially wider. Analysis of the pooled RoR by direction of effect in cohort studies indicated differences between both study types. Post hoc analyses revealed that statistical heterogeneity was higher across meta-analyses from “broadly similar” than “similar but not identical” BoE pairs, and higher across cohort studies compared to RCTs.

Comparison with other studies

General medical field

The Cochrane review by Anglemyer et al. [18] evaluated the agreement of effect estimates between RCTs and observational studies in a sample of methodological reviews. Across nine reviews with specific estimates for RCTs versus cohort studies, they computed a pooled RoR of 1.04 (95% CI 0.89 to 1.21), which was nearly identical to our pooled RoR of 1.04 (95% CI 0.97 to 1.11). In the RCT versus cohort analysis, the overall difference of effect estimates was small for seven from nine studies; two studies [101, 102] showed discordance in different directions with a RoR of 0.71 and 3.58, respectively. Anglemyer et al. [18] concluded that on average, the difference of effect estimates between observational studies and RCTs is negligible and proposed that future work should explore other factors than the study design only that could explain occurring differences of effect estimates. In contrast to Anglemyer et al. [18], we performed more detailed data extraction, investigated PI/ECO-similarity degree, and calculated PIs. This allowed us to better understand potential differences. We evaluated statistical heterogeneity on different levels and showed that across the included meta-analyses as well as within the pooled RoR, median statistical heterogeneity and PI were highest across PI/ECO-dissimilar BoE-pairs, and higher across cohort studies compared to RCTs. Further, analysis by each PI/ECO-domain showed that differences of interventions were the main drivers towards disagreement; within the category “similar but not identical,” meta-regression showed that the average effects on the pooled RoR resulting from differences in populations, interventions, and comparators were comparably large, albeit not statistically significant.

Other research fields

Hong et al. [103] conducted a meta-epidemiological study comparing 74 pairs of summary effect estimates from RCTs and observational studies in the field of pharmacology. On average, differences were small albeit with considerable between-study variability, which is in line with our findings. Anglemyer et al. [18] showed differences between RCTs and all observational BoE for pharmacological studies (RoR: 1.17, 95% CI 0.95 to 1.43). In contrast, in our analysis, the pooled RoR for pharmacological BoE pairs was similar to the main analysis (RoR: 1.04, 95% CI 0.89 to 1.21). However, in stratified analyses, PI/ECO-similarity degree was an important driver for discordance across pharmacological BoE pairs: for “similar but not identical” BoE-pairs, the RoR was 1.20 and for “broadly similar” BoE-pairs, the RoR was 0.79, with considerable statistical heterogeneity (I2=67% and 69%, respectively). We found important differences of interventions in “broadly similar” BoE pairs; For example, early interventions at high CD4-cell counts with antiretroviral therapy in RCTs may prevent human immunodeficiency virus infection more likely compared to interventions at various disease stages in cohort studies [77]. Also, exposure to digoxin after myocardial infarction (MI) can increase mortality whereas in chronic heart failure (CHF) with sinus rhythm the effect on mortality is known to be more neutral [104, 105]. Hence, RCTs can show lower mortality when including populations with CHF and sinus rhythm than cohort studies that include MI survivors [96]. From BoE pairs rated as “similar but not identical,” many were from the cardiovascular field [40, 47, 48, 53, 96]. Both, BoE from RCTs and cohort studies often included mixed populations with acute and non-acute CVD [40, 47, 48]; this drives PI/ECO-dissimilarity and may increase statistical heterogeneity. A recent meta-epidemiological study has shown that differences in effect estimates between nutrition RCTs and cohort studies were mainly driven by dissimilarities in population, intervention or exposure, comparator, and outcome [20]. Franklin et al. [106] emulated ten selected pharmacological RCTs using observational data sets. For nine included RCT emulations, differences of effect estimates were within the range of random variation. Disagreement was largest in comparisons with active comparators in observational data and placebo in RCTs. The authors conclude that similar active comparators in RCTs, and observational studies increase the probability of agreement and stressed that different methods have a substantial impact on the finding of agreement.

Potential implications

RCTs are considered the gold standard to evaluate causal inference for medical interventions [13]. Due to a variety of reasons such as low external validity [7, 9] and limited availability of RCTs [5], health care professionals and other decision-makers increasingly rely on results from observational studies. However, results from RCTs and observational studies can differ [15, 18, 107] and efforts to understand under which circumstances this occurs are ongoing [106]. Our study provides valuable insights into the field of general and internal medicine, but also into other important research fields such as public health. We showed that BoE from RCTs and cohort studies included in systematic reviews from high-impact factor medical journals often differ in terms of study populations (e.g., different disease status), interventions and comparators (e.g., different intervention-timing, different drugs of the same class), or outcomes (e.g., late-stage disease versus any disease). Our data highlight the importance of PI/ECO-differences—especially those of interventions—in explaining differences of effect estimates. As a perspective, evaluating differences in factors such as study size, follow-up time, or publication date may serve to further explore disagreement between the two study design types. However, other factors require equal attention. Appropriate adjustment for confounding is a necessary precondition to consider results from observational studies and residual confounding remains a major concern [108]. To deal with these uncertainties evaluating the risk of bias is of tremendous importance to assess the trustworthiness of findings. In our sample, the Cochrane risk of bias tool [97] for RCTs and the Newcastle Ottawa scale (NOS) [98] for cohort studies were mainly used, along with a variety of other instruments to rate the risk of bias/study quality. We assume that the increased use of the ROBINS-I tool [109] may facilitate integrating both BoE in evidence syntheses and facilitate analyses by the risk of bias and certainty of the evidence in methodological studies. The ROBINS-I tool is based on the target trial approach [110] and permits to better compare evidence from RCTs and observational studies. This will be useful to investigate the influence of bias on differences between findings from RCTs and cohort studies. In general, cohort studies may serve as a source for complementary or sequential information, or even replace findings from RCTs [11]. In evidence synthesis, cohort studies are sometimes included as a complementary source of evidence to increase the precision and/or generalizability of findings [12]. However, caution is warranted when pooling both BoE since, as shown in our study, PI/ECO-differences are common between both BoE, and cohort studies showed higher statistical heterogeneity.

Strengths and limitations

Our study has several strengths: First, a large sample of BoE-pairs (n=129) derived from 64 systematic reviews with a high number of RCTs and cohort studies were included. BoE pairs investigated a broad range of medical topics from high-impact factor medical journals. Second, extensive data extraction, including a detailed description of the population, intervention, comparator, outcome, risk of bias ratings, and length of follow-up conducted by two reviewers independently allowed us to rigorously explore the clinical- and design features of the included BoE. Third, our analysis included an evaluation of agreement of effect estimates across the included BoE-pairs for binary and also continuous effect estimates. We stratified the analyses by type of binary effect measure, intervention-type, and outcome category. For the first time in the general medical field, we implemented an approach that allowed us to explore the influence of PI/ECO-differences on the disagreement of effect estimates.

Several limitations should be considered as well: First, meta-epidemiologic studies such as ours are based on an observational analysis and therefore show only non-causal associations [111, 112]. Factors such as publication date can act as meta-confounders. Further, we did not take into account the risk of bias/study quality and certainty of the evidence into the quantitative analysis, since the tools used by the systematic review authors were highly heterogeneous and often the corresponding information was not reported sufficiently in the systematic reviews. However, bias was assessed as follows in our sample: we showed that on average the effect estimates were in agreement (as shown by the pooled RoR) making systematic bias towards smaller or larger effect estimates unlikely. Potential bias may also exist in individual BoE pairs and influence the RoRs additionally to PI/ECO-differences. However, we showed that PI/ECO-dissimilarities were important drivers of statistical heterogeneity and wide PIs. Further, bias may affect individual cohort studies causing higher statistical heterogeneity in meta-analyses [13]. Accordingly, in our sample statistical heterogeneity in meta-analyses of cohort studies (median: I2= 46%) was higher than in meta-analyses of RCTs (median: I2= 8%). We did not explore whether disagreement was larger between RCTs compared to prospective and retrospective cohort studies, respectively. The corresponding information was reported in a suboptimal manner, and researchers may use inconsistent nomenclature [113, 114]. Second, we did not evaluate the methodological quality of the included systematic reviews, but given that we focused on high-impact journals, we assumed that published systematic reviews are of reasonably high methodological quality. Third, even though rating the degree of PI/ECO-similarity was performed by two reviewers using predefined criteria, this process is still partly subjective, and ratings may be too strict since only two BoE were judged as “more or less identical.” Further, PI/ECO-dissimilarities in BoE pairs were usually present in more than one PI/ECO-domain; this complicates drawing conclusions about the difference of effect estimates that results from a given PI/ECO-dissimilarity in one domain (e.g., from a difference of interventions). Fourth, performing several subgroup analyses might increase the likelihood of findings by chance. However, most of these analyses did not find any subgroup differences, thereby increasing our confidence in the findings of the main analysis. Further, with the exception of analysis by PI/ECO-similarity degree and intervention type, subgroup analyses were performed post hoc. However, analyses by type of effect estimate and outcome category were planned before the main analysis was conducted. Fifth, some degree of overlap between BoE cannot be ruled out since some primary studies contributed to more than one included BoE. This might have increased the precision of our findings. However, a sensitivity analysis of only one outcome per systematic review showed similar findings to the main analysis. Sixth, with regard to the search strategy, choosing another time frame may yield different results; however, we chose the dates to cover a 10-year period (January 01, 2010, to December 31, 2019). Further, the restriction on BoE pairs from the same systematic review may limit the representativeness of the sample. However, the main alternative, i.e., the inclusion of BoE from matched systematic reviews from RCTs and cohort studies, may have other drawbacks, such as impaired comparability of systematic review methodology.

Conclusions

On average the pooled effect estimates between RCTs and cohort studies did not differ. Statistical heterogeneity and wide PIs were mainly driven by PI/ECO-dissimilarities (i.e., clinical heterogeneity) and cohort studies. Differences of interventions were the main drivers towards disagreement; however, when focusing on “similar but not identical” BoE-pairs (i.e., with at least moderate similarity), the similarity degree categories (“similar but not identical,” “more or less identical”) affected more the average effect in populations, interventions, or comparators compared to the outcome albeit not statistically significant. The quantitative analysis did not assess how the risk of bias and certainty of the evidence influenced disagreement in addition to PI/ECO-dissimilarities. Upcoming meta-epidemiological studies may further explore the impact of risk of bias, certainty of the evidence, and residual confounding on differences of effect estimates between RCTs and cohort studies.

Availability of data and materials

Data are based on published meta-analyses.

Abbreviations

ACR:

Assumed control risk

BoE:

Body of evidence

CHF:

Chronic heart failure

CI:

Confidence interval

CSs:

Cohort studies

CVD:

Cardiovascular disease

DDP-4:

Dipeptidyl peptidase 4

DHA:

Docosahexaenoic acid

DMD:

Difference of mean differences

EPA:

Eicosapentaenoic acid

HR:

Hazard ratio

LDL:

Low-density lipoprotein

MD:

Mean difference

MI:

Myocardial infarction

NRSI:

Non-randomized studies of interventions

NSTE-ACS:

Non-ST elevation acute coronary syndrome

OR:

Odds ratio

PI:

Prediction interval

PI/ECO:

Population, intervention/exposure, comparator, outcome

RCT:

Randomized controlled trial

RHR:

Ratio of hazard ratios

RR:

Risk ratio

RoR:

Ratio of ratios

ROR:

Ratio of odds ratios

RRR:

Ratio of risk ratios

SGLT-2:

Sodium-glucose transporter 2

References

  1. Murad MH, Asi N, Alsawas M, Alahdab F. New evidence pyramid. Evid Based Med. 2016;21(4):125.

    Article  PubMed  PubMed Central  Google Scholar 

  2. OCEBM Levels of Evidence Working Group, Howick J, Chalmers I. In: Glasziou P, Greenhalgh T, Heneghan C, Liberati A, Moschetti I, Phillips B, Thornton H, Goddard O, Hodgkinson M, editors. The Oxford 2011 levels of evidence: Oxford Centre for Evidence-Based Medicine; 2011.

    Google Scholar 

  3. Kabisch M, Ruckes C, Seibert-Grafe M, Blettner M. Randomized controlled trials: part 17 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2011;108(39):663–8.

    PubMed  PubMed Central  Google Scholar 

  4. Rothman KJ, Greenland S, Lash TL. Modern epidemiology: Lippincott Williams & Wilkins; 2008.

    Google Scholar 

  5. Reeves BC, Deeks JJ, Higgins JP, Shea B, Tugwell P, Wells GA, et al. Including non-randomized studies on intervention effects. In: Cochrane handbook for systematic reviews of interventions; 2019. p. 595–620.

    Chapter  Google Scholar 

  6. Carlson MDA, Morrison RS. Study design, precision, and validity in observational studies. J Palliat Med. 2009;12(1):77–82.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Kennedy-Martin T, Curtis S, Faries D, Robinson S, Johnston J. A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials. 2015;16(1):495.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Rochon PA, Gurwitz JH, Sykora K, Mamdani M, Streiner DL, Garfinkel S, et al. Reader's guide to critical appraisal of cohort studies: 1. Role and design. BMJ. 2005;330(7496):895–7.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Averitt AJ, Weng C, Ryan P, Perotte A. Translating evidence into practice: eligibility criteria fail to eliminate clinically significant differences between real-world and study populations. NPJ Digit Med. 2020;3(1):67.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Tierney JF, Stewart LA. Investigating patient exclusion bias in meta-analysis. Int J Epidemiol. 2004;34(1):79–87.

    Article  PubMed  Google Scholar 

  11. Schünemann HJ, Tugwell P, Reeves BC, Akl EA, Santesso N, Spencer FA, et al. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):49–62.

    Article  PubMed  Google Scholar 

  12. Cuello-Garcia CA, Morgan RL, Brozek J, Santesso N, Verbeek J, Thayer K, et al. A scoping review and survey provides the rationale, perceptions, and preferences for the integration of randomized and nonrandomized studies in evidence syntheses and GRADE assessments. J Clin Epidemiol. 2018;98:33–40.

    Article  PubMed  Google Scholar 

  13. Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions: Wiley; 2019.

    Book  Google Scholar 

  14. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878–86.

    Article  CAS  PubMed  Google Scholar 

  15. Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA. 2001;286(7):821–30.

    Article  CAS  PubMed  Google Scholar 

  16. Lonjon G, Boutron I, Trinquart L, Ahmad N, Aim F, Nizard R, et al. Comparison of treatment effect estimates from prospective nonrandomized studies with propensity score analysis and randomized controlled trials of surgical procedures. Ann Surg. 2014;259(1):18–25.

    Article  PubMed  Google Scholar 

  17. Kuss O, Legler T, Börgermann J. Treatments effects from randomized trials and propensity score analyses were similar in similar populations in an example from cardiac surgery. J Clin Epidemiol. 2011;64(10):1076–84.

    Article  CAS  PubMed  Google Scholar 

  18. Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. 2014;2014(4):MR000034.

    PubMed Central  Google Scholar 

  19. Murad MH, Wang Z. Guidelines for reporting meta-epidemiological methodology research. Evid Based Med. 2017;22(4):139–42.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Schwingshackl L, Balduzzi S, Beyerbach J, Bröckelmann N, Werner SS, Zähringer J, et al. Evaluating agreement between bodies of evidence from randomised controlled trials and cohort studies in nutrition research: meta-epidemiological study. BMJ. 2021;374:n1864.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Azad MB, Abou-Setta AM, Chauhan BF, Rabbani R, Lys J, Copstein L, et al. Nonnutritive sweeteners and cardiometabolic health: a systematic review and meta-analysis of randomized controlled trials and prospective cohort studies. Can Med Assoc J. 2017;189(28):E929.

    Article  Google Scholar 

  22. Bloomfield HE, Koeller E, Greer N, MacDonald R, Kane R, Wilt TJ. Effects on health outcomes of a mediterranean diet with no restriction on fat intake: a systematic review and meta-analysis. Ann Intern Med. 2016;165(7):491–500.

    Article  PubMed  Google Scholar 

  23. Johnston BC, Zeraatkar D, Han MA, Vernooij RWM, Valli C, El Dib R, et al. Unprocessed red meat and processed meat consumption: dietary guideline recommendations from the Nutritional Recommendations (NutriRECS) Consortium. Ann Intern Med. 2019;171(10):756–64.

    Article  PubMed  Google Scholar 

  24. Alipanah N, Jarlsberg L, Miller C, Linh NN, Falzon D, Jaramillo E, et al. Adherence interventions and outcomes of tuberculosis treatment: a systematic review and meta-analysis of trials and observational studies. PLoS Med. 2018;15(7):e1002595.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Higgins JP, Soares-Weiser K, Lopez-Lopez JA, Kakourou A, Chaplin K, Christensen H, et al. Association of BCG, DTP, and measles containing vaccines with childhood mortality: systematic review. BMJ. 2016;355:i5170.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Suthar AB, Lawn SD, del Amo J, Getahun H, Dye C, Sculier D, et al. Antiretroviral therapy for prevention of tuberculosis in adults with HIV: a systematic review and meta-analysis. PLoS Med. 2012;9(7):e1001270.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ahmad Y, Sen S, Shun-Shin MJ, Ouyang J, Finegold JA, Al-Lamee RK, et al. Intra-aortic balloon pump therapy for acute myocardial infarction: a meta-analysis. JAMA Intern Med. 2015;175(6):931–9.

    Article  PubMed  Google Scholar 

  28. Barnard S, Kim C, Park MH, Ngo TD. Doctors or mid-level providers for abortion. Cochrane Database Syst Rev. 2015;(7):Cd011242. https://community.cochrane.org/stylemanual/references/reference-types/cochrane-publications.

  29. Brenner H, Stock C, Hoffmeister M. Effect of screening sigmoidoscopy and screening colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-analysis of randomised controlled trials and observational studies. BMJ. 2014;348:g2467.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Fenton JJ, Weyrich MS, Durbin S, Liu Y, Bang H, Melnikow J. Prostate-specific antigen-based screening for prostate cancer: evidence report and systematic review for the US preventive services task force. JAMA. 2018;319(18):1914–31.

    Article  PubMed  Google Scholar 

  31. Fluri F, Engelter S, Lyrer P. Extracranial-intracranial arterial bypass surgery for occlusive carotid artery disease. Cochrane Database Syst Rev. 2010;(2):Cd005953.

  32. Gargiulo G, Sannino A, Capodanno D, Barbanti M, Buccheri S, Perrino C, et al. Transcatheter aortic valve implantation versus surgical aortic valve replacement: a systematic review and meta-analysis. Ann Intern Med. 2016;165(5):334–44.

    Article  PubMed  Google Scholar 

  33. Hopley C, Stengel D, Ekkernkamp A, Wich M. Primary total hip arthroplasty versus hemiarthroplasty for displaced intracapsular hip fractures in older patients: systematic review. BMJ. 2010;340:c2332.

    Article  PubMed  Google Scholar 

  34. Jefferson T, Rivetti A, Di Pietrantonj C, Demicheli V, Ferroni E. Vaccines for preventing influenza in healthy children. Cochrane Database Syst Rev. 2012;(8):Cd004879.

  35. Molnar AO, Fergusson D, Tsampalieros AK, Bennett A, Fergusson N, Ramsay T, et al. Generic immunosuppression in solid organ transplantation: systematic review and meta-analysis. BMJ. 2015;350:h3163.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Nelson RL, Furner SE, Westercamp M, Farquhar C. Cesarean delivery for the prevention of anal incontinence. Cochrane Database Syst Rev. 2010;(2):Cd006756.

  37. Nieuwenhuijse MJ, Nelissen RG, Schoones JW, Sedrakyan A. Appraisal of evidence base for introduction of new implants in hip and knee replacement: a systematic review of five widely used device technologies. BMJ. 2014;349:g5133.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Raman G, Moorthy D, Hadar N, Dahabreh IJ, O'Donnell TF, Thaler DE, et al. Management strategies for asymptomatic carotid stenosis: a systematic review and meta-analysis. Ann Intern Med. 2013;158(9):676–85.

    Article  PubMed  Google Scholar 

  39. Schweizer M, Perencevich E, McDanel J, Carson J, Formanek M, Hafner J, et al. Effectiveness of a bundled intervention of decolonization and prophylaxis to decrease Gram positive surgical site infections after cardiac or orthopedic surgery: systematic review and meta-analysis. BMJ. 2013;346:f2743.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Silvain J, Beygui F, Barthelemy O, Pollack C, Cohen M, Zeymer U, et al. Efficacy and safety of enoxaparin versus unfractionated heparin during percutaneous coronary intervention: systematic review and meta-analysis. BMJ. 2012;344:e553.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Wilson A, Gallos ID, Plana N, Lissauer D, Khan KS, Zamora J, et al. Effectiveness of strategies incorporating training and support of traditional birth attendants on perinatal and maternal mortality: meta-analysis. BMJ. 2011;343:d7102.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Wilson HA, Middleton R, Abram SGF, Smith S, Alvand A, Jackson WF, et al. Patient relevant outcomes of unicompartmental versus total knee replacement: systematic review and meta-analysis. BMJ. 2019;364:l352.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Filippini G, Del Giovane C, Clerico M, Beiki O, Mattoscio M, Piazza F, et al. Treatment with disease-modifying drugs for people with a first clinical attack suggestive of multiple sclerosis. Cochrane Database Syst Rev. 2017;4:Cd012200.

    PubMed  Google Scholar 

  44. Yank V, Tuohy CV, Logan AC, Bravata DM, Staudenmayer K, Eisenhut R, et al. Systematic review: benefits and harms of in-hospital use of recombinant factor VIIa for off-label indications. Ann Intern Med. 2011;154(8):529–40.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Tricco AC, Zarin W, Cardoso R, Veroniki AA, Khan PA, Nincic V, et al. Efficacy, effectiveness, and safety of herpes zoster vaccines in adults aged 50 and older: systematic review and network meta-analysis. BMJ. 2018;363:k4029.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Jefferson T, Di Pietrantonj C, Al-Ansary LA, Ferroni E, Thorning S, Thomas RE. Vaccines for preventing influenza in the elderly. Cochrane Database Syst Rev. 2010;(2):Cd004876.

  47. Bellemain-Appaix A, Kerneis M, O'Connor SA, Silvain J, Cucherat M, Beygui F, et al. Reappraisal of thienopyridine pretreatment in patients with non-ST elevation acute coronary syndrome: a systematic review and meta-analysis. BMJ. 2014;349:g6269.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Bellemain-Appaix A, O'Connor SA, Silvain J, Cucherat M, Beygui F, Barthelemy O, et al. Association of clopidogrel pretreatment with mortality, cardiovascular events, and major bleeding among patients undergoing percutaneous coronary intervention: a systematic review and meta-analysis. JAMA. 2012;308(23):2507–16.

    Article  CAS  PubMed  Google Scholar 

  49. Bolland MJ, Leung W, Tai V, Bastin S, Gamble GD, Grey A, et al. Calcium intake and risk of fracture: systematic review. BMJ. 2015;351:h4580.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Hartling L, Dryden DM, Guthrie A, Muise M, Vandermeer B, Donovan L. Benefits and harms of treating gestational diabetes mellitus: a systematic review and meta-analysis for the U.S. Preventive Services Task Force and the National Institutes of Health Office of Medical Applications of Research. Ann Intern Med. 2013;159(2):123–9.

    Article  PubMed  Google Scholar 

  51. Henderson JT, Webber EM, Bean SI. Screening for asymptomatic bacteriuria in adults: updated evidence report and systematic review for the US preventive services task force. JAMA. 2019;322(12):1195–205.

    Article  PubMed  Google Scholar 

  52. Kansagara D, Dyer E, Englander H, Fu R, Freeman M, Kagen D. Treatment of anemia in patients with heart disease: a systematic review. Ann Intern Med. 2013;159(11):746–57.

    Article  PubMed  Google Scholar 

  53. Li L, Li S, Deng K, Liu J, Vandvik PO, Zhao P, et al. Dipeptidyl peptidase-4 inhibitors and risk of heart failure in type 2 diabetes: systematic review and meta-analysis of randomised and observational studies. BMJ. 2016;352:i610.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Li L, Shen J, Bala MM, Busse JW, Ebrahim S, Vandvik PO, et al. Incretin treatment and risk of pancreatitis in patients with type 2 diabetes mellitus: systematic review and meta-analysis of randomised and non-randomised studies. BMJ. 2014;348:g2366.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Nikooie R, Neufeld KJ, Oh ES, Wilson LM, Zhang A, Robinson KA, et al. Antipsychotics for Treating delirium in hospitalized adults: a systematic review. Ann Intern Med. 2019;171(7):485–95.

    Article  PubMed  Google Scholar 

  56. Chung M, Tang AM, Fu Z, Wang DD, Newberry SJ. Calcium intake and cardiovascular disease risk: an updated systematic review and meta-analysis. Ann Intern Med. 2016;165(12):856–66.

    Article  PubMed  Google Scholar 

  57. Grant RL. Converting an odds ratio to a range of plausible relative risks for better communication of research findings. BMJ. 2014;348:f7450.

    Article  PubMed  Google Scholar 

  58. Chung M, Lee J, Terasawa T, Lau J, Trikalinos TA. Vitamin D with or without calcium supplementation for prevention of cancer and fractures: an updated meta-analysis for the U.S. Preventive Services Task Force. Ann Intern Med. 2011;155(12):827–38.

    Article  PubMed  Google Scholar 

  59. Vinceti M, Filippini T, Del Giovane C, Dennert G, Zwahlen M, Brinkman M, et al. Selenium for preventing cancer. Cochrane Database Syst Rev. 2018;1:Cd005195.

    PubMed  Google Scholar 

  60. Pittas AG, Chung M, Trikalinos T, Mitri J, Brendel M, Patel K, et al. Systematic review: vitamin D and cardiometabolic outcomes. Ann Intern Med. 2010;152(5):307–14.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Te Morenga L, Mallard S, Mann J. Dietary sugars and body weight: systematic review and meta-analyses of randomised controlled trials and cohort studies. BMJ. 2013;346:e7492.

    Article  Google Scholar 

  62. Toledo E, Salas-Salvadó J, Donat-Vargas C, Buil-Cosiales P, Estruch R, Ros E, et al. Mediterranean diet and invasive breast cancer risk among women at high cardiovascular risk in the PREDIMED trial: a randomized Clinical Trial. JAMA Intern Med. 2015;175(11):1752–60.

    Article  PubMed  Google Scholar 

  63. Thomson CA, Van Horn L, Caan BJ, Aragaki AK, Chlebowski RT, Manson JE, et al. Cancer incidence and mortality during the intervention and postintervention periods of the women’s health initiative dietary modification trial. Cancer Epidemiol Biomark Prev. 2014;23(12):2924–35.

    Article  Google Scholar 

  64. Howard BV, Van Horn L, Hsia J, Manson JE, Stefanick ML, Wassertheil-Smoller S, et al. Low-fat dietary pattern and risk of cardiovascular diseasethe women’s health initiative randomized controlled dietary modification trial. JAMA. 2006;295(6):655–66.

    Article  CAS  PubMed  Google Scholar 

  65. Margolis KL, Ray RM, Horn LV, Manson JE, Allison MA, Black HR, et al. Effect of calcium and vitamin D supplementation on blood pressure. Hypertension. 2008;52(5):847–55.

    Article  CAS  PubMed  Google Scholar 

  66. Trivedi DP, Doll R, Khaw KT. Effect of four monthly oral vitamin D3 (cholecalciferol) supplementation on fractures and mortality in men and women living in the community: randomised double blind controlled trial. BMJ. 2003;326(7387):469.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Hüpfl M, Selig HF, Nagele P. Chest-compression-only versus standard cardiopulmonary resuscitation: a meta-analysis. Lancet. 2010;376(9752):1552–7.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003;326(7382):219.

    Article  PubMed  PubMed Central  Google Scholar 

  69. Riley RD, Higgins JP, Deeks JJ. Interpretation of random effects meta-analyses. BMJ. 2011;342:d549.

    Article  PubMed  Google Scholar 

  70. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60.

    Article  PubMed  PubMed Central  Google Scholar 

  71. Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods. 2016;7(1):55–79.

    Article  PubMed  Google Scholar 

  72. Paule RC, Mandel J. Consensus values and weighting factors. J Res Natl Bur Stand. 1982;87(5):377.

  73. Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019;22(4):153–60.

    Article  PubMed  Google Scholar 

  74. Abou-Setta AM, Beaupre LA, Rashiq S, Dryden DM, Hamm MP, Sadowski CA, et al. Comparative effectiveness of pain management interventions for hip fracture: a systematic review. Ann Intern Med. 2011;155(4):234–45.

    Article  PubMed  Google Scholar 

  75. Aburto NJ, Ziolkovska A, Hooper L, Elliott P, Cappuccio FP, Meerpohl JJ. Effect of lower sodium intake on health: systematic review and meta-analyses. BMJ. 2013;346:f1326.

    Article  PubMed  PubMed Central  Google Scholar 

  76. Alexander DD, Miller PE, Van Elswyk ME, Kuratko CN, Bylsma LC. A meta-analysis of randomized controlled trials and prospective cohort studies of eicosapentaenoic and docosahexaenoic long-chain Omega-3 fatty acids and coronary heart disease risk. Mayo Clin Proc. 2017;92(1):15–29.

    Article  CAS  PubMed  Google Scholar 

  77. Anglemyer A, Rutherford GW, Horvath T, Baggaley RC, Egger M, Siegfried N. Antiretroviral therapy for prevention of HIV transmission in HIV-discordant couples. Cochrane Database Syst Rev. 2013;(4):Cd009153.

  78. Chowdhury R, Stevens S, Gorman D, Pan A, Warnakula S, Chowdhury S, et al. Association between fish consumption, long chain omega 3 fatty acids, and risk of cerebrovascular disease: systematic review and meta-analysis. BMJ. 2012;345:e6698.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  79. Chowdhury R, Warnakula S, Kunutsor S, Crowe F, Ward HA, Johnson L, et al. Association of dietary, circulating, and supplement fatty acids with coronary risk: a systematic review and meta-analysis. Ann Intern Med. 2014;160(6):398–406.

    Article  PubMed  Google Scholar 

  80. Chowdhury R, Kunutsor S, Vitezova A, Oliver-Williams C, Chowdhury S, Kiefte-de-Jong JC, et al. Vitamin D and risk of cause specific death: systematic review and meta-analysis of observational cohort and randomised intervention studies. BMJ. 2014;348:g1903.

    Article  PubMed  PubMed Central  Google Scholar 

  81. Ding M, Huang T, Bergholdt HK, Nordestgaard BG, Ellervik C, Qi L. Dairy consumption, systolic blood pressure, and risk of hypertension: Mendelian randomization study. BMJ. 2017;356:j1000.

    Article  PubMed  PubMed Central  Google Scholar 

  82. Jamal SA, Vandermeer B, Raggi P, Mendelssohn DC, Chatterley T, Dorgan M, et al. Effect of calcium-based versus non-calcium-based phosphate binders on mortality in patients with chronic kidney disease: an updated systematic review and meta-analysis. Lancet. 2013;382(9900):1268–77.

    Article  CAS  PubMed  Google Scholar 

  83. Jin H, Leng Q, Li C. Dietary flavonoid for preventing colorectal neoplasms. Cochrane Database Syst Rev. 2012;(8):Cd009350.

  84. Keag OE, Norman JE, Stock SJ. Long-term risks and benefits associated with cesarean delivery for mother, baby, and subsequent pregnancies: systematic review and meta-analysis. PLoS Med. 2018;15(1):e1002494.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Kredo T, Adeniyi FB, Bateganya M, Pienaar ED. Task shifting from doctors to non-doctors for initiation and maintenance of antiretroviral therapy. Cochrane Database Syst Rev. 2014;(7):Cd007331.

  86. Matthews A, Stanway S, Farmer RE, Strongman H, Thomas S, Lyon AR, et al. Long term adjuvant endocrine therapy and risk of cardiovascular disease in female breast cancer survivors: systematic review. BMJ. 2018;363:k3845.

    Article  PubMed  PubMed Central  Google Scholar 

  87. Menne J, Dumann E, Haller H, Schmidt BMW. Acute kidney injury and adverse renal events in patients receiving SGLT2-inhibitors: a systematic review and meta-analysis. PLoS Med. 2019;16(12):e1002983.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Mesgarpour B, Heidinger BH, Roth D, Schmitz S, Walsh CD, Herkner H. Harms of off-label erythropoiesis-stimulating agents for critically ill people. Cochrane Database Syst Rev. 2017;8:Cd010969.

    PubMed  Google Scholar 

  89. Moberley S, Holden J, Tatham DP, Andrews RM. Vaccines for preventing pneumococcal infection in adults. Cochrane Database Syst Rev. 2013;(1):Cd000422.

  90. Navarese EP, Gurbel PA, Andreotti F, Tantry U, Jeong YH, Kozinski M, et al. Optimal timing of coronary invasive strategy in non-ST-segment elevation acute coronary syndromes: a systematic review and meta-analysis. Ann Intern Med. 2013;158(4):261–70.

    Article  PubMed  Google Scholar 

  91. Ochen Y, Beks RB, van Heijl M, Hietbrink F, Leenen LPH, van der Velde D, et al. Operative treatment versus nonoperative treatment of Achilles tendon ruptures: systematic review and meta-analysis. BMJ. 2019;364:k5120.

    Article  PubMed  PubMed Central  Google Scholar 

  92. Thomas RE, Jefferson T, Lasserson TJ. Influenza vaccination for healthcare workers who work with the elderly. Cochrane Database Syst Rev. 2010;(2):Cd005187.

  93. Tickell-Painter M, Maayan N, Saunders R, Pace C, Sinclair D. Mefloquine for preventing malaria during travel to endemic areas. Cochrane Database Syst Rev. 2017;10:Cd006491.

    PubMed  Google Scholar 

  94. Zhang XL, Zhu L, Wei ZH, Zhu QQ, Qiao JZ, Dai Q, et al. Comparative efficacy and safety of everolimus-eluting bioresorbable scaffold versus everolimus-eluting metallic stents: a systematic review and meta-analysis. Ann Intern Med. 2016;164(11):752–63.

    Article  PubMed  Google Scholar 

  95. Zhang XL, Zhu QQ, Yang JJ, Chen YH, Li Y, Zhu SH, et al. Percutaneous intervention versus coronary artery bypass graft surgery in left main coronary artery stenosis: a systematic review and meta-analysis. BMC Med. 2017;15(1):84.

    Article  PubMed  PubMed Central  Google Scholar 

  96. Ziff OJ, Lane DA, Samra M, Griffith M, Kirchhof P, Lip GY, et al. Safety and efficacy of digoxin: systematic review and meta-analysis of observational and controlled trial data. BMJ. 2015;351:h4451.

    Article  PubMed  PubMed Central  Google Scholar 

  97. Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

    Article  PubMed  Google Scholar 

  98. Wells GA, Shea B, O’Connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Oxford: University of Ottawa; 2000.

  99. Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

    Article  PubMed  Google Scholar 

  100. Owens DK, Lohr KN, Atkins D, Treadwell JR, Reston JT, Bass EB, et al. AHRQ Series Paper 5: Grading the strength of a body of evidence when comparing medical interventions—agency for Healthcare Research and Quality and the Effective Health-Care Program. J Clin Epidemiol. 2010;63(5):513–23.

    Article  PubMed  Google Scholar 

  101. Bhandari M, Tornetta P, Ellis T, Audige L, Sprague S, Kuo JC, et al. Hierarchy of evidence: differences in results between non-randomized studies and randomized trials in patients with femoral neck fractures. Arch Orthop Trauma Surg. 2004;124(1):10–6.

    Article  PubMed  Google Scholar 

  102. Naudet F, Maria AS, Falissard B. Antidepressant response in major depressive disorder: a meta-regression comparison of randomized controlled trials and observational studies. PLoS One. 2011;6(6):e20811. https://doi.org/10.1371/journal.pone.0020811

  103. Hong YD, Jansen JP, Guerino J, Berger ML, Crown W, Goettsch WG, et al. Comparative effectiveness and safety of pharmaceuticals assessed in observational studies compared with randomized controlled trials. BMC Med. 2021;19(1):307.

    Article  PubMed  PubMed Central  Google Scholar 

  104. Virgadamo S, Charnigo R, Darrat Y, Morales G, Elayi CS. Digoxin: a systematic review in atrial fibrillation, congestive heart failure and post myocardial infarction. World J Cardiol. 2015;7(11):808–16.

    Article  PubMed  PubMed Central  Google Scholar 

  105. Vamos M, Erath JW, Hohnloser SH. Digoxin-associated mortality: a systematic review and meta-analysis of the literature. Eur Heart J. 2015;36(28):1831–8.

    Article  CAS  PubMed  Google Scholar 

  106. Franklin JM, Patorno E, Desai RJ, Glynn RJ, Martin D, Quinto K, et al. Emulating randomized clinical trials with nonrandomized real-world evidence studies. Circulation. 2021;143(10):1002–13.

    Article  PubMed  Google Scholar 

  107. Peinemann F, Tushabe DA, Kleijnen J. Using multiple types of studies in systematic reviews of health care interventions – a systematic review. PLoS One. 2013;8(12):e85035.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  108. Schünemann HJ, Cuello C, Akl EA, Mustafa RA, Meerpohl JJ, Thayer K, et al. GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. J Clin Epidemiol. 2019;111:105–14.

    Article  PubMed  Google Scholar 

  109. Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

    Article  PubMed  PubMed Central  Google Scholar 

  110. Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64.

    Article  PubMed  PubMed Central  Google Scholar 

  111. Herbert RD. Controversy and debate on meta-epidemiology. Paper 2: meta-epidemiological studies of bias may themselves be biased. J Clin Epidemiol. 2020;123:127–30.

    Article  PubMed  Google Scholar 

  112. Page MJ. Controversy and debate on meta-epidemiology. Paper 4: confounding and other concerns in meta-epidemiological studies of bias. J Clin Epidemiol. 2020;123:133–4.

    Article  PubMed  Google Scholar 

  113. Vandenbroucke JP. Prospective or retrospective: what’s in a name? BMJ. 1991;302(6771):249.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Bashir MM, Maskari FA, Ahmed L, Al-Rifai RH. Prospective vs retrospective cohort studies: is a consensus needed? Int J Epidemiol. 2021;50(Supplement_1):dyab168.063.

    Article  Google Scholar 

Download references

Acknowledgements

None.

Funding

Open Access funding enabled and organized by Projekt DEAL. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Projektnummer 459430615.

Author information

Authors and Affiliations

Authors

Contributions

NB, SB, LH, JB, CK, KG, MW, JJM, and LS designed the research. NB, LH, SB, MP, and LS analyzed the data and wrote the first draft of the paper. NB, SB, LH, JB, CK, MP, MW, JJM, and LS interpreted the data, read the manuscript, and approved the final version. NB and LS are guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Corresponding author

Correspondence to Lukas Schwingshackl.

Ethics declarations

Ethics approval and consent to participate

Not applicable since we did not include any human subject.

Consent for publication

Not applicable since we did not include any human subject.

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Appendix S1.

Search strategy for systematic reviews. Figure S1. Flow diagram, identification of systematic reviews. Table S1. Criteria for rating PI/ECO-similarity degree. Table S2. Transformations made to the original data extraction. Table S3. Reasons for exclusion of systematic reviews. Table S4. Characteristics of included BoE from RCTs. Table S5. Certainty of the evidence and risk of bias for BoE from RCTs. Table S6. Characteristics of BoE from cohort studies. Table S7. Risk of bias and certainty of the evidence for BoE from cohort studies. Table S8. Heat map: instruments used for the assessment of risk of bias for BoE from RCTs and cohort studies. Table S9. Ratings of PI/ECO-similarity degree for included BoE-pairs. Table S10. Effect estimates and statistical heterogeneity for meta-analyses of RCTs and cohort studies. Figure S2a. Forest plot, analysis by population similarity degree. Figure S2b. Forest plot, analysis by intervention/ exposure similarity degree. Figure S2c. Forest plot, analysis by comparator similarity degree. Figure S2d. Forest plot, analysis by outcome similarity degree. Figure S3. Forest plot, analysis by intervention-type. Figure S3a. Forest plot, analysis of invasive procedures, stratified by PI/ECO-similarity degree. Figure S3b. Forest plot, analysis of drugs as intervention, stratified by PI/ECO-similarity degree. Figure S3c. Forest plot, analysis of nutrition as intervention, stratified by PI/ECO-similarity degree. Figure S3d. Forest plot, analysis of vaccines as intervention, stratified by PI/ECO-similarity degree. Figure S3e. Forest plot, analysis of birth assistance as intervention, stratified by PI/ECO-similarity degree. Figure S4. Forest plot, analysis of nutrition as intervention: Vitamin D/ Calcium as intervention vs. other nutrition-interventions. Figure S5. Forest plot, analysis by outcome-category. Figure S6. Sensitivity analysis: one BoE-pair per systematic review. Figure S7. Sensitivity analysis by direction of cohort study summary effect estimate.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bröckelmann, N., Balduzzi, S., Harms, L. et al. Evaluating agreement between bodies of evidence from randomized controlled trials and cohort studies in medical research: a meta-epidemiological study. BMC Med 20, 174 (2022). https://doi.org/10.1186/s12916-022-02369-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12916-022-02369-2

Keywords