- Research article
- Open Access
- Open Peer Review
This article has Open Peer Review reports available.
Systematic reviews: a cross-sectional study of location and citation counts
© Montori et al; licensee BioMed Central Ltd. 2003
Received: 22 July 2003
Accepted: 24 November 2003
Published: 24 November 2003
Systematic reviews summarize all pertinent evidence on a defined health question. They help clinical scientists to direct their research and clinicians to keep updated. Our objective was to determine the extent to which systematic reviews are clustered in a large collection of clinical journals and whether review type (narrative or systematic) affects citation counts.
We used hand searches of 170 clinical journals in the fields of general internal medicine, primary medical care, nursing, and mental health to identify review articles (year 2000). We defined 'review' as any full text article that was bannered as a review, overview, or meta-analysis in the title or in a section heading, or that indicated in the text that the intention of the authors was to review or summarize the literature on a particular topic. We obtained citation counts for review articles in the five journals that published the most systematic reviews.
11% of the journals concentrated 80% of all systematic reviews. Impact factors were weakly correlated with the publication of systematic reviews (R2 = 0.075, P = 0.0035). There were more citations for systematic reviews (median 26.5, IQR 12 – 56.5) than for narrative reviews (8, 20, P <.0001 for the difference). Systematic reviews had twice as many citations as narrative reviews published in the same journal (95% confidence interval 1.5 – 2.7).
A few clinical journals published most systematic reviews. Authors cited systematic reviews more often than narrative reviews, an indirect endorsement of the 'hierarchy of evidence'.
Evidence-based medicine (EBM) is the judicious and conscientious incorporation of the best available evidence into clinical decision making while considering the values and preferences of the patient . This definition of EBM invokes a hierarchy of evidence.
Systematic reviews of the literature occupy the highest position in currently proposed hierarchies of evidence. It is argued that systematic reviews should occupy this top position because of two fundamental premises. First, clinical reviews systematically search, identify, and summarize the available evidence that answers a focused clinical question, with particular attention to methodological quality. Second, reviews that include a meta-analysis provide precise estimates of the association or the treatment effect. Clinicians can apply the results of meta-analyses to a wide array of patients – certainly wider than those included in each of the primary studies – that do not differ importantly from those enrolled in the primary studies.
Narrative reviews are summaries of research that lack an explicit description of a systematic approach. Despite the emerging dominance of systematic reviews, narrative reviews persist. A study by Antman et al .  found that narrative reviews, which frequently reflected the opinion of a single expert, lagged behind the evidence, disagreed with the existing evidence, and disagreed with other published expert opinions. Mulrow[4, 5] and later McAlister et al . [4, 5] found that these reviews lacked methods to limit the intrusion of bias in the summary or the conclusions.
Because of the importance of systematic reviews in summarizing the advances of health care knowledge, their number is growing rapidly. The Cochrane Collaboration, a world-wide enterprise to produce and disseminate systematic reviews of effectiveness, has published in excess of 1000 systematic reviews since its inception [6, 7]. Collectively, other groups and individuals have likely contributed three to five times that number in the past 20 years and these reviews are dispersed throughout the medical literature . Researchers wanting to define the frontier of current research and clinicians wanting to practice EBM should be able to reliably and quickly find all valid systematic reviews of the literature. Nonetheless, researchers have reported difficulty finding systematic reviews within the mass of biomedical literature represented in large bibliographic databases such as MEDLINE [9–12].
If systematic reviews in fact represent the best available evidence, they are likely to have great clinical importance. It follows that they be cited often in the literature. The Institute for Scientific Information (ISI) impact factors reflect (albeit far from ideally ) the prestige of a journal and the importance of the articles it publishes. The impact factor for a journal is the number of citations to articles published in the journal in the past two years, divided by the number of articles published during that period. ISI also reports the number of citations that individual articles receive. Since the impact factor relates to the overall citation performance of the articles a journal publishes and not to any individual article type, and since systematic reviews are a relatively small fraction of all articles published in journals, we did not expect a strong association between impact factors and the frequency of publication of systematic reviews. However, we hypothesized that the number of citations for systematic reviews would be greater than the number of citations for a "look alike" article, in this case, a narrative review published in the same journal.
Thus, we sought to answer the following research questions: (1) Where are systematic reviews published? (2) What is the relation between journal impact factor and journal yield of systematic reviews? (3) Do systematic reviews receive more citations than narrative reviews?
Answers to our first question may lead to definition of journal subsets in MEDLINE within which most systematic reviews will reside. Answers to our second and third questions will indicate whether the literature reflects the hierarchy of evidence, one of the basic tenets of EBM.
The Hedges Team of the Health Information Research Unit (HIRU) at McMaster University is conducting an expansion and update of our 1991 work on search filters or 'hedges' to aid clinicians, researchers, and policymakers harness high-quality and relevant information from MEDLINE . We planned to conduct the present work within the larger context of the Hedges Project prior to the onset of data collection and analyses.
The editorial group at HIRU prepares four evidence-based medical journals, the ACP Journal Club, Evidence-based Medicine, Evidence-based Nursing, and, up to 2003, Evidence-based Mental Health. These journals help keep healthcare providers up-to-date. To produce these secondary journals, the editorial staff has identified 170 journals that regularly publish clinically-relevant research in the areas of focus of these evidence-based journals (i.e., general internal medicine, family practice, nursing, and mental health). We evaluated journals for inclusion into this set that have the highest Science Citation Index Impact Factors in each field and journals that clinicians and librarians who collaborate with HIRU recommended based on their perceived yield of important papers. The editorial staff then monitors the yield of original studies and reviews of scientific merit and clinical relevance (criteria below) for each of these journals, to determine if they should be kept on the list or replaced with higher yielding nominated journals.
Study identification and classification
On an ongoing basis, six research associates review each of these journals and apply methodological criteria to each item to determine if the article is eligible for inclusion in the evidence-based publications. For the purpose of the Hedges Project (i.e., to develop search strategies for large bibliographic databases such as MEDLINE), we expanded the data collection effort and began intensive training and calibration of the research staff in 1999. In this manuscript, we report the κ statistic measuring chance-adjusted agreement between the six research assistants for each classification procedure.
Purpose categories: definitions and criteria of methodological rigor
Etiology (causation and safety)
Content pertains directly to determining if there is an association between an exposure and a disease or condition. The question is "What causes people to get a disease or condition?"
Observations concerned with the relationship between exposures and putative clinical outcomes; data collection is prospective; clearly identified comparison group(s); blinding of observers of outcome to exposure.
Content pertains directly to the prediction of the clinical course or the natural history of a disease or condition with the disease or condition existing at the beginning of the study.
Inception cohort of individuals all initially free of the outcome of interest; follow-up of at least 80% of patients until occurrence of a major study end point or to the end of the study; analysis consistent with study design.
Content pertains directly to using a tool to arrive at a diagnosis of a disease or condition.
Inclusion of a spectrum of participants; objective diagnostic reference standard OR current clinical standard for diagnosis; participants received both the new test and some form of the diagnostic standard; interpretation of the diagnostic standard without knowledge of test result and vise versa; analysis consistent with study design.
Content pertains directly to an intervention for therapy (including adverse effects studies), prevention, rehabilitation, quality improvement, or continuing medical education.
Random allocation of participants to comparison groups; outcome assessment of at least 80% of those entering the investigation accounted for in 1 major analysis at any given follow-up assessment; analysis consistent with study design.
Content pertains directly to the economics of a healthcare issue with the economic question addressed being based on the comparison of alternatives.
Question is a comparison of the alternatives; alternative services or activities compared on outcomes produced (effectiveness) and resources consumed (costs); evidence of effectiveness must from a study of real patients that meets the above-noted criteria for diagnosis, treatment, quality improvement, or a systematic review article; effectiveness and cost estimates based on individual patient data (micro-economics); results presented in terms of the incremental or additional costs and outcomes of one intervention over another; sensitivity analysis if there is uncertainty.
Clinical prediction guide
Content pertains directly to the prediction of some aspect of a disease or condition.
Guide is generated in one or more sets of real patients (training set); guide is validated in another set of real patients (test set).
For the purposes of the Hedges Project, we defined review as any full text article that was bannered as a review, overview, or meta-analysis in the title or in a section heading, or that indicated in the text that the intention of the authors was to review or summarize the literature on a particular topic . To be considered a systematic review, the authors had to clearly state the clinical topic of the review, how the evidence was retrieved and from what sources (i.e., name the databases), and provide explicit inclusion and exclusion criteria. The absence of any one of these 3 characteristics would classify a review as a narrative review. The inter-rater agreement for this classification was almost perfect (κ = 0.92, 95% confidence interval 0.89 – 0.95).
Then, we classified all reviews by whether they were concerned with the understanding of healthcare in humans. Examples of studies that would not have a direct effect on patients or participants (and, thus, are excluded from analysis) include studies that describe the normal development of people; basic science; gender and equality studies in the health profession; or studies looking at research methodology issues. The inter-rater agreement for this classification was almost perfect (κ = 0.87, 95% confidence interval 0.89 – 0.96).
A third level of classification placed reviews in purpose categories (i.e., what question(s) are the investigators addressing) that we defined for the Hedges Project and included etiology (causation and safety), prognosis, diagnosis, treatment, economics, clinical prediction guides, and qualitative (Table 1) . The inter-rater agreement for this classification was 81% beyond chance (κ = 0.81, 95% confidence interval 0.79 – 0.84).
A fourth level of classification graded reviews for methodological rigor placing them in pass and fail categories. To pass, the review should include a statement of the clinical topic (i.e., a focused review question); explicit statements of the inclusion and exclusion criteria; a description of the search strategy and study sources (i.e., a list of the databases); and at least 1 included study that satisfied methodological rigor criteria for the purpose category (Table 1). For example, reviews of treatment interventions had to have at least one study with random allocation of participants to comparison groups and assessment of at least one clinical outcome. All narrative reviews were included in the fail category. We refer to systematic reviews that passed this methodological rigor evaluation as rigorous systematic reviews. Again, the inter-rater agreement for this classification was almost perfect (κ = 0.89, 95% confidence interval 0.78 – 0.99).
For this report, we retrieved data on review articles including a complete bibliographic citation (including journal title), the pass/fail methodological grade, and the review type (narrative or systematic review).
Impact factor and citation counts
To collect impact factor data for all 170 journals in the database we used the ISI Journal Citation Reports http://isiknowledge.com. We also queried the ISI Web of Science database to ascertain, as of February 2003, the number of citations to each one of the reviews in an arbitrary subset of five journals that published the most systematic reviews and are indexed journals in the ISI database.
Data were arrayed in frequency tables. We conducted nonparametric univariate analysis (Kruskal-Wallis) to assess the relationship between the number of citations and the type of review. We assessed the correlation between journal impact factor and citation counts. Then, using multiple linear regression, we determined the ability of the independent variables – methodological quality of the reviews and journal source – to predict the dependent variable, the number of citations (after log transformation). Thus, this analysis was stratified by journal to adjust not only for impact factor, but also for other journal-specific factors not captured by this measure.
What journals publish systematic reviews?
The 20 clinical journals that published the most systematic reviews in 2000
No. reviews (% of all original and review articles)
No. systematic reviews (% of all reviews)
ARCH INTERN MED
ANN INTERN MED
J FAM PRACT
J CLIN ONCOL
J ADV NURS
J GEN INTERN MED
AM J MED
The five clinical journals that published the most systematic reviews by purpose category in 2000
No. systematic reviews
No. therapy systematic reviews (%*)
ARCH INTERN MED
ANN INTERN MED
ARCH INTERN MED
ANN INTERN MED
J GEN INTERN MED
The five clinical journals that published the most systematic reviews by audience in 2000
No. reviews (% of all original and review articles)
No. systematic reviews (% of all reviews)
J ADV NURS
PATIENT EDUC COUNS
J CLIN NURS
J PEDIATR ONCOL NURS
J NURS SCHOLAR
ARCH INTERN MED
ANN INTERN MED
The relationship between journal impact factor and publication of systematic reviews
In the subset of 99 journals for which impact factor data were available, impact factor was significantly and weakly associated with the number of rigorous reviews published (R2 = 0.075, P = 0.0035). The association was also significant and somewhat stronger in the subset of general medicine journals (No. rigorous systematic reviews = 2. + • impact factor; R2 = 0.257, P = 0.0156) with all other clinical topic subsets being not significant (P 0.05).
To conduct citation analysis we identified the top five journals that published the most systematic reviews (Table 2). The Cochrane Library was excluded because ISI does not track citations for Cochrane reviews. In this subset, there were 172 narrative reviews and 99 systematic reviews of which 82 were rigorous systematic reviews. For the rest of the analyses we considered the systematic reviews that did not meet methodological criteria (n = 17) in the same group as narrative reviews.
Rigorous systematic reviews were cited significantly (P < 0.0001) more often (median 26.5, IQR 12 – 56.5) than narrative reviews (8, 3 – 20). After stratifying for journal source, review type (narrative vs. rigorous systematic review) was an independent predictor of citation counts (R2 = 0.257, P < 0.0001): a rigorous systematic review had, on average, twice the number of citations as a narrative review published in the same journal (relative citation rate 2.0, 95% confidence interval 1.5 – 2.7). There was no significant interaction between journal and review type.
Our study indicates that 11% of the 170 clinical journals we reviewed published more than 80% of all systematic reviews. Impact factor was a weaker predictor of citations than the methodological quality of the review. Among the five journals publishing most systematic reviews, and after stratifying by journal, the type of review (rigorous systematic vs. narrative) was independently associated with the number of citations. Thus, our findings are consistent with the priority given to systematic reviews in the hierarchy of evidence to support evidence-based clinical decisions.
Limitations and strengths of the research
Our research has some limitations. First, we did not determine the nature of the citations. That is, it is possible that certain citations pointed out a fatal flaw in the index paper. Second, of all the journals, the Cochrane Library provides the largest number of reviews. Unfortunately, the Cochrane Library is not an ISI indexed resource. Third, the New England Journal of Medicine had the highest impact factor, but no systematic reviews in 2000. Nevertheless, our results were statistically significant and did not lack statistical power. Furthermore, our results apply to most medical journals that publish systematic reviews (unlike the New England Journal of Medicine) in addition to reports using other study designs (unlike the Cochrane Library). We did not set out to evaluate the impact of these reviews on clinical practice.
Our research has several strengths. The methods we used to ascertain the database and classify the records involved highly trained personnel, independent assessments, explicit definitions, third-party arbitration of differences between reviewers, and a large and complete database. To our knowledge, this is the first paper to describe where systematic reviews are most often published in a broad range of clinical journals. Also for the first time, we evaluated and demonstrated that rigorous systematic reviews were cited more often than less rigorous and narrative reviews in the subset of journals that publish most systematic reviews, even after adjusting for journal of publication (e.g., journal impact factor). Our results are consistent with another study that also documented a weak association between journal impact factor and the methodological quality of published studies .
Meaning of the research
We can only speculate about the causes of the maldistribution of rigorous systematic reviews among a few journals, since exploration of such causes was not an objective of our study. Journal policy and author preferences may contribute to this maldistribution. The lack of systematic reviews and meta-analysis published in the New England Journal of Medicine is evidence of the effect of journal policy. Other journals, such as The Journal of the American Medical Association, Lancet, The British Medical Journal, and Annals of Internal Medicine, have published articles about systematic review methodology and reporting, and enthusiastically publish rigorous reviews of clinical importance. Authors of such reviews, naturally, may prefer to submit their reviews to journals with large circulation and impact. The relative contributions of these sources to the observed maldistribution constitute hypotheses that remain to be tested.
Given that our research design does not support causal inferences, it is unwise to derive recommendations to journal editors based on our findings. We think that journal editors interested in publishing rigorous research should prefer systematic reviews over narrative reviews. Furthermore, our research generates the hypothesis that a choice of systematic over narrative reviews may contribute to increase a journal's impact factor. However, editors of traditional journals have other competing priorities that rely less on citation counts and more on popularity (e.g., attract and maintain readership, attract advertisement and generate revenue) which may direct their choice of reviews to publish (i.e., if they perceive narrative reviews as easier to read and more attractive to their readership than systematic reviews and meta-analyses).
Future research may refine citation counting to ascertain whether the citation is positive or negative. This work will also inform our development of MEDLINE search filters for identifying systematic reviews in that database, particularly through the generation of journal subsets within the database to expedite the search.
In summary, our report identifies for researchers and clinicians the journals that are to publish rigorous reviews. Furthermore, rigorous systematic reviews are cited more often than narrative ones, an indirect endorsement of the hierarchy of evidence.
The National Library of Medicine, USA funded this research. In addition to the authors of this report, the personnel that contributed to this research are Angela Eady, Susan Marks, Ann McKibbon, Cindy Walker-Dilks, Stephen Walter, and Sharon Wong.
- Guyatt GH, Haynes B, Jaeschke R, Cook D, Greenhalgh T, Meade M, Green L, Naylor C, Wilson M, McAlister FA, Richardson W, Montori V, Bucher H: Introduction: The philosophy of evidence-based medicine. In: Users' Guides to the Medical Literature: A Manual of Evidence-Based Clinical Practice. Edited by: Guyatt GH, Rennie D. 2002, Chicago: American Medical Association, 121-140.Google Scholar
- Oxman A, Guyatt GH, Cook D, Montori V: Summarizing the evidence. In: Users' Guides to the Medical Literature A manual for evidence-based clinical practice. Edited by: Guyatt GH, Rennie D. 2002, Chicago: AMA Press, 155-173.Google Scholar
- Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC: A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. Jama. 1992, 268 (2): 240-248. 10.1001/jama.268.2.240.View ArticlePubMedGoogle Scholar
- Mulrow CD: The medical review article: state of the science. Ann Intern Med. 1987, 106 (3): 485-488.View ArticlePubMedGoogle Scholar
- McAlister FA, Clark HD, van Walraven C, Straus SE, Lawson FM, Moher D, Mulrow CD: The medical review article revisited: has the science improved?. Ann Intern Med. 1999, 131 (12): 947-951.View ArticlePubMedGoogle Scholar
- Chalmers I: The Cochrane collaboration: preparing, maintaining, and disseminating systematic reviews of the effects of health care. Ann N Y Acad Sci. 1993, 703: 156-163. discussion 163–155.View ArticlePubMedGoogle Scholar
- Clarke M: The Cochrane Collaboration: providing and obtaining the best evidence about the effects of health care. Eval Health Prof. 2002, 25 (1): 8-11. 10.1177/0163278702025001002.View ArticlePubMedGoogle Scholar
- Montori VM, Smieja M, Guyatt GH: Publication bias: a brief review for clinicians. Mayo Clin Proc. 2000, 75 (12): 1284-1288.View ArticlePubMedGoogle Scholar
- Dickersin K, Higgins K, Meinert CL: Identification of meta-analyses. The need for standard terminology. Control Clin Trials. 1990, 11 (1): 52-66. 10.1016/0197-2456(90)90032-W.View ArticlePubMedGoogle Scholar
- Hunt DL, McKibbon KA: Locating and appraising systematic reviews. Ann Intern Med. 1997, 126 (7): 532-538.View ArticlePubMedGoogle Scholar
- Shojania KG, Bero LA: Taking advantage of the explosion of systematic reviews: an efficient MEDLINE search strategy. Eff Clin Pract. 2001, 4 (4): 157-162.PubMedGoogle Scholar
- White V, Glanville J, Lefebvre C, Sheldon T: A statistical approach to designing search filters to find systematic reviews: objectivity enhances accuracy. J Information Sci. 2001, 27: 357-370. 10.1177/0165551014233798.View ArticleGoogle Scholar
- Seglen PO: Why the impact factor of journals should not be used for evaluating research. Bmj. 1997, 314 (7079): 498-502.View ArticlePubMedPubMed CentralGoogle Scholar
- Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC: Developing optimal search strategies for detecting clinically sound studies in MEDLINE. J Am Med Inform Assoc. 1994, 1 (6): 447-458.View ArticlePubMedPubMed CentralGoogle Scholar
- Wilczynski NL, McKibbon KA, Haynes RB: Enhancing retrieval of best evidence for health care from bibliographic databases: calibration of the hand search of the literature. Medinfo. 2001, 10 (Pt 1): 390-393.Google Scholar
- Lee KP, Schotland M, Bacchetti P, Bero LA: Association of journal quality indicators with methodological quality of clinical research articles. Jama. 2002, 287 (21): 2805-2808. 10.1001/jama.287.21.2805.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/1/2/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.