The most important tasks for peer reviewers evaluating a randomized controlled trial are not congruent with the tasks most often requested by journal editors

Background The peer review process is a cornerstone of biomedical research publications. However, it may fail to allow the publication of high-quality articles. We aimed to identify and sort, according to their importance, all tasks that are expected from peer reviewers when evaluating a manuscript reporting the results of a randomized controlled trial (RCT) and to determine which of these tasks are clearly requested by editors in their recommendations to peer reviewers. Methods We identified the tasks expected of peer reviewers from 1) a systematic review of the published literature and 2) recommendations to peer reviewers for 171 journals (i.e., 10 journals with the highest impact factor for 14 different medical areas and all journals indexed in PubMed that published more than 15 RCTs over 3 months regardless of the medical area). Participants who had peer-reviewed at least one report of an RCT had to classify the importance of each task relative to other tasks using a Q-sort technique. Finally, we evaluated editors’ recommendations to authors to determine which tasks were clearly requested by editors in their recommendations to peer reviewers. Results The Q-sort survey was completed by 203 participants, 93 (46 %) with clinical expertise, 72 (36 %) with methodological/statistical expertise, 17 (8 %) with expertise in both areas, and 21 (10 %) with other expertise. The task rated most important by participants (evaluating the risk of bias) was clearly requested by only 5 % of editors. In contrast, the task most frequently requested by editors (provide recommendations for publication), was rated in the first tertile only by 21 % of all participants. Conclusions The most important tasks for peer reviewers were not congruent with the tasks most often requested by journal editors in their guidelines to reviewers. Electronic supplementary material The online version of this article (doi:10.1186/s12916-015-0395-3) contains supplementary material, which is available to authorized users.


Background
The peer review process is a cornerstone of biomedical research publication [1]. The editorial peer review process is described as journal editors relying on the views of independent experts in making decisions on, for example, the publication of submitted manuscripts or presentation of reports at meetings [2]. The peer review system is considered the best method for evaluating publications of health research [3]. Nevertheless, the effectiveness of the peer review system is questioned. In 2010, a report ordered by the UK House of Commons showed that the cost to the UK Higher Education Institutions in terms of staff time was £110 to £165 million per year for peer review and up to £30 million per year for the work done by editors and editorial boards [4]. Worldwide, peer review costs an estimated £1.9 billion annually and accounts for about one-quarter of the overall costs of scholarly publishing and distribution [5]. The human cost was evaluated at 15 million hours by 2010 [6]. Furthermore, the peer review system may fail to identify important flaws in the quality of manuscripts and published articles [7,8]. A recent study by Hopewell et al. [9] showed that peer reviewers could not detect important deficiencies in the reporting of methods and results of randomized trials; modifications requested by peer reviewers were relatively minor. For example, reviewers added information about concealment of allocation sequence for 10 % of the manuscripts while it was not reported in more than half of the articles. Although most had a positive impact, some were inappropriate and could have a negative impact on reporting in the final publication. Peer reviewers' assessments can also be affected by the study results: peer reviewers tend to rate the methodological quality of a study with positive results higher than the same study reported with negative results [10,11]. Moreover, the reproducibility between reviewers is disputable [12][13][14]. Finally, peer reviewers are not able to detect fraud [15] and mistakes [16].
Our hypothesis is that several tasks can be expected from peer reviewers and that peer reviewers and editors may have different views on which task is important.
We aimed to 1) identify all tasks that are expected of peer reviewers when evaluating a manuscript reporting a randomized controlled trial (RCT), 2) sort these tasks according to their importance, and 3) determine which tasks are clearly requested by editors in their recommendations to reviewers. We focused on the peer review of RCTs because this design is the cornerstone of therapeutic evaluation.

Methods
We proceeded in three steps: first, we identified and combined the tasks expected of peer reviewers, then surveyed peer reviewers of RCT reports to sort the different combined tasks, followed by assessment of the editors' recommendations to peer reviewers to determine which of the tasks identified are clearly requested.

Identification of tasks expected from peer reviewers
To identify the tasks expected of peer reviewers, we systematically reviewed the published literature and searched for the recommendations to reviewers in a sample of journals.

Systematic review of the published literature
We systematically searched MEDLINE via PubMed, Google Scholar, and EMBASE for all articles related to the peer review process and reporting tasks expected of peer reviewers published in English during the last 10 years with an abstract available. The search strategy used the keywords "peer review" OR "peer-review" (Search date: March 18, 2014). One researcher (AC) read all titles and abstracts and all text fragments from full-text articles. If a study was potentially eligible, the full-text article was retrieved. We included articles in the field of biomedical research dedicated to the peer review process and specifically pertaining to tasks expected of peer reviewers.

Journal recommendations to reviewers
We selected a sample of journals publishing full-text articles in English. We aimed to select a large variety of journals publishing RCT results. For this purpose, we proceeded in two steps. 1) We searched PubMed with the keywords randomized controlled trial with the following filters: study type RCT, English language, human studies, published between January 1 and March 31, 2013, and with an abstract available (search date February 2, 2014). All citations retrieved were exported to an EndNote file. We selected all journals whatever the category (Additional file 1) that had published at least 15 reports of RCTs indexed in PubMed during the 3-month period. 2) We arbitrarily selected, in the Web of Science (Journal Citation Reports), 14 categories in the field of medicine among the 111 existing categories in the field of (bio)medicine. We chose these categories to have a sample of journals that focused on a large variety of patients (children, adults) and treatments (drugs, surgery, rehabilitation, psychotherapy, radiotherapy, devices, etc.).The selected categories were emergency medicine, surgery, internal medicine, anesthesiology, cardiac and vascular disease, critical care medicine, hematology, infectious diseases, neurosciences, oncology, pediatrics, psychiatry, rheumatology, and clinical neurology. Then, for each category, we selected the 10 highest-impact-factor journals (Journal Citation Report 2012) that published RCTs (Additional file 1). For this purpose, we hand-searched the three last issues of each journal (2014/04/30) and selected only journals that had published at least one RCT report in the three last issues.
For each selected journal, the recommendations to peer reviewers posted on the journal website were searched (by AC) and all the editors and managing editors with an available email address were emailed to ask for the recommendations they provide to their peer reviewers. When recommendations were provided on the journal website and the journal publisher website, we combined the recommendations and considered that the peer reviewers had to follow both recommendations. One reminder was sent every week until three emails were sent.

Extraction and condensation of tasks
From all sources (articles, website, journal editors' responses), one researcher (AC) systematically recorded all peer review tasks relevant for the assessment of an RCT reported. Duplicates were deleted and tasks were classified into the following categories used in a previous work [17]: 1) logistics (i.e., how to use the manuscript review system), 2) etiquette/objectives (i.e., how to behave during the peer review process), 3) abstract (i.e., form and content), 4) introduction (i.e., context of the study), 5) rationale (i.e., interest of the study), 6) methods (i.e., quality of study methods), 7) statistics (i.e., sample size calculation or proposition of a specific statistics review), 8) results (i.e., presentation and content of results), 9) discussion (i.e., interpretation of results and limitations), 10) conclusion (i.e., whether the conclusion is supported by results), 11) references (i.e., pertinence and presentation of references), 12) overall presentation (i.e., language, length and organization of the article), 13) figures/tables (i.e., presentation, size and number of figures), 14) reporting guidelines (i.e., reporting guidelines checklist such as CONSORT), 15) ethics (i.e., reference to ethics review board or respecting patient rights), and 16) fraud (i.e., risk of plagiarism or conflict of interest) (Additional file 2).
All tasks were secondarily combined to create principal task combinations [18,19] (Fig. 1). In fact, some of the tasks collected were defined precisely and were related to the same domain. For example, evaluating the risk of bias could imply several tasks: evaluating the quality of the randomization procedure, blinding of patients, care providers, outcome assessors, and rating and handling of missing data. Similarly, assessing the quality of randomization procedure can be divided into different tasks such as evaluating the allocation sequence generation and allocation concealment, which imply checking who generated the random allocation sequence, who enrolled participants, and who assigned participants to interventions and evaluating the mechanism used to implement the random allocation sequence (such as sequentially numbered containers), assessing any steps taken to conceal the sequence until interventions were assigned. We did not want to go into such details for defining a task, so we condensed all the tasks related to the assessment of the risk of bias into a single task. We proceeded similarly for all tasks that needed some condensation.
Two researchers (AC and CB) independently combined the tasks, and consensus was reached by discussion in case of disagreement. If no consensus was reached, a third researcher helped resolve any disagreements (IB).

Survey of peer reviewers to sort the statements related to tasks by importance
To sort the tasks expected from peer reviewers, we used Q-sort [20,21], which asks participants to sort combined tasks in relation to each other according to a predetermined distribution.

Participants
We selected participants who had peer-reviewed at least one RCT report. Our aim was to obtain the participation of peer reviewers with different expertise (methodological and clinical expertise) and backgrounds. For this purpose we used a large strategy of recruitment and contacted the following participants: A convenience sample of participants listed as peer reviewers of the New England Journal of Medicine, Journal of the American Medical Association, and Annals of the Rheumatic Diseases cited by the journal in the last journal issues for 2010 and 2011 and selected in a previous study [22]. Editors of the journals selected by searching journal websites described previously.
All participants were invited via a standardized and personalized email to participate in a survey to determine which tasks are most important when reviewing the report of an RCT. There was no reminder sent after the first email. We included only participants who reported in the survey that they had peer-reviewed at least one RCT report.

Sorting tasks
With Q-sort, participants classify the importance of each task relative to other tasks [23]. We pilot-tested the survey with 10 researchers (two statisticians, six methodologists, and two clinicians) and improved the wording of some items. Sorting involved three steps. First, participants read a list of tasks identified and combined in a previous step that could be expected of a peer reviewer when evaluating the report of an RCT. Second, participants sorted the combined tasks into three categories according to their evaluation of the task's importance: 1) less important, 2) neutral, or 3) more important. Third, participants were asked to read the sorted tasks and place them on a score sheet representing a normal distribution ranging from −5, less important, to +5, more important. An example of sorting is given in Additional file 3. Finally, participants could re-evaluate their distribution and shift items. We used FlashQ Software 1.0, the Flash application for performing Q-sort online [24].

Assessment of tasks requested in editors' recommendations to peer reviewers
For all tasks identified and sorted, we systematically determined whether the tasks were clearly recommended by journal editors. One of us (AC) screened all journal websites and documents sent by editors to determine whether editors specifically request these tasks. If the task requested by editors was not clear, we considered that editors did not request this task. For example, if editors requested "to evaluate the quality of the study," we considered this task vague or unclear. However, if editors clearly requested the assessment of the randomization procedure or the strategy used to account for missing data or a blinding procedure, we considered the editor's request clear, in this case, requesting the evaluation of risk of bias.
We also systematically recorded whether the guidelines were easily accessible (i.e., information accessible on the main page of the website), if the journal's website had a specific peer review section, and if the website included a description of the peer review process.

Statistical analysis
Quantitative data are described with means and SDs and categorical data with frequencies and percentages. Participant characteristics and classifications of tasks expected of peer reviewers are described overall and by participant competence (i.e., clinician, methodologist, or both). The classification was established by the mean ranking and proportion of participants who ranked the tasks in the first tertile in importance (i.e., ≥2 on the −5 to + 5 scale). We also performed a sensitivity analysis excluding the sub-group of peer reviewers who were first contacted because they were editors. These participants were first asked to provide details of their journal recommendations to reviewers and then asked again to rank the order of importance of the different tasks (Additional file 4). Statistical analysis involved use of SAS 9.3 (SAS Inst. Inc., Cary, NC).

Identification of tasks expected from peer reviewers
We retrieved 2 117 citations from our search of articles related to tasks for peer reviewers; 98 full-text articles were selected and 90 were evaluated. Overall, among the 90 articles selected and evaluated, 48 were secondarily excluded because they did not report any recommendation or tasks dedicated to peer reviewers, 34 reported specific recommendations and tasks, and 8 referred to other sources (such as online training resources, website, etc.) from which further data were extracted (Fig. 1). The selected articles are in Additional file 5.
Overall, we selected 171 journals. All journals had a website. In total, 95 (56 %) journals did not have recommendations for peer reviewers available on their website and publisher interface; 34 (20 %) had recommendations only on their website, 32 (18 %) only on the publisher interface, and 10 (6 %) on both. Only 52 journals (30 %) had a specific peer reviewer section. Of the 171 journal editors contacted, 77 (45 %) responded and 69 (40 %) agreed to send us the documents they send to reviewers (Fig. 1), but three refused because they considered this information confidential. Among those who sent us documents, 44 (57 %) provided specific documents to their peer reviewers that are not available on their website. Among these specific documents, only 22 (50 %) gave more information than the website.

Survey of peer reviewers to sort the statements related to tasks by importance Participants
We identified 7 996 email addresses useful for inviting participants to sort tasks; 229 participants completed the Q-sort survey of which 26 were excluded because they had not peer-reviewed at least one RCT report. Therefore Participant characteristics are in Table 2. Participants were mainly located in Europe (49 %) and USA/Canada (32.5 %) but also in South America (9.6 %) and Oceania (8.9 %). In total, 93 (45.8 %) had clinical expertise, 72 (35.5 %) were methodologists or statisticians, 17 (8.4 %) had both clinical and methodological expertise, and 21 (10.3 %) had expertise in another area (i.e., were researchers, engineers, economists). Most participants (92 %) were investigators of at least one RCT. Overall, 150 participants (73.9 %) reviewed one to five RCT reports per year. One third did not receive any form of training to perform a peer review. More than half of the participants (61.1 %, n = 124) spent more than 2 h peer-reviewing a manuscript. Half regularly agreed to disclose their name when proposed or requested by editors. Table 3 reports the tasks sorted by participants. The nine tasks rated as the most important were to 1) evaluate the risk of bias of the trial, 2) determine whether the manuscript conclusion is consistent with the results, 3) evaluate the adequacy of the statistical analyses, 4) evaluate whether the control group is appropriate, 5) check if all outcomes are adequately reported, 6) evaluate the relevance of the primary outcome(s), 7) search for any attempt to distort the presentation or interpretation of results, 8) evaluate the reliability and validity of the outcome measures, and 9) evaluate the importance of the study. For clinicians, the 2nd and 3rd most important tasks were to evaluate the importance of the study and the reliability and validity of the outcome measures, whereas for methodologists, these tasks were ranked lower, 14th and 11th, respectively. The task to evaluate the adequacy of statistical analyses was ranked 2nd by methodologists but 6th by clinicians. The sensitivity analysis excluding the answers of the 21 participants invited because they were editors showed consistent results (see Additional file 4).

Assessment of tasks requested in editors' recommendations to peer reviewers
The tasks rated in the first tertile of importance (≥2 on the −5 to +5 scale) were not completely congruent with the tasks most frequently requested by editors (Fig. 2). For example, evaluating the risk of bias was rated in the first tertile by 63 % of participants but clearly requested by only 5 % of editors. In contrast, the task most frequently requested by editors was the recommendation for publication (76 %), but this task was classified in the first tertile by only 21 % of all participants.
Among the 171 websites studied, only 76 (44 %) had recommendations for peer reviewers. These recommendations were easily accessible for only half of the websites (n = 38). One third had had a specific peer review section. However, 70 % (124) included a description of the peer review process.

Discussion
This study involved the identification and sorting of all tasks expected from peer reviewers when evaluating the report of an RCT. We performed a large systematic review of the literature and editors' guidelines and surveyed peer reviewers from various backgrounds who evaluated peer reviewer tasks by the Q-sort method.
Overall, we identified 221 different tasks that could be expected from peer reviewers when evaluating an RCT report. These tasks involved different levels of expertise. For example, assessing the adequacy of statistical analysis involves methodological and statistical expertise, whereas assessing the adequacy of the selection of participants and clinical setting involves content expertise. The tasks rated as most important were related to the methodology (evaluating the risk of bias of the trial, whether the control group is appropriate, and the reliability and validity of the outcome measures), statistics (evaluating the adequacy of statistical analyses), and results (determining whether the manuscript conclusions are consistent with the results and whether all outcomes are adequately reported, and searching for any attempt to distort the presentation or interpretation of results). We found some differences in how tasks were sorted according to the peer reviewers' expertise (clinicians, methodologists, or both). Therefore, peer reviewers will place different values on different tasks according to their expertise.
We found some differences between tasks clearly requested by editors in their recommendations and tasks rated as important by reviewers. Editors frequently ask whether an article is suitable for publication, but reviewers ranked this task 22nd among the 36 tasks. Editors had more recommendations about the format (tables/figures/ presentation) and the importance of the study than the methodology. In contrast, the tasks considered the least important by peer reviewers mainly related to the format (evaluating whether authors respect the requested format for references and the adequacy of the language).  Our findings also reflect that current recommendations by editors may not be sufficiently explicit. Editors' recommendations were mainly generic and not for a specific type of study design. This is probably related to the fact that editors usually rely on the expertise of reviewers and expect that the reviewers know which aspects are important and which aspects need to be evaluated. However, some evidence in the literature shows that the impact of peer reviewers is questionable [9,[24][25][26][27][28][29]. This situation is problematic because peer reviewers are usually not specifically trained; they have limited time for peer review and receive minimal rewards for this task. Further, they may misunderstand what they are expected to do. Moreover, there may be ambiguities regarding who should perform a specific task. Some tasks, such as checking the quality of reporting or checking trial registration, can be considered by editors as a reviewer task and by reviewers as an editorial task. Editors could try to achieve consensus on what they expect from peer reviewers when evaluating the report of an RCT.
Furthermore, important and simple tasks that could easily increase a paper's quality were poorly rated by peer reviewers and rarely requested by editors. In fact, the task dedicated to trial registration (i.e., the item to compare information recorded on a clinical trial register such as ClinicalTrials.gov and reported in the manuscript) was classified 32nd among the 36 condensed items and never clearly requested by editors. This situation raises some concern because trial registration may limit the risk of selective reporting bias [30]. However, these results    To check if all outcomes are adequately reported (5) 1.4 ± 2.0 (9) 1.1 ± 2.1 (5) 1.7 ± 1.9 (5) 1.5 ± 1.6 (4) 1.9 ± 1.9 To evaluate the relevance of the primary outcome(s) To evaluate the adequacy of the language (35) -2.9 ± 1.7 (35) -2.7 ± 1. are consistent with other studies. In fact, Mathieu et al. [22] showed that only one third of peer reviewers examined registered trial information and reported any discrepancies to journal editors. These results could be linked to the ambiguity on who, editors or reviewers, is responsible for this task. The medical community should be aware of the importance of trial registration and ensure that proper information is submitted to registries. There may be legitimate discrepancies between the manuscript and registry record, but these should be transparently reported in the paper or in the registry. Similarly, the tasks dedicated to reporting guidelines (i.e., checking whether the items requested by the CONSORT Statement are adequately reported by authors and whether items requested by the CONSORT Statement extensions are adequately reported when appropriate) were classified 28th and 31st, respectively, among the 36 items and requested by 29 % and 1 % of editors, respectively. Some studies have demonstrated the positive impact of reporting guidelines, such as the CONSORT guidelines, on reporting quality [31]. The ranking of tasks concerning the CONSORT Statement could explain in part why reporting according to the CONSORT guidelines is poor [7,32].
Our study has some limitations. First, we cannot know whether peer reviewers who participated in the study are representative of all peer reviewers. In fact, we cannot estimate the response rate because the emails sent may have been identified as spam or sent to junk mail folders. However, this survey focuses on qualitative input and it was mainly important to have a variety of expertise. Second, our sample included only 171 journals, with a response rate of journal editors of 45 %. Third, only one author extracted data on the tasks expected of peer reviewers (AC). However, two reviewers achieved consensus on the classification of these tasks. Fourth, our study was a Fig. 2 Representation for each task of the proportion of participants rating the task in the first tertile (i.e., ≥2 on the scale from −5 to +5 for each task) and the proportion of editors requesting the task in their recommendations to authors. The tasks are sorted according to the mean ranking of participants snapshot of editors' recommendations, and some journals may have since updated their websites and their instructions to peer reviewers (30 July, 2014). Finally, although we performed a systematic methodological review, we cannot exclude that we missed some reports on this topic.

Conclusions
The tasks considered most important to peer reviewers (i.e., the methodology, statistics, and results) were frequently not clearly requested by journal editors.