Search strategy
We searched MEDLINE (via PubMed) to identify reports of RCTs indexed between January and December 2009, which were published in the core clinical journals defined by the US National Library of Medicine and the National Institutes of Health (a subset of 119 widely read journals published in English, covering all specialties of clinical medicine and public-health sciences, and including all major medical journals, which was previously known as the Abridged Index Medicus and is available at http://www.nlm.nih.gov/bsd/aim.html). The search strategy used the following limits: ‘randomized controlled trial’, ‘publication date from 2009/01/01 to 2009/12/31’ and ‘core clinical journals’. The date of search was 13 January 2010.
Eligibility criteria and screening process
One of the researchers (GB) screened the titles and abstracts of retrieved articles to identify relevant articles, then obtained full text of the relevant articles, and assessed the full text to determine whether the article met the inclusion criteria. The help of a second reviewer (IB or EP) was requested if needed. We considered only articles that were the first report of the trial results. We excluded sub-studies of an original publication (for example, follow-up study, trial extension, ancillary study, post hoc analyses, exploratory analyses, secondary analyses, reanalysis of a trial, pooled analyses of trials).
Data collection
A standardized data-extraction form (available from the corresponding author) was generated from a review of the literature and a priori discussion. Before data extraction, the form was tested independently, as a calibration exercise, by two of the authors (GB, EP) on a separate random set of 20 articles. The ratings were reviewed and any disagreements were resolved by consensus.
Following this, the two reviewers, who were not blinded to the journal name, authors, author affiliations, or funding sources, retrieved and extracted data from published articles. A random sample of 30 articles was reviewed for quality assurance. Inter-observer agreement in extracting data was good: the median kappa value for items was 0.68 (range 0.30 to 1.00) (see Additional file 1). In cases of uncertainty regarding a particular article, items with poor agreement, or items related to the design of the trial, the data were independently checked by the second reader, and discrepancies were resolved by discussion.
Data collection
We extracted data related to the general characteristics of the study: number of randomized groups, study design, medical area, nature of intervention group(s), number of centers, total number of randomized participants, randomization design, funding sources, and whether the trial was registered. We also extracted methodological items: definition of the study hypothesis (the comparisons planned in the Methods section), baseline characteristics and outcomes reported per group (details that would allow for future meta-analyses), sample-size calculation reported, sample-size calculation taken into account in the multiple-arm design (either by a global sample-size calculation or by an adjustment method used for multiple testing), planning or use of an adjustment method for statistical comparisons (either for sample-size calculation or for statistical analysis), and whether the title identified the trial as a multiple-arm trial. We also systematically assessed selective reporting by comparing the planned comparisons (that is, the comparisons reported in the Methods section) and reported comparisons (the comparisons reported in the Results section) for global comparison tests (which globally assess differences between all groups), pairwise comparison tests (which compare data between two groups); and pooled group analyses (which assess combined data for two or more groups).
Statistical analysis
Because we chose a convenience sample of RCTs, we did not calculate a required sample size. Our planned analysis was descriptive, and was stratified by study design (parallel-arm, factorial, crossover). Categorical variables are presented as frequencies and percentages, and quantitative variables are presented as median (with 10th and 90th percentiles). We specifically investigated comparisons that were reported as planned but were not performed, which could suggest selective reporting. We also investigated reported comparisons that were not planned, which could suggest post hoc comparisons. Data analysis involved use of the software programs SAS (version 9.3 for Windows; SAS Institute, Cary, NC, USA) and R (version 2.15.1; R Foundation for Statistical Computing, Vienna, Austria).