This meta-research study is ancillary to the Cochrane living systematic review and network meta-analysis (NMA) on COVID-19 (hereafter COVID-NMA) (https://covid-nma.com/, PROSPERO CRD42020182600) . COVID-NMA is a living systematic review, in which all available evidence related to COVID-19 is continuously collected, critically appraised, and synthesized using pairwise comparison and NMA methods. The protocol for the present study can be provided upon request by the corresponding author.
Our sample comprises all studies assessing preventive, therapeutic, or post-acute care interventions for COVID-19 that were published as preprints or journal articles up to 15 August 2020.
To identify eligible studies, we screened the COVID-NMA database, which provides an exhaustive archive of COVID-19 studies assessing the effect of interventions. As part of the COVID-NMA project, medRxiv, and PubMed are searched daily, independently by two reviewers, to identify eligible studies. Secondary sources, such as the Living OVerview of Evidence (L·OVE) database, are searched as a quality control. The search strategy is in Additional file 1: Methods S1.
Eligibility criteria and study selection
We included all preprints and journal articles indexed in the COVID-NMA database, from inception to 15 August 2020, with the following study designs: randomized controlled trials (RCTs), observational studies (i.e., case series, case-control, cohort, interrupted time series, studies modeling the effects of population-level interventions). We excluded case series with < 5 participants, systematic reviews, prognosis and diagnosis studies, and modeling studies using simulation methods. We included reports on preventive, therapeutic, and post-acute care interventions, for healthy individuals, patients with COVID-19, or recovered individuals, respectively. Because the scope of the COVID-NMA is already very broad, studies assessing Traditional Chinese Medicine and other types of Traditional, Complementary and Alternative Medicine (TM/CAM) interventions were excluded . Studies not performed with humans were also excluded.
Two reviewers (T.O. and O.P.) independently screened all records. Disagreements were resolved by consensus or by consultation with a senior reviewer (I.B.).
Linking all evidence sources for each study
We sought to link preprints and journal articles reporting results from the same study. For this purpose, we used a four-step approach:
We performed a systematic search of preprints and published articles as part of the COVID-NMA (as described above). We used the same eligibility criteria for preprints and articles. One author (T.O.) downloaded these data in Excel format and used three ways to identify potential preprint-article pairs: (a) by using the Excel search function to identify articles for which the name of the article author matched that of the first author of an included preprint, and comparing the titles of potential matches to verify that they reported the same study; (b) by extracting the intervention type assessed in each preprint and article and comparing the titles of preprints and articles that assessed the same intervention to identify pairs that reported the same study; and (c) by comparing the trial registration number for preprints and articles reporting RCTs.
We performed additional searches of the Dimensions academic search engine (https://app.dimensions.ai). We used the name of the first author of each preprint to identify articles by the same author in any authorship position. We did this by downloading the Dimensions dataset in Excel format, filtering entries that were not journal articles, and using the search function to search for the name of the preprint’s first author in the column containing all article author names. When a potential preprint–article pair was identified in this manner, we compared the titles and the names of the remaining authors to decide whether the two sources reported findings from the same study.
One author (G.C.) designed an algorithm that relies on preprint metadata obtained via the Crossref API (https://github.com/CrossRef/rest-api-doc) (e.g., author ORCIDs, author names, sequence of coauthors listed in the bylines, title words, preprint/publication timelines) to identify journal articles reporting on the same study as the included preprints. All links identified by the algorithm were then reviewed by one author (T.O.) to remove false-positives. Furthermore, the accuracy of the algorithm was validated by comparing its results for 740 preprint–journal article pairs, against all links established by the medRxiv platform as of 14 July 2020. The algorithm correctly linked 99.73% of the 740 pairs, with only 2 false-negatives. Detailed results on the accuracy of the algorithm will be reported in a validation study [Cabanac et al., unpublished data].
Finally, we contacted the corresponding author of each preprint that was classified as unpublished in the above two steps via email (see Additional file 1: Methods S2 for a template) to ask whether the preprint had been accepted for journal publication. For preprints that had not been accepted, we asked whether they had been submitted to a journal, or if the authors intended to submit them in the future. We sent a reminder email to authors who did not respond to our first email. We received 123 responses from the 272 authors contacted (45% response rate). No new preprint–journal article links were identified in this step.
In this report, we include preprint–journal article links identified up to 2 September 2020.
We extracted the following general characteristics of studies: study design, intervention, country of affiliation for the corresponding author, and data sharing (i.e., whether the authors were willing to share the dataset used in the study or not, from the data sharing statement).
Characteristics of evidence sources
We classified each source as a preprint or journal article. For journal articles, we manually extracted whether the journal had issued corrections or a retraction notice by searching the journal website and Retraction Watch (https://retractionwatch.com/retracted-coronavirus-covid-19-papers/). We used an algorithm to automatically extract the number of versions on the preprint server for each preprint and the date of online publication for each preprint and journal article.
Changes between and within evidence sources
We identified all studies in our sample for which there was more than one evidence source (i.e., a preprint and a journal article) or more than one version of the same evidence source (i.e., > 1 preprint version).
First, we sought to compare important evidence components between the following evidence sources:
Second, we sought to compare evidence components within evidence sources (the first preprint version versus the latest preprint version, if > 1 versions were available).
To perform these comparisons, we downloaded all abovementioned sources for each study in portable document format (PDF). We entered each pair of files in PDF Converter (Enterprise 8). This software allows users to compare documents by automatically detecting and highlighting changes in text (words added or deleted between versions). We reviewed the highlighted text to identify changes in the following evidence components, which may affect systematic reviewers’ appraisal of the effect of the intervention assessed in the study:
Change in any study result. We searched for numeric changes in at least one of the following effect-size metrics: hazard ratio, odds ratio, relative risk, event rate, correlation or regression coefficient (for country-level studies), or in the statistical significance (i.e., p value changed from > to < 0.05, or vice versa), for any outcome. To characterize the magnitude of change in results, we considered the change to be important if (1) it represented an increase or decrease by ≥ 10% of the initial value in any effect estimate and/or (2) it led to a change in the p value crossing the threshold of 0.05. For evidence source pairs that had a change in results, we additionally extracted whether the sample size had changed among or between evidence sources. The sample size was defined as the number of individuals enrolled in the study or the number of countries/regions analyzed (for population-level studies of policy interventions).
Change in the study conclusion reported in the abstract. First, we assessed the conclusion to determine two aspects:
If the abstract conclusion was positive, neutral, or negative regarding the effect of the intervention (i.e., whether the authors focused on improvement in any efficacy outcome or reduction in harms; versus lack of impact on any outcome; versus focus on deterioration of efficacy or safety outcomes).
If the abstract conclusion reported uncertainty in the findings (i.e., whether the authors emphasized the need for additional studies to confirm the findings and/or used mild phrasing such as “might be effective” versus strong phrasing such as “These results prove the efficacy of the intervention”).
We considered any change in conclusion among positive, neutral, or negative or change between reporting versus not reporting uncertainty to constitute change in the conclusion.
Dissemination of evidence sources
To describe the dissemination of the different evidence sources, we used an algorithm designed by one of the authors (G.C.) to automatically extract the following usage data for each preprint–journal article pair: citation count, Altmetric Attention Score (extracted from the Dimensions database) , and the number of PubPeer comments (extracted from the PubPeer Application Programming Interface [API]). These metrics were selected because they represent evidence dissemination by different end-user groups. Citations reflect use of an evidence source in academic communications. The Altmetric Attention Score tracks the occurrences of a source mentioned in the media and online bibliographic reference managers . Finally, PubPeer data reflect attention by researchers in the form of crowdsourced peer review. Commenters can write comments or ask for clarifications about the study and the study authors can post a reply . Usage data were last updated on 21 October 2020.
We sought to identify changes in evidence in a subgroup of studies currently being used for quantitative evidence synthesis in the COVID-NMA. These are RCTs and non-randomized studies that are interrupted time series or non-randomized studies using causal inference analysis or multivariable regression adjustment, with ≥ 150 incident users.
In addition to the study results and conclusion, we searched for changes in the following methodologic components that could affect the appraisal of risk of bias [8, 9]: blinding of participants, clinicians, or outcome assessors; amount and handling of missing data; randomization process and allocation concealment (in RCTs); inclusion of participants in the study or the analysis; and statistical method used to adjust for confounders and confounders adjusted for (in non-randomized studies).
One reviewer (T.O.) extracted data for all included studies by using a structured, piloted form (see Additional file 1: Methods S3). A second reviewer (O.P.) independently extracted data from all sources for 20% of the studies for all variables except for assessment of the conclusion, which was duplicated in full. Disagreements were resolved by consensus or by consulting a senior reviewer (I.B.). The agreement between reviewers was > 80% for all variables.
We used descriptive statistics to describe study characteristics and changes in evidence. We summarized data usage (e.g., citation count) as median and interquartile range (IQR). Because our sample included evidence sources published over an 8-month period, we accounted for the difference in exposure time by dividing each usage metric by the number of days since the date of first online publication.