Potential predatory and legitimate biomedical journals: can you tell the difference? A cross-sectional comparison

Background The Internet has transformed scholarly publishing, most notably, by the introduction of open access publishing. Recently, there has been a rise of online journals characterized as ‘predatory’, which actively solicit manuscripts and charge publications fees without providing robust peer review and editorial services. We carried out a cross-sectional comparison of characteristics of potential predatory, legitimate open access, and legitimate subscription-based biomedical journals. Methods On July 10, 2014, scholarly journals from each of the following groups were identified – potential predatory journals (source: Beall’s List), presumed legitimate, fully open access journals (source: PubMed Central), and presumed legitimate subscription-based (including hybrid) journals (source: Abridged Index Medicus). MEDLINE journal inclusion criteria were used to screen and identify biomedical journals from within the potential predatory journals group. One hundred journals from each group were randomly selected. Journal characteristics (e.g., website integrity, look and feel, editors and staff, editorial/peer review process, instructions to authors, publication model, copyright and licensing, journal location, and contact) were collected by one assessor and verified by a second. Summary statistics were calculated. Results Ninety-three predatory journals, 99 open access, and 100 subscription-based journals were analyzed; exclusions were due to website unavailability. Many more predatory journals’ homepages contained spelling errors (61/93, 66%) and distorted or potentially unauthorized images (59/93, 63%) compared to open access journals (6/99, 6% and 5/99, 5%, respectively) and subscription-based journals (3/100, 3% and 1/100, 1%, respectively). Thirty-one (33%) predatory journals promoted a bogus impact metric – the Index Copernicus Value – versus three (3%) open access journals and no subscription-based journals. Nearly three quarters (n = 66, 73%) of predatory journals had editors or editorial board members whose affiliation with the journal was unverified versus two (2%) open access journals and one (1%) subscription-based journal in which this was the case. Predatory journals charge a considerably smaller publication fee (median $100 USD, IQR $63–$150) than open access journals ($1865 USD, IQR $800–$2205) and subscription-based hybrid journals ($3000 USD, IQR $2500–$3000). Conclusions We identified 13 evidence-based characteristics by which predatory journals may potentially be distinguished from presumed legitimate journals. These may be useful for authors who are assessing journals for possible submission or for others, such as universities evaluating candidates’ publications as part of the hiring process.


Background
The Internet has transformed scholarly publishing. It has allowed for the digitalization of content and subsequent online experimentation by publishers, enabling print journals to host content online, and set the course for online open-access publishing. Nevertheless, an unwelcome consequence of the Internet age of publishing has been the rise of so-called predatory publishing.
In the traditional subscription model of publishing, journals typically require transfer of copyright from authors for articles they publish and their primary revenue stream is through fees charged to readers to access journal content, typically subscription fees or pay-per-article charges. Open access publishing, in contrast, typically allows for authors to retain copyright, and is combined with a license (often from Creative Commons), which enables free and immediate access to published content coupled with rights of reuse [1]. Some open access journals [2] and many hybrid journals (i.e., those with some open access content and also with non-open access content) [3] use a business model that relies upon publication charges (often called article publication or processing charges, or APC) to the author or funder of the research to permit immediate and free access.
Predatory publishing is a relatively recent phenomenon that seems to be exploiting some key features of the open access publishing model. It is sustained by collecting APCs that are far less than those found in presumably legitimate open access journals and which are not always apparent to authors prior to article submission. Jeffrey Beall, a librarian at the University of Colorado in Denver, first sounded the alarm about 'predatory journals' and coined the term. He initiated and maintains a listing of journals and publishers that he deems to be potentially, possibly, or probably predatory, called Beall's List [4] (content unavailable at the time of publishing). Their status is determined by a single person (Jeffrey Beall), against a set of evolving criteria (in its 3rd edition at the time of writing) that Beall has based largely on The Committee On Publication Ethics (COPE) Code of Conduct for Journal Editors and membership criteria of the Open Access Scholarly Publisher's Association [5][6][7]. Others have suggested similar criteria for defining predatory journals [8,9].
The phenomenon of predatory publishing is growing and opinions on its effects are divided. Critics say that it is extremely damaging to the scientific record and must be stopped [10,11]. Others feel that, while problematic, predatory publishing is a transient state in publishing and will disappear or become obvious over time [12]. A fundamental problem of predatory journals seems to be that they collect an APC from authors without offering concomitant scholarly peer review (although many claim to [13]) that is typical of legitimate journals [14]. Additionally, they do not appear to provide typical publishing services such as quality control, licensing, indexing, and perpetual content preservation and may not even be fully open access. They tend to solicit manuscripts from authors through repeated email invitations (i.e., spam) boasting open access, rapid peer review, and praising potential authors as experts or opinion leaders [13]. These invitations may seem attractive or an easy solution to inexperienced or early career researchers who need to publish in order to advance their career, or to those desperate to get a publication accepted after a number of rejections, or to those simply not paying attention. Predatory journals may also be a particular problem in emerging markets of scientific research, where researchers face the same pressure to publish, but lack the skills and awareness to discern legitimate journals from predatory ones.
Still, many researchers and potential authors are not aware of the problem of predatory journals and may not be able to detect a predatory journal or distinguish one from a legitimate journal. In order to assist readers, potential authors, and others in discerning legitimate journals from predatory journals, it would be useful to compare characteristics from both predatory and nonpredatory journals to see how they differ.
In this study, we undertook a cross-sectional study comparing the characteristics of three types of biomedical journals, namely (1) potential predatory journals, (2) presumed legitimate, fully open access journals, and (3) presumed legitimate subscription-based biomedical journals that may have open access content (e.g., hybrid).

Design
This was a cross-sectional study.

Journal identification and selection
We searched for journals on July 10, 2014. For feasibility, only journals with English-language websites were considered for inclusion and we set out to randomly select 100 journals within each comparison group. The following selection procedures were used to identify journals within each comparison group: Potential predatory journals ('Predatory'): We considered all journals named on Beall's List of single publishers for potential inclusion. We applied the MEDLINE Journal Selection criteria [15]: "[Journals] predominantly devoted to reporting original investigations in the biomedical and health sciences, including research in the basic sciences; clinical trials of therapeutic agents; effectiveness of diagnostic or therapeutic techniques; or studies relating to the behavioural, epidemiological, or educational aspects of medicine." Three independent assessors (OM, DM, LS) carried out screening in duplicate. From the identified biomedical journals, a computer-generated random sample of 100 journals was selected for inclusion. Journals that were excluded during data extraction were not replaced.
Presumed legitimate fully open-access journals ('Open Access'): A computer-generated, random sample of 95 journals from those listed on PubMed Central as being full, immediate open access, were included. In addition, five well-established open access journals were purposefully included: PLOS Medicine, PLOS One, PLOS Biology, BMC Medicine, and BMC Biology. Presumed legitimate subscription-based journals ('Subscription-based'): A computer-generated, random sample of 100 journals from those listed in the Abridged Index Medicus (AIM) was included. AIM was initiated in 1970 containing a selection of articles from 100 (now 119) English-language journals, as a source of relevant literature for practicing clinicians [16]. AIM was used here since all journals in this group were initiated prior to the digital era and presumed to have a maintained a partially or fully subscription-based publishing model [confirmed by us].
For all journals, their names and URLs were automatically obtained during the journal selection process and collected in Microsoft Excel. Screening and data extraction were carried out in the online study management software, Distiller SR (Evidence Partners, Ottawa, Canada). Journals with non-functioning websites at the time of data extraction or verification were excluded and not replaced.

Data extraction process
Data were extracted by a single assessor (OM) between October 2014 and February 2015. An independent audit (done by LS) of a random 10% of the sample showed discrepancies in 34/56 items (61%) on at least one occasion. As such, we proceeded to verify the entire sample by a second assessor. Verification was carried out in April 2015 by one of eight assessors (RB, JC, JG, DM, JR, LS, BJS, LT) with experience and expertise on various aspects of biomedical publishing process. Any disagreements that arose during the verification process were resolved by third party arbitration (by LS or LT). It was not possible to fully blind assessors to study groups due to involvement in the journal selection process (OM, DM, LS).

Data extraction items
Items for which data were extracted were based on a combination of items from Beall's criteria (version 2, December 2012) for determining predatory open-access publishers [6], the COPE Code of Conduct for Journal Publishers (http://publicationethics.org/resources/code-conduct), and the OASPA Membership criteria (http:// oaspa.org/membership/membership-criteria/). Data for 56 items were extracted in the following nine categories: aims and scope, journal name and publisher, homepage integrity (look and feel), indexing and impact factor, editors and staff, editorial process and peer review, publication ethics and policies, publication model and copyright, and journal location and contact.

Data analysis
Data were descriptively summarized within each arm. Continuous data were summarized by medians and interquartile range (IQR); dichotomous data were summarized using proportions.

Results
Ninety-three potential predatory journals, 99 open access journals, and 100 subscription-based journals were included in the analysis. The process of journal identification, inclusion, and exclusions within each study group is outlined in Fig. 1; 397 journals were identified as potential predatory journals. After de-duplication and screening for journals publishing biomedical content, 156 journals were identified, from which a random sample of 100 were chosen. Seven journals from the predatory group and one from the legitimate open access group were excluded during data extraction due to nonfunctional websites. No journal appeared in more than one study group.
There were four unanticipated journal exclusions during data extraction in the presumed legitimate open access and subscription-based groups for which randomly selected replacement journals were used. One journal was listed twice in the open access group and was deemed to be a magazine rather than a scientific journal. Two journals in the subscription-based journal group were deemed to be a magazine and a newsletter, respectively. The decision to exclude and replace these was made post-hoc, by agreement between LS and DM.

Homepage and general characteristics
About half of the predatory journals in our sample indicated interest in publishing non-biomedical topics (e.g., agriculture, geography, astronomy, nuclear physics) alongside biomedical topics in the stated scope of the journal and seemed to publish on a larger number of topics than non-predatory journals (Table 1). Predatory journals included pharmacology and toxicology (n = 59) in the scope of their journal four and a half times more often than open access journals (n = 13) and almost 30 times more than subscription-based journals (n = 2).
When we examined the similarity of the journal name to other existing journals (e.g., one or two words different on the first page of Google search results), we found that over half of predatory journals (n = 51, 55.84%) had names that were similar to an existing journal compared to only 17 open access journals (17.17%) and 22 subscriptionbased journals (22.00%) ( Table 2). In all study groups, the journal name was well reflected in the website URL. For journals that named a country in the journal title, some journals named a different country in the journal contact information (11/21 (52.38%) predatory; 4/13 (30.77%) open access; 1/31 (3.23%) subscription-based) ( Table 3).
There was a high prevalence of predatory journals from low or low-to middle-income countries (LMICs) (48/64, 75.00%) compared to open access journals (18/92, 19.56%); none of the subscription-based journals listed LMIC addresses.
We assessed the integrity of the homepage by examining the content for errors (Table 4). Spelling and grammatical errors were more prevalent in predatory journals (n = 61, 65.59%) compared to in open access (n = 6, 6.06%) and subscription-based journals (n = 3, 3.00%). In addition, we found a higher frequency of distorted or potentially unauthorized image use (e.g., company logos

Metrics and indexing
Most subscription-based journals indicated having a journal impact factor (assumed 2-year Thomson Reuters JIF unless otherwise indicated) (n = 80, median 4.275 (IQR 2.469-6.239)) compared to less than half of open access journals (n = 38, 1.750 (1.330-2.853)) and fewer predatory journals (n = 21, 2.958 (0.500-3.742)) ( Table 5). More than half of predatory journals (n = 54, 58.06%) and subscription-based journals (n = 62, 62%) mentioned Other metric Top 5 listed (n) Index Copernicus Value (31) Global Impact Factor (9) Scientific Journal Impact Factor (9) Scientific Journal Rankings/ SciMago/Scopus (7) Total citations (7) Scientific Journal Rankings/ SciMago/Scopus (6) Total citations (5) Index Copernicus Value (3) h-index (2) 5-year impact factor (2) 5-year impact factor (27  Assessors were asked to perform a Google search of the Editor-in-Chief and two other randomly selected editors/staff/board members along with their affiliation (if provided) and make a subjective assessment of whether the names appear to be legitimate, false/made up, used without permission. Assessments were based on searches through online profiles (i.e., LinkedIn, faculty bio, etc.) for mention of journal affiliation; categories not distinct since judgments based on multiple editors c Denominator of fractions indicates the number of journals where the variable concerned was relevant  (Table 7).

Publication ethics and policies
We examined journals' promotion and practices around publications ethics (

Publication model, fees, and copyright
We assessed whether journals made any indication about accessibility, fees, and copyright (Table 9). Forty-

Discussion
This study demonstrates that our sample of potential predatory journals is distinct in some key areas from presumed legitimate journals and provides evidence of how they differ. While criteria have been proposed previously to characterize potential predatory journals [7], measuring each journal against a long list of criteria is not practical for the average researcher. It can be time consuming and some criteria are not straightforward to apply, as we have learned during this study. For instance, whether or not the listed editors of a journal are real people or have real affiliations with a journal is quite subjective to assess. Another example pertains to preservation and permanent access to electronic journal content. We found that not all presumed legitimate journals made explicit statements about this; however, we know that in order to be indexed in MEDLINE, a journal must "Have an acceptable arrangement for permanent preservation of, and access to, the content" [17]. From our findings, we have developed a list of evidence-based, salient features of suspected predatory journals ( Table 10) that are straightforward to assess; we describe them further below. We recognize that these criteria are likely not sensitive enough to detect all potentially illegitimate, predatory journals. However, we feel they are a good starting point.

Non-biomedical scope of interest
We found that predatory journals tend to indicate interest in publishing research that was both biomedical and nonbiomedical (e.g., agriculture, geography, astrophysics) within their remit, presumably to avoid limiting submissions and increase potential revenues. While legitimate journals may do this periodically (we did not assess the scope of presumed legitimate biomedical journals), the topics usually have some relationship between them and represent a subgroup of a larger medical specialty (e.g., Law and Medicine). Authors should examine the scope and content (e.g., actual research) of the journals they intend to publish in to determine whether it is in line with what they plan to publish.

Spelling and grammar
The home page of a journal's website may be a good initial indicator of their legitimacy. We found several homepage indicators that may be helpful in assessing a journal's legitimacy and quality. The homepages of potential predatory journals' websites contained at least 10 times more spelling and grammar errors than presumed legitimate journals. Such errors may be an artefact of foreign language translation into English, as the majority of predatory journals were based in countries where a non-English language is dominant. Further, legitimate publishers and journals may be more careful about such errors to maintain professionalism and a good reputation.

Fuzzy, distorted, or potentially unauthorized image
Potential predatory journals appeared to have images that were low-resolution (e.g., fuzzy around the edges) or distorted 'knock-off' versions of legitimate logos or images.

Language directed at authors
Another homepage check authors can do is to examine the actual written text to gauge the intended audience. We found that presumed legitimate journals appear to target readers with their language and content (e.g., highlighting new content), whereas potential predatory journals seem to target prospective authors by inviting submissions, promising rapid publication, and promoting different metrics (including the Index Copernicus Value).

Manuscript submission and editorial process/policies
Authors should be able to find information about what happens to their article after it is submitted. Potential predatory journals do not seem to provide much information about their operations compared to presumed legitimate journals. Furthermore, most potential predatory journals request that articles be submitted via email rather than a submission system (e.g., Editorial Manager, Scholar One), as presumed legitimate journals do. Typically, journals have requirements that must be met or checked by authors or the journal during submission (e.g., declaration of conflicts of interest, agreement that the manuscript adheres to authorship standards and other journal policies, plagiarism detection). When a manuscript is submitted via email, these checks are not automatic and may not ever occur. Authors should be cautious of publishing in journals that only take submissions via email and that do not appear to check manuscripts against journal policies as such journals are likely of low quality. In addition, the email address provided by a journal seems to be a good indicator of its legitimacy. Predatory journals seem to provide non-professional or non-academic email addresses such as from providers with non-secured servers like Gmail or Yahoo.

Very low APC and inappropriate copyright
Finally, authors should be cautious when the listed APC of a biomedical journal is under $150 USD. This is very low in comparison to presumed legitimate, fully open access biomedical journals for which the median APC is at least 18 times more. Hybrid subscription journals charge 30 times the amount of potential predatory journals to publish and make research openly accessible. It has been suggested that hybrid journals charge a higher fee in order to maintain their 'prestige' (e.g., journals can be more selective about their content based on who is willing to pay the high fee) [18]. On the contrary, extremely low APCs may simply be a way for potential predatory journals to attract as many submissions as possible in order to generate revenue and presumably to build their content and reputation. Evidently, the APC varies widely across journals, perhaps more than any other characteristic we measured. Journal APCs are constantly evolving and increasing requirements by funders to make research open access may have a drastic impact on APCs as we know them over the coming years.
Researchers should be trained on author responsibilities, including how to make decision about where to publish their research. Ideally, authors should start with a validated or 'white' list of acceptable journals. In addition to considering the items listed in Table 10 in their decision-making, tools to guide authors through the journal selection process have started to emerge, such as ThinkCheckSubmit (http://thinkchecksubmit.org/). Recently, COPE, OASPA, DOAJ, and WAME produced principles of transparency against which, among other measures, DOAJ assesses journals in part, before they can be listed in the database (https://doaj.org/bestpractice). We also encourage researchers to examine all journals for quality and legitimacy using the characteristics in Table 10 when making a decision on where to submit their research. As the journal landscape changes, it is no longer sufficient for authors to make assumptions about the quality of journals based on arbitrary measures, such as perceived reputation, impact factor, or other metrics, particularly in an era where bogus metrics abound or legitimate ones are being imitated.
This study examined most of Beall's criteria for identification of predatory publishers and journals together with items from the COPE and OASPA. While many of the characteristics we examined were useful to distinguish predatory journals from presumed legitimate journals, there were many that do not apply or that are not unique to predatory journals. For instance, defining criteria of predatory journals [4] suggest that no single individual is named as an editor and that such journals do not list an editorial board. We found that this was not the case in over two thirds of predatory journals and, in fact, a named EIC could not be identified for 26 (13.07%) of the presumed legitimate journals in our sample. Such non evidence-based criteria for defining journals may introduce confusion rather than clarity and distinction.
The existing designation of journals and publishers as predatory may be confusing for other reasons. For instance, more than one presumed-legitimate publisher has appeared on Beall's list [19]. In October 2015, Frontiers Media, a well-known Lausanne-based open access publisher, appeared on Beall's List [20]. Small, new, or under-resourced journals may appear to have the look and feel of a potential predatory journal because they do not have affiliations with large publishers or technologies (e.g., manuscript submission systems) or mature systems The contact email address is non-professional and non-journal affiliated (e.g., @gmail.com or @yahoo.com) and the features of a legitimate journal. This is in line with our findings that journals from low-resourced (LMIC) countries were more often in the potentially predatory group of journals than either of the presumed-legitimate journal arms. However, this does not imply that they are necessarily predatory journals. Another limitation is that the majority of the open access biomedical journals in our sample (95%) charged an APC, while generally many open access journals do not. May 2015 was the last time that the DOAJ provided complete information regarding APCs of journals that it indexes (fully open access, excluding delayed or partial open access). At that time, approximately 32% of journals charged an APC. At the time of writing this article, approximately 40% of medical journals in DOAJ appear to charge an APC. However, these figures do not account for the hybrid-subscription journals that have made accommodations in response to open access, many of which are included in our sample of subscriptionbased journals. For such journals, our data and that of others [21] show that their fees appear to be substantially higher than either potential predatory or fully open access journals.

In context of other research
To the best of our knowledge this is the first comparative study of predatory journal publishing and legitimate publishing models aimed at determining how they are different and similar. Previously, Shen and Björk [22] examined a sample of about 5% of journals listed on Beall's List for a number of characteristics, including three that overlap with items for which we collected data: APC, country of publisher, and rapidity of (submission to) publishing [22]. In a large part, for the characteristics examined, our findings within the predatory journal group are very similar. For example, Shen and Björk [22] found the average APC for single publisher journals to be $98 USD, which is very similar to our results ($100 USD). They also found that 42% of single predatory journal publishers were located in India, whereas our estimates were closer to 62%. Differences between their study and ours may exist because we focused on biomedical journals while they included all subject areas.

Limitations
It was not possible to fully blind assessors to study groups since, given the expertise of team members, a minimum knowledge of non-predatory publishers was expected. In addition, we could only include items that could be assessed superficially rather than those requiring in-depth investigations for each journal. Many items can and should be investigated further.
Since some characteristics are likely purposely similar between journals (e.g., journals from all groups claim to be open access and indicate carrying out peer review) [14], and it was difficult to anticipate which, we did not carry out a logistic regression to determine whether characteristics were likely to be associated with predatory or presumed legitimate journals.

Conclusions
This research initiates the evidence-base illuminating the difference between major publishing models and, moreover, unique characteristics of potential predatory (or illegitimate) journals (Table 10).
The possibility that some journals are predatory is problematic for many stakeholders involved in research publication. Most researchers are not formally trained on publication skills and ethics, and as such may not be able to discern whether a journal is running legitimate operations or not. For early career researchers or for those who are unaware of the existence or characteristics of predatory journals, they can be difficult to distinguish from legitimate journals. However, this study indicates that predatory journals are offering at least 18-fold lower APCs than non-predatory journals, which may be attractive to uninformed authors and those with limited fiscal resources. Assuming that each journal publishes 100 articles annually, the revenues across all predatory journals would amount to at least a $USD 100 million dollar enterprise. This is a substantial amount of money being forfeited by authors, and potentially by funders and institutions, for publications that have not received legitimate professional editorial and publishing services, including indexing in databases.
Established researchers should beware of predatory journals as well. There are numerous anecdotes about researchers (even deceased researchers [23]) who have been put on a journal's editorial board or named as an editor, who did not wish to be and who were unable to get their names delisted [24]. Aside from this potentially compromising the reputation of an individual that finds him or herself on the board, their affiliation with a potential predatory journal may confer legitimacy to the journal that is not deserved and that has the potential to confuse a naïve reader or author. As our findings indicate, this phenomenon appears to be a clear feature of predatory journals.
In addition to the costs and potential fiscal waste on publication in predatory journals, these journals do not appear to be indexed in appropriate databases to enable future researchers and other readers to consistently identify and access the research published within them. The majority of predatory journals indicated being 'indexed' in Google Scholar, which is not an indexing database. Google does not search pre-selected journals (as is the case with databases such as Medline, Web of Science, and Scopus), rather it searches the Internet for scholarly content. Some potentially predatory journals indicate being indexed in well-known biomedical databases; however, we have not verified the truthfulness of these claims by checking the databases. Nonetheless, if legitimate clinical research is being published in predatory journals and cannot be discovered, this is wasteful [25], in particular when it may impact systematic reviews. Equally, if non-peer reviewed, low quality research in predatory journals is discovered and included in a systematic review, it may pollute the scientific record. In biomedicine, this may have detrimental outcomes on patient care.

Future research
What is contained (i.e., 'published') within potential predatory journals is still unclear. To date, there has not been a large-scale evaluation of the content of predatory journals to determine whether research is being published, what types of studies predominate, and whether or not data (if any) are legitimate. In addition, we have little understanding of who is publishing in predatory journals (i.e., experience of author, geographic location, etc.) and why. Presumably, the low APC is an attractive feature; however, whether or not authors are intentionally or unintentionally publishing within these journals is critical to understanding the publishing landscape and anticipate future potential directions and considerations.
The findings presented here can facilitate education on how to differentiate between presumed legitimate journals and potential predatory journals.

Availability of data and materials
The screening and data extraction forms used, and the data generated, in this study are available from the authors on request.
Authors' contributions DM and LS conceived of this project and drafted the protocol, with revisions by VB. RB, JC, JG, OM, DM, JR, LS, BJS, and LT were involved in the conduct of this project. LS and LT performed analysis of data. LS drafted the manuscript. All authors provided feedback on this manuscript and approved the final version for publication.

Competing interests
VB is the Chair of COPE and the Executive Director of the Australasian Open Access Strategy Group.

Consent for publication
Not applicable.
Ethics approval and consent to participate Not applicable.

Transparency declaration
David Moher affirms that this manuscript is an honest, accurate, and transparent account of the study being reported, that no important aspects of the study have been omitted, and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Author details