Skip to main content

Fifteen years of epidemiology in BMC Medicine


BMC Medicine was launched in November 2003 as an open access, open peer-reviewed general medical journal that has a broad remit to publish “outstanding and influential research in all areas of clinical practice, translational medicine, medical and health advances, public health, global health, policy, and general topics of interest to the biomedical and sociomedical professional communities”. Here, I discuss the last 15 years of epidemiological research published by BMC Medicine, with a specific focus on how this reflects changes occurring in the field of epidemiology over this period; the impact of ‘Big Data’; the reinvigoration of debates about causality; and, as we increasingly work across and with many diverse disciplines, the use of the name ‘population health science’. Reviewing all publications from the first volume to the end of 2018, I show that most BMC Medicine papers are epidemiological in nature, and the majority of them are applied epidemiology, with few methodological papers. Good research must address important translational questions that should not be driven by the increasing availability of data, but should take appropriate advantage of it. Over the next 15 years it would be good to see more publications that integrate results from several different methods, each with different sources of bias, in a triangulation framework.


In the 15 years since BMC Medicine was launched in November 2003, epidemiology has led the challenge of ‘Big Data’ science [1], reinvigorated debates about what can legitimately be considered causes of diseases and what methods should be used to determine causality (e.g., [2, 3]), and become increasingly known as ‘population health science’ [4]. These three changes are related to each other and to broader changes in science and society, as well as being rooted in a much longer history going back decades if not centuries. I thought it would be interesting to consider how these recent changes are reflected in the last 15 years of BMC Medicine. To do this, I undertook a review of the types of studies published by BMC Medicine in the last 15 years (see Fig. 1 and Additional file 1 for the methodology used to prepare this figure). I was pleased to see that most of the published research articles were epidemiology studies (Fig. 1a; 981/1334; 74%). Most of the epidemiology papers were applied studies (Fig. 1a; 946/981; 96%). This is a common finding in general medical journals, despite the existence of several specific epidemiology journals [5]. The few papers that I considered to be methodological (Fig. 1b; 35/981; 4%) were largely concerned with methods for developing or refining tools to measure risk factors or disease outcomes (e.g., [6, 7]), rather than research into analytical or study design methods. There was little evidence that authors were using directed acyclic graphs (DAGs) to demonstrate statistical assumptions [8].

Fig. 1
figure 1

Research articles and ‘epidemiology’ research articles published in BMC Medicine, 2003–2018. a the proportion of all research articles that were epidemiology studies, by years. b the proportion of epidemiology study papers that were methodological or included any ‘omics measurements

Big data

‘Big Data’ has no clear definition, but the term can be used to refer to datasets with many participants and/or many variables. The former category includes large-scale record linkage studies; the latter includes the integration of multiple ‘omics data with socioeconomic, environmental, lifestyle and clinical data in epidemiological studies and the collection of intense, continuously measured data, such as glucose levels collected by sensors at short, regular intervals. The current BMC Medicine call for papers in this area notes: “Big Data in Medicine can be used to provide health profiles and predictive models around individual patients. The use of high-throughput data to integrate genetic and clinical inter-relationships; real-world data to infer biological principles as well as associations, trajectories and stratifications of patients; data-driven approaches for patients and digital platforms are the hope for medical problems and evidence-based medicine” [9].

However, as Saracci has eloquently highlighted, excessive claims for ‘Big Data’, such as is proposed in this statement, can result in ‘bigness’ overriding the key principles of epidemiology and good science. These principles include, for example, the need for data (and software) validity, replication or validation of results in independent studies and, importantly, using data to address the most relevant questions rather than ‘blind [big] data dredging’ [1]. As with other journals, BMC Medicine has published a small proportion of ‘omics studies (Fig. 1b; 77/981 (8%) of the epidemiology papers included some ‘omics measurements) and most of these were small and had no independent replication or validation (e.g., [10,11,12]). Larger studies that did include replication (e.g., [13, 14]) have been published more recently.

Population health science

The increasing use of the term ‘population health science’ in part reflects the potential for epidemiologists to undertake population level physiology and embed this in what was previously called ‘social medicine’. This is enabled by the integration of multiple ‘omics data with socioeconomic, lifestyle and clinical data in large cohort studies. Multidisciplinary (i.e., people or groups from different disciplines working together on research projects by drawing on their specific disciplinary knowledge) and interdisciplinary (i.e., synthesising methods and knowledge from different disciplines to answer research questions) approaches are needed to realise the full potential of these data [4]. Thus, over the last 15 years, epidemiologists have increasingly learnt the theories and language of colleagues from diverse fundamental and emerging disciplines, including mathematics, biology, chemistry, data and computer science and (bio)informatics [15,16,17]. We have worked in large collaborations with these disciplines, as well as with social and clinical scientists, with whom we have a long tradition of working. This multidisciplinary and interdisciplinary work with population data has been called ‘population health science’ [4].

Causality, Mendelian randomisation and triangulation

One of the most notable changes in epidemiology in the last 15 years has been the increased use of Mendelian randomisation (MR) [18]. MR is the use of genetic data to explore causal effects of modifiable (non-genetic) risk factors. The first formal proposal of this method (as used over the last 15 years) was published in February 2003 [18], just 9 months before the first volume of BMC Medicine was published. Notably, in that original paper – and particularly in a subsequent paper – George Davey Smith acknowledges a long history of others who have suggested the use of genetic variants in this way, including Fisher, who made the link between randomised trials and the random segregation of genetic variants in 1951 [19]. MR and other new methods have stimulated debates about causality, the underlying assumptions of different analytical methods and the importance of acknowledging and exploring these [8]. This has resulted in epidemiologists increasingly using DAGs to demonstrate their causal analysis assumptions, particularly for new methods or causal frameworks, such as MR. Over the last 15 years, MR has been increasingly used to improve causal understanding of the effects of lifestyle risk factors and pathophysiological targets on human health and disease [20,21,22,23,24]. Alongside these applications, considerable efforts have been made to develop methods to explore the validity of the genetic instruments used in MR studies and the robustness of their results [25,26,27,28,29,30,31,32,33,34]. The availability of summary results from large numbers of genome-wide association studies (GWAS) that can be used for two-sample MR [29], together with automated tools (such as MR-Base [35]) for analysing these data and performing sensitivity analyses, have contributed to recent increases in the use of (two-sample) MR. This shift is reflected in the results of my review of BMC Medicine publications: just one MR study was published before 2018. This paper, published in 2004, did not use the term MR, but used MTHFR genetic variants to explore the role of homocysteine in migraine [36]. By contrast, six MR studies were published in BMC Medicine in 2018 [37,38,39,40,41,42], five of which used two-sample MR.

The ease with which two-sample MR can be undertaken means that some authors can complete analyses in minutes without giving sufficient thought to the importance or relevance of the research question being explored. They may also fail to consider or discuss key methodological issues (even when using automated systems developed specifically for two-sample MR). These include whether the two samples are from the same underlying population and whether the GWAS population used is relevant for the research question. In addition, replication of these two-sample MR findings and triangulating them with results from other methods with different underlying assumptions should be explored [29]. One notable example of the poor science that can result from the rush to an ‘easy publication’ is demonstrated by the comparison of results from two studies published in 2016. Both studies applied two-sample MR to the same publicly available data, but reported diametrically opposing conclusions (one reported that higher circulating C-reactive protein concentration increased risk of schizophrenia, while the other concluded that it decreased schizophrenia risk) [28]. Hartwig and colleagues demonstrated how one of the two had not harmonised summary data across the two samples (Table 3 in [28]); that paper has subsequently been retracted [43].

The use of triangulation is increasingly recognised as key to exploring causal effects [44]. In this approach, results are compared from several different epidemiological methods, each of which has different, unrelated, key sources of bias. The idea is that if each of these methods suggests that a risk factor is causally related to an outcome, despite their different sources of bias, confidence in the results increases and a true causal effect is reflected. If results differ, by being explicit in the first instance about their different sources of bias, it is possible to determine what further studies would be needed to obtain a robust causal answer [44]. Going forwards, the potential for further extending this approach in a truly interdisciplinary way – including integrating data from (bio)informatics and laboratory science – is an exciting possibility for the next 15 or more years.

Data sharing and supporting team science

Changes in epidemiology over the last 15 years have coincided with debates about data-use and sharing [45]. In cohort studies, there is no equivalent of the randomised trial register that provides a means of exploring ‘data dredging’ and publication bias. In a 2007 commentary, I noted that with the increasing number of cohorts and data within them that are, rightly, shared across the global scientific community to investigate many different hypotheses, it was nearly impossible to judge contributions to publication bias from observational epidemiology [46]. I suggested then that this situation might be improved by changing the journal publication process so that authors submitted only the introduction and methods of their study. In this way, decisions to publish would not be dependent on the results (and whether or not they reached some arbitrary P-value threshold). This opinion had no influence on journal editors or researchers and, in fact, my thoughts have changed since then. I think accessing cohort data would benefit from the requirement to submit a brief ‘protocol’ of planned analyses that could serve as a ‘register’. These should be kept as simple as possible and made public. They should neither be used to judge (scientifically) whether data are shared, nor to reject access on the basis of overlap with other proposals. Two UK examples of this process are the UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC) [47, 48] (for transparency, I acknowledge that I have had a leading scientific role in ALSPAC for the last 15 years). Debates about the pros and cons of this approach versus access that does not require registration are likely to continue, but I hope over the coming years that more researchers, funders, academic institutions and journal editors will insist on clear policies for sharing of hypotheses, data and analysis code between researchers. In addition, they should push for ‘team science’, with recognition of all who contribute (including those who recruit participants and collect and process data).


As a new member of the BMC Medicine Editorial Board, I am pleased to see that a consistently high proportion of applied epidemiology papers have been published over the last 15 years (Fig. 1a). As I read through the titles and abstracts of each paper, I also sensed that a high proportion of this research is from low and middle income countries, which I am also pleased about. In the next 15 years, it would be nice to see the advice to researchers from a recent Nature editorial reflected in published BMC Medicine research: ‘In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth’ [49].

Availability of data and materials

Not applicable.



Genome-wide association study


Mendelian randomisation


  1. Saracci R. Epidemiology in wonderland: big data and precision medicine. Eur J Epidemiol. 2018;33(3):245–57.

    Article  Google Scholar 

  2. Davey SG. Post-“modern epidemiology”: when methods meet matter. Am J Epidemiol. 2019.

    Article  Google Scholar 

  3. Pearce N, Vandenbroucke JP, Lawlor DA. Causal inference in environmental epidemiology: old and new approaches. Epidemiology. 2019;30(3):311–6.

    Article  Google Scholar 

  4. Keyes KM, Galea S. Setting the agenda for a new discipline: population health science. Am J Public Health. 2016;106(4):633–4.

    Article  Google Scholar 

  5. Saracci R. Epidemiology, the international epidemiological association and the international journal of epidemiology: a personal chronicle. Int J Epidemiol. 2016;45(6):1727–32.

    PubMed  Google Scholar 

  6. Zhang J, Mikolajczyk R, Lei X, Sun L, Yu H, Cheng W. An adjustable fetal weight standard for twins: a statistical modeling study. BMC Med. 2015;13:159.

    Article  Google Scholar 

  7. Ndila C, Bauni E, Nyirongo V, Mochamah G, Makazi A, Kosgei P, et al. Verbal autopsy as a tool for identifying children dying of sickle cell disease: a validation study conducted in Kilifi district, Kenya. BMC Med. 2014;12:65.

    Article  Google Scholar 

  8. Pearce N, Lawlor DA. Causal inference – so much more than statistics. Int J Epidemiol. 2016;45(6):1895–903.

    Article  Google Scholar 

  9. Beyond Big Data to new biomedical and health data science: moving to next century precision health. BMC Med. 2019. Accessed 15 May 2019.

  10. Agardh E, Lundstig A, Perfilyev A, Volkov P, Freiburghaus T, Lindholm E, et al. Genome-wide analysis of DNA methylation in subjects with type 1 diabetes identifies epigenetic modifications associated with proliferative diabetic retinopathy. BMC Med. 2015;13:182.

    Article  Google Scholar 

  11. Lin X, Lim IY, Wu Y, Teh AL, Chen L, Aris IM, et al. Developmental pathways to adiposity begin before birth and are influenced by genotype, prenatal environment and epigenome. BMC Med. 2017;15(1):50.

    Article  Google Scholar 

  12. Maitre L, Fthenou E, Athersuch T, Coen M, Toledano MB, Holmes E, et al. Urinary metabolic profiles in early pregnancy are associated with preterm birth and fetal growth restriction in the Rhea mother-child cohort study. BMC Med. 2014;12:110.

    Article  Google Scholar 

  13. Wang Q, Ferreira DLS, Nelson SM, Sattar N, Ala-Korpela M, Lawlor DA. Metabolic characterization of menopause: cross-sectional and longitudinal evidence. BMC Med. 2018;16(1):17.

    Article  Google Scholar 

  14. Lau CE, Siskos AP, Maitre L, Robinson O, Athersuch TJ, Want EJ, et al. Determinants of the urinary and serum metabolome in children from six European populations. BMC Med. 2018;16(1):202.

    Article  CAS  Google Scholar 

  15. Felix JF, Joubert BR, Baccarelli AA, Sharp GC, Almqvist C, Annesi-Maesano I, et al. Cohort Profile: Pregnancy And Childhood Epigenetics (PACE) Consortium. Int J Epidemiol. 2018;47(1):22–23u.

    Article  Google Scholar 

  16. Santos Ferreira DL, Williams DM, Kangas AJ, Soininen P, Ala-Korpela M, Smith GD, et al. Association of pre-pregnancy body mass index with offspring metabolic profile: analyses of 3 European prospective birth cohorts. PLoS Med. 2017;14(8):e1002376.

    Article  Google Scholar 

  17. Millard LAC, Tilling K, Lawlor DA, Flach PA, Gaunt TR. Physical activity phenotyping with activity bigrams, and their association with BMI. Int J Epidemiol. 2017;46:1857–70.

    Article  Google Scholar 

  18. Davey Smith G, Ebrahim S. “Mendelian randomisation”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22.

    Article  Google Scholar 

  19. Davey SG. Capitalising on Mendelian randomization to assess the effects of treatments. JLL Bulletin: Commentaries on the history of treatment evaluation. 2006.

  20. Borges MC, Barros AJD, Ferreira DLS, Casas JP, Horta BL, Kivimaki M, et al. Metabolic profiling of adiponectin levels in adults: Mendelian randomization analysis. Circ Cardiovasc Genet. 2017;10(6):e001837.

    Article  CAS  Google Scholar 

  21. Borges MC, Lawlor DA, de Oliveira C, White J, Horta BL, Barros AJD. The role of adiponectin in coronary heart disease risk: a Mendelian randomization study. Circ Res. 2016;119:491–9.

    Article  CAS  Google Scholar 

  22. Carter AR, Borges MC, Benn M, Tybjaerg-Hansen A, Davey Smith G, Nordestgaard BG, et al. Combined association of body mass index and alcohol consumption with biomarkers for liver injury and incidence of liver disease: a Mendelian randomization study. JAMA Netw Open. 2019;2(3):e190305.

    Article  Google Scholar 

  23. Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, et al. Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol. 2011;40(6):1617–28.

    Article  Google Scholar 

  24. Tyrrell J, Richmond RC, Palmer TM, Feenstra B, Rangarajan J, Metrustry S, et al. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. JAMA. 2016;315(11):1129–40.

    Article  CAS  Google Scholar 

  25. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40(4):304–14.

    Article  Google Scholar 

  26. Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan NA, Thompson JR. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-egger regression: the role of the I2 statistic. Int J Epidemiol. 2016;45(6):1961–74.

    PubMed  PubMed Central  Google Scholar 

  27. Bowden J, Del Greco MF, Minelli C, Zhao Q, Lawlor DA, Sheehan NA, et al. Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. Int J Epidemiol. 2018.

    Article  Google Scholar 

  28. Hartwig FP, Davies NM, Hemani G, Davey SG. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol. 2016;45(6):1717–26.

    Article  Google Scholar 

  29. Lawlor DA. Two-sample Mendelian randomization: opportunities and challenges. Int J Epidemiol. 2016;45(3):908–15.

    Article  Google Scholar 

  30. Slichter D. Testing instrument validity and identification with invalid instruments. Society of Labor Economics Journal of labor economics. Chicago, IL: University of Chicago Press; 2014. Accessed 15 May 2019

    Google Scholar 

  31. Spiller W, Slichter D, Bowden J, Davey SG. Detecting and correcting for bias in Mendelian randomization analyses using gene-by-environment interactions. Int J Epidemiol. 2018.

  32. Guo Z, Kang H, Cai TT, Small DS. Confidence intervals for causal effects with invalid instruments using two-stage hard thresholding with voting. arXiv:1603.05224v3 [math.ST] 2017.

  33. Kang H, Zhang A, Cai TT, Small DS. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J Am Stat Assoc. 2016;111:132–42.

    Article  CAS  Google Scholar 

  34. van Kippersluis H, Rietveld CA. Pleiotropy-robust Mendelian randomization. Int J Epidemiol. 2018;47(4):1279–88.

    Article  Google Scholar 

  35. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408.

    Article  Google Scholar 

  36. Lea RA, Ovcaric M, Sundholm J, MacMillan J, Griffiths LR. The methylenetetrahydrofolate reductase gene variant C677T influences susceptibility to migraine with aura. BMC Med. 2004;2:3.

    Article  Google Scholar 

  37. Disney-Hogg L, Cornish AJ, Sud A, Law PJ, Kinnersley B, Jacobs DI, et al. Impact of atopy on risk of glioma: a Mendelian randomisation study. BMC Med. 2018;16(1):42.

    Article  Google Scholar 

  38. Lai FY, Nath M, Hamby SE, Thompson JR, Nelson CP, Samani NJ. Adult height and risk of 50 diseases: a combined epidemiological and genetic analysis. BMC Med. 2018;16(1):187.

    Article  CAS  Google Scholar 

  39. He Y, Timofeeva M, Farrington SM, Vaughan-Shaw P, Svinti V, Walker M, et al. Exploring causality in the association between circulating 25-hydroxyvitamin D and colorectal cancer risk: a large Mendelian randomisation study. BMC Med. 2018;16(1):142.

    Article  Google Scholar 

  40. Larsson SC, Burgess S, Michaelsson K. Serum magnesium levels and risk of coronary artery disease: Mendelian randomisation study. BMC Med. 2018;16(1):68.

    Article  Google Scholar 

  41. Mocellin S, Tropea S, Benna C, Rossi CR. Circadian pathway genetic variation and cancer risk: evidence from genome-wide association studies. BMC Med. 2018;16(1):20.

    Article  Google Scholar 

  42. Nordestgaard LT, Tybjaerg-Hansen A, Rasmussen KL, Nordestgaard BG, Frikke-Schmidt R. Genetic variation in clusterin and risk of dementia and ischemic vascular disease in the general population: cohort studies and meta-analyses of 362,338 individuals. BMC Med. 2018;16(1):39.

    Article  Google Scholar 

  43. Inoshita M, Numata S, Tajima A, Kinoshita M, Umehara H, Nakataki M, et al. Retraction: a significant causal association between C-reactive protein levels and schizophrenia. Sci Rep. 2018;8:46947.

    Article  Google Scholar 

  44. Lawlor DA, Tilling K, Davey SG. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866–86.

    PubMed  Google Scholar 

  45. Barbui C. Sharing all types of clinical data and harmonizing journal standards. BMC Med. 2016;14:63.

    Article  Google Scholar 

  46. Lawlor DA. Quality in epidemiological research: should we be submitting papers before we have the results and submitting more hypothesis-generating research? Int J Epidemiol. 2007;36(5):940–3.

    Article  Google Scholar 

  47. Allen NE, Sudlow C, Peakman T, Collins R, Biobank UK. UK biobank data: come and get it. Sci Transl Med. 2014;6(224):224ed224.

    Article  Google Scholar 

  48. Lawlor DA, Lewcock M, Rena-Jones L, Rollings C, Yip V, Smith D, et al. The second generation of the Avon longitudinal study of parents and children (ALSPAC-G2): a cohort profile. Wellcome Open Res. 2019;4:36.

    Article  Google Scholar 

  49. It’s time to talk about ditching statistical significance. Looking beyond a much used and abused measure would make science harder, but better. Nature. 2019;567:283.

Download references


Professor George Davey Smith (University of Bristol) and Professor Neil Pearce (London School of Hygiene & Tropical Medicine) provided useful comments on an earlier draft of this editorial.


DAL works in a unit that receives support from the University of Bristol and UK Medical Research Council (grant number MC_UU_00011/6). The views presented in this paper are those of the author and not necessarily any acknowledged funding body or person.

Author information

Authors and Affiliations



All results are the responsibility of the author. DAL wrote and approved the final version of the manuscript for publication.

Corresponding author

Correspondence to Deborah A. Lawlor.

Ethics declarations

Ethics approval

Not applicable.

Consent for publication

Not applicable.

Competing interests

DAL receives or has received during the last 10 years) support from the UK National Institute for Health Research, British Heart Foundation, European Research Council, US National Institute of Health, Medtronic Ltd. and Roche Diagnostics for research that is unrelated to this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Details of how the articles included in Fig. 1 were classified. (DOCX 15 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lawlor, D.A. Fifteen years of epidemiology in BMC Medicine. BMC Med 17, 177 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: