- Open Access
Fifteen years of epidemiology in BMC Medicine
BMC Medicine volume 17, Article number: 177 (2019)
BMC Medicine was launched in November 2003 as an open access, open peer-reviewed general medical journal that has a broad remit to publish “outstanding and influential research in all areas of clinical practice, translational medicine, medical and health advances, public health, global health, policy, and general topics of interest to the biomedical and sociomedical professional communities”. Here, I discuss the last 15 years of epidemiological research published by BMC Medicine, with a specific focus on how this reflects changes occurring in the field of epidemiology over this period; the impact of ‘Big Data’; the reinvigoration of debates about causality; and, as we increasingly work across and with many diverse disciplines, the use of the name ‘population health science’. Reviewing all publications from the first volume to the end of 2018, I show that most BMC Medicine papers are epidemiological in nature, and the majority of them are applied epidemiology, with few methodological papers. Good research must address important translational questions that should not be driven by the increasing availability of data, but should take appropriate advantage of it. Over the next 15 years it would be good to see more publications that integrate results from several different methods, each with different sources of bias, in a triangulation framework.
In the 15 years since BMC Medicine was launched in November 2003, epidemiology has led the challenge of ‘Big Data’ science , reinvigorated debates about what can legitimately be considered causes of diseases and what methods should be used to determine causality (e.g., [2, 3]), and become increasingly known as ‘population health science’ . These three changes are related to each other and to broader changes in science and society, as well as being rooted in a much longer history going back decades if not centuries. I thought it would be interesting to consider how these recent changes are reflected in the last 15 years of BMC Medicine. To do this, I undertook a review of the types of studies published by BMC Medicine in the last 15 years (see Fig. 1 and Additional file 1 for the methodology used to prepare this figure). I was pleased to see that most of the published research articles were epidemiology studies (Fig. 1a; 981/1334; 74%). Most of the epidemiology papers were applied studies (Fig. 1a; 946/981; 96%). This is a common finding in general medical journals, despite the existence of several specific epidemiology journals . The few papers that I considered to be methodological (Fig. 1b; 35/981; 4%) were largely concerned with methods for developing or refining tools to measure risk factors or disease outcomes (e.g., [6, 7]), rather than research into analytical or study design methods. There was little evidence that authors were using directed acyclic graphs (DAGs) to demonstrate statistical assumptions .
‘Big Data’ has no clear definition, but the term can be used to refer to datasets with many participants and/or many variables. The former category includes large-scale record linkage studies; the latter includes the integration of multiple ‘omics data with socioeconomic, environmental, lifestyle and clinical data in epidemiological studies and the collection of intense, continuously measured data, such as glucose levels collected by sensors at short, regular intervals. The current BMC Medicine call for papers in this area notes: “Big Data in Medicine can be used to provide health profiles and predictive models around individual patients. The use of high-throughput data to integrate genetic and clinical inter-relationships; real-world data to infer biological principles as well as associations, trajectories and stratifications of patients; data-driven approaches for patients and digital platforms are the hope for medical problems and evidence-based medicine” .
However, as Saracci has eloquently highlighted, excessive claims for ‘Big Data’, such as is proposed in this statement, can result in ‘bigness’ overriding the key principles of epidemiology and good science. These principles include, for example, the need for data (and software) validity, replication or validation of results in independent studies and, importantly, using data to address the most relevant questions rather than ‘blind [big] data dredging’ . As with other journals, BMC Medicine has published a small proportion of ‘omics studies (Fig. 1b; 77/981 (8%) of the epidemiology papers included some ‘omics measurements) and most of these were small and had no independent replication or validation (e.g., [10,11,12]). Larger studies that did include replication (e.g., [13, 14]) have been published more recently.
Population health science
The increasing use of the term ‘population health science’ in part reflects the potential for epidemiologists to undertake population level physiology and embed this in what was previously called ‘social medicine’. This is enabled by the integration of multiple ‘omics data with socioeconomic, lifestyle and clinical data in large cohort studies. Multidisciplinary (i.e., people or groups from different disciplines working together on research projects by drawing on their specific disciplinary knowledge) and interdisciplinary (i.e., synthesising methods and knowledge from different disciplines to answer research questions) approaches are needed to realise the full potential of these data . Thus, over the last 15 years, epidemiologists have increasingly learnt the theories and language of colleagues from diverse fundamental and emerging disciplines, including mathematics, biology, chemistry, data and computer science and (bio)informatics [15,16,17]. We have worked in large collaborations with these disciplines, as well as with social and clinical scientists, with whom we have a long tradition of working. This multidisciplinary and interdisciplinary work with population data has been called ‘population health science’ .
Causality, Mendelian randomisation and triangulation
One of the most notable changes in epidemiology in the last 15 years has been the increased use of Mendelian randomisation (MR) . MR is the use of genetic data to explore causal effects of modifiable (non-genetic) risk factors. The first formal proposal of this method (as used over the last 15 years) was published in February 2003 , just 9 months before the first volume of BMC Medicine was published. Notably, in that original paper – and particularly in a subsequent paper – George Davey Smith acknowledges a long history of others who have suggested the use of genetic variants in this way, including Fisher, who made the link between randomised trials and the random segregation of genetic variants in 1951 . MR and other new methods have stimulated debates about causality, the underlying assumptions of different analytical methods and the importance of acknowledging and exploring these . This has resulted in epidemiologists increasingly using DAGs to demonstrate their causal analysis assumptions, particularly for new methods or causal frameworks, such as MR. Over the last 15 years, MR has been increasingly used to improve causal understanding of the effects of lifestyle risk factors and pathophysiological targets on human health and disease [20,21,22,23,24]. Alongside these applications, considerable efforts have been made to develop methods to explore the validity of the genetic instruments used in MR studies and the robustness of their results [25,26,27,28,29,30,31,32,33,34]. The availability of summary results from large numbers of genome-wide association studies (GWAS) that can be used for two-sample MR , together with automated tools (such as MR-Base ) for analysing these data and performing sensitivity analyses, have contributed to recent increases in the use of (two-sample) MR. This shift is reflected in the results of my review of BMC Medicine publications: just one MR study was published before 2018. This paper, published in 2004, did not use the term MR, but used MTHFR genetic variants to explore the role of homocysteine in migraine . By contrast, six MR studies were published in BMC Medicine in 2018 [37,38,39,40,41,42], five of which used two-sample MR.
The ease with which two-sample MR can be undertaken means that some authors can complete analyses in minutes without giving sufficient thought to the importance or relevance of the research question being explored. They may also fail to consider or discuss key methodological issues (even when using automated systems developed specifically for two-sample MR). These include whether the two samples are from the same underlying population and whether the GWAS population used is relevant for the research question. In addition, replication of these two-sample MR findings and triangulating them with results from other methods with different underlying assumptions should be explored . One notable example of the poor science that can result from the rush to an ‘easy publication’ is demonstrated by the comparison of results from two studies published in 2016. Both studies applied two-sample MR to the same publicly available data, but reported diametrically opposing conclusions (one reported that higher circulating C-reactive protein concentration increased risk of schizophrenia, while the other concluded that it decreased schizophrenia risk) . Hartwig and colleagues demonstrated how one of the two had not harmonised summary data across the two samples (Table 3 in ); that paper has subsequently been retracted .
The use of triangulation is increasingly recognised as key to exploring causal effects . In this approach, results are compared from several different epidemiological methods, each of which has different, unrelated, key sources of bias. The idea is that if each of these methods suggests that a risk factor is causally related to an outcome, despite their different sources of bias, confidence in the results increases and a true causal effect is reflected. If results differ, by being explicit in the first instance about their different sources of bias, it is possible to determine what further studies would be needed to obtain a robust causal answer . Going forwards, the potential for further extending this approach in a truly interdisciplinary way – including integrating data from (bio)informatics and laboratory science – is an exciting possibility for the next 15 or more years.
Data sharing and supporting team science
Changes in epidemiology over the last 15 years have coincided with debates about data-use and sharing . In cohort studies, there is no equivalent of the randomised trial register that provides a means of exploring ‘data dredging’ and publication bias. In a 2007 commentary, I noted that with the increasing number of cohorts and data within them that are, rightly, shared across the global scientific community to investigate many different hypotheses, it was nearly impossible to judge contributions to publication bias from observational epidemiology . I suggested then that this situation might be improved by changing the journal publication process so that authors submitted only the introduction and methods of their study. In this way, decisions to publish would not be dependent on the results (and whether or not they reached some arbitrary P-value threshold). This opinion had no influence on journal editors or researchers and, in fact, my thoughts have changed since then. I think accessing cohort data would benefit from the requirement to submit a brief ‘protocol’ of planned analyses that could serve as a ‘register’. These should be kept as simple as possible and made public. They should neither be used to judge (scientifically) whether data are shared, nor to reject access on the basis of overlap with other proposals. Two UK examples of this process are the UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC) [47, 48] (for transparency, I acknowledge that I have had a leading scientific role in ALSPAC for the last 15 years). Debates about the pros and cons of this approach versus access that does not require registration are likely to continue, but I hope over the coming years that more researchers, funders, academic institutions and journal editors will insist on clear policies for sharing of hypotheses, data and analysis code between researchers. In addition, they should push for ‘team science’, with recognition of all who contribute (including those who recruit participants and collect and process data).
As a new member of the BMC Medicine Editorial Board, I am pleased to see that a consistently high proportion of applied epidemiology papers have been published over the last 15 years (Fig. 1a). As I read through the titles and abstracts of each paper, I also sensed that a high proportion of this research is from low and middle income countries, which I am also pleased about. In the next 15 years, it would be nice to see the advice to researchers from a recent Nature editorial reflected in published BMC Medicine research: ‘In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth’ .
Availability of data and materials
Genome-wide association study
Saracci R. Epidemiology in wonderland: big data and precision medicine. Eur J Epidemiol. 2018;33(3):245–57.
Davey SG. Post-“modern epidemiology”: when methods meet matter. Am J Epidemiol. 2019. https://doi.org/10.1093/aje/kwz064.
Pearce N, Vandenbroucke JP, Lawlor DA. Causal inference in environmental epidemiology: old and new approaches. Epidemiology. 2019;30(3):311–6.
Keyes KM, Galea S. Setting the agenda for a new discipline: population health science. Am J Public Health. 2016;106(4):633–4.
Saracci R. Epidemiology, the international epidemiological association and the international journal of epidemiology: a personal chronicle. Int J Epidemiol. 2016;45(6):1727–32.
Zhang J, Mikolajczyk R, Lei X, Sun L, Yu H, Cheng W. An adjustable fetal weight standard for twins: a statistical modeling study. BMC Med. 2015;13:159.
Ndila C, Bauni E, Nyirongo V, Mochamah G, Makazi A, Kosgei P, et al. Verbal autopsy as a tool for identifying children dying of sickle cell disease: a validation study conducted in Kilifi district, Kenya. BMC Med. 2014;12:65.
Pearce N, Lawlor DA. Causal inference – so much more than statistics. Int J Epidemiol. 2016;45(6):1895–903.
Beyond Big Data to new biomedical and health data science: moving to next century precision health. BMC Med. 2019. https://bmcmedicine.biomedcentral.com/articles/collections/bigdata. Accessed 15 May 2019.
Agardh E, Lundstig A, Perfilyev A, Volkov P, Freiburghaus T, Lindholm E, et al. Genome-wide analysis of DNA methylation in subjects with type 1 diabetes identifies epigenetic modifications associated with proliferative diabetic retinopathy. BMC Med. 2015;13:182.
Lin X, Lim IY, Wu Y, Teh AL, Chen L, Aris IM, et al. Developmental pathways to adiposity begin before birth and are influenced by genotype, prenatal environment and epigenome. BMC Med. 2017;15(1):50.
Maitre L, Fthenou E, Athersuch T, Coen M, Toledano MB, Holmes E, et al. Urinary metabolic profiles in early pregnancy are associated with preterm birth and fetal growth restriction in the Rhea mother-child cohort study. BMC Med. 2014;12:110.
Wang Q, Ferreira DLS, Nelson SM, Sattar N, Ala-Korpela M, Lawlor DA. Metabolic characterization of menopause: cross-sectional and longitudinal evidence. BMC Med. 2018;16(1):17.
Lau CE, Siskos AP, Maitre L, Robinson O, Athersuch TJ, Want EJ, et al. Determinants of the urinary and serum metabolome in children from six European populations. BMC Med. 2018;16(1):202.
Felix JF, Joubert BR, Baccarelli AA, Sharp GC, Almqvist C, Annesi-Maesano I, et al. Cohort Profile: Pregnancy And Childhood Epigenetics (PACE) Consortium. Int J Epidemiol. 2018;47(1):22–23u.
Santos Ferreira DL, Williams DM, Kangas AJ, Soininen P, Ala-Korpela M, Smith GD, et al. Association of pre-pregnancy body mass index with offspring metabolic profile: analyses of 3 European prospective birth cohorts. PLoS Med. 2017;14(8):e1002376.
Millard LAC, Tilling K, Lawlor DA, Flach PA, Gaunt TR. Physical activity phenotyping with activity bigrams, and their association with BMI. Int J Epidemiol. 2017;46:1857–70.
Davey Smith G, Ebrahim S. “Mendelian randomisation”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22.
Davey SG. Capitalising on Mendelian randomization to assess the effects of treatments. JLL Bulletin: Commentaries on the history of treatment evaluation. 2006. https://www.jameslindlibrary.org/articles/capitalising-on-mendelian-randomization-to-assess-the-effects-of-treatments/.
Borges MC, Barros AJD, Ferreira DLS, Casas JP, Horta BL, Kivimaki M, et al. Metabolic profiling of adiponectin levels in adults: Mendelian randomization analysis. Circ Cardiovasc Genet. 2017;10(6):e001837.
Borges MC, Lawlor DA, de Oliveira C, White J, Horta BL, Barros AJD. The role of adiponectin in coronary heart disease risk: a Mendelian randomization study. Circ Res. 2016;119:491–9.
Carter AR, Borges MC, Benn M, Tybjaerg-Hansen A, Davey Smith G, Nordestgaard BG, et al. Combined association of body mass index and alcohol consumption with biomarkers for liver injury and incidence of liver disease: a Mendelian randomization study. JAMA Netw Open. 2019;2(3):e190305.
Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, et al. Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol. 2011;40(6):1617–28.
Tyrrell J, Richmond RC, Palmer TM, Feenstra B, Rangarajan J, Metrustry S, et al. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. JAMA. 2016;315(11):1129–40.
Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40(4):304–14.
Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan NA, Thompson JR. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-egger regression: the role of the I2 statistic. Int J Epidemiol. 2016;45(6):1961–74.
Bowden J, Del Greco MF, Minelli C, Zhao Q, Lawlor DA, Sheehan NA, et al. Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. Int J Epidemiol. 2018. https://doi.org/10.1093/ije/dyy258.
Hartwig FP, Davies NM, Hemani G, Davey SG. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol. 2016;45(6):1717–26.
Lawlor DA. Two-sample Mendelian randomization: opportunities and challenges. Int J Epidemiol. 2016;45(3):908–15.
Slichter D. Testing instrument validity and identification with invalid instruments. Society of Labor Economics Journal of labor economics. Chicago, IL: University of Chicago Press; 2014. http://www.sole-jole.org/14436.pdf. Accessed 15 May 2019
Spiller W, Slichter D, Bowden J, Davey SG. Detecting and correcting for bias in Mendelian randomization analyses using gene-by-environment interactions. Int J Epidemiol. 2018. https://doi.org/10.1093/ije/dyy204.
Guo Z, Kang H, Cai TT, Small DS. Confidence intervals for causal effects with invalid instruments using two-stage hard thresholding with voting. arXiv:1603.05224v3 [math.ST] 2017.
Kang H, Zhang A, Cai TT, Small DS. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J Am Stat Assoc. 2016;111:132–42.
van Kippersluis H, Rietveld CA. Pleiotropy-robust Mendelian randomization. Int J Epidemiol. 2018;47(4):1279–88.
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408.
Lea RA, Ovcaric M, Sundholm J, MacMillan J, Griffiths LR. The methylenetetrahydrofolate reductase gene variant C677T influences susceptibility to migraine with aura. BMC Med. 2004;2:3.
Disney-Hogg L, Cornish AJ, Sud A, Law PJ, Kinnersley B, Jacobs DI, et al. Impact of atopy on risk of glioma: a Mendelian randomisation study. BMC Med. 2018;16(1):42.
Lai FY, Nath M, Hamby SE, Thompson JR, Nelson CP, Samani NJ. Adult height and risk of 50 diseases: a combined epidemiological and genetic analysis. BMC Med. 2018;16(1):187.
He Y, Timofeeva M, Farrington SM, Vaughan-Shaw P, Svinti V, Walker M, et al. Exploring causality in the association between circulating 25-hydroxyvitamin D and colorectal cancer risk: a large Mendelian randomisation study. BMC Med. 2018;16(1):142.
Larsson SC, Burgess S, Michaelsson K. Serum magnesium levels and risk of coronary artery disease: Mendelian randomisation study. BMC Med. 2018;16(1):68.
Mocellin S, Tropea S, Benna C, Rossi CR. Circadian pathway genetic variation and cancer risk: evidence from genome-wide association studies. BMC Med. 2018;16(1):20.
Nordestgaard LT, Tybjaerg-Hansen A, Rasmussen KL, Nordestgaard BG, Frikke-Schmidt R. Genetic variation in clusterin and risk of dementia and ischemic vascular disease in the general population: cohort studies and meta-analyses of 362,338 individuals. BMC Med. 2018;16(1):39.
Inoshita M, Numata S, Tajima A, Kinoshita M, Umehara H, Nakataki M, et al. Retraction: a significant causal association between C-reactive protein levels and schizophrenia. Sci Rep. 2018;8:46947.
Lawlor DA, Tilling K, Davey SG. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866–86.
Barbui C. Sharing all types of clinical data and harmonizing journal standards. BMC Med. 2016;14:63.
Lawlor DA. Quality in epidemiological research: should we be submitting papers before we have the results and submitting more hypothesis-generating research? Int J Epidemiol. 2007;36(5):940–3.
Allen NE, Sudlow C, Peakman T, Collins R, Biobank UK. UK biobank data: come and get it. Sci Transl Med. 2014;6(224):224ed224.
Lawlor DA, Lewcock M, Rena-Jones L, Rollings C, Yip V, Smith D, et al. The second generation of the Avon longitudinal study of parents and children (ALSPAC-G2): a cohort profile. Wellcome Open Res. 2019;4:36. https://doi.org/10.12688/wellcomeopenres.15087.1.
It’s time to talk about ditching statistical significance. Looking beyond a much used and abused measure would make science harder, but better. Nature. 2019;567:283.
Professor George Davey Smith (University of Bristol) and Professor Neil Pearce (London School of Hygiene & Tropical Medicine) provided useful comments on an earlier draft of this editorial.
DAL works in a unit that receives support from the University of Bristol and UK Medical Research Council (grant number MC_UU_00011/6). The views presented in this paper are those of the author and not necessarily any acknowledged funding body or person.
Consent for publication
DAL receives or has received during the last 10 years) support from the UK National Institute for Health Research, British Heart Foundation, European Research Council, US National Institute of Health, Medtronic Ltd. and Roche Diagnostics for research that is unrelated to this article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lawlor, D.A. Fifteen years of epidemiology in BMC Medicine. BMC Med 17, 177 (2019) doi:10.1186/s12916-019-1407-5
- Big data