Skip to main content

Development and validation of a multivariable prediction model for infection-related complications in patients with common infections in UK primary care and the extent of risk-based prescribing of antibiotics



Antimicrobial resistance is driven by the overuse of antibiotics. This study aimed to develop and validate clinical prediction models for the risk of infection-related hospital admission with upper respiratory infection (URTI), lower respiratory infection (LRTI) and urinary tract infection (UTI). These models were used to investigate whether there is an association between the risk of an infection-related complication and the probability of receiving an antibiotic prescription.


The study used electronic health record data from general practices contributing to the Clinical Practice Research Datalink (CPRD GOLD) and Welsh Secure Anonymised Information Linkage (SAIL), both linked to hospital records. Patients who visited their general practitioner with an incidental URTI, LRTI or UTI were included and followed for 30 days for hospitalisation due to infection-related complications. Predictors included age, gender, clinical and medication risk factors, ethnicity and socioeconomic status. Cox proportional hazards regression models were used with predicted risks independently validated in SAIL.


The derivation and validation cohorts included 8.1 and 2.7 million patients in CPRD and SAIL, respectively. A total of 7125 (0.09%) hospital admissions occurred in CPRD and 7685 (0.28%) in SAIL. Important predictors included age and measures of comorbidity. Initial attempts at validating in SAIL (i.e. transporting the models with no adjustment) indicated the need to recalibrate the models for age and underlying incidence of infections; internal bootstrap validation of these updated models yielded C-statistics of 0.63 (LRTI), 0.69 (URTI) and 0.73 (UTI) indicating good calibration. For all three infection types, the rate of antibiotic prescribing was not associated with patients’ risk of infection-related hospital admissions.


The risk for infection-related hospital admissions varied substantially between patients, but prescribing of antibiotics in primary care was not associated with risk of hospitalisation due to infection-related complications. Our findings highlight the potential role of clinical prediction models to help inform decisions of prescribing of antibiotics in primary care.

Peer Review reports


Antimicrobial resistance (AMR) is one of the biggest global threats facing modern healthcare and medicine [1, 2]. A recent World Health Organization report highlighted the urgency of this problem, identifying that drug-resistant infections cause at least 700,000 deaths globally a year [3]. This number could rise to 10 million per year by 2050 if no action is taken [4,5,6]. One driving factor behind the emergence and persistence of AMR is the overuse and misuse of antibiotics [7]. It is not purely a contemporary issue, as government committees in the UK discussed strategies to optimise antibiotic usage more than 20 years ago [8, 9]. Despite this, the way physicians make the decision on whether to prescribe has changed little in that time and is still largely reliant on their immediate assessment of a patient’s symptoms.

Primary care was responsible for prescribing over 80% of all antibiotics in the National Health Service (NHS) in 2017 [10]. Earlier research has examined antibiotic prescribing patterns in primary care in the UK and found that it is heterogeneous regionally and nationally [11,12,13]. A recent study highlighted that substantial variability exists both within and between general practices and that there are multiple drivers behind the decision to prescribe [14]. Together, this suggests that a more evidence-based approach to decision-making for antibiotic prescribing is required to achieve better patient care. Prescribing based on an objective evaluation of a patient’s risk is a relatively new concept but is gaining popularity. For example, prescribing of statins is now guided by the QRISK algorithm [15], used to estimate a patient’s risk of developing cardiovascular disease in the following 10-year period. Applying a similar approach to antibiotic prescribing could facilitate a more targeted use of a medication that is becoming increasingly ineffective. However, to date, there are no validated risk models for this purpose.

The aim of this study was twofold: first, to develop and validate prognostic models that predict the risk of developing infection-related complications in patients who consult their general practitioner (GP) for a common infection; second, to use these models to investigate whether there is an association between the risk of an infection-related complication and the rate of receiving an antibiotic prescription. Three common infections were investigated: lower respiratory tract infections (LRTI), urinary tract infections (UTI) and upper respiratory tract infections (URTI).


This was a retrospective cohort study using data from two sources: the Clinical Practice Research Datalink (CPRD GOLD) and the Secure Anonymised Information Linkage (SAIL) databases, which made up the derivation and validation cohorts, respectively. CPRD GOLD contains longitudinal, anonymised, patient-level electronic health records (EHRs) from general practices in the UK with more than 5 million active patient records representing about 8% of the UK population [16]. SAIL contains data from approximately 80% of general practices in Wales and covers around 75% of the 3 million population [17,18,19].

The EHRs included clinical diagnoses, prescribed medication, vaccination history, diagnostic testing, lifestyle information and clinical referrals, as well as patient’s age, gender, ethnicity, smoking history and body mass index (BMI). Patient-level socioeconomic information was available through linkage of the postcode of a patient’s residence to the Index of Multiple Deprivation (IMD) [20]. Patient-level IMD was aggregated based on quintiles. Antibiotic prescriptions were determined using the British National Formulary.

The derivation dataset from CPRD comprised routinely collected data from 587 general practices in England from 1 January 2000 to 31 December 2015. Patient-level data from the general practices were linked to hospitalisation data (HES for CPRD GOLD, PEDW for SAIL) containing information on the date of hospital admission and the clinical diagnoses established at and during admission (coded using ICD10 codes). Linked data were available for about half of CPRD practices which are all located in England and for all the SAIL practices. Patients were followed for 30 days after their initial GP consultation to determine if they suffered further complications as a result of their infection. The validation cohort (SAIL) covered 338 general practices in Wales from 1 January 2000 to 15 March 2017.

Study population

The study population included patients consulting their GP for one of three infections (LRTI, UTI and URTI including coughs, colds and sore throats). READ codes (version 2 for CPRD and version 3 for SAIL) were used to extract EHRs for each infection-related consultation. Code lists used in this study are available on the Clinical Codes Repository [21]. Across both datasets, we restricted the study population to incidental consultations (i.e. no record of previous consultation for these infections or antibiotic prescribing 3 months before). Patients could appear in the dataset on multiple occasions (as separate records) due to the long-term nature of the study. For the development of the clinical prediction models (CPMs), we excluded all patients who were prescribed an antibiotic on the day of their consultation.


The primary outcome was the time between a patient’s GP consultation and hospital admission due to infection-related complications, with censoring at 30 days. Hospital admissions due to infection-related complications were identified by the ICD10 codes for the primary admission diagnosis, where we considered a broad set of infections (such as hospital admission for LRTI, pneumonia, sepsis); the full list is also available at [21]. Hospital admission (for any reason) was also used as an additional outcome for the study.

Predictor variables

The full list of potential predictors was derived based on a literature review and discussions with clinical experts; this list is outlined in Table 1.

Table 1 List of potential predictors considered for the risk prediction models

For all infections, patient data for smoking status and BMI were missing for over 50% of patients; hence, they were not considered during the modelling stage. Imputation would not have been feasible without introducing unnecessary bias [25]. Patients for whom IMD information was not available (0.12%) were also removed from the derivation dataset; this step was not required for the validation dataset as the IMD linkage was complete. For patient ethnicity, white and unknown were combined (following the approach taken by Hippisley-Cox et al. [26]), with the remaining ethnicities forming a category labelled combined minorities.

Statistical methods

Cox proportional hazards regression models were fitted to the derivation cohort. Patients entered the study following a consultation with a GP for one of the three common infections and were monitored for the following 30 days. Age was categorised into 11 age groups; initial investigation using cubic splines was considered but found to be problematic due to the sharp increases in incidence rates in patients > 50 years, causing the models to overestimate the risk in older patients.

To validate the performance of the models developed in CPRD GOLD, they were applied to the SAIL dataset (geographical external validation). Predictive performance was assessed in terms of discrimination (ability of the models to differentiate those who experienced the outcome from those that did not), using established metrics (R2 statistic for survival data and the concordance value/C-statistic). Additionally, model calibration was quantified by comparing the observed and predicted risks for decile groups based on the predicted risk of the patient.

Model updating methods were applied (see Supplement 1) to CPMs that were found to be miscalibrated in the validation cohort (SAIL). These updated models were internally validated using bootstrap resampling to correct for in-sample optimism (since we did not have a further independent dataset to perform further geographical external validation). After completing model derivation and updating, there were six CPMs: one for each of the three indications, across both CPRD and SAIL. To investigate antibiotic prescribing according to the predicted risk, the models were applied to all patients (non-antibiotic and antibiotic users) in the relevant dataset (e.g. the LRTI CPRD model to all LRTI patients in the CPRD dataset). Patients were categorised into 10 groups based on their risk level, and the prescribing rate for each group was compared.

The extent of risk-based prescribing of antibiotics was evaluated in the CPRD cohort containing both antibiotic and non-antibiotic users with an incidental common infection by first estimating the risks of infection-related hospital admission. This was based on the predictions by the three development prediction models. The probability of patients who received an antibiotic prescription was then estimated for each of the three common infections. The study considered non-linear relationships between whether patients received an antibiotic and the risk of antibiotic prescribing using fractional polynomial models [26]. The final fractional polynomial model was selected by Akaike Information Criterion (AIC) from the combination of two terms for predicted risk of infection-related hospital admissions including x-2, x-1, x-0.5, log(x0), x0.5, x1, x2, x3. The models were adjusted for the calendar year.

The analysis was done using R versions 3.3.3 to 3.5 [27] depending on the analysis environment used for the two datasets. The ‘survival’ package [28] was used to fit the survival models and estimate hazard ratios (HRs) and 95% confidence intervals (CIs). Other packages used included the ‘pec’ package [29] to calculate survival probabilities, the ‘survminer’ package [30] to check the proportional hazards assumption and the ‘rms’ package [31] for the bootstrap validation. The polynomial analysis was done by R package “mfp” [32].


A total of 10.8 million incidental consultations for URTI, LRTI and UTI were analysed in this study: 8,110,530 from CPRD and 2,727,646 from SAIL. There were 6,311,321 antibiotic users (58.23%) and 4,526,855 non-antibiotic users (41.77%), with 33,067 events recorded in CPRD and SAIL combined. Comparisons between the two datasets indicate that the validation cohort had a much younger demographic (mean age CPRD = 38.98; SAIL = 29.80), although most other covariates had broadly similar values (Table 2). The disparity in mean age can be accounted for by the SAIL dataset having many more patients in the under 6 age group (e.g. in CPRD, the prevalence of URTIs was 18.6%, whereas for SAIL, it was 32.0%), which serves to reduce the average age of the population. The high level of white and unknown ethnicity is also striking. This is primarily due to the high level of unrecorded data for ethnicity, and the values are in line with other similar studies [33].

Table 2 Baseline characteristics of the derivation and validation cohorts (i.e. incidental antibiotic users with no antibiotic prescription at the date of consultation and in previous 3 months)

The incidence rate of events was low among non-antibiotic users, with the mean rate being 1.71 cases per 1000 person-months in the derivation cohort and 7.49 in the validation cohort. For both datasets, the incidence rate increases with age (for adults) and increasing Charlson Comorbidity Index (Table 3). Most of the hospital admissions for infection-related complications were for LRTI (CPRD, 10.28 cases per 1000 person-months; SAIL, 23.73), followed by UTI (2.09; 7.00) then URTI (1.23; 6.42).

Table 3 Counts and incidence rates for events of hospitalisation due to infection-related complications for the non-antibiotic users in both the validation and derivation cohorts

CPM derivation

After developing the CPMs within CPRD, age proved to be the most influential characteristic in determining the risk level across all infections (Table 4). The HRs were highest for the youngest and oldest patients taking the values of 2.43 (LRTI), 2.20 (URTI) and 10.48 (UTI) for the < 5 category, and 5.76 (LRTI), 4.82 (URTI) and as high as 15.23 (UTI) for the 80+ category. Other factors that had a strong impact on risk were those detailing the patient’s past medical history such as their Charlson Comorbidity Index and previous history of hospitalisation. As expected, within the development cohort (CPRD), the models were well-calibrated (Fig. 1), and the concordance values reported ranged from 0.71 to 0.82 (Table 5).

Table 4 HRs for the incidence of hospital admission due to infection-related complications in the derivation cohort (CPRD GOLD)
Fig. 1
figure 1

Predicted against observed risks for non-antibiotic users in the derivation cohort (CPRD GOLD) for each decile (stratified by risk level). x-axis: predicted risk (N events per 1000 person-months). y-axis: observed risk (N events per 1000 person-months)

Table 5 Performance metrics for the prediction models fitted to the derivation cohort (CPRD GOLD)

Model external validation

During first attempts at validation in the SAIL cohort (i.e. geographical validation), the concordance values were 0.61 (LRTI), 0.68 (URTI) and 0.73 (UTI), but poor calibration was observed; the Brier score (averaged over 10 risk groups) was 13.17 cases per 1000 person-months for LRTI (URTI, 5.25; UTI, 4.87). The parameter causing the most divergence when transporting the models was age. Therefore, we updated all of the models to adjust for these differences using an additional age factor (further justification is provided in Supplement 1). Once these models had been refitted (overall adjustment of age shown in Table 6), the calibration was much better (Fig. 2)—Brier score, 3.78 cases per 1000 person-months for LRTI (URTI, 0.92; UTI, 1.76). Bootstrap validation of these models in the validation cohort leads to optimism-corrected concordance values of LRTI, 0.63; URTI, 0.69; and UTI, 0.73. Supplement 2 provides the TRIPOD Checklist for prediction model development.

Table 6 Adjusted age HRs following model adjustment in the validation cohort (SAIL)
Fig. 2
figure 2

Predicted against observed risks for non-antibiotic users in the validation cohort (SAIL) for each decile (stratified by risk level); models were adjusted for the validation cohort by adding an extra predictor to model age in the derivation cohort. x-axis: predicted risk (N events per 1000 person-months). y-axis: observed risk (N events per 1000 person-months)

Antibiotic prescribing rates

The probability of antibiotic prescribing was plotted for each of the 10 stratified groups of predicted risk for each infection and dataset (Fig. 3). Prescribing rates remained relatively constant across all levels of predicted risk. For all three infection types, patients with very low risks of being hospitalised for infection-related complications were as likely to be prescribed an antibiotic as those with much higher risks.

Fig. 3
figure 3

Probability of antibiotic prescribing stratified by predicted risk level for both the derivation and validation cohorts (o = LRTI – derivation cohort; Δ = LRTI – validation cohort;  = URTI – derivation cohort; + = URTI – validation cohort; □ = UTI – derivation cohort; * = UTI – validation cohort). x-axis: decile of predicted risk. y-axis: probability of antibiotic prescribing

Additional sensitivity analysis was done using the outcome of hospitalisation for any reason—application of these models to all patients reiterated this finding showing that prescribing rates were relatively uniform across all risk groups.


This study developed three clinical prediction models to predict the risk of infection-related hospitalisation following a GP consultation for URTI, LRTI and UTI using the population-based CPRD and SAIL datasets. The models were successfully adjusted and updated to generalise over datasets covering England and Wales. The models were then applied to datasets containing both antibiotic and non-antibiotic users, and we observed that the decision to prescribe an antibiotic was independent of the risk of hospitalisation due to infection-related complications. Furthermore, it was found that the risk levels of patients vary significantly both across the patient cohort and by indication, which indicates that risk scores provide enough discrimination between patients to offer a viable alternative to traditional approaches to prescribing largely based on symptoms alone. Together, these two observations suggest a potential way to achieve further optimisation of antibiotic usage in primary care.

In previous work, there have been very few attempts to apply clinical risk prediction modelling to the management of infectious conditions. The prediction models that have been developed in this area have focused on specific resistance strains [34, 35] or specific age groups [36] and have had much smaller patient cohorts compared to the size of the populations considered in this study. The major strength of this work is the utilisation of two large independent population-based EHR datasets (both in terms of volume and timespan) for model development and validation. Moreover, individual risk models were developed for each infectious condition, rather than combining multiple conditions in a single model (as others have done [15, 37]).

In 2016, the King’s Fund in the UK examined the pressures of general practice and highlighted the issues faced by practitioners such as increasing workload, greater complexity of work and pressures to meet strict deadlines [38]. In this context, a clinical risk prediction model to objectively assess a patient’s risk of hospitalisation may be welcomed. Estimated risk scores could be presented to the patient, supporting a shared decision-making approach during the consultation. Additional work is needed to validate the clinical risk prediction models before they could be used in clinical practice, but this work represents the first step towards changing the way GPs assess and treat patients for multiple common infections.

Whilst some may argue that the link between antibiotic prescribing and infection-related hospitalisation is not necessarily causal, it is indisputable that antibiotics are the most effective large-scale treatment for common infections. Hence, ensuring the efficient use of antibiotics in primary care is the easiest way to manage the cases of infection-related hospitalisation downstream.

Simplified versions of the models presented here have been made available to medical professionals as an educational resource through a risk calculator tool (Fig. 4) as part of the BRIT Antibiotic Prescribing Dashboard, which is on the HSCN (Health and Social Network). Whilst the models are able to advise on which patients are at the highest risk, they do not explicitly identify which patients should or should not be offered treatment. Given that antibiotics are relatively cheap and very effective, the decision over whether to prescribe is often complicated by the fact that not treating a serious bacterial infection has a much higher cost to the individual than over-treating, leading to physicians prescribing “just in case” [39]. Patients with infections can deteriorate quickly (possibly over a matter of hours), whereas other clinical prediction models investigate conditions (e.g. cardiovascular disease or types of cancer [33, 40, 41]) that develop over a much longer period of time and allow a longer period in which to intervene.

Fig. 4
figure 4

The risk calculator available through the BRIT antibiotic prescribing dashboard

Conversely, it cannot be automatically assumed that antibiotics should only be given to patients with a high risk of infection-related hospitalisation—clinical assessment is a crucial part of the process, and a risk-based judgement is never complete on its own. There are many additional factors that can complicate the decision-making process including whether the infection is genuine, if symptoms will improve with treatment, and potential for infection-related complications. Despite this, the calculators provide a further complementary tool and could work to counteract individual proclivities in prescribing.

Model transportability was a problem during the validation phase of the work, showing that it was difficult to find a model that generalised well to both populations. Domain validation (as done in this study) is notoriously difficult [42] and rarely attempted [43, 44] because it is not easy to account for the demographic contrast and, in this case, local infectious factors. Here, the validation cohort (SAIL) showed a much higher incidence rate of complications overall, in particular, among young children (age < 5), part of which could be attributable to the measles epidemics in South Wales in recent years [45]. Wide CIs were also observed for some groups, e.g. for patients 80+, UTI—this is likely down to the low occurrence of events for this indication, particularly for the baseline group (age 20–30), leading to low statistical power. Despite this difficulty, standard approaches were applied to account for these differences and underpin the results presented, by allowing us to develop CPRD-specific and SAIL-specific models, using the former as the foundations for the latter.

The main limitation with a study of this type is that EHRs can only provide a static snapshot of a patient’s consultation. Because of this, there is no way to fully understand the severity or type of symptoms seen by the practitioner. Here, we separated consultations into three different infections using a single umbrella term to describe many different READ codes, giving the impression that all cases of a single infection have the same level of seriousness, when, in reality, that is not the case. The vast array of codes available and variety by which the same condition can be coded adds to the complexity of conducting this type of analysis [46]. However, despite these limitations, this was the best approach because it would have been very difficult to build the prediction models for individual read codes as the incidence rate would have been too low. Our models were also limited by the selection of the predictors; in particular, some factors were not included. For instance, despite a clear regional variation in primary care antibiotic prescribing [14, 47], this aspect was not built into the prediction models as an explanatory variable. This was done because the intention was to make the model applicable to different regions and ultimately to be used in a clinical setting across the entire UK. Smoking status and BMI, which may be important predictive variables, were also omitted from the models due to large amounts of missing information in the EHRs. Finally, the data used in the study failed to capture instances of delayed prescription or cases where patients do not take (or complete) their prescribed course. These are both interesting subgroups of the main population and would possibly warrant further investigation in their own study.

A recent study investigating the drivers of antibiotic prescribing found that prescribing guidelines alone do not positively influence a change in prescribing, and suggests that more targeted interventions are needed [14]. To achieve the ambitious government target of reducing antibiotic use in the community by 15% by 2024 [48], innovative approaches are required in primary care. Whilst this study represents a start towards that goal, further work is needed including further validation of the models in new datasets and creating new models for other common infections. As well as the research step, it is crucial that this intelligence is available to practitioners to inform their decision-making. An antibiotic prescribing dashboard containing this information is being developed, with the hope of working with general practices to construct a dynamic way to integrate this into point-of-care decision support [49].


Three clinical risk prediction models were presented, capable of evaluating a patient’s risk of developing further complications as a result of their common infection. The models have been fitted and validated using two large national datasets. Examining prescribing by risk stratification highlighted the lack of relationship between a patient’s risk level and their chance of being prescribed an antibiotic; this is likely due to practitioners prescribing to symptoms but it does show a significant area where improvements could be made to tackle overprescribing of antibiotics.

Availability of data and materials

The anonymized patient-level data used for this project cannot be shared for reasons of information governance. However, data can be obtained by application to CPRD and SAIL.



Antimicrobial resistance


Body mass index


Confidence interval


Clinical prediction model


Clinical Practice Research Datalink (data provider for English practices)


General practitioner


Hospital Episode Statistics (hospitalisation data for CPRD)


Hazard ratio


International Statistical Classification of Diseases (v10)


Index of Multiple Deprivation


Lower respiratory tract infection


National Health Service


Patient Episode Dataset for Wales (hospitalisation data for SAIL)


Secure Anonymised Information Linkage (data provider for Welsh practices)


Upper respiratory tract infection


Urinary tract infection


  1. Public Health England P. English Surveillance Programme for Antimicrobial Utilisation and Resistance (ESPAUR) report 2018. 2018.

    Google Scholar 

  2. Sugden R, Kelly R, Davies S. Combatting antimicrobial resistance globally. Nat Microbiol. 2016;1(10):16187.

    Article  CAS  PubMed  Google Scholar 

  3. Resistance ICGaA. No time to wait: securing the future from drug-resistant infections - report to the Secratary-General of the United Nations. Geneva: World Health Organisation; 2019.

  4. O’Neill J. Tackling drug-resistant infections globally: final report and recommendations. London: Review on Antimicrobial Resistance; 2016.

    Google Scholar 

  5. Adeyi OOB, Enis J, Olga B, Irwin A, Berthe FCJ, Le Gall FG, Marquez PV, Nikolic IA, Plante CA, Schneidman M, Shriber DE, Thiebaud A. Drug-resistant infections: a threat to our economic future (vol. 2): final report (English). Washington, DC: The World Bank; 2017.

  6. Committee HoCHaSC. Antimicrobial resistance: eleventh report of session 2017–19. London: House of Commons Health and Social Care Committee; 2018.

  7. Llor C, Bjerrum L. Antimicrobial resistance: risk associated with antibiotic overuse and initiatives to reduce the problem. Ther Adv Drug Saf. 2014;5(6):229–41.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Smith RD, Coast J. Antimicrobial resistance: a global response. Bull World Health Organ. 2002;80(2):126–33.

    PubMed  PubMed Central  Google Scholar 

  9. Health Do. The path of least resistance. Standing Medical Advisory Committee: sub-group on antimicrobial resistance; 1998.

    Google Scholar 

  10. England PH. English Surveillance Programme for Antimicrobial Utilisation and Resistance (ESPAUR). 2018.

    Google Scholar 

  11. Mölter A, Belmonte M, Palin V, Mistry C, Sperrin M, White A, et al. Antibiotic prescribing patterns in general medical practices in England: does area matter? Health Place. 2018;7:10–6.

    Article  Google Scholar 

  12. Pouwels KB, Dolk FCK, Smith DRM, Robotham JV, Smieszek T. Actual versus ‘ideal’ antibiotic prescribing for common conditions in English primary care. J Antimicrob Chemother. 2018;73(suppl_2):19–26.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Li Y, Mölter A, White A, Welfare W, Palin V, Belmonte M, et al. Relationship between prescribing of antibiotics and other medicines in primary care: a cross-sectional study. Br J Gen Pract. 2019;69(678):e42–51.

    Article  PubMed  Google Scholar 

  14. Palin V, Mölter A, Belmonte M, Ashcroft DM, White A, Welfare W, et al. Antibiotic prescribing for common infections in UK general practice: variability and drivers. 2019.

    Google Scholar 

  15. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. Br Med J. 2007;335(7611):136.

  16. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Jones KHFD, Lyons RA. The SAIL Databank: 10 years of spearheading data privacy and research utility. Swansea: Swansea University; 2017.

  18. Ford DV, Jones KH, Verplancke JP, Lyons RA, John G, Brown G, et al. The SAIL Databank: building a national architecture for e-health research and evaluation. BMC Health Serv Res. 2009;9:157.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Jones KH, Ford DV, Jones C, Dsilva R, Thompson S, Brooks CJ, et al. A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: a privacy-protecting remote access system for health-related research and evaluation. J Biomed Inform. 2014;50:196–204.

    Article  PubMed  PubMed Central  Google Scholar 

  20. GOV.UK. English indices of deprivation 2019 [updated 17-Jun-19. Available from: Accessed 18 Apr 2020.

  21. ClinicalCodes 2019. Available from: Accessed 18 Apr 2020.

  22. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83.

  23. Mary Charlson M. Charlson Comorbidity Index (CCI) [updated 17-Jun-2019. Available from:]. Accessed 18 Apr 2020.

  24. Ministry of Housing CaLG. National Statistics: English indices of deprivation 2015 2015 [updated 30-Sep-2015. Available from:].

    Google Scholar 

  25. Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157–66.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.

    Book  Google Scholar 

  27. Team RC. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2017.

    Google Scholar 

  28. Therneau TM, Grambsch PM. Modeling survival data: extending the Cox model; 2000.

    Book  Google Scholar 

  29. Ulla B. Mogensen HI, Thomas A. Gerds evaluating random forests for survival analysis using prediction error curves. J Stat Softw. 2012;50(11):1–23. Published online 2012 Sep.

  30. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer International Publishing; 2018.

  31. Harrell FE Jr. rms: regression modeling strategies. 5.1–2. ed; 2018.

    Google Scholar 

  32. Stephan Luecke M. mfp: multivariable fractional polynomials; 2015.

    Google Scholar 

  33. Hippisley-Cox J, Coupland C, Robson J, Brindle P. Derivation, validation, and evaluation of a new QRISK model to estimate lifetime risk of cardiovascular disease: cohort study using QResearch database. BMJ. 2010;341:c6624.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Sabino S, Soares S, Ramos F, Moretti M, Zavascki AP, Rigatto MH. A cohort study of the impact of carbapenem-resistant Enterobacteriaceae infections on mortality of patients presenting with sepsis. mSphere. 2019;4(2).

  35. Weinstein EJ, Han JH, Lautenbach E, Nachamkin I, Garrigan C, Bilker WB, et al. A clinical prediction tool for extended-spectrum cephalosporin resistance in community-onset Enterobacterales urinary tract infection. Open Forum Infect Dis. 2019;6(4):ofz164.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Figueroa-Phillips LM, Bonafide CP, Coffin SE, Ross ME, Guevara JP. Development of a clinical prediction model for central line-associated bloodstream infection in children presenting to the emergency department. Pediatr Emerg Care. 2019.

  37. Chiang PPC, Glance D, Walker J, Walter FM, Emery JD. Implementing a QCancer risk tool into general practice consultations: an exploratory study using simulated consultations with Australian general practitioners. Br J Cancer. 2015;112:S77.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Baird B, Charles A, Honeyman M, Maguire D, Das P. Understanding pressures in general practice. London: The King’s Fund; 2016.

  39. Lucas PJ, Cabral C, Hay AD, Horwood J. A systematic review of parent and clinician views and perceptions that influence prescribing decisions in relation to acute childhood infections in primary care. Scand J Prim Health Care. 2015;33(1):11–20.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Cassidy A, Myles JP, van Tongeren M, Page RD, Liloglou T, Duffy SW, et al. The LLP risk model: an individual risk prediction model for lung cancer. Br J Cancer. 2008;98(2):270–6.

    Article  CAS  PubMed  Google Scholar 

  41. Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst. 2001;93(5):358–66.

    Article  CAS  PubMed  Google Scholar 

  42. Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008;61(11):1085–94.

    Article  CAS  PubMed  Google Scholar 

  43. Siontis GC, Tzoulaki I, Castaldi PJ, Ioannidis JP. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol. 2015;68(1):25–34.

    Article  PubMed  Google Scholar 

  44. Tugwell P, Knottnerus JA. Clinical prediction models are not being validated. J Clin Epidemiol. 2015;68(1):1–2.

    Article  PubMed  Google Scholar 

  45. Wales PH. Measles outbreak: data 2015 [updated 19-Feb-2015. Available from:]. Accessed 18 Apr 2020.

  46. Williams R. A Christmas guide to clinical coding. BMJ. 2018;363:k5209.

    Article  Google Scholar 

  47. Molter A, Belmonte M, Palin V, Mistry C, Sperrin M, White A, et al. Antibiotic prescribing patterns in general medical practices in England: does area matter? Health Place. 2018;53:10–6.

    Article  PubMed  Google Scholar 

  48. Department of Health and Social Care. Tackling antimicrobial resistance 2019–2024 The UK’s five-year national action plan [updated 24-Jan-2019. Available from:]. Accessed 18 Apr 2020.

  49. Palin V, Tempest E, Mistry C, van Staa T. Developing the infrastructure to support the optimisation of antibiotic prescribing using the learning healthcare system (LHS) to improve healthcare services in the provision of primary care in England. BMJ Health Care Inform. 2020; bmjhci-2020-100147.R1 (in press).

Download references


Not applicable.


This study was funded by Connected Health Cities. Connected Health Cities is a Northern Health Science Alliance-led programme funded by the Department of Health and delivered by a consortium of academic and NHS organisations across the north of England.

Author information

Authors and Affiliations



TvS and CM designed the study and prepared the data for analysis. CM did the systematic literature search for the introduction, carried out the statistical analysis, prepared the supplementary material and wrote the first draft. TvS, GM, DJ, YL and VP provided statistical insight and guidance. In particular, YL and VP helped update the manuscript to address feedback from reviewers. VP, WW and DA contributed clinical guidance. All authors contributed with further drafts and approved the final manuscript.

Corresponding author

Correspondence to Tjeerd van Staa.

Ethics declarations

Ethics approval and consent to participate

This study is based in part on the data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. HES data is subject to Crown copyright (2018) protection, re-used with the permission of The Health, & Social Care Information Centre, all rights reserved. This study also used anonymised data held in the Secure Anonymised Information Linkage (SAIL) System, which is part of the national e-health records infrastructure for Wales.

The interpretation and conclusions contained in this study are those of the authors alone and not necessarily those of the SAIL, MHRA, NHSA, NHS or the Department of Health. The study protocol was approved by the Independent Scientific Advisory Committee for CPRD research (protocol number 16_153) and SAIL’s Information Governance Protocol Review Panel (protocol number 0693). We would like to acknowledge all the data providers who make anonymised data available for research.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Supplement 1: Details on model validation.

Additional file 2.

Supplement 2: TRIPOD Checklist for prediction model development.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mistry, C., Palin, V., Li, Y. et al. Development and validation of a multivariable prediction model for infection-related complications in patients with common infections in UK primary care and the extent of risk-based prescribing of antibiotics. BMC Med 18, 118 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: