Blood transcriptomic discrimination of bacterial and viral infections in the emergency department: a multi-cohort observational validation study
BMC Medicine volume 18, Article number: 185 (2020)
There is an urgent need to develop biomarkers that stratify risk of bacterial infection in order to support antimicrobial stewardship in emergency hospital admissions.
We used computational machine learning to derive a rule-out blood transcriptomic signature of bacterial infection (SeptiCyte™ TRIAGE) from eight published case-control studies. We then validated this signature by itself in independent case-control data from more than 1500 samples in total, and in combination with our previously published signature for viral infections (SeptiCyte™ VIRUS) using pooled data from a further 1088 samples. Finally, we tested the performance of these signatures in a prospective observational cohort of emergency department (ED) patients with fever, and we used the combined SeptiCyte™ signature in a mixture modelling approach to estimate the prevalence of bacterial and viral infections in febrile ED patients without microbiological diagnoses.
The combination of SeptiCyte™ TRIAGE with our published signature for viral infections (SeptiCyte™ VIRUS) discriminated bacterial and viral infections in febrile ED patients, with a receiver operating characteristic area under the curve of 0.95 (95% confidence interval 0.90–1), compared to 0.79 (0.68–0.91) for WCC and 0.73 (0.61–0.86) for CRP. At pre-test probabilities 0.35 and 0.72, the combined SeptiCyte™ score achieved a negative predictive value for bacterial infection of 0.97 (0.90–0.99) and 0.86 (0.64–0.96), compared to 0.90 (0.80–0.94) and 0.66 (0.48–0.79) for WCC and 0.88 (0.69–0.95) and 0.60 (0.31–0.72) for CRP. In a mixture modelling approach, the combined SeptiCyte™ score estimated that 24% of febrile ED cases receiving antibacterials without a microbiological diagnosis were due to viral infections. Our analysis also suggested that a proportion of patients with bacterial infection recovered without antibacterials.
Blood transcriptional biomarkers offer exciting opportunities to support precision antibacterial prescribing in ED and improve diagnostic classification of patients without microbiologically confirmed infections.
There is an urgent need to improve precision use of antibacterial drugs in order to minimise unnecessary prescribing . This has a disproportionate impact within hospitals. In this setting, antibacterial overuse selects for drug-resistant bacteria and disrupts host-protective microbiota among individuals with increased risk of infection due to comorbidities, invasive procedures or instrumentation. All of this is compounded by exposure to drug-resistant pathogens from other hospital inpatients or the hospital environment [2,3,4].
Precision use of antibacterials is most challenging in emergency departments (ED), where assessments are based on a single time point with limited microbiological and laboratory data. Clinical features of severe sepsis unequivocally demand empirical antibacterials . However, in patients who do not present with severe sepsis, better stratification of the risk of bacterial infection is expected to reduce antibacterial prescribing and may inform decisions about hospital admission, infection control practice and the choice of diagnostic investigations. These objectives have fuelled extensive efforts to identify biomarkers which discriminate bacterial and viral infections . Importantly, routine diagnostic microbiology may provide inaccurate estimates of the true incidence of bacterial and viral infections in an ED setting. For example, in a prospective observational study, approximately 50% of suspected bacterial infections and 30% of suspected viral infections were not confirmed . Accurate estimates of prior probability, needed to evaluate the predictive value of tests, are lacking. We hypothesise that molecular biomarkers of bacterial and viral infections may be used to obtain better estimates of the incidence of these infections in ED.
Blood leucocyte counts, C-reactive protein (CRP) and procalcitonin (PCT) are the most widely used biomarkers of infection used in current practice. Blood neutrophilia is associated with bacterial infection, but also occurs in response to trauma, seizures and vomiting . Deficient neutrophil leucocytosis or leucopaenia is recognised in elderly patients with infection and in severe sepsis . Lymphopaenia, sometimes associated with viral illnesses, is also reported as a correlate of bacteraemia . Therefore, differential blood leucocycte counts have limited value as a biomarker to guide antibacterial use. In a multivariate analysis of clinical and laboratory parameters in febrile ED patients, elevated serum CRP and history of rigours were significantly associated with bacterial infection . These were used in combination with serum PCT levels to develop a diagnostic risk score for bacterial infection, with a receiver operating characteristic (ROC) area under the curve (AUC) of 0.83 . At a sensitivity of 95% and specificity of 32%, this risk score achieved a negative predictive value (NPV) of 73% compared to physician’s judgement which achieved 96% sensitivity, 50% specificity and 85% NPV. Even with suboptimal tests, the potential for biomarkers such as PCT to safely reduce initiation and continuation of antibacterial treatment has been demonstrated in selected ED patients . In unselected adult ED patients with fever, a trial of PCT-guided treatment did not reduce antibacterial prescribing. This was partly attributed to physician non-adherence , vindicated by the fact that PCT only identified confirmed bacterial infections with ROC AUC of 0.68, underscoring the need for more accurate biomarkers.
In recent years, blood transcriptional profiling has emerged as a potentially powerful tool for diagnostic biomarker discovery in infectious diseases. We and others have focused this approach on identifying transcriptional signatures that discriminate between infective and non-infective inflammatory syndromes [14,15,16], and on discriminating between bacterial and viral infections [17,18,19]. Validation of these transcriptional signatures in prospective unselected ED cohorts is limited to two case-control studies: one of febrile children, in which a single gene-pair ratio achieved ROC AUC 0.97 for 28 confirmed bacterial infections compared to 23 confirmed viral infections , and our previously published validation of a transcriptional signature for viral infection (SeptiCyte™ VIRUS), in which the sum of two gene-pair ratios achieved ROC AUC 0.93 for 54 confirmed bacterial infections compared to 14 confirmed viral infections among febrile adults . None has sought to compare the performance of transcriptional biomarkers to that of the existing biomarkers used almost ubiquitously in routine practice.
A key utility of a biomarker to support clinical decisions in ED is its potential use as a triage test to determine the risk of bacterial infection. In the present study, we describe the discovery and multi-cohort validation of a new blood transcriptomic signature (SeptiCyte™ TRIAGE) designed to be a “rule-out” test for bacterial infection. We then sought to benchmark the application of SeptiCyte™ TRIAGE, by itself and in combination with SeptiCyte™ VIRUS, against the performance of peripheral blood leukocytes and CRP to discriminate between confirmed bacterial and viral infections in unselected adults presenting to ED with fever. Finally, we used the combined signatures in a mixture modelling approach to estimate the incidence of bacterial and viral infections in patients from the same cohort with no microbiological diagnosis.
Selection of published data sets for discovery and validation of blood transcriptional signatures
We used four mutually exclusive groups of publicly available case-control data sets from GEO and ArrayExpress repositories that were of human origin and involved transcriptional profiling of whole blood or peripheral blood mononuclear cells without culture or stimulation. In the first group, we identified data sets derived from ED studies that included proven bacterial infections compared to uninfected healthy or virally infected controls (Additional Table S1). In the second group, we used data sets originally identified in our previous publication describing derivation and validation of the SeptiCyte™ VIRUS signature , in which neither cases nor controls included bacterial infection (Additional Table S2). In the third group, we identified all data sets that included proven bacterial infection cases and controls comprising healthy volunteers or patients with non-infective systemic inflammatory response syndrome (Additional Table S3). In the fourth group, we identified all remaining data sets, not included in any other group including proven bacterial infection cases and viral infection controls (Additional Table S5). The first two groups were identified by searches on 20 January 2015. The third and fourth groups were identified by searches on 17 May 2017.
Study approval for prospective ED cohort
This study was approved by the UK National Research Ethics Service (reference: 10/H0713/51).
ED study population and sampling
Consecutive adult patients presenting to University College London Hospitals Emergency Department service with a core temperature of > 37.5 °C were invited to participate (Table 1). Recruitment took place in 2010–2013, subject to availability of the recruitment team within regular working hours. All participants provided written informed consent. Where patients were unable to give consent directly, assent for their participation was sought from accompanying persons. In these cases, the patients’ consent to participate in the study was confirmed when patients were able to do so. Tempus™ tube (Fischer Scientific) blood samples were collected alongside routine blood tests in ED, within 4 h of presentation to hospital. Demographic, clinical laboratory results and clinical outcome data were obtained from the hospital electronic data repository. Blood RNA samples were not available for downstream analysis for a subset of the cohort either because the sample was not obtained at the time of recruitment or because the subsequent RNA extraction did not yield an adequate concentration of high-quality RNA (see Fig. 2 and Table 1).
Clinical case definitions
Patients were classified into five separate groups based on laboratory microbiology and whether they received antimicrobial treatment during their hospital stay (Fig. 2, Table 1). Confirmed bacterial infection required culture of pathogenic bacteria from a sterile site (triggering initiation or continuation of antibacterial treatment). Confirmed viral infection required a positive viral PCR from a clinical specimen or serological evidence of acute infection. Those who had no positive microbiology were divided into two further groups on the basis of whether or not they received antimicrobial treatment. The final group consisted of microbiologically proven infection not due to bacterial and viral pathogens.
Blood transcriptomic profiling
Samples from a subset of this cohort had previously been subjected to RNA sequencing (RNAseq) for validation of our previously published SeptiCyte™ VIRUS signature . We complemented these data with targeted transcriptional profiling of all remaining samples from the study cohort for which adequate RNA was available, using customised NanoString nCounter assays (NanoString Technologies) . Briefly, total blood RNA was extracted using the Tempus Spin RNA Isolation Kit (Ambion; Life Technologies). Sample signal values from this assay were background subtracted, normalised to the positive control (GAPDH expression value) in each run and log2-transformed. In order to ensure that we could pool RNAseq and Nanostring data, we undertook Nanostring profiling of a subset of samples already subjected to RNAseq, in order to make direct pairwise comparisons of the gene expression signatures used in the present study. This analysis showed high concordance with correlation coefficient of 0.99 (Fig. S1). Gene expression data used to calculate the blood transcriptional signature scores for the ED cohort are provided in Additional File 1.
A blood transcriptional signature for bacterial infections (SeptiCyte™ TRIAGE) was derived from separate discovery and validation microarray data sets (Additional Table S1–3) using linear models of gene-pair ratios as described previously [14, 19] and in the “Results” section. The SeptiCyte™ scores were calculated from log2-transformed gene expression values. For SeptiCyte™ VIRUS, the calculation comprised (ISG15 + OASL) − (IL16 + ADGRE5). For SeptiCyte™ TRIAGE, the calculation comprised (DIAPH2 + GBP2 + TLR5) − (IL7R + GIMAP4 + FGL2). The combined SeptiCyte™ score was calculated by the subtracting the SeptiCyte™ VIRUS score from the SeptiCyte™ TRIAGE.
Unit variance scaling of gene expression in multi-cohort data (Table S5) was performed by subtracting the mean and dividing by the standard deviation in each data set . Mann-Whitney and receiver operating characteristic (ROC) analyses were performed in GraphPad Prism v6. The Youden index of ROC curves was calculated from the sum of sensitivity and specificity − 1. Bayesian conditional probabilities were calculated as previously described . Ninety-five per cent confidence intervals are provided for each measure of test performance. We used mixture modelling to estimate the proportions of bacterial and viral infections in patients recruited to the ED fever cohort. The frequency distributions for SeptiCyte™ scores for cases of proven bacterial and viral infections were fitted to two normal distributions using maximum likelihood. The posterior probabilities for these two classes were used to estimate the relative likelihood of a bacterial or viral diagnosis for a given value of SeptiCyte™ score. New distributions were then constructed by mixing the two distinct normal distributions in different proportions of viral to bacterial cases (ranging 0.1 to 10 in steps of 5 × 10−4) using the R function rnorm() to generate the appropriate set of random deviates. Each mixed distribution was compared to the empirical distributions for cases of unknown aetiology. The difference between predicted and observed distribution was measured with the Jensen-Shannon divergence using CalcJSDivergence() in the R package textmineR. The proportion giving the minimum divergence was chosen as the best fit.
In silico discovery of the SeptiCyte™ signatures was performed by Immunexpress. No a priori sample size calculation was performed for recruitment of the ED cohort. The evaluation of the performance of this signature in the prospective ED cohort was conducted entirely by independent investigators at UCL, with no commercial interest in Immunexpress. This includes the design of the inclusion criteria for the cohort study, participant recruitment, clinical data collection and case ascertainment, sample collection, measurement of the RNA signatures and evaluation of the performance of these signatures. The Standards for Reporting Diagnostic accuracy studies (STARD) checklist is available as an online supplement.
Discovery and in silico validation of a blood transcriptional signature associated with bacterial infections (SeptiCyte™ TRIAGE)
We sought to identify a parsimonious blood transcriptional signature for bacterial infection using similar computational approaches to derive gene signatures for sepsis in the ICU setting  and for viral infections in general . Eight public microarray data sets comparing patients with bacterial infections to controls (Additional Table S1) were used to discover gene-pair ratios that were differentially expressed (with false discovery rate < 0.01) and that discriminated between bacterial and control cases with ROC AUC > 0.7 in each data set. We then sought to exclude non-specific biomarkers of disease, by identifying and excluding all gene-pair ratios that discriminated non-infective diseases from their controls with ROC AUC > 0.8 (in blood transcriptomic data from eight published data sets of non-infective diseases; Additional Table S2). Finally, in the pool of eight normalised discovery data sets, we used stepwise addition of the remaining gene-pair ratios ranked by greedy forward selection to maximise the mean ROC AUC between bacterial cases and controls. This approach identified a blood transcriptional signature, SeptiCyte™ TRIAGE, based on the sum of three gene-pair ratios (DIAPH2/IL7R, GBP2/GIMAP4, TLR5/FGL2), which differentiated bacterial infections from viral infections and healthy controls in the discovery data sets (Table S1) with a ROC AUC range of 0.77–1 (Fig. 1a). We then sought to validate this signature in independent published data sets derived from patients with bacterial infection and healthy volunteers, or non-infective diseases (Fig. 1b and Additional Table S3). The SeptiCyte™ TRIAGE signature discriminated bacterial infection cases from healthy volunteers with a ROC 0.70–1.
Using SeptiCyte TRIAGE to discriminate bacterial and viral infections in adult ED patients with a fever
Next, we sought to validate the SeptiCyte™ TRIAGE signatures in the ED setting, and to benchmark the performance of these signatures against peripheral blood leukocyte counts and CRP, used almost universally in ED. We recruited an observational cohort of 332 consecutive patients presenting to the ED in a large UK teaching hospital with a temperature of > 37.5°, for whom we were able to obtain consent from the patient or where necessary the next of kin (Table 1, Fig. 2, Additional Fig. S2). The patients ranged from 17 to 99 years of age and 48% were male. No predefined risk factors (Additional Table S4) for infection were evident in 147 (44%) patients in the cohort. Of the remainder, most had one risk factor (Additional Fig. S2C-D). Recruitment to the study did not affect the diagnostic investigations or management of the participants in any way. Hence, the diagnostic yield and use of antimicrobial treatment in this cohort reflected routine practice in a UK setting. Confirmatory microbiological diagnosis became available for 124 patients (38%), including 102 bacterial and 16 viral infections, four cases of malaria, one attributed to fungal infection and one to Entamoeba histolytica (Additional Fig. S2E). Of the 208 cases with no positive microbiology, 32 recovered without receiving antimicrobials. The remaining 176 received empirical antibacterial treatment (Fig. 2). Blood transcriptomic data were available on 68 patients with proven bacterial infection, 14 patients with proven viral infection and 118 patients with no confirmed laboratory diagnosis of infection of whom 93 received empirical antibacterial treatment (Fig. 2).
Within this ED fever cohort, the SeptiCyte™ TRIAGE score for 68 patients with confirmed bacterial infection and 14 with confirmed viral infection was derived from RNAseq data. The SeptiCyte™ TRIAGE score was significantly higher in bacterial infection compared to viral infection (Fig. 3a) and achieved a ROC AUC of 0.88 (0.81–0.97) (Fig. 3e). We used the ROC curve Youden index to identify the threshold value providing the maximum classification accuracy for a given test. At this threshold, the SeptiCyte™ TRIAGE score achieved a sensitivity of 0.87 (0.76–0.94) and specificity of 0.79 (0.5–0.95), which we then used to calculate the positive and negative predictive values for this test, across a range of pre-test probabilities (Fig. 3i). Assuming prior probabilities of 72% or 35% for upper bound and lower bound of bacterial infection in febrile ED patients , the NPV of the SeptiCyte™ TRIAGE score at its Youden index was calculated to be 0.70 (0.45–0.80) and 0.92 (0.79–095) respectively (Fig. S3).
Combining SeptiCyte™ TRIAGE and SeptiCyte™ VIRUS to obtain greater discrimination
Next, we tested the hypothesis that combining our tests for viral (SeptiCyte™ VIRUS)  and bacterial (SeptiCyte™ TRIAGE) infection would achieve better discrimination between cases of bacterial and viral infection. This hypothesis was based on the premise that because each signature was independently derived to discriminate between different classes (bacterial infection from controls in the case of SeptiCyte™ TRIAGE, and viral infections from controls in the case of SeptiCyte™ VIRUS), each signature would reflect different or orthogonal features of the cases and hence, in combination, would offer better discrimination between bacterial and viral infections than either signature alone. In order to test this assumption, we first pooled publicly available data (following unit variance scaling), from 1088 bacterial and viral infections in twelve case-control studies that had not contributed to any of the discovery data for either signature (Additional Table S5). Comparison of the two scores in these data revealed a statistically significant inverse correlation, but an R2 coefficient of only 0.13, indicating that the majority of the signal from each score was orthogonal (Fig. S3A). Consistent with our hypothesis, a combined score, generated by subtracting the viral score from the bacterial score, was found to discriminate bacterial and viral infections in these pooled data with a ROC AUC of 0.88 (0.86–0.9), compared to the SeptiCyte™ TRIAGE alone (ROC AUC 0.76, 0.73–0.79) or SeptiCyte™ VIRUS score alone (ROC AUC of 0.84, 0.82–0.87) (Fig. S3B).
This analysis provided independent validation of the combined SeptiCyte™ score in case-control data, but our primary aim was to test its performance in the observational ED fever cohort. In this setting, the distribution of values for the combined score was significantly higher in bacterial infections compared to viral infections (Fig. 3b) and discriminated between the two groups with a ROC AUC of 0.95 (0.90–1) (Fig. 3f). At the Youden index of the ROC curve, the combined score achieved a sensitivity of 0.94 (0.86–0.98) and specificity of 0.93 (0.66–0.99) for bacterial infections. At this threshold, the PPV and NPV of the combined score are shown across the range of pre-test probabilities in Fig. 3j. Assuming prior probabilities of 72% or 35% for upper bound and lower bound of bacterial infection in febrile ED patients [as estimated in reference # 7], the NPV of the combined SeptiCyte™ score at its Youden index was calculated to be 0.86 (0.64–0.96) and 0.97 (0.90–0.99) respectively (Additional Fig. S4).
Peripheral blood leucocytosis and high CRP are frequently used as biomarkers of bacterial infection in routine clinical practice in the ED. Although there were statistically significant correlations between the combined SeptiCyte™ scores and leukocyte count or CRP, the correlation coefficients were weak (R2 of 0.3 leukocyte count and 0.07 for CRP, Additional Fig. S5). The distribution of leucocyte counts and CRP measurements were statistically higher in patients with bacterial infections compared to those with viral infections. Discrimination of these cases yielded ROC AUC of 0.79 (0.68–0.91) for WCC and 0.73 (0.61–0.86) for CRP. At the Youden index of these ROC curves, the PPV and NPV of each test are shown across the range of pre-test probabilities in Fig. 3k and l. At their Youden indices, the NPV of these measurements at an estimated prior probability of bacterial infection 35%  was calculated to be 0.90 (0.80–0.94) for WCC and 0.88 (0.69–0.95) for CRP. At an estimated prior probability of 72% , the NPV reduced to 0.66 (0.46–0.79) for WCC and 0.60 (0.32–0.72) for CRP (Fig. S4). On the basis that a test used to rule-out bacterial infection must achieve high NPV even at relatively high prior probability of bacterial infection, our data show that the combined SeptiCyte™ score outperforms WCC and CRP.
In a sensitivity analysis, we also calculated the NPV for each of these tests using the Youden index thresholds to exclude bacteraemia within our ED cohort. The combined SeptiCyte™ score achieved a NPV of 1.0 (0.94–1.0) compared to 0.94 (0.86–0.97) for CRP and 0.89 (0.80–0.94) for WCC. The performance metrics of all these tests are presented side-by-side in Table 2.
Estimating the true prevalence of bacterial and viral infections in the ED.
The study by Limper et al.  highlights that even when microbiological investigations are optimised, estimates of the prevalence of bacterial infection ranged from 35 to 72% . In this setting, diagnostic biomarkers may offer more accurate estimates of the prevalence of bacterial and viral infections and consequently more accurate estimates of the predictive value of a test.
We used the combined SeptiCyte™ score to infer the classification of cases within our ED fever cohort which did not yield positive microbiological results, using Gaussian mixture modelling . We divided the 119 available whole blood RNA samples from these cases into 93 cases that received empirical antibacterials (group A) and 26 cases that experienced self-limiting illnesses without any antibacterials (group B). The frequency distribution of the combined SeptiCyte™ score for both these groups was compared to that of proven bacterial and viral infections from the same cohort (Fig. 4a). We fitted a normal distribution to the known bacterial and viral distributions, and then calculated predicted frequency distributions which would be observed for cohorts containing different proportions of bacterial and viral cases (Fig. 4b). We compared the observed distribution of scores in groups A and B to the predicted distributions and estimated the proportion of viral infection cases which showed the best fit to the data by minimising the Jansen-Shannon divergence  between predicted and observed distributions (Fig. 4c). This analysis estimated that 37% of patients who received empirical antibacterials were classified by the combined SeptiCyte™ score as viral infections, compared to 47% of patients who did not receive antibacterials. Assuming all 208 febrile ED patients without microbiological diagnosis had either bacterial or viral infection, our analysis suggests that 229 (69%) of the total cohort of 332 had bacterial infections of which 45% were microbiologically proven, and 97 (30%) had viral infections of which 16.5% had laboratory confirmation. Under the reasonable assumption that not all febrile illnesses will be exclusively due to bacterial and viral infections, these estimates represent the upper limits of the prevalence of bacterial and viral infections in ED.
Finally, we estimated the relative likelihood of having a bacterial versus viral infection in individual cases. We used the fitted distributions for the cohorts of proven bacterial and viral infections to estimate the posterior probability for each data point in patients without positive microbiology that either received empirical antibacterial treatment (group A) or recovered without antibacterials (group B). This value provided an estimate of the relative likelihood of having a bacterial or viral infection for a given combined SeptiCyte™ score. Approximately 70% of group A patients could be classified as bacterial or viral infection with greater than a two-fold likelihood ratio, and about 60% of group B patients could be classified as bacterial or viral infection with greater than a two-fold likelihood ratio (Fig. 4d). In this analysis, 24% of patients who received empirical antibacterials had greater than two-fold likelihood of having had a viral infection and 40% of patients who recovered without receiving antibacterials had more than two-fold likelihood of having had a bacterial infection.
We describe a novel blood transcriptomic signature specific for bacterial infection (SeptiCyte™ TRIAGE), which we validate using data from 1575 samples in a multi-cohort analysis of published case-control studies. We combined this signature with our previously published transcriptomic signature for viral infections (SeptiCyte™ VIRUS) and validated the application of the combined signature in published case-control data from 1088 samples and in a further independently derived cohort of emergency adult admissions to hospital. In this cohort, the combined signature score achieved a ROC AUC of 0.95 for discriminating between proven bacterial and viral infections. Peripheral blood leukocyte and CRP measurements, which remain the cornerstone of early diagnostic biomarkers to guide the use of antibacterial drugs, only achieved ROC AUCs of 0.79 and 0.74 respectively. In the present study, we were not able to make comparison with PCT because this is not used routinely in adult ED settings in the UK.
In our ED cohort, the combined SeptiCyte™ score achieved an NPV of 0.86–0.97 across the range of prior probabilities for bacterial infection within febrile ED patients. On the basis that prolonged delay in antimicrobial treatment for severe sepsis is associated with increased mortality [25, 26], we propose that the imperfect sensitivity of any such biomarker means that its application as a rule-out test, to reduce empirical antimicrobial prescriptions in ED, will be restricted to patients with non-severe illness. Even so, the effectiveness of this application may be sensitive to heterogeneity in clinician assessments of risk/benefit ratio for individual patients. Of note, within the present data set, the combined SeptiCyte™ score achieved 100% NPV for bacteraemia, suggesting that such an approach may in fact provide an effective rule-out test for severe bacterial infection. In addition, as a quantitative test, the combined SeptiCyte™ score does not necessarily require a specific threshold to provide a binary result. Clinicians may wish to consider the sensitivity, specificity and predictive value of different test thresholds, depending on their tolerance for false positive or false negative results.
The major limitation of our study is the relatively small sample size of proven bacterial and viral infections in our ED cohort. Notwithstanding the need for extended validation in larger sample sizes, our data encourage the further development of blood transcriptomic signatures for rapid near-patient tests to support the differential diagnosis of bacterial and viral infections. In addition to the technological development required to realise their potential, further evaluation of factors that may confound the performance of gene expression signatures is necessary. Specifically, the range of non-infectious diseases, or non-viral and non-bacterial infections that may modulate these transcripts, and the impact of time on antibacterial or antiviral treatment should be examined further. It is particularly important to establish the window of opportunity in which these measurements can be used to reliably distinguish between infections, or to evaluate their potential role in monitoring the response to treatment. Also of note, our approach to discovery of the most concise biosignature is agnostic to the biological function of the genes and precludes confident inferences about the functional pathways represented by these signatures. Such inferences are statistically dependent on identification of multiple components of a pathway, and our statistical power to identify the associated pathway is substantially reduced by selecting the minimum number of genes required to achieve the maximum classification accuracy.
We hypothesised that blood transcriptional biomarkers that accurately discriminate between bacterial and viral infections may offer an opportunity to obtain better estimates of the true incidence of these two classes of infectious disease. Such epidemiological data are critical to our ability to incorporate prior probabilities in clinical assessments, and our interpretation of diagnostic laboratory tests. In the cohort of adult ED fever patients recruited in this study, we estimated that an upper limit of the prevalence of bacterial infection to be 69%, in keeping with the total proportion of cases attributed to bacterial infection in a similar ED cohort , but a higher proportion of viral infections suggesting that viral illnesses are substantially underdiagnosed. In our study, 206 (62%) had no microbiological diagnosis. Of those that received empirical antibacterial treatment, the application of mixture modelling using the combined SeptiCyte™ score estimated that 24% had more than two-fold likelihood of being due to a viral infection, suggesting that these patients may not have needed antibacterial drugs. These patients may also be targeted for enhanced virological testing and may merit source isolation to mitigate against onward transmission of viral infection.
Interestingly, in ED fever patients who had no positive microbiology, but fully recovered without antibacterial treatment, mixture modelling using the combined SeptiCyte™ score classified that as much as 40% had more than two-fold likelihood of being due to bacterial infection. These data support the concept that some bacterial infections may be self-limiting and do not necessarily need antibacterial treatment. Hence, any policy for antibacterial prescribing triggered exclusively by diagnostic biomarkers for bacterial infection may inadvertently increase unnecessary antibacterial use.
Importantly, 30% of microbiologically undiagnosed cases that received empirical antibacterials and 40% of those that did not receive antibacterial treatment could not be classified using the combined SeptiCyte™ score with greater than two-fold likelihood ratio. A plausible explanation in some cases may be the presence of co-infection or a non-infective cause of fever, but there may be many additional potential confounders. Further investigation is required using larger studies with sufficient power to evaluate the possible effects of age, gender, ethnicity, comorbidities and immunomodulatory drugs. Ultimately, integration of clinical and laboratory data will be required to derive models which quantitate the risk of bacterial infections which do or do not require antibacterial treatment.
Our study supports the development of blood transcriptomic signatures for rapid near-patient tests to discriminate between bacterial and viral infections in ED. We expect this approach may inform precision use of antibacterials, but also infection control practice and better use of targeted diagnostic tests for bacterial and viral infection.
Availability of data and materials
All data generated by this study are included in this published article and its supplementary information files, or available in public repositories under the accession numbers provided.
Gene Expression Omnibus
Negative predictive value
Positive predictive value
- ROC AUC:
Receiver operating characteristic area under the curve
Holmes AH, Moore LSP, Sundsfjord A, Steinbakk M, Regmi S, Karkey A, et al. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet Lond Engl. 2016;387:176–87.
Blaser M. Antibiotic overuse: stop the killing of beneficial bacteria. Nature. 2011;476:393–4.
Baggs J, Jernigan JA, Halpin AL, Epstein L, Hatfield KM, McDonald LC. Risk of subsequent sepsis within 90 days after a hospital stay by type of antibiotic exposure. Clin Infect Dis Off Publ Infect Dis Soc Am. 2018;66:1004–12.
Gould IM. Antibiotic policies to control hospital-acquired infection. J Antimicrob Chemother. 2008;61:763–5.
Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315:801–10.
Kapasi AJ, Dittrich S, González IJ, Rodwell TC. Host biomarkers for distinguishing bacterial from non-bacterial causes of acute febrile illness: a comprehensive review. PLoS One. 2016;11:e0160278.
Limper M, Eeftinck Schattenkerk D, de Kruif MD, van Wissen M, Brandjes DPM, Duits AJ, et al. One-year epidemiology of fever at the emergency department. Neth J Med. 2011;69:124–8.
Lowsby R, Gomes C, Jarman I, Lisboa P, Nee PA, Vardhan M, et al. Neutrophil to lymphocyte count ratio as an early indicator of blood stream infection in the emergency department. Emerg Med J EMJ. 2015;32:531–4.
Kreger BE, Craven DE, McCabe WR. Gram-negative bacteremia. IV. Re-evaluation of clinical features and treatment in 612 patients. Am J Med. 1980;68:344–55.
Wyllie DH, Bowler IC, Peto TE. Relation between lymphopenia and bacteraemia in UK adults with medical emergencies. J Clin Pathol. 2004;57:950–5.
de Kruif MD, Limper M, Gerritsen H, Spek CA, Brandjes DPM, ten Cate H, et al. Additional value of procalcitonin for diagnosis of infection in patients with fever at the emergency department. Crit Care Med. 2010;38:457–63.
Christ-Crain M, Stolz D, Bingisser R, Müller C, Miedinger D, Huber PR, et al. Procalcitonin guidance of antibiotic therapy in community-acquired pneumonia: a randomized trial. Am J Respir Crit Care Med. 2006;174:84–93.
van der Does Y, Limper M, Jie KE, Schuit SCE, Jansen H, Pernot N, et al. Procalcitonin-guided antibiotic therapy in patients with fever in a general emergency department population: a multicentre non-inferiority randomized clinical trial (HiTEMP study). Clin Microbiol Infect. 2018;24(12):1282–9.
McHugh L, Seldon TA, Brandon RA, Kirk JT, Rapisarda A, Sutherland AJ, et al. A molecular host response assay to discriminate between sepsis and infection-negative systemic inflammation in critically ill patients: discovery and validation in independent cohorts. PLoS Med. 2015;12:e1001916.
Sweeney TE, Shidham A, Wong HR, Khatri P. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set. Sci Transl Med. 2015;7:287ra71.
Miller RR, Lopansri BK, Burke JP, Levy M, Opal S, Rothman RE, et al. Validation of a host response assay, SeptiCyte LAB, for discriminating sepsis from systemic inflammatory response syndrome in the ICU. Am J Respir Crit Care Med. 2018;198:903–13.
Sweeney TE, Wong HR, Khatri P. Robust classification of bacterial and viral infections via integrated host gene expression diagnostics. Sci Transl Med. 2016;8:346ra91.
Herberg JA, Kaforou M, Wright VJ, Shailes H, Eleftherohorinou H, Hoggart CJ, et al. Diagnostic test accuracy of a 2-transcript host RNA signature for discriminating bacterial vs viral infection in febrile children. JAMA. 2016;316:835–45.
Sampson DL, Fox BA, Yager TD, Bhide S, Cermelli S, McHugh LC, et al. A four-biomarker blood signature discriminates systemic inflammation due to viral infection versus other etiologies. Sci Rep. 2017;7:2914.
Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008;26:317–25.
van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142.
López Puga J, Krzywinski M, Altman N. Points of significance: Bayes’ theorem. Nat Methods. 2015;12:277–8.
McLachlan G, Basford K. Mixture models: inference and applications to clustering. New York: Marcel Dekker; 1988.
Majtey AP, Lamberti PW, Prato DP. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states. Phys Rev A. 2005;72:052310.
Kumar A, Roberts D, Wood KE, Light B, Parrillo JE, Sharma S, et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit Care Med. 2006;34:1589–96.
Whiles BB, Deis AS, Simpson SQ. Increased time to initial antimicrobial administration is associated with progression to septic shock in severe sepsis patients. Crit Care Med. 2017;45:623–9.
This work was supported by Immunexpress, the National Institute for Health Research University College London Hospitals Biomedical Research Centre, NIHR fellowship award to LS (CS_2016_16_007), Wellcome Trust awards to MN (207511/Z/17/Z) and EG (107311/Z/15/Z) and Medical Research Council award to JR (MR/L001756/1).
Ethics approval and consent to participate
This study was approved by the UK National Research Ethics Service (reference: 10/H0713/51). All participants provided written informed consent.
Consent for publication
Dayle Sampson, Thomas Yager, Brian Fox, Leo McHugh, Therese Seldon, Antony Rapisarda, Richard B. Brandon, and Krupa Navalkar state that they are present or past employees and/or shareholders of Immunexpress, Inc. In silico discovery of the SeptiCyte™ signatures was performed by Immunexpress. The evaluation of the performance of this signature in the prospective ED cohort was conducted entirely by independent investigators at UCL, with no commercial interest in Immunexpress. This includes the design of the inclusion criteria for the cohort study, participant recruitment, clinical data collection and case ascertainment, sample collection, measurement of the RNA signatures and evaluation of the performance of these signatures.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised to add author Roslyn A. Hendriks to the authorship and 'Author contributions' section, where she had originally been omitted from
Comparison of Nanostring and RNAseq derived blood transcriptional signature scores. Figure S2. Demographic and microbiological summary of the ED fever cohort. Figure S3. Comparison of SpeticyteTM TRIAGE, SepticyteTM VIRUS and combined SpeticyteTM scores in pooled case-control data of bacterial and viral infections. Figure S4. Negative predictive value of different biomarkers for identification of ED patients with proven bacterial infection. Figure S5. Comparison of peripheral blood leukocyte count and C reactive protein levels with blood transcriptional biomarkers.
GEO Datasets Used for Discovery of SeptiCyte™ Triage. Table S2. GEO Datasets used to test specificity of differentially expressed gene pair ratios for discovery of the SeptiCyteTM TRIAGE signature. Table S3. GEO Datasets used to validate the SeptiCyte™ TRIAGE signature. Table S4. Apriori defined list of risk factors for infection. Table S5. GEO Datasets used to validate the combined SeptiCyteTM signature.
Spreadsheet for anonymised clinical metadata Septicyte™ scores, WCC and CRP measurements for all study participants.
About this article
Cite this article
Sampson, D., Yager, T.D., Fox, B. et al. Blood transcriptomic discrimination of bacterial and viral infections in the emergency department: a multi-cohort observational validation study. BMC Med 18, 185 (2020). https://doi.org/10.1186/s12916-020-01653-3