This systematic review of the predictive value of serum markers of inflammation and infection in children presenting with febrile neutropenia found 25 studies reporting 14 different markers. Of these, CRP, PCT, IL6 and IL8 were most commonly examined. The finding of a diverse range of potentially useful markers, but such little consistency across studies, is unfortunately common in such research , and may reflect the relative lack of coordination in supportive care studies.
The studies presented similar challenges in reporting, methodology and analysis. Reporting if the test was interpreted 'blind' to the results of the outcome analysis, and vice-versa, was very poorly reported. Many studies failed to assess if the marker had supplementary value above the simple admission data collected by clinicians at every encounter: age, malignancy, temperature, vital statistics and blood count. Analysis of the data was frequently undertaken by episode, with no account of multiple admissions for the same patient. Such an analysis ignores the variation which may be expected from genetic polymorphisms for the production of the biomarker under investigation , or in individual genetic susceptibility to infection [41, 42]. The biomarker cut-off values reported were frequently derived from the dataset to which they were then applied, which is likely to produce significant overestimations of accuracy . The data were sometimes presented as mean and standard deviation estimates, from which measures of test accuracy were derived. Although this may raise concerns because of the assumption of a normal distribution, there is some empiric justification for this procedure .
Quantitative meta-analysis using three approaches demonstrated how the commonly used, simple techniques may fail to reflect inconsistencies in the whole data set and so produce misleadingly precise results. The example of this review is important to recall when appraising other reviews where inconsistencies may not have been as extensively investigated.
The analysis undertaken using only the most commonly reported cut-off in a restricted number of studies produced excessively precise results which did not reflect the uncertainty of the whole data set, and so should be rejected. A similar problem was found with the use of data points with different thresholds to produce a hierarchical summary receiver operator curve (HSROC). The HSROC modelled by these techniques does not take into account the actual value of the thresholds. This is frequently reasonable: it is impossible to quantify the thresholds used by different radiologists to call a radiograph 'positive' for pneumonia. In cases where the values are known, an ordered relationship should be possible to determine, flowing from high to low cut-offs from left to right on the curve. This ordered relationship did not hold true for analyses of CRP and PCT and so should call into question analyses in other studies which do not assess whether thresholds vary according to the implicit structure of the model.
A previously developed  technique to undertake the ordered pooling of all the results was used to attempt to overcome these difficulties of only selective use of the data, and of incorrect relationships between test thresholds. This approach failed to produce meaningful results for the ability of PCT and CRP to identify patients who developed a documented infection, reflecting the inconsistencies and great heterogeneity of the data.
Some of the observed heterogeneity may be due to differences in measurement between apparently similar outcomes. While bacteremia is likely to be similarly reported across the studies, the diagnosis of a soft-tissue infection may vary between clinicians and centers. Very few studies reported in detail the exact definitions of the outcomes they reported. Further variation may have been introduced by the varying definitions of fever and neutropenia. In this review, 20 different combinations of criteria were used to define febrile neutropenia. These data could not be directly assessed to explore their relationship with the diagnostic value of the biomarkers, but as the depth of neutropenia and peak, and duration of temperature may affect the generation of biomarkers, the variation may further account for some of the heterogeneity. Additionally, although the assay techniques used in the studies were reported to be similar, there was no calibration of assays across the various studies. Other differences in the populations studied, such as the nature of the malignancies, recent surgical interventions and duration of therapy, may also add heterogeneity to interpreting markers which are themselves affected by a malignant disease. A more prosaic reason for heterogeneity may be publication bias: the tendency for reports demonstrating good predictive value to be published than those showing poor discrimination [45–47].
In order to interpret the information from this review in a clinically meaningful way, both the estimates of predictive effectiveness and the uncertainty that surrounds these estimates need to be taken into account. CRP has been most extensively studied in this setting; it is a ubiquitous test and the only one which has been shown to add to the predictive ability of clinically-based decision rules [26, 34]. These studies chose two differing cut-offs (> 50 mg/dl  or > 90 mg/dl ). It is at best only moderately discriminatory in the setting of detecting documented infection (Sensitivity 0.65; 95% CI 0.41 to 0.84, Specificity 0.73; 95% CI 0.63 to 0.82), which is in keeping with estimates drawn from its value in the detection of serious bacterial infection in non-neutropenic children , and may be a significant overestimation of its value. The clinical role of CRP as a screening tool may be limited, however, if another biomarker is shown to be a more discriminatory test.
Data from this review and meta-analytic comparisons of CRP and PCT in the non-neutropenic population  are suggestive of the improved predictive value of PCT over CRP. This has a strong pathophysiological basis, as PCT levels are reported to rise within 3 to 4 hours in response to infection as compared with the 24 to 48 hours required for CRP . However, the data for the improved predictive value of PCT are quite varied (see Additional file 3 and previously published reviews ). This may be related to the degree of neutropenia, as reports from the post-transplant setting have shown disappointingly poor discrimination , or this again may be due to small studies and publication bias [47, 51]. Based on the data from this review, procalcitonin cannot yet be recommended for use in routine clinical practice
Similar pathophysiological claims for improved predictive ability can be advanced for IL6 and IL8 . In this review, IL6 level shows potential to be a better discriminator than CRP of those children who will develop a serious infectious complication. IL8 also appears to have moderate discriminatory ability and has been used in combination with clinical data in a small pilot study to withhold antibiotics to a highly select group of patients with febrile neutropenia . Both of these cytokines show promise, and should be subject to further investigation.
Given the very limited data available for other potential biomarkers of infection in the setting of pediatric febrile neutropenia identified by this review, no strong clinical conclusions for their use can be reached without further studies.
These conclusions are drawn from an extensive and detailed systematic review of the available evidence using advanced techniques of meta-analysis, supplemented by rational clinical and pathophysiological reasoning. It should be clearly understood that they are uncertain and unstable, as only small amounts of new data may substantially alter these findings.