Natriuretic peptides for the detection of diastolic dysfunction and heart failure with preserved ejection fraction—a systematic review and meta-analysis

Background An overview of the diagnostic performance of natriuretic peptides (NPs) for the detection of diastolic dysfunction (DD) and heart failure with preserved ejection fraction (HFpEF), in a non-acute setting, is currently lacking. Methods We performed a systematic literature search in PubMed and Embase.com (May 13, 2019). Studies were included when they (1) reported diagnostic performance measures, (2) are for the detection of DD or HFpEF in a non-acute setting, (3) are compared with a control group without DD or HFpEF or with patients with heart failure with reduced ejection fraction, (4) are in a cross-sectional design. Two investigators independently assessed risk of bias of the included studies according to the QUADAS-2 checklist. Results were meta-analysed when three or more studies reported a similar diagnostic measure. Results From 11,728 titles/abstracts, we included 51 studies. The meta-analysis indicated a reasonable diagnostic performance for both NPs for the detection of DD and HFpEF based on AUC values of approximately 0.80 (0.73–0.87; I2 = 86%). For both NPs, sensitivity was lower than specificity for the detection of DD and HFpEF: approximately 65% (51–85%; I2 = 95%) versus 80% (70–90%; I2 = 97%), respectively. Both NPs have adequate ability to rule out DD: negative predictive value of approximately 85% (78–93%; I2 = 95%). The ability of both NPs to prove DD is lower: positive predictive value of approximately 60% (30–90%; I2 = 99%). Conclusion The diagnostic performance of NPs for the detection of DD and HFpEF is reasonable. However, they may be used to rule out DD or HFpEF, and not for the diagnosis of DD or HFpEF.


Background
Heart failure (HF) is a major public health problem with high morbidity and mortality [1,2]. Over half of the HF patients have heart failure with preserved ejection fraction (HFpEF), which is featured by elevated left ventricular filling pressure during exercise despite a normal ejection fraction [3]. As HFpEF is a syndrome with different underlying pathophysiologic mechanisms, detection is difficult with current guidelines [3,4].
The use of natriuretic peptides (NPs), specifically brain natriuretic peptide (BNP) and N-terminal prohormone of brain natriuretic peptide (NT-proBNP), is advised in HFpEF guidelines [3,5,6]. Detection of HFpEF in the stable outpatient population is difficult since levels of NPs are usually low as opposed to patients with heart failure with reduced ejection fraction (HFrEF) [7]. Compared with cut-offs for NPs in current guidelines [3,5,6], up to one-third of all HFpEF outpatients have NP levels below the typical diagnostic thresholds and will be missed [8,9]. This is alarming as obesity is a common comorbidity in HFpEF patients and is associated with lower NT-proBNP levels, which could lead to underdiagnosis of HFpEF [10][11][12].
Although the exact underlying mechanism of HFpEF is still unclear, diastolic dysfunction (DD) is an established precursor in the development of HFpEF [13,14]. NPs are not included in the current guidelines for the detection of DD [15]. However, emerging evidence suggests that BNP is higher in patients with DD in comparison to adults without DD [7,16,17]. Moreover, it has been demonstrated that levels gradually rise in parallel to the severity of diastolic abnormalities as assessed by echocardiography [18,19]. As NPs are secreted by the ventricular walls as a result of abnormal preload and afterload, and systemic inflammation is apparent in patients with DD, NPs could be useful in detecting DD [20,21]. Nonetheless, the performance of NPs for the detection of early subclinical DD is not as good as symptomatic DD [22].
Since early detection of DD and HFpEF in the nonacute setting is important for prevention and treatment strategies, a good diagnostic marker, such as NPs, is needed. However, a clear overview of the diagnostic performance of NPs for the detection of DD and HFpEF, in a non-acute setting, is currently lacking.
Therefore, this study aimed to systematically review and meta-analyse studies investigating the diagnostic performance of NPs for the detection of DD and HFpEF.

Data search
We performed a systematic review of PubMed and Embase.com from their inception to May 13, 2019 (SR and LS), according to the PRISMA-DTA statement [23].
The search terms 'heart failure' or 'diastolic dysfunction' were combined with general search terms 'diagnostic performance' or 'markers', as this broad search string was used for a set of systematic reviews describing a range of diagnostic markers (NPs, echo markers or biomarkers) (see Additional file 1). Reference lists of the identified articles were hand-searched for relevant publications. The protocol and search strategy were preregistered on PROSPERO (Registration number CRD42018065018). Because the protocol and search strategy were focussed on a broader research question, the identified studies were reported in three manuscripts focussed on echo parameters [24], biomarkers [25] and natriuretic peptides in this manuscript.

Patient and public involvement
Patients were not involved in the generation of this meta-analysis.

Study selection
Two reviewers independently screened titles, abstracts and full-text (SR/AJvB/MLH/JWJB). Studies were included if they (i) studied a diagnostic performance measure, (ii) studied the performance of NPs for the detection of DD and/or HFpEF, (iii) included a control population without DD or HFpEF or with HFrEF, (iv) had a cross-sectional study design (maximum follow-up 2 years) and (v) were written in English or Dutch. We excluded studies if they (i) studied the performance of the diagnostic marker for the detection of acute HF, (ii) are in rare patient populations (e.g. beta-thalassemia, hypertrophic cardiomyopathy or infiltrative disorders) or (iii) used a single echo marker as reference standard. Inconsistencies in study selection were resolved through consensus with a third reviewer (AJvB/JWJB). The mean positive and negative proportion of agreement between the reviewers for title/abstract screening was 40 and 96%, respectively. For full-text screening, the mean positive and negative proportion of agreement was 55 and 60%, respectively. For further details on the literature search and inclusion and exclusion criteria, see eMethods in Additional file 1.

Data extraction
One reviewer (SR) extracted the data, including measures of study design, study population, number of participants, markers, diagnostic performance measures and demographics, which was appraised by a second reviewer (AJvB/JWJB).

Quality assessment
Two reviewers (SR/AJvB/JWJB) independently evaluated the quality of the included studies using the QUADAS-2 checklist [26]. This checklist provides a quality score on four domains: patient selection, index test, reference standard and patient flow and timing. Each domain received a low, high or unclear risk of bias or concerns regarding applicability. A domain was rated as high risk of bias when one of the two or two of the three support questions were answered in a negative manner. A domain was rated as low risk of bias when two of the three or all support questions were answered in a positive manner. A domain was rated as unclear when one the of support questions could not be answered due to lack of information in the study. Inconsistencies in the quality assessment were resolved through consensus with a third reviewer (AJvB/JWJB). The mean positive and negative proportion of agreement between reviewers was 91 and 79%, respectively. For further details on the quality assessment, see eMethods in Additional file 1.

Data synthesis
Studies were meta-analysed using a random-effects model when two or more studies investigated the same diagnostic measure in similar study populations with similar control populations and reported similar diagnostic performance measures. The studies had to provide confidence intervals of this diagnostic performance measure or sufficient information (2 × 2 table) to compute these confidence intervals. Forest plots of randomeffects meta-analysis models were fitted to respectively AUCs, or sensitivities and specificities for all studies and stratified for cross-sectional versus case-control studies. In subgroup analyses, we examined differences by study design (cross-sectional versus case-control), geographic location (European versus other studies), assays (Roche versus other for NT-proBNP and FEIA versus RIA for BNP) and decade of publication (2000-2009 versus 2010-2019). Subgroup effect statistic was calculated by means of a Wald test to determine differences between respective subgroups, if both subgroups included a minimum of two studies [27]. In sensitivity analyses, we further determined heterogeneity by geographic location by including European studies only, or by study population by excluding studies with hospitalized patients. Heterogeneity was tested using I 2 with I 2 > 50% considered as substantial. Publication bias was evaluated by visual inspection of funnel plots. All analyses and plots were performed in RStudio 3.4.2 using the Metafor package [28]. Trivariate generalized mixed models (GLMM) were fitted in SAS Studio to meta-analyse PPVs and NPVs for all cross-sectional studies to account for differences in the prevalence of DD [29]. Percentage positive and negative agreement was calculated for title/abstract and full-text screening and quality assessment to determine the inter-rater reliability [30].

Search results
From 11,728 titles/abstracts, 352 full-text articles were screened and 51 studies were included in the data extraction (Additional file 3: Figure S1). Twenty-three studies reported the diagnostic performance for the detection of DD and 27 studies for the detection of HFpEF and one study for both. Two studies reported diagnostic performance of ANP [20,31].

Meta-analyses
Sensitivity and specificity were reported for NT-proBNP for detection of HFpEF in ten studies with a mean of 69% (56-81%; I 2 = 96.9%) and a mean of 85% (76-91%; I 2 = 98.9%), respectively. Four studies reported sensitivity and specificity for BNP for detection of HFpEF with a mean of 68% (44-93%; I 2 = 92.8%) and a mean of 78% (61-95%; I 2 = 92.2%), respectively (Fig. 3). With only one cross-sectional study for NT-proBNP, summary estimates could not be computed. Subgroup analyses did not show differences between crosssectional and case-control studies for BNP for the detection of HFpEF (p values > 0.1). Age depicted in mean ± standard deviation, median (IQR) or median (minimum-maximum) EU Europe, AS Asia, CA Canada, y years, pro-ANP pro-atrial natriuretic peptide, BNP brain natriuretic peptide, AMB ambulatory, F female, AF atrial fibrillation, DD diastolic dysfunction, FEIA fluorescence immunoassay, HF heart failure, NT-proBNP N-terminal prohormone of brain natriuretic peptide, HFpEF heart failure with preserved ejection fraction, HTN hypertension, DOE dyspnoeic on exertion, HFrEF heart failure with reduced ejection fraction, NCD non-cardiac dyspnoea, Cath catheterization, CHF chronic heart failure, DHF diastolic heart failure, ChLIA chemiluminescence immunoassay *Reported for total study population  Age depicted in mean ± standard deviation, median (IQR) or median (minimum-maximum) EU Europe, AS Asia, CA Canada, y years, ANP atrial natriuretic peptide, BNP brain natriuretic peptide, RIA radioimmunoassay, AMB ambulatory, F female, AF atrial fibrillation, DD diastolic dysfunction, FEIA fluorescence immunoassay, SD systolic dysfunction, RA rheumatoid arthritis, NT-proBNP N-terminal prohormone of brain natriuretic peptide, CAD coronary artery disease, AP angina pectoris, HFpEF heart failure with preserved ejection fraction, HTN hypertension, ICU intensive care unit, DOE dyspnoeic on exertion, CHD coronary heart disease, CKD chronic kidney disease, Q quartile, MetS metabolic syndrome *Reported for total study population  Table S3).

Positive and negative predictive values
For detection of DD, eight cross-sectional studies reported PPV and NPV for NT-proBNP with a mean of 63% (34-92%) and a mean of 81% (74-88%), respectively (Fig. 4). Seven cross-sectional studies reported this for BNP with a mean PPV of 54% (23-85%) and a mean NPV of 90% (82-98%). With only two cross-sectional studies for both NPs, summary estimates for detection of HFpEF could not be computed. The two NT-proBNP studies for the detection of HFpEF showed inconsistent results, but remarkably, for BNP, the PPV was higher than NPV: around 90 and 70%, respectively.

Incremental diagnostic performance
Three studies reported incremental values of NPs on top of clinical models, but NPs did not improve the diagnostic performance of the model [34,36,72]. Two studies reported the diagnostic performance of clinical models including NPs, but the diagnostic performance of the model only was not reported [48,56].

Discussion
Our study is the first systematic review and metaanalysis of NPs as a diagnostic marker for the detection of DD and HFpEF. The meta-analysis indicates a reasonable diagnostic performance for both NPs for detection of DD and HFpEF with AUC values around 0.80, although heterogeneity between studies was high. Heterogeneity was partly explained by the case-control design of half of the BNP studies for detection of HFpEF.
For both NPs, sensitivity was lower than specificity: approximately 65% versus 80%, respectively. Both NPs have adequate ability to rule out DD with a NPV of approximately 80%. The ability of both NPs to prove DD is lower with a PPV of approximately 60%. The risk of bias was generally high for three of the four domains.
Our systematic review and meta-analysis has several strengths. Our study provides a comprehensive overview of the diagnostic performance of NPs for detection of HFpEF and DD. The systematic review and metaanalysis are performed according to the PRISMA-DTA Fig. 4 Meta-analysis of positive and negative predictive value of NT-proBNP and BNP for the detection of DD with controls without DD statement and included a quality assessment [23,26]. However, this systematic review also has some limitations, such as substantial heterogeneity due to the quality of the included studies. The heterogeneity of included studies can be due to spectrum bias, which is a bias introduced by different inclusion criteria resulting in different study populations. We therefore performed sensitivity analyses excluding studies with hospitalized patients, but this did not affect our results. We also performed subgroup analyses for other characteristics such as geographical region, assay and decade of publication, but we did not detect any differences. However, study settings also differed across included studies in other aspects such as sex and age of the study population for which we could not perform a meta-analysis as the groups became too small. The effect of spectrum bias on the reported diagnostic measures is hard to quantify as it is unclear if this would lead to over-or underestimation of the diagnostic measures. Another explanation for the heterogeneity of included studies could be the heterogeneous nature of the HFpEF syndrome.
Overall, included studies had a high risk of bias with half of the studies using a case-control design. This results in an overestimation of diagnostic performance, as the contrast in clinical characteristics between patient and control population is large. Moreover, in casecontrol studies, the control population does not accurately reflect the population suspected to have DD or HFpEF, for whom NPs will be used in clinical practice. This limits the applicability of the results to other studies and clinical practice. Restricting the meta-analyses to cross-sectional studies substantially reduced heterogeneity (I 2 = 51.1%) in studies for BNP for detection of HFpEF, but the summary estimates remained similar. For the meta-analyses for PPV and NPV, case-control studies were already excluded because of the use of prevalence estimates in Trivariate GLMM.
This study showed that the diagnostic performance of NPs for detection of DD and HFpEF, versus no DD or HFpEF, is reasonable with summary AUC values around 0.80 for both NT-proBNP and BNP, while the diagnostic performance of NT-proBNP for detection of HFpEF versus HFrEF was lower: around 0.70. For both NPs for the detection of DD and HFpEF, specificity (~80%) was higher than sensitivity (~60-70%). The overall performance persisted in analyses excluding case-control studies. Therefore, both for detection of DD versus no DD or HFpEF versus no HFpEF, our results indicate that these measures seem to perform better for ruling out of DD or HFpEF than for making the diagnosis. However, the specific performance of NPs in clinical practice also depends on the clinical setting. The high percentage of false-negatives might be more severe for secondary or tertiary care, while in primary care, NPs may be more important to rule out DD or HFpEF, for which other diagnostic characteristics, such as NPV, are important. Our results indicate that NPs are useful to rule out DD or HFpEF in primary care with a low prevalence of these conditions but are less suitable to use to differentiate HFpEF from HFrEF.
NT-proBNP and BNP have equal capability for ruling out or ruling in DD, but NT-proBNP has a higher specificity for detection of HFpEF. In general, based on this meta-analysis, one NP is not clearly preferred over the other for the detection of DD and HFpEF. The ranges of cut-off values used in the included studies were wide. In comparison to the current guidelines, only six studies used the same cut-off value for NT-proBNP as the new HFA-PEFF diagnostic algorithm [3]. Therefore, we recommend to use the cut-off values as proposed by the current guidelines [3,6].
For detection of DD, both NPs have a substantially lower PPV (~60%) than NPV (~85%). This means that both NPs are potentially better in ruling out DD than proving DD. Current guidelines for detection of DD do not include NPs as a diagnostic marker [15]. NPs are released as a consequence of volume overload, a characteristic that is absent in (asymptomatic) DD. Consequently, the positive predictive value for the detection of DD is low and will result in misclassification, and therefore, our findings are in line with guidelines for the diagnosis of DD, as NPs are not advised to diagnose DD. However, the guidelines could provide room for the use of NPs to rule out DD in certain settings [15]. For example, in patients with exertional dyspnoea, NPs have a very good ability to diagnose DD [18]. Our study also provides evidence that NPs might be useful to rule out DD in specific settings wherein a low prevalence of DD occurs, such as primary care. This approach could be suitable for screening patients at high risk of HFpEF in primary care such as diabetes patients.
For the detection of HFpEF, a trend towards a higher PPV than NPV is observed, suggesting that BNP could be useful for diagnosing HFpEF instead of ruling out HFpEF. This is in contrast with guidelines that propose to use NPs only for ruling out HFpEF [5,6]. In acute settings, the diagnostic performance of NPs is good to detect acute HF from non-cardiac dyspnoea, as NP levels are higher in patients with acute HF [79,80]. In nonacute HFpEF patients, NP levels can be closer to normal than the acute setting. This makes it more difficult to distinguish HFpEF from non-HFpEF patients based on NPs, especially in combination with common comorbidities that complicate the diagnosis further [10,81]. Therefore, NPs should be used in combination with echocardiography for initial diagnosis of HFpEF, as guidelines recommend [3]. Evidence of the incremental value of NPs on top of clinical characteristics or echocardiography measures is limited, but important. We therefore recommend future studies to compute incremental values of NPs on top of clinical or echocardiographic diagnostic models for the detection of HFpEF, confirmed by catheterization. Furthermore, future studies should aim to reduce bias by using cross-sectional studies with pre-specified cut-off values and correct reference diagnoses with transparent patient population selection procedures. As our study shows the ability of NPs to rule out DD and as earlier recognition of HFpEF is key to prevent late diagnosis, future studies should focus on the possibilities of NPs in screening programmes in patients at risk in primary care such as patients with type 2 diabetes, for detection of DD as precursor of HFpEF.

Conclusion
This systematic review and meta-analysis of 51 studies shows that NPs have reasonable diagnostic performance for the detection of DD and HFpEF in a non-acute setting. NPs are useful to rule out DD and would not be a tool to rule in DD. NPs have value in the diagnosis of HFpEF, but not for ruling out HFpEF. However, NPs should be used in combination with echocardiography. As the risk of bias of the included studies is high and sensitivity of NPs for detection of DD or HFpEF is low compared to specificity, the use of NPs alone to detect DD or HFpEF should be discouraged as recommended by current guidelines. Nonetheless, the high NPV observed for both DD and HFpEF indicates they might be useful for screening of high-risk patients in primary care such as those with diabetes. For future research and guidelines, well-performed cross-sectional studies with pre-specified cut-off values for NPs are needed for unbiased estimates of diagnostic performance measures of NPs, especially for use in primary care.