- Research article
- Open Open Peer Review
Diagnostic indicators of non-cardiovascular chest pain: a systematic review and meta-analysis
BMC Medicinevolume 11, Article number: 239 (2013)
Non-cardiovascular chest pain (NCCP) has a high healthcare cost, but insufficient guidelines exist for its diagnostic investigation. The objective of the present work was to identify important diagnostic indicators and their accuracy for specific and non-specific conditions underlying NCCP.
A systematic review and meta-analysis were performed. In May 2012, six databases were searched. Hand and bibliography searches were also conducted. Studies evaluating a diagnostic test against a reference test in patients with NCCP were included. Exclusion criteria were having <30 patients per group, and evaluating diagnostic tests for acute cardiovascular disease. Diagnostic accuracy is given in likelihood ratios (LR): very good (LR+ >10, LR- <0.1); good (LR + 5 to 10, LR- 0.1 to 0.2); fair (LR + 2 to 5, LR- 0.2 to 0.5); or poor (LR + 1 to 2, LR- 0.5 to 1). Joined meta-analysis of the diagnostic test sensitivity and specificity was performed by applying a hierarchical Bayesian model.
Out of 6,316 records, 260 were reviewed in full text, and 28 were included: 20 investigating gastroesophageal reflux disorders (GERD), 3 musculoskeletal chest pain, and 5 psychiatric conditions. Study quality was good in 15 studies and moderate in 13. GERD diagnosis was more likely with typical GERD symptoms (LR + 2.70 and 2.75, LR- 0.42 and 0.78) than atypical GERD symptoms (LR + 0.49, LR- 2.71). GERD was also more likely with a positive response to a proton pump inhibitor (PPI) test (LR + 5.48, 7.13, and 8.56; LR- 0.24, 0.25, and 0.28); the posterior mean sensitivity and specificity of six studies were 0.89 (95% credible interval, 0.28 to 1) and 0.88 (95% credible interval, 0.26 to 1), respectively. Panic and anxiety screening scores can identify individuals requiring further testing for anxiety or panic disorders. Clinical findings in musculoskeletal pain either had a fair to moderate LR + and a poor LR- or vice versa.
In patients with NCCP, thorough clinical evaluation of the patient’s history, symptoms, and clinical findings can indicate the most appropriate diagnostic tests. Treatment response to high-dose PPI treatment provides important information regarding GERD, and should be considered early. Panic and anxiety disorders are often undiagnosed and should be considered in the differential diagnosis of chest pain.
In the USA, 6 million patients present to emergency departments with chest pain each year, at an annual cost of $8 billion [1, 2]. In emergency departments, roughly 60% to 90% of the patients presenting with chest pain have no underlying cardiovascular disease [3–6]. The proportion of patients with cardiovascular disease may be higher in specialized units (cardiology emergency departments, cardiac care units (CCUs), intensive care units (ICUs))  and lower in the primary care setting [6, 8–10]. Physicians generally assume that patients with non-cardiovascular chest pain (NCCP) have an excellent prognosis after ruling out serious diseases. However, patients with NCCP have a high disease burden; most patients that seek care for NCCP complain of persisting symptoms on 4-year follow-up . Furthermore, compared to patients with cardiac pain, patients with non-cardiac chest pain have a similarly impaired quality of life and similar numbers of doctor visits .
In patients with chest pain, the diagnostic investigation focuses primarily on cardiovascular disease diagnosis and is often performed by cardiologists. Upon ruling out cardiovascular disease, only vague recommendations exist for further diagnostic investigation, often delaying diagnosis and appropriate treatment and causing uncertainty for patients . Limited data are available regarding efficient diagnostic investigations for patients with NCCP. Most studies investigate gastrointestinal diseases, and extensive provocation testing has been proposed . Some report that almost half of the patients with NCCP will have gastrointestinal disorders , while others attribute more than a third of cases to psychiatric disorders, as diagnosed by the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSMIV). Referred pain from the spine and the chest wall are also likely substantial contributors to NCCP. Information is scarce regarding the appropriate diagnostic tests, and their sensitivity and specificity to discriminate different non-cardiac diseases.
The present systematic review aimed to identify relevant diagnostic tests for patients with NCCP, and to summarize their positive and negative likelihood ratios for underlying disease identification.
Literature search and study selection
This review, conducted in May 2012, followed the QUADAS quality assessment checklist for diagnostic accuracy studies . We searched six databases (PubMed/Medline, Biosis/Biological Abstracts (Web of Knowledge), Embase (OvidSP), INSPEC (Web of Knowledge), PsycInfo (OvidSP), and Web of Science (Web of Knowledge)) using the following search terms as medical subject headings (MeSH) and other subject headings: thoracic pain, chest pain, non-cardiac chest pain, atypical chest pain, musculoskeletal chest pain, esophageal chest, and thoracic spine pain. The findings were limited to studies investigating ‘diagnosis’, ‘sensitivity and specificity’, ‘sensitivity specificity’, or chest pain/diagnosis. We applied no limits for study setting or language; however, one potentially eligible Russian language article was excluded due to lack of language proficiency . Appendix 1 depicts three detailed search strategies.
To ensure search completeness, one reviewer (MW) conducted a hand search of the last 5 years in the four journals that published most articles about patients with NCCP (Gastroenterology, Chest, Journal of the American College of Cardiology and American Journal of Cardiology). Potentially eligible references not retrieved by the systematic search in the six databases were added. Bibliographies of included studies were also searched, and potential eligible references included in the full text review.
Eligible studies included non-screening studies on diagnostic accuracy published between 1992 and May 2012. Inclusion criteria were studies reporting on patients of 18 years and older, seeking care for NCCP. NCCP was defined as chest pain and cardiac or other vascular disease was ruled out (that is, cardiovascular disease, aortic dissection, pulmonary embolism). Exclusion criteria included studies with <30 patients per group due to concerns about sample size . This group size was arbitrarily chosen to exclude studies with the highest risk of bias, while allowing a comprehensive literature overview. Based on the nomogram proposed by Carley et al. a sample size of more than >150 patients are needed to accurately assess a diagnostic test ; however, with this sample size cut-off, very few studies (mainly retrospective data analyses) would have been eligible.
Study selection, data extraction, and synthesis
Two reviewers (MW and KR) independently screened 6,380 references by title and abstract. Both reviewers independently reviewed the full text of 260 studies meeting the eligibility criteria. Disagreements were discussed and resolved by consensus or third party arbitration (JS). Researchers with specific language proficiencies reviewed non-English language references. When the same study was included in several publications without change in diagnostic measure, the most recent publication was chosen and missing information was added from previous publications.
All information regarding the diagnostic test, reference test, and considered differential diagnosis was extracted and grouped according to the disease investigated. The methods used to assess accuracy, sensitivity, and specificity were also extracted.
Study quality was assessed using the Scottish Intercollegiate Guidelines Network (SIGN) methodology checklist for diagnostic studies . Overall bias risk and study quality was rated according to the SIGN recommendations. The ratings included high quality (++; most criteria fulfilled and if not fulfilled, the study conclusions are very unlikely to be altered), moderate quality (+; some criteria fulfilled and if not fulfilled, the study conclusions are unlikely to be altered), low quality (−; few or no criteria fulfilled, conclusions likely to be altered). Studies rated as low quality by both reviewers were excluded from further analysis.
Reference standards and test evaluation
Information about method validity, reliability, practicability and value for clinical practice of the reference and the standard test was extracted and critically assessed. When several reference standards were used, all measurements were extracted and used for further analysis.
Descriptive statistics were used to summarize findings across all diagnostic studies. Sensitivity, specificity, positive and negative predictive values (PPV and NPV, respectively), and positive and negative likelihood ratios (LR + and LR-, respectively) were calculated based on a 2 × 2 table (true/false positives, true/false negatives). Pretest probabilities (prevalence) and the positive and negative post-test probability of the disease were calculated. If one field contained the value 0, 0.5 was added to each field to enable value calculation. Test diagnostic accuracy was assessed as follows : very good (LR+ >10, LR- <0.1); good (LR + 5 to 10, LR- 0.1 to 0.2); fair (LR + 2 to 5, LR- 0.2 to 0.5); or poor (LR + 1 to 2, LR- 0.5 to 1).
When more than four unbiased studies were available in clinically similar populations and with comparable index and reference tests, we performed joined meta-analysis of the diagnostic test sensitivity and specificity. We used a hierarchical Bayesian model, as proposed by Rutter and Gatsonis , which accounts for the within-study and between-study variability and the potentially imperfect nature of the reference test. The hierarchical Bayesian model was set up as follows: we assumed J diagnostic studies in the meta-analysis, with crosstabulation between index test (T1) and reference test (T2) available for each study, and both tests assumed to be dichotomous (1 = positive test result, 0 = negative test result). Each study was assumed to use a different cut-off value (θj) to define a positive test result. The diagnostic accuracy of each study was denoted by αj. The model structure implied a within-study level for study-specific parameters (θj and αj), and a between-study level for parameters common among all studies. The model could theoretically be extended to include study-specific covariates such as percentage of female patients or mean age to reduce heterogeneity on study level.
Appendix 2 gives details of the model set up and prior distributions. The results of the Bayesian analysis are samples from the posterior distribution of the parameters, and estimates are presented as posterior means (50% quantile), and lower (2.5% quantile) and upper (97.5% quantile) bounds, resulting in a 95% credible region. Analyses were performed using R statistical software and the ‘HSROC’ package [22, 23].
For this study no ethical approval was required. No protocol was published or registered. All methods were determined a priori.
Figure 1 summarizes the search and inclusion process. Out of 6380 records, 260 were reviewed in full text, resulting in exclusion of 232 studies. In total, the analysis included 28 studies. The reasons for exclusion of the 232 studies are given in Figure 1 and overview of excluded studies reviewed in full text is give in Appendix 3.
Table 1 presents the study characteristics, and included patients. In all, 20 studies (71%) evaluated diagnostic tests to identify gastrointestinal disease, mainly gastroesophageal reflux disorders (GERD), underlying NCCP. Musculoskeletal chest pain was investigated in three studies (11%), and psychiatric conditions in five studies (18%). Study quality was good in 15 studies (54%) and moderate in 13 (46%; Appendix 4). No study had to be excluded because of poor study quality.
Accuracy of symptoms for the diagnosis of GERD
Table 2 summarizes the diagnostic accuracy of the diagnostic tests relevant for clinical practice. A comprehensive overview of all evaluated diagnostic tests is provided in Appendix 5. For diagnosis of GERD, the most common reference tests (endoscopy and/or 24-h pH-metry) are reported.
Patients with the main complaint of NCCP were less likely to have GERD (LR + 0.83, 0.43; LR- 1.13, 1.23) compared to patients with the main complaint of dysphagia (LR + 1.27, 1.16; LR- 0.97, 0.97) or GERD typical symptoms without chest pain (LR + 1.26, 1.53; LR- 0.93, 0.74) in two studies [25, 26]. Two further studies compared the accuracy of NCCP and typical GERD symptoms (LR + 2.70 , 2.75 ; LR- 0.42 , 0.78 ) with NCCP without GERD symptoms (LR + 0.49; LR- 2.71 ) or with NCCP and a history of heart burn (LR + 2.15; LR- 0.74 ).
Accuracy of response to proton pump inhibitor (PPI) treatment for diagnosis of GERD in NCCP
The effect of treatment with PPI was measured by using a symptom intensity score (SIS) at baseline and follow-up. The SIS was calculated by adding the reported daily severity (mild = 1; moderate = 2; severe = 3; and disabling = 4) multiplied by the reported daily frequency values obtained during each week of symptom recording.
Table 2 summarizes the results. Three studies compared the treatment response after high doses of PPI (rabeprazole , lansoprazole , omeprazole ) for 1 week to placebo. A reduction of the SIS score of ≥50% was associated with a good LR + and a fair LR- (LR + 5.48 , 7.13 , 8.56 ; LR- 0.24 , 0.25 , 0.28 ) for the presence or absence of GERD. The likelihood ratios in the placebo groups with a reduction of the SIS score of ≥50% were: LR + 0.89 , 0.61 , 3.04 ; LR- 1.03 , 1.22 , 0.84 . A reduction of the SIS score of ≥65% resulted in a very good LR + (18.33), and a good LR- (0.17) . A treatment duration of 4 weeks (lasoprazole) resulted in a better LR- (LR + 2.75; LR- 0.13)  when compared to 2 weeks (omeprazole , LR + 2.7; LR- 0.15).
For joint meta-analysis only studies were considered with similar study design. Therefore, the active treatment arms of six studies were available for further analysis [31–36]. The model could be extended to include study-specific covariates such as the percentage of female patients or mean age to reduce unexplained heterogeneity on study level. However, due to the small number of studies available for pooling we refrained from including covariates. Figure 2 shows the summary receiver operating characteristic (ROC) curve. Considering the GERD prevalence and the fact that no perfect reference test is available for GERD (sensitivity of the 24-h pH-metry in endoscopy-negative patients <71% ), the posterior mean sensitivity of what was 0.89 (95% credible interval, 0.28 to 1). The posterior mean of the specificity was 0.88 (95% credible interval, 0.26 to 1), respectively.
Accuracy of provocation tests for GERD diagnosis
Using a treadmill test during the 24-h pH-metry (reference test) showed highest LR + for GERD when chest pain was provoked by exercise (LR + 14.4; LR- 0.79 ). In all patients who underwent treadmill test, a high number of false negative test results during the treadmill test were observed.
For joint meta-analysis only studies were considered with similar study design, again. Five patient groups from four original studies were included in the analysis [38–41]. Figure 3 shows the summary ROC curve. Considering the prevalence and imperfect nature of the reference test, posterior mean sensitivity and specificity were 0.53 (95% credible interval, 0.02 to 1) and 0.93 (95% credible interval, 0.25 to 1), respectively. For all provocation tests (Tensilon test, Bernstein test, or balloon distension test) high numbers of false negative results were found [29, 42].
Accuracy of patient characteristics for eosinophilic esophagitis diagnosis
Eosinophilic esophagitis is a rare but important differential diagnosis for NCCP. In a retrospective analysis the likelihood for histologically proven eosinophilic esophagitis (reference test) was fair when current GERD symptoms were present (LR + 2.36, LR- 0.71 (poor)). Male gender or the presence of typical endoscopic findings for eosinophilic esophagitis were associated with a poor LR + (1.78) but a very good LR- (0.09) No information was available about eosinophilia that responds to PPI treatment compared to eosinophilic esophagitis.
Accuracy of clinical signs for musculoskeletal chest pain diagnosis
In one study in a cardiology emergency department specific clinical signs or symptoms compared to a standardized examination protocol showed either fair LR + and poor LR- (for example, pain worse with movement of the torso, pain relief on pain medication, no sudden pain start, age ≤49 years) or a poor LR + and a very good LR- (for example, anterior chest wall tenderness, biomechanical dysfunction) . A score of 3 or more points in a sum score (1 point for each of five palpation findings: restriction in C4 to 7/Th1 to Th8 when sitting; prone restriction Th1 to 8; paraspinal tenderness; anterior chest wall tenderness; costosternal junction tenderness) showed an LR + of 1.52 and very good LR- of 0.03. A score of 1 or more points in a sum score for the diagnosis of a chest wall syndrome (CWS) in the GP setting (1 point for each positive finding: localized muscle tension; stinging pain; pain reproducible by palpation; absence of cough) showed a LR + of 1.82 and LR- of 0.20 . A score of 2 or more points in the sum score showed a LR + of 3.02 and LR- of 0.47.
Accuracy of patient characteristics for psychiatric disease diagnosis
For the diagnosis of an anxiety disorder the anxiety subscale of the Hospital Anxiety and Depression Score (HADS-A, cut-off ≥8) compared to a neuropsychiatric interview (reference test) showed a very good LR- (0.03) and a fair LR + (2.03). In further studies the HADS-A was used as reference test for the diagnosis of anxiety disorder. Specific symptoms showed a fair LR + and a poor LR-: fear of dying (LR + 4.04; LR- 0.82); light-headedness, dizziness, or faintness (LR + 3.03; LR- 0.84); diaphoresis (LR + 3.49; LR- 0.69); and chills or hot flushes (LR + 4.85; LR- 0.81).
For panic disorders a four-item panic screening score validated in patients presenting to an ER showed fair LR + (3.44 and 3.89) and poor-to-fair LR- (0.44 and 0.55) . A combination of different questionnaires and pain patterns (Agoraphobia Cognitions Questionnaire; Mobility Inventory for Agoraphobia; McGill Pain Questionnaire sensory) showed a fair LR + (2.6) and fair LR- (0.46) . In patients presenting to their primary care physician with NCCP the presence of a panic disorder was rarely diagnosed. Clinician consultations in this setting had poor accuracy for panic disorder diagnosis (LR + 0.8; LR- 1.02) .
The included studies showed that most studies investigated tests for gastroesophageal reflux disease (GERD) as the underlying disease in non-cardiovascular chest pain (NCCP). Few studies investigated diagnostic tests for other illnesses. The diagnostic value of a PPI treatment test was confirmed, with a ≥50% symptom reduction under PPI treatment showing posterior sensitivity and specificity of almost 90%. Together with the favorable adverse effect profile of PPIs, a high dose (double reference dose, twice daily) can quickly provide important diagnostic information in patients with unexplained chest pain. History or presence of typical GERD-associated symptoms increases the likelihood of GERD.
Only limited evidence was available for other prevalent illnesses manifesting with chest pain. Screening tools for panic and anxiety disorders are valuable for identifying patients requiring further diagnostic evaluation. The likelihood for musculoskeletal chest pain increased when the pain was reproducible or relieved by pain medication. Among studies investigating musculoskeletal disease, the major limitation was the lack of a reference test (‘gold standard’).
Results in light of existing literature
To the best of our knowledge, this is the first systematic review summarizing the current evidence on the accuracy of diagnostic tests in patients with NCCP. Several non-systematic reviews have suggested various diagnostic and therapeutic approaches [14, 58–61], often with algorithms focused on gastrointestinal diseases [14, 58, 59], sometimes recommending extensive testing, such as provocation tests. Here, we found no additional value of provocation testing for diagnosing underlying gastroesophageal conditions, as provocation tests failed to identify many patients that would have reflux during a 24-h pH measurement period. While meta-analyses of PPI treatment studies compared to placebo have been previously conducted [62, 63], compared to this analyses we excluded studies of poor quality and small sample sizes [64–67]. Our study is the first to assess study quality and to use a hierarchical Bayesian approach that accounts for within-study and between-study variability and the imperfect nature of the reference test.
Cremonini et al.  previously used a bivariate model, and found a lower pooled sensitivity and specificity (sensitivity 80% vs 89%, specificity 74% vs 88%) of a positive PPI treatment response for the diagnosis of GERD. Harbord et al.  showed that the likelihood functions of the two model formulations are algebraically identical in the absence of covariates. However, for assessing a summary ROC curve, the hierarchical Bayesian model is more natural than the model for pooled sensitivity and specificity . Without a broadly accepted standard reference test, it is important to adjust for conditional dependence between multiple tests (index test and reference test) carried out in the same subjects . The hierarchical Bayesian model can be adapted to this situation by introducing covariance terms between the sensitivities and specificities of the index and reference tests. A previous simulation study  demonstrated that if a model does not address an imperfect reference test, bias will be around 0.15 in overall sensitivity and specificity . No systematic review has examined diagnostic studies of musculoskeletal chest pain or chest pain as part of a psychiatric disease.
Strengths and limitations
This review comprehensively evaluates the currently available studies. The search was inclusive, no language restrictions were applied, and a thorough bibliographic search was conducted to identify all relevant studies. The extraction process was performed in accordance with current guidelines and supported by an experienced statistician. Potential factors influencing diagnostic test accuracy were identified by a multidisciplinary team (an internist, general practitioner, statistician, and methodologist).
The study was limited by the small number of studies available for most diseases presenting with NCCP. Furthermore, many studies were only of moderate quality and most cross-sectional or prospective studies did not meet the required sample size criterion for reliable estimates of sensitivity and specificity. Small studies on diagnostic accuracy are often imprecise, with wide confidence intervals, making it difficult to assess test informativeness . The lack of a gold standard reference test is another limitation, which we addressed within the Bayesian model formulation; however, the resulting posterior credible intervals for overall sensitivity and specificity of the index test are wider than they would be with a perfect reference test. Further, NCCP is a collective term with potentially different underlying diseases and therefore might present differently. Diagnostic accuracy in one population with high prevalence for one disease is high might be entirely different for another population . Therefore, for most studies no joint meta-analysis could be conducted and results have to be interpreted on a single study level within the context of the study population. We have tried to balance this by providing a thorough description of the studies’ inclusion and exclusion criteria and the study setting. This will allow readers to judge to whom study results apply. In studies included in the joint meta-analyses, we intended to include study-specific covariates such as the percentage of female or mean age into the Bayesian model. The inclusion of covariates can reduce unexplained heterogeneity. However, this was due to the small number of studies available for meta-analysis not feasible.
Further research should investigate the combined value of symptoms, clinical findings, and diagnostic tests, including multidisciplinary research aimed at increasing our knowledge about diagnostic processes and making recommendations for diagnostic tests and treatments in patients with NCCP. Most patients with chest pain consult primary care physicians , but few studies are performed in this setting. Further research is needed to strengthen the evidence in a primary care setting. The value of screening questionnaires for panic and anxiety disorders should be further evaluated and investigated in clinical practice. The use of a flag system , as successfully applied in back and neck pain, could facilitate the diagnostic process allowing systematic assessment of first red flags (acute disease requiring immediate diagnosis and care), then green flags (identifiable diseases), and yellow flags (psychological diseases).
Implication for practice
Patients with NCCP incur high healthcare costs due to the extensive and often invasive diagnostic testing, and NCCP’s impact on quality of life. Early identification of underlying diseases is essential to avoid delayed treatment and chronicity of complaints. Symptoms and clinical findings may provide important information to guide treatment of an underlying illness. In patients with typical GERD symptoms, twice-daily high-dose PPI treatment is the most efficient diagnostic approach. GERD is very likely if a positive treatment response occurs after 1 week, while GERD is unlikely if there is no response after 4 weeks of PPI treatment. In patients not responding to PPI, if an endoscopy shows no pathological findings, other illnesses should be considered before initiating further gastrointestinal testing.
Panic and anxiety disorders are often missed in clinical practice . Symptoms such as expressing ‘fear of dying’, ‘light headedness, dizziness, faintness’, ‘diaphoresis’ and ‘chills or hot flushes’ are associated with anxiety disorders. Screening tests are valuable to rule out panic or anxiety disorders, and positive finding should lead to further investigation.
In patients with NCCP, timely diagnostic evaluation and treatment of the underlying disease is important. A thorough history of symptoms and clinical examination findings can inform clinicians which diagnostic tests are most appropriate. Response to high-dose PPI treatment can indicate whether GERD is the underlying disease and should be considered as an early test. Panic and anxiety disorders are often not diagnosed and should be considered in the differential diagnosis of chest pain.
Appendix 1: Search Strategy May Week 4 2012
Biological Abstracts/BIOSIS, INSPEC and Web of Science (Web of Knowledge)
Topic = (‘thoracic pain’ OR ‘chest pain’ OR ‘noncardiac chest pain’ OR ‘non cardiac chest pain’ OR ‘atypical chest pain’ OR ‘musculoskeletal chest pain’ OR ‘esophageal chest pain’ OR ‘thoracic spine pain’ OR ‘chest wall’) AND Topic = (sensitivity OR specificity OR diagnostic tests) NOT Topic = (coronary artery disease OR cardiac disease OR coronary heart disease OR coronary thrombosis OR coronary occlusion)
Refined by: Topic = (human*)
Timespan = 1992 to 2012.
Appendix 2: Set up of the hierarchical Bayesian models for the summary receiver operating characteristic (ROC) curves
Model 1: proton pump inhibitor (PPI) studies
Assumption: imperfect reference standard
Prior of prevalence (pi) is beta (12, 12), <= > pi in [0.3, 0.7]
Prior of beta is uniform (−0.75, 0.75)
Prior of THETA is uniform (−1.5, 1.5)
Prior of LAMBDA is uniform (−3, 3)
Prior of sigma_alpha is uniform (0, 2)
Prior of sigma_theta is uniform (0, 2)
Prior of S2 (sensitivity of reference test) is:
Study(ies) 1 to 7 beta (172.55, 30.45), <= > S2 in [0.8, 0.9]
Prior of C2 (specificity of reference test) is:
Study(ies) 1 to 7 beta (50.4, 12.6), <= > C2 in [0.7, 0.9]
Model 2: exertional 24 h pH-metry
Assumption: imperfect reference standard
Prior of prevalence (pi) is beta (5.2318, 6.0194), <= > pi in [0.18, 0.75]
Prior of beta is uniform (−0.75, 0.75)
Prior of THETA is uniform (−1.5, 1.5)
Prior of LAMBDA is uniform (−3, 3)
Prior of sigma_alpha is uniform (0, 2)
Prior of sigma_theta is uniform (0, 2)
Prior of S2 (sensitivity of reference test) is:
Study(ies) 1 to 5 beta (172.55, 30.45), <= > S2 in [0.8, 0.9]
Prior of C2 (specificity of reference test) is:
Study(ies) 1 to 5 beta (50.4, 12.6), <= > C2 in [0.7, 0.9]
Appendix 3: Summary of excluded studies during full-text review
In Table 5 summarizes the studies reviewed in full-text and excluded from the systematic review. For each study the reason for exclusion is provided.
Appendix 4: Summary of the Scottish Intercollegiate Guidelines Network (SIGN) quality assessment 
Appendix 5: Summary of all tests evaluated
Table 7 provides a detailed description of all tests and reference tests investigated. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), prevalence, and post test prevalence are given.
Sensitivity calculated by TP/(TP + FN); specificity calculated by TN/(FP + TN). Biomechanical dysfunction defined as chest pain presumably caused by mechanical joint and muscle dysfunction related to C4 to T8 somatic structures of the spine and chest wall established by means of joint-play and/or end-play palpation.
24-h pH-metry: 24-h pH monitoring measures with a single sensor located above the lower esophageal sphincter (LES) a reflux event and the association of the reflux event with symptoms can also be ascertained from the tracing; manometry: esophageal manometry measures mean pressure of the lower esophageal sphincter and any degree of hypomotility and dysmotility in the esophagus.
LR+, positive likelihood ratio calculated sensitivity/1 - specificity; LR-, negative likelihood ratio calculated 1 - sensitivity/specificity; diagnostic test accuracy. Very good: LR + >10, LR- <0.1; good: LR + 5 to 10, LR- 0.1 to 0.2; fair: LR + 2 to 5, LR- 0.2 to 0.5; poor: LR + 1 to 2, LR- 0.5 to 1.
Reference tests are as follows. Endoscopic classification: LA classification: grade A, ≥1 mucosal break ≤5 mm, that does not extend between the tops of two mucosal folds; grade B, ≥1 mucosal break >5 mm long that does not extend between the tops of two mucosal folds; grade C, ≥1 mucosal break that is continuous between the tops of two or more mucosal folds but which involves <75% of the circumference; grade D, ≥1 mucosal break which involves at least 75% of the esophageal circumference . Savary-Miller System: grade I, single or isolated erosive lesion(s) affecting only one longitudinal fold; grade II, multiple erosive lesions, non-circumferential, affecting more than one longitudinal fold, with or without confluence; grade III, circumferential erosive lesions; grade IV, chronic lesions: ulcer(s), stricture(s) and/or short esophagus. Alone or associated with lesions of grades 1 to 3; grade V, columnar epithelium in continuity with the Z line, non-circular, star-shaped, or circumferential. Alone or associated with lesions of grades 1 to 4 . Hentzel-Dent grades: grade 0, no mucosal abnormalities; grade 1, no macroscopic lesions but erythema, hyperemia, or mucosal friability; grade 2, superficial erosions involving <10% of mucosal surface of the last 5 cm of esophageal squamous mucosa; grade 3, superficial erosions or ulceration involving 10% to 50% of the mucosal surface of the last 5 cm of esophageal squamous mucosa; grade 4, deep peptide ulceration anywhere in the esophagus or confluent erosion of >50% of the mucosal surface of the last 5 cm of esophageal squamous mucosa . pH-metry: De Meester criteria: (1) total number of reflux episodes; (2) number of reflux episodes with pH <4 for more than 5 minutes; (3) duration of the longest episode; (4) percentage total time pH <4; (5) percentage upright time pH <4; and 6) percentage recumbent time pH <4. . Manometry: Spechler criteria: diagnosis of ineffective esophageal motility, nutcracker esophagus, spasm, achalasia based on: basal lower esophageal sphincter pressure, relaxation, wave progression, distal wave amplitude .
ACQ Agoraphobia Cognitions Questionnaire, ADIS-R structured interview by psychologist, recommended interview protocol for panic research, CVD cardiovascular disease, DSM-IV (ADIS-IV) Anxiety Disorders Interview Schedule, GERD gastroesophageal reflux disease, GP general practitioner, GPE Ghillibert probability estimate (sum of partial probabilities for exact numbers of reflux associated symptoms within the context of the total number of symptoms), HDR high-degree response, EoE eosinophilic esophagitis (typical abnormal EoE endoscopic findings (rings or furrows)), IHD ischemic heart disease, McGill sensory McGill Pain Questionnaire sensory, MIA Mobility Inventory for Agoraphobia, MINI Mini International Neuropsychiatric Interview (gold standard for anxiety disorders), NPV negative predictive value, PPV positive predictive value, SI symptom index (calculated as the proportion of chest pain symptoms (pH <4) divided by the number of chest pain episodes recorded, expressed as a percentage), SIS symptom index score (calculated by adding the reported daily severity (mild = 1; moderate = 2; severe = 3; and disabling = 4) multiplied by the reported daily frequency values during each week).
Eslick GD, Coulshed DS, Talley NJ: Review article: the burden of illness of non-cardiac chest pain. Aliment Pharmacol Ther. 2002, 16: 1217-1223. 10.1046/j.1365-2036.2002.01296.x.
Kahn SE: The challenge of evaluating the patient with chest pain. Arch Pathol Lab Med. 2000, 124: 1418-1419.
Cayley WE: Diagnosing the cause of chest pain. Am Fam Physician. 2005, 72: 2012-2021.
Blatchford O, Capewell S, Murray S, Blatchford M: Emergency medical admissions in Glasgow: general practices vary despite adjustment for age, sex, and deprivation. Brit J Gen Pract. 1999, 49: 551-554.
Pope JH, Aufderheide TP, Ruthazer R, Woolard RH, Feldman JA, Beshansky JR, Griffith JL, Selker HP: Missed diagnoses of acute cardiac ischemia in the emergency department. N Engl J Med. 2000, 342: 1163-1170. 10.1056/NEJM200004203421603.
Buntinx F, Knockaert D, Bruyninckx R, De Blaey N, Aerts M, Knottnerus JA, Delooz H: Chest pain in general practice or in the hospital emergency department: is it the same?. Fam Pract. 2001, 18: 586-589. 10.1093/fampra/18.6.586.
Ruddox V, Mathisen M, Otterstad JE: Prevalence and prognosis of non-specific chest pain among patients hospitalized for suspected acute coronary syndrome - a systematic literature search. BMC Med. 2012, 10: 58-10.1186/1741-7015-10-58.
Nilsson S, Scheike M, Engblom D, Karlsson LG, Molstad S, Akerlind I, Ortoft K, Nylander E: Chest pain and ischaemic heart disease in primary care. Brit J Gen Pract. 2003, 53: 378-382.
Verdon F, Burnand B, Herzig L, Junod M, Pecoud A, Favrat B: Chest wall syndrome among primary care patients: a cohort study. BMC Fam Pract. 2007, 8: 51-10.1186/1471-2296-8-51.
Svavarsdottir AE, Jonasson MR, Gudmundsson GH, Fjeldsted K: Chest pain in family practice. Diagnosis and long-term outcome in a community setting. Can Fam Physician. 1996, 42: 1122-1128.
Eslick GD, Talley NJ: Natural history and predictors of outcome for non-cardiac chest pain: a prospective 4-year cohort study. Neurogastroenterol Motil. 2008, 20: 989-997. 10.1111/j.1365-2982.2008.01133.x.
Eslick GD, Talley NJ: Non-cardiac chest pain: predictors of health care seeking, the types of health care professional consulted, work absenteeism and interruption of daily activities. Aliment Pharmacol Ther. 2004, 20: 909-915. 10.1111/j.1365-2036.2004.02175.x.
Stochkendahl MJ, Christensen HW: Chest pain in focal musculoskeletal disorders. Med Clin North Am. 2010, 94: 259-273. 10.1016/j.mcna.2010.01.007.
Katz PO: Approach to the patient with unexplained chest pain. Semin Gastrointest Dis. 2001, 12: 38-45.
Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J: Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess. 2004, 8: iii-1–234
Maev IV, Iurenev GL, Burkov SG, V’Iuchnova ES: Rabeprazole test and comparison of the effectiveness of course treatment with rabeprazole in patients with gastroesophageal reflux disease and non-coronary chest pain. Klin Med. 2007, 85: 45-51.
Bachmann LM, Puhan MA, Riet G, Bossuyt PM: Sample sizes of studies on diagnostic accuracy: literature survey. BMJ Clin Res Ed. 2006, 332: 1127-1129. 10.1136/bmj.38793.637789.2F.
Carley S, Dosman S, Jones SR, Harrison M: Simple nomograms to calculate sample size in diagnostic studies. Emerg Med J. 2005, 22: 180-181. 10.1136/emj.2003.011148.
Scottish Intercollegiate Guidelines Network: Methodology checklist 5: diagnostic studies. http://www.sign.ac.uk/methodology/checklists.html.
Jaeschke R: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA. 1994, 271: 703-707. 10.1001/jama.1994.03510330081039.
Rutter CM, Gatsonis CA: A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001, 20: 2865-2884. 10.1002/sim.942.
R Project: R: A language and environment for statistical computing. http://www.r-project.org/.
Schiller I, Dendukuri N: HSROC: Joint meta-analysis of diagnostic test sensitivity and specificity with or without a gold standard reference test. R package version 2.1.6. 2013
Kim JH, Rhee PL, Park EH, Son HJ, Kim JJ, Rhee JC: Clinical usefulness of subgrouping of patients with non-cardiac chest pain according to characteristic symptoms in Korea. J Gastroenterol Hepatol. 2007, 22: 320-325. 10.1111/j.1440-1746.2006.04264.x.
Hong SN, Rhee PL, Kim JH, Lee JH, Kim YH, Kim JJ, Rhee JC: Does this patient have oesophageal motility abnormality or pathological acid reflux?. Dig Liver Dis. 2005, 37: 475-484. 10.1016/j.dld.2005.01.018.
Netzer P, Gut A, Heer R, Gries N, Pfister M, Halter F, Inauen W: Five-year audit of ambulatory 24-hour esophageal pH-manometry in clinical practice. Scand J Gastroenterol. 1999, 34: 676-682. 10.1080/003655299750025877.
Mousavi S, Tosi J, Eskandarian R, Zahmatkesh M: Role of clinical presentation in diagnosing reflux-related non-cardiac chest pain. J Gastroenterol Hepatol. 2007, 22: 218-221. 10.1111/j.1440-1746.2006.04416.x.
Singh S, Richter JE, Bradley LA, Haile JM: The symptom index. Differential usefulness in suspected acid-related complaints of heartburn and chest pain. Dig Dis Sci. 1993, 38: 1402-1408. 10.1007/BF01308595.
Ho K: Gastroesophageal reflux disease is a common cause of non-cardiac chest pain in a country with a low prevalence ofreflux esophagitis. Dig Dis Sci. 1998, 43: 1991-1997. 10.1023/A:1018842811123.
Lam HGT: Acute non-cardiac chest pain in a coronary care unit: evaluation by 24-hour pressure and pH recording of the esophagus. Gastroenterology. 1992, 102: 453-460.
Dickman R, Emmons S, Cui H, Sewell J, Hernandez D, Esquivel RF, Fass R: The effect of a therapeutic trial of high-dose rabeprazole on symptom response of patients with non-cardiac chest pain: a randomized, double-blind, placebo-controlled, crossover trial. Aliment Pharmacol Ther. 2005, 22: 547-555. 10.1111/j.1365-2036.2005.02620.x.
Bautista J, Fullerton H, Briseno M, Cui H, Fass R: The effect of an empirical trial of high-dose lansoprazole on symptom response of patients with non-cardiac chest pain - a randomized, double-blind, placebo-controlled, crossover trial. Aliment Pharmacol Ther. 2004, 19: 1123-1130. 10.1111/j.1365-2036.2004.01941.x.
Fass R, Fennerty MB, Ofman JJ, Gralnek IM, Johnson C, Camargo E, Sampliner RE: The clinical and economic value of a short course of omeprazole in patients with noncardiac chest pain. Gastroenterology. 1998, 115: 42-49. 10.1016/S0016-5085(98)70363-4.
Pandak WM, Arezo S, Everett S, Jesse R, DeCosta G, Crofts T, Gennings C, Siuta M, Zfass A: Short course of omeprazole - A better first diagnostic approach to noncardiac chest pain than endoscopy, manometry, or 24-hour esophageal pH monitoring. J Clin Gastroenterol. 2002, 35: 307-314. 10.1097/00004836-200210000-00006.
Kim JH, Sinn DH, Son HJ, Kim JJ, Rhee JC, Rhee PL: Comparison of one-week and two-week empirical trial with a high-dose rabeprazole in non-cardiac chest pain patients. J Gastroenterol Hepatol. 2009, 24: 1504-1509. 10.1111/j.1440-1746.2009.05859.x.
Xia HH, Lai KC, Lam SK, Hu WH, Wong NY, Hui WM, Lau CP, Chen WH, Chan CK, Wong WM, Wong BC: Symptomatic response to lansoprazole predicts abnormal acid reflux in endoscopy-negative patients with non-cardiac chest pain. Aliment Pharmacol Ther. 2003, 17: 369-377. 10.1046/j.1365-2036.2003.01436.x.
Kushnir VM, Sayuk GS, Gyawali CP: Abnormal GERD parameters on ambulatory pH monitoring predict therapeutic success in noncardiac chest pain. Am J Gastroenterol. 2010, 105: 1032-1038. 10.1038/ajg.2009.646.
Lacima G, Grande L, Pera M, Francino A, Ros E: Utility of ambulatory 24-hour esophageal pH and motility monitoring in noncardiac chest pain: report of 90 patients and review of the literature. Dig Dis Sci. 2003, 48: 952-961. 10.1023/A:1023011931955.
Cooke RA, Anggiansah A, Smeeton NC, Owen WJ, Chambers JB: Gastroesophageal reflux in patients with angiographically normal coronary-arteries - an uncommon cause of exertional chest pain. Br Heart J. 1994, 72: 231-236. 10.1136/hrt.72.3.231.
Bovero E, Torre F, Poletti M, Faveto M, De Iaco F: Exertional gastroesophageal pH-metry: a new provocative physiological test in the diagnosis of chest pain. Gastroenterol Clin Biol. 1993, 17: 4-8.
Romand F, Vincent E, Potier V, Claudel N, Galoo E, Desbaumes J: Angina-like chest pain and exertional esophageal pH monitoring. Gastroenterol Clin Biol. 1999, 23: 313-318.
Abrahao LJ, Lemme EMO: Role of esophageal provocative tests in the investigation of patients with chest pain of undetermined origin [in Portugese]. Arq Gastroenterol. 2005, 42: 139-145.
Achem SR, Almansa C, Krishna M, Heckman MG, Wolfsen HC, Talley NJ, DeVault KR: Oesophageal eosinophilic infiltration in patients with noncardiac chest pain. Aliment Pharmacol Ther. 2011, 33: 1194-1201. 10.1111/j.1365-2036.2011.04652.x.
Stochkendahl MJ, Vach W, Hartvigsen J, Hoilund-Carlsen PF, Haghfelt T, Christensen HW: Reconstruction of the decision-making process in assessing musculoskeletal chest pain: an exploratory study using recursive partitioning. J Manipulative Physiol Ther. 2012, 35: 184-195. 10.1016/j.jmpt.2012.01.009.
Bosner S, Becker A, Hani MA, Keller H, Sonnichsen AC, Karatolios K, Schaefer JR, Haasenritter J, Baum E, Donner-Banzhoff N: Chest wall syndrome in primary care patients with chest pain: presentation, associated features and diagnosis. Fam Pract. 2010, 27: 363-369. 10.1093/fampra/cmq024.
Manchikanti L, Singh V, Pampati V, Beyer CD, Damron KS: Evaluation of the prevalence of facet joint pain in chronic thoracic pain. Pain Physician. 2002, 5: 354-359.
Kuijpers PM, Denollet J, Lousberg R, Wellens HJ, Crijns H, Honig A: Validity of the hospital anxiety and depression scale for use with patients with noncardiac chest pain. Psychosomatics. 2003, 44: 329-335. 10.1176/appi.psy.44.4.329.
Demiryoguran NS, Karcioglu O, Topacoglu H, Kiyan S, Ozbay D, Onur E, Korkmaz T, Demir OF: Anxiety disorder in patients with non-specific chest pain in the emergency setting. Emerg Med J. 2006, 23: 99-102. 10.1136/emj.2005.025163.
Foldes-Busque G, Fleet R, Poitras J, Chauny JM, Belleville G, Denis I, Diodati JG, Pelland ME, Lessard MJ, Marchand A: Preliminary investigation of the Panic Screening Score for emergency department patients with unexplained chest pain. Acad Emerg Med. 2011, 18: 322-325. 10.1111/j.1553-2712.2011.01009.x.
Fleet RP, Dupuis G, Marchand A, Burelle D, Beitman BD: Detecting panic disorder in emergency department chest pain patients: a validated model to improve recognition. Ann Behav Med. 1997, 19: 124-131. 10.1007/BF02883329.
Katerndahl DA, Trammell C: Prevalence and recognition of panic states in STARNET patients presenting with chest pain. J Fam Pract. 1997, 45: 54-63.
Lundell LR, Dent J, Bennett JR, Blum AL, Armstrong D, Galmiche JP, Johnson F, Hongo M, Richter JE, Spechler SJ, Tytgat GN, Wallin L: Endoscopic assessment of oesophagitis: clinical and functional correlates and further validation of the Los Angeles classification. Gut. 1999, 45: 172-180. 10.1136/gut.45.2.172.
Genta RM, Spechler SJ, Kielhorn AF: The Los Angeles and Savary-Miller systems for grading esophagitis: utilization and correlation with histology. Dis Esophagus. 2011, 24: 10-17. 10.1111/j.1442-2050.2010.01092.x.
Pandolfino JE, Vakil NB, Kahrilas PJ: Comparison of inter- and intraobserver consistency for grading of esophagitis by expert and trainee endoscopists. Gastrointest Endosc. 2002, 56: 639-643. 10.1016/S0016-5107(02)70110-7.
De Meester TR, Wang CI, Wernly JA, Pellegrini CA, Little AG, Klementschitsch P, Bermudez G, Johnson LF, Skinner DB: Technique, indications, and clinical use of 24 hour esophageal pH monitoring. J Thorac Cardiovasc Surg. 1980, 79: 656-670.
Spechler SJ, Castell DO: Classification of oesophageal motility abnormalities. Gut. 2001, 49: 145-151. 10.1136/gut.49.1.145.
Katz P, Gerson L, Vela M: Guidelines for the diagnosis and management of gastroesophageal reflux disease. Am J Gastroenterol. 2012, 108: 308-328.
Eslick GD, Fass R: Noncardiac chest pain: evaluation and treatment. Gastroenterol Clin North Am. 2003, 32: 531-10.1016/S0889-8553(03)00029-3.
Achem SR, DeVault KR: Recent developments in chest pain of undetermined origin. Curr Gastroenterol Rep. 2000, 2: 201-209. 10.1007/s11894-000-0062-4.
Chambers J, Bass C: Atypical chest pain: looking beyond the heart. QJM. 1998, 91: 239-244. 10.1093/qjmed/91.3.239.
Yelland M, Cayley WE, Vach W: An algorithm for the diagnosis and management of chest pain in primary care. Med Clin North Am. 2010, 94: 349-374. 10.1016/j.mcna.2010.01.011.
Cremonini F, Wise J, Moayyedi P, Talley NJ: Diagnostic and therapeutic use of proton pump inhibitors in non-cardiac chest pain: a metaanalysis. Am J Gastroenterol. 2005, 100: 1226-1232. 10.1111/j.1572-0241.2005.41657.x.
Kahrilas PJ, Hughes N, Howden CW: Response of unexplained chest pain to proton pump inhibitor treatment in patients with and without objective evidence of gastro-oesophageal reflux disease. Gut. 2011, 60: 1473-1478. 10.1136/gut.2011.241307.
Chambers J, Cooke R, Anggiansah A, Owen W: Effect of omeprazole in patients with chest pain and normal coronary anatomy: initial experience. Int J Cardiol. 1998, 65: 51-55. 10.1016/S0167-5273(98)00093-X.
Dekel R, Martinez-Hawthorne SD, Guillen RJ, Fass R: Evaluation of symptom index in identifying gastroesophageal reflux disease-related noncardiac chest pain. J Clin Gastroentero. 2004, 38: 24-29. 10.1097/00004836-200401000-00007.
Adamek RJ, Wegener M, Wienbeck M, Pulte T: Esophageal motility disorders and their coexistence with pathological acid reflux in patients with noncardiac chest pain. Scand J Gastroenterol. 1995, 30: 833-838. 10.3109/00365529509101588.
Achem SR, Kolts BE, MacMath T, Richter J, Mohr D, Burton L, Castell DO: Effects of omeprazole versus placebo in treatment of noncardiac chest pain and gastroesophageal reflux. Dig Dis Sci. 1997, 42: 2138-2145. 10.1023/A:1018843223263.
Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA: A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2007, 8: 239-251. 10.1093/biostatistics/kxl004.
Dendukuri N, Schiller I, Joseph L, Pai M: Bayesian meta-analysis of the accuracy of a test for tuberculous pleuritis in the absence of a gold standard reference. Biometrics. 2012, 68: 1285-1293. 10.1111/j.1541-0420.2012.01773.x.
Leeflang MMG, Rutjes AWS, Reitsma JB, Hooft L, Bossuyt PMM: Variation of a test’s sensitivity and specificity with disease prevalence. Can Med Assoc J. 2013, 185: E537-E544. 10.1503/cmaj.121286.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/11/239/prepub
We thank Dr Martina Gosteli, University of Zurich, for conducting the literature search.
The authors declare that they have no competing interests. This study was not funded.
MW and KR carried out data extraction, participated in the analysis and drafted the manuscript. UH participated in the design of the study and performed the statistical analysis. JS conceived the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.