Bias associated with delayed verification in test accuracy studies: accuracy of tests for endometrial hyperplasia may be much higher than we think!
BMC Medicine volume 2, Article number: 18 (2004)
To empirically evaluate bias in estimation of accuracy associated with delay in verification of diagnosis among studies evaluating tests for predicting endometrial hyperplasia.
Systematic reviews of all published research on accuracy of miniature endometrial biopsy and endometr ial ultrasonography for diagnosing endometrial hyperplasia identified 27 test accuracy studies (2,982 subjects). Of these, 16 had immediate histological verification of diagnosis while 11 had verification delayed > 24 hrs after testing. The effect of delay in verification of diagnosis on estimates of accuracy was evaluated using meta-regression with diagnostic odds ratio (dOR) as the accuracy measure. This analysis was adjusted for study quality and type of test (miniature endometrial biopsy or endometrial ultrasound).
Compared to studies with immediate verification of diagnosis (dOR 67.2, 95% CI 21.7–208.8), those with delayed verification (dOR 16.2, 95% CI 8.6–30.5) underestimated the diagnostic accuracy by 74% (95% CI 7%–99%; P value = 0.048).
Among studies of miniature endometrial biopsy and endometrial ultrasound, diagnostic accuracy is considerably underestimated if there is a delay in histological verification of diagnosis.
The natural history of endometrial hyperplasia is not fully understood . What is known is that a proportion of simple and complex hyperplastic processes will regress without treatment  although the time scale over which such regression may occur is unclear. Similarly the time scale over which benign endometrium progresses to hyperplasia is also unknown. Among studies evaluating accuracy of tests for diagnosis of hyperplasia (miniature biopsy or ultrasonography), it has previously been hypothesised that if histological verification of diagnosis after performing the test is delayed, the estimation of test accuracy may be influenced by the phenomena of disease regression or progression . For instance, false positive diagnoses of endometrial hyperplasia may occur due to natural disease regression during the time interval between testing and verification of diagnosis. Similarly, false negative diagnoses may also result from progression of benign functional or atrophic endometrium.
To obtain accurate estimates of test accuracy in studies of hyperplasia, an immediate comparison of the test under scrutiny with a reference standard that verifies the diagnosis will be essential [4–6]. When accuracy studies suffer from a delay in performance of the reference standard, the resultant false positives and false negatives will be expected to lead to an underestimation of test accuracy. In systematic reviews, when studies of various designs are collated, the extent of underestimation that arises from delay is important in obtaining an unbiased pooled accuracy estimate. To our knowledge, the extent of underestimation of accuracy due to a delay in verification of diagnosis has not been evaluated empirically in studies of endometrial hyperplasia. We undertook this analysis to examine formally how inaccurate the estimation of accuracy can be in studies evaluating miniature endometrial biopsy devices and endometrial thickness measurement by pelvic ultrasonography for predicting endometrial hyperplasia when there are delays in histological verification of diagnosis.
To test our hypothesis, a data set of all the published studies reporting the accuracy of miniature endometrial biopsy devices and endometrial ultrasonography for predicting endometrial hyperplasia was obtained from systematic reviews [7, 8]. The reviews focused on test accuracy studies in which the results of the test were compared with the results of a reference standard. The targeted population was women with abnormal pre- or postmenopausal uterine bleeding. The diagnostic tests of interest were miniature endometrial biopsy devices (for example, pipelle® endometrial suction curette, Unimar, Wilton, CT, USA) and endometrial thickness measurement by pelvic ultrasonography. The reference standard was endometrial histology obtained by an independent endometrial sampling technique, for example, inpatient curettage (with hysteroscopy) or hysterectomy.
Identification of studies
Two independent electronic searches of MEDLINE and EMBASE were conducted to identify relevant citations on endometrial biopsy (1980–1999) and ultrasonography (1966–2000). Search term combination for endometrial biopsy  was diagnosis (MeSH) AND endometrial biopsy (textword), while that for studies on ultrasonography  was ultrasound AND endometrial thickness AND sonography (textwords). The searches were limited to human studies, but there were no language restrictions. Relevant studies were identified by examining all the retrieved citations, reference lists of all known reviews and primary studies, and direct contact with manufacturers. Details of the search and selection processes can be found in the published reports of the reviews [7, 8].
Study quality assessment
All selected studies were assessed for their methodological quality defined as the confidence that study design, conduct and analysis minimize bias in the estimation of diagnostic accuracy [9–11]. We considered the following features in quality assessment: method of recruitment of sample, appropriateness of patient spectrum, and blinding of comparison between test and reference standard. Recruitment was considered to be adequate if patient selection was consecutive or a random sample was obtained. Patient spectrum was considered to be appropriate if both pre- and postmenopausal women were included. Blinding was considered to be present if it was clearly reported that the pathologists providing histological reports were kept unaware of the results of miniature endometrial biopsy or endometrial ultrasonography. If the results of the diagnostic tests were divulged to the pathologists or in the absence of any such reporting, blinding was categorised as absent. For the purpose of our analysis, studies were classified into two quality categories: Category I studies had any one of the following features: adequate recruitment, appropriate spectrum, and blinding; category II studies had none of the above quality features.
In addition to assessment of methodological quality, data were extracted to allow classification of studies into one of two groups: i) immediate verification – reference standard performed within 24 hours of testing, and ii) delayed verification – reference standard performed more than 24 hours after testing. Any studies that could not be categorised in this way due to lack of reporting were excluded. Data were then abstracted as 2 × 2 tables and estimates of diagnostic accuracy were derived for each individual study. A correction factor of 0.5 was used when cells of the 2 × 2 tables included zero values . True positive rates (sensitivity), false positive rates (1-specificity) and diagnostic odds ratios (dORs) were calculated for each primary evaluation. The dOR represents a ratio of the positive and negative likelihood ratios and it can be mathematically summarised as:
dOR = [sensitivity/(1-specificity)] / [(1-sensitivity)/specificity]
Pooled dORs were generated as the principal measures of diagnostic accuracy. Meta-analyses to produce summary estimates of accuracy were performed separately for subgroups of studies reporting immediate and delayed verification. To delineate the impact of delay in verification of diagnosis, weused meta-regression analysis [13, 14] with the log of dOR as the accuracy measure. This technique fitted a multivariable linear regression model for examining the influence of delay, quality and test type on the estimation of accuracy observed among studies included in the analysis (random effects model). In this way the analysis was adjusted for the confounding effects of study quality (two quality categories) and type of test (miniature endometrial biopsy or endometrial ultrasound).
Selection of studies
The study selection process is shown in Figure 1. In total there were 2,982 subjects in 27 diagnostic evaluations reported in the 24 eligible primary studies. Eleven evaluations delayed verification of the diagnosis by more than 24 hours; the delay was up to six months in one study, up to four weeks in four studies, up to three weeks in one study and up to one week in the remaining three studies. Three of these studies were rated as category I for methodological quality, and eight as category II. Sixteen evaluations verified the diagnosis within 24 hours of the test. Among these, seven studies were rated as category I for quality, and nine as category II (Table 1).
Table 2 shows the diagnostic accuracy results for individual studies according to test type and verification status in terms of delay. The summary statistics for the various subgroups showed that the dOR for studies with immediate verification was 67.2 (21.7–208.8) while that for studies with delayed verification was 16.2 (8.6–30.5) as shown in Figure 2. Meta-regression analysis for bias due to delay in verification of diagnosis, adjusted for study quality and test type, showed that the underestimation of test accuracy among studies with delayed verification was 74% (95% CI 7%-99%; P = 0.048) on average compared to studies with immediate verification (Table 3).
Our study shows empirically the magnitude of bias associated with delay in verification of diagnosis in test accuracy studies. Delay in verification of more than 24 hours was associated with a considerable underestimation of accuracy of miniature biopsy and endometrial ultrasonography in diagnosing endometrial hyperplasia. This supports the premise that the reported limited accuracy of miniature endometrial biopsy devices and endometrial ultrasonography in diagnosing hyperplasia is due, in part, to natural history of disease rather than resulting entirely from intrinsic problems with performance of the diagnostic tools .
We posed our hypothesis a priori and tested it in as rigorous a manner as possible. Our literature search was without language restriction, facilitating retrieval of many relevant test accuracy studies. However, due to poverty of reporting many critical pieces of information were missing in the available literature, restricting the number of studies that could be included in our analysis (for example, 31 studies were ineligible for inclusion because explicit information about time before verification was omitted). Our examination of delays in verification was also restricted; just two time categories were discernible (delay < 24 hours or > 24 hours). Immediate verification (reference standard to be performed straight after the index test) was not achievable in some studies because the reference test (inpatient endometrial sampling) necessitated use of general anaesthesia. A practical cut-off of 24 hours was taken to allow time for reference testing to be undertaken when the preceding index tests (miniature endometrial biopsy and ultrasound) were performed in the conscious outpatient. Although the natural history of endometrial hyperplasia is unclear, it is unlikely that biological alteration would have occurred within 24 hours. To study the rate of disease progression or regression would require repeated testing over time, but such a study is unlikely to be ethically justifiable, given that most clinicians will institute treatment following initial diagnosis. Such a study would be then become one of prognosis under treatment rather than a natural history study.
We also evaluated other features of methodological quality and, in general, found the quality of studies to be poor. For example, only three studies reported blinding interpretation of the reference test from knowledge of results from the index test. A lack of blinding can introduce bias and overestimation of diagnostic accuracy . Pathological interpretation of endometrial hyperplasia is open to a varying degree of subjectivity especially at extreme ends of the spectrum, where overlap with benign functional endometrium (simple hyperplasia) and cancer (complex hyperplasia with cytological atypia) is more likely. Absence (or explicit reporting) of blinding is thus associated with poorer methodological quality and this feature was incorporated in our quality assessment. Our analysis adjusted for the confounding effects of quality but our inferences should be interpreted with caution due to relative scarcity of good quality studies.
Our findings have implications for research into new diagnostic interventions. Our results demonstrate that test evaluation with robust study design (immediate verification) showed good test performance but evaluation in poor designs (delayed verification) showed poor performance. Poor designs may reflect the situation prevalent in routine clinical practice where test results may not be immediately confirmed due to resource and other implications. Thus diagnostic evaluations carried out in routine practice may mask the accuracy of tests.
Fox H: Endometrial hyperplasia: a conceptual and practical approach. Gynaecology Forum. 1996, 1: 7-9.
Kurman RJ, Kaminski PF, Norris HJ: The behaviour of endometrial hyperplasia: a long-term study of "untreated" hyperplasia in 170 patients. Cancer. 1985, 56: 403-412.
Krampl E, Soby B, Istre O: How representative are pipelle endometrial biopsies? A retrospective analysis of 324 biopsies followed by transcervical resection of the endometrium or hysterectomy. Gynaecol Endosc. 1997, 6: 277-281. 10.1046/j.1365-2508.1997.1210540.x.
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Van der Meulen JHP, Bossuyt PM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.
Reid MC, Lachs MS, Feinstein AR: Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA. 1995, 274: 645-651. 10.1001/jama.274.8.645.
Sheps SB, Schechter MT: The assessment of diagnostic tests: a survey of current medical research. JAMA. 1984, 252: 2418-2422. 10.1001/jama.252.17.2418.
Gupta JK, Chien PF, Voit D, Clark TJ, Khan KS: Ultrasonographic endometrial thickness for diagnosing endometrial pathology in women with postmenopausal bleeding: a meta-analysis. Acta Obstet Gynecol Scand. 2002, 81: 799-816. 10.1034/j.1600-0412.2001.810902.x.
Clark TJ, Mann CH, Shah N, Song F, Khan KS, Gupta JK: Accuracy of outpatient endometrial biopsy in the diagnosis of endometrial hyperplasia: a systematic quantitative review. Acta Obstet Gynecol Scand. 2001, 80: 784-793. 10.1034/j.1600-0412.2001.080009784.x.
Cochrane Methods Working Group on Systematic Reviews of Screening and Diagnostic Tests: Recommended Methods. [http://www.cochrane.org/cochrane/sadtdoc1.htm]
Dunn G, Everitt B: Clinical Biostatistics. London: Edward Arnold. 1995
Khan KS, ter Riet G, Popay J, Nixon J, Kleijnen J: Study quality assessment. In Undertaking Systematic Reviews of Research on Effectiveness. CRD's Guidance for Carrying Out or Commissioning Reviews. Edited by: Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. 2001, York: NHS Centre for Reviews and Dissemination (CRD), University of York, 2
Sankey S, Weisfiels L, Fine M, Kapoor W: An assessment of the use of the continuity correction for sparse data in meta analysis. Comm Stat Sim Comput. 1996, 25: 1031-1056.
Sterne JAC, Egger M, Davey Smith G: Investigating and dealing with publication and other biases. In Systematic Reviews in Health Care: Meta-analysis in Context. Edited by: Egger M, Davey Smith G, Altman DG. 2001, London: BMJ Publishing Group, 189-208.
Song F, Sheldon T, Sutton A, Abrams K, Jones D: Methods for exploring heterogeneity in meta-analysis. Eval Health Prof. 2001, 24: 126-151. 10.1177/01632780122034849.
Sun-Kuie T, Sian-Ann T, Ka-Mui C, Soo-Kim L: The diagnostic value and patient acceptability of outpatient endometrial sampling with Gynoscann. Aust NZ J Obstet Gynaecol. 1992, 32: 73-76.
Goldberg GL, Tsalacopoulos GDDA: A comparison of the Accurette and Vabra aspirator and uterine curettage. S Afr Med J. 1982, 61: 114-116.
Sonnendecker EW, Sevitz H, Hofmeyr GJ: Diagnostic accuracy of the accurette endometrial sampler. S Afr Med J. 1982, 61: 109-113.
Kufahl J, Pederson I, Eriksen PS, Helkjaer PE, Larsen LG, Jensen KL, de Nully P, Philipsen T, Wahlin A.: Transvaginal ultrasound, endometrial cytology sampled by Gynoscann and histology obtained by Uterine Explora Curette compared to the histology of the uterine specimen. A prospective study in pre- and postmenopasual women undergoing elective hysterectomy. Acta Obstet Gynecol Scand. 1997, 76: 790-796.
Goldchmit R, Katz Z, Blickstein I, Caspi B, Dgani R: The accuracy of endometrial Pipelle sampling with and without sonographic measurement of endometrial thickness. Obstet Gynecol. 1993, 82: 727-730.
Kavak Z, Ceyhan N, Pekin S: Combination of vaginal ultrasonography and pipelle sampling in the diagnosis of endometrial disease. Aust N Z J Obstet Gynaecol. 1996, 36: 63-66.
Botsis D, Kassanos D, Pyrgiotis E, Zourlas PA: Vaginal sonography of the endometrium in postmenopausal women. Clin Exp Obstet Gynecol. 1992, 19: 189-192.
Garuti G, Sambruni I, Cellani F, Garzia D, Alleva P, Luerti M: Hysteroscopy and transvaginal ultrasonography in postmenopausal women with uterine bleeding. Int J Gynaecol Obstet. 1999, 65: 25-33. 10.1016/S0020-7292(98)00224-0.
Haller H, Matejcic N, Rukavina B, Krasevic M, Rupcic S, Mozetic D: Transvaginal sonography and hysteroscopy in women with postmenopausal bleeding. Int J Gynecol Obstet. 1996, 54: 155-159. 10.1016/0020-7292(96)02677-X.
Grigoriou O, Kalovidouros A, Papadias C, Antoniou G, Antonaki V, Giannikos L: Transvaginal sonography of the endometrium in women with postmenopausal bleeding. Maturitas. 1996, 23: 9-14. 10.1016/0378-5122(95)00945-0.
Karlsson B, Granberg S, Wikland M, Ryd W, Norstrom A: Endovaginal scanning of the endometrium compared to cytology and histology in women with postmenopausal bleeding. Gynecol Oncol. 1993, 50: 173-178. 10.1006/gyno.1993.1188.
Malinova M, Pehlivanov B: Transvaginal sonography and progesterone challenge for identifying endometrial pathology in postmenopausal women. Int J Gynecol Obstet. 1996, 52: 49-53. 10.1016/0020-7292(95)02554-5.
Wolman I, Sagi J, Ginat S, Jaffa AJ, Hartoov J, Jedwab G: The sensitivity and specificity of vaginal sonography in detecting endometrial abnormalities in women with postmenopausal bleeding. J Clin Ultrasound. 1996, 24: 79-82. 10.1002/(SICI)1097-0096(199602)24:2<79::AID-JCU5>3.3.CO;2-6.
Malinova M, Pehlivanov B: Transvaginal sonography and endometrial thickness in patients with postmenopausal uterine bleeding. Eur J Obstet Gynecol Reprod Biol. 1995, 58: 161-165. 10.1016/0028-2243(94)01968-1.
Stovall TG, Solomon SK, Ling FW: Endometrial sampling prior to hysterectomy. Obstet Gynecol. 1989, 73: 405-409.
Gupta JK, Wilson S, Desai P, Hau C: How should we investigate women with postmenopausal bleeding?. Acta Obstet Gynecol Scand. 1996, 75: 475-479.
Guner H, Tiras MB, Karabacak O, Sarikaya H, Erdem M, Yildirim M: Endometrial assessment by vaginal ultrasonography might reduce endometrial sampling in patients with postmenopausal bleeding: A prospective study. Aust N Z J Obstet Gynaecol. 1996, 36: 175-178.
Abu-Ghazzeh Y, Shakoury WA, Barqawi R: Comparative study of transvaginal hysterosonography and biopsy for the evaluation of post-menopausal bleeding. Ann Saudi Med. 1999, 19: 116-119.
De Silva BY, Stewart K, Steven JD, Sathanandan M: Transvaginal ultrasound measurement of endometrial thickness and endometrial pipelle sampling as an alternative diagnostic procedure to hysteroscopy and dilatation and curettage in the management of post-menopausal bleeding. J Obstet Gynaecol. 1997, 17: 399-402. 10.1080/01443619750112989.
Taviani A, Braccini S, Toniazzi P, Pantani P, Costamagna V, Gambini G, Pancanti V: Transvaginal echography in patients with postmenopausal metrorrhagia. Minerva Ginecol. 1995, 47: 369-372.
Morales FJ, Dualde D, Marinaro A: Value of vaginal ultrasound in the diagnosis of postmenopausal metrorrhagia. Radiologia. 1998, 40: 255-262.
Mortakis AE, Mavrelos K: Transvaginal ultrasonography and hysteroscopy in the diagnosis of endometrial abnormalities. J Am Assoc Gynecol Laparosc. 1997, 4: 449-452.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/2/18/prepub
TJC and KSK conceived and designed the study with input from GtR. TJC conducted the systematic review and acquired all data. AC conducted the statistical analyses with input from KSK and GtR. TJC wrote all versions of the manuscript. KSK and GtR critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript.
About this article
Cite this article
Clark, T.J., ter Riet, G., Coomarasamy, A. et al. Bias associated with delayed verification in test accuracy studies: accuracy of tests for endometrial hyperplasia may be much higher than we think!. BMC Med 2, 18 (2004). https://doi.org/10.1186/1741-7015-2-18