Effectiveness of training interventions to improve quality of medical certification of cause of death: systematic review and meta-analysis

Background Valid cause of death data are essential for health policy formation. The quality of medical certification of cause of death (MCCOD) by physicians directly affects the utility of cause of death data for public policy and hospital management. Whilst training in correct certification has been provided for physicians and medical students, the impact of training is often unknown. This study was conducted to systematically review and meta-analyse the effectiveness of training interventions to improve the quality of MCCOD. Methods This review was registered in the International Prospective Register of Systematic Reviews (PROSPERO; Registration ID: CRD42020172547) and followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. CENTRAL, Ovid MEDLINE and Ovid EMBASE databases were searched using pre-defined search strategies covering the eligibility criteria. Studies were selected using four screening questions using the Distiller-SR software. Risk of bias assessments were conducted with GRADE recommendations and ROBINS-I criteria for randomised and non-randomised interventions, respectively. Study selection, data extraction and bias assessments were performed independently by two reviewers with a third reviewer to resolve conflicts. Clinical, methodological and statistical heterogeneity assessments were conducted. Meta-analyses were performed with Review Manager 5.4 software using the ‘generic inverse variance method’ with risk difference as the pooled estimate. A ‘summary of findings’ table was prepared using the ‘GRADEproGDT’ online tool. Sensitivity analyses and narrative synthesis of the findings were also performed. Results After de-duplication, 616 articles were identified and 21 subsequently selected for synthesis of findings; four underwent meta-analysis. The meta-analyses indicated that selected training interventions significantly reduced error rates among participants, with pooled risk differences of 15–33%. Robustness was identified with the sensitivity analyses. The findings of the narrative synthesis were similarly suggestive of favourable outcomes for both physicians and medical trainees. Conclusions Training physicians in correct certification improves the accuracy and policy utility of cause of death data. Investment in MCCOD training activities should be considered as a key component of strategies to improve vital registration systems given the potential of such training to substantially improve the quality of cause of death data. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-020-01840-2.


Medical certification of cause of death
The Medical Certificate of Cause of Death (Fig. 1) is a standardised universal form recommended by the WHO for international use, which has been adopted by most WHO member states [6]. The WHO also provides instructions on correct cause of death reporting to improve the quality of medical certification and subsequent data [7].
When a single cause of death is reported on the death certificate, this becomes the underlying cause of death used for tabulation. When more than one cause of death is reported, the disease or injury which initiated the sequence of events that produced the fatal event becomes the underlying cause of death [6].
Despite the availability of guidance, errors in cause of death certification have been observed across all geographical regions, with inadequate certification by doctors remaining the principal reason for inaccurate death data [8,9]. Over the past few decades, therefore, training medical doctors in death certification has become a key intervention employed by health services and national governments to improve mortality statistics. Interventions have included improvements in death certificate formats, training programmes on completion of death certificates, development of self-learning educational materials, implementation of cause of death query systems, periodic peer auditing of death certificates and increasing autopsy rates [10][11][12].

Intervention studies on death certification
Several studies have investigated the effectiveness of interventions to improve the quality of death certification [13][14][15]. Whilst improvement in death certification accuracy is often reported, negative findings have also been published [16]. Moreover, there are few randomised controlled trials (RCTs) or similar studies that have produced high-quality evidence. A 2010 literature review identified 129 studies on the effectiveness of educational interventions for death certification, ultimately reviewing 14, including three RCTs [8]. All educational interventions identified in the review improved certain aspects of death certification, although the statistical significance of evaluation results varied with the type of intervention. Given the absence of any systematic review and metaanalysis of death certification training interventions, as well as the increase in experimental data produced in the past decade and the need-made even more urgent by the COVID-19 pandemic-to strengthen national vital registration and cause of death data systems, further evaluation is essential. In this study, we systematically review and meta-analyse the effectiveness of training interventions for improving the quality of medical certification of cause of death (MCCOD). To our knowledge, no study has specifically investigated interventions intended to reduce errors in MCCOD in a systematic review.

Preparation and search strategy
This review was registered in the International Prospective Register of Systematic Reviews (PROSPERO; Registration ID: CRD42020172547). Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIS MA) guidelines were followed throughout the review process [17].
A comprehensive literature search was conducted to identify published articles investigating the effectiveness of training and education interventions to improve death certification (additional file 1: Fig. S1). The search was conducted on the CENTRAL, Ovid MEDLINE and Ovid EMBASE electronic databases, and returned 1060 results, which were exported to EndNote X9 citation manager and deduplicated. The remaining 676 studies were then limited to those published from 1994 onwards (where 1994 is the year ICD-10 was implemented) resulting in 616 studies for screening.

Eligibility criteria and study selection
This study aimed to assess the effectiveness of training interventions in improving the quality of MCCOD compared to generic academic training in training curricula for current, as well as prospective physicians (in randomised studies), or pre-intervention quality parameters (in non-randomised studies) [8]. Two reviewers (BPK and JS) independently reviewed each study against inclusion/exclusion criteria (additional file 2: Fig. S2). Studies were screened by titles and abstracts using DistillerSR online screening software. Full texts of 44 records were then reviewed, as well as an additional eight records that were identified from the study reference lists. All disputes were resolved by an expert third reviewer (LM). Researchers were blinded to each others' decisions. A total of 21 studies were included for data extraction and final analysis (Fig. 2). One reviewer extracted data from the selected studies (BPK), with findings then reviewed by a second reviewer (JS). Disputes were resolved independently by the third reviewer (LM).

Risk of bias, meta-analysis and narrative synthesis
Selected studies were categorised under 'randomised' and 'non-randomised', and risk of bias was assessed by two reviewers (BPK and JS) with disputes resolved by the third reviewer (LM). Randomised trials were assessed using the seven domains of the GRADE recommendations, and non-randomised studies were assessed using the seven domains of ROBINS-I criteria [18,19].
All studies were initially assessed for clinical and methodological heterogeneity [20]. Four interventions were eligible to undergo meta-analysis in relation to five outcomes. As these were before-and-after studies without control groups, the 'generic inverse variance method' was used in pooling [21]. Review Manager 5.4 software was used in the meta-analysis and the effect measure was 'risk difference' (i.e. percentage of death certificates with each error). Statistical heterogeneity was assessed using the I-square statistic and chi-square test. When potential outliers were removed in dealing with statistical heterogeneity, sensitivity analyses were performed with and without excluded studies [22]. Robustness of the effect measures was explored further using a sensitivity analysis with both fixed and random effect assumptions [22]. Potential publication bias was explored with the generation of funnel plots.
The meta-analysis findings were imported through the 'GRADEproGDT' online tool. A 'summary of findings' table was prepared, and related narrative components added to the table [23]. The certainty assessments were done using eight criteria: study design, risk of bias, potential of publication bias, imprecision, inconsistency, indirectness, magnitude of effect, dose-response gradient and effect of plausible confounders [24]. Studies or subgroups that were not included in the meta-analysis were included in a narrative synthesis of findings.
Seminars, interactive workshops, teaching programmes and training sessions were the most common terms used in introducing the interventions. These ranged in duration from 45 min [13] to 5 h [27], and some interventions included subsequent sessions on additional days [36]. Other descriptions included 'training of trainers' (Philippines, Myanmar, Sri Lanka) [30], a video (UK) [35] and web-based or online training (USA, Fiji) [14,15,31]. In Peru, training was complementary to an online death certification system [32].
For the majority of interventions, a comparison of certification errors pre-and post-intervention was used as the measure of impact, although some studies developed a special knowledge test or used a quality  Quasi-experimental study  index. These included the Mid-America-Heart Institute (MAHI) Death-Certificate-Scoring System (two interventions) [13,14], knowledge assessment tests developed by the investigators (three interventions) [31,35,37], and quality indexes providing numerical scores based on ICD volume 2 best-practice certification guidelines [15].

Risk of bias assessments
The risk of bias assessments for the randomised studies [13,35,37] are shown in Fig. 3a and in Fig. 3b for the non-randomised studies. For all randomised studies, 'blinding of participants and personnel' was assessed as high-risk given the difficulty of maintaining blinding for training interventions. All three studies had pre-determined outcomes and were rated low risk for 'selective reporting'.
All but one study were before-after studies without a separate control group. Due to the method of recruitment, none of the studies was characterised as low-risk in relation to confounding and selection bias. However, since the intervention periods were clearly defined, all studies were characterised as lowrisk for 'bias in measurement classification of interventions'.

Meta-analysis
Since the interventions targeting medical students were found to be clinically heterogenous, potential metaanalyses were restricted to those targeting physicians. In anticipation of substantial methodological heterogeneity, the meta-analysis was planned separately for nonrandomised studies. Findings of the studies and subgroups initially entered to the meta-analysis are summarised in additional file 3: Tables S1-S5.
As the initial meta-analyses showed statistical heterogeneity, sensitivity analyses were performed after excluding a potential outlier in each comparison, with both fixed and random effect assumptions (Table 2). Except for 'ill-defined underlying cause of death' [43], the direction and significance of the estimates did not change with these sensitivity analyses.
The forest plots of the five outcomes (i.e. after excluding the outliers) included in the meta-analyses are shown in Fig. 4a-e. Three interventions were included in each meta-analysis [30].
The lowest pooled risk difference (15%) was observed for 'multiple causes per line' and 'ill-defined underlying cause of death' whereas the highest was for 'no disease time interval' (33%).
Funnel plots exploring potential publication bias are shown in Fig. 5a-e.
All funnel plots were generally symmetrical. A cautious interpretation of these is included in the "Discussion" section.
In the 'summary of findings' table (Table 3), the certainty assessments of these five outcomes are presented. 'Moderate certainty' was assigned to four outcomes and 'low certainty' to one. Findings of related additional studies have also been summarised as comments in Table 3.

Narrative synthesis of other findings Findings of randomised studies
In two of the three randomised studies conducted on medical interns, overall scores improved with the intervention (p < 0.05) [13,37]. In the third study, which was conducted on medical students, there was weak evidence for an improvement in the overall performance score (p = 0.046), as well as a 'skill score' (p = 0.066) [35]. In one study, 'correct identification of the COD' improved more in the intervention group (15% to 91%) compared to the control group (16% to 55%), and 'erroneous identification of cardiac deaths' decreased more with the intervention (56% to 6%) compared to the controls (64% to 43%) [13]. In a South African study, three errors ('mechanism only', 'improper sequence' and 'absence of time interval') were significantly reduced in the intervention group only, whereas 'competing causes' and 'abbreviations' were reduced in both groups [37].

Other comparisons
Case-wise comparisons with a set of errors were conducted in two studies [25,27]. Most errors decreased following the intervention. In one non-randomised controlled study, a custom performance score increased post-intervention [31]. One study in England explored 'mentioning consultant's name' and 'completion by a non-involved doctor', both of which improved following the intervention [38]. In a Canadian study, 'increased use of specific diseases as UCOD' and 'being more knowledgeable on not using conditions like 'old age'' improved in the intervention group [33]. 'Competing causes' were less common post-intervention in two Indian studies, with varying strength of evidence (p = 0.001 and p = 0.069) [28,36], but not in a Canadian study (p = 0.81) [34]. 'Mechanism of death followed by a legitimate UCOD' showed non-significant reductions in three studies (45.9% to 36.1%, 13.5% to 7.8% and 16% to 6.6%) [28,34,36]. Other studies that assessed 'presence of at least one-major error' and 'keeping blank lines' in the sequence generally showed a reduction following the intervention [30,34].

Discussion
We conducted a systematic review of the impact of 24 selected interventions to improve the quality of MCCOD. Our meta-analysis suggests that selected training interventions significantly reduced error rates amongst participants, with moderate certainty (four outcomes), and low certainty (one outcome). Similarly, the findings of the narrative synthesis suggest a positive impact on both physicians and medical trainees. These findings highlight the feasibility and importance of strengthening the training of current and prospective physicians in correct MCCOD, which will in turn increase the quality and policy utility of data routinely produced by vital statistics systems in countries. The systematic approach we followed distinguishes this study from the more common 'narrative reviews', whilst the meta-analysis provides pooled and precise estimates of training impact [44]. Rigorous heterogeneity and 'certainty of evidence' assessments were performed. To enable a better comparison of the quality of the selected studies, risk of bias assessments were performed using different criteria for randomised and nonrandomised studies [18,19]. Given the controversy surrounding conventional direct comparison methods for before-after studies in the literature-due to these methods' non-independent nature [45]-less controversial 'generic inverse variance methods' were used in this review.
Irrespective of the study design (i.e. randomised or not) and population (i.e. physicians or medical students), training interventions were shown to reduce diagnostic errors, either in relative terms or due to an increase in scaled scores. Risk differences were used as pooled effect measures and typically suggested that certification errors decreased between 15 and 33% as a result of the training. Our findings also suggest that refresher trainings and regular dissemination of MCCOD quality assessment findings can further reduce diagnostic errors. However, due to the inherent limitations of using 'absolute risk estimates' like risk differences, we place greater emphasis on the direction of the effect measure and not on its size [46].
The pre-intervention percentages of all error categories selected for meta-analyses were below 51%, except for the category 'absence of time intervals', which ranged from 37 to 93% [30]. Based on post-intervention percentages, we therefore conclude that the intervention had a markedly favourable impact. For example, postintervention errors were reduced to between 6.0 and 20.8% for 'multiple causes in a single line' and between 5.8 and 20.3% for 'improper sequence'. For all  interventions reviewed under the meta-analysis, posttraining assessments were conducted between 6 months and 2 years after the intervention. Hence, the observed risk differences reflect the impact of the intervention over a longer time period, which is likely to be a more useful measure of the sustainability and effectiveness of training interventions than the more commonly used immediate post-training assessments. The classification of errors into 'minor' or 'major' varies between studies. For example, 'absence of time intervals' was considered a major error in one study [32], but minor in several others [28,30,34,36]. Some studies, although not all, classified 'mechanism of death followed by a legitimate UCOD' as an error [26,28,34,36,40]-furthermore, the scoring method and content of the assessment varied between studies [13,14,31,35,37]. Given this heterogeneity, it is important to focus on the patterns of individual errors and to be clear about how errors are defined before comparing results across studies.
Interestingly, we found greater variation across studies for post-intervention composite error indicators than for specific errors. Across the six interventions considered, post-intervention measures of 'at least one major error' ranged from 3.75 to 44.8% [30,34,40] whilst the fraction of cases with 'at least one error' ranged from 9 to 74.8% [30,38,41]. It is also interesting to note that doctors appeared to benefit less from the interventions compared to interns. This may in part reflect lower priority given by doctors to certification compared to patient management, possibly due to limited understanding of the public policy utility of data derived from individual death certificates.
In some studies, it is possible that a small proportion of post-intervention death certificates were actually completed by doctors who had not undergone training. This would have the effect of diluting the impact estimates of the training interventions. Further, constructing the causal sequence on the death certificate may involve a degree of public health and epidemiological consideration, in addition to clinical reasoning, which may be challenging for some doctors to incorporate into the certification process. This could explain the general lower improvement scores reported for the causal sequence. Finally, correct certification practices are heavily dependent on the attitudes of doctors towards the process, as well as the level of monitoring, accountability and feedback related to their certification performance. Most interventions were conducted as interactive workshops that enabled participants to undergo 'on-thespot' training [13, 25-30, 33, 34, 36, 37, 41]. There is a paucity of studies with control groups that compare different interventions. One study concluded that a 'faceto-face' intervention was more effective than 'printed instructions' [13]. However, another concluded that an added 'teaching session' did not improve performance compared to an 'education handout', although both strategies were independently effective [13,37]. More research is required to test the relative effectiveness of training methods, such as online interventions, compared to those requiring face-to-face interaction.
Our analysis suggests several cost-effective options for improving the quality of medical certification. To the extent that individual-level training of doctors in correct medical certification is costly, strengthening the curricula in medical schools designed to teach medical students how to correctly certify causes of death, and ensuring that these curricula are universally applied, is likely to be the most economical and sustainable way to improve the quality of medical certification. How and when this training is applied prior to completion of medical training is likely to vary from one context to another and will depend on local requirements for internship training. Training smaller groups of physicians as master trainers in medical certification and subsequently rolling out the training in provincial and district hospitals is likely to be an effective and economical interim measure to improve certification accuracy, as has been demonstrated in a number of countries [30].
In some countries, electronic death certification has been used as a means to standardise and improve the quality of cause of death data [32]. Electronic death certification can be helpful in avoiding certain errors such as illegible handwriting and reporting multiple causes on In one Sri Lankan study, ill-defined underlying cause of death was observed to be higher postintervention (10.6% versus 4.4%) GRADE Working Group grades of evidence. High certainty: We are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low certainty: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect CI Confidence interval a Due to being non-randomised studies and since in some studies, pre-and as post-analyses were not done immediately close to the intervention; the bias due to confounding was marked as 'serious' b Funnel plot not fully symmetrical in one study that underwent meta-analysis *The risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI) a single line (by not allowing the certifier to report more than one condition per line) [47]. An electronic certification system can also generate pop-up messages to remind the certifier not to report modes of dying, or symptoms and signs, as the underlying cause. However, electronic certification cannot improve the accuracy of the causal sequence or alleviate the reporting of competing causes, unspecified neoplasms or non-reporting of external causes. Furthermore, whilst cause of death data entered in free text format could improve the quality of medical certification [48] when electronic certification is enhanced with suggested text options and 'pick' lists, this can lead to systematic errors in medical certification. This review has several limitations. The studies examined in this review included a diverse range of participants and intervention methods and were conducted in various cultural settings. The duration and modality of the training interventions varied substantially across studies. Only three interventions were randomised, and due to the diversity in non-randomised studies, the potential influence of confounding factors on the quality parameters assessed cannot be excluded. These factors were, however, considered in risk of bias and heterogeneity assessments.
There is also considerable subjectivity in the assessment of some criteria, including 'legibility' and 'incorrect sequence' that could lead to bias in the assessments. Despite outcomes usually being pre-defined, adherence to risklowering strategies, such as 'blinding the assessor', was often not described [14, 15, 25, 26, 28-33, 36, 38-42]. Despite the inclusion of only three interventions, each meta-analysis included an adequate number of at least 1500 observations per group. Even though funnel plots were presented for gross exploration of publication bias, generally the interpretation of these are recommended for meta-analyses with more than 10 comparisons. Furthermore, little evidence is available on the appropriateness of funnel plots drawn with risk differences [49].