An integrated approach to processing WHO-2016 verbal autopsy data: the InterVA-5 model

Background Verbal autopsy is an increasingly important methodology for assigning causes to otherwise uncertified deaths, which amount to around 50% of global mortality and cause much uncertainty for health planning. The World Health Organization sets international standards for the structure of verbal autopsy interviews and for cause categories that can reasonably be derived from verbal autopsy data. In addition, computer models are needed to efficiently process large quantities of verbal autopsy interviews to assign causes of death in a standardised manner. Here, we present the InterVA-5 model, developed to align with the WHO-2016 verbal autopsy standard. This is a harmonising model that can process input data from WHO-2016, as well as earlier WHO-2012 and Tariff-2 formats, to generate standardised cause-specific mortality profiles for diverse contexts. The software development involved building on the earlier InterVA-4 model, and the expanded knowledge base required for InterVA-5 was informed by analyses from a training dataset drawn from the Population Health Metrics Research Collaboration verbal autopsy reference dataset, as well as expert input. Results The new model was evaluated against a test dataset of 6130 cases from the Population Health Metrics Research Collaboration and 4009 cases from the Afghanistan National Mortality Survey dataset. Both of these sources contained around three quarters of the input items from the WHO-2016, WHO-2012 and Tariff-2 formats. Cause-specific mortality fractions across all applicable WHO cause categories were compared between causes assigned in participating tertiary hospitals and InterVA-5 in the test dataset, with concordance correlation coefficients of 0.92 for children and 0.86 for adults. The InterVA-5 model’s capacity to handle different input formats was evaluated in the Afghanistan dataset, with concordance correlation coefficients of 0.97 and 0.96 between the WHO-2016 and the WHO-2012 format for children and adults respectively, and 0.92 and 0.87 between the WHO-2016 and the Tariff-2 format respectively. Conclusions Despite the inherent difficulties of determining “truth” in assigning cause of death, these findings suggest that the InterVA-5 model performs well and succeeds in harmonising across a range of input formats. As more primary data collected under WHO-2016 become available, it is likely that InterVA-5 will undergo minor re-versioning in the light of practical experience. The model is an important resource for measuring and evaluating cause-specific mortality globally.


Background
The quality and performance of national health information systems varies widely around the world, correlated strongly with economic and infrastructural development. Countries that currently operate efficient and detailed health information systems, based on complete individual data, typically started from nothing 200 to 300 years ago, and began with basic registration of deaths and their causes. If the major causes of death in a population can be characterised, this leads to considerable insights in terms of health priorities and the implementation of appropriate interventions and services. However, the World Health Organization (WHO) estimated that around 50% of 56 million deaths worldwide in 2015 were not registered with information on cause [1]. Therefore, there is a great need for cost-effective, rapid and consistent tools to address this gap in the medium term.
Verbal autopsy (VA) has become an increasingly important approach for documenting deaths that otherwise pass without registration or certification, typically in lower-income countries and particularly in Africa and Asia. The basic principle of VA is that a standardised interview is conducted with family members or others having detailed knowledge of the circumstances, signs and symptoms leading to the death, and the interview data are processed into likely medical causes of death.
Necessary tools for large-scale implementation of VA comprise several essential components, which can be used in conjunction with each other to achieve the over-arching objective of making step-changes in the proportion of deaths worldwide that are appropriately registered by cause. Part of WHO's normative global role is to develop and update standard protocols for VA interviews and cause of death reporting categories, of which the most recent version is the WHO 2016 verbal autopsy instrument (WHO-2016) [2]. This new standard, taken as a given starting point for developing InterVA-5, was primarily intended to achieve harmonisation between earlier WHO standards and the Tariff-2 system [3], which inevitably led to a larger number of interview items.
Additionally, because VA interviews typically involve multiple complex skip patterns (for example where particular interview items relate to specific age/sex groups), there are considerable efficiency gains to be made by handling VA interviews with portable data capture tools, typically implemented on smartphones or tablets. This has previously been shown to be an effective, cost-effective and acceptable approach [4]. However, the InterVA-5 software does not provide data capture functions but is designed to post-process VA interview data gathered by various means.
Although physicians have been widely used to assign individual causes of death using VA data after interviews have been conducted, that approach can be costly, slow and not always consistent between practitioners and contexts [5]. Thus, it has become more common to apply automated computerised models to VA data, which are much cheaper, faster and more consistent. It can be argued that physicians may be able to bring additional nuances to assigning causes to individual cases compared with automated models, particularly in specific research settings. Additionally, careful physician review may play a role in quality control and VA model development. Nevertheless, making any significant future impact on categorising the over 20 million uncertified deaths every year using VA will necessarily depend on using automated methods.
There are currently three families of automated VA models of relevance to the WHO-2016 standard, namely InterVA, InSilicoVA and Tariff [3,6,7]. Initial work on InterVA models dates from 2003 [8] and has passed through a number of iterations since. InSilicoVA built on the foundations of InterVA, aiming to achieve higher precision and measures of uncertainty by, among other things, simultaneously estimating distributions of individual cause-assignment probabilities and cause-specific mortality fractions, and differentiating between negative and unknown responses to VA responses. InSilicoVA is closely related to InterVA, using the same probability base to relate indicators and causes, and thus uses the same interview items. Tariff was first proposed in 2011 [9] and has subsequently been revised and shortened to Tariff-2, as implemented in the SmartVA-Analyze software [10].
Thus the aim of this paper is to present the development and evaluation of InterVA-5, the latest product in the InterVA family, designed to correspond to the WHO-2016 standard [11]. This builds substantially on the InterVA-4 model [6], which corresponded to the WHO 2012 verbal autopsy instrument (WHO-2012), but InterVA-5 also includes significant new concepts as well as updates based on the experience of processing hundreds of thousands of VA cases using InterVA-4. The harmonising concept behind WHO-2016 was carried forward into the design of InterVA-5, which not only directly corresponds to WHO-2016 but also incorporates backward compatibility with WHO-2012 and InterVA-4 [6], as well as coherence with Tariff-2 and the associated SmartVA-Analyze model [10]. Since WHO-2012 and Tariff-2 content are by definition separate subsets of WHO-2016, it was feasible to design InterVA-5 as a harmonising model that could handle WHO-2016, WHO-2012 or Tariff-2 datasets, in the interests of achieving wider comparability and consistency in processing existing data.
In addition, InterVA-5 incorporates a novel concept of Circumstances Of Mortality CATegories (COMCAT) as a tool that complements medical causes of death with assigning circumstantial categories to deaths, related to critical limiting factors for care seeking and utilisation processes at and around the time of death, as they occur in any specific health systems and social context. For example, for a woman whose medical cause of death is assigned as obstetric haemorrhage, her death might have occurred at home because she had no means or resources to call for help or get to a health facility; another woman with the same medical cause of death might have been inadequately managed during her delivery despite getting to a health facility. The intention of COMCAT is to make distinctions between important circumstances around a death, particularly where these may not be reflected in medical causes. The conceptual basis of COMCAT is described elsewhere [12], and a detailed operational evaluation of its implementation within InterVA-5 will follow as a separate paper.

Implementation
The overall architecture of the InterVA-5 software follows the same general pattern as was implemented in the InterVA-4 software [6], involving the following major components: 1. System initiation-reading knowledge base and accepting user input parameters 2. Reading input data file and checking format 3. Checking data consistency, excluding errors and generating warnings 4. Processing likelihoods for each pregnancy status category, for each case 5. Processing likelihoods for each cause of death category, for each case 6. Processing likelihoods for each COMCAT, for each case 7. Post-processing output file with pregnancy status, up to three causes and COMCAT for each case In line with the existing concept that InterVA products are made available on an open-source basis, the InterVA-5 software is issued under the GNU General Public License Version 3 (GPL3) and the accompanying knowledge base that drives the system is also freely available. In the same spirit, the specifications for the input and output files are defined in non-proprietary comma-separated variable (CSV) format. The executable software, code and full user documentation are included in the download (see linked GitHub repository) [11].
For historical reasons, the InterVA-5 software was first implemented and compiled as a run-time version in Microsoft Visual FoxPro 9.0, the same programming environment as has been used for earlier versions of InterVA. In order to co-validate the software, a parallel implementation in R was undertaken by a separate software team at another institution, and test outputs from the two separate implementations carefully checked for any discrepancies or errors. The R implementation of InterVA-5 is available via the openVA repository for open-access VA resources as open source software under GPL3 (see linked GitHub repository) [13]. The Windows and R software versions are kept synchronised and produce the same results.
All of the InterVA family of models have used a simple input format of binary questions. Up until InterVA-4, the response of interest was always defined as "yes", even though that sometimes made the wording of questions awkward. Therefore, InterVA-5 uses a data-driven concept of a substantive response for each item, which may be "yes" (e.g. "Did (s) he have a fever?") or "no" (e.g. "Was the placenta completely delivered?"), and the probabilistic modelling updates likelihoods for each cause category on the basis of substantive responses recorded in the VA data.
Where WHO VA items are specified in other ways (e.g. as continuous variables for duration of symptoms), InterVA takes pre-determined categories and implements each category as a binary variable. The detailed specification of WHO-2016 [14] also includes a substantial preamble of civil registration parameters which are not intended to elucidate cause of death, such as civil identity numbers and residential addresses which are not relevant to InterVA-5. Overall, the 305 items in WHO-2016 that are relevant to assigning cause of death correspond to 353 binary indicators in the InterVA-5 data input format, plus an individual identifier field. InterVA-5 input data can therefore be prepared from complete WHO-2016 data records, using a suitable script to convert to the 353 variables plus identifier required in the CSV input file. Alternatively, if there is a prior decision to use InterVA-5 as the interpretation tool, a tablet data collection tool directly designed for the InterVA-5 format can be implemented for the VA interview and the data transferred directly (for example, the MIVA utilities included in the linked GitHub repository). Since WHO-2012/InterVA-4 and Tariff-2/Smart-VA-Analyze are both subsets of the WHO-2016 standard [2], it is also relatively straightforward to run conversion scripts from those data formats to the InterVA-5 input format. Figure 1 shows the combinations of input indicators for the three data formats (InterVA-5 353 indicators, InterVA-4 245 and Tariff-2 241).
The established knowledge base that drove InterVA-4 (version 4.04) was used as the basis for the InterVA-5 knowledge base. As has always been the case with the InterVA family of models, this knowledge base is an accumulated resource, based on both such data sources as are available plus syntheses of expert opinion, as previously described [15]. To move from this InterVA-4 resource to a revised version for InterVA-5, we needed to do four things: The only change in mortality cause categories moving from WHO-2012 to WHO-2016 was a redefinition of the WHO-2012 category 01.11 (haemorrhagic fever) into two separate categories; 01.11 (haemorrhagic fever excluding dengue fever) and 01.12 (dengue fever). Revised probabilities for these two categories were reviewed and derived on the basis of available evidence and expert input.
The additional items in WHO-2016 compared with WHO-2012 were almost all contained in the Population Health Metrics Research Consortium (PHMRC) reference dataset [16], which was a longer precursor of the Tariff-2 format. Conditional probabilities for these items were derived by randomly selecting half of the PHMRC data as a training dataset and using that as a basis for filling the probability base for the additional items. The PHMRC reference dataset [16] was randomly divided into equal train and test datasets for revising and testing the InterVA-5 model. The training dataset was used primarily to inform conditional probability assignments in InterVA-5 for the 89 indicators ( Fig. 1) present in the Tariff-2 indicator subset but not in the WHO-2012 indicator subset. The other half of the PHMRC dataset was retained as a test dataset for the new model.
A few new or revised items (e.g. the new WHO-2016 item "Did (s) he receive (or need) antiretroviral therapy (ART)?", and splitting the InterVA-4 item "Did (s) he have fever for less than 2 weeks before death?" into "Did the fever last less than a week before death?" and "Did the fever last at least one week, but less than 2 weeks before death?", which was specifically relevant to the additional WHO VA cause category for dengue fever) required revisions to the knowledge base on the basis of expert opinion. The complete conditional probability matrix that InterVA-5 uses is included as a spreadsheet in the download of the model [11].
A few reported issues with InterVA-4, such as implausible over-attribution of WHO, cause category 06.01 (acute abdomen) and under-attribution of 04.01 (acute cardiac), an incorrect balance between fresh and macerated stillbirths (11.01 and 11.02) and over-attribution of 01.03 (HIV/AIDS related death) in young children were addressed within the overall process of revising the knowledge base.
Social scientists contributed to a process of estimating conditional probabilities for the COMCAT factors, on the same principles as the estimation of probabilities for causes of death. This was an inherently different exercise in that no data existed in absolute terms nor indeed any sense that COMCAT outputs could be considered fundamentally correct or incorrect. This is an area that will be revisited as experience of its use grows, but the current InterVA-5 knowledge base constitutes a starting point for this novel concept.
Thus, overall the implementation of InterVA-5 constitutes a cause of death model which is fully compatible with the WHO-2016 instrument, which can also process WHO-2012 and Tariff-2 datasets, and which can assign

Results
Testing the new InterVA-5 software has been an important part of the development process. As with any software update, evaluating continuity with the previous version is important, as well as overall performance of the new version. Evaluating assignment of cause of death in any context is notoriously difficult because of a lack of any absolute comparator [17]. InterVA-4 has previously been extensively compared with the same PHMRC dataset as used here [17], physician assigned causes of death [18], co-validated with Global Burden of Disease mortality estimates [19] and deployed in large-scale mortality analyses [20]. For evaluating comparability between different approaches to modelling the same set of VA cases, the concordance correlation coefficient (CCC), as implemented in the Stata concord command, is a useful measure of equivalence.
Since the WHO-2016 instrument is relatively new, there are not yet any extensive VA data sources specifically collected under that protocol available for evaluation. However, some earlier VA archives do contain data including a substantial proportion of WHO-2016 items, which therefore for the present have to suffice as material for evaluating InterVA-5. There are two major objectives: firstly to compare the InterVA-5 cause of death assignments with an established, best available, reference source (even though no perfect reference source exists) and secondly to compare the performance of InterVA-5 when processing data aligned with WHO-2016, WHO-2012 and Tariff-2 input formats.
Firstly, the 6130 VA records in the PHMRC test dataset were used, which covered 248/353 (70.3%) of the InterVA-5 input indicators. The strengths of the PHMRC dataset are that it includes causes of death attributed by tertiary hospitals, though not all the WHO-2016 cause of death categories are included, and its verbal autopsy data were not used as part of assigning the hospital causes of death. The PHMRC dataset causes did not differentiate between fresh and macerated stillbirths, nor between different haemorrhagic fevers, which were amalgamated into stillbirths and haemorrhagic fevers for this comparison. Because the hospital and VA processes leading to the attribution of indeterminate cause to some deaths were very different, indeterminate outcomes (1.4% for hospital and 11.1% for VA) were excluded by redistributing proportionally over all other causes for this comparison. Cause-specific mortality fractions (CSMFs) for WHO-2016 cause categories, from the hospital causes and InterVA-5, are shown in Table 1, for the 5-plus and under-5 age groups, by WHO-2016 cause categories and broad groups. InterVA-5 CSMFs were derived by aggregating individually assigned likelihoods for each cause, and dividing by total deaths. Figure 2 shows the agreement between the two sources, for deaths under 5 years and those 5 years and older, with different colours corresponding to the broad causes shown in Table 1. The points near the axes reflect rare causes that were either unrepresented or not directly comparable between the two sources, such as childhood cancers, amounting to 3.1% of the total deaths under 5 years and 1.0% of those 5 years and older. Nevertheless, we retained these points in the overall comparisons so as to take a conservative approach to assessing concordance. The CCC was 0.922 (95% CI 0.871 to 0.974) for the younger age group and 0.858 (95% CI 0.786 to 0.930) for the older age group.
For the second objective of testing the performance of the new InterVA-5 software when confronted by different subsets of input indicators, the Afghanistan 2010 national mortality survey dataset [21] was used, being a national all-age population-based dataset that was collected independently of any of the WHO-2016, WHO-2012 or Tariff-2 protocols, but included 257/353 (72.8%) of the InterVA-5 items. When reduced to the InterVA-4 and Tariff-2 subsets of the InterVA-5 items, 202/245 (82.4%) and 188/241 (78.0%) respectively of those subsets were available, as shown in Fig. 1. Table 2 shows the InterVA-5 outputs for the three datasets based on the WHO-2016, WHO-2012 and Tariff-2 standards for the under-5 and 5-plus age groups, by WHO-2016 cause categories and broad groups. Figure 3 shows the agreement between the outputs using the InterVA-5 and InterVA-4 datasets, and Fig. 4 the InterVA-5 and Tariff-2 datasets. CCCs for InterVA-4 were 0.968 (95% CI 0.947 to 0.988) for the under-5 age group, and 0.961 (95% CI 0.940 to 0.983) for the 5-plus age group; for Tariff-2, the CCCs were 0.918 (95% CI 0.869 to 0.968) for the under-5 age group and 0.871 (95% CI 0.806 to 0.936) for the 5-plus age group. Points near the axes in these comparisons reflect very rare causes that were barely measurable from this dataset.
Finally, as with any software update, it is important to demonstrate version continuity together with the effects of intentional changes as part of the update process. Figure 5 shows the Afghanistan dataset as processed by InterVA-4 (version 4.04), compared with the new InterVA-5 software processing the InterVA-4 subset of inputs. Excluding the intentional changes (shown as diamond-shaped markers in Fig. 5), CCC was 0.909 (95% CI 0.860 to 0.958).

Discussion
The development of the InterVA-5 model follows our established practice of providing analytical models for verbal autopsy data that correspond to international WHO VA standards. WHO-2016 was specifically developed as a harmonisation of various existing VA standards,  and accordingly InterVA-5 was specifically developed to be, as far as technically possible, a unifying and updated model capable of handling a range of input formats corresponding to various VA standards. One might expect that InterVA-5 would perform most robustly when used with data meeting the full WHO-2016 specification, therefore having the maximum amount of information available. However, it is important, as demonstrated here, that it can also perform reasonably comparably with WHO-2012 and Tariff-2 input formats, even though those do not fully meet current standards. Tracking mortality patterns consistently over time and place is critical in terms of evaluating health and development policy and therefore the ability to process earlier VA data collected under previous standards is strategically important. The absolute accuracy of VA in general, and in assessing specific models for assigning cause of death from VA data, raises difficult questions which have been extensively explored in various settings. In many ways, the performance of VA methods has received more scientific scrutiny than the sometimes serendipitous nature of individual physicians' certification of deaths. There is no process for cause of death attribution leads to absolute "truth" for every case, and the lack of precise comparators often makes assessments of various VA methods contentious. Here we have made use of the interesting, though by no means perfect, PHMRC reference dataset [16]. This at least provides cause of death as clinically assigned by the tertiary facilities in which the deaths occurred, which was backed up by laboratory and diagnostic evidence. Nevertheless, one can find cases where correspondence between the clinical cause of death and responses to questions in the VA interview was not obviously congruent. However, as evident in Fig. 2, the overall similar patterns of mortality between InterVA-5 and the PHMRC data, albeit in a tertiary hospital population unrepresentative of more usual VA applications, are an encouraging starting point. The comparison of broad cause categories presented at the end of Table 1 also suggests that at an overall level there are not major differences that would give rise to public health concerns.
Earlier versions of InterVA models have been used extensively and have been seen to deliver largely plausible findings over a wide range of settings and mortality patterns [20]. Nevertheless, as with any modelling exercise, there are always possibilities for improvement, with the caveat that a so-called improvement in one respect must not lead to deterioration in other respects. Our detailed evaluations reported here, using the Afghan VA dataset, of the new InterVA-5 model in relation to its antecedents are therefore very important. Although it may be difficult to compare performance on very rare causes of death, Figs. 3, 4 and 5 clearly demonstrate that on a population basis there is strong overall consistency between InterVA-5 and earlier models and standards. Demonstrating this continuity between models is important for long-term studies of population mortality.
As yet, very few primary data have been collected under the WHO-2016 standard, which limits the field applications of InterVA-5 to date, and hence the source material for evaluating InterVA-5. As was the case with InterVA-4, which underwent a series of minor modifications in response to feedback, issued as new versions of the public software over the past 5 years, it is anticipated    that InterVA-5 will experience a similar software life cycle as experience of its use extends. We therefore particularly welcome feedback from InterVA-5 users.

Conclusions
At present, InterVA-5 and the related InSilico model are the only tools for analysing VA data which are fully compatible with the WHO-2016 standard (in terms of VA interview input items and deriving all of the WHO-VA cause of death categories as outputs). The InterVA-5 model brings the additional advantage of being able to handle data from the earlier WHO-2012 and Tariff-2 standards reasonably well, thus bringing a helpful degree of harmonisation across the interpretation of various VA data formats. This harmonisation is important for monitoring long-term trends over periods when different VA standards have been used. As with any VA model, the usefulness of the outputs depends on using good quality source material from VA interviews, carefully preparing input data, and appropriately processing and  interpreting outputs. It is likely that widespread use of the model will lead to future minor refinements. The free availability of InterVA-5 means that large quantities of VA data, even into the millions of cases which could be generated in national civil registration processes, can now be processed cheaply, feasibly and consistently. Current measurement needs for the United Nations' Sustainable Development Goals, as well as monitoring and evaluating progress towards WHO's visions for Universal Health Care and non-communicable disease control, make standardised cause-specific mortality measurement techniques, as implemented in InterVA-5, an essential part of the global toolkit [22]. In addition, InterVA-5 is a tool that can readily be used by national or regional health services to track local mortality patterns.

Availability and requirements
Project name: InterVA-5 Project home page: www.interva.net Operating system(s): runs in a DOS window on a personal computer; platform independent Programming language: FoxPro (compiled into a runtime format) Other requirements: runs directly from the folder into which it is downloaded Licence: GNU General Public Licence Version 3 Any restrictions to use by non-academics: none