Improving performance of the Tariff Method for assigning causes of death to verbal autopsies

Background Reliable data on the distribution of causes of death (COD) in a population are fundamental to good public health practice. In the absence of comprehensive medical certification of deaths, the only feasible way to collect essential mortality data is verbal autopsy (VA). The Tariff Method was developed by the Population Health Metrics Research Consortium (PHMRC) to ascertain COD from VA information. Given its potential for improving information about COD, there is interest in refining the method. We describe the further development of the Tariff Method. Methods This study uses data from the PHMRC and the National Health and Medical Research Council (NHMRC) of Australia studies. Gold standard clinical diagnostic criteria for hospital deaths were specified for a target cause list. VAs were collected from families using the PHMRC verbal autopsy instrument including health care experience (HCE). The original Tariff Method (Tariff 1.0) was trained using the validated PHMRC database for which VAs had been collected for deaths with hospital records fulfilling the gold standard criteria (validated VAs). In this study, the performance of Tariff 1.0 was tested using VAs from household surveys (community VAs) collected for the PHMRC and NHMRC studies. We then corrected the model to account for the previous observed biases of the model, and Tariff 2.0 was developed. The performance of Tariff 2.0 was measured at individual and population levels using the validated PHMRC database. Results For median chance-corrected concordance (CCC) and mean cause-specific mortality fraction (CSMF) accuracy, and for each of three modules with and without HCE, Tariff 2.0 performs significantly better than the Tariff 1.0, especially in children and neonates. Improvement in CSMF accuracy with HCE was 2.5 %, 7.4 %, and 14.9 % for adults, children, and neonates, respectively, and for median CCC with HCE it was 6.0 %, 13.5 %, and 21.2 %, respectively. Similar levels of improvement are seen in analyses without HCE. Conclusions Tariff 2.0 addresses the main shortcomings of the application of the Tariff Method to analyze data from VAs in community settings. It provides an estimation of COD from VAs with better performance at the individual and population level than the previous version of this method, and it is publicly available for use. Electronic supplementary material The online version of this article (doi:10.1186/s12916-015-0527-9) contains supplementary material, which is available to authorized users.


(Continued from previous page)
Results: For median chance-corrected concordance (CCC) and mean cause-specific mortality fraction (CSMF) accuracy, and for each of three modules with and without HCE, Tariff 2.0 performs significantly better than the Tariff 1.0, especially in children and neonates. Improvement in CSMF accuracy with HCE was 2.5 %, 7.4 %, and 14.9 % for adults, children, and neonates, respectively, and for median CCC with HCE it was 6.0 %, 13.5 %, and 21.2 %, respectively. Similar levels of improvement are seen in analyses without HCE. Conclusions: Tariff 2.0 addresses the main shortcomings of the application of the Tariff Method to analyze data from VAs in community settings. It provides an estimation of COD from VAs with better performance at the individual and population level than the previous version of this method, and it is publicly available for use.
Keywords: Verbal autopsy questionnaire, Mortality surveillance, Causes of death Background Reliable data on the distribution of causes of death (COD) in a population are fundamental to good public health practice [1]. Ideally, COD data are based on accurate medical certification and registration of all deaths [2]. However, many, if not most, resource-poor countries lack adequate systems for the collection, tabulation, and dissemination of vital statistics on causes of death in their populations [3]. In the absence of comprehensive medical certification of deaths, the only feasible way to collect essential mortality data is verbal autopsy (VA), whereby relatives of the deceased respond to a questionnaire about the medical history of the decedent and of the terminal illness (the illness that led directly to death).
Methods for assigning the COD to VAs can be separated into two broad groups: those based on expert judgment of physicians and empirical methods that are data-driven. The first group includes physician-coded VAs (PCVA) [4] and InterVA, a computer program based on expert judgment [5]. The second group uses a datadriven approach, exploring patterns of responses on actual answers to verbal autopsies to ascertain the cause of death. This group includes methods such as King-Lu [6], Tariff Method [7], and Random Forest [8]. The last two were developed as part of the Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy validation study [9].
With most analytic methods it is not possible to scrutinize the relationships between responses to individual items in the VA questionnaire and the different causes of death systematically. Tariff Method, on the other hand, is a simple additive algorithm based on a score, or tariff, for each question item-COD pair that performs as well or better than other analytic methods when validated against "gold standard" deaths for which the cause has been reliably established [7].
The PHMRC study [9] selected hospital deaths that met gold standard clinical criteria and compared diagnoses from the decedents' medical records with VAs obtained from the families of the deceased. Necessarily, all decedents in the PHMRC database had had contact with health services that had the appropriate facilities and were otherwise capable of making reliable diagnoses. The PHMRC study assumes that the attributes of specific diseases leading to death in a hospital are sufficiently similar to the attributes of the same diseases leading to deaths in the community in order to draw conclusions about causes of death in the community. The principal potential application of VA is in community or population studies, where decedents can be expected to have had a range of experiences with health services. It is possible that exposure to the health care system may have influenced either the course of the illness itself or else affected responses to items in the questionnaire. The Tariff Method [7] addressed this limitation by classifying responses to questions in the PHMRC verbal autopsy instrument (VAI) according to whether they did or did not depend on the contact that relatives of the deceased person may have had with health services, namely the health care experience (HCE), so that performance could be reported as being with or without HCE.
Tariff Method was included in a recently published study of the comparative performance of six different methods for assigning COD to VAs [10]. Although the performance of the method in this comparative study was far superior to other diagnostic procedures commonly used, questions have been raised about the external validity of empirical methods developed from the PHMRC database [11]. Here we describe in detail the development of the updated Tariff Method we refer to as Tariff 2.0 and address issues of external validity. In our view, the processes of development and validation of empirical methods have been poorly understood by certain commentators and it is important that this be corrected if the full potential of well-performing automated VA diagnostic methods for reducing ignorance about causes of death is to be realized.
Steps in the development of Tariff 2.0 were: 1) testing Tariff 1.0 by using it to assign CODs to VAs collected in household surveys (community VAs); 2) revision and retraining of Tariff 1.0 using validated VAs (VAs that had been collected for deaths with hospital records fulfilling the gold standard criteria from the PHMRC gold standard database); 3) retesting Tariff 1.0 using community VAs and further revising the Tariff 1.0 to create Tariff 2.0; and 4) assessing the performance of Tariff 2.0 using the validation database at individual and population levels using as metrics chance-corrected concordance (CCC) and cause-specific mortality fraction (CSMF) accuracy.

PHMRC gold standard validation study database
The general methodology of the PHMRC study has been described in detail elsewhere [9] and is summarized here for convenience. VAs were collected from six sites in four countries: Andhra Pradesh and Uttar Pradesh in India; Bohol in the Philippines; Mexico City in Mexico; and Dar es Salaam and Pemba Island in Tanzania. Gold standard clinical diagnostic criteria for hospital deaths were specified for a target cause list of 53 adult, 27 child, and 13 neonatal causes, including stillbirths. Deaths with hospital records fulfilling the gold standard criteria were identified in each of the sites. Families were then interviewed about the events leading to each of these deaths using the PHMRC VAI [9]. Interviewers were blinded to the COD assigned in the hospital. The PHMRC database contains 12,501 verbal autopsies with gold standard diagnoses (7,846 adults, 2,064 children, 1,586 neonates, and 1,005 stillbirths). All data collection procedures were approved by the Internal Review Board of the University of Washington, Seattle, WA, USA; School of Public Health, University of Queensland; George Institute for Global Health, Hyderabad, India; National Institute of Public Health, Mexico; Research Institute for Tropical Medicine, Alabang, Metro Manila, Philippines; Muhimbili University, Tanzania; Public Health Laboratory Ivo de Carneri, Tanzania; and CSM Medical University, India . All information on VAs was collected after obtaining signed consent from the informants.
The target cause list was developed from World Health Organization (WHO) estimates of the leading CODs in developing countries in 2004 [12]. COD categories were based on the International Classification of Diseases (ICD) and are mutually exclusive and collectively exhaustive. The original cause list for the validation study was 53 for adults, 27 for children, and 13 for neonates (plus stillbirths). The number of causes in the target list was reduced; firstly, because there were insufficient cases for certain causes and secondly, because analytic methods were unable to discriminate between causes. The first reduction created an analysis cause list, which was used to test diagnostic algorithms, and the second, a reporting cause list containing 34 adult, 21 child, and 11 neonatal causes (including stillbirths) for output from the Tariff 1.0 [9]. The number of neonatal causes was further reduced from 11 to 6 for the updated version of Tariff [10] because of the use of combinations of causes that did not map to the ICD. In the further development of Tariff 2.0 it was realized that neonatal deaths with sepsis had been wrongly recoded in the reduction from 11 to 6 causes. The result has been to change the number of neonatal deaths by COD. Because prenatal deliveries with both sepsis and birth asphyxia could not be recoded to a list with single COD, 34 deaths were dropped from the test/training analyses. The COD lists are shown in Additional file 1. Reductions of the cause list preceded any development of the item-reduced instrument.
Changes to the categorization of neonatal causes and the further accumulation of community deaths has meant that there are differences in the detail of performance metrics between this paper and the comparison of methods for cause assignment published in 2014 for neonates. None of these changes is substantial, however, and none affects the conclusion we draw from this analysis.
The PHMRC VAI includes both closed-ended questions and an open-ended narrative. Question items were based on the closed-ended questions and cover: 1) symptoms of the terminal illness; 2) diagnoses of chronic illnesses obtained from health service providers (as reported by respondent as communicated to them by the health service provider, not obtained through record linkage); 3) risk behaviors (tobacco and alcohol); and 4) details of any interactions with health services. Text items were based on open-ended narrative using a text mining procedure that identifies key words and groups words with the same or similar meanings to create them. Performance was reported as being 1) with HCE and 2) without HCE, respectively. The former was based on analysis of all question and text items, whereas the latter was based on an analysis of question items on symptoms and risk behaviors only.

Community VA data
The development of Tariff 1.0 had been based on the PHMRC validation database and thus all deaths had occurred in hospital. Our initial aim in developing Tariff 2.0 was to review the cause distributions of deaths in community VAs using Tariff 1.0 and to see whether these distributions were plausible. This review was based on the examination of 12,528 VAs, not linked to gold standard hospital data, collected from community samples using the PHMRC VAI. VAs of 3,067 deaths, occurring within 5 years of interview, were collected from household surveys in Mexico City in Mexico, Andhra Pradesh in India, Pemba in Tanzania, and Bohol in the Philippines, as part of the PHMRC study [13]. A further 9,461 VAs were collected in Chandpur and Comilla Districts in Bangladesh, in Central and Eastern Highlands Provinces in Papua New Guinea, and in Bohol Province in the Philippines, as part of a study funded by the National Health and Medical Research Council (NMHRC) of Australia. The age-site distribution of these deaths is shown in Table 2. The performance of Tariff 2.0 could only be compared with that of Tariff 1.0 by using the PHMRC gold standard database.

Tariff method
The premise of the Tariff Method is that individual question and text items are consistently associated with particular causes of death. In the Tariff Method, the association between each item-cause pair is quantified. The first step in quantification is to develop a matrix of endorsement rates for item-cause pairs based on the analysis cause list. An item in the VAI is said to have been endorsed if the response was "yes". The tariff itself reflects the relationship between the endorsement rate for a particular item (j) and a particular cause of death (i) and the distribution of the endorsement rate for item j among all other causes in the analysis cause list: To assign a cause to a death, we compute summed tariff scores for each cause in the analysis cause list based on the distribution of endorsed items for that death: where k is the given decedent, i is the item, and j is the cause of death, x ki is the response for decedent k on item i, with a value of 1 for a positive response and 0 for a negative response, and r identifies the specific item being used among top 40 with the highest absolute tariffs for cause j. Tariff scores for a given decedent are computed for every possible COD.
Therefore, the tariff score of an item for a given cause will depend on its endorsement rate, and some causes will have inherently high tariffs. For example, the item "Decedent suffered poisoning" has a strong association with a few causes of death (poisoning and suicide) and carries high tariffs for those causes. On the other hand, the item "Decedent had a rash" is associated with many different causes of death and carries low tariffs for the causes it is associated with.
A tariff score is calculated for all causes for a given decedent. The most obvious way to assign cause of death would be to select the one that carries the highest (summed) tariff score. However, some causes carry inherently higher tariffs than do others. Therefore to make the tariff scores for different causes comparable, all deaths in the training dataset were ranked by their tariff scores from highest to lowest, and the tariff score for a decedent was compared with these ranks. The cause with the highest ranked tariff score was assigned to the decedent; this makes use of all the information in the training dataset to normalize tariff scores.
The Tariff Method (both Tariff 1.0 and Tariff 2.0) is trained using the validated PHMRC database for which VAs were collected for deaths with hospital records fulfilling the gold standard criteria (validated VAs). During the development of both Tariff 1.0 and 2.0, however, the PHMRC gold standard dataset was repeatedly divided into a training dataset (from which methods were developed) and a testing dataset (used to test the performance of the methods). Tariff 2.0 follows the same process as described above in assigning CODs, but improves on Tariff 1.0 in four important ways.
1. Significance testing for each tariff One limitation of Tariff 1.0 is that items that are strongly associated with a small number of deaths in the PHMRC database can drive COD assignments.
To address this issue, we created 500 bootstrapped samples of the dataset with replacement of all symptoms by cause up to the original sample size. We then used the 500 samples to generate a 95 % uncertainty interval (UI) around each tariff estimate and removed tariffs with uncertainty intervals that included zero.

Standardization of text mining
Standardization of text mining is an iterative process that involves making changes to data preparation and empirically testing how these changes affected model performance. For text analysis, all text were translated to English before starting data mining. We first identified key words that appeared at least 50 times within the open-ended narrative using the Text Mining package in R (version 2.14.0) [14]. Second, we grouped words to form items by stemming (e.g. "injuries" and "injured" formed an item, "injuri") and also grouped words with similar meanings (e.g. "fire" and "burn"). We calculated tariffs for each of these text items. A physician then reviewed text items with statistically significant tariffs for clinical plausibility. These belonged, broadly, to three groups: obvious symptom items; items which appeared to be based on HCE; and other items, often with high tariffs, but with no obvious biological association. For example, the text item "road" had a tariff of 6.5 for road traffic accidents but also had a tariff of 3.0 or more for a number of cancers. The spurious association between "road" and "cancer" arose because of respondents mentioning the Ocean Road Cancer Institute in Dar es Salaam. Tariffs based on text items that were clinically implausible were removed from the analysis.

Biologically and epidemiologically implausible cause assignments
We examined cause assignments at both individual and population levels. We disallowed biologically impossible cause assignments such as males with cervical cancer as well as highly unlikely assignments such as males with breast cancer. At the population level, we censored unlikely assignments such as malaria deaths in non-endemic regions. Additional file 2 lists the full set of exclusion criteria.
We made very few changes to question items. We excluded a number of items, particularly those associated with health-seeking behavior, which had implausible associations with COD and were a consequence of the original dataset being hospitalbased. For example, in the PHMRC validation dataset some gold standard deaths were obtained from police reports and coronial inquiries. However, when analyzing datasets from community deaths, an implausibly high percentage of population deaths had been attributed to drowning because decedents had not been taken to hospital.

Indeterminate cause of death
Gold standard deaths were selected because they met predetermined criteria. It is probable that more information will be available about such cases than will be available for home deaths or, indeed, for other hospital deaths. An extreme example is of a 90-year-old woman whose relatives endorsed only a single question item: "Had her periods stopped naturally because of menopause?" Tariff 1.0 would assign causes that had few symptoms or had low average tariff scores to such a case. Because the assignment of drowning as the COD was driven by the single item: "Did the decedent suffer from drowning?", the woman was initially assigned drowning as the COD. Overall, 29 of 40 items for drowning carried negative tariffs. Cases with little information, i.e., with multiple negative responses to question items, were thus attracted to drowning as a COD.
To address this problem, using the training dataset, which was sampled with replacement to create a uniform cause distribution, we developed a method for identifying deaths where there was insufficient information from the VA interview to assign a COD and coded such deaths as indeterminate. At the ranking stage of analysis we stipulated that tariff scores for a given decedent needed to be above both cause-specific and absolute thresholds. If a tariff score was below either the cause-specific or the absolute threshold, that cause was disallowed for that decedent. If all causes were disallowed, the decedent was classified as indeterminate. We reallocated indeterminate deaths at the population level so that the sum of the CSMFs from all causes of death was 1.0. We did so based on 1) a Tariff model performance weight that was equal to the probability of a death from a given cause being assigned as indeterminate by the Tariff Method weighted by 2) a Global Burden of Disease (GBD) weight equal to the estimated distribution of causespecific mortality by age and sex for a country in the GBD study 2010 [15]. This weight is used to calculate the fraction of an indeterminate death that is allocated to each COD. Weights sum to one. To illustrate this process, Tariff and GBD weights for a 45-year-old male in the Philippines are shown in Additional file 3.
In this example, for cirrhosis the average of the GBD weight (0.054) and the tariff weight (0.026) is used to generate an overall weight for cirrhosis (0.039). If 45-year-old male decedent from the Philippines then 0.039 would be added to the number of cirrhosis deaths when generating the population-level cause of death distributions. The same would be done using the other weights for the other causes. Thus, in Tariff 2.0, an indeterminate VA is partially reallocated to multiple causes of death to create population-level cause of death estimates that are representative of the population from which they came. We did not reallocate indeterminate deaths at the individual level.

Performance metrics
The performance of methods for assigning COD is a function of the true COD composition in a study population [16]. The PHMRC study developed methods to assess performance independently of COD composition and, at the same time, account for random chance effects on COD composition [16]. The 500 train-test data analysis datasets, each with a different COD composition, were generated by holding 75 % of the dataset as "training" data and 25 % as "test" data. Each test dataset was sampled with replacement using a Dirichlet distribution to provide a new CSMF composition. There was no correlation between the COD composition of the train set and the test set. Additional file 4 illustrates how the validation data have been used to generate each train-test pair. A detailed account of this procedure is given elsewhere [16]. We use two metrics to assess the performance of a method: median chance-corrected concordance (CCC) and cause-specific mortality fraction (CSMF) accuracy [16]. The first quantifies performance in correctly predicting COD for an individual and the second in predicting COD composition in populations. Analysis of the 500 test datasets results in a distribution from which we calculate the two metrics and their uncertainty intervals. Results are not biased by the particular cause composition of the dataset.
We assessed the performance of the Tariff Method in correctly assigning a COD to an individual VA using CCC. CCC adjusts sensitivity for chance so that a prediction without error would equal 1 and with random allocation would equal 0. CCC is calculated as: where TP j is true positives or number of decedents with gold standard cause j correctly assigned to cause j, FN is false negatives or the number of decedents incorrectly assigned to cause j, and N is the number of causes analyzed. TP plus FN equals the true number of deaths due to cause j. Performance was also measured at the population level using the mean CSMF accuracy across the 500 cause compositions: where the numerator is the sum of the absolute error for all k causes between the true CSMF and the estimated CSMF and the denominator is the maximum possible error across all causes. A prediction without error would result in CSMF accuracy = 1, whereas a totally erroneous prediction would result in CSMF accuracy = 0. In a further development, we also estimated the CSMF accuracy, correcting by chance, namely chance-corrected CSMF (CCCSMF) accuracy [17].

Validation of Tariff 2.0
Although the most important practical application of VAs lies in the prediction of the cause composition of mortality at the level of the population (CSMFs), the focus of this paper will be on an analysis of the effects of revisions to Tariff Method upon the different causes of death at the level of the individual person (median CCC). Such a detailed analysis is not possible at the population level. Tables 3 and 4 provide an overview of results by CSMF accuracy and median CCC, respectively. For both metrics, and for each of three modules with and without HCE, Tariff 2.0 performs significantly better than did Tariff 1.0. Improvements were most notable in children and neonates but, also, statistically significant in adults. Thus, improvement in CSMF accuracy with HCE was 2.5 %, 7.4 %, and 14.9 % for adults, children, and neonates, respectively, and for median CCC with HCE it was 6.0 %, 13.5 %, and 21.2 %, respectively. Similar levels of improvement are seen in results with no HCE. Differences in improvement between CSMF accuracy and median CCC are more apparent than real. If CSMF accuracy in adults is corrected to take random allocation of COD into account, or CCCSMF accuracy with HCE, improvements are 6.8 %, 20.1 %, and 40.3 % for adults, children, and neonates, respectively.
Median CCC for Tariff 1.0 and 2.0 with and without HCE is shown in Table 5 for adults and in Additional files 5 and 6 for children and neonates, respectively. It should be noted that the allocation of deaths to an indeterminate category will reduce median CCC but increase the accuracy of CSMFs in Tariff 2.0. We describe here results for adult causes of death in detail. In general, median CCC is higher in children and neonates because fewer causes of death are reported. Group A contained only six specific causes and a residual group. All these specific causes are associated with global programs for their control.   because the model was unable to distinguish between other cancers and other non-communicable diseases.
Group C contained external causes of death. Six causes were due to accidents and two (homicide and suicide) to intentional acts.  Tables 6 and 7 provide more information about endorsement rates and tariffs for gold standard maternal deaths. Table 6 shows endorsement rates for five key questions that define maternal death; in 20.1 % of cases respondents gave a negative response to all five. Specificity for maternal death was 99.3 %. Table 7 shows how tariffs distinguish maternal causes from cervical cancer but do not discriminate among maternal causes.

Community VAs
All population VA data was analyzed by site and module. The age and sex distribution of decedents in the community dataset was comparable to that of the gold standard dataset, although the adults and neonates were slightly older. The percentage of decedents who sought care outside of the home was lower for all modules (see Additional file 7).

Discussion
It is essential to recognize that Tariff Method has been formally validated against the PHMRC gold standard database. Through validation it has been possible to compare accuracy between different analytic methods for assigning COD and, in this paper, to assess in detail the effect of revisions to Tariff Method. We have demonstrated increased accuracy of Tariff Method for CSMFs of 2.2 % and 2.5 % for adult modules with and without HCE, of 10.2 % and 7.4 % for child modules, and of 15.0 % and 14.9 % for neonatal modules. We have also shown increased accuracy for median CCC of 3.5 % and 6.0 % for adult modules with and without HCE and of 20.7 % and 21.2 % for neonatal modules.
Random allocation of deaths to different causes would result in CSMF accuracy of 0.632. These results were obtained by randomly assigning CODs from the reporting cause lists to 500 simulated populations with different cause compositions. If CSMF accuracy with HCE shown in Table 3 is adjusted to show improvement over random chance (CCCSMF accuracy), then adjusted accuracy would be 37.6 %, 41.1 %, and 53.1 % for adults, children, and neonates, respectively. To put these results into perspective, the reported CSMF accuracy of medical certification of adult deaths in Mexican teaching hospitals was 82 % [18]; this is equivalent to an adjusted accuracy of 50 %.
Creation of the validation dataset has also made it possible to make objective judgments about the capacity of Tariff Method to discriminate between different CODs. The outcome has been the reporting cause list (Additional file 1). It has also been possible to identify those question and text items that contribute significantly to cause assignment and those that do not. We will be presenting details of a validated item-reduced instrument in a future communication.
In a recent article, Byass drew attention to some of the shortcomings of the PHMRC gold standard database [11]. He argued that although the internal validity of the dataset has been demonstrated, its external validity is suspect and in consequence there has been "over-fitting" of the empirical methods to the dataset. This argument was based in part on an earlier publication that pointed to the effects of small sample size (796 cases) on external validity [19]. There are no absolute criteria for external validity. The PHMRC dataset contained over 12,500 cases from four different countries. The first step in establishing external validity was the "out-of-sample" analyses involving the development of 500 datasets with stochastically determined distribution of causes. The second step was taken with the research described in this paper in which the revision of Tariff Method was, in the main, based on two sets of community VAs, but the validation was dependent on the original PHMRC database. A third step will be to add new gold standard hospital deaths to the PHMRC database.
Cases in the gold standard database serve to establish defining characteristics of a disease for the subsequent assigning of COD from verbal autopsies. To an extent they are the equivalent of type specimens in biology. Signs and symptoms featured prominently in the diagnostic criteria for many of the target diseases. Cases were selected because they met predetermined criteria; many cases were

rejected.
A key issue in developing Tariff 2.0 was to establish the minimum set of information that would define death by a particular cause. Although endorsement rates in the community VA datasets were comparable with the gold standard validation data, endorsement rates for some individual community deaths were very low. This led to the development of ranking cutoffs which allowed us to identify deaths where there was too little information for diagnosis and classify these as "indeterminate".
Maternal deaths are a case in point. In 20 % of verbal autopsies for gold standard maternal death in the PHMRC database there was no response to any of five key questions that depend on knowledge of the pregnancy status of the decedent and serve to define a maternal death (Table 6). Byass has suggested that empirical methods can "learn" wrong conclusions: in this case, that non-pregnant women can die from maternal causes [11]. High specificity for maternal deaths indicates that this did not happen. Tariffassigned CODs are the result of an additive process. The problem appears to be one of paucity of information in many verbal autopsies, possibly due to respondents' lack of familiarity with the symptoms of the terminal illness or, in the case of maternal deaths, that the decedent was in fact pregnant. Such problems are more likely to arise in civil registration systems than in longitudinal population studies where the fact and outcome of pregnancy can be determined by other means. However, a comparison of tariffs between maternal causes and cervical cancer shows little room for confusion in assigning COD (Table 7): the problem for Tariff Method was that VA symptoms were distributed among a range of maternal causes and Tariff was unable to distinguish among them. The use of gold standards makes this problem explicit.
Byass also raises the question of whether hospital deaths provide a valid basis for the development of empirical methods to assign COD to verbal autopsies taken from open populations [11]. The present study was a response to just this situation. We have argued above that the principal problem was one of paucity of information and that Tariff 2.0 copes well under these circumstances. However, two other factors may come into play. The first is that the characteristics of a terminal illness, in particular its duration, may be altered through hospitalization. This would be truer for acute illnesses of childhood when disease characteristics do not have time to develop than for chronic illnesses where respondents would have had time to have become familiar with long-standing symptoms of the underlying cause of death. We can only point out that in children median CCC for infections (mostly acute) showed about 10 % increase between Tariff 1.0 and Tariff 2.0, and that the gold standards were heavily influenced by Integrated Management of Childhood Illness criteria. Certainly, CCC for deaths from respiratory causes in both adults and children was less than hoped. Byass attributes this to procedures during hospitalization resulting in generally high endorsement rates for respiratory symptoms.