The epidemiological and clinical characteristics of SARS-CoV-2 mean that a large proportion of infections may go undetected [13, 20]. In the absence of serological data, the ratio between cases and deaths, adjusted for delays from confirmation-to-outcome, can be used to derive estimates of the proportion of symptomatic cases reported. Using this approach, we estimated that case ascertainment dropped substantially in many countries during the peak of their first epidemic wave. Although serological surveys are beginning to emerge [20], many countries do not have such data available, or may only have results from a single cross-sectional survey. The methods and estimates presented here can therefore provide an ongoing picture of the underlying epidemics, including local level dynamics as fine-scale surveillance data become available [21, 22].
Our analysis has some limitations. We assumed the age-adjusted baseline CFR was 1.4% (95% CrI 1.2–1.5%) [4], which is broadly consistent with other published estimates [5, 23, 24], and we assumed a range of 10–70% of infections were asymptomatic [20, 25, 26] with a mean value of 50% [12]. Given the uncertainty in these estimates, we propagated the variance in baseline CFR and range in proportion asymptomatic in the inference process so the final 95% credible interval reported for under-ascertainment reflects underlying uncertainty in the model parameters. We also assumed that deaths from COVID-19 are accurately reported. If local testing capacity is limited, or if testing policy affects attribution of deaths (for example, the evidence for the efficacy of post-mortem swabbing is lacking), deaths can be misattributed to a cause other than COVID-19. In that case, our model may underestimate the true burden of infection. For example, in Peru between 1 April and 1 July 2020, there were 690% excess deaths when compared to confirmed COVID deaths and 3396 reported COVID-19 deaths per 100,000 cases, whereas in the UK there were 199% excess deaths, and 23,642 reported COVID-19 deaths per 100,000. There have also been reports of data reporting issues for several countries [27]. Additionally, if a large proportion of transmission is concentrated within specific age groups, the effective CFR may be higher or lower than the assumed baseline; with better age-stratified temporal data on cases and deaths, it would be possible to explore the effect of this in more detail. However, our estimates were in general consistent with published serological data, where available, providing evidence that our method was robust for these countries at least.
To compare our estimates against seroprevalence studies, and consistent with other simplifying assumptions across countries in this study, we assume that there is little or no variation between the accuracy of the various serological studies included. Including the confidence intervals of each seroprevalence estimate in the comparison allows for some of this variation to be captured quantitatively, but most will be missed. However, as the comparison is crude for a number of reasons, we believe the additional error incurred by such an assumption is minimal. Further, given that our estimates of under-ascertainment in many countries suggest that the numbers of symptomatic infections at the peak of the outbreak were one or two orders of magnitude larger than reported cases, even if deaths are under-reported, our estimates are still likely to be much closer to the true burden than locally reported cases imply.
Our estimates of under-ascertainment over time require a time-series of COVID-19 deaths as an input, a data source that may also exhibit reporting variation. One notable example of this was Spain during June 2020 (Supplementary Appendix: Figure S1). However, as our Gaussian process model quantifies time-varying case ascertainment, it is able to account for positive or negative spikes in reporting [13] (see the Estimating under-ascertainment rates section in the Supplementary Appendix for more details). Finally, our results are limited by the quality of the input data, which is likely to vary in accuracy between countries. However, as we find good agreement between the 95% credible intervals of our estimates and seroprevalence studies, we believe that our model accurately captures some of this variation.
Since the temporal trend in under-ascertainment does not necessarily reflect trends in reported cases or testing effort, evidence synthesis methods such as the one presented here can provide additional insights into whether observed case patterns reflect the underlying epidemic dynamics. In the early stages of outbreaks, this method can provide an indication of whether a large proportion of cases are being detected— and hence whether transmission may be containable with targeted measures such as isolation and contact tracing—or whether transmission is more widespread and a more extensive response is required. Such estimates can also provide insights in the later stages of an outbreak, as they can indicate high levels of detection in countries that have achieved control. For example, in Australia, an adapted version of our model estimated that 80% (95% CrI 55–100%) of cases had likely been ascertained during the outbreak [22]. By adjusting for under-ascertainment, it is also possible to reconstruct the temporal dynamics of SARS-CoV-2 internationally. During February and early March 2020, importations of SARS-CoV-2 into the UK came primarily from Italy, Spain and France [28]. This is consistent with the inferred progression of infection during this period in our model; we estimated that Italy, Spain, France and Belgium all had over 6.5% of the population infected by 31 March 2020 [28].