Epidemiological data
The present study employs published secondary data of the confirmed MERS cases arising from the outbreak in the Republic of Korea [15–17]. As of July 31, 2015, a total of 185 cases have been diagnosed (excluding one case diagnosed and recovered in China) including 36 deceased cases. During the course of this outbreak, a detailed line list of cases has been made publicly available [15]. Data on individual cases include the (1) dates of illness onset, (2) age-group, (3) gender, and (4) background health status either as outpatient or inpatient of a specific healthcare facility. Since the dates of illness onset among those recently reported were not yet available, we relied on the dates of confirmatory diagnosis as an alternative. We believe it is reasonable to approximate the date of illness onset by the date of confirmed diagnosis in this setting, because from the midst of the outbreak, all suspected contacts have been regularly monitored and have been subjected to laboratory testing regardless of clinical signs and symptoms. We limited ourselves to handle the abovementioned covariates (2) to (4) only, not each specific individual comorbidity, because it is not clear if the presence of all common underlying comorbidities were routinely evaluated and consistently documented in the case list; the identification of most closely associated specific comorbidities may be the subject of future clinical studies.
Statistical model
Herein, we develop an estimation model composed of a mixture of a survival model and logistic regression model. Let S(τ) and p be the survival probability at time τ since illness onset and the CFR, respectively. The relationship of the two is described as
$$ p=1-S\left(\infty \right), $$
(i)
where S(τ) is given by
$$ S\left(\tau; p\right)=1-p{\displaystyle \underset{0}{\overset{\tau }{\int }}f(x)}dx, $$
(ii)
where f(τ) is the conditional probability density function of the time from illness onset to death given fatal outcome [4, 18]. In the majority of the following analyses, f(τ) was assumed as known and based on the data from first 10 reported cases who eventually resulted in death in South Korea, using the moment-based estimates of the mean (13.2 days) and standard deviation (7.1 days). Nevertheless, we also examined the heterogeneous risk of death by jointly quantifying heterogeneous risk factors and parameters of f(τ), verifying that the estimates of the risk of death in the joint estimation do not greatly differ from those we obtained using the assumed known f(τ).
Since we aim to identify risk factors (or explanatory variables) of p, the notation p is set to be changeable by individual i, i.e. pi. Such individual variation is modelled using the logit model:
$$ \ln \left(\frac{p_i}{1-{p}_i}\right)={a}_0+{\displaystyle \sum_{k=1}^N{a}_k{x}_{k,i}}, $$
(iii)
where a0 is the intercept, ak the coefficient of variable k, xk,i the k-th variable of individual i of the linear predictor, and N the total number of independent variables. Let A and B represent the groups of cases who have survived and died by the most recent calendar time tm, the likelihood function to parameterize the linear predictor in (iii) is
$$ L\left(\mathbf{a};\boldsymbol{\upalpha}, \boldsymbol{\upbeta}, {t}_m\right)=\prod_{i\in A}S\left({t}_m-{\alpha}_i;{p}_i\right)\prod_{i\in B}\left[{p}_if\left({\beta}_i-{\alpha}_i\right)\right], $$
(iv)
where αi and βi represent the observed dates of illness onset and death of an individual i, respectively, with coefficient vector a = (a0, a1, ⋯, a
N
).
Estimation settings
In our analysis of the Korean MERS data in real time, univariate analyses using the abovementioned logistic regression were conducted to detect any variable associated with the risk of death. Explanatory dichotomous variables include age-group (below or above 60 years old), gender, and patients under treatment. Patients under treatment include both outpatients and inpatients, while non-patient cases represent all other cases including healthcare workers, visitors, and so on. The univariate test was achieved by estimating the CFR at the most recent time tm for each subgroup using the solution of pi obtained from (iii) and calculating the difference of CFR between the estimates. After identifying variables that were significantly associated with MERS death from univariate analyses, a multivariate version of the model (iii) was run to adjust confounding factors and identify epidemiological factors that were significantly associated with death.
Subsequently, we used exactly the same model as (iii) to test if there was any time-dependent change in the risk of death, perhaps due to increased ascertainment involving diagnoses of a substantial number of mild and asymptomatic cases. The time dependence in the CFR was modelled by introducing a parameter δ, a constant factor multiplied to the original logit model, i.e.
$$ {p}_i=\frac{\delta \exp \left({a}_0+{\displaystyle \sum_{k=1}^N{a}_k{x}_{k,i}}\right)}{1+ \exp \left({a}_0+{\displaystyle \sum_{k=1}^N{a}_k{x}_{k,i}}\right)}, $$
(v)
where δ was dealt with as
$$ \delta =\left\{\begin{array}{l}1,\kern2.75em \mathrm{f}\mathrm{o}\mathrm{r}\ t<{t}_0\ \\ {}\varepsilon, \kern2.75em \mathrm{f}\mathrm{o}\mathrm{r}\ {t}_0\le t\end{array}\right., $$
(vi)
where t0 represents the first day on which the ascertainment rate increased while ε scales the extent of ascertainment (where ε is expected to be less than 1); t0 was objectively sought by comparing the Akaike Information Criterion (AIC) among models with different t0.
When the joint estimation was conducted, we estimated not only coefficients of the linear predictor in the likelihood L but also parameters for f(τ), i.e. mean and standard deviation of the gamma distribution. Parameters were estimated using the maximum likelihood method (i.e. by minimizing negative logarithm of (iv)). The 95 % confidence intervals (CI) were derived from the profile likelihood. Different models were compared using the AIC.
Ethical considerations
The present study reanalyzed the publicly available secondary data from the Korean Government and WHO which collected the notification data with ethical approval and written consent from patients, and adhering to the International Health Regulations. The secondary data were de-identified by these organizations in advance of our access. As such, the datasets employed in our study have been deemed exempted from the ethical approval.
Availability of supporting data
The present study fully rests on published data, and essential components of the data consisting of dates of illness onset and death are downloadable from the WHO website [15].