Data
We obtained the daily series of 247 confirmed COVID-19 cases in Singapore between January 23 and March 17, 2020, from public records of the Ministry of Health, Singapore, as of March 17, 2020 [27]. Individual-level case details including the dates of symptom onset, the date of reporting, and whether the case is autochthonous (local transmission) or imported are publicly available. Clusters consisting of two or more cases according to the infection source were also assembled from case descriptions obtained from field investigations conducted by the Ministry of Health, Singapore [27]. Single imported cases are analyzed as clusters of size 1 whereas unlinked cases were excluded from the cluster analysis.
Transmission clusters
As of March 17, 2020, 18 different clusters of COVID-19 cases with 2–48 cases per cluster have been reported in Singapore. A schematic diagram and characteristics of the COVID-19 clusters in Singapore are given in Fig. 1 and Table 1. The geographic location of the six clusters accounting for 45.3% of the total cases is shown in Fig. 2 whereas the corresponding distribution of cluster sizes is shown in Fig. 3.
Yong Thai Hang cluster
This cluster with 9 cases was the first to be reported in Singapore. It has nine traceable links, including eight Chinese and one Indonesian national associated with the visit of Chinese tourists to the Yong Thai Hang health products store, a shop that primarily serves the Chinese population, on January 23, 2020. Four shop employees and the tour guide were first identified as a cluster on February 4, 2020 [12, 28, 29]. The tour guide subsequently infected her husband, a newborn, and the domestic helper [29]. No further cases have been added to this cluster as of February 8, 2020.
Grand Hyatt hotel
This cluster with 3 local cases was the second cluster to receive international attention, as it originated from a business meeting held at the Grand Hyatt hotel attended by Singaporean locals and the Chinese visitors from Hubei [30]. Four international cases associated with this cluster had left Singapore before the onset of symptoms. All Singaporean residents associated with this cluster have recovered as of February 19, 2020 [30]. No additional cases have been added to this cluster as of February 8, 2020.
Seletar Aerospace Heights cluster
This cluster with 5 Bangladeshi work pass holders was identified on February 9, 2020. No further cases have been added to this cluster as of February 15, 2020.
The Life Church and Missions and The Grace Assembly of God cluster
This Singaporean cluster is composed of 33 cases, including two imported cases and 31 local cases. The cluster started during The Life Church and Missions service event in Paya Lebar on January 19, 2020. This event was apparently seeded by two visitors from Wuhan China who infected a couple with SARS-CoV-2 at the church. The infected couple likely passed the infection to another case during a Lunar New Year’s celebration on January 25, 2020. This case had subsequently infected Grace Assembly of God church staff at the Tanglin branch, generating secondary cases by the time he was reported on February 14, 2020. Two branches of the Grace Assembly of God church at Tanglin and Bukit Batok have been included in this cluster [28, 31]. This church serves an average of 4800 people in attendance over the weekend. While the church has momentarily closed, field investigations have not led to conclusive evidence regarding superspreading transmission. No further cases have been added to this cluster as of March 9, 2020.
SAFRA Jurong cluster
The largest cluster composed of 48 local cases is linked to a private dinner function at SAFRA Jurong restaurant on February 15, 2020. The restaurant was closed for cleaning from February 16 to February 19, 2020, following the dinner function. The latest case was added to this cluster on March 16, 2020.
Wizlearn Technologies cluster
This cluster which comprises of 14 cases was identified on February 26, 2020. Wizlearn Technologies is an e-learning solutions company. The latest case was added to this cluster on March 3, 2020.
Church of Singapore cluster
The first case of this cluster was identified on March 14, 2020, originating as a secondary case from a case in the SAFRA Jurong cluster. This cluster is composed of 3 local cases. No further cases have been added to this cluster since March 16, 2020.
Boulder gym cluster
The first case of this cluster was identified on March 8, 2020, also linked to the SAFRA Jurong cluster. This cluster is composed of 3 local cases. No further cases have been added to this cluster since March 10, 2020.
Cluster A
The first case of this cluster was identified on February 14, 2020. The cluster comprises of 3 local cases. No further cases have been added in this cluster as of February 18, 2020.
Cluster B
The first case of this cluster was identified on February 19, 2020. This cluster is composed of two local cases. No further cases have been added in this cluster since February 21, 2020.
Cluster C
This first case of this cluster was identified on March 3, 2020. This cluster is composed of 2 local cases. No further cases have been added to this cluster since March 6, 2020.
Cluster D
The first case of this cluster was identified on March 7, 2020. The two cases (one imported and one local) in this cluster are related to each other. No further cases have been added in this cluster since March 8, 2020.
Cluster E
The two cases (an imported and a local case) of this cluster were identified on March 11, 2020. No further cases have been added in this cluster since March 11, 2020.
Cluster F
The first case of this cluster was identified on March 11, 2020. This cluster is composed of 3 local cases. No further cases have been added to this cluster since March 13, 2020.
Cluster G
This cluster is composed of 5 cases, including 4 imported cases. The first case of this cluster was identified on March 10, 2020. No further cases have been added to this cluster since March 13, 2020.
Cluster H
The first case of this cluster was identified on March 14, 2020, a secondary case generated from a case at SAFRA Jurong cluster. This cluster is composed of 3 local cases. No further cases have been added to this cluster since March 16, 2020.
Cluster I
The first case of this cluster was identified on March 14, 2020. This cluster is composed of one local and one imported case. No further cases have been added to this cluster since March 15, 2020.
Cluster J
The first case of this cluster was identified on March 15, 2020. This cluster is composed of one imported and one local case. No further cases have been added to this cluster since March 16, 2020.
Adjusting for reporting delays
As an outbreak progresses in real time, epidemiological curves can be distorted by reporting delays arising from several factors that include (i) delays in case detection during field investigations, (ii) delays in symptom onset after infection, (iii) delays in seeking medical care, (iv) delays in diagnostics, and (v) delays in processing data in surveillance systems [32]. However, it is possible to generate reporting-delay-adjusted incidence curves using standard statistical methods [33]. Briefly, the reporting delay for a case is defined as the time lag in days between the date of onset and date of reporting. Here we adjusted the COVID-19 epidemic curve of local cases by reporting delays using a non-parametric method that employs survival analysis known as the Actuaries method for use with right truncated data, employing reverse time hazards to adjust for reporting delays as described in a previous publication [34,35,36]. The 95% prediction limits are derived according to Lawless and Kalbfleisch [37]. For this analysis, we exclude 7 imported cases and 5 local cases for which dates of symptoms onset are unavailable.
Effective reproduction number from case incidence
We assess the effective reproduction number over the course of the outbreak, Rt, which quantifies the temporal variation in the average number of secondary cases generated per case during the course of an outbreak after considering multiple factors including behavior changes, cultural factors, and the implementation of public health measures [16, 26, 38]. Estimates of Rt > 1 indicate sustained transmission, whereas, Rt < 1 implies that the outbreak is slowing down and the incidence trend is declining. Hence, maintaining Rt < 1 is required to bring an outbreak under control. Using the reporting delay adjusted incidence curve, we estimate the most recent estimate of Rt for COVID-19 in Singapore by characterizing the early transmission phase using a phenomenological growth model as described in previous publications [39,40,41,42]. Specifically, we first characterize daily incidence of local cases for the first transmission wave (January 21–February 14, 2020) using the generalized logistic growth model (GLM) after adjusting for imported cases. This model characterizes the growth profile via three parameters: the growth rate (r), the scaling of the growth parameter (p), and the final epidemic size (K). The GLM can reproduce a range of early growth dynamics, including constant growth (p = 0), sub-exponential or polynomial growth (0 < p < 1), and exponential growth (p = 1) [40, 42]. We denote the local incidence at calendar time ti by Ii, the raw incidence of imported cases at calendar time ti by Ji, and the discretized probability distribution of the generation interval by ρi. The generation interval is assumed to follow a gamma distribution with a mean of 4.41 days and a standard deviation of 3.17 days based on refs. [43, 44]. Then, we can estimate the effective reproduction number by employing the renewal equation given by [45, 46]
$$ {R}_{t_i}=\frac{I_i}{\sum_{j=0}^i\left({I}_{i-j}+\alpha\ {J}_{i-j}\right){\rho}_i} $$
In this equation, the numerator represents the new cases Ii, and the denominator represents the total number of cases that contribute to the new cases Ii at time ti. Parameter 0 ≤α ≤ 1 represents the relative contribution of imported cases to the secondary disease transmission. We perform a sensitivity analyses by setting α = 0.15 and α = 1.0 [47]. Next, in order to derive the uncertainty bounds around the curve of Rt directly from the uncertainty associated with the parameters estimates (r, p, K), we estimate Rt for 300 simulated curves assuming a Poisson error structure [48].
Reproduction number (R) from the analysis of cluster sizes
A second method of inferring the reproduction number applies branching process theory to cluster size data to infer the degree of transmission heterogeneity [49, 50]. Simultaneous inference of heterogeneity and the reproduction number has been shown to improve the reliability of confidence intervals for the reproduction number [51]. In the branching process analysis, the number of transmissions caused by each new infection is modeled as a negative binomial distribution. This is parameterized by the effective reproduction number, R, and the dispersion parameter, k. The reproduction number provides the average number of secondary cases per index case, and the dispersion parameter varies inversely with the heterogeneity of the infectious disease. In this parameterization, a lower dispersion parameter indicates higher transmission heterogeneity.
Branching process theory provides an analytic representation of the size distribution of cluster sizes as a function of R, k, and the number of primary infections in a cluster (as represented in equation of 6 of the supplement of [52]). This permits direct inference of the maximum likelihood estimate and confidence interval for R and k. In this manuscript, we modify the calculation of the likelihood of a cluster size to account for the possibility that truncation of case counts at a specific time point (i.e., March 17, 2020) may result in some infections being unobserved. This is accomplished by denoting x as the sum of the observed number of serial intervals in a cluster. Then the likelihood that an observed cluster of size j containing m imported cases is generated by x infectious intervals is given by:
$$ {l}_{m\to j}^C\left({R}_{\mathrm{eff}},k,x\right)=\frac{m}{j}{l}_{x\to \left(j-m\right)}\left({R}_{\mathrm{eff}},k\right) $$
(1)
where the likelihood of i infections causing j infections is given by:
$$ {l}_{i\to j}\left({R}_{\mathrm{eff}},k\right)=\frac{\Gamma \left(j+ ki\right)}{\Gamma \left(j+1\right)\Gamma (ki)}\ {\left(\frac{k}{R_{\mathrm{eff}}+k}\right)}^{k\mathrm{i}}{\left(\frac{R_{\mathrm{eff}}}{R_{\mathrm{eff}}+k}\right)}^j $$
(2)
where Γ is the gamma function.
To determine the number of observed serial intervals in each cluster, we first estimate the cumulative probability distribution of the serial interval. We assume the serial interval is a gamma distribution, with a mean of 4.7 days and a standard deviation of 2.9 days [43]. This translates to a shape parameter of 2.63 and a scale parameter of 1.79. We then use the difference between the onset data and the end of our study (March 17, 2020) to determine how much of the infectious period was observed. For cases that only have a report date, but no onset date, we assume an onset date that is 6 days earlier than the reporting date. This is based on the average duration between onset date and report date that was observed in the data. When applied to the case series, we are able to assign a total size, the number of imported cases, and the observed number of infectious periods for each cluster in the case series. When no imported cases are known to be in a cluster, we assign the number of imported cases to be one as the cluster must have been initiated by someone (e.g., the index case had contact with a foreign visitor).
When Eq. (1) is applied to the table of cluster size characteristics, the likelihood of the data can be calculated as a function of R and k. Minimizing the likelihood produces the maximum likelihood estimates of R and k. Applying the likelihood ratio test by profiling and R and k produces confidence intervals [53]. Code was run in R version 3.6.1.