Mapping the cryptic spread of the 2015–2016 global Zika virus epidemic

Background Zika virus (ZIKV) emerged as a global epidemic in 2015–2016 from Latin America with its true geographical extent remaining unclear due to widely presumed underreporting. The identification of locations with potential and unknown spread of ZIKV is a key yet understudied component for outbreak preparedness. Here, we aim to identify locations at a high risk of cryptic ZIKV spread during 2015–2016 to further the understanding of the global ZIKV epidemiology, which is critical for the mitigation of the risk of future epidemics. Methods We developed an importation simulation model to estimate the weekly number of ZIKV infections imported in each susceptible spatial unit (i.e. location that did not report any autochthonous Zika cases during 2015–2016), integrating epidemiological, demographic, and travel data as model inputs. Thereafter, a global risk model was applied to estimate the weekly ZIKV transmissibility during 2015–2016 for each location. Finally, we assessed the risk of onward ZIKV spread following importation in each susceptible spatial unit to identify locations with a high potential for cryptic ZIKV spread during 2015–2016. Results We have found 24 susceptible spatial units that were likely to have experienced cryptic ZIKV spread during 2015–2016, of which 10 continue to have a high risk estimate within a highly conservative scenario, namely, Luanda in Angola, Banten in Indonesia, Maharashtra in India, Lagos in Nigeria, Taiwan and Guangdong in China, Dakar in Senegal, Maputo in Mozambique, Kinshasa in Congo DRC, and Pool in Congo. Notably, among the 24 susceptible spatial units identified, some have reported their first ZIKV outbreaks since 2017, thus adding to the credibility of our results (derived using 2015–2016 data only). Conclusion Our study has provided valuable insights into the potentially high-risk locations for cryptic ZIKV circulation during the 2015–2016 pandemic and has also laid a foundation for future studies that attempt to further narrow this key knowledge gap. Our modelling framework can be adapted to identify areas with likely unknown spread of other emerging vector-borne diseases, which has important implications for public health readiness especially in resource-limited settings.

1. Additional information on the collection and pre-processing of data
Cases were reported monthly at a country level. We assumed the daily case count to be uniform within each month to calculate the weekly incidence. If the monthly number of reported cases was below four, we randomly assigned each case to a Eweek within that month.
[Group 2]: Anguilla, Antigua and Barbuda, Argentina, Aruba, Bahamas, Barbados, Belize, Bolivia, British Virgin Islands, Cayman Islands, Costa Rica, Cuba, Curacao, Dominica, Dominican Republic, Ecuador, El Salvador, French Guiana, Grenada, Guadeloupe, Guatemala, Guyana, Haiti, Honduras, Jamaica, Martinique, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Saint Barthelemy, Saint Kitts and Nevis, Saint Vincent  We digitized the weekly country-level autochthonous case counts from published bar charts, except for Tonga, for which data were directly obtained from the table published by the ReliefWeb. Ideally, we would want to obtain case data reported at a subdivision level, given that Brazil reported the largest number of autochthonous Zika cases during 2015-6 and these cases were not evenly distributed across space. We were unable to obtain the subdivision-level data however and therefore ran the analysis at a country level.

[Group 5]: Colombia
For each first-level country subdivision except San Andrés, weekly cumulative case counts between Eweek 37 and Eweek 52 of 2016 were obtained from the Colombian National Institute of Health's website, and each time series was then differenced to derive the weekly case counts. The weekly ZIKV incidence data reported prior to Eweek 38 of 2016 were collected and made publicly available by Siraj et al. (2018).
For San Andrés, weekly incidence data were not published by Siraj et al. (2018). Hence, we obtained daily incidence data prior to Eweek 5 of 2016 from Rojas et al. (2016), which were subsequently aggregated within each Eweek. As before, weekly incidence data from Eweek 5 of 2016 onwards were derived from the weekly cumulative case counts published by the Colombian National Institute of Health.
These countries were excluded from our analyses, because either the autochthonous case counts were not publicly available, or the time of each reported case could not even be narrowed down to a specific month. Overall, the number of autochthonous Zika cases reported by these countries only accounted for less than 0.09% of the total case count reported worldwide during 2015-6.
The total number of autochthonous cases reported by each of these countries during 2015-6 was relatively small (less than ten per country). The Eweek during which each case was reported can be found using HealthMap. We located each autochthonous case reported by Indonesia, Malaysia, and Myanmar to a first-level country subdivision, and data for the rest of the countries were publicly available at a country level only.
We digitized the daily country-level autochthonous case counts from published bar charts, which were subsequently aggregated within each Eweek.

[Group 9] Saint Lucia
The PAHO published the total number of autochthonous Zika cases reported at a country level by the end of 2016, and the time when the first autochthonous case was identified. Since the total number of cases was reasonably large (872), we did not exclude Saint Lucia from our analysis, and assumed the shape of the epidemiological curve (post-standardization) was equal to that of Saint Vincent and the Grenadines, which is situated closed to Saint Lucia.

Merging of subdivisions
For each first-level country subdivision with no incoming air passengers based on the 2015-6 OAG data, we performed an estimation of the most likely airport * (within the same country) that the population of subdivision would rely on when they returned home. In the equation below, the distance between the population-weighted centroid coordinates of subdivision and an airport (denoted by ( , )) was adjusted by the total number of incoming air passengers during 2015-6 ( ) using Holling's type 2 function, so that major airports would not be overemphasized. We used the notation " within " to restrict ourselves to all airports within the country , to which subdivision belonged (similarly for " ′ within "): * = argmin within ( , ) ( ) Consequently, subdivision was merged with the one where airport * was located, and they were modelled as a single unit from then on. However, if the reported autochthonous cases cannot be located to the subdivision level, the entire country was treated as a single unit of analysis. Hence, the spatial unit of analysis in our study can be a single first-level country subdivision, a combination of subdivisions, or an entire country (hereinafter referred to as "spatial unit").

Countries or territories with zero incoming air passenger during 2015-6, and hence were excluded from the analyses.
Andorra, Bouvet Island, British Indian Ocean Territory, Clipperton, French Southern Territories, Heard Island and McDonald Islands, Liechtenstein, Palestinian Territory, Pitcairn, Saint Helena, San Marino, South Georgia, Tokelau, Vatican City.

Model validation
We took the list of 0 estimates that were obtained by Armstrong et al. (2020) via a systematic search within Web of Science and PubMed, and applied the following selection criteria: 1) The 0 value was derived by estimating the early growth rate of the epidemic based on the reported case data (i.e. using a simple exponential growth model, or Richards' growth model).
In other words, we did not include estimates derived from mechanistic models, which could vary substantially in terms of the functional forms as well as the input parameter values used. In fact, even for the same location and time window, there can be very different literature estimates of 0 obtained by these models, where the differences in model specifications played an important role. This is not to say that these model results were all invalid, but the complex nature of the models has made it challenging to standardize the results, as well as to examine the validity of the methods given the large number of studies included in the list. Hence, we only included estimates that were directly driven by the reported case data as described in the beginning.
2) The time period of the reported case data that were used to estimate 0 was clearly specified.
This would allow us to average our 0 estimates across the corresponding time window to be compared with the literature estimate.
3) The spatial resolution of the location of the reported case data that were used to estimate 0 should not be higher than the spatial unit of analysis in our study.
4) The width of the uncertainty interval of the 0 estimate should not exceed 10. Note that these estimates did not incorporate local evidence of Aedes-borne disease transmission potential or thermal restrictions for ZIKV transmission under different scenarios. Additional adjustment steps were implemented in the analysis of onward ZIKV spread to minimize false positive rates (Refer to the Methods section for more details).

Visualizations of the global risk model outputs
[See Figure S1.tif in the attachment] Figure S2: Estimated ZIKV presented as a function of temperature and vector-to-host ratio * .
For demonstration purposes, we assumed an equal vector-to-host ratio for Ae. aegypti and Ae. albopictus, and the y-axis value refers to the vector-to-host ratio for each species. Note that these estimates were preliminary results only, which did not incorporate the thermal restrictions for ZIKV transmission under different scenarios. An additional adjustment step was implemented in the analysis of onward ZIKV spread to minimize false positive rates (Refer to the Methods section for more details).