Quantifying the risk of local Zika virus transmission in the contiguous US during the 2015–2016 ZIKV epidemic

Background Local mosquito-borne Zika virus (ZIKV) transmission has been reported in two counties in the contiguous United States (US), prompting the issuance of travel, prevention, and testing guidance across the contiguous US. Large uncertainty, however, surrounds the quantification of the actual risk of ZIKV introduction and autochthonous transmission across different areas of the US. Methods We present a framework for the projection of ZIKV autochthonous transmission in the contiguous US during the 2015–2016 epidemic using a data-driven stochastic and spatial epidemic model accounting for seasonal, environmental, and detailed population data. The model generates an ensemble of travel-related case counts and simulates their potential to have triggered local transmission at the individual level in the 2015–2016 ZIKV epidemic. Results We estimate the risk of ZIKV introduction and local transmission at the county level and at the 0.025° × 0.025° cell level across the contiguous US. We provide a risk measure based on the probability of observing local transmission in a specific location during a ZIKV epidemic modeled after the epidemic observed during the years 2015–2016. The high spatial and temporal resolution of the model allows us to generate statistical estimates of the number of ZIKV introductions leading to local transmission in each location. We find that the risk was spatially heterogeneously distributed and concentrated in a few specific areas that account for less than 1% of the contiguous US population. Locations in Texas and Florida that have actually experienced local ZIKV transmission were among the places at highest risk according to our results. We also provide an analysis of the key determinants for local transmission and identify the key introduction routes and their contributions to ZIKV transmission in the contiguous US. Conclusions This framework provides quantitative risk estimates, fully captures the stochasticity of ZIKV introduction events, and is not biased by the under-ascertainment of cases due to asymptomatic cases. It provides general information on key risk determinants and data with potential uses in defining public health recommendations and guidance about ZIKV risk in the US. Electronic supplementary material The online version of this article (10.1186/s12916-018-1185-5) contains supplementary material, which is available to authorized users.

1 Simulating the spatiotemporal dynamics of the 2015-2016 ZIKV epidemic in the Americas

Overview
To quantify the risk of local ZIKV transmission in the contiguous US, we utilize the simulation results from the study by Zhang et. al. [28] to estimate the volume of travel-related ZIKV cases (including both symptomatic and asymptomatic infections) entering the US. In addition, to evaluate the risk of triggering local transmission in the contiguous US by travel-related cases, we adopt the ZIKV transmission model (Section 1.2, 1.3) as well as the quantification of socioeconomic risk of exposure to mosquitoes (Sections 1.5) from ref. [28]. In this section, we briefly reiterate how the global epidemic and mobility model (GLEAM) was adopted to simulate the 2015-2016 ZIKV epidemic in the Americas, while details of the model can be found in ref. [28]. Figure S1 describes the compartmental classifications used to simulate ZIKV transmission dynamics. The humanvector chain-binomial model is based on a SEIR compartmentalization of human populations and a SEI compartmentalization of vector populations [15]. Humans can be in one of four compartments: susceptible individuals (S H ) who lack immunity against infection, exposed individuals (E H ) who have acquired infection but are not yet infectious, infected individuals (I H ) who can transmit infection (and may or may not display symptoms), and removed individuals (R H ) who are no longer infected. People in the final compartment may recover and gain immunity or die. We treat the total human population size as constant, i.e., S H + E H + I H + R H = N H .

ZIKV transmission dynamics
The transition of people between compartments is performed stochastically, based on various biological factors. Following ref. [12], susceptible humans transit to the exposed compartment under the force of infection (λ H ), which is proportional to the rate at which a particular human is bitten by infected mosquitoes (I V /N V ), the parameter β that accounts for the daily mosquito biting rate and the specific transmissibility of ZIKV, and the temperature dependence [17] of the mosquito-to-human probability of transmission (T V H ). The temperature dependency of T V H is given by the following equation: The factor k expressing the number of mosquitoes per person, and we have N H = N V /k, yielding a force of infection On average, individuals stay in the exposed or infectious state for the duration of the mean intrinsic latent period −1 H = 7 days or the mean infectious period µ −1 H = 5 days, respectively. The vector population is divided into three compartments: susceptible (S V ), exposed (E V ), and infectious (I V ). The force of infection (λ V ) governing the transition rate from susceptible to exposed individuals in the vector population is proportional to the density of infectious humans (I H /N H ). On average, mosquitoes are in the exposed state for the duration of the mean extrinsic latent period −1 V = 8 days. The average lifetime of mosquitoes µ −1 V is temperature dependent and varies across spatial locations and time of the year [11]. The explicit temperature dependency of µ −1 V is given by the following equation: The overall mosquito population is rescaled every day based on the local temperature. The explicit temperature dependency [28] is given by the equation: is the average temperature of the previous 79 days. Similar to the force of infection from vector to human, the force of infection from human to the vector, λ V , is a function of β, the temperature dependence of human-to-mosquito transmission (T HV ), and the density of infectious humans ( The coupled population equations describing the epidemic time evolution read as: and In the above expressions each term ∆ X→Y represents the number of human or vector individuals transitioning from state X to state Y . Transitions are calculated according to chain binomial processes ∆ X→Y = Binomial(X, p X→Y ), and p X→Y are transition probabilities determined by the force of infection and the average lifetime of individuals in each compartment. We assume memoryless discrete stochastic transition processes. It is worth stressing that the terms ∆ I V →S V , ∆ E V →S V are introduced to model the replenishment of mosquitoes after death.
By using the standard approach of Ref. [12], the basic reproduction number can be expressed as: It is worth remarking that the basic reproduction number varies spatiotemporally according to the temperature and mosquito abundance. S t

The calibration of the ZIKV transmission model
The calibration of the disease dynamic model is performed by a Markov chain Monte Carlo (MCMC) analysis of data reported from the 2013 ZIKV epidemic in French Polynesia [16]. Setting the extrinsic and intrinsic latent periods and the human infectious period to reference values and using average daily temperatures of French Polynesia, we estimate a basic reproduction number at the temperature T = 25 C • from French Polynesia as R F P 0 = 2.75 (95% CI [2.53-2.98]), which is consistent with other ZIKV outbreak analyses [16,5]. It is worth recalling that the reproduction number depends on the location and time of the year through seasonal temperature changes. The calibration in French Polynesia provides the basic transmissibility of ZIKV.

The projected seasonal abundance of the mosquitoes in the contiguous United States
To model the seasonal variation of the mosquito abundance across the contiguous United States, we consider a mechanistic model of mosquito population dynamics driven by local temperature [28]. The model is informed by the mosquito presence data created by Kraemer et. al [13] and the global air temperature data [27] and projects the daily abundance of Aedes mosquitoes at the spatial resolution of 0.25 • × 0.25 • across the contiguous US. Figure S2 provide the projected monthly average Ae. aegypti abundance of 50 major cities in the contiguous US comparable to the projection by Monaghan et al. [18]. Here we consider the maximum monthly abundance in Miami as a reference point, we define abundance category "None to low" as less than 10% of the reference point, category "Low to moderate" as 10% − 50% of the reference point, "Moderate to high" as 50% − 90% of the reference point, and "High" as larger than 90% of the reference point.

The socioeconomic risk of exposure to mosquitoes
Studies suggest that even if the environmental conditions are suitable for arbovirus transmission, population's risk of exposure to mosquitoes may still vary due to socioeconomic disparities [26,23,24]. Following ref. [28], we estimate the socioeconomic risk, p e throughout the contiguous US. The premise of the method is to identify the fraction of the population exposed to mosquito biting (i.e., p e ) given the presence of mosquitoes due to socioeconomic disparities.
To accomplish this, we consider 9 previous outbreaks of chikungunya that occurred in confined populations (either island populations or isolated villages). We estimate the socioeconomic risk of exposure p e for each of the 9 outbreaks considered utilizing seroprevalence surveys and environmental settings related to chikungunya transmission. We consider the logarithm of Gross Domestic Product per capita in Purchasing Power Parity (GDP per capita, PPP) as the socioeconomic indicator and model the relationship between p e and log(GDP per capita, P P P ) by considering a linear regression model:p e =α +βlog(GDP per capita, P P P ) whereα andβ were estimated through least squares regression. To project the socioeconomic risk of exposurep e across the contiguous US, we consider the geophysically scaled economic dataset (G-Econ), developed by Nordhaus et al. [21,20]. This data maps the GDP per capita in PPP at 1 • × 1 • resolution. We spatially interpolate the data into a resolution of 0.25 • × 0.25 • , matching the spatial resolution of ZIKV transmission model (Section 3). Additionally, the data is rescaled to reflect the 2015 GDP per capita, PPP estimate.

Global model for the spread of vector-borne diseases
The GLEAM model is a fully stochastic epidemic modeling platform that utilizes real world data in order to perform in silico simulations of the spatial spread of infectious diseases at the global level. GLEAM uses population information obtained from the high-resolution population database of the Gridded Population of the World project from the Socioeconomic Data and Application Center at Columbia University (SEDAC) [6]. Within each subpopulation, a compartmental model is used to simulate the disease of interest. The model uses an individual dynamic where discrete, stochastic transitions are mathematically defined by chain binomial and multinomial processes. Subpopulations interact through the mechanistically simulated mobility and commuting patterns of disease carriers. Mobility includes global air travel [22], and GLEAM simulates the number of passengers traveling daily worldwide using available data on the origin-destination flows among indexed subpopulations.
The transmissibility of vector-borne diseases is associated with strong spatial heterogeneity, driven by variability and seasonality in vector abundance, the temperature dependence modulating the vector competence, and the characteristics of the exposed populations. Many locations, such as those at high elevation, are not at risk of autochthonous ZIKV transmission simply because the vector is absent. In other locations, the vector may be present but sustained transmission is not possible because of environmental factors that affect the vector's population dynamics, such as temperature or precipitation. Housing conditions, availability of air-conditioning, and socioeconomic factors also contribute significantly to determining the fraction of the population likely exposed to the vector. To extend the GLEAM model to simulate vector-borne diseases, a number of new datasets with high spatial resolution are integrated, including the following: • Global terrestrial air temperature data: The global air temperature dataset [27] containing monthly mean temperatures at a spatial resolution of 0.5 • × 0.5 • . To match the spatial resolution of GLEAM's gridded population density map, the temperature for each population cell is extracted from the nearest available point in the temperature dataset. Daily average temperatures are linearly interpolated from each population's monthly averages.
• Global Ae. aegypti and Ae. albopictus distribution: The Global Ae. aegypti and Ae. albopictus distribution database provides uncertainty estimates for the vector's distribution at a spatial resolution of 5km × 5km [13].
• Geolocalized economic data: The geophysically scaled economic dataset (G-Econ), developed by Nordhaus et.al. [21,20], maps the per capita Gross Domestic Product (computed at Purchasing Power Parity exchange rates) at a 1 • × 1 • resolution. To estimate the per capita Gross Cell Product (GCP) at Purchasing Power Parity (PPP) rates, the amount is distributed across GLEAM cells proportionally to each cell's population size. The data have also been rescaled to reflect 2015 GDP per capita (PPP) estimates.
These databases are combined to model the key drivers of ZIKV transmission. Temperature impacts many important disease parameters, including the spatiotemporally specific values of R 0 , whose variations induce seasonality and spatial heterogeneity in the model. Temperature data is also used together with the mosquito presence distribution data to define the daily mosquito abundance (number of mosquitoes per human) in each cell. Data on mosquito abundance and temperature are used to identify cells where ZIKV outbreaks are not possible because of environmental factors. The human populations in these cells are thus considered unexposed to ZIKV. Finally, we use historical data and G-Econ to provide a socioeconomic rescaling factor, reflecting how exposure to the vector is impacted by socioeconomic variables such as the availability of air-conditioning.

The calibration of the global model
Once the data layers and parameters have been defined, the model runs using discrete time steps of one full day to simulate the transmission dynamic model, incorporating human mobility between subpopulations. The model is fully stochastic and from any nominally identical initialization (initial conditions and disease model), generates an ensemble of possible epidemics, as described by newly generated infections, time of arrival of the infection in each subpopulation, and the number of traveling carriers. The Latin square sampling of the initial introduction of ZIKV in the Americas, and the ensuing statistical analysis is performed on 150,000 stochastic epidemic realizations. From those realizations we find the probability p(x) and p(x|θ), defined as the probability of the evidence (the epidemic peak in Colombia, as from surveillance data) and the likelihood of the evidence given the parameters θ specifying the date and location of introduction of ZIKV in Brazil. From those distributions we can calculate the posterior probabilities of interest. In total, 1736 out of the 150,000 simulations match the epidemic peak in Colombia and the posterior estimate on the time of introduction into Brazil matches phylogenetic evidence [8,9]. We consider the ensemble of the 1736 simulations as the output of the global model to characterize the spatiotemporal dynamics of ZIKV transmission in the Americas and the global dissemination of the travel related ZIKV infections through human mobility.
2 Challenges on using surveillance data to directly estimate the travelassociated ZIKV infections entering the contiguous US and the creation of the TCC database Evaluating the risk of ZIKV introduction in the contiguous US requires a quantitative estimate of the intensity of travel-associated ZIKV infections entering the contiguous US. In the case of Zika, direct estimates from the surveillance data prove challenging due to the following reasons. First, the US CDC issued a health advisory for ZIKV on January 15th, 2016, before which travel associated ZIKV infections were not actively monitored in the US. However, in 2015, several countries including Brazil were likely experiencing large scale ZIKV outbreaks [28] and exporting ZIKV infectedindividuals to the contiguous US which were missing from the surveillance systems. Second, a high asymptomatic rate of ZIKV infection along with mild or inapparent dengue-like disease symptoms result in low detection rate of ZIKV through surveillance system. Analysis suggests that even in the United States, the detection rate was likely lower than 5% [25], indicating that the surveillance system only revealed the tip of the iceberg of the total travel-associated ZIKV infections entering the contiguous US. Third, the publicly available data on travel-associated ZIKV infections in the contiguous US are spatially aggregated at state level and are not temporally resolved. However, since ZIKV transmission, like other vector transmitted diseases, is strongly modulated by seasonality [19], both the ZIKV infected individual's time of arrival and location of residence are important to assess his/her risk of triggering local ZIKV transmissions.
To overcome the challenge mentioned above and unveil the full scope of the ZIKV infection importations through the surveillance system, we consider the output of the computational model developed by Zhang et al. [28], and briefly reiterated in Section 1. Specifically, for each of the 1736 simulated ZIKV epidemics, we extract all the travel-associated ZIKV infections entering the US whose time and airport of entry, stage of infection (exposed or infectious), and country of origin are known. We further divide the contiguous US into census areas (referred to as airport basins hereafter) centered on airport transportation hubs using a Voronoi tessellation (details of the method can be found in [1]). For each travel-associated ZIKV infection entering the same airport basin in the US, we randomly assign his/her location of residence within the airport basin at a resolution of 2.5km × 2.5km. The likelihood of assignment is proportional to the population density [6] of the geographical location. This allows us to create the simulated travel associated case counts (TCC) for each of the 1736 simulations.

The environmental suitability of ZIKV transmission in the United States
The risk of a travel-associated ZIKV infection to trigger local transmission also varies with time and location of entry into the contiguous US. A major driver of this spatiotemporal modulation is the environmental suitability of ZIKV transmission. Previous studies [19] suggest that only limited areas surrounding the gulf coast of the contiguous US are suitable for ZIKV transmission all year long. In contrast, a large fraction of the contiguous US is not suitable for ZIKV transmission due to the limited range of the Aedes mosquitoes' spatial distribution [4,14]. Transmission suitability in the rest of the contiguous US varies with seasonality, resulting in a transmission-allowed time window that in general decreases in length with the yearly average temperature [19,2,3]. When a travel-associated ZIKV infection enters the contiguous US, he/she has the potential to trigger ZIKV local transmission only during the "transmission-allowed" time window of his/her location of residence. The ZIKV transmission model described in Sections 1.2 and 1.3 explicitly considers the environmental determinants of ZIKV transmission including temperature and vector distribution. Here, we inform this model with high definition data on temperature [27] and Aedes mosquito distribution [13] in the contiguous US to obtain the spatiotemporally resolved environmental suitability of ZIKV transmission. Figure S3 shows the basic reproduction number R 0 at four different seasons in the contiguous US. From the figure, we can clearly observe the strong spatiotemporal heterogeneity of the Zika virus environmental suitability in the contiguous US.

Sensitivity analysis and counterfactual scenarios
To illustrate that the spatiotemporal variations of the three main drivers for ZIKV transmission (i.e., environmental suitability of ZIKV transmission, the socioeconomic risk of exposure to mosquitos, and the intensity of ZIKV infection importations) give rise to the spatial heterogeneity of the risk of local ZIKV transmission, we create three counterfactual scenarios. In each scenario, we modify one of the three drivers across the contiguous United States to uniformly mimic the settings in Miami-Dade, Florida while holding the two other drivers the same as in the main text analysis (referred to as the reference scenario hereafter).
Specifically, in Counterfactual Scenario 1, the environmental factor (the temperature and thus all temperaturemodulated disease parameters) and socioeconomic risk of exposure to mosquitoes remain the same as the reference scenario in the main manuscript, while for all airports in the US, the ZIKV infection importations are set to be the same as those of the airport in Miami-Dade, Florida. By repeating the same analysis performed with the actual data for the reference scenario, the county-level risk map of local ZIKV transmission in this scenario is shown in Figure S4.
Since the airport of Miami-Dade, Florida has one of the highest volumes of ZIKV infection importation, for areas that are environmentally suitable for ZIKV transmission, and socioeconomically at high risk of exposure to mosquitoes, the cumulative risk of local ZIKV transmission appears to be much higher when compared to the reference scenario presented in the manuscript. Figure S4: The cumulative risk map of local ZIKV transmission for each county in the contiguous US of Counterfactual Scenario 1, where the ZIKV infection importations are set to be the same as those of the airport in Miami-Dade, Florida across all airports in the contiguous US. The color scale indicates for any given county the probability of experiencing at least one ZIKV outbreak with more than 20 infections.
In Counterfactual Scenario 2, the level of ZIKV infection importation and the socioeconomic risk of exposure to mosquitoes remains the same as reference scenario. However, the temperature and consequently all temperaturemodulated parameters of ZIKV transmission model across the contiguous US are set to be the same as that in Miami-Dade, Florida. The county risk map of local ZIKV transmission during the year 2015-2016 is shown in Figure  S5.
In Miami-Dade, Florida, the environmental conditions favor ZIKV transmission, with the basic reproduction number R 0 greater than 1 all year long. Therefore, in this counterfactual scenario we observe that the regions at risk of local ZIKV transmission extend to multiple geographical regions previously not environmentally suitable for local ZIKV transmission (R 0 < 1). For instance we observe substantial risk of ZIKV local transmission in states such as Maine, New York, Illinois, Oregon, and Idaho.
In Counterfactual Scenario 3, the level of ZIKV infection importations and the environmental suitability (temperature and consequently temperature-modulated disease parameters) remain the same in the reference scenario. However, in this scenario, the socioeconomic risks of exposure to mosquitoes across the contiguous US are set to be the same as that in Miami-Dade, Florida. In this case the county risk map of local ZIKV transmission during the year 2015-2016 is shown in Figure S6.
Due to the low socioeconomic risk of exposure to mosquitoes in Miami-Dade, Florida in the reference scenario, in the Counterfactual Scenario 3 the only geographical location that shows substantial risk is southern Florida. In contrast, in the reference scenario areas in southern Florida, southern Texas and densely populated regions along the Gulf Coast are all areas at risk.
In summary, the simulations of all three counterfactual scenarios generate unrealistic risk maps of ZIKV local transmission across the contiguous United States. This sensitivity analysis confirms the importance of considering all the main drivers of ZIKV transmission, i.e., environmental suitability, socioeconomic risk of exposure to mosquitoes, and intensity of case importations, to quantitatively assess the risk of local ZIKV transmission in the contiguous US. Figure S5: The cumulative risk map of local ZIKV transmission for each county in the contiguous US of counterfactual scenario 2, where the temperature settings across the contiguous US are set to be the same as that in Miami-Dade, Florida. The color scale indicates for any given county the probability of experiencing at least one ZIKV outbreak with more than 20 infections. Figure S6: The cumulative risk map of local ZIKV transmission for each county in the contiguous US of counterfactual scenario 3, where the socioeconomic risks of exposure to mosquitos across the contiguous US are set to be the same as that in Miami-Dade, Florida. The color scale indicates for any given county the probability of experiencing at least one ZIKV outbreak with more than 20 infections.