Predictability and epidemic pathways in global outbreaks of infectious diseases: the SARS case study
© Colizza et al. 2007
Received: 10 October 2007
Accepted: 21 November 2007
Published: 21 November 2007
Skip to main content
© Colizza et al. 2007
Received: 10 October 2007
Accepted: 21 November 2007
Published: 21 November 2007
The global spread of the severe acute respiratory syndrome (SARS) epidemic has clearly shown the importance of considering the long-range transportation networks in the understanding of emerging diseases outbreaks. The introduction of extensive transportation data sets is therefore an important step in order to develop epidemic models endowed with realism.
We develop a general stochastic meta-population model that incorporates actual travel and census data among 3 100 urban areas in 220 countries. The model allows probabilistic predictions on the likelihood of country outbreaks and their magnitude. The level of predictability offered by the model can be quantitatively analyzed and related to the appearance of robust epidemic pathways that represent the most probable routes for the spread of the disease.
In order to assess the predictive power of the model, the case study of the global spread of SARS is considered. The disease parameter values and initial conditions used in the model are evaluated from empirical data for Hong Kong. The outbreak likelihood for specific countries is evaluated along with the emerging epidemic pathways. Simulation results are in agreement with the empirical data of the SARS worldwide epidemic.
The presented computational approach shows that the integration of long-range mobility and demographic data provides epidemic models with a predictive power that can be consistently tested and theoretically motivated. This computational strategy can be therefore considered as a general tool in the analysis and forecast of the global spreading of emerging diseases and in the definition of containment policies aimed at reducing the effects of potentially catastrophic outbreaks.
The outbreak of severe acute respiratory syndrome (SARS) in 2002–2003 represented a serious public health threat to the international community. Its rapid spread to regions far away from the initial outbreak created great concern for the potential ability of the virus to affect a large number of countries and required a coordinated effort aimed at its containment . Most importantly, it clearly pointed out that people's mobility and traveling along commercial airline routes is the major channel for emerging disease propagation at the global scale. Spatio-temporal structures of human movements thus need to be considered for a global analysis of epidemic outbreaks , as for example in , which incorporates the airline network structure of the largest 500 airports of the world.
In this article, we present a stochastic meta-population epidemic model, based on the extension of the deterministic modeling approach to global epidemic diffusion [4, 5], for the study of the worldwide spread of emerging diseases that includes the complete International Air Transport Association (IATA) commercial airline traffic database associated with urban areas census information [6, 7]. Once the disease parameters are determined, no free adjustable parameters are left in the model. A toolkit of specific indicators that consider the stochastic nature of the process is introduced to provide risk analysis scenarios and to assess the reliability of epidemic forecasts. In particular, the predictive power of the model is linked to the emergence of epidemic propagation pathways related to the complex properties of the transportation network. The SARS epidemic is used as a case study to assess the model effectiveness and accuracy against real data. The model considers disease parameters estimated from the Hong Kong outbreak in a way consistent with the global nature of the meta-population model by including the impact of infectious individuals traveling in and out of the city. The temporal and geographic pattern of the disease is analyzed, and the proposed toolkit of epidemic indicators is tested against empirical data.
We adopt a global stochastic meta-population model that considers a set of coupled epidemic transmission models. The approach is in the same spirit as the deterministic models used for the global spread of infectious diseases and their successive stochastic generalizations [3, 6, 7], where each compartmental model represents the evolution of the epidemic within one urban area, and the models are coupled by air travel. The air travel data from the IATA  database is included in the model and determines the traveling probabilities. It includes the 3 100 largest commercial airports around the globe and 17 182 connections among them, accounting for more than 99% of the total worldwide traffic. Each airport is surrounded by the corresponding urban area whose population is assumed to be homogeneously mixed for the disease dynamics. Population data is collected from several census databases (see [6, 7] for more specific details). The model is fully stochastic and takes into account the discrete nature of individuals both in the travel coupling and in the compartment transitions. The transmission model within each urban area follows a compartmentalization specific to the disease under study. For instance, in the case of a simple Susceptible-Infected-Recovered (SIR) model, the population N j of a city j is subdivided into susceptible, infectious and recovered individuals, so that N j = S j (t) + I j (t) + R j (t) where S j (t), I j (t) and R j (t) represent the number of individuals in the corresponding compartments at time t. In order to consider the discrete nature of the individuals in the stochastic evolution of the infection dynamics, we describe the disease propagation inside each urban area by introducing binomial and multinomial processes. Two kinds of processes are considered in the infection dynamics: the contagion process (e.g. the generation of new infectious through the transmission of the disease from infectious individuals to susceptibles) and the transition of individuals from one compartment to another (i.e. from infectious to recovered). In the first class of processes it is assumed that each susceptible in city j will be infected by the contact with an infectious individual with rate βI j (t)/N j , where β is the transmission rate of the disease. The number of new infections generated in city j is extracted from a binomial distribution with probability βI j (t)Δt/N j and number of trials S j (t), where Δt is the considered time scale interval. The second class describes a transition process, where the number of individuals changing compartment: e.g. in the SIR model for the city j: I j → R j with rate μ – is extracted from a binomial distribution with a probability given by the rate of transition (in the previous example μ) and number of trials given by the number of individuals in the compartment at time t (in the previous example I j (t)).
Each compartmental model in a given urban area is then coupled to the compartmental models of other urban areas via a travel stochastic operator that identifies the number of individuals in each compartment traveling from the urban area i to the urban area j. The number of passengers in the compartment X traveling from a city i to a city j is an integer random variable, in that each of the X i potential travellers has a probability p ij = w ij /N i to go from i to j where w ij is the traffic, according to the data, on a given connection in the considered time scale and N i is the urban area population. In each city I, the numbers of passengers traveling on each connection at time t define a set of stochastic variables that follows a multinomial distribution. In addition, other routing constraints and two legs travels can be considered. A detailed mathematical description of the traveling coupling is reported in [6, 7, 12].
The defined model considers stochastic fluctuations both in the individual compartmental transitions and in the traveling events. This implies that in principle each model realization, even with the same initial conditions, may be different from all the others. In this context, the comparison of a single realization of the model with the real evolution of the disease may be very misleading. Similarly, the mere comparison of the number of cases obtained in each country averaged over several realizations with the actual number of cases occurred is a poor indicator of the reliability of the achieved prediction. Indeed in many cases the average would include a large number of occurrences with no outbreaks in a variety of countries. It is therefore crucial to distinguish in each country (or to a higher degree of resolution, in each urban area) the non-outbreak from the outbreak realizations and evaluate the number of cases conditionally to the occurrence of the latter events. For this reason, we define in the following a set of indicators and analysis tools that can be used to provide scenarios forecast and real world data comparison.
The likelihood to experience an outbreak can be provided by analyzing different stochastic occurrences of the epidemic with the same initial conditions, and by evaluating the probability that the infection will reach a given country. In the following we will consider statistics over 103 different realizations of the stochastic noise, and define the probability of outbreak in each country as the fraction of realizations that produced a positive number of cases within the country. This allows for the identification of areas at risk of infection, with a corresponding quantitative measure expressed by the outbreak probability. A more quantitative analysis is obtained by inspecting the predicted cumulative number of cases for each country, conditional to the occurrence of an outbreak in the country. The outbreak likelihood and magnitude analysis can be broken down at the level of single urban areas. In the following section we present an example of the results available at this resolution scale.
The overlap Θ(t) assumes values between 0 and 1, being equal to 0 if at time t the two epidemic patterns do not share any common infected city, and equal to 1 if at time t the two realizations are identical. The more an outbreak is predictable, the more likely the two realizations will be similar, leading to a high value of the overlap function. In view of the strong fluctuations inherent to the infection process and the movement of individuals, the presence of an appreciable overlap can be possible only in the presence of a robust mechanism driving the disease propagation and leading to the emergence of epidemic pathways, i.e. preferential channels along which the epidemic will more likely spread [6, 7]. These pathways on their turn may find their origin in the large heterogeneities encountered in the traffic volume – ranging from a few passengers to 106 passengers per year – associated with the air travel connections. In order to pinpoint the presence of epidemic pathways, starting from identical initial conditions, one can simulate different outbreaks subject to different realizations of the stochastic noise and obtain the time evolution of the epidemic in each urban area as described in the main text. During the simulations, one observes the propagation of the virus from one country to the other by means of the air travel and thus monitors the path followed by the infection at the country level. At each outbreak realization, it is possible to identify for each country C i the country C j origin of the infection and construct the graph of virus propagation; namely, if a latent or an infectious individuals travels from C j to C i and causes an outbreak in the country C i – not yet infected – a directed link from C j to C i is created with weight equal to 1. Once the origin of infection for C i has been identified, the following multiple introductions in C i are not considered as we are only interested in the path followed by the disease in infecting a geographical region not yet infected. After a statistically significant number of realizations, a directed weighted network is obtained in which the direction of a link indicates the direction of the virus diffusion and the weight represents the number of times this flow has been observed out of n realizations. For each country C i we renormalize to 1 the sum of the weights on all incoming links, in order to define the probability of infection on each flow. The network of epidemic pathways is then pruned by deleting all directed links having an occurrence probability less than a given threshold, in order to clearly identify the major pathways along which the epidemic will spread. This information identifies for each country the possible origins of infection and provides a quantitative estimation of the probability of receiving the infection from each identified origin. It is therefore information of crucial importance for the development and assessment of preparation plans of single countries. Travel advisories or limitations and medical screenings at the ports of entry – such as those put in place during SARS epidemic – might well strongly benefit from the analysis and identification of such epidemic pathways.
Initial offset from 21 February (days)
Rate of transmission
L(t = 0)
Number of initial latent individuals
21 February + T 0-20 March
s f (t)
Scaling factor for the rate of transmission
21 March – 9 April
10 April – 11 July
Relative infectiousness of patients at the hospital
Average latency period (days)
21 February + T 0-25 March
Average period from onset of symptoms to admission (days)
25 March – 1 April
2 April – 11 July
μ R -1
Average period from admission to recovery (days)
μ D -1
Average period from admission to death (days)
Case fatality rate
Initial conditions are based on available evidence on the early stages of the outbreak and assume as index patient the first case detected out of mainland China, who arrived in Hong Kong on 21 February 2003 . Simulations are seeded in Hong Kong T 0 days after 21 February with the index patient and L(t = 0) initial latent. This allows the effective consideration of the observed super-spreading events [31–34] and multiple transmission before the index patient was hospitalized . The complete time frame under study is from T 0 days after 21 February to 11 July 2003, date corresponding to the last daily update by the World Health Organization (WHO) on the cumulative number of reported probable cases of SARS .
The values of the transmission rate β, of the initial number of latent individuals L(t = 0), and of the initial delay of T 0 days are determined through a least square fit procedure to optimize the agreement of the stochastic simulation results with Hong Kong data. The advantage with respect to previous approaches is that no closed boundaries are imposed on Hong Kong, allowing for the mobility of individuals traveling in the city and for a decrease of the pool of infectious individuals who leave the city by means of air travel. The optimization gives the following baseline values: β = 0.57 [0.56–0.59], L(t = 0) = 10 [8–11], T 0 = 3 days, where the errors reported for β and L(t = 0) correspond to a relative variation of 10% in the least square value from the minimum, once the offset is set to its optimal value. The obtained value of the reproductive number – R 0 = 2.76 – is in agreement with previous estimates . We also tested different initial conditions that do not effectively incorporate super-spreading events with no substantial changes in the results.
Forecasted number of cases for the countries with an incorrect prediction of outbreak
United Arab Emirates
While the results shown in Figures 2, 3, 4, 5 indicate that computational models can attain a predictive power, the present case study for the SARS epidemic is still subject to approximations and assumptions and very accurate predictions need the introduction of much more detail into the computational scheme. Taiwan is the prominent example in which the quantitative magnitude of the outbreak is not well predicted. Indeed, the model still lacks many features that are likely responsible for the deviations of simulation results from the official reports. Population heterogeneity in terms of travel frequency, obviously related to wealth distribution, is not considered. Specific features relative to single countries (such as health care systems, specific control strategies, travel screening etc.) are not taken into account by our stochastic model. We also do not consider variations in the virus transmissibility among infected individuals in each compartment. Similarly, we account only in an effective way for the occurrence of superspreading events and outbreak diversity in the initial stage of SARS transmission [33, 34, 36]. By contrast, the model readily allows the integration of the heterogeneity shown by SARS in the way it affected different countries and which represented a very peculiar feature of the virus. Finally, the inclusion of other transportation systems is likely to have an impact in contiguous geographical regions.
It is worth also noting, however, that the countries for which forecasts underestimate the empirical data showed some peculiarities in the evolution of the SARS spread. Taiwan for instance experienced an anomalous outbreak explosion after a temporary failure in the infection containment procedures in a single hospital . Moreover, it ought to be considered that the situation in mainland China is not trivially reproducible, due to the lack of available information on the actual initial conditions of the spread. Results for China are therefore not reported in the charts of Figure 5, as numerical simulations seeded in Hong Kong are likely not able to describe the outbreak occurred in that country. This fact is expected to have an impact especially in South-East Asian countries – such as Taiwan, Singapore and Vietnam – that have airline connections with large traffic towards China.
The computational approach presented here is the largest scale epidemic model at the worldwide level. Its good agreement with historical data of the SARS epidemic suggests that the transportation and census data used here are the basic ingredients for the forecast and analysis of emerging disease spreading at the global level. A more detailed version of the model including the interplay of different transportation systems, information about the specific conditions experienced by each country and a refined compartmentalization to include variations in the susceptibility and heterogeneity in the infectiousness would clearly represent a further improvement in the a posteriori analysis of epidemic outbreaks. In the case of a new emergent global epidemic the computational approach could be useful in drawing possible scenarios for the epidemic evolution. Though the initial conditions and the disease parameters will be unknown before the disease has already spread to a few countries, the computational approach would however allow in a short time the exploration of a wide range of values for the basic parameters and initial conditions, providing extensive data on the worst and best case scenarios as well as likelihood intervals, to the benefit of decision makers. In general, the encouraging results achieved with the present level of details introduced in the modeling schemes suggests that large scale computational approaches can be a useful predictive tool to assess risk management and preparedness plans for future emerging diseases and in understanding space-time variations of outbreak occurrences.
The authors thank the International Air Transport Association for making the commercial airline database available. AB and AV are partially funded by the European Commission-contract 001907 (DELIS). AV is partially funded by the NSF award IIS-0513650.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.