The ethics committee of Lyon University Hospital approved this study, and all participants gave signed, written informed consent. The data were collected anonymously.
Data collection platform
Contact network measurements are based on the SocioPatterns RFID platform (http://www.sociopatterns.org) [28, 30]. With this method, subjects wear a badge equipped with an active radiofrequency identification (RFID) device (tag). RFID devices engage in bidirectional radio communication at multiple power levels, exchanging packets that contain a device-specific identifier. At low power level, packets can only be exchanged between tags within a radius of 1 to 2 meters [28, 30]. This threshold is set to allow detection of a close-contact situation, during which a communicable disease infection can be transmitted, either by airborne transmission through coughing or sneezing, or directly by physical contact. Subjects wear the RFID badges on their chest, so that contacts are recorded only when participants face each other, as the body acts as a shield for the proximity-sensing RF signals. In addition to sensing nearby devices, RFID tags send the locally collected contact information to a number of receivers installed in the environment, which relay this information over a local area network to a computer system used for monitoring and data storage. Proximity scans are performed at random times, and each tag dispatches information to the receivers every few seconds. Time is then coarse-grained over 20 second intervals, during which face-to-face proximity can be assessed with a confidence in excess of 99% [28, 30]. This time scale is also adequate to follow the dynamics of social interaction.
All communication (from tag to tag, from tags to receivers, and from receivers to the data storage system) is encrypted. Contact data are stored in encrypted form, and all data management is completely anonymous. Other details on the data-collection infrastructure can be found elsewhere [28, 30].
Data collection in this study
Participants attending the 2009 Annual French Conference on Nosocomial Infections (http://www.sf2h.net/) were asked to wear RFID tags; of the 1,200 attendees, 405 volunteers wore the tags. Face-to-face interactions between these 405 volunteers were collected during 2 days of the conference (3rd and 4th of June 2009). The data were collected from 9 am to 9 pm on the first day and from 8.30 am to 4.30 pm on the second day (periods defined as 'day' in the following text). Contacts were not recorded outside of these time periods (periods defined as 'nights').
Empirical contact networks
To assess the role of the dynamic nature of the network of contacts in the dynamics of disease spread, we considered a network built on the explicit representation of the dynamic interactions between individuals (referred to as the dynamic network; DYN) at the shortest available temporal resolution (20 seconds) against two benchmark networks that are built on progressively lower amounts of information available on the interactions, referred to as the heterogeneous (HET) and homogenous (HOM) networks, respectively.
Firstly, taking advantage of the full spatial and temporal resolution, DYN considered the empirical sequence of successive contact events collected during the congress. Each contact was identified by the RFID identification numbers of the two individuals involved, and by its starting and ending times. The resulting network was a dynamic object encoding the actual chronology and duration of contacts, therefore preserving heterogeneity in the duration of contacts and the causality constraints between events. The latter is particularly important for disease spread, as it may prevent propagation along certain sequences of interactions that would otherwise be allowed in an aggregated static representation of the contact patterns. For example, if a susceptible individual A interacts first with an infectious individual B and then with a susceptible individual C, disease transmission can occur from B to A and then from A to C. If instead, A meets first C and later B, A can become infected from B, but the propagation from B to A and then to C is no longer possible.
The benchmark networks correspond to coarse-graining of the data on a daily scale. The first one, HET, was produced for each conference day by connecting individuals who came in contact during this conference day, thus aggregating all daily dynamic information in a single snapshot, and weighting each link by the total time the two individuals spent in face-to-face presence during the considered day. Therefore, HET included information on the actual contacts between individuals (who has met whom) and on the total duration of these contacts (how long A was in contact with B during the whole day), but disregarded information about the temporal order of contacts. In the previous example, the transmission from A to C could take place in both situations, representing the different sequences of the events. HET was therefore a daily aggregated network in which contacts were aggregated over a day, but the whole neighborhood structure between individuals was kept. As the conference lasted 2 days, the aggregation procedure produced two such networks, one for each day.
By contrast, the HOM network was constructed for each day by connecting individuals who were in face-to-face contact during the conference day, again aggregating all daily dynamic information in a single snapshot, but weighting each link with equal weight, corresponding to the mean duration of contacts between two individuals who have met each other on the same day in the HET network. The HOM construction may correspond to networks constructed by asking each participant to report with whom they have been in contact during the conference day, and then estimating for how long on average this contact lasted. For each conference day, HET and HOM have exactly the same structure of interactions from a topological point of view, but they differ by the assignments of weights on the links.
Generation of contact networks on longer timescales
Because we simulated the spread of a realistic infectious disease, which would be characterized by longer timescales than the data collection period, we introduced three different procedures to longitudinally extend the data-driven network, by preserving some of its features. The simplest procedure consisted of repeating the 2-day recordings. This repetition procedure, denoted as REP, was performed both for the dynamic sequence of contacts (DYN) and consistently for the set of daily HET and HOM networks. In this simple procedure, the same contacts were repeated for each attendee for each simulated sequence of 2 days; that is, the assumption was made that the same attendee always met the same set of other attendees, in the same order, and for the same duration. Although this procedure yields a realistic contact pattern for each single day, it uses only empirical data, and thus such a 'deterministic' repetition is rather unrealistic as time goes on. We therefore considered two additional procedures that might improve this limitation.
The first one, random shuffling (RAND-SH), consisted of producing 2-day sequences by randomly reshuffling the participants' identities, as given by their tag IDs. The overall sequence of contacts was preserved, but each contact was set as occurring between different attendees from one 2-day sequence to the next. DYN networks were then constructed as before, taking into account the 20- second temporal resolution, and the HET and HOM networks were obtained by aggregating the data for each day, as explained above. This method results in more realistic contact patterns being obtained, and avoids the unrealistic repetition of interactions between individuals. However, the RAND-SH procedure completely erases any correlations between the contact patterns of an attendee in successive 2-day sequences, which is also unrealistic. Analysis of the empirical contact networks shows that in fact a correlation did exist between the number of contacts of an attendee in the first and second conference days, and also that a fraction of contacts were repeated from one day to the next.
Therefore, we designed a third procedure (constrained shuffling; CONSTR-SH) for the generation of synthetic contact patterns starting from the 2-day sequence, which constrained the reshuffling to preserve the correlations between the attendees' social activity and the same fraction of repeated contacts during successive days (see Additional file 1).
It is important to note that in all cases we preserved the time frame during which data were collected, because no collection occurred outside the conference premises. For this reason, each individual was considered as isolated during the 'night' periods in the DYN network. We therefore also introduced such 'nights' in the HET and HOM networks by 'switching off' the links (that is, considering individuals as isolated) during these periods, thus resembling the circadian pattern encoded by the empirical data.
Epidemiological model
We considered a simple SEIR epidemic model for the simulation of the infectious-disease spread in the population under study, in which no births, deaths or introduction of new individuals occurred. Individuals were each assigned to one of the following disease states: Susceptible (S), Exposed (E), Infectious (I) or Recovered (R).
The model is individual-based and stochastic. Susceptible individuals may contract the disease with a given rate when in contact with an infectious individual, and enter the exposed disease state when they become infected but are not yet infectious themselves. These exposed individuals become infectious at a rate σ, with σ-1 representing the mean latent period of the disease. Infectious individuals can transmit the disease during their infectious period, whose mean duration is equal to v
-1. After this period, they enter the recovered phase, acquiring permanent immunity to the disease.
To compare simulation results obtained from the three different networks, we needed to adequately define the rate of infection for a given infectious-susceptible pair, depending on the definition of the networks themselves. β was defined as the constant rate of infection from an infected individual to one of their susceptible contacts on the unitary time step dt of the process. Given two people, an infectious individual A and a susceptible individual B, who are in contact during the unitary time step, the probability of B becoming infected during this period was given by βdt. To obtain the same mean infection probability in the HET and HOM networks over an entire 24-hour period (day and night), the weights on such networks needed to be rescaled by W
AB
/ΔT, defined as the ratio between the total sum of the duration of all contacts between A and B in a day, and the effective duration of the day (that is, the total time during which the links in the daily networks were considered active, discarding the 'nights'). Therefore, the probability of infection between A and B during the time step dt was βW
AB dt/ΔT for the HET network, and β<W> dt/ΔT for the HOM network (with <W> being the mean weight of the links in the HET network).
We considered two different disease scenarios for the simulations of disease spread on all networks under study. In particular, the following values were assumed for the duration of the mean latency period (σ-1), mean infectious period (v
-1) and transmission rate (β): (i) σ-1 = 1 days, v
-1 = 2 days and β = 3.10-4/s (very short incubation and infectious periods); and (ii) σ-1= 2 days, v
-1 = 4 days and β = 15.10-5/s (short incubation and infectious periods). These sets of parameter values were chosen to maintain the same value of β/v, which is the biologic factor responsible for the rate of increase of cases during the epidemic outbreak, while changing the global timescales of incubation and infectious periods, and assessing the role played by the social factors embedded in the contact patterns. Short incubation and infectious periods were used so as to minimize the consequences of the arbitrariness in the construction procedures of long datasets as described above. Each simulation started with a single randomly chosen infectious individual, with the rest of the population being in the susceptible state.
Analysis of the empirical contact networks and of the simulation results
To describe the empirical contact networks, we calculated the number of contacts, the mean duration of contacts, the mean degree of a node (defined as the number of distinct individuals encountered by the individual under scrutiny), the mean clustering coefficient (which describes the local cohesiveness), the mean shortest path (defined as the mean number of links to cross to go from one node to another, and the correlation between the properties of the nodes in the aggregated networks of the first and second conference day). For this analysis, we measured the Pearson correlation coefficients between the degree of an individual in the first and second day, and between the time spent in interaction in the first and second day.
The comparison of the epidemic outbreaks in the three networks under study was performed by analyzing several parameters, namely the final size of the epidemic, the number of infectious individuals during the epidemic peak, the time of the peak, and the duration of the epidemic.
Since we aim at assessing the impact on spreading phenomena of the contact patterns, of their dynamic nature, and of the available amount of details on their dynamics we also estimated the reproductive number R
0, defined as the expected number of secondary infections from an initial infected individual in a completely susceptible host population [1]. Several methods can be used to compute R
0 [40, 41], possibly yielding different estimates [42] for the same epidemiological parameters. In this study, we computed the value of R
0 as the mean, over different realizations, of the number of secondary cases from the single initial randomly chosen infectious individual. Mean R
0 values and variances were then compared for the three networks (DYN, HET and HOM) and the three data-extension procedures (REP, RAND-SH and CONSTR-SH) under study.