Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees

Background The spread of infectious diseases crucially depends on the pattern of contacts between individuals. Knowledge of these patterns is thus essential to inform models and computational efforts. However, there are few empirical studies available that provide estimates of the number and duration of contacts between social groups. Moreover, their space and time resolutions are limited, so that data are not explicit at the person-to-person level, and the dynamic nature of the contacts is disregarded. In this study, we aimed to assess the role of data-driven dynamic contact patterns between individuals, and in particular of their temporal aspects, in shaping the spread of a simulated epidemic in the population. Methods We considered high-resolution data about face-to-face interactions between the attendees at a conference, obtained from the deployment of an infrastructure based on radiofrequency identification (RFID) devices that assessed mutual face-to-face proximity. The spread of epidemics along these interactions was simulated using an SEIR (Susceptible, Exposed, Infectious, Recovered) model, using both the dynamic network of contacts defined by the collected data, and two aggregated versions of such networks, to assess the role of the data temporal aspects. Results We show that, on the timescales considered, an aggregated network taking into account the daily duration of contacts is a good approximation to the full resolution network, whereas a homogeneous representation that retains only the topology of the contact network fails to reproduce the size of the epidemic. Conclusions These results have important implications for understanding the level of detail needed to correctly inform computational models for the study and management of real epidemics. Please see related article BMC Medicine, 2011, 9:88


Background
The pattern of contacts between individuals is a crucial determinant for the spread of infectious diseases in a population [1]. The topological structure of the contact network of the population, the presence of people with a much larger number of contacts than the mean value [2][3][4][5], the clustering and presence of well-identified communities of people [6][7][8][9][10], and the frequency and duration of contacts [11][12][13] all have important implications for the spread and control of epidemics. Knowledge of contact patterns is crucial for building and informing computational models of infectious disease transmission [14][15][16][17][18][19][20][21][22][23]. Although some of the properties of contact patterns can dramatically affect the model predictions [3][4][5], little is known about their empirical characteristics, and few experiments have been conducted to collect data on how individuals mix and interact.
The starting point of most modeling approaches is the assumption of homogeneous mixing, which assumes that every individual has an equal probability of contacting other individuals in the population [1]. No heterogeneity in the mixing pattern or in the duration or frequency of the contact is considered, and the dynamic nature of the contacts is disregarded. Going beyond this approximation, various approaches have been proposed to estimate mixing properties between classes of people (for example, social or age classes) using indirect [1] and, more recently, direct [11,[24][25][26][27] methods. Indirect methods are based on estimating the elements of a 'who acquires infection from whom' (WAIFW) matrix using observed seroprevalence data. In direct methods, each element of a contact matrix is estimated independently from the epidemiologic data. Direct methods rely on data collection about at-risk events via diaries [11,12] or time-use data [2,27]. To date, research on human social interaction has been mainly based on self-reported data. Despite a real improvement in the description of potential contacts with respect to a homogeneous mixing approach, selfreport methods involve a limited number of people who provide information on a limited number of snapshots in time (usually 1 day). The obtained data may be subject to uncontrolled bias and a lack of representativeness, because they are not based on objective reports, and because the data collection is performed on a random day and is not longitudinal. These limitations become particularly relevant in the case of contact patterns and infectious diseases transmitted by the respiratory or close-contact routes. For these diseases, all types of social encounters, even random contacts of very short duration (for example, on public transport), may be important for transmission, but are rather difficult to report objectively and exhaustively through a diary method.
New technologies are now available that allow the tracking of proximity to and interactions between individuals [28][29][30][31][32][33][34][35][36][37], greatly transforming our ability to understand and characterize social behavior [38]. Detection of contact patterns can rely on objective and unsupervised measures of proximity behavior that can be extended to a large number of people, with high temporal and spatial resolution [28,30], thus overcoming the limitations of self-reported data. Departing from the typical static representation of a network of contacts between individuals [39], it is now possible to describe the dynamic nature of the interactions. Analysis of the dynamics of a contact network needs to incorporate two essential features: (i) variations in the duration and frequency of the contacts between individuals, and (ii) the existence of causality constraints in the possible chains of transmission.
Finally, little is known about the level of detail that should be incorporated in the modeling effort to perform in practice realistic simulations of epidemics spreading in a population. Very coarse descriptions of human behavior, such as the homogeneous mixing hypothesis, leave out crucial elements. Conversely, extremely detailed information may yield a lack of transparency in the models, making it difficult to discriminate the effect of any particular modeling assumption or component.
The aim of this study was to assess the role of the temporal aspects, heterogeneities and constraints of dynamic contact patterns in shaping the dynamics of an infectious disease in a population using data collected during a 2-day medical conference. In this study, we capitalized on the recent development of a data-collection infrastructure that allows the tracking of face-toface proximity of individuals at a high temporal resolution [28,30]. We used the data collected during a scientific conference to provide temporal information on individual contact events. Such data can be mapped onto a dynamic network of contacts, in which all information on interactions between pairs of individuals, time of occurrence and duration are explicit in the network representation. Along with the explicit dynamic network of contacts, we considered two different projections of the data, defining two types of daily networks that aggregate the empirical data in different ways, which reflect different amounts of available knowledge about the contacts between individuals. We then simulated the spread of an infectious disease over these networks, and highlighted the role that different features of contact patterns and their dynamic aspects played during the course of the simulated outbreak. The results have important implications for identification of the level of detail needed for contact data to adequately and realistically inform modeling approaches applied to public health problems.

Methods
The ethics committee of Lyon University Hospital approved this study, and all participants gave signed, written informed consent. The data were collected anonymously.

Data collection platform
Contact network measurements are based on the Socio-Patterns RFID platform (http://www.sociopatterns.org) [28,30]. With this method, subjects wear a badge equipped with an active radiofrequency identification (RFID) device (tag). RFID devices engage in bidirectional radio communication at multiple power levels, exchanging packets that contain a device-specific identifier. At low power level, packets can only be exchanged between tags within a radius of 1 to 2 meters [28,30]. This threshold is set to allow detection of a close-contact situation, during which a communicable disease infection can be transmitted, either by airborne transmission through coughing or sneezing, or directly by physical contact. Subjects wear the RFID badges on their chest, so that contacts are recorded only when participants face each other, as the body acts as a shield for the proximity-sensing RF signals. In addition to sensing nearby devices, RFID tags send the locally collected contact information to a number of receivers installed in the environment, which relay this information over a local area network to a computer system used for monitoring and data storage. Proximity scans are performed at random times, and each tag dispatches information to the receivers every few seconds. Time is then coarsegrained over 20 second intervals, during which face-toface proximity can be assessed with a confidence in excess of 99% [28,30]. This time scale is also adequate to follow the dynamics of social interaction.
All communication (from tag to tag, from tags to receivers, and from receivers to the data storage system) is encrypted. Contact data are stored in encrypted form, and all data management is completely anonymous. Other details on the data-collection infrastructure can be found elsewhere [28,30].

Data collection in this study
Participants attending the 2009 Annual French Conference on Nosocomial Infections (http://www.sf2h.net/) were asked to wear RFID tags; of the 1,200 attendees, 405 volunteers wore the tags. Face-to-face interactions between these 405 volunteers were collected during 2 days of the conference (3rd and 4th of June 2009). The data were collected from 9 am to 9 pm on the first day and from 8.30 am to 4.30 pm on the second day (periods defined as 'day' in the following text). Contacts were not recorded outside of these time periods (periods defined as 'nights').

Empirical contact networks
To assess the role of the dynamic nature of the network of contacts in the dynamics of disease spread, we considered a network built on the explicit representation of the dynamic interactions between individuals (referred to as the dynamic network; DYN) at the shortest available temporal resolution (20 seconds) against two benchmark networks that are built on progressively lower amounts of information available on the interactions, referred to as the heterogeneous (HET) and homogenous (HOM) networks, respectively.
Firstly, taking advantage of the full spatial and temporal resolution, DYN considered the empirical sequence of successive contact events collected during the congress. Each contact was identified by the RFID identification numbers of the two individuals involved, and by its starting and ending times. The resulting network was a dynamic object encoding the actual chronology and duration of contacts, therefore preserving heterogeneity in the duration of contacts and the causality constraints between events. The latter is particularly important for disease spread, as it may prevent propagation along certain sequences of interactions that would otherwise be allowed in an aggregated static representation of the contact patterns. For example, if a susceptible individual A interacts first with an infectious individual B and then with a susceptible individual C, disease transmission can occur from B to A and then from A to C. If instead, A meets first C and later B, A can become infected from B, but the propagation from B to A and then to C is no longer possible.
The benchmark networks correspond to coarse-graining of the data on a daily scale. The first one, HET, was produced for each conference day by connecting individuals who came in contact during this conference day, thus aggregating all daily dynamic information in a single snapshot, and weighting each link by the total time the two individuals spent in face-to-face presence during the considered day. Therefore, HET included information on the actual contacts between individuals (who has met whom) and on the total duration of these contacts (how long A was in contact with B during the whole day), but disregarded information about the temporal order of contacts. In the previous example, the transmission from A to C could take place in both situations, representing the different sequences of the events. HET was therefore a daily aggregated network in which contacts were aggregated over a day, but the whole neighborhood structure between individuals was kept. As the conference lasted 2 days, the aggregation procedure produced two such networks, one for each day.
By contrast, the HOM network was constructed for each day by connecting individuals who were in face-toface contact during the conference day, again aggregating all daily dynamic information in a single snapshot, but weighting each link with equal weight, corresponding to the mean duration of contacts between two individuals who have met each other on the same day in the HET network. The HOM construction may correspond to networks constructed by asking each participant to report with whom they have been in contact during the conference day, and then estimating for how long on average this contact lasted. For each conference day, HET and HOM have exactly the same structure of interactions from a topological point of view, but they differ by the assignments of weights on the links.

Generation of contact networks on longer timescales
Because we simulated the spread of a realistic infectious disease, which would be characterized by longer timescales than the data collection period, we introduced three different procedures to longitudinally extend the data-driven network, by preserving some of its features. The simplest procedure consisted of repeating the 2-day recordings. This repetition procedure, denoted as REP, was performed both for the dynamic sequence of contacts (DYN) and consistently for the set of daily HET and HOM networks. In this simple procedure, the same contacts were repeated for each attendee for each simulated sequence of 2 days; that is, the assumption was made that the same attendee always met the same set of other attendees, in the same order, and for the same duration. Although this procedure yields a realistic contact pattern for each single day, it uses only empirical data, and thus such a 'deterministic' repetition is rather unrealistic as time goes on. We therefore considered two additional procedures that might improve this limitation.
The first one, random shuffling (RAND-SH), consisted of producing 2-day sequences by randomly reshuffling the participants' identities, as given by their tag IDs. The overall sequence of contacts was preserved, but each contact was set as occurring between different attendees from one 2-day sequence to the next. DYN networks were then constructed as before, taking into account the 20-second temporal resolution, and the HET and HOM networks were obtained by aggregating the data for each day, as explained above. This method results in more realistic contact patterns being obtained, and avoids the unrealistic repetition of interactions between individuals. However, the RAND-SH procedure completely erases any correlations between the contact patterns of an attendee in successive 2-day sequences, which is also unrealistic. Analysis of the empirical contact networks shows that in fact a correlation did exist between the number of contacts of an attendee in the first and second conference days, and also that a fraction of contacts were repeated from one day to the next.
Therefore, we designed a third procedure (constrained shuffling; CONSTR-SH) for the generation of synthetic contact patterns starting from the 2-day sequence, which constrained the reshuffling to preserve the correlations between the attendees' social activity and the same fraction of repeated contacts during successive days (see Additional file 1).
It is important to note that in all cases we preserved the time frame during which data were collected, because no collection occurred outside the conference premises. For this reason, each individual was considered as isolated during the 'night' periods in the DYN network. We therefore also introduced such 'nights' in the HET and HOM networks by 'switching off' the links (that is, considering individuals as isolated) during these periods, thus resembling the circadian pattern encoded by the empirical data.

Epidemiological model
We considered a simple SEIR epidemic model for the simulation of the infectious-disease spread in the population under study, in which no births, deaths or introduction of new individuals occurred. Individuals were each assigned to one of the following disease states: Susceptible (S), Exposed (E), Infectious (I) or Recovered (R).
The model is individual-based and stochastic. Susceptible individuals may contract the disease with a given rate when in contact with an infectious individual, and enter the exposed disease state when they become infected but are not yet infectious themselves. These exposed individuals become infectious at a rate σ, with σ -1 representing the mean latent period of the disease. Infectious individuals can transmit the disease during their infectious period, whose mean duration is equal to v -1 . After this period, they enter the recovered phase, acquiring permanent immunity to the disease.
To compare simulation results obtained from the three different networks, we needed to adequately define the rate of infection for a given infectious-susceptible pair, depending on the definition of the networks themselves. β was defined as the constant rate of infection from an infected individual to one of their susceptible contacts on the unitary time step dt of the process. Given two people, an infectious individual A and a susceptible individual B, who are in contact during the unitary time step, the probability of B becoming infected during this period was given by βdt. To obtain the same mean infection probability in the HET and HOM networks over an entire 24-hour period (day and night), the weights on such networks needed to be rescaled by W AB /ΔT, defined as the ratio between the total sum of the duration of all contacts between A and B in a day, and the effective duration of the day (that is, the total time during which the links in the daily networks were considered active, discarding the 'nights'). Therefore, the probability of infection between A and B during the time step dt was bW AB dt/ΔT for the HET network, and b<W> dt/ΔT for the HOM network (with <W> being the mean weight of the links in the HET network).
We considered two different disease scenarios for the simulations of disease spread on all networks under study. In particular, the following values were assumed for the duration of the mean latency period (σ -1 ), mean infectious period (v -1 ) and transmission rate (β): (i) σ -1 = 1 days, v -1 = 2 days and β = 3.10 -4 /s (very short incubation and infectious periods); and (ii) σ -1 = 2 days, v -1 = 4 days and β = 15.10 -5 /s (short incubation and infectious periods). These sets of parameter values were chosen to maintain the same value of β/v, which is the biologic factor responsible for the rate of increase of cases during the epidemic outbreak, while changing the global timescales of incubation and infectious periods, and assessing the role played by the social factors embedded in the contact patterns. Short incubation and infectious periods were used so as to minimize the consequences of the arbitrariness in the construction procedures of long datasets as described above. Each simulation started with a single randomly chosen infectious individual, with the rest of the population being in the susceptible state.

Analysis of the empirical contact networks and of the simulation results
To describe the empirical contact networks, we calculated the number of contacts, the mean duration of contacts, the mean degree of a node (defined as the number of distinct individuals encountered by the individual under scrutiny), the mean clustering coefficient (which describes the local cohesiveness), the mean shortest path (defined as the mean number of links to cross to go from one node to another, and the correlation between the properties of the nodes in the aggregated networks of the first and second conference day). For this analysis, we measured the Pearson correlation coefficients between the degree of an individual in the first and second day, and between the time spent in interaction in the first and second day.
The comparison of the epidemic outbreaks in the three networks under study was performed by analyzing several parameters, namely the final size of the epidemic, the number of infectious individuals during the epidemic peak, the time of the peak, and the duration of the epidemic.
Since we aim at assessing the impact on spreading phenomena of the contact patterns, of their dynamic nature, and of the available amount of details on their dynamics we also estimated the reproductive number R 0 , defined as the expected number of secondary infections from an initial infected individual in a completely susceptible host population [1]. Several methods can be used to compute R 0 [40,41], possibly yielding different estimates [42] for the same epidemiological parameters. In this study, we computed the value of R 0 as the mean, over different realizations, of the number of secondary cases from the single initial randomly chosen infectious individual. Mean R 0 values and variances were then compared for the three networks (DYN, HET and HOM) and the three data-extension procedures (REP, RAND-SH and CONSTR-SH) under study.

Results
In total, 28,540 face-to-face contacts between 405 attendees at a 2-day conference were recorded, and the probability distribution of the duration of these contacts was plotted (Figure 1). The mean duration was 49 seconds, with large variations (SD 112 seconds), meaning a large number of contacts of brief duration, a few contacts of long duration, and a broad tail, suggesting that no typical contact duration could be defined. Statistical distributions of the number and duration of contacts and of the link weights were similar from one day to the next, although the two daily contact networks were obviously not identical.
In the daily contact networks, the mean degree of a node was close to 30, with a distribution decaying exponentially for large numbers. The mean clustering coefficient was 0.28, much larger than the mean value of 0.07 obtained for a random network of the same size and mean degree. The network was also a small world, with a mean shortest path of 2.2 (snapshots of the network of the first conference day are shown; see Additional File 2).
The link weights, by contrast, had a broad distribution, with a mean cumulated duration of the interaction between two attendees of 2 minutes. The total duration spent in contact by any attendee also had a broad distribution, with a mean of 75 minutes. The Pearson correlation coefficient between the degree of an individual in the first and second day was 0.37, and that between the total time spent in interaction in the first and second day was 0.52. The fraction of repeated contacts in the second day with respect to the first was 12%, and was independent of the degree.
The distributions of R 0 for the three networks using the REP procedure were also plotted ( Figure 2). In all cases, the number of secondary cases from the initial seed of the single infectious individual ranged from 0, corresponding to the most probable event of no outbreak, to around 20 to 25 individuals (the mean values and the variances obtained for the estimation of R 0 , depending on the scenarios and the network type are shown: Figure 3; see Additional file 3). In all scenarios, higher values of R 0 , together with larger variances, were observed in the HOM network compared with the HET and DYN networks.
The distribution of the final number of cases for the three networks and the REP data-extension procedure are also shown (Figure 4). In this plot, a high probability of rapid extinction of the pathogen spread was seen, corresponding to a small number of infected individuals. This was slightly smaller in the HOM case compared with the HET and DYN networks. By contrast, when the epidemic started, the final number of cases was high, and it was larger in the HOM network than in the HET and DYN networks. Intermediate cases with limited propagation were rare.
The distribution of the final number of cases for the three networks was analyzed for the various parameters of the SEIR model and for the various extrapolation scenarios (Table 1; see Additional file 4). In all cases, and independently from the procedure adopted for extending the 2-day dataset, the probability of extinction for the HOM network was lower than for the HET and DYN networks. In the case of large outbreaks, the final number of cases was higher in the HOM network than in the HET and DYN networks. Propagation over the HET and DYN networks led to a similar extinction probability and to a similar final number of cases. The final numbers of cases for both disease scenarios (i.e., short and very short latency and infectious periods) were also fairly close.
Regarding the peak times of disease spread in the various cases ( Figure 5; see Additional file 5), we found that in most cases, the peak of the epidemic was reached first on average for spread within the HOM network. However, the differences between the peak times were small, and even the simulations on the network with the least information gave a good estimate of the peak time obtained when the full information on the contact patterns was included.
Using the evolution in time of the number of infectious and recovered individuals for the different dataextension procedures and for the two sets of SEIR parameters, the temporal behavior of disease spread was analyzed ( Figure 6; Figure 7). Symbols represent the median values, and lines represent the fifth and ninetyfifth percentiles of the number of infectious and recovered individuals. In all cases, disease spread on the HOM network evolved slightly faster and reached a significantly larger number of individuals, compared with the HET and DYN, which had very similar characteristics to each other.
Interesting differences were seen in the results of simulations on datasets extended with different procedures ( Figure 5, Figure 6, Figure 7). The spread was slightly slower in the RAND-SH case, but lasted longer, ad consequently the final number of cases R ∞ was larger. In fact, we systematically found R ∞ (REP) <R ∞ (CONSTR-SH) <R ∞ (RAND-SH), and the more the identities of the tags were shuffled, the more efficient was the spread.

Discussion
Using a recently developed data collection technique deployed during a 2-day conference involving 405 volunteers, we measured the dynamics of contact (face-to-face) interactions between individuals during such a social event. We used the data to compare the simulated spread of communicable diseases on this dynamic network (DYN) and on two networks, one heterogeneous (HET) and one homogeneous (HOM), obtained by aggregating  the dynamic network at two distinct levels of precision. To compensate for the relatively short duration of the observation period (2 days), we designed two different models to construct dynamical contact networks spanning an extended time period during which the spread of an infectious disease could be simulated. The broad distributions of the various network characteristics reported in this study were consistent with those seen in other contexts [30,36,37]. Our results bear also similarity with those reported previously for interaction networks at conferences [30,36], in which the resulting picture was not characterized by the presence of 'superspreaders', when they were defined in terms of the number of distinct individuals contacted. This was however less clear when the cumulated interaction time was taken into account.
In the three networks, disease extinction occurred as frequently (between 36% and 47%) as large outbreaks (between 34% and 49%). Outbreaks tended to be explosive (attack rate between 51% and 80%), consistently with previous work [4]. A large difference in the process of disease spread was apparent between the HOM network (which did not include any information on the heterogeneity of contact durations nor on the dynamic aspect) and the two other networks; for the HOM network there was a systematically larger number of infected individuals. This result implies that heterogeneity in the contact durations between individuals is associated with a lower spread of transmission, suggesting that a single individual who does not spend their time equally between their contacts effectively reduces the routes of disease spread [12,15]. Disregarding the heterogeneity of contact durations can lead to large differences in the estimated number of cases, suggesting that information on the daily cumulated contact time between individuals gives crucial information for correct modeling of disease spread. Interestingly, however, the peak time was only slightly changed in the HOM network, showing that even rather limited information can yield good estimates of the epidemic timescales.
The comparison between disease spread in the HET and DYN networks provides insights into whether temporal constraints due to the precise sequence of the contacts might affect the propagation of disease. Given two individuals, the overall expected probability of a transmission occurring during the interval ΔT is the same in both cases (that is, bW AB ), so the only difference is that the contact is not continuously present in the DYN network, but it may be intermittent and repeated only during the actual recorded contacts. This introduces time constraints on the paths that the infectious agent can follow between individuals in the DYN network, which may slow down disease spread on the DYN network compared with the HET network. However, this slowing down of infection and the differences in the final number of cases between the HET and DYN networks were too small to be relevant for the simulations investigated here. The similarity between the spreading behaviors in the HET and DYN networks was independent of the different procedures used to extend the initial 2-day dataset. These procedures created successive artificial 'days' which differed from each other by various amounts, that is, with a different level of repetition of contacts from one day to the next. The robustness of the comparison between HET and DYN therefore indicates that the observed similarity between the spreading on the HET and DYN networks is due to the discrepancy between the timescales considered for propagation (of the order of days), and the temporal resolution and the contact durations (of 20 seconds and of the order of minutes up to a few hours, respectively). The total time spent in contact by each pair of individuals was in this context sufficient to describe precisely the propagation pattern, as shown by the peak time and the final number of cases. Therefore, for the simulation of diseases such as those considered in this study, contact information at a daily resolution might be enough to characterize disease spread, and the precise order of the sequence of contacts might not be needed. However, this would not be the case for extremely fast-spreading processes, as shown in previous work [36]. This implies that there is a crossover between the two regimens, which will be the subject of future investigations.
Finally, the difference between the results obtained for the different procedures REP, RAND-SH and CONSTR-SH shows the importance of knowledge of the respective fractions of repeated and new contacts between successive days [8,12,43]. Repeated encounters favor propagation, so that the REP procedure led to an initially faster Table 1 Distribution of the final number of cases for the three network types according to the four scenarios (5000 runs, dynamic contact network of 405 participating attendees) spread, but contacts between different individuals from one day to the next favor propagation across the network, so that the RAND-SH procedure led in the end to a larger attack rate.
Compared with other approaches [11,26,27], the data collection method used in this study makes it possible to gather information on actual face-to-face contacts, with high temporal and spatial resolution [28,30,36]. It allows access to the precise durations, time and order of the successive contacts between individuals, fully representing the corresponding heterogeneity and the causality constraints in the chain of transmission.

Limitations
Unsupervised data-collection systems based on RFID infrastructures, such as the one presented here [28,30,37] carry some caveats that need to be discussed. First, individuals are not followed outside of the zone covered by RFID readers, so that contacts between participants that occur during the day outside of the area covered by the RFID readers are not monitored. This results in an underestimation of the number of contacts, and therefore of the possibilities for disease spread. Moreover, in this study, the periods of 'nights' represented a proportion of 56% of the 24-hour period, during which individuals were assumed to be isolated. This may artificially increase the probability of extinction if the contagiousness period of an infected individual ends during these periods, precluding further transmission. This issue may be solved by upcoming technological improvements that will allow operation of the RFID sensing layer in a fully distributed fashion with on-board storage on the devices themselves; that is, such RFID tags will register and store contacts even if they are not close to RFID readers.
Another issue, well known in the field of social networks, is due to the partial sampling of the population. Of the 1,200 attendees at this conference, 405 (34%) participated in the data collection. Consequently, only these attendees were taken into account in the model of disease spread, whereas they were in fact also in contact with the non-participating attendees. Previous investigation [30] has shown that for a wide variety of real-world deployments of the RFID proximity-sensing platform used in this study, the behavior of the statistical distributions of quantities such as contact durations is not altered by unbiased sampling of individuals. However, paths of disease spread between sampled attendees that also involved unsampled attendees may have existed, but were not taken into account. This effect may lead to an underestimation of disease spread, and future work will focus on quantification of such possible biases, for instance through bootstrapping procedures. In addition, it is possible that the volunteering participants themselves introduced a systematic bias into the sampled population concerning their interaction behavior, as they self-selected to participate to the experiment. However, assessment of this effect would require independent data sources for monitoring unsampled individuals, inevitably limiting the size of populations and settings because of logistics constraints. Although interesting for the understanding of social behavior, such a study would need to be specifically designed and tailored to the research question, thus going beyond the aim of the present study. Another interesting perspective would be to compare and integrate the results of unsupervised contact measurements with the results of simultaneously performed survey-or diary-based inquiries. Finally, the limited period (2 days) of data collection made it necessary to generate artificially longer datasets by different procedures in order to model the spread of pathogens on realistic timescales. Deployment of the measuring infrastructure on much longer timescales is planned so as to validate such generation procedures and to measure their effect.