Conceived and designed the experiments: AB CC LI BL J-FP JS WVdB PV NV. Performed the experiments: AB CC J-FP CR NV. Analyzed the data: AB CC LI MQ JS WVdB NV. Wrote the paper: AB CC J-FP JS NV.
The authors have declared that no competing interests exist.
Little quantitative information is available on the mixing patterns of children in school environments. Describing and understanding contacts between children at school would help quantify the transmission opportunities of respiratory infections and identify situations within schools where the risk of transmission is higher. We report on measurements carried out in a French school (6–12 years children), where we collected data on the time-resolved face-to-face proximity of children and teachers using a proximity-sensing infrastructure based on radio frequency identification devices.
Data on face-to-face interactions were collected on Thursday, October 1st and Friday, October 2nd 2009. We recorded 77,602 contact events between 242 individuals (232 children and 10 teachers). In this setting, each child has on average 323 contacts per day with 47 other children, leading to an average daily interaction time of 176 minutes. Most contacts are brief, but long contacts are also observed. Contacts occur mostly within each class, and each child spends on average three times more time in contact with classmates than with children of other classes. We describe the temporal evolution of the contact network and the trajectories followed by the children in the school, which constrain the contact patterns. We determine an exposure matrix aimed at informing mathematical models. This matrix exhibits a class and age structure which is very different from the homogeneous mixing hypothesis.
We report on important properties of the contact patterns between school children that are relevant for modeling the propagation of diseases and for evaluating control measures. We discuss public health implications related to the management of schools in case of epidemics and pandemics. Our results can help define a prioritization of control measures based on preventive measures, case isolation, classes and school closures, that could reduce the disruption to education during epidemics.
The role of children in the community spread of respiratory infections such as influenza is a challenging epidemiological issue
However, little is known about how children actually mix in a school environment
In order to reduce this knowledge gap, the research priorities comprise collecting data on activities and interactions of children, in particular in schools. Until recently, most empirical studies have relied on self-reported information such as questionnaire-based surveys to determine mixing patterns
We deployed a proximity-sensing infrastructure based on radio-frequency identification devices (RFID) in a French primary school, and used it to collect, in an unsupervised manner
The study took place in a primary school in Lyon, France during two days in October 2009. The age of the students (elementary cycle, composed of 5 grades) ranges between 6 and 12 years. In this school, each of the 5 grades is divided in two classes, for a total of 10 classes. Each class has an assigned room and an assigned teacher. The smallest class has 22 children and the largest 26, for a total of 241 children and 10 teachers. 232 children and all teachers participated in the data collection. The school day runs from 8.30am to 4.30pm, with a lunch break from 12pm to 2pm, and two breaks of 20–25 min around 10.30am and 3.30pm. Lunches are served in a common canteen, and a shared playground is located outside the main building. As the playground and the canteen do not have enough capacity to host all the students at the same time, only two or three classes have breaks at the same time, and lunches are taken in two consecutive turns.
The French national bodies responsible for ethics and privacy, the Comission Nationale de l'Informatique et des Libertés (CNIL,
The measurement infrastructure, developed in the context of the SocioPatterns project
Data on face-to-face interactions between 232 children (96% coverage) and 10 teachers (100% coverage) across 10 classes were collected over two days (Thursday, October 1st 2009 and Friday, October 2nd 2009, see
Grade | Class name | Numbers of children or teachers | Number of participating children or teachers | Participating rate (%) | ||
Day 1 | Day 2 | Day 1 | Day 2 | |||
1 | Class 1A | 24 | 22 | 23 | 91.7 | 95.8 |
Class 1B | 25 | 25 | 25 | 100 | 100 | |
2 | Class 2A | 25 | 22 | 23 | 88.0 | 92.0 |
Class 2B | 26 | 25 | 26 | 96.2 | 100 | |
3 | Class 3A | 24 | 23 | 23 | 95.8 | 95.8 |
Class 3B | 22 | 21 | 21 | 95.5 | 95.5 | |
4 | Class 4A | 23 | 21 | 21 | 91.3 | 91.3 |
Class 4B | 24 | 22 | 22 | 91.7 | 91.7 | |
5 | Class 5A | 24 | 22 | 21 | 91.7 | 87.5 |
Class 5B | 24 | 23 | 23 | 95.8 | 95.8 | |
- | Teachers | 10 | 10 | 10 | 100 | 100 |
The patterns of contacts between children, and the corresponding mixing patterns between classes and age groups are analyzed through several quantities describing the number of contacts between individuals, the duration of these contacts, and the cumulated time spent in contact by each pair of individuals, as well as their statistical distributions characterized in particular by average and coefficient of variation squared (
More precisely, we define the following weights quantifying the proximity relations of a pair of individuals
the occurrence
the frequency
the cumulative duration
As children are grouped into classes, we also compute:
the total number of contacts between children of classes A and B,
the total time spent in contact between children of classes A and B
the total number of contacts of children of class A,
the total contact time of children of class A
The quantities
In order to study the temporal structure of the interaction network, we also measure for each child
We also build contact networks aggregated on a 20-minute timescale: each day is divided in sliding windows of 20 minutes, starting at intervals of 5 minutes and, for each 20-minute period, edges are drawn between those pairs of individuals for which at least one contact was recorded during this period. The average degree of each 20-minute network gives the average number of individuals with whom a given individual has been in contact with during the corresponding time window. By using these 20-minute sliding windows we filter out the fast fluctuations of the dynamical contact network and only retain the slowly-varying information on the network evolution.
As a summary of the contacts of each day, we additionally build two daily aggregated networks in which edges are drawn between a pair of individuals whenever at least one contact was recorded for that pair during the considered day. Each edge is weighted by the total time the corresponding individuals spent in contact during that day.
The two daily aggregated networks are compared using various measures. We compute the Pearson correlation coefficients between the characteristic parameters (number of contacts, total time spent in contact, etc.) measured for each individual in day 1 and in day 2. We also compare the network structures at a more detailed level, measuring the similarity between the neighborhoods of each node across the two days. A simple measure of similarity is given by the respective numbers of new and repeated distinct persons contacted in day 2 with respect to day 1. This can be further refined by specifying if these new and repeated contacts occur within the same class or with individuals of other classes. Moreover, as each link
This quantity is 1 if
Finally, by measuring the relative rates at which the RFID readers receive the packets emitted by individual badges, it is possible to perform approximate localization of the badges, and tell which RFID reader is closest to any given badge. Since the readers were installed in the classrooms, in the canteen, and in the courtyard, it is possible to detect in which of these areas each badge was situated at any point in time. This allows to construct the trajectories that children followed in space as they move within the school premises.
We recorded a total of 77,602 contact events involving 242 individuals (37,414 contacts on day 1 and 40,188 on day 2), with an average of about 317 contacts per individual on the first day (coefficient of variation squared
Panel C gives the distributions of the number of distinct individuals with whom an individual of each class has had at least one contact. In each boxplot, the horizontal line gives the median, the box extremities are the 25th and 75th percentiles, and the whiskers correspond to the 5th and 95th percentiles.
Each individual, on average, was in contact with 50 distinct individuals (
Most contacts are of short duration, but contacts of very different durations are observed, including rather long ones.
The heterogeneity of contact patterns is also observed for cumulated contact durations.
When considering single individuals, the distribution of total time spent by an individual in face-to-face proximity with other individuals is more homogeneous, with an average of 10340 seconds (2 h 52 mn) for day 1 and 11000 seconds (3 h 03 mn) for day 2, with
The matrix entry for row A and column B gives the number of contacts (
The matrix entry for row A and column B gives the cumulated duration (
1st A | 1st B | 2nd A | 2nd B | 3rd A | 3rd B | 4th A | 4th B | 5th A | 5th B | teachers | |
1st A | 4505 | 1051 | 594 | 625 | 560 | 286 | 83 | 160 | 57 | 105 | 149 |
1st B | 1051 | 9756 | 502 | 632 | 269 | 207 | 551 | 161 | 448 | 386 | 1084 |
2nd A | 594 | 502 | 5401 | 1583 | 657 | 360 | 77 | 56 | 76 | 30 | 586 |
2nd B | 625 | 632 | 1583 | 6270 | 712 | 373 | 119 | 36 | 41 | 54 | 508 |
3rd A | 560 | 269 | 657 | 712 | 5537 | 2076 | 77 | 163 | 109 | 82 | 414 |
3rd B | 286 | 207 | 360 | 373 | 2076 | 5926 | 248 | 193 | 154 | 219 | 282 |
4th A | 83 | 551 | 77 | 119 | 77 | 248 | 4496 | 828 | 351 | 745 | 382 |
4th B | 160 | 161 | 56 | 36 | 163 | 193 | 828 | 2843 | 119 | 346 | 168 |
5th A | 57 | 448 | 76 | 41 | 109 | 154 | 351 | 119 | 4913 | 1968 | 372 |
5th B | 105 | 386 | 30 | 54 | 82 | 219 | 745 | 119 | 1968 | 5025 | 273 |
teachers | 149 | 1084 | 586 | 508 | 414 | 282 | 382 | 168 | 372 | 273 | 101 |
The matrix entry for row A and column B gives the total number of contacts
1st A | 1st B | 2nd A | 2nd B | 3rd A | 3rd B | 4th A | 4th B | 5th A | 5th B | teachers | |
1st A | 2242.3 | 582.7 | 315.3 | 340.0 | 260.7 | 126.3 | 30.3 | 61.3 | 20.0 | 37.0 | 61.7 |
1st B | 582.7 | 5611.0 | 234.3 | 367.0 | 119.3 | 83.7 | 271.0 | 84.3 | 197.0 | 169.0 | 459.7 |
2nd A | 315.3 | 234.3 | 3055.3 | 1068.3 | 339.3 | 219.0 | 30.0 | 19.3 | 25.66 | 11.7 | 331.7 |
2nd B | 340.0 | 367.0 | 1068.3 | 3723.0 | 365.7 | 179.7 | 53.3 | 16.0 | 14.66 | 20.0 | 247.3 |
3rd A | 260.67 | 119.3 | 339.3 | 365.7 | 2839.7 | 1105.3 | 29.7 | 75.3 | 40.33 | 30.0 | 201.7 |
3rd B | 126.3 | 83.67 | 219.0 | 179.7 | 1105.3 | 3436.3 | 117.7 | 85.7 | 56.0 | 85.0 | 147.3 |
4th A | 30.3 | 271.0 | 30.0 | 53.3 | 29.67 | 117.7 | 2421.7 | 439.3 | 163.0 | 373.0 | 179.7 |
4th B | 61.3 | 84.3 | 19.3 | 16.0 | 75.3 | 85.7 | 439.3 | 1600.0 | 46.0 | 207.3 | 68.3 |
5th A | 20.0 | 197.0 | 25.7 | 14.7 | 40.3 | 56.0 | 163.0 | 46.0 | 2671.0 | 966.7 | 188.3 |
5th B | 37.0 | 169.0 | 11.7 | 20.0 | 30.0 | 85.0 | 163.0 | 207.3 | 966.66 | 2752.7 | 134.7 |
teachers | 61.67 | 459.67 | 331.7 | 247.3 | 201.67 | 147.3 | 179.7 | 68.3 | 188.33 | 134.7 | 65.0 |
The matrix entry for row A and column B gives the cumulated duration
1st grade | 2d grade | 3d grade | 4th grade | 5th grade | Teachers | |
|
322.6 (177.7) | 24.8 (13.3) | 13.9 (6.2) | 10.0 (4.7) | 10.4 (4.4) | 13.0 (5.5) |
|
24.6 (13.2) | 274.8 (162.7) | 21.7 (11.4) | 3.0 (1.2) | 2.1 (0.7) | 11.4 (6.0) |
|
15.0 (6.7) | 23.9 (12.5) | 307.7 (167.8) | 7.7 (3.5) | 6.4 (2.4) | 7.9 (4.0) |
|
11.1 (5.2) | 3.3 (1.4) | 7.9 (3.6) | 189.9 (103.7) | 18.2 (9.2) | 6.4 (2.9) |
|
11.3 (4.8) | 2.3 (0.8) | 6.3 (2.4) | 17.5 (8.9) | 269.3 (148.6) | 7.3 (3.6) |
|
61.7 (26.1) | 54.8 (29.0) | 34.8 (17.5) | 27.5 (12.4) | 32.4 (16.2) | 10.5 (6.8) |
The cell of row A and column B of the matrix gives the average number (and the duration in minutes, between parenthesis) of contacts involving an individual of grade A with any individual of grade B, per day.
The median value is represented with a black line, the 95% confidence interval is shown in gray and the number of individuals over which the statistics are calculated is shown in red dashes. Breaks and beginning and end of lunch are characterized by a sudden increase of the degree, showing the occurrence of large numbers of contact events.
The average total number is displayed in black, the average number of children of the same class in red, and the average number of children of other classes in blue.
The average total time is displayed in black, the average time spent with children of the same class in red, and the average time spent with children of other classes in blue.
Edges between individuals having interacted less than 2 minutes have been removed, thus keeping only the strongest links. The width of links corresponds to the cumulative duration of contacts, and nodes with higher number of edges have larger size. Colors correspond to classes, teachers are shown in grey. Figure created using the Gephi software,
A comparison between the characteristics of the overall face-to-face contact patterns in the two days of the deployment is reported in
Day 1 | Day 2 | |
Number of individuals | 236 | 238 |
Average number of contacts of an individual (CV2) | 317 (0.22) | 338 (0.27) |
Average total time in contact of an individual, in minutes (CV2) | 172 (0.25) | 183 (0.33) |
Average number of distinct persons contacted (CV2) | 50 (0.14) | 46.5 (0.18) |
Average cumulated time spent in contact by two persons, in seconds (CV2) | 207 (5.4) | 236 (4.7) |
Average duration of a contact , in seconds(CV2) | 32.6 (1.2) | 32.6 (1.1) |
Average clustering coefficient | 0.5 | 0.56 |
At a more detailed level, the Pearson correlation coefficients between the number of contacts of an individual in the first and second day is 0.53; for the time spent in contact, it is 0.54; for the number of distinct persons contacted it is 0.53. These values show an overall strong correlation between the behavior of individuals from one day to the next.
Moreover, each child, on average, has 26 repeated contacts on the second day with children met during the first day (19 in the same class and 7 in a different class), and new contacts with 20 other children (1.4 in the same class, 18.4 in a different class). The average cosine similarity between his/her neighborhoods across the two days is 0.67 (0.74 for the neighborhood restricted to his/her own class, 0.2 for the neighborhood restricted to children in a different class). This indicates a repetitive pattern inside each class but a non negligible renewal of the contacts between classes across consecutive days.
Each row corresponds to a particular place in the school (classroom, canteen, courtyard) where a RFID reader was situated, and each colored line corresponds to the spatio-temporal trajectory of the children of a class (only 5 classes are shown for clarity). Line widths correspond to the number of children whose approximate position correspond to the row area. A line can become thinner if children leave the school (for instance during the lunch break, to have lunch at home) or divide itself into two thinner lines if two groups of children of the same class follow distinct paths in the school The trajectories highlight how mixing between classes, shown by the fact that the colored lines overlap, occurs during the breaks and is strongly constrained by the school schedule.
To our knowledge, this is the first study presenting detailed measures of close (face-to-face) proximity interactions between children in a primary school (see however
A number of other studies describe or estimate social contact numbers and durations
To allow a more informed comparison between studies based on different methodologies, we compute for each child or for each pair of individuals the number and total duration of contacts lasting longer than a given threshold. The results are summarized in
A. Filtering procedure: only contacts of duration at least T | Average daily number of distinct other children in contact | Average daily cumulated duration of contacts with other children, in minutes |
T = 0 | 47.4 | 176 |
T = 40 s | 20.8 | 100 |
T = 1 mn | 11.8 | 65 |
T = 2 mn | 4..1 | 28 |
T = 3 mn | 2.2 | 19 |
B. Filtering procedure: only cumulated contacts at least W | Average daily number of distinct other children in contact | Average daily cumulated duration of contacts with other children, in minutes |
W = 0 | 47.4 | 176 |
W = 1 mn | 21.4 | 163 |
W = 2 mn | 15.2 | 153 |
W = 5 mn | 8.1 | 129 |
W = 7 mn | 6.1 | 117 |
W = 10 mn | 4.3 | 102 |
W = 12 mn | 3.5 | 93 |
W = 15 mn | 2.7 | 81 |
Our results show that children mix preferentially with children within their age group. This effect, known as age homophily, is largely due to the fact that children study together and have the same schedule, and represents a general feature studied in various contexts by sociologists
These results may help to advise public decision-makers on interventions aimed at containing or mitigating the propagation of communicable diseases at the level of schools, in particular in case of an epidemic or a pandemic. School closure has been proposed as an effective physical intervention to reduce transmission of respiratory pathogens, especially influenza
The development of mathematical models that aim at describing the spread of the infection and its prevention and control is hindered by the lack of information on the contact patterns between individuals. Epidemiological models of disease transmissions in structured populations depend heavily on the knowledge of the amount and duration of contacts between individuals of different age groups. To reduce this knowledge gap, we provide the exposure matrix of
These results highlight important properties of the contact patterns between school children that need to be taken into account when modeling the propagation of diseases and when evaluating control measures. On the one hand, our results tend to indicate that assumptions such as the homogeneity of contact durations, or a homogeneous mixing between classes, may yield misleading results. On the other hand,
In the following, we discuss some limitations of the present study, and point to strategies for moving forward.
First of all, the deployed infrastructure only measured contacts between children while they were in the school building or in the playground. Badges were not worn during sport activities, which often involve close proximity situations and physical contacts. Moreover, even though the children would not be in school during a school closure, they would still mix with other children and adults in the community and spread the virus through these contacts. It would be interesting to use the data collection infrastructure to combine school data with household data and data on contact patterns during school closure
Another potential issue concerns the possibility that children changed their behaviour because they were wearing badges and knew they were participating in a scientific measure. According to observers familiar with the environment (teachers and staff), however, no significant change could be detected in the children's behavior, and the children seemed to rapidly forget about the badges. In addition, while detailed explanations were given to the parents about the study and the badges, details on the role of the RFID badges (e.g., their detection range) were not given to the children.
From a public health perspective, it has to be emphasized that the collected data provide information on the mutual proximity of badges (and therefore of the persons wearing the badges), but not on the occurrence of physical contacts. Our measurements may thus be used in the context of, e.g., respiratory-spread pathogens but not for infectious agents transmitted by skin contact. Note however that physical contact can only occur between persons who are already in spatial proximity. Therefore, it would be very interesting to study the fraction of close encounters that result in a physical contact. In the future, the use of devices that can directly sense physical contact (e.g., body-area networks) may be explored.
The short period of time (two days) of data collection also limits the ability to draw conclusions on what happens at longer time scales. Deployment of the sensing infrastructure over much longer timescales is needed in order to confirm the present results.
Finally, the data presented in this study depend on the school schedule and spatial structure, and the generalization of our results to other schools should be carried out with caution. Some properties are however expected to be rather general, such as the heterogeneity of the contact durations and of the cumulated contact durations, that has been observed in several other settings
Further research will use the gathered data to simulate the transmission of infectious agents (e.g. respiratory or gastro-enteric viruses) inside a school, to evaluate the role of the index case, and to assess the impact of various containment measures (e.g. class closure, homogeneous partial vaccination vs. vaccination of whole classes, at fixed coverage, etc). Further deployments in other schools with different schedules, other countries, and possibly for longer periods, will also be very useful to cross-validate our findings.
Movie representing the dynamical evolution of the contacts during the first day of the deployment. Each dot represents an individual, and an edge is drawn when a contact between two individuals occurs. Only contacts lasting at least 40 s are retained. Each frame corresponds to an aggregation of the contact network over a time window of 20 mn, and successive frames correspond to aggregation time windows shifted by 10 s; the movie is then built using 20 frames per second. Nodes are disposed in circles corresponding to the various classes, with the teacher at the center, and color-coded according to the grade (teachers are shown in black).
(AVI)
Comparison of the measured average numbers and durations of contacts across several studies.
(DOC)
Cumulative contact network for day 1, in gml format; each edge is weighted by the total time in face-to-face proximity (“duration”) and the number of events of face-to-face proximity (“count”) detected between the two corresponding RFID badges.
(GML)
Cumulative contact network for day 2, in gml format; each edge is weighted by the total time in face-to-face proximity (“duration”) and the number of events of face-to-face proximity (“count”) detected between the two corresponding RFID badges.
(GML)
List of tags and corresponding class of the person who wore it.
(TXT)
We warmly thank Bitmanufaktur, the Openbeacon project and Truelite for their technical support. We are particularly grateful to all children, their parents and the school staff who volunteered to participate in the data collection.