An analysis of school absences in England during the COVID-19 pandemic

Background The introduction of SARS-CoV-2, the virus that causes COVID-19 infection, in the UK in early 2020, resulted in the introduction of several control policies to reduce disease spread. As part of these restrictions, schools were closed to all pupils in March (except for vulnerable and key worker children), before re-opening to certain year groups in June. Finally, all school children returned to the classroom in September. Methods Here, we analyse data on school absences in late 2020 as a result of COVID-19 infection and how that varied through time as other measures in the community were introduced. We utilise data from the Department for Education Educational Settings database and examine how pupil and teacher absences change in both primary and secondary schools. Results Our results show that absences as a result of COVID-19 infection rose steadily following the re-opening of schools in September. Cases in teachers declined during the November lockdown, particularly in regions previously in tier 3, the highest level of control at the time. Cases in secondary school pupils increased for the first 2 weeks of the November lockdown, before decreasing. Since the introduction of the tier system, the number of absences with confirmed infection in primary schools was observed to be (markedly) lower than that in secondary schools. In December, we observed a large rise in the number of absences per school in secondary school settings in the South East and London, but such rises were not observed in other regions or in primary school settings. We conjecture that the increased transmissibility of the new variant in these regions may have contributed to this rise in secondary school cases. Finally, we observe a positive correlation between cases in the community and cases in schools in most regions, with weak evidence suggesting that cases in schools lag behind cases in the surrounding community. Conclusions We conclude that there is no significant evidence to suggest that schools are playing a substantial role in driving spread in the community and that careful monitoring may be required as schools re-open to determine the effect associated with open schools upon community incidence.


Introduction
In late 2019, a novel strain of coronavirus, now known as SARS-CoV-2, emerged in Wuhan, China [1,2]. Over the next few months, this virus spread around the world, with the World Health Organization declaring a global pandemic on 11th March 2020 [3]. Upon infection with the virus, individuals can develop COVID-19 disease. In some instances, infected people do not develop symptoms or are only mildly infected, with symptoms including a dry cough, a fever, shortness of breath and a loss of taste and smell [4][5][6]. However, in more serious cases, predominantly in the elderly and those with underlying health conditions, hospitalisation and admission to intensive care may be required, with many of these individuals dying as a result of their infection [7][8][9][10].
In the UK, the government began to introduce control policies in March 2020 in order to prevent the spread of infection. These included the closing of all pubs, restaurants and non-essential shops on 20th March, as well as the closing of schools [11] to all pupils except for vulnerable children or those with key worker parents. On 23rd March, the UK entered full lockdown, whereby people were only allowed out of their house for essential shopping, medical treatment, essential work and one form of exercise per day [12].
All children returned to school in England from 1st September, but with non-pharmaceutical interventions (NPIs) in place [14]. In many instances, this involved staggered drop off and pick up times for children, mandatory wearing of masks in playgrounds for parents and in indoor settings excluding classrooms for secondary school children, as well as advice to parents not to congregate outside the school gates. Additionally, children and staff were placed into "bubbles" in order to minimise the risk of large-scale spread. In primary schools, these bubbles were introduced either at the level of the class or the year group, whilst in most secondary schools, the entire year group would typically form a bubble, owing to substantial mixing across different academic subjects [15]. If a pupil or staff member within a bubble tested positive for COVID-19, all others within the bubble were required to isolate for 14 days and were advised to seek a test if they started to develop symptoms.
During October 2020, as cases began to rise, the government introduced a regional three-tiered system in order to control disease spread [16,17]. Each region of the country would be placed into a tier dependent upon the local incidence, the effective reproduction number and the local hospital occupancy. In all tiers, the "rule of six" was in place, which prohibited mixing in groups of more than six people, but as regions escalated through the tiers, the settings in which the individuals could meet outside their household became more restricted in an attempt to curb the spread of infection. When the tiers were introduced on 14th October 2020, the majority of the South of England and the Midlands were in tier 1, the lowest level of control, whilst many parts of the North were first placed into tier 2, with some regions escalating further into tier 3 a few days later. By the end of October 2020, cases of COVID-19 were rising across the country in a concerning way [18,19] and the government announced that a 4-week lockdown would be introduced in England from 5th November 2020. However, schools remained open during this lockdown, as it was decided that the need for children to remain in education outweighed the risks associated with schools remaining open.
The emergence of a new, more transmissible variant, B.1.1.7, in the South East of England during this period contributed to a marked increase in spread towards the end of the year, particularly in the South East, Greater London and the East of England [20]. It became apparent that the November lockdown in England had not been sufficient to bring the reproduction number (R) below 1 and that more action would be needed urgently to avoid the National Health Service becoming overwhelmed in the new year. Over the Christmas period, there was significant debate over the need for another national lockdown and whether or not this lockdown should include the closure of schools. Finally, on 4th January, the government announced that a new lockdown would be introduced in England with immediate effect and that schools would again be closed to all children except vulnerable children and those with key worker parents [21].
The closing of schools was brought in as a necessary measure on 20th March 2020 and again on 4th January 2021, owing to the need for a substantial tightening of restrictions in order to bring the R number below 1. However, decisions to close schools need to balance the risk associated with transmission of SARS-CoV-2 with the negative impact of school closures upon children's educational needs and their health and well-being. The evidence to date strongly indicates that the vast majority of children are only mildly affected by the disease and the mortality rates are extremely low [22,23]. There has been a higher level of uncertainty regarding children's role in transmission of the virus [24,25] and the effect that the closing of schools will have upon the reproduction number, though when schools are open there is the potential for increased mixing of parents which may contribute to transmission. In order to ensure that children can return to the classroom as quickly but safely as possible, it is important to understand the risk in both primary and secondary settings and how that risk may vary depending on the incidence of COVID-19 in the local community.
In this paper, we analyse data from the Department for Education on school absences in primary and secondary settings throughout the pandemic, and how the level of absences was dependent upon the current state of the pandemic and the level of controls that were in place in the wider community at the time. Owing to the restrictions that were in place before the summer vacation and the fact that schools were only open to key worker children, vulnerable children and specific year groups at that time, the overall attendance at school showed substantial variability during 2020 (Fig. 1). We focus here upon the autumn term, from September to December 2020, when all children were in school, in order to provide evidence regarding the risk associated with schools remaining open during this time.

Data sources
The Department for Education: Educational Setting Status data [26] were extracted over four time periods: 22nd March 2020 to 30th May 2020, 1st June 2020 to 16th July 2020, 1st September 2020 to 10th October 2020 and 12th October 2020 to 17th December 2020. A database for each extraction contains the status of each school on each day throughout the extracted period, including quantitative and qualitative records for attendance and absences. Available in all records were the school ID number, time stamp and the number of pupils in attendance. Details summarising the changes made to the databases throughout the time period we analysed are given in Table 1.
In this paper, we primarily consider the data collected in the autumn term (1st September 2020 to 17th December 2020) when all pupils on roll were eligible to attend school. It is important to note that during the first half of this time frame (1st September 2020 to 10th October 2020), records of teacher absences were limited, with less than 1% of schools submitting this information.
From the school URN number (ID), the location of each school including postcode, Lower Tier Local Authority (LTLA) and regional information was extracted from the government database [27]. Additionally, phase of education (e.g. primary or secondary) and establishment type (e.g. state, private, academy) were accessible for each school. LTLAs in the UK are a lower level of local government, typically managed by a district, borough or city council, and during the COVID-19 pandemic, tend to have been the finest scale at which data have been reported on.

Data processing
We cleaned the data by removing any rows with missing date values and smoothing the pupil roll and teacher roll over the term 1 period. We approximated the total

Eligibility
Total number pupils -Y * Y, all pupils on roll Y, all pupils on roll

Eligible pupils absent
Due to suspected or confirmed COVID-19

Teachers absent
Due to suspected or confirmed COVID-19 number of pupils on roll at each school by taking the maximum of the daily number of pupils on roll at each school over term 1. This was performed to smooth out any small changes over the term, particularly to remove the drop in eligibility during half term and the usual staggered opening of early years schooling in September, when reception children typically first go to school in a phased manner over the first 2 weeks of term. Teacher roll was not available until 12th October 2020 (as described in Table 1). We approximated the teacher roll prior to 12th October 2020 by taking the maximum recorded in each school between the 12th October 2020 and 17th December 2020 period, assuming that the teacher roll is constant over a school term.

Data analysis methods
The data were aggregated spatially by summing over each LTLA or region. It is these aggregated values that we used for our spatial analyses, rather than considering each school individually within a region. Additionally, we also grouped the data by the current control policies that were in place in the corresponding LTLA -this was computed by aggregating over all schools under each tier allocation over time, whereby each school was categorised by the tier of its LTLA each day. To study the spatial-temporal patterns, we compute and map community case percentages as the proportion of positive tests from the Pillar 2 polymerase chain reaction (PCR) test data (testing done in the community and not in hospitals) for each LTLA, averaged over the 5-day school week; teacher and pupil percentages were calculated based upon the proportion of teachers and pupils who are absent from school due to a positive test in each LTLA, averaged over the 5-day school week. Relative incidence thresholds were calculated in the first week of November by assigning the top 10% of LTLAs to the 'Very high' category, the 75th-90th percentile as 'High' , the 50th-75th percentile as 'Medium' and the remainder 'Low' . We then used the same threshold values in all subsequent weeks, to allow for comparison across weeks.
To assess the impact of the new variant, B.1.1.7, we considered specific regions which, at the time, had been differentially impacted by the new variant. London and Kent were chosen due to the high number of new variant compatible cases reported, whilst Devon and the West Midlands were chosen due to having had fewer reported cases compatible with the new variant. Additionally, Devon and West Midlands had differing tier statuses on the 3rd December 2020, with Devon in tier 2 and the West Midlands in tier 3.
We applied two approaches to assess the possible impact of the B.1.1.7 variant on school absences: (i) inspect the distribution of student absences due to a confirmed COVID-19 case on a day in November 2020 and a day in December 2020 and (ii) analyse lagged correlations between absences and community cases.
We inspected the distribution of student absences due to a confirmed COVID-19 case on 4th November 2020 and 16th December 2020 in each of the four regions. Explicitly, for each region, we identified the number of students absent in each school due to a confirmed positive test on the specified days and observed how these absences per school were distributed on each day.
To study lagged correlations between absences and community cases, we calculated the Pearson's correlation coefficient between the number of community cases in an LTLA on 1 day, and the number of pupil absences due to a confirmed positive COVID-19 test on another day. We considered discrete daily lags from [−10, 16], with a lag of +k referring to the correlation between cases in the community on day t − k and absences in school due to a confirmed COVID-19 test on day t. We considered primary and secondary schools separately, with a single school on 1 day within the specified region corresponding to a single data point.

Temporal variation in absences in schools by region
We analysed the total number of confirmed cases in schools in all regions (Fig. 2). Cases in pupils steadily increased in all regions following the return to school in early September. We note that, owing to a change in the data recording system in mid-October, a distinct increase in the number of secondary school pupils absent is observed across all regions. We believe that this is an artefact of an alteration in the data recording system rather than a true rise in absences in that week. Following half term in late October, confirmed cases in pupils continued to rise, noticeably in secondary schools. Throughout this period, the percentage of confirmed cases in secondary school pupils was much higher than that in primary schools. Cases were seen to reduce in all regions 2 weeks after the introduction of lockdown in November. In December, cases in secondary school students in Greater London increased markedly (Fig. 2, light green line), but in other regions, particularly those in tier 3 such as the West Midlands and the North West (Fig. 2, orange and pink lines), cases continued to decrease, indicating that a reduction of spread in the community may have resulted in a reduction of cases in schools.
Confirmed cases in teachers declined throughout November in regions under greater restrictions prior to lockdown (North West, North East, West Midlands), compared to a slight increase in lower tier control regions. We did not observe a marked difference between the percentage of confirmed cases in teachers in primary and secondary schools (Fig. 2c, d). The number of cases in teachers increased in Greater London and the East of Fig. 2 Percentage of study population recorded as a confirmed case, stratified by region. For each panel, we display the number of cases by date and by region, from 1st September 2020 to 17th December 2020. Cases in teachers were not recorded in the data prior to 12th October 2020, when the data outputs from DfE were updated (the date after which the data changed is indicated by the vertical dashed lines). The half term week for most of England is shown by the dark grey shaded region whilst the light grey shaded region represents the national lockdown in England which commenced on 5th November. a Pupils in primary schools. b Pupils in secondary schools. c Teachers in primary schools. d Teachers in secondary schools. In b, we observe a spike in cases in Greater London on a single day in late September that is a result of a single school reporting a high number of absences on that day. Given that this only occurs for 1 day, we believe that this is a data entry error England in December, but at a lower rate than in secondary school pupils. For all regions, in both primary and secondary schools, we find a strong correlation between cases in pupils and teachers, with a larger number of cases in students in secondary schools but no evidence of increased risk to teachers in this setting (Fig. 3).

Analysis of absences owing to cases of COVID-19 in school children and teachers by tier status of LTLA of school location
We also examined the number of absences as a result of confirmed cases in pupils and teachers, stratified by tier status of the relevant local authority (Fig. 4). We observe Fig. 3 Confirmed cases in teaching staff (by percentage per region) against confirmed cases in pupils (by percentage per region) by day for all regions. For all panels, the circle for each region indicates the earliest date in this data set (12th October 2020) whilst the square indicates the latest date (18th December 2020). The correlation coefficient for each region is given in the legend. Cases are shown for all: a all schools; b primary schools only; and c secondary schools only a marked difference between students and teachers when stratified by tier status. In primary schools, cases in students increased slightly in tiers 1 and 2 for the first 2 weeks of the national lockdown in November, though remained relatively static in tier 3 (Fig. 4a). Cases then began to marginally reduce across all tiers. In secondary schools, confirmed cases in students increased across all tiers for the first 2 weeks of lockdown before decreasing (Fig. 4b).
In tier 3 regions, cases continued to decline after lockdown, whilst there was a marginal increase in cases in tier 2 regions. We observe a different pattern of behaviour in teachers -confirmed cases in regions previously in tier 3 declined throughout the lockdown in both primary and secondary schools, whilst there was a marginal increase in confirmed cases in tier 2 and tier 1 regions during this same period (Fig. 4c, d). Cases in teachers increased slightly in tier 2 regions in the second week of December in both settings, whilst they continued to decline in tier 3 regions.

Spatiotemporal analysis of community cases and cases in schools in November and December
We investigated the spatiotemporal behaviour of cases at the lower tier local authority (LTLA) level in the community, in school teachers and school pupils from early November to the end of term.
Community cases were highest in the North, the Midlands and Greater London at the start of the November lockdown. Cases were observed to decrease in these regions during lockdown with, in the last week of Fig. 4 Percentage of study population recorded as a confirmed case, stratified by intervention tier status. For each panel, we display the number of cases by date and by intervention tier status, from 12th October 2020 to 17th December 2020. The half term week for most of England is shown by the dark grey shaded region, with the period corresponding to the national lockdown shown by the light grey shaded region. The faded dots indicate the tier status prior to the national lockdown that was introduced on Thursday 5th November 2020. a Pupils in primary schools. b Pupils in secondary schools. c Teachers in primary schools. d Teachers in secondary schools. It should be noted that several regions changed tiers when lockdown was lifted on 2nd December, leading to an observed discontinuity in the data displayed in the figure on this date November, the emergence of a new cluster in the South East and London (Figs. 5 and 6, left columns). Cases in teachers (Figs. 5 and 6, middle columns) and pupils (Figs. 5 and 6, right columns) did reduce during lockdown, but at a slower rate. A new cluster of cases emerged in school teachers and pupils in late November, whilst the country was under lockdown and a similar increase in cases in the community is observed in December. At LTLA level, there is some slight variation observed between local authorities reporting very high numbers of community cases and very high numbers of cases in schools.
Given this observed increase in community cases in the South East during November and December, we investigated whether there was any signal indicating an increase in clusters of cases in schools during this period. We studied the frequency distribution of the number of confirmed Relative incidence at LTLA level by week for England. For the week commencing 10th November, the top 10% of LTLAs are designated 'Very high', the 75th-90th percentile 'High', the 50th-75th percentile 'Medium' and the remainder 'Low'. Cases are grouped into (left column) community, (middle column) school teachers and (right column) school pupils. Cases in the community are calculated as the percentage of swab tests administrated in each LTLA which returned a positive result; cases in teachers and pupils as the percentage which are absent from school due to a positive test cases by school on 4th November, the day before lockdown was introduced, and on 16th December, 2 days before the end of term (Fig. 7). We observe that, in Greater London and Kent, more secondary schools reported a greater number of students absent with confirmed infection in the last week of term compared to early November. However, we do not observe the same effect in primary schools in these regions -when absences with infection are reported in primary schools, in the majority of cases, there is a single child absent with confirmed infection (Fig. 7, first row). When we compare this result with other regions such as Devon (Fig. 7, bottom left panel) and the West Midlands (Fig. 7, bottom right panel), we note that secondary school absences in the last week of term follow a similar distribution to early November.
Finally, we examined the temporal correlation at the LTLA level between cases in the community and cases in primary and secondary schools (Fig. 8). We varied the lag time between school and community cases to explore whether there was any signal that increased cases in schools resulted in increased cases in the community at a later time, or whether the opposite was the case. In London, Kent and the West Midlands, we observe a weak correlation between cases in secondary school pupils and community cases that increases with lag time, peaking at a lag of around 5 days in London and 13 days in Kent and the West Midlands, indicating that an increase in community cases is most positively correlated with an increase in school cases in pupils at a later date. We observe the same result for primary school pupils in Kent and the West Midlands, but noticeably observe a negligible correlation between community cases and cases in primary school children in London across all time lags. We observe a much weaker correlation in Devon possibly owing to the relatively low number of cases observed in children in the county. On each subplot, we display the distributions for 4th November 2020 (blue) and 16th December 2020 (orange) for primary schools (first two plots) and secondary schools (second two plots)

Discussion
In this paper, we present a set of analyses of the Department for Education data on Educational Settings recording school attendance, in order to investigate the impact of the pandemic upon schools and the potential role of school children upon transmission in the wider community. We observe that cases in schools increased throughout September and October 2020, mirroring the increases reported in the local community. The percentage of students with confirmed infection in secondary school students was found to be higher than that in primary school students throughout this period. Notably, this was not the case with teachers -the percentage of teachers reporting infection appeared to be of a similar magnitude in both primary and secondary schools. This suggests that teachers are not exposed to increased risk in school environments where more children are infected, perhaps suggesting that the background incidence in the community plays a greater role in determining the risk to teachers. We can also infer that teachers are not at greater risk in primary schools than in secondary schools.
During the November 2020 lockdown, schools remained open and the observed rise in cases in younger people led to suggestions that schools were playing a Fig. 8 Correlation between cases in the community and pupils in November and December. In these panels, a positive lag indicates that the correlation is calculated between schools on the current date and community cases that have been reported at an earlier date up to a maximum lag of 16 days. The correlation is calculated for all LTLAs in each region, rather than calculated individually for each LTLA. For varying time lag applied to data from November and December, the LTLAs presented are a Greater London, b Kent, c Devon and d West Midlands. Secondary schools are depicted by dashed red lines and primary schools by solid blue lines. For each line, the shaded regions indicate the 95% confidence intervals, which were calculated using the Fisher transformation major role in spreading the virus. However, the subsequent confirmation of the emergence of the more transmissible B.1.1.7 variant provided evidence to suggest that this may not be the case; the increase in cases in secondary school-aged children in London, the South East and the East of England throughout late November and early December was not observed in the North West, the North East and the Midlands, where this new variant was not widely circulating at that point in time.
We seek to understand whether cases in schools are driving an increase in cases in the community, or whether an increase in incidence in the local area leads to increased infection rates in school-aged children and hence more cases reported in schools. Some insights can be gained by examining cases in schools stratified by the tier status of the relevant local authority. Notably, the increase in cases in students observed across all tiers during the first 2 weeks of the November lockdown, particularly in secondary schools, was not reflected in a rise in cases in teachers during this same period. If schools were exposing teachers to increased risk during this lockdown, we might expect that, as cases started to rise amongst secondary school children, a similar rise may be observed, following a time lag, in cases in teachers. Given that this is not the case, this may suggest that teachers are more at risk of infection in the community than in the school environment and the decreased community mixing due to the national lockdown led to the drop in cases in teachers during this period.
During December, we observed a distinct increase in the number of confirmed cases in students and teachers in the South East of England. However, this increase mirrored that seen in the local community. As the new variant B.1.1.7 became more prevalent, community cases increased more rapidly in the South East. We did observe some spatial variability at the LTLA level between areas of high incidence in the community and in schools. From our analysis, it appears that during December there was an increase in clusters of cases in secondary schools in those parts of the country that were most affected by the new variant. Kent in particular reported more schools with large numbers of students absent with confirmed infection in mid-December compared with before the start of the November lockdown. Noticeably, we did not observe a marked increase in the number of students absent per school in primary schools in Kent. There has been much debate around the relative role of primary and secondary schools during the pandemic and this analysis at least suggests that primary school children do not appear to be as affected as secondary school children by the emergence of the new variant.
When we examined the relationship between community and school cases in more depth, we observed a correlation between cases in the community and cases in schools in most regions, with the strongest correlation between current cases in schools and community cases reported several days previously. From this analysis, we conclude that there is not sufficient evidence to suggest that outbreaks in schools are driving an increase in community cases, with the calculated correlations providing weak evidence that suggesting the opposite may be true, that an increase in incidence in the community leads to more cases in schools. As schools re-open, careful monitoring may be required in order to determine the risk associated with open schools upon community incidence.
It is important to note that all of the data analysed here refer to absences in schools as a result of confirmed cases of COVID-19 in pupils and teachers, but they do not necessarily imply that these individuals were infected within schools. The data do not record location of infection, and therefore, we cannot provide conclusive evidence of the presence or absence of spread within a school.

Conclusions
At the time of writing at the end of January 2021, there have been almost 3.8 million confirmed cases of COVID-19 in the UK and around 105,000 deaths [28]. Hospital occupancy is reaching capacity in many parts of the country, daily deaths are still above 1000 per day and schools remain closed except for children of key workers and vulnerable children. It is clear that the longer that children remain out of school the greater the risk of many children suffering long term from a lack of access to face-to-face teaching and socialisation, with a resulting negative impact upon their mental health and education. It is vital that processes are put in place to ensure that children get back to school as rapidly but as safely as possible. Our work suggests that this can be achieved by ensuring that community incidence is as low as possible when schools re-open. However, further measures, such as ensuring parents do not mix at pick up and drop off and a reinforcement of the need for people to work from home if they can, may be needed in order for children to return to school safely in the near future.