Ethnic and socioeconomic differences in SARS-CoV2 infection in the UK Biobank cohort study

Background Understanding of the role of ethnicity and socioeconomic position in the risk of developing SARS-CoV-2 infection is limited. We investigated this in the UK Biobank study. Methods The UK Biobank study recruited 40-70 year olds in 2006-2010 from the general population, collecting information about self-defined ethnicity and socioeconomic variables (including Townsend deprivation index and educational attainment). SARS-CoV-2 test results from Public Health England were linked to baseline UK Biobank data. Poisson regression with robust standard errors was used to assess risk ratios (RRs) between the exposures and dichotomous variables for: being tested, having a positive test and testing positive in hospital. We also investigated whether ethnicity and socioeconomic position were associated with having a positive test amongst those tested. We adjusted for covariates including age, sex, social variables (including healthcare work and household size), behavioural risk factors and baseline health. Findings Among 428,225 participants, 1,474 had been tested and 669 had tested positive between 16 March and 13 April 2020. Black, south Asian and white Irish people were more likely to have confirmed infection (RR 4.01 (95%CI 2.92-5.12); RR 2.11 (95%CI 1.43-3.10); and RR 1.60 (95% CI 1.08-2.38) respectively) and were more likely to be hospitalised compared to White British. While they were more likely to be tested, they were also more likely to test positive. Adjustment for baseline health and behavioural risk factors led to little change, with only modest attenuation when accounting for socioeconomic variables. Area socioeconomic deprivation and having no qualifications were consistently associated with a higher risk of confirmed infection (RR 1.91 (95%CI 1.53-2.38); and RR 2.26 (95%CI 1.76-2.90) respectively). Interpretation Some minority ethnic groups have a higher risk of confirmed SARS-CoV-2 infection in the UK Biobank study which was not accounted for by differences in socioeconomic conditions, measured baseline health or behavioural risk factors. An urgent response to addressing these elevated risks is required. Funding Medical Research Council, Chief Scientist Office.


Research in Context
Evidence before the study Previous pandemics have often affected specific ethnic and socioeconomically disadvantaged groups disproportionately. We searched the Cochrane COVID-19 study register, the National Library of Medicine's LITCOVID database, medrxiv and biorxiv for epidemiological studies of the predictors of developing SARS-CoV-2 infection and prognosis of COVID-19 disease on 13 th April 2020. A prepublication ecological study of US counties suggested areas which had higher socioeconomic disadvantage and higher ethnic minorities tended to have greater COVID-19 case fatality. Audit data from critical care units in the UK and administrative data from the US Centers for Disease Control and Prevention found a higher than expected proportion of ethnic minorities were diagnosed with SARS-CoV-2 infection. However, we found no previous studies which accounted for potential differences in previous health, behavioural risk factors or social circumstances. We found a lack of studies examining differences in risk of SARS-CoV-2 infection or prognosis across socioeconomic groups.

Added value of the study
In a large population-based cohort study in the UK, we found an increased risk of developing confirmed SARS-CoV-2 infection in Black, South Asian and White Irish ethnic groups. The risk of confirmed infection was also higher with socioeconomic disadvantage (as assessed by both Townsend deprivation quartile and education level). Adjustment for potential confounding and mediating variables did not fully account for the differences in risk for either ethnicity or socioeconomic position. We also investigated whether differences in testing practice could be responsible for these findings (because of differential ascertainment) but found no evidence of this.

Implications of all the available evidence
There is increasing evidence that some ethnic minority groups (particularly Blacks, South Asians and White Irish) experience increased risk of SARS-CoV-2 infection, with increased risk amongst more socioeconomically disadvantaged groups too. While socioeconomic position, country of birth, behavioural risk factors and prior health might account for some of the differences between ethnic groups, they do not fully explain this risk.
Policy interventions designed to contain transmission and shield high risk groups need to take account of the higher risk SARS-CoV-2 and worse prognosis experienced by specific ethnic groups and more socioeconomically disadvantaged populations. Monitoring the impacts of the pandemic across different social groups is warranted, so that targeted interventions and a responsive policy approach can be pursued. Further research is needed to understand the mechanisms by which these excess risks arise.

Background
The Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) and its resulting disease  is spreading rapidly worldwide. 1 A better understanding of the predictors of developing infection is essential for health service planning (e.g. ensuring adequate facilities for those most at risk), targeting prevention efforts (e.g. targeted shielding or surveillance) and for informing future modelling efforts. Age, male sex and pre-existing medical conditions are established predictors of adverse COVID-19 outcomes, as is excess adiposity, 2 but the role of social determinants is poorly understood. 3,4 Ethnicity and socioeconomic position strongly influence health outcomes for both infectious and non-communicable diseases. Previous pandemics have often disproportionately impacted ethnic minorities and socioeconomically disadvantaged populations. 5,6 Early evidence suggests that the same may be occurring in the current SARS-CoV-2 pandemic but empirical research remains highly limited. 7 It is highly plausible that infection risk will vary across these social groups. For example, socioeconomic disadvantage is linked to living in overcrowded housing and some ethnic groups are more likely to live in larger households 8 -both of which potentially predispose to increased risk of infection, and to greater viral load.
Establishing the risk of developing infection across different social groups is challenging. A major issue is that information about ethnicity and socioeconomic position are often not well collected within routine health data. Furthermore, the size of the different social groups in the general population is also often not accurately known. The ideal approach to estimating infection risk across different social groups is to analyse data from a cohort study, but most existing cohort studies which include detailed information about ethnicity and socioeconomic position are subject to long delays in data being available for analysis and are too small to provide useful estimates of infection risk.
The UK Biobank study has carried out data linkage between its study participants and SARS-CoV-2 test results held by Public Health England. We therefore aimed to investigate the relationship between ethnicity, socioeconomic position and the risk of having confirmed SARS-CoV-2 infection in the population-based UK Biobank study.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 27, 2020.

Study design and participants
Data were obtained from UK Biobank (https://www.ukbiobank.ac.uk/), with the methods described in detail previously. 9 In brief, over 502 000 community-dwelling individuals aged 37 to 73 years were recruited to the study during 2006 to 2010. Participants attended one of 22 assessment centres across England, Scotland and Wales. Data were collected on a range of topics including social and demographic factors, health and behavioural risk factors, using standardised questionnaires administered by trained interviewers and self-completion by computer.
Results of COVID-19 tests for UK Biobank participants, including confirmed cases, were provided by the Public Health England (PHE) microbiology database Second Generation Surveillance System and linked to UK Biobank baseline data. 10 Data provided by PHE included the specimen date, specimen type (e.g. upper respiratory tract), laboratory, origin (whether there was evidence from microbiological record that the participant was an inpatient or not) and result (positive or negative).
Data were available for the period 16 March 2020 to 14 April 2020.
Since data on test results were only available for England, we restricted the study population to people who attended UK Biobank baseline assessment centres in England. Participants who were identified as having died prior to 14 February 2018 from the linked mortality records provided by the NHS Information Centre and those who requested to withdraw from the study (N=26) were also excluded from the analysis. In addition to the analyses of the overall population, we also investigated positive test results among those who had been tested only. This allowed us to investigate the potential for bias due to differential testing between ethnic and socioeconomic groups.
UK Biobank received ethical approval from the NHS National Research Ethics Service North West (11/NW/0382). This research has been conducted using the UK Biobank resource under Application 41286.

Assessment of ethnicity and socioeconomic position
All exposures were derived from the baseline assessment centre data collection. Ethnicity was selfreported based on pre-defined categories into: white British, white Irish, other white background, south Asian, black (Caribbean or African), Chinese, mixed or other. Due to small numbers, analyses of the Mixed and Chinese groups were limited. In line with previous research, we also do not report results for the other group due to problems with interpretation of this highly heterogenous group. 11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Ascertainment of SARS-CoV-2 outcomes
We defined our primary outcome as having a positive test within the Public Health England database available through linkage. 10 This reflects confirmed infection but does not include symptomatic individuals who have not presented to the health service or not been tested, or asymptomatic cases. Some systemic differences exist in testing threshold. For example, healthcare workers may be more likely to be tested and therefore observed differences may reflect differences in testing practices. To investigate whether differential ascertainment was biasing our results, we studied three further outcomes. We identified positive cases that had their test taken while an in-patient (hereafter referred to as hospitalised cases). This group is likely to reflect more severe illness and therefore is less likely to be subject to ascertainment bias. In addition, we investigated outcomes related to testing practice by assessing the risk of being tested in the overall population and testing positive amongst only those who had been tested. Higher levels of confirmed SARS-CoV-2 infection could arise from higher rates of testing amongst some population subgroups. However, if this were to occur, the likelihood of having a positive test would be lower amongst groups experiencing high rates of testing.

Potential confounders and mediators
Age group (5-year age bands), sex and assessment centre were included as potential confounder variables in all statistical models. Country of birth (UK and Ireland) versus elsewhere was also included, given its influence on cultural practices. 14 We also included several variables which could reflect potential confounding or mediation. Participants were asked about the title of their current or most recent job at baseline and these were converted to the Standard Occupational Classification Baseline health status was assessed using self-reported long-standing illness, health, disability or infirmity (yes or no) and the number of chronic health conditions self-reported from a pre-defined list of 43 conditions and top-coded at 4 or more, based on a previously published approach. 15 Lifestyle factors included smoking (never, previous, current); body mass index (BMI) (weight/height 2 derived from physical measurements and classified into underweight, normal weight, overweight, obese); and alcohol consumption (categorised into daily or almost daily, 3-4 times a week, once or twice a week, 1-3 times per month, special occasions, former drinker or never).
Other social variables were also considered. Employment status distinguished those in paid employment or self-employment, retired, looking after home and/or family, unable to work because of sickness or disability, unemployment or other. For those in work, manual versus non-manual occupation was assessed by asking participants to report whether their job involved heavy manual or physical work (never/rarely/sometimes versus usually/always). Housing tenure was categorised into owner-occupier or renter/other (including those who live in accommodation rent free, in a care home or sheltered accommodation). Urban/rural status was derived from data on the home area population density; UK Biobank combined each participant's home postcode with data generated from the 2001 census from the Office of National Statistics. The number of people within a household was categorised into three groups: single person, two people and three or more people (which included those living in institutions, such as care homes).

Statistical analyses
The association between the exposures (ethnicity and socioeconomic position) and the outcomes of interest (confirmed infection, hospitalised case, being tested and having a positive test amongst those tested) were explored using Poisson regression. Poisson regression was preferred over logistic regression to allow relative risks to be presented, rather than odds ratios which are often misinterpreted. 16 Robust standard errors were used to ensure accurate estimation of 95% confidence intervals and p values. Statistical analysis was conducted using Stata/MP 15.1 To investigate ethnicity, we initially adjusted for age, sex and assessment centre (model 1) and then added country of birth (model 2). Subsequent models additionally adjusted for variables which we hypothesised were likely to be at least partially mediating rather than confounding variables. Model 3 adjusted for model 2 variables and for being a healthcare worker. Model 4 additionally adjusted for social variables (namely urbanicity, number of people per household, highest education level, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. We followed a similar approach to explore the role of deprivation and education level. Model 1 was adjusted for age, sex and assessment centre; model 2 added ethnicity and country of birth; model 3 also adjusted for the social variables (as above); model 4 adjusted for model 2 plus health status variables; model 5 was adjusted for model 2 plus behavioural risk factors; and model 6 was adjusted for all previous covariates.

Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding authors (SVK and CLN) had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Results
Most of the baseline UK Biobank sample in England was white British, with the next largest groups being other white, white Irish and then south Asian and black (Table 1) In comparison to the white British majority ethnic group, several ethnic minority groups had a higher risk of testing positive for SARS-CoV-2 infection and also testing positive while an in-patient (Figure 1 and Appendix). Black participants had the highest risk (RR 4.01 (95%CI 2.92-5.52)), with adjustment for country of birth resulting in little attenuation (RR 3.70 (95%CI 2.50-5.48)); adjustment for a history of being a healthcare worker (RR 3.35 (95%CI 2.24-5.00)) and for social factors (including measures of socioeconomic position) did additionally attenuate the risk (RR 2.45 (95%CI 1.57-3.81)).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . South Asians also had an elevated risk (RR 2.11 (95%CI 1.43-3.10) in model 1), with a similar pattern of attenuation as for the black ethnic group. In contrast, the white Irish group had a consistent modestly elevated risk of having a positive test (RR 1.60 (95%CI 1.08-2.38)) which did not change with adjustment for covariates. The Chinese group had imprecisely estimated risk ratios due to smaller numbers. The pattern of findings for hospitalised cases was similar, suggesting that the higher testing rates amongst certain ethnic groups in the community were not skewing the results.
Similarly, analyses of the likelihood of testing positive amongst those who had been tested was often higher or the same in these ethnic groups (Table 2), whereas a lower risk would have suggested differentially high testing.
In comparison to the most socioeconomically advantaged quartile, living in a disadvantaged area (according to the Townsend deprivation score) was associated with a higher risk of confirmed infection, particularly for the most disadvantaged quartile (RR 2.48 (95%CI 1.95-3.16)) ( Figure 2 and Appendix). Differences in ethnicity and country of birth, social factors, baseline health and behavioural risk factors all moderately attenuated the association in the most disadvantaged quartile. Socioeconomic deprivation was also associated with hospitalised cases. While testing was again more likely, the risk of being diagnosed positive amongst those tested also tended to be higher, rather than lower ( Table 2).
Analyses by education also showed a higher risk of confirmed SARS-CoV-2 infection with lower levels of education (RR 1.95 (95%CI 1.56-2.43) for no qualifications compared to degree level educated) ( Figure 3 and Appendix). While adjustment for ethnicity and country of birth made little difference to the association, adjustment for social factors, baseline health and behavioural risk factors all attenuated the association somewhat (RR 1.41 (95%CI 1.09-1.82) in fully adjusted model). We again observed a similar pattern in hospitalised cases and found little evidence of increased testing amongst the less educated groups ( Figure 3 and Table 2).

Discussion
Several ethnic minority groups had a higher risk of both being diagnosed and testing positive as an inpatient with laboratory-confirmed SARS-CoV-2 infection in the UK Biobank study. The black, south Asian and white Irish ethnic groups were found to be at greatest risk. Similarly, measures of socioeconomic disadvantage (area-based deprivation and lower education) were also associated with an increased risk of having confirmed infection and being a hospitalised case. For both ethnicity and socioeconomic position, we did not find evidence that these patterns were likely to be due to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20075663 doi: medRxiv preprint differential ascertainment, since although the likelihood of testing was increased, the likelihood of a positive test was, if anything, higher among ethnic minorities who had been tested. Ethnic differences in infection risk did not appear to be fully accounted for by differences in pre-existing health, behavioural risk factors or country of birth measured at baseline. Furthermore, socioeconomic differences appeared to make a modest contribution to these ethnic differences.
Our study has several important strengths. First, by using a well characterised cohort study, we can identify a clearly defined population at risk of experiencing SARS-CoV-2 infection. By combining data linkage with a large sample size, this has allowed us to provide empirical data from this pandemic in a timely fashion. Ethnicity was collected using self-report which is widely considered to be a goldstandard approach 17 and the availability of a large dataset has allowed us to provide empirical data on this crucial policy priority in a timely fashion, including a more nuanced appreciation of the risks of infection within different members of the white majority population, as well as minority ethnic populations. Our investigation of socioeconomic position has similarly benefited from being able to study different measures and assess the pattern of findings across these. The detailed data collected in this cohort has also allowed us to investigate the extent to which observed inequalities are potentially mediated by a wide range of factors, including behavioural risk factors, pre-existing health status and other social variables.
However, several potential limitations should be noted. Ascertainment bias is potentially problematic and could arise in several ways, including differential healthcare seeking, differential testing and differential prognosis. Even so, we have been unable to find any evidence to suggest that differential healthcare seeking or testing would explain the observed pattern of findings. Increased ascertainment amongst ethnic minorities would be expected to result in a lower proportion of confirmed cases amongst those tested whereas we observed the opposite. One possibility that remains is that some ethnic and socioeconomic groups have a poorer prognosis and are therefore more likely to be admitted to hospital and therefore to be tested. However, if this were the case, the issue of more adverse outcomes among these groups remains concerning. Other limitations include the non-representativeness of the UK Biobank study population, with those who were more advantaged being more likely to participate and ethnic minorities less well represented. There is therefore the potential that the findings in our study may not reflect the broader UK population. 18 However, empirical research has found that this does not result in substantial bias in measures of association in the UK Biobank study. 19 We have also been unable to fully exclude all deaths that occurred prior to the pandemic, due to lack of up-to-date linkage to mortality records at present.
Our exposure data were collected some years ago and it is therefore likely that pre-existing health, risk factors and some social variables have changed, although generally most risk factors track . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20075663 doi: medRxiv preprint throughout life. Being a healthcare worker was also ascertained at baseline, although many who stopped employment in this area have now returned to work. Lastly, we have not explored the role of specific health conditions such as asthma, diabetes and high blood pressure, which have been shown to be associated with a higher risk of severe outcomes 3,20 and are more prevalent amongst socioeconomically disadvantaged groups and some ethnic minority groups. 21,22 However, these are likely to operate as mediators rather than confounders.
Administrative data from health services has recently suggested an increased risk of severe COVID-19 disease within ethnic minority groups. The UK's Intensive Care National Audit & Research Centre (ICNARC) analysed data on 5,578 patients admitted to critical care up to 16 th April 2020 and found black and Asian people comprised a high proportion of total patients (11.2% and 14.9% respectively), although it was unclear whether these higher percentages were biased by most cases being initially seen in areas with high BME proportions. 23 Similarly, data from the US Centers for Disease Control and Prevention also suggest a higher risk amongst Black or African American people, but information on race was missing for approximately two-thirds of those diagnosed. 24 Academic research on this topic has been limited to date. An ecological study of US counties has suggested that more socially vulnerable areas (which included greater numbers of people with socioeconomic disadvantage and ethnic minorities) were associated with higher COVID-19 case fatality rates. 25 Our study adds substantially to the evidence by finding that ethnicity appears to be an important predictor of laboratory-confirmed SARS-CoV-2 infection that is only partly attenuated by a large range of potential mediators (such as socioeconomic position), as well as addressing concerns about numerator-denominator bias.
Our results suggest there is an urgent need for further research on how SARS-CoV-2 infection affects different ethnic and socioeconomic groups. Our findings warrant replication in other datasets, ideally including representative samples and across different countries. As the pandemic evolves, there is a need to monitor infection and disease outcomes by ethnicity and socioeconomic position.
However, data to allow this disaggregation is often not available -record linkage could potentially help address this gap, particularly in settings where administrative register data are available. Given the differences in health risks across occupational groups 26 , understanding the risks that the full range of key workers experience is also required. Lastly, other social groups, such as homeless people, prisoners and undocumented migrants, experience severe disadvantage and research is necessary to study these highly vulnerable populations too. 27,28 The limited evidence available suggests that some ethnic minority groups, particularly black and south Asian people, are particularly vulnerable to SARS-CoV-2 infection. Socioeconomic . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04. 22.20075663 doi: medRxiv preprint disadvantage and poorer pre-existing health do not explain all this elevated risk. There is therefore a need to determine exactly why this increased risk occurs. An immediate policy response is required to ensure the health system is responsive to the needs of ethnic minority groups. This should include ensuring that health and care workforces, which often rely on workers from minority ethnic populations, have access to the necessary protective personal equipment (PPE) to ensure they can work safely. Timely communication of guidelines to reduce the risk of being exposed to the virus is also required in a range of languages. 29 Previous evidence suggests ethnic minorities in the UK tend to receive reasonably equitable care in many, but not all, areas. 30 However, this is not the case in many other countries (such as the US) where the adverse consequences of SARS-CoV-2 infection may be even worse. SARS-CoV-2 therefore has the potential to substantially exacerbate ethnic and socioeconomic inequalities in health 31 , unless steps are taken to mitigate these inequalities. The data from this study may be helpful to inform allocation of more aggressive therapies in people with severe disease, or targeting preventative vaccination to at risk groups, once evidence for such approaches becomes available.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. .

Figure 1: Risk ratios for the association between ethnicity (White British as reference category) and: a) being tested for SARS-CoV-2; b) testing positive and c) testing positive as an inpatient amongst participants in UK Biobank
Model 1: Age, sex and assessment centre . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 27, 2020. . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Non
The copyright holder for this preprint this version posted April 27, 2020. .