Skip to main content
  • Research article
  • Open access
  • Published:

Predictive model for long COVID in children 3 months after a SARS-CoV-2 PCR test



To update and internally validate a model to predict children and young people (CYP) most likely to experience long COVID (i.e. at least one impairing symptom) 3 months after SARS-CoV-2 PCR testing and to determine whether the impact of predictors differed by SARS-CoV-2 status.


Data from a nationally matched cohort of SARS-CoV-2 test-positive and test-negative CYP aged 11–17 years was used. The main outcome measure, long COVID, was defined as one or more impairing symptoms 3 months after PCR testing. Potential pre-specified predictors included SARS-CoV-2 status, sex, age, ethnicity, deprivation, quality of life/functioning (five EQ-5D-Y items), physical and mental health and loneliness (prior to testing) and number of symptoms at testing. The model was developed using logistic regression; performance was assessed using calibration and discrimination measures; internal validation was performed via bootstrapping and the final model was adjusted for overfitting.


A total of 7139 (3246 test-positives, 3893 test-negatives) completing a questionnaire 3 months post-test were included. 25.2% (817/3246) of SARS-CoV-2 PCR-positives and 18.5% (719/3893) of SARS-CoV-2 PCR-negatives had one or more impairing symptoms 3 months post-test. The final model contained SARS-CoV-2 status, number of symptoms at testing, sex, age, ethnicity, physical and mental health, loneliness and four EQ-5D-Y items before testing. Internal validation showed minimal overfitting with excellent calibration and discrimination measures (optimism-adjusted calibration slope: 0.96575; C-statistic: 0.83130).


We updated a risk prediction equation to identify those most at risk of long COVID 3 months after a SARS-CoV-2 PCR test which could serve as a useful triage and management tool for CYP during the ongoing pandemic. External validation is required before large-scale implementation.

Peer Review reports


Children and young people (CYP) testing positive for SARS-CoV-2 are usually asymptomatic or have a low symptom burden at the time of infection compared to adults [1, 2]. Recent studies on post-COVID sequelae (also known as ‘long COVID’), however, have shown some adults and children can have persistent symptoms for months after acute infection [3, 4]. A recent systematic review of persistent symptoms following SARS-CoV-2 infection found most reported persistent symptoms were no more common in SARS-CoV-2-positive than in SARS-CoV-2-negative CYP, with only small increases in cognitive difficulties, headache, loss of smell, sore throat and sore eyes [5]. Similar to the successful use of predictive models for cardiovascular disease, e.g. in the United Kingdom (UK) [6] and the United States (US) [7], predictive models can help identify CYP at highest risk of experiencing persistent symptoms and direct them towards relevant care. This is particularly important during the pandemic when health services are under increased pressure [8]. A systematic review identified over 100 diagnostic and prognostic models for SARS-CoV-2, mainly relating to acute outcomes, e.g. mortality, intensive care unit (ICU) admission and length of hospital stay [9]. With the exception of two studies, however, most were considered low quality due to non-representative selection of controls, inadequate exclusions, high risk of model overfitting and unclear reporting [9]. Based on predictive model quality assessment tools [10] and model development guidelines [11], the two models mentioned above (the Jehi diagnostic model [12] and 4C mortality score [13]) and a third model (QCOVID [14]) are considered as higher quality predictive models for SARS-CoV-2 because of large sample sizes [15], appropriate modelling techniques [16] and suitable internal validation and reporting [11]. Of these three models, the 4C and QCOVID models were developed in adult populations (age≥18 years) whereas the Jehi model was developed in all patients who were tested for SARS-CoV-2 at all Cleveland Clinic locations in Ohio and Florida, US, regardless of age and included 11,672 patients (median age: 46.89 years among SARS-CoV-2 negatives; 54.23 years among SARS-CoV-2 positives).

There are very few predictive models for the potential long-term effects of SARS-CoV-2 infection, and those that exist have focused mostly on adults. Sudre and colleagues focused on identifying the characteristics and predictors of post-COVID sequelae in a sample of 4182 adults who reported testing positive for SARS-CoV-2 and found those experiencing more than five symptoms during the first week of illness were more likely to report ‘long COVID’ [17]. Recent large national cohort studies of CYP are consistent with the abovementioned systematic review [5], finding little difference in ‘long COVID’ symptom prevalence between SARS-CoV-2-positive and SARS-CoV-2 control CYP who either tested negative or did not have a test [4, 18]. As acute SARS-CoV-2 infection remains predominantly a mild infection in CYP and the cumulative incidence of infection increases, the incidence of post-COVID sequelae and the extent to which it is distinct from pandemic-related symptoms resulting from national lockdowns, school closures and social isolation is a critical factor in health policy decisions. We previously presented a model that predicted impairing symptoms in CYP [19], and here we aimed to update and internally validate the prediction model in CYP 3 months after a PCR test and to determine whether the impact of these predictors differed by SARS-CoV-2 infection status. The outcome examined here aligns with our previously described Delphi definition of long COVID [20].


We use data from the Children and young people with Long Covid (CLoCk) study: a national cohort study of SARS-CoV-2 PCR-positive CYP aged 11–17 years living in England who were matched at study invitation, on month of test, age, sex and geographical area, to SARS-CoV-2 test-negative CYP selected from the national testing database at Public Health England (now UK Health Security Agency (UKHSA)) [21]. Test-negative CYP who self-reported subsequently testing positive for SARS-CoV-2 were excluded [4].

Here we examine a previously described study subset that is broadly representative of the target population in terms of age, sex, geographical region and socio-economic status [4]. Briefly, from a total of 50,836 CYP who were approached, 7139 (3246 SARS-CoV-2 positive, 3893 SARS-CoV-2 negative) who completed the CLoCk questionnaire sent to them 3 months after their PCR test during January–March 2021 (median time between testing and questionnaire: 14.9 weeks [25th, 50th centiles: 13.1, 18.9]) were included. The questionnaire included demographic characteristics, elements of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC) Paediatric COVID-19 follow-up questionnaire [22] and the recent Mental Health of Children and Young people in England surveys [23]. CYP responded to 21 questions on physical symptoms at the time of testing (e.g. cough, tiredness, etc.). They rated their general physical and mental health before SARS-CoV-2 testing in two separate questions using a 5-category Likert scale. The prevalence of ‘very poor’ was low; therefore, for analysis, we recoded these variables into four categories (very poor/poor to very good). Quality of life/functioning before testing was measured via the EQ-5D-Y scale [24], and feelings of loneliness by the UCLA Loneliness scale [25]. The Index of Multiple Deprivation (IMD) was calculated from the CYP’s small local area level-based geographic hierarchy (lower super output area) at the time of the questionnaire and used as a proxy for socio-economic status. We examine IMD quintiles from most (quintile 1) to least (quintile 5) deprived (Table 1).

Table 1 Baseline characteristics (frequencies and percentages) of participants who completed the 3-month questionnaire, overall and stratified by SARS-CoV-2 status

Outcome: long COVID (experiencing at least one impairing symptom)

We operationalized the Delphi research definition of long COVID [20] as having at least one of the 21 reported physical symptoms and experiencing more than minimal problems on any one of the five EQ-5D-Y questions at the time of the questionnaire, i.e. approximately 3 months after the PCR test (see Table 2). The published Delphi research definition requires laboratory confirmation of SARS-CoV-2 infection but of course that was not required when assessing how many test-negatives would also have met this definition.

Table 2 Prevalence (frequencies and percentages) of long COVID 3 months after a PCR test and related variablesa, overall and stratified by SARS-CoV-2 status

Potential predictors

Pre-specified potential predictors were chosen based on their distribution in the dataset and their association with the outcome. In addition to SARS-CoV-2 status, we considered 13 predictors including demographics (sex, age, ethnicity and IMD), prior quality of life/functioning (assessed by 5 items from the EQ-5D-Y scale), prior physical and mental health and feelings of loneliness prior to the CYP’s PCR test. We also included the number of physical symptoms experienced at testing (details in Table 1).

Sample size and missing data

The sample size was pre-defined by the study design. We, therefore, assessed whether our study was sufficiently powered to estimate the overall outcome risk, and how many predictor parameters could be considered before overfitting/precision becomes a concern [15]. Using the pmsampsize STATA package [15], we considered (i) small overfitting (i.e. a shrinkage factor of predictor effects ≤10%), (ii) small absolute difference of 0.05 in the model’s apparent and adjusted Nagelkerke’s R-squared value and (iii) precise estimation within ±0.05 of the average outcome risk in the population. We also assumed an outcome prevalence of 21.5%, C-statistic of 0.80 and 61 parameters. Accordingly, the minimum sample size required was 2557 (actual sample=7139); the events-per-candidate predictor parameter value was 9.01. The dataset had no missing data.

Statistical analysis

We assessed the extent to which SARS-CoV-2 status and our 13 potential predictors were correlated by considering pairwise Cramer’s V correlation coefficients. All potential predictors were categorical variables, with the exception of age and number of symptoms at testing. We determined the appropriate functional form for the relationship between age and the log odds of the probability of the outcome by modelling the relationship (i) linearly, (ii) categorically (11–13, 14–15, 16–17 years), (iii) with linear and quadratic terms and (iv) using fractional polynomials with up to 2 degrees. Similarly, we examined the most appropriate functional form for the number of symptoms. The functional form with the lowest Akaike information criterion (i.e., the best fitting model) was used in building our prediction model.

We used logistic regression to address our aim of predicting long COVID in CYP 3 months after their PCR test, allowing for an interaction between each potential predictor and SARS-CoV-2 status to determine whether the relationship between the potential predictor and outcome differed by SARS-CoV-2 status. We first examined univariable associations between each predictor and long COVID, in the total population and stratified by SARS-CoV-2 status. Next, we built a multivariable prediction model using a stepwise backward (p<0.200) and forward (p<0.157) elimination procedure [26]. Variables included in the stepwise selection procedure included all potential predictors, SARS-CoV-2 status and interaction terms between potential predictors and SARS-CoV-2 status (61 potential parameters in total). The above steps were used in developing our initial model for predicting long COVID [19], and here we present an update using a larger sample size and a refined definition of long COVID (see Table 2). The model was updated with an adjustment of the intercept to account for the difference in the outcome prevalence and all the regression coefficients were re-estimated based on the larger sample size of 7139 CYP.

Model performance was measured using calibration and discrimination measures. Calibration (i.e. agreement between observed and predicted probabilities of our outcome) was assessed using calibration plots, calibration-in-the-large and calibration slope statistics [16, 27]. Model discrimination (i.e. the ability of our model to differentiate between CYP who had long COVID 3 months post-test and those who did not) was quantified using the C-statistic (values ≥ 0.7 indicate strong discrimination). The internal validity of our final model was assessed using 100 bootstrap samples which were drawn with replacement [16]. We estimated the level of model overfitting (optimism) in our dataset using the bootstrap samples and adjusted for optimism using a uniform shrinkage factor (the average calibration slope from each of the bootstrap samples). The original β coefficients were multiplied by the shrinkage factor to obtain the optimism-adjusted coefficients; the model intercept was re-estimated based on these shrunken model coefficients generating the final model [11, 27].

Data management and analysis were performed using STATA16. We followed guidelines by the Prognosis Research Strategy (PROGRESS) [28,29,30,31] Group; the model development and validation phases particularly followed the suggested methods [27, 30,31,32]. The study is reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement (Additional file 1: Table 1) [11]. The study was approved by Yorkshire and the Humber–South Yorkshire Research Ethics Committee (REC reference: 21/YH/0060).


Of the 7139 CYP (3893 SARS-CoV-2 negative, 3246 SARS-CoV-2 positive) in our analytic sample, 26% (1860/7139) were of non-European origin, 62.9% (4493/7139) were female and there were more older than younger CYP (42.6% 16–17-year-olds vs. 31.0% 11–13-year-olds) (Table 1). Three months after their PCR test, 65.5% (2126/3246) of SARS-CoV-2 PCR-positives had at least one physical symptom (Table 2) and 25.2% (817/3246) had long COVID (i.e. at least one impairing symptom). This compares with 52.5% (2045/3893) and 18.5% (719/3893), respectively, in test-negative CYP.

Univariable associations

SARS-CoV-2 status and the 13 potential predictors were not strongly correlated (Cramer’s V < 0.50 for all possible pairwise correlations). Ethnicity did not predict the outcome (Table 3). The predictive effect of self-rated physical and mental health, feelings of loneliness, problems with mobility, doing usual activities, having pain and feeling worried/sad before testing differed by SARS-CoV-2 status, with a general pattern of higher odds among test-negatives (Table 3, stratified associations).

Table 3 Odds ratios (95% CIs) of univariable associations between potential predictors and long COVID 3 months after a PCR test, overall and stratified by SARS-CoV-2 status

Multivariable predictive model

In the final model (Additional file 1: Table 2), SARS-CoV-2 status, number of symptoms at testing, sex, age, ethnicity, self-rated physical and mental health, feelings of loneliness and four items from the EQ-5D-Y scale (problems looking after self, doing usual activities, having pain, feeling worried/sad) before testing predicted the outcome. The impact of some predictors differed by SARS-CoV-2 status: interactions between SARS-CoV-2 status and age, ethnicity, self-rated mental health, feelings of loneliness and problems doing usual activities were retained. Additional file 1: Fig. 1 shows graphs from the final model, for all included predictors, of the probability of having the outcome.

Model performance

The model showed excellent calibration and discrimination. It was perfectly calibrated in the model development data with an apparent slope of 1 and an apparent calibration-in-the-large of 0 (Additional file 1: Table 3). Good overall model calibration was further confirmed by the calibration plot (Fig. 1), with narrow confidence intervals and closely aligned predicted and observed probabilities for 10 equally sized risk groups. The predictive model showed strong discrimination with a C-statistic of 0.838 (95% CI: 0.827, 0.849). Bootstrap internal validation showed small model overfitting with an optimism-corrected C slope close to one. The bootstrapping approach provided a shrinkage factor of 0.965752; we also generated the heuristic shrinkage factor (again close to one: 0.979196). We chose the bootstrap shrinkage factor as it was slightly smaller, and applied it to the original β coefficients to obtain the optimism-adjusted coefficients before re-estimating the intercept for the final model given in Box 1 (Additional file 1) and Additional file 1: Table 2.

Fig. 1
figure 1

Observed and predicted risk of long COVID 3 months after a PCR test. This graph shows the mean predicted probability (hollow dots) and 95% confidence intervals of long COVID 3 months after a PCR test plotted against the observed proportion of the same outcome for 10 equally sized groups. The dashed line represents the line of equality and perfect calibration. The blue solid line is a smoothed locally weighted scatter plot smoothing (Lowess) regression line

Worked examples

Box 1 (Additional file 1) shows the prediction equation for estimating the risk of long COVID 3 months post-PCR test in 11-to-17-year-old CYP. We demonstrate with hypothetical examples the predicted risk of long COVID 3 months post-test in Table 4. A calculator is provided in Additional file 2.

Table 4 Hypothetical examples of predicted risk of long COVID 3 months after a PCR test, using our prediction model

As an example, the predicted risk of outcome for a hypothetical 14-year-old, white male, with no symptoms at testing, very good physical health, never feeling lonely, no problems on all included EQ-5D-Y items and poor/very poor mental health before testing, would be 0.11 if he tested positive and 0.04 if negative; if he had very good mental health before testing, the risk would be 0.07 if positive and 0.03 if negative.


To our knowledge, we have developed [19] and updated the first risk prediction model that uses self-reported information from CYP to estimate their probability of experiencing long COVID 3 months after SARS-CoV-2 testing. SARS-CoV-2 status, number of physical symptoms at testing, sex, age, ethnicity, self-rated physical and mental health, feelings of loneliness and four items from the EQ-5D-Y scale (all before testing) predicted long COVID 3 months later, with the impact of some predictors differing by SARS-CoV-2 status. We provide a risk calculator to predict CYP most likely to experience long COVID, to triage those who need support and for whom early intervention might be of greatest benefit. Importantly, our model has excellent predictive ability, calibration and discrimination. It enables us to answer important clinical questions such as ‘are those who have many symptoms during acute SARS-CoV-2 infection at greater risk of Long COVID than those without?’. The answer is ‘yes’ but our model provides a more nuanced answer by considering other factors.

Our goal was to provide a model that utilizes multiple factors (i.e. predictors) in combination, to accurately predict long COVID 3 months post-test. Importantly, our focus was not on whether included predictors are causal or not. Instead, the focus was the overall predictive performance of the model [33]. As such, we followed the guidelines to model building [27]. The large sample allowed flexible examination of the potential for relationships to differ by SARS-CoV-2 status and by the shape of the association without considerable concerns about overfitting. Model fitting statistics were extremely favourable and the use of a matched national cohort sample of test-positive and test-negative CYP is unique. Despite its internal validation, the model needs to be externally validated on other independent datasets and in different populations and settings prior to its wider application. Additionally, the model needs to be reassessed for experiencing long COVID beyond 3 months. It is possible many of the predictors stay the same but acknowledge there may be differences as the disease profile (and, therefore, predictors) changes over the course of the illness.

We acknowledge study limitations. The CLoCk study response rate (13.9%) is typical of surveys of this type [34] and is in line with other COVID-19-related studies [35, 36]. Importantly, the examined CYP are broadly representative of the target population in terms of important demographics such as age, sex and socio-economic status [4] as well as more generally of CYP aged 11–17 years living in England [37]. Baseline measures (at/or before testing) were subject to recall bias because they were not taken at the time of acute infection, and we were unable to assess whether symptoms waxed and waned between testing and the questionnaire. In addition, the possibility of selection bias in both directions (CYP more likely to participate if they have persistent symptoms, or less likely to participate if too unwell) among respondents cannot be ruled out. Furthermore, as the background epidemiological situation in relation to SARS-CoV-2 infection prevalence changes, there is a need to reassess possible differences in our model’s predictive value over time. Finally, caution is required for predictions based on data extrapolation/situations where there are only a very small number of observations for different predictor combinations.

To our knowledge, no other study has explicitly aimed to present a risk prediction model for long COVID [5, 38]. Moreover, the majority of previous studies lack a SARS-CoV-2 test-negative comparison group and so distinguishing long-term symptoms predicted by SARS-CoV-2 infection from background rates or pandemic-related effects remains a challenge [5]. More recent studies include control groups and, thus, broad comparisons can be made. Our finding that the odds of experiencing long COVID 3 months post-test was 1.48 times higher in SARS-CoV-2-positive compared to SARS-CoV-2-negative CYP is in line with findings from the LongCOVIDKidsDK study, where the SARS-CoV-2 test-positive group had 1.22 times higher odds of having at least one ‘Long COVID’ symptom lasting at least 2 months compared with the SARS-CoV-2 control group who either tested negative or never had a test [39]. We found both test-positive and test-negative CYP met the Delphi consensus definition of long COVID 3 months post-test with a difference of 6.7% between these groups. In contrast, in Borch et al., the prevalence of reported symptoms in CYP aged 6–17 years lasting more than 4 weeks was similar regardless of SARS-CoV-2 status (28% test-positives; 27.2% test-negatives/never had a test) [18]. Discrepancies in findings could be due to several reasons including differences in the symptom questions asked of the test-positive and test-negative/never been tested groups, timing of outcome (>4 weeks vs ~3 months), recruitment methodology, recruitment rates between test-positives and test-negatives and/or underlying prevalence levels in the countries at the time of the study. Our results are consistent with findings in adults, where the number of symptoms at onset [40] and female sex [41] were associated with ‘Long COVID’ and pre-existing diagnosis of depression/anxiety is over-represented in those with fatigue after SARS-CoV-2 infection [41].


Understanding which CYP are at risk of experiencing long COVID is important for individuals (e.g. in decision-making about whether to receive COVID-19 vaccination) and health service provision (e.g. for careful monitoring, early intervention and hopefully reduction in the burden of prolonged health problems). Using data from a large national matched cohort study, we updated our previously developed prediction model for experiencing long COVID 3 months after SARS-CoV-2 testing in CYP. Our model has excellent performance, and we hope it will serve as a useful tool for the early identification and management of CYP at risk of long COVID in the context of the current pandemic.

Availability of data and materials

Data are not publicly available. All requests for data will be reviewed by the Children & young people with Long Covid (CLoCk) study team, to verify whether the request is subject to any intellectual property or confidentiality obligations. Requests for access to the participant-level data from this study can be submitted via email to with detailed proposals for approval. A signed data access agreement with the CLoCK team is required before accessing shared data. Code is not made available as we have not used custom code or algorithms central to our conclusions.



Children and young people with Long Covid


Children and young people


Intensive care unit


Index of Multiple Deprivation


International Severe Acute Respiratory and emerging Infection Consortium


United Kingdom


UK Health Security Agency


United States


  1. Molteni E, Sudre CH, Canas LS, Bhopal SS, Hughes RC, Antonelli M, et al. Illness duration and symptom profile in symptomatic UK school-aged children tested for SARS-CoV-2. Lancet Child Adolesc Health. 2021;5(10):708–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Castagnoli R, Votto M, Licari A, Brambilla I, Bruno R, Perlini S, et al. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in children and adolescents: a systematic review. JAMA Pediatr. 2020;174(9):882–9.

    Article  PubMed  Google Scholar 

  3. Lopez-Leon S, Wegman-Ostrosky T, Perelman C, Sepulveda R, Rebolledo PA, Cuapio A, et al. More than 50 long-term effects of COVID-19: a systematic review and meta-analysis. Sci Rep. 2021;11(1):16144.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Stephenson T, Pinto Pereira SM, Shafran R, de Stavola BL, Rojas N, McOwat K, et al. Physical and mental health 3 months after SARS-CoV-2 infection (long COVID) among adolescents in England (CLoCk): a national matched cohort study. Lancet Child Adolesc Health. 2022.

  5. Behnood SA, Shafran R, Bennett SD, Zhang AXD, O'Mahoney LL, Stephenson TJ, et al. Persistent symptoms following SARS-CoV-2 infection amongst children and young people: a meta-analysis of controlled and uncontrolled studies. J Infect. 2021.

  6. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ. 2017;357:j2099.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Goff DC Jr, Lloyd-Jones DM, Bennett G, Coady S, D'Agostino RB, Gibbons R, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129(25 Suppl 2):S49–73.

    Article  PubMed  Google Scholar 

  8. House of Commons Health and Social Care Committee. Clearing the backlog caused by the pandemic. 2021. [updated 14/12/2021]. Available from:

    Google Scholar 

  9. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–8.

    Article  PubMed  Google Scholar 

  11. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–73.

    Article  PubMed  Google Scholar 

  12. Jehi L, Ji X, Milinovich A, Erzurum S, Rubin BP, Gordon S, et al. Individualizing risk prediction for positive coronavirus disease 2019 testing: results from 11,672 patients. Chest. 2020;158(4):1364–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020;370:m3339.

    Article  PubMed  Google Scholar 

  14. Clift AK, Coupland CAC, Keogh RH, Diaz-Ordaz K, Williamson E, Harrison EM, et al. Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: national derivation and validation cohort study. BMJ. 2020;371:m3731.

    Article  PubMed  Google Scholar 

  15. Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020;368:m441.

    Article  PubMed  Google Scholar 

  16. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, et al. Attributes and predictors of long COVID. Nat Med. 2021;27(4):626–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Borch L, Holm M, Knudsen M, Ellermann-Eriksen S, Hagstroem S. Long COVID symptoms and duration in SARS-CoV-2 positive children - a nationwide cohort study. Eur J Pediatr. 2022.

  19. Nugawela MD, Stephenson T, Shafran R, de Stavola BL, Ladhani SN, Simmons R, et al. Developing a model for predicting impairing physical symptoms in children 3 months after a SARS-CoV-2 PCR-test: the CLoCk Study. medRxiv. 2022.

  20. Stephenson T, Allin B, Nugawela MD, Rojas N, Dalrymple E, Pinto Pereira S, et al. Long COVID (post-COVID-19 condition) in children: a modified Delphi process. Arch Dis Child. 2022;107(7):674–80.

    Article  PubMed  Google Scholar 

  21. Stephenson T, Shafran R, De Stavola B, Rojas N, Aiano F, Amin-Chowdhury Z, et al. Long COVID and the mental and physical health of children and young people: national matched cohort study protocol (the CLoCk study). BMJ Open. 2021;11(8):e052838.

    Article  PubMed  Google Scholar 

  22. ISARIC Global Covid-19 Paediatric Follow Up Working Group. ISARIC Global COVID-19 paediatric follow-up – ISARIC. 2021. Available from:

    Google Scholar 

  23. NHS Digital. Mental health of children and young people in England, 2020: wave 1 follow up to the 2017 survey. 2020. Available from:

    Google Scholar 

  24. Wille N, Badia X, Bonsel G, Burstrom K, Cavrini G, Devlin N, et al. Development of the EQ-5D-Y: a child-friendly version of the EQ-5D. Qual Life Res. 2010;19(6):875–86.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Office of National Statistics. Children’s and young people’s experiences of loneliness: 2018. 2018. Available from:

    Google Scholar 

  26. Heinze G, Wallisch C, Dunkler D. Variable selection - a review and recommendations for the practicing statistician. Biom J. 2018;60(3):431–49.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Steyerberg EW. Clinical prediction models: a practical approach to development, validation and updating. New York: Springer; 2009.

    Book  Google Scholar 

  28. Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Hingorani AD, Windt DA, Riley RD, Abrams K, Moons KG, Steyerberg EW, et al. Prognosis research strategy (PROGRESS) 4: stratified medicine research. BMJ. 2013;346:e5793.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Royston P, Moons KG, Altman DG, Vergouwe Y. Prognosis and prognostic research: developing a prognostic model. BMJ. 2009;338:b604.

    Article  PubMed  Google Scholar 

  31. Altman DG, Vergouwe Y, Royston P, Moons KG. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605.

    Article  PubMed  Google Scholar 

  32. Moons KG, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart. 2012;98(9):683–90.

    Article  PubMed  Google Scholar 

  33. Ramspek CL, Steyerberg EW, Riley RD, Rosendaal FR, Dekkers OM, Dekker FW, et al. Prediction or causality? A scoping review of their conflation within current observational research. Eur J Epidemiol. 2021;36(9):889–98.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Steel K, Yapp R. Coronavirus (COVID-19) Infection survey: methods and further information: Office for National Statistics; 2022. Available from:

    Google Scholar 

  35. Yapp R, Bracher M. Coronavirus (COVID-19) infection survey: technical data: Office for National Statistics; 2022. Available from:

    Google Scholar 

  36. Ward H, Cooke GS, Atchison C, Whitaker M, Elliott J, Moshe M, et al. Prevalence of antibody positivity to SARS-CoV-2 following the first peak of infection in England: serial cross-sectional studies of 365,000 adults. Lancet Reg Health Eur. 2021;4:100098.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Office for National Statistics. 2011 Census data. 2011. Available from:

    Google Scholar 

  38. Stephenson T, Shafran R, Ladhani SN. Long COVID in children and adolescents. Curr Opin Infect Dis. 2022.

  39. Kikkenborg Berg S, Dam Nielsen S, Nygaard U, Bundgaard H, Palm P, Rotvig C, et al. Long COVID symptoms in SARS-CoV-2-positive adolescents and matched controls (LongCOVIDKidsDK): a national, cross-sectional study. Lancet Child Adolesc Health. 2022;6(4):240–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Righi E, Mirandola M, Mazzaferri F, Dossi G, Razzaboni E, Zaffagnini A, et al. Determinants of persistence of symptoms and impact on physical and mental wellbeing in Long COVID: a prospective cohort study. J Infect. 2022.

  41. Bai F, Tomasoni D, Falcinella C, Barbanotti D, Castoldi R, Mulè G, et al. Female gender is associated with long COVID syndrome: a prospective cohort study. Clin Microbiol Infect. 2021.

Download references


Michael Lattimore, Public Health England, as Project Officer for the CLoCk study

Jake Dudley, UCL Great Ormond Street Institute of Child Health, London, UK

Role of funder/sponsor (if any)

None of the funders was involved in study design, data collection, analysis or writing.


This work is independent research jointly funded by the National Institute for Health and Care Research (NIHR) and UK Research and Innovation (UKRI) (COVLT0022). All research at Great Ormond Street Hospital NHS Foundation Trust and UCL Great Ormond Street Institute of Child Health is made possible by the NIHR Great Ormond Street Hospital Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, the UKRI or the Department of Health and Social Care. SMPP is supported by a UK Medical Research Council Career Development Award (ref: MR/P020372/1).

Author information

Authors and Affiliations



MDN conceived the study, conducted the statistical analyses, accessed and verified the data and drafted the manuscript. TS and RS conceived the study and supported the drafting of the manuscript. SMP and BLdS conceived the study, provided statistical input to the design, accessed and verified the data and drafted the manuscript. SNL, RS, KM, NR, ED, EYC, TF, IH and EC critically revised the manuscript for important intellectual content and gave final approval for the version to be published. All authors read and approved the final manuscript and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Snehal M. Pinto Pereira.

Ethics declarations

Ethics approval and consent to participate

The study was approved by Yorkshire and the Humber–South Yorkshire Research Ethics Committee (REC reference: 21/YH/0060; IRAS project ID: 293495).

Consent for publication

Not applicable.

Competing interests

Sir Professor Stephenson is the Chair of the Health Research Authority and therefore recused himself from the research ethics application. All other authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table 1.

TRIPOD checklist for prognostic model development and validation studies. Table 2. Final multivariable analysis developed model and optimism adjusted β coefficients. Table 3. Model Performance Statistics based on internal validation. Figure 1. Probability of long COVID for each predictor (from the developed model), when all other predictive variables are at their reference value. Box 1. Final equation for experiencing long COVID 3 months after a PCR-test in children aged 11 to 17 years.

Additional file 2.

Risk Calculator for experiencing long COVID 3 months after a PCR test.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nugawela, M.D., Stephenson, T., Shafran, R. et al. Predictive model for long COVID in children 3 months after a SARS-CoV-2 PCR test. BMC Med 20, 465 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: