At what times during infection is SARS-CoV-2 detectable and no longer detectable using RT-PCR-based tests? A systematic review of individual participant data

Background Tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral ribonucleic acid (RNA) using reverse transcription polymerase chain reaction (RT-PCR) are pivotal to detecting current coronavirus disease (COVID-19) and duration of detectable virus indicating potential for infectivity. Methods We conducted an individual participant data (IPD) systematic review of longitudinal studies of RT-PCR test results in symptomatic SARS-CoV-2. We searched PubMed, LitCOVID, medRxiv, and COVID-19 Living Evidence databases. We assessed risk of bias using a QUADAS-2 adaptation. Outcomes were the percentage of positive test results by time and the duration of detectable virus, by anatomical sampling sites. Results Of 5078 studies screened, we included 32 studies with 1023 SARS-CoV-2 infected participants and 1619 test results, from − 6 to 66 days post-symptom onset and hospitalisation. The highest percentage virus detection was from nasopharyngeal sampling between 0 and 4 days post-symptom onset at 89% (95% confidence interval (CI) 83 to 93) dropping to 54% (95% CI 47 to 61) after 10 to 14 days. On average, duration of detectable virus was longer with lower respiratory tract (LRT) sampling than upper respiratory tract (URT). Duration of faecal and respiratory tract virus detection varied greatly within individual participants. In some participants, virus was still detectable at 46 days post-symptom onset. Conclusions RT-PCR misses detection of people with SARS-CoV-2 infection; early sampling minimises false negative diagnoses. Beyond 10 days post-symptom onset, lower RT or faecal testing may be preferred sampling sites. The included studies are open to substantial risk of bias, so the positivity rates are probably overestimated.


Background
Accurate testing is pivotal to controlling severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), otherwise known as the coronavirus disease 2019 .
Considerable political and medical emphasis has been placed on rapid access to testing both to identify infected individuals so as to direct appropriate therapy, appropriate return to work, and to implement containment measures to limit the spread of disease. However, success depends heavily on test accuracy. Understanding when in the disease course the virus is detectable is important for two purposes, firstly to understand when and how to detect SARS-CoV-2, and secondly to understand how long individuals are likely to remain infective posing a risk to others.
The success of COVID-19 testing depends heavily on the use of accurate tests at the appropriate time. Testing for active virus infection relies predominantly on reverse transcription polymerase chain reaction (RT-PCR), which detects viral ribonucleic acid (RNA) that is shed in varying amounts from different anatomical sites and at different times during the disease course. It is increasingly understood that differences in virus load impact directly on diagnostic accuracy, notably giving rise to negative tests in disease-positive individuals [1,2].
Positivity is contingent upon sufficient virus being present to trigger a positive test which may depend on test site, sampling methods, and timing [3]. For example, it is believed that positive nasopharyngeal RT-PCR declines within a week of symptoms so that a positive test later in the disease course is more likely from sputum, bronchoalveolar lavage fluid, or stool [4]. Nomenclature for anatomical site is also unclear, with a wide variety of overlapping terms used such as "oral", "throat", "nasal", "pharyngeal", and "nasopharyngeal".
Because testing is pivotal to management and containment of COVID-19, we performed an individual participant data (IPD) systematic review of emerging evidence about test accuracy by anatomical sampling site to inform optimal sampling strategies for SARS-CoV-2. We aimed to examine at what time points during SARS-CoV-2 infection it is detectable at different anatomical sites using RT-PCR-based tests.

Methods
This IPD systematic review followed the recommendations of the PRISMA-IPD checklist [5].

Eligibility
Eligible articles were any case series or longitudinal studies reporting participants with confirmed COVID-19 tested at multiple times during their infection and provided IPD for RT-PCR test results at these times. We stipulated that test timings were linked to index dates of time since symptom onset or time since hospital admission as well as COVID-19 diagnosis by positive RT-PCR and/or suggestive clinical criteria, for example World Health Organization (WHO) guidelines [6].

Data extraction
Data were extracted into pre-specified forms. We did not contact authors for additional information. Study, participant characteristics, and ROB were extracted in Microsoft Excel (KG, JS, SG, JA, AW, SM). Data included country, setting, date, number of participants and IPD participants, inclusion criteria, IPD selection, participant age, sample types, RT-PCR test type and equipment, and primers. RT-PCR test results were extracted using Microsoft Access (SM, BS, JP, ZZ, CH).

Risk of bias
We could not identify an ideal risk of bias (ROB) tool for longitudinal studies of diagnostic tests, so we adapted the risk of bias tool for diagnostic accuracy studies QUADAS-2 [7] to include additional signalling questions to cover anticipated issues. ROB signalling questions, evaluation criteria, and domain assessment of potential bias are reported (Additional file 1: Table S2).

Sampling method and grouping
Details of sampling sites and methods, including location of the sampling site(s) and any sample grouping (for example, if combined throat and nasal swabs), were extracted from full texts by a clinician (NS) with queries referred to a second clinician (ST). If stated, details of sampling methodology were recorded, including who collected samples, information regarding anatomical location (e.g. how the nasopharynx was identified), and sample storage (Additional file 1: Table S3).

RT-PCR test result conversion to binary results
IPD RT-PCR results were extracted from each article and converted to binary results ("positive" or "negative"). Data from Kaplan-Meier (KM) curves were extracted using Web digitizer [8] (Additional file 1: Table S3).

Data analysis
Days since symptom onset and days since hospital admission were calculated from reported IPD. Data were presented collated across 5-day time intervals for each sample method, with longer times grouped within the longest time interval, and 95% CI was calculated for proportions. For comparison of duration of positive RT-PCR from respiratory tract (RT) and faecal samples, analysis and graphical presentation were restricted to participants sampled by both methods. Data analysis used STATA (14.2 StataCorp LP, Texas, USA) (Additional file 1: Table S3).

Included studies
A total of 5078 articles were identified, 116 full text articles were screened, and 32 articles were included   (Fig. 1). Most articles were from China, in hospitalised adult participants (Table 1). Articles reported on a total of 1023 participants and 1619 test results.
Twenty-six (81%) articles reported data on test results since the start of symptoms, and 23 (72%) since hospital admission. Sixteen studies including 22% (229/1023) of the participants reported both these time points: The median time between symptom onset and hospitalisation was 5 days (interquartile range (IQR) 2 to 7 days). The median number of participants per study was 22 (IQR 9 to 56, range 5 to 232), and the median number of RT-PCR test results per participant was 4 (IQR 2 to 9) ( Table 2).

Sampling site reporting
Articles variably specified sampling sites according to anatomical location, or grouped more than one site for analysis, for example as upper RT (Additional file 1: Table S4). The most frequent sample sites were faeces (n = 13), nasopharyngeal (n = 10), and throat (n = 9), although there was a range of other sites including blood, urine, semen, and conjunctival swabs ( Table 2). Details of sampling method were generally absent. Two studies specified the person taking the samples. One study described how the nasopharynx was identified and the swab technique (length of contact time with the nasopharynx and twisting). Five studies specified sample storage and transport details.

Sampling site positivity over time
We present RT-PCR test results for 11 different sampling sites at different times during SARS-CoV-2 infection. Figures 2 and 3 show the number of positive and negative RT-PCR results for 5-day time intervals since symptom onset and time from hospital admission, respectively.
The sampling sites yielding the greatest proportion of positive tests were nasopharyngeal, throat, sputum, or faeces. Insufficient data were available to evaluate saliva and semen. Only 33% of participants who were tested with blood samples had detectable virus (44/133; 6 articles [20,26,27,31,35,38]), and almost no samples tested from urine or conjunctival sampling detected virus presence.

Upper and lower respiratory tract sampling
We further grouped sites into upper (URT) and lower (LRT) respiratory tract. The rate of sample positivity reduced faster from URT sites compared to LRT sites (Fig. 4a). Given that analysis across all participants is likely to be influenced by preferential URT sampling of participants with less severe disease, we also analysed participants who underwent both URT and LRT sampling. Again, URT sites on average cleared faster (median 12 days, 95% CI 8 to 15 days) than LRT sites (median 28 days, 95% CI 20 to not estimable; Fig. 4b); the majority of participants clear virus from URT site before LRT (Fig. 4c). Data based on time since hospital admission are consistent with data for time since symptom onset.

Faecal vs. respiratory tract sampling
Across participants sampled by both RT and faecal sampling since hospital admission, 29% of participants were

Intermittent false negative results
Many articles reported intermittent false negative RT-PCR test results for participants within the monitoring time span. Where participant viral loads were reported, several different profiles were distinguished; two examples are shown in Fig. 6 [14,15]. Intermittent false negative results were reported either where the level of virus is close to the limit of detection, or in participants with high viral load but for unclear reasons.

Risk of bias
The proportion of studies with high, low, or unclear ROB for each domain is shown in Fig. 7, and ROB for individual studies is shown in Additional file 1: Table S5. All studies were judged at high ROB. All but one were judged at high ROB for the participant selection domain [17], mainly as they only included participants with confirmed SARS-CoV-2 infection based on at least one positive PCR test. Studies also frequently selected a subset of the participant cohort for longitudinal RT-PCR testing, and only results for these participants were included in the study. Ten studies were judged at unclear ROB for the index test domain as the schedule of testing was based on clinician choice rather than being pre-specified by the study or clinical guidelines, or because the samples used for PCR testing were not pre-specified. Eleven studies were judged at high ROB for the flow and timing domain mainly because continued testing was influenced by easy access to participants, such as by continued hospitalisation.

Key findings
Negative RT-PCR test results were common in people with SARS-CoV-2 infection confirming that RT-PCR testing misses identification of people with disease. Our IPD systematic review has established that sampling site and time of testing are key determinants of whether SARS-CoV-2 infected individuals are identified by RT-PCR.
We found that nasopharyngeal sampling was positive in approximately 89% (95% CI 83 to 93) of tests within 4 days of either symptom onset. Sampling 10 days after symptom onset greatly reduced the chance of a positive test result.
There were limited data on new methods of sample collection like saliva in these longitudinal studies. Sputum samples have similar or higher levels of detection to nasopharyngeal sampling, although this may be influenced by preferential sputum sampling in severely ill participants. Although based on few participants tested at both sampling sites, URT sites have faster viral clearance than LRT in most of these participants; 50% of participants were undetectable at URT sites 12 days after symptom onset compared to 28 days for LRT. We found that faecal sampling is not suitable for initial detection of disease, as up to 30% of participants detected using respiratory sampling are not detected using faecal sampling. Viral detection in faecal samples may be useful to establish virus clearance, although as noted, whether RT or faecal samples have longer duration of viral detection varies between participants.
All included studies were judged at high ROB, so results of this review should be interpreted with caution. Table 3 provides an overview of the major methodological limitations and their potential impact on study results. A major source of bias is that all but one study [19] restricted inclusion to participants with confirmed SARS-CoV-2 infection based on at least one positive RT-PCR test, meaning that the percentage of positive RT-PCR testing is likely to be overestimated.
Lack of technical details, for example of how samples are taken and RT-PCR tests performed, limits the applicability of findings to current testing. Compared to real life, studies were likely to use more invasive sampling methods, use experienced staff to obtain samples, and sample participants in hospital settings where sample handling could be standardised. Consequently, estimates of test performance are likely to be overestimated compared to real-world clinical use and in community population testing including self-test kits.
These limitations have important implications for how testing strategies should be implemented and in

Putting the findings into context of literature
The accuracy of RT-PCR testing is limited by sampling sites used, methods, and the need to test as soon as possible from symptom onset in order to detect the virus. Previous studies have established that in COVID-19 infection, viral loads typically peak just before symptoms and at symptom onset [4] and estimated false negative test results over time since exposure from upper respiratory tract samples [2]. To our knowledge, there has been no prior systematic review of RT-PCR using IPD to quantify the percentage of persons tested who are positive and how this varies by time and sampling site.
Understanding the distribution of anatomical sites with detectable virus is clinically relevant, especially given independent viral replication sites in nose and throat using distinct and separate genetic colonies [17]. Understanding of different patterns of detection and duration of virus detection at different body sites is essential when designing strategies of testing to contain virus spread. Notably, it is unclear if detection of virus in faeces is important in disease transmission, although faecal infection was shown in SARS and MERS [41].

Strengths of study
This review uses robust systematic review methods to synthesise published literature and identifies overall patterns not possible from individual articles. Using IPD, we examined data across studies and avoided studylevel ecological biases present when using overall study estimates. IPD regarding sample site at different time points during infection is vital because it provides an overview of test performance impossible from individual studies alone. Synthesised IPD can also substantiate or reject patterns appearing within individual studies. Within-participant paired comparisons of sampling sites also become possible with sufficient data.

Limitations of study
The main limitation is the risk of bias in the included studies. Although constraints were understandable given the circumstances in which the studies were done, the consequences for validity need to be highlighted. The percentage of positive RT-PCR testing is likely to be overestimated, because inclusion was restricted to participants with confirmed SARS-CoV-2 infection based on at least one positive RT-PCR test in all but one study [19]. This means that people who had a COVID-19 infection but never tested positive on at least one RT-PCR test would not have been included. This could arise if SARS-CoV-2 is not present at easily sampled sites or at the time participants were tested. This makes it impossible to determine the true false negative rate of the test-the proportion of people who actually have SARS-CoV-2 but would receive a negative RT-PCR test result. It is possible that only half of persons infected by SARS-CoV-2 may test positive, as a community surveillance study in Italy found only 53% (80/152) persons tested RT-PCR positive in households quarantined for 18 days with persons who tested PCR positive [39]. The same study also identified households where no one tested RT-PCR positive, but where there were clusters of persons with symptoms typical of COVID. Poor reporting of sampling methods and sites impaired our ability to distinguish between and report on variability between them. For some sampling methods such as saliva and throat swabs, more studies are needed. There were also sparse data on sampling methods that are becoming more widespread, such as participant self-sampling [42] and short nasal swab sampling (anterior nares/mid turbinate) [43]. Our index times may be subject to bias as symptom onset is somewhat subjective and hospital admission practices vary by country, pandemic stage, and hospital role (i.e. healthcare vs. isolation). The results presented do not correspond to following the same participants across time, but the testing at clinically relevant time snapshots reported from individual studies, so that participants tested at later time points are likely to have more severe disease; this does not limit the interpretation of results in understanding testing of participants in most clinical contexts. Comparisons of sampling sites should be restricted to participants tested at the relevant sites.
We have used analysis methods that do not include clustering within studies, to keep analyses simple to understand and present, and to avoid complications of fitting models where the number of participants in each cluster varies. Ultimately, many potentially eligible studies did not report IPD which led to their exclusion, or only reported IPD for a subset of participants in the study. We would welcome contact and data sharing with clinicians and authors to rectify this.

Implications for policy/practice/future research
To avoid the consequences of missed infection, samples for RT-PCR testing need to be taken as soon as symptoms start for detection of SARS-CoV-2 infection in preventing ongoing transmission.
Even within 4 days of symptom onset, some participants infected with SARS-CoV-2 will receive negative test results. Testing at later times will result in a higher percentage of false negative tests in people with SARS-CoV-2, particularly at upper RT sampling sites. After 10 days post-symptoms, it may be important to use Fig. 6 Example participants with intermittent false negative results. a An example of a participant with high viral load, but where alternate RT-PCR test results report high viral load or undetectable virus. b A participant where virus levels have reduced over time to a level around the limit of viral detection, and at these low levels of virus, intermittent negative results will occur due to differences in the location or amount of sample  Table S2). For each domain, the percentage of studies by concern for potential risk of bias is shown: low (green), unclear (yellow), and high (red)

Domain
Details of bias and applicability issues Impact on interpretation of study data

Participants (source of bias)
In these studies, the reference test usually incorporates RT-PCR (index test).
• RT-PCR testing is usually a key component of identifying people with SARS-CoV-2 infection. • Participants will not be detected or included in these studies when SARS-CoV-2 is not present at easily sampled sites and at the time that participants were available for testing.
Unclear how many and what severity of participants with SARS-CoV-2 are not included in studies. People who do not have a positive RT-PCR test at some point are excluded. This could lead to overestimation of positivity.
Rates of positivity will be inflated as only people with virus accessible for sampling for RT-PCR tests will be included in studies.

Participants (source of bias)
Most participants are identified or present based on respiratory tract symptoms such as cough or respiratory distress.
Unclear how many and what severity of participants with SARS-CoV-2 are not included in studies.
• Participants will not be detected or included in these studies when less common symptoms or asymptomatic.
• Participants included will be biased to over-represent people with detectable virus in respiratory tract sampling sites and at times frequently used for testing (post symptom onset or at admission to hospital).
Studies will inflate positivity for sampling sites that overlap with sampling sites used in RT-PCR reference testing.
• For example, we identified 30% of participants with RT positivity but with negative results from faecal sampling. However, if participants had only faecal virus, would they have been included in the studies?
Index test: RT-PCR (applicability) • Studies included are likely to use more invasive sampling methods than acceptable in widespread population testing. For example, nasopharyngeal testing is likely in many current studies to be based on long swabs and self test kits.
Percentage of people with detectable virus may be overestimated when testing is applied in real-world clinical use and in population testing.
• Studies will use experienced staff to obtain samples, handle, process, and conduct tests.
• Studies are mostly sampling participants in hospital settings or in specialised research community testing research where sample handling, transport, and storage have been standardised.
• Variation in RT-PCR kits is minimised as studies are based in few hospitals or limited to a research setting Reporting of sampling sites and methods is poor.
• Poor reporting may have led to less ideal grouping of sampling in analysis. • Some studies are likely to use a variety of nasopharyngeal sampling methods depending on the individual participants, but the type of sampling is typically reported at a study level for a particular sampling site.
Percentage of people with detectable virus may be over-or underestimated.
Flow and timing Uncertainty and inconsistencies in time of sampling Percentage of people with detectable virus may be over-or underestimated at particular times.
• Time of symptom onset can be subjective unless based on fever, but some participants do not have fever.
• Time of symptom onset may be different if asked of participants in ICU setting.
• Time of hospitalisation and discharge may be affected by function hospitalisation serves in containment of disease lower RT or faecal sampling. Valid estimates are essential for clinicians interpreting RT-PCR results. However, ROB considerations suggest that the positive percentage rates we have estimated may be optimistic, possibly considerably so. Participants can have detectable virus in different body compartments, so virus may not be detected if samples are only taken from a single site. Some hospitals in the UK now routinely take RT-PCR samples from multiple sites, such as the nose and throat. More studies are urgently needed on evolving sampling strategies such as self-collected samples which include saliva and short nasal swabs. Future studies should avoid the risks of bias we have identified by precisely reporting the anatomical sampling sites with a detailed methodology on sample collection. Table 4 details example studies helpful for future study design.
Further sharing of IPD will be important, and we would welcome contact from groups with IPD data we can include in ongoing research.

Conclusions
RT-PCR misses detection of people with SARS-CoV-2 infection; early sampling minimises false negative diagnoses. Beyond 10 days post-symptom onset, lower RT or faecal testing may be preferred sampling sites. The Table 3 Biases and issues in interpretation (Continued)

Domain
Details of bias and applicability issues Impact on interpretation of study data spread. In some studies, the hospitals were also quarantine centres, so participants were hospitalised immediately at onset of mild symptoms rather than restricted to patients needing oxygen.
Flow and timing Clinical cohort within studies changes across time points. Percentage of people with detectable virus may be overestimated at particular later time points as these correspond to participants who were severely ill.
• Participants who have recovered from COVID-19 in most studies are typically not tested after 2 negative tests 24 h apart.
• Many studies only test inpatients at the hospital, so the participants sampled between 0 and 14 days typically have less severe disease than those tested longer Flow and timing (selective outcome reporting) Some studies only publish IPD data for a selection of people. Available IPD data may not represent a typical spectrum of participants in the different settings (community setting, hospital, ICU, nursing home, prison).

Publication bias
Published data is likely to be biased towards publication of research active groups which may not represent typical real world.
Percentage of people with detectable virus may be overestimated.   [15]