Skip to main content
  • Research article
  • Open access
  • Published:

Use of whole-genome sequencing to distinguish relapse from reinfection in a completed tuberculosis clinical trial



RIFAQUIN was a tuberculosis chemotherapy trial in southern Africa including regimens with high-dose rifapentine with moxifloxacin. Here, the application of whole-genome sequencing (WGS) is evaluated within RIFAQUIN for identifying new infections in treated patients as either relapses or reinfections. WGS is further compared with mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) typing. This is the first report of WGS being used to evaluate new infections in a completed clinical trial for which all treatment and epidemiological data are available for analysis.


DNA from 36 paired samples of Mycobacterium tuberculosis cultured from patients before and after treatment was typed using 24-loci MIRU-VNTR, in silico spoligotyping and WGS. Following WGS, the sequences were mapped against the reference strain H37Rv, the single-nucleotide polymorphism (SNP) differences between pairs were identified, and a phylogenetic reconstruction was performed.


WGS indicated that 32 of the paired samples had a very low number of SNP differences (0–5; likely relapses). One pair had an intermediate number of SNP differences, and was likely the result of a mixed infection with a pre-treatment minor genotype that was highly related to the post-treatment genotype; this was reclassified as a relapse, in contrast to the MIRU-VNTR result. The remaining three pairs had very high SNP differences (>750; likely reinfections).


WGS and MIRU-VNTR both similarly differentiated relapses and reinfections, but WGS provided significant extra information. The low proportion of reinfections seen suggests that in standard chemotherapy trials with up to 24 months of follow-up, typing the strains brings little benefit to an analysis of the trial outcome in terms of differentiating relapse and reinfection. However, there is a benefit to using WGS as compared to MIRU-VNTR in terms of the additional genotype information obtained, in particular for defining the presence of mixed infections and the potential to identify known and novel drug-resistance markers.

Peer Review reports


Evaluations of drug trials for tuberculosis (TB) are complicated by the fact that a recurrence of disease can either be due to endogenous relapse of disease or to subsequent exogenous infection with a new strain (reinfection). Historically, during the major TB chemotherapy trials of the 1960s to 1980s (reviewed by Fox et al. [1]), it was not possible to differentiate isolates, and all new infections that occurred after the trial conclusion were labelled as relapses.

From the 1980s, a series of genomic-based methods for typing strains of Mycobacterium tuberculosis were developed, in particular IS6110 restriction fragment length polymorphism (RFLP), spoligotyping and mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) typing [24]. Some trials therefore began to use molecular methods to differentiate relapses from reinfections. This was initially through IS6110 RFLP typing [57] and then through MIRU-VNTR typing [8], while other trials continued without any differentiation [9].

MIRU-VNTR became the favoured typing approach because it combined reasonable discrimination with a readout that could both be easily measured and be described in a digital form [3]. More recently, whole-genome sequencing (WGS) has enabled the identification of single-nucleotide polymorphism (SNP) differences, thus leading to far greater discrimination in TB epidemiological studies [1013].

Two groups have recently used WGS to evaluate paired samples, comparing SNP differences between the original infections and new infections following treatment [14, 15]. The study by Bryant et al. [14] was based on an ongoing clinical trial [16] that was being carried out in sub-Saharan Africa, south and east Asia, and central America. Of the 36 paired samples, 33 were found to be highly similar (≤6 SNPs; classed as relapses) and three were highly divergent (≥1306 SNPs; classed as reinfections).

The report by Guerra-Assunção et al. [15] was not based on a clinical trial, but was taken from the Karonga Prevention Study, a long-term population-based programme in Malawi. In this programme, 60 paired samples collected over a 15-year time period were sequenced, and while the authors also found a clear division in SNP numbers between relapses and reinfections, it was not as marked as in the Bryant study. Thus, they classed 46 samples with 0–8 SNP differences as relapses, and 14 with >100 SNP differences as reinfections.

In this study, we performed WGS and analysed SNPs to compare pre- and post-treatment isolates from the completed RIFAQUIN clinical trial [17], a study evaluating high-dose rifapentine with moxifloxacin, carried out in sub-Saharan Africa. Successful sequencing was carried out on 36 pairs of samples of M. tuberculosis recovered before treatment and from those patients showing positive cultures at 6 months, and results were compared with MIRU-VNTR data. Our results agree with the general findings from the two studies referred to above, in that the overwhelming majority of secondary cases were classified as relapses. Importantly, WGS was further able to monitor possible epidemiological connections and sample errors during the trial, which were not detected using MIRU-VNTR. Given the added benefit of WGS in this context, we suggest that WGS should be routinely used as the method of choice in such trials.



The RIFAQUIN chemotherapy trial, in collaboration with six institutions in southern Africa, has been previously described [17]. Between August 2008 and August 2011, patients with newly diagnosed smear-positive drug-sensitive TB were randomly assigned to one of the following:

Control regimen: 2 months of daily ethambutol, isoniazid, rifampicin and pyrazinamide followed by 4 months of daily isoniazid and rifampicin;

4-month regimen: Isoniazid replaced by moxifloxacin daily for 2 months followed by 2 months of twice-weekly moxifloxacin and 900 mg rifapentine; or

6-month regimen: Isoniazid replaced by moxifloxacin daily for 2 months followed by 4 months of once-weekly moxifloxacin and 1200 mg rifapentine.

Sputum was examined by microscopy and culture at regular intervals for treatment failure or relapse. Patients had up to 18 months of follow-up post randomisation, with the patients recruited last having 12 months of follow-up post-randomisation. Samples from patients with two or more consecutive M. tuberculosis-positive cultures after 6 months (or at the end of treatment) were selected for WGS.

MIRU-VNTR determination and assignment

The 24-loci MIRU-VNTR typing of these isolates was previously described [17]. Briefly, a 10 μL loop was used to pick up a sample of M. tuberculosis colonies by sweeping across growth on a Lowenstein–Jenson (LJ) slope. Bacteria were heat-killed and DNA extraction performed using lysozyme and proteinase K digestion followed by phenol-chloroform extraction and ethanol precipitation [18]. The 24 MIRU-VNTR loci were amplified in eight labelled multiplex PCR reactions, and the amplicons sized, with MapMarker 1000 standard (BioVentures, Murfreesboro, TN, USAs), by capillary electrophoresis on the sequencer (3130 Genetic Analyzer, Applied Biosystems, Waltham, MA, USA). Analysis was carried out using the GeneMapper software (Applied Biosystems, Waltham, MA, USA), which assigns alleles based on the customised bin-sets (fragment sizes and dyes) used to define each allele. For some samples there was variable coverage across the MIRU-VNTR loci using the sequencer, so, where possible, any missing loci were confirmed by single-plex PCR with products sized by standard agarose gel electrophoresis. Where possible, paired samples (pre- and post-treatment) from a given patient were run in parallel.

Whole-genome sequencing

For the WGS, 50 μL, containing at least 250 ng, of genomic DNA from each sample was sheared using the Covaris E220 for a target size of 200 bp (Peak Incident Power: 175; duty factor: 10%; cycle/burst: 200; temperature: <8 °C; time: 120 s). Libraries were prepared from sheared DNA using the NEB DNA Ultra kit in accordance with standard protocol (New England Biolabs, Hitchin, UK). The NEB adapters were substituted for the set described by Kozarewa and Turner [19]. Libraries were quantified using the Qubit High Sensitivity DNA assay and pooled equimolarly (Invitrogen, UK). The pools were subjected to paired-end sequencing carried out on a single lane of the Illumina HiSeq 2500 (v3 chemistry, read length 100 bp). Samples which produced a low yield were re-pooled and sequenced on a single MiSeq run (v2 chemistry, read length 250 bp).

Sequence analyses

Sequence reads were mapped to the H37Rv reference genome (RefSeq accession: NC_000962) using bwa mem v0.7.3a-r367 [20], alignments were sorted, and duplicates were removed with samtools v0.1.19 [21]. Site statistics were generated using samtools mpileup and variant sites were filtered based on the following criteria: mapping quality above 30, site quality score above 30, at least four reads covering each site with at least two reads mapping to each strand, at least 75% of reads supporting site (DP4) and an allelic frequency of 1. Sites that failed these criteria in any isolate were removed from the analysis. Phylogenetic reconstruction was performed using RAxML v8.2.3 [22] with a General Time Reversible (GTR) model of nucleotide substitution and a Gamma model of rate heterogeneity; branch support values were determined using 1000 bootstrap replicates. Relapse or reinfection calls were made by applying the above filtering criteria to the individual patients’ paired samples. INDELS were identified using samtools mpileup as above, but setting the minimum fraction of gapped reads for candidates to 0.05.

In silico spoligotyping and sub-lineage typing

Spoligotypes were generated using SpolPred [23]. Sub-lineages were further determined using the presence or absence of a set of 62 lineage-defining SNPs as derived by Coll et al. [24].

Mixed infections

For each isolate sequence a count of the percentage of reads supporting a variant base at each genome position was plotted. Mixed isolates can be identified by the presence of an extra peak, suggesting the presence of two genotype populations in the sequenced sample. Base calls for the majority and minority strains were separated based on the per cent reads and pseudo-sequences were generated and subsequently included in the phylogenetic reconstruction as above.


Samples studied

Figure 1 shows a flowchart of the samples studied. A total of 827 patients, with newly diagnosed, microscopy-positive pulmonary TB were enrolled in South Africa, Zimbabwe, Botswana and Zambia in the trial. Fifty-one patients had positive cultures in post-treatment follow-up and therefore required genotyping to distinguish relapse from reinfection (as per the RIFAQUIN protocol [17]). DNA was available to generate MIRU-VNTR data for 44 pairs of samples (pre- and post-treatment). The remaining DNA was passed for WGS, and good-quality sequences (>20× coverage) were generated for both pre- and post-treatment samples of 36 patients.

Fig. 1
figure 1

Flowchart of pairs of samples studied. MIRU mycobacterial interspersed repetitive units, WGS whole-genome sequencing

SNP differences were determined between the pairs of isolates, and a comparison with MIRU-VNTR differences is shown in Table 1. Two main groups can be identified: 32 pairs of isolates had five or fewer SNP differences, and four pairs of samples had a much higher number of SNP differences (range 737–1329). An additional single pair of isolates differed by 57 SNPs, but this was probably because the pre-treatment isolate contained a mixed infection, as discussed below.

Table 1 Comparison between the differences in single-nucleotide polymorphisms and mycobacterial interspersed repetitive units

Phylogenetic reconstruction of SNPs

Phylogenetic reconstruction of variant SNPs (Fig. 2a) showed that the majority (32 out of 36) of the isolate pairs had low numbers of SNP differences and were therefore clearly determined as cases of relapse. One isolate pair was identified as a mixed infection (see below). The remaining three isolate pairs that had high numbers of SNP differences appear quite divergent on the tree (marked in green) and were determined as likely reinfections.

Fig. 2
figure 2

a Phylogenetic reconstruction of 36 pairs of isolates. These were inferred using 5132 high-quality single-nucleotide polymorphisms (SNPs) following the removal of 661,083 low-quality sites and the remaining invariant sites. The tree was rooted using the H37Rv reference strain sequence. Relapse, reinfection and mixed are denoted with black/blue, green and red tips respectively. Blue tip labels are further shown in panels be. be Branches have been amplified where unexpected similarity was seen; the numbers of SNPs between the most divergent samples are given

There were also isolates that mapped closely to other patient isolates on the tree, and these merited closer attention to see if there were genuine connections or unexpected problems caused by possible laboratory handling errors.

Panels b and c in Fig. 2 show one class of pattern that was observed with clustered isolate pairs, in which there were no SNP differences between each member of a pair, but each pair was very closely related to another pair. In both panels, the two pairs of samples came from different centres (panel b: 005 and 014, Harare and Marondera, both in Zimbabwe; panel c: 008 Harare, Zimbabwe, and 001 Francistown, Botswana, on the borders of Zimbabwe; Table 2), suggesting that a laboratory processing error was unlikely. An alternative explanation is that highly similar local strains were circulating in the two relatively close regions and had evolved independently over time.

Table 2 Relationship between single-nucleotide polymorphisms and mycobacterial interspersed repetitive units-variable number tandem repeat differences

Panels d and e in Fig. 2 show a different type of pattern, in which a pair of isolates from one patient clustered together, as expected for relapses, but was also identical to a single isolate from another pair, suggesting a possible transmission event. In Fig. 2d, a post-treatment sequence for isolate 009 was identical to isolate pair 012; the two 009 isolates differed by 1233 SNPs. In Fig. 2e, a pre-treatment isolate 004-1 was identical in sequence to both isolates of patient 003; the two 004 isolates differed by 737 SNPs. All four patients received treatment in the same city, Harare (Table 2). While it is not impossible that these genotypes were genuinely isolated from the two patients, 009 and 004, another possible explanation is some form of laboratory processing error. Indeed, in one case the patients visited the hospital on the same day, and in the other results were reported at the same time. This combined with their geographical co-location would further support the possible processing error interpretation. It is also worth noting that if these are indeed errors, they would normally be invisible to the analysis without the resolution of WGS.

Mixed infections

One patient’s pair of samples (035) displayed 57 SNPs between the pre- (035-1) and post-treatment (035-2) isolates and was therefore initially classified as a reinfection. However, further analysis of the WGS data showed evidence of a mixed infection in the pre-treatment isolate (035-1; Fig. 3a) corresponding to an approximately 75% to 25% combination of two genotypes. Using this majority/minority ratio of read coverage, it was possible to separate the two genotypes and further phylogenetic reconstruction suggested that it was likely that the minority genotype (035-1-min) was closely related to the post-treatment isolate (035-2; Figs. 2a and 3b). This suggests that this was in fact a relapse of a previously unidentified minority genotype, rather than a reinfection as previously assigned.

Fig. 3
figure 3

Identification of mixed infection. a Counts of genome sites which were called as a reference base but showed a significant proportion of sequence reads also supporting a variant base call (035-1); b the equivalent plot for an isolate with no mixed infection (035-2). The presence of a second peak in a is suggestive of a mixture with a minority genotype

Initially there appeared to be 57 SNP differences between the pre- and post-treatment isolates (035-1, 035-2), which would have been an unusual result given that the previous studies had only identified reinfections with very high SNP differences, and nothing at an intermediary level. The observation of mixed genotypes would explain this discrepancy because one of the main filtering criteria in the site-calling algorithm is to remove sites with mixed genotype calls (<75% read support for the call), so the real number of SNP differences between the isolates is likely to be higher. After separating the genotypes, it was estimated that the number of SNP differences between the pre-treatment minority genotype and the post-treatment isolates was 869 SNPs. The pre-treatment minority genotype and the post-treatment isolate appeared to differ by 245 SNPs; however, the genotype separation algorithm used was relatively crude, with filtering based on parameter cut-offs, so it was not possible to completely separate the genotypes at all mixed genome sites, reflecting the overlapping shape of the two distributions (Fig. 3a). However, the proximity of their placement on the tree (Fig. 2a) suggests they are highly related and thus this patient’s disease was likely a relapse.

Comparing WGS with MIRU-VNTR data

Figure 4a shows there is a stark difference in the number of SNP differences between cases of relapse and reinfection, an observation also made by Bryant et al. [14]. Table 1 and Fig. 4b show the distribution of MIRU-VNTR differences. The majority of pairs had no MIRU-VNTR differences (out of up to 21 loci determined), but some had a maximum of seven loci different. We experienced technical difficulties which meant that the number of loci amplified varied (Table 2; see Discussion).

Fig. 4
figure 4

Analysis of single-nucleotide polymorphism (SNP) and mycobacterial interspersed repetitive units-variable number tandem repeat (MIRU-VNTR) differences between pairs of isolates. Data are summarised from Tables 1 and 2. a Number of SNP differences detected between paired isolates; b number of MIRU-VNTR differences detected between paired isolates; c correlation between SNP and MIRU differences; d number of informative MIRU loci on which differences were based (for each pair of samples, the lower number is shown)

The relationship between SNP and MIRU-VNTR differences is shown in Table 2 and Fig. 4c. There was a clear MIRU-VNTR difference between those labelled as relapses using WGS (zero to two MIRU-VNTR differences) and those labelled as reinfections (seven to eight MIRU-VNTR differences). However, within the relapse group, there was no obvious relationship between these two measures: all samples with two to five SNPs had no MIRU-VNTR differences, whereas there were four with no SNP differences and one MIRU-VNTR difference. Overall, WGS largely agreed with MIRU-VNTR (Table 3), with only the likely mixed infection causing a possible discrepancy. That was based on a decision in the trial to classify pairs with two or more MIRU-VNTR differences as reinfections.

Table 3 Comparison of the use of whole-genome sequencing with mycobacterial interspersed repetitive units-variable number tandem repeats for calling relapse or reinfection

In silico spoligotyping and sub-lineages

Human M. tuberculosis strains have been divided into six global lineages, and further into sub-lineages, some of which may have distinct infection phenotypes [24]. In addition to the whole-genome SNP-based methodology used above, analysis using a set of 62 lineage-defining SNPs [24] was also used to assign sub-lineages (Additional file 1: Table S1). The three reinfections observed all involved different sub-lineages in the pair (patient 004: Euro-American LAM → Euro-American S type; patient 009: Euro-American S-type → East Asian; patient 015: Euro-American T → East Asian).

In silico spoligotyping was also performed (Additional file 1: Table S1). Of the 32 relapse pairs, 24 had identical spoligotypes and the remaining eight had one to seven spacer differences; all three reinfections had different spoligotypes (9–29 spacer differences).

Antimicrobial resistance

Drug susceptibility testing showed that only one post-treatment isolate (004-2) had a drug-resistance phenotype, confirmed by genotyping (RIFR: rpoB S450L; INHR: katG S315T; EMBR: embB M306V), while its pre-treatment isolate partner (004-1) was susceptible to all drugs tested. Therefore, there was no evidence of any acquisition of antibiotic resistance during the trial in the samples that were tested with WGS.

SNPs in relapse isolates

While most SNPs that arise in a strain between treatment and relapse would be expected to be random, as long as they are not deleterious, it would be a reasonable hypothesis that some SNPs may actively help the bacteria survive. Comparing the relapse pairs, 18 out of 30 SNPs were synonymous and 12 out of 30 were non-synonymous (Table 4). Of the 12 non-synonymous SNPs and two INDELs, none were in a gene associated with antibiotic resistance, in accord with the fact that no phenotypic resistance was seen. However, two SNPs lay in genes that are implicated in pathogenesis, both associated with esx Type 7 secretion systems (T7SSs) [25] (discussed below).

Table 4 Variants identified in relapse pairs


Relapse versus reinfection

In this study, high-quality genome sequence was generated for 36 pairs of isolates. The majority of pairs (32 of 36) were shown to have very few SNPs (≤5) between pre- and post-treatment M. tuberculosis isolates, suggestive of relapse and thus treatment failure.

On initial inspection, the other four pairs (4 of 36) had significant SNP differences between samples (57, 737, and two >1000), indicative of reinfection. However, phylogenetic analyses cast doubt on two pairs, in which a single isolate of each pair was highly related to another patient’s isolate in the study. While it is possible that these reflect transmission events, it is difficult to rule out some form of laboratory processing error; indeed, a transmission event so similar to another pair of samples in the trial (in one case the pre-treatment and in the other the post-treatment samples) would be relatively uncommon though not impossible, but such a pattern would be expected if there were a sample processing error and patient samples were swapped. A similar event was suggested by Casali et al. [26]. Indeed, trials inserting negative samples into the TB diagnostic process showed that errors can occur [27], but strain-typing methods allowed actual contamination to be detected. A review by Burman et al. [28] indicated a median false-positive rate of 3.1% in published studies. WGS can thus help identify when processing errors have occurred, thereby improving overall trial data quality and acting as a quality control measure of trial procedures.

The case with 57 SNP differences between the isolate pair was probably a mixed infection, and while accurate SNP figures could not be obtained, the data were consistent with a relapse from one of the two pre-existing strains. These are described as a major/minor strain within the sequencing data, but that may not accurately reflect the relative levels in the patient; these levels could, for example, be affected by colony size on the LJ slopes, and the actual loop sample taken for DNA preparation. The isolate pair were initially identified as being different from each other by a higher number of SNP differences (57) than would be expected for a relapse, but at an unusually low level of SNPs for a reinfection compared to other reported examples. This is likely to be due to the mixed infection causing many genuine SNPs to be discarded as uncertain by the site-calling algorithm. Reports of similar cases of mixed infections in previous studies [14, 15, 29] support the likelihood that this interpretation may be genuine, thus suggesting that it is important to assess isolates for evidence of mixed infections before calling relapse/reinfection.

Therefore, from the 36 pairs of isolates sequenced, there was strong evidence that 32 were relapses, one was a mixed infection masking a likely relapse, and three were reinfections, although two of these may have been the result of laboratory processing errors. This proportion (32 of 35 (91%) relapse: 3 of 35 (9%) reinfection; excluding the possible mixed infection) can be compared with previously reported relapse to reinfection proportions of 92:8, also in a chemotherapy trial [14], and 73:27 in the rather different situation of a long-term study with longer post-treatment follow-up (over 12 years in some cases) [15]. This latter study indicated that relapses occurred towards the start of the follow-up, and particularly within the first 2 years, and therefore is consistent with the study reported here.

SNP differences in this and previous studies

The number of SNP differences in the relapse and reinfection groups was comparable to previous pre- and post-treatment studies (Table 5). Casali et al. [26] also found up to four SNP differences over 4 years in intra-patient studies. In each of the previous relapse studies, there was a large gap between the number of SNPs found in presumed relapses and in reinfections. This both lends support to the definition used to identify relapse versus reinfection, and also gives weight to the suggestion by Bryant et al. [14] that there is some immunity to reinfection by very similar strains. The same pattern was observed in this study, even though the phylogenetic tree showed that highly similar strains were circulating. Guerra-Assunção et al. [15] showed less SNP diversity in reinfections (100 rather than 1000 SNPs), and it would be interesting to determine if there is an effect of time, with similar strains only reinfecting after a longer passage of time. Casali et al. [26] demonstrated that there is strain diversity within a single sputum specimen, with up to 10 SNP differences seen when individual colonies were sequenced. The methodology described in this study deliberately took a sweep of colonies, which meant that much of this strain diversity within a single specimen would not be seen in WGS at the depth of coverage used.

Table 5 Number of single-nucleotide polymorphism differences between relapse and reinfection paired samples in different studies

SNPs seen in relapse isolates

For 16 of the 32 relapse pairs sequenced, SNPs were identified between the isolates (Table 4; excluding the mixed infection). While it is likely that many or most of these will not be advantageous to the bacteria, it is a plausible hypothesis that some of them might have a survival advantage.

Of the 12 non-synonymous SNPs observed in relapse isolate pairs, two were in gene systems that have proven involvement with pathogenesis: the two T7SSs esx1 and esx3. One lay in eccB3, which is a gene in the ESX3 T7SS, which is essential for growth. This system is involved in pathogenesis, partly through the control of iron acquisition, which appears to have a role in metal homeostasis [30]. The other was located in mce1B, which is a gene in the ESX1 T7SS, which is essential for virulence and exports the well-characterised ESAT-6/CFP10 complex [25]. Bryant et al. [14] reported that two genes with SNPs had functions associated with oxidative stress, and Guerra-Assunção et al. [15] reported an association with katG, well known for being involved in resistance to both oxidative stress and isoniazid. Clearly these may just be chance associations, but they also indicate potential avenues for studying bacterial survival during chemotherapy. The scale of investment in phase 2 and 3 trials is such that there is an obligation to extract as much information as possible from the study and the contribution of WGS is fundamental to understanding the bacteriology under treatment.

Mixed infections

A potential confounder in differentiating relapse from reinfection is that of mixed infections. If either the initial or subsequent infection is mixed, then sampling just one isolate could give a misleading designation. One likely mixed infection was identified with a 75:25 genotype ratio, although this ratio may not represent the ratio of the mixture in the bacterial population in vivo.

Of course, these methods would only reveal mixed infections with significant proportions of each strain, and it cannot formally exclude the possibility that other infections were also mixed, but at a very low levels. Bryant et al. [14], Guerra-Assunção et al. [15], Casali et al. [26] and Köser et al. [29] all identified mixed infections using WGS. Other studies have demonstrated them using alternative techniques, including MIRU-VNTR [3134], but WGS is more powerful, and Bryant et al. [14] found that WGS detected more mixed infections than MIRU-VNTR.

The definition of a mixed infection is made less clear by the finding that at least 10 SNP differences can be found within a single sputum sample [26], and the observation that very similar strains circulate in high-prevalence settings (e.g. Fig. 2). However, the data here and in the previous relapse studies [14] suggest that some sort of immunological protection might exist that makes successful co-infection with a similar strain less likely.

Comparing WGS to MIRU-VNTR and spoligotyping

Previously, owing to its speed and digital output, MIRU-VNTR has been preferred to the earlier IS6110 profiling as a means of typing M. tuberculosis isolates; indeed, it was only recently described as “the new reference standard for molecular epidemiological studies” [35].

In this study, there was a correlation between SNP and MIRU-VNTR differences for isolates predicted to be cases of relapse (0–5 SNP; 0–2 MIRU-VNTR loci) and reinfection (SNP > 1000; MIRU-VNTR loci ≥7). This is in contrast with the study of Bryant et al. [14] who reported that three reinfection pairs had 1–13 different loci, although that study was an interim analysis performed prior to final data resolution and unbinding, which may have impacted on the ultimate assignment of the patients. Furthermore, Casali et al. [26] found that two MIRU-VNTR differences could correspond to a significant number of SNP differences. A transmission study by Walker et al. [11] only examined isolates with successful 24-loci MIRU-VNTR data, showing that, up to a difference of 100 SNPs, isolates could have 1–3 MIRU-VNTR locus differences, while above 100 SNP differences, the number of MIRU-VNTR changes increased.

Achieving consistent results with MIRU-VNTR, which involves 24 multiplexed PCRs, is known to be technically challenging [14, 26, 36, 37]. Indeed, there was significant variation in the number of loci amplified in this study (Table 2, Fig. 4d), which we attribute to a combination of DNA quantity and quality, and the technical difficulties referred to above. Furthermore, other limitations and issues with MIRU-VNTR in relation to the study setting have been discussed in a systematic review [38].

WGS is technically more straightforward and was comparable in cost in our hands (~£100 per sample), but with reducing costs and whole-genome resolution, it is clearly a superior, more robust method then MIRU-VNTR for strain typing. In addition, WGS can provide additional information by identifying markers associated with drug resistance, which could be useful in the context of relapsing cases in a clinical trial. Sequence data is also more amenable to incorporation into other studies and will provide further information on TB evolution as global databases of genome information grow.

Spoligotyping has been widely used for robust division of M. tuberculosis into different sub-types [4], but we found that SNPs were not only far more sensitive for determining relapses and reinfections, but also more useful for assigning sub-lineages.

Value of WGS in chemotherapy trials

The data from this study in combination with the previous two relapse studies [14, 15] allow an evaluation of the relative benefit of using WGS or MIRU-VNTR as a means of determining relapses from reinfections in chemotherapy trials.

The RIFAQUIN trial was in an area of high endemicity, suggesting that reinfections are not likely to be higher elsewhere due to disease prevalence. Thus, the data presented in this study and previously [14, 15] indicate that the proportion of reinfections is very low compared to relapse, although Guerra-Assunção et al. [15] suggest that reinfections may rise at later time points after completion of therapy. Furthermore, cases in which isolates are identified as reinfections are more likely to be wrong, because the possible errors observed here (processing errors, unrecognized mixed infections) are more likely to suggest a reinfection.


In the pre-genomic era, all post-treatment infections were presumed to be relapses, and it could be argued that, due to the low reinfection rates and the increased cost and time required to perform the sequencing, WGS provides only modest gains for the analysis of the primary outcome in a chemotherapy clinical trial of this nature.

Nevertheless, in addition to robust genomic evidence for treatment outcome, the added information that WGS provides is scientifically valuable and will become of greater value as more genome sequence data and more information about the genotype–phenotype correlation and its impact on disease and transmission becomes available. Furthermore, future trials for new TB drugs in the development pipeline or novel combination regimens may be held in areas of high TB prevalence where re-infection or mixed infections are more likely, thus making accurate strain discrimination imperative; in these instances, WGS should be the method of choice.


  1. Fox W, Ellard GA, Mitchison DA. Studies on the treatment of tuberculosis undertaken by the British Medical Research Council tuberculosis units, 1946-1986, with relevant subsequent publications. Int J Tuberc Lung Dis. 1999;3(10 Suppl 2):S231–279.

    CAS  PubMed  Google Scholar 

  2. Hermans PW, van Soolingen D, Dale JW, Schuitema AR, McAdam RA, Catty D, et al. Insertion element IS986 from Mycobacterium tuberculosis: a useful tool for diagnosis and epidemiology of tuberculosis. J Clin Microbiol. 1990;28(9):2051–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, Willery E, et al. Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J Clin Microbiol. 2006;44(12):4498–510.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, et al. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997;35(4):907–14.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Das S, Chan SL, Allen BW, Mitchison DA, Lowrie DB. Application of DNA fingerprinting with IS986 to sequential mycobacterial isolates obtained from pulmonary tuberculosis patients in Hong Kong before, during and after short-course chemotherapy. Tuber Lung Dis. 1993;74(1):47–51.

    Article  CAS  PubMed  Google Scholar 

  6. Tam CM, Chan SL, Kam KM, Sim E, Staples D, Sole KM, et al. Rifapentine and isoniazid in the continuation phase of a 6-month regimen. Interim report: no activity of isoniazid in the continuation phase. Int J Tuberc Lung Dis. 2000;4(3):262–7.

    CAS  PubMed  Google Scholar 

  7. Benator D, Bhattacharya M, Bozeman L, Burman W, Cantazaro A, Chaisson R, et al. Rifapentine and isoniazid once a week versus rifampicin and isoniazid twice a week for treatment of drug-susceptible pulmonary tuberculosis in HIV-negative patients: a randomised clinical trial. Lancet. 2002;360(9332):528–34.

    Article  PubMed  Google Scholar 

  8. Lienhardt C, Cook SV, Burgos M, Yorke-Edwards V, Rigouts L, Anyo G, et al. Efficacy and safety of a 4-drug fixed-dose combination regimen compared with separate drugs for treatment of pulmonary tuberculosis: the Study C randomized controlled trial. JAMA. 2011;305(14):1415–23.

    Article  CAS  PubMed  Google Scholar 

  9. Jindani A, Nunn AJ, Enarson DA. Two 8-month regimens of chemotherapy for treatment of newly diagnosed pulmonary tuberculosis: international multicentre randomised trial. Lancet. 2004;364(9441):1244–51.

    Article  CAS  PubMed  Google Scholar 

  10. Walker TM, Monk P, Grace Smith E, Peto TEA. Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing. Clin Microbiol Infect. 2013;19(9):796–802.

    Article  CAS  PubMed  Google Scholar 

  11. Walker TM, Lalor MK, Broda A, Saldana Ortega L, Morgan M, Parker L, et al. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med. 2014;2(4):285–92.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13(2):137–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Satta G, Witney AA, Shorten RJ, Karlikowska M, Lipman M, McHugh TD. Genetic variation in Mycobacterium tuberculosis isolates from a London outbreak associated with isoniazid resistance. BMC Med. 2016;14(1):117.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bryant JM, Harris SR, Parkhill J, Dawson R, Diacon AH, van Helden P, et al. Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study. Lancet Respir Med. 2013;1(10):786–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Guerra-Assunção J, Crampin A, Houben R, Mzembe T, Mallard K, Coll F, et al. Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. eLife. 2015. doi: 10.7554/eLife.05166

  16. Gillespie SH, Crook AM, McHugh TD, Mendel CM, Meredith SK, Murray SR, et al. Four-month moxifloxacin-based regimens for drug-sensitive tuberculosis. N Engl J Med. 2014;371(17):1577–87.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Jindani A, Harrison TS, Nunn AJ, Phillips PPJ, Churchyard GJ, Charalambous S, et al. High-dose rifapentine with moxifloxacin for pulmonary tuberculosis. N Engl J Med. 2014;371(17):1599–608.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Kent L, McHugh TD, Billington O, Dale JW, Gillespie SH. Demonstration of homology between IS6110 of Mycobacterium tuberculosis and DNAs of other Mycobacterium spp.? J Clin Microbiol. 1995;33(9):2290–3.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Kozarewa I, Turner DJ. 96-plex molecular barcoding for the Illumina Genome Analyzer. Methods Mol Biol Clifton NJ. 2011;733:279–98.

    Article  CAS  Google Scholar 

  20. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-Bio. 2013. Cited 7 July 2014.

  21. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Coll F, Mallard K, Preston MD, Bentley S, Parkhill J, McNerney R, et al. SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences. Bioinformatics. 2012;28(22):2991–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5:4812.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Majlessi L, Prados-Rosales R, Casadevall A, Brosch R. Release of mycobacterial antigens. Immunol Rev. 2015;264(1):25–45.

    Article  CAS  PubMed  Google Scholar 

  26. Casali N, Broda A, Harris SR, Parkhill J, Brown T, Drobniewski F. Whole genome sequence analysis of a large isoniazid-resistant tuberculosis outbreak in London: a retrospective observational study. PLoS Med. 2016;13(10):e1002137.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Aber VR, Allen BW, Mitchison DA, Ayuma P, Edwards EA, Keyes AB. Quality control in tuberculosis bacteriology. 1. Laboratory studies on isolated positive cultures and the efficiency of direct smear examination. Tubercle. 1980;61(3):123–33.

    Article  CAS  PubMed  Google Scholar 

  28. Burman WJ, Reves RR. Review of false-positive cultures for Mycobacterium tuberculosis and recommendations for avoiding unnecessary treatment. Clin Infect Dis. 2000;31(6):1390–5.

    Article  CAS  PubMed  Google Scholar 

  29. Köser C, Bryant JM, Becq J, Torok ME, Ellington MJ, Marti-Renom MA, et al. Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. N Engl J Med. 2013;369(3):290–2.

    Article  PubMed  Google Scholar 

  30. Tufariello JM, Chapman JR, Kerantzas CA, Wong K-W, Vilchèze C, Jones CM, et al. Separable roles for Mycobacterium tuberculosis ESX-3 effectors in iron acquisition and virulence. Proc Natl Acad Sci. 2016;113(3):E348–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Hanekom M, Streicher EM, de Berg DV, Cox H, McDermid C, Bosman M, et al. Population structure of mixed Mycobacterium tuberculosis infection is strain genotype and culture medium dependent. PLoS One. 2013;8(7):e70178.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Fang R, Li X, Li J, Wu J, Shen X, Gui X, et al. Mixed infections of Mycobacterium tuberculosis in tuberculosis patients in Shanghai, China. Ann Tuberc. 2008;88(5):469–73.

    Article  Google Scholar 

  33. Mallard K, McNerney R, Crampin AC, Houben R, Ndlovu R, Munthali L, et al. Molecular detection of mixed infections of Mycobacterium tuberculosis strains in sputum samples from patients in Karonga District, Malawi. J Clin Microbiol. 2010;48(12):4512–8.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Cohen T, Wilson D, Wallengren K, Samuel EY, Murray M. Mixed-strain Mycobacterium tuberculosis infections among patients dying in a hospital in KwaZulu-Natal, South Africa. J Clin Microbiol. 2011;49(1):385–8.

    Article  PubMed  Google Scholar 

  35. Brossier F, Sola C, Millot G, Jarlier V, Veziris N, Sougakoff W. Comparison of a semiautomated commercial repetitive-sequence-based PCR method with spoligotyping, 24-locus mycobacterial interspersed repetitive-unit-variable-number tandem-repeat typing, and restriction fragment length polymorphism-based analysis of IS6110 for Mycobacterium tuberculosis typing. J Clin Microbiol. 2014;52(11):4082–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Cowan LS, Mosher L, Diem L, Massey JP, Crawford JT. Variable-number tandem repeat typing of Mycobacterium tuberculosis isolates with low copy numbers of IS6110 by using mycobacterial interspersed repetitive units. J Clin Microbiol. 2002;40(5):1592–602.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chatterjee A, Mistry N. MIRU–VNTR profiles of three major Mycobacterium tuberculosis spoligotypes found in western India. Tuberculosis. 2013;93(2):250–6.

    Article  CAS  PubMed  Google Scholar 

  38. Mears J, Abubakar I, Cohen T, McHugh TD, Sonnenberg P. Effect of study design and setting on tuberculosis clustering estimates using mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR): a systematic review. BMJ Open. 2015;5(1):e005636.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


MIRU-VNTR analyses were carried out by Selina Bannoo, Emma Cunningham, Alice Morgan, Solomon Mwaigwishya and Laura Wright. Sequencing was carried out at UCL Genomics by Tony Brooks and Nipurna Jina.


The RIFAQUIN trial was funded by the European and Developing Countries Clinical Trials Partnership and the Wellcome Trust; RIFAQUIN Current Controlled Trials number, ISRCTN44153044.

Availability of data and materials

Sequence data has been submitted to the ENA database with accession number PRJEB18529. The full analysis pipeline can be downloaded and run from

Authors’ contributions

AW, AB, PP and NS performed the data analysis; DC cultured the isolates; AJ, PB and TM designed the study. All authors contributed to the writing of the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The study protocol was reviewed and approved by the ethics review committee at St. George’s by medical ethics and regulatory committees representing each of the participating countries, and by the institutional review board of the Centers for Disease Control and Prevention operating in Botswana.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations



Corresponding author

Correspondence to Adam A. Witney.

Additional file

Additional file 1: Table S1.

Spoligotype and sub-lineage information for isolates. (DOCX 23 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Witney, A.A., Bateson, A.L.E., Jindani, A. et al. Use of whole-genome sequencing to distinguish relapse from reinfection in a completed tuberculosis clinical trial. BMC Med 15, 71 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: