- Research article
- Open Access
- Open Peer Review
Use of whole-genome sequencing to distinguish relapse from reinfection in a completed tuberculosis clinical trial
BMC Medicinevolume 15, Article number: 71 (2017)
RIFAQUIN was a tuberculosis chemotherapy trial in southern Africa including regimens with high-dose rifapentine with moxifloxacin. Here, the application of whole-genome sequencing (WGS) is evaluated within RIFAQUIN for identifying new infections in treated patients as either relapses or reinfections. WGS is further compared with mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) typing. This is the first report of WGS being used to evaluate new infections in a completed clinical trial for which all treatment and epidemiological data are available for analysis.
DNA from 36 paired samples of Mycobacterium tuberculosis cultured from patients before and after treatment was typed using 24-loci MIRU-VNTR, in silico spoligotyping and WGS. Following WGS, the sequences were mapped against the reference strain H37Rv, the single-nucleotide polymorphism (SNP) differences between pairs were identified, and a phylogenetic reconstruction was performed.
WGS indicated that 32 of the paired samples had a very low number of SNP differences (0–5; likely relapses). One pair had an intermediate number of SNP differences, and was likely the result of a mixed infection with a pre-treatment minor genotype that was highly related to the post-treatment genotype; this was reclassified as a relapse, in contrast to the MIRU-VNTR result. The remaining three pairs had very high SNP differences (>750; likely reinfections).
WGS and MIRU-VNTR both similarly differentiated relapses and reinfections, but WGS provided significant extra information. The low proportion of reinfections seen suggests that in standard chemotherapy trials with up to 24 months of follow-up, typing the strains brings little benefit to an analysis of the trial outcome in terms of differentiating relapse and reinfection. However, there is a benefit to using WGS as compared to MIRU-VNTR in terms of the additional genotype information obtained, in particular for defining the presence of mixed infections and the potential to identify known and novel drug-resistance markers.
Evaluations of drug trials for tuberculosis (TB) are complicated by the fact that a recurrence of disease can either be due to endogenous relapse of disease or to subsequent exogenous infection with a new strain (reinfection). Historically, during the major TB chemotherapy trials of the 1960s to 1980s (reviewed by Fox et al. ), it was not possible to differentiate isolates, and all new infections that occurred after the trial conclusion were labelled as relapses.
From the 1980s, a series of genomic-based methods for typing strains of Mycobacterium tuberculosis were developed, in particular IS6110 restriction fragment length polymorphism (RFLP), spoligotyping and mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) typing [2–4]. Some trials therefore began to use molecular methods to differentiate relapses from reinfections. This was initially through IS6110 RFLP typing [5–7] and then through MIRU-VNTR typing , while other trials continued without any differentiation .
MIRU-VNTR became the favoured typing approach because it combined reasonable discrimination with a readout that could both be easily measured and be described in a digital form . More recently, whole-genome sequencing (WGS) has enabled the identification of single-nucleotide polymorphism (SNP) differences, thus leading to far greater discrimination in TB epidemiological studies [10–13].
Two groups have recently used WGS to evaluate paired samples, comparing SNP differences between the original infections and new infections following treatment [14, 15]. The study by Bryant et al.  was based on an ongoing clinical trial  that was being carried out in sub-Saharan Africa, south and east Asia, and central America. Of the 36 paired samples, 33 were found to be highly similar (≤6 SNPs; classed as relapses) and three were highly divergent (≥1306 SNPs; classed as reinfections).
The report by Guerra-Assunção et al.  was not based on a clinical trial, but was taken from the Karonga Prevention Study, a long-term population-based programme in Malawi. In this programme, 60 paired samples collected over a 15-year time period were sequenced, and while the authors also found a clear division in SNP numbers between relapses and reinfections, it was not as marked as in the Bryant study. Thus, they classed 46 samples with 0–8 SNP differences as relapses, and 14 with >100 SNP differences as reinfections.
In this study, we performed WGS and analysed SNPs to compare pre- and post-treatment isolates from the completed RIFAQUIN clinical trial , a study evaluating high-dose rifapentine with moxifloxacin, carried out in sub-Saharan Africa. Successful sequencing was carried out on 36 pairs of samples of M. tuberculosis recovered before treatment and from those patients showing positive cultures at 6 months, and results were compared with MIRU-VNTR data. Our results agree with the general findings from the two studies referred to above, in that the overwhelming majority of secondary cases were classified as relapses. Importantly, WGS was further able to monitor possible epidemiological connections and sample errors during the trial, which were not detected using MIRU-VNTR. Given the added benefit of WGS in this context, we suggest that WGS should be routinely used as the method of choice in such trials.
The RIFAQUIN chemotherapy trial, in collaboration with six institutions in southern Africa, has been previously described . Between August 2008 and August 2011, patients with newly diagnosed smear-positive drug-sensitive TB were randomly assigned to one of the following:
Control regimen: 2 months of daily ethambutol, isoniazid, rifampicin and pyrazinamide followed by 4 months of daily isoniazid and rifampicin;
4-month regimen: Isoniazid replaced by moxifloxacin daily for 2 months followed by 2 months of twice-weekly moxifloxacin and 900 mg rifapentine; or
6-month regimen: Isoniazid replaced by moxifloxacin daily for 2 months followed by 4 months of once-weekly moxifloxacin and 1200 mg rifapentine.
Sputum was examined by microscopy and culture at regular intervals for treatment failure or relapse. Patients had up to 18 months of follow-up post randomisation, with the patients recruited last having 12 months of follow-up post-randomisation. Samples from patients with two or more consecutive M. tuberculosis-positive cultures after 6 months (or at the end of treatment) were selected for WGS.
MIRU-VNTR determination and assignment
The 24-loci MIRU-VNTR typing of these isolates was previously described . Briefly, a 10 μL loop was used to pick up a sample of M. tuberculosis colonies by sweeping across growth on a Lowenstein–Jenson (LJ) slope. Bacteria were heat-killed and DNA extraction performed using lysozyme and proteinase K digestion followed by phenol-chloroform extraction and ethanol precipitation . The 24 MIRU-VNTR loci were amplified in eight labelled multiplex PCR reactions, and the amplicons sized, with MapMarker 1000 standard (BioVentures, Murfreesboro, TN, USAs), by capillary electrophoresis on the sequencer (3130 Genetic Analyzer, Applied Biosystems, Waltham, MA, USA). Analysis was carried out using the GeneMapper software (Applied Biosystems, Waltham, MA, USA), which assigns alleles based on the customised bin-sets (fragment sizes and dyes) used to define each allele. For some samples there was variable coverage across the MIRU-VNTR loci using the sequencer, so, where possible, any missing loci were confirmed by single-plex PCR with products sized by standard agarose gel electrophoresis. Where possible, paired samples (pre- and post-treatment) from a given patient were run in parallel.
For the WGS, 50 μL, containing at least 250 ng, of genomic DNA from each sample was sheared using the Covaris E220 for a target size of 200 bp (Peak Incident Power: 175; duty factor: 10%; cycle/burst: 200; temperature: <8 °C; time: 120 s). Libraries were prepared from sheared DNA using the NEB DNA Ultra kit in accordance with standard protocol (New England Biolabs, Hitchin, UK). The NEB adapters were substituted for the set described by Kozarewa and Turner . Libraries were quantified using the Qubit High Sensitivity DNA assay and pooled equimolarly (Invitrogen, UK). The pools were subjected to paired-end sequencing carried out on a single lane of the Illumina HiSeq 2500 (v3 chemistry, read length 100 bp). Samples which produced a low yield were re-pooled and sequenced on a single MiSeq run (v2 chemistry, read length 250 bp).
Sequence reads were mapped to the H37Rv reference genome (RefSeq accession: NC_000962) using bwa mem v0.7.3a-r367 , alignments were sorted, and duplicates were removed with samtools v0.1.19 . Site statistics were generated using samtools mpileup and variant sites were filtered based on the following criteria: mapping quality above 30, site quality score above 30, at least four reads covering each site with at least two reads mapping to each strand, at least 75% of reads supporting site (DP4) and an allelic frequency of 1. Sites that failed these criteria in any isolate were removed from the analysis. Phylogenetic reconstruction was performed using RAxML v8.2.3  with a General Time Reversible (GTR) model of nucleotide substitution and a Gamma model of rate heterogeneity; branch support values were determined using 1000 bootstrap replicates. Relapse or reinfection calls were made by applying the above filtering criteria to the individual patients’ paired samples. INDELS were identified using samtools mpileup as above, but setting the minimum fraction of gapped reads for candidates to 0.05.
In silico spoligotyping and sub-lineage typing
For each isolate sequence a count of the percentage of reads supporting a variant base at each genome position was plotted. Mixed isolates can be identified by the presence of an extra peak, suggesting the presence of two genotype populations in the sequenced sample. Base calls for the majority and minority strains were separated based on the per cent reads and pseudo-sequences were generated and subsequently included in the phylogenetic reconstruction as above.
Figure 1 shows a flowchart of the samples studied. A total of 827 patients, with newly diagnosed, microscopy-positive pulmonary TB were enrolled in South Africa, Zimbabwe, Botswana and Zambia in the trial. Fifty-one patients had positive cultures in post-treatment follow-up and therefore required genotyping to distinguish relapse from reinfection (as per the RIFAQUIN protocol ). DNA was available to generate MIRU-VNTR data for 44 pairs of samples (pre- and post-treatment). The remaining DNA was passed for WGS, and good-quality sequences (>20× coverage) were generated for both pre- and post-treatment samples of 36 patients.
SNP differences were determined between the pairs of isolates, and a comparison with MIRU-VNTR differences is shown in Table 1. Two main groups can be identified: 32 pairs of isolates had five or fewer SNP differences, and four pairs of samples had a much higher number of SNP differences (range 737–1329). An additional single pair of isolates differed by 57 SNPs, but this was probably because the pre-treatment isolate contained a mixed infection, as discussed below.
Phylogenetic reconstruction of SNPs
Phylogenetic reconstruction of variant SNPs (Fig. 2a) showed that the majority (32 out of 36) of the isolate pairs had low numbers of SNP differences and were therefore clearly determined as cases of relapse. One isolate pair was identified as a mixed infection (see below). The remaining three isolate pairs that had high numbers of SNP differences appear quite divergent on the tree (marked in green) and were determined as likely reinfections.
There were also isolates that mapped closely to other patient isolates on the tree, and these merited closer attention to see if there were genuine connections or unexpected problems caused by possible laboratory handling errors.
Panels b and c in Fig. 2 show one class of pattern that was observed with clustered isolate pairs, in which there were no SNP differences between each member of a pair, but each pair was very closely related to another pair. In both panels, the two pairs of samples came from different centres (panel b: 005 and 014, Harare and Marondera, both in Zimbabwe; panel c: 008 Harare, Zimbabwe, and 001 Francistown, Botswana, on the borders of Zimbabwe; Table 2), suggesting that a laboratory processing error was unlikely. An alternative explanation is that highly similar local strains were circulating in the two relatively close regions and had evolved independently over time.
Panels d and e in Fig. 2 show a different type of pattern, in which a pair of isolates from one patient clustered together, as expected for relapses, but was also identical to a single isolate from another pair, suggesting a possible transmission event. In Fig. 2d, a post-treatment sequence for isolate 009 was identical to isolate pair 012; the two 009 isolates differed by 1233 SNPs. In Fig. 2e, a pre-treatment isolate 004-1 was identical in sequence to both isolates of patient 003; the two 004 isolates differed by 737 SNPs. All four patients received treatment in the same city, Harare (Table 2). While it is not impossible that these genotypes were genuinely isolated from the two patients, 009 and 004, another possible explanation is some form of laboratory processing error. Indeed, in one case the patients visited the hospital on the same day, and in the other results were reported at the same time. This combined with their geographical co-location would further support the possible processing error interpretation. It is also worth noting that if these are indeed errors, they would normally be invisible to the analysis without the resolution of WGS.
One patient’s pair of samples (035) displayed 57 SNPs between the pre- (035-1) and post-treatment (035-2) isolates and was therefore initially classified as a reinfection. However, further analysis of the WGS data showed evidence of a mixed infection in the pre-treatment isolate (035-1; Fig. 3a) corresponding to an approximately 75% to 25% combination of two genotypes. Using this majority/minority ratio of read coverage, it was possible to separate the two genotypes and further phylogenetic reconstruction suggested that it was likely that the minority genotype (035-1-min) was closely related to the post-treatment isolate (035-2; Figs. 2a and 3b). This suggests that this was in fact a relapse of a previously unidentified minority genotype, rather than a reinfection as previously assigned.
Initially there appeared to be 57 SNP differences between the pre- and post-treatment isolates (035-1, 035-2), which would have been an unusual result given that the previous studies had only identified reinfections with very high SNP differences, and nothing at an intermediary level. The observation of mixed genotypes would explain this discrepancy because one of the main filtering criteria in the site-calling algorithm is to remove sites with mixed genotype calls (<75% read support for the call), so the real number of SNP differences between the isolates is likely to be higher. After separating the genotypes, it was estimated that the number of SNP differences between the pre-treatment minority genotype and the post-treatment isolates was 869 SNPs. The pre-treatment minority genotype and the post-treatment isolate appeared to differ by 245 SNPs; however, the genotype separation algorithm used was relatively crude, with filtering based on parameter cut-offs, so it was not possible to completely separate the genotypes at all mixed genome sites, reflecting the overlapping shape of the two distributions (Fig. 3a). However, the proximity of their placement on the tree (Fig. 2a) suggests they are highly related and thus this patient’s disease was likely a relapse.
Comparing WGS with MIRU-VNTR data
Figure 4a shows there is a stark difference in the number of SNP differences between cases of relapse and reinfection, an observation also made by Bryant et al. . Table 1 and Fig. 4b show the distribution of MIRU-VNTR differences. The majority of pairs had no MIRU-VNTR differences (out of up to 21 loci determined), but some had a maximum of seven loci different. We experienced technical difficulties which meant that the number of loci amplified varied (Table 2; see Discussion).
The relationship between SNP and MIRU-VNTR differences is shown in Table 2 and Fig. 4c. There was a clear MIRU-VNTR difference between those labelled as relapses using WGS (zero to two MIRU-VNTR differences) and those labelled as reinfections (seven to eight MIRU-VNTR differences). However, within the relapse group, there was no obvious relationship between these two measures: all samples with two to five SNPs had no MIRU-VNTR differences, whereas there were four with no SNP differences and one MIRU-VNTR difference. Overall, WGS largely agreed with MIRU-VNTR (Table 3), with only the likely mixed infection causing a possible discrepancy. That was based on a decision in the trial to classify pairs with two or more MIRU-VNTR differences as reinfections.
In silico spoligotyping and sub-lineages
Human M. tuberculosis strains have been divided into six global lineages, and further into sub-lineages, some of which may have distinct infection phenotypes . In addition to the whole-genome SNP-based methodology used above, analysis using a set of 62 lineage-defining SNPs  was also used to assign sub-lineages (Additional file 1: Table S1). The three reinfections observed all involved different sub-lineages in the pair (patient 004: Euro-American LAM → Euro-American S type; patient 009: Euro-American S-type → East Asian; patient 015: Euro-American T → East Asian).
In silico spoligotyping was also performed (Additional file 1: Table S1). Of the 32 relapse pairs, 24 had identical spoligotypes and the remaining eight had one to seven spacer differences; all three reinfections had different spoligotypes (9–29 spacer differences).
Drug susceptibility testing showed that only one post-treatment isolate (004-2) had a drug-resistance phenotype, confirmed by genotyping (RIFR: rpoB S450L; INHR: katG S315T; EMBR: embB M306V), while its pre-treatment isolate partner (004-1) was susceptible to all drugs tested. Therefore, there was no evidence of any acquisition of antibiotic resistance during the trial in the samples that were tested with WGS.
SNPs in relapse isolates
While most SNPs that arise in a strain between treatment and relapse would be expected to be random, as long as they are not deleterious, it would be a reasonable hypothesis that some SNPs may actively help the bacteria survive. Comparing the relapse pairs, 18 out of 30 SNPs were synonymous and 12 out of 30 were non-synonymous (Table 4). Of the 12 non-synonymous SNPs and two INDELs, none were in a gene associated with antibiotic resistance, in accord with the fact that no phenotypic resistance was seen. However, two SNPs lay in genes that are implicated in pathogenesis, both associated with esx Type 7 secretion systems (T7SSs)  (discussed below).
Relapse versus reinfection
In this study, high-quality genome sequence was generated for 36 pairs of isolates. The majority of pairs (32 of 36) were shown to have very few SNPs (≤5) between pre- and post-treatment M. tuberculosis isolates, suggestive of relapse and thus treatment failure.
On initial inspection, the other four pairs (4 of 36) had significant SNP differences between samples (57, 737, and two >1000), indicative of reinfection. However, phylogenetic analyses cast doubt on two pairs, in which a single isolate of each pair was highly related to another patient’s isolate in the study. While it is possible that these reflect transmission events, it is difficult to rule out some form of laboratory processing error; indeed, a transmission event so similar to another pair of samples in the trial (in one case the pre-treatment and in the other the post-treatment samples) would be relatively uncommon though not impossible, but such a pattern would be expected if there were a sample processing error and patient samples were swapped. A similar event was suggested by Casali et al. . Indeed, trials inserting negative samples into the TB diagnostic process showed that errors can occur , but strain-typing methods allowed actual contamination to be detected. A review by Burman et al.  indicated a median false-positive rate of 3.1% in published studies. WGS can thus help identify when processing errors have occurred, thereby improving overall trial data quality and acting as a quality control measure of trial procedures.
The case with 57 SNP differences between the isolate pair was probably a mixed infection, and while accurate SNP figures could not be obtained, the data were consistent with a relapse from one of the two pre-existing strains. These are described as a major/minor strain within the sequencing data, but that may not accurately reflect the relative levels in the patient; these levels could, for example, be affected by colony size on the LJ slopes, and the actual loop sample taken for DNA preparation. The isolate pair were initially identified as being different from each other by a higher number of SNP differences (57) than would be expected for a relapse, but at an unusually low level of SNPs for a reinfection compared to other reported examples. This is likely to be due to the mixed infection causing many genuine SNPs to be discarded as uncertain by the site-calling algorithm. Reports of similar cases of mixed infections in previous studies [14, 15, 29] support the likelihood that this interpretation may be genuine, thus suggesting that it is important to assess isolates for evidence of mixed infections before calling relapse/reinfection.
Therefore, from the 36 pairs of isolates sequenced, there was strong evidence that 32 were relapses, one was a mixed infection masking a likely relapse, and three were reinfections, although two of these may have been the result of laboratory processing errors. This proportion (32 of 35 (91%) relapse: 3 of 35 (9%) reinfection; excluding the possible mixed infection) can be compared with previously reported relapse to reinfection proportions of 92:8, also in a chemotherapy trial , and 73:27 in the rather different situation of a long-term study with longer post-treatment follow-up (over 12 years in some cases) . This latter study indicated that relapses occurred towards the start of the follow-up, and particularly within the first 2 years, and therefore is consistent with the study reported here.
SNP differences in this and previous studies
The number of SNP differences in the relapse and reinfection groups was comparable to previous pre- and post-treatment studies (Table 5). Casali et al.  also found up to four SNP differences over 4 years in intra-patient studies. In each of the previous relapse studies, there was a large gap between the number of SNPs found in presumed relapses and in reinfections. This both lends support to the definition used to identify relapse versus reinfection, and also gives weight to the suggestion by Bryant et al.  that there is some immunity to reinfection by very similar strains. The same pattern was observed in this study, even though the phylogenetic tree showed that highly similar strains were circulating. Guerra-Assunção et al.  showed less SNP diversity in reinfections (100 rather than 1000 SNPs), and it would be interesting to determine if there is an effect of time, with similar strains only reinfecting after a longer passage of time. Casali et al.  demonstrated that there is strain diversity within a single sputum specimen, with up to 10 SNP differences seen when individual colonies were sequenced. The methodology described in this study deliberately took a sweep of colonies, which meant that much of this strain diversity within a single specimen would not be seen in WGS at the depth of coverage used.
SNPs seen in relapse isolates
For 16 of the 32 relapse pairs sequenced, SNPs were identified between the isolates (Table 4; excluding the mixed infection). While it is likely that many or most of these will not be advantageous to the bacteria, it is a plausible hypothesis that some of them might have a survival advantage.
Of the 12 non-synonymous SNPs observed in relapse isolate pairs, two were in gene systems that have proven involvement with pathogenesis: the two T7SSs esx1 and esx3. One lay in eccB3, which is a gene in the ESX3 T7SS, which is essential for growth. This system is involved in pathogenesis, partly through the control of iron acquisition, which appears to have a role in metal homeostasis . The other was located in mce1B, which is a gene in the ESX1 T7SS, which is essential for virulence and exports the well-characterised ESAT-6/CFP10 complex . Bryant et al.  reported that two genes with SNPs had functions associated with oxidative stress, and Guerra-Assunção et al.  reported an association with katG, well known for being involved in resistance to both oxidative stress and isoniazid. Clearly these may just be chance associations, but they also indicate potential avenues for studying bacterial survival during chemotherapy. The scale of investment in phase 2 and 3 trials is such that there is an obligation to extract as much information as possible from the study and the contribution of WGS is fundamental to understanding the bacteriology under treatment.
A potential confounder in differentiating relapse from reinfection is that of mixed infections. If either the initial or subsequent infection is mixed, then sampling just one isolate could give a misleading designation. One likely mixed infection was identified with a 75:25 genotype ratio, although this ratio may not represent the ratio of the mixture in the bacterial population in vivo.
Of course, these methods would only reveal mixed infections with significant proportions of each strain, and it cannot formally exclude the possibility that other infections were also mixed, but at a very low levels. Bryant et al. , Guerra-Assunção et al. , Casali et al.  and Köser et al.  all identified mixed infections using WGS. Other studies have demonstrated them using alternative techniques, including MIRU-VNTR [31–34], but WGS is more powerful, and Bryant et al.  found that WGS detected more mixed infections than MIRU-VNTR.
The definition of a mixed infection is made less clear by the finding that at least 10 SNP differences can be found within a single sputum sample , and the observation that very similar strains circulate in high-prevalence settings (e.g. Fig. 2). However, the data here and in the previous relapse studies  suggest that some sort of immunological protection might exist that makes successful co-infection with a similar strain less likely.
Comparing WGS to MIRU-VNTR and spoligotyping
Previously, owing to its speed and digital output, MIRU-VNTR has been preferred to the earlier IS6110 profiling as a means of typing M. tuberculosis isolates; indeed, it was only recently described as “the new reference standard for molecular epidemiological studies” .
In this study, there was a correlation between SNP and MIRU-VNTR differences for isolates predicted to be cases of relapse (0–5 SNP; 0–2 MIRU-VNTR loci) and reinfection (SNP > 1000; MIRU-VNTR loci ≥7). This is in contrast with the study of Bryant et al.  who reported that three reinfection pairs had 1–13 different loci, although that study was an interim analysis performed prior to final data resolution and unbinding, which may have impacted on the ultimate assignment of the patients. Furthermore, Casali et al.  found that two MIRU-VNTR differences could correspond to a significant number of SNP differences. A transmission study by Walker et al.  only examined isolates with successful 24-loci MIRU-VNTR data, showing that, up to a difference of 100 SNPs, isolates could have 1–3 MIRU-VNTR locus differences, while above 100 SNP differences, the number of MIRU-VNTR changes increased.
Achieving consistent results with MIRU-VNTR, which involves 24 multiplexed PCRs, is known to be technically challenging [14, 26, 36, 37]. Indeed, there was significant variation in the number of loci amplified in this study (Table 2, Fig. 4d), which we attribute to a combination of DNA quantity and quality, and the technical difficulties referred to above. Furthermore, other limitations and issues with MIRU-VNTR in relation to the study setting have been discussed in a systematic review .
WGS is technically more straightforward and was comparable in cost in our hands (~£100 per sample), but with reducing costs and whole-genome resolution, it is clearly a superior, more robust method then MIRU-VNTR for strain typing. In addition, WGS can provide additional information by identifying markers associated with drug resistance, which could be useful in the context of relapsing cases in a clinical trial. Sequence data is also more amenable to incorporation into other studies and will provide further information on TB evolution as global databases of genome information grow.
Spoligotyping has been widely used for robust division of M. tuberculosis into different sub-types , but we found that SNPs were not only far more sensitive for determining relapses and reinfections, but also more useful for assigning sub-lineages.
Value of WGS in chemotherapy trials
The data from this study in combination with the previous two relapse studies [14, 15] allow an evaluation of the relative benefit of using WGS or MIRU-VNTR as a means of determining relapses from reinfections in chemotherapy trials.
The RIFAQUIN trial was in an area of high endemicity, suggesting that reinfections are not likely to be higher elsewhere due to disease prevalence. Thus, the data presented in this study and previously [14, 15] indicate that the proportion of reinfections is very low compared to relapse, although Guerra-Assunção et al.  suggest that reinfections may rise at later time points after completion of therapy. Furthermore, cases in which isolates are identified as reinfections are more likely to be wrong, because the possible errors observed here (processing errors, unrecognized mixed infections) are more likely to suggest a reinfection.
In the pre-genomic era, all post-treatment infections were presumed to be relapses, and it could be argued that, due to the low reinfection rates and the increased cost and time required to perform the sequencing, WGS provides only modest gains for the analysis of the primary outcome in a chemotherapy clinical trial of this nature.
Nevertheless, in addition to robust genomic evidence for treatment outcome, the added information that WGS provides is scientifically valuable and will become of greater value as more genome sequence data and more information about the genotype–phenotype correlation and its impact on disease and transmission becomes available. Furthermore, future trials for new TB drugs in the development pipeline or novel combination regimens may be held in areas of high TB prevalence where re-infection or mixed infections are more likely, thus making accurate strain discrimination imperative; in these instances, WGS should be the method of choice.
Fox W, Ellard GA, Mitchison DA. Studies on the treatment of tuberculosis undertaken by the British Medical Research Council tuberculosis units, 1946-1986, with relevant subsequent publications. Int J Tuberc Lung Dis. 1999;3(10 Suppl 2):S231–279.
Hermans PW, van Soolingen D, Dale JW, Schuitema AR, McAdam RA, Catty D, et al. Insertion element IS986 from Mycobacterium tuberculosis: a useful tool for diagnosis and epidemiology of tuberculosis. J Clin Microbiol. 1990;28(9):2051–8.
Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, Willery E, et al. Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J Clin Microbiol. 2006;44(12):4498–510.
Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, et al. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997;35(4):907–14.
Das S, Chan SL, Allen BW, Mitchison DA, Lowrie DB. Application of DNA fingerprinting with IS986 to sequential mycobacterial isolates obtained from pulmonary tuberculosis patients in Hong Kong before, during and after short-course chemotherapy. Tuber Lung Dis. 1993;74(1):47–51.
Tam CM, Chan SL, Kam KM, Sim E, Staples D, Sole KM, et al. Rifapentine and isoniazid in the continuation phase of a 6-month regimen. Interim report: no activity of isoniazid in the continuation phase. Int J Tuberc Lung Dis. 2000;4(3):262–7.
Benator D, Bhattacharya M, Bozeman L, Burman W, Cantazaro A, Chaisson R, et al. Rifapentine and isoniazid once a week versus rifampicin and isoniazid twice a week for treatment of drug-susceptible pulmonary tuberculosis in HIV-negative patients: a randomised clinical trial. Lancet. 2002;360(9332):528–34.
Lienhardt C, Cook SV, Burgos M, Yorke-Edwards V, Rigouts L, Anyo G, et al. Efficacy and safety of a 4-drug fixed-dose combination regimen compared with separate drugs for treatment of pulmonary tuberculosis: the Study C randomized controlled trial. JAMA. 2011;305(14):1415–23.
Jindani A, Nunn AJ, Enarson DA. Two 8-month regimens of chemotherapy for treatment of newly diagnosed pulmonary tuberculosis: international multicentre randomised trial. Lancet. 2004;364(9441):1244–51.
Walker TM, Monk P, Grace Smith E, Peto TEA. Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing. Clin Microbiol Infect. 2013;19(9):796–802.
Walker TM, Lalor MK, Broda A, Saldana Ortega L, Morgan M, Parker L, et al. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med. 2014;2(4):285–92.
Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13(2):137–46.
Satta G, Witney AA, Shorten RJ, Karlikowska M, Lipman M, McHugh TD. Genetic variation in Mycobacterium tuberculosis isolates from a London outbreak associated with isoniazid resistance. BMC Med. 2016;14(1):117.
Bryant JM, Harris SR, Parkhill J, Dawson R, Diacon AH, van Helden P, et al. Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study. Lancet Respir Med. 2013;1(10):786–92.
Guerra-Assunção J, Crampin A, Houben R, Mzembe T, Mallard K, Coll F, et al. Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. eLife. 2015. doi: 10.7554/eLife.05166
Gillespie SH, Crook AM, McHugh TD, Mendel CM, Meredith SK, Murray SR, et al. Four-month moxifloxacin-based regimens for drug-sensitive tuberculosis. N Engl J Med. 2014;371(17):1577–87.
Jindani A, Harrison TS, Nunn AJ, Phillips PPJ, Churchyard GJ, Charalambous S, et al. High-dose rifapentine with moxifloxacin for pulmonary tuberculosis. N Engl J Med. 2014;371(17):1599–608.
Kent L, McHugh TD, Billington O, Dale JW, Gillespie SH. Demonstration of homology between IS6110 of Mycobacterium tuberculosis and DNAs of other Mycobacterium spp.? J Clin Microbiol. 1995;33(9):2290–3.
Kozarewa I, Turner DJ. 96-plex molecular barcoding for the Illumina Genome Analyzer. Methods Mol Biol Clifton NJ. 2011;733:279–98.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-Bio. 2013. http://arxiv.org/abs/1303.3997. Cited 7 July 2014.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Coll F, Mallard K, Preston MD, Bentley S, Parkhill J, McNerney R, et al. SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences. Bioinformatics. 2012;28(22):2991–3.
Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5:4812.
Majlessi L, Prados-Rosales R, Casadevall A, Brosch R. Release of mycobacterial antigens. Immunol Rev. 2015;264(1):25–45.
Casali N, Broda A, Harris SR, Parkhill J, Brown T, Drobniewski F. Whole genome sequence analysis of a large isoniazid-resistant tuberculosis outbreak in London: a retrospective observational study. PLoS Med. 2016;13(10):e1002137.
Aber VR, Allen BW, Mitchison DA, Ayuma P, Edwards EA, Keyes AB. Quality control in tuberculosis bacteriology. 1. Laboratory studies on isolated positive cultures and the efficiency of direct smear examination. Tubercle. 1980;61(3):123–33.
Burman WJ, Reves RR. Review of false-positive cultures for Mycobacterium tuberculosis and recommendations for avoiding unnecessary treatment. Clin Infect Dis. 2000;31(6):1390–5.
Köser C, Bryant JM, Becq J, Torok ME, Ellington MJ, Marti-Renom MA, et al. Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. N Engl J Med. 2013;369(3):290–2.
Tufariello JM, Chapman JR, Kerantzas CA, Wong K-W, Vilchèze C, Jones CM, et al. Separable roles for Mycobacterium tuberculosis ESX-3 effectors in iron acquisition and virulence. Proc Natl Acad Sci. 2016;113(3):E348–57.
Hanekom M, Streicher EM, de Berg DV, Cox H, McDermid C, Bosman M, et al. Population structure of mixed Mycobacterium tuberculosis infection is strain genotype and culture medium dependent. PLoS One. 2013;8(7):e70178.
Fang R, Li X, Li J, Wu J, Shen X, Gui X, et al. Mixed infections of Mycobacterium tuberculosis in tuberculosis patients in Shanghai, China. Ann Tuberc. 2008;88(5):469–73.
Mallard K, McNerney R, Crampin AC, Houben R, Ndlovu R, Munthali L, et al. Molecular detection of mixed infections of Mycobacterium tuberculosis strains in sputum samples from patients in Karonga District, Malawi. J Clin Microbiol. 2010;48(12):4512–8.
Cohen T, Wilson D, Wallengren K, Samuel EY, Murray M. Mixed-strain Mycobacterium tuberculosis infections among patients dying in a hospital in KwaZulu-Natal, South Africa. J Clin Microbiol. 2011;49(1):385–8.
Brossier F, Sola C, Millot G, Jarlier V, Veziris N, Sougakoff W. Comparison of a semiautomated commercial repetitive-sequence-based PCR method with spoligotyping, 24-locus mycobacterial interspersed repetitive-unit-variable-number tandem-repeat typing, and restriction fragment length polymorphism-based analysis of IS6110 for Mycobacterium tuberculosis typing. J Clin Microbiol. 2014;52(11):4082–6.
Cowan LS, Mosher L, Diem L, Massey JP, Crawford JT. Variable-number tandem repeat typing of Mycobacterium tuberculosis isolates with low copy numbers of IS6110 by using mycobacterial interspersed repetitive units. J Clin Microbiol. 2002;40(5):1592–602.
Chatterjee A, Mistry N. MIRU–VNTR profiles of three major Mycobacterium tuberculosis spoligotypes found in western India. Tuberculosis. 2013;93(2):250–6.
Mears J, Abubakar I, Cohen T, McHugh TD, Sonnenberg P. Effect of study design and setting on tuberculosis clustering estimates using mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR): a systematic review. BMJ Open. 2015;5(1):e005636.
MIRU-VNTR analyses were carried out by Selina Bannoo, Emma Cunningham, Alice Morgan, Solomon Mwaigwishya and Laura Wright. Sequencing was carried out at UCL Genomics by Tony Brooks and Nipurna Jina.
The RIFAQUIN trial was funded by the European and Developing Countries Clinical Trials Partnership and the Wellcome Trust; RIFAQUIN Current Controlled Trials number, ISRCTN44153044.
Availability of data and materials
Sequence data has been submitted to the ENA database with accession number PRJEB18529. The full analysis pipeline can be downloaded and run from http://github.com/bugs-bioinf/rifaquin-2016.
AW, AB, PP and NS performed the data analysis; DC cultured the isolates; AJ, PB and TM designed the study. All authors contributed to the writing of the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The study protocol was reviewed and approved by the ethics review committee at St. George’s by medical ethics and regulatory committees representing each of the participating countries, and by the institutional review board of the Centers for Disease Control and Prevention operating in Botswana.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Spoligotype and sub-lineage information for isolates. (DOCX 23 kb)