Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission.


Methodology
DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico.

Principal findings
In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence.
However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable.

Conclusions/Significance
This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission.

Author summary
Leprosy, one of the most ancient human infectious diseases, affects skin and nerves and is caused by Mycobacterium leprae infection. Despite the effective use of multidrug therapy/ MDT since the 80´s, over 200,000 new cases are reported yearly, indicating active transmission, especially in India and Brazil. Although rare, recurrent clinical manifestations after MDT can occur due to leprosy reactions, relapse by drug resistance, insufficient treatment or reinfection. Relapse and reinfection cannot be differentiated clinically and molecular genotyping of a predefined set of loci have limited resolution due to exceptional M. leprae genome conservation and low sequence diversity between strains from the same geographical area. This is the first report that has compared whole-genome sequences of M. leprae strains from original and recurrent leprosy episodes. M. leprae genome differences were detected between the strains from the first and second episodes in the three patients. In one patient, there was clear evidence for reinfection with an unrelated strain whereas the other two were considered true relapses due to minor strain differences. No known drug resistance mutations were detected, excluding drug resistance as the recurrence cause. Next generation sequencing of M. leprae DNA discriminates relapse from reinfection representing a powerful tool for evaluating different disease outcomes and transmission.

Introduction
Leprosy is a complex dermato-neurologic and systemic disease [1] primarily caused by Mycobacterium leprae or to a much lesser extent by Mycobacterium lepromatosis. [2] Despite a strong decrease in leprosy prevalence since the systematic implementation of multidrug therapy (MDT) in the 1980's, the incidence of disease, the major indicator of active transmission, remains high in many countries, especially in India and Brazil, showing that transmission continues unabated. [3] Overall, more than 200,000 new leprosy cases are reported each year worldwide. [3] The MDT regimen for leprosy consists of different antibiotic combinations that are prescribed based on the number of skin lesions: a six-month regimen of rifampicin and dapsone for paucibacillary (PB) patients (<5 skin lesions) and a twelve month regimen of rifampicin, dapsone and clofazimine for multibacillary (MB) patients (>5 skin lesions). [4] In 2002, WHO proposed that a uniform MDT regimen (U-MDT) should be considered to treat all types of leprosy in order to facilitate leprosy control. In 2007, an open-label randomized and controlled clinical trial (uniform multidrug therapy for leprosy patients in Brazil, U-MDT/CT-BR) was initiated to compare U-MDT with the regular MDT for PB and MB patients. [5,6] Clinical monitoring is still taking place with special emphasis on disease recurrence and leprosy type 1 and type 2 reactions (T1R/T2R).
An increased relapse rate and the possible emergence of drug resistance are major concerns for the shortened MDT proposal for MB patients. It is therefore important to address this issue by analyzing in depth all recurrent cases from the U-MDT/CT-BR trial. Molecular genotyping techniques, such as typing selected single nucleotide polymorphisms (SNP) or counting variable number tandem repeats (VNTR) have been used to differentiate reinfection from relapse. [7][8][9][10][11] However, the resolution of such techniques is often limited because of the exceptional level of genome conservation in M. leprae and the limited sequence diversity between strains from the same geographical area in particular. [12] In contrast, genome-wide approaches provide higher resolution and accuracy compared to genotyping based on a predefined set of loci, but are technically more complex. High throughput sequencing is becoming increasingly efficient and cost-effective with purified DNA but is more challenging with clinical specimens such as DNA extracted directly from skin biopsies, especially from formalin-fixed paraffinembedded (FFPE) samples.
In this study, we investigated three recurrent cases of leprosy from the U-MDT/CT-BR trial to determine whether recurrence was due to drug resistance, bacterial persistence or to reinfection. To achieve this, we compared whole genome sequencing analysis of M. leprae collected from skin lesions at the initial diagnosis and during the recurrence of the disease and correlated the sequence data with the clinical, microbiologic and serologic findings.

Ethics statement
This study was approved by the regional research ethical committees, by the National Committee for Ethics in Research (CONEP, National Health Council/ Ministry of Health, Brazil, protocol # 001/06) and by the human and animal research ethics committee from the Federal University of Goiás (CEMHA/HC/UFG protocol # 166/2011). Written informed consent was obtained from all adult subjects and a parent or guardian of participants under the age of 18 years, provided informed consent on their behalf prior to inclusion in the study (Clinical-Trials.gov identifier: NCT00669643).

Study design
Three recurrent cases of leprosy identified in the U-MDT/CT-BR trial were investigated ( Table 1). Clinical diagnosis and monitoring were carried out at the National Reference Canter in Ceará state, Northeast Brazil. Leprosy diagnosis was confirmed by bacteriological analysis of slit skin smears and by histopathological examination of biopsies taken from active skin lesions. [6] At the first visit, patients had a complete dermato-neurological examination by a dermatologist with expertise in leprosy diagnosis, when the number and the body distribution of skin lesions and affected nerves were registered. Biopsy of skin lesion, venous blood and skin smear material from six sites for bacilloscopy were collected. During the clinical monitoring, patients attended the established schedule for clinical/laboratory monitoring (monthly appointment during the first year and thereafter, yearly). All patients were advised to return to an urgent appointment at the reference center in case any discomfort or new clinical manifestation appeared. In this study, the following case definitions for leprosy reactions were employed: T1R was defined as an acute clinical manifestation, usually characterized by the exacerbation of pre-existing lesions, or the appearance of new lesions. T2R was characterized by the sudden appearance of tender erythematous skin nodules (erythema nodosum leprosum/ENL) mainly accompanied by fever and other systemic symptoms such as joint pain, bone tenderness, neuritis, edema, malaise, anorexia with or without lymphadenopathy. In the clinical diagnosis of reactions, skin signs were obligatory, nerve and systemic signs were noncompulsory while neuritis, malaise, and fever could be present in both types of reaction. Treatment for leprosy reactions followed the guidelines from the Brazilian Ministry of Health. Patients with clinical manifestations not fulfilling these previously described criteria were considered suspected cases of relapses and were clinically examined by the assistant dermatologist, by the PI (GOP) and by an expert member of the independent steering committee (Dr. Sinesio Talhari). Additionally, in these patients skin smears and biopsies were collected from new lesions and used to investigate drug susceptibility (inoculation in BALB/c mice, sequencing of the rpoB, folP1, gyrA and gyrB genes and whole genome sequencing).
As part of the U-MDT/CT-BR trial, a well-prepared biobank of biopsies from leprosy skin lesions and serum samples, collected at various time points during treatment and monitoring, was assembled and has been properly maintained at recruitment sites and an extra back-up has been kept at the coordination center. For this study, we used skin biopsies from the first episode that were formalin-fixed and paraffin-embedded to allow long-term storage and serum samples collected at diagnosis and at various time-points during and after treatment ( Table 1 and S1 Table). Serum IgM antibodies to M. leprae-specific PGL-1 antigen (0 . 01μg/ mL NT-P-BSA) and serum IgG antibodies to the synthetic LID-1 (1μg/mL LID-1) antigen were detected by enzyme-linked immunosorbent assay (ELISA). [13,14] Patients showing   Table 1 and S1  Table), which were used as the source of M. leprae for drug susceptibility testing in BALB/c mice [15] (treated with dapsone, rifampicin or no drug) and for partial [16] and whole genome sequencing.

DNA extraction from tissue
A truXTRAC TM FFPE DNA kit (Covaris) was used following the manufacturer's recommendation with some optimization. Briefly, ten 20μm FFPE tissue sections for each sample were pooled in a screw-cap microTUBE in duplicate or triplicate. Paraffin was removed and the tissue rehydrated with 100μl of tissue SDS buffer using a focused-ultrasonicator series S2 with the following settings: intensity = 5, cycles per burst = 200, time = 300s, temperature = 20˚C. Digestion was done using a 40μl mixture of proteinase K (20 mg/ml) and lysozyme (10 mg/ml) using a focused-ultrasonicator with the same settings as above except for the time set at 10s. Digestion occurred at 56˚C overnight followed by 1 h at 80˚C to reverse the formaldehyde crosslinks. Finally, DNA was isolated from lysates using the columns of the truXTRAC FFPE DNA kit and eluted in 50μl of Covaris BE buffer. DNA was quantified using a Qubit fluorometer (ThermoFisher). For samples 1126-2011 and 2188-2014, which had been passaged in mice, DNA was extracted from mouse footpad suspensions then sheared to~600 bp by ultrasonication and purified with AMPure beads, before library preparation. The quantity of DNA was assessed after each critical step i.e. DNA extraction, library preparation and amplification post-array capture (S2 Table). Since the quality of DNA is known to be low after FFPE extraction, we did not fragment the DNA with the Covaris method as it was already fragmented nor did we size select our libraries to avoid losing too much DNA.

Library preparation and sequencing
DNA from each extract was used to prepare Illumina libraries using a Kapa Hyper Prep kit (Kapa Biosystem) as described elsewhere. [17] To remove host DNA from the libraries, we used a custom-synthesized oligonucleotide array (Agilent) spanning the entire M. leprae genome. [18] Quality of the captured and re-amplified library was assessed using the Fragment Analyzer system (Advances Analytical technologies, Inc). The size of the captured library was 180bp and the concentration 52ng/μl.
Sequencing was performed on an Illumina Hi-Seq 2500 instrument.

Sequence analyses
Raw reads from the same sample were merged and processed as described elsewhere [17] by adapter-and quality-trimming and alignment with the M. leprae TN reference genome (NCBI a.n. AL450380.1). To avoid false positive SNP calls the following cutoffs were applied: minimum overall coverage of 5 non-duplicated reads, minimum of 3 non-duplicated reads supporting the SNP, mapping quality score greater than 8, base quality score greater than 15 and a SNP frequency above 80%. SNPs and short insertions and deletions (InDels) were compared between index and second episodes for each recurrent case. Unique sets of SNPs for each genome were established by comparison with the list of SNPs from 20 M. leprae genomes published elsewhere (S3 Table). [9,18,19] All unique and/or discriminatory variants were manually visualized using the IGV browser [20] to check for possible alignment inconsistencies. We additionally genotyped all samples using the SNP model described in Monot et al. and inferred in silico the VNTR copy number for 33 out of 44 known VNTR loci (11 loci were too large to be spanned with Illumina reads). [9,11,21]

Demographics and diagnosis
The U-MDT/CT-BR study initially enrolled 858 patients of whom 78.4% were classified as MB. During follow-up, four of the treated patients presented with new symptoms between four and eight years after completion of U-MDT and three of these were re-investigated in this study. These participants were three young male leprosy patients (# 1126, 2188 and 3208) from Fortaleza, Ceará, Northeast Brazil, an endemic city for leprosy. The main clinical and laboratory characteristics of these three patients with recurrent signs of leprosy after U-MDT are shown in Table 1. In all three cases, leprosy was first diagnosed in 2007 but the patients displayed new clinical signs, which were not associated with leprosy reactions, between 2011 and 2015.
In these three patients, original leprosy skin lesions detected at diagnosis, disappeared after specific treatment and upon suspicion of relapse/reinfection, new skin lesions were observed in previously unaffected body areas. The timelines of clinical events presented by these patients during follow up (S1 Fig) highlight their high propensity to develop leprosy reactions, especially T2R, although all of them also developed T1R. These records also demonstrate that leprosy reactions and relapse/reinfection occurred at different time points. The timelines also illustrate the evolution of bacilloscopic index (BI) during follow up. In one case, the BI at the second episode was higher than the BI at the first episode.
In addition, the first diagnosis revealed that the three MB patients showed high IgM and IgG antibody levels to PGL-1 and LID-1 antigens, respectively (S1 Fig). Since these biomarkers have been used to monitor the disease state, we measured antibody levels by ELISA before, during and after U-MDT. Overall, the antibody titers gradually declined but remained above the threshold for positivity for at least one of the antigens during the study period except for patient 1126. This patient showed an antibody titer below the threshold just before the recurrence of the disease (39 months after U-MDT) and then both antibody titers increased by the time of recurrent disease. By contrast, despite oscillating levels of PGL-1 antibody for 3208, antibody titers, especially to LID-1, remained high for 3208 and 2188 during the entire study period.

Drug susceptibility results
M. leprae from the recurrent lesions (1126-2011, 2188-2014) was inoculated into mice and only multiplied in the untreated animals, indicating that the bacilli were viable but susceptible to dapsone and rifampicin. It was not possible to inoculate mice with the sample from 3208-2015. Analysis of the rpoB, folP1, gyrA and gyrB genes revealed a wild-type sequence in all six strains, confirming susceptibility to rifampicin, dapsone, and fluoroquinolones, respectively, in all cases.

Whole-genome analysis
Sufficient whole genome read coverage was obtained from the six M. leprae samples for genotyping and comparative genomic analyses (S4 Table).
Strains 3208-2007 and 3208-2015 differed in only two SNPs and one VNTR locus (Fig 1  and S5 Table). Both SNPs (T1740863C in an intergenic region and C1803024T in a pseudogene) were present in 3208-2015, indicating that 3208-2015 was certainly the direct progeny of 3208-2007. In addition, eight unique variant nucleotides were restricted to these two samples (compared to the SNPs from 20 previously published M. leprae genomes [9,18,19] and those from this study), confirming the identity of the strains (S6 Table). Interestingly, a cluster of three SNPs leads to missense mutations, in codons 495 and 496 of asn1, encoding an Lasparagine permease, which contributes to virulence in Mycobacterium tuberculosis [22].
Analysis of 2188-2007 and 2188-2014 revealed identical genome sequences (Fig 1). Curiously, both strains belong to a new SNP subtype intermediate between subtypes 4N and 4O. The only difference between the two genomes was found in the (GTA)9 VNTR locus (S5 Table), which harbored 11 repeats in 2188-2007 and 12 repeats in 2188-2014. Genome comparisons revealed that both strains share 28 unique variant nucleotides (S7 Table). Among them are two missense mutations in ML0411, encoding a PPE protein and in ribD (ML1340), the riboflavin biosynthesis protein. An insertion of 9 nucleotides (GGACATCTA at position 1,219,061) was found in ML1052, a putative PucR-like transcriptional regulator, which leads to a modification of the protein. Interestingly this mutation was present at only 30% frequency in 2188-2007, while it was fixed in 2188-2014.
Another frame-shift arising from a dinucleotide insertion was found in ML0825c, the ortholog of rv2358 in M. tuberculosis that codes for the protein SmtB, a zinc-sensing transcriptional regulator and member of the AsrR/SmtB family. [23,24]. The C-terminal part of SmtB is essential for the protein dimerization, zinc binding and DNA recognition. Furthermore, a specific histidine residue (H138 in ML0825c and H117 in Synechococcus StmB) is important for the allosteric coupling of the zinc and DNA binding sites in the protein. [25] Modeling of M. leprae StmB in silico (S2 Fig) showed that the frame-shift leads to loss of H138 and should thus impair protein function.

Discussion
The relapse rate is considered to be the most important indicator of the efficacy of a given MDT. On the other hand, reinfection is an indicator of active transmission and the susceptibility of leprosy convalescents to new infections. This investigation provided a unique opportunity to apply high-resolution whole-genome tools to differentiate relapse from reinfection and to evaluate the impact of U-MDT on antibody levels to two M. leprae antigens. MDT affects both cellular and humoral M. leprae specific immunity. In MB patients, there is a decline in antibody levels during MDT and patients remain unable to mount a protective Th1 type immunity to M. leprae after treatment. [26] Levels of antibodies to PGL-1 and LID-1 were high in all three MB cases at diagnosis, then declined during and after treatment but nonetheless remained above the cut-off point for positivity, especially antibodies to LID-1. Our data is in accordance with previous studies showing decay in antibody titers while sero-reversion is rare in leprosy patients after regular MDT [26,27]. By the time of recurrent disease, the antibody titers to at least one of the antigens had risen. In our study, patients were carefully monitored for treatment compliance and all completed the U-MDT treatment.
In our study the three MB patients had several episodes of leprosy reactions during follow up including T1R and mainly T2R, in accordance with the reports showing increased propensity of MB patients to develop reactions. [28,29] In fact, several studies have shown that in some endemic areas the occurrence of T1R in BL/LL patients is higher than T2R. A study about risk factors for leprosy reactions in patients from three endemic countries (Philippines, Nepal, Brazil) showed that among all LL and BL patients, T1R was more frequent than T2R. [30] Another study from Thailand showed that T2R was slightly more prevalent than T1R in lepromatous patients. [31] T1R primarily affects immunologically unstable borderline patients (BL, BT, BB), while although sporadic, it also occurs in LL patients. T1R is characterized by an increased inflammatory Th1-type cell-mediated immunity in pre-existing skin lesions and systemically, in serum and in circulating leukocytes. The capacity of BL/LL patients, who have a predominant Th2 response, to develop T1R was elucidated by studies showing leukocytes with a Th0 profile that produce IFNγ, IL2 and IL4 or a polarized shift to Th1 type response with IFNγ and IL-12p40 mRNA in lesional skin and in leukocytes. [32,33] In both leprosy and tuberculosis, host genetic factors and immunological mechanisms determine the outcome of infection so that susceptibility varies among individuals. Case 1126 was unambiguously identified as reinfection because of the extensive polymorphisms between the two strains. Reinfection has long been suspected as a cause of new leprosy episodes and it has been suggested that individuals who have already had leprosy are more likely to be reinfected after treatment due to their inherent immunogenetic susceptibility. [34][35][36] Around 30% of relapse cases in Recife, northeast Brazil, were reported to be in contact with other leprosy patients and more often from the same family or household. [8] Leprosy case 1126 is an example of "family disease", because both of the patient's parents had leprosy around five years before his diagnosis, his daughter and partner had PB leprosy and the partner's cousin, who lives in the same household, was diagnosed with MB leprosy but failed to complete MDT due to alcohol addiction.
The extremely limited genomic variability detected between strains from the same geographical origin poses a challenge in distinguishing between relapse or reinfection with a closely related strain. In a recent paper, Avanzi et al. showed that a strain infecting three patients in the same region of Guinea Conakry differed in only two SNPs [17] and four VNTRs. In our study, two SNPs and one polymorphic VNTR were found. While individual VNTRs carry virtually no ancestral information due to the risk of homoplasy and mutation reversion, the fact that only two SNPs were found strongly indicates that the recurrent strain was directly derived from the original infection. It should be recalled that in our study skin biopsies were taken from two different lesions in different body areas. Likewise, in the case of 2188 only one polymorphic VNTR locus distinguished between the first and the recurrent infection, and the absence of SNPs confirms the strain's identity. Furthermore, there was no history of leprosy in either patient 3208's or 2188's households, and both patients had high antibody titers during the study period suggesting continued immunological stimulation by bacterial antigens after treatment. Therefore, based on the genomic analysis, the patients' epidemiologic history and serological data, we consider that the recurrence of leprosy in both patients 3208 and 2188 was due to relapse.
Leprosy presents a variable incubation period which can range from 2-15 years. Although more prevalent in adults, leprosy also occurs in children <15 years, with reports of cases in patients younger than 1 year of age [37] indicating at least in children, short incubation period of the disease. However, nothing is known about the incubation period of reinfection, especially in genetically susceptible individuals who remain exposed to the bacilli in endemic areas. In this study, the reinfection case was observed 4 years after the conclusion of treatment, indicating a relatively short incubation period but which is in accordance with the reported range of the incubation period of the disease. The availability and larger use of whole genome sequencing studies of M. leprae in recurrent leprosy and leprosy reinfection can clarify the duration of incubation period in such cases. Further investigations of other such cases will give us a more definitive picture of characteristics of reinfection.
It is theoretically possible that the original infection in leprosy could involve more than one strain of M. leprae and, that the recurrence could be a relapse due the regrowth of one of the sub-populations of M. leprae, that had been under-treated by the first course of MDT. However, although possible, in our study this probability was implausible, since in all three patients investigated, including the reinfection case, genomic sequences of the M. leprae strains responsible for the original infections showed no mutation associated with drug resistance. Therefore, even if the original infection had involved more than one strain of M. leprae, these strains were MDT susceptible.
To conclude, this study is the first to demonstrate that it is possible to differentiate reinfection from relapse in leprosy in a field setting with a follow up period extended to eight years. This provides a proof-of-concept and emphasizes the value of whole genome sequencing in clinical follow up of leprosy. Importantly, the extended observation period allowed identification of relapses/reinfection. M. leprae grows very slowly and has a relatively long incubation time, so shorter periods of monitoring would be unlikely to provide sufficient clinical evidence to suspect relapse or reinfection. Also the two relapse cases in this study exemplify the superiority of whole-genome sequencing over genotyping a limited subset of loci or VNTR typing. For instance, the current SNP genotyping scheme can only detect distinct M. leprae lineages [9], which is not useful for analyzing closely related strains. VNTRs can distinguish such strains but do not reflect the overall genetic distance (Fig 1) nor convey information about strain ancestry. Improvements in sample preparation have made whole-genome sequencing more applicable routinely and we expect that recent technological advances will culminate in sequencing platforms that can be used to deliver whole genome coverage at the point of diagnosis within days of seeing the patient. [38] All raw sequence read files have been deposited in the trace archive of the National Center for Biotechnology Information Sequence Read Archive under accession no. SRP078228.
Supporting information S1   In red is the mutated sequence of SmtB. Panel B shows the structure of the CzrA dimer from Sthaphylococcus aureus. Zn, in orange, binds at the interface between the two monomers. Panel C shows a model of the effect of the mutation in ML0825 on the dimer, which compromises the binding of Zn ions. The mutated part is represented in red lines. The protein was modeled using the homology modeling webserver SWISS-MODEL and the structure of the transcriptional repressor CzrA from Sthaphylococcus aureus (PDB code 1R1V) as template. (DOCX) S1 Reference list. (DOCX)