New Short Tandem Repeat-Based Molecular Typing Method for Pneumocystis jirovecii Reveals Intrahospital Transmission between Patients from Different Wards

Pneumocystis pneumonia is a severe opportunistic infection in immunocompromised patients caused by the unusual fungus Pneumocystis jirovecii. Transmission is airborne, with both immunocompromised and immunocompetent individuals acting as a reservoir for the fungus. Numerous reports of outbreaks in renal transplant units demonstrate the need for valid genotyping methods to detect transmission of a given genotype. Here, we developed a short tandem repeat (STR)-based molecular typing method for P. jirovecii. We analyzed the P. jirovecii genome and selected six genomic STR markers located on different contigs of the genome. We then tested these markers in 106 P. jirovecii PCR-positive respiratory samples collected between October 2010 and November 2013 from 91 patients with various underlying medical conditions. Unique (one allele per marker) and multiple (more than one allele per marker) genotypes were observed in 34 (32%) and 72 (68%) samples, respectively. A genotype could be assigned to 55 samples (54 patients) and 61 different genotypes were identified in total with a discriminatory power of 0.992. Analysis of the allelic distribution of the six markers and minimum spanning tree analysis of the 61 genotypes identified a specific genotype (Gt21) in our hospital, which may have been transmitted between 10 patients including six renal transplant recipients. Our STR-based molecular typing method is a quick, cheap and reliable approach to genotype Pneumocystis jirovecii in hospital settings and is sensitive enough to detect minor genotypes, thus enabling the study of the transmission and pathophysiology of Pneumocystis pneumonia.

hospital settings and is sensitive enough to detect minor genotypes, thus enabling the study of the transmission and pathophysiology of Pneumocystis pneumonia. maximum period size (repeat unit) of 5 nucleotides was used, and gave 179 putative results. Di-and tri-nucleotide repeats based on loci with the highest repeat numbers were then selected and loci containing mixed or partial repeat sequences were rejected.
The 10 markers with the most repeat units, distributed in different regions of the genome (different contigs) and in different locations relative to the coding sequence were selected in silico. These regions were then amplified with primers in the 3' and 5' flanking regions of the repeats that were designed in Primer 3 software (http://primer3.ut.ee), with the aim of obtaining amplicons of various sizes for each marker with no overlap between markers. An initial test on 10 randomly selected DNA samples (10 patients) showed correct amplification for six of the 10 selected markers in all samples. These six markers were retained for further investigation ( Table 1). Four of these markers were tri-nucleotide repeats, located in the contigs 022 (STRPj_3_022), 108 (STRPj_3_108), 138 (STRPj _3_138) and 279 (STRPj _3_279), and the other two were di-nucleotide repeats, located in contigs 189 (STRPj_2_189) and 278 (STRPj_2_278). Two were intronic (located in an intron; STRPj_3_022 and _279), two were exonic (located in an exon, STRPj_3_108 and _138) and two were extra-genic (located in the 5' or 3' flanking region of a gene, STRPj_2_189 and _278) ( Table 1).
The 2500 bp sequence up and downstream from each marker was aligned to the Pneumocystis murina genome (available at http://www.broadinstitute.org/annotation/genome/ Pneumocystis_group.2/MultiHome.html). The markers were broadly distributed in the genome and were probably located on different P. jirovecii chromosomes (Table 1). Indeed, since P. jirovecii contigs are still not assembled as chromosomes, as opposed to P. murina genome, the precise location of these markers in P. jirovecii genome is still uncertain.

PCR amplification and genotyping
The six selected STRs were amplified separately by PCR since multiplexing failed to be as sensitive as single PCR (data not shown). The forward primers were tagged with fluorophores (FAM, HEX or ATTO565). All PCR reactions were performed on a GeneAmp PCR System 9700 Thermocycler (Applied Biosystems) in a final volume of 20 μL containing 1X Ampli Taq Gold buffer (Life technologies) with 0.25μM of each primer, 2.5mM of MgCl2, 0.8μM of dNTPs, 0.25 UI of Ampli Taq Gold polymerase (Life technologies) and 2 μL of DNA. The reaction consisted of 10 minutes at 95°, followed by 35 cycles of 30 s at 95°C (denaturation), 30 s at 56°(primer annealing) and 60 s at 72°C (extension) followed by a final extension of 10 min at 72°C. A sample with a mixed genotype at one locus (sample181) was used in each PCR run as an internal control and to measure reproducibility.

Fragment processing and analysis
After amplification, 2 μL of PCR product was prepared for fragment analysis by the addition of 18 μL of formamide (3700 formamide, Life technologies) and 1 μL of Genescan-500 LYZ Size Standard (Life technologies). Capillary electrophoresis was performed with the denaturing polymer POP-7 (Life technologies) in a 50 mm capillary tube at 60°C. The lengths of the PCR fragments were determined on an ABI 3500 genetic analyzer with ABI Gene mapper v4.1 software (Life technologies).

Minority allele detection
To test the limit of detection of multiple genotypes, two samples were selected containing both a single allele/locus and the same fungal load (same Cq +/-1), which gave rise to a peak intensity in a 1:1 ratio. Various serial dilutions (1:1000, 1:100; 1:50, 1:20, 1:10 and 1:2) of each DNA sample were then tested, and the presence of the expected amplicons was analyzed in each mixture.

Genotyping
Genotypes were determined if (i) each of the six markers were pure (no additional peak corresponding to a smaller or larger amplicon); or (ii) only one marker had multiple amplicons (one to three additional detectable peaks corresponding to a smaller or larger amplicon). In samples harboring mixtures in more than one marker, already known genotypes in other types of samples (see above) (n = 61) were screened. Deduced genotypes that were detected only in those samples were considered not suitable for global analysis. In samples harboring several alleles at one locus, the major allele was recorded (entered first in Tables).

Samples and patients
A total of 106 respiratory samples defined as positive for P. jirovecii by quantitative real-time PCR [41] and harboring a high fungal load (mean quantification cycle = 25.46±4.25) were selected from those collected between 1 st January 2011 and 31 st December 2013 [41]. DNA was extracted as described previously [41] and stored at -20°C. All samples were processed for diagnostic procedures with the patient's informed consent. The patients were cared for in three hospitals in the north of Paris. Demographic data and clinical variables including age, sex, underlying disease at the time of the BAL procedure and outcome at the last follow-up visit were recorded retrospectively from the electronic medical file. Underlying diseases were divided into four categories (HIV positivity, hematological malignancies, solid organ transplantation, others). Each hospital stay, as an inpatient or an outpatient, in addition to radiological and medical consultations, was recorded with electronic medical file software.

Ethics statement
This study was a non-interventional study with no change in the usual procedures. Biological material and clinical data were obtained only for standard diagnostic following physicians' prescriptions with no specific sampling. According to the French Health Public Law (CSP Art

Data analysis
Relatedness between the different genotypes was investigated by comparing allelic profiles with the minimum spanning tree (MStree) method (BioNumerics software v6.5, Applied Maths Inc., Austin, TX). Briefly, STRs were treated as multistate categories based on an infinite allele model (i.e., all changes are equally likely). Singletons were defined as genotypes that were not grouped into clonal complexes, i.e. they had at least two allelic mismatches with any other genotype. The number of repeat differences between ancestral and derived alleles was computed for each link of one mismatch along the MStree. The classical criterion of one allelic mismatch to group genotypes in clonal complexes was used [42]. Discriminatory power was calculated with Simpson's diversity index (D) as described previously [31,43], taking into account only one sample per genotype per patient (samples harboring a genotype already recovered once in a given patient were excluded) such that only independent samples were retained [43].
Correlation with clinical data was performed only with one sample per patient. Statistical analysis was performed with Prism v5.0 (GraphPAD Software, San Diego, CA).

STR-based method
A total of 106 P. jirovecii-PCR-positive samples from 91 patients were genotyped at our six loci. The allelic diversity varied in our population (Fig 1) with STRPj_3_022, STRPj_3_108, and STRPj_2_278 harboring a low and STRPj_3_138, STRPj_2_189, STRPj_3_279 a high allele diversity in size (Fig 1, Table 1). We tested the ability of our assay to detect multiple alleles for each marker in one sample. The maximal ratio allowing the detection of the minor allele was 1:50 (2%) (data not shown). In the internal control sample tested nine times in nine PCR runs, a variation in the percentage of the minority allele compared to the major allele (2.3 to 9.4%) was observed (S1 File).
The P. jirovecii fungal load determined from the quantification cycle (Cq) using mtLSU diagnostic qPCR was not significantly different in unique and multiple genotype samples (p = 0.45), suggesting that the ability to detect mixtures was not dependent on fungal load.

Analysis of iterative samples in patients
Thirty respiratory repeat samples (BAL and/or induced sputum) from 15 patients (two per patient) were available. These samples were taken at intervals ranging from 0 to 37 days (Table 3). At least one allele was shared by the two samples in all patients and for all markers except in two patients, who had one marker that showed a one repeat contraction or amplification between samples (Patient 21, marker#189; Patient 23, marker#278).    (Table 3).

Correlation with patient data
Among the 91 patients, 41 (45.1%) were HIV-positive, 25 (27.5%) had hematological malignancies, 14 (15.4%) were renal transplant recipients, and 11 (12.1%) had other immunodeficiencies (solid tumors, immunosuppressive therapy or constitutive immune defects). All patients were cared for in three healthcare facilities: Saint-Louis (n = 77, 84.6%), Lariboisière (n = 11, 12.1%), and Robert Debré (n = 3, 3.3%) hospitals. The male:female ratio was 2.3 and median age was 52 years old [range . Samples with multiple alleles at several loci were excluded from genotyping to avoid misassignment of a combination of alleles to a given genotype leading to artifactual diversity. A total of 61 genotypes were detected in 55 (51.9%) samples (54 We analyzed the distribution of each allele in the six markers for each disease groups taking into account every allele detected in all samples (unique or multiple alleles for each marker) and the proportion of each allele for each marker in each disease group. The distribution of marker#022, #108 (p = 0.002), #138 (p = 0.040), #189 (p = 0.001), and #279 (p<0.0001) significantly differed between groups (S1 Table). Allele 144 in marker#022 (p<0.0001), allele 138 in marker#108 (p = 0.049), allele 169 in marker#138 (p = 0.015), allele 219 in marker#189 (p<0.0001) and allele 190 in marker#279 p<0.0001) were more frequently observed in renal transplant patients than in other patients. Of note, allele 144 in marker#022 was less frequently observed in HIV patients (p = 0.0003) than in other patients. Kidney transplant patients tended When multiple alleles were detected, the major allele is written first. * gain or loss of allele between the two samples.
** allele replacement between the two samples.
Sex and hospital did not cluster with the five genotype categories obtained in the MSTree (clusters 1, 2, 3, 4, and the singletons) (data not shown). However, underlying diseases were not evenly distributed in the five genotype categories (p = 0.0008). A higher proportion of samples from kidney transplant recipients (9/ Genotype 21 (Gt21), composed of alleles 144, 138, 169, 219, 189, 190 in marker#022, #108, #138, #189, #278 and #279, respectively, was found in 8/54 (14.8%) patients of this dataset. In addition, the corresponding alleles were found in three samples (two patients) harboring multiple genotypes, suggesting transmission to these patients (Patients07 and 09). We investigated these 10 patients (six renal transplant recipients, two hematological malignancies and two others) to search for potential transmission events and recorded their presence in the hospital before and at the time of PCP diagnosis (Table 4).
We discovered several instances where patients could have met, potentially leading to transmission. In particular, three groups of patients were temporally linked (Patients01-03, Pa-tients04-06 and Patients07-10). Many patients were present in the same place at several instances before and during the PCP episode (Fig 3). Patient01 was the index case that initiated the transmission chain. Patient01 and 03 had the first recorded cases of PCP among these 10 patients. Patient03 also potentially played a central role in the transmission, with links to Pa-tients04 and Patient07 (red bars). Patient04 was a long-term carrier (705 days between exposure and PCP) with putative transmission to Patient05. The median time between putative exposure and PCP was 197 days (range: 42-705 days).

Discriminatory power of the assay
Simpson's index of diversity (D) was calculated from the three markers harboring the most allelic diversity in the entire set of six markers. Given the potential transmission of Gt21 in cluster 2, calculations were performed with the whole data set but also after the removal of potentially linked cases, i.e. cluster 2 (43 patients) and renal transplant samples from cluster 2 (45 patients) ( Table 5). Six markers gave a better D index than 3 to 5 markers ( Table 5). The D index was 0.985 for the six markers calculated from the whole population (54 patients) ( Table 5). A higher D index (D = 0.992) was obtained with the 6 markers when epidemiologically linked samples were excluded (Table 5). Minimum spanning tree analysis of 61 genotypes from 55 samples harboring a unique genotype (one allele per marker) or multiple genotypes (multiple alleles in one marker). The number of allelic mismatches among STR profiles was used as distance. Each circle corresponds to one genotype (Gt), with its arbitrary number indicated next to it. The size of the circle is correlated with the number of isolates possessing the corresponding Gt, from one (smallest circle) to nine (Gt21). Dark, dashed and thin connecting bars corresponds to one, 2 or >2 different markers observed between linked genotypes. Gray zones surrounding some groups of circles indicate that these profiles belong to the same genetic cluster, meaning that they have a single allelic mismatch with at least one other member of the group. Cluster 2, which was significantly associated with renal transplant recipients, is shown by a dashed line. The color of the circles indicates the underlying disease of the patient in whom this specific genotype was recovered (Green, HIV patient; Red, hematology patient; Purple, renal transplant recipient; Yellow, other cause of immunosuppression).

Discussion
Here, we described a short tandem repeat (STR)-based genotyping assay including six markers. One marker (marker STRPj_2_278) has already been used (PjMS5) in a recent publication [44]. The six markers are located in several contigs within genes (exonic, intronic) or at intergenic locations. Alignment to the P. murina genome suggests that our markers are located on six different chromosomes (Table 1). Exonic and intronic markers are expected to have lower allelic variability than intergenic markers because these regions are under constraint [45]. However, in our case, each marker harbored allelic variability regardless of its location with respect to a gene. The assay is convenient and is based on a single PCR run. It also has high discriminatory power and enables the detection of multiple genotypes with a maximal ratio of 1:50 (2%), which is more sensitive than Sanger sequencing [29] and SSCP [46]. The major allele is written first for samples with multiple alleles. Bold numbers show alleles from Gt21. 21*, 21** are samples harboring multiple genotypes in which all alleles from Gt21 were present in addition to other deduced genotypes. In samples 109 and 110 (21*), alleles corresponding to Gt21, were minority alleles (lower intensity of the peak of the G21 allele compared to the peak of the other allele) in all mixed markers. In samples 231 and 240 (21**), alleles corresponding to Gt21 were major alleles (higher intensity of the peak of the Gt21 allele compared to the peak of the other allele). doi:10.1371/journal.pone.0125763.t004 We established the stability of our six markers by analyzing samples taken from the same patient: in 13/15 patients, the same alleles was iteratively found in 6/6 markers in samples taken between 0 and 14 days apart and in all 15 patients, in 5/6 markers in samples taken between 0 and 37 days apart. The analysis revealed the occurrence of allele replacement or modification (contraction or extension of 1 to 6 repeats for a given marker at 1 or 2 loci) for 4/15 patients, suggesting the microevolution of P. jirovecii during infection (over a short period of time, up to 37 days). This reveals rapid variation of these parts of the genome due to their repetitive nature. Indeed, infection is associated with fungal proliferation and modification, and the number of repeat units could be easily altered (gain or loss). Unfortunately, the stability of each selected marker, as was reported for STRAf in Aspergillus fumigatus [31], cannot be assessed experimentally in humans during infection nor in vitro with long-term cultures of P. jirovecii. However, the culture system published recently [24] or murine model of Pneumocystis pneumonia with P. murina could be used to test the stability of the markers and to study the plasticity of P. jirovecii nuclear and mitochondrial genomes. Comparison of STR to sequencebased typing methods will also bring new arguments to assess the level of stability of these STR markers.
In our study, about 70% samples harbored multiple genotypes. Sanger sequencing-based methods are less sensitive and detected multiple genotypes in only about 30% of samples [28,47]. A high number of multiple genotypes were also observed with SSCP studies in HIV patients [48] or with an STR-based method (70% of patients) [44], suggesting that different strains of P. jirovecii are constantly transmitted between humans throughout life [1,2]. Recovery of unique versus multiple genotypes was not significantly associated with underlying disease, although samples from renal transplant patients tended to harbor a unique genotype, thus reinforcing the hypothesis of transmission between these patients, as already described in literature [18,19,22].
The ability to detect multiple genotypes makes the data difficult to interpret but enables the identification of minor alleles. Association of major and minor allele has been used to determine genotypes using microsatellite [44] and SSCP [46] typing methods.
From a technical point of view, we think that PCR amplification of STR markers could introduce variation in the allele ratio in mixed samples. Indeed, a variation (2.3 to 9.4%) was observed in the proportion of the minority allele compared to the major allele. Consequently, it is difficult to be confident in the results when determining the different genotypes in complex allele mixtures (multiple alleles at more than one marker) including various minority alleles, unless one knows precisely what to look for, as was the case for Gt21. We searched for each of the 61 genotypes determined in pure or mixed samples at one locus in mixed samples two or three alleles at two to six loci. The Gt21 alleles were observed in four samples containing multiple genotypes, thus demonstrating potential transmission because the patients formed part of a transmission route. In our case, Gt21 alleles represent either the minority allele in all markers (sample109 and 110) or the majority allele in all markers (sample231 and 140), as mentioned Table 4. The alleles of Gt1 and Gt3 genotypes were found in 8 and 9 mixed samples, respectively, but no epidemiological link was found at the end between the corresponding patients. In addition, during the analysis of repeat samples from patients harboring multiple genotypes (Table 3), a change in the ratio of the alleles was observed (depicted as ÃÃÃ ) in samples obtained 1 day apart (Patient7 and 21). Therefore, we decided not to determine genotype in samples harboring multiple alleles at more than one marker but only to search for and analyze already described genotypes. The distribution of the alleles of each marker differed according to the patients' underlying diseases. Notably, one allele of 5/6 markers of Gt21 were associated with renal transplant patients. The same analysis, after the removal of samples harboring Gt21, showed that one allele of only 2/6 markers were significantly associated with renal transplant patients.
We found Gt21 in 10 patients and built a putative transmission map. Other studies have demonstrated transmission within particular hospital environments (pediatric transplant unit, waiting room for renal transplant patients) [18,19,22]. Previous outbreaks have been suspected in our hospital [21,49], but this particular one was clinically unrecognized. However, the concomitant presence of some patients in the hospital on the same day prompted us to draw a transmission map suggesting possible transmission between these patients. Transmission could have occurred by meetings in various places (radiology room, cafeteria, corridors, hall) or through a third party (medical staff). Indeed, the possible role in transmission of an immunocompetent third-party such as carrier patients who did not develop PCP [16], family members or medical staff has already been suggested in other outbreaks [14,20,50].
Among the five genotype categories defined for the MSTree analysis, the major one, Cluster 1, included patients from all underlying disease groups and all geographical origins. Branches corresponding to patients with the same underlying disease (i.e. transplant recipients, hematological malignancy) were observed, suggesting an epidemiological link between the corresponding patients, which we failed to uncover. Cluster 2 was mostly composed of samples from renal transplant patients of various geographical origins. Based on the assumption that isolates from different origins exhibit different genotypes, our results confirm those of previous studies [2] suggesting that P. jirovecii does not reactivate the strain acquired during a primary infection, in contrast with cryptococcosis for example [51]. However, Parobek et al. have shown that Pneumocystis isolates were genetically close, despite having been recovered from highly distant geographical areas [44], suggesting a limited diversity of the alleles of their markers. The easiest way to demonstrate reactivation is to study the genotype of patients that migrated from their geographical origin to another country. However, if the allele diversity is low even considering various geographical places, typing methods based on STRs, in particular the assay published by Parobek et al., would not be suitable to investigate this way the reactivation hypothesis. In addition, BAL fluid samples would not be the best specimen to investigate reactivation since various genotypes could be recovered from various areas of the lungs. Induced sputa could better reflect the diversity of the genotypes that could be recovered from the whole lungs [52].
Singletons, i.e. samples with no genetic link with other genotypes recovered in the hospital were mostly composed of samples from HIV-positive patients from various geographical regions. PCP accounted for 32% of opportunistic infections in HIV-positive individuals in 2009 [53]; therefore, it is possible that these patients were not epidemiologically linked to the hospital before they were diagnosed with PCP. However, clinical criteria like (i) pneumocystosis, (ii) the number of episodes of PCP or (iii) previous visits as an outpatient before PCP, were not associated with the genotype categories or with specific alleles.
It is not possible to assess the discriminatory power of an assay with samples potentially involved in transmission, i.e. non-independent strains. Indeed, Simpson's index of diversity (D) cannot be calculated with samples that are linked [43]. However, transmission is especially difficult to rule out with P. jirovecii strains within a hospital given the continual flux of chronically ill patients with a high risk of transmission. The genotype involved in the putative transmission in our hospital aggregated in cluster 2; therefore, we calculated the D index after the exclusion of samples from renal transplant recipients recovered from cluster 2. The resulting D index was then higher than that calculated with the whole population, but was still lower than that obtained by Parobek et al. (D!0.999). This could be due to other unrecognized transmission events in our hospital, the limited diversity of P. jirovecii recovered from a relatively small area (Paris), or the use of six markers instead of nine. Alternatively, the diversity calculated by Parobek et al. may have been artificially high because of the potentially incorrect genotype assignment of samples harboring mixed alleles.
In conclusion, the assay described here is easy to perform. It should prove useful to investigate outbreaks in a hospital setting and should improve the understanding of the pathophysiology of Pneumocystis infections.
Supporting Information S1 File. Intensity of the peaks for each marker tested for the internal control run nine times (Pj_SLS_181). (XLSX) S2 File. Allele of the 61 genotypes recovered in pure or mixed (more than one allele in one marker) samples. (XLSX) S3 File. Alleles detected for each marker for the 48 samples harboring more than one alleles in more than one locus. The 61 already determined genotypes were reported and counted in these samples. (XLSX) S1 Table. Distribution of the different alleles of each marker in the different groups of diseases (DOCX)