The Conservation and Application of Three Hypothetical Protein Coding Gene for Direct Detection of Mycobacterium tuberculosis in Sputum Specimens

Background Accurate and early diagnosis of tuberculosis (TB) is of major importance in the control of TB. One of the most important technical advances in diagnosis of tuberculosis is the development of nucleic acid amplification (NAA) tests. However, the choice of the target sequence remains controversial in NAA tests. Recently, interesting alternatives have been found in hypothetical protein coding sequences from mycobacterial genome. Methodology/Principal Findings To obtain rational biomarker for TB diagnosis, the conservation of three hypothetical genes was firstly evaluated in 714 mycobacterial strains. The results showed that SCAR1 (Sequenced Characterized Amplified Region) based on Rv0264c coding gene showed the highest conservation (99.8%) and SCAR2 based on Rv1508c gene showed the secondary high conservation (99.7%) in M. tuberculosis (MTB) strains. SCAR3 based on Rv2135c gene (3.2%) and IS6110 (8%) showed relatively high deletion rate in MTB strains. Secondly, three SCAR markers were evaluated in 307 clinical sputum from patients in whom TB was suspected or patients with diseases other than TB. The amplification of IS6110 and 16SrRNA sequences together with both clinical and bacteriological identification was as a protocol to evaluate the efficacy of SCAR markers. The sensitivities and specificities, positive predictive value (PPV) and negative predictive value (NPV) of all NAA tests were higher than those of bacteriological detection. In four NAA tests, IS6110 and SCAR3 showed the highest PPV (100%) and low NPV (70% and 68.8%, respectively), and SCAR1 and SCAR2 showed the relatively high PPV and NPV (97% and 82.6%, 95.6% and 88.8%, respectively). Conclusions/Significance Our result indicated that SCAR1 and SCAR2 with a high degree of sequence conservation represent efficient and promising alternatives as NAA test targets in identification of MTB. Moreover, the targets developed from this study may provide more alternative targets for the development of a multisite system to effectively detect MTB in samples.


Introduction
Tuberculosis remains a significant threat to human health for thousands of years. Even today, infection with the etiologic agent, Mycobacterium tuberculosis (MTB), is associated with extraordinarily high rates of morbidity and mortality worldwide. Accurate and early detection of pathogenic mycobacteria in clinical samples is of major importance in the control of human tuberculosis. Delay in diagnosis will result in late initiation of therapy and prolonged transmission. Conventional diagnosis of tuberculosis mainly relies on the acid-fast staining of bacilli smears and mycobacterial culture of sputa from TB patients. However, bacteriological diagnosis is time-consuming, because culturing of M. tuberculosis can take 4 to 8 weeks. Direct staining and microscopy of clinical specimens lack sensitivity and specificity [1].
More recently, nucleic acid amplification (NAA) tests by different PCR technology for the successful identification of MTB directly from clinical samples, such as sputum, blood, urine, and cerebrospinal or pleural fluid have been published [2][3][4]. These nucleic acid-based methods have been found to be more sensitive than conventional methods and were able to detect even small numbers of MTB cells [1,5]. At present, a number of studies related NAA tests have been designed to detect MTB [6][7][8][9][10]. However, in-house tests vary widely in their accuracy. Results from different clinical studies evaluating the various amplification assays were divergent concerning their specificity and sensitivity [7,8]. Factors affecting NAA test accuracy were identified using meta-analytical methods, and the results showed that amplified targets and protocols used to amplify mycobacteria DNA in samples are significantly associated with accuracy [11,12]. While protocols have been greatly optimized, the choice of the target sequence remains controversial. The lack of targets in some strains may be the one of main reasons causing much lower and highly variable detecting accuracy in clinical diagnosis [13,14]. A possible solution may be to identify more effective target or use more than one target to increase detection accuracy.
The genome of MTB (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as unknown function or are classified as conserved hypothetical proteins [15]. Though these hypothetical protein coding genes have been little studied in the TB diagnosis, some have been used as DNA target sequences in the detection of MTB [8,16]. In this study, we firstly evaluated the conservation of three hypothetical genes, which code conserved hypothetical protein Rv0264c, Rv1508c, and Rv2135c, in clinical strains of MTB, and then tested the application for direct detection of MTB in sputum specimens.

Ethics Statement
All these patients were treated in accordance with the Helsinki Declaration on the participation of human subjects in medical research. The ethics approvals were obtained for this study from Shanghai Pulmonary Hospital Ethics Committee (the permit numbers: 2011-fk-03). A written informed consent was obtained from each of participants. , and 102 NTM isolates (isolated from Guangzhou Chest Hospital). All strains were grown in Sauton culture medium (0.5 g/L KH 2 PO 4 , 0.5 g/L MgSO4.7H 2 O, 2 g/L citric acid, 0.05 g/L ferric ammonium citrate, 4.0 g/L L-asparagine, 6% glycerol and 0.02% Tween 80). Cells were sterilized at 80uC for 30 min, and were harvested by centrifugation (12,0006g for 5 min). The bacterial pellet was washed three times with sterilized saline and re-collected by centrifugation (12,0006g for 10 min each time). Aliquots of the bacterial pellet were frozen at 220uC until they were processed for PCR.

Sputum Samples and Preparation
307 cases of clinical sputum specimens were obtained from patients with respiratory symptoms at Shanghai Pulmonary Hospital (Shanghai, PR China) to test the clinical applicability of three coding genes as target DNA in TB diagnosis. This included 203 samples from patients in whom TB was suspected and 104 specimens from patients with diseases other than TB. The diagnosis of TB was made on the basis of (i) TB symptoms and signs; (ii) TB lesion detected by chest X-ray or computed tomography scan; (iii) tuberculin skin test (0.1 to 100 tuberculin units); (iv) symptom relief from anti-TB treatment. Sputum samples were decontaminated by the standard protocol with Nacetyl-L-cysteine-2% NaOH and were concentrated by centrifugation at 3,0006g for 20-30 min [17]. The sediment was resuspended in 2 mL of supernatant.

Conventional Identification
All sputum specimens were processed for direct smear examination by fluorescence microscopy after auramine-rhodamine staining and were cultured onto three Loewenstein-Jensen slants. Mycobacterial cultures were incubated at 37uC for 8 weeks and were examined weekly for growth. Positive slides were confirmed by Ziehl-Neelsen staining. Bacterial colonies were simultaneously identified as M. tuberculosis or other different species of mycobacteria by 16SrRNA sequencing [18] and conventional methods, including growth characteristics, pigment production, colony morphology, and growth in the presence of PNB (pnitrobenzoic acid) and TCH (thiophene-2-carboxylic acid hydrazide). The exact methods used in our laboratory were in accord with those described in Chinese Laboratory Science Procedure of Diagnostic Bacteriology in Tuberculosis. Aliquots of the resuspended sediments of sputum were frozen at 220uC until they were processed for PCR.

Extraction of DNA
The total of 50 mL resuspended bacterial pellet or 100 mL resuspended sputum sediments were treated with DNA lysis buffer (10 mmol/L NaCl, 1 mg/mL SDS, 15% Chelex-100, 1% Tween 20). The mixture was incubated at 50uC for 1 h, followed by 100uC for 10 min, then centrifuged (5,0006g for 10 min) to obtain the aqueous phase containing genomic DNA that was used for PCR amplification.
The PCR reaction of mixtures (20 mL) consisted of 16taq PCR buffer, 2.5 mmol/L MgCl 2 , 0.2 mmol/L dNTP, 0.2 umol/L primer 1, 0.2 umol/L primer 2, 1 U Taq DNA polymerase, 10 ng of template DNA. The reaction conditions were the following: 94uC for 5 min, followed by 35 cycles of 94uC for 1 min, annealing temperature ( Table 2) for 30 s, 72uC for 1 min, and finally 72uC for 10 min. All PCR products of 714 mycobacterial strains were electrophoresed on 2% agarose gel with ethidium bromide staining. PCR products of SCAR markers from some strains were randomly selected and sequenced to exclude fake positive. The clinical strains with inconsistent diagnosis among NAA test and traditional bacteriological examinations tests were needed to sequence their PCR products of 16SrRNA gene. The 570 bp of 16SrRNA sequence was compared to those deposited in the GenBank database using basic Local alignment search tool on the Internet (http://www.ncbi.nlm.nih. gov/BLAST/). A 99% identity was used to define a specific species.

Real-time PCR in Sputum Specimens
The application of three coding genes for direct detection of MTB in sputum specimens was analyzed by a 7500 real-time PCR system (Applied Biosystems). For 1 mL of DNA, we added 12.5 mL of PCR master mix (Quantitect SYBR green PCR buffer (Tris-Cl, KCl, (NH 4 ) 2 SO 4 , 5 mmol/L MgCl 2 (pH 8.7)), dNTP, SYBR green I, and ROX), 0.5 mL forward primer (10 pmol/mL), 0.5 mL reverse primer (10 pmol/mL), and 10.5 mL ddH 2 O. Standard curves of quantified and diluted PCR products, as well as negative controls, were included in each PCR run.

Amplification of DNA from Mycobacterium Strains by PCR
All chromosomal DNA extracted from 714 mycobacterium strains including 21 reference strains and 693 clinical strains diagnosed by traditional bacteriological tests were amplified and tested by Mycobacterium genus-specific primers (F1, R1) from the 16SrRNA gene fragment, MTC-specific primers (IS52, IS57) derived from the IS6110 insertion sequence, and the SCAR markers developed from three coding genes Rv0264c, Rv1508c and Rv2135c. The results were summarized in Table 2. All 714 strains from mycobacterium accurately produced the 570 bp 16SrRNA amplicon. All 121 NTM strains (19 NTM reference strains and 102 NTM clinical isolated strains) were negative by both IS6110-PCR and three SCAR markers amplification. In 590 MTB clinical strains, the positive rate of IS6110, SCAR1, SCAR2 and SCAR3 markers was 92% (543/590), 99.8% (559/590), 99.7% (588/590), and 96.8% (571/590), respectively. The deletion rate of IS6110 sequence was the highest in 4 NAA targets. SCAR1 and SCAR2 showed relatively high conservation in MTB strains. Though SCAR3 showed relatively high deletion rate, its conservation in MTB strains was higher than IS6110 sequence. Other three MTC strains were all positive by IS6110, SCAR1 and SCAR3 amplification. SCAR2-PCR was positive in M. africanum but negative in M. bovis and BCG, which showed that Rv1508c may be absent in M. bovis species.

Culture and Direct Microscopy
All 307 samples were detected by Ziehl-Neelsen staining and growth in Loewenstein-Jensen medium. Acid-fast bacilli was detected in 138 samples after Ziehl-Neelsen. Growth of mycobacteria in Loewenstein-Jensen medium was detected in 151 samples, in which 129 were identified as M. tuberculosis, 15 were identified as nontuberculous mycobacterial species (2 M. kansassi, 2 M. chelonaeabscessus, 10 M. intracellulare, 1 M. avium), and 7 were identified as pathogenic bacteria other than mycobacterium ( Table 3).

Comparison of Real-time PCR in Sputum Specimens with Bacteriological and Clinical Data
A total of 307 samples were examined by routine microbiology laboratory techniques and PCR by using the IS6110 sequences and Rv0264c, Rv1508c, Rv2135c genomic fragment as NAAT targets. The results are summarized in Table 4.
In group I, II, and III, a definitive diagnosis of TB was made on the basis of clinical data. In group I, 101 samples were both smearpositive and culture-positive, and were also positive in SCAR1 and SCAR2 amplification; the positive rate of IS6110 and SCAR3 markers was 95% (96/101), and 90.1% (91/101) in these 101 samples, respectively. 44 samples from group I were smear-positive but culture-negative (n = 23) or smear-negative but culturepositive (n = 21); The positive rate of these samples was 56.8%, 80%, 88.6%, 59.1% in IS6110, SCAR1, SCAR2 and SCAR3 PCR amplification, respectively. The samples from group II (n = 43) were negative in both smear and culture, the positive rate of these samples was relatively low in PCR amplification (37.2% in IS6110-PCR, 65.1% in SCAR1-PCR, 76.7% in SCAR2-PCR, 40% in SCAR3-PCR).
Group III (Table 3) included 15 specimens from patients showing clinical manifestations of a mycobacterial infection. All of these specimens were culture positive, and 66.7% (10/15) of these samples were smear-positive. However, all of these samples were both negative in the IS6110 and SCAR3 amplification, and two sample was positive in SCAR1-PCR and one samples positive in SCAR1-PCR. These samples were diagnosed as having infection with mycobacteria other than M. tuberculosis by conventional methods and 16SrRNA sequencing.
Finally, of the 105 samples from patients without mycobacterial infection, 93 were found to be negative in smear, culture, and PCR amplification, as expected (group IV, Table 3); however, of the 11 remaining specimens, 7 were culture-positive but smear and PCR negative (group V, Table 3), which were identified as pathogenic bacteria other than mycobacterium based on 16SrRNA sequencing, and 4 were SCAR1-PCR or SCAR2-PCR positive but 16SrRNA-PCR/IS6110-PCR and culture/ smear negative (group VI, Table 3).

Sensitivity and Specificity
After resolution of the discrepant analysis by comparison of both clinical and bacteriological criteria and patient history, a total of 188 specimens were considered true positive for TB infection (group I, II) and 119 samples were considered true negative for TB disease (including 104 specimens in group IV, V and VI, and the 15 specimens in group III that contained M. kansassi, M. chelonaeabscessus, M. intracellulare, M. avium). The sensitivities and specificities of the PCR assay, staining, and culture were evaluated in comparison with the clinical assessment of patients and are summarized in Table 4. In four NAA tests, the positive predictive value (PPV) and negative predictive value (NPV) of IS6110, SCAR1, SCAR2 and SCAR3 markers were 100% and 70%, 97% and 82.6%, 95.6% and 88.8%, 100% and 68.8%, respectively. Though IS6110 and SCAR3 showed the highest PPV, they showed relatively low NPV in four PCR assays.

Discussion
Acid-fast staining microscopy of specimens combined with isolation and culture of the bacilli remains the ''gold standard'' method to specifically identify mycobacteria. However, the diagnosis of TB based on bacteriological method is difficult because of the requirement for the identification of M. tuberculosis in patient specimens. One of the most important technical advances in diagnosis of tuberculosis is the development of nucleic acid amplification (NAA) tests, which have emerged as promising alternatives to ensure early detection of M. tuberculosis in clinical samples.
Reports of studies based on NAA tests for the successful identification of M. tuberculosis by different PCR technology have been published [6][7][8][9][10]13,14]. 16SrRNA gene is the target sequence amplified by commercial kits and the performance of these tests is excellent. However, since the target gene is present in all mycobacterial species, the discrimination between mycobacteria must rely on hybridization with specific DNA probes [18,19]. The most popular target sequence is the insertion sequence IS6110. Even though IS6110 element with a high copy number in genome provides a good sensitivity for detecting MTB in the metaanalysis by Flores [11,12], the specificity of IS6110-based assays is controversial. One potential problem with the target is that some strains lack these sequences in its genome, which seriously compromises the use of IS6110-based PCR for tuberculosis diagnosis [14]. Other target sequences have also been developed to identify TB infection by PCR; these include amplification of sequences encoding mycobacterial antigens, and intergenic regions [20]. However, while protocols used to amplify DNA from mycobacteria present in a wide range of samples have been greatly optimized, the choice of the target sequence remains controversial.  Recently, interesting alternatives have found in hypothetical protein coding sequences. The analysis of genome sequencing revealed that there are more than 4000 protein coding genes in the genome of M. tuberculosis (H37Rv), of which a considerable fraction of open reading frames is still labeled as conserved hypothetical protein. Except coding sequences as a target for PCR diagnosis, some unknown function proteins have also been revealed as B cell and/or T cell antigens for TB diagnosis [21,22]. These results implied that rational biomarker for TB diagnosis may be obtained from these conserved hypothetical protein or their coding sequences.
In this study, we first evaluated the conservation of three hypothetical genes, which code conserved hypothetical protein Rv0264c, Rv1508c, and Rv2135c, in clinical strains of M. tuberculosis. The results showed that three hypothetical protein coding sequences had higher conservation than insertion sequence IS6110 in tested MTB strains. SCAR1 based on Rv0264c coding gene showed the highest positive rate (99.8%) and SCAR2 based on Rv1508c coding gene showed the secondary high positive rate (99.7%). SCAR3 (based on Rv2135c coding gene) showed relatively high deletion rate (3.2%) in the three SCAR markers, but less than the deletion rate of IS6110 (8%).
Moreover, the three SCAR markers were also evaluated in clinical sputum specimens. After resolution of the discrepant analysis by comparison of Real-time PCR in sputum specimens with bacteriological and clinical data, the sensitivities and specificities, positive predictive value (PPV) and negative predictive value (NPV) of all NAA tests were higher than those of bacteriological detection (acid-fast staining of bacilli smears and mycobacterial culture). In four NAA tests, IS6110 and SCAR3 showed the highest PPV (100%) and low NPV (70% and 68.8%, respectively), and SCAR1 and SCAR2 showed the relatively high PPV and NPV (97% and 82.6%, 95.6% and 88.8%, respectively).
The ideal target sequence should be present in all strains of the M. tuberculosis complex to avoid false-negative reactions, and absent in all other mycobacterium species to eliminate the chance of falsepositive results. If not, the choice of targets with a high degree of sequence conservation or a multisite system based on more targets is necessarily required. The SCAR1 and SCAR2 markers with a high degree of sequence conservation may represent efficient and promising alternatives as NAA test targets in identification of MTB. SCAR3 and IS6110 showed relatively high deletion rate, but they had the high PPV. Maybe, they can be as alternative targets for the development of a multisite system to effectively detect MTB in samples.