Diversity and microevolution of CRISPR loci in Helicobacter cinaedi

Helicobacter cinaedi is associated with nosocomial infections. The CRISPR-Cas system provides adaptive immunity against foreign genetic elements. We investigated the CRISPR-Cas system in H. cinaedi to assess the potential of the CRISPR-based microevolution of H. cinaedi strains. A genotyping method based on CRISPR spacer organization was carried out using 42 H. cinaedi strains. The results of sequence analysis showed that the H. cinaedi strains used in this study had two CRISPR loci (CRISPR1 and CRISPR2). The lengths of the consensus direct repeat sequences in CRISPR1 and CRISPR2 were both 36 bp-long, and 224 spacers were found in the 42 H. cinaedi strains. Analysis of the organization and sequence similarity of the spacers of the H. cinaedi strains showed that CRISPR arrays could be divided into 7 different genotypes. Each genotype had a different ancestral spacer, and spacer acquisition/deletion events occurred while isolates were spreading. Spacer polymorphisms of conserved arrays across the strains were instrumental for differentiating closely-related strains collected from the same hospital. MLST had little variability, while the CRISPR sequences showed remarkable diversity. Our data revealed the structural features of H. cinaedi CRISPR loci for the first time. CRISPR sequences constitute a valuable basis for genotyping, provide insights into the divergence and relatedness between closely-related strains, and reflect the microevolutionary process of H. cinaedi.


Introduction
Helicobacter cinaedi is a gram-negative, motile, spiral, and microaerophilic bacterium, belonging to the family Helicobacteriaceae. It was first isolated in rectal swabs obtained from homosexual men in the 1980s [1]. Since 2000, the number of reports of H. cinaedi infections have been increasing. Examples of the diverse range of infections caused by H. cinaedi include proctocolitis, gastroenteritis, neonatal meningitis, localized pain, rash, and bacteremia [2]. This organism is difficult to culture and therefore difficult to isolate compared with other Helicobacter spp. and as a result its biological and clinical characteristics are less well understood [3]. In a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and diversity has not been explored in this species. The CRISPR-Cas system should provide useful information about strain characterization, lineage identification, and epidemiology.
Multilocus sequence typing (MLST) is a genotyping method based on the nucleotide sequences of seven housekeeping genes, which are used to assign different alleles to sequence types (ST) and clonal complexes. MLST has been widely used in molecular epidemiology and population biology in Helicobacter species [29,30,31], and has been proven useful for typing other H. cinaedi strains [32].
Genotyping analysis is crucial in terms of understanding the epidemiology of transmission; thus, the aim of the present work was to systematically investigate the prevalence and diversity of CRISPR loci in H. cinaedi. In this study, we developed a CRISPR sequence analysis for H. cinaedi and compared the results with MLST analysis.

CRISPR loci analysis
Primer pairs were designed to amplify the full CRISPR loci, respectively CRISPR1_Forward (5'-CAATTTAGAAAACGCAGAGCC-3') and CRISPR1_Reverse (5'-GATATGATTTACC CTGCGGAAG-3'), and CRISPR2_Forward (5'-TGTCATACTGAGACTTTTGCC-3') and CRISPR2_Reverse (5'-GCTACCCAAAGTCGCCAAAAC-3'). Other primers used for sequencing are listed in S2 Table. Amplification parameters consisted of 35 cycles of denaturation at 94˚C for 15 s, annealing at 55˚C for 15 s, and extension at 72˚C for 2 min. PCR products were sequenced using PCR primers and sequencing primers, designed based on the spacer sequences. Sequence assembly and editing were performed with the DNASIS Pro Version 3.02 (Hitachi Solutions) and MEGA 6. The information pertaining to the CRISPR locus including position, length, and content were acquired from the CRISPR web server (http://crispr.i2bc. paris-saclay.fr/) [35]. Clustal X software was used to investigate the homology of the sequences of the CRISPR region possessed by each strain. The aligned sequences were compared by detecting identical spacers.
Visual representation of the CRISPR arrays was performed as previously described [21,36]. The repeat sequences were removed for each array and the list of spacers was focused on the ancestral spacer on the left-hand side. Each spacer within the array was visually represented by a box. This allowed a comparison of conserved arrays by aligning spacers from the ancestral end. Spacer genotyping was based on common ancestral spacers. A matrix of zeros and ones was calculated, depending on the presence or absence of spacer arrays for every strain. The dendrogram was derived from the matrix of correlation distances by using the Jaccard similarity coefficient with the Dendro-UPGMA program (UPGMA), with a dendrogram construction utility (DendroUPGMA, http://genomes.urv.cat/UPGMA/index.php) [37]. CRISPRTarget (http://bioanalysis.otago.ac.nz/CRISPRTarget/crispr_analysis.html) [38] was utilized to predict the presence of possible protospacers. All spacer sequences were used for homology searching to find potential protospacers with >90% sequence identity [21].

Multilocus sequence typing
Primers and PCR conditions for the seven housekeeping genes were as described in a previous report [32]. After confirming the single amplification products on 1% agarose gels, sequences were determined using a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) and an automatic DNA sequencer (3130 Genetic Analyzer, Applied Biosystems). Allelic MLST sequences were analyzed using the PubMLST website (http://pubmlst.org/). Different STs and CCs were assigned using the H. cinaedi MLST database (http://pubmlst.org/hcinaedi/). The phylogeny for the 42 isolates was estimated by concatenated sequences using the neighborjoining method [39]. Clustal X software was used to align the sequences [40], and calculate the genetic distances. The dendrogram was constructed using NJplot program [41] and MEGA 6 [42].

The CRISPR loci structure in H. cinaedi
Based on genomic analysis [27,28], CRISPR loci are flanked by cas genes encoding Cas proteins (Fig 1). Three cas genes (cas2, cas1 and cas9, in this order) were located upstream of the CRISPR1, which is consistent with a type-II system [43]. Cas1 and Cas2 are the core proteins of the CRISPR-Cas system [15]. Cas9 protein sequencing analysis is consistent with the classification of type II systems characterized to date [44]. To determine the type of H. cinaedi CRISPR-Cas system, we obtained Cas9 amino acid sequences from Gram-negative type II system-containing bacteria, as previously described [17], and compared them with the Cas9 sequences of H. cinaedi strains PAGU597 T and PAGU611. We constructed a multiple sequence alignment and phylogenetic tree for Cas9 (S1 Fig). The phylogenetic tree showed that the Cas9 sequences from two H. cinaedi were closely related to those of Campylobacter jejuni subsp. jejuni NCTC11168, and formed part of the subtype II-C subcluster.
RAMP gene was located downstream of the CRISPR2 locus. Cas1 and cas2 genes were not found in CRISPR2 and the predicted length of the ORF for RAMP was 1782 bp-RAMP is a signature gene of the type III system [15].
The two CRISPR loci, CRISPR1 and CRISPR2, were identified for all H. cinaedi strains by CRISPR PCR and sequencing analysis. An average of 32 spacers (ranging from 4 to 63) were identified in CRISPR1 loci, while CRISPR2 loci had 6 spacers (ranging from 2 to 10). It has been reported that CRISPR repeats are composed of exact repeat sequences ranging from 24 to 48 bases long [45]. These sequences have also been shown to contain palindromes. The 5' terminal portion of a repeat is normally composed of the sequence GTTT (G) and the 3' terminus contains GAAA (C/G) [17,46]. Generally, repeats associated with the type II system are weakly palindromic, and typically 36 bp in length [43]. CRISPR1 and CRISPR2 in H. cinaedi strains retained a 36-bp long repeat sequence. The consensus direct repeats associated with CRISPR1 contained a conserved 5'-GTTTTAGTCCCTTCTTAAACTTCTATATGCTAGAAT-3'. A conserved 5'-GTTTTAGTGGGACCCGATTTAAGGGGATTTGTATCA -3' was present in CRISPR2.

Distribution and conservation of CRISPR spacer arrays
Identification of the spacer sequences from the CRISPR loci was conducted to evaluate the extent of genotypic diversity among the H. cinaedi isolates. We applied an approach to outline the distribution of conserved CRISPR arrays-identified by their ancestral spacer content-in all 42 strains. A conserved ancestral spacer implies commonality among the strains, whereas spacers acquired later may differ between related strains due to different exposures to foreign invasive DNA. The distribution of the identified ancestral spacers enabled the CRISPR arrays to be grouped by spacer organization. The spacer composition of CRISPR1 and CRISPR2 loci are indicated in Figs 2 and 3. We found 20 unique CRISPR1 patterns (CRISPR1 patterns A to T, Fig 2) and 16 unique CRISPR2 patterns (CRISPR2 patterns a to p, Fig 3). The 42 H. cinaedi strains were grouped into 7 different genotypes (G1-G7), according to the sequence spacer arrays (presence or absence) and ancestral spacers (Figs 2-4).
The six reference strains had different spacer arrays compared to the Japanese isolates. PAGU 640, 1749, 1752, and 1753 (from outside Japan) had the same ancestral spacer as genotype G1 and shared conserved spacers, in addition to unique spacers (spacer 1R, 2I, 2W of CRISPR 1; spacer 1i, 1j, 1k, 1l of CRISPR2). Unique spacers (spacer 3S, 5I, 5J of CRISPR1) were also present in PAGU 597 T isolated in the USA, which shared conserved spacers with genotype G2. In this study, the predecessor of genotype 2 had not been identified. It was predicted that a large spacer deletion occurred during expansion of the ancestral lineage, resulting   611, 1294, 1703, 1708 and 1811), the spacer distribution of CRISPR1 differed, even though that of CRISPR2 was same, and vice versa.
in PAGU 597 strain. The spacer organization of PAGU 1744 from the USA was distinctive, and all spacers of the two loci were composed of unique nucleotide sequences.

MLST typing
A total of 11 different sequence types (STs) were identified among the 42 isolates (Table 1). PAGU 1922 had a different allelic profile and did not correspond with any ST belonging to CC1. Among the STs, 11 were assigned into 6 known CCs while ST-18 was unassigned. Based on the phylogenetic tree of MLST, the 36 clinical isolates from Japan were classified into 6 clusters, CC1 (4 isolates), CC4 (14 isolates), CC8 (4 isolates), CC9 (8 isolates), CC16 (7 isolates), and unassigned CC (ST18, 3 isolates) (Fig 5). These H. cinaedi isolates were collected from 5 hospitals in Japan, and the distributions within each hospital were compared. Hospital A obtained 24 isolates over 11 years, which were subsequently divided into 5 clusters (CCs 1, 4, 8, 16, and ST-18). The reference strains (PAGU 597 T and 1744) revealed slightly different sequences compared to the Japanese isolates, while PAGU 640, 1749, 1752, and 1753, which were classified as ST-4, had the same ST as the isolates from hospital A.

Comparison of CRISPR analysis and MLST
Twelve MLST STs were identified among the 42 isolates, whereas there was a greater number of CRISPR patterns (20 CRISPR1 patterns, 16 CRISPR2 patterns, Table 1 and Fig 4), which indicated that CRISPR analysis has greater discriminatory power than MLST. Isolates assigned to ST-4 diversified into six distribution patterns (B, C, D, E, F and G) of CRISPR1 and six patterns of the CRISPR2 (b, c, d, e, f and g). Similarly, the strains assigned to ST-3, ST-8, and ST-16 differentiated into separate CRISPR1 patterns (ST-3, H and I; ST-8, L and M; ST16, P, Q, R, and S). Each sequence of seven housekeeping genes among the PAGU611 and PAGU1496 strains belonging to ST-8 demonstrated identical sequences at all seven loci. However, these strains were isolated at different times and, according to the distribution of CRISPR1 loci, it appeared that the spacer defect of PAGU1496 occurred between 2004 and 2010 in hospital A (spacer 6K, Fig 2). The diversity was revealed by determining the CRISPR sequences for strains assigned to the same ST in MLST analysis.

Discussion
Reports on the number of H. cinaedi infections have been steadily growing, and the association of this bacterium with a variety of human infections and atherosclerotic diseases has received increasing attention in recent years [3,10]. H. cinaedi is currently the most commonly reported enterohepatic Helicobacter isolated in humans. Kitamura et al. previously documented an outbreak of nosocomial H. cinaedi infections caused by direct person-to-person spread [6]. We have also received reports of a growing number of cases of nosocomial H. cinaedi infections in Japan. Indeed, this microorganism is recognized as a causative agent of nosocomial infections [4]. H. cinaedi strains were isolated from men and women of a broad age-range (from neonates to the elderly). Some patients had immunocompromised conditions, while others had not been in apparently immunocompetent [47]. H. cinaedi infections have been detected in hospitals throughout Japan, and we hypothesize that they are more common in Japanese hospitals than is currently recognized. This study attempted to compare CRISPR arrays to gain an understanding of the diversity of H. cinaedi. The CRISPR1-cas locus possessed the minimum number of cas genes required to formulate the cas operon-a characteristic of subtype II-C [15]. The repeats of H. cinaedi CRISPR1 were 36 bp in length, which corresponded with type II systems. The cas components suggested that CRISPR2 of H. cinaedi strains resembles type III systems. Cas1 and cas2 genes were not found in the CRISPR2 loci, but in many organisms, the type III CRISPR-cas operons lack the cas1-cas2 gene pair [15].
Hospital A has been isolating H. cinaedi strains since 2004. Two genotypes of H. cinaedi (genotype G1 and G3) were found in 2004 in a comparison of the evolution of spacer organization over time. Genotypes G1 and G3 were distinguished by the presence of different ancestral spacer. Genotype G1 strains shared the ancestral spacers 1A and 1a. The ancestral spacers 6A and 1p were present in genotype G3. In a previous analysis using pulse field gel electrophoresis typing [6], the strains isolated from 2004 to 2005 in hospital A could be divided into two clusters (initial outbreak strain, subsequent outbreak strain). This clustering pattern was also supported by the phylogenetic tree of hsp gene, as well as the RAPD pattern. These findings are consistent with our results showing genotypes G1 and G3 by CRISPR analysis (Fig 4).
Our results not only provide information about the homology of the sequences in the CRISPR region, but also enable the process of spread to be traced via CRISPR arrays by showing the acquisition and deletion of spacers. Although genotype G1 strains have been circulating in hospital A since 2004, the arrangement of the spacers has frequently changed. These strains were subsequently isolated in the same hospital in 2008, 2009, and 2010.
Based on CRISPR distribution, genotype G1 isolates obtained in hospital A were further divided into three subtypes (genotype G1-I; PAGU 617 and 627, genotype G1-II; PAGU 1024, 1123, 1124, and 1125, genotype G1-III; PAGU 1411, 1459, 1500, and 1513). It could be considered that the predecessor of these subtypes was not identified in this study, and spacer deletions occurred while the genotype G1 isolates were spreading. These data show that CRISPR pattern can systematically distinguish closely-related strains, and reflect the microevolution of strains that are particularly relevant among the same genotypes.
Strains classified as genotype G3 were isolated for the first time in 2004 (PAGU611, PAGU612, and PAGU614), circulated without elimination for several years at the same hospital, and were again detected in patients in 2010 (PAGU1496). In addition to the two major genotypes G1 and G3, genotypes G2, G5, and G6 have also circulated since 2011 at hospital A.
Based on CRISPR analysis, all strains from hospital B were classified as genotype G4 except one (PAGU 1294). However, six of the isolated strains were grouped into two STs (ST-10 and ST-11) via MLST. The reason for dividing the strains into ST-10 and ST-11 was due to differences between two of the bases of the 23S rRNA sequence. Alignments of the sequences of the 23S rRNA gene showed that the nucleotides at positions 547659, 547760, and 548262 (the base order of the genomic sequence of H. cinaedi PAGU597 T , AP012492) were G-T-T in ST-10 and G-C-C in ST-11, respectively. The nucleotide sequence of the above-mentioned site of the strain classified as ST-9 from hospital C is G-T-C. Thus, the distinction between the three STs classified as CC-9 is derived from only two base differences in the nucleotide sequence of 23S rRNA gene. In the 8 strains classified as CC-9, the nucleotide sequences of the other 6 genes were identical by MLST. A previous comparison of the 23S rRNA gene sequences has been reported for the strain isolated in hospital B, [48].These ST-10 and ST-11 strains were isolated from female and male patients, respectively, and it was reported that nosocomial infections could have occurred in these cases via the female or male toilets, respectively. Although the efficacy of sequencing analysis of the 23S rRNA gene has been described [49], the sequences of the 23 rRNA gene of the 4 strains assigned to ST-1 and ST-3 appeared identical, as did those of the 20 ST-4, ST-5, ST-8, and ST-9 strains in this study (Table 1). Therefore, the discrimination value of the 23S rRNA gene sequencing analysis of H. cinaedi strains is low.
The gyrA sequence is an appropriate marker with a high discrimination rate for the phylogenetic analysis of the Helicobacter genus [50]. We evaluated the genetic relationships of our isolates using gyrA and 16S rRNA gene sequences, which are the gold standard for phylogenetic analysis. The gyrA sequences of H. cinaedi isolates showed low diversity (S2 Fig), which led us to conclude that these sequences were useful for analysis within genus, but not within species. The 16S rRNA gene was further investigated for the analysis of H. cinaedi isolates within species (S3 Fig). It was generally thought that the 16S rRNA gene was insufficient for identification at the species level as a stand-alone technique in phylogenetic analysis, but the 16S rRNA phylogenetic tree yielded the same topology as MLST in H. cinaedi species. Thus, the phylogenetic analysis of the 16S rRNA gene was considered reliable for H. cinaedi, contrary to its use for other bacterial species.
In MLST analysis, the strains assigned to ST-3, ST-4, ST-8, and ST-16 had seven genes showing identical nucleotide sequences within each ST group, and no diversity was observed. Meanwhile, in CRISPR analysis, these strains had different spacer distributions, even within the same ST via MLST, and the strains belonging to one ST were divided into two or more CRISPR patterns (Table 1).
The 12 STs were divided into 20 CRISPR patterns, and CRISPR typing is considered to have higher discriminatory power than MLST. In addition, the spacer array of CRISPR does not only distinguish between strains, but also provides useful background information about the evolution of the strains. We can predict the relevance of isolates depending on whether they have a common ancestral spacer. For these reasons, CRISPR analysis is thought to be efficient and provide more information than other genotyping methods.
We have described the epidemiological analysis of H. cinaedi isolates using CRISPR arrays. The polymorphisms among the organization of spacers reflect the adaptation process of H. cinaedi. Thus, the distribution of CRISPR spacers may assist in the study of nosocomial H. cinaedi infections, and may be useful for typing H. cinaedi isolates and elucidating how they spread. CRISPR-Cas system data will contribute to a better understanding of the origins and microevolution of this microorganism.
Supporting information S1 Cas9 proteins from Gram-negative type II system-containing bacteria are referenced [17]. A phylogenetic tree based on Cas9 proteins was constructed by the neighbor-joining method. H. cinaedi strains PAGU597 T and PAGU611 are shown in red. Two Cas9 protein sequences were obtained from the DDBJ (Accession Nos. AP012492 and AP012344, respectively). (PDF)