The Use of High-Throughput DNA Sequencing in the Investigation of Antigenic Variation: Application to Neisseria Species

Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci) into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3′ end of the silent loci (copy 1) as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11) are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.


Introduction
Antigenic variation is a genetic process that leads to highfrequency changes in cell surface components in a wide range of species. To evaluate the experimental approach described here we initially investigated antigenic variation in a gene where that process has been well studied, the pilE gene of Neisseria gonorrhoeae. N. gonorrhoeae (the gonococcus), and the closely related N. meningitidis (the meningococcus), are both strict human pathogens and must continually evade the human immune system. What partly confounds the immune system is antigenic variation of the PilE pilin subunit that assembles into the type IV pilus. In this system the expressed gene (pilE) changes and the so-called silent loci (pilS) donate variant genetic information, but remain unchanged in the process. The variant sequences recombine into the distal two thirds of the pilE gene [1], and the process is dependent on the presence of the RecA protein [2]. Mosaic proteins can sometimes be formed as multiple gene segments from the pilS loci are recombined into the expressed copy to yield a variant gene [1].
Antigenic variation in this gene is a high-frequency event, with approximately 12% of randomly selected colonies containing a variant gene sequence [1]. Two distinct classes of type IV pili are found in N. meningitidis: class I pili (also found in N. gonorrhoeae) antigenically vary, whereas at least some class II pili do not [3].
In addition to the type IV pilin system, other potential antigenic variation systems have been identified in the genome sequences of Neisseria species. The opa genes are a gene family encoding surfaceexposed proteins, with the number of genes varying between strains. For N. gonorrhoeae strain MS11, nine intact opa genes have been cloned and sequenced allowing alignment of their gene sequences [4]. From these alignments two hyper-variable regions can be observed in the middle of the genes as well as a semivariable region located towards the 59 end. Several studies have shown DNA transformation-mediated horizontal transmission of chromosomal DNA where hyper-variable opa segments are exchanged between strains [5][6][7]. However, limited data has been accrued as to whether opa genes also engage in antigenic variation as observed with the pil system. Anecdotal evidence suggests that limited variation does occur in at least some opa genes [4].
Various methods have been used to determine the extent and nature of antigenic variation within a population [1,8]. Recent studies have involved the amplification of the pilE gene from hundreds of randomly selected colonies, followed by sequencing of each amplicon using Sanger sequencing technology to detect the subset that contained variant sequences [1,3,9]. Because of the cost and time involved the number of genes that were eventually sequenced was limiting, leading to conclusions being based on small data sets. For example, the extent of pilE gene variation in N. gonorrhoeae was based on analysis of 497 amplicons, and that for N. meningitidis on 260 amplicons [1,3]. We reasoned that high throughput sequencing of DNA extracted from a varying culture could be used to detect multiple gene variants simultaneously, especially those that might be occurring with a low frequency. The more cells in the culture that are producing a variant sequence, the more variant DNA will be found in the DNA preparation. Comparing the sequence depth of a variant sequence with the sequence depth of all DNA in the sample would therefore enable an estimation of the frequency at which variant cells were present in the culture. This approach has dramatically increased the size of the data sets allowing a more detailed picture to emerge of the extent and nature of antigenic variation.

Bacterial Strains and Growth Conditions
Initial experiments used gonococcal strains FA1090 and MS11 containing the recA6 allele [10]. This allele was first introduced into strain MS11, and subsequently genomic DNA from MS11 recA6 was used to transform strain FA1090. In these strains the recA gene is transcribed from a promoter that is only operational in the presence of the inducer isopropyl thiogalactopyranoside (IPTG). Growing the strain in the absence of IPTG halts RecA production, and effectively ''freezes'' antigenic variation. The N. meningitidis strains used were NMB and FAM18, both of which produce class II pilin. The details of the construction of N. meningitidis strain CKNM397 have been described elsewhere [11]. Briefly CKNM397 is N. meningitidis strain NMB producing the class I pilin from N. meningitidis strain MC58. The pilE gene and upstream sequence containing the native promoter and a G4 sequence important for antigenic variation [12], was PCR amplified from strain MC58 and integrated into the iga gene of strain NMB. Subsequently the NMB pilE gene encoding class II pilin was insertionally inactivated. The solid and liquid media, and the growth conditions used have been described previously [13].

Amplification and Sequencing of Genes of Interest
A single colony of the relevant strain was picked from a GC agar plate and resuspended in 60-100 ml of GC broth. Neither the agar plate nor the broth contained IPTG, so in those strains containing the IPTG-inducible recA6 allele the recA gene was not being expressed, and antigenic variation was absent. A portion of this suspension was plated onto GC agar and incubated overnight. Strains that contained the recA6 allele were plated onto both GC agar and GC agar containing 2 mM IPTG. The remainder of the original suspension was retained and used to generate a reference sequence by Sanger sequencing. The cells were harvested from the agar plates, and genomic DNA was extracted from both these samples and the reference sample using the GenElute TM Bacterial Genomic DNA Kit (Sigma). The genomic DNA was used as template in PCR with KOD Hot Start DNA Polymerase (Merck). The oligonucleotide primers used to PCR amplify the genes of interest are listed in Table S1 in the Supporting Information. These were designed such that they bound approximately 300 bp upstream and downstream of the gene of interest. The PCR conditions were as follows: 95uC for 2 mins followed by 35 cycles of 95uC for 20 seconds, 55uC for 10 seconds and 70uC for 20 seconds. The PCR products were then gel-purified with QIAquick Gel Extraction Kit (QIAGEN). The PCR product from the reference sample was then subjected to Sanger sequencing using the same primers that were used for amplification.
The amplicons were prepared for high-throughput sequencing using the Illumina Genomic DNA Sample Prep Kit as per the manufacturer's instructions. Sequence data were obtained from an Illumina Genome Analyzer II using 36-cycle (Illumina 36C Sequencing Kit V2), or 72-cycle when it became available, paired-end chemistry. The raw sequence data from each sample will be deposited in the NCBI Sequence Read Archive, and the relevant sequence quality data are shown in Table S2 of the Supporting Information.

Assembly of Sequence Reads
Existing assembly programs, designed to handle moderate coverage of whole genome sequences, proved inadequate when faced with the extensive coverage of relatively short amplicon sequences used here. A novel assembly procedure was therefore developed to deal with the unusual nature of the high-throughput sequencing data produced. The procedure has three stages, an assembly stage to produce a large number of candidate sequences (500 per sample, or per RecA 2 /RecA + pair of samples), a selection stage to pick a smaller number of sequences that explain as much of the read data as possible (50 per sample or per RecA 2 /RecA + pair of samples), and finally a manual validation stage to identify and remove any mis-assemblies.
The assembly procedure is a variant on k-mer based assembly [14]. A k-mer is a string of k contiguous nucleotides, and a sequence of nucleotides of length n can be viewed as a sequence of n2k+1 overlapping k-mers. K-mer based assembly is usually based on statistics on the number of times each k-mer occurs in the read set. We added to this statistics on the number of k-mer pairs. All ordered k-mer pairs observed in the read pairs were counted. That is, each pair of k-mers occurring one after the other in a read, or one k-mer in the first read and one k-mer in the paired read. K- mer pair counts were used both in the assembly stage, as part of a seed-and-extend algorithm, and in the validation stage, where visualization of k-mer pair counts of an assembled sequence allowed manual validation of the correctness of assembled sequences. The details are discussed in a supplementary text file (Text File S1) in the Supporting Information, and the software is available for download at http://www.vicbioinformatics.com/ software.assemblet.shtml.

Nomenclature of pilS Silent Loci
We have continued the use of the system adopted by others to name the individual partial gene sequences located in the various silent loci [1,3,9]. Gonococcal strain FA1090 contains 19 partial gene copies [15]. All but one of these is found in five pilS loci distributed around the genome. Here for instance pilS1c3 refers to the third partial gene copy in the silent locus pilS1. One additional partial gene copy is found just upstream of the pilE gene, and has been designated pilEc2 [15]. Gonococcal strain MS11 also has 5 pilS loci, containing 13 partial gene copies [16]. An additional 2 partial gene copies are found upstream of the pilE gene. In some variants of this strain the pilE gene and upstream copies are duplicated. Only 11 of these 17 possible partial gene copies have been sequenced [16]. A summary of the pilS loci and the partial gene copies they contain, along with the relevant accession numbers, is shown in Table 1.

Analysis of Sequence Assemblies
Each of the 50 sequence assemblies emerging from the selection stage was used as a query sequence in a BLASTn search of the databases, and any assembly that did not relate to the gene and strain in question was removed from further consideration. These were mainly short assemblies with very low sequence coverage. The remaining assemblies were aligned to the reference sequence using ClustalW2. This, along with heat maps (see Text File S1 in Supplementary Information), occasionally identified additional mis-assemblies arising from the short Illumina sequence reads, and the presence of direct or inverted repeats in the amplified region. These were also removed from the analysis. The remaining assemblies were then manually searched for those that differed from the reference sequence by just one nucleotide, such as a one nucleotide insertion or deletion, or a nucleotide change. For the reasons outlined below, these assemblies were also removed from further analysis. As expected, prominent amongst the remaining assemblies was an assembly (allele 1) with high k-mer depth that was identical to the reference sequence obtained by Sanger sequencing of the amplicon. The remaining assemblies were again aligned with the reference sequence using ClustalW2. In most cases the average k-mer depth exceeded 10 5 , and in some cases 10 6 ( Table 2). The frequency with which a particular variant appeared was estimated by dividing the average k-mer depth of the variant portion of the assembly by the average k-mer depth of same-sized regions immediately on either side of the variant sequence. This is illustrated for a particular assembly containing a variant sequence in Figure S1, Panel A. In some cases regions containing variant sequences were interrupted by region(s) of conserved sequence. An example of this is shown in Figure S1, Panel B. In these cases the spike in k-mer depth resulting from the presence of the internal conserved region would inflate the average k-mer depth if this was measured across the entire region, resulting in an artificially high frequency calculation. Therefore for these assemblies the frequency was calculated by dividing the k-mer depth of each individual variable segment by the average k-mer depth of same-sized regions immediately on either side of the entire region.

Variation at the pilE Gene Encoding Class I Pilin in N. gonorrhoeae Strain FA1090
The pilE of strain FA1090 recA6 grown in the absence of IPTG was amplified by PCR and sequenced using Sanger chemistry. An alignment with the FA1090 genome sequence (accession number AE004969) revealed sequence variation in the 39 part of the gene ( Figure S2). This variation can be explained by gene conversion using sequence from a specific silent locus, pilS1c1. In two independent experiments a culture grown in the absence of IPTG was split, with one half cultured in the absence of IPTG, whilst the other half was grown in the presence of IPTG, allowing antigenic variation. The pilE gene from both cultures was amplified by PCR, and subjected to high-throughput sequencing.
In both experiments there were multiple assemblies that differed from the reference sequence by just one nucleotide. Unlike the variant sequences described below, the single nucleotide changes   were not concentrated in the 39 end of the pilE gene, but were scattered throughout the pilE gene and the flanking sequences. In both experiments, only one assembly involved a single nucleotide change that occurred at higher frequency in the presence of RecA. The same change was detected in both experiments, and involved a single nucleotide change downstream of the pilE gene. It therefore seems that almost all of the assemblies involving single nucleotide changes were not the result of antigenic variation. Others have reported that antigenic variation can result in single nucleotide changes [1], but in our hands it seems more likely that these are the result of low frequency mutations occurring during amplification of the gene or (less likely) sequencing errors. Such changes were therefore not considered further.
In the first experiment 29 assemblies passed the assembly and screening process described above, while in the second experiment (maybe because of a much lower average k-mer depth in the RecA + sample; Table 2) just six assemblies were detected. The results are shown in Table 3 and Table 4, for the first and second experiments, respectively. All of the variants were present at basal levels in the absence of RecA, and at a much higher frequency in the presence of RecA, suggesting active gene conversion during the experiment. In the absence of RecA the average kmer depth  across the variant portion of each assembly was less than 10, compared with 10 5 -10 6 for the rest of the assembly. In each case the kmer depth fell to zero for a portion of that variant sequence. An example of this can be seen in Figure S1, panel A. We used the average kmer depth across the entire variant sequence to calculate the frequency, rather than the minimal kmer depth (zero) for just a portion of the variant sequence. In agreement with the suggestion that this represents antigenic variation, alignments of the assembled sequences are shown in Figures S3 and S4 and demonstrate that sequence variation was confined to the 39 end of the pilE gene. Of the 28 variant sequences detected in the first experiment, 27 were identical to a pilS locus in the FA1090 genome sequence (Table 3). In the remaining case (allele 25) a mosaic sequence was present, derived from multiple silent loci. This is possibly derived from two separate recombination events involving pilS2c3 and pilS2c4. In two cases the variant sequence was identical to more than one silent locus, so it was not possible to unambiguously map the donor sequence. It was also evident that a particular silent locus can be involved in generating multiple variants. For instance, different portions of pilS3c3 were involved in generating alleles 4, 16, 19 and 26 (Table 3, Figure S3). In the repeat experiment all five variant sequences were identical to part of a silent locus (Table 4). Two variant sequences (alleles 3 and 4) appeared in both experiments. A variety of silent loci served as the source of donor sequences, with 12 of the 19 silent copies potentially involved.

Variation at the pilE Gene Encoding Class I Pilin in N. gonorrhoeae Strain MS11
The pilE gene of gonococcal strain MS11 is also antigenically variable, but reportedly at a lower frequency than in strain FA1090 [9]. In order to determine whether the approach outlined above could detect such differences, we again conducted two separate experiments, using strain MS11recA6. In the first experiment 15 assemblies passed the screening procedure (Table 5, Figure S5), while the second experiment yielded 8 assemblies (Table 6, Figure S6). In both experiments allele 1 was identical to both the reference sequence, and the MS11 pilE sequence deposited in the databases (Accession number K02078). In agreement with the earlier report [9] the frequency of antigenic variation observed was lower than that seen in strain FA1090 (Table 5 and Table 6). Rather than being barely detectable in the RecAculture, the variant sequences were present at a higher frequency than seen in FA1090, suggesting that despite single colony isolations a variant subpopulation was present. This occurred in both experiments and might suggest that the recA promoter is not as tightly controlled in the MS11 genetic background. As a result the presence of RecA only boosted the variant frequency approximately 10-fold in MS11, compared with approximately 100-fold in FA1090 (Table 3, Table 4). Not all silent loci in strain MS11 have been sequenced and annotated. As a result in three cases in both experiments, it was not possible to assign a specific silent locus as the source of the donor sequence. Alleles 9 and 11 appeared in both experiments.

Lack of Variation in the pilE Gene of Strains of N. meningitidis that Express Class II Pilin
Meningococci also express type IV pili, with at least some of the class I pilin subunits also varying antigenically [17]. However the class II pili of N. meningitidis FAM18 appear not to vary [3]. We therefore searched for evidence of pilE variation in FAM18 and another meningococcal strain, NMB, which also produces class II pilin. In neither strain were we able to detect antigenic variation despite adequate k-mer depth ( Table 2). We also looked for antigenic variation in meningococcal strain CKNM397. This strain is derived from strain NMB but produces the class I pilin from strain MC58 instead of the native class II pilin [11]. In two separate experiments we were unable to detect any antigenic variants despite adequate k-mer depth ( Table 2).

Variation at opa Genes
For the reasons outlined above, we also investigated whether there was any evidence of antigenic variation in the opa genes of N. gonorrhoeae strain FA1090. In an initial experiment we were unable to detect any variation in the FA1090 opaJ gene (NGO1922) (data not shown). However it has been reported that the opaK gene, situated close to the pilE gene, is more variable than other opa loci [7]. In two separate experiments, we were able to detect variant sequences in opaK (NGO2132) (Table 7, Figure S7 and Figure S8). In each case the variant portion of the assembly was identical to part of the opaD gene (NGO1513), which appeared to act as the donor sequence in these experiments. However in the first experiment the variant sequences are clearly present in the RecAsample and the presence of RecA boosts their levels only marginally (Table 7). In the second experiment both variants display a probably unrelated change in a CTCTT-repeat region known to be subject to slipped-strand mis-pairing. Other than this change allele 2 in the first experiment is identical to allele 5 in the second experiment. In this second experiment both variants appeared much more frequently in the presence of RecA (Table 7).

Discussion
Previous investigations of the extent and nature of antigenic variation in Neisseria have involved the amplification of genes from hundreds of individual colonies that were then sequenced by conventional Sanger sequencing technology in order to detect the subset that contained variant sequences [1,3,8]. Such an approach was both time-consuming and expensive. The advent of affordable deep sequencing platforms has enabled an alternative approach to such studies. Here we have used PCR amplification of the genes of interest, followed by high throughput DNA sequencing, to detect variant sequences. This involved a single PCR amplification rather than hundreds, one sequencing reaction rather than hundreds, and an improved ability to detect low-frequency variants. The data shown in Table 3 and Table 4 suggest that this approach is indeed capable of detecting examples of antigenic variation, in that the variant sequences detected in these experiments are typical of antigenic variation events. They are (a) restricted to the 39 end of the pilE gene, (b) detected at a high frequency only in the presence of RecA, and (c) in almost all cases identical to a portion of a pilS copy. We were also able to detect antigenic variation in the pilE gene of N. gonorrhoeae strain MS11, and in agreement with an earlier report [9] this appeared to be occurring at a frequency lower than that observed for strain FA1090.
In agreement with earlier studies [1,9] there are aspects of the antigenic variation that are difficult to explain. In the cases where it was possible to unambiguously identify the source of the donor variant sequence, it is clear that all silent loci can act in this capacity, and the number of variants generated was roughly proportional to the number of partial gene copies within the individual pilS loci. This was true for both strains FA1090 and MS11, and would seem to imply an underlying stochastic process. However we also observed two alleles arising in two separate experiments, again for both FA1090 and MS11. Similar results have been observed before [1,9], exceed what might be expected by chance, and suggest some bias in the process of selection of donor sequences. There was also some bias evident in the partial gene copy, within a silent locus, that was used as a donor sequence. The copy at the 39 end of the silent loci (copy 1) appears to be over-represented. For strain FA1090 5 of the 19 partial gene copies are designated as copy 1, so if all copies were used equally they may be expected to make up 26% of the donor sequences. In fact they make up 40% of the donor sequences. For strain MS11 this bias is even more pronounced. The use of copy 1 as a donor might be expected in 36% of the time whereas this was observed in 69% of cases.
A similar theme emerges when the variant frequency, rather than the number of variants, is examined. We summed the RecA + frequencies of the individual variants to provide an overall frequency for all variants. For FA1090 this was 15.1%, and for MS11 3.6%. This is in reasonable agreement with previous studies using different methods where the variant frequency was estimated to be 12.9-13% for FA1090 [1,9] and 5.7% for MS11 [9]. Again leaving aside those cases where a donor sequence could not be unambiguously identified, it seemed that partial gene copies 1 from the various pilS loci were again over-represented. For FA1090 they make up 53%, and for MS11 78%, of the unambiguous variant frequency.
The partial gene copies designated copy 1 differ from other silent copies in that they contain a sequence of approximately 250 nucleotides that has been designated the Pilus Associated Repeat (PAR) [15]. PAR is found immediately downstream of the end of the pilE gene, and each copy 1 of the pilS loci. Within the PAR in FA1090 are two previously described repeats, RS4 (32 nt) and the Sma/Cla repeat (65 nt) [15]. PAR is identically located in strain MS11, although some copies lack the RS4 repeat [16]. Previous work suggested that deletion of the Sma/Cla repeat downstream of the pilE gene results in a decrease in the amount of antigenic variation [18]. Our results suggest that PAR sequences associated with copy 1 in the various pilS loci influence both the source of the donor sequence and the frequency of antigenic variation associated with these specific partial gene copies.
It has previously been reported that antigenic variation is undetectable in the pilE gene of N. meningitidis strain FAM18 expressing class II pilin [3]. Despite potentially having the ability to detect variants at a much lower frequency than in the previous report, we were also unable to detect variation in this gene, or in the equivalent gene from N. meningitidis strain NMB. Strains producing class II pili have only two silent loci, as opposed to eight pilS copies in other meningococcal strains. Also the pilE gene expressing class II pilin is located elsewhere on the chromosome, whereas in those meningococcal strains that produce class I pili the silent loci are adjacent to pilE. Both of these factors might adversely affect the ability of the pilE gene from class II-producing strains to undergo antigenic variation. In addition, a DNA structure in the pilE promoter region of N. gonorrhoeae that is necessary for pilin antigenic variation has recently been described [12]. This guanine quartet structure is degenerate in meningococcal strains that produce class II pili, and this would also adversely affect the frequency of antigenic variation. However CKNM397 contains the pilE gene, and its native promoter, from strain MC58. The promoter region includes the sequences from MC58 that can form the guanine quartet structure. The absence of variation in this strain suggests that although this DNA structure is necessary for antigenic variation [12], by itself it is not sufficient to allow this process, at least in this genetic background. It seems that additional factor(s), present in class I-producing but not class II-producing strains, are needed for antigenic variation. Both FAM18 and NMB are disease-causing, rather than carriage, isolates. It therefore seems that antigenic variation of pilE is not essential for virulence, despite the fact that this process can be observed in all N. gonorrhoeae and many N. meningitidis strains.
We have also investigated another gene family where it has been suggested that antigenic variation might be occurring. In initial experiments with the opaJ gene of N. gonorrhoeae strain FA1090, we were unable to detect any variation. However it has been reported that the opaK gene, located close to pilE, is more variable than other opa genes [7]. In two separate experiments with opaK we were able to detect sequence variants. However only in the second experiment were they clearly more frequent when the RecA protein was present. There were also distinct differences from the variation observed in pilE. Firstly the number of different sequence variants detected was much smaller than seen in pilE. Secondly, compared with pilE, there was a more distinct bias in the source of the donor sequences. In every case the variant portion of the sequence was identical to part of the opaD gene, suggesting that this gene alone was donating sequence to yield variants of opaK.
Here we have demonstrated that PCR amplification of the genes of interest, followed by high throughput DNA sequencing, can be used to investigate antigenic variation. This approach could therefore be applied to a wide range of antigenic variation systems at a level that has not previously been possible. For example Borrelia burgdorferi, the causative agent of Lyme disease has a surface-exposed lipoprotein, VlsE, that undergoes antigenic variation [19,20]. The Msp2 and Msp3 systems of Anaplasma marginale [21] and the VlhA system of Mycoplasma synoviae [22] are additional examples of antigenic variation of surface components that could be explored using this approach. These genes all contain both conserved and variable segments. Given the data depth that can be achieved using this approach, it should therefore be possible to undertake more systematic searches for conserved segments in these antigenically variable genes. Figure S1 K-mer depth across assemblies containing variant sequence segments. The green trace depicts the kmer depth in an amplicon derived from a culture grown in the presence of RecA, and therefore antigenic variation. The blue trace shows the k-mer depth for an amplicon obtained from a culture grown in the absence of RecA and therefore no antigenic variation. The vertical pale green bar shows the variant segment of the assembly across which the k-mer depth was averaged, and the vertical grey bars the same-sized conserved sequences on either side across which the k-mer depth was averaged, for the frequency calculation. A. An assembly containing a single variant segment.

Supporting Information
B. An assembly where a conserved region of more than k bases interrupts a variant segment, resulting in a spike of k-mer depth (marked with a downward arrow) that would result in an artificially high frequency calculation if the k-mer depth was averaged across the entire region. (TIF) Figure S2 Alignment of the sequence of the pilE gene from the stock of N. gonorrhoeae strain FA1090 used in these experiments (top), and the FA1090 genome sequence (bottom). Blue text indicates sequence flanking the pilE gene (black text). Sequence differences are highlighted in yellow. The grey shading highlights the extent of the sequence identity between the pilE genome sequence and pilS1c1, flanking the variant sequence. (DOC) Figure S3 Alignment of the variant sequences detected in the first experiment with pilE in N. gonorrhoeae FA1090. The allele 1 assembly is identical to the reference sequence obtained by Sanger sequencing of the amplicon. Blue text indicates sequence flanking the pilE gene (black text). Sequence differences are highlighted in yellow. The grey shading highlights the extent of the sequence identity between the pilE sequence and the various silent copies, flanking the variant sequence. Where the variant sequence was identical to part of two silent copies, the larger of the two regions of sequence identity is shown. (DOC) Figure S4 Alignment of the variant sequences detected in the repeat experiment with pilE in N. gonorrhoeae FA1090. The seq1 assembly is identical to the reference sequence obtained by Sanger sequencing of the amplicon. Blue text indicates sequence flanking the pilE gene (black text). Sequence differences are highlighted in yellow. The grey shading highlights the extent of the sequence identity between the pilE sequence and the various silent copies, flanking the variant sequence.  Text File S1 Assembly of sequence reads. (DOC)