Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Characterization of the Candida orthopsilosis agglutinin-like sequence (ALS) genes

  • Lisa Lombardi,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft

    Current Address: School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin, Ireland

    Affiliation Department of Biology, University of Pisa, Pisa, Italy

  • Marina Zoppo,

    Roles Formal analysis, Investigation, Methodology

    Affiliation Department of Biology, University of Pisa, Pisa, Italy

  • Cosmeri Rizzato,

    Roles Formal analysis, Investigation, Methodology

    Current Address: Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy

    Affiliation Department of Biology, University of Pisa, Pisa, Italy

  • Daria Bottai,

    Roles Formal analysis, Investigation, Methodology

    Affiliation Department of Biology, University of Pisa, Pisa, Italy

  • Alvaro G. Hernandez,

    Roles Formal analysis, Investigation, Methodology

    Affiliation Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America

  • Lois L. Hoyer ,

    Contributed equally to this work with: Lois L. Hoyer, Arianna Tavanti

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Supervision, Writing – original draft

    Affiliation Department of Pathobiology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America

  • Arianna Tavanti

    Contributed equally to this work with: Lois L. Hoyer, Arianna Tavanti

    Roles Conceptualization, Funding acquisition, Investigation, Supervision, Writing – original draft

    Affiliation Department of Biology, University of Pisa, Pisa, Italy


Agglutinin like sequence (Als) cell-wall proteins play a key role in adhesion and virulence of Candida species. Compared to the well-characterized Candida albicans ALS genes, little is known about ALS genes in the Candida parapsilosis species complex. Three incomplete ALS genes were identified in the genome sequence for Candida orthopsilosis strain 90–125 (GenBank assembly ASM31587v1): CORT0C04210 (named CoALS4210), CORT0C04220 (CoALS4220) and CORT0B00800 (CoALS800). To complete the gene sequences, new data were derived from strain 90–125 using Illumina (short-read) and Oxford Nanopore (long-read) methods. Long-read sequencing analysis confirmed the presence of 3 ALS genes in C. orthopsilosis 90–125 and resolved the gaps located in repetitive regions of CoALS800 and CoALS4220. In the new genome assembly (GenBank PQBP00000000), the CoALS4210 sequence was slightly longer than in the original assembly. C. orthopsilosis Als proteins encoded features well-known in C. albicans Als proteins such as a secretory signal peptide, N-terminal domain with a peptide-binding cavity, amyloid-forming region, repeated sequences, and a C-terminal site for glycosylphosphatidylinositol anchor addition that, in yeast, suggest localization of the proteins in the cell wall. CoAls4210 and CoAls800 lacked the classic C. albicans Als tandem repeats, instead featuring short, imperfect repeats with consensus motifs such as SSSEPP and GSGN. Quantitative RT-PCR showed differential regulation of CoALS genes by growth stage in six genetically diverse C. orthopsilosis clinical isolates, which also exhibited length variation in the ALS alleles, and strain-specific gene expression patterns. Overall, long-read DNA sequencing methodology was instrumental in generating an accurate assembly of CoALS genes, thus revealing their unconventional features and first insights into their allelic variability within C. orthopsilosis clinical isolates.


Fungal adhesion is essential for stable colonization of host surfaces and subsequent disease development. Adhesion is mediated by molecules exposed on the fungal cells and accessible ligands on the host surface. Fungal adhesins include glycosylphosphatidylinositol (GPI)-modified cell wall proteins that mediate interactions with host cells, resident microbiota, and abiotic surfaces (reviewed in de Groot et al., [1]).

Proteins in the Agglutinin-Like Sequence (ALS) family of Candida albicans are among the best-characterized fungal adhesins [2]. In this species, the ALS family includes eight genes that encode large, cell-surface glycoproteins that share a similar basic organization including an N-terminal domain with adhesive function (NT-Als), a central domain of tandemly repeated sequences, and a Ser/Thr-rich C-terminal domain. The presence of a secretory signal sequence at the N terminus of the protein and a GPI anchor addition site at the C terminus are consistent with protein entry into the secretory pathway, and final localization linked to β-1,6-glucan in the fungal cell wall [3].

The molecular basis for adhesive function was first described for C. albicans Als9 by solving its NT-Als structure [4]. The NT-Als structure shows two immunoglobulin-like domains that form a peptide-binding cavity that can contain up to 6 amino acids. The flexible C-terminal ends of peptide ligands make natural binding partners due to an invariant Lys at the bottom of the binding cavity that can form an ionic pair with the C-terminal carboxyl group of the incoming peptide. Further functional characterization was pursued using C. albicans Als3 because it makes the largest contribution to C. albicans adhesion to host cells (reviewed in [5] and [6]). Mutagenesis of key amino acids within the peptide-binding cavity did not alter NT-Als surface topography. When the mutant construct was introduced into C. albicans under control of the ALS3 promoter, the resulting strain produced Als3 on the cell surface in quantities similar to wild-type, but had the adhesive capacity of a Δals3/Δals3 null mutant that had no surface Als3 [7]. This work demonstrated the importance of the peptide-binding cavity in Als-mediated adhesion. Als proteins also have a short sequence with amyloid-forming potential [8] that contributes to the aggregative properties of the protein. Overall, these features promote C. albicans interaction with complex surfaces including host cells, other microbes, and protein-coated abiotic materials [7].

Despite the clinical relevance of species in the Candida parapsilosis sensu latu complex [9, 10], including C. parapsilosis, Candida orthopsilosis, and Candida metapsilosis, little is known about the Als proteins in these species. Previous data indicate that C. parapsilosis and C. orthopsilosis have comparable adhesive properties, while C. metapsilosis is the least adhesive species of the complex [11]. Over the past few years, the availability of genome sequences and annotations for C. parapsilosis and C. orthopsilosis led to identification of ALS-like genes in these opportunistic pathogens ([12], [13]). The ALS gene composition of C. parapsilosis is variable depending on the strain examined [14]. Strain CDC 317, for which the annotation is featured in the Candida Genome Database (, has five ALS genes whereas other C. parapsilosis strains have one or three. Disruption of CPAR2_404800 reduced C. parapsilosis adhesion to human buccal epithelial cells [15], suggesting that C. parapsilosis Als proteins function in adhesion like their C. albicans orthologs. The goal of this work was to identify and characterize the ALS gene family of C. orthopsilosis using genome sequence data. Because of the presence of extensive tracts of repeated sequences, ALS genes are often incomplete in genome assemblies. Here, we describe the process of completing the sequences of the C. orthopsilsosis ALS genes and their encoded proteins. A set of C. orthopsilosis clinical isolates was used to examine sequence variation and relative ALS gene expression patterns. The resulting data provide insight into the ALS gene family in C. orthopsilosis and the basis for functional characterization.

Materials and methods

Fungal strains and growth conditions

The C. orthopsilosis type strain ATCC 96139 [16] and the genome sequencing strain 90–125 [13] were included in this study, along with 4 clinical isolates (124, 85, 331, and 488) that were part of a strain collection deposited at the Department of Biology, University of Pisa. C. orthopsilosis strains were maintained as 30% glycerol frozen stocks at -20°C or -80°C and cultured on YPD agar plates (per liter: 10 g yeast extract, 20 g peptone, 20 g dextrose, 15 g agar). YPD liquid medium was used for routine growth at 30°C with shaking.

Genomic DNA preparation

C. orthopsilosis genomic DNA for PCR amplification was extracted after an overnight incubation at 30°C in YPD medium with shaking. Cells were resuspended in a lysis buffer, broken with glass beads, and the resulting suspension extracted with phenol:chloroform:isoamyl alcohol (25:24:1) as described previously [16]. Following RNase treatment, DNA was precipitated with 2 volumes of isopropanol and 10 μl of 4 M ammonium acetate. The pellet was dried and dissolved in 50 μl of TE (pH 8.0).

C. orthopsilosis genomic DNA for long-read sequencing was extracted from cells that were grown for 16 h at 37°C in YPD medium with 200 rpm shaking. Cells were treated with zymolyase to form spheroplasts that were lysed with sodium dodecyl sulfate. Gentle mixing by inversion was used to handle the spheroplasts, and during phenol extractions and isopropyl alcohol precipitation of DNA [17].

Genome sequence data generation and assembly

New genome data were derived from strain 90–125 using Illumina (short-read) and Oxford Nanopore (long-read) methods. MiSeq shotgun genomic libraries were prepared with the Hyper Library construction kit (Kapa Biosystems). The library was quantitated by qPCR and sequenced on one MiSeq flowcell for 151 cycles from each end of the fragment using a MiSeq 300-cycle sequencing kit (version 2). FASTQ files were generated and demultiplexed with the bcl2fastq Conversion Software (Illumina, version MiSeq reads were quality trimmed using Trimmomatic [18] with the parameters “LEADING:30 TRAILING:30” prior to assembly. MiSeq yielded 2,281,330 reads of 150 nt each.

For Oxford Nanopore long-read sequencing, 1 μg of genomic DNA was sheared in a gTube (Covaris, Woburn, MA) for 1 min at 6,000 rpm in a MiniSpin plus microcentrifuge (Eppendorf, Hauppauge, NY). The sheared DNA was converted into a Nanopore library with the Nanopore Sequencing kit (LSK-108) with the Expansion barcoding kit (EXP-NBD103; Oxford Nanopore, UK). The library was sequenced on a SpotONFlowcell MK I (R9.4) for 48 h using a MinION MK 1B sequencer. Base calling and demultiplexing were performed in real time with the Metrichor Agent V2.45.3 using the 1D Base Calling plus Barcoding for FLO-MIN_106D 450 bp workflow. Sixty nucleotides (nt) were removed from both ends of each Oxford Nanopore read. Reads longer than 1000nt were used in the final assembly. The Oxford Nanopore (ONP) flow cell yielded 40,744 reads for a total of 364,246,709 bp. The mean and median ONP read lengths were 8,940 and 8,754 bp, respectively with a minimum of 114 bp and a maximum of 98,108 bp.

Genome assembly was performed using Canu v1.4 [19] using default parameters with the command ‘canu -p asm -d orthopsilosisgenomeSize = 14m useGrid = false -nanopore-raw C_orthopsilosis_trimmed.fastq’ using the trimmed ONP FASTQ reads. ONP reads were then aligned against the assembly using bwa [20], and the alignment was then used to polish the assembly using nanopolish v 0.6.0 [21]. Finally, the trimmed MiSeq data was used to additionally polish the assembly using Pilon v1.21 [22].

Identification and in silico analysis of ALS genes

C. orthopsilosis ALS genes were identified by command-line BLAST of the entire genome sequence using known ALS sequences as the query. C. albicans ALS3 (GenBank accession number AY223552) provided a baseline query because of its prototypical N-terminal (NT-Als) domain sequence for which the three-dimensional structure is known [7], as well as the central tandem-repeat domain (TR; head-to-tail copies of a 36-amino acid repeated sequence) and a Ser/Thr-rich C-terminal (CT) domain present in the translated protein. Other query sequences included C. albicans ALS1 (L25902), ALS2 (AH006927), ALS4 (AH006929), ALS5 (AY227440), ALS6 (AY225310), ALS7 (AF201684), ALS9-1 (AY269423), and ALS9-2 (AY269422). Once identified, CoALS genes were used as BLAST queries, as well. The putative cleavage site of the N-terminal signal peptide was predicted using SignalP 4.1 Server (; [23]). Tango in silico analysis (; [24]) was used to identify the amyloid-forming region (AFR). The hypothetical position of the ω site to which the GPI moiety is attached after proteolytic cleavage was predicted by using PredGPI (; [25]). Tandem repeat units were detected with T-Reks (; [26]).

Analysis of ALS allelic variation

PCR was used to amplify various CoALS fragments to detect allelic size variation. Primers were designed according to the genomic sequence of the strain 90–125 available in the Candida Gene Order Browser database (CGOB3,; [27, 28]; Table 1). PCRs used DreamTaq DNA Polymerase (Thermo Fisher Scientific); primers were synthesized by Sigma Genosys or Integrated DNA Technologies. Amplification of entire CoALS genes used Q5 High-Fidelity DNA polymerase (New England Biolabs, NEB). PCRs were heated at 98°C for 30 s followed by 30 cycles of 98°C (10 s), 68°C (30 s), and 72°C (2.5 min for ALS800 and ALS4210, and 3.5 min for ALS4220). A final 10-min extension 72°C was performed. PCR products were migrated on a 0.8% agarose gel in Tris Acetate EDTA buffer (TAE). Molecular sizes were calculated in silico using Gel Analyzer 2010 software ( and either the GeneRuler 1 kb DNA ladder (Thermo Fisher Scientific) or 100 bp DNA ladder (NEB).

Quantitation of relative gene expression levels

Relative expression of the CoALS genes was determined by real-time reverse transcription (RT)-PCR starting from total RNA of C. orthopsilosis isolates. Each strain was inoculated in 10 ml of YPD and grown for 16 h at 30°C with shaking. An aliquot (500 μl) of the pre-inoculum was then inoculated in 20 ml of fresh YPD broth and incubated for 1 h and 24 h at 30°C. Total RNA was extracted using Nucleospin RNA (Macherey Nagel, Düren, Germany) according to manufacturer’s instructions and treated with DNase (Macherey Nagel) to remove DNA contamination. RNA was eluted in 60 μl of RNase-free water and stored at -80°C. The quality and quantity of the extracted RNA were determined spectrophotometrically in an UVette 220–1600 (10 mm path length, 100 ml of sample volume, Eppendorf, Milan, Italy). One μg of total RNA in a 20-μl reaction volume was converted into cDNA with random primers, using the Reverse Transcription System kit (Promega), following manufacturer’s instructions. An RT-negative control was included to ensure lack of genomic DNA contamination.

Primer sequences for real-time PCR are shown in Table 1. Each PCR mixture (20 μl) contained 1 μl of cDNA, 10 μl of Sso Advanced universal SYBR Green supermix, 1 μl each of primers (final concentration 0.2 μM) and 7 μl of sterile MilliQ water. Real-time PCR was performed in 96-well plates on CFX96 Touch Real-Time PCR Detection System (BioRad) (95°C incubation for 60 s, followed by 40 cycles of 95°C incubation for 5 s and 58°C for 15 s). C. orthopsilosis ACT1 was used as the reference gene (Table 1). The transcription level of ALS genes was calculated using the 2- ΔCt method [29]. RT-PCR results were evaluated by Repeated Measures ANOVA test, followed by Dunnett’s Multiple Comparison Test. A P value <0.05 was considered statistically significant.


Identification and DNA sequence of C. orthopsilosis ALS genes

The C. orthopsilosis strain 90–125 genome sequence initially was accessed using CGOB3 (; [27, 28]) and three putative ALS genes were located. Subsequently, data available at were used to more carefully describe the ALS genes in the reference genome assembly (ASM31587v1). One ALS gene was located on chromosome 2 (CORT_0B00800) and two more in tandem on chromosome 3 (CORT_0C04210 and CORT_0C04220; Fig 1). For simplicity, the gene names were abbreviated here as CoALS800, CoALS4210, and CoALS4220, respectively.

Fig 1. Schematic of CoALS genes.

Analysis reported here indicated that C. orthopsilosis strain 90–125 encoded three ALS genes (in green), namely CoALS800 (2499 bp) located on chromosome 2, and CoALS4210 and CoALS4220 (2457 bp and 6078 bp, respectively), which were contiguous on chromosome 3 and separated by 6321 nt. The scheme was drawn to scale. An arrow below each gene indicates the orientation of the ORF.

In silico analysis revealed that the sequences of CoALS800 and CoALS4220 were incomplete due to mis-assembly of repeated DNA sequences in the coding region (Table 2).

Table 2. Comparison between ALS genes in C. orthopsilosis strain 90–125 genome assemblies ASM31587v1 and PQBP00000000.

The genome assembly was generated from short-read sequences (454 Life Sciences and Illumina) with the aid of paired-end Sanger sequence reads from a fosmid library [13]. Because fungal species tend to encode multiple ALS genes, each containing long stretches of repeated DNA, ALS genes are very difficult to assemble from short-read sequence data. The recent development of long-read DNA sequencing methodology provided the potential to produce sequence reads that span entire repeat regions. One drawback of the long-read technology is reduced accuracy of base calling [30], so Illumina data were also generated and incorporated into the genome assembly. The assembled genome was deposited in GenBank with the accession number PQBP00000000. The genome assembled into 10 contigs that mapped to the 8 chromosomal sequences defined by the reference genome assembly (ASM31587v1; Table 3). Long-read sequence data contributed to an improved assembly. For example, assembly ASM31587v1 had 242 contigs in 8 scaffolds, an N50 of 120 kb, and an L50 of 36. Assembly PQBP00000000 had no added Ns, an N50 of 1.59 Mb, and an L50 of 4.

Table 3. Comparison between chromosomes/contigs from the ASM31587v1 (454/Illumina) and PQBP00000000 (ONP/Illumina) assemblies.

The new genome assembly was searched using the Basic Local Alignment Search Tool (BLAST; with C. albicans Als3 (CaAls3) as the query (translated from GenBank accession number AY223552). BLAST results revealed the same three genes discussed above (CoALS4210, CoALS4220, CoALS800). Additional BLAST, using the CoALS sequences and other parts of known ALS genes as queries, failed to reveal additional genes suggesting that strain 90–125 encoded three ALS genes. The schematic of the chromosomal arrangement of the C. orthopsilosis ALS genes (Fig 1) accurately depicts both genome assemblies. Final sequences for the CoALS genes were deposited in GenBank under accession numbers MG799557 (CoALS800, 2499 bp), MG799558 (CoALS4210, 2457 bp), and MG799559 (CoALS4220, 6078 bp).

The information above was the most concise description of the path toward identifying the CoALS genes and validating their DNA sequences. Prior to generating the new genome assembly CoALS sequence assembly was attempted by subcloning and PCR amplification of various gene fragments, and Sanger sequencing of the resulting constructs and products. Other GenBank deposits of strain 90–125 sequences were made during the course of the study and listed here for the sake of completeness. These included KJ679579 (which was identical to MG799557), KX961387 (a partial sequence including the 5’ domain of CoALS4210, which was 100% identical to MG799558 in the region of overlap), and KY211672 (a partial CoALS4220 sequence, which was assembled using Xs to indicate unknown nucleotides within the tandem repeat region).

Comparisons between data from the different approaches suggested only minor differences. For example, CoALS4210 was predicted to be shorter in the ASM31587v1 assembly than the PQBP00000000 assembly. Validation methods pointed to the MG799558 sequence as the correct, final version. For CoALS4220, the long-read sequence technology provided an accurately sized template for assembly of the tandem repeat sequences. Sanger sequencing of subcloned fragments and PCR products in the different laboratories contributing to this manuscript were in agreement with the exception of 6 nucleotides in tandem repeat unit 14 (TR14); the shorter version was reported in MG799559 and featured in this manuscript. The 90–125 isolate used in all work originated in the Tavanti laboratory.

Features of C. orthopsilosis Als proteins

C. orthopsilosis ALS genes were translated to visualize and compare the CoAls proteins (Fig 2). Protein features were compared to the well-characterized C. albicans proteins [2]. Each CoAls protein encoded a secretory signal sequence of 22 amino acids followed by an N-terminal (NT) domain of 312 or 313 amino acids. The CoAls NT domains were 81–87% identical, and shared 45–47% identity with NT-Als3 from C. albicans. Alignment of the NT-CoAls amino acid sequences with NT-Als3 for which the three-dimensional structure is known [7] showed conservation of the eight Cys that provide the NT-Als3 fold (Fig 3). This sequence similarity suggested conservation of adhesive function in the CoAls proteins. The NT domain was followed by a short sequence (AFR) that had amyloid-forming potential as defined by Tango [24]. The aggregative function of this sequence was demonstrated previously in C. albicans Als proteins [7, 8].

Fig 2. Domain architecture of CoAls proteins.

The three CoAls protein sequences were drawn to scale, with domains represented by different colors. Each protein encoded a secretory signal peptide followed by the N-terminal domain (NT-Als) where adhesive function resides in the well-characterized C. albicans orthologs [4, 7]. Each CoAls protein also had a short amyloid forming region (AFR; [8]), and a Thr-rich region (T) domain. Like Als proteins in C. albicans[2], CoAls4220 included a central region of conserved tandemly repeated sequences. Most units were 36 aa, but two (TR14 and TR20) had 34 aa and one (TR17) had 35 aa. CoAls4210 and CoAls800 lacked the central tandem-repeat region. Instead, they had a short, imperfect repeated sequence (SSSEPP consensus), a Ser/Thr/Pro-rich region, and another short, imperfect repeat (GSGN consensus). Each CoAls protein had C-terminal signal for GPI anchor addition, predicting its localization in the fungal cell wall [3].

Fig 3. Conserved NT-Als features suggested adhesive activity for CoAls proteins.

The mature (signal peptide removed) NT domains of CoAls800, CoAls4210, and CoAls4220 contained 8 Cys residues in conserved positions (highlighted in yellow), which are essential for the folding of C. albicans (Ca) NT-Als3 for which the three-dimensional structure was solved [7]. Conservation of NT-Als3 adhesive function in the CoAls proteins was also suggested by the presence of the invariant Lys (K59) located in the CaNT-Als3 binding cavity (highlighted in blue). The amino acid alignment was produced using Clustal Omega ( Identical (*), conserved (:), and semi-conserved (.) amino acids are indicated below the alignment. Dashes in the sequence indicate gaps. The sequence of C. albicans Als3 (CaAls3; GenBank accession number AY223552) was used as a reference.

Like CaAls3, the CoAls proteins had a Thr-rich region (T domain; 32–34% Thr) that followed the NT and AFR sequences. The boundaries of the T domain were based on evaluations of sequence data, rather than on functional data. Currently the T domain is bounded in C. albicans Als proteins by the end of the AFR and the start of the tandemly repeated sequences [7]. Of the newly described CoAls proteins, only CoAls4220 had tandemly repeated copies of a 36-aa sequence. Unlike C. albicans Als proteins, however, the length of selected repeat units in CoAls4220 was variable, with some repeat units lacking one or two amino acids. The region C-terminal to the tandem repeats in CoAls4220 was rich in Ser (30%) and Thr (15%) similar to C-terminal regions in C. albicans Als proteins.

Regions following the T domain in CoAls800 and CoAls4210 were different than those observed in other Als proteins. Compositions of the two proteins were very similar in this region (Fig 2). Both encoded two different short, imperfect repeated sequences. The motif SSSEPP was found in the region proximal to the T domain. Following a Ser/Thr/Pro-rich (58–62%) region, a GSGN motif was present. Each CoAls protein had a C-terminal sequence with hallmarks of a GPI anchor addition site. The C-terminal 20 aa were predicted to be cleaved in this process.

Allelic variation in C. orthopsilosis ALS genes

C. albicans ALS genes are marked by considerable variation that exists between strains and between alleles in the diploid species [31, 32]. Allelic variation is notable in small nucleotide sequence changes, as well as large differences in gene length, mainly due to expansion and contraction of repeated sequences within the coding region. These observations provide the foundation for evaluation of allelic variation in the CoALS genes.

PCR primers were designed to amplify and sequence the 5’ end of each CoALS gene in the strains used in the study (Table 1). Resulting sequences were deposited in GenBank. For CoALS800, accession numbers included KM506766 (ATCC 96139), KM506767 (85), KJ855317 (124), KJ855318 (488), and KJ855319 (331). CoALS4210 accession numbers were KX961388 (ATCC 96139), KX961391 (85), KX961389 (124), KX961390 (488), and KX961392 (331). CoALS4220 accession numbers included KX961393 (ATCC 96139), KX961394 (85), KX961395 (124), KX961396 (488), and KX961397 (331). Translation of these nucleotide sequences provided 396 aa from the N-terminal end of CoAls800, 340 aa from CoAls4210, and 337 aa from CoAls4220, including the signal peptide for each protein. Alignment of all sequences for the same protein showed >98% identity, suggesting little variation in the adhesive domain across strains. None of the altered amino acids in any of the proteins was located within the peptide-binding cavity.

PCR was used to assess length variation among the CoALS genes. Length variation was apparent from amplification of the entire CoALS coding region (Fig 4A). Targeted PCR primers were designed to attribute this length variation to specific regions of each gene. Length variation was present in sequences 3’ of the AFR-encoding region (Fig 4B, 4C and 4D). Variation between the diploid alleles was obvious in some strains within the CoALS4220 tandem repeat domain (Fig 4D).

Fig 4. Source of allelic variability in CoALS genes and their encoded proteins.

A PCR-based strategy was used to evaluate the presence of allelic variation among CoALS genes from C. orthopsilosis strains 90–125 (1); ATCC 96139 (2); 85 (3), 124 (4), 488 (5), and 331 (6). Each subfigure shows a schematic of the CoALS gene or its encoded protein and corresponding PCR products that were analyzed on ethidium-bromide-stained agarose gels. Flanking gray rectangles represent the position of PCR primers outside of the coding region. (A). Overall size differences of the CoALS genes in each strain were demonstrated using primers 5’ and 3’ of the coding region (depicted as arrows; primer sequences are detailed in Table 1). Size markers (in kb) are indicated on the left of each gel image. Experiments used either GeneRuler 1 kb DNA ladder (Thermo Fisher Scientific) or 100 bp DNA ladder (NEB). Dissection of the source of the allelic variation in genes CoALS800 (B), CoALS4210 (C), and CoALS4220 (D) indicated variability in the sequences encoding the C-terminal regions of each protein and the tandem repeats in CoALS4220. Primers are labeled with lowercase letters that correspond to the labels on the agarose gel images. Sizes of fragments encoding the AFR and T domains were not detectably different between strains.

Additional primers were designed to further dissect the location of the observed length variation (Fig 5). Strain and/or allelic variability was noted in the SSSEPP-encoding sequences of CoALS800 and CoALS4210. The GSGN-encoding sequences in these two genes were homogeneous in CoALS800, but variable in CoALS4210. Variability was observed in the 3’ end of the CT-encoding domain of CoALS4220. These sequence differences suggest that mature CoAls proteins will be different sizes across strains, and that within a strain, alleles may produce proteins of different lengths.

Fig 5. Dissection of allelic variability in CT-encoding regions of CoALS genes.

Each panel contains the schematic of a CoAls protein (A = CoAls800; B = CoAls4210; C = CoAls4220) carried forward from Figs 2 and 5, agarose gel images that reveal PCR product sizes from amplification with different primer pairs (labeled in lowercase letters), and a Dotmatcher output ( that compares each amino acid sequence to itself to reveal repeated sequences. The analyzed region is indicated by a red arrow in each panel. Strain numbers are the same as for Fig 4. Primer sequences are shown in Table 1. Molecular sizes (in kb) are shown at the left of each gel image. GeneRuler 1 kb DNA ladder (Thermo Fisher Scientific) was used in all experiments. Variability in the CT region of CoAls800 was located in the SSSEPP region (A), while in CoAls4210, the GSGN region was also variable in size. The CT region of CoAls4220 was also variable in size, due to sequence differences that encode the 3’ half of the CT region (C).

Real-time PCR analysis of C. orthopsilosis ALS gene expression

Quantitative expression of CoALS genes was measured in the clinical isolates and reference strains grown in YPD medium for 1 h and 24 h. Data were displayed as a heat map (Fig 6). Transcription levels for the three CoALS genes varied based on stage of growth. CoALS800 showed the lowest expression level at 1 h incubation in all the strains tested (P< 0.0001). Conversely, CoALS4220 was expressed more highly than the other two genes (P < 0.0001 at 1 h, P < 0.001 at 24 h), although its transcriptional level was lower at 24 h compared to 1 h. Strain differences in expression were observed for all CoALS genes.

Fig 6. Strain- and growth-stage differences in CoALS gene expression.

Real-time RT-PCR was used to quantify relative expression levels for the three CoALS genes in the six C. orthopsilosis strains grown for either 1 h or 24 h at 30°C in YPD liquid medium. Lower numbers indicate a smaller difference between expression of the gene and the ACT1 control, suggesting higher overall relative expression. Gray-scale coding indicates higher (darker gray) and lower expression (lighter shading). CoALS800 showed the lowest expression level at 1 h incubation in all the strains tested (P<0.0001). CoALS4220 was expressed more highly than the other genes (P<0.0001 at 1h, P<0.001 at 24h), although its transcription level was lower at 24 h compared to 1 h.


The study of microbial pathogenesis has been revolutionized by the availability of genome sequences for many species. Although sequence data can be generated and assembled into draft genome files at a rapid pace, some genes have features that defy accurate representation in these resources. Examples include genes that belong to families with many similar loci, and open reading frames that encode multiple copies of repetitive sequences. Genes in the ALS family possess both features, and as such, are often mis-assembled in available genome sequences.

Long-read DNA sequence methodology is one answer to this problem. Although long-read methods produce data with a lower accuracy in base calling [30], the method is attractive for studying the ALS family since the long-read sequence can provide a template upon which shorter-read data (i.e. Illumina) can be assembled. Work presented here demonstrates the utility of this approach. The combination of methods provided an accurate and complete assembly for two of the three CoALS genes. Data for the third gene was sufficiently complete that primers could be designed for PCR amplification and Sanger sequencing of the product, delivering a final gene sequence. Overall, combining long- and short-read approaches generated a more-complete picture of the ALS family than was evident in the previous genome sequence that was assembled without the benefit of the long-read data.

Among the newly characterized CoALS genes, CoALS4220 looks most like the ALS genes that were described in C. albicans because of the presence of multiple copies of a tandemly repeated sequence in the center of the gene. In CoALS4220, however, this sequence includes repeat copies that are missing 1 or 2 amino acids, a feature that was not observed in any of the C. albicans proteins. The CoALS800 and CoALS4210 genes are unique among currently characterized ALS genes because they do not possess a tandemly repeated sequence and, therefore, are shorter than most Als proteins for which adhesive function has been demonstrated. For example, C. albicans Als3 is produced from two alleles in strain SC5314: one protein is 1155 amino acids and the other is 1047 amino acids, due to the presence of three fewer copies of the tandemly repeated sequence. The shorter protein contributes less than the larger protein to C. albicans adhesion, presumably because the longer protein is better able to project the NT-Als adhesive domain away from the C. albicans cell surface [33]. These CaAls3 sequences are over 300 amino acids longer than the CoAls800 and CoAls4210 sequences described here. However, recent work shows that CoALS4210 contributes to C. orthopsilosis adhesion because deletion of the gene results in reduced adhesion to HBECs [34].

The current study is also unique in that it examines ALS gene expression in multiple clinical isolates using a quantitative method and demonstrates notable strain-specific gene expression patterns. Overall, C. orthopsilosis shows differential regulation of its ALS genes by growth stage, a theme that was also found in C. albicans. C. albicans ALS4 is up-regulated in cells from a saturated culture [35] whereas C. albicans ALS1 is highly expressed in cells that are transferred to fresh growth medium [36]. In C. orthopsilsosis, CoALS800 was more highly expressed as a culture aged, while CoALS4220 was more highly expressed in a 1-h culture. CoALS4210 expression patterns varied by strain, with some strains showing higher relative expression in a young culture and others exhibiting little gene expression difference regardless of which growth stage was examined.

Future studies will be aimed at associating gene expression data with adhesive function in different experimental models. Previously characterized strain 124, described as highly adhesive to expholiated buccal cells [11], shows a strong relative expression of CoALS4220 (Fig 6), which could be responsible for its higher relative adhesion [11], as also demonstrated in C. albicans, whose ALS gene expression levels are positively correlated with Als protein abundance.

C. orthopsilosis is closely related to Candida parapsilosis; it has been less than 15 years since the species were recognized as distinct [16]. Publications describe C. parapsilosis ALS gene content as highly divergent by strain. Pryszcz et al. [14] examined whole genome sequences from a variety of C. parapsilosis isolates and noted one strain with 5 genes, while others encoded only 1. In our current study, PCR primers designed to recognize CoALS800, CoALS4210 and CoALS4220 amplified the predicted products from each of 6 C. orthopsilosis isolates. Amplified Fragment Length Polymorphism analysis of four of the isolates (ATCC 96139, 85, 124, and 331) was reported previously [37]. UPGMA analysis showed that these strains belong to different clusters, indicating genetic diversity among the isolates used in the current study. These observations suggest broad conservation of the three CoALS genes within the species. Recently, it has been shown that the majority of C. orthopsilosis strains are hybrids between a Parental Species A (non-hybrid, of which the homozygous isolate 90–125 is representative) and a Parental Species B, which has not been isolated in non-hybrid form [38, 39]. Interestingly, three of the 4 clinical isolates used in this study are known to be hybrids belonging to different clades, namely strains Co85 (Clade 1), Co331 (Clade 2), and Co124 (Clade 4.1) [39]. It has been suggested that C. orthopsilosis and C. metapsilosis hybrid formation may have facilitated a change in pathogenicity to humans [39, 40]. Further analyses will be required to investigate potential association between adhesion ability and hybrid genomes.

Although Co85, Co124, Co331 and 90–125 all have diploid genomes, evaluation of copy number variation in 1-kb windows across the genomes evidenced the presence of a single copy of CoALS4210 in strain Co85 [39]. This result did not seem to affect ALS mRNA levels detected by RT-PCR in strain Co85. We can conclude with certainty that strain 90–125 does not have additional ALS genes, but we cannot exclude the presence of other ALS genes in the remaining strains. Ongoing work in C. orthopsilosis will continue to characterize the ALS family and adhesive function of the Als proteins in this species.


We thank the staff of the Roy J. Carver Biotechnology Center for their support with DNA sequencing and genome data assembly.


  1. 1. de Groot PW, Bader O, de Boer AD, Weig M, Chauhan N. Adhesins in human fungal pathogens: glue with plenty of stick. Eukaryot Cell. 2013;12(4):470–81. pmid:23397570
  2. 2. Hoyer LL, Green CB, Oh SH, Zhao X. Discovering the secrets of the Candida albicans agglutinin-like sequence (ALS) gene family—a sticky pursuit. Med Mycol. 2008;46(1):1–15.
  3. 3. Lu CF, Kurjan J, Lipke PN. A pathway for cell wall anchorage of Saccharomyces cerevisiae alpha-agglutinin. Mol Cell Biol. 1994;14(7):4825–33. pmid:8007981
  4. 4. Salgado PS, Yan R, Taylor JD, Burchell L, Jones R, Hoyer LL, et al. Structural basis for the broad specificity to host-cell ligands by the pathogenic fungus Candida albicans. Proc Natl Acad Sci U S A. 2011;108(38):15775–9. pmid:21896717
  5. 5. Cota E, Hoyer LL. The Candida albicans agglutinin-like sequence family of adhesins: functional insights gained from structural analysis. Future Microbiol. 2015;10(10):1635–548. pmid:26438189
  6. 6. Hoyer LL, Cota E. Candida albicansagglutinin-like sequence (Als) family vignettes: areview of Als protein structure and function. FrontMicrobiol. 2016;7:280. pmid:27014205
  7. 7. Lin J, Oh SH, Jones R, Garnett JA, Salgado PS, Rusnakova S, et al. The peptide-binding cavity is essential for Als3-mediated adhesion of Candida albicans to human cells. J Biol Chem. 2014;289(26):18401–12. pmid:24802757
  8. 8. Garcia MC, Lee JT, Ramsook CB, Alsteens D, Dufrêne YF, Lipke PN. A role for amyloid in cell aggregation and biofilm formation. PLoS One. 2011;6(3):e17632. pmid:21408122
  9. 9. Barchiesi F, Orsetti E, Osimani P, Catassi C, Santelli F, Manso E. Factors related to outcome of bloodstream infections due to Candida parapsilosis complex. BMC Infect Dis. 2016;16:387. pmid:27507170
  10. 10. Pfaller MA, Messer SA, Jones RN, Castanheira M. Antifungal susceptibilities of Candida, Cryptococcus neoformans and Aspergillus fumigatus from the Asia and Western Pacific region: data from the SENTRY antifungal surveillance program (2010–2012). J Antibiot (Tokyo). 2015;68(9):556–61.
  11. 11. Bertini A, De Bernardis F, Hensgens LA, Sandini S, Senesi S, Tavanti A. Comparison of Candida parapsilosis, Candida orthopsilosis, and Candida metapsilosis adhesive properties and pathogenicity. Int J Med Microbiol. 2013;303(2):98–103. pmid:23403338
  12. 12. Butler G, Rasmussen MD, Lin MF, Santos MA, Sakthikumar S, Munro CA, et al. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature. 2009;459(7247):657–62. pmid:19465905
  13. 13. Riccombeni A, Vidanes G, Proux-Wera E, Wolfe KH, Butler G. Sequence and analysis of the genome of the pathogenic yeast Candida orthopsilosis. PLoS One. 2012;7(4):e35750. pmid:22563396
  14. 14. Pryszcz LP, Nemeth T, Gacser A, Gabaldon T. Unexpected genomic variability in clinical and environmental strains of the pathogenic yeast Candida parapsilosis. Genome Biol Evol. 2013;5(12):2382–92. pmid:24259314
  15. 15. Bertini A, Zoppo M, Lombardi L, Rizzato C, De Carolis E, Vella A, et al. Targeted gene disruption in Candida parapsilosis demonstrates a role for CPAR2_404800 in adhesion to a biotic surface and in a murine model of ascending urinary tract infection. Virulence. 2016;7(2):85–97. pmid:26632333
  16. 16. Tavanti A, Davidson AD, Gow NA, Maiden MC, Odds FC. Candida orthopsilosis and Candidametapsilosis spp. nov. to replace Candida parapsilosis groups II and III. J Clin Microbiol. 2005;43(1):284–92. pmid:15634984
  17. 17. Sherman F. FGR , Hick J.B. Methods in yeast genetics. Cold Spring Harbor Laboratory, New York. 1986:9.
  18. 18. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. pmid:24695404
  19. 19. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. pmid:28298431
  20. 20. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv e-prints [Internet]. 2013 March 1, 2013; 1303.
  21. 21. Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Brief Bioinform. 2018. pmid:29617724
  22. 22. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. pmid:25409509
  23. 23. Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–6. pmid:21959131
  24. 24. Rousseau F, Serrano L, Schymkowitz JW. How evolutionary pressure against protein aggregation shaped chaperone specificity. J Mol Biol. 2006;355(5):1037–47. pmid:16359707
  25. 25. Pierleoni A, Martelli PL, Casadio R. PredGPI: a GPI-anchor predictor. BMC Bioinformatics. 2008;9:392. pmid:18811934
  26. 26. Jorda J, Kajava AV. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics. 2009;25(20):2632–8. pmid:19671691
  27. 27. Fitzpatrick DA, O’Gaora P, Byrne KP, Butler G. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser. BMC Genomics. 2010;11:290. pmid:20459735
  28. 28. Maguire SL, OhEigeartaigh SS, Byrne KP, Schroder MS, O’Gaora P, Wolfe KH, et al. Comparative genome analysis and gene finding in Candida species using CGOB. Mol Biol Evol. 2013;30(6):1281–91. pmid:23486613
  29. 29. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C(T) method. Nat Protoc. 2008;3(6):1101–8. pmid:18546601
  30. 30. Madoui MA, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A, et al. Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics. 2015;16:327. pmid:25927464
  31. 31. Hoyer LL. The ALS gene family of Candida albicans. Trends Microbiol. 2001;9(4):176–80. pmid:11286882
  32. 32. Zhang N, Harrex AL, Holland BR, Fenton LE, Cannon RD, Schmid J. Sixty alleles of the ALS7 open reading frame in Candida albicans: ALS7 is a hypermutable contingency locus. Genome Res. 2003;13(9):2005–17. pmid:12952872
  33. 33. Oh SH, Cheng G, Nuessen JA, Jajko R, Yeater KM, Zhao X, et al. Functional specificity of Candida albicans Als3p proteins and clade specificity of ALS3 alleles discriminated by the number of copies of the tandem repeat sequence in the central domain. Microbiology. 2005;151(Pt 3):673–81. pmid:15758214
  34. 34. Zoppo M, Lombardi L, Rizzato C, Lupetti A, Bottai D, Papp C, et al. CORT0C04210 is required for Candida orthopsilosis adhesion to human buccal cells. Fungal Genet Biol. 2018;120:19–29. pmid:30205198
  35. 35. Coleman DA, Oh SH, Manfra-Maretta SL, Hoyer LL. A monoclonal antibody specific for Candida albicans Als4 demonstrates overlapping localization of Als family proteins on the fungal cell surface and highlights differences between Als localization in vitro and in vivo. FEMS Immunol Med Microbiol. 2012;64(3):321–33. pmid:22106872
  36. 36. Coleman DA, Oh SH, Zhao X, Hoyer LL. Heterogeneous distribution of Candida albicans cell-surface antigens demonstrated with an Als1-specific monoclonal antibody. Microbiology. 2010;156(Pt 12):3645–59. pmid:20705663
  37. 37. Tavanti A, Hensgens LA, Ghelardi E, Campa M, Senesi S. Genotyping of Candidaorthopsilosis clinical isolates by amplification fragment length polymorphism reveals genetic diversity among independent isolates and strain maintenance within patients. J Clin Microbiol. 2007;45(5):1455–62. pmid:17329454
  38. 38. Pryszcz LP, Nemeth T, Gacser A, Gabaldon T. Genome comparison of Candida orthopsilosis clinical strains reveals the existence of hybrids between two distinct subspecies. Genome Biol Evol. 2014;6(5):1069–78. pmid:24747362
  39. 39. Schroder MS, Martinez de San Vicente K, Prandini TH, Hammel S, Higgins DG, Bagagli E, et al. Multiple origins of the pathogenic yeast Candida orthopsilosis by separate hybridizations between two parental species. PLoS Genet. 2016;12(11):e1006404. pmid:27806045
  40. 40. Pryszcz LP, Nemeth T, Saus E, Ksiezopolska E, Hegedusova E, Nosek J, et al. The genomic aftermath of hybridization in the opportunistic pathogen Candida metapsilosis. PLoS Genet. 2015;11(10):e1005626. pmid:26517373