Characterization and Phylogenetic Analysis of the Mitochondrial Genome of Glarea lozoyensis Indicates High Diversity within the Order Helotiales

Background Glarea lozoyensis is a filamentous fungus used for the industrial production of non-ribosomal peptide pneumocandin B0. In the scope of a whole genome sequencing the complete mitochondrial genome of the fungus has been assembled and annotated. It is the first one of the large polyphyletic Helotiaceae family. A phylogenetic analysis was performed based on conserved proteins of the oxidative phosphorylation system in mitochondrial genomes. Results The total size of the mitochondrial genome is 45,038 bp. It contains the expected 14 genes coding for proteins related to oxidative phosphorylation,two rRNA genes, six hypothetical proteins, three intronic genes of which two are homing endonucleases and a ribosomal protein rps3. Additionally there is a set of 33 tRNA genes. All genes are located on the same strand. Phylogenetic analyses based on concatenated mitochondrial protein sequences confirmed that G. lozoyensis belongs to the order of Helotiales and that it is most closely related to Phialocephala subalpina. However, a comparison with the three other mitochondrial genomes known from Helotialean species revealed remarkable differences in size, gene content and sequence. Moreover, it was found that the gene order found in P. subalpina and Sclerotinia sclerotiorum is not conserved in G. lozoyensis. Conclusion The arrangement of genes and other differences found between the mitochondrial genome of G. lozoyensis and those of other Helotiales indicates a broad genetic diversity within this large order. Further mitochondrial genomes are required in order to determine whether there is a continuous transition between the different forms of mitochondrial genomes or G. lozoyensis belongs to a distinct subgroup within Helotiales.


Introduction
The filamentous fungus Glarea lozoyensis (ATCC 20868) was originally isolated by plating filtrates of pond water near Madrid, Spain [1]. It came to be known as a producer of pneumocandins, non-ribosomal peptides with strong inhibitory effect on fungal glucan biosynthesis. G. lozoyensis ATCC 74030 is a mutant strain of the wild type used for the production of the antimycotic drug Caspofungin from Pneumocandin B 0 [2,3]. The taxonomy of G. lozoyensis has been revised several times [4]. It was designated Zalerion arboricola first [1], however, after thorough analysis of morphological and molecular data, it was classified as a new anamorphic genus in the order Helotiales [4]. Through sequence analysis of the ITS region of a broad array of species in Helotiales the closest relatives of G. lozoyensis were narrowed to some Cyathicula De Not species (Helotiaceae) [5].
Since mitochondrial (mt) genomes often evolve faster than nuclear genomes [6,7,8], they have been successfully applied as markers in evolutionary biology [9,10]. With the emergence of next-generation sequencing in the last years, the access to whole genomes has become easy and affordable. The number of completely sequenced mt genomes of filamentous fungi has increased dramatically [11,12], so that it is possible to use them for phylogenetic studies [13,14,15,16,17,18]. The Fungal Mitochondrial Genome Project was launched over a decade ago [19]. Now, more than 80 fungal mitogenomes are available. Nevertheless, only four of the Helotiales order, Phialocephala subalpina, Sclerotinia sclerotiorum, Botrytis cinera and Marssonina brunnea (incomplete annotation) have been sequenced so far (Table 1).
A fungal mt genome typically contains 14 conserved proteincoding genes, 22-26 tRNA genes, and 2 rRNA genes arranged very likely in circular form [20,21,22,23]. The mtDNA divergence between different fungal species is characterized by variations in intergenic regions, intronic sequences, and in the order of genes [24,25].
We have identified the mitochondrial genome of G. lozoyensis ATCC 74030 within the whole genome of the fungus, which was assembled by combination of data from Illumina MP and PE sequencing [26]. As there are only very few mt genomes known from species of the large Helotiales order, we decided to annotate this genome in detail and to investigate its relation to the known mt genomes of Helotiales. These were also used as main reference for a thorough manual revision of the automatic annotation obtained from diverse bioinformatic tools.

Cultivation and DNA Preparation
G. lozoyensis ATCC 74030 was grown for 20 days at 25uC on a YM agar plates (yeast extract 0.1%, malt extract 1.0%, agar 1.5% in H 2 O). The mycelium was scraped off the plate, freeze-dried and ground to a fine powder. Total DNA was isolated with a DNeasy Plant Mini Kit (Qiagen) according to the manufacturer's instructions.

Sequencing and Assembly
Whole genome shotgun sequencing of G. lozoyensis ATCC 74030 was performed by sequencing a paired-end library and an additional mate-pair library with an Illumina HiSeq 2000 sequencer. About 38 Gbp of reads with an average length of 75 bp were assembled to contigs using the CLC Genomics Workbench (CLCbio). Scaffold N50 is estimated to 870,933. A single 45,038 bp scaffold representing the complete mtDNA was identified by sequence similarity search to known fungal mt genomes.

Phylogenetic Analysis
To determine the evolutionary background of G. lozoyensis, a concatenation of 12 OXPHOS genes (atp6, cob, cox1, cox2, cox3, nad1, nad2, nad3, nad4, nad4L, nad5 and nad6) was compared with analogous sets of genes from 24 mt genomes published in GenBank. Protein sequence alignment was carried out for each protein using ClustalW with default options through a Galaxy server [37,28,29]. The aligned protein sequences were used to construct a maximum likelihood tree with PhyML 3.0 using LG as evolutionary model [38]. The reliability for internal branch was assessed using the aLRT test (SH-Like) as recommended in PhyML. The graphic representation was performed with Treeview (http://taxonomy.zoology.gla.ac.uk/rod/rod.html) and manual editing.

Genbank Accession Number
The G. lozoyensis ATCC 74030 mt genome sequence is deposited in GenBank under accession number KF169905.

General Features
The mitochondrial genome of G. lozoyensis ATCC 74030 comprises 45,038 bp, which is in the same range as several other mt genomes in Pezizomycotina (Table 1). 31.8% is covered with intergenic spacers of 3-1131 bp length and 15.04% with seven introns ( Table 2). Among the five mt genomes reported, three are relatively large ( Table 1) and only that of P. subalpina (43.7 kbp) is similar to that of G. lozoyensis ( Table 1). The differences in genome size are marked by a multitude of introns and endonucleases in B. cinerea and S. sclerotiorum and a large intergenic region in M. brunnea. With 58.1% of the genome encoding structural genes, the mt genome of G. lozoyensis is rather compact ( Table 2). It encodes the large and the small ribosomal RNA subunit (rnl and rns), 33 tRNAs, 14 putative proteins of the oxidative phosphorylation system (OXPHOS), 6 hypothetical proteins and 3 intronic proteins, of which one is ribosomal protein RPS3 and two are homing endonucleases (HE) ( Figure 1, Table 2). As in mitochondria of most other ascomycetes, all genes and tRNAs are found on the plus strand [13,14,15,16,17,18]. The overall G+C content of the mt genome is 29.8% (Table 2), consistent with the characteristic AT-rich nature of fungal mt genomes. The G+C content of regions encoding RNA genes is usually higher than the genome wide average. In G. lozoyensis it is 31% (Table 2), which is similar to other fungal mitochondria [39].
The mt genome of G. lozoyensis was compared with the fully annotated mt genomes of the Helotiales order, i.e., those of P. subalpina and S. sclerotiorum. BLASTn alignment [40] gave a sequence identity of 27% with the mt genome of P. subalpina and 19% with that of S. sclerotiorum. Moreover, DNA dot plot analysis showed that the mt genome of G. lozoyensis is colinear with that of P. subalpina ( Figure S1) [40]. Considering also the similar size, the mt genome of P. subalpina was chosen as the main reference for the manual annotation of the mt genome of G. lozoyensis.

Protein Coding Genes
The following genes encoding proteins involved in respiratory chain complexes (OXPHOS) were found in the G. lozoyensis mt genome: atp6, atp8, atp9, cox1-3, cob, nad1-6 and nad4L ( Figure 1, Table S1). Moreover, there are three intronic proteins, of which one is eroded (see section introns), and six hypothetical proteins (ORF1-6). All 14 OXPHOS proteins are highly conserved in P. subalpina (Table S1). Differences in sequence mainly refer to intronic proteins, which were not found in P. subalpina [13]. For instance, the large ribosomal subunits (rnl) in G. lozoyensis shares only a sequence identity of 45% with the corresponding feature in P. subalpina. This is mainly due to the presence of intron IE (2250 bp) in G. lozoyensis rnl, which also includes the gene for ribosomal protein S3 (rps3). In P. subalpina IE does not exist and rps3 forms a separate ORF [13]. ORF included in introns are presumably an excellent criterion for inferring phylogenetic relationships of fungi. In Pezizomycotina, the intronic rps-like protein may even play a role in maintaining the integrity of the mt genome [41].
The length of the cox1 gene varies widely within the investigated species. In G. lozoyensis it includes 5,412 bp, in S. sclerotiorum 12,458 bp and in P. subalpina only 1,720 bp. In contrast to the latter, S. sclerotiorum and G. lozoyensis cox1 contain several intronic proteins such as GIY-YIG and LAGLIDADG. For that reason, S. sclerotiorum cox1 was chosen as additional reference for intronic gene annotation in G. lozoyensis cox1. The G. lozoyensis GIY-YIG (842 bp) shares 87% sequence identity with a GIY-YIG of S. sclerotiorum (SS1G_20030.1; 816 bp). Since it is an eroded ORF, the DNA sequence was used for comparison instead of the protein sequence. The LAGLIDADG gene of G. lozoyensis shows only 17% identity to a LAGLIDADG from S. sclerotiorum (SS1G_20022; 323 aa, Broad Institute). However, a BLASTp search on NCBI database nr resulted a maximum identity of 76% with a LAGLIDADG of the Pezizomycotum Ajellomyces dermatitidis SLH14081. Of the six hypothetical proteins (ORF1-6) only ORF2 and ORF6 share similarities with known sequences. A part of ORF2 (132 aa) is similar to cytochrome c oxidase subunit I (cox1) of G. lozoyensis, even though only 39% of ORF2 is covered (e-value 3e 215 ). The sequence identity between both is 78% at the amino acid level. The occurrence of this fragment may be explained by partial gene duplication due to the presence of a transposon. In general, the evolution of gene orders in Pezizomycotina is mainly characterized by transpositions [13]. For example, in eight mt genomes of Phialocephala species, a duplication of the region around atp9 was found [13]. However, to confirm the transposon hypothesis in G. lozoyensis, more mt genomes of the same genus are required. ORF6 was found to be similar with a putative protein (ORF2) in the mt genome of P. subalpina (e-value: 7e 239 , 46% identity and 90% coverage). No significant sequence

Introns
A total of eight introns were identified in the coding genes of the G. lozoyensis mt genome by BLASTx search and sequence alignment (nr-database) [30]. Four of them are located in cox1 (coxIA, coxIB, coxIC and coxID). The others were found in the large ribosomal subunit rnl (intron IE), cox3, nad2, and nad5 ( Figure 1, Table S1). coxIA and IE belong to introns group I, which is dominating in fungal mt genes, while in plant mt genes group II introns are found more frequently [43]. It is characteristic for group I introns is that all upstream exons end with a ''T'' and all introns end with a ''G''. The conserved stems [44] were found in both genes (data not shown). Group I introns are considered to be mobile genetic elements interrupting protein-coding and structural RNA genes [45]. Most of them carry a ''homing endonuclease gene'' (heg) encoding a DNA endonuclease (HE), which catalyzes in the transfer and site-specific integration (''homing'') of the intron [46,47,48]. There are four families of DNA endonucleases (HEs) [49] denoted by the presence of conserved amino acid sequence motifs: GIY-YIG, HC-box, HNH and LAGLIDADG [50,51]. Among these the LAGLIDADG endonucleases form the largest group. They are encountered in some bacteria and bacteriophages as well as in organelle genomes of protozoans, fungi, plants, and sometimes in early branching Metazoans [52]. Two forms can be distinguished: Proteins with a single LAGLIDADG motif which dimerize and double-motif forms derived form a gene fusion of two monomeric forms [53]. In the G. lozoyensis mt genome an intact ORF encoding a putatively functional HE referred to LAGLIDADG (342aa) was found in cox1-intron coxIA. A Pfam analysis resulted only one LAGLI-DADG motive (Table S1), so that it can be considered as the single-motif form of LAGLIDADG. Moreover, in coxIB a frameshifted and inactive ( = ''eroded'') heg known as GIY-YIG (835 bp) with several stop codons in the sequence was found. Despite being inactive, the conserved domain of GIY-YIG was still identified by Pfam (see section Protein coding genes and Table S1). Eroded hegs are characterized by several point and length mutations resulting in frameshifts and stop codons interrupting the ORF [54]. In species from the basal fungal lineages many introns in the long cox1-gene carry eroded hegs [55]. The erosion is generally regarded as a preliminary step before complete elimination of the intron [49]. The presence of intact and eroded hegs strengthens the hypothesis that numerous events of loss and gain have occurred during evolution. Besides the heg, coxID contains also a DUF3839 (PF12943) conserved domain, whose function is unknown. Furthermore, in the intron of the large ribosomal subunit (IE) we found a protein encoding for ribosomal protein S3 known as rps3 (519aa) ( Table S1).

Genetic Code and Codon Usage
The codon usage of the G. lozoyensis mitochondrial ORFs was analyzed using genetic code four, which is common for Pezizomycotina mtDNA [56]. Most protein-coding ORFs start with the orthodox translation initiation codon ATG. Exceptions are: nad2 and cox3 with GTG, cox2 with TTA and cox1 with TTG. Six genes end with the stop codon TAG: cob, orf3, orf2, nad3, atp6, and nad5; all others with TAA, which is the preferred termination codon for fungal mt genes [22,57]. The intronic proteins, ribosomal protein S3 (rps3) and the putative LAGLI-DADG endonuclease, start with ATG but the stop codon is TAA for rps3 and TAG for LAGLIDADG. The most commonly used amino acid in the 22 protein genes is leucine followed by isoleucine (Table 3). Similar results are reported for other fungi [14,18]. As expected from the high AT content (79%), the most frequently used codons are composed exclusively of ''U'' and ''A'', e.g. UUA (2.78%), AUA (1.88%), AAU (1.38%), UUU (1.82%), AAA (1,27%), UAU (1.32%) and AUU (1.20%) ( Table 3). The codons UGC, CGC, CUC, CGA, CGG, UCG are underrepresented, being used one to twenty times less than the GC -rich codons ( Table 3).  tRNA Genes 33 tRNA genes were identified in the G. lozoyensis mt genome, which is more than the 22-26 tRNA genes typically found in fungal mt genomes (Table 4, Figure 1) [20,21,22,23,32,33,36]. However, in Helotiales the numbers of tRNA genes appears to be generally higher, e.g. in S. sclerotiorum there are 33 tRNAs, in B. cinera and M. brunnea 31 tRNAs (predicted by TRNAscan tool [32]). In contrast, only 27 tRNAs were found in P. subalpina. Further mt genome sequences are required to confirm this tendency. The majority of G. lozoyensis mitochondrial tRNA genes are organized into two dense clusters. The set of 33 tRNA genes is sufficient to decode all codons in the predicted ORFs, lessening the need for tRNA import from the cytoplasm into the mitochondrium [58]. For lysine, asparagine and glycine two tRNA genes were found with the same anticodon. Furthermore, two tRNAs with different anticodons were identified for arginine, serine, phenylalanine and leucine. For methionine and isoleucine there are three tRNAs genes with identical anticodons.

Phylogeny and Comparative Genomics
Since mt genomes often evolve faster than nuclear genomes, especially in intergenic regions [59,60], mitochondrial markers were successfully applied in evolutionary biology [61,62,63]. In Pezizomycotina, completely sequenced and annotated mt genomes are available for members of Eurotiomycetes and Sordariomycetes [11,12,64]. But only three genomes with complete draft annotations are available for Helotialean species ( Table 1). The Helotiales is one of the most diverse fungal order with more than 350 genera and over 2,000 species including many important plant pathogens [65]. In order to gain additional evidence for the classification of G. lozoyensis, we compared the amino acid sequences of 12 OXPHOS proteins (atp6, cox1-3, cob, nad1-6, nad4L) with those from 24 other fungi to build a phylogenetic tree (Figure 2). Most nodes in this tree have high bootstrap values, which indicate the robustness of the computed tree. Five classes of filamentous ascomycetes are clearly distinguished: Dothideomycetes; Leotiomycetes, Sordariomycetes, Eurotiomycetes and Saccharomycetes ( Figure 2). As found already in other studies [44], the mt genomes of yeast species cluster apart from those of filamentous fungi. G. lozoyensis is found amongst other species of the Helotiales order with high bootstrap support. This placement is in line with previous observations based on nuclear ribosomal internal transcribed spacers (ITS) [8]. The closest relative of G. lozoyensis was found to be P. subalpina. This is consistent with the high similarities found already in BLASTp analyses (see above and Table S1). A close relation between protein sequence similarity and a uniform organization of mt genomes has been found for the orders Onygenales [39] and Sordariales [18]. The arrangement of mt genes might even be used as a reference to derive a common evolutionary route in fungi. We compared the mt gene order in G. lozoyensis with that of the Helotialean species with sequenced and annotated mt genome, i.e. P. subalpina and S. sclerotiorum (Figure 3). It was found that the gene organization in G. lozoyensis deviates significantly from both fungi (Figure 3). Already the mt genomes of P. subalpina and S. sclerotiorum differ considerably [15]. Anchored genome alignments for the two adjacent species G. lozoyensis and P. subalpina show no close genome rearrangements ( Figure S2A). The similar result was obtained when S. sclerotiorum was included in the analysis ( Figure S2B). In contrast, there is a complete synteny in gene order between S. sclerotiorum and B. cinerea [66]. Additional mt genomes of Helotiales species are required to allow further conclusions about the considerably diverse mt gene arrangements in this order.

Conclusion
We have identified the complete mt genome of G. lozoyensis on a scaffold obtained by whole-genome sequencing. Previous studies based on RAPD, microsatellite-primed PCR and nuclear ribosomal internal transcribed spacers (ITS), suggest that G. lozoyensis belongs to the order Helotiales [5]. Our mt genome analysis clearly confirms that, BLASTp analysis of G. lozoyensis mt protein sequences yielded the corresponding genes from P. subalpina (Helotiales) as closest homologs (identities 70-100%); even though, not to all genes a homolog was found in P. subalpina. In a phylogenetic analysis based on the mt proteins of 24 fungi, G. lozoyensis was clearly classified as Helotiales with P. subalpina as closest relative. However, an anchored genome alignment of the Helotialean species G. lozoyensis, P. subalpina and S. sclerotiorum revealed that there is no synteny between these three apparently closely related species. This clearly demonstrates that species within the large Helotiales order species can be highly diverse. More mt genomes of the order Helotiales are required to find out whether the gaps between the differently ordered mt genomes can be fully closed or gene arrangements are in overall very diverse within this order. Figure S1 G. lozoyensis mt genome is colinear with that of P. subalpina. Dotplot of mt genomes based on BLASTn analysis (http://blast.ncbi.nlm.nih.gov/Blast.cgi) with an e-value cutoff of 10 210 . Sequence lengths are given along the axes in kbp. The shaded cells in the matrix indicate identical residues. a) G. lozoyensis and P. subalpina. b) G. lozoyensis and S. sclerotiorum. (PDF) Figure S2 Mauve genome comparison. Multiple alignments of four helotialean species, G. lozoyensis, P. subalpina, S. sclerotiorum and B. cinerea [67] were performed with the Mauve software package [67]. Locally collinear blocks (LCB) of the genome sequences are shown in identical colors and are connected with lines. For the genome of G. lozoyensis the annotation is displayed to allow the assignment of genes to LCBs. a) Alignment of G. lozoyensis and P. subalpina. b) Alignment of G. lozoyensis, P. subalpina and S. sclerotiorum. (PDF)

Supporting Information
Table S1 Mitochondrial genes annotation in Glarea lozoyensis and comparison to corresponding genes/proteins in Phialocephala subalpine. (XLS)