Mitochondrial Genome of Phlebia radiata Is the Second Largest (156 kbp) among Fungi and Features Signs of Genome Flexibility and Recent Recombination Events

Mitochondria are eukaryotic organelles supporting individual life-style via generation of proton motive force and cellular energy, and indispensable metabolic pathways. As part of genome sequencing of the white rot Basidiomycota species Phlebia radiata, we first assembled its mitochondrial genome (mtDNA). So far, the 156 348 bp mtDNA is the second largest described for fungi, and of considerable size among eukaryotes. The P. radiata mtDNA assembled as single circular dsDNA molecule containing genes for the large and small ribosomal RNAs, 28 transfer RNAs, and over 100 open reading frames encoding the 14 fungal conserved protein subunits of the mitochondrial complexes I, III, IV, and V. Two genes (atp6 and tRNA-IleGAU) were duplicated within 6.1 kbp inverted region, which is a unique feature of the genome. The large mtDNA size, however, is explained by the dominance of intronic and intergenic regions (sum 80% of mtDNA sequence). The intergenic DNA stretches harness short (≤200 nt) repetitive, dispersed and overlapping sequence elements in abundance. Long self-splicing introns of types I and II interrupt eleven of the conserved genes (cox1,2,3; cob; nad1,2,4,4L,5; rnl; rns). The introns embrace a total of 57 homing endonucleases with LAGLIDADGD and GYI-YIG core motifs, which makes P. radiata mtDNA to one of the largest known reservoirs of intron-homing endonucleases. The inverted duplication, intergenic stretches, and intronic features are indications of dynamics and genetic flexibility of the mtDNA, not fully recognized to this extent in fungal mitochondrial genomes previously, thus giving new insights for the evolution of organelle genomes in eukaryotes.


Introduction
Phlebia radiata Fr. is a saprobic, wood-colonizing and white-rot type of wood decay causing polypore fungal species of the class Agaricomycetes, phylum Basidiomycota, and is encountered in Eurasian and North-American forests generally on dead angiosperm wood [1,2]. We initiated de novo whole genome sequencing of P. radiata due to its notable biotechnological abilities in decomposition of wood components and lignocelluloses, and in oxidation and conversion of synthetic and milled wood lignin, and lignin model compounds [3][4][5]. The fungus is also efficient in degradation of xenobiotics and production of lignin-converting oxidoreductases like lignin peroxidases and manganese peroxidases, and laccase [3][4][5][6].
The draft assembly of 454-sequenced P. radiata genome resulted first with ca. 300x coverage of a single scaffold and circular dsDNA molecule of over 156 kbp in size, which turned out to be the mitochondrial genome. Mitochondria are cellular organelles of eukaryotes which support individual life-style and generate proton motive force for production of ATP and energy via respiration [7][8][9]. Mitochondria are also known to participate in many other indispensable cellular processes such as calcium homeostasis, cell aging and apoptosis, iron metabolism, and synthesis of ironsulphur clusters for oxidoreductive enzymes [9][10][11][12].
Essence of mitochondria is accepted to arise from endosymbiosis [13,14], most reliably of the SAR11 clade ancestor marine bacterium (pelagibacteria) [15]. Adaptation to the host organism has resulted with co-evolution of the mitochondrial genome and gene flow to the host genome [7,8,16,17]. It was previously considered that mitochondrial genomes are small and compact, according to information mostly achieved from metazoa, such as the only 16 kbp-size human mitochondrial genome [9]. This notion has, together with the retarded mtDNA sizes, previously led to the proposal of the ''vanishing mitochondria'', especially in fungi [8].
Complete genome sequencing on eukaryotic micro-and macroorganisms has, however, demonstrated a higher degree of mitochondrial genome structural complexity, and variation in the mtDNA size than was previously realized. Complicated network of mini-circle mtDNAs are present in the basal body mitochondrion of the Kinetoplastida protozoa [18], when the largest mt genomes are described for Embryophyta and Charophyta [19,20], i.e. for land plants and green algae. In angiosperm flowering plants, the mtDNA varies highly in size (200 kbp to 11 Mbp) and may be organized to multiple chromosomes [20][21][22]. So far, the largest plant mt genomes were recently sequenced for Silene species as complex entities with up to 128 circular-mapping chromosomes [22].
Together with our study, recent genome sequencing reports indicate that fungal mitochondrial genomes have a much higher degree of variation in size, gene content, genomic organization and gene order, and gene intron-exon construction than has been realized previously. We acknowledge that the high number of intron-homing endonucleases (HEs) recognized in the P. radiata mtDNA may play an editing role, both in genome replication and gene transcription, as well as an integrating role for intron and gene transposition in the mtDNA. Another unique feature is the duplicated ''mirror'' region in the genome, which together with the repetitive-element dense sections may promote both DNA recombination and gene transcription. We also discuss mtDNAencoded proteome phylogeny in relation to tRNA evolution and ORF codon usage, in regard to the currently accepted concept of fungal systematics.

Fungal Isolate and Cultivation
Phlebia radiata Fr. strain 79 (FBCC0043) was originally isolated from a distinguishable fruiting body found in South Finland on white-rot decayed alder (Alnus incana), and maintained in the Fungal Biotechnology Culture Collection at the Department of Food and Environmental Sciences, University of Helsinki, as living mycelium on 2% (wt/vol) malt extract, 1.5% (wt/vol) agar slants under paraffin oil at 12uC. Species identification is based on both macroscopic features of the original fruiting body and mycelium, as well as at molecular level on ribosomal 18S rRNA gene and ITS1-5.8S-ITS2 bar coding sequences [33]. For isolation of total DNA, the fungus was cultivated in liquid 2% (wt/vol) malt extract broth for 10 days at 28uC in the dark. After cultivation, the mycelial mats were harvested and washed with cold ultrapure water, frozen to 220uC, and lyophilized.

DNA Isolation
Dry mycelium was quickly ground in acid-washed and autoclaved mortar. DNA was isolated using a modified version of the hot-CTAB extraction at 65uC [34], followed by phenolchloroform and 3x chloroform-isoamyl alcohol extractions, and incubation with 0.1 mg/ml Proteinase K (Fermentas) for 30 min at 55uC. Total DNA was precipitated overnight with isopropanol at 4uC, centrifuged at 6500 g 30 min at 4uC, washed twice with 70% ethanol, and subjected to 50 U/ml of RNAseA (Fermentas) treatment at 37uC overnight. After chloroform-isoamyl alcohol extraction, and re-precipitation with ice-cold 94% ethanol overnight at 220uC, DNA was dissolved in sterile TE (10 mM Tris-HCl buffer with 1 mM EDTA, pH 7.5) solution. Integrity an d amount of the isolated total DNA was examined by 1.5% (wt/ vol) agarose gel electrophoresis, and using the NanoDrop 1000 Spectrophotometer (Thermo Scientific).

Sequencing and mt Genome Assembly
Single-stranded template DNA (sstDNA) was sequenced using the 454 sequencing technology with GS FLX Titanium chemistry (Roche, 454 Life Sciences). Number of obtained reads was 1 876 081 containing 752 Mbp of both genomic DNA (gDNA) and mitochondrial DNA (mtDNA). All reads were assembled using Newbler (Roche, 454 Life Sciences) software. Mitochondrial contigs containing high average sequence coverage (approximately 300x) were placed in proper order, resulting with single scaffold, and a finished mtDNA circular genome was defined being 156 348 bp in length. Circularity and sequence orientation, in particular for the large duplicated region, was verified with genome-walking PCR.

Gene Annotation and Bioinformatic Analyses
The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code (NCBI translation table 4) was at first assumed for ORF detection. Protein-coding and rRNA genes were annotated by blastp and blastn queries against non-redundant NCBI databases [35][36][37], and localised and annotated in the mtDNA sequence using Artemis [38] software. Intron-exon boundaries of the conserved genes were adjusted manually on the basis of ClustalX [39] multiple Basidiomycota mt coding sequence alignments. Transfer-RNAs were identified with tRNAscan-SE [40]. HEs were recognized by Pfam 26.0 database [41] queries. Protein domain images were generated with ExPASy PROSITE MyDomains Image Creator (http://prosite.expasy.org/mydomains/) and edited in Inkscape version 0.48.2 (http://inkscape.org/). Intron types were determined with RNAweasel algorithm [42]. Nucleotide sequence repeat elements were identified and analysed with the EMBOSS package Nucleic repeats group tools [43], and by performing a local blastn [35] query of the complete mtDNA sequence against itself. The hits were clustered as a function of similarity in CD-HIT Suite [44], and the h-cd-hit-est algorithm was run with consecutive 0.75, 0.80, and 0.90 cut-off values, using the sequence set that returned ,0.001 blastn E-values in the 1 vs. 1 search.

Phylogenetic Analyses
Genome accessions of completely sequenced fungal mtDNAs were retrieved from NCBI Organelle Genome Resources website [19], and linked to corresponding proteomes through GenBank [45] queries. Subsequently, super alignments were generated from USEARCH [46] de-replicated proteomes with the core of Hal pipeline [47], allowing 50% of missing data. Phylogenetic trees were constructed from 44 fungal taxa and 2 019 aa remgaps super alignment first with RAxML 8.0.0 [48] with 100 rapid bootstrap repetitions and automatic model selection (-f a -d -m PROT-GAMMAAUTO) using Blastocladiomycota as outgroup (bestscoring aa model was MTZOA), and with PhyloBayes 3.3f [49] using default options, 2 parallel chains were run until maxdiff was ,0.1, first 100 trees were discarded as burn-in, and one in ten remaining trees were sampled for posterior consensus. The tree was rooted from mid-point. Nodes receiving #0.8 posterior consensus (Bayesian) or #80 bootstrap support (ML) were collapsed to polytomies with TreeCollapseCL4 (http:// emmahodcroft.com/TreeCollapseCL.html). The trees were edited in FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Correlation Analyses
Sequence similarity of the core domain aa-sequences from 57 HEs in the P. radiata mtDNA were analyzed by generating aasequence pairwise distance matrix of the LAGLIDADG 1 and 2, and GIY-YIG catalytic ORFs using Geneious 5.5.5 software. In addition, pairwise distance matrices of the HE domain loci were calculated using the R environment 2.14.1 package for Windows (http://www.r-project.org/) in order to test correlation of the locus distance to the sequence-similarity based (evolutionary) distance. The data matrices were tested for being parametric or nonparametric. LAGLIDADG 1 aa-sequence similarity scores were normalized with logarithmic transformation. Parametric Pearson correlation in PASW Statistics 18, release 18.0.0 (SPSS Inc., Chicago, IL, USA) was used for LAGLIDADG 1, and GIY-YIG type HE domains, when the non-parametric Spearman's correlation test was applied to LAGLIDADG 2.

P. radiata mtDNA Genome Structure and Conserved Genes
The mitochondrial genome (mtDNA) of Phlebia radiata isolate 79 was achieved by de novo 454 sequencing of total DNA using Titanium chemistry, and the final assembly resulted in a single 156 348 bp scaffold with a sequence coverage of over 300x, representing one circular dsDNA molecule ( Figure 1) with a GC percentage of 31.1. The genome contains the 14 protein-coding genes typical to fungal mtDNA, which are related to the mitochondrial inner membrane Complexes I, III, IV and V of the respiratory chain, i.e. cox1, atp6, cox2, cox3, nad4L, nad5, atp8, nad2, nad3, atp9, cob, nad4, nad6, nad1, in clockwise order of the mtDNA ( Figure 1). Additionally, 31 conserved genes related to information transfer (28 tRNAs, rnl, rns, and rps3) were identified (Tables 1, 2).
Identified protein (sum 68 953 bp), rRNA (sum 13 606 bp), and tRNA (sum 2 070 bp) genes including introns cover 55% of the P. radiata mtDNA. However, only about 15% (25 045 bp) of the genome refers to conserved coding sequences, when intergenic regions (in total 44% of the genome) and coding-sequence splicing introns (sum 59 584 bp, 38%) dominate the sequence space ( Figure 2). The majority of the conserved protein-coding genes were split by long introns into multiple short exons ( Figure 1, Table 1). The highest number of introns (13) was in the cox1 gene, which covered ca. 21 kbp (14%) of the genome.
Notably, the genes encoding atp6, tRNA-Ile GAU and tRNA-Phe GAA were present in two identical copies. The duplicate atp6 and tRNA-Ile GAU genes were identified in the ''mirror'' region, which comprised an inverted and almost identical 6.1 kbp region in the genome (Figure 1). On the basis of multiple sequence alignments, cob and nad6 ORFs had C-terminal fused extensions. Moreover, alternative 39-ends were found for atp6 and cox2 (Figure 3 A, B).

Open Reading Frames with Unknown or Non-conserved Function
In total, 108 ORFs in addition to the conserved genes met our initial search criteria (Table S1). From these, 39 produced significant (E-value #0.001) blastp hits against the nr database, with a Codon Adaptation Index (CAI) range of 0.299-0.800 in reference to the conserved protein-coding genes. The majority of these ORFs were intronic and were associated with HE domains ( Table 3). A notable exception was ORF793 within the long group II intron in the middle of cob gene ( Figure 1). This intronic ORF was associated with identified RNA-dependent DNA polymerase domain (annotated locus PRA_mt0165, reverse transcriptase) and had particularly low CAI-value of 0.299 (Table S1), which indicates relatively recent horizontal gene transfer from a genetically distant source, most probably of viral origin.
Due to annotated genome sequence submission requirements, ORFs that continued from undisrupted exon reading frames into putative intronic regions were 59-truncated to their first Met codons, which shortened eight annotated ORFs, and excluded five ORFs that returned significant E-values (Table S2). These ORFs may represent inteins (''protein introns'').
Freestanding P. radiata mtDNA ORFs that returned significant blastp hits were ORF588, ORF319, ORF314, ORF273 and ORF90, with respective CAI values of 0.624, 0.577, 0.747, 0.633, and 0.515 (Table S1), indicative of fungal mitochondrial origin. Two of these, ORF588 (PRA_mt0150) and ORF273 (PRA_mt0076), were the most similar to putative DNA polymerases of Pleurotus ostreatus mt genome (Table S1). Two coding sequences, ORF319 and ORF314, were the most similar to hypothetical proteins annotated in Moniliophthora roreri mtDNA as orf2 and hyp11, respectively. Notably, the 59-end of ORF319 is similar to that of the P. radiata mt nad6 gene (36/37 nt identities), as it is located at the edge of the mirror region ( Figure 1). The best hit for ORF90 in turn was a hypothetical protein annotated in the mtDNA of the Ascomycota species Ajellomyces dermatitidis.

Transfer RNAs and Codon Usage
The tRNAscan-SE algorithm identified 28 tRNAs (Table 2). This tRNA set is likely able to sense all the codons of the P. radiata mtDNA-encoded proteome. With the exception of anticodons of Trp and Ile tRNAs, where possible, U was always the anticodon wobble position base. In the remaining tRNAs, G was always used over A at the anticodon wobble position. For the tRNA-Cys, the tRNAscan-SE Cove algorithm predicted probability for a gene match score below the threshold value of 20.0. However, Cove scores were low as well for other Basidiomycota tRNA-Cys genes, e.g. in Phakopsora meibomiae (cove: 19.75), P. ostreatus (cove: 19.67), and Schizophyllum commune (cove: 22.36). This indicates that the P.radiata mtDNA tRNA-Cys gene is real despite the low Cove score obtained.
The GC-content of P. radiata mtDNA ORFs was 26.83% (1st letter GC: 34.14%, 2nd letter GC: 33.33%, 3rd letter GC: 13.00%), with no obvious bias observed in codon usage between the leading and the lagging strand encoded ORFs. With the exception of Trp, W-base (A or T/U) ending codons were preferred over S-ending (C or G) codons across all codon families. Cys codons showed the smallest bias with 72% UGU over 28% UGC. For Ala, Phe, His, Ile, Asn, Pro and Tyr the same percentage was $80%, and for the rest $90%. Some codons UGA (1), AAG (9), CGC (1), AGG (5), and CGG (2) may be unassigned, as they were only present in non-conserved regions, mainly in the putative non-translated C-terminal fused extensions of cob and nad6.

Phylogeny
The first phylogenetic analysis was established on the similarity and variations of codon usage in fungal mtDNA-protein coding sequence ORFs ( Figure 4). The restricted amount of species included in the analysis, however, already grouped P. radiata  Table 1. Annotated conserved protein-coding and rRNA genes, their characteristics, location and intron types in P. radiata mtDNA. cob [3] 119244 126306  TAA [1] nad5 starts from the adjacent in frame codon to nad4L stop. [2] nad3 uses the last A of nad2 stop codon for initiation Met's first nt. [3] Based on multiple sequence alignment of Basidiomycota cob genes the last conserved aa of P. radiata cob is 43 aa before the stop codon. [4] Based on multiple sequence alignment of Basidiomycota nad6 genes the last conserved aa of P. radiata nad6 is 122 aa before the stop codon. [5] The codon after the putative TAG stop codon is TAA. Empty cell, not present or observed, or unable to calculate. doi:10.1371/journal.pone.0097141.t001 mtDNA-proteome with Agaricomycotina. Exceptional was the positioning of C. neoformans far out from the other Agaricomycotina species.
Our protein phylogenetic approaches, the maximum-likelihood based RAxML, and the Bayesian Monte Carlo Markov Chain sampler PhyloBayes, reconstructed the recognized fungal phyla as monophyletic clades. However, in RAxML the branching of Chytridiomycota, Glomeromycota, and Dikarya (Ascomycota and Basidiomycota) was polytomic, whereas in the PhyloBayes derived tree, Chytridiomycota and Monoblepharidomycota were a sister lineage to Glomeromycota and Dikarya, and Glomeromycota were a sister lineage to Dikarya ( Figure 5). Further, in the Bayesian phylogeny, the incertae sedis species (previous Zygomycota) R. oryzae and M. verticillata were within the Glomeromycota/Dikarya branch.
Basidiomycota subphyla were resolved by both methods to current fungal taxonomy with the single exception of the Agaricomycotina classified C. neofarmans node (two strains, Figure 5). As with higher-level taxonomy, PhyloBayes seemingly solved Basidiomycota subphylum level phylogeny with less polytomies, placing Agaricomycotina as a sister lineage to the Pucciniomycotina/Ustilaginomycotina group. P. radiata positioned nearest to Trametes cingulata and two Ganoderma species ( Figure 5).

Introns and Intron Homing Endonucleases
Nine of the 16 fungal mitochondrial conserved genes in P. radiata mtDNA were interrupted by over 1 000 bp long introns (Table 3). RNAweasel [42] algorithm detected 29 group I and two group II intron structures, out of which all but one were located within regions that were determined to be intronic also by our manual approach (blastp, blastn, Pfam queries, ClustalX alignments). Our semi-manual approach (blastp, blastn, Pfam queries, ClustalX alignments) predicted seven additional introns. Four of these were associated with core catalytic HE domains within the nad1, nad2 and rnl gene regions. Despite the lack of sequence homology, three shorter introns were inferred to reside within the rns gene (Table 3). In total, 29 introns were associated with HE domains. Introns varied in length from 201 bp (intron 2 in rns) to 3 420 bp (intron 5 in cox1), with an average length of about 1.5 kbp for the protein-coding gene splicing introns (Table 3).  [1] tRNAscan predicts the tRNA type from the anticodon. [2] This tRNA was determined to be Ile through comparative means (see below). [3] The bit score of tRNA-Cys was below 20, which is a typical cut-off value for a pseudogene. The gene was predicted with exceptionally low score from all Basidiomycota mtDNAs. doi:10.1371/journal.pone.0097141.t002 Based on Pfam queries, the P. radiata mtDNA contained 57 characteristic protein domains encoded by HE genes (HEGs, Pfam family PF05204; Table 4) belonging to three different structural families: LAGLIDADG (47) with subtypes 1 (28) and 2 (19), and GIY-YIG (10). The catalytic HE domains expanded from 21 to 181 aa (59 to 542 bp), and their aa pairwise sequence similarities varied from 3.2% to 32% for LAGLIDADG 1, from no identity to 67% similarity for LAGLIDADG 2, and from 12% to 52% for GIY-YIG type of domains (Table 4, Table S5). The HE motifs were predominantly (45/57) situated within group I type introns. One single GIY-YIG domain located within a group II intron, and 10 motifs located in regions of unrecognized intron type ( Table 3).
Eight of the HE catalytic domains were exceptional in appearing as free-standing after the last putative coding sequence exon and stop codon in the genes atp6 and cox2. Notably, alternative C-termini were annotated for both of these genes ( Figure 3A, B). The parametric Pearson's correlation test of pairwise aa-similarity and locus distance for the identified HE domain types (Table S1, S2, S5) resulted in correlation value of 20,166 with a statistically significant p-value (0.031) for LAGLIDADG type 1, and correlation value of 20.256 with, however, a statistically insignificant p-value (0.089) for GIY-YIG type. The non-parametric Spearman's correlation test for LAGLIDADG type 2 (19) resulted in a test value of 20.040 but with statistically insignificant p-value (0.607). These results infer  that genome (intron) location is to some extent related to the degree of HE sequence similarity, at least for the LAGLIDADG 1 HEs. However, it may also be concluded that HE motif transposition to more distant locations are equally allowed, as is observed for LAGLIDADG 2 and GIY-YG domains.

Inverted Duplication and Other Repeated Elements
A distinguishing feature was the inverted duplication region of 6 075 bp in size that accounted for 3.9% of the P. radiata mtDNA. One region (ID1) expanded from nt position 140 421 to 134 346 and the other (ID2) from nt position 36 285 to 42 360 (Figure 1), which is named ''mirror region'' in our EMBL submitted and annotated P. radiata mt genome. Both regions harboured the two genes atp6 and tRNA-Ile GAU , and differed only by 3 nt -in the plus DNA strand at nt position 40 066 with an additional A, at nucleotide position 41 338 with absence of T, and at position 42 353, T instead of G.
A major difference between the ID1 and ID2 regions was the start of the coding sequence of the single-copy gene nad6 within the 39 end of the ID1 region ( Figure 1). Another difference was the occurrence of a single-copy, functionally unknown ORF319 (PRA_mt0074) that was only recognized in ID2. However, the ORF was only 1-nt different in the 59 end sequence (first 37 nt) compared to nad6 in ID1. In addition to the mirror region, the P. radiata mtDNA is frequent with short (#200 nt), dispersed, and partially overlapping tandem repeat sequence motifs ( Figure 6A, B), in particular between positions 85 000 to 100 000 nt, where also tRNA encoding genes were clustered (Figure 1). The most abundant repeat sequence types were dispersed and inverted repeat sequences, which were almost exclusively localized into intronic, and especially into intergenic regions ( Figure 6A). These repeats were often overlapping, and covered as much as 15% of Table 3. Introns, their lengths and types in the conserved P. radiata mtDNA-encoded genes.   the P. radiata mtDNA. Subsets of these sequences shared high sequence similarity ( Figure 6B).

Origin of Replication
According to G/C skew analysis, origin of replication (oriC) of the P. radiata mtDNA may be located around 11:30 to 00:30 o'clock (position 153 000 to position 7 nt) regarding to the largest bias of G over C (Figure 1). The lowest G/C skew ratio in turn is located around 7:00 to 8:30 o'clock (positions 88 000 to 109 000 nt), in the about 21 kb size intergenic region indicative of a putative mtDNA replication termination site, which is supported by switching of the coding strand (orientation of transcription) at this site, around position 103 000 nt.

Sequence Accession
The complete and annotated Phlebia radiata 79 mitochondrial genome sequence is available under the accession codes [EMBL: HE613568] and [NCBI: NC_020148].

mtDNA Size and Genome Organization
To our knowledge, at 156 348 bp, the mitochondrial genome of P. radiata described in our current study is the second largest completely sequenced, gene annotated and located mtDNA among fungi, and presents specific features as signs of genetic flexibility, recombination history and active editing process of the genome. Our findings on the size and original features of the P. radiata mtDNA, together with other recent Basidiomycota mitochondrial genome studies are thereby not explicitly supporting the previous conclusions for the rather small sized and disappearing fungal mitochondrial genomes.
On the contrary, fungal mt genomes apparently vary greatly in size, from ca. 12 kb kbp of the Cryptomycota parasite species Rozella allomycis [19] to over 235 kbp (165 kbp main mtDNA [50]) of the Basidiomycota Agaricomycotina species Rhizoctonia solani strain AG3 Rhs1AP ( Table 5). Evidence of large variations in the Figure 5. Phylogeny of fungal mitochondrial proteomes. The statistically most likely tree was derived by Bayesian inference from a multi-gene superalignment of mtDNA-encoded proteins (2 019 aa positions, 44 taxa). Posterior consensus support values are depicted for branching, and nodes receiving #0.8 support were collapsed into polytomies. The tree was rooted from mid-point. Colours refer to phyla or sub-phyla: turquoise, Chytridiomycota; yellow, Monoblepharidomycota; orange brown, Glomeromycota; dark blue, Ascomycota; pink, Ustilaginomycotina (Basidiomycota); red, Pucciniomycotina (Basidiomycota); bright green, Agaricomycotina (Basidiomycota); Blastocladiomycota node as outgroup. Scale bar indicates amino-acid substitutions per site. For species and mtDNA accession, see Table 5. doi:10.1371/journal.pone.0097141.g005 genome size and a high degree of mtDNA structural complexity between eukaryotic organism lineages and species is currently accumulating through sequencing projects. For fungi, such extreme variations in the mtDNA size or multi-chromosomal organization have not, however, been noticed than is updated for plant (from 0.2 to 10 Mbp mtDNAs), algae and protozoa mitochondria [18][19][20][21][22]. Fungal mitochondrial genomes are usually mapped as single circular dsDNA molecules [7,8,[23][24][25][26][27][28][29]. We likewise assume a similar configuration for the P. radiata mtDNA on the basis of our sequence assembly, bioinformatic analyses and PCR. Reports on more linear than circular chromosomal structure of the mtDNA in the Chytridiomycota species Hyaloraphidium curvatum [30], in the Ascomycota yeasts Candida albicans [31] and Saccharomyces cerevisiae [32,51,52] indicate that the possibility for a partial linear or linear-circular chromosomal organization for the P. radiata mtDNA cannot be completely ruled out, despite the convincing circular assembly which was obtained from the careful study of our sequence data.
With fungal mt genomes of less than 30 kb in size, usually all the 14 fungal mtDNA-conserved, mitochondrial inner-membrane protein complex I, III, IV and V protein subunit-coding genes are present. Examples of these compact mt genomes within Basidiomycota are species of the animal-pathogenic genus Cryptococcus, with some variation of the mtDNA size (24-34.7 kb), gene order and intronic and ORF coding sequence between species and variants [53,54]. The large fungal mitochondrial genomes, alike P. radiata mtDNA, expand over 100 kb in size and may include over 50 protein-coding ORFs ( Table 5). The tendency for larger fungal mt genomes (over 90 kbp) in the Basidiomycota subphylum Agaricomycotina is pinpointed by recent reports on Moniliopthora perniciosa and M. roreri [24,28], T. cingulata (over 90 kb) [26], A. bisporus (135 kbp) [23] and R. solani (165 kbp/235 kbp) [50]. This tendency is furthermore confirmed by our study on the 156 kbp P. radiata mtDNA showing up to 126 predicted protein-coding ORFs and 30 RNA genes, which are the second highest numbers reported for the fungal mitochondrial genomes.
The largest mtDNAs within Ascomycota are those of the filamentous species Podospora anserina (over 100 kbp) and Chaetomium thermophilum var. thermophilum (over 120 kbp), similar to many of the Agaricomycotina basidiomycete mtDNAs, when the smallest (from Ascomycota yeasts) mt genomes are reduced to about 20 kbp in size (Table 5) with only 10 protein-coding genes [19]. Among Ascomycota fungi, however, there are yet large variations at the genus level -between species -of both mtDNA size and gene content. In the yeast S. cerevisiae, the 85.8 kbp mtDNA harnesses 19 protein-coding genes when only the mitochondrial Complex I NADH dehydrogenase subunit encoding genes are absent [19,29,51]. In another species of Saccharomyces, S. castellii, the mtDNA is reduced to 1/3 in size (25.7 kb) and contains only 9 protein-coding genes [51].
Only slight differences, however, in mtDNA size, gene number and organisation have been observed in the Basidiomycota genera Moniliophthora (Agaricomycotina) [24,28], Tilletia (Ustilaginomycotina) [19], Phakopsora (Pucciniomycotina) [27], and Cryptococcus (Agaricomycotina) [53,54], thus indicating genus-level conservation of the mitochondrial genomes in the phylum Basidiomycota. The large differences in gene order and location (loss of synteny) between the Basidiomycota mitochondrial genomes at higher taxon levels, as was reported for Ganoderma lucidum (Agaricomycotina) mtDNA [55], and is observed in this study for P. radiata mtDNA, may thus indicate frequent recombination events and flexibility of fungal organelle genomes. Table 4. Intron-homing endonuclease domains and their location in the P. radiata mtDNA.

Intergenic and Repetitive Sequences
In the large angiosperm plant mt genomes, the genome size is mainly due to long intergenic regions and non-coding sequence (gene spacers, introns, and pseudogenes) [20,22]. Accordingly in the large P. radiata mtDNA, over 80% of the mt genome is either intergenic or intronic sequence (Figure 2). Our analyses show that most of the intergenic sequence space in the P. radiata mtDNA is filled with repeated sequences, in particular between genome nt positions from 90 000 to 110 000 surrounding several tRNAcoding gene loci. Accumulation of polymorphic microsatellite repeated elements (1-6 nt in length) were reported for species of the Agaricomycotina genus Agrocybe [56], and already in S. cerevisiae mtDNA, long AT rich stretches were identified [29]. Putative roles in splicing of mitochondrial polycistronic transcripts may be proposed for the intergenic regions, as well as action as potential promoters with regulative element motifs.
Surprisingly, the mtDNA of P. radiata contains a large (6.1 kbp) inverted duplication segment. This is another similar feature to the angiosperm plant mitochondrial genomes, where large duplicated sequences in the mtDNA apparently function as co-linear recombination sites to aid genome organization [21,22]. In the Ascomycota species C. albicans, the very similar in size (7 kbp) inverted repeat regions in the apparently linear mtDNA are directing genome replication via homologous recombination at the site [31]. Interestingly, the C. albicans mtDNA inverted repeat harnesses duplication of the gene cox3, whereas in the P. radiata mtDNA mirror site, we recognized duplication of the genes atp6 and tRNA-Ile GAU . Two inverted repeats of over 4 kbp in size were reported for another Agaricomycotina species, A. bisporus, devoid of protein-coding ORFs but containing duplicated sets of tRNA genes [23]. At the moment, we may suggest a replication-directing recombination function for the inverted duplication in the P. radiata mt genome, similar to that observed in C. albicans mtDNA [31].

Introns and Homing Endonucleases
Together with the high portion of the intergenic regions, notable in P. radiata mtDNA is the degree of invasion by mobile DNA-elements, which were recognized as long type I and II selfsplicing introns (in total over 30 long introns), and including up to 57 HE domain-encoding ORFs. Nine of the 15 conserved proteincoding genes, and the rnl gene in P. radiata mtDNA are invaded by long introns carrying HE motifs of LAGLIDADG types I and II, when the ten recognized GIY-YIG motifs correspond a minority of the HE domain types.
Homing endonuclease genes were previously identified within mtDNAs of other Basidiomycota species, such as M. perniciosa and M. roreri [24,28] and A. bisporus [23]. However, only four of the protein-coding genes (cob, cox1,2, and nad5), and the rns and rnl genes were previously reported to contain self-splicing introns with HE motifs in these fungal mtDNAs, when in the P. radiata mtDNA, nine of the 11 intron-containing conserved genes contained intronic HE domains. In the elongated cox1 gene of P. radiata, eleven LAGLIDADG and four GIY-YIG motifs were recognized in 13 long, self-splicing type I (12) and type II (1) introns. In regard to this, even up to 18 introns, both type I and II, were recognized in the almost 30 kb-size cox1 gene of A. bisporus [57].
A few reports enlighten the enzymatic and molecular functions of intronic HEs in the fungal mtDNAs. In gene transcription, the HE domains are apparently removed from the transcribed pre-mRNA resulting in a contiguous RNA transcript [58][59][60]. It is likely that existence of intron-homing endonucleases within fungal mtDNA genes is one apparatus for promoting genetic diversity and adaptive response for the mitochondrial genome, when the allelic recombination events may be impossible or rare due to the mainly uniparental nature of mtDNA inheritance in fungi [10,17].
In the Ascomycota species Aspergillus nidulans, C-terminal fragment of the mtDNA cob gene intron-homing translated LAGLIDADG type endonuclease (I-AniI) is involved in splicing  Table 5. Fungal mitochondrial genomes, representatives of Basidiomycota and other phyla. Phlebia radiata mtDNA, this study; *R. solani 162 751 kbp [50], 235 849 kbp [19]. of the intronic pre-mRNA, while the second, N-terminal endonuclease motif was not essential for intron splicing [58]. Instead, the N-terminal motif functioned in cleavage of the DNA target site to initiate the HE intron mobilization. These reports on well-ordered and bi-functional effects of the intronic HEs imply their active involvement in supporting genetic flexibility in the fungal mitochondrial genomes. The HEs recognize longer DNA target regions (14-40 bp) than common DNA-endonucleases and tolerate more sequence variation, which assists in interrupting and introducing new genetic elements (usually introns) and ORFs (mainly intronic HE domains) in their target sites [59,60]. In P. radiata mtDNA, two potential examples (genes atp6 and cox2) of HE-transmitted introns and alternative coding sequences, both C-terminal, are observed ( Figure 3A, B). Unusual splicing of the mtDNA atp6 C-terminus was early on reported for the Blastocladiomycota species Allomyces macrogynus [61]. Additional intron with HE was found as an insertion including foreign C-terminal atp6 sequence fused inframe, which was explained by horizontal gene transfer [61]. However, a more likely explanation is that intronic HEs may have transposed to a new site downstream of the stop codon, as has apparently occurred in P. radiata atp6 leading to alternative Ctermini, but in the case of A. macrogynus atp6, introducing somewhat altered homolog of the new C-terminus.
Additional C-terminal coding sequence was likewise inserted into the mitochondrial intronic rps3 gene of the Ascomycota Ophiostoma novo-ulmi subsp. americana [62]. In the latter case, the HE insertion apparently shifted a small portion of the rps3 coding region downstream and disrupted the ORF with a premature stop codon. Accordingly, in the P. radiata cox2 gene, the duplicated Cterminus is intervened by a 2.5 kb sequence containing catalytic HE domains, and is fused after a premature stop codon in the first C-terminus.
The group I and II type self-splicing introns and HEs are interlinked since the self-splicing introns need endonuclease activity to assist splicing of the transcribed intronic RNA [58][59][60]. If the numerous HE motifs in the P. radiata mtDNA are active, the homing processes may modify their target genes, and even the genome size and structure considerably -in the course of either a long or short evolutionary time period -as is seen in the mt genomes of the animal-pathogenic Cryptococcus spp. [53,54]. Another function for the multiple HE domains could be regulation of transcription of their target genes, which is observed for bacterial viruses [60].
Considering the density of group I type introns (26) and their high frequency of HE domains in the core genes of P. radiata mtDNA, it is well established to expect that if functional, the HEs could assist in intronic RNA splicing and thereby affect transcription of their target genes, with a possibility for alternative intron splicing. The mitochondrial Complex IV cytochrome oxidase subunit 1 encoding cox1 gene of P. radiata is particularly interrupted by long self-splicing introns (13) containing multiple HEs, as seemingly is general in Agaricomycota mtDNAs [24][25][26][27][28]57]. In P. radiata mtDNA, cox1 is apparently the first replicated and transcribed gene in sense (leading strand) orientation ( Figure 1). Whether regulation of P. radiata mtDNA gene expression is mediated by the multiple introns and their identified HEs will be of our future concern.
Plasmid-originating Features Although mitochondrial plasmids have been sequenced, and plasmid-originating genes are identified in fungal mtDNAs [8,[23][24][25][26][27][28]50,63,64], our sequence analyses supported no individual plasmid dsDNA features in the P. radiata mtDNA. We were also unable to detect integrated-plasmid like sequence regions with inverted repeat ends as has been reported for other Agaricomycota species, i.e. A. bisporus [23], M. perniciosa [24] and A. aegerita [64]. However, putative and degenerative plasmid and viral-originating features, such as the cob gene intron-located reverse transcriptase (RT, RNA-directed DNA polymerase), and DNA polymerase B encoding dpob genes were identified in the P. radiata mtDNA, thereby possibly indicating previous plasmid-transmitted DNA integration events. The mitochondrial type II intron-homing and retrovirus-related reverse transcriptase [65,66] may function in plasmid integration to the mtDNA, and due to template-switching capacity, intron loss and gain to the mitochondrial genomes may occur.

tRNA Assembly, Codon Usage and Phylogeny
Fungal and animal mitochondrial genomes generally have a single tRNA gene for each synonymous protein-coding codon [67]. This also applies to the mitochondria of Basidiomycota, and implies extensive codon third nucleotide (wobble) pairing, similar to that observed for the tRNAs of the bacterium Mycoplasma capricolum [68], i.e. the tRNA anticodon first nucleotide (anticodon wobble) U pairs with all four third position nucleotides, and the first anticodon G pairs with third codon position C or U nucleotides.
Assuming that these codon/anticodon recognition rules apply, the predicted tRNA set of P. radiata mtDNA is sufficient to translate its conserved mitochondrial proteome, except for the codons Ile AUA (274) and Trp UGA (1), which would require unusual ANG and ANC wobble-pairing to their respected tRNA anticodons. However, we infer from Basidiomycota tRNA-Met and tRNA-Ile multiple sequence alignments that one of the P. radiata mtDNA tRNAs with CAU anticodon is in fact a tRNA-Ile gene, and the predicted anticodon is likely edited. Likewise, based on the lack of UGA codons in P. radiata mtDNA conserved ORFs, presence of the canonical CCA anticodon in tRNA-Trp, and the high bias towards low GC-content, we infer that P. radiata mitochondrial genome does not utilize the Mold, Protozoan, and Coelenterate Mitochondrial Code (NCBI translation table 4) in which UGA encodes Trp.
From the frequency of UGA codons in fungal mtDNA-encoded proteomes, we infer that UGA has been assigned to Trp multiple times in the evolution of Basidiomycota mitochondrial genomes, i.e. in the lineage leading to the Pucciniomycotina genus Phakopsora, and in the lineage leading to the Agaricomycotina genus Moniliopthora, but not in the lineage leading to Phlebia. Likewise, we infer from sequence alignments of Basidiomycota cox3 genes that in C. neoformans, UGA induces a +1 nt frameshift, which restores sequence homology for the last 16 codons of the gene in reference to other Basidiomycota cox3 genes. We conclude that with this repertoire of mtDNA-encoded tRNAs, the mitochondria of P. radiata do not require tRNA import from the cytosol.

mtDNA Proteome Phylogeny
Maximum-likelihood and Bayesian inference approaches of the Basidiomycota mtDNA-encoded proteomes resulted with wellsupported and systematically consistent evolutionary trees in line with current multigene-based fungal taxonomies [69,70], both in respect to fungal phyla (-mycota) and within subphyla (-mycotina) ( Figure 5). P. radiata mt proteome grouped together with other Agaricomycotina species, nearest to Ganoderma spp. and T. cingulata, which also belong to the same taxonomic class (Agaricomycetes) and share similar, wood-decaying white-rot saprobic lifestyle with P. radiata. The opportunistic pathogen C. neoformans was the only exception in protein phylogeny by falling outside the subphylum Agaricomycotina, which is consistent with our mtDNA proteome ORF codon usage evolutionary analysis ( Figure 4). Multi-protein Bayesian evolutionary analysis positioned the yet insertae sedis subphyla Kixcellomycotina (Zancudomyces (Smittium) culisetae) nearest to Glomeromycota, and Mucoromycotina (Rhizopus oryzae) and Mortierellomycotina (Mortierella verticillata) together between Glomeromycota and Dikarya ( Figure 5). Otherwise the relationships between extant taxons were well resolved, thus indicating a strong signal for a single common origin of the Basidiomycota and fungal mt genomes.

Conclusions
Mitochondria are numerous in eukaryotic cells and thereby, mitochondrial genomes as well have high cellular copy numbers. Our study confirms the high degree of variety of fungal mtDNAs in genome structure and size, gene order and location, and exonintron structure of the protein-coding genes. This indicates that for mtDNA, continuous and adaptive modifications are allowed, including mobile genetic elements and signs of recombination events. Several features in the P. radiata mtDNA support such genetic flexibility and repair mechanisms, and regulation of transcription. Existence of the long inverted-duplicated region, frequency of repetitive sequence motifs, and especially the abundance of intron-homing endonucleases support these conclusions. Surprisingly, these features of P. radiata mtDNA, together with the large genome size, are shared with fungal, plant and algae mtDNAs.
Accurately characterized reference genomes including the mtDNAs are currently needed to aid in de novo sequencing and evolutionary studies of fungi. The novel P. radiata mtDNA features observed in our research indicate a general phenomenon for evolutionary pressure and genome diversity in mitochondrial genomes, not being as stable and compact integrities as previously considered. The fungal mtDNAs could thus serve as sources for evolutionary and biochemical studies of genetic mobile elements, intron loss and gain, virulence and adaptation, and targeted genetic engineering by the use of homing endonucleases.

Supporting Information
Table S1 Intronic and additional ORFs annotated in the P. radiata mtDNA. (XLSX)