Mitochondrial genome sequence of Phytophthora sansomeana and comparative analysis of Phytophthora mitochondrial genomes.

Phytophthora sansomeana infects soybean and causes root rot. It was recently separated from the species complex P. megasperma sensu lato. In this study, we sequenced and annotated its complete mitochondrial genome and compared it to that of nine other Phytophthora species. The genome was assembled into a circular molecule of 39,618 bp with a 22.03% G+C content. Forty-two protein coding genes, 25 tRNA genes and two rRNA genes were annotated in this genome. The protein coding genes include 14 genes in the respiratory complexes, four ATP synthase genes, 16 ribosomal proteins genes, a tatC translocase gene, six conserved ORFs and a unique orf402. The tRNA genes encode tRNAs for 19 amino acids. Comparison among mitochondrial genomes of 10 Phytophthora species revealed three inversions, each covering multiple genes. These genomes were conserved in gene content with few exceptions. A 3' truncated atp9 gene was found in P. nicotianae. All 10 Phytophthora species, as well as other oomycetes and stramenopiles, lacked tRNA genes for threonine in their mitochondria. Phylogenomic analysis using the mitochondrial genomes supported or enhanced previous findings of the phylogeny of Phytophthora spp.


Introduction
The genus Phytophthora includes many devastating pathogens infecting economically important crops [1]. Perhaps the best known and most historically significant among those is P. infestans, the pathogen that causes potato late blight and the culprit behind the Irish potato famine [2]. During the past 20 years, the number of species described in the Phytophthora genus has expanded significantly. With approximately 55 species described in 1999, it expanded to 105 species by 2007 [3] and 117 species by 2012 [4]. Currently, at least 185 formally described species and provisional species have been reported [5][6][7]. Phytophthora is a genus in the Peronosporales of oomycetes. Oomycetes produce hyphae and are morphologically similar to fungi. However, they are phylogenetically distant. Oomycetes belong to the major group Stramenopila, which also include diatoms and brown algae [8][9][10]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Phylogenetic analysis grouped Phytophthora species into 10 clades [6,11,12]. P. sansomeana is one of the recently described species in clade 8 [13]. Previously included in the species complex P. megasperma sensu lato, it is morphologically similar to but phylogenetically distinct from P. megasperma sensu strictu. P. sansomeana was isolated from soybean in Indiana in 1990 [14] and has now been reported in China [15], Canada and multiple Midwest states in the USA [16,17]. It lives in the soil, infects soybean roots, and causes discoloration and rotting of lateral root and internal discoloration and rotting of the taproot. Above ground, it can cause yellowing and stunting, or whole plant wilting, but the chocolate-colored stem discoloration which is typically associated with late-season infection by another soybean root rot pathogen, P. sojae, is usually absent in P. sansomeana-infected soybean plants. This pathogen overwinters as oospore in the soil and in soybean debris. Also included in this species are isolates from Douglas-fir and alfalfa [13].
Interest in the genus Phytophthora has been increasing due to its economic impact. Molecular data serves as the foundation of many studies. Genome sequences have been available for many Phytophthora species. Complete mitochondrial genomes from eight species have been published. These include P. infestans [18,19], P. andina, P. ipomoeae, P. mirabilis, P. phaseoli [20], P. nicotianae [21], P. sojae and P. ramorum [22]. Another species, P. polonica, also has its complete mitochondrial genome publicly available in GenBank (Table 1). Inconsistencies in annotation have been observed (see "Results and discussion" for details). Mitochondrial genomes have been use to study the population of P. infestans [23][24][25][26] as well as resolving the phylogenetic relationship of Phytophthora species [12,19]. Molecular information from P. sansomeana is limited. In this study, we sequenced and annotated its complete mitochondrial genome, compared it to that of other Phytophthora species currently publicly available, and examined phylogenetic relationship among these species using their mitochondrial genomes.

Nucleic acids extraction and sequencing
The type strain of P. sansomeana, 1819b, was obtained from American Type Culture Collection (ATCC #: MYA-4455). It was maintained on lima bean agar. For nucleic acids extraction, it was grown in half-strength lima bean broth on a lab bench for 8 days. DNA was extracted from harvested mycelium using the Gentra Puregene Yeast/Bact kit (Qiagen). DNA was sequenced using both PacBio and Illumina technologies. For PacBio sequencing, a 20-kb insert library was constructed and then run on two SMRT cells on the RS II sequencing platform using the P6-C4 chemistry. For Illumina sequencing, a TruSeq DNA PCR-free library with mean insert size of approximately 440 bp was constructed according to manufacturer's instructions. It was sequenced on the HighSeq2500 platform.

Assembly, annotation and comparative genomics
The PacBio reads were assembled using HGAP (RS_HGAP_Assembly.3 protocol implemented in PacBio smrtanalysis 2.3.0 software package) and polished using Quiver [27]. Illumina read-pairs were mapped onto the assembly using Bowtie2 [28], and Pilon [29] was used to further improve the assembly by utilizing the aligned Illumina reads.
The mitochondrial genome of P. sansomeana was first annotated using MFannot (http:// megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl) using the standard genetic code. Like plant mitochondria, but different from those of animal and fungi, oomycete mitochondria use the standard code [30]. The automatic annotation was then verified manually. Mitochondrial genomes of nine other Phytophthora species were downloaded from GenBank (Table 1). Protein coding genes among the species were compared. Genes identified in one species but not in another were manually verified or corrected. For this purpose, ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) was used to find open reading frames (ORFs) and BLAST searches were used to identify similarity. ORFfinder was also used to find ORFs encoding at least 100 aa in the intergenic regions of all Phytophthora mitochondrial genomes. Ribosomal RNA (rRNA) gene sequences were used to search NCBI non-redundant database and SILVA rRNA database [31]. Transfer RNA (tRNA) genes were analyzed using tRNAscan-SE 2.0 [32] in the organellar mode. A circular map of the mitochondrial genome of P. sansomeana was generated using OGDRAW [33]. It's genome assembly and annotation were submitted to GenBank under accession number MH936679.

Phylogenomic analysis
To construct phylogenetic tree based on the mitochondrial genomes, individual protein coding genes and rRNA genes were aligned with Muscle [34]. The alignments were visually inspected to remove gaps and regions deemed ambiguously aligned (usually between closely adjacent gaps). The alignments were concatenated and a maximum-likelihood tree based on the Tamura-Nei nucleotide model [35] was constructed using MEGA7 [36]. One thousand bootstraps were used to test the support of individual branches. Pythium ultimum [37] and Pythium insidiosum [38] were used as outgroups.

Assembly and annotation of P. sansomeana mitochondrial genome
PacBio sequencing generated 383,775 subreads totaling 2.57 Gb after filtering and adapter removal. The N50 of subread length was 9,846 bp. The mitochondrial genome of P.
sansomeana was assembled into a single contig of 55.5 kb using PacBio reads, with a 15.8 kb direct duplicated region at both ends, indicating a circular topology. To verify this, the contig was broken at a random position in the non-duplicated region and the two segments were reconnected by merging the duplicated regions. PacBio reads were then mapped to this rearranged contig. Visual inspection of read alignments and coverage confirmed its circular topology. After polishing, the PacBio assembly resulted in a molecule of 39,546 bp. Illumina sequencing generated 43.3 million read-pairs at 151 bp x 2 in length. A total of 2,249,965 readpairs were mapped to the mitochondrial genome. Based on the alignments of Illumina reads, 72 single nucleotide deletions were corrected.
There was one large subunit rRNA gene (rnl) and one small subunit rRNA gene (rns). The 25 tRNA genes encoded tRNAs for 19 amino acids. There were two tRNA genes for arginine, glycine, leucine, methionine and serine, and one each for the other 14 amino acids except threonine (Fig 1). There was a third tRNA gene located between rpl16 and ymf99 (Fig 1) that had a CAU anticodon, but we interpreted it as trnI CAU rather than trnM CAU based on its similarity to a homologous tRNA gene at the same location in P. infestans. Experimentally confirmed in Escherichia coli [39] and inferred in P. infestans [18], it was assumed that the cytosine in trnI-CAU anticodon was post-transcriptionally modified to lysidine that would enable it to recognize the AUA codon for isoleucine.
The genome was compact. Coding regions approximately comprised 90.1% of the genome. There was no intron in any of the genes. Of the 69 intergenic regions, 53 were 50 bp or less in length and 41 were less than 20 bp. There were three overlaps: between rps7 and rps12 (26 bp), between nad1 and nad11 (4bp), and between tatC and ymf101 (4bp).

Protein coding gene contents in the mitochondrial genomes of 10 Phytophthora species
Mitochondrial genomes of nine other Phytophthora species were publicly available and they were downloaded from GenBank (Table 1). Throughout the analysis below, please refer to Table 1 for reference and GenBank accessions. These 10 genomes are similar in size, ranging from 37,561 bp in P. nicotianae to 42,977 bp in P. sojae. G+C contents range from 21.57% in P. polonica to 22.39% in P. ipomoeae. Based on seven nuclear loci and four mitochondrial loci, Phytophthora species were grouped into 10 clades [11,12]. The 10 species included in our analysis belong to clades 1, 7, 8 and 9 (Table 1).
All 10 genomes share the 34 genes with known functions, including the 14 genes in respiratory complexes, four ATP synthase genes and 16 ribosomal RNA protein genes as described in P. sansomeana. The gene atp9 was not reported in P. nicotianae. Our analysis of the submitted sequence in GenBank unambiguously identified atp9 gene in this genome at the expected location (Fig 2) but its 3' end was truncated. DNA sequence alignment (not shown) revealed that

PLOS ONE
this was caused by a deletion from the 3' end of atp9 gene that extended into the intergenic region between atp9 gene and nad9 gene. This deletion removed the last 13 amino acids in atp9 protein and replaced it with five non-homologous amino acids (Fig 3). RNA genes are in white letters with rRNA genes in dark grey background and tRNA genes in grey background. Protein coding genes are in dark letters with the following backgrounds: cytochrome C reductase and oxidases, green; ATP synthases, blue; NADH dehydrogenases, yellow; ribosomal protein genes, brown; twin-arginine translocase, purple; and conserved ORFs with unknown function, light grey. The diamond symbols represent regions with one or more ORFs unique to that particular species. In species other than P. sansomeana, protein coding genes with an asterisk are those identified in this study not presented in GenBank annotations. tRNA genes with an asterisk are those whose annotations are modified in this study, either in coding strand or in identity, comparing to GenBank annotations. tatC gene was previously given the gene symbol ymf16 (ymf is used to designated conserved ORF in mitochondria without annotated function) or SecY in species other than P. sansomeana. Our analysis showed that it should be named tatC (see text for detail). IR, inverted repeat. a Clade designation is based on [11,12]. https://doi.org/10.1371/journal.pone.0231296.g002

PLOS ONE
In addition to the 34 genes described above, another gene shared by these mitochondrial genomes was annotated with function, but its annotation in these genomes was inconsistent. ymf16 (= orf244) is a conserved ORF in all 10 Phytophthora species that encodes a protein of approximately 244 amino acids. In the 13 mitochondrial genomes of nine Phytophthora species available in GenBank (Table 1), this gene was given the name ymf16 in most cases and its product was annotated as SecY-independent transporter protein. However, in P. ramorum and P sojae, this gene was named SecY, and in P. nicotianae, its product was named SecY. The name "SecY" is not compatible with the meaning of "SecY-independent".
There are three translocation systems in bacteria that transport distinct subset of proteins into or cross the inner membrane. SecYEG is the main translocase that transports unfolded peptides [40,41], of which SecY is the main transmembrane subunit. YidC functions as an insertase and also assists SecYEG in the assembly of membrane proteins [42]. The twin-arginine translocation (Tat) system is capable of translocating folded or even multi-subunit protein complexes across inner membrane [43]. Proteins translocated by the Tat system have a twin-arginine motif in their signal peptides. The TatC protein is an essential component of the Tat system.
In a previous study, TatC gene was identified in the mitochondrial genomes of oomycetes [44]. To determine the function of ymf16 gene, its predicted protein sequences from the 10 Phytophthora species were used to search Pfam database version 32 [45]. The only hit was TatC (sec-independent translocase protein, E values < 1e-20). As such, we annotated this gene as tatC (Fig 2).
Six conserved ORFs have been reported in the mitochondrial genomes of these 10 species. Of these, orf32, ymf98, ymf99 and ymf100 were found in all 10 species. Orf32 was not reported in P. nicotianae but our analysis found this ORF in the expected location (nucleotide 7530..7628) (Fig 2). Ymf96 was not found in P. nicotianae. Of the five species in subclade 1c, ymf101, a small ORF encoding approximately 64 aa, was found in P. infestans but missing in three species, P. ipomoeae, P. mirabilis and P. phaseoli (Fig 2). Interestingly, In P. andina, ymf101 was missing in haplotype Ic but our analysis identified it in haplotype Ia (nucleotide 30270..30476). ymf101 was not reported in P. nicotianae, but our analysis identified it at nucleotide 30206..30015.
Unique ORFs have been reported in several species. In P. infestans haplotypes IIa and IIb, an insertion of approximately 2 kb encodes multiple ORFs [19]. Also reported were orf183 in P. nicotianae, two inverted copies of orf176 in P. ramorum, and six ORFs at two location in P. sojae. Additionally, we identified orf402 in P. sansomeana and also found an ORF with 173 codons (excluding the stop codon) in P. polonica at nucleotide 540..1061 (Fig 2).

Organization of the mitochondrial genomes of 10 Phytophthora species
As discussed above, the mitochondrial genomes in these 10 species were conserved in gene content with few exceptions. These genomes were also conserved in gene order and coding strand except for three inversions. One inversion event occurred within clade 1. P. nicotianae belongs to clade 1 and is basal to subclades 1b and 1c. Compared to P. nicotianae and species in clades 7, 8 and 9, a large section between atp1 gene and nad3 gene was inverted in the five species in subclade 1c: P. andina, P. infestans, P. ipomoeae, P. mirabilis and P. phaseoli. This section contained 11 protein coding genes and eight tRNA genes. A second inversion occurred between species in clade 1 and species in clades 7, 8 and 9. This inversion covered three protein coding genes: atp9, nad9 and cob. A third inversion occurred within clade 8. In P. sansomeana (subclade 8a), the section covering nad5, nad6 and trnR UCU was inverted comparing to P. ramorum (subclade 8c) and species in other clades (Fig 2).

Phylogeny of Phytophthora species based on mitochondrial genomes
Other than atp9, 34 protein coding genes with annotated functions and the two rRNA genes, rnl and rns, were used to construct the phylogenetic tree (Fig 4). Previously, Phytophthora species were grouped into 10 clades [11,12]. Our analysis included species from four clades and the result was in agreement with previous studies. Species from individual clades were grouped together with bootstrap support 86% or higher. Within clade 1, previous studies showed moderate support that P. nicotianae was basal to subclades 1b and 1c based on seven nuclear genes [11] and four mitochondrial genes [12]. Our analysis supported the conclusion that P. nicotianae was basal to subclade 1c (100% bootstrap support). Species in subclade 1b was not included in this study. Within clade 1c, P. phaseoli diverged first and was basal to other species. P. andina haplotype Ia was grouped with haplotype I of P. infestans, while haplotype Ic was grouped with P. ipomoeae and P. mirabilis. These findings were in agreement with previous reports [20,53]. The phylogenetic analysis was dominated by clade 1 species. Decreasing the number of species in clade 1 did not change the topology of the tree (not shown).
In summary, we sequenced and annotated the complete mitochondrial genome of the soybean pathogen P. sansomeana and compared it to that of nine other Phytophthora species. Inconsistencies in annotation among these mitochondrial genomes were corrected. These genomes were found to be conserved in gene content with few exceptions. Three inversion events, each covering multiple genes, were observed among these genomes. Phylogenomic analysis using the mitochondrial genomes supported or enhanced previous findings.
Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. USDA is an equal opportunity provider and employer.
Supporting information S1