Complete chloroplast genomes of three important species, Abelmoschus moschatus, A. manihot and A. sagittifolius: Genome structures, mutational hotspots, comparative and phylogenetic analysis in Malvaceae

Abelmoschus is an economically and phylogenetically valuable genus in the family Malvaceae. Owing to coexistence of wild and cultivated form and interspecific hybridization, this genus is controversial in systematics and taxonomy and requires detailed investigation. Here, we present whole chloroplast genome sequences and annotation of three important species: A. moschatus, A. manihot and A. sagittifolius, and compared with A. esculentus published previously. These chloroplast genome sequences ranged from 163121 bp to 163453 bp in length and contained 132 genes with 87 protein-coding genes, 37 transfer RNA and 8 ribosomal RNA genes. Comparative analyses revealed that amino acid frequency and codon usage had similarity among four species, while the number of repeat sequences in A. esculentus were much lower than other three species. Six categories of simple sequence repeats (SSRs) were detected, but A. moschatus and A. manihot did not contain hexanucleotide SSRs. Single nucleotide polymorphisms (SNPs) of A/T, T/A and C/T were the largest number type, and the ratio of transition to transversion was from 0.37 to 0.55. Abelmoschus species showed relatively independent inverted-repeats (IR) boundary traits with different boundary genes compared with the other related Malvaceae species. The intergenic spacer regions had more polymorphic than protein-coding regions and intronic regions, and thirty mutational hotpots (≥200 bp) were identified in Abelmoschus, such as start-psbA, atpB-rbcL, petD-exon2-rpoA, clpP-intron1 and clpP-exon2.These mutational hotpots could be used as polymorphic markers to resolve taxonomic discrepancies and biogeographical origin in genus Abelmoschus. Moreover, phylogenetic analysis of 33 Malvaceae species indicated that they were well divided into six subfamilies, and genus Abelmoschus was a well-supported clade within genus Hibiscus.

Introduction Family Malvaceae consists of 244 genera and over 4200 species, and most of them are widely distributed in tropics and temperate regions [1]. According to the diverse morphological characteristics, this family could be divided into nine subfamilies, including Sterculioideae, Tilioideae, Malvoideae, Helicteroideae, Grewioideae, Dombeyoideae, Byttnerioideae, Brownlowioideae and Bombacoideae [2]. Abelmoschus is one of important genera in subfamily Malvoideae of family Malvaceae. This genus was previously placed within Hibiscus, and subsequently isolated by taxonomists due to genetic differences [3]. As currently defined, genus Abelmoschus contains 11 species, 4 subspecies and 5 varieties [4], and displays a variable habit, from annual to perennial, herbs to shrubs, and is distributed in Asia, Australia and southwestern Africa [5]. Most members of this genus are economically important plants, and used in agriculture, food and medicines. A. esculentus (okra) and A. caillei are widely cultivated as vegetables due to their tender pods [6][7][8]. A. manihot is a popular green leafy vegetable and its flowers have been applied in clinical treatment of burns, chronic kidney disease and oral ulcers owing to the flavonoids [9,10]. A. moschatus, as an aromatic plant, could be suitable for medical or food uses to improve insulin sensitivity [11]. A. sagittifolius also has a long history of medicinal usage, and cadinane sesquiterpenoid glucoside extracted from the stem tubers exhibited antitumor activity [12]. Moreover, antioxidant, antimicrobial, wound healing, antiinflammatory and immunomodulatory activities were also found in Abelmoschus species [13][14][15][16]. Seed oil and levels of oleic acid have also been reported in Abelmoschus [3].
Due to coexistence of wild and cultivated form and interspecific hybridization, genus Abelmoschus is controversial in systematics and taxonomy, such as taxonomic position of some Abelmoschus species and the relationships between Abelmoschus species and part of Hibiscus species [8]. In terms of morphological and cytological features, highly variable root, flower and fruit characters of Abelmoschus have been used extensively in classification system [17,18]. Patil et al. found seed coat sculpturing and seed trichomes could be used as the diagnostic characters for many morphologically closely related species of Abelmoschus [5]. Fluorochrome-binding pattern of nine Abelmoschus species showed polyploidy was an important factor in the chromosome number variation and evolution in this genus [4]. Some researchers also used molecular markers to analyze genetic relationships of Abelmoschus, but most studied focused on genetic diversity within A. esculentus and A. manihot [2,7,9,18,19], molecular markers were relatively lacking in other species. Thus, new molecular tools were necessary to study the accurate phylogeny in Abelmoschus.
Chloroplast is characteristic organelle in plant cells, and crucial in the photosynthesis and biosynthesis of pigments, amino acids, starch and fatty acids [20,21]. The chloroplast genome generally has a circular structure with a pair of inverted-repeats (IR) regions (further called IRa and IRb), a large single copy (LSC) region and a small single copy (SSC) region. Due to the small size, conserved structure and gene content, it has been applied for resolving phylogenetics, evolution, taxonomic issues, population genetics and environmental adaptability [22]. Although chloroplast genome sequences of Abelmoschus esculentus has been deposited in Gen-Bank (NC_035234.1) [23], there are no systematic, comprehensive and comparative studies of chloroplast genome in Abelmoschus.
In this study, three chloroplast genomes of A. moschatus, A. manihot and A. sagittifolius were sequenced and compared with the chloroplast genomes of A. esculentus (NC_035234.1) and related species in Malvaceae. Apart from gene content and structure organization, comparative studies were conducted to identify mutational hotspots in Abelmoschus, and a phylogenetic tree of 33 species in family Malvaceae were constructed. These results will be useful in developing molecular markers for resolving taxonomic issues of Abelmoschus, and elucidating the evolutionary and phylogenetic relationships in the family of Malvaceae.

Plant material, DNA isolation and sequencing
The fresh leaves of A. moschatus, A. manihot and A. sagittifolius were collected from the experimental field of Guangdong academy of agricultural sciences (Guangzhou, China). All samples were frozen in liquid nitrogen immediately and stored at −80˚C. Total DNA was extracted by Plant DNA Isolation Kit (Tiangen, Beijing, China). Paired-end (PE) library was constructed according to protocol of Illumina manual (San Diego, CA, USA), and then it was run on an Illumina NovaSeq platform (Genepioneer Biotechnologies, Nanjing, China) with PE150 sequencing strategy and 350 bp insert size.

Chloroplast genome assembly and annotation
Raw reads of three Abelmoschus species were filtered using the software NGSQCToolkit V2.3.3. In order to reduce the complexity of sequence assembly, filtered reads were compared with the chloroplast genome database built by Genepioneer Biotechnologies (Nanjing, China) using Bowtie2 V2.2.4, and sequences on the alignment was used as the chloroplast genome sequence of samples [24]. Seed sequence was obtained by software SPAdes v3.10.1, and contigs was acquired by kmer iterative extend seed. Then, the contigs were connected as scaffolds by SSPACE v 2.0, and gaps were filled using Gapfiller v2.1.1 until the complete chloroplast genome sequence was recovered. Finally, quality control was adopted to ensure the accuracy of assembly results with the reference genome of A. esculentus (NC_035234.1).
The coding sequences and ribosomal RNA (rRNA) were obtained using software BLAST V 2.2.25 and HMMER V3.1 b2 after compared with the chloroplast genome database in National Center of Biotechnology Information (NCBI). Aragorn V1.2.38 software was used for transfer RNA (tRNA) prediction, then tRNA annotation information of chloroplast genome was obtained. Chloroplast genome maps were made by OrganellarGenomeDRAW (OGDRAW).

Relative Synonymous Codon Usage (RSCU) and RNA editing sites
RSCU analysis of A. moschatus, A. manihot, A. sagittifolius and A. esculentus (NC_035234.1) was determined using MEGA v7.0, and value of RSCU greater than one was considered to be a higher codon frequency. The putative RNA editing sites were analyzed by PREP-cp (http:// prep.unl.edu/) with default parameters [25].

Simple Sequence Repeats (SSRs) and repeat sequences
The comparison of SSRs within four Abelmoschus species were identified using MISA (MIcroSAtellite identification tool) v1.0 with 8 for mononucleotide repeats, 5 for di-and 3 each for tri-, tetra-, penta-and hexanucleotide repeats. Software vmatch v2.3.0 was used to identify forward (F), reverse (R), palindromic (P), and complementary (C) repeats with minimum repeats size �30 bp and sequence similarity of 90%.
With the reference genome of A. moschatus, different types of single nucleotide polymorphisms (SNPs) and Indels were determined in Abelmoschus using MAFFT program.
The contraction and expansion of the IR boundaries among the above seven species in Malvoideae were visualized between the four regions of the chloroplast genome (LSC/IRb/SSC/ IRa) by Geneious R8.1. Meanwhile, the analysis of chloroplast sequence homology and collinearity was performed by Mauve software.

Phylogenetic analysis
Phylogenetic analysis was performed using the chloroplast genomes of A. moschatus, A. manihot, A. sagittifolius and A. esculentus, along with related 29 species within the same family of Malvaceae. Their accession numbers were listed in S1 Table. All chloroplast genome sequences were aligned through MAFFT V7.427, and Indels were removed by TrimAl (V1.4.rev15), then phylogenetic tree was constructed under maximum composite likelihood method (GTRGAMMA model and bootstrap = 1000) using RAxML v8.2.10.

Characterization of chloroplast genomes in Abelmoschus species
Illumina Novaseq 6000 produced a total of 25,192,038, 19,864,607 and 21,300,029 paired-end (150bp) clean reads for A. moschatus, A. manihot and A. sagittifolius, with average organelle coverage 4470, 1888 and 3194, respectively. Chloroplast genome size was~163 kb in Abelmoschus species, including a pair of IR regions separated by a LSC region and a SSC region (Fig 1 and Table 1). The GC content of Abelmoschus chloroplast genomes was~36%, and the LSC, SSC and IR regions had similar content in four species, with~34%,~31% and~41%, respectively.

Amino acid frequency, codon usage and RNA editing sites
Four Abelmoschus species showed similarity in amino acids frequency and codon usage. Protein-coding genes comprised 26713, 26705, 26714 and 26717 codons in A. moschatus, A. manihot, A. sagittifolius and A. esculentus, respectively (S2 Table and S1 Fig). Among those amino acids, Leucine was the most encoded amino acid followed by Isoleucine and Serine, while the Cysteine was the least abundant in chloroplast genomes. The use of the codons ATG and TGG, encoding Methionine and Tryptophan, exhibited no bias (RSCU = 1.00) in Abelmoschus. The findings also revealed that most of the amino acids preferred synonymous codons (RSCU >1.00) having A/T at 3 0 end, except ATA and CTA encoding for Isoleucine and Leucine, respectively.
Putative RNA editing sites were also determined in four Abelmoschus species. PREP predicted 55 putative RNA editing sites in 24 genes of A. moschatus and A. sagittifolius, 56 putative RNA editing sites in 24 genes of A. manihot, and 62 putative RNA editing sites in 24 genes

PLOS ONE
Complete chloroplast genomes of Abelmoschus moschatus, A. manihot and A. sagittifolius of A. esculentus (S3 Table). Similar RNA editing sites were found in most genes, however, gene ycf3 was unique to A. esculentus and gene clpP was unique to A. moschatus, A. manihot and A. sagittifolius. The highest number of editing sites were determined in ndhB (12), ndhD (7), Subunits of RNA polymerase (4) rpoA, rpoB, rpoC1 � , rpoC2

SSRs and repeat sequences
SSRs were detected by MISA software in Abelmoschus (Fig 2A and 2B) Table). About 67% SSRs repeats were found in LSC, 13% in SSC, and 19% in IR. The IGS regions contained the most SSRs, and comprised approximately 58% of the total SSRs. Four categories of repeat sequences were also found in Abelmoschus, and there were 486 repeats were present in the chloroplast genomes of four species, 122 in A. moschatus, 137 in A. manihot, 136 in A. sagittifolius and 91 in A. esculentus (Fig 2C-2F). Types of repeats (P, F and R) had similar numbers in each species, but the number of type C is relatively small. The size of repeats was mainly 30-54 bp in four Abelmoschus, and all contained one repeats above 55

PLOS ONE
Complete chloroplast genomes of Abelmoschus moschatus, A. manihot and A. sagittifolius esculentus, respectively. We also found some shared sequences in LSC/SSC (all 1), SSC/IR (all 2), and LSC/IR (2)(3)(4) in four species. The complete details of repeat sequences in four Abelmoschus species were also listed in S5 Table.

SNPs and Indels in Abelmoschus
Diverse types of SNPs were determined in four Abelmoschus species using A. moschatus as reference. A. manihot, A. sagittifolius and A. esculentus showed 166, 79 and 262 SNPs in complete chloroplast genome, respectively. SNPs of A/T, T/A and C/T were the largest number type among the 12 substitutions in Abelmoschus (Fig 3A and 3B), and most SNPs were located in LSC regions followed by SSC regions. The ratio of transition to transversion was 0.37 for A. manihot, 0.55 for A. sagittifolius and 0.51 for A. esculentus. Furthermore, Indels were also detected in different regions of chloroplast genomes. A total of 120, 84 and 177 Indels were found in A. manihot, A. sagittifolius and A. esculentus, and most of them existed in the LSC regions, but IR regions had the longest Indel average length in A. manihot and A. sagittifolius (Fig 3C and 3E).

Mutational hotspots in Abelmoschus
Comparative analysis was conducted to identify mutational hotspots of protein-coding genes, IGS and intron regions of chloroplast genome among four Abelmoschus species. The IGS regions had more polymorphic (average π = 0.00432) compared to protein-coding regions (average π = 0.00285) and intronic regions (average π = 0.00269). The nucleotide diversity was ranged from 0.00013 (rps12-exon2-ndhF) to 0.02113 (clpP-exon3) in all the polymorphic containing regions (Fig 4). A total of thirty highly diverse regions (region length �200 bp) were  Table 4. Most of these mutational hotspots belong to IGS regions, such as start-psbA, atpB-rbcL and petD-exon2-rpoA. Higher nucleotide diversity was also observed for protein coding genes, including 1 st and 2 nd exon of clpP, 1 st intron of clpP, 2 nd intron of ycf3, matK, ndhF and 1 st exon of atpF. A. moschatus, A. manihot, A. sagittifolius and A.

IR boundary and collinearity
The IR regions were compared among four Abelmoschus species and three closely related species in family Malvaceae (Fig 5).  aligned by Mauve software, and no rearrangement occurred in gene organization (Fig 6), but the gene layouts within SSC regions of A. officinalis and G. hirsutum were in the opposite orientations compared with H. rosa-sinensis and four Abelmoschus species.

Ka/Ks substitution rate
In this study, we analyzed Ka/Ks rate of A. moschatus compared with to three species in the same genus and three closely related species in family Malvaceae (S6 Table). Eighty-five protein-coding genes were analyzed and thirty-eight of them had an average Ka/Ks rate between 0 to 0.1 in seven species, which indicated these genes were under strong purifying selection pressure in family Malvaceae. In contrast, three genes showed Ka/

Phylogenetic analysis
Maximum likelihood phylogenetic tree of 33 species in family Malvaceae were constructed based on complete chloroplast genomes after removing the Indels. Phylogenetic analysis indicated that A. moschatus sister to A. sagittifolius, four Abelmoschus species shared a common node with H. taiwanensis and H. mutabilis, and then they came together with other Hibiscus species to form a large group. The species of six different subfamilies were well distinguished with bootstrap values about 100. However, Sterculioideae subfamily was divided into two groups because Heritiera elata did not share a same node with other species in the same genus (Fig 7).

Genome characteristics of Abelmoschus species and comparison with other species in Malvaceae
Most species in Abelmoschus were economically important plants, but the chloroplast genomes remained relatively limited, with only A. esculentus was sequenced [23]. In this study, three chloroplast genomes of A. moschatus, A. manihot and A. sagittifolius were sequenced and compares with A. esculentus. The sizes of chloroplast genomes ranged narrowly from 163121 to 163453 in four Abelmoschus species, and comparative analyses revealed highly conserved structure and gene. Most angiosperms typically contained 74 to 79 protein-coding genes in chloroplast genomes [27]. In this study, four Abelmoschus species all encoded 78 unique protein-coding genes, and was different with previously reported species of Hibiscus cannabinus and three Firmiana speies in Malvaceae, which contain 79 protein-coding genes [22,28]. Rabah et al. [23] reported A. esculentus had 29 unique tRNA genes, but the gene trnH-GUG, located at LSC/IRa boundary, was not annotated, so we reannotated this gene for this species. Within the same subfamily, previous studies reported 17, 19 and 18 intron-containing genes in H. cannabinus, H. rosa-sinensis and 12 species of Gossypium, respectively [2,28,29], while Abelmoschus harbored 18 intron-containing genes, thirteen out of them had intron length differences among 4 species, and gene trnK-UUU had the longest intron with 2563-2576 bp. The chloroplast genomes had well collinearity relationship among four Abelmoschus species and three closely related species in Malvaceae, but some differences were detected in terms of the direction of SSC, gene miss and IR expansion and contraction. Gene layouts within SSC region had the same orientations between four Abelmoschus species and H. rosa-sinensis, but A. officinalis and G. hirsutum had the opposite orientations compared with them, and similar

PLOS ONE
Complete chloroplast genomes of Abelmoschus moschatus, A. manihot and A. sagittifolius phenomenon with different inversions in the LSC region was also found in Chenopodium quinoa and Mangifera indica [23]. The infA gene as a translation initiation factor has been independently lost many times during the evolution of land plants [27,30], and it also missed in Abelmoschus, but infA showed functional or non-functional in different Malvaceae species, such as H. rosa-sinensis [2].
The border of IR was highly variable region with many nucleotide changes in chloroplast genomes of closely related species. Among four Abelmoschus species, the genes of rpl16, ndhF and ycf1 showed different fragment sizes in the IR boundaries, and the IRb/SSC border was crossed by the ndhF except A. esculentus, in which ndhF had larger gene size and all located in SSC region, this indicated that the relationships among A.moschatus, A. manihot and A. sagittifolius were closer than A. esculentus. Moreover, Abelmoschus species showed relatively independent boundary traits compared with the other Malvaceae species. Gene rpl16 was located at the junction of IRb/LSC in Abelmoschus, whereas A. officinalis and G. hirsutum presented rps19 gene crossing the boundary or locating in LSC region, and ten species in different genus of Malvaceae also showed rps19 gene in IRb/LSC [2]. In the IRb region, rps3 was the closest gene to the IRb/LSC boundary in Abelmoschus, but this gene was replaced by the rpl2 in other Malvaceae species. Durio zibethinus was a Malvaceae species with another boundary characteristic, rpl23 (in LSC) and trnI-CAU (in IRb) were the closest genes to the IRb/LSC boundary, and rpl23 and rpl2 had only one copy due to IR expansion and contraction [31]. These results seem to be line with phylogenetic analysis, which indicated that species with more similar boundary traits had closer phylogenetic relationship in Malvaceae.

SSRs and repeat sequences in Abelmoschus
Owing to the advantages of non-recombination, haploidy, uniparental inheritance and low nucleotide substitution rate, chloroplast SSRs markers can be considered as an excellent tool in population genetics and phylogeny analysis [32]. In the current study, mononucleotide SSR in four Abelmoschus species varied in size from 8 to 18 nucleotides, which was different from related species in Hibiscus (7 to 15 nucleotides) and Firmiana (7 to 22 nucleotides) [2,22]. Both of A. sagittifolius and A. esculentus had six types SSRs, but A. moschatus and A. manihot did not contain hexanucleotides. Most SSRs were distributed in LSC region and intergenic region, and the identified SSRs in Abelmoschus revealed that A/T and AT/TA were the most abundant in mononucleotide and dinucleotide SSRs respectively, which agreed with the majority of plant family [24]. Moreover, repeat sequences was lower in A. esculentus compared

PLOS ONE
Complete chloroplast genomes of Abelmoschus moschatus, A. manihot and A. sagittifolius with A. moschatus, A. manihot and A. sagittifolius, but they shared similar distribution regions. Abundant repeats were found in the intergenic spacer regions (IGS), followed by intronic region and coding sequences, and the same distribution pattern of repeat sequences were also reported in Hibiscus [2]. These repeat sequences were also crucial in chloroplast genome arrangement and sequence variation of Abelmoschus.

Taxonomic discrepancies and hotspots in Abelmoschus
Previous studies reported that seed shape and trichome structure had major taxonomic importance and proved to be valuable characters for separating taxa of Abelmoschus [5]. SSR markers (mainly in A. esculentus) were also developed from transcriptome data and genomic DNA to investigate genetic relatedness and cross-species transferability [7,19]. Pfeil et al. analyzed the phylogeny of Hibiscus and the Tribe Hibisceae using chloroplast DNA sequences of ndhF and rp116 intron, and found two tested Abelmoschus species were embedded within Hibiscus [33]. Werner et al. [8] used nuclear internal transcribed spacer (ITS) and chloroplast rpl16 sequences to construct phylogenetic relationships within Abelmoschus, and its relationship with the genus Hibiscus and other related species in Malvaceae, but A. esculentus and A. caillei cannot be distinguished from each other, and genetic diversity within A. esculentus and A. caillei was low. In this study, we listed thirty highly mutational hotspots (�200 bp) after comparing nucleotide diversity of protein-coding genes, IGS, and intron regions among Abelmoschus, and these hotspots could be used to solve taxonomic discrepancies for genus Abelmoschus. Most mutational hotspots belong to IGS regions, and some hotspots in protein-coding genes had also been commonly used for barcoding markers in related genera, such as matK, rbcL and ndhF [2,33]. The nucleotide diversity of rpl16 intron was 0.0029, while the thirty hotspots identified in Abelmoschus had nucleotide diversity from 0.0024 to 0.0142, and 23 regions had higher polymorphic than previously reported sequence of rpl16. Interestingly, three exons and one intron of clpP gene all showed high nucleotide diversity, especially the third exon (π = 0.02113, region length = 71bp), and polymorphic region of this gene had been proved to be effective in evaluating the crop types and biogeographical origin of Cannabis sativa [34]. Therefore, all these mutational hotspots provided useful information for subsequent development of chloroplast markers, evolutionary relationships and biogeographical origin.

Phylogenetic relationship in Malvaceae
Phylogenetic tree of 33 species in family Malvaceae were reconstructed using chloroplast genomes without Indels in this study, and they were well divided into six subfamilies, except Heritiera elata which did not share a node with other species in the same genus. Few previous studies referred to the taxonomic position of A. sagittifolius, our phylogenetic tree suggested that A. sagittifolius was closer to A. moschatus than to A. esculentus, which was consistent with the morphological characteristics of pod and flower [12]. Furthermore, genus Abelmoschus was previously included in the genus Hibiscus and later isolated from it [3]. Four Abelmoschus species shared a common node with H. taiwanensis and H. mutabilis, and then they formed a large group with other Hibiscus species located in different branches. These results indicated that Abelmoschus was a well-supported clade within Hibiscus, and agreed with the viewpoint of Werner et al. [8]. Thus, taxonomic treatment of Abelmoschus is an issue that required further discussion. Abelmoschus could be merged with Hibiscus to form a broad genus Hibiscus, or it maintains the taxonomic position of Abelmoschus, but some Hibiscus species need to change their taxonomic position. As more complete chloroplast genomes are sequenced, the chloroplast genome data could be expected to help resolve the deeper branches of phylogeny and complex evolutionary histories in Malvaceae [35].

Conclusions
Three chloroplast genomes of A. moschatus, A. manihot and A. sagittifolius were sequenced and annotated in the present study, and compared with the chloroplast genomes of A. esculentus and related species in Malvaceae. The results revealed the gene number and order, amino acid frequency, and codon usage were similar in Abelmoschus. However, the differences were also found in IR boundaries, intron-containing genes and the number of repeat sequences and SNPs. Abelmoschus species also showed relatively independent IR boundary traits compared with related species in Malvaceae, and identified thirty mutational hotpots might be useful for developing molecular markers and resolving taxonomic discrepancies and biogeographical origin both at genus Abelmoschus and family Malvaceae levels.
Supporting information S1 Fig. Amino acids frequency in A. moschatus, A. manihot, A. sagittifolius and A.