Figures
Abstract
Studies on the molecular characteristics of chloroplast genome are generally important for clarifying the evolutionary processes of plant species. The base composition, the effective number of codons, the relative synonymous codon usage, the codon bias index, and their correlation coefficients of a total of 41 genes in 21 chloroplast genomes of the genus Arachis were investigated to further perform the correspondence and clustering analyses, revealing significantly higher variations in genomes of wild species than those of the cultivated taxa. The codon usage patterns of all 41 genes in the genus Arachis were AT-rich, suggesting that the natural selection was the main factor affecting the evolutionary history of these genomes. Five genes (i.e., ndhC, petD, atpF, rpl14, and rps11) and five genes (i.e., atpE, psbD, psaB, ycf2, and rps12) showed higher and lower base usage divergences, respectively. This study provided novel insights into our understanding of the molecular evolution of chloroplast genomes in the genus Arachis.
Citation: Yang S, Li G, Li H (2023) Molecular characterizations of genes in chloroplast genomes of the genus Arachis L. (Fabaceae) based on the codon usage divergence. PLoS ONE 18(3): e0281843. https://doi.org/10.1371/journal.pone.0281843
Editor: Branislav T. Šiler, Institute for Biological Research, University of Belgrade, SERBIA
Received: August 18, 2022; Accepted: February 1, 2023; Published: March 14, 2023
Copyright: © 2023 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: This study was funded by the Science and Technology Department of Shaanxi Province (grant 2023-YBGY-261) and Xi'an Peihua University (grant PHKT2250).
Competing interests: The authors have declared that no competing interests exist.
Introduction
As one of the most important economical crops, peanut (Arachis hypogaea L.) is an annual crop in the legume family (Fabaceae), cultivated for edible oil and food in more than 100 countries worldwide. To date, a large number of germplasm resources of the genus Arachis are maintained in China, India, and the United States. It is well known that genetic diversity declines in proportion to the severity of the genetic bottleneck. Due to the significant genetic bottleneck in the cultivated taxa of peanuts, these plants generally show a narrow genetic base [1]. Although the cultivated taxa of peanuts are classified via morphological characters (i.e., the presence or absence of axles on the main stem), the genetic variation is generally considered the fundamental element of species diversity, it is necessary to study the genetic divergence of the genus Arachis due to its agricultural significance [2,3]. Advances in novel genomic tools are helpful for illuminating the evolution of cultivated and wild taxa of the genus Arachis [4], while the genomic information is important for studying the molecular characteristics of peanuts based on selected genes.
The genetic codons are closely linked to nucleic acids and proteins. Therefore, the codon usage patterns of a selected gene are important for exploring its molecular functions [5], i.e., predicting its degree of inheritance as well as its adaptiveness during evolution. Studies have investigated the significance of the molecular composition and the codon usage pattern at the genomic and genetic levels based on chloroplast genomes for exploring the genetic diversity within plants [6,7]. For example, studies have revealed that the codon usage patterns in chloroplast genomes reflect the degrees of genetic variations under the evolutionary pressure [8,9]. The chloroplast genomes contain molecular characteristics important for both clarification of the evolutionary history and improvement of crop plants. Therefore, comparative analyses based on codon usage patterns of chloroplast genomes have been widely used to evaluate the genetic correlation among groups of plants [10,11]. Further, the relationship among the compositions, such as the relationship between GC12 and GC3 of chloroplast genomes could be also used to distinguish the sub-genus of a plant [12].
Peanuts provide a large portion of nutrients for human populations in China, India, and many countries in South Saharan Africa [13]. Peanut is an important source of high-quality cooking oil and is also appreciated worldwide as a type of affordable and flavorful food [14]. For a long term, peanuts have been used either as a whole or as an ingredient in food to provide the highest protein contents among many commonly consumed snack nuts, and served as a rich source of heart-healthy, monounsaturated lipids [15]. Studies have shown that molecular characteristics of plant genomes are generally influenced by many human or natural factors [16]. Furthermore, plant biodiversity is important for the ecological investigations and is determined by many factors, such as the geographical locations [17,18], genome and gene structures [19,20], and the temporal factors [21]. A large number of studies have provided a solid foundation for understanding not only the chemical compositions of peanuts, but also the breeding methods to improve the quality of peanuts. For example, the plant chloroplast genomes are important for photosynthesis and have been usually used as the molecular systems to investigate the gene expressions [22]. Furthermore, the codon usage patterns in plant genomes have been used as the evolutionary characteristics to perform phylogenetic analysis. For example, studies have explored the genomic evolution and phylogenetic development in plant chloroplasts based on the variations in their compositions and the maximum likelihood of sequences [23–26]. Moreover, the molecular and genetic analyses of the chloroplast genomes have provided solid experimental evidence to facilitate the improvements in crop plants [27,28].
The chloroplast genome is composed of a single circular double stranded DNA molecule, capable of independent replication and transcription. Compared with the nuclear genome, the chloroplast genome shows unique characteristics in its base ratio, nucleotide sequence, and gene structure. Many factors, such as the environment and the cultivation by humans, affect the evolutionary characteristics of the chloroplast genome. Due to their relatively small sizes, it is generally cost effective and convenient to obtain and analyze the chloroplast genomes compared to the nuclear genomes [29]. However, studies on the diversity degree of both genes and the overall molecular characterizations of chloroplast genomes in the genus Arachis are sparse [30]. Despite the well-established taxonomic groupings of the genus Arachis, the evolutionary characteristics and genetic diversity of chloroplast genomes in the genus Arachis are not clear due to their relatively conserved genomes and lack of appropriate data. For example, the codon usage bias and divergences of genes in chloroplast genomes of the genus Arachis would be imperative data for studying their variations of molecular adaptation as well as their molecular diversities.
As the main food production systems, crops play an important role in nutritional security. With the rapidly increasing population worldwide, there is an urgent need to increase the crop production in order to ensure the food security in the near future [31]. Molecular studies enhanced by the next generation DNA sequencing technology allow the extensive exploration of the structural and functional features of plant genomes, which are expected to show significant impact on not only the fundamental biological studies of plants but also the genetic improvement of crops [32]. For example, comprehensive studies on the codon usage patterns of a plant could reveal the key factors in codon choice and its molecular evolution [33]. It is well known that the genes in chloroplast genomes are generally highly conserved, with some of them commonly used as the molecular markers for taxonomic identification [34]. In the present study, the codon usage divergences and their potential taxonomic applications were explicitly investigated based on a total of 21 chloroplast genomes of the genus Arachis available at the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) database. The codon usage patterns of these genomes were explored based on several molecular parameters, including the basic composition, the effective number of codons (ENCs), the codon bias index (CBI), and the relative synonymous codon usage (RSCU) of a total of 41 genes shared in these chloroplast genomes. The codon usage variations among these genes were further assessed to evaluate the factors affecting the evolutionary history of these chloroplast genomes. Genes with varied codon usage divergence and base usage divergence were identified. This study provided novel evidence to support the further investigations of the molecular evolution of chloroplast genomes of Arachis species.
Materials & methods
Selection of chloroplast genomes and genes
A total of 21 completely sequenced chloroplast genomes of 14 species of the genus Arachis, including A. batizocoi, A. cardenasii, A. correntina, A. diogoi, A. duranensis, A. helodes, A. hoehnei, A. ipaensis, A. paraguariensis, A. pintoi, A. stenosperma, A. villosa, A. hypogaea (with 8 accessions), and A. monticola, were retrieved in the NCBI database. A total of 41 genes shared in all 21 chloroplast genomes (i.e., accD, atpA, atpB, atpE, atpF, atpI, ccsA, cemA, clpP, matK, ndhA, ndhC, ndhE, ndhF, ndhG, ndhH, ndhJ, petA, petD, psaA, psaB, psbA, psbB, psbD, rbcL, rpl14, rpl20, rpoA, rpoB, rpoC1, rps2, rps3, rps4, rps7, rps8, rps11, rps12, rps14, ycf2, ycf3, and ycf4) were selected for further analyses. To facilitate the analyses performed in this study, the selection of these genes was based on the following criteria: (1) the gene sequences were longer than 300 bp, (2) the starting codon of these genes was ATG, and (3) the total number of the bases was divisible by 3. Further, for the standardization of genes in different genomes, some genes with total sequence quantity less than 21 or 42 (for those double copy genes) were excluded.
Statistics of the molecular characteristics in gene sequences
In order to investigate the codon usage patterns, the basic molecular components of the gene sequences, including the occurrences of the adenine (A), the thymine (T), the cytosine (C), and the guanine (G), the contents of the third bases, the GC12, the GC3, as well as the overall GC contents and the number of codons were calculated based on the Matlab 2010b platform using the in-house scripts.
Calculation of the codon usage pattern
The effective numbers of codon (ENC) values of 41 genes in the genus Arachis were used to quantify their degrees of codon usage bias. The lower ENC value indicated that the inner codon usage was more biased, with 35 generally regarded as the bias threshold of the codon usage pattern [35]. Based on the codon quantity of the gene sequences counted, the ENC values of each gene were calculated by using the following Eq (1) [36,37]:
(1)
where
(k = 2, 3, 4, and 6) was the mean of fk values. The fk value was calculated by the following Formula (2) for the k-fold degenerate amino acids:
(2)
where n was the total number of occurrences of the codons for that amino acid, and the ni was the total number of occurrences of the i-th codon for that amino acid. The relationship between the ENC values and the GC3s ratios was generally used to evaluate the homogeneous property of codon usage in the genes. The comparison between the ENC values with the expected value calculated by ENCexpected = 2+s+{29/[s2+(1-s)2]} [38], with the parameter s representing the given composition of GC3s, was used to evaluate the evolutionary pressure in the genes.
The bias of base content within each gene of the chloroplast genomes of the genus Arachis was evaluated by the PR2 plot, with the AT-bias [A/(A + T)] plotted as the Y-axis and the GC-bias [G/(G + C)] as the X-axis [39]. The neutrality plot with the scatter diagram based on GC12 against GC3 of the gene sequences was used to identify the factors, i.e., the mutation pressure and the natural selection during the evolutionary history, influencing the evolutionary pressure in the genes. The relative synonymous codon usage (RSCU) values of genes of the genus Arachis in chloroplast genomes were calculated by the following Formula (3) [40]:
(3)
where the parameter gij denoted the observed number of the i-th codon for the j-th amino acid, and the ni represented the quantity of the types of synonymous codons for the amino acid. The RSCU values of genomes have been generally used for evaluating the bias of the synonymous codons [41]. The ideal RSCU value of a codon is equal to 1 if there is only the mutation affecting the codon usage pattern [42]. The higher RSCU value indicates that the corresponding codon is used more frequently in the gene. The codon is defined as more-abundant with its RSCU value larger than 1.0, suggesting that the codon is favored over the other codons, whereas the codon is considered as less-abundant with its RSCU value less than 1.0 [43,44].
As commonly used to describe the foreign gene expression in the host, the codon bias index (CBI) of the genes in the chloroplast genomes of the genus Arachis was calculated by the following Formula (4) [45]:
(4)
where the Nopt represented the total number of codon appeared in the superior sequences, the Nran represented the sum of codons for the occurrences of the superior codon when all the synonymous codons were randomly distributed, and the Ntot indicated the number of occurrence of the amino acid corresponding to the superior codon in the genes.
Codon usage divergence analysis
The protein length vs. GC ratio of each gene sequence was calculated to explore the influence of the sequence length on the GC ratio. Similarly, the influence of the ENC on the CBI in all gene sequences was assessed to study the codon usage pattern of all 41 genes. The correspondence analysis and the clustering analyses were further performed to investigate the evolutionary distances among the 21 chloroplast genomes of the genus Arachis based on their RSCU values. The divergences of codon usage in all 41 gene sequences were calculated by summing the standard deviations of all their codon usage parameters.
Results
Base usage of chloroplast genes in the genus Arachis
The contents of GC12, GC3s, and the overall GC of the 924 gene sequences (S1 Table in S1 File), including the 41 genes (3 with two copies) in 21 chloroplast genomes of the genus Arachis, were shown in the area graph (Fig 1A). The results showed that all 3 types of GC contents of these genes were less than 50%, revealing evidently that these gene sequences were AT-biased. The results of the PR2 bias plot with G3/(G3+C3) as X-axis and A3/(A3+T3) as Y-axis for these gene sequences showed no evident bias within the usage of the third bases of the codons (Fig 1B). The results of the neutrality plot based on the relationship between GC12/(GC12+AT12) and GC3/(GC3+AT3) for all gene sequences revealed that the ratios of GC3 ranged largely from 20% to 35%, while the ratios of GC12 mainly ranged from 35% to 50%, with both roughly showing normally distributed patterns (Fig 1C). The compositions for each of the three positions in codons were calculated to explore the overall base usage (Fig 1D). The results showed that the compositions of A and T varied over a larger range than that of the compositions of G and C at both the 3-base level and the third base level.
(A) Distributions of overall GC, GC12, and GC3s contents. (B) The PR2 plot. (C) The neutrality plot. (D) The base ratios of overall 41 genes in 21 chloroplast genomes.
Codon usage within chloroplast genes of the genus Arachis
As usually used to evaluate the evolutionary pressure of a gene, the ENC values of the 41 genes in 21 chloroplast genomes of the genus Arachis were calculated and presented by the ENC-GC3s plot (Fig 2A). The results showed that most of the data points were below the standard curve, suggesting that the corresponding genes were more biased and under both evolutionary pressures (i.e., natural selection pressure and mutation pressure). The ENC values were also displayed separately for these genes (Fig 2B). The results showed that the standard deviations for different genes were significantly different, revealing that the ENC values for certain genes (i.e., the rps8, ndhG, and atpF) were variable. The ENC values were further normalized to the range of the CBI values, with the fitting results showing evidently the negative correlations between the codon bias and the ENC values, as indicated by the correlation coefficient of -0.26 (Fig 2C).
(A) Effective number of codon (ENC) vs. GC3s plot. The blue curve denotes the expected theoretical ENC values with no evolutionary constraint, the dots under the blue line indicate bias in the genes, and the dots above the line denote little or no bias but homogeneous codon usage in these gene sequences. (B) The ENC values for individual genes. (C) Relationships between the ENC values (normalized to the range of the CBI distributions) and the CBI values of 41 genes in 21 chloroplast genomes of the genus Arachis.
The correlation analysis was performed to identify the relationships among the codon usage parameters, i.e., the ENC, the compositions of G, C, A, and U, and the length of protein encoded by genes, of the 41 genes in 21 chloroplast genomes of the genus Arachis (Fig 3). The results revealed higher GC3s rate (correlation coefficient = 0.480) and higher ENC values (correlation coefficient = 0.321) in the longer genes, while the ENC values showed positive correlation with the GC3s (correlation coefficient = 0.486).
“Ami No” represents the total number of amino acids based on the gene.
To reveal the proportion of codons encoding the identical amino acid in the 41 genes of 21 chloroplast genomes of the genus Arachis, the RSCU values and the codon quantity of these genes, including the three terminal codons (UAA, UAG, and UGA) and the one-dimensional degenerate codons (AUG and UGG), were calculated (Fig 4). The results showed that the abundant codons with RSCU values higher than 1.5 included UUA, GUU, UCU, ACU, GCU, UAU, CAU, CAA, AAU, GAU, AGA, and GGA, while the less-abundant codons with RSCU values less than 0.5 included CUC, CUG, AGC, ACG, GCG, UAC,CAC, CAG, AAC, GAC, CGC, AGG, and GGC. It was noted that about two-thirds of the terminal codons were UAA.
The names of amino acids are abbreviated and listed along the Y-axis. “Stop” represents the terminal codons of UAA, UAG, and UGA. The CQ values represent the total codons in the 924 coding sequences.
Codon usage divergences of chloroplast genes of the genus Arachis
It was noted that the codon usage divergences of individual genes could not be evaluated appropriately by the overall codon usage patterns (Figs 1–4). Therefore, the RSCU values for the 41 genes in 21 chloroplast genomes of the genus Arachis were calculated separately (Fig 5). The distributions of RSCU values varied greatly among these genes, showing that the RSCUs among the chloroplast genomes of the genus Arachis were different from each other as detected by the independent considerations of these genes, while the codon usage preferences in these genes were of significant differences. The results also showed that not all genetic codons were detected for some amino acids. For example, no codons for tryptophan (Trp) were revealed in the genes atpA, atpB, ndhE, rpl1, and rps1 in all 21 chloroplast genomes. Similarly, no corresponding codons were found for histidine (His), asparagine (Asn), and cystine (Cys) in gene ndhC of all 21 chloroplast genomes.
The blank boxes indicate the absence of corresponding codons for certain amino acids.
The variations of the RSCU values of the 41 genes among the 21 chloroplast genomes of the genus Arachis were evaluated by their Euclidean distances to reveal the differences on the relative codon usage for certain amino acids (Fig 6). The correspondence analysis of these chloroplast genomes was conducted based on their RSCU values with the exclusions of three terminal codons and the codons for Met and Trp (the inset graph of Fig 6). Both analyses revealed largely congruent groupings among the 21 chloroplast genomes of the genus Arachis, which were realized into three groups with the chloroplast genome of A. pintoi recognized in one group, three species (i.e., A. ipaensis, A. helodes, and A. cardenasii) revealed in another group, and the other 17 chloroplast genomes in the third group. All eight accessions of A. hypogaea were revealed in one clade with the inclusion of A. monticola as well. The Euclidean distances among the RSCU values revealed significantly higher variations in genomes of wild species of Arachis than those of the cultivated taxa (i.e, A. hypogaea).
The codon usage divergences of the 41 genes in 21 chloroplast genomes of the genus Arachis were also evaluated by their base usage diversity. In order to further explore the divergences of different genes at the base usage level, the 41 genes in 21 chloroplast genomes of the genus Arachis were considered simultaneously to calculate their base usage (Table 1). The GC3 contents for all genes were lower than 50%, while the GC12 contents in some genes (i.e., psbB, rbcL, and rps11) were higher than 50%. These results showed that although the GC content for all genes could be plotted (Fig 1A), it is still necessary to explicitly identify the GC contents for individual genes. Based on the standard deviations, the GC12 content for all 41 genes were more consistent than that of the GC3.
Data are presented as mean ± standard deviation. Two copies of each of the three genes (i.e., rps12, rps7, and ycf2) are detected in each of the 21 chloroplast genomes.
To explore the degree of the base usage diversity in the 41 genes, the standard deviation values of base component parameters, including percentages of A, G, C, T, G3, C3, A3, T3, GC12, GC3, and overall GC of each gene were calculated and summed (Fig 7). The results showed that the diversities of these genes were not only affected by the mutations in the genes, but also by the sequence lengths of these genes. The mutations in shorter genes showed evidently larger impact on the composition than that in the longer genes, while some longer genes, such as ycf2, psbB, atpE and psbD, were generally of lower base usage diversity. Among the 41 genes, the rps12 contained the most stable sequence (base usage divergence = 0) showing no variations in their sequences of 21 chloroplast genomes. Further, double-copy genes showed relatively more stable characteristics, such as the genes with longer sequences ycf2 and rps7, and the gene with shorter sequence rps12.
The X-axis represents the length of gene sequences.
Discussion
In our study, a total of 21 chloroplast genomes of the genus Arachis were retrieved from the NCBI database to explore the applications of their codon usage divergences in the characterizations of several molecular parameters. It was expected that our findings based on a total of 41 genes shared among these chloroplast genomes of the genus Arachis would greatly facilitate the studying on molecular characteristics in the genus Arachis based on selected genes.
Several molecular parameters, including the codon usage patterns, the ENC values, the RSCU values, the PR2 plot, the neutrality plot, and the GC contents, were calculated for a total of 41 genes shared among a total of 21 chloroplast genomes of the genus Arachis. Furthermore, the standard deviations of both the ENC values of each gene and their compositions were calculated to reveal their codon usage divergences (Figs 1D and 2B). The results showed that five genes, including rps8 (ENC value of 41.413±2.443), atpF (43.976±0.623), ndhE (46.558±0.659), ndhG (50.349±0.683), and ydf3 (53.625±0.496), showed higher codon usage divergences, while other genes, i.e., atpB (ENC value of 47.928±0.049), ndhF (42.690±0.089), psaB (50.414±0.027), psbD (45.887±0.006), rbcL (47.370±0.075), rpoB (48.100±0.041), rps2 (46.661±0.096), and rcf2 (53.269±0.036), showed lower codon usage divergences. These results were consistent with those reported previously, showing broad distributions of ENC values, suggesting that the codon usage bias in chloroplast genomes were influenced by the combined effects of both mutation pressure and natural selection [46]. For example, the gene sequences of rpl20 were under stronger mutation pressure as suggested by their homogeneous codon usage, while some other genes, e.g., rps14, rps8, and petD, were under both mutation pressure and natural selection pressure as revealed by their uneven and more biased codon usage. To date, the investigations of genetic engineering based on chloroplast genomes are commonly performed [47]; phylogenetic study on some plant suggested typical relationship among chloroplast genomes from different areas [48]. The chloroplast genomes have been established as the ideal markers for both phylogenetic studies [49] and significant contribution to the enhanced resistance to environmental stresses of host plants [50]. It was speculated that our study provided novel insights into the evolutionary characteristics of the genus Arachis based on selected genes.
In order to study the base usage diversity in genes of the genus Arachis chloroplast genomes, the base usage patterns of 41 genes were evaluated with the mean and the standard deviations of the basic base compositions calculated (Table 1). The base usage diversity of these genes was further assessed by their divergences (Fig 7). The results showed that five genes, i.e., ndhC (with the degree of base usage divergence of 0.035), petD (0.028), atpF (0.026), rpl14 (0.021), and rps11 (0.020), were of significantly higher diversity as indicated by their high degrees of base usage divergence. Although the base usage in genes based on the overall compositions was usually considered as the indicator for evaluating the diversity of the gene sequences, this method was not appropriate for evaluating the base mutations [51]. This was because that the diversity of nucleotides in the gene sequences may not lead to any functional changes in the genes. However, the distances among the basic functional units, e.g., the codon usage pattern of sequences determined by the RSCU values, would be more reliable for clustering analysis [52]. Therefore, the RSCU values of the 41 genes were further calculated to explicitly identify the potential variations in their biological functions based on the base mutations in the 21 chloroplast genomes of the genus Arachis. Previous studies revealed that the cultivated taxa of A. hypogaea contained the morphological characters (i.e., runner type habit without floral spikes) similar to those of the wild species of Arachis [53]. However, the results in the present study showed that all the chloroplast genomes in the cultivated peanuts (A. hypogaea) show comparable molecular characteristics (Fig 6). For some organisms, the location of the operon affects the efficiency of protein translation [26,54]. Codon usage characteristics in the genes considered in this paper did not show dependence on their location.
The chloroplast genomes of Arachia have been characterized [55,56]. Furthermore, the classification of the genus Arachis has also been studied based on the characteristics in the genus Arachis chloroplast genomes. For example, the population of A. duranensis distributed in Salta and Argentina was identified by the combined analysis of chloroplast DNA and non-transcribed spacer 5S rDNA sequences [57]. Our results revealed the narrow genetic base in the cultivated peanuts, which was likely caused by a single polyploidization event isolating the cultivated taxa from the wild species of Arachis [58]. Our study identified the evolutionarily conserved characteristics of genes of the genus Arachis in their chloroplast genomes to show these general applications in these evolutionary and molecular investigations.
Conclusions
Advanced molecular techniques have been constantly developed to enhance our understanding of the functions of genes and genomes in peanuts in order to facilitate the studying on the evolution of peanuts based on molecular characteristics. In this study, the patterns of base usage and codon usage based on several molecular characteristics (i.e., the base composition, the ENC, the RSCU, the CBI, and their correlation coefficients) of a total of 41 genes in 21 chloroplast genomes in the genus Arachis were investigated to further perform the correspondence and clustering analyses among these genomes. The results revealed significantly higher variations in genomes of wild species than those of the cultivated taxa in the genus Arachis, while the codon usage patterns of all 41 genes in the genus Arachis were AT-rich with five genes (i.e., ndhC, petD, atpF, rpl14, and rps11) of higher codon usage divergences, suggesting that the natural selection was the main factor affecting the evolutionary history of these genomes. Furthermore, five genes (i.e., ndhE, ndhG, atpF, rps8, and ycf3) and nine genes (i.e., atpB, ndhF, psbD, psaB, psbD, rbcL, rpoB, ycf2, and rps2) showed higher and lower base usage divergences, respectively. This study provided novel evidence based on the codon usage patterns to enhance our understanding of the molecular evolution of chloroplast genomes of the genus Arachis, facilitating the technical improvement of molecular phylogenetic investigation, and evaluation in the genus Arachis based on selected genes.
Supporting information
S1 File. S1 Table Compositional results of 924 concerned sequences.
https://doi.org/10.1371/journal.pone.0281843.s001
(XLS)
References
- 1. Osman ER, Talaat AA, Steven JK. Phylogenetic analyses of peanut resistance gene candidates and screening of different genotypes for polymorphic markers. Saudi J Biol Sci. 2010; 17: 43–49. pmid:23961057
- 2. Zhuang W, Chen H, Yang M, Wang J, Manish KP, Zhang C. The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nat Genet. 2019; 51: 865–876. pmid:31043757
- 3. Wang X, Liu Y, Huai D, Chen Y, Jiang Y, Ding Y. Genome-wide identification of peanut PIF family genes and their potential roles in early pod development. Gene. 2021; 781: 145539. pmid:33631242
- 4. He M, Yang X, Cui S, Mu GJ, Hou M, Chen H. Molecular cloning and characterization of annexin genes in peanut (Arachis hypogaea L.). Gene. 2015; 568: 40–49.
- 5. Zhang T, Hu S, Yan C, Li C, Zhao X, Wan S. Mining, identification and function analysis of micrornas and target genes in peanut (arachis hypogaea l.). Plant Physiol Bioch. 2017; 111: 85–96. pmid:27915176
- 6. Liu H, Lu Y, Lan B, Xu J. Codon usage by chloroplast gene is bias in Hemiptelea davidii. Journal of Genetics. 2020; 99(1):1–11.
- 7. Zhang X, Wan Q, Liu F, Zhang K, Sun A, Luo B, et al. Molecular analysis of the chloroplast cu/zn-sod gene (ahcsd2) in peanut. Crop J. 2015, 3: 246–257.
- 8. Wang J, Li Y, Li C, Yan C, Shan S, Shi C, et al. Twelve complete chloroplast genomes of wild peanuts: great genetic resources and a better understanding of Arachis phylogeny. BMC Plant Biol. 2019; 19: 504.
- 9. Yin D, Wang Y, Zhang X, Ma X, He X, Zhang J. Development of chloroplast genome resources for peanut (Arachis hypogaea L.) and other species of Arachis. Sci Rep. 2017; 7: 11649.
- 10. Chakraborty S, Yengkhom S, Uddin DA. Analysis of codon usage bias of chloroplast genes in Oryza species. Planta. 2020; 252: 67.
- 11. Li G, Pan Z, Gao S, He Y, Xia Q, Jin Y, et al. Analysis of synonymous codon usage of chloroplast genome in Porphyra umbilicalis. Genes Genom. 2019; 41: 1173–1181.
- 12. Li G, Zhang L, Xue P, Zhu M. Comparative Analysis on the Codon Usage Pattern of the Chloroplast Genomes in Malus Species. Biochemical Genetics, 2022 (Online ahead of print). pmid:36414922
- 13. Bishi SK, Kumar L, Mahatma MK, Khatediya N, Chauhan SM, Misra JB. Quality traits of Indian peanut cultivars and their utility as nutritional and functional food. Food Chem. 2015; 167: 107–114. pmid:25148966
- 14. Zhou LZ, Chen FS, Hao LH, Du Y; Liu C. Peanut Oil Body Composition and Stability. J. Food Sci. 2019; 84: 2812–2819. pmid:31546282
- 15. Toomer OT. Nutritional chemistry of the peanut (Arachis hypogaea). Crit Rev Food Sci Nutr. 2018; 58: 3042–3053.
- 16. Pandey S, Muthamilarasan M, Sharma N, Chaudhry V, Dulani P, Shweta S, et al. Characterization of DEAD-box family of RNA helicases in tomato provides insights into their roles in biotic and abiotic stresses. Environ Exp Bot. 2019; 158: 107–116.
- 17. Jalil N, Pirani A, Moazzeni H, Mahmoodi M, Zare G, Noormohammadi A, et al. The new locally endemic genus Yazdana (Caryophyllaceae) and patterns of endemism highlight the high conservation priority of the poorly studied Shirkuh Mountains (central Iran). J Syst Evol. 2020; 58: 339–353. pmid:32612642
- 18. Da P, Hülber K, Willner W, Schneeweiss MG. An explicit test of Pleistocene survival in peripheral versus nunatak refugia in two high mountain plant species. Mol Ecol. 2020; 29: 172–183. pmid:31765501
- 19. Alex H, Huang K, Hulst van R, Tissen B, Caplan LJ, Koppula A, et al. Sex Determination by Two Y-Linked Genes in Garden Asparagus. Plant Cell. 2020; 32: 1790–1796. pmid:32220850
- 20. Wong KS, Soltis DE, Leebens-Mack J, Wickett NJ, Barker SW, Peer VY, et al. Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants. Annu Rev Plant Biol. 2020; 71: 741–765. pmid:31851546
- 21. Bell DC, Gonzalez AL. Historical Biogeography and Temporal Diversification in Symphoricarpos (Caprifolieae, Caprifoliaceae, Dipsacales). Syst Bot. 2019; 44: 83–89.
- 22. Pandey S, Prasad A, Sharma N, Prasad M. Linking the plant stress responses with RNA helicases. Plant Sci. 2020; 299: 110607. pmid:32900445
- 23. Huang X, Tan W, Li F, Liao R, Guo Z, Shi T, et al. The chloroplast genome of Prunus zhengheensis: Genome comparative and phylogenetic relationships analysis. Gene. 2021; 793: 145751.
- 24. Bahiri-Elitzur S, Tuller T. Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J. 2021; 19: 2646–2663. pmid:34025951
- 25. Henriquez LC, Abdullahb , Ahmed I, Carlsen MM, Zuluaga A, Croat BT, et al. Evolutionary dynamics of chloroplast genomes in subfamily Aroideae (Araceae). Genomics. 2020; 112: 2349–2360.
- 26. Li G, Zhang L, Xue P. Codon usage pattern and genetic diversity in chloroplast genomes of Panicum species. Gene, 2021; 802: 145866.
- 27. Alan C, Cibrián-Jaramillo A, Karremans PA, Martinez MD, Hernandez-Hernandez J, Brym M, et al. Genotyping-By-Sequencing diversity analysis of international Vanilla collections uncovers hidden diversity and enables plant improvement. Plant Sci. 2021; 311: 111019. pmid:34482920
- 28. Dodsworth S, Guignard MS, Christenhusz MJM, Cowan RS, Knapp S, Maurin O, et al. Potential of Herbariomics for Studying Repetitive DNA in Angiosperms. Front Ecol Evol. 2018; 6: 174.
- 29. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016; 17: 134. pmid:27339192
- 30. Prabhudas SK, Sowjanya P, Parani M, Purushothaman N. Shallow Whole Genome Sequencing for the Assembly of Complete Chloroplast Genome Sequence of Arachis hypogaea L. Front Plant Sci. 2016; 27: 1106.
- 31. Savadi S, Mangalassery S, Sandesh MS. Review Advances in genomics and genome editing for breeding next generation of fruit and nut crops. Genomics. 2021; 113: 3718–3734.
- 32. Almutairi MM. Analysis of chromosomes and nucleotides in rice to predict gene expression through codon usage pattern. Saudi J Biol Sci. 2021; 28: 4569–4574. pmid:34354442
- 33. Li N, Sun M, Jiang Z, Shu H, Zhang S. Genome-wide analysis of the synonymous codon usage patterns in apple. J Integr Agr. 2016; 15: 983–991.
- 34. Chakraborty S, Sophiarani Y, Uddin A. Free energy of mRNA positively correlates with GC content in chloroplast transcriptomes of edible legumes. Genomics. 2021; 113: 2826–2838. pmid:34147635
- 35. Liu H, Lu Y, Lan B, Xu J. Codon usage by chloroplast gene is bias in Hemiptelea davidii. Journal of Genetics. 2020; 99: 8.
- 36. The Wright F. ‘effective number of codons’ used in a gene. Gene. 1990; 87: 23–29.
- 37. Fuglsang A. Estimating the ‘‘Effective Number of Codons”: The Wright Way of Determining Codon Homozygosity Leads to Superior Estimates. Genetics. 2006; 172: 1301–1307. pmid:16299393
- 38. Gajbhiye S, Patra PK, Yadav MK. New insights into the factors affecting synonymous codon usage in human infecting Plasmodium species, Acta Trop. 2017; 176: 29–33. pmid:28751162
- 39. Sueoka N. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene. 1999; 238: 53–58. pmid:10570983
- 40. Yengkhom S, Uddin A, Chakraborty S. Deciphering codon usage patterns and evolutionary forces in chloroplast genes of Camellia sinensis var. assamica and Camellia sinensis var. sinensis in comparison to Camellia pubicosta. J Integr Agr. 2019; 18: 2771–2785.
- 41. LaBella LA, Opulente AD, Steenwyk LJ, Hittinger TC, Rokas A. Variation and selection on codon usage bias across an entire subphylum. Plos Genet. 2019; 15: e1008304. pmid:31365533
- 42. Hui S, Jing L, Tao C, Nan Z. Synonymous codon usage pattern in model legume Medicago truncatula. J Integr Agr. 2018, 17: 2074–2081.
- 43. Nakamura M, Sugiura M. Translation efficiencies of synonymous codons for arginine differ dramatically and are not correlated with codon usage in chloroplasts. Gene. 2011; 472: 50–54. pmid:20950677
- 44. Barbhuiya AP, Uddin A, Chakraborty S. Analysis of compositional properties and codon usage bias of mitochondrial CYB gene in anura, urodela and gymnophiona. Gene. 2020; 751: 144762.
- 45. Choudhury NM, Uddin A, Chalkraborty S. Codon usage bias and its influencing factors for Y-linked genes in Human. Comput Biol Chem. 2017; 69: 77–86. pmid:28587988
- 46. Wang Z, Xu B, Li B, Zhou Q, Wang G, Jiang X, et al. Comparative analysis of codon usage patterns in chloroplast genomes of six Euphorbiaceae species. PeerJ. 2020; 8: e8251. pmid:31934501
- 47. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016; 17: 134. pmid:27339192
- 48. Roman M. G., & Houston R. Investigation of chloroplast regions rps16 and clpP for determination of Cannabis sativa crop type and biogeographical origin. Legal medicine (Tokyo, Japan), 2020; 47: 101759. pmid:32711370
- 49. Wu ZQ, Ge S. Phylogeny of the BEP clade in grasses revisited: evidence from whole genome sequences of chloroplast. Mol Phylogenet Evol. 2012; 62: 578–578.
- 50. Wang J, Li CJ, Yan CX, Zhao XB, Shan SH. A comparative analysis of the complete chloroplast genome sequences of four peanut botanical varieties. PeerJ. 2018; 6: e5349. pmid:30083466
- 51. Zhang Y, Wang ZF, Guo YA, Chen S, Xu XY, Wang RJ. Complete chloroplast genomes of Leptodermis scabrida complex: Comparative genomic analyses and phylogenetic relationships. Gene. 2021; 791: 145715.
- 52. Niu Y, Luo YY, Wang CL, Liao WB. Deciphering Codon Usage Patterns in Genome of Cucumis sativus in Comparison with Nine Species of Cucurbitaceae. Agronomy. 2021; 11: 2289.
- 53. Stalker HT, Simpson CE. Germplasm resources in Arachis. In Advances in Peanut Science. Pattee HE, Stalker HT (Eds.); Am. Peanut Res. and Educ. Soc., Stillwater, OK, 1995: 14–53.
- 54. Tessa E.F. Q, Yuri I. W, Jasper J. K, Omri W, Richard van der O, Wenqi R. Differential Translation Tunes Uneven Production of Operon-Encoded Proteins. Cell Reports, 2013; 4: 938–944. pmid:24012761
- 55. Prabhudas SK, Prayaga S, Madasamy P, Natarajan P. Shallow Whole Genome Sequencing for the Assembly of Complete Chloroplast Genome Sequence of Arachis hypogaea L. Front Plant Sci. 2016; 7: 1106.
- 56. Zhang XL, Zhu LL, Song DL, Li FZ. Characterization of the complete chloroplast genome of Arachis pintoi Krapov. & WCGreg., a perennial leguminous forage. Mitochondrial DNA B. 2021; 6: 3452–3453.
- 57. Grabiele M, Chalup L, Robledo R, Seijo G. Genetic and geographic origin of domesticated peanut as evidenced by 5S rDNA and chloroplast DNA sequences. Plant Syst Evol. 2012; 298: 1151–1165.
- 58. Kochert G, Stalker HT, Gimenes M, Galgaro L, Romero LC, Moore K. RFLP and cytogenetic evidence on the origin and evolution of the allotetraploid domesticated peanut, Arachis hypogaea (Leguminosae). Am J Bot 1996; 83: 1282–1291.