A Global Analysis of the Polygalacturonase Gene Family in Soybean (Glycine max)

Polygalacturonase is one of the pectin hydrolytic enzymes involved in various developmental and physiological processes such as seed germination, organ abscission, pod and anther dehiscence, and xylem cell formation. To date, no systematic analysis of polygalacturonase incorporating genome organization, gene structure, and expression profiling has been conducted in soybean (Glycine max var. Williams 82). In this study, we identified 112 GmPG genes from the soybean Wm82.a2v1 genome. These genes were classified into three groups, group I (105 genes), group II (5 genes), and group III (2 genes). Fifty-four pairs of duplicate paralogous genes were preferentially identified from duplicated regions of the soybean genome, which implied that long segmental duplications significantly contributed to the expansion of the GmPG gene family. Moreover, GmPG transcripts were analyzed in various tissues using RNA-seq data. The results showed the differential expression of 64 GmPGs in the tissue and partially redundant expression of some duplicate genes, while others showed functional diversity. These findings suggested that the GmPGs were retained by substantial subfunctionalization during the soybean evolutionary processes. Finally, evolutionary analysis based on single nucleotide polymorphisms (SNPs) in wild and cultivated soybeans revealed that 107 GmPGs had selected site(s), which indicated that these genes may have undergone strong selection during soybean domestication. Among them, one non-synonymous SNP of GmPG031 affected floral development during selection, which was consistent with the results of RNA-seq and evolutionary analyses. Thus, our results contribute to the functional characterization of GmPG genes in soybean.


Introduction
The plant cell wall is involved in many essential biological processes such as cell elongation, sloughing of cells at the root tip, fruit softening, fruit decay, pollen dehiscence and abscission of organs including leaves, floral parts, and fruits [1][2][3]. Therefore, the function and regulation of the cell wall have always piqued the interest of researchers [4]. Cell wall network assembly occurs through the action of cell wall hydrolytic enzymes, including polygalacturonase (PG), β-1,4-endoglucanases, pectate lyase, and pectin methyl esterase, which cleave the bonds between the polymers that make up the cell wall [5][6].
Among these hydrolytic enzymes, the PG belongs to one of the largest hydrolase families, which catalyze α- (1)(2)(3)(4) linkages between D-galacturonic acid residues in homogalacturonan, causing cell separation [7]. Thus, PG activities are associated with a wide range of plant developmental programs such as seed germination, embryo development, organ abscission, pod and anther dehiscence, pollen grain maturation, xylem cell formation, and pollen tube growth [8][9]. Previous studies reported that PG was present in the endosperm cap of tomato seeds, and the activity of PG increased during seed germination [10][11]. The Arabidopsis exo-polygalacturonase gene NIMNA causes cell elongation defects in the early embryo and markedly reduces suspensor length [12]. Knocking out three Arabidopsis PGs led to the failure of pollen grain separation, and silique and anther dehiscence [13][14]. Over-expression of a PG in transgenic apple trees (Malus domestica) alters the leaf morphology and causes premature leaf shedding [15]. In tomato fruit, a correlation between endo-PG activity and softening was observed in some cultivars [16]. Moreover, the functions of PGs are not only restricted to plant developmental processes but also include wound responses and host-parasite interactions [17][18]. Thus, these findings illustrate that plant PG genes have extensive functional divergence.
Plant PGs are multifunctional proteins encoded by a large gene family. To date, genomewide analyses of the PG gene family have focused mainly on annual herbaceous plants such as Arabidopsis and rice. The Arabidopsis and rice genomes contain 66 and 42 PG members, respectively, which are divided into three distinct groups [19]. Comparative analyses of this gene family will help understand the expansion and functional diversification of this large gene family. Soybean (Glycine max) is planted worldwide as an essential protein and oil crop; however, the functional characterization of PG genes in soybean has rarely been reported. In the present study, we conducted a detailed analysis of GmPG genes based on the genome Wm82. a2v1 including genome organization, gene structure, expression compendium, and selective effects of GmPG genes during soybean domestication. Our results may provide a subset of potential candidate PG genes for future engineering modification.

Sequence retrieval and phylogenetic analysis
To identify soybean PG gene family members, 66 Arabidopsis PG protein sequences were used to search the soybean genome database version 2 (http://www.phytozome.net/) using the TBLASTP program. The cut-off of E-value was 1e-10, and the score was 40%. Previous studies have shown that all PG proteins contained glycosyl hydrolase family 28 (GH28) domains (Kim et al., 2006) [19]. Thus, apart from sequence similarities, all collected soybean PG candidates were primarily analyzed using the protein families database (Pfam) to confirm the presence of GH28 domains in their protein structures. Multiple sequence alignments of the full-length protein sequences were performed by Clustal X (version 1.83) program [20]. The unrooted phylogenetic trees were constructed with MEGA 5.0 using the maximum-likelihood (ML) methods, and the bootstrap test was carried out with 1000 iterations [21]. The program MEME version 4.11.1 was used for the elucidation of motifs in 112 deduced soybean PG protein sequences (http://meme.sdsc.edu) [22].

Genomic structure and gene duplication
Gene structure display server (GSDS) program was used to illustrate exon/intron organization of individual PG genes by comparison of the cDNA with its corresponding genomic DNA sequence from Phytozome (http://www.phytozome.net/) [23]. The identification of homologous chromosome segments resulting from whole-genome duplication events was accomplished as described in Schmutz et al. (2010) [24].

Expression analysis of GmPG genes
Transcript data of the GmPG genes were downloaded from the Soybase database (http:// soybase.org/). These were obtained from various tissues and developmental stages, including vegetative tissues (e.g., young leaf, root, and nodule), reproductive tissues (e.g., flower, one cm pod, pod shell of 10 and 14 days after flowering), and seeds from seven developmental stages (10,14,21,25,28,35, and 42 days after flowering). All transcript data were analyzed with Cluster 3.0 [25], and the heat map was viewed using Java Treeview [26].

Evolutionary analysis of GmPG genes
SNPs of the GmPG genes were downloaded from the NCBI dbSNP database based on the resequencing of 302 wild and cultivated soybean genomes [27]. Moreover, we analyzed the ratio of each SNP in wild and cultivated soybean populations. The SNP site with reverse distribution ratio in different types of soybean population was defined as a putative selective site throughout domestication.

DNA polymorphism analysis
Candidate SNP of the soybean PG031 gene was analyzed with a cleaved amplified polymorphic sequence (CAPS) marker, as follows. PCR using primers 5'-CTGTATCTCATTGGGTGATG GTAAC-3' and 5'-CCTGTTATTACGGGCT TGACG-3' amplified a 623-bp fragment from the genomic DNA. The amplified fragment from the PG031 289H allele had an Nsi I site containing the SNP; thus, it was digested into 217-bp and 406-bp fragments using this enzyme, whereas the PG031 289Y allele remained undigested. All genomic DNA of wild soybean and cultivars were provided by the Key Laboratory of Soybean Molecular Design Breeding of Northeast Institute of Geography and Agroecology.

Gene expression model
A 1,786-bp fragment upstream of the PG031 start codon was PCR amplified from the soybean genomic DNA using the following primers (5'-TAAAGTTCAAGGTGTTAGGAAGGTG-3' and 5'-ATTGTTTTTGTTTTTGTTTGTGGCA-3') for investigating PG031 gene expression patterns. The PCR product was cloned into the Pst I/Nco I-digested pMDC1001G vector to generate the PG031 pro :GUS expression construct. Plants were transformed with Agrobacterium tumefaciens strain LBA4404 using the floral dip method [28]. Positive transformants were selected on 1/2MS plates containing 50 mgÁL -1 kanamycin. Flowers and siliques of T 2 transgenic plants were subjected to GUS staining. GUS histochemical staining was performed by using 5-bromo-4-chloro-3-indolyl-b-D-glucuronide as substrates [29]. The tissues were decolored in 75% ethanol and images of GUS staining were recorded using a VHX digital microscope (Japan) or a Canon camera (Japan).

Generation of the GmPG031 transgenic Arabidopsis thaliana
To obtain the transgenic Arabidopsis lines, the full length CDS of the GmPG031 289H and GmPG031 289Y genes were respectively amplified with the gene specific primers (5'-ATGA AGTTCACTATAATCACA ATAT-3' and 5'-CTAGGCTGCACAAGTAGGAG-3'), and then linked with expression vector under the control of the strong constitutive CaMV35S promoter. The recombinant construct was transformed into Col-0 Arabidopsis as above, and transgenic lines were obtained by RT-PCR identification wherein the T 3 lines were used for phenotypic analysis. Seedlings were transferred from 1/2 MS plates to the soil for growth in a greenhouse under controlled environment conditions (21-23°C, 200μmol photons m -2 s -1 , 70% relative humidity, 16 h light/8 h dark cycles). Siliques were measured for at least three plants from each transgenic line.

Sequence and phylogenetic relationships of soybean PG genes
We identified 112 genes encoding putative PG proteins in the soybean genome Wm82.a2v1 using the Phytozome database (S1 File). The detailed information of PG family genes in soybean including gene locus, location, and similarities to their Arabidopsis orthologs as well as amino acids are listed in Table 1. The 112 GmPG genes were distributed throughout the 20 soybean chromosomes and were numbered from GmPG001 to GmPG112 according to their localization. These identified PG genes in soybean encode proteins ranging from 67 to 882 amino acids (aa) with an average of 398 aa. Remarkably, in most cases, two or more soybean PG genes were found for every ortholog in Arabidopsis. We speculate that the presence of more GmPG genes may reflect a great need for complicated transcriptional regulation in this leguminous plant.
Subsequently, we constructed an unrooted tree to examine the phylogenetic relationships among GmPG genes using alignments of the full-length amino-acid sequences in their coding PG proteins ( Fig 1A). The phylogenetic tree showed that the PGs formed three distinct clades (red, green, and blue boxes) with 100% bootstrap support. In the tree, the PG genes in the red, blue, and green clades were termed cluster I, II, and III PGs, respectively. Soybean classes I, II, and III PG genes contained 105, 5, and 2 members, respectively. Class I was divided into subfamilies CI 1 to CI 13 according to the most recent common ancestor (MRCA) of soybean. Phylogenetic tree topology revealed that 36 GmPG gene pairs located at the terminal nodes shared a sequence similarity of 52%~99%. This implied that these genes were homologous genes that diverged by gene duplication.
Additionally, through multiple alignment analysis, we also discovered the features of the homologous domain sequence and the frequency of the amino-acids at each position on the GmPG domains. As shown in S1 Fig, three distinct motifs, which are the main domains of the PG family, were identified. Among the 112 GmPG genes, motifs 1 and 3 were present in most of the GmPG family members (Fig 1B). Noticeably, some specific motifs were present in PGs; for instance, domains 1 and 3 of subfamily CI 10 -CI 13 , domains 1 and 2 of Cluster II, and domains 2 and 3 of Cluster III. However, some of these were PG fragments and had no domains such as GmPG062, GmPG069, GmPG092, and GmPG018, which were considered to be pseudogenes in the study. These results suggested that these motifs might confer unique functional roles to soybean PG proteins. Gene structure and gene duplication of soybean PG genes To determine the numbers and positions of exons and introns within each soybean PG gene, we used the full-length cDNA sequences with the corresponding genomic DNA sequences. We observed that introns disrupted most of the coding sequences of the PGs. By contrast, two genes (GmPG062 and 018) had no introns in their coding region (Fig 1C). The remaining genes had up to 11 introns based on their relative positions. These results supported the argument that gene structural diversity was a possible explanation to the evolution of multigene families [30]. However, gene structures not only showed extreme similarity in most of the closely related GmPG members at the same node, but the position and length of intron were almost completely conserved. This high level of similarity suggested that these genes arose from a recent duplication event. Furthermore, 26 GmPG genes contained two to six alternative structures that had undergone alternative splicing (AS) and thus produced a variety of transcripts from a single gene (Fig 2). Among these genes, 21 genes underwent extension, shortening or deletion of exon sequences, three underwent 5'-UTR events, and three had competing 5'/3'-UTR events. Interestingly, GmPG073 and GmPG080 exhibited six alternative types of splicing by 5/3' alternative splice and extending or shortening the exon. The remaining three alternative splice events  (GmPG040, GmPG082, and GmPG091) occurred in the 5'-UTR region without affecting the coding frame, which indicated that they created a variety of UTRs that may play a key role in gene regulation. Besides, some of the AS events resulted in a variety of domain insertions and/ or deletions in the corresponding coding region. For instance, the exon deletion in GmPG035 resulted in the deletion of domain 1. The AS events enriched gene structures and might be a consequence of function diversity. Moreover, gene duplication occurs throughout plant evolution, thereby contributing to the establishment of gene-family expansion and new gene functions [31][32]. Paralogous segments created by this whole-genome duplication event were identified in previous analyses of the soybean genome [24]. Fifty-four duplicate pairs relative to the corresponding duplicate blocks are illustrated in Fig 3 and S1 Table, including 53 segmental duplications and one tandem duplication. Twenty-three PG pairs were clearly located in collinear regions and formed eight blocks. For example, GmPG004, 005, 006 showed extensive collinearity corresponding to the duplicated regions GmPG021, 020 and 019; GmPG045, 046, 047, 048 were duplicated by GmPG081, 082, 083, 084, and 086, which were collinearly arranged. These results indicated that these eight blocks were derived from large-scale duplications of their associated blocks. Moreover, GmPG033 and GmPG039 were flanked by GmPG054, whereas GmPG056, 083, and 094 were flanked by GmPG047, indicating that these are products of a block duplication. All of these findings clearly suggest that several members of the PG family were derived from large-scale duplication events.

Differential expression profile of soybean PG genes
Gene expression pattern may provide important clues to gene function [33]. We therefore obtained the previously publicly-available RNA-seq data across six soybean tissues and seven seed developmental stages [34]. RNA-seq data analysis showed that 64 GmPG genes had sequence reads in at least one tissue (Fig 4). These genes were clustered into five groups (A-E) and four groups (I-IV) based on their expression profiles in the soybean tissues (except seeds) and the expression patterns during seven soybean seed development stages (Fig 5). These genes in clusters A-E were mainly expressed in flower, root/flower/pod, root, nodule, and pod/leaf/root/flower, respectively. Further, most genes of cluster II were highly expressed at the earlier stage of seed development. Most genes of cluster III were expressed during the whole soybean seed development process. Thus, the wide expression of these genes illustrated that soybean PG genes had extensive functional divergence. Moreover, many genes showed a distinct tissue-specific expression pattern, suggesting specific roles in particular stages of development. For instance, six genes (GmPG038, 022, 007, 031, 023, and 025) displayed specifically expression in the flower of soybean. Three genes (GmPG026, 072, and 086) had a significantly transcript accumulation in the root. GmPG034 and GmPG112 were highly expressed in nodule. Four genes such as GmPG021, GmPG039, GmPG063, and GmPG102 were primarily expressed at the seed development stage. Besides, for most of the members of the subfamily CI 10 -CI 13 , GmPGs accumulated transcripts in various tissues. Subfamily CI 9 GmPGs were mainly expressed in the flower. For some GmPG genes, there was no domain 1 in their protein, which showed a relative low expression level.
Duplicate genes may have different evolutionary fates: nonfunctionalization, neofunctionalization, or subfunctionalization, which may be indicated by divergence in expression patterns [35]. In this study, we investigated the functional redundancy of the GmPG genes with high proportion of segmental/tandem duplications. Of the 54 homologous pairs of GmPG genes, 16 paralogous pairs shared similar expression patterns; for example, GmPG001/008, GmPG004/ 021, GmPG043/101, GmPG028/054, and GmPG048/084, etc. In contrast, the expression patterns of the 35 duplicate genes were partially redundant, from which distinct pattern shifts were discerned. For example, GmPG112 gene was mainly expressed in the nodule, whereas its duplicate counterpart GmPG002 gene was hardly expressed in the tissues. GmPG071 showed a high expression level in seeds, but its duplicate counterpart GmPG049 showed a relatively low expression level. GmPG006 extended to broader expression patterns in tissues while its duplicate counterpart GmPG019 had no expression. The other three gene pairs barely had any corresponding data in various tissues. These findings suggested that expression profiles had diverged substantially after gene duplication. Consequently, we speculate that GmPGs have been retained by substantial subfunctionalization during the soybean evolutionary processes.

Artificial selection analysis for GmPGs during soybean domestication
Cultivated soybean was domesticated from wild soybean (Glycine soja) in China 5,000 years ago. Therefore, large numbers of protein-coding genes underwent selection during soybean domestication. Here, we present a survey of the selection effects using 112 soybean PG genes during soybean domestication based on the sequence diversity analysis in soybean populations including 62 G. soja, 130 landraces, and 110 improved cultivars [27]. We determined that 1726 selected SNPs existed in the 107 soybean PG genes, including 452, 1,110, 88, and 66 SNPs in exon, intron, 5'UTR, and 3'UTR, respectively (S2 Table). Moreover, most of them had more than one selected site. These results suggested that the sites had experienced selection during soybean domestication and improvement. The SNPs were distributed throughout the 20 soybean chromosomes, mainly Chr09, Chr14, Chr15, and Chr19 (S2 Fig). Among these, 549 sites were significantly decreased from wild soybeans to landraces and to cultivars. On the contrary, the genetic diversity of 542 sites increased sharply in cultivars compared with that of landraces or wild soybeans. These results suggested that many sites probably might have been artificially selected to meet human needs or adapt to their environment. In addition, the reverse distribution of SNP in different evolutionary type of soybeans was defined as strong selected site [30]. 594 strong selected sites were identified and located in 91 GmPGs (S3 Table). So these PG genes with one or more type of reverse distribution were assumed to undergo an artificial selection during soybean domestication.
Domestication involves the genetic modification of functional units. Therefore, we found that 86 GmPGs had nonsynonymous selected site(s) in their coding sequence (CDS), and 42 of them had sequence reads in at least one tissue (S4 Table). Interestingly, nonsynonymous selected site(s) of six PGs (PG007, 011, 022, 031, 034, and 107) had a single haplotype in wildsoybeans and various in cultivars. These genes were mainly expressed in the flower, flower/ leaf/pod, flower, flower, nodule, and seed, respectively. Moreover, nonsynonymous selected site(s) of the three PGs (PG039, 54, and 74) had a single haplotype in landraces and various in wild-soybeans. These genes had a high transcript accumulation in various tissues. These selected sites may have caused functional changes in the corresponding GmPGs during soybean domestication.
Recently, more evidence demonstrated the significance of PG genes in flower development [13][14]. In an attempt to address whether the nonsynonymous selected site(s) may influence flower development, PG007, 11,22,31,39,54, and 74 above mentioned were selected as our study aim. Because more than two selected sites were identified in PG007, 11,39,54, and 74, we barely obtained a single haplotype using one SNP. Moreover, the nonsynonymous SNP of PG022 showed lower diversity than the nonsynonymous SNP of PG031 between 17 wild soybean and 14 cultivars according to Lam's resequencing data [36]. Finally, we selected a specific SNP from GmPG031 to explore the preliminary role of soybean PG genes during soybean domestication.

Allelic variation of soybean PG031 gene in wild and cultivated soybean populations
Compared to the Williams 82 reference sequence, the SNP above was detected at the 867th nucleotide from the start codon of the PG031 gene. Moreover, the SNP caused an AA substitution from (H) in wild soybean to (Y) in cultivars at 289th amino acid of the PG031 protein.
The protein was identified by CAPS-specific primers, which amplified a 623-bp fragment containing the SNP, which was then digested using Nsi I into 217-bp and 406-bp fragments of the PG031 289H -type, but not of the PG031 289Y -type ( Fig 6A).
To elucidate polymorphism of the SNP alleles, we conducted genotype analysis using the minicore collection of the Chinese soybean landraces. Our results showed that the SNP locus was PG031 289H in all of the G. soja population; in cultivars, the SNP was mainly PG031 289H , whereas PG031 289Y shared 26% (Fig 6B and S5 Table). This was consistent with the results presented in S4 Table. Therefore, we speculated that the SNP substitution of PG031 might have resulted in a loss of function or gain in function.

PG031 involved in flower development
We investigated the specificity of PG031 expression in a tissue to elucidate the role of the soybean PG031 gene in flower. For this, we fused a 1,786-bp fragment (upstream of the GmPG031 translation initiation site) to the GUS reporter gene and introduced it into Arabidopsis via Agrobacterium-mediated transformation. The transgenic seedlings of T 3 generation were subjected to GUS staining for activity analysis. We observed strong GUS activity in the flowers of PG031-GUS plants (Fig 7A). Specifically, strong activity appeared in both pollen and tube but not in the buds (Fig 7B-7E). Moreover, RT-PCR amplification from DN50 (PG031 289H -type) and TK780 (PG031 289Y -type) showed a significant increase in PG031 transcripts in the flower, but there were no transcripts in the leaf/stem/seed (Fig 7F). The PG031 gene showed flowerspecific expression pattern, which was in agreement with the RNA-seq profile (Fig 4). Thus, we speculate that the PG031 gene might play a role in flower development.

GmPG031 affected flowers and siliques development
To understand the biological role of soybean PG031 in flower, the GmPG031 289H and GmPG031 289Y genes were introduced in Arabidopsis under the control of CaMV35 promoter. As shown in Fig 8A, growth status seemed similar between GmPG031 289H and GmPG031 289Y transgenic plants. When they entered into the flowering stage, some of the inflorescences died in 35S::GmPG031 289Y plants, but survived in WT and 35S::GmPG031 289H . In addition, 35S:: GmPG031 289Y siliques had survived but appeared less full, curved shorter, and attained early maturity, whereas the WT and 35S::GmPG031 289H siliques developed at a normal rate (Fig 8B  and 8C). The rate of full silique in 35S::GmPG031 289H plants was significantly higher than that of 35S::GmPG031 289Y (Fig 9A). The silique length of 35S::GmPG031 289H plants was significantly longer than that of 35S::GmPG031 289Y and the wild-type (Fig 9B). In addition, the selected SNP of the GmPG031 gene affected seed number and weight. Because the siliques showed less full in the GmPG031 289Y transgenic Arabidopsis, the number of seeds was lower than that of GmPG031 289H plants. However, the 1,000-seed weight of 35S::GmPG031 289Y was significantly higher than that of 35S::GmPG031 289H (Fig 9C, P-value<0.05).
Seed weight is an important trait of soybean domestication. Our phenotypic observations on seed weight in transgenic Arabidopsis has proven that the detected SNP occurred during the domestication of wild soybean and is responsible for the difference in seed weight between wild and cultivated soybeans. Discussion PG is one of the major enzymes involved in pectin disassembly by biochemically catalyzing the hydrolytic cleavage of a (1-4) galacturonan [37][38]. Pectin is one of the major components of the primary cell wall, whose disassembling leads to cell wall separation [39][40]. Therefore, PG was suggested to play important roles in many stages of plant development, particularly in various cell separation processes [41][42].
PGs belong to a large gene family in plant genomes. A recent study identified 66, 75, and 44 PG genes from Arabidopsis, Populus, and rice genomes, respectively, which were divided into three classes [43]. In this study, we found that the 112 PGs from the soybean genomes could be classified into three distinct groups (Fig 1). Compared to Arabidopsis, Populus, and rice PGs, the soybean genome contained the highest number of PG members. Therefore, multiple soybean genes may be required to maintain their biological functions of adapting to more complex organ systems and structures. Furthermore, RNA-seq showed that 64 GmPGs had sequence reads in at least one tissue. Approximately 48 of these 112 (43%) GmPGs showed transcript accumulation in flower tissues, 42 (38%) showed the transcript accumulation in roots, 30 (27%) showed the transcript accumulation in leaves, 34 (30%) showed the transcript accumulation in pods, and 37 (33%) showed the highest transcript accumulation in seeds (Figs 4 and 5). Variations in the expression levels of GmPGs indicate that multiple GmPGs are necessary for the complicated transcriptional regulations during the development of all organs or tissues in soybean.
To our knowledge, the primary causes of gene-family expansion include segmental duplication, tandem duplication, and transposition events. Combined with the soybean genome, GmPG genes were generated mainly through segmental duplication events and non-randomly distributed across all 20 chromosomes. Moreover, one gene of some duplicate pairs showed a relatively low expression level, whereas the other showed functional diversity, which may lead to neofunctionalization or subfunctionalization. These results support the theory that segmental duplication events may widely distribute duplicated genes across the genome, and could lead to the loss of many functional redundant genes to avoid fitness cost [44][45]. Moreover, AS contributes to gene-family diversity, which generates various gene isoforms for differential expression.
With the rapid development of next-generation sequencing technology, powerful genomic approaches have been used to screen for selective sweeps or genes at a genome-wide level [36,46]. Thus far, large numbers of protein-coding genes undergoing selection during plant domestication have been identified from soybean such as GmCupin genes, GmPLC genes [30,47]. Similarly, PGs are involved in diverse biological processes. Therefore, we comprehensively investigated SNPs in soybean PG genes using 302 resequenced soybean accessions in this study. Our results revealed that most of the GmPGs underwent strong natural selection during soybean domestication, with some exhibiting a certain degree of variation. Whether these SNPs confer unique functional roles remains to be further investigated.
To determine whether the identified SNP plays a putative functional role in plant development, one SNP of GmPG031 gene was selected as a preliminary candidate according to the RNA-seq data and evolutionary analysis results. Furthermore, we performed genotype, promoter pattern, and transgene expression analysis. Genotype analysis indicated that the SNP of the GmPG031 gene underwent strong selection during soybean domestication. Promoter and over-expression analysese indicated that the selected SNP apparently affected floral development (Figs 6 and 7). Thus, our study suggested that the differential selection patterns may be associated with their functions. Although there is a lack of experimental evidence for the involvement of PG gene in soybean development or organogenesis, understanding the role of PGs in domestication may help answer fundamental biological questions and enhance our ability to engineer crops.