Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Global Analysis of the Polygalacturonase Gene Family in Soybean (Glycine max)

  • Feifei Wang,

    Affiliations Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China, University of Chinese Academy of Sciences, Beijing, 100049, China

  • Xia Sun,

    Affiliation Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China

  • Xinyi Shi,

    Affiliation School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China

  • Hong Zhai,

    Affiliation Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China

  • Changen Tian,

    Affiliation School of Life Sciences, Guangzhou University, Guangzhou, 510006, China

  • Fanjiang Kong,

    Affiliation Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China

  • Baohui Liu ,

    yuanxh@iga.ac.cn (XY); liubh@iga.ac.cn (BL)

    Affiliation Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China

  • Xiaohui Yuan

    yuanxh@iga.ac.cn (XY); liubh@iga.ac.cn (BL)

    Affiliation Northeast Institute of Geography and Agroecology, Key Laboratory of Soybean Molecular Design Breeding, the Chinese Academy of Sciences, Harbin, 150081, China

Abstract

Polygalacturonase is one of the pectin hydrolytic enzymes involved in various developmental and physiological processes such as seed germination, organ abscission, pod and anther dehiscence, and xylem cell formation. To date, no systematic analysis of polygalacturonase incorporating genome organization, gene structure, and expression profiling has been conducted in soybean (Glycine max var. Williams 82). In this study, we identified 112 GmPG genes from the soybean Wm82.a2v1 genome. These genes were classified into three groups, group I (105 genes), group II (5 genes), and group III (2 genes). Fifty-four pairs of duplicate paralogous genes were preferentially identified from duplicated regions of the soybean genome, which implied that long segmental duplications significantly contributed to the expansion of the GmPG gene family. Moreover, GmPG transcripts were analyzed in various tissues using RNA-seq data. The results showed the differential expression of 64 GmPGs in the tissue and partially redundant expression of some duplicate genes, while others showed functional diversity. These findings suggested that the GmPGs were retained by substantial subfunctionalization during the soybean evolutionary processes. Finally, evolutionary analysis based on single nucleotide polymorphisms (SNPs) in wild and cultivated soybeans revealed that 107 GmPGs had selected site(s), which indicated that these genes may have undergone strong selection during soybean domestication. Among them, one non-synonymous SNP of GmPG031 affected floral development during selection, which was consistent with the results of RNA-seq and evolutionary analyses. Thus, our results contribute to the functional characterization of GmPG genes in soybean.

Introduction

The plant cell wall is involved in many essential biological processes such as cell elongation, sloughing of cells at the root tip, fruit softening, fruit decay, pollen dehiscence and abscission of organs including leaves, floral parts, and fruits [13]. Therefore, the function and regulation of the cell wall have always piqued the interest of researchers [4]. Cell wall network assembly occurs through the action of cell wall hydrolytic enzymes, including polygalacturonase (PG), β-1,4-endoglucanases, pectate lyase, and pectin methyl esterase, which cleave the bonds between the polymers that make up the cell wall [56].

Among these hydrolytic enzymes, the PG belongs to one of the largest hydrolase families, which catalyze α-(1–4) linkages between D-galacturonic acid residues in homogalacturonan, causing cell separation [7]. Thus, PG activities are associated with a wide range of plant developmental programs such as seed germination, embryo development, organ abscission, pod and anther dehiscence, pollen grain maturation, xylem cell formation, and pollen tube growth [89]. Previous studies reported that PG was present in the endosperm cap of tomato seeds, and the activity of PG increased during seed germination [1011]. The Arabidopsis exo-polygalacturonase gene NIMNA causes cell elongation defects in the early embryo and markedly reduces suspensor length [12]. Knocking out three Arabidopsis PGs led to the failure of pollen grain separation, and silique and anther dehiscence [1314]. Over-expression of a PG in transgenic apple trees (Malus domestica) alters the leaf morphology and causes premature leaf shedding [15]. In tomato fruit, a correlation between endo-PG activity and softening was observed in some cultivars [16]. Moreover, the functions of PGs are not only restricted to plant developmental processes but also include wound responses and host-parasite interactions [1718]. Thus, these findings illustrate that plant PG genes have extensive functional divergence.

Plant PGs are multifunctional proteins encoded by a large gene family. To date, genome-wide analyses of the PG gene family have focused mainly on annual herbaceous plants such as Arabidopsis and rice. The Arabidopsis and rice genomes contain 66 and 42 PG members, respectively, which are divided into three distinct groups [19]. Comparative analyses of this gene family will help understand the expansion and functional diversification of this large gene family. Soybean (Glycine max) is planted worldwide as an essential protein and oil crop; however, the functional characterization of PG genes in soybean has rarely been reported. In the present study, we conducted a detailed analysis of GmPG genes based on the genome Wm82.a2v1 including genome organization, gene structure, expression compendium, and selective effects of GmPG genes during soybean domestication. Our results may provide a subset of potential candidate PG genes for future engineering modification.

Materials and Methods

Sequence retrieval and phylogenetic analysis

To identify soybean PG gene family members, 66 Arabidopsis PG protein sequences were used to search the soybean genome database version 2 (http://www.phytozome.net/) using the TBLASTP program. The cut-off of E-value was 1e-10, and the score was 40%. Previous studies have shown that all PG proteins contained glycosyl hydrolase family 28 (GH28) domains (Kim et al., 2006) [19]. Thus, apart from sequence similarities, all collected soybean PG candidates were primarily analyzed using the protein families database (Pfam) to confirm the presence of GH28 domains in their protein structures. Multiple sequence alignments of the full-length protein sequences were performed by Clustal X (version 1.83) program [20]. The unrooted phylogenetic trees were constructed with MEGA 5.0 using the maximum-likelihood (ML) methods, and the bootstrap test was carried out with 1000 iterations [21]. The program MEME version 4.11.1 was used for the elucidation of motifs in 112 deduced soybean PG protein sequences (http://meme.sdsc.edu) [22].

Genomic structure and gene duplication

Gene structure display server (GSDS) program was used to illustrate exon/intron organization of individual PG genes by comparison of the cDNA with its corresponding genomic DNA sequence from Phytozome (http://www.phytozome.net/) [23]. The identification of homologous chromosome segments resulting from whole-genome duplication events was accomplished as described in Schmutz et al. (2010) [24].

Expression analysis of GmPG genes

Transcript data of the GmPG genes were downloaded from the Soybase database (http://soybase.org/). These were obtained from various tissues and developmental stages, including vegetative tissues (e.g., young leaf, root, and nodule), reproductive tissues (e.g., flower, one cm pod, pod shell of 10 and 14 days after flowering), and seeds from seven developmental stages (10, 14, 21, 25, 28, 35, and 42 days after flowering). All transcript data were analyzed with Cluster 3.0 [25], and the heat map was viewed using Java Treeview [26].

Evolutionary analysis of GmPG genes

SNPs of the GmPG genes were downloaded from the NCBI dbSNP database based on the resequencing of 302 wild and cultivated soybean genomes [27]. Moreover, we analyzed the ratio of each SNP in wild and cultivated soybean populations. The SNP site with reverse distribution ratio in different types of soybean population was defined as a putative selective site throughout domestication.

DNA polymorphism analysis

Candidate SNP of the soybean PG031 gene was analyzed with a cleaved amplified polymorphic sequence (CAPS) marker, as follows. PCR using primers 5'- CTGTATCTCATTGGGTGATGGTAAC-3' and 5'- CCTGTTATTACGGGCTTGACG-3' amplified a 623-bp fragment from the genomic DNA. The amplified fragment from the PG031289H allele had an Nsi I site containing the SNP; thus, it was digested into 217-bp and 406-bp fragments using this enzyme, whereas the PG031289Y allele remained undigested. All genomic DNA of wild soybean and cultivars were provided by the Key Laboratory of Soybean Molecular Design Breeding of Northeast Institute of Geography and Agroecology.

Gene expression model

A 1,786-bp fragment upstream of the PG031 start codon was PCR amplified from the soybean genomic DNA using the following primers (5'-TAAAGTTCAAGGTGTTAGGAAGGTG-3' and 5'- ATTGTTTTTGTTTTTGTTTGTGGCA-3') for investigating PG031 gene expression patterns. The PCR product was cloned into the Pst I/Nco I-digested pMDC1001G vector to generate the PG031pro:GUS expression construct. Plants were transformed with Agrobacterium tumefaciens strain LBA4404 using the floral dip method [28]. Positive transformants were selected on 1/2MS plates containing 50 mg·L-1 kanamycin. Flowers and siliques of T2 transgenic plants were subjected to GUS staining. GUS histochemical staining was performed by using 5-bromo-4-chloro-3-indolyl-b-D-glucuronide as substrates [29]. The tissues were decolored in 75% ethanol and images of GUS staining were recorded using a VHX digital microscope (Japan) or a Canon camera (Japan).

Semi-quantitative RT-PCR analysis was performed to further characterize the expression of soybean PG031. Total RNA was extracted from various tissues in cultivar DongNong50 (DN50, PG031289H-type) and Tokei 780 (TK780, PG031289Y-type) using the TRIzol method, and then subjected to reverse transcription using the SuperScriptTM III Reverse Transcriptase kit (Invitrogen, Carlsbad, CA, USA). RT-PCR specific primers were F: 5'- CTGTATCTCATTGGGTGATGGTAAC-3' and R: 5'-TTCAACGGCCTCTTCATTATC-3'; amplification of β-tubulin gene (Glyma.05G157300.1) was used as an internal control to normalize all data and cloned by primers (F: 5'-TCTTGGACAACGAAGCCATCT-3'; and R: 5'-TGGTGAGGGACGAAATGATCT-3').

Generation of the GmPG031 transgenic Arabidopsis thaliana

To obtain the transgenic Arabidopsis lines, the full length CDS of the GmPG031289H and GmPG031289Y genes were respectively amplified with the gene specific primers (5'-ATGAAGTTCACTATAATCACAATAT-3' and 5'-CTAGGCTGCACAAGTAGGAG-3'), and then linked with expression vector under the control of the strong constitutive CaMV35S promoter. The recombinant construct was transformed into Col-0 Arabidopsis as above, and transgenic lines were obtained by RT-PCR identification wherein the T3 lines were used for phenotypic analysis. Seedlings were transferred from 1/2 MS plates to the soil for growth in a greenhouse under controlled environment conditions (21–23°C, 200μmol photons m-2 s-1, 70% relative humidity, 16 h light/8 h dark cycles). Siliques were measured for at least three plants from each transgenic line.

Results

Sequence and phylogenetic relationships of soybean PG genes

We identified 112 genes encoding putative PG proteins in the soybean genome Wm82.a2v1 using the Phytozome database (S1 File). The detailed information of PG family genes in soybean including gene locus, location, and similarities to their Arabidopsis orthologs as well as amino acids are listed in Table 1. The 112 GmPG genes were distributed throughout the 20 soybean chromosomes and were numbered from GmPG001 to GmPG112 according to their localization. These identified PG genes in soybean encode proteins ranging from 67 to 882 amino acids (aa) with an average of 398 aa. Remarkably, in most cases, two or more soybean PG genes were found for every ortholog in Arabidopsis. We speculate that the presence of more GmPG genes may reflect a great need for complicated transcriptional regulation in this leguminous plant.

Subsequently, we constructed an unrooted tree to examine the phylogenetic relationships among GmPG genes using alignments of the full-length amino-acid sequences in their coding PG proteins (Fig 1A). The phylogenetic tree showed that the PGs formed three distinct clades (red, green, and blue boxes) with 100% bootstrap support. In the tree, the PG genes in the red, blue, and green clades were termed cluster I, II, and III PGs, respectively. Soybean classes I, II, and III PG genes contained 105, 5, and 2 members, respectively. Class I was divided into subfamilies CI1 to CI13 according to the most recent common ancestor (MRCA) of soybean. Phylogenetic tree topology revealed that 36 GmPG gene pairs located at the terminal nodes shared a sequence similarity of 52%~99%. This implied that these genes were homologous genes that diverged by gene duplication.

thumbnail
Fig 1. Phylogenetic relationships and gene structures of the GmPG genes.

(A) Multiple alignments of 112 full-length amino acids of PG genes from soybean were executed by Clustal X ver. 1.83, and the phylogenetic tree was constructed using MEGA 5.0 by the maximum-likelihood (ML) method with 1,000 bootstrap replicates. Three distinct clusters (I to III) formed by the PGs are marked by red, green, and blue frames, respectively. (B) The main domains are highlighted by colored boxes. Introns are shown as lines. The sequence of the domains are shown in S1 Fig. (C) Exons and introns are represented by yellow boxes and gray lines, respectively. The sizes of exons and introns can be estimated using the scale at the bottom.

https://doi.org/10.1371/journal.pone.0163012.g001

Additionally, through multiple alignment analysis, we also discovered the features of the homologous domain sequence and the frequency of the amino-acids at each position on the GmPG domains. As shown in S1 Fig, three distinct motifs, which are the main domains of the PG family, were identified. Among the 112 GmPG genes, motifs 1 and 3 were present in most of the GmPG family members (Fig 1B). Noticeably, some specific motifs were present in PGs; for instance, domains 1 and 3 of subfamily CI10-CI13, domains 1 and 2 of Cluster II, and domains 2 and 3 of Cluster III. However, some of these were PG fragments and had no domains such as GmPG062, GmPG069, GmPG092, and GmPG018, which were considered to be pseudogenes in the study. These results suggested that these motifs might confer unique functional roles to soybean PG proteins.

Gene structure and gene duplication of soybean PG genes

To determine the numbers and positions of exons and introns within each soybean PG gene, we used the full-length cDNA sequences with the corresponding genomic DNA sequences. We observed that introns disrupted most of the coding sequences of the PGs. By contrast, two genes (GmPG062 and 018) had no introns in their coding region (Fig 1C). The remaining genes had up to 11 introns based on their relative positions. These results supported the argument that gene structural diversity was a possible explanation to the evolution of multigene families [30]. However, gene structures not only showed extreme similarity in most of the closely related GmPG members at the same node, but the position and length of intron were almost completely conserved. This high level of similarity suggested that these genes arose from a recent duplication event.

Furthermore, 26 GmPG genes contained two to six alternative structures that had undergone alternative splicing (AS) and thus produced a variety of transcripts from a single gene (Fig 2). Among these genes, 21 genes underwent extension, shortening or deletion of exon sequences, three underwent 5'-UTR events, and three had competing 5'/3'-UTR events. Interestingly, GmPG073 and GmPG080 exhibited six alternative types of splicing by 5/3' alternative splice and extending or shortening the exon. The remaining three alternative splice events (GmPG040, GmPG082, and GmPG091) occurred in the 5'-UTR region without affecting the coding frame, which indicated that they created a variety of UTRs that may play a key role in gene regulation. Besides, some of the AS events resulted in a variety of domain insertions and/or deletions in the corresponding coding region. For instance, the exon deletion in GmPG035 resulted in the deletion of domain 1. The AS events enriched gene structures and might be a consequence of function diversity.

thumbnail
Fig 2. Alternative splicing of GmPG genes in the soybean genome.

(A) The 26 identified GmPGs contained alternative structures. (B) Among these genes, 21 genes underwent extension, shortening or deletion of exon sequences, three underwent 5'-UTR events, and three had competing 5'/3'-UTR events. (C) The main domains are indicated by colored boxes.

https://doi.org/10.1371/journal.pone.0163012.g002

Moreover, gene duplication occurs throughout plant evolution, thereby contributing to the establishment of gene-family expansion and new gene functions [3132]. Paralogous segments created by this whole-genome duplication event were identified in previous analyses of the soybean genome [24]. Fifty-four duplicate pairs relative to the corresponding duplicate blocks are illustrated in Fig 3 and S1 Table, including 53 segmental duplications and one tandem duplication. Twenty-three PG pairs were clearly located in collinear regions and formed eight blocks. For example, GmPG004, 005, 006 showed extensive collinearity corresponding to the duplicated regions GmPG021, 020 and 019; GmPG045, 046, 047, 048 were duplicated by GmPG081, 082, 083, 084, and 086, which were collinearly arranged. These results indicated that these eight blocks were derived from large-scale duplications of their associated blocks. Moreover, GmPG033 and GmPG039 were flanked by GmPG054, whereas GmPG056, 083, and 094 were flanked by GmPG047, indicating that these are products of a block duplication. All of these findings clearly suggest that several members of the PG family were derived from large-scale duplication events.

thumbnail
Fig 3. Chromosomal locations and region duplication of GmPG genes.

A total of 112 GmPG genes are mapped to the 20 chromosomes (Chr) on the basis of JGI soybean Genome version 7.0. Each pair of duplicated PG genes is connected with a red line, generating a total of 54 gene pairs. Segmental duplicated homologous blocks are indicated with the same color bar, a total of 8 predicted duplication regions. Tandemly duplicated genes are indicated with yellow box. The chromosome number is indicated above each chromosome. The scale is in megabases (Mb). Scale represents a 3.5 Mb chromosomal distance.

https://doi.org/10.1371/journal.pone.0163012.g003

Differential expression profile of soybean PG genes

Gene expression pattern may provide important clues to gene function [33]. We therefore obtained the previously publicly-available RNA-seq data across six soybean tissues and seven seed developmental stages [34]. RNA-seq data analysis showed that 64 GmPG genes had sequence reads in at least one tissue (Fig 4). These genes were clustered into five groups (A-E) and four groups (I-IV) based on their expression profiles in the soybean tissues (except seeds) and the expression patterns during seven soybean seed development stages (Fig 5). These genes in clusters A-E were mainly expressed in flower, root/flower/pod, root, nodule, and pod/leaf/root/flower, respectively. Further, most genes of cluster II were highly expressed at the earlier stage of seed development. Most genes of cluster III were expressed during the whole soybean seed development process. Thus, the wide expression of these genes illustrated that soybean PG genes had extensive functional divergence. Moreover, many genes showed a distinct tissue-specific expression pattern, suggesting specific roles in particular stages of development. For instance, six genes (GmPG038, 022, 007, 031, 023, and 025) displayed specifically expression in the flower of soybean. Three genes (GmPG026, 072, and 086) had a significantly transcript accumulation in the root. GmPG034 and GmPG112 were highly expressed in nodule. Four genes such as GmPG021, GmPG039, GmPG063, and GmPG102 were primarily expressed at the seed development stage. Besides, for most of the members of the subfamily CI10-CI13, GmPGs accumulated transcripts in various tissues. Subfamily CI9 GmPGs were mainly expressed in the flower. For some GmPG genes, there was no domain 1 in their protein, which showed a relative low expression level.

thumbnail
Fig 4. Expression profile of GmPG genes in different tissues.

The numbers in the expression profile are normalized data, which were calculated as reads, normalization of the raw data. All data were selected from the SoyBase databases.

https://doi.org/10.1371/journal.pone.0163012.g004

thumbnail
Fig 5. Heat map of 64 expressed GmPG genes in different tissues.

(A) Heat map showing hierarchical clustering of 61 expressed GmPG genes among various tissues analyzed. (B) Heat map showing hierarchical clustering of 37 expressed GmPG genes during the development of soybean seeds.

https://doi.org/10.1371/journal.pone.0163012.g005

Duplicate genes may have different evolutionary fates: nonfunctionalization, neofunctionalization, or subfunctionalization, which may be indicated by divergence in expression patterns [35]. In this study, we investigated the functional redundancy of the GmPG genes with high proportion of segmental/tandem duplications. Of the 54 homologous pairs of GmPG genes, 16 paralogous pairs shared similar expression patterns; for example, GmPG001/008, GmPG004/021, GmPG043/101, GmPG028/054, and GmPG048/084, etc. In contrast, the expression patterns of the 35 duplicate genes were partially redundant, from which distinct pattern shifts were discerned. For example, GmPG112 gene was mainly expressed in the nodule, whereas its duplicate counterpart GmPG002 gene was hardly expressed in the tissues. GmPG071 showed a high expression level in seeds, but its duplicate counterpart GmPG049 showed a relatively low expression level. GmPG006 extended to broader expression patterns in tissues while its duplicate counterpart GmPG019 had no expression. The other three gene pairs barely had any corresponding data in various tissues. These findings suggested that expression profiles had diverged substantially after gene duplication. Consequently, we speculate that GmPGs have been retained by substantial subfunctionalization during the soybean evolutionary processes.

Artificial selection analysis for GmPGs during soybean domestication

Cultivated soybean was domesticated from wild soybean (Glycine soja) in China 5,000 years ago. Therefore, large numbers of protein-coding genes underwent selection during soybean domestication. Here, we present a survey of the selection effects using 112 soybean PG genes during soybean domestication based on the sequence diversity analysis in soybean populations including 62 G. soja, 130 landraces, and 110 improved cultivars [27]. We determined that 1726 selected SNPs existed in the 107 soybean PG genes, including 452, 1,110, 88, and 66 SNPs in exon, intron, 5'UTR, and 3'UTR, respectively (S2 Table). Moreover, most of them had more than one selected site. These results suggested that the sites had experienced selection during soybean domestication and improvement. The SNPs were distributed throughout the 20 soybean chromosomes, mainly Chr09, Chr14, Chr15, and Chr19 (S2 Fig). Among these, 549 sites were significantly decreased from wild soybeans to landraces and to cultivars. On the contrary, the genetic diversity of 542 sites increased sharply in cultivars compared with that of landraces or wild soybeans. These results suggested that many sites probably might have been artificially selected to meet human needs or adapt to their environment. In addition, the reverse distribution of SNP in different evolutionary type of soybeans was defined as strong selected site [30]. 594 strong selected sites were identified and located in 91 GmPGs (S3 Table). So these PG genes with one or more type of reverse distribution were assumed to undergo an artificial selection during soybean domestication.

Domestication involves the genetic modification of functional units. Therefore, we found that 86 GmPGs had nonsynonymous selected site(s) in their coding sequence (CDS), and 42 of them had sequence reads in at least one tissue (S4 Table). Interestingly, nonsynonymous selected site(s) of six PGs (PG007, 011, 022, 031, 034, and 107) had a single haplotype in wild-soybeans and various in cultivars. These genes were mainly expressed in the flower, flower/leaf/pod, flower, flower, nodule, and seed, respectively. Moreover, nonsynonymous selected site(s) of the three PGs (PG039, 54, and 74) had a single haplotype in landraces and various in wild-soybeans. These genes had a high transcript accumulation in various tissues. These selected sites may have caused functional changes in the corresponding GmPGs during soybean domestication.

Recently, more evidence demonstrated the significance of PG genes in flower development [1314]. In an attempt to address whether the nonsynonymous selected site(s) may influence flower development, PG007, 11, 22, 31, 39, 54, and 74 above mentioned were selected as our study aim. Because more than two selected sites were identified in PG007, 11, 39, 54, and 74, we barely obtained a single haplotype using one SNP. Moreover, the nonsynonymous SNP of PG022 showed lower diversity than the nonsynonymous SNP of PG031 between 17 wild soybean and 14 cultivars according to Lam’s resequencing data [36]. Finally, we selected a specific SNP from GmPG031 to explore the preliminary role of soybean PG genes during soybean domestication.

Allelic variation of soybean PG031 gene in wild and cultivated soybean populations

Compared to the Williams 82 reference sequence, the SNP above was detected at the 867th nucleotide from the start codon of the PG031 gene. Moreover, the SNP caused an AA substitution from (H) in wild soybean to (Y) in cultivars at 289th amino acid of the PG031 protein. The protein was identified by CAPS-specific primers, which amplified a 623-bp fragment containing the SNP, which was then digested using Nsi I into 217-bp and 406-bp fragments of the PG031289H-type, but not of the PG031289Y-type (Fig 6A).

thumbnail
Fig 6. CAPS marker used to detect the SNP in PG031.

(A) A fragment of 623 bp harboring the SNP can be digested by Nsi I in the PG031289H, but not in the PG031289Y. (B) Genotypic constitutions at the CAPS marker for 815 soybean population

https://doi.org/10.1371/journal.pone.0163012.g006

To elucidate polymorphism of the SNP alleles, we conducted genotype analysis using the minicore collection of the Chinese soybean landraces. Our results showed that the SNP locus was PG031289H in all of the G. soja population; in cultivars, the SNP was mainly PG031289H, whereas PG031289Y shared 26% (Fig 6B and S5 Table). This was consistent with the results presented in S4 Table. Therefore, we speculated that the SNP substitution of PG031 might have resulted in a loss of function or gain in function.

PG031 involved in flower development

We investigated the specificity of PG031 expression in a tissue to elucidate the role of the soybean PG031 gene in flower. For this, we fused a 1,786-bp fragment (upstream of the GmPG031 translation initiation site) to the GUS reporter gene and introduced it into Arabidopsis via Agrobacterium-mediated transformation. The transgenic seedlings of T3 generation were subjected to GUS staining for activity analysis. We observed strong GUS activity in the flowers of PG031-GUS plants (Fig 7A). Specifically, strong activity appeared in both pollen and tube but not in the buds (Fig 7B–7E). Moreover, RT-PCR amplification from DN50 (PG031289H-type) and TK780 (PG031289Y-type) showed a significant increase in PG031 transcripts in the flower, but there were no transcripts in the leaf/stem/seed (Fig 7F). The PG031 gene showed flower-specific expression pattern, which was in agreement with the RNA-seq profile (Fig 4). Thus, we speculate that the PG031 gene might play a role in flower development.

thumbnail
Fig 7. GmPG031 expressed in flowers and siliques.

(A) GUS staining of GmPG031pro::GUS transgenic plants flowers. (B-E) Close-up images of transgenic plants flowers in A. (F) RT-PCR identification of GmPG031 in different tissues including 15-d-old leaf (15L),60-d-old leaf (60L), stems (S), stem tip (ST), roots (R), flowers(F), pod (P) and immature embryos (IC) in soybean DN50 and TK780. β-tubulin was used as an internal control.

https://doi.org/10.1371/journal.pone.0163012.g007

GmPG031 affected flowers and siliques development

To understand the biological role of soybean PG031 in flower, the GmPG031289H and GmPG031289Y genes were introduced in Arabidopsis under the control of CaMV35 promoter. As shown in Fig 8A, growth status seemed similar between GmPG031289H and GmPG031289Y transgenic plants. When they entered into the flowering stage, some of the inflorescences died in 35S::GmPG031289Y plants, but survived in WT and 35S::GmPG031289H. In addition, 35S::GmPG031289Y siliques had survived but appeared less full, curved shorter, and attained early maturity, whereas the WT and 35S::GmPG031289H siliques developed at a normal rate (Fig 8B and 8C). The rate of full silique in 35S::GmPG031289H plants was significantly higher than that of 35S::GmPG031289Y (Fig 9A). The silique length of 35S::GmPG031289H plants was significantly longer than that of 35S::GmPG031289Y and the wild-type (Fig 9B). In addition, the selected SNP of the GmPG031 gene affected seed number and weight. Because the siliques showed less full in the GmPG031289Y transgenic Arabidopsis, the number of seeds was lower than that of GmPG031289H plants. However, the 1,000-seed weight of 35S::GmPG031289Y was significantly higher than that of 35S::GmPG031289H (Fig 9C, P-value<0.05).

thumbnail
Fig 8. GmPG031 gene influences flower and silique development.

(A) Phenotypes of WT, 35S::GmPG031289Y and 35S::GmPG031289H plants. Some of 35S::GmPG031289Y transgenic plants have dead inflorescence. (B) 35S::GmPG031289Y siliques appeared less full, but WT and 35S::GmPG031289H siliques relatively. (C) A series of siliques at different positions were compared between transgenic and wild-type plants (Stage 15). Different positions are shown in A.

https://doi.org/10.1371/journal.pone.0163012.g008

thumbnail
Fig 9. Identification of siliques in GmPG031 transgenic and wild-type plants.

(A) The rate of full silique in GmPG031289H plants was higher than that in GmPG031289Y transgenic Arabidopsis. (B) Silique length of WT, 35S::GmPG031289Y and 35S::GmPG031289H plants. (C) The 1,000-seed weight of 35S::GmPG031289Y was significantly higher than that of 35S::GmPG031289H plants. Stars indicate significantly different groups (P < 0.05, Tukey test).

https://doi.org/10.1371/journal.pone.0163012.g009

Seed weight is an important trait of soybean domestication. Our phenotypic observations on seed weight in transgenic Arabidopsis has proven that the detected SNP occurred during the domestication of wild soybean and is responsible for the difference in seed weight between wild and cultivated soybeans.

Discussion

PG is one of the major enzymes involved in pectin disassembly by biochemically catalyzing the hydrolytic cleavage of a (1–4) galacturonan [3738]. Pectin is one of the major components of the primary cell wall, whose disassembling leads to cell wall separation [3940]. Therefore, PG was suggested to play important roles in many stages of plant development, particularly in various cell separation processes [4142].

PGs belong to a large gene family in plant genomes. A recent study identified 66, 75, and 44 PG genes from Arabidopsis, Populus, and rice genomes, respectively, which were divided into three classes [43]. In this study, we found that the 112 PGs from the soybean genomes could be classified into three distinct groups (Fig 1). Compared to Arabidopsis, Populus, and rice PGs, the soybean genome contained the highest number of PG members. Therefore, multiple soybean genes may be required to maintain their biological functions of adapting to more complex organ systems and structures. Furthermore, RNA-seq showed that 64 GmPGs had sequence reads in at least one tissue. Approximately 48 of these 112 (43%) GmPGs showed transcript accumulation in flower tissues, 42 (38%) showed the transcript accumulation in roots, 30 (27%) showed the transcript accumulation in leaves, 34 (30%) showed the transcript accumulation in pods, and 37 (33%) showed the highest transcript accumulation in seeds (Figs 4 and 5). Variations in the expression levels of GmPGs indicate that multiple GmPGs are necessary for the complicated transcriptional regulations during the development of all organs or tissues in soybean.

To our knowledge, the primary causes of gene-family expansion include segmental duplication, tandem duplication, and transposition events. Combined with the soybean genome, GmPG genes were generated mainly through segmental duplication events and non-randomly distributed across all 20 chromosomes. Moreover, one gene of some duplicate pairs showed a relatively low expression level, whereas the other showed functional diversity, which may lead to neofunctionalization or subfunctionalization. These results support the theory that segmental duplication events may widely distribute duplicated genes across the genome, and could lead to the loss of many functional redundant genes to avoid fitness cost [4445]. Moreover, AS contributes to gene-family diversity, which generates various gene isoforms for differential expression.

With the rapid development of next-generation sequencing technology, powerful genomic approaches have been used to screen for selective sweeps or genes at a genome-wide level [36, 46]. Thus far, large numbers of protein-coding genes undergoing selection during plant domestication have been identified from soybean such as GmCupin genes, GmPLC genes [30, 47]. Similarly, PGs are involved in diverse biological processes. Therefore, we comprehensively investigated SNPs in soybean PG genes using 302 resequenced soybean accessions in this study. Our results revealed that most of the GmPGs underwent strong natural selection during soybean domestication, with some exhibiting a certain degree of variation. Whether these SNPs confer unique functional roles remains to be further investigated.

To determine whether the identified SNP plays a putative functional role in plant development, one SNP of GmPG031 gene was selected as a preliminary candidate according to the RNA-seq data and evolutionary analysis results. Furthermore, we performed genotype, promoter pattern, and transgene expression analysis. Genotype analysis indicated that the SNP of the GmPG031 gene underwent strong selection during soybean domestication. Promoter and over-expression analysese indicated that the selected SNP apparently affected floral development (Figs 6 and 7). Thus, our study suggested that the differential selection patterns may be associated with their functions. Although there is a lack of experimental evidence for the involvement of PG gene in soybean development or organogenesis, understanding the role of PGs in domestication may help answer fundamental biological questions and enhance our ability to engineer crops.

Supporting Information

S1 Fig. Conserved domains across PG proteins in soybean.

https://doi.org/10.1371/journal.pone.0163012.s001

(TIF)

S2 Fig. Distribution of the selected SNPs in soybean chromosomes.

https://doi.org/10.1371/journal.pone.0163012.s002

(TIF)

S1 File. Transcript sequences of soybean PG genes.

https://doi.org/10.1371/journal.pone.0163012.s003

(TXT)

S1 Table. Gene duplication and gene blocks of the GmPG genes.

https://doi.org/10.1371/journal.pone.0163012.s004

(XLSX)

S2 Table. Selected sites of soybean PG genes during soybean domestication.

https://doi.org/10.1371/journal.pone.0163012.s005

(XLSX)

S3 Table. Strong selected sites of GmPGs during soybean domestication.

https://doi.org/10.1371/journal.pone.0163012.s006

(XLSX)

S4 Table. Nonsynonymous selected sites of soybean PG genes during soybean domestication.

https://doi.org/10.1371/journal.pone.0163012.s007

(XLSX)

S5 Table. Polymorphism analysis of candidate SNP in soybean PG031 gene.

https://doi.org/10.1371/journal.pone.0163012.s008

(XLS)

Acknowledgments

The authors are grateful to the providers who submitted the RNA-seq and resequencing data to the public databases. We would also like to thank LetPub (www.letpub.com) for its linguistic assistance in the preparation of this manuscript.

Author Contributions

  1. Conceptualization: FW XY.
  2. Data curation: FW XS.
  3. Formal analysis: FW HZ.
  4. Funding acquisition: FW.
  5. Investigation: FW XyS.
  6. Methodology: FW XS.
  7. Project administration: CT.
  8. Resources: BL.
  9. Software: XyS.
  10. Supervision: XY BL FK.
  11. Validation: HZ.
  12. Visualization: FW FK.
  13. Writing – original draft: FW.
  14. Writing – review & editing: FW XY.

References

  1. 1. Rose JK, Bennett AB. Cooperative disassembly of the cellulose-xyloglucan network of plant cell walls: parallels between cell expansion and fruit ripening. Trends Plant Sci. 1999; 4: 176–183 pmid:10322557
  2. 2. Cosgrove DJ. Expansive growth of plant cell walls. Plant Physiol Biochem. 2000; 38: 109–124. pmid:11543185
  3. 3. Roberts JA, Elliott KA, Gonzalez-Carranza ZH. Abscission, dehiscence, and other cell separation processes. Annu Rev Plant Biol. 2002; 53: 131–158. pmid:12221970
  4. 4. Popper ZA. Evolution and diversity of green plant cell walls. Curr Opin Plant Biol. 2008; 11: 286–292. pmid:18406657
  5. 5. Bosch M, Hepler PK. Pectin methylesterases and pectin dynamics in pollen tubes. Plant Cell. 2005; 17: 3219–3226. pmid:16322606
  6. 6. Patterson SE. Cutting loose, Abscission and dehiscence in Arabidopsis. Plant Physiol. 2001; 126: 494–500. pmid:11402180
  7. 7. Markovic O, Janecek S. Pectin degrading glycoside hydrolases of family 28: sequence-structural features, specificities and evolution.Prote in Eng. 2001; 14: 615–631.
  8. 8. Hadfield KA, Bennett AB. Polygalacturonases: many genes in search of a function. Plant Physiol. 1998; 117: 337–343. pmid:9625687
  9. 9. Torki M, Mandaron P, Thomas F, Quigley F, Mache R, Falconet D. Differential expression of a polygalacturonase gene family in Arabidopsis thaliana. Mol Gen Genet. 1999; 261: 948–952. pmid:10485285
  10. 10. Sitrit Y, Downie B, Bennett AB, Bradford KJ. A novel exo-polygalacturonase is associated with radicle protrusion in tomato (Lycopersicon esculentum) seeds. Plant Physiol. 1996; 111: 155–161.
  11. 11. Sitrit Y, Hadfield KA, Bennett AB, Bradford KJ, Downie AB. Expression of a polygalacturonase associated with tomato seed germination. Plant Physiol. 1999; 121: 419–428. pmid:10517833
  12. 12. Babu Y, Musielak T, Henschen A, Bayer M. Suspensor length determines developmental progression of the embryo in Arabidopsis. Plant Physiol. 2013; 162: 1448–1458. pmid:23709666
  13. 13. Ogawa M, Kay P, Wilson S, Swain SM. ARABIDOPSIS DEHISCENCE ZONE OLYGALACTURONASE1 (ADPG1), ADPG2, and QUARTET2 are polygalacturonases required for cell separation during reproductive development in Arabidopsis. Plant Cell. 2009; 21: 216–233. pmid:19168715
  14. 14. Xiao C, Somerville C, Anderson CT. POLYGALACTURONASE INVOLVED IN EXPANSION1 functions in cell elongation and flower development in Arabidopsis. Plant Cell. 2014; 26: 1018–1035. pmid:24681615
  15. 15. Atkinson RG, Schroder R, Hallett IC, Cohen D, Marae EA. Overexpression of polygalacturonase in transgenic apple trees leads to a range of novel phenotypes involving changes in cell adhesion. Plant Physiol. 2002; 129: 122–133. pmid:12011344
  16. 16. Kramer MG, Redenbaugh K. Commercialization of a tomato with an antisense polygalacturonase gene: The FLAVRSAVR tomato store. Euphytica. 1994; 79: 293–297.
  17. 17. Orozco-Cardenas ML, Ryan CA. Polygalacturonase β-subunit antisense gene expression in tomato plants leads to a progressive enhanced wound response and necrosis in leaves and abscission of developing flowers. Plant Physiol. 2003; 133: 693–701. pmid:12972668
  18. 18. Cheng Q, Cao Y, Pan H, Wang M, Huang M. Isolation and characterization of two genes encoding polygalacturonase-inhibiting protein from Populus deltoides. J Genet Genomics. 2008; 35: 631–8. pmid:18937920
  19. 19. Kim J, Shiu SH, Thoma S, Li WH, Patterson SE. Patterns of expansion and expression divergence in the plant polygalacturonase gene family. Genome Biol. 2006; 7: R87. pmid:17010199
  20. 20. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997; 25: 4876–4882. pmid:9396791
  21. 21. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood., evolutionary distance., and maximum parsimony methods. Mol Biol Evol. 2011; 28: 2731–2739. pmid:21546353
  22. 22. Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res. 2015; 43:39–49.
  23. 23. Guo AY, Zhu QH, Chen X, Luo JC. GSDS: a gene structure display server. Yi chuan. 2007; 29: 1023–1026. pmid:17681935
  24. 24. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010; 463: 178–183. pmid:20075913
  25. 25. de Hoon MJ, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004; 20: 1453–1454. pmid:14871861
  26. 26. Page RD. TreeView: an application to display phylogenetic trees on personal computers. Computer applications in the biosciences: CABIOS. 1996; 12: 357–358. pmid:8902363
  27. 27. Zhou Z, Jiang Y, Jiang Y, Wang Z, Gou Z, Lyu J, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol. 2015; 33: 408–414. pmid:25643055
  28. 28. Clough SJ, Bent AF. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 1998; 16: 735–743. pmid:10069079
  29. 29. Jefferson RA, Kavanagh TA, Bevan MW. GUS fusions: β-glucuronidase as a sensitive and versatile gene fusion marker in higher plants. EMBO J. 1987; 6: 3901–3907. pmid:3327686
  30. 30. Wang X, Zhang H, Gao Y, Sun G, Zhang W, Qiu L. A comprehensive analysis of the Cupin gene family in soybean (Glycine max). PLoS One. 2014; 9: 10.
  31. 31. Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci. 2003; 100: 11484–11489. pmid:14500911
  32. 32. Cannon SB, Mitra A, Baumgarten A, Young ND, May G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004; 4: 10. pmid:15171794
  33. 33. Du H, Yang SS, Liang Z, Feng BR, Liu L, Huang YB, et al. Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC Plant Biol. 2012; 12: 106. pmid:22776508
  34. 34. Severin AJ, Woody JL, Bolon YT, Joseph B, Diers BW, Farmer AD, et al. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC Plant Biol. 2010; 10: 160. pmid:20687943
  35. 35. Prince VE, Pickett FB. Splitting pairs: the diverging fates of duplicated genes. Nature reviews Genetics. 2002; 3: 827–837. pmid:12415313
  36. 36. Lam HM, Xu X, Liu X, Chen W, Yang G, Wong FL, et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet. 2010; 42: 1053–1059. pmid:21076406
  37. 37. Lyu M, Yu Y, Jiang J, Song L, Liang Y, Ma Z, et al. BcMF26a and BcMF26b are duplicated polygalacturonase genes with divergent expression patterns and functions in pollen development and pollen tube formation in brassica campestris. PLoS One. 2015; 10(7):e0131173. pmid:26153985
  38. 38. Li X, Chen GH, Zhang WY, Zhang X. Genome-wide transcriptional analysis of maize endosperm in response to ae wx double mutations. J Genet Genomics. 2010; 37: 749–62. pmid:21115169
  39. 39. Willats WG, McCartney L, Mackie W, Knox JP. Pectin: cell biology and prospects for functional analysis. Plant Mol Biol. 2001; 47: 9–27. pmid:11554482
  40. 40. Daher FB, Braybrook SA. How to let go: pectin and plant cell adhesion. Front Plant Sci. 2015; 6: 523. pmid:26236321
  41. 41. Rodriguez-Gacio MC, Nicolas C, Matilla AJ. Cloning and analysis of a cDNA encoding an endo-polygalacturonase expressed during the desiccation period of the silique-valves of turnip-tops (Brassica rapa L.cv.Rapa). J Plant Physiol. 2004; 161: 219–227. pmid:15022837
  42. 42. Park J, Cui Y, Kang BH. AtPGL3 is an Arabidopsis BURP domain protein that is located to the cell wall and promotes cell enlargement. Front Plant Sci. 2015; 6: 412. pmid:26106400
  43. 43. Yang ZL, Liu HJ, Wang XR, Zeng QY. Molecular evolution and expression divergence of the Populus polygalacturonase supergene family shed light on the evolution of increasingly complex organs in plants. New Phytol. 2013; 197: 1353–1365. pmid:23346984
  44. 44. Song H, Wang P, Hou L, Zhao S, Zhao C, Xia H, et al. Global analysis of WRKY genes and their response to dehydration and salt stress in soybean. Front Plant Sci. 2016; 7: 9. pmid:26870047
  45. 45. Cannon SB, Mitra A, Baumgarten A, Young ND, May G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004; 4: 10. pmid:15171794
  46. 46. Liu T, Fang C, Ma Y, Shen Y, Li C, Li Q, et al. Global investigation of the co-evolution of MIRNA genes and microRNA targets during soybean domestication. Plant J. 2016; 85: 96–409.
  47. 47. Wang F, Deng Y, Zhou Y, Dong J, Chen H, Dong Y, et al. Genome-Wide Analysis and Expression Profiling of the Phospholipase C Gene Family in Soybean (Glycine max). PLoS One. 2015; 10: e0138467. pmid:26421918