Genome-Wide Expression Analysis of Soybean MADS Genes Showing Potential Function in the Seed Development

The MADS family is an ancient and best-studied transcription factor and plays fundamental roles in almost every developmental process in plants. In the plant evolutionary history, the whole genome duplication (WGD) events are important not only to the plant species evolution, but to expansion of members of the gene families. Soybean as a model legume crop has experience three rounds of WGD events. Members of some MIKCC subfamilies, such as SOC, AGL6, SQUA, SVP, AGL17 and DEF/GLO, were expanded after soybean three rounds of WGD events. And some MIKCC subfamilies, MIKC* and type I MADS families had experienced faster birth-and-death evolution and their traces before the Glycine WGD event were not found. Transposed duplication played important roles in tandem arrangements among the members of different subfamilies. According to the expression profiles of type I and MIKC paralog pair genes, the fates of MIKC paralog gene pairs were subfunctionalization, and the fates of type I MADS paralog gene pairs were nonfunctionalization. 137 out of 163 MADS genes were close to 186 loci within 2 Mb genomic regions associated with seed-relative QTLs, among which 115 genes expressed during the seed development. Although MIKCC genes kept the important and conserved functions of the flower development, most MIKCC genes showed potentially essential roles in the seed development as well as the type I MADS.


Introduction
The MADS family, found in fungi [1], animals [2] and plants [3] [4], possesses a highly conserved N-terminal with a DNAbinding domain named MADS. Based on the phylogenetic analysis, MADS gene family is divided into two large lineages, type I and type II, which was created through a gene duplication occurred before the divergence of plants (and fungi) and animals [5][6][7].
In plant, the typical difference between type II MADS genes and type I MADS genes is that the plant type II, but not type I, has a K-domain [5,6,8]. The plant-special type II MADS is also named as MIKC MADS due to their four domains, MADS domain, I domain, K domain, and C domain [8,9]. and except MADS-domain and K-box domain, I-domain and C-domain are not conserved although they have important functions for the MADS family [9,10]. MIKC type can be further divided into MIKC C and MIKC* clade, both of which were present in a common ancestor of mosses and vascular plants, suggesting they are an ancestral kind of genes [11]. About 13 subfamilies compose of the MIKC C clade and most of them originate from ancestral seed plants and are often characterized by distinct sequence motifs in their C-terminal domains [5,12]. And based on the phylogenic tree, AG-, AGL6-, AGL12-, DEF+GLO-(B), GGM13-(B(s)), STMADS11-and TM3-like genes very likely existed already in the most recent common ancestor of angiosperms and gymnosperms and AGL2-, AGL17-, and SQUA-like genes, existed at least already in the most recent common ancestor of monocots and eudicots [5]. MIKC* clade is characterized by an altered protein domain structure, probably evolved from an ancestral MIKC C gene by a duplication in the Keratin-like region and composed of S-clade, P-clade, lycophyte-clade and Bryophyte-clade [7,13]. Heterogeneous type I MADS can be subgrouped into Ma, Mb, and Mc based on the sequence of the MADS domain and the presence of additional motifs [5,[14][15][16].
Since the plant MADS genes, AGAMOUS (AG) from Arabidopsis thaliana [3] and DEFICIENS (DEF) from Antirrhinum majus [4] are first discovered as regulators of floral organ identity, a lot of plant MIKC C genes have been isolated from various plant species and demonstrate their essential roles in almost all developmental processes in plants, such as the control of flower identity, root architecture, gametophyte development, fruit ripening, the regulation of flowering time ( [17][18][19][20]). By contrast, less attention is paid for type I MADS genes. Recent studies indicate a key regulatory role for type I MADS genes in plant specifying female gametophyte, embryo, and endosperm development [20,21]. MIKC* genes retained a conserved role in the gametophyte during land plant evolution [13]. And five Arabidopsis MIKC* genes (AGL30, AGL65, AGL66, AGL94, and AGL104) are expressed in pollen and regulate pollen development by repressing immature pollen genes and activating mature pollen genes [22,23].
Extensive duplications in the angiosperms have resulted in the expansion of members of the gene families and gene diversifications. And duplications in plant MADS transcription factors have been studied to understand the origins and evolution of plant developmental mechanisms [24,25], and the results demonstrated that duplicated genes either swapped roles, acquired novel roles or retained ancestral roles in different plant species [26]. The modern soybean genome has apparently undergone one whole-genome triplication (WGT) and two whole genome duplication (WGD) events (Legume WGD and Glycine WGD), and about 75% of genes have multiple paralogs [27,28]. Among them, ,50% of paralogous genes display expression subfunctionalization [29], which may contribute to phenotypic variation in polyploids [30]. In addition to WGD duplication, tandem duplication generates gene copies that are consecutive in the genome and is presumed to arise through unequal chromosomal crossing over [31] and may contribute to the expansion of some large gene families [32]. Dispersed duplicates are neither adjacent to each other in the genome nor within homeologous chromosome segments and may result from transposition events [31,33]. Such distantly transposed duplications may occur by DNA-based or RNA-based mechanisms [34]. A total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) are found in the soybean genome [35]. And transposed duplication may play significant roles in shaping and reshaping of their host genomes regulating gene expression, altering gene function, and creating new genes [36,37].
MADS families are identified and classified in many flowering plants such as Arabidopsis [14], petunia [38], tomato [39], poplar [40], rice [41], grapevine [42], maize and sorghum [43], cucumber [44]. But only genome-wide expression profiles of all the MADS genes have been reported in Arabidopsis and rice [14,15,41], and a comprehensive plant protein-protein interactome map of nearly all members of the Arabidopsis MADS family has been constructed to investigate the essential roles of Arabidopsis biological processes [45]. Expression profiles of MIKC C genes of conserved known biological functions are systematically analyzed in some plants [39,42].These accomplishments benefit us to comprehensively understand the plant MADS family and their functions in plant development.
Soybean is one of the most important crop plants for seed protein and oil content, which are associated with the seed development, and genetic control of agronomic traits, such as seed constituents and yield, is inherited in a quantitative manner. Based on the soybase database (http://www.soybase.org) [46], about 2200 soybean quantitative trait loci (QTLs) were associated with about 134 soybean agronomic traits, among which about 1000 QTLs were relative to the seed or pod development. Furthermore, the soybean genome has been completely sequenced [47]. Therefore, QTLs can be mapped to the genome and provide the primary insight for understanding potential functions of colocalized genes in the soybean development.
The MADS family has obviously functional roles in the plant development. Soybean MADS genes and the characters of their evolution in the soybean genome evolution process were identified through the bioinformatic analysis. And based on the genome-wide expression patterns through in silico expressions and RT-qPCR, more attentions were paid to soybean MADS expressions in the seed development. In addition, co-localizations of MADS genes with QTLs relative to seed traits were investigated. According to our results, not only did type I MADS highly express in the seed development [21,48,49], but also MIKC C genes had important functions in the seed development, except the conserved and best-studied function of the flower development.
GmMADS genes were distributed on 20 chromosomes, especially in GM8 (18 MADS genes), GM10 (16 MADS genes) and GM18 (18 MADS genes) ( Figure 1). And according to the characters of MADS gene distributions, the location of MIKC and Mc genes were not biased in the chromosomes, but Ma genes mainly located in the end of GM10 through tandem or proximal duplicates, as well as Mb genes in GM11 and GM18.
Based on the motif organization of 163 soybean MADS proteins, all soybean MADS proteins almost had three motifs, motif 1, 2 and 6 ( Figure 2 and Figure S2). And motif 1 (29 aa, Table S2) and 2 (14 aa, Table S2)  In term of the gene structure, the fist exon (about 180 bp) conservatively coded the MADS domain in the MADS genes with more than one exons (Table S1). 7 MIKC* genes had 9-11 exons, and covered about 4.7 kb in length in the genome. For the MIKC C genes, the number of exons were 4-9, and average number was about 7, and covered about average about 7.4 kb in length from 0.75 kb to 18.6 kb, and genome sequences of about 82.7% MIKC C genes were longer than 5 kb. About 44/75 type I MADS genes had only one exon and 12/75 type I MADS genes had more than 4 exons, and the average exon number of Ma, Mb, and Mc genes was 1.6, 2.4, and 2.1 respectively, and the average genome sequences were about 0.6, 1.2, and 0.8 kb respectively.

Different Expansion Patterns of Two Types of MADS within the Soybean Genome
To investigate GmMADS gene evolution in soybean, the syntenic relationships among G.max, M.truncatula, experiencing the WGT and Legume WGD events [50] and V.vinifera, experiencing the WGT event [51], were computed through MCScanX [52]. Based on our results, the blocks containing some MIKC genes, which  (Table S1). The short black lines in the green arcs showed the markers associated with non-seed QTLs and those in the red short blocks showed the markers associated with seed QTL (Table S4). The red blocks showed regions of QTLs relative to the seed traits according to the markers (Table S4). The light blue rainbows showed collinear relationships among the blocks containing MADS genes according to the MCScanX results (Table  S3) (Table S3). But for the type I genes, the inter-species syntenic relationships can hardly been found. That indicated gene orders of the blocks containing MIKC genes were more conserved than that containing type I MADS genes during the soybean evolution process.
WGD events expanded the members of the MIKC family, and blocks containing about 85% (75/88) MIKC genes experienced WGD events (Figure 1, 3 and Table S3). Based on syntenic blocks, all the genes of subfamily AG (10 genes), AGL12 (2 genes), ABS (2 genes), FLC (2 genes), and SOC (8 genes) were originated from 3, 1, 1, 1, and 1 different ancestor sites before the Gamma WGT events respectively. In addition to the WGD duplication, other gene duplication patterns were found in some MIKC C subfamilies. In SQUA subfamily, six paralog genes (GmMADS28/32, 29/30, and 31/159) were the resultants of a common ancestor sites experiencing three WGD events, and one paralog gene pair (GmMADS24/26) were diverged after the Glycine WGD event, while GmMADS25 and GmMADS27 were the results of the transposed duplication (Table S3- 4). Seven members of AGL17 subfamily were originated from an ancestor sites before the Gamma WGT events, and GmMADS84 was tandem connection to GmMADS160. In SVP subfamily, six paralog genes were derivates of a common ancestor, while GmMADS124 and 162 through the tandem duplication and a dispersed gene (GmMADS51) through the transposed duplication (Table S3-4). Eight DEF/GLO genes were originated from two different ancestor sites, and the transposable elements occurred at the up/dwonstrean of GmMADS121, 133, and 147, which were dispersed in the genome (Table S-3). The origins of 10/11 SEP genes were 3 different ancestor sites before the Gamma WGT events, and a transposed duplication member, GmMADS71, was proximal to GmMADS23 (AGL6) (Table S3-4). Three paralog gene pairs of the AGL6 subfamily GmMADS21/91, GmMADS22/23 and GmMADS34/36 were from the common ancestor site before three rounds of WGD events, but GmMADS69 located among two transposable elements had the tandem relationship with GmMADS159 (SQUA) ( Table  S3-4).
Only AGL15 subfamily had not any paralog genes and was composed of two dispersed genes, GmMADS122 (GM12) and GmMADS43 (GM2), which were located among two transposable elements (Table S4). In the syntenic blocks, some MIKC C genes of different families showed the tandem relationship. For example, paralog SEP gene pairs GmMADS33/37 and GmMADS17/19 displayed tandem relationship with GmMADS15/73 (FLC subfamily) and GmMADS28/32 (SQUA subfamily) respectively ( Figure 1). That may result from the transposable duplications (Table S4).
For MIKC* gene family, the members of MIKC*-S (GmMADS54/55 and GmMADS68) and -P (GmMADS70/134 and GmMADS56) experienced the same evolution processes. The paralog pair genes were diverged after the Glycine WGD event (Figure 1, 3 and Table S3). And GmMADS56 and GmMADS68 did not have the corresponding gene pairs in the soybean syntenic blocks, but their homologous blocks and collinear pairs were found in Medicago (two MIKC*genes, MtMADS52 and MtMADS23 respectively) ( Table S3), suggesting that their collinear genes in their soybean syntenic blocks were not retentive after the Legume WGD events.
Multiple duplications made the number of the type I GmMADS  Table S4). That suggested that type I genes had experienced a higher rate of birth-and-death evolution than type II genes.

Soybean MADS Genes Co-localized with QTLs for the Seed-relative Features
Based on the soybean QTL database (http://www.soybase.org/ search/qtllist.php), about 269 loci associated with 807 QTLs for 112 traits were found within 2-Mb genomic regions surrounding 148 soybean MADS genes ( Figure 1 and Table S4). And 186 out of 269 loci were associated with 372 QTLs for 59 seed-relative traits, containing 295 QTLs relative to seed traits (constituent or size), 32 to pod maturity date, one to R3 beginning pod, one to R8 full maturity and 43 to the yield (Table S5) Table S5).

Soybean MADS Genes Showing Highly Expression in the Seed Development through in silico Transcriptome
For the whole expression analyses of the soybean MADS family, the RPKM method was employed to correct biases in total gene exon size and to normalize for the total short read sequences obtained in 17 tissue libraries [53,54]. And then relative RPKM values represented the relative expression of each MADS genes (Figure 3, 4 and Table S6). From the online database, 25 GmMADS genes (19 for type I and 6 for type II), were undetectable at the transcription level in all 17 tissues; 29 genes (22 for Ma, 1 for Mb and 6 for Mc, were detected only in seed tissues; 9 genes, (5 for MIKC c , 2 for MIKC*, and 2 for Ma, were detected only in non-seed tissues; 37 genes (34 for MIKC and 4 for type I), had no biased tissues and wide expression with fluctuant levels.
A hierarchical clustering analysis of transcription profiles in 17 tissues based on a Pearson correlation displayed 13 clusters for 138 GmMADS genes ( Figure 5 and S3). Most of them (78 genes, 43 for type I and 35 for MIKC) highly expressed in the seed tissues and fell into Cluster I to IX. The genes in different clusters had their own abundant transcripts in different tissues, such as Cluster I and IX in suspensors and embryos at globular stage, respectively, both Cluster IV and V in seed coat parenchymas at the early stage of seed maturation, Cluster VII in seeds at the globular embryo stage, Cluster VIII in seeds at both the globular and heart embryo stages, both Cluster IV and Cluster VI at the seed developing stages, Cluster II in developing cotyledons or in dry seeds, and Cluster III in seeds at the early stage of seed maturation.
There were 36 GmMADS genes of Cluster X and XI highly expressed in flowers. And Cluster X genes also highly expressed in leaves. Cluster XII expressed mainly in roots, while Cluster XIII in both roots and stems ( Figure 5 and S3).
By and large, the expression level of type I GmMADS genes was much lower than that of MIKC, and they were mainly in the seeds as the previous report in other species [49]. The function of MIKC genes is famous as regulators in the floral organ development, but our analysis showed that high expression of some MIKC genes were detected in roots, stems, leaves, and seeds besides flowers in

GmMADS Gene Expression Peaks in the Soybean Seeds
Some GmMADS genes strongly accumulated in the seed in different seed developmental stages ( Figure 6). One expression tendency of them was that the transcription occurred at high level in the seed at the early stage, and then progressively downregulated along with seed development; their transcripts were detected at very low level in the flowers and hardly in the dry seeds. Such a kind of genes included 7 AG genes (GmMADS3, 11, 12, 13, 20, 38, 16, 17, 18, 19, 33, and 71), AG genes (GmMAD-S1and 2), DEF/GLO (GmMADS5 and 10), SVP (GmMADS124) strongly expressed in the seed development ( Figure 8). That indicated these MIKC C genes played important roles in reproductive tissues.
There were also some genes having relative high abundance in multiple tissues besides the flowers (Figure 9). Two FLC genes (GmMADS73 and 15) and two SVP genes (GmMADS50 and 128) showed relatively high expression in the leaves at the seedling stage and at the flowering time respectively. The high level of transcripts of GmMADS69 (AGL6) and GmMADS84 (AGL17) were in roots, besides in flowers.   (Figure 11). The relative expressions of GmMADS31 were strongly detected in the seedling stage stems and lower in other tissues. GmMADS130 highly expressed in the seedling stage stems, but high in the seedling stage roots and leaves and flowering stage roots. The highest transcriptions of GmMADS75 and GmMADS105were in the stems at the flowering time, and they showed relatively high expressions in the roots. And Low transcriptions of all the four genes can also be detected in flower and seeds.  Figure 12). According to the transcription patterns, eight MIKC C genes expressed more highly in the root at the flowering times than at the seedling stage, and 4 MIKC C gene and one Mc gene more highly expressed in the seedling stage roots. Expressions of two AGL6 genes were abundant in the roots and other tissues at the flowering time. And GmMADS67 expressed only in the roots at the seedlings. That indicated 16 genes were more importance of the roots than the flowers and seeds.

Expression Divergences of the Paralog Gene Pairs
From the results above, there were 46 paralog gene pairs found in the soybean MADS gene family. Based on in silico expression data of these pairs in 17 soybean tissues, expression divergence in a sample was obviously evidenced. For example, 236 (about 30.2%) of gene pairs showed only one gene expressed while the other was undetectable; 113 (about 14.5%) of gene pairs showed the ratios between paralog gene pairs were between 1 and 2, 83 (about 10.6%) between 2 and 10, 38 (about 4.9%) above 10 ( Figure 13). The MIKC paralog pair gene expressed high or low in same tissues, whereas most one of type I MADS paralog gene pair did not expressed ( Figure 13 and Table S3). Our results indicated that, during evolutional progress, MIKC paralog gene pairs underwent sub-funtionalization and type I MADS paralog gene pairs underwent non-functionalization.

Soybean MADS Subfamilies Experiencing Different Selection Pressure
At least one ancestral MADS-box gene was present in the common ancestor of plants, animals, and fungi, and probably the duplication that gave rise to the animal MEF2and SRF-like genes occurred after animals diverged from plants but before fungi diverged from animals about 1000 million year ago (MYA) [6,8]. Plant MIKC-type genes and animal MEF2-like genes are monophyletic, not so as plant type I and animal SRF-like genes do [6,8,16]. Subfamilies of MIKC, such as AG-, AGL6-, AGL12-, DEF/ GLO, GGM13-(B (s) ), STMADS11and TM3-like genes, very likely existed already in the most recent common ancestor of angiosperms and gymnosperms about 300 MYA, and AGL2-, AGL17-, and SQUA-like genes, existed at least already in the most recent common ancestor of monocots and eudicots about 200 MYA [5]. So some important events during the phylogeny of species, especially spermatophyte, can be shown through the evolution of MADS gene family [55]. And the soybean genome has experienced two WGD events, the legume WGD and Glycine WGD, after the Gamma WGT event, when the monocots and eudicots diverged [27,28]. So the trace of all soybean MADS genes experienced the three rounds of the WGD events should been found in the soybean genome during the soybean evolution process. But some evidences were found only in the MIKC gene evolutions. The homologous blocks containing MADS genes showed that ancestors of 6 soybean MIKC C subfamilies, SOC (TM3-like), SQUA, AGL6, SVP (STMADS11-like), AGL17, DEF/GLO, existed at least before the Gamma WGT event ( Figure S5), and that AG and SEP (AGL2-like) at least before the legume WGD event (Figure 3). The evolution traces of blocks embodying 3 subfamilies of MIKC (AGL12, FLC and ABS), MIKC* family and all the type I MADS genes were only found after the Glycine WGD event (Figure 3 and 4 and Table S3). That indicated that in the soybean genome evolution, type I MADS-box genes and three MIKC subfamilies and MIKC* family had experienced faster birth-and-death evolution than some MIKC subfamilies.

More Early Divergence of Paraog genes, More Significant Difference of Expressions
In the soybean, duplication events result in a genome with approximately 46,430 'high-confidence' genes, of which 75% are present as more than one copy [28]. And approximately 50% of paralogs from the recent WGD event differentially expressed and thus had undergone expression sub-functionalization through RNA-seq expressions of 7 tissues [29]. Based on our results, most MIKC paralogs from the recent WGD event showed the similar expression profiles, and one of the paralog pair expressed more highly in the most tissue than the other ( Figure 13). But for the paralogs from much deeper duplications, the detached transcriptions were more obvious. For example, three paralog gene pairs of AGL6 subfamily diverged into two clades after the Gamma WGT event, one clade experienced the two WGD events, and then formed one paralog gene pair, GmMADS22/23, and the other was composed of GmMADS34/36 and GmMADS21/91 (Figure 3). The former paralog gene pair strongly expressed in the flower and seeds (Figure 3 and 7), but the two latter paralog genes detached after the legume WGD event highly expressed in the roots at the flowering time (Figure 3 and 12). And other paralog genes of MIKC C subfamilies from different divergence events show similar expression divergences to AGL6 subfamily. That indicated the Gamma WGT event had important roles in vaiable functions of paralogs.

MADS Genes have Important Potential Roles in the Seed Development
Compared with type II MADS, very little is known about type I MADS, but recent studies display a key regulatory role for type I MADS factors in plant reproduction, in particular in specifying female gametophyte, embryo, and endosperm development [21,48,49,[56][57][58]. The rice and Arabidopsis have 28 and 61 type Figure 5. Expression cluster analysis based on in silico expression of 138 MADS genes. The samples were designed as Figure 3. The gray line shows expression profiles of each genes, and the green line is the average expression and indicates the expression pattern of one cluster. For simplicity, gene names display the corresponding code numbers of every subfamily ( Figure S1 and Table S6). doi:10.1371/journal.pone.0062288.g005    (Table S7). According to microarray expressions of MADS genes in rice or Arabidopsis ( Figure S6 and S7), 10 and 32 type I MADS genes highly express in the rice or Arabidopsis seed development processes respectively. In addition, activities of promoters of 38 type I MADS genes are detected in the female gametophyte and seed development processes through their own promoters and 20 type I MADS genes are not detected [49]. In the soybean, 45 out of 75 type I MADS genes expressed in the seed tissues and transcriptions of 22 genes were not detected ( Figure 4).
Since the first MADS genes are found in Arabidopsis and Antirrhinum respectively [3,4], the importance of best-studied MIKC C genes is well known for floral homeotic functions during the ontogeny of flowers. Floral organ identity genes have been subdivided into five different classes, termed as class A (e.g. AP2 and AP1), B (e.g. PI, AP3, GLO, DEF), C (e.g. AG), D (e.g. SHP and STK), and E (e.g. SEP) genes, which are required in different combinations to specify sepals, petals, stamens, carpels and ovules [10,25,[59][60][61]. Based on in silico analysis and our RT-qPCR results, GmMADS genes displayed high transcriptions in the soybean flower. For example, 7 members of DEF/GLO subfamily highly expressed in the flowers (Figure 3 and 7). Transcriptions of all SQUA subfamily genes, especially the AP1 ortholog genes GmMADS24, 25, 26, and 27, were also detected in the flowers (Figure 3 and 7).
But some MIKC C subfamilies with known floral homeotic functions displayed high relative expression not only in the flowers, but in the seed tissues, especially in the early stage seeds (Figure 3  and 8). Among 11 members of SEP subfamily, 7 SEP-like genes (GmMADS14, [16][17][18][19]33, and 71) relatively highly expressed in the flowers, but all members of SEP subfamily strongly expressed in the seeds at different developmental stages (Figure 3, 6 and 8). In the rice, high transcriptions of four SEP genes (OsMADS1, 5, 7/45, 8/24,) were detected both in the rice panicle and seed development and OsMADS34 were detected only in the rice panicle development ( Figure S6). In Arabidopsis, SEP1, 2 and 3 expressed in embryonic culture tissues [48], and SEP1 and 2 expressed both in the seed and in the flowers, while SEP4 highly expressed in the   Figure S7). That suggested that soybean SEP genes may play a fundamental role in the development of all floral organs and seeds.
Homeotic C-class gene AG ortholog genes, GmMADS1 and 2 showed relatively high expression in the flowers and seeds (Figure 8), however, another AG ortholog gene, GmMADS3 highly expressed not in the flowers but in the early seeds, as well as 2 STK-like genes (a paralogous gene pair, GmMADS38/39) and 4 SHP2-like genes (two paralogous gene pairs, GmMADS11/13 and GmMADS12/20) did ( Figure 6). And in the rice, compared with transcriptions in flowers, four rice AG members, such as OsMADS3, 13, 21 and 66, highly expressed in the seeds ( Figure  S6). In Arabidopsis, STK (AGL11) is not only detected in inflorescence but in the developing silique tissues, and redundantly with SHP1, SHP2 and ABS regulate the seed development ( Figure  S7) [62][63][64][65]. The results indicated that GmMADS genes underwent different evelutional progress from that of Arabidopsis MADS gene did and acquired new function in seed development. Another SHP1 ortholog gene GmMADS74 were detected in the seeds and flowers at low level, but high expressed in the roots and leaves (Figure 12), inferring its function beyond flowers and seeds.
The embryo proper represents new sporophytic generation and contains the shoot and root meristems. It is well known of the MIKC genes, AGL15 mRNA accumulates primarily in the embryo and the seed, and has an important component of the regulatory circuitry directing seed-specific processes in the developing embryo [62,66,67], and according to Figure S6, the expressions of AGL15 and AGL18 were increasing during the process of the seed mature. GmAGL15 (GmMADS122) is also preferentially expressed in developing embryos, but not in the flowers and yang pods [68]. Both AGL15 subfamily genes, GmMADS122 and 43, expressed in the seed tissues or flowers based on our RT-qPCR results, ( Figure 6 and Table S6); two AGL17 subfamily members GmMADS64 and 160 highly accumulated in the globular stage embryo and low in the seed development (Table S6). In the rice, three AGL17-like genes, OsMADS25, 59 and 61, highly expressed in the later stage of the seed development. But in Arabidopsis, three AGL17-like genes, AGL17, 21 and ANR, highly expressed only in the roots, and low expressions of AGL16 can be detected in seeds ( Figure S7). The suspensor is a terminally differentiated structure that supports and nourishes the embryo proper and degenerates later in development. GmMADS51 and 52 (SVP), GmMADS106 (SOC) and GmMADS127 (AGL17) highly expressed in suspensors and had potential roles in the function of the suspensors.

Plant Materials
The soybean (Glycine max) cultivar Kennong 18 was employed in all experiments. Plants were grown in a growth chamber under short day conditions (8 hr light/16 hr dark) at a temperature 25uC , 28uC. Under the normal conditions, tissues were separately harvested at different stages for gene expression analysis. The seeds were sampled at day 7, 14 and 21 after flowering and when the seeds became yellow. At least five individual plants per sample were then harvested and frozen in liquid nitrogen and stored at 280uC until used. And all experiments were repeated three times under the consistent conditions.
Because diversity of MADS genes in Arabidopsis is rather ancient and representative for other flowering plants [5], 108 Arabidopsis [14] and 75 rice MADS genes [41] (Table S7) were selected to classify 163 soybean MADS proteins into 5 families, MIKC C , MIKC*, Ma, Mb and Mc, and soybean genes most similar to Arabidopsis MADS genes were considered as the Arabidopsis ortholog genes, as well as the classification of M. truncatula and V.vinifera MADS family. And a topology tree ( Figure S1) was constructed to investigate the relationship of the soybean and Arabidopsis MADS proteins through ClustalW1.8 and MAGE 5.0 using Neighbor-Joining method [70].
To analyze the specific motifs of different MADS families, 10 motifs were identified by MEME v 4.9.0 with default parameters [71] (http://meme.nbcr.net/meme/cgi-bin/meme.cgi) among the soybean, rice, Arabidopsis, V.vinifera and M. truncatula after removing redundant sequences by Purge tool [72], and then MAST v 4.9.0 was used to identify motif organizations of 163 soybean MADS proteins ( Figure S2).

Investigating QTLs Relative to Soybean MADS Genes
To determine the co-localization of MADS genes with the QTLs, available SSR or RFLP markers linked to the QTLs were downloaded from Soybase (http://www.soybase.org/dlpages/ index.php) [46], and markers' sequences were mapped to the soybean genome (v1.09) through Blast. Furthermore, the QTL physical locations are usually uncertain due to the recombination  frequency being affected by population size, even if the linked marker sequences and their genomic positions are known. Therefore, genes within a 2-Mb genomic region flanking markers were associated with a QTL.

Total RNA Isolation and Quantitative Reverse Transcription-PCR
The procedure used for RNA extraction, cDNA synthesis, and PCR was as described by Hu, et al [73]. According to the specificity and efficiency of the primer pairs, 96 soybean MADS genes were designed by Beacon Designer 7.9, and at least one primer was specific for the target gene primer pairs (Table S1). GmSKIP (Glyma12g05510), GmUKNII (Glyma14g08990) and GmUKN1 (Glyma12g02310) were selected as reference genes for all the experiments [73]. And primers used as controls or for analysis had an efficiency of greater than 90% by LinRegPCR (http://LinRegPCR.HFRC.nl) [74].

In silico Expression Analysis of MADS Genes
The tissue-specific transcript characteristics of 163 MADS genes were investigated based on the RNA-seq data from 17 soybean tissues (http://seedgenenetwork.net/soybean), such as three seed compartments at the seed globular stage (GSE29162), whole seeds at five stages of seed development (globular, heart, cotyledon, early-maturation, dry), and vegetative (leaves, roots, stems, seedlings) and reproductive (floral buds) tissues (GSE29163), cotyledons of mid-maturation and late maturation seeds, whole dry seeds, and cotyledons of seedlings six days after imbibitions (GSE29134).
For the whole expression analyses of soybean MADS, the RPKM method was employed to correct for biases in total gene exon size and to normalize for the total short read sequences obtained in each tissue library [53,54]. And the geometrical average of RPKM values of the selected reference genes in RT-qPCR experiments was a reference gene value, and the ratios of target gene and a reference gene value were the relative expression level of the target gene in each sample (Table S6). A hierarchical clustering analysis of gene-wide normalizations of 138 gene transcription profiles in 17 tissues using a Pearson correlation was computed by Gensis1.7.5 [75].
Compared with expression profiles of the soybean MADS in the seed development, the microarray data, GSE6893 for rice [76] and GSE680 for Arabidopsis [77], were downloaded from the GEO database, and then cDNAs of MADS genes were selected to identify special probe sets through Probe Match tool (http://www. affymetrix.com/analysis/index.affx). At last, expression values of 55 rice and 73 Arabidopsis MADS probe sets were computed by Genespring 11.5 respectively ( Figure S6 and 7). A hierarchical clustering analysis of gene-wide normalizations of transcription profiles used a Pearson correlation by Gensis1.7.5 [75].

Collinear Relationships of 163 MADS Genes in the Soybean Genome
MCScanX [52] was employed to identify syntenic regions containing MADS genes among soybean (Table S3), V. vinifera and M. truncatula. Briefly, BLASTP with e-value #1e210 was applied to find intra-species paralogous pairs and inter-species homologous pairs, and the homologous blocks involved at least 5 collinear gene pairs and the gap gene pairs number was not more than 20. Based on the average Ks value of homologous blocks, the divergence times of the blocks were computed to investigate the evolution of soybean MADS genes in the soybean genome evolution. For example, if the average Ks is less than 0.3, divergence of the homologous blocks is about after the Glycine WGD event, and Ks is more than 1.5, the divergence time is after the Gamma WGT event, and Ks is between 0.3 and 1.5, the divergence time is after the Legume WGD event and before the Glycine WGD event.
Besides the WGD duplication, Soytedb (http://www.soybase. org/soytedb/) [35] was employed to identified the nearest transposable elements around some soybean MADS genes to analyze the importance of the transposable duplications to the soybean MADS gene evolutions (Table S4). Figure S1 Phylogenetic relationship of MADS genes between Glycine max and Arabidopsis. The deduced fulllength amino acid sequences of 163 Glycine and 108 Arabidopsis genes were aligned by Clustal X 1.83 and the phylogenetic tree was constructed using  Figure S5 The evolution model of blocks embodying MADS genes in the soybean. The black blocks were as the ancestors before the Gamma WGT event, and red, blue and green blocks showed the traces of the Gamma WGT, Legume WGD and Glycine WGD event respectively, and the bars were the MADS genes in the blocks. And the blocks without color showed the blocks were lost in the genome evolutionary history. Green blocks without bar were that the MADS genes were lost after the WGD events. (TIF) Figure S6 Microarray expressions of rice MADS genes. The microarray data (GSE6893) were from the NCBI GEO database. Special probe sets for 55 rice MADS genes and expression values were computed through Genespring 11.5. A hierarchical clustering analysis of gene-wide normalizations of 55 gene transcription profiles in 14 tissues using a Pearson correlation by Gensis1.7.5. SAM, P1 to P6 were up to 0.5 mm, 0-3 cm, 3-5 cm, 5-10 cm, 10-15 cm, 15-22 cm and 22-30 cm of panicles, respectively. And S1, S2, S3, S4 and S5 were seeds at 0-2, 3-4, 5-10, 11-20 and 21-29 days after pollination, respectively. Root and Young leaves was the roots and leaves from 7-d-old seedlings, respectively. Young leaves were as the control. (TIF) Figure S7 Microarray expressions of Arabidopsis MADS genes. The microarray data (GSE680) were from the NCBI GEO database. Special probe sets for 73 rice MADS genes and expression values were computed through Genespring 11.5. A hierarchical clustering analysis of gene-wide normalizations of 73 gene transcription profiles in 11 tissues using a Pearson correlation by Gensis1.7.5. Seedling was as the control. (TIF)