GmFT2a Polymorphism and Maturity Diversity in Soybeans

Background Soybean is a short-day crop of agricultural, ecological, and economic importance. The sensitive photoperiod responses significantly limit its breeding and adaptation. GmFT2a, a putative florigen gene with different transcription profiles in two cultivars (late-maturing Zigongdongdou and early-maturing Heihe 27) with different maturity profiles, is key to flowering and maturation. However, up to now, its role in the diverse patterns of maturation in soybeans has been poorly understood. Methods Eighty varieties, including 19 wild accessions, covering 11 of all 13 maturity groups, were collected. They were planted in pots and maintained under different photoperiodicity conditions (SD, short day; LD, long day; and ND, natural day). The day to first flowering was recorded and the sensitivity to photoperiod was investigated. Polymorphisms in the GmFT2a coding sequence were explored by searching the known SNP database (NCBI dbSNP). The GmFT2a promoter regions were then cloned from these varieties and sequenced. Further polymorphism and association analyses were conducted. Results These varieties varied greatly in time to first flowering under ND and exhibited a consecutive distribution of photoperiod sensitivity, which suggested that there is rich diversity in flowering time. Furthermore, although GmFT2a had only one known synonymous SNP in the coding sequence, there were 17 haplotypes of the GmFT2a promoter region, HT06 of which was extremely abundant. Further association analysis found some SNPs that might be associated with day to first flowering and photoperiod sensitivity. Conclusion Although GmFT2a is a key flowering gene, GmFT2a polymorphism does not appear to be responsible for maturity diversity in soybean.


Introduction
Different soybean cultivars exhibit different maturity pattern and sensitivity toward photoperiod, which are related to their adaptation to different ecological environments. For practical reasons, soybean breeders categorized soybean cultivars into different "maturity groups". For instance, soybeans in North America were classified into 13 maturity groups (MG): MG000 to MGX [1,2]. On the other hand, Chinese soybean researchers have divided cultivars into 12 MGs based on the environments and planting patterns in China [3,4]. Commercial cultivars of desirable traits but belonging to a particular MG are often limited by the geographical range of cultivation. It is therefore important to gain a better understand on the genetic control of photoperiodism and maturity in soybean.
Photoperiod responses and maturity patterns in soybean are quantitative traits controlled by multiple genes or loci. Up till now, nine maturity loci have been reported, including, E1-E8, and J [5][6][7][8][9][10][11][12][13]. These loci have been comprehensively reviewed by Xia et al. [14]. They play different roles under different photoperiods with stronger effects under long-day and weaker effects under short-day conditions [15]. Four of these loci were characterized at the molecular level, using map-based or candidate-based cloning. E1 encodes a soybean-specific potential transcription factor Glyma06g23040 [16]; E2 encodes a GIGANTEA homologue, GmGIa [17]; and E3 and E4 encode the phytochromes GmPhyA3 [18] and GmPhyA2 [19]. However, the genes corresponding to the five remaining loci have not been identified and the exact functions of the four identified loci remain unclear.
The key flowering gene Flowering Locus T (FT) in Arabidopsis thaliana encodes a putative florigen that is an integrating factor of the flowering regulation network [20,21]. Soybean has at least 10 FT-like genes [22,23], among which, GmFT2a and GmFT5a could functionally promote flowering in A. thaliana [23,24]. Furthermore, it was observed that GmFT2a overexpression could induce early flowering in transgenic soybean [24]. Therefore, it is surprising that these two important flowering genes have not been considered as candidate genes for the five unidentified maturity loci. GmFT2a is regulated by photoperiod differentially in two cultivars exhibiting different photoperiod sensitivities (photoperiodsensitive Zigongdongdou and photoperiod-insensitive Heihe 27) [24], suggesting that the function of GmFT2a might be related to the regulation of maturity. While it is speculated that the expression of GmFT2a may be developmentally regulated [24], the promoter region of GmFT2a has not been thoroughly analyzed.
The release of the soybean reference genome [25] has provided a new platform for breeding and molecular research. In addition, the resequencing of 31 wild and cultivated soybean genomes (designated as 31-Soybean Resequencing Project in this paper) has further characterized genome-wide genetic variations [26]. These studies may provide tools to address the question of how the polymorphisms in GmFT2a and its flanking sequences might function in the diversification of flowering and maturity time in soybeans.
In this study, soybean cultivars were originally cultivated/ collected in China and North America, which cover most MGs, from MG000 to MGVIII. The sequence polymorphisms in the GmFT2a coding sequence and the GmFT2a promoter were analyzed. Possible roles of GmFT2a and its potential application in breeding were discussed.

1: Plant materials and photoperiod treatments
Eighty varieties were collected in China and North America (Table S1 in File S1). Soybean seeds were planted in soil in 10-liter pots and grown under natural day (ND) conditions. After germination, seedlings of uniform size were selected so that each pot finally contained five uniform plants. The seedlings were grown in nature sunshine until the cotyledons opened, and were then separated into groups and grown under different photoperiods (LD, 16 h light/8 h dark; SD 12 h light/12 h dark; and ND). Additional details regarding plant growth and treatments were as reported before [27]. The day to first flowering of each plant was recorded as the number of days from the expansion of unifoliates to first flowering (DEUFF) and 15 plants in three pots were investigated for each variety of each treatment. Photoperiod sensitivity (PS) was calculated as described previously [28].

3: Bioinformatics analysis
The genomic sequences were aligned using ClustalW 2.0.9 [29]. The alignment was adjusted manually and input into MEGA 5 [30] for calculation of nucleotide diversity and Tajima's D statistics. It was also input into TASSEL [31] to estimate linkage disequilibrium and identify SNP-trait associations by generating a general linear model (GLM). The phylogenetic relationships among the 17 haplotypes were inferred using the NJ method in MEGA 5 [30].

The selected soybean population exhibited a continuous spectrum of photoperiod sensitivities
The plants investigated in this study were collected/cultivated in China and North America. The selected cultivars include a wide range of maturity types, from MG000 to MGVIII, and comprising 11 of the 13 defined maturity groups [1], representing a diverse population adapted to different geographic regions. As shown in Table 1 and Figure 1

The GmFT2a coding sequence is highly conserved
Investigation of the SNP data from the 31-Soybean Resequencing Project [26] revealed that there was only one SNP site in the GmFT2a coding sequence. It is a synonymous A/T SNP (named ss249156869) located at position 30746204 on chromosome Gm16 ( Figure S1 in File S1). Further resequencing of over-100 cultivated soybean genomes did not identify any SNP in the GmFT2a coding sequence (unpublished data). Therefore, the GmFT2a coding sequence is highly conserved and the diversity in flowering time and maturation time in soybeans is not a result of polymorphism in the coding region of GmFT2a.

The GmFT2a promoter region harbors rich polymorphisms
The GmFT2a promoter (about 2.3Kb) from each of the 80 soybean accessions used in this study was cloned and Sanger sequenced (available in GenBank with accession numbers of KF573201 -KF573362). Considering that the soybean genome is palaeopolyploid and contains 10 FT-like genes [23,32], we confirmed each sequencing result with BLAST, with reference to the genome of Williams 82 (www.phytozome.org). The nucleotide diversity was analyzed by Tassel v2.1 [31]. In total, 15 SNPs and 16 InDels were detected in the 2,489 aligned base pairs, with six InDels and two SNPs contained within two longer InDels (Table S2 in File S1). For the whole sequenced population, an average difference of 4.7 SNPs per kilobase (π=0.0047) were found between two samples ( Table 2). For the subpopulations of North American cultivars, Chinese cultivars, and wild soybeans from China, the values were 4.4, 4.8, and 5.4 SNPs per kilobase, respectively ( Table 2). This showed that the GmFT2a promoter region is more diverse in the soybeans from China than those from North America. Indeed, the Watterson estimator (θ) value was higher in the North American subpopulation than in the other two subpopulations ( Table 2). Tajima's D values were all negative, with differences reaching a significant level (P<0.001) in the population and subpopulations (Table 2). Without considering the two SNP sites located inside InDels, the 13 independent SNP sites were compared with those found in the 31-Soybean Resequencing Project [26]. Ten sites were common, three sites were newly found, and seven sites were missing. The missing sites were either close to an insertion and a deletion (Table 3).

Several SNPs show some relationship with DEUFF and photoperiod sensitivity
Association analysis was done using the GLM (Table 6). At a significance level of p<0.01, SNP S17 showed a relationship with DEUFF under SD and ND; SNPs S162 and S1849 were associated with DEUFF under SD; and InDel D272 was associated with DEUFF under LD. Association with PS was also analyzed (Table 6).

The soybean population in the present study harbors rich diversity in DEUFF and PS
The soybean population investigated in this study was diverse in terms of geographic source, since the specimens were collected from North America and China and included 11 maturity groups, from MG000 to MGVIII. The samples therefore covered almost all of the 13 MGs [1]. The population showed a  Figure 1). More importantly, the population exhibited a consecutive diversified day to first flowering. Individual plants within the population first-flowered every few days, from 20 to 85 days after the expansion of unifoliates, under natural day conditions ( Table 1). As for PS, wild soybeans were on average more sensitive than cultivated ones. The PS of wild soybeans varied from 0.590 to 0.810, while that of cultivated soybeans varied from 0.056 to 0.752. This greater range of variation in cultivated soybeans facilitates their adaptation to different ecological environments. More importantly, the entire population showed a consecutive diversified spectrum of PS, from 0.056 to 0.81 (Table 1 and Figure 1). Therefore, the population studied here is richly diversified not only in terms of geographic source but also in terms of phenotype with regard to DEUFF and PS.

GmFT2a is under strong selection
The coding sequence of GmFT2a is highly conserved. A search of the known SNP data from the 31-Soybean Resequencing project [26] revealed only one synonymous A/T SNP site (ss249156869) in the GmFT2a coding sequence ( Figure S1 in File S1). Furthermore, in an ongoing genome resequencing project involving over 100 cultivars, no SNP was found in the GmFT2a coding sequence (unpublished data). Therefore, the polymorphism of the GmFT2a coding sequence, which is under such strong selection, is probably not causally related to the diversity of flowering and maturity in soybeans. Adding that GmFT2a is involved in flowering transition and maintenance in soybean [24] and its homolog FT is an integrating factor of the flowering regulation network in A. thaliana [20,21], GmFT2a should be function essential for soybean adaptation.
Unlike the GmFT2a coding sequence, the GmFT2a promoter region is highly diversified. The degree of diversification of this region differs in different subpopulations (North American cultivars, Chinese cultivars, and wild soybeans). Wild soybeans were more diversified than cultivated ones, with a higher pairwise nucleotide diversity parameter (π) value (Table 2). Furthermore, the 19 wild soybeans examined in the present study included 14 of 17 possible haplotypes, while the 62 cultivated soybeans examined included only six of 17 possible haplotypes, supporting the assessment of a higher level of diversification in wild soybeans ( Table 3). Examination of all 80 varieties revealed 31 polymorphic sites in the GmFT2a promoter region. It is interesting that there was no significant linkage disequilibrium in such a narrow region, since Lu et al. detected linkage disequilibrium in rice Ghd7 [33]. This indicated that the GmFT2a promoter region was highly polymorphic. Considering that GmFT2a is a putative florigen gene that plays central roles in the flowering regulation network, this high degree of polymorphism might facilitate the adaptation of soybeans to different environments and requirements.
The GmFT2a promoter region is also under strong selection. The whole population and each of the three subpopulations considered individually all had significantly negative Tajima's D values (Table 2). This suggested that, like cultivated soybeans, wild soybeans might also be under positive selection. However, the negative values might also result from low frequency mutations or population expansion. However, more evidence is required in order to define the selection model. Haplotype analysis also provides evidence for strong selection. A total of 17 haplotypes were set up using 16 stringent polymorphic sites. These haplotypes did not distribute equally. Haplotype HT06 was the most predominant one. It was found in 62 varieties (10 out of 19 wild soybeans, 13 out of 17 Chinese cultivars, and 39 out of 44 North American cultivars), covering all maturity groups, from MG000 to MGVIII. This is also suggested that GmFT2a might be under high selection pressure, indicating a high degree of risk when selecting GmFT2a haplotypes during breeding. On the other hand, high profit tends to stem from high risk. CS59, a currently predominant and widely adapted cultivar of Zhonghuang 13, includes HT15; its wide adaptation might have resulted from the selection of HT15. GmFT2a might function as an engine.
The development of a strong and suitable engine might be the key to increasing production potential and adaptability.

Polymorphism of GmFT2a is not related to maturity diversity
Further association analysis with GLM did not show a significant association between GmFT2a polymorphism and maturity diversity, which is consistent with the idea that GmFT2a is under strong selection. At the level of p<0.01, SNP S17 showed a relationship with the day to first flowering under SD and ND while SNPs S162 and S1849 showed such a relationship only under SD. The PLACE program (http:// www.dna.affrc.go.jp/PLACE/) identified a CIACADIANLELHC element (CAANNNNATC, dark letter means SNP location) near SNP S17 (G /A). Near SNP S1849 (T/C), however, the program found an IBOXCORENT element (GATAAGR) [34,35]. Whereas the CIACADIANLELHC element is associated with circadian expression, the IBOXCORENT element is associated  Table 4. Haplotypes of the GmFT2a promoter region in 80 varieties.
Position S17 S162 D231 S320 S1149 S1458 D1496 S1580 D1737 S1844 S1849 D1849 S1912 S1930 D2014 D2263 Variety Number  There are nine maturity loci, E1-E8 and J [14]. E5-E8 and J have not been identified on the molecular level. GmFT2a is under highly stringent selection pressure, indicating that it probably does not correspond to one of the five unknown maturity loci. Indeed, GmFT2a also has nine paralogous genes, and little is known about these other nine genes [23,24]. Together, the nine maturity loci, GmFT2a and its relatives, and other flowering genes make up a complicated and elaborate flowering regulation network. There are many selection sites in the network that could be utilized in breeding new soybean varieties with good adaptation. GmFT2a functions downstream of other flowering genes, integrating flowering signals to regulate flowering. The predominance of HT06 indicates a core function of GmFT2a as an engine in the network. Therefore, it would be rather difficult to select GmFT2a directly in soybean breeding. However, future breeding should pay more attention to GmFT2a as a key element to be considered in approaches to breaking the bottleneck of soybean breeding.

Supporting Information
File S1. A Word document with supplementary materials, including Table S1 and S2, Figure S1 and S2 and List S1. (DOC)