The Acid Phosphatase-Encoding Gene GmACP1 Contributes to Soybean Tolerance to Low-Phosphorus Stress

Phosphorus (P) is essential for all living cells and organisms, and low-P stress is a major factor constraining plant growth and yield worldwide. In plants, P efficiency is a complex quantitative trait involving multiple genes, and the mechanisms underlying P efficiency are largely unknown. Combining linkage analysis, genome-wide and candidate-gene association analyses, and plant transformation, we identified a soybean gene related to P efficiency, determined its favorable haplotypes and developed valuable functional markers. First, six major genomic regions associated with P efficiency were detected by performing genome-wide associations (GWAs) in various environments. A highly significant region located on chromosome 8, qPE8, was identified by both GWAs and linkage mapping and explained 41% of the phenotypic variation. Then, a regional mapping study was performed with 40 surrounding markers in 192 diverse soybean accessions. A strongly associated haplotype (P = 10−7) consisting of the markers Sat_233 and BARC-039899-07603 was identified, and qPE8 was located in a region of approximately 250 kb, which contained a candidate gene GmACP1 that encoded an acid phosphatase. GmACP1 overexpression in soybean hairy roots increased P efficiency by 11–20% relative to the control. A candidate-gene association analysis indicated that six natural GmACP1 polymorphisms explained 33% of the phenotypic variation. The favorable alleles and haplotypes of GmACP1 associated with increased transcript expression correlated with higher enzyme activity. The discovery of the optimal haplotype of GmACP1 will now enable the accurate selection of soybeans with higher P efficiencies and improve our understanding of the molecular mechanisms underlying P efficiency in plants.


Introduction
Phosphorus (P) is essential for all living cells and organisms. P occurs in complex DNA and RNA structures that contain and translate genetic information and control all living processes in plants and animals. Animals need to obtain an adequate supply of P from their food [1]. As for plants, P is one of the most essential mineral nutrients required for growth and development [2]. However, most unmanured soils do not contain sufficient readily available P to meet the high demands of crops, particularly during certain periods of the growing cycle. Therefore, fertilizers containing P must be supplied. In recent years, the annual global consumption of phosphates was more than 50 million tons [3]. However, excess P application is problematic because the excess dissolution of phosphates contaminates water sources [4,5]. Furthermore, most of the annually fertilized phosphates are fixed in the soil in organic forms by adsorption, sedimentation and transformation. Consequently, 50-80% of the total P found in soil exists as organic phosphates, which are unavailable to plants in the absence of mineralization [6]. Therefore, agricultural soil is a large 'potential P pool' that must be developed and used. Furthermore, the global reserve of rock phosphate is a non-renewable resource. A recent study conducted by the International Fertilizer Development Center (IFDC) concluded that at the current extraction rates, the global commercial phosphate reserves will be depleted in approximately 300-400 years [7]. Therefore, the development of plants that can efficiently utilize endogenous and added P is a sustainable, economic approach for plant production.
In response to persistent P deficiency, plants have developed many adaptive mechanisms to enhance the availability and uptake of P, including modifications in root architecture, increased activities of internal and extracellular acid phosphatases, greater exudation of small molecular organic acids, and symbiosis with mycorrhizal fungi [2]. Hundreds of genes associated with the plant P metabolic pathway have been identified [2], but few of these have been applied in crop breeding programs, perhaps because most of the identified genes were cloned by reverse genetics methods from a few model plants [6,[8][9][10], while few were identified by forward genetics methods [11]. For example, tremendous efforts have been made to dissect the genetic basis of soybean P efficiency by assessing factors such as biomass [12], root architecture [13], P concentration, acid phosphatase activity [14] and flower/pod abscission rates [15]. These studies, which were based on genetic analyses in segregating populations, identified several QTLs that likely control P efficiency. However, no QTLs associated with soybean P efficiency have been cloned. The reasons for these limited results may include the following: (a) the QTLs associated with P efficiency in soybean are generally localized to large chromosomal regions (10-20 centimorgans), (b) the effects of the QTLs are minor and often represent only a small fraction of the phenotypically relevant variation, and (c) QTLs are complex, and their evaluation requires information regarding germplasm diversity. Therefore, many challenging questions remain unanswered. For example, how can we identify the candidate P efficiency genes in these QTLs? Is P efficiency modulated by genetic variation of the candidate genes? What are the most favorable alleles and haplotypes useful for the breeding of high P efficiency? These questions reveal the highly challenging nature of the genetic dissection of P efficiency in soybean and other plants.
Although QTL linkage mapping provides useful information on genetic loci, it is typically difficult to isolate candidate genes based on a single QTL mapping experiment. Furthermore, the genes identified by this method are restricted to those that segregate in the considered cross [16]. Genome-wide associations (GWAs) overcome this limitation and have recently been successfully applied in studies of Arabidopsis [17], rice [18,19], maize [20] and other plants [21,22]. This method can provide increased power for the localization of QTLs because of the higher recombination rates between markers and QTL alleles in randomly mating populations [23][24][25]. However, GWAs also have limitations because they could generate false positives as a result of population structure. Although population structure can be controlled by statistical methods [13,17,18], the complementary use of familybased linkage analyses in controlled crosses is also an option [26,27]. The complementarity of GWAs and classical linkage analyses has been well demonstrated by studies of Arabidopsis flowering time [16] and rice Al tolerance [28]. In addition, because P efficiency is a typical quantitative trait controlled by multiple genes [29] and the molecular mechanism underlying soybean P efficiency is poorly understood, a candidate-gene association analysis could be an effective means to functionally characterize target genes. This strategy to study complex quantitative trait genes has been successfully applied for many plant traits such as maize flowering time [23], carotenoid content [30], architecture of floral branch systems [31] and rice seed shattering-loss [32].
Soybean is a highly important crop. Low P availability is the most significant soybean production constraint and is more problematic than other nutrient deficiencies, toxicities or diseases [33]. Because soybean is a model legume plant, the characterization of P efficiency related genes and the mechanisms responding to low-P stress in soybean would eventually facilitate P efficiency studies in other legumes and plants. We aimed to clone the soybean candidate gene for P efficiency, elucidate its effects, determine its favorable haplotypes and develop valuable functional markers. Therefore, we performed a series of experiments including linkage mapping, genome-wide association mapping, candidate-gene association mapping, gene expression and plant transformation. These studies revealed that a soybean acid phosphatase-encoding gene, GmACP1, located within the major QTL qPE8, is associated with P efficiency, and its genetic variation modulates P efficiency related traits in soybeans.

Significant variation in P efficiency among soybean germplasms
To determine the genetic variation of P efficiency in soybean plants, four P-efficiency-related traits were determined using 152 soybean recombination inbred lines (RILs) ( Table 1) and 192 soybean accessions (Table 2). These traits included plant height (PHt), acid phosphatase activity (APA), leaf P concentration (PC) and 100-seed weight (100-SW) under low-P (2P, soil available P,5 mg kg 21 ) and high-P (+P, soil available P.20 mg kg 21 ) conditions; the relative values of the traits 2P/+P are denoted as RPH, RAPA, RPC and 100-RSW, respectively. As shown in Table 1, the phenotypic RIL values ranged from 0.12-0.62 mg g 21 for PC, 0.13-0.58 mg g 21 for the pod P concentration (PPC) and 0.70-3.01 mmol r-NP min 21 mg protein 21 for APA in the low-P condition. The transgressive segregation of Pefficiency-related traits was obvious, and the phenotypic variation was significantly affected by the genotypes and treatments. In addition, analysis of variance (ANOVA) indicated that the phenotypic APA variation between the two parents (Bogao and Nannong 94-156) was significant (P = 0.01) in different P conditions ( Table 1). The mean APA values for the individual accessions in the natural population ranged from 1.34-2.12 mmol r-NP min 21 mg protein 21 , and the mean PC values ranged from 0.07-1.84 mg g 21 . Among the lines of diverse soybean accessions, the PC levels reached 1.84 mg g 21 ; however, one soybean accession had a PC of only 0.07 mg g 21 (Table 2). Overall, the soybean plants clearly exhibited considerable natural variation in the traits related to P efficiency and displayed very high genetic diversity.

P efficiency related QTL linkage mapping in the RIL population
Several P-efficiency-related QTLs have been identified in our previous works [14,15], most of which focused on P efficiency at the soybean seedling stage. However, 90% of soybean P absorption occurs during the reproductive stage [33,34]. Therefore, studying the P-efficiency-related traits during the reproductive stage is expected to be better suited for identifying the QTL associated with P efficiency. In this study, we initially aimed to

Author Summary
In soybean plants, low-phosphorus (P) stress is more detrimental than other nutrient deficiencies, toxicities or diseases; however, it is difficult to select soybean varieties with high P efficiencies based on phenotypes. Although QTL map-based cloning is a powerful method, it is timeconsuming, and the QTLs underlying P efficiency in soybeans have not been identified. Combining linkage and association analyses, we identified a highly significant region on chromosome 8, qPE8, which is associated with soybean P efficiency. Gene expression and plant transformation experiments indicated that the gene GmACP1 within qPE8 affects P efficiency. A haplotype survey of GmACP1 identified 10 haplotypes that explained 33% of the variation in P efficiency. The discovery of the optimal haplotype of GmACP1 will improve the accuracy of selecting soybeans with higher P efficiencies and increase our understanding of the molecular mechanisms underlying P efficiency in plants.
identify the QTLs related to P efficiency during the soybean reproductive stage using 152 RILs in various environments. We used the relative values RPH, RPC, RPPC, RAPA and 100-RSW as indices to assess P efficiency in soybeans. Three primary QTL were found to be associated with P efficiency which were located on three soybean chromosomes (8, 14 and 18) (Table 3). Of the three loci, a stable QTL (for RAPA, RPPC and RPC) termed qPE8 (QTL for P efficiency related traits on chromosome 8) was identified; this QTL encompasses a 6.3-Mb region and explained up to 41% of the phenotypic variation (Table 3, Figure S1). This locus was consistent with previously identified QTL associated with P efficiency in soybean [14,15].

GWAs for P efficiency related traits in a natural population
To overcome the limitations of linkage analysis, we conducted GWAs to identify the loci associated with P efficiency using 1,536 single nucleotide polymorphisms (SNPs) [35] and the relative phenotypic values at several sites and over several years as P efficiency phenotypes generated in 192 soybean accessions. GWAs were conducted across all 192 genotypes using SNPs with minor allele frequencies (MAF).0.05. In addition, the GWAs were separately associated with the P efficiency related phenotype, APA and PC across sites and years ( Figure 1). A mixed linear model (MLM), which controlled for the complex population structure and pedigree relationships, was used in each analysis to correct for the confounding effects of the subpopulation structure and relatedness between individuals. The relative performance of the MLM in all traits was evaluated to control false positives in the studied population, as shown in the quantile-quantile (QQ) plots ( Figure S2). The smaller deviation of the observed P value from the expected value indicated that the MLM was suitable for the control of type I errors. Seventy-four significant SNPs associated with P efficiency were identified and organized by the GWAs into six major clusters on chromosomes 8, 9, 11, 13, 18 and 19 across various environments ( Figure 1).
Next, the SNPs identified by the GWAs were compared to the positions of QTL regions identified by linkage mapping in this study and previous reports ( Figure 1). First, the results indicated that the GWAs identified more phenotype-genotype associations and provided higher resolution; however, linkage QTL mapping identified rare specific alleles that were undetectable by the GWAs. Second, most of the highly significant associated regions identified by the GWAs were also identified and validated by linkage QTL mapping ( Figure 1). Finally, a highly significant SNP cluster (including four SNPs, P = 7.5610 28 ) on chromosome 8, colocalized to qPE8, was found to be flanked by Satt089 and Sat_310. In addition, previous studies have shown that this locus is associated with P efficiency at the soybean seedling stage [9,10]. Therefore, we analyzed qPE8 to identify the P-efficiency-related genes located within this region. At this stage of the study, however, it was not possible to discriminate among the candidate genes for P efficiency in this region. Therefore, we decided to investigate the region between Satt089 and Sat_310 ( Figure 2A).
Regional association mapping to refine the qPE8 region within 250 kb To refine the major QTL region qPE8, which was identified by both association and linkage mapping and shown to stably affect soybean P efficiency across various environments, we performed a regional association mapping analysis with 18 SSR markers and 22 SNP markers (an average of one marker per 0.15 Mb) in a population of 192 soybean accessions ( Figure 2B). The regional scan of the association analysis revealed that six markers (BARC-045183-08900, Satt089, Sat_233, BARC-039899-07603, BARC-028281-05808 and Satt377) were significantly associated with the RAPA in the qPE8 genomic region (Table 4). Of these, Sat_233 and BARC-039899-07603 were closely associated with the RAPA (P = 2.0610 26 and P = 6.7610 25 , respectively). In addition, the haplotype between Sat_233 and BARC-039899-07603 explained more of the phenotypic variation (27.2%) than any other combination of markers. These results suggested that the target gene was located in a region of approximately 250 kb between Sat_233 and BARC-039899-07603 ( Figure 2C).

GmACP1 is the candidate gene for the QTL qPE8
A comprehensive analysis of the region between Sat_233 and BARC-039899-07603 predicted 28 annotated genes (Table S1). Five of the 28 genes (Glyma08g20700, 20710, 20800, 20820 and 20830) were considered candidate genes for P efficiency or plant stress following BLASTP queries of the protein database (http:// www.ebi.ac.uk/Tools/sss/ncbiblast/) and synteny analyses between soybean and other dicotyledonous plants. The candidate genes included Glyma08g20700, which encodes calcineurin B, whose Arabidopsis homolog responds to abiotic stress [36], and Gly-ma08g20710, which encodes phospholipase D, whose Arabidopsis homolog responds to phosphate starvation [37]. The candidate gene Glyma08g20830 encodes a protein phosphatase whose yeast homolog is essential for eliciting adequate cellular responses to stress [38]. The candidate genes Glyma08g20800 and Glyma08g20820 encode putative phosphatases, and their homologs in the common bean and tomato are strongly induced during P starvation [39,40].
To assess the responses of these five putative candidate genes to low-P stress in soybean, a quantitative real-time PCR (qRT-PCR) analysis was performed using two representative accessions. The candidate gene Glyma08g20820, which encodes an acid phosphatase (termed GmACP1), was dramatically upregulated during low-P stress ( Figure S3). However, the expression of the remaining four genes did not change significantly in response to low-P stress. GmACP1 is 2,283 bp long and encodes a 272-amino acid polypeptide. The deduced amino acid sequence of GmACP1 is 86% identical to a common bean acid phosphatase (PvPS2) and 70% identical to a tomato acid phosphatase (LePS2) ( Figure S4). An alignment of the amino acid sequences of GmACP1 and other plant phosphatases revealed two highly conserved motifs, namely, motif 1 ''DFDXT'' and motif 2 ''GDGXXD'', which are members of the haloacid dehalogenase (HAD) and DDDD (4-aspartate) enzyme superfamilies, respectively ( Figure S4). These enzymes catalyze various hydrolytic and phosphotransferase reactions [41,42]. Furthermore, two conserved aspartic acid residues were identified in the DFDXT motif; the first Asp might be transiently phosphorylated during the P-transfer reaction [43]. The second conserved motif, GDGXXD, is generally observed in phosphatases rather than phosphomutases [42,43]. These results indicate that GmACP1 may encode a putative phosphatase in soybean.
To further confirm the candidate gene, we analyzed GmACP1 expression by performing qRT-PCR on different soybean plant tissues. GmACP1 was expressed at high levels in the roots, shoots and leaves and low levels in the flowers, but it was undetectable in the pods (data not shown). qRT-PCR was then conducted on the roots and shoots of eight representative soybean accessions, including the two parents used for linkage mapping. GmACP1 expression was increased by 26-to 110-fold in the accessions with high P efficiency (such as Nau 4); however, the expression was increased by only 2-to 9-fold in the accessions with low P efficiency (such as Nau 3) ( Figure 3). The GmACP1 coding sequence was expressed in Escherichia coli (BL21). The bacteria expressed high levels of GmACP1 following induction with isopropylthio-b-galactoside (IPTG), and the protein was detected in the soluble fraction. The expressed protein was resolved as a 31.8-kDa peptide on an SDS-PAGE gel ( Figure 4A). The recombinant protein had a low but significant phosphatase activity relative to the control. The phosphatase activity had an optimal pH of 4.0 ( Figure 4B), suggesting that GmACP1 encodes an acid phosphatase and is the candidate gene in the QTL qPE8.

Soybean hairy root transformation confirms the function of GmACP1
To elucidate the function of GmACP1, a GmACP1-pMDC83 overexpression vector was constructed and introduced into soybean roots via Agrobacterium rhizogenes-mediated transformation [44]. The transgenic hairy roots were verified by PCR amplification ( Figure 5B). After 20 days of growth on vermiculite, the transgenic hairy roots were transferred into B&D solution (Broughton and Dilworth, 1971) supplemented with r-nitrophenyl phosphate (r-NPP) for seven days. The yellow color generated by the secreted acid phosphatase was more intense in the transgenic hairy roots overexpressing GmACP1 than in the hairy roots transformed with the control vector ( Figure 5A). Furthermore, the control plants but not the composite transgenic plants developed the symptoms of P deficiency, such as curled leaf edges and deep green leaves. The transgenic soybean hairy roots exhibited a 2.3-fold increase in APA and 11.2-20.0% more efficient P usage by the roots relative to the controls. The transgenic soybean hairy roots also exhibited improved P efficiencies with 11.4-22.8% increases in root dry weight and 23.8-39.0% increases in root P concentration, compared with the controls (Figure 5C-F).
GmACP1 polymorphisms are associated with soybean P efficiency related traits Sequencing analysis indicated that polymorphisms (including single-base changes and indels) existed in the exons and introns of the GmACP1 genes from the 192 soybean accessions, including the two parents used for the linkage mapping analysis. We identified and filtered out 16 SNPs ( Figure 2D) for the subsequent association analyses. Furthermore, we observed that these 16 SNPs exhibited strong linkage disequilibrium (LD) ( Figure S5) and could form four LD blocks ( Figure 2E). Because GmACP1 was upregulated in response to low-P stress, and overexpression of GmACP1 increased the APA and P efficiency in soybean roots, we evaluated whether GmACP1 polymorphisms were correlated with the variation of P efficiency related traits; we observed that GmACP1 polymorphisms between the two parents were significantly associated with APA variation ( Figure S6). As shown in Table 5, gene associations in the 192 soybean accessions confirmed that GmACP1 plays a key role in regulating the APA and PC. Six probable causative sites, including Indel170, Indel423, Site 432 (S432) and S433 in intron 1, S1106 in intron 3 and S1688 in exon 4, were significantly associated with variations in P efficiency related traits after analysis by Tassel version 4.0 [45] (Table 5). Indel170 was significantly correlated with the RAPA, accounting for 32% of the phenotypic variation (P = 1.84610 241 ), which was estimated by multiple linear regression (MLR) using SAS version 9.0 (SAS Institute Inc., Cary, NC, USA), and RPC accounted for 23% of the phenotypic variation (P = 5.32610 231 , MLR). The accuracy of the algorithm (MLR) was estimated using 5-fold cross validation, and the frequency distributions of R 2 based on the 5-fold cross validation are shown in Figure S7. A single-base transversion at S1688 resulted in the substitution of tyrosine with phenylalanine, which explained 22% of the variation (P = 5.35610 231 , MLR).
Various GmACP1 haplotypes display variation in gene expression and acid phosphatase activity The strong LD of the six probable causative sites may cause overlapping effects. Therefore, the GmACP1 haplotypes were used to determine the joint effects of the six putative causative sites related to P efficiency. Ten haplotype classes were observed in the natural population (H1-H10) ( Figure 6A). The RAPA and RPC values were highest when the favorable alleles were combined into H9 (40TTAT). Overall, the ten haplotypes explained 33% of the phenotypic variation (P = 5.81610 243 , MLR) for the RAPA, and a 2.6-fold difference was observed between two of the most differentiated haplotype classes. Specifically, haplotype H9 (40TTAT) was observed in 16 soybean accessions, whereas H4 was observed in 78 soybean accessions. Notably, the proteins encoded by H9 and H4 differ by only one amino acid, phenylalanine vs. tyrosine. The acid phosphatase activity of the protein encoded by haplotype H9 expressed in vitro was 1.6-fold higher than the protein encoded by haplotype H4 ( Figure 6B).
Expression profiling of GmACP1 in soybean roots after seven days of low-P stress indicated that GmACP1 expression correlated well with the RAPA under various P conditions. Accessions with the favorable haplotype '40TTAT', which have a 6-10 bp deletion in intron 1 and a nucleotide transition in exon 4, showed higher GmACP1 expression levels (Figure 7), suggesting that the variation among the different haplotypes was responsible for the diverse APA in the soybean accessions. Based on the transformation of the soybean hairy roots with GmACP1 and the strong association between GmACP1 polymorphisms and traits related to soybean P efficiency, we speculated that GmACP1 is a key P efficiency related gene in soybean. To our knowledge, this is the first P efficiency gene that has been cloned based on genetic mapping and genomic sequence data in soybean.

Development of a functional marker for soybean P efficiency
In this study, PCR markers for Indel170 were assayed in 14 soybean accessions, representing varieties with high and low P efficiencies, which were previously identified using conventional selection studies [12,14,46,47] (Table S2). As shown in Figure 8, all seven accessions with low P efficiencies produced a 312 bp amplicon, as did one variety with moderate P efficiency that did not carry the deletion allele ( Figure 8). The remaining six accessions with high P efficiencies produced amplicons of 302-306 bp because of the deletion of the Indel170 site. These results confirmed that Indel170 is highly associated with P efficiency and validated Indel170 as a functional marker.

Discussion
Since P is an essential macro-element for all living cells, it is indispensable in entire ecosystem, especially in agricultural production systems [11]. Low soil P availability has a profound influence on global agriculture and food production [3]. Crops with high P efficiency need to be developed to maximize yield in low-input systems such as poor soils [48]. This development can be achieved by the selection of cultivars with enhanced phosphateuptake or phosphate-use efficiency by traditional plant breeding programs or genetic engineering [49,50].
Recently, it has been shown that the capacity of plants to scavenge P from soils could be increased by upregulating the proton-translocating pyrophosphatases (H + -PPases). For example, overexpression of H + -PPases in Arabidopsis, tomato, rice, alfalfa or cotton leads to more elaborate root systems and increased production of leaves, fruits and seeds [51]. Plants possess a large number of adaptive responses to phosphate limitation, which are avenues to enhance the phosphate scavenging abilities; therefore, more related genes should be characterized.
Although several P-efficiency-related QTLs have been mapped in Arabidopsis [52], rice [4], wheat [53], maize [50] and soybean [12,14,15], few of these QTLs have been cloned. Based on previous mapping of the soybean P efficiency QTL [14,15], we combined linkage mapping, association mapping, bioinformatics and expression analysis to clone a P efficiency QTL, qPE8. Using in vitro expression and hairy root transformation we confirmed the function of the gene GmACP1. Finally, we identified the elite alleles and haplotypes of GmACP1, and developed valuable markers for breeding applications. This strategy could be useful for the dissection and cloning of the resting P efficiency QTL in the soybean genome and the P efficiency QTL in other plants.

GmACP1 functions as an acid phosphatase
The cloned soybean P efficiency gene GmACP1, showed significantly increased expression after P starvation (Figure 3 and  7). Comparison of the deduced GmACP1 amino acid sequence with related proteins ( Figure S4) and a protein activity assay ( Figure 4B) confirmed that GmACP1 encodes an acid phosphatase.
Acid phosphatases are a family of enzymes that are widespread in plants, animals, algae and fungi [54]. They can catalyze the hydrolysis of various phosphomonesters in acidic medium to release an inorganic phosphate and play a pivotal role in P metabolism [55]. In the past decades, studies have revealed that acid phosphatase activity (APA) could be used as a diagnostic criterion for human diseases [54,56], algae and plant P deficiency [57,58]. In animals, acid phosphatases are normally found at low concentrations. However, pronounced changes in their synthesis occur in particular diseases, such as prostate cancer, bone dysplasia etc., where unusually high or low enzyme expression is seen as part of the pathophysiological process [56,59]. Phosphatase activity has also been used as a P deficiency indicator in algae and in natural plankton populations [58]. In plant, for example, tomato (Lycopersicon esculentum) acid phosphatase gene LePS2, and the bean (Phaseolus vulgaris) acid phosphatase genes PvPS2 were specifically induced upon P starvation [39,40]. We observed also that P starvation increased the soybean root expression of GmACP1 110-fold relative to the control ( Figure 3). Furthermore, in the 192 soybean accessions, the leaf APA was significantly associated with leaf P content (PC) (P,0.0001), suggesting that the leaf APA is also  Figure 3. Quantitative real-time PCR of GmACP1 in ten representative accessions with diverse phosphorus (P) efficiencies. The Y-axis denotes the ratio of GmACP1 expression under low P (2P) and high P (+P, control) conditions after seven days. The levels of GmACP1 in the accessions with high P efficiency marked by the green box (Nau4, Nannong 94-156, the high P efficiency parent of the RILs; Nau50; Nau182 and Nau203) were increased by 26-to 110-fold; those in the accessions with low P efficiency marked by the red box (Nau3, Bogao, the low P efficiency parent of the RILs; Nau2; Nau18 and Nau190) were increased by 2-to 9-fold. Nau9 and Nau108 are representative accessions with moderate P efficiencies. doi:10.1371/journal.pgen.1004061.g003 a diagnostic criterion for P deficiency in soybean and can be used to screen for high P efficiency germplasms in soybean.
In addition, acid phosphatases can effectively improve the acquisition and utilization of organic P complexes [55]. Therefore, the overexpression of genes encoding acid phosphatases is an attractive paradigm to develop plants with higher P efficiencies [2]. Wang et al (2009) overexpressed an Arabidopsis purple acid phosphatase gene (AtPAP15) in soybean hairy roots and found that the APA was increased by 1.5-fold in transgenic hairy roots [6]; they also overexpressed AtPAP15 in soybean plants and observed significantly improved P efficiencies and yields. By overexpression in bean hairy roots and Arabidopsis, Liang et al (2012) observed that the putative bean protein phosphatase PvS2:1 was involved in root growth and P  accumulation [60]. In this study, the APA, root P content and root dry weight were significantly increased in the transgenic hairy roots overexpressing GmACP1 (Figure 5), indicating that the overexpression of GmACP1 could facilitate the uptake and use of organic phosphate in the culture solution by the transgenic soybean hairy roots.
Comprehensive analyses of transcripts in plants such as Arabidopsis [61] and white lupine (Lupinus albus) [62] have revealed the functional classification of the P-responsive genes in diverse metabolic pathways, including transcriptional regulation, ion transport, signal transduction, and other processes related to growth and development, indicating that the plant response to P deficiency is controlled by a complex network. The further characterization of the key genes in this network would provide deeper insight into the regulatory mechanisms involved in plant responses to low-P stress.

Favorable haplotypes and alleles for high P efficiency
Stress resistance of an organism is a consequence of long-term co-evolution process. Resistance genes must maintain high mutation rates and high levels of nucleotide diversity to withstand environmental stresses [63]. SNPs, which include single base pair changes and small indels, are abundant and relatively stable in the genome, and they have been discovered within genes underlying observed traits [64]. However, SNPs within a narrow region are typically in high LD, i.e., adjacent SNP alleles on the same chromosome are strongly correlated [65]. Therefore, haplotype association is likely to be more powerful in the presence of LD, which could compensate for the bi-allelic limitation of SNPs and substantially improve the efficiency of QTL mapping [66,67]. Sequences and haplotype analysis of a candidate gene for aluminum (Al) accumulation, Nrat1, in 383 diverse rice accessions identified a tolerant haplotype that explained 40% of the Al tolerance variation and three non-synonymous mutations in Nrat1 that were predictive of Al sensitivity [28]. Association of the natural protein haplotypes of Heavy Metal ATPase 3 (HMA3) from 149 Arabidopsis accessions with leaf cadmium (Cd) accumulation and genetic complementation experiments identified five active haplotypes [68]. The elevated leaf Cd accumulation was associated with the reduced function of HMA3 caused by a nonsense mutation and polymorphisms that changed two specific amino acids [68]. Other recent studies include the identification of the downy mildew resistance genes in Arabidopsis [69] and the cyst nematode resistance gene in soybean [70]. Although some P efficiency genes have been characterized [11], to our knowledge, the natural variation and haplotype analysis of these genes and the possible underlying mechanism have not been reported.
In this study, haplotype analysis of GmACP1 gene revealed ten haplotypes (H1 to H10) ( Figure 6A). The RAPA, RPC and RSW of these ten haplotypes were significantly associated with each other, indicating that the P efficiency gene could contribute to seed yield and has therefore been retained during soybean domestication and breeding. Notably, the in vitro expressed protein encoded by the favorable haplotype H9 had 1.6-fold higher enzymatic activity than the protein encoded by the unfavorable haplotype H4 (Figure 6B), indicating that haplotype H9 might be significant for engineering plants with higher P efficiencies. Further studies on the origins of this favorable haplotype would be interesting.
In this study, the 192 soybean accessions used to identify the favorable haplotype of GmACP1 were selected to represent all six ecological regions of soybean cultivation in China and soybeans with differences in P efficiency. However, as more than 20,000 . P values are from an association analysis performed using the mixed model, incorporating population structure and kinship, using data from two years and two sites. d Phenotypic data across three environments were used for the analysis. R 2 values from multiple linear regression (MLR) of the data showing the percentage phenotypic variation explained. e P value from the MLR analysis of the data across three environments. f Fold change between two of the most differentiated haplotype classes. ns: not significant at P = 0.05; RAPA: relative acid phosphatase activity in two P conditions; RPC: relative P concentration in two P conditions. doi:10.1371/journal.pgen.1004061.t005 soybean accessions are preserved [71], other elite alleles of GmACP1 might be discovered using the stored soybean germplasms. These results can be incorporated into breeding efforts to design plants for more sustainable agriculture and a healthier environment.

Plant materials and field experiments
To map QTLs for P efficiency, a segregating soybean population consisting of 152 F 8:10 recombinant inbred lines (RILs) were used. These RILs were derived from a cross between the 'Nannong 94-156' variety that possessed high P efficiency and the 'Bogao' variety with low P efficiency. For the association mapping experiments, a natural population of 192 soybean accessions (Table S3), including landraces, cultivars and breeding lines collected from latitudes 53 to 24uN and longitudes 134 to 97uE in China, was used. These accessions were selected because they represented all six ecological regions of soybean cultivation in China and soybeans with differences in P efficiency. The seeds of each accession were obtained from the Germplasm Storage in the National Center for Soybean Improvement (Nanjing, China).
For the linkage mapping analyses, greenhouse trials using 152 RILs to map QTLs related to P efficiency were performed over two years at the Jiangpu Station in Nanjing, China, in 2006 and 2007 (all field experiments were conducted from June to October). For the association mapping studies, two greenhouse trials using the 192 accessions were performed at the Jiangpu Station in Nanjing, China and the Maozhuang Station in Henan, China in 2008, and a repeat experiment was performed in Nanjing in 2009. These experiments used a completely randomized design with a split-plot restriction. The primary plots included treatments (low P and normal P), and the subplots were the soybean genotypes. Three replicates were performed, with six plants per replicate. The soil had a very low P concentration of 2.51 mg kg 21 available P and contained 0.2 g kg 21 total nitrogen, 52.4 mg kg 21 available  (InDel170, InDel423, S432, S433, S1106 and S1688), and only the observed haplotypes are listed. The effects of these haplotypes were estimated based on the mean phenotypic values. The colored cells represent the favorable alleles, and the red-shaded cells represent polymorphisms that result in amino acid substitutions. Fold change between ''11TTGA'' and ''40TTAT'' represents the comparison between the best and the worst haplotypes as predicted by component allelic effects. RAPA: relative acid phosphatase activity under 2P and +P; RPC: relative P concentration under 2P and +P; RSW: relative 100-seed weight under 2P and +P. (B) Acid phosphatase activity of proteins from different haplotypes of GmACP1 expressed in E. coli. H4 denotes the haplotype ''11TTGA'' and H9 denotes the favorable haplotype ''40TTAT''. ** indicates a significant difference at P = 0.01. The error bars represent the standard errors of three independent repetitions. doi:10.1371/journal.pgen.1004061.g006  Table S2. L denotes the accessions with low P efficiency, H denotes the accessions with high P efficiency, and A represents the accession with moderate P efficiency. M, marker. doi:10.1371/journal.pgen.1004061.g008 Figure 7. Relationships between the acid phosphatase activities (APA), GmACP1 expression levels and haplotypes in the different accessions. The top panel corresponds to the APA from different accessions. The Y-axis denotes the relative APA (2P/+P). HNAPA2008 and NJAPA2008 denote the phenotypic data obtained in Henan and Nanjing in 2008. Nau10, Nau70, Nau80 and Nau90 are accessions with low P efficiency; Nau4, Nau103, Nau123 and Nau159 are accessions with high P efficiency; and Nau39 and Nau175 are accessions with moderate P efficiency. The bottom panel shows the GmACP1 qRT-PCR data at seven days following low P stress. The Y-axis denotes the relative expression of GmACP1 under 2P and +P. The accessions marked by green boxes have the favorable haplotype ''40TTAT'' for P efficiency whereas the accessions marked by red boxes have the unfavorable haplotype. doi:10.1371/journal.pgen.1004061.g007 K and 12.8 g kg 21 organic matter [9]. To evaluate the plant responses to low P availability, 60 mg kg 21 H 2 NCONH 2 and 20 mg kg 21 KCl or KH 2 PO 4 were applied to the low P and high P treatment pots in different growth stages, respectively.

Measurement of acid phosphatase activity, P concentration and plant dry weight
Leaf samples weighing approximately 0.1 g taken from soybean plants in the field experiments or soybean hairy root samples weighing 0.1 g were ground in liquid nitrogen and macerated in 1 mL of extraction buffer (50 mM sodium acetate buffer, pH 5.0). The extract was then centrifuged at 16,0006g for 10 min at 4uC. The supernatant (10 mL) was mixed with 490 mL of extraction buffer and used to assay the acid phosphatase activity (APA) with r-nitrophenol phosphate (r-NPP) as the substrate [39]. The acid phosphatase activity was measured at 37uC for 10 min, and the reaction was terminated using 1 mL of 2 M NaOH. The OD at 410 nm was measured in a spectrophotometer. In situ staining for APA was performed by culturing whole transgenic roots in B&D solution [72] with r-NPP for one week at 27uC.
The plant enzymes were deactivated by heat-induced denaturation at 105uC for 60 min; the samples were then oven-dried at 65uC for three days. The dried samples were milled and subsequently digested with concentrated H 2 SO 4 and H 2 O 2 to facilitate the determination of P concentration using the molybdate-blue colorimetric method [73]. The P efficiency in the hairy root (root P-use efficiency) was defined as the mass (mg) of root dry weight produced per mg of P absorbed by roots [74].

QTL mapping and analysis of gene synteny
For the linkage mapping analysis, the composite interval mapping (CIM) program of WinQTLCart version 2.5 [75] was used to detect QTLs for traits related to P efficiency using the 152 RILs. For each trait, empirical thresholds were computed using the permutation test (1,000 permutations, overall error level 5%) for CIM [76]. The confidence intervals were set as the map interval that corresponded to a 1-LOD decline on either side of the LOD peak. A genetic map comprising 306 markers was constructed and used for linkage mapping [14]. The genes in the target genomic region were predicted using the soybean genome information (http://soybase.org/SequenceIntro.php). The candidate genes related to P efficiency were identified by querying the predicted genes using BLASTP of the protein database (http://www.ebi.ac. uk/Tools/sss/ncbiblast/), and synteny comparisons were performed between soybean and other dicotyledonous plants.

Determination of transcript levels by qRT-PCR
The candidate gene expression was analyzed using hydroponically grown plants. The seeds were surface-sterilized with chlorine and germinated in sterile vermiculite. When the cotyledons were fully expanded, the soybean seedlings were selected and transferred to modified one-half Hoagland's nutrient solution supplemented with 500 mM KH 2 PO 4 for seven days. The seedlings were then transferred to modified one-half Hoagland's nutrient solution lacking P for seven days. Treatment with 500 mM KH 2 PO 4 was used as a control. The roots and shoots were sampled and stored at 270uC for further use. The total RNA was isolated from the roots of soybean plants using the RNA simple Total RNA Kit (DP419, TIANGEN, Beijing, China) and treated with 10 units of RNase-free DNase I (TaKaRa, Japan). The first strand of cDNA was synthesized using the SuperScript III First Strand Synthesis System (Invitrogen, USA). Gene expression was determined by qRT-PCR assays using the ABI 7500 system (Applied Biosystems, Foster City, USA). The PCR reactions contained 50 ng of the first cDNA strand, 0.5 mL of 10 mmol L 21 gene-specific primers (Table S1), and 10 mL of the real-time PCR SYBR MIX (QPK-201; TOYOBO). The PCR conditions were as follows: 95uC for 5 min and 40 cycles at 95uC for 15 s and 60uC for 60 s. The soybean tubulin gene (GenBank: AY907703.1) was amplified as a positive control, and a negative control reaction was performed using water instead of the cDNA. Three replicates were performed for each reaction, and the data were analyzed using the ABI 7500 Sequence Detection System (SDS) software version 1.4.0. The normalized expression, reported as fold changes, was calculated for each sample as DD CT = (C T, Target 2 C T, Tubulin ) genotype 2(C T, Target 2C T, Tubulin ) calibrator [77].

DNA sequencing and genotyping
The genomic DNA was isolated from the bulked leaf tissues of eight to 10 plants, as previously described [78]. The PCR primers for the different germplasms were designed using the Primer 3 online tool (http://frodo.wi.mit.edu/primer3/). To identify SNPs in the GmACP1 gene, we sequenced approximately 3,300 bp, including part of the promoter, the 59-UTR, the complete CDS, three introns and part of the 39-UTR in ten representative P-efficient accessions that displayed differences in P efficiency. These regions were amplified by PCR using the following primer pairs: (1) 59-AGCATCCACAGAAAAATCCC/AGAACGAGG-GAATAAAAGGG-39 at 58uC (annealing temperature), (2) 59-TTATTCCCTCGTTCTGCTCT/GAATAAGGCTCTGTTT-GGGT-39 at 58uC, (3) 59-TCACTTTTAGCAGCACTCTC/ CTGGTTGAACAAATCGGTGA-39 at 60uC, (4) 59-TCACT-TTTAGCAGCACTCTC/CTGGTTGAACAAATCGGTGA-39 at 58uC, and (5) 59-ATGAAAAGAGATGGATGCTA/AATA-CAAGTCCAATAACCTA-39 at 58uC. The PCR reactions were conducted in 50 mL volumes using the ExTaq polymerase (TAKARA, Kobe, Japan), following the manufacturer's recommendations. The PCRs were performed using a PTC-225 thermal cycler (MJ Research, Watertown, MA) as follows: 1 cycle of 5 min at 94uC; 30 cycles of 40 s at 94uC, 1 min at the appropriate annealing temperature of the specific primer pair and 1 min at 72uC; and 1 cycle of 10 min at 72uC. The DNA sequencing was performed at Takara (Dalian, China), and each PCR fragment was sequenced in both directions. All sequences were verified manually, and all observed singletons were verified by sequencing the newly amplified fragments. The sequences were aligned using the ClustalX software version 1.83 [79]. The polymorphism data were analyzed using the DnaSP software version 4.10 [80] to identify sequence variation. The identified SNPs with minor allele frequencies of at least 5% were genotyped in a larger sample from the population. The LD analysis was performed using the HapView 4.0 software program [81] on the entire GmACP1 sequence; a window size of 50 bp was used to plot the average D9 against the distance (base pairs). The significance of LD between the sites was determined using Fisher's exact test.

Association analysis
The 192 soybean accessions were genotyped with 82 unlinked SSRs that provided representative coverage of the soybean genome. The employed SSR markers are publicly available (http://bldg6.arsusda.gov/cregan/soy_map1.html). The population structure was inferred from the SSR data using the STRUCTURE software, version 2.2 [82]. Six independent runs were implemented using the following parameters: number of populations (K) from 1 to 10, burn-in time and Markov-chain Monte Carlo replication numbers set to 500,000, model of admixture and correlated allele frequencies. The K value was determined by LnP (D) in the STRUCTURE output, and the population structure at K = 2 was used for the association analysis, as previously reported [83]. STRUCTURE produces a Q matrix that lists the estimated membership coefficients for each individual in a cluster; this information was used in the subsequent association analysis. The relative kinship matrix was calculated using 1,536 SNP markers [35] for the 192 accessions by SPAGeDi, which is a program that analyzes the spatial genetic structure at the individual or population levels [84]. The negative kinship values between individuals were set to zero.
Both the genome-wide association and candidate-gene association analyses were performed based on the 192 natural soybean accessions and the 1,536 SNPs [35] using a mixed-model approach that controlled for complex population structure and pedigree relationships [85], implemented in TASSEL version 4 [45]. This approach simultaneously accounted for multiple levels of relatedness based on random genetic markers, which were used to establish the population structure and kinship matrix. All polymorphisms (frequency .5%) were tested, and the P-value for each site was estimated based on 1,000 permutations of the dataset under a mixed linear model (MLM). Markers were defined as being significantly associated with traits on the basis of a significant association threshold (2LogP$2.00, P#0.01). Associations for the relative acid phosphatase activity (RAPA) and relative P concentration (RPC) were performed on the 192 soybean accessions; only the significant association sites were reported. We performed a multiple linear regression (MLR) using SAS version 9.0 (GLM) (SAS Institute Inc., Cary, NC, USA) to detect the effect of GmACP1 polymorphisms and haplotypes in this natural soybean population. Furthermore, we used a 5-fold cross validation to estimate accuracy of the algorithm (MLR). We divided the phenotypic data into five segments, four of which were used for training, whereas one segment was omitted and used for testing. This was performed 1000 times; in the first instance, the first segment was used for testing and the remainder was used for training; then, the second segment was used for testing and the remaining segments were used for training, and so on. For the regional association mapping studies, the 192 soybean accessions were genotyped with 18 SSR markers and 22 SNP markers based on the soybean physical map (http://soybase.org/ gbrowse/cgi-bin/gbrowse/gmax1.01/#search). The regional association analysis was performed as described above.

Expression and purification of GmACP1 in E. coli
The GmACP1 coding sequences (including the sequences of high/low P efficiency haplotypes) were was cloned into the pET28a expression vector and expressed in E. coli (BL21) cells, as described by Baldwin et al. [39]. The recombinant clones contained His tags at both the N-and C-terminal ends of the GmACP1 peptide. The E. coli cells were induced with IPTG to produce the recombinant protein and then lysed by sonication and run through a Ni-affinity column, according to the manufacturer's recommendations (GenScript, TM0217). The purity of the affinity-purified GmACP1 protein was confirmed on an SDS-PAGE gel. The GmACP1 protein concentration was measured using the Bradford Protein Assay Kit (Bio-Rad, Hercules, CA). The reactions contained 150 ng of GmACP1 protein and 1 mg of synthetic substrate r-nitrophenyl phosphate in 500 mL of NaAc-HAc solution (100 mM). Control assays were performed using an extract from E. coli containing only pET-28a. All reactions were performed in triplicate. The GmACP1 phosphatase activity was measured at 37uC for 10 min with pH values ranging from 3.0 to 9.0.

Soybean hairy root transformation
The GmACP1 overexpression vector was constructed using the Gateway technology with the Clonase II Kit (Invitrogen, Carlsbad, CA). The GmACP1 open reading frame (ORF) was amplified from the cDNA of soybean accession Kefeng No. 1, which is a variety with high P efficiency, using gene-specific primers (forward: 59-GGGGACAAGTTTGTACAAAAAAG-CAGGCTTCCAACATGTCTGGAACCGTGAT-39, reverse: 59-GGGGACCACTTTGTACAAGAAAGCTGGGTCTGGTCTA CTGGGAGGACT-39). Two recombination reactions (BP and LR reactions) constitute the basis of the Gateway cloning technology. Finally, the amplified fragment was introduced into the pMDC83 vector according to the manufacturer's instructions and confirmed by sequencing. An empty vector was constructed using a single enzyme digestion method. The plasmid pMDC83 was digested with the restriction endonuclease KpnI and re-annealed following the removal of the fragment of the recombination site between attR1 and attR2. This process was confirmed by PCR (forward: 59-GGTTGGCCATGGAA-CAGGTA-39, reverse: 59-GAGGACCTCGA CTCTAGAAC-39) and sequencing analyses. Next, the expression vector containing the GmACP1 gene and empty vector were introduced into Agrobacterium rhizogenes K599 (kindly provided by Prof. Peter Gresshoff) through the freeze-thaw method. Soybean hairy root transformation was performed using the accession Kefeng No. 1, as previously described by Kereszt et al. [43]. The seeds were surface-sterilized with chlorine gas for 4 h prior to germination in vermiculite.  Figure S2 Quantile-quantile plots of estimated 2log10 (p) for phosphorus efficiency related traits from association analysis based on MLM with Q and K. HNAPA2008 and NJAPA2008 denote the phenotypic data obtained in Henan and Nanjing in 2008. The black triangles represent the P values expected under the null distribution. (TIF) Figure S3 qRT-PCR of five candidate genes in two representative accessions with different phosphorus (P) efficiency values (the Y-axis denotes the gene expression levels). Nau3 (L) is an accession with low P efficiency, and Nau4 (H) is an accession with high P efficiency (gene annotation: Glyma08g20700, Calcineurin B; Glyma08g20710, Phospholipase D; Glyma08g20800, Putative Phosphatase; Glyma08g20820, Putative Phosphatase; and Gly-ma08g20830, Protein Phosphatase). (TIF) Figure S4 Comparison of GmACP1 with other related proteins. Invariant residues are shown in bold, and other conserved residues are highlighted. Alignment of the amino acid sequences revealed two peptide motifs that are conserved in the active site of the HAD and DDDD superfamilies of hydrolytic phosphotransferases. The Asp residue predicted to be transiently phosphorylated during the phosphate transfer reaction is indicated with an arrow. The abbreviations and GenBank accession numbers for the acid phosphatase sequences analyzed are as follows: Phaseolus vulgaris putative phosphatase (ABP52095); Ricinus communis putative phosphatase (XP_002527425); Arabidopsis thaliana putative phosphatase (NP 173213); LePS2, Lycopersicon esculentum (AAG40473); and Zea mays putative phosphatase (NP_001151156). (TIF)