Allelic Variations of a Light Harvesting Chlorophyll A/B-Binding Protein Gene (Lhcb1) Associated with Agronomic Traits in Barley

Light-harvesting chlorophyll a/b-binding protein (LHCP) is one of the most abundant chloroplast proteins in plants. Its main function is to collect and transfer light energy to photosynthetic reaction centers. However, the roles of different LHCPs in light-harvesting antenna systems remain obscure. Exploration of nucleotide variation in the genes encoding LHCP can facilitate a better understanding of the functions of LHCP. In this study, nucleotide variations in Lhcb1, a LHCP gene in barley, were investigated across 292 barley accessions collected from 35 different countries using EcoTILLING technology, a variation of the Targeting Induced Local Lesions In Genomes (TILLING). A total of 23 nucleotide variations were detected including three insert/deletions (indels) and 20 single nucleotide polymorphisms (SNPs). Among them, 17 SNPs were in the coding region with nine missense changes. Two SNPs with missense changes are predicted to be deleterious to protein function. Seventeen SNP formed 31 distinguishable haplotypes in the barley collection. The levels of nucleotide diversity in the Lhcb1 locus differed markedly with geographic origins and species of accessions. The accessions from Middle East Asia exhibited the highest nucleotide and haplotype diversity. H. spontaneum showed greater nucleotide diversity than H. vulgare. Five SNPs in Lhcb1 were significantly associated with at least one of the six agronomic traits evaluated, namely plant height, spike length, number of grains per spike, thousand grain weight, flag leaf area and leaf color, and these SNPs may be used as potential markers for improvement of these barley traits.


Introduction
Light-harvesting chlorophyll a/b-binding protein (LHCP) is one of the most abundant proteins of the chloroplast in plants. It roughly accounts for half amount of the chlorophyll involved in photosynthesis. The main function of LHCPs is collecting and transferring light energy to photosynthetic reaction centers [1][2][3][4]. Many homologous genes encoding LHCPs from various plant species belong to one of the 10 members in the gene family [5][6][7]. Four LHCPs of photosystem (PS) I, named LHCI, are encoded by the Lhca1, Lhca2, Lhca3 and Lhca4 [7]. Three major PS II associated LHCPs, designated as LHCII and encoded by Lhcb1, Lhcb2 and Lhcb3, are highly homologous and probably form homoor heterotrimers [7,8]. Three other PS II associated LHCPs have been designated as minor LHCPs, including inner antenna chlorophyll a-binding complexes CP29, CP26 and CP24 that are encoded by the Lhcb4, Lhcb5 and Lhcb6 genes, respectively [8]. The minor LHCPs are monomeric and more closely associated with PS II than the major LHCPs [7,9]. However, the roles of each LHCP in the structure, function and regulation of the light-harvesting antenna systems remain to be discovered [10].
Several studies have postulated that the LHCP genes were down-regulated in stress conditions such as cold [11], high-salinity [11], drought [12,13] and infection by Puccinia triticina [14]. Moreover, a higher level of LHCP transcripts was detected in high osmotic adjustment (OA) plant of Oryza sativa spp. japonica, IR62266, than that in low OA CT9993 at a moderate level of dehydration stress [12]. Similarly, a higher level expression of a LHCP was observed in the drought-tolerant genotypes, Martin, than in the drought-sensitive genotype, Moroc9-75 under drought stress [13]. Ability of different accessions to adapt to stress conditions resides in their genetic diversity. Single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) are the most common forms of nucleotide variation in natural populations [15]. To date, the allelic variations in LHCP have not been systematically examined. The exploration of genetic variation in genes encoding LHCPs may facilitate a better understanding of functions of LHCPs and provide useful information and selection tools for plant breeders to improve plant with high photosynthesis efficiency.
Many techniques can be used for analysis of nucleotide variation within a gene. Sequencing is the most accurate approach, but is relatively expensive when applied in large numbers of individuals [16]. Since 2004, EcoTILLING, a variant of Targeting Induced Local Lesions in Genomes (TILLING) technique [17], has been increasingly used in several species for discovering nucleotide polymorphism of important genes in natural populations due to its high-throughput, accuracy, costeffectiveness [18][19][20]. In sunflower, seven SNPs and two indels were identified in a LHCP region using EcoTILLING technology in 19 elite inbred lines [21]. In barley, allelic variations were identified in mlo and Mla resistance genes [22] and drought-related genes [23] using the same method.
In this study, a natural population of 292 barley accessions with diverse geographical origins was analyzed using EcoTILLING technology to examine allelic variation of an Lhcb1 gene. A total of 23 nucleotide changes were detected with 31 distinguishable haplotypes in the germplasm collection. The potential association of SNPs with protein function changes was evaluated. Distribution of SNPs in accessions from different geographic origins (Africa, Middle East Asia, North East Asia, Arabian Peninsula, Australia and Europe) and genotypes (wild, cultivar and landrace) was investigated. In addition, association analysis between SNPs in the Lhcb1 and six agronomic traits of barley has been performed.

Plant materials and DNA extractions
A set of 292 barley (Hordeum vulgare L.) accessions was obtained from the International Center for Agricultural Research in the Dry Areas (ICARDA) ( Table 1 and Table S1). These accessions contain 171 H. vulgare landraces (VUL-LA), 82 H. vulgare cultivars or improved genotypes (VUL-IG) and 39 wild relatives H. spontaneum (SPON), which were collected from 35 countries in six geographic regions including Africa, Middle East Asia, North East Asia, Arabian Peninsula, Australia and Europe.
Genomic DNA of barley accessions was extracted from 200 mg young leaf tissue using a modified CTAB method [24]. DNA from all samples was quantified using a spectrophotometer and normalized to a concentration of 20 ng/ml.

Evaluation of agronomic traits
All accessions were evaluated for six agronomic traits, flag leaf area (FLA in cm 2 ), spike length (SL in cm), number of grains per spike (NGS), leaf color (SPAD value), plant height (PH in cm) and 1000-kernel weight (TKW in g) in field at the Experimental Station of Guangzhou University Guangzhou, Guangdong Province, China (23u169N; 113u239E, elevation 16 m asl). The experiments were repeated twice (2009/2010 and 2010/2011) with three replications. Eleven plants per genotype were planted in a single-row plot at 1.5 m long and 30 cm apart. Three randomly selected plants per genotype from each replication were characterized for six traits (Table 2) as described by Gupta [25] and Lakew [26].

Primers for Lhcb1
To screen for natural variation in the Lhcb1 of barley, nested PCR was employed to amplify coding region of the Lhcb1 as described by Wienholds [27]. The primer design was based on the published mRNA sequence (including complete coding region) of the Lhcb1 from GeneBank (accession no. AK359563.1) with melting temperatures around 60uC using Primer 5.0 software (Premier Biosoft International, Palo Alto, CA, USA) ( Table 3 and Fig. 1). The primer sequences of the gene were attached with an M13F sequence (59-cacgacgttgtaaaacgac) in 59-end of forward primers or an M13R sequence (59-ggataacaatttcacacagg) in 59-end of reverse primers (Table 3) for second PCR. M13 forward primers labeled with IRDye800 at 59-end and M13 reverse primers labeled with IRDye700 at 59-end were synthesized by LI-COR Inc.

PCR amplification and EcoTILLING assays
For EcoTILLING assay, the mRNA sequence of the Lhcb1 was amplified by nested PCR as described by Wienholds [27] with minor modifications. The accession ICARDA IG 26727 was selected as a reference. Initial PCR amplification of the target region was performed using 20 ng of genomic DNA (1:1 reference to sample DNAs) in a volume of 10 ml containing 1.0 ml of 106PCR buffer, 0.1 mM of forward and reverse gene-specific primers, 2.5 mM MgCl 2 , 0.4 mM dNTPs, and 0.4 U Taq DNA Polymerase (Bio Basic Inc., Toronto, Canada) under the following conditions: 5 min denaturation at 94uC followed by 35 cycles of 30 s at 94uC, 45 s at 58uC and 1 min at 72uC, and a final step of 3 min at 72uC for additional PCR extension. The PCR product was diluted in 90 ml of distilled water as template for second round nested PCR.
The second round of PCR was carried out in a 10 ml solution containing 1 ml of initial PCR product, 1.0 ml 106PCR buffer, 0.02 mM M13F-tailed gene-specific forward primer, 0.04 mM M13R-tailed gene-specific reverse primer, 0.08 mM IRD800- . Thermocycling conditions consisted of an initial step of 94uC for 1 min followed by 38 cycles of 20 s at 94uC, 30 s at 58uC and 1 min at 72uC, and a final step of 3 min at 72uC. After the nested PCR, heteroduplexes formation was performed by incubating the reaction mix at 99uC for 10 min, followed by 70 cycles starting at 70uC for 20 sec with a decrement of 0.3uC in subsequent cycles and then holding at 4uC. Heteroduplex DNA was cleaved at 45uC for 15 min in a 20 ml of reaction solution containing 10 ml PCR product, 10 mM HEPES (pH 7.5), 10 mM MgSO 4 , 0.002% (w/v) Triton X-100, 0.2 mg/ml of bovine serum albumin, and 0.4 ml CEL I enzyme. CEL I enzyme was prepared following Guo and Li [28]. Digestion was stopped by addition of 5 ml of 0.25 M EDTA (pH 8), mixing thoroughly, and then put on ice. Digested products were separated in a LICOR 4300 DNA Analyzer (LICOR, Nebraska, USA) using 6.5% denaturated polyacrylamide gel electrophoresis running at 1500 V, 40 mA, 50 W and 45uC for 5 hours.
During electrophoresis, the LI-COR DNA analyzer captured two images in IRD700 and IRD800 channels, respectively. Tiff images were manually scored using the GelBuddy program [29]. Big dark bands with different sizes in both IRD700 and IRD800 channels were considered as a polymorphic site (Fig. 2). Total length PCR products from both channels should be equivalent to the fragment size of the undigested PCR product. Data summary reports generated by GelBuddy were imported to Microsoft Excel for further analysis. The number of haplotypes was estimated using Bayesian methods implemented in the program PHASE, version 2.1 [30,31].

DNA sequencing and statistical analysis
Once a polymorphism was identified, the corresponding DNA sample was amplified using gene-specific primers. The resulted PCR fragment was directly sequenced. Each polymorphic site was sequenced from more than one accession to confirm that only two alleles segregated at any specific site. Multiple sequence alignment was conducted using ClustalW software (http://www.ebi.ac.uk/ tools). The potential effect of SNPs on protein function was predicted using SIFT (Sorting Intolerant from Tolerant) [32] and PARSESNP (Position-Specific Scoring Matrix) programs [33]. Nucleotide diversity (p), haplotype diversity and Tajima's D [34] were calculated using DnaSP v5.0 [35].

Association between SNPs and agronomic traits
In order to test the effect of SNPs in the Lhcb1 on agronomic traits of barley, the association between SNP markers and traits was calculated using TASSEL software v3.0 (http://www. maizegenetics.net/tassel). To evaluate population structure, all barley accessions were genotyped with 21 genome-wide SSR molecular markers (3 SSRs for each chromosome) (Table S2), and three groups were defined (unpublished) using Structure software version 2 [36]. These independent group memberships were used as covariates in the genotype-phenotype association analysis with the GLM_Q model. The marker being tested was treated as a fixed effect. The significance of associations between markers and traits was tested using an F-test. The association between a marker  and a trait is represented by its R 2 value, an estimate of the percentage of variance explained by the marker.

Allele mining in the Lhcb1
EcoTILLING identified 23 natural variation sites in the amplified region of the Lhcb1 across 292 accessions. The frequency of polymorphic sites ranged from 0.003 to 0.264, with an average of 0.06 per polymorphic site in 292 samples (Table 4). Sequencing random samples containing each of these variation sites confirmed 20 single nucleotide polymorphisms (SNPs) and 3 insert/deletions (indels) in the 23 natural variation sites (Table 4 and Fig. 1). However, variation site was not identified in the two samples that showed two variation sites in EcoTILLING by sequencing ( Table 4). The Lhcb1 has a frequency of one SNP per 49.3 bp in 292 barley accessions. The ratio of transitions (C-T and A-G) to transversions (A-C, A-T, C-G and G-T) of SNPs was 15 to 5 in the targeted region of Lhcb1. In 20 sequence validated SNPs, nine sites were missense changes, eight were silent synonymous changes, and three were indels in the 39 downstream of non-coding region. Two of nine missense changes were predicted to be deleterious to the function of Lhcb1 protein (Table 4).  The nucleotide diversity (p) of the Lhcb1 was 0.00166 across 292 barley accessions. For different geographic regions, p values ranged from 0.0011 for European accessions (9 accessions) to 0.00212 for Middle East Asian accessions (56 accessions). Similarly, p for SPON was the highest among the three groups, SPON, VUL-LR and VUL-IG (Table 5). Tajima's D statistics was calculated to examine whether the SNPs in the sequenced region of Lhcb1 were neutrally selected. Resulting Tajima's D value was not significant (P,0.05) although a high negative value of 21.12884 was estimated. Thus, the Lhcb1 in the population did not significantly deviate from neutral selection.
The frequencies of the Lhcb1 haplotypes also differed significantly among the geographical regions of tested accessions (Table 6). This was particularly obvious for haplotype H26, which is most frequent in the Arabian Peninsula (0.357), but rare in African (0.018) and Middle East Asia (0.036) and completely absent in Australia, North East Asia and Europe. These rare haplotypes were usually confined to specific geographic regions. Of the 28 rare haplotypes (,10% in the accessions sampled), 20 were unique to only one region with nine accessions exclusively from Middle East Asia, six from North East Asia, three from Africa, and two without information on their origins. The Lhcb1 haplotype diversity for each geographic region ranged from 0.556 (Europe) to 0.903 (Middle East Asia) with a mean of 0.768 (Table 5). These values in general corresponded to the number of The first letter indicates the common bp at this site, followed by the position of the SNP in the sequence on GenBank accession number AK359563.1, and then the nucleotide which is the rare variant at this site. b All nucleotide changes identified by sequencing were first by EcoTILLING as a band on the gel image. In two sample, ,242 bp and ,532 bp were identified on the EcoTILLING gel for which corresponding polymorphisms could not be confirmed by sequencing. c Frequency was calculated by dividing the number of similar nucleotide changes identified on the EcoTILLING gel by the number of samples analyzed. d The first letter indicates the common amino acid at this site, followed by the position of the SNP within the predicted protein sequence and then the amino acid change induced by the variant nucleotide polymorphism. '' = '' means no change in the amino acid encoded by that codon (synonymous variation). e A non-synonymous SNP is predicted to be damaging to the encoded protein if the PARSESNP score is .10 (bold). f A non-synonymous SNP is predicted to be damaging to the encoded protein if the SIFT score is ,0.05 (bold). *Adjacent polymorphisms appear as a single band on the gel image. doi:10.1371/journal.pone.0037573.t004 Lhcb1 haplotypes discovered with some exceptions. For example, accessions from Middle East Asia had the highest haplotype diversity of Lhcb1, and also the most Lhcb1 haplotypes (n = 18). However, the accessions from North East Asia had a very low haplotype diversity value, but the Lhcb1 haplotypes (n = 14) second to Middle East Asia due to majority of low-frequency haplotypes in this region.
In addition, significant difference in Lhcb1 haplotype diversity was observed among three barley groups, i. e. SPON, VUL-LR and VUL-IG (Table 5), with SPON having the highest haplotype diversity (Table 5). Although three groups had six haplotypes in common, SPON, VUL-LR and VUL-IG each had ten, seven and two unique haplotypes, respectively (Table 6).

Association between SNPs and phenotypic traits
Association analysis was performed to find tentative association between nucleotide variations in Lhcb1 with agronomic traits. Because 14 SNPs were either linkage disequilibrium (LD) within subgroups or rare alleles (frequency ,3%), only nine distinct SNPs were used for association analysis. Among them, five SNPs were significantly associated (P,0.01) with one or two phenotypic traits, with one SNP that were highly significantly associated (P,0.001) with two phenotypic traits ( Table 7). The percentage of variation of a given trait explained by each associated SNP was up to 8.0% with an average of 3.9%. The SNP at position 907 bp in the Lhcb1 was highly associated with SL and NGS (P,0.001), and explained 8.0% or 5.3% and 5.0% or 5.6% of the variation for SL and NGS in both seasons, respectively. Another SNP at position 1006 bp exhibited significant association (P,0.01) with SL, explaining 2.7% and 2.6% phenotypic variation for the SL in both seasons. The SNP at position 463 bp was significantly associated (P,0.01) with FLA and LC, explaining 3.0% and 2.2% phenotypic variation for the FLA in season one and LC in season two. Two SNPs (positions 589 bp and 961 bp) were significantly associated (P,0.01) with TGW, both explaining approximately 2.4% phenotypic variation in 2009 and 2010 experiments.

Use of EcoTILLING to discover SNP for specific genes in barley
EcoTILLING was initially used to characterize the variability of genes within a collection of Arabidopsis ecotypes [17]. Since then, it has been successfully used in the analysis of natural variability of in Populus trichocarpa [37], in wheat [19], in Brassica [20] and in barley [22,23]. Used in combination with sequencing, EcoTIL-LING becomes a fast, reliable, economical method for identifying polymorphisms and developing functional markers for plants [38]. Once polymorphisms are identified by EcoTILLING, individuals can be grouped according to haplotype and only interesting haplotypes and/or representatives from each haplotype need to be sequenced. In addition, EcoTILLING points at the approximate location of the polymorphism within the locus studied and, therefore, restricts the necessity of sequencing the complete locus but only the regions around the polymorphic sites [39]. In this study, all these advantages account for a reduction of more than 85% in number of sequencing reactions potentially required to identify the variability of the Lhcb1 in the germplasm collection.

Nucleotide of variation in Lhcb1
Many LHCP from various plant species have been identified by transcriptome analysis. However, the allelic variation in LHCP has not been systematically characterized. Fusari [21] found seven SNPs and two indels in a sunflower LHCP after screening 19 elite inbred lines using EcoTILLING. Our primary goal was to characterize genetic variation of an Lhcb1 in barley. To this end, a set of barley accessions originated from several geographic regions was selected for allele mining. EcoTILLING revealed 23 nucleotide changes including 20 SNPs and 3 indels in the Lhcb1, which formed 31 haplotypes in 292 accessions. Compared to previous report on an Lhcb2 in 24 unrelated black poplar [40], the nucleotide diversity (p = 0.00166) and haplotype diversity (0.819) of the Lhcb1 was lower. The average frequency of SNPs was 1 per 49.3 bp, which was higher than reported on an Lhcb2 (1SNP/ 73.9 bp) [40] and on a LHCP (1SNP/76.7 bp) [21]. In addition, Middle East Asia was identified as a hotspot of the haplotype diversity (0.903) ( Table 5), which is in agreement with several earlier reports that the barley accessions [41,42] and wheat accessions [43] from Middle East Asia had high genetic diversity. Among the three gene pools, SPON, VUL-LR and VUL-IG, SPON showed the highest nucleotide diversity (p) and the highest haplotype diversity in the Lhcb1 in this study, which supports the earlier observations of high genetic diversity in SPON [44][45][46].

Association between SNPs of Lhcb1 and agronomic traits
LHCP family in plants encodes many LHCPs that play essential roles in light capture and photoprotection in the photosystem. A strong relationship between the photosynthetic capacity and grain yield was observed in cereals such as wheat and maize [47,48]. It is critical that the photosynthetic capacities of both the total canopy and specific leaves are maintained throughout the entire plant life cycle, especially from flowering to grain maturity [49]. In agronomic terms, some 'stay green' mutants have higher kernel weights than wide type in maize. Thus 'stay green' traits have extensively used in improving grain yield under stress conditions such as drought and heat. However, little is known about the underlying genetics and molecular biology of the trait(s) even though some analyses have been performed in maize and sorghum [49,50].
Association analysis emerged as a powerful approach to search for the role of genetic polymorphisms in phenotype variations in responses to environmental stresses [51][52][53]. In this study, five SNPs in barley Lhcb1 were significantly associated with at least one agronomic trait. Of these five SNPs, two at positions 463 bp and 589 bp of Lhcb1 were missense mutations, but they did not severely affect protein function according to SIFT, and other three SNPs at positions 907 bp, 961 bp and 1006 bp were in a non-coding region. Due to low minor allele frequency, association data for three of these five SNPs at positions 463 bp, 589 bp and 961 bp should be interpreted with caution and need to be validated for individual cultivars involved in crosses before they can be applied to marker-assisted selection [54,55]. Further research on relationship between these newly detected SNPs in the Lhcb1 and other important agronomic traits may provide useful markers as selection tools to improve barley yield under stress conditions.
In conclusion, we have demonstrated EcoTILLING as an efficient approach for allele mining of barley candidate genes. Haplotype sequencing confirmed 23 nucleotide mutantions including 20 SNPs and 3 indels with 31 unique haplotypes in the Lhcb1 among 292 barley accessions from 35 countries. The results indicated that the accessions from Middle East Asia had the highest nucleotide diversity in the Lhcb1, and H. spontaneum exhibited greater genetic diversity than H. vulgare. Thus introgression of genes from Middle East Asian accessions or H. spontaneum in to cultivated barley may enhance genetic diversity. Association analysis showed that five SNPs in the Lhcb1 were significantly associated with at least one agronomic trait and these SNPs can be used in future studies to assess their usefulness as selection criteria for improving these agronomic traits.

Supporting Information
Table S1 General information of barley accessions used in this study.
(DOC)   FLA, flag leaf area (cm 2 ); NGS, number of grains per spike; LC, leaf color (SPAD); PH, plant height (cm); SL, spike length (cm); TKW, Thousand grain weight (g). The number of SNP positions is relative to the sequence on GenBank accession number AK359563.1. R 2 is the fraction of the total variation explained by the marker. *(P,0.01) indicates the SNP significantly associated with traits. **(P,0.001) indicates the SNP highly significantly associated with traits. doi:10.1371/journal.pone.0037573.t007