Cloning and Characterization of 5′ Flanking Regulatory Sequences of AhLEC1B Gene from Arachis Hypogaea L.

LEAFY COTYLEDON1 (LEC1) is a B subunit of Nuclear Factor Y (NF-YB) transcription factor that mainly accumulates during embryo development. We cloned the 5′ flanking regulatory sequence of AhLEC1B gene, a homolog of Arabidopsis LEC1, and analyzed its regulatory elements using online software. To identify the crucial regulatory region, we generated a series of GUS expression frameworks driven by different length promoters with 5′ terminal and/or 3′ terminal deletion. We further characterized the GUS expression patterns in the transgenic Arabidopsis lines. Our results show that both the 65bp proximal promoter region and the 52bp 5′ UTR of AhLEC1B contain the key motifs required for the essential promoting activity. Moreover, AhLEC1B is preferentially expressed in the embryo and is co-regulated by binding of its upstream genes with both positive and negative corresponding cis-regulatory elements.


Introduction
NF-Y (Nuclear Factor Y) transcription factor is ubiquitous in eukaryotic organisms. The three subunits of NF-Y, NF-YA, NF-YB, and NF-YC, play an important role in regulating the expression of multiple genes (both positively and negatively) by recognizing and binding to the CCAAT promoter sequence [1,2]. In the Arabidopsis genome, there are 36 NF-Y subunits, including 10 NF-YA, 13 NF-YB and 13 NF-YC. These subunits are differentially expressed in a tissue-or organ-specific pattern, or in the distinctive profile of developmental stages, and participate in regulating of many genes in a wide range of biological processes [3][4][5].
The NF-Y transcription factor genes such as LEAFY COTYLEDON1 (LEC1 or NF-YB9) and LEC1-LIKE (L1L or NF-YB6)-first identified in Arabidopsis-are genes related to embryonic development. AtLEC1 and AtL1L mRNA accumulate in different spatial and temporal patterns. Higher levels of AtLEC1 mRNA are present in the early-stage embryo at the proembryo stage, globular stage, transition stage, heart stage, torpedo stage, and curled cotyledon stage than in the late maturation embryo, but is not detectable in leaves, stems, roots, and flowers [6], while AtL1L mRNA levels are higher in seeds than in vegetative tissues. AtL1L RNA levels peak at a later stage of embryogenesis (mainly from the torpedo stage to the bent-cotyledon stage) as compared with LEC1 levels. Warpeha et al. (2007) [7] showed NF-YB6 and NF-YB9 expression in the 6-d-old etiolated seedlings of Arabidopsis. Siefers et al. (2009) [3] identified 36 nuclear factor transcription subunits that can combine to govern tissue-specific expression patterns of flowering time, embryo maturation, meristem development, etc. in Arabidopsis. The turnip (tnp) mutant represents a gain-of-function mutant of Arabidopsis LEC1. In tnp mutant, the elements required for the repression of LEC1 in vegetative tissue are deleted in the distal upstream promoter region causing a higher constitutive expression of LEC1 [8].
Here, we analyze the phylogenetic relationship among the peanut transcription factors AhLEC1A, AhLEC1B, and the Arabidopsis NF-YB transcription factors. We also cloned the 5 0 flanking regulatory sequence of the AhLEC1B gene and analyzed the cis-regulatory elements existing in this region by computational analyses. We further constructed a set of GUS expression frameworks driven by different length promoters with 5 0 terminal and 3 0 terminal deletion to identify the crucial regulatory regions and characterize the GUS expression patterns in their transgenic Arabidopsis lines.

Plant materials and growth conditions
Peanut (Arachis hypogaea L.) cv. 'Luhua 14' seeds were grown in the experimental field of Shandong Academy of Agricultural Sciences. Seeds at different developmental stages were collected at 10~70 days after pegging (DAP) and kept in -80°C refrigerator for isolation of total RNA and construction of a cDNA library.

Cloning of 5' flanking region of AhLEC1
Peanut genomic DNA was isolated from Luhua 14 leaves using CTAB method [9]. For each DNA library construction, 2.5μg genomic DNA was digested with four blunt-end restriction enzyme DraI, EcoRV, PvuII, and StuI respectively. The digested samples were purified with phenol and chloroform; and then 4μl digested DNA was connected with the BD Genome-Walker adaptor (Table 1) provided by BD GenomeWalker Universal Kit (Clontech, USA), resulting in the library containing digestions by DraI, EcoRV, PvuII, and StuI (LD, LE, LP, and LS). Based on the sequence of AhLEC1B genomic DNA (S1 Fig), two nested gene-specific primers (GSP), LEC1BGSP1-2 and LEC1BGSP2-2 (Table 1), were designed. The first round of PCR reaction was done as per the manufacturer's instructions in a 25μl reaction system using an AP1 (Table 1) provided by Kit and LEC1BGSP1-2 as 5' terminal and 3' terminus primer, and 1μl DNA of each library as template. The nested PCR reaction was also performed using the same volume and conditions with primers AP2 (Table 1) and LEC1BGSP2-2, and 1μl of the 10-fold diluted primary PCR products as template. The specific PCR fragments from the second round reaction were isolated and inserted into the vector pEASY-T3. The recombinants harboring the target gene were validated by EcoRI digestion and two-way sequencing using ABI3730 model DNA sequencer.
The ds-cDNA was synthesized based on the manufacturer's instruction using the above decapped, full-length mRNA with RNA Oligo as template, and oligo dT provided by Super-Script TM III RT kit as a primer. The ds-cDNA was cloned into vector pCR4-TOPO to establish the full-length cDNA library.
For amplifying the transcription start site (TSS) of the target gene, two 3 0 terminus genespecific primers for each gene, TSS LEC1BGSP1-1 and TSS LEC1BGSP2 (Table 1), were designed, for use in the nested PCR reaction. The 5' terminus general primers for two rounds of PCR were 5' GeneRacer TM Primer and 5' Nested Primer (Table 1). According to the recommended system of BD Advantage™ 2 PCR Kit, the primary PCR was performed as per the following conditions: 94°C denatured for 2 min, and 5 cycles of 94°C for 30 sec and 72°C for 30sec, and then 5 cycles of 94°C for 30 sec and 70°C for 30 sec, and 20 cycles of 94°C for 30 sec, 63°C for 30sec and 68°C for 30sec, and finally extension for 10 min at 68°C. The nested PCR was performed using a 50-fold dilution of the primary PCR product as template. The PCR condition were: denaturation at 94°C for 2 min; 35 cycles of 94°C for 30 sec, 65°C for 30 sec and 68°C for 10 sec; and finally 68°C for 10 min.
The nested PCR products were collected and sequenced by ABI3730 model DNA sequencer.
Computational cis-regulatory motif analysis of the promoter of AhLEC1B gene Two different online software PLACE (http://www.dna.affrc.go.jp/PLACE/) and PlantCARE (http://bioinformatics.psb.ugebp.be/webtools/plantcare/html/) were used to predict the cis-regulatory elements in the 5' flanking region of AhLEC1B gene, including the 5' untranslated region (5' UTR) and the upstream regulatory region. Constructs of GUS expressing system, Arabidopsis transformation, and GUS staining The different length promoters with 5' or 3' terminal deletion were obtained by PCR. All primers are listed in Table 1. BR1 and BR2 are reverse primers localized in 5' UTR of AhLEC1B. BF1-BF5 are the forward primers situated in the different sites of the AhLEC1B promoter (Table 1). For cloning purposes, a HindIII site (AAGCTT) and a NcoI site (CCATGG) was added to the 5' border and 3' border of each fragment by PCR amplification with an appropriately designed oligonucleotide. The six fragment-deleted promoters replacing the CaMV 35S promoter were cloned into pCAMBIA3301 digested with HindIII and NcoI.
The binary vectors constructed above were transferred into Agrobacterium tumefaciens strain GV3101 and then transformed into Arabidopsis Col-0 plants using the floral dip method [11]. Seeds were harvested and stored at room temperature. For screening, seeds were sterilized in 95% (v/v) ethanol for 1 min and 0.1% (v/v) HgCl for 20 min, followed by several washes with sterile water. Herbicide-resistant plants were selected by incubating plants for 14d on MS [12] basal medium supplemented with10 mg/L Basta.
GUS staining was performed using a standard protocol [13]. The roots and leaves at the 4-leaf stage, stems at the bolting stage, flowers, and seeds of 6-10 days after pollination in transgenic T 2 lines were incubated with the staining buffer (0.1% TritonX-100 and 2mM 5-bromo-4-chloro-3-indolyl-β-D-glucuronide (X-Gluc), cyclohexyl ammonium salt in 100mM sodium phosphate buffer, pH7.0) at 37°C overnight or 24h and then decolorized with 70% ethanol. The analyses were performed using at least six independent transgenic lines for analysis.

Phylogenetic analysis of AhLEC1A and AhLEC1B
In the Arabidopsis genome 13 NF-YB genes with distinctive expression patterns were found [3][4][5]. To predict the evolutionary relationship of AhLEC1A and AhLEC1B, a sequence comparison of AhLEC1A, AhLEC1B, and Arabidopsis NF-YB transcription factors was performed using MAGE 4.0. AhLEC1A, AhLEC1B, Arabidopsis NF-YB6 (L1L) and NF-YB9 (LEC1) have higher sequence similarity and group together (Fig 1). AhLEC1A and AhLEC1B share 95% sequence identity and diverge at only 12 amino acid sites. However, the expression profile of AhLEC1A was substantially different from that of AhLEC1B. AhLEC1A is expressed specifically in seeds during different developmental stages while AhLEC1B mRNA accumulates at higher levels in seeds as compared with roots, stems, rosettes, and flowers [14].
Cloning and sequence analysis of 5' flanking region of AhLEC1B and localization of TSS To investigate the major regulatory regions or elements of AhLEC1B, we isolated the promoter using chromosomal walking. As a result, the 5' flanking fragment of 1289 bp in length including the promoter region (1235bp) and 5'UTR sequences (54bp) was obtained from the peanut DNA library LP (Fig 2) Based on the cDNA sequence of AhLEC1B, we further amplified the 5'UTR of the gene from the full-length cDNA library of Luhua14 developing seeds using nested 5' RACE. As a result, we obtained PCR products of about 400bp and 60bp (Fig 3). The transcription of AhLEC1B gene starts at the first 'A' within the sequence of CCAAACT. This sequence is located 83 bp upstream to the translation start codon ATG, and is consistent with the general feature in most eukaryotes (Fig 4).

Cis-elements prediction of AhLEC1B promoter
To predict cis-regulatory elements in 5' flanking fragment of AhLEC1B, we submitted the 1318 bp sequences containing 1235 bp promoter region and 83 bp 5'UTR to PLACE and PlantCARE online to detect cis-regulatory elements.
The putative TATA box (TATATAT) in the core region of promoter was located -36 from TSS. The other cis-regulatory elements were classified into two groups (Fig 4). The first group  preferentially expressed in the seed or embryo [15][16][17][18]. Moreover, the CACTFT (YACT, 22 copies), TAAAG MOTIF (5 copies), ROOT MOTIF (ATATT, 15 copies), OSE2 ROOT NOD-ULE (CTCTT, 8copies) and POLLEN1 LELAT52 (AGAAA, 7 copies) are expressed in leaf, root and flower, respectively [19][20][21][22]. Some motifs required for light regulation (twelve copies of GATA BOX and ten copies of GT1 CONSENSUS) are dispersed in the promoter region of AhLEC1B [23,24]. Four copies of TGAC core sequences (WRKY71OS) were also scattered in the promoter region. Zhang et al. (2004) [25] found that the TGAC core motif could bind with rice WRKY71 transcriptional repressor to participate in the regulation of the gibberellin signaling pathway. The second group of cis-regulatory elements included a large number of elements with lower copies (less than three copies) or a single copy. These include several phytohormone-regulated elements such as CPB Sequence (TATTAG, cytokinin response), ERE Motif (AWTTCAAA, ethylene-induced transcription), GARE Motif (TAACAGA, Gibberellinresponsive element) [26][27][28], and some elements (TGACGT Sequence and PROLAMIN BOX) related to gene expression levels [29,30], and some tissue-or organ-preferential regulatory elements DPBF CORE (ACACNNG, associated with embryo-or seed-preferential expression), and RAV1A AT (CAACA) which expresses in relatively higher level in rosette leaves and roots, and etc (Fig 4) [31,32]. A copy of RY REPEAT sequence (CATGCA) was present in the upstream region of AhLEC1B promoter. RY REPEAT sequence is present in the promoters of many genes regulating seed development [33] and is also found in the promoter and intron regions of AtLEC1. Other specific cis-regulatory elements, such as CELL CYCLE BOX (CAC-GAAAA) and HEXAMER MOTIF (ACGTCA) were present in AhLEC1B promoter. The CELL CYCLE BOX (CACGAAAA) is involved in cell-cycle-specific activation of transcription [34] while HEXAMER MOTIF (ACGTCA) functions in the regulation of replication-dependent expression of the histone H3 gene [35,36]. They all exist mainly by the style of a single copy in the promoter region of this gene.

GUS expression driven by AhLEC1B promoter fragments
To identify the crucial regulatory regions that are essential for gene expression, we generated a series of constructs containing different length AhLEC1B promoter with 5' terminal deletion and 52bp 5' UTR or 3' terminal deletion fused with GUS reporter gene (Fig 5). All constructs were introduced into the Arabidopsis genome by Agrobacterium-mediated transformation. The resulting transgenic T 2 lines containing a single copy homologous gene were screened for use in GUS histochemical staining studies. The results of staining in diverse tissues or organs showed that the longest fragment (Q1, 1281bp) containing 1229bp promoter region and 52bp 5' UTR, mainly regulates the GUS expression in the developing embryo. Moreover, three fragments (Q4, Q5, and Q6) with a 5' terminal deletion could drive the GUS expression in all tissues detected (Fig 6). However, the promoter fragment (Q2 and Q3) with 351bp deletion from 3' terminus lost the promoter function that had crucial activity responsive elements ( Table 2). The shortest fragment (Q6, 118bp) including 66bp promoter region and 52bp 5' UTR contains the main elements that control the constitutive expression of the downstream gene (Fig 6).

Discussion
Arabidopsis LEC1 and L1L genes regulate embryogenesis, but they have distinct function during embryo development [6,47]. LEC1 expression in the embryos peaks at the early stage of seed development and declines thereafter, up to the green premature seed stage [6,48]. The loss-of-function mutation in LEC1 results in desiccation intolerance of embryos and defective in the production of storage proteins and lipids. However, as compared with LEC1 levels, the L1L mRNA levels peak at the later stage of embryogenesis. The suppression of L1L in RNAi transgenic lines results in abnormal embryos and the embryo lethal phenotype [47], but its mutants l1l-1 and l1l-2 have no apparent altered phenotypes during seed development [49].
AhLEC1A and AhLEC1B from peanut are homologous genes of Arabidopsis NF-YB6 (L1L) and NF-YB9 (LEC1) and have differential expression patterns in vegetative tissues. Our RT-PCR data shows that AhLEC1B mRNA, as similar as AtL1L does, accumulates at a higher level in seeds but at a lower level in vegetative tissues [14]. Thus, our expression data and phylogenetic analysis together shows that AhLEC1B is an ortholog of AtL1L.  ACGT Sequence ACGT -120(+,-) ACGT sequence (from -155 to -152) required for etiolation-induced expression of erd1 (early responsive to dehydration) in Arabidopsis [37].
CARGCW8GAT CWWWWWWWWG -220(+,-) A variant of CArG motif with a longer A/T-rich core is a preferential binding site for the transcriptional regulator AGL15 that accumulates during embryo development [17].
RAV1A AT CAACA -243(-) Binding consensus sequence of Arabidopsis transcription factor RAV1, which expresses in relatively higher level in rosette leaves and roots [32].

(Continued)
In this study, we cloned and analyzed the 5 0 flanking regulatory sequence of AhLEC1B, and found that GUS gene, driven by the whole-length Q1 construct, preferentially expressed in embryos of the transgenic Arabidopsis. On the other hand, the transgenic lines with 452bp-1156bp deletion constructs of Q1 from 5 0 terminal showed higher GUS expression in roots, rosettes, stems, flowers, and seeds. Previous studies showed that the upstream region of AtLEC1 promoter contains elements that repress its function in vegetative tissues [8]. Moreover, the seed-specific expression of the AtLEC1 gene is controlled by combinatorial properties of negative and positive cis-regulatory elements in its promoter [8]. PICKLE (PKL)-a putative chromatin-remodeling factor-forms part of a NuRD histone deacetylase complex, which as a negative regulator of AtLEC1 expression, represses embryonic identity and contributes to the transition from embryonic to postembryonic development in vegetative tissues [50,51]. We hypothesize that the expression of AhLEC1B gene has a similar regulatory mode in peanut. The VP1/ABI3-LIKE (VAL) B3 proteins (as another repressor) in Arabidopsis, specific binding to the canonical sequence of Sph/RY cis-elemnets (CATGCA), are required for repression of the LEC1/B3 transcription factor network during gemination and vegetative development [33,52]. Our results showed that an RY REPEAT element (CATGCA) localized at -1149bp of the AhLEC1B promoter region from TSS may be the binding site for VAL. The binding of VAL to the RY REPEAT element probably inhibits AhLEC1B expression in vegetative tissues. Moreover, the distal region of the AhLEC1B promoter consists of several other negative regulatory elements such as WRKY71OS (a transcriptional repressor of the gibberellin signaling pathway) and SRE (sugar-repressive element), which may be associated with upstream genes to decline its expression in particular way. Many elements required for the expression in embryo or endosperm, such as E BOX, CARGCW8GAT, and DPBF CORE, and so on, disperse in the Q1 construct. The E BOX elements are concentrated in the region from -250 to -50 in the promoters of some genes involved in fatty acid biosynthesis, triacylglycerol synthesis, and reserve including SeFAD2, Cs-ACP1 and Cs-4PAD, acyl-CoA-diacylglycerol acyltransferase (At2g19450), phosphatidylcholine: diacylglycerol acyltransferase (At3g44830), several oil-body oleoresins (At3g01570, At3g18570, At3g27660, At5g40420, and At5g51210), and two caleosins (At4g26740 and At5g55240) [18]. The non-canonical CArG motif-CARGCW8GAT, which is an AGL15 (AT5G13790) transcription factor (TF) binding site is present in many endospermspecific TF gene promoters [17,53]. AGL15 might act upstream of the chalazal endospermspecific TF genes and functions in activating at least one chalazal endosperm gene regulatory network [54].
The 300bp proximal region and 52bp 5 0 UTR of the AhLEC1B promoter have crucial regulatory elements that are required for its basic activity and function. Deletion of these regulatory elements causes loss of reporter expression in Q2 transgenic lines (Fig 6). In this region, with the exception of TATA BOX, many tissue-or organ-specific elements, including phytohormone-responsive elements, light-regulated elements, elements associated with biotic stress and  [25]. The symbol '+' or '-' in the bracket represents the DNA strand in which the element is situated. c The positive number indicates the location of element in 5 0 UTR, while the negative represents that in promoter. doi:10.1371/journal.pone.0139213.t002 Analysis of 5 0 Flanking Regulatory Sequences of AhLEC1B Gene abiotic stress response, etc., were found (Table 2). Furthermore, the 5 0 -end deletion analysis of Q1 construct indicated that the 65bp promoter fragment with the 52bp 5 0 UTR, where TATA BOX, CACTFTPPCA1, DOF CORE, GATA BOX, SORLIP1AT, and the like exist, could satisfy its basic driving function. The promoter also drives the GUS activity in a manner similar to that of the CaMV 35S promoter in all detected tissues. In general, plant promoters have a distal region (upstream activation sequence) and a proximal region (core region of the promoter) located at about 30-40bp upstream of TSS. In our study, we found that AhLEC1B promoter harbored within -65~+52bp region has those crucial elements such as DOF CORE and GATA BOX and the like. Morton et al. (2014) [55] found that ROEs (regions of enrichment) of transcription factor binding site (TFBS) in the proximal promoter region within 40 nucleotides from the TSS are present either in Narrow Peak promoters or in those of Broad with Peak, where these crucial elements helpfully determine the profiles and levels of gene expression.
In conclusion, the AhLEC1B gene-with transcripts preferentially in the embryo-is co-regulated by the binding of upstream genes and the corresponding cis-regulatory elements in its promoter. The promoter elements may negatively and positively regulate the gene, and its 65bp promoter region plus 52bp 5 0 UTR contain the key motifs required for the essential promoter activity.