Genome-wide analysis of basic/helix-loop-helix gene family in peanut and assessment of its roles in pod development

The basic/helix-loop-helix (bHLH) proteins constitute a superfamily of transcription factors that are known to play a range of regulatory roles in eukaryotes. Over the past few decades, many bHLH family genes have been well-characterized in model plants, such as Arabidopsis, rice and tomato. However, the bHLH protein family in peanuts has not yet been systematically identified and characterized. Here, 132 and 129 bHLH proteins were identified from two wild ancestral diploid subgenomes of cultivated tetraploid peanuts, Arachis duranensis (AA) and Arachis ipaensis (BB), respectively. Phylogenetic analysis indicated that these bHLHs could be classified into 19 subfamilies. Distribution mapping results showed that peanut bHLH genes were randomly and unevenly distributed within the 10 AA chromosomes and 10 BB chromosomes. In addition, 120 bHLH gene pairs between the AA-subgenome and BB-subgenome were found to be orthologous and 101 of these pairs were highly syntenic in AA and BB chromosomes. Furthermore, we confirmed that 184 bHLH genes expressed in different tissues, 22 of which exhibited tissue-specific expression. Meanwhile, we identified 61 bHLH genes that may be potentially involved in peanut-specific subterranean. Our comprehensive genomic analysis provides a foundation for future functional dissection and understanding of the regulatory mechanisms of bHLH transcription factors in peanuts.


Introduction
Basic/helix-loop-helix (bHLH) transcription factors are a superfamily of proteins that are widely distributed in all eukaryotic organisms and have been found to play an increasing number of functions in a wide range of essential metabolic, physiological and developmental processes, such as photosynthesis, light signaling, pigment biosynthesis, seed development and stress resistance [1][2][3][4]. The bHLH proteins among animals, yeasts, and plants are defined by a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 study, a hidden Markov model (HMM) that allows for the detection of the bHLH domain across highly divergent sequence was used to systematically identify and characterize the bHLH genes in peanut using A-subgenome and B-subgenome as references. Using this method, we have identified a total of 261 bHLH genes. Multiple sequence alignments, phylogenetic relationships, chromosome distribution patterns, DNA-binding activities and intron distribution patterns of these bHLH genes were also determined. Additionally, based on the expression patterns among different tissues and qPCR analyses, 61 bHLH genes that likely regulate pod development were identified.

Plant materials and growth conditions
Plant materials were collected from cultivated peanut (Luhua-14) grown on the experimental farm of Shandong Academy of Agricultural Sciences with normal day/night period. Peanut materials including root, stem, leaf, flower and peg were collected at 60 days after seed germination. Six developmental stages of peanut gynophores were used in this study. Aerial grown gynophores, which were green or purple in color with a length of 3-5 cm were assigned as S1; peg grown in soil for about 3 d that were white in color and with no detectable ovary enlargement was assigned as S2; peg buried in soil for about 9 d with very small enlarged ovary was assigned as S3; peg buried in soil for about 15 d, 21 d, 27 d were assigned as S4, S5, S6, respectively. A 5 mm tip region of the gynophore was manually dissected, frozen in liquid nitrogen and stored at -80˚C for the following experiments. Two biological replicates were prepared for each stage.

Collection and identification of candidate bHLH genes in peanut
The whole genome sequence of the peanut AA-subgenome (Aradu.V14167.a1.M1) and BBsubgenome (Araip.K30076.a1.M1) were obtained from PeanutBase (http://peanutbase.org/) and the HMM sequence of the bHLH domain (PF00010) was downloaded from the pfam database (http://pfam.xfam.org/) and used as query to search for candidate peanut bHLH protein sequences in the AA and BB subgenomes using BLASTP (e-value < 0.001). To further confirm and filter out uncertain bHLH proteins, the predicted bHLH domains were examined using the SMART tool (http://smart.embl-heidelberg.de). All bHLH protein sequences of peanut used in this study are listed in S1 Table. Multiple sequence alignments, identification of conserved motifs and phylogenetic analysis Multiple protein sequence alignments were performed using Clustal-omega (http://www.ebi. ac.uk/Tools/msa/clustalo/). To visualize the conserved motifs, the sequences were analyzed with WEBLOGO programs (http://weblogo.berkeley.edu). A phylogenetic tree was constructed using MEGA 7.0 (http://www.megasoftware.net) using the neighbor-joining method with the following parameters: pairwise deletion option, 1000 bootstrap replicates and Poisson correction distance [27]. The consensus tree showed only branches with a bootstrap consensus > 50. Maximum likelihood (ML) analyses were performed with PhyML version 3.0 (http://www.atgc-montpellier.fr/phyml) using the JTT model of amino acid substitution and the radial tree was drawn using FigTree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree).

Location of bHLH genes on AA and BB chromosomes
To examine the chromosomal location of peanut bHLH genes, the start and end positions of each bHLH gene on each chromosome were obtained from the peanut database website (http://peanutbase.org/) via BLASTN, and a map was generated using MapInspect software (http://mapinspect.software.informer.com/).

RNA-seq data collection and expression analysis of bHLH genes
To further characterize the function of peanut bHLH genes during peanut development, published RNA-seq data from 22 different tissues in cultivated peanut were downloaded from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) under Bio-Project PRJNA291488. A description of the peanut tissues is listed in S2 Table. The expression pattern of the bHLH genes in different tissues was determined using an R script based on the normalized RPKM (Reads Per Kilobase of exon model per Million mapped reads) values of all genes transformed to log2 (value + 1). A correlation analysis between orthologous regions of the AA-and BB-subgenomes was performed using SPSS software.

RNA isolation and quantitative RT-PCR analysis
Total RNA was extracted from different peanut tissues using CTAB reagent. The reverse transcription reaction (20 μl) contained 2 μg DNase I-treated total RNA, 50 nM Oligo(dT) primer, 0.25 mM each of dNTPs, 50 units reverse transcriptase, 1×reverse transcriptase buffer and 4 units RNase inhibitor, according to the manufacturer's protocol. The reactions were incubated at 42˚C for 1 h and were terminated by incubation at 85˚C for 5 min to inactivate the reverse transcriptase. AhActin was used as the internal control. SYBR Green PCR Master Mix (Bio-Rad) was used in all qRT-PCR reactions with an initial denaturing step at 95˚C for 10 min, followed by 40 cycles of 95˚C for 5 s, 65˚C for 5 s and 72˚C for 8 s. Three biological replicates were prepared for each sample and relative expression levels were calculated using the 2 -ΔΔCt method. Student's t-test was used to determine whether the qRT-PCR results were statistically different between two samples ( Ã P < 0.05). Primers used in all of the qRT-PCR experiments are listed in S3 Table. Results and discussion

Identification of bHLH genes in two wild type peanuts
The bHLH gene family is one of the largest families in plants, and the members are only fewer than the MYB family [28]. In order to define the peanut bHLH gene family, in this study, a total of 132 and 129 bHLH proteins were identified in the AA-and BB-subgenomes, respectively, based on the Hidden Markov Model BLAST, according to the criteria developed by Atchley and Toledo-Ortiz [3,7]. To verify the reliability of our criteria, we performed simple modular architecture research tool (SMART) analysis of all 261 putative peanut bHLH protein sequences and found that the majority of these proteins (104, 78.7% in AA-subgenome and 94, 72.8% in BB-subgenome) had a typical bHLH domain. The proteins lacking the basic region may interact with other bHLH proteins to bind to the DNA motif. Cultivated peanut is an allotetraploid derived from two diploid wild species AA and BB that contain two closely related subgenomes. The number of bHLH genes between AA and BB are almost equal. This may be due to the fact that the genome size of AA and BB-subgenomes is highly similar, with a sequence similarity of 64% between AA and BB [25,26]. Compared with other transcription factor gene families, the bHLH gene family is the second largest family and has only a few less members than the MYB gene family. In previous studies, 147, 167, 159, 127, 230, 127, 289 and 319 bHLH genes were identified in Arabidopsis, rice, tomato, miltiorrhiza, Chinese cabbage, potato, maize and soybean, respectively [3,11,12,14,29,30]. The number of bHLH genes in each diploid wild peanut is similar to that found in Arabidopsis, rice, tomato, miltiorrhiza and potato, but was noticeably less than that found in Chinese cabbage, maize and soybean. This may be due to the large genome sizes of these plants or genome duplication. In precious reports, the number of bHLH proteins increased with plant evolution and genome duplication, suggesting that these proteins may play an important role in plant evolution.
Multiple sequence alignments, conserved amino acid residues in the bHLH domains and DNA-binding activity prediction To analyze the features of peanut bHLH protein domains, we conducted multiple protein sequence alignments of the bHLH domains from AA-and BB-subgenomes using Clustalomega software (S1 Fig). The frequencies of the consensus amino acids within the bHLH domains were counted and are shown in Table 1. There are four conserved regions in the bHLH domain sequences for most of the bHLH proteins, including one basic region, two helix regions and one loop region (Fig 1). The basic regions have five conserved residues (His-9, Glu-13, Arg-14, Arg-16 and Arg-17) that were identical in at least 50% of the 132 AA-subgenome and 129 BB-subgenome bHLH domains (Fig 1A and 1B). The first helix region, the loop region and the second helix region have three conserved residues (Asn-21, Leu-27, and Pre-32), one conserved residue (Lys-36) and five conserved residues (Lys-39, Leu-43, Ile-47, Tyr-49, and Leu-53), respectively, that were identical in at least 50% of the 132 AA-subgenome and 129 BB-subgenome bHLH domains. Among these 14 conserved residues, six residues were present in more than 75% of sequences (Glu-13, Arg-16, Arg-17, Leu-27, Lys-36 and Leu-53) ( Table 1). All of these 14 conserved residues have also been reported in other species, suggesting that these residues are extremely important for the function of bHLH proteins.
The basic region of peanut bHLH protein that functions in DNA binding contains 17 residues. Using the criteria described by Massari and Murre, peanut bHLH proteins are divided into several categories based on sequence information in the basic bHLH region [1,9]. For the AA-subgenome, 104 DNA binding proteins and 28 non-DNA binding proteins were identified, while 94 DNA binding proteins and 35 non-DNA binding proteins were identified in the BB-subgenome (Fig 2). The DNA binding bHLH proteins were further divided into two groups, putative E-box-binding proteins and putative non-E-box-binding proteins, depending on the presence or absence of residues Glu-13 and Arg-16 in the basic region. Only five and three non-E-box-binding proteins were found in AA-and BB-subgenomes, respectively. The 99 and 91 E-box-binding proteins in the AA-and BB-subgenomes, respectively, can be further divided into two subgroups, G-box-binding proteins and non-G-box-binding proteins, according to the presence or absence of the His-9 residue. A total of 75 and 65 G-box-binding proteins were found in the AA-and BB-subgenomes, respectively.

Intron distribution within the peanut bHLH domains
To analyze intron distribution within the coding sequence of the bHLH domain in all peanut bHLH genes reported here, we performed a multiple alignment between all the bHLH coding sequences and genome sequences using BLAST. Ten different distribution patterns (designated I to X), ranging from zero to three introns within the domain, were observed (Fig 3). Among the 104 AA-subgenome and 94 BB-subgenome bHLH genes, only 13 AA-subgenome and 12 BB-subgenome bHLH genes did not contain an intron in their bHLH domain region (pattern X). In contrast, 87.5% of AA-subgenome bHLH genes and 87.2% of BB-subgenome bHLH genes contained introns in the coding sequence of the bHLH domain. However, the sequence length and similarity of the introns differed among these bHLH domains, even at the same position. Among the nine patterns, pattern VI (including one intron) contained the most bHLH genes (42 of AA-subgenome and 35 of BB-subgenome) and pattern I (including Table 1. Information on the consensus motif and conserved amino acid sequences in the bHLH domain.

Position in the Alignment
Region AA-subgenome BB-subgenome three introns) was the second common in peanut, consistent with that found in tomato and rice, but different from that of Arabidopsis. In Arabidopsis, the most common pattern involves three introns in the bHLH region [3]. These results showed that intron sequences and their distribution varies among peanut, tomato, rice and Arabidopsis although their bHLH domains were conserved, and peanut bHLH proteins may have a more distant evolutionary relationship with Arabidopsis proteins than those of tomato and rice. Furthermore, we also investigated the intron phases in the bHLH domains with respect to codons. The splicing of each intron is thought to occur in three different phases: phase 0, phase 1, or phase 2, depending on the splicing position in the codons. In phase 0, splicing occurs after the third nucleotide of the first codon; in phase 1, splicing occurs after the first nucleotide of the single codon; and in phase 2, splicing occurs after the second nucleotide [31].   triangles) were spliced at phase 0 (I-VI). The other introns with less conserved positions (VII, VIII and IX) were all spliced in phase 1. Interestingly, no splicing in phase 2 was detected in the bHLH domains of peanut, unlike that seen in both rice and Arabidopsis [11]. Such conserved splicing phases were also observed in the bHLH and MYB gene families of soybean, rice and Arabidopsis [11,32,33]. Therefore, our findings indicate that the splicing phase was highly conserved in peanut, as well as other higher plant species during the evolution of bHLH gene domains, and the introns in the bHLH domain may play an important role in the evolution of the bHLH gene family by means of unknown mechanisms.

Phylogenetic analysis of peanut bHLH proteins
To identify the evolutionary relationships of the peanut bHLH proteins, a neighbor-joining (NJ) phylogenetic tree was generated using multiple sequence alignments of the conserved bHLH domains with a bootstrap analysis (1,000 replicates). The phylogenetic tree showed that all of the 261 peanut bHLH domains were subdivided into 19 subfamilies, designated as 1 to 19 (Fig 4), according to clades with at least 50% bootstrap support, consistent with the results showing that the bHLH superfamily in plants is usually composed of between 14 and 32 subfamilies, based on phylogenetic analysis of the bHLH region [34,35]. The genes with a Gbox binding region were mostly clustered within subfamilies 6, 9-10, 14, and 17-19, whereas the genes with a non-DNA-binding region were grouped in subfamilies 7 and 11. In addition, different subfamilies can share the same intron distribution pattern. For example, genes in subfamilies 1, 6 and 9 have the same intron distribution pattern (pattern I), while subfamilies 10 and 19 belong to pattern VI. These results suggest that the pattern of intron distribution can also provide important evidence to support phylogenetic relationships within a gene family, and proteins within the same subfamily may share close evolutionary relationships.
Many bHLH proteins have been functionally characterized in model plants, such as Arabidopsis, rice and tomato. To further predict and annotate the function of peanut bHLH proteins and obtain information about the evolutionary history between peanut and other plants, a phylogenetic tree was generated using the alignment of full-length bHLH protein sequences of peanut, Arabidopsis, rice and tomato. This analysis generated 24 distinct subfamilies (designated as 1 to 24) according to the groups proposed by previous phylogenetic analyses of Arabidopsis and rice bHLH protein sequences (S2 Fig) [11]. The peanut bHLH proteins were unevenly distributed in all 24 subfamilies. However, our above results showed that only 19 subfamilies were found using peanut bHLH proteins alone. This difference may be attributed to more species and more protein sequences used in this phylogenetic tree. Generally, transcriptional regulators within the same clade may exhibit recent common evolutionary origins and conserved molecular functions [12]. Notably, seven peanut bHLH proteins including Aradu.5CX4U, Araip.P4GTD, Aradu.Y3AAH, Araip.UVF95, Aradu.GH2K1, Araip.RX20Z and Aradu.0K58L were highly orthologous to SlybHLH131, which has been proved to be involved in resisting to yellow leaf curl virus infection in tomato [13]. Furthermore, Aradu.QW16A, Araip.X1DZZ, Aradu.LS2KI and Araip.37BUF were orthologous to OsbHLH13 and OsbHLH16 that are involved in anthocyanin biosynthesis [36]. Aradu.1124E and Araip. HA94C were orthologous to OsbHLH164, which is critical for tapetum development [37], while Aradu.D69CU and Araip.6S5NP were orthologous to OsbHLH62, which is important for cold shock response [38]. These results suggest that the 15 peanut bHLH proteins within different subfamilies may have related molecular functions with their homologs in tomato or rice, which provides a foundation for future functional studies of bHLH proteins in peanut.
The phylogenetic analysis of full-length bHLH protein sequences between peanut and Arabidopsis indicated that the members of subfamily H are the most homologous to Arabidopsis PIF family proteins (S3 Fig). Specifically, Aradu.RC5BB and Aradu.LP0MC were highly orthologous to AtPIF7, while Aradu.I92X3 was orthologous to AtPIF4/AtPIF5. In addition, Aradu. QV5DJ, Aradu.YAX06 and Aradu.L9W8G were orthologous to AtPIF3. PIFs are a group of bHLH subfamily transcription factors that have recently been shown to act directly downstream of phytochromes and promote light-regulated growth and development in Arabidopsis [39][40][41]. Given that light plays fundamental roles in peanut development and pod formation, identifying components of the light signaling pathway will be of great significance to the study of peanut pod development mechanisms. In this study, six PIFs ranging from 446 to 740 AA in length were identified from the wild type AA-subgenome. Meanwhile, six PIFs ranging from 434 to 756 AA in length were also identified from BB-subgenome. Among them, five pairs were orthologous. General information about PIFs from wild and cultivated peanut is presented in S4 Table. All Arabidopsis and peanut PIF proteins were then analyzed for the presence of conserved motifs. A conserved APB motif, important for interaction with phyB, was found at the N-terminus of all 18 PIF proteins, and at least one bHLH domain involved in protein interaction and DNA binding was found in each PIF at the C-terminus (S4 Fig). Only four PIF proteins (Aradu.QV5DJ; Aradu.0DZ84; Araip.2LX3X; Araip.7G5H2) contained a conserved APA motif. In addition, a predicted nuclear localization signal peptide was found in most peanut PIFs. These functional motifs, as well as the phylogenetic analysis of PIFs between peanut and Arabidopsis, indicate that these 12 bHLHs could encode functional transcription factors involved in the light signal transduction pathway in peanut.
Orthologues of AdbHLH and AibHLH genes are located in syntenic loci in the two wild type genomes To determine the physical map positions of the AdbHLH and AibHLH genes on the peanut chromosome, the cDNA sequence of each OsbHLH gene was used to search the peanut genome database using BLASTN software. As shown in S5 Fig, 131 AdbHLH and 129 AibHLH genes were randomly and unevenly distributed across 10 AA chromosomes and 10 BB chromosomes. The distribution number of bHLH genes does not positively correlate with chromosome length. Chromosomes A03, A05 and A08 contained the same amount and largest number of bHLH genes (20), while chromosomes A04 and A10 contained the same amount and the least bHLH genes (6) in the AA-subgenome. In the BB-subgenome, both chromosome B03 and chromosome B09 contained the largest number of bHLH genes (19) and both chromosome B04 and chromosome B10 contained the lowest number of bHLH genes (7).
In addition, 120 bHLH orthologous gene pairs were detected between the AA-subgenome and BB-subgenome, according to the phylogenetic relationship and the sequence alignment of AdbHLH and AibHLH genes ( Table 2). The identity and similarity of their full-length CDS sequences and protein sequences were both above 80%, which was consistent with the close relationship of the AA and BB-subgenomes. Among the orthologous gene pairs shared by the AA-and BB-subgenomes, 101 orthologous gene pairs were found on syntenic loci of the AAsubgenome and BB-subgenome. Notably, one AA bHLH gene (Aradu.8U0A6) had two corresponding orthologous genes in the BB-subgenome (Araip.RMK33 and Araip.8T85I), while three BB bHLH genes (Araip.44BHE, Araip.KI1I3 and Araip.RM65A) had more than one corresponding orthologous gene in the AA-subgenome, demonstrating that bHLH gene duplication events occurred universally in the two wild subgenomes, which are considered to be the raw materials for the evolution of new biological functions and played crucial roles in plant adaptation.

Expression patterns of peanut bHLH genes in different tissues
To further understand the function of peanut bHLH proteins, the expression patterns of peanut bHLH genes among 22 samples, including 8 tissues, were analyzed, as well as 14 different developmental stages, according to the normalized RPKM data from RNA-seq. S5 Table  shows (Fig 5). The specific accumulation of these bHLH genes in a particular tissue suggests that they may play conserved regulatory roles in discrete cells, organs, or conditions. Given that the cultivated peanut is an allotetraploid that is derived from two diploid wild species, Arachis duranensis and Arachis ipaensis, it is very interesting to detect the orthologous expression of AdbHLHs and AibHLHs. As shown in S6 Table, 66 pairs of orthologous genes between AdbHLHs and AibHLHs exhibited similar expression patterns and similar expression levels, suggesting that these orthologous genes exhibited functional redundancy.

Identification of pod development-related bHLH genes
Peanut flowers bloom above the ground, whereas its fruit develops underground. Following fertilization, the peanut zygote divides a few times to form a pre-embryo and embryonic development stops upon exposure to light or normal day/night periods. However, the ovary continues to develop and form a peg. Along with the elongation of the peg, the tip region (containing the embryo) of the peg is buried in the soil at which time peanut pod development resumes in darkness. Thus, the early development of the peanut pod is a complex, genetically programmed process involving the coordinated regulation of gene expression, seriously impacting on peanut production. Recent studies in model plant species have shown that the bHLH transcription factors participate in various plant developmental processes, such as root hair formation, anther development and axillary meristem generation [42]. However, there are no reports of the bHLH proteins in peanut pod development until now.
In order to assess the potential regulatory role of bHLHs in this peanut-specific process and predict candidate bHLHs that may function in the regulation of gene expression during early pod development, we further investigated peanut bHLH expression among the early developmental stages. We identified 31 AdbHLH and 30 AibHLH genes that showed a gradual increase or decrease in expression, along with the early developmental process based on the gene expression data (Fig 6). To validate the bioinformatic data, qRT-PCR was performed to examine the expression of several bHLHs that may be related to early pod development and results were in agreement with the sequencing data ( Fig 7A). The high and differential expression of these genes in early developmental stage of pod, or peanut fruit, may directly contribute to pod formation and development in peanut. Furthermore, the expression of two PIFs in root, stem, leaf, flower, seed and different developmental stages of peanut gynophore were examined using qRT-PCR (Fig 7B). The results showed that Aradu.QV5DJ/Araip.2LX3X was expressed in all of these tissues with higher levels seen in flowers and early developmental stages of gynophore than was seen in other tissues. The accumulation of Aradu.YAX06/Araip. N5MMK in S1, S2 and S3 of gynophores was significantly higher than that in other tissues (Fig 7B), implying that these two genes may serve broader functions than other AhPIFs in light signaling, cell division, differentiation and morphogenesis during early embryo development and pod formation.

Conclusions
Although many bHLH family genes have been identified in various plants, only a small number have been functionally characterized. In the past few decades, the regulation of growth and development, stress resistance, metabolism, light and hormone signaling by bHLH transcription factors has been reported in various plants. However, until now no reports about the bHLH genes in peanut have been made. This study is the first comprehensive and systematic analysis of bHLH transcription factors based on the entire genome sequence of the wild peanuts. In total, 261 bHLH transcription factors were identified in the wild peanut genome. The structure, classification, expression patterns among different tissues and comparative analyses of this gene family between peanut and Arabidopsis will help to identify candidate bHLH transcription factors potentially involved in regulating peanut pod development and provide basic resources for further study of bHLH genes in peanut. Further detailed experimental investigation is required to reveal the roles and molecular mechanisms underlying the regulation of bHLHs (particularly the PIF subfamily genes) in the developmental and physiological processes during early pod formation and embryo development.
Supporting information S1 Table. Amino acid sequences of all bHLH proteins in peanut.