The giant panda (Ailuropoda melanoleuca) is a critically endangered mammalian species. Studies on functions of regulatory proteins involved in developmental processes would facilitate understanding of specific behavior in giant panda. The basic helix-loop-helix (bHLH) proteins play essential roles in a wide range of developmental processes in higher organisms. bHLH family members have been identified in over 20 organisms, including fruit fly, zebrafish, mouse and human. Our present study identified 107 bHLH family members being encoded in giant panda genome. Phylogenetic analyses revealed that they belong to 44 bHLH families with 46, 25, 15, 4, 11 and 3 members in group A, B, C, D, E and F, respectively, while the remaining 3 members were assigned into “orphan”. Compared to mouse, the giant panda does not encode seven bHLH proteins namely Beta3a, Mesp2, Sclerax, S-Myc, Hes5 (or Hes6), EBF4 and Orphan 1. These results provide useful background information for future studies on structure and function of bHLH proteins in the regulation of giant panda development.
Citation: Dang C, Wang Y, Zhang D, Yao Q, Chen K (2011) A Genome-Wide Survey on Basic Helix-Loop-Helix Transcription Factors in Giant Panda. PLoS ONE 6(11): e26878. doi:10.1371/journal.pone.0026878
Editor: Vladimir N. Uversky, University of South Florida College of Medicine, United States of America
Received: July 25, 2011; Accepted: October 5, 2011; Published: November 9, 2011
Copyright: © 2011 Dang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by Scientific Research Promotion Fund for the Talents of Jiangsu University (No. 09JDG029) and Jiangsu Sci-Tech Support Project - Agriculture (No. BE2008379). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The basic helix-loop-helix (bHLH) proteins form a large superfamily of transcription factors that play crucial roles in a wide range of developmental processes including neurogenesis, myogenesis, hematopoiesis, sex determination and gut development. The bHLH domain is approximately 60 amino acids long and comprises a DNA-binding basic region (b) and two helices separated by a variable loop region (HLH) . The HLH domain promotes dimerization, allowing the formation of homodimeric or heterodimeric complexes between different bHLH family members. The two basic domains which are brought together through dimerization bind specific hexanucleotide sequences.
In the past two decades, protein functions of animal bHLH family members have been well characterized mainly through studies on bHLH proteins in model organisms including the nematode (Caenorhabditis elegans), fruit fly (Drosophila melanogaster) and mouse (Mus musculus). It has been established that animal bHLHs are classified into 45 families based on their different functions in the regulation of gene expression. In addition, they are divided into 6 groups according to target DNA elements they bind and their own structural characteristics. Specifically, group A consists of 22 families. They mainly regulate neurogenesis, myogenesis and mesoderm formation. Group B consists of 12 families. They mainly regulate cell proliferation and differentiation, sterol metabolism and adipocyte formation, and expression of glucose-responsive genes. Group C has 7 families. They are responsible for the regulation of midline and tracheal development, circadian rhythms, and for the activation of gene transcription in response to environmental toxins. Group D has only 1 family. It forms inactive heterodimers with group A bHLH proteins. Group E has 2 families, which regulate embryonic segmentation, somitogenesis and organogenesis etc. Group F also has 1 family. It regulates head development and formation of olfactory sensory neurons etc (reviewed in ).
With the completion of genome sequencing projects for an increased number of organisms, bHLH family members have been identified in genomes of over 20 organisms. These include 8 bHLH genes in Saccharomyces cerevisiae, 16 in Amphimedon queenslandica, 33 in Hydra magnipapillata, 42 in Caenorhabditis elegans, 46 in Ciona intestinalis, 50 in Strongylocentrotus purpuratus, 51 in Apis mellifera, 52 in Bombyx mori, 57 in Daphia pulex, 59 in Drosophila melanogaster, 63 in Lottia gigantea, 64 in Capitella sp 1, 68 in Nematodtella vectensis, 78 in Branchiostoma floridae, 87 in Tetraodon nigroviridis, 104 in Gallus gallus, 114 in Mus musculus, 114 in Rattus norvegicus, 118 in Homo sapiens, 139 in Brachydanio rerio, 147 in Arabidopsis, and 167 in Oryza sativa –.
The giant panda, Ailuropoda melanoleuca, is a critically endangered mammal confined in six isolated mountain ranges of South-western China . As one of the most primitive carnivores, giant panda not only has unique food habit, but also has highly specialized reproductive behavior and low fertility , all of which signify that the giant panda has considerably different regulatory mechanisms in growth and development. However, very little is known on structure and function of regulatory genes in the growth and development of giant panda , . As bHLH proteins present great importance in the regulation of organismal development, in this study, we have made exhaustive effort to obtain the complete list of bHLH family members encoded in the genome of giant panda. As a result, 107 bHLH family members were identified. Phylogenetic analyses with their mouse bHLH homologues revealed that the 107 giant panda bHLH members belong to 44 bHLH families with 46, 25, 15, 4, 11 and 3 members in group A, B, C, D, E and F, respectively, while 3 members were assigned into “orphan”. The present study provides useful background information for future studies on structure and function of bHLH proteins in the regulation of giant panda development.
Materials and Methods
The sets of 45 representative bHLH domains and 114 mouse bHLH motifs were from the additional files of previous reports , , respectively. Each sequence of both sets was used as query sequence to perform tblastn search against the giant panda genome sequences which were accessed through the hyperlink provided on GenBank's MapView webpage (http://www.ncbi.nlm.nih.gov/mapview/). The expect value (E) was set at 10 in order to obtain all bHLH related sequences. The obtained subject sequences were manually examined to keep only one sequence for those that have the same contig number, reading frame and coding regions, to add the missing amino acids to corresponding sites with the help of EditSeq program (version 5.01) of the DNAStar package, and to find introns within the bHLH motifs using NetGene2 application online (http://www.cbs.dtu.dk/services/NetGene2/). Sequence accession numbers of giant panda bHLH proteins were obtained by using amino acids of each identified bHLH motif to conduct blastp search against giant panda protein sequence databases which were also accessed through the hyperlink on GenBank's MapView webpage.
All sequences that had been improved by the above methods were aligned using ClustalW program embedded in MEGA4  with default settings. Each sequence was examined for their amino acid residues at the 19 conserved sites by manual checking . Sequences with less than nine variations were regarded as potential giant panda bHLH members. The sequences which have less than ten conserved amino acids were discarded and the rest sequences were aligned again using ClustalW. The aligned giant panda bHLH motifs were shaded in GeneDoc Multiple Sequence Alignment Editor and Shading Utility (Version 2.6.02)  and copied to rich text file for further annotation.
Phylogenetic analyses to all the identified giant panda bHLH members were carried out in two steps. First, all the obtained giant panda bHLH motif sequences were used to build neighbor-joining (NJ) distance tree with the 114 mouse bHLH motif sequences using PAUP 4.0 Beta 10  based on a step matrix constructed from Dayhoff PAM 250 distance matrix by R. K. Kuzoff (http://paup.csit.fsu.edu/nfiles.html). Then, each giant panda bHLH motif sequence was used to conduct in-group phylogenetic analyses  with mouse bHLH motif sequences. That is, each amino acid sequence of giant panda bHLH motifs was used to construct NJ, maximum parsimony (MP), and maximum likelihood (ML) phylogenetic trees with mouse bHLH family members of the corresponding group, respectively. The NJ trees were bootstrapped with 1,000 replicates to provide information about their statistical reliability. MP analysis was performed using heuristic searches and bootstrapped with 100 replicates. ML trees were constructed using TreePuzzle 5.2  with quartet-puzzling tree-search procedure and 25,000 puzzling steps. Model of substitution was set to the Jones-Taylor-Thornton . Other parameters were set to default values.
Results and Discussion
Giant Panda bHLH Family Members
The tblastn searches, sequence alignment, and examination of the 19 conserved amino acid sites revealed that there were 107 bHLH genes encoded in giant panda genome. The names of all 107 giant panda bHLH members are listed in Table 1. Each identified giant panda bHLH (GpbHLH) gene was named according to nomenclature used by mouse bHLH sequences. The alignment of all 107 GpbHLH motifs is shown in Figure S1 and the phylogenetic tree constructed using amino acids from 107 GpbHLH motifs and 114 mouse bHLH motifs is shown in Figure S2. Figures S1 and S2 together show that there were 46, 25, 15, 4, 11 and 3 members in group A, B, C, D, E and F, respectively. And additional 3 members were assigned into “orphan”. We found that gene encoding for member of Delilah family was not found in the giant panda genome. In Figure S1, there are two most conserved sites located at sites 23 and 59 of the bHLH motif. Besides, there are other eight sites which are also conserved as indicated with asterisks on top of Figure S1 (amino acid sequences of all 107 giant panda bHLH motifs are available in file S1).
Identification of Orthologous Families
Ortholog identification has had much uncertainty since there is no absolute criterion that can be used to decide whether two genes are orthologous . In our previous studies , , in-group phylogenetic analysis was adopted to identify homologues for the unknown sequences that would form a monophyletic clade among themselves by using a more certain criterion based on the criterion used by Ledent et al. , : If an unknown single giant panda bHLH forms a monophyletic clade with another bHLH of known family in phylogenetic trees constructed with different methods and all the bootstrap values exceed 50, the known member will be regarded as a homologue of the unknown sequence. Figure S3, as an example here, shows NJ, MP and ML phylogenetic trees constructed with one GpbHLH member (GpAsh1) and eight group A bHLH members from mouse. In all three trees, GpAsh1 formed monophyletic clade with Mash1 of mouse with bootstrap values ranging from 92 to 100. Therefore, GpAsh1 was considered as an ortholog of Mash1 of mouse. The similar in-group phylogenetic analyses were conducted to each of the identified GpbHLH members by referencing Figure S2 to select appropriate related mouse bHLH members for the analysis. All the bootstrap values of constructed NJ, MP and ML trees were listed in Table 1 without showing the correspondent constructed trees. Table 1 showed that the orthology of GpbHLH members with mouse can be divided into the following categories.
Firstly, among the 107 GpbHLH members, 83 GpbHLH members have all the bootstrap values over 50 (55≦bootstrap values≦100) in constructed NJ, MP and ML trees. We have sufficient confidence to define orthology of these GpbHLH motifs to their corresponding mouse bHLH orthologs.
Secondly, 4 GpbHLH members, namely GpTCF4, GpNDF1, GpUSF2 and GpEBF1, formed monophyletic clade with bootstrap values over 50 in NJ and ML trees. Although they also formed monophyletic clade in MP trees, their bootstrap values ranged from 21 to 45. Therefore, the orthology of these 4 GpbHLH members have been defined according to the statistical support from NJ and ML trees. And 10 GpbHLH members, namely GpMist1, GpAHR2, GpTwist, GpDHand, GpARNT1, GpSREBP1, GpId1, GpHerp2 and GpOrphan3, formed monophyletic clade with bootstrap values ranging from 50 to 100 in NJ and MP trees, but did not form monophyletic group with any single bHLH sequence in ML trees (marked with n/m* or n/m in Table 1). For these 9 GpbHLH members, we have defined their orthology according to the statistical support from NJ and MP trees.
Thirdly, 2 GpbHLH members, namely GpPod1 and GpHen2 formed monophyletic clade in NJ and MP trees with bootstrap values ranging from 20 to 79 but did not form monophyletic group in ML tree. And 4 other GpbHLH members, namely GpTF12, GpMITF, GpDec2 and GpEBF3, formed monophyletic clade with bootstrap values ranging from 72 to 82 in NJ tree, but did not form monophyletic clade in MP and ML trees. Although these 6 GpbHLH members did not have sufficient bootstrap support, we defined orthologs for them because they all have one or two bootstrap support to testify their orthology to the correspondent mouse ortholog. This phylogenetic divergence of bHLH motif sequences between giant panda and mouse probably means that these two mammals have evolved in quite different circumstances.
Finally, there are 4 GpbHLH sequences which did not form monophyletic clade with most of the mouse bHLH motif sequences in all constructed phylogenetic trees. They are GpBeta3b, GpMesp1, GpHen1 and GpBmal2 of which whole protein sequences were used to conduct in-group phylogenetic analyses with whole sequences of corresponding mouse bHLH proteins for defining their orthology (marked with a in Table 1).
Protein Sequences and Genomic Coding Regions of Giant Panda bHLH Genes
Protein sequence accession numbers of all the identified GpbHLH motifs were listed in Table 1. It was found that there are 95 GpbHLH motifs of which protein sequence accession numbers were found in ‘Non-RefSeq protein’ database (shown as ‘XP’ plus number). Protein sequence accession numbers of 9 GpbHLH motifs were only found in ‘Ab initio protein’ database in which all protein sequences were predicted from their corresponding genomic sequences (shown as ‘hmm’ plus number). They are GpAsh3b, GpAsh3c, GpTal1, GpSim2, GpNPAS3, GpId1, GpId4, GpDec2 and Hes7, respectively. There are also 3 GpbHLH protein sequences of which accession numbers were not found in any protein databases. They are GpKA1, GpMist1 and GpOrphan4, respectively.
Table 1 showed that, among the 104 bHLH protein sequences deposited in giant panda databases, 58 were annotated in full agreement with our analytical result (shown as the same name in the column of “annotation in GenBank” with that in the column of “gene name”), 33 were annotated differently with our result (shown as a different name in the column of “annotation in GenBank” with that in the column of “gene name”), and 13 were merely predicted proteins (indicated as “hypothetical protein”). Therefore, our work not only newly identified the 13 protein sequences as bHLH family members but also provided additional information for further investigations on the 33 differently annotated bHLH protein sequences.
The coding regions and intron analysis for 107 giant panda bHLH motifs are listed in Table 2. The data of intron analyses showed that there are 47 GpbHLH members with introns in their bHLH motifs. It was found that: (i) 26 GpbHLH members have one intron, among which 13 GpbHLH members have introns in the basic region, 12 have introns in the loop region, and 1 has introns in the helix 2 region. (ii) 19 GpbHLH members have two introns, among which 15 have introns in the basic and loop regions respectively, 3 have introns in the basic and helix 2 regions respectively, and 1 has introns in the helix1 and helix 2 regions respectively. (iii) 2 GpbHLH members have three introns among which two were located in basic region and one was located in helix 2 region. There are altogether 70 introns being identified in the 47 GpbHLH motifs. The longest intron in GpbHLH motif is 45,217 bp (base pairs), and the average length of intron is 4,393 bp. These data are comparable with those of mouse. In mouse, there are also 47 bHLH members having introns in their bHLH motifs. The total number of introns identified is 73, with the longest one of 48,288 bp and the average length of 4,286 bp (data not shown).
The Giant Panda bHLH Repertoire
Compared to the 114 bHLH family members of mouse, it was found that the giant panda has one less member in each of the 7 bHLH families namely Beta3, Mesp, Paraxis, Myc, Hes, COE and Orphan. The missing bHLH family members are Beta3a, Mesp2, Sclerax, S-Myc, Hes5 (or Hes6), EBF4 and Orphan 1, respectively. Based on the available data, it is difficult to say whether giant panda does lack these bHLH genes. At present, there are three mammalian species (human, mouse and rat) of which bHLH family members have been identified and classified , . While human has different members with mouse and rat in only 2 bHLH families, i.e. Myc and H/E(spl), it is hard to believe that giant panda could have different members in 7 bHLH families. Moreover, among the 7 family members missing in giant panda, zebrafish and chicken are found to lack only one (S-Myc) and two (S-Myc and EBF4) members, respectively , . Therefore, it is thought that additional bHLH members may be found after a new and higher quality version of giant panda genome sequence is released. Nevertheless, given that very little information is available on bHLH genes and their functions among bear speices, our data provide a good background information for further studies on regulatory functions of bHLH proteins in giant panda and other bear species.
Alignment of 107 giant panda bHLH family members. Designation of basic, helix 1, loop and helix 2 follows Ferre-D'Amare et al. . The family names and high-order groups have been organized according to Table 1 of Ledent et al. . Highly conserved sites are indicated with asterisks on the top. The first five amino acids of NPAS1 were not available due to incompleteness of the correspondent genomic contig sequences.
Phylogenetic relationship of 107 giant panda and 114 mouse bHLH members. The tree was constructed with neighbor-joining algorithm with OsRa (the rice bHLH motif sequence of R family) as outgroup. For simplicity, branch lengths of the tree are not proportional to distances between sequences, and bootstrap values less than 50 are not shown. The higher-order group labels are in accordance with Ledent et al. .
In-group phylogenetic analyses of GpAsh1. (a), (b) and (c) are NJ, MP and ML trees constructed with one giant panda bHLH member (GpAsh1) and nine group A bHLH members from mouse, respectively. In all trees, OsRa was used as the outgroup.
The authors are greatly thankful to professor Bin Chen of Jiangsu University and two anonymous reviewers for their constructive comments on the manuscript.
Conceived and designed the experiments: QY KC. Performed the experiments: CD DZ. Analyzed the data: CD YW. Wrote the paper: CD YW.
- 1. Massari ME, Murre C (2000) Helix-loop-helix proteins: Regulators of transcription in eucaryotic organisms. Mol Cell Biol 20: 429–440.
- 2. Wang Y, Yao Q, Chen KP (2010) Progress of studies on family members and functions of animal bHLH transcription factors. Hereditas (Beijing) 32(4): 307–330 (In Chinese with English abstract).
- 3. Reece-Hoyes JS, Deplancke B, Shingles J, Grove CA, Hope IA, et al. (2005) A compendium of C. elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol 6: R110.
- 4. Zheng X, Wang Y, Yao Q, Yang Z, Chen K (2009) A genome-wide survey on basic helix-loop-helix transcription factors in rat and mouse. Mamm Genome 20(10): 236–246.
- 5. Li X, Duan X, Jiang H, Sun Y, Tang Y, et al. (2006) Genome-wide analysis of basic/helix-loop-helix transcription factor family in rice and Arabidopsis. Plant Physiol 141: 1167–1184.
- 6. Satou Y, Imai KS, Levine M, Kohara Y, Rokhsar D, et al. (2003) A genomewide survey of developmentally relevant genes in Ciona intestinalis. I. Genes for bHLH transcription factors. Dev Genes Evol 213: 213–221.
- 7. Simionato E, Ledent V, Richards G, Thomas-Chollier M, Kerner P, et al. (2007) Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics. BMC Evol Biol 7: 33.
- 8. Toledo-Ortiz G, Huq E, Quail PH (2003) The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell 15: 1749–1770.
- 9. Wang Y, Chen KP, Yao Q, Wang W, Zhu Z (2007) The basic helix-loop-helix transcription factor family in Bombyx mori. Dev Genes Evol 217(10): 715–723.
- 10. Wang Y, Chen KP, Yao Q, Wang WB, Zhu Z (2008) The basic helix-loop-helix transcription factor family in the honeybee, Apis mellifera. J Insect Sci 8: 44.
- 11. Wang Y, Chen KP, Yao Q, Zheng XD, Yang Z (2009) Phylogenetic analysis of zebrafish basic helix-loop-helix transcription factors. J Mol Evol 68(10): 629–640.
- 12. Liu WY, Zhao CJ (2010) Genome-wide identification and analysis of the chicken basic helix-loop-helix factors. Comp Funct Genom. doi:10.1155/2010/682095.
- 13. Wan QH, Fang SG, Wu H, Fujihara T (2003) Genetic differentiation and subspecies development of the giant panda as revealed by DNA fingerprinting. Electrophoresis 24(9): 1353–1359.
- 14. Kleiman DG (1983) Ethology and reproduction of captive giant pandas (Ailuropoda melanoleuca). Z Tierpsychol 62: 1–46.
- 15. Tao Y, Zeng B, Xu L, Yue B, Yang D, et al. (2010) Interferon-gamma of the giant panda (Ailuropoda melanoleuca): complementary DNA cloning, expression, and phylogenetic analysis. DNA Cell Biol 29(1): 41–45.
- 16. Wan QH, Zeng CJ, Ni XW, Pan HJ, Fang SG (2009) Giant panda genomic data provide insight into the birth-and-death process of mammalian major histocompatibility complex class II genes. PLoS One 4(1): e4147.
- 17. Ledent V, Vervoort M (2001) The basic helix-loop-helix protein family: comparative genomics and phylogenetic analysis. Genome Res 11: 754–770.
- 18. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 24: 1596–1599.
- 19. Atchley WR, Terhalle W, Dress A (1999) Positional dependence, cliques, and predictive motifs in the bHLH protein domain. J Mol Evol 48: 501–516.
- 20. Nicholas KB, Nicholas-Jr HB, Deerfield-II DW (1997) GeneDoc: Analysis and visualization of genetic variation. Embnet News 4: 14.
- 21. Swofford DL (1998) PAUP*. Phylogenetic analysis using parsimony, Version 4. Sinauer Associates.
- 22. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502–504.
- 23. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. CABIOS 8: 275–282.
- 24. Ledent V, Paquet O, Vervoort M (2002) Phylogenetic analysis of the human basic helix-loop-helix proteins. Genome Biol 3: R30.
- 25. Ferre-D'Amare AR, Prendergast GC, Ziff EB, Burley SK (1993) Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain. Nature 363: 38–45.