Patterns of Positive Selection of the Myogenic Regulatory Factor Gene Family in Vertebrates

The functional divergence of transcriptional factors is critical in the evolution of transcriptional regulation. However, the mechanism of functional divergence among these factors remains unclear. Here, we performed an evolutionary analysis for positive selection in members of the myogenic regulatory factor (MRF) gene family of vertebrates. We selected 153 complete vertebrate MRF nucleotide sequences from our analyses, which revealed substantial evidence of positive selection. Here, we show that sites under positive selection were more frequently detected and identified from the genes encoding the myogenic differentiation factors (MyoG and Myf6) than the genes encoding myogenic determination factors (Myf5 and MyoD). Additionally, the functional divergence within the myogenic determination factors or differentiation factors was also under positive selection pressure. The positive selection sites were more frequently detected from MyoG and MyoD than Myf6 and Myf5, respectively. Amino acid residues under positive selection were identified mainly in their transcription activation domains and on the surface of protein three-dimensional structures. These data suggest that the functional gain and divergence of myogenic regulatory factors were driven by distinct positive selection of their transcription activation domains, whereas the function of the DNA binding domains was conserved in evolution. Our study evaluated the mechanism of functional divergence of the transcriptional regulation factors within a family, whereby the functions of their transcription activation domains diverged under positive selection during evolution.


Introduction
Recent studies in evolutionary genetics have provided several lines of evidence supporting the role of positive selection in the evolution of many genes. These studies have suggested that positive genetic selection is also the major evolutionary force in addition to neutral mutations and random genetic drift [1][2][3]. In all known organisms, transcriptional regulation plays a central role in complex biological processes. However, the mechanisms underlying the functional gain and divergence of transcription factors remain unclear. Here, we performed an evolutionary analysis to study the role of positive selection in the evolution of myogenic regulatory factors (MRFs), which comprise the transcription factor family that regulates myogenesis.
Myogenesis involves two major temporally ordered steps. First, myogenic progenitor cells (myoblasts) originate from mesenchymal precursor cells, and second, these cells then terminally differentiate into mature muscle fibers [4]. The myogenic regulatory factors (MRFs) play key roles in myoblast determination and differentiation [5,6]. In vertebrates, the MRF family includes myogenic differentiation 1 (MyoD), myogenic factor 5 (Myf5), myogenin (MyoG), and Myf6 (MRF4) genes. All MRFs share a conserved basic helix-loop-helix (bHLH) domain that is required for DNA binding and dimerization with other proteins, such as E protein. All four MRFs are characterized by their capacity to convert a variety of cell lines into myocytes and to activate muscle-specific gene expression [7,8]. The four MRF proteins display distinct regulatory roles in muscle development. Myf5 and MyoD are myogenic determination factors and contribute to myoblast determination, which is activated in proliferating myoblasts before overt differentiation. In contrast, MyoG and Myf6 are myogenic differentiation factors that contribute to the differentiation of myoblasts and act downstream of Myf5 and MyoD, though Myf6 partly acts at both the determination and differentiation levels [6,9,10]. Although Myf5 and MyoD have redundant functions in myoblast determination and can compensate for the functional loss of each other, Myf5 plays a more critical role during the early determination of epaxial muscle, whereas MyoD is more critical for hypaxial muscle determination [8,11].
Genome duplication is believed to be a major genetic event that occurs during the evolution of a gene family from a single gene to multiple gene copies [12,13]. Indeed, evolutionary analyses of the amino acid sequences of the MRF family indicate that vertebrate Myf5, MyoD, MyoG, and Myf6 genes were duplicated from a single invertebrate gene [14,15]. The vertebrate genome contains all four MRFs genes, whereas the invertebrate genomes of Caenorhabditis elegans [16], Anthocidaris crassispina [17], and Drosophila melanogaster [18] only contain a single MRF gene. However, although only a single MRF gene exists in the genome of Ciona intestinalis, it gives rise two different transcripts of MRFs (MDFa and MDFb) as a result of alternative splicing. Moreover, in cephalochordates, the amphioxus have two MRF genes: BMD1 and BMD2 [19,20]. The amphioxus and ascidians are chordates species and are closely related to vertebrates [21]. The two MRF genes in amphioxus might be the adaptive result of muscle evolution in cephalochordates in order to acquire a more complex transcriptional regulatory network for myogenesis [15,22,23]. The two splice forms of MyoD in ascidians suggest that the regulation pattern of multiple MyoD genes has evolved under selective pressure before the MRF genes were duplicated into multiple copies [19,24]. Genome evolution studies suggested that large-scale genome duplications occurred during early chordate evolution [12,25]. The vertebrate genome appears to undergo two rounds of duplication according to the ''one-twofour'' rule [13], and the MRF gene family appears to have followed that rule as well [14]. The single ancestral gene initially duplicated into two lineages during the evolution of chordates. The Myf5 and MyoD genes were then duplicated from one of these two lineages, whereas MyoG and Myf6 were duplicated from the other lineage during vertebrate evolution. Therefore, the functional redundancy between Myf5 and MyoD as well as between MyoG and Myf6 might be due to their common genetic origin [14].
The mechanisms underlying the evolution of the MRF gene family during their duplication remain unclear. In particular, the evolutionary forces affecting the functional divergence of the four MRFs genes have not been fully elucidated. In this study, we investigated the mechanisms underlying the evolution of the four MRF genes with particular emphasis on the selective pressures imposed on the branches and sites of MRFs during vertebrate evolution. Our study provides several lines of evidence for the role of positive selection in the functional divergence of transcription factors.

Results
The sequence variations among the four groups of vertebrate MRFs In vertebrates, the MRF sequences were divided into four groups and their protein structure differences are shown in Figure 1A. Three functional domains were identified in MRF proteins by querying the Conserved Domain Database in NCBI [26]. The most conserved region is the HLH domain, which defines the MRF family, as its amino acid sequences were almost unchanged among the four MRFs ( Fig. 1A and 1B). The BASIC domain was also conserved in all of the MRFs (Fig. 1A). However, the third MYF5 domain was only conserved in the myogenic determination factors (Myf5 and MyoD), but not in the myogenic differentiation factors (Myf6 and MyoG) (Fig. 1A). Two amino acid sequences of SXXTSPXSNCSDGM and SSLDCLSXIVXRIT were highly conserved in the MYF5 domain of Myf5 and MyoD (Fig. 1C).

Detection of positive genetic selection for all vertebrate MRFs sequences
Nucleotide mutations in coding sequences are important for the evolution of gene functions. The likelihood ratio (LR) tests of site models in the CODEML program of phylogenetic inference by maximum likelihood (PAML4) [27] were used to test the positive selection of all vertebrate MRF sequences. A neighbor joining (NJ) tree of 153 vertebrate MRF coding sequences (File S1) was used for the LR tests ( Fig. 2A). The LR tests with M7 and M8 detected positive selection by using all vertebrate MRFs sequences, which fit the selective model better than the null model and also had a v.1 (Table S1). The results remain significant with the experimental error set at 1% (Table 1, Table S1). There are 11 sites under positive selective pressure, which were identified under M8 using Bayes Empirical Bayes (BEB) analysis [28,29] (Table 1,  Table S1, Figs. 3A and 3B).

Different positive selection on the four branches of vertebrate MRFs
Typically, the relatively short period of positive selection is usually followed by long periods of continuous negative selection [2]. The branch models of the CODEML program were used to examine whether some branches in the MRFs phylogeny were driven by positive selection. First, we used the one-ratio model (M0), which assumes a single v ratio for all lineages in the phylogeny [28,30]. Under the M0 model, the v ratio is 0.055, which is significantly less than 1, and indicates that the evolution of MRFs was dominated by strong purifying selection (Table 1). We then used free-ratio and two-ratio models to test for positive selection in each branch. The freeratio model assumes a different v parameter for each branch in the tree [28,30]. The LR test results revealed that the differences between the free-ratio and one-ratio models were significant (p,0.01, Table 1), indicating that the v ratios were different among the lineages.
Given that positive selection usually affects a few amino acid sites along particular lineages [2,27], we used branch-site models to further examine whether some sites along particular MRFs lineages are under positive selection pressure ( Table 1). As expected, the positive selection on the four vertebrate MRF lineages was different. We identified 5 sites under positive selection from the vertebrate MyoG lineage (branch f in Fig. 2A, Fig. 4A and Table 1). In addition, another 3 and 2 amino acid sites were identified from the teleost MyoG lineage and the bird MyoG lineage, respectively (branches m and n in Fig. 2A, Fig. 4A, and Table 1). Although no positive selection sites were identified in the entire vertebrate Myf6 lineage, 2 sites were identified from the birds-mammals Myf6 lineage, and 2 additional sites were identified in the teleost Myf6 lineage (branches j and l in Fig. 2A, Fig. 4A, and Table 1). In addition, only 3 sites were identified from the Actinopterygii MyoD lineage (branch g in Fig. 2A, Fig. 4A, and Table 1). However, no site was identified from the Myf5 lineage.

The functional divergence between the myogenic determination factors (Myf5/MyoD) and myogenic differentiation factors (MyoG/Myf6)
The myogenic determination factors (Myf5/MyoD) and myogenic differentiation factors (MyoG/Myf6) play distinct roles in myogenesis. The functional divergence between these factors was estimated using the DIVERGE 2.0 program [31]. Type I functional divergence showed h = 0.49960.04 between Myf5/ MyoD and MyoG/Myf6 branches, which was significantly greater than 0 (p,0.01). Thus, the functional divergence between Myf5/ MyoD and MyoG/Myf6 was significant. Twenty-nine residues have a stringent threshold of a posterior ratio higher than eight. Most of these sites were located in the BASIC, MYF5 domains and Cterminus, which might be critical for the functional divergence between the myogenic determination factors (Myf5 and MyoD) and vertebrate MyoG. All the branches with a v-ratio significantly greater than 1 are marked with arrows and letters corresponding to those in Table 1 and Table 2. doi:10.1371/journal.pone.0092873.g002 myogenic differentiation factors (Myf6 and MyoG) (Fig. 3E). The role of positive selection in this divergent process was evident ( Table 1, Fig. 2A, and Fig. 3C). Using the branch-site specific model, the same 14 positive selection sites were identified from the Myf5/MyoD lineage (branch a in Fig. 2A) and MyoG/Myf6 lineage (branch b in Fig. 2A), with 7 of them located in the BASIC domain, 1 close to the HLH domain, and 6 in the MYF5 domain and C-terminus (Fig. 3C, Fig. 3D, and Table 1).

Detection of positive genetic selection for each group of vertebrate MRF sequences
A neighbor joining (NJ) tree of 53 vertebrate MyoD coding sequences (File S2) was generated, which was used for positive selection analysis (Fig. 2B). No sites were identified using the site models. However, positive selection was identified from the teleost MyoD2 lineage using the two ratio branch model (branch c in Fig. 2B, Table 2). Moreover, 4 sites were identified from the  Table 2). Thus, the evolution of MyoD for all vertebrates was likely driven by positive selection. Similarly, positive selection was identified in vertebrate MyoG using a tree of 43 sequences ( Fig. 2C and File S3). Using a branch-site model, 3 sites were identified from the lineage of the bird MyoG (branch a in Fig. 2C, Fig. 4B and Table 2) and 4 sites were identified from the teleost MyoG lineage (branch b in Fig. 2C, Fig. 4B and Table 2). Unlike the other MRF genes, 2 sites were still identified by the pair model of M7 versus M8 when the sequences were limited only to the 19 mammalian MyoG sequences ( Table 2, Fig. 4C and File S4). These results suggest that the evolution of MyoG in all vertebrates was driven by positive selection.
Unlike MyoD and MyoG, no branch or site under positive selection was identified in the vertebrate Myf5 gene (Table S2). Although the selective pressures on the branches of Myf6 were different (Table 2), no sequences were found to be under positive selection at the 5% confidence level (Table S2). These data suggest that the evolution of MyoD and MyoG was driven strongly by positive selection, but the evolution of Myf5 and Myf6 was only weakly driven by this selective pressure.

Location of positive selection sites
Under Bayes Empirical Bayes (BEB) analysis, a total of 55 positive selection sites during the divergence of MRFs were identified using the site and branch-site models of PAML4. We plotted the genetic location of positively selected sites onto the protein secondary structure and three-dimensional structure ( Fig. 3 and Fig. 4). Positively selected sites were not homogeneously distributed among regions. A total of 40% (22 of 55) of sites were located in the BASIC domain, whereas 51% (28 of 55) of sites were located in the MYF5 domain and C-terminus. Only 2 sites were located in the HLH domain. Among the 28 sites in the MYF5 domain, most were located in conserved amino acid sequences of SXXTSPXSNCSDGM and SSLDCLSXIVXRIT. To identify connections between positive selection and functional sites, spatial relationships among the positive selection sites were evaluated by mapping them onto three-dimensional protein structures [32,33]. All sites were shown to localize on the protein surface ( Fig. 3B  and 3D).

Different rates of evolution for each of the three MRFs domains
Given that most of the positive selection sites are frequently located in the BASIC and MYF5 domains of MRF proteins, the positive selection pressures on the three domains should be different. Thus, the evolution rates of the three domains were analyzed by calculating the nonsynonymous (dN) and synonymous (dS) substitution rates (Fig. 5). The MYF5 domain had the fastest evolutionary rate, whereas the HLH domain evolved the slowest (Fig. 5A, 5B, and 5C). In addition, the evolutionary rate of C-terminal sequences in MyoG and Myf6 was significantly faster than the MYF5 domain of MyoD and Myf5, whereas the HLH domain had a similar evolutionary rate among the four MRFs ( Fig. 5D  and 5F).

Discussion
The four MRF genes display distinct regulatory roles during embryonic myogenesis and postnatal muscle development [6,9,34]. However, the mechanisms underlying the functional The v represents for Ka/Ks, the topology and branch-specific v ratios are presented in Figure 3. * Significant at p,0.05, ** Significant at p,0.01. The site number is marked with the alignments with the gap eliminated. 2D, log-likelihood difference between compared models. doi:10.1371/journal.pone.0092873.t001 divergence among them remain unclear. In this study we investigated the evolution of the four MRF genes in order to determine the role of positive selection in the functional divergence of this transcription factor family.

The functional complex trajectories of vertebrate MRFs genes
The four vertebrate MRF genes diverged from a single invertebrate ancestor gene following two rounds of genomic duplication [14]. In the urochordate Ciona intestinalis, two MRF proteins (MDFa and MDFb) were transcribed by a single MRF gene, which was different than lower invertebrates, whereby a single MRF ortholog was transcribed [35]. Thus, the vertebrate-like regulatory strategy of multiple myogenic factors has been described in Ciona intestinalis [19,24,35]. In vertebrates, the four MRFs are produced by gene duplication. It has been shown that Myf5 and MyoD evolved from one of these lineages, whereas MyoG and Myf6 (MRF4) evolved from another lineage  Table 2. doi:10.1371/journal.pone.0092873.g004 [14], which might explain the functional overlap of these factors [6,7]. All three domains of MRF proteins were identified in vertebrates. The HLH and BASIC domains were conserved in all of the vertebrate MRFs. However, the third MYF5 domains were only identified in the vertebrate Myf5 and MyoD genes, but are not conserved in Myf6 and MyoG (Fig. 1A). Therefore, the MYF5 domain is critically involved in the functional differences between the myogenic determination factors (Myf5 and MyoD) and the myogenic differentiation factors (MyoG and Myf6). In addition, two amino acid regions (SXXTSPXSNCSDGM and SSLDCLSXIVXRIT) might be critical in the functional gain of the myogenic determination role in Myf5 and MyoD (Fig. 1C). Most sites of the SSLDCLSXIVXRIT region were also conserved in the Myf6 C-terminus, which might explain the minor role of Myf6 in myogenic determination (Fig. 1C) [6,10].

The functional divergence between the myogenic determination factors (Myf5/MyoD) and myogenic differentiation factors (MyoG/Myf6)
Positive selection and gene duplication are two major forces in the adaptive evolution of new functions in a gene family [2]. Significant evidence of positive selection was found during the evolution of the vertebrate MRFs. Positively selected sites were identified in the BASIC, MYF5 domains and C-terminus, and all of these sites localized on the surface of human MyoD (Fig. 3A and 3B). Given that the BASIC, MYF5 domain and C-terminus are the transcription activation domains and are required for muscle gene activation [6,7], the positive selective pressures may alter the capability of MRFs to activate myogenic gene expression, which might be responsible for the functional divergence of the vertebrate MRFs.
Indeed, our findings provide evidence that the functional divergence of the transcriptional activity domain between the myogenic determination factors (Myf5 and MyoD) and differentiation factors (Myf6 and MyoG) was driven by positive selection. Positive selection sites responsible for this divergent process were identified from the BASIC, MYF5 domains and C-terminus ( Fig. 2A, Fig. 3C and 3D). Moreover, the role of positive selection in functional divergence between Myf5/MyoD and MyoG/Myf6 was also evident after examining the selective pressure on each of the four vertebrate MRFs lineages, which suggested that the major sites and species under positive selection were observed in the MyoG and Myf6 lineages, while few were identified in Myf5 and MyoD ( Fig. 2A and Fig. 4A). In particular, positive selection sites in the HLH domain were identified from the vertebrate MyoG branch (Table 1, Fig. 2A and Fig. 4A). The HLH domain is required for DNA binding and dimerization of myogenic bHLH factors with other proteins [7,10]. Thus, the transcriptional activity domain and DNA binding domain of MyoG were all likely driven by positive selection pressures, which could explain the specific role of MyoG in myogenic differentiation, but not in myogenic determination [36]. Although sites located in the C-terminus were also identified from two Myf6 branches in a number of organisms ranging from teleosts to mammals, no sites were located in the conserved regions ( Fig. 2A, Fig. 4A, and Table 1). This may explain the more specific role of Myf6 in both myogenic differentiation and myogenic determination [10,36,37]. Conversely, only a few sites in the Myf5 and MyoD lineages were identified, suggesting that the functions of myogenic determination factors were more conserved during their divergence from the ancestral gene. Overall, the myogenic differentiation factors gained new functions under positive selective pressure, while myogenic determination factors mostly retained the basic functions of ancestral bHLH genes. These observations could explain the more important and conserved functions of MyoD/Myf5 than Myf6/MyoG in the regulation of muscle development [10,36].

The functional divergence between the myogenic determination factors Myf5 and MyoD
In addition to the divergence between the myogenic determination factors and differentiation factors, the functional divergence The v represents for Ka/Ks, the topology and branch-specific v ratios are presented in Figure 3. *Significant at p,0.05, ** Significant at p,0.01. The site number is marked with the alignments with the gap eliminated. 2D, log-likelihood difference between compared models. doi:10.1371/journal.pone.0092873.t002 within the myogenic determination factors (between Myf5 and MyoD) was also under positive selective pressure. The evolution processes of MyoD in all vertebrates are driven by positive selection on the BASIC and MYF5 domains (Fig. 2B, Fig. 4B, and Table 2). However, no branches or sites under positive selection were identified during Myf5 evolution, which was selected by purifying selection. The different positive selective pressure between Myf5 and MyoD might explain the functional divergence between myogenic determination factors because MyoD gained new functions during its evolution from amphibians to mammals [38][39][40], whereas Myf5 functions remained conserved after its divergence.

The functional divergence between the myogenic differentiation factors MyoG and Myf6
Similar to the myogenic determination factors, the function of myogenic differentiation factors (Myf6 and MyoG) also diverged under positive selection. The positive selection on the BASIC and C-terminus were identified in the bird MyoG lineage and the teleost MyoG lineage (Fig. 4B and Table 2). In addition, unlike other MRF genes, positive selection was identified, though the estimate was limited to the mammalian MyoG sequences (Table 2 and Fig. 4C). Thus, the evolution of MyoG in all vertebrates was under positive selection. However, positive selection was not identified during Myf6 evolution, which indicated a relatively slow evolution rate of Myf6 after its divergence from myogenic differentiation factors. Therefore, although Myf6 and MyoG were duplicated from the

The different positive selection of the three vertebrate MRFs domains
The HLH domain is crucial for the MRF family, and therefore its amino acid sequences are almost unchanged during the evolution from nematodes to humans. In contrast, the sequences of the BASIC, MYF5 domains and C-terminus show a greater number of differences among species (Fig. 1A and Fig. 1B). Indeed, positive selection sites were identified in the BASIC, MYF5 domains and C-terminus, whereas few were found in the HLH domain. Therefore, the role of the three domains in the evolution and functional divergence of the MRF genes might be different. Based on evolutionary analysis, the role of the HLH domain in maintaining the conserved function of the MRF gene family was confirmed, whereas the BASIC, MYF5 domains and C-terminus are the targets for the gain of new functions under positive selective pressure. Thus, the DNA binding features among the four MRF genes are similar due to the conserved HLH domain. However, the transcriptional activity features among them vary due to the different evolutionary rates of the BASIC, MYF5 domains and C-terminus. Thus, their transcriptional activity for specific muscle genes are different, which resulted in their distinct roles in myogenesis [6,36,41].
Overall, we conclude that the functional gain and divergence of these transcription factors were driven by distinct positive selection on their transcription activation domains, whereas the DNA binding domains play roles in maintaining the conserved function of the transcription factor family.

Materials and Methods
Data collection and alignment BLASTP, TBLASTN and keyword searches were used to obtain the open reading frames of MRFs from the NCBI (http:// www.ncbi.nlm.nih.gov/guide/). The MRF sequences were aligned by the program MUSCLE or ClustalW, and all gaps were eliminated by manual edition (File S1). The alignment results were used to calculate the selection pressure with PAML4 [27]. The MRF protein structures were mapped by querying the Conserved Domain Database in NCBI [26].

Phylogenetic analyses
Phylogenetic trees were constructed using the MEGA5 software [42] with the Neighbor-Joining (NJ) method, a mathematical model of P-distance, 1000 bootstrap replicates, and complete deletion. In addition, the maximum likelihood (ML) trees for the MRFs were also constructed with the MEGA5 software using Kimura-2 parameters, 1000 bootstrap replicates, and complete deletion.

Detection of the evolutionary rates for MRF coding sequences
The CODEML program in the PAML4 [27] was used to calculate the positive selection of the MRFs. In the CODEML program, the branch model allows the v ratio to vary among branches in the phylogeny [28,43]. In branch models, the simplest model is M0, which is referred to as the null hypothesis H0, and it assumes the same v ratio for all branches. The model = 1 fits the free-ratio model, which assumes an independent v ratio for each branch. The model = 2 fits the two-ratio model, which is allowed to have several v ratios [27]. The site model allows the v ratio to vary among sites (amino acids in the protein). In the site model analysis, two pairs of models appeared to be particularly useful, and formed likelihood ratio tests of positive selection. The first compares M1a (Nearly Neutral) and M2a (Positive Selection), whereas the second compares M7 (beta) and M8 (beta and v). M1a allows two classes of v sites: negative sites with v0,1 and neutral sites with v1 = 1, whereas M2a adds a third class with v2 possibly .1. M7 allows ten classes of v sites between 0 and 1 according to a beta distribution with parameters p and q, whereas M8 adds an additional class with v possibly .1, similar to M2a. In addition, to test whether variable selection pressures exist among the MRFs sites, we also used a paired model of M0 (one-ratio) against M3 (discrete). M3 specifies 3 discrete classes of MRFs coding sites. The branch-site models allows v ratio to vary in sites and branches on the tree, and used to detect positive selection that affects a few sites along particular lineages (called foreground branches). The nonsynonymous (dN) and synonymous (dS) substitution rates were calculated by the Nei-Gojobrotri (Jues-Cantor) method as implemented in the MEGA5.0 program to measure the pairwise sequence distances of the three domains among different MRFs [3,42].

Three-dimensional structural analyses
Three-dimensional structures of the proteins were predicted using the worldwide web following the methods of a case study using the Phyre server [44]. The structural images for the proteins were produced using RasMol 2.7.5 [45,46].

The detection of functional divergence of MRF genes
The DIVERGE 2.0 program [31] was used to estimate the Type I functional divergence between myogenic determination factors (Myf5/MyoD) and myogenic differentiation factors (MyoG/Myf6). The Type I functional divergence was measured as the coefficient of functional divergence, h (ranging from 0-1), which was calculated by model-free estimation (MFE) and maximum-likelihood estimation (MLE) under a two-state model. The value of h represents the functional divergence [47,48].