Vertebrate Paralogous MEF2 Genes: Origin, Conservation, and Evolution

Background The myocyte enhancer factor 2 (MEF2) gene family is broadly expressed during the development and maintenance of muscle cells. Although a great deal has been elucidated concerning MEF2 transcription factors' regulation of specific gene expression in diverse programs and adaptive responses, little is known about the origin and evolution of the four members of the MEF2 gene family in vertebrates. Methodology/Principal Findings By phylogenetic analyses, we investigated the origin, conservation, and evolution of the four MEF2 genes. First, among the four MEF2 paralogous branches, MEF2B is clearly distant from the other three branches in vertebrates, mainly because it lacks the HJURP_C (Holliday junction recognition protein C-terminal) region. Second, three duplication events might have occurred to produce the four MEF2 paralogous genes and the latest duplication event occurred near the origin of vertebrates producing MEF2A and MEF2C. Third, the ratio (Ka/Ks) of non-synonymous to synonymous nucleotide substitution rates showed that MEF2B evolves faster than the other three MEF2 proteins despite purifying selection on all of the four MEF2 branches. Moreover, a pair model of M0 versus M3 showed that variable selection exists among MEF2 proteins, and branch-site analysis presented that sites 53 and 64 along the MEF2B branch are under positive selection. Finally, and interestingly, substitution rates showed that type II MADS genes (i.e., MEF2-like genes) evolve as slowly as type I MADS genes (i.e., SRF-like genes) in animals, which is inconsistent with the fact that type II MADS genes evolve much slower than type I MADS genes in plants. Conclusion Our findings shed light on the relationship of MEF2A, B, C, and D with functional conservation and evolution in vertebrates. This study provides a rationale for future experimental design to investigate distinct but overlapping regulatory roles of the four MEF2 genes in various tissues.


Introduction
The myocyte enhancer factor 2 (MEF2) gene family, which belong to the evolutionarily ancient MADS (MCM1, AGAMOUS, DEFICIENS, and SRF)-box superfamily [1][2][3][4], has four members referred to as MEF2A, B, C, and D located on different chromosomes in vertebrate genomes [5,6].Of the four MEF2 members, all can be tissue-specific alternatively spliced, producing multiple isoforms which have significant functional differences [1][2][3][4].They recognize and bind to the consensus DNA sequence CTA(A/T) 4 TAG/A as homo-or heterodimers via a 56-amino acid domain (i.e.MADS-box) [7,8].Adjacent to the MADS-box is a 29-amino acid extension, referred to as the MEF2-specific (MEF2s) domain, which contributes to high-affinity DNA binding and dimerization with other homologous MEF2 proteins and facilitates interactions with other cofactors [9,10].The C-terminal of MEF2 proteins, which is subject to complex patterns of alternative splicing, contains the transcriptional activation domain to promote signal transduction and/or regulate target gene transcription [9,[11][12][13].
In contrast to the functions of MEF2 proteins in vertebrates and evolutionary analyses of the MADS family in plants, which have been the subject of extensive research, little is known about the evolutionary relationship of the four MEF2 proteins.At present, we only know that MEF2 proteins share over 65% amino acid identity in the MEF2s domain, and over 90% similarity in the MADS-box in contrast to only about 50% similarity with other MADS factors such as SRF (serum response factor) [2,38,[43][44][45], which is closely associated with specific DNA binding [46].
Here, we investigate the duplication events and evolutionary rates of the four MEF2 proteins, particularly Darwinian selection on the four MEF2 branches and on the sites in MEF2 sequences and in particular branches.The study strongly improves our understanding of MEF2 conservation and evolution in vertebrates, and the findings may be laid for future experimental dissection of the function of the four MEF2 members.

Phylogeny of MEF2 genes
The data set of 102 MEF2 protein sequences was aligned to produce the phylogeny (see Figure 1) of MEF2 genes by using the Neighbor-joining (NJ) method [47] (see Materials and methods), and this NJ tree is broadly consistent with the tree constructed by Bayesian method [48] (see Figure S1).The phylogeny shows that MEF2B is the most distant from the other three MEF2 proteins in vertebrates, and MEF2C and MEF2A are more closely tied to each other than to MEF2D and MEF2B genes.In line with MEF2A closely tied to MEF2C, some common types of alternative splicing have been observed for MEF2A and MEF2C transcripts [3,5,18].For example, there have been found 16 and 17 transcripts, respectively, for MEF2A and MEF2C genes in Pan troglodytes, and most of them have similar transcriptional splicing patterns.
Interestingly, MEF2C and MEF2D had another independent duplication event in the species Danio rerio and Xenopus laevis, respectively, producing five paralogous MEF2 proteins in the two species (see Figure 1).According to Sonnhammer's new notion [49] on paralogy and orthology, MEF2Ca and MEF2Cb can be regarded as inparalogs to each other, outparalogs to the other three MEF2 proteins in Danio rerio, and co-orthologs to MEF2C protein in other vertebrates, as well as MEF2Da and MEF2Db in Xenopus laevis.In contrast to gene duplication, gene loss presumably occurred in some species, such as MEF2B loss in Oryctolagus cuniculus (see Figure 1).

Pairwise estimates of natural selection on MEF2A-D in humans and mice
Nucleotide changes in protein-coding regions of genes are of importance to the conservation and evolution of protein function.Dealing with nucleotide changes, it is necessary to discriminate between changes that affect the amino acid sequence (nonsynonymous substitution) from changes that do not affect amino acid sequence (synonymous substitution).The ratio (v = K a /K s ) of nonsynonymous to synonymous substitution rate is a valid measure of natural selection pressure at the protein level [50], with v,1, v.1, and v = 1 representing purifying selection, positive selection, and neutral evolution, respectively [51].
Synonymous and nonsynonymous substitution rates and their ratios for MEF2A-D protein coding regions are presented in Table 1.All the four v ratios are much lower than one (v,0.2),indicating that MEF2A-D proteins are subject to strong purifying selection to maintain protein function.However, among the four proteins, MEF2B evolves at an even higher rate with much greater nonsynonymous and synonymous substitution rates than the other three MEF2 proteins, and the other three proteins evolve at the same order of magnitude level, though MEF2A has a twice bigger v ratio compared to MEF2C.
One of the unresolved issues about the MEF2 family is whether the increased v ratio of MEF2B reflects (i) simply a long-term accumulation under a relaxed selection pressure, or (ii) an abrupt increase in an episodic period for functional divergence following the duplication event.The other question is whether or not some sites in the MEF2 family or in some particular MEF2 branches are under positive selection.In the following, we will focus on variable natural selection among the four MEF2 branches, MEF2 sites, and the sites along particular MEF2 branches to test these scenarios.

Natural selection among the four MEF2A-D branches
We assumed variable v evolutionary ratios among MEF2 branches in MEF2 phylogeny, and then tested for a significant difference of the ratios based on Likelihood Ratio Test (LRT) (see Materials and methods) [51,52].The null hypothesis (H0) is that the evolutionary ratios for the MEF2 family are all simply due to underlying uniform mutation rates (i.e.v is identical across all the branches of the MEF2 phylogeny).Under the H0 model (see Table 2), the estimate of v is 0.012, indicating that the evolution of all the MEF2 members was dominated by strong purifying selection which is consistent with previous results (see Table 1).Given that no significant difference of H1 versus H0 was detected, the increased v rate of MEF2B is likely from a long-term accumulation under a relaxed selection pressure on MEF2B.In addition, among the six alternatives to H0, H5 and H6 for MEF2A and MEF2C branches both with p-value,0.01suggest that selection pressure (v~0:001) on MEF2A and MEF2C is significantly stronger than on the other two MEF2 branches consistent with previous results (see Table 1).

Natural selection among MEF2A-D sites
Strong purifying selection dominates the four MEF2 branches regardless of relative relaxed purifying selection on the MEF2B branch, however, whether some sites in MEF2 sequences under adaptive evolution or variable selection are still unknown.To test these, we conducted the following pairs of models from PAML4 [51,53]: M0 versus M3, M1a versus M2a, and M7 versus M8, and the results are presented in Table 3.For both pair models of M1a versus M2a and M7 versus M8, none of the p-values by LRT are less than 0.01, suggesting that no sites in MEF2 proteins are under positive selection.However, for the pair model of M0 versus M3, the LRT (2D'~29:568,df ~4,p{valuev0:01) suggests that there are indeed certain sites under highly variable selection pressures across MEF2 proteins.In summary, the analyses show that although none of sites in MEF2 proteins are under positive selection, variable selection pressures exist among MEF2 sites.

Natural selection among sites along particular MEF2A, B, C, and D branches
Given that positive selection often operates only on a few amino acid sites along particular branches [53], we employed branch-site specific Model A (see Materials and methods) to detect whether some sites along particular MEF2 branches are under positive selection, and the results are presented in Table 4. Along MEF2A and MEF2C branches, there are no sites with v ratio greater than 1 demonstrating that none of the sites in the branches underwent adaptive evolution.However, along MEF2B and MEF2D branches, both of the LRTs are significant less than 0.05, demonstrating that some sites along the MEF2B and MEF2D branches underwent adaptive evolution.The sites are 53 and 64 under positive selection with the posterior probability .95%along the MEF2B branch as well as 50 along the MEF2D branch (see Discussion).

Evolution of type I and type II MADS factors
There are two types of MADS factors in plants and animals, called type I (SRF-like) and type II (MEF2-like) MADS factors [54].To our knowledge, type I MADS factors evolve faster than type II MADS factors in plants [55,56].However, little is known about the evolutionary rate of the two types in animals.Substitution rates of SRF and MEF2A-D genes in the human Table 1.Rates of synonymous (K s ) and nonsynonymous (K a ) nucleotide substitutions (6 standard errors) and their ratios (v) for MEF2 protein-coding regions.Table 2. Parameter estimates under branch-specific models among the four MEF2 branches for MADS and MEF2s coding regions.and mouse genomes are presented in Figure 2. From the figure, mSRF (referred to as SRF in mice) evolves faster than its corresponding orthologous hSRF (referred to as SRF in humans).
Likewise, the evolutionary rate of the MEF2 family also reveals the same pattern that MADS factors evolve faster in mice than in humans.When comparing paralogous genes, the substitution rate of MEF2B is much higher than that of SRF, MEF2A, MEF2C, and MEF2D, demonstrating that MEF2B evolves faster than SRF as well as MEF2A, MEF2C, and MEF2D.In support of this result, the analysis of indels revealed that MEF2B bears 7 short fragment deletions and 1 fragment insertion, which is more than the other MADS-box factors bearing in the mouse genome.Furthermore, there are also slight differences in evolutionary rate among MEF2A, MEF2C, MEF2D, and SRF between mice and humans.

Discussion
The four members of the MEF2 gene family are broadly expressed in different but overlapping patterns during embryogenesis and postnatal development as well as throughout adulthood in vertebrates [2,15,16].Here, we analyzed the evolutionary relationship of the four MEF2 proteins.Phylogenetic analysis shows that MEF2B is the most distant from the other three MEF2 proteins in vertebrates, and MEF2A and MEF2C originated from the latest duplication event near the origin of vertebrates.Lineage-specific analysis of the MEF2 gene family shows that a long-term accumulation of substitutions after the duplication led to the MEF2B branch evolving faster than the other MEF2 branches.In addition, site-specific analysis of the MEF2 gene family shows that although all the sites in MEF2 proteins are clearly constrained by purifying selection, variable purifying selection appears in the MADS and MEF2s regions of MEF2 proteins.In contrast to strong purifying selection, branch-site analysis shows that sites 53 and 64 along the MEF2B branch and 50 along the MEF2D branch are under positive selection.Furthermore, analysis of substitution rates for SRF and MEF2A-D shows that SRF evolves as slowly as MEF2 proteins except for MEF2B.

Duplication of MEF2 genes
MEF2B is the most distant among the four MEF2 members in vertebrates, which is mainly because of lacking the HJURP_C region (see Figure 3), but also other sequential characters.In support of this, the MEF2A-D phylogenetic tree (see Figure S2) constructed by the alignment of only the MADS and MEF2s regions also proves that MEF2B is the most distant.In addition, an invertebrate animal called Nematostella vectensis has two MEF2-type genes (see Figure 3) [57]: one has no HJURP_C region similar to MEF2B in vertebrates; and the other has the HJURP_C region similar to MEF2A, C, and D. To our knowledge, the origin of the HJURP_C region is far much later than MADS/MEF2s domains because that the HJURP was just found in higher eukaryotes [58].Given that original MEF2 proteins have no such HJURP_C region, we presume that the origin of MEF2B is more ancient than the other three MEF2 proteins which include the HJURP_C region, and the three MEF2 proteins should share a common ancestor also including the HJURP_C region.According to the presence of two MEF2-type genes in the invertebrate species Nematostella vectensis: one has the HJURP_C region and the other does not, we further presume that the first duplication event occurred before the origin of vertebrates producing two copies of MEF2 genes, and in the following evolutionary process, one finally became extant MEF2B, the other was inserted by the HJURP_C region which lies at C-terminal to the MADS/MEF2s regions and this MEF2 gene might be the most recent common ancestor of MEF2D, A, and C. Thereafter, such MEF2 gene had two duplication events to produce MEF2D, A, and C near the origin of vertebrates.
In relation to gene duplication patterns, MEF2A-D genes seem to originate from interchromosomal duplications considering that the four MEF2 genes are distributed on different chromosomes [5,6].

Functional constraints on MEF2A-D
The pairwise approach proposed by Yang et al. [59] is an efficient iterative means for computing synonymous and nonsynonymous substitution rates.By this approach, we found that, in addition to MEF2A-D under purifying selection, MEF2B evolves faster than the other three MEF2 proteins.A reasonable explanation is that the functional constraint on MEF2B is lower than on the other three MEF2 genes.This could be why few mutant MEF2B phenotypes have ever been reported.In contrast, many mutant phenotypes have been known for MEF2A, C, and D from mice [60].For example, targeted inactivation of MEF2A, C, and D in mice results in cardiac lethality [61], embryonic lethality [62], and a failure of normal bone development [30], respectively.
Poor mutant phenotypes for MEF2B gene could be because of its possible functional redundancy with other MEF2 genes and thus MEF2B probably functions as a potential candidate for the other MEF2 proteins.In support of this hypothesis, the alternative splicing of MEF2B transcripts is altered in MEF2C mutant embryos [63], and a significant upregulation of MEF2B expression was observed [62].However, double or multiple knockouts of MEF2B and other MEF2 genes, such as inactivation of MEF2C and MEF2B in embryos, will be especially interesting and would provide more information on the roles of MEF2B.Although all the MEF2 genes are subject to purifying selection, different sites of MEF2 genes including MADS and MEF2s regions are under variable purifying selection.Site model analysis shows that none common sites in the four MEF2 branches are under positive selection, whereas branch-site analysis shows that sites 53 and 64 along MEF2B branch and 50 along MEF2D branch are under positive selection.Of interest, at both sites 53 and 64, amino acid R is present in MEF2B branch in contrast to K in the other MEF2 branches.Elegant studies from Alvarez-Buylla's group [41,42] presented that some positions in the K domain of MIKC proteins in plants, which has similar functions (e.g.dimerization) as MEF2s region in MEF2 proteins, are under positive selection involved in both dimerization and a-b folding; whereas functional differences between R and K on the two sites (i.e.53 and 64) are little known.In the case of site 50 along the MEF2D branch, residue H is present in contrast to residue S in the homologous site along the other MEF2 branches.However, the v ratio for this site is 999 because of the rate of synonymous substitution K s = 0 and thus K a /K s is represented as 999.

Faster evolutionary rate of MADS factors in mice than in humans
MADS factors evolve faster in mice than in humans.One reasonable explanation is that mice with a shorter generation length than humans would undergo more germ-line cell divisions and thus accumulate a larger number of mutations in unit time, which would lead to a larger number of substitutions in mice than in humans [64,65].However, a shorter generation length of about 80 times in mice than in humans [65] is largely inconsistent with Table 4. Parameter estimates under branch-site models along particular MEF2 branch.

Branch-site models
Parameter estimates PSS 2D

Foreground MEF2B Branch
Model A H0 v 0 = 0.011 P 0 = 0.832 v 1 = 1.000P 1 = 0.011Not allowed 21686.781v 2a fore = 999.000v 2a back = 0.011 P 2a = 0.045 v 2b fore = 999.000v 2b back = 1.000P 2b = 0.001 Note: Model A H0 is specified using fixed v 2 = 1.The p-value of Model A H0 versus Model A H1 for the MEF2B and MEF2D branches is 0.040 and 0.033, respectively, which are considered to be statistically significant.doi:10.1371/journal.pone.0017334.t004only 1-3 times the substitution rates of MADS factors in mice than in humans.Therefore, there should be other ways to affect the accumulation of substitutions, such as mutation repair efficiency [66], rate of cell division [67], and weight-specific metabolic rate [68].Furthermore, essential functions of MADS factors whether in mice or humans, usually do not suffer deleterious mutations in MADS factors and thus natural selection would eliminate such mutations.Given these causes, MADS factors evolve just 1-3 times faster in mice than in humans.However, which one or more causes play pivotal roles in constraining the evolutionary rate will need to be evaluated with further research.

Similar functional conservation between type I and type II MADS factors in animals
Based on evolutionary analysis of MADS-box genes in plants, two groups [55,56] concluded that type I MADS genes evolve much faster than type II MADS genes.Our findings in animals, however, indicate that type I MADS genes, evolve as slowly as type II MADS genes.Unlike, possibly, less functional importance or functional redundancy of type I MADS factors in plants [69], type I MADS factor usually represented as only one SRF-like MADS factor in animals is expressed ubiquitous and plays essential roles in cell differentiation and growth [70][71][72].For example, SRF-null mice die before gastrulation and do not form mesoderm [73], demonstrating that SRF is an obligatory transcription factor and thus mutation of SRF would lead to injury, illness, and even death of the organism.In contrast to SRF, the four members of the MEF2 family are mainly involved in tissue-restricted gene expression of three muscle cells but also of other cells, including T-lymphocytes, B-cells, chondrocytes, and neural crest cells [26,30,31,[74][75][76][77], and they also play essential roles in gene regulation.These considerations explain quite well why SRF evolves at nearly the same conservational level with MEF2A, MEF2C, and MEF2D, except slower than MEF2B.
In summary, we have constructed the phylogeny of the MEF2 genes, and revealed that the function of MEF2B is somewhat less important than the other three MEF2 members in vertebrates, which is consistent with functional research from previous experimental observations.To circumvent putative problems with redundancy between MEF2B and other MEF2 proteins, generation of double or multiple MEF2 knockouts is especially interesting and would provide a deeper comprehension of the different and/or overlapping functional roles of the four MEF2 members in vertebrates.

Data collection and alignment
Orthologous and paralogous MEF2 sequences (As a total of 102 sequences, see File S1) were obtained from The National Center for Biotechnology Information (NCBI) using BLASTP, TBLASTN, and key words searches [74].The MEF2 amino acid sequences were aligned by the program MUSCLE, and poorly aligned positions and divergent regions (e.g. a number of indels and/or mismatches) were eliminated by the program Gblocks in combination with manual edition.The alignment result (see File S2) was used to construct the MEF2 phylogeny.On the other hand, a Perl script was written to capture the open reading frames (excluding 59-UTR and 39-UTR) by using the corresponding MEF2 protein sequences against the corresponding mRNA sequences (see File S3).Thereafter, MEF2 coding sequences were aligned according to the previous alignment of MEF2 protein sequences by ClustalW as implemented in MEGA4 [78].Because the regions C-terminal to MEF2s domains are two divergent among the four MEF2A-D branches, it is not appropriate to calculate nonsynonymous and synonymous substitution rates and their ratios (v = K a /K s ) [79], therefore, just the MADS and MEF2s domains were used for the analyses.The alignment result (see File S4) of MADS and MEF2s coding regions was used to calculate K a , K s , and their ratios by PAML4 (Phylogenetic Analysis by Maximum Likelihood, version 4).

Phylogenetic analysis
The MEF2 phylogenetic tree was constructed by Neighbor-Joining (NJ) method with 500 bootstrap replicates, poissoncorrection (PC) distance, and pairwise deletion options as implemented in MEGA 4 [78].In addition, MrBayes 3.1 [48] with default model and priors was used to construct MEF2 Bayesian phylogenetic tree.Searches were started from a random tree (nruns = 1) with 4 heated chains (temp = 0.05) and 300,000 iterations, the initial 5,000 trees were discarded, and finally a consensus tree using the Bayesian posterior probabilities (PPs) to evaluate branch support was constructed.The consensus Bayesian tree (see Figure S1) is broadly consistent with the former NJ tree.

Detection of evolutionary rates for MADS and MEF2s coding regions
To test whether there were different evolutionary rates among MEF2A-D proteins in vertebrates, the YN00 program [59] of PAML4 [53] was used to estimate substitution rates of MEF2 coding sequences by pairwise calculation of K a /K s between mice and humans (see Table 1).To our knowledge, a high evolutionary rate is thought to originate from two possible ways: one is simply a long-term accumulation of substitutions because of relaxed natural selection; the other is an abrupt increase of substitutions in an episodic period because of functional divergence.To test which scenario brings the increase of MEF2B evolutionary rate, the CODEML program of PAML4 [53] was used to implement models that allow for different v parameters in different parts of the MEF2A-D phylogeny (see Figure S2).The simplest model, referred to as null hypothesis H0, assumes the same v ratio for all branches in the phylogeny.Other models, referred to as alternatives, specify independent v ratio for the corresponding branch in the phylogeny (see Table 2).The likelihood ratio test (LRT) [52] was applied to measure the statistical significance of each pair of nested models.
Since positive selection is likely to act on a small subset of sites in a protein and thus averages of substitution rates across a protein with lower than 1 may not represent that all the sites in the protein are under negative selection.Besides, even though all the sites in a protein are under negative selection, various negative selection pressures still may appear in different domains in a protein.To test whether some sites in MADS and MEF2s regions are under positive selection, we used two pair models from the CODEML program [53]: M1a (Nearly Neutral) against M2a (Positive Selection); and M7 (beta) against M8 (beta & v).M1a allows two classes of v sites: negative sites with v 0 ,1 estimated from our data and neutral sites with v 1 = 1, whereas M2a adds a third class with v 2 possibly .1 estimated from our data.M7 allows ten classes of v sites between 0 and 1 according to a beta distribution with parameters p and q, whereas M8 adds an additional class with v possibly .1 as M2a does.In both comparisons, degree of freedom (df) is 2. In addition, to test whether variable selection pressures exist among MADS/MEF2s sites, we used a pair model also from the CODEML program [53]: M0 (one ratio) against M3 (discrete).M0 specifies a single v ratio for all MEF2 coding sites, whereas M3 specifies MEF2 coding sites into 3 discrete classes.Degree of freedom for this comparison is 4.
In addition, to reveal whether there are some sites along particular MEF2 branches, we also did branch-site analyses employing the Test 2 [80] of the null model A H0 (model = 2 NSsites = 2) with v 2 fixed to 1 in comparison to alternative model A H1 with v 2 to be estimated [53].In contrast to 3.84 for 5% and 6.63 for 1% for x 2 1 , the critical values are 2.71 at 5% and 5.41 at 1% [53] given that the null distribution (the branch-site model we used here) is the 50:50 mixture of point mass 0 and x 2 1 .To compare evolutionary rate between type I and type II MADS factors in animals, substitution rates (see Figure 2) of SRF and MEF2A-D were calculated in humans and mice by using dogs and cows as outgroups.Here, substitution rate was simply determined as mutation rate per site [81] across coding sequences.

Figure 2 .Figure 3 .
Figure 2. Substitution rates of SRF and MEF2A-D coding regions in the human and mouse genomes.SR on Y-axis represents for substitution rate, that is, mutation rate per site across the corresponding coding region.Dog and cow are used as outgroups to identify substitution sites.doi:10.1371/journal.pone.0017334.g002

File
Figure S1 MEF2 Bayesian tree.(PDF) Figure S2 Different v parameters for different parts of the MEF2A-D phylogeny.(PDF)

Table 3 .
Parameter estimates under site pair models for the MADS and MEF2s coding regions.
Note: The v represents for K a /K s that is the average of selection across all sites in the MEF2 coding regions.PSS represents the number of sites under positive selection.**Significance with Pv0:01.doi:10.1371/journal.pone.0017334.t003