Active Sites of Reduced Epidermal Fluorescence1 (REF1) Isoforms Contain Amino Acid Substitutions That Are Different between Monocots and Dicots

Plant aldehyde dehydrogenases (ALDHs) play important roles in cell wall biosynthesis, growth, development, and tolerance to biotic and abiotic stresses. The Reduced Epidermal Fluorescence1 is encoded by the subfamily 2C of ALDHs and was shown to oxidise coniferaldehyde and sinapaldehyde to ferulic acid and sinapic acid in the phenylpropanoid pathway, respectively. This knowledge has been gained from works in the dicotyledon model species Arabidopsis thaliana then used to functionally annotate ALDH2C isoforms in other species, based on the orthology principle. However, the extent to which the ALDH isoforms differ between monocotyledons and dicotyledons has rarely been accessed side-by-side. In this study, we used a phylogenetic approach to address this question. We have analysed the ALDH genes in Brachypodium distachyon, alongside those of other sequenced monocotyledon and dicotyledon species to examine traits supporting either a convergent or divergent evolution of the ALDH2C/REF1-type proteins. We found that B. distachyon, like other grasses, contains more ALDH2C/REF1 isoforms than A. thaliana and other dicotyledon species. Some amino acid residues in ALDH2C/REF1 isoforms were found as being conserved in dicotyledons but substituted by non-equivalent residues in monocotyledons. One example of those substitutions concerns a conserved phenylalanine and a conserved tyrosine in monocotyledons and dicotyledons, respectively. Protein structure modelling suggests that the presence of tyrosine would widen the substrate-binding pocket in the dicotyledons, and thereby influence substrate specificity. We discussed the importance of these findings as new hints to investigate why ferulic acid contents and cell wall digestibility differ between the dicotyledon and monocotyledon species.


Introduction
In plants, the superfamily of the aldehyde dehydrogenases (ALDHs) is generally comprised of several protein families and sub-families, each with differing roles in plant growth and development or responses to biotic and/or abiotic stresses [1,2]. Maize mitochondrial ALDH2B2 is the nuclear restorer of cytoplasmic male sterility [3,4]; rice mitochondrial ALDH family 2 is thought to be essential for the detoxification of acetaldehyde during re-aeration after submergence [5]; whereas OsALDH7B6 from rice is required for seed maturation and maintenance of seed viability through the detoxification of aldehydes generated by lipid peroxidation [6]. Transcripts of several plant ALDH genes increase in response to environmental stresses such as dehydration, salinity, excessive light or wounding [7][8][9][10][11][12][13][14][15][16]. The aldehyde dehydrogenase ALDH1 isolated from the plant Artemisia annua had more than 60% amino acid sequence identity with the subfamily 2C of ALDHs in rice and maize, and catalyzed the oxidation of dihydroartemisinic aldehyde into dihydroartemisinic acid [17]. ALDHs were thus shown to be involved in the biosynthesis of artemisinin in plants [18,19]. Nair et al. [20] have shown that the reduced epidermal fluorescence1 (ref1) phenotype (characterised by a reduced cell wall strength and an accumulation of less than 30% of the sinapate esters found in the wild type) of Arabidopsis thaliana is caused by a mutation in ALDH2C4 (AT3G24503), and that AtALDH2C4/REF1 is important for oxidizing coniferaldehyde and sinapaldehyde into ferulic acid and sinapic acid, respectively, in the phenylpropanoid pathway. Ferulic acid is a hydroxycinnamic acid which, in commelinid monocots, particularly grasses, is ester-and ether-linked to the cell wall polymers of glucuronoarabinoxylan and to lignin, respectively [21,22], whereas in dicots, it is associated with pectic polysaccharides via ester linkages ( [23], and references therein). Moreover, ferulic acid can oxidatively cross-link to form covalent ether bonds or C-C bonds between chains of polysaccharides and lignin. Polysaccharides thus become less accessible to degradative enzymes. Consistent with this, several biochemical and genetic studies have established that the ferulic acid content is negatively correlated with the cell wall digestibility in forage grasses and crops [23][24][25][26][27][28][29][30]. All of these studies underline the functional diversity of ALDH proteins, mirrored by the number of ALDH genes generally found in plant species, and their implication in the cell wall structure.
Sequenced plant genomes show ALDH families and sub-families of various gene numbers, each protein with enzymatic properties that may be similar, overlapping or different [2]. Based on the orthology principle, ALDH isoforms have often been functionally annotated in plants according to established information for Arabidopsis [2,31]. However, comparative examination of enzymatic properties of ALDH isoforms in monocotyledon plants relative to dicotyledon plants is lacking.
In this study, we have analysed the B. distachyon ALDH genes, alongside those of other sequenced monocotyledon and dicotyledon species, to investigate traits supporting either a convergent or divergent evolution of plant ALDH functions. To examine how multiplicity and sequence diversity of isoforms between monocotyledon and dicotyledon plants would influence ALDH enzymatic activity we used the ALDH2C subfamily of genes, as their greater specificity towards aromatic aldehyde substrates has been well established compared to the other subfamilies in plant [20,32]. Our results indicate that the B. distachyon genome contains members of the plant-specific ALDH families. Although the ALDH sub-families 2 and 3 are generally represented by more than 3 gene isoforms among plant species, we found a low level of polymorphism between the ALDH2C/REF1-type protein sequences in dicotyledon and monocotyledon plants. One such polymorphism is a conserved phenylalanine residue within the active site of the monocotyledon sequences that, in the conserved dicotyledon sequences, is substituted by a tyrosine residue. Even though this substitution can be viewed as a conserved amino acid substitution, protein structure modelling suggests that the substitution will result in an enlargement of the substrate-binding site, thus altering the substrate specificity of the dicotyledon ALDH2C/REF1 isoforms. Our data, therefore, suggest a difference in substrate specificity of coniferaldehyde/sinapaldehyde dehydrogenases between monocotyledon and dicotyledon plants, which to some extent, may contribute to the different levels of ferulic acid content and cell wall digestibility between dicotyledon and monocotyledon plants.

Identification and annotation of Brachypodium ALDH proteins
The genome sequence of Brachypodium distachyon line Bd21 deposited in the PHYTOZOME v10.2 database (http://www.phytozome.org) was used. ALDH amino acid sequences from rice (Oryza sativa), maize (Zea mays), and Arabidopsis (Arabidopsis thaliana) [2] were retrieved from the database and used to search for Brachypodium ALDH sequences by BLASTP with default settings [33] (S2 Table). The presence of characteristic sequence domains within ALDH proteins was verified in the retrieved sequences: PF00171 (ALDH domain), PS00070 (ALDH cysteine active site), PS00687 (ALDH glutamic acid active site), KOG2450 (aldehyde dehydrogenase), KOG2451 (aldehyde dehydrogenase), KOG 2453 (aldehyde dehydrogenase), and KOG2456 (aldehyde dehydrogenase). After removing redundant sequences, the resulting, deduced Brachypodium ALDH protein sequences were annotated using guidelines established by the ALDH Gene Nomenclature Committee (AGNC) [34], wherein, proteins with more than 40% amino acid sequence identity were grouped in a family, and sequences with more than 60% identity composed a subfamily. Amino acid sequences with less than 40% identity would describe a new ALDH protein family.

Prediction of protein structures
The effects of amino acid substitutions in the ALDH proteins were predicted from the crystal structure of the maize RF2C (ZmALDH2C1) protein (Protein Data Base number: 4PXL). All mutational analyses were performed using DeepView/Swiss-PDBViewer v4.1 software (http:// www.expasy.org/spdbv/) [44,45]. Several possible conformations were examined for each mutant residue, with conformations producing the least number of clashes with adjacent residues being pre-selected. Energy minimization was performed after selection with GROMOS96 [46] implemented in DeepView/Swiss-PDBViewer. The Ramachandran Plot was used to identify the most allowed conformations containing mutant residues.

Results
The ALDH protein superfamily in B. distachyon Brachypodium distachyon is a monocotyledon species that is closely related to crop plants such as, rice (Oryza sativa), wheat (Triticum aestivum), barley (Hordeum vulgare), rye (Secale cereale), and oats (Avena sativa) [47][48][49][50][51][52]. We have identified a total of nineteen ALDH genes in the B. distachyon genome (release version v2.1; PHYTOZOME v10.2). The genes (BdALDHs) can be grouped into ten families following the ALDH Gene Nomenclature Committee (AGNC) criteria (Table 1). Based on these criteria, the assignment of the individual BdALDH to a family and a subgroup was performed based on the percentage of identity amino acid residues: a minimum of 40% amino acid identity for the proteins of the same family, and a minimum of 60% amino acid identity for the proteins of the subgroups. The percentage of amino acid identity was obtained from a BLASTP search using the ALDH amino acid sequences from rice (Oryza sativa), maize (Zea mays), and Arabidopsis (Arabidopsis thaliana) as queries to search in the genome sequence of B. distachyon. Each gene was assigned the root symbol ' ALDH' followed by the family designation number (1, 2, 3, etc.), the subfamily identifier (A, B, C, D, etc.), and the individual gene number. The largest families are 2 and 3, composed of four and five genes respectively. Families 10 and 18 contain two genes each, with only one gene in each of families 5, 6, 7, 11, 12, and 22.
Using Arabidopsis ALDH sequences as out-group sequences the relationship of the BdALDHs to orthologs in other grass species was examined (Fig 1). BdALDHs group together with those of rice, sorghum, and maize according to their assigned families and sub-families. In most cases, the BdALDH sequences aligned very closely with rice sequences more than to the

ALDH sequence polymorphisms and functional diversity
Our analysis of BdALDHs indicates that families 2 and 3 have four and five members respectively, and they are more polymorphic than the other BdALDH families with only one or two genes in each family (Fig 1). Furthermore, if the number of members in each ALDH family in different monocotyledon and dicotyledon plants is expressed as the ratio of number of genes in the family to total number of ALDH genes in the species, the ratios show that families 2 and 3 are consistently the largest polymorphic families with the exception of soybean (Glycine max) (S1 Table; Fig 2). In plants, the protein family 2 is composed of the subfamilies 2B and 2C. Based on analyses of Arabidopsis loss-of-function mutants, the proteins of the subfamily 2C (ALDH2C/ REF1-type proteins) catalyse the oxidation of coniferaldehyde and sinapaldehyde to ferulic acid and sinapic acid, respectively [20]. Arabidopsis, like most dicotyledons with published ALDH gene families (Eutrema salsugineum, Glycine max, Vitis vinifera, Gossypium raimondii), contains only one or two ALDH2C/REF1-type genes compared to four ALDH2C/REF1-type genes in B. distachyon and other monocotyledons ( Table 1; Fig 1). Based on the orthology principle, ALDH isoforms have often been functionally annotated in plants according to established information in Arabidopsis but the degree of ALDH sequence diversity and of its effect on the specificity of function has rarely been assessed among species. To determine the degree of divergence between ALDH2C genes in dicotyledons and monocotyledons the protein amino acid sequences were aligned (Fig 3; S3 Table) then nucleotide divergence per pair of ALDH2C sequences was determined within and between selected species by using MEGA6. The nucleotide divergence between isoforms of the same lineage (within-group divergence) was similar within dicotyledons (0.33 ± 0.01) and monocotyledons (0.28 ± 0.01). In contrast, the nucleotide divergence between the two lineages was significantly higher (0.52 ± 0.02) than within the groups. To verify whether this difference has arisen either by chance or selection, we ran the codon-based Z-test of purifying selection implemented in MEGA6 with a null hypothesis of strict neutrality [41]. The null hypothesis was rejected in favour of the alternative hypothesis of purifying selection within the ALDH2Cs in both lineages (P < 0.0001). Though, the null hypothesis could not be rejected in favor of the alternative hypothesis of positive selective. These data indicated that the number of synonymous substitutions per synonymous site (dS) was significantly superior to the number of non-synonymous substitutions per non-synonymous site (dN) within ALDH2Cs in both lineages.
We examined the selection of each amino acid from the alignment shown in Fig 3. As noted above with the codon-based Z-test, nearly all sites were found to be under a purifying selection. Only 14 (3%) out of the 423 codons showed a ratio of dN/dS > 1 compared to 395 (93%) out of 423 codons that had a dN/dS < 1.
Given the fact that the previous two approaches tend to detect hotspots of positive selection more efficiently than they did for the sites of mid-level positive or negative selection, we used the Tajima's D statistics to gain better insight into the pattern of nucleotide variations. We found D = 2.71 (θ = 0.17, π = 0.29, P < 0.001), which suggests that nucleotide variation within ALDH2Cs might overall be less frequent than expected but a few alleles of nucleotide polymorphisms would be present at high frequency among the species.
Nucleotide composition, amino acid content, and codon usage are biased among ALDH2C isoforms Characterisation of the nucleotide and amino acid residue composition between ALDH2C isoforms showed that dicotyledon ALDH2C/REF1 coding sequences are enriched in A and T, specifically, at the first and third positions of codons (Table 2). In contrast, monocotyledon sequences were significantly enriched in C and G at these positions in the codon, especially G in position 3. The dicotyledon sequences have a significantly high proportion of the two amino acid residues Ile and Asn, whereas the monocotyledon sequences have a significantly high  proportion of Ala and Val (Table 3). In reviewing the nucleotide composition of the codons for these amino acids, we found that codons with A or T in the third base position were more frequently used for Ile and Asn in dicotyledon ALDH2C sequences than in monocotyledon sequences, whereas C or G in the third base position was the preferred nucleotide for Ala and Val in monocotyledon ALDH2C sequences (Fig 4). These data suggested the presence of some lineage-specific amino acid polymorphisms within ALDH2C/REF1 sequences in the dicotyledon and monocotyledon species.

Amino acid substitution analysis
We then compared the amino acids between dicotyledon and monocotyledon ALDH2C sequences and identified 32 sites where residues either strictly differ or overlap between the two plant lineages (Table 4). Overlapping sites were often composed of one amino acid residue common to the two lineages, and between one to five additional variants in one of the lineages, more often in the monocotyledon isoform sequences. Four of the 32 sites (residue number 211, 430, 529 and 576 in Fig 3) were occupied by residues that are consistent within the dicotyledon sequences but different between the monocotyledon isoforms. Amino acid residues at positions 211 and 529 were conserved between B. distachyon and Zea mays isoforms but not at site 430 where there is a mixture of conserved and non-conserved amino acid residues. One of the four sites (residue number 576 in Fig 3) was strictly dimorphic with only two amino acid variants, each conserved in one lineage (Table 5). While many of the 32 polymorphic sites involved amino acid variants with similar physicochemical properties (Table 4), sites with dissimilar amino acid residues were also identified between the two lineages. For example, L326 and Q419 in ZmALDH2C1 (positions 430 and 529, respectively in Fig 3) are conserved as V324 and Q417 in BdALDH2C1 but in AtALDH2C4 these residues are K325 and K418, respectively; showing a non-polar to polar basic amino acid variation at position 430 and a non-polar basic to a polar basic variation at position 529. Furthermore, the active site residue F466 (position 576 in Fig 3), implicated in substrate binding in maize ALDH2C1 (RF2C: [32]) and conserved in all monocotyledon isoforms examined in this study is a tyrosine, a polar amino acid, in all of the dicotyledon sequences (Fig 3). Overall, about 10% of the amino acid residues were found to be different between monocotyledon and dicotyledon ALDH2C/REF1 isoforms. The physicochemical properties of the concerned amino acids were similar or conserved between the two lineages. We examined whether those substitutions would influence the tridimensional structure of the ALDH2C isoforms.

Possible effects of amino acid variations on the catalytic properties of maize RF2C (ZmALDH2C1) protein
Using the published crystal structure of the maize ZmALDH2C1 (RF2C) protein [32] we predicted the effect on enzymatic activity of exchanging the conserved monocotyledon amino acid with its alternative dicotyledon (Arabidopsis) amino acid. The residue Q419 (position 529 in Fig 3) in ZmALDH2C1 is located on the external surface of the protein and therefore the exchange Q419K was predicted to have no or minor impact on enzyme topology. However, residues T108 (position 211 in Fig 3) and L326 (position 430 in Fig 3) are both located inside of the protein and consequently the exchanges T108A and L326K are predicted to disrupt some existing H-bonds as well as participate in the formation of new H-bonds between nearby or distant residues. Similar predictions were made for the F466Y exchange within the substratebinding site (Fig 5). The substrate-binding pocket is composed primarily of aromatic and nonpolar residues [32] and during modelling several possible configurations with the exchange residue Y466 were analysed. Most of the predicted configurations induce clashes with adjacent residues and were therefore not retained. One of the most likely permitted conformations with Y466 implies the displacement of the polar side-chain toward the surface of the protein, and  away from the hydrophobic microenvironment of the substrate-binding site, resulting in a widening of the substrate-binding channel (Fig 5). We suggest that this monocotyledon to dicotyledon exchange of amino acid residues could explain some of the substrate specificity observed between the ALDH2C family isoforms.

Discussion
ALDH enzymes are found in almost all organisms and they are expressed in diverse organs and tissues where they play diverse roles [1,2]. Besides the effect of speciation, they appear to evolve through gene duplication as shown in diverse species [31,36,37,39]. Although the driving forces of the duplication events are unclear, it is remarkable that the ALDH genes often do not have the same expression patterns and a number of isoforms per family. We found that B. distachyon, like other species, contains more isoforms within the ALDH2 and ALDH3 families than in the other families. This suggests that ALDH genes might have evolved to fulfill different functions. In this scenario, distinctive features within the sequences of evolved ALDH genes would support their functional specialisation. Alternatively, the duplicated genes might have randomly evolved and therefore would not impact on the primary function or biochemical  211  A106  T106  G110  S115  T108  G117  S119   430  K325  V324  A328  S333  L326  A335  R337   529  K418  Q417  E421  A426  Q419  E428  A433   576  Y465  F464  F468  F473  F466 F475 F480 1 The positions were identified from the parsimony-informative sites in the alignment showed in Fig 3. 2 Positions refers to the numbering of amino acid residue within each isoform of species. The strictly dimorphic case of amino acid polymorphisms is shown in bold characters.
doi:10.1371/journal.pone.0165867.t005 properties of the enzymes. To understand the evolutionary pattern of the duplicated ALDH genes within a subfamily, we used the ALDH2C amino acid sequences to test whether independent ALDH isoforms of genetically distant species, i.e. monocotyledons and dicotyledons, contain similar structural changes, and if so, whether those structural changes are likely to alter the enzyme properties such as substrate specificity. Because proteins are grouped in a subfamily based on at least 60% sequence conservation (AGNC recommendations), such a comparison might be biased. We were looking for sequence features within the remaining 40% of sequences, which might differ between genetically distant isoforms and perhaps between isoforms of a given single species. Our focus on the ALDH2C subfamily was guided by their high specificity toward coniferaldehyde and sinapaldehyde, as shown by Nair et al [20], and by their involvement in the biosynthesis of ferulic acid. Those features are unique to this subfamily, and they will be valuable for the future physiological and biochemical studies. We found that the analysed ALDH2C sequences were most likely suggested to a negative selection. This observation is significant when considering that monocotyledons and dicotyledons might have diverged 340 million years ago [53]; it suggests that recently evolved ALDHs have retained the ancestral enzymatic property, which is to oxidise aldehyde molecules to their corresponding carboxylic acids. In agreement with this, previous studies showed that the ALDH2C/REF1 proteins oxidise coniferaldehyde and sinapaldehyde to ferulic acid and sinapic acid, respectively, whereas ALDH2B proteins preferably oxidise acetaldehyde to acetic acid and would be involved in pollen fertility and aerobic fermentation [3,4,5,20,32,54]. However, besides the preferred substrates, in vitro enzymatic tests showed that ALDH2C and ALDH2B, as well as other ALDH proteins, also oxidise a range of aldehydes with comparable efficiency [4,6,12,32,55]. This leads to the question how ALDH functional specificity is achieved in planta.
Our calculation of the Tajima D statistics indicated that nucleotide variations within ALDH2Cs were overall low but a few alleles of nucleotide substitutions would be present among the species. This led us to examine whether the substitutions are likely to alter the enzyme substrate specificity. Notably, we found one site within the amino acid sequence alignment that indicates an exchange of a phenylalanine residue in the monocotyledon ALDH2C sequences with a tyrosine in the dicotyledon's ones. We do not know why the two alleles of that substitution were separately maintained in each lineage (represented by the species analysed in this study), and the biological significance. Our predictions, based on the crystal structure of the maize ZmALDH2C1 protein (RF2C, a homologous protein of ALDH2C4/REF1 in Arabidopsis) [32], suggest that the substitution would widen the substrate-binding pocket of the ALDH2C isoforms in dicotyledons. A similar observation was reported on the comparison of the maize RF2C and RF2F (ZmALDH2C5, homologous to ALDH2C4/REF1 in Arabidopsis) proteins [32]. The authors found that the substrate-binding site of RF2F is much wider because of the presence of V192 and M477 instead of the two aromatic residues F178 and F460 (positions 282 and 570 in the alignment, respectively; Fig 3) in RF2C. They further examined the impacts of these substitutions on the enzymatic activity and found that the cavity width of RF2F correlates with high Km values for various substrates most probably due to weaker nonpolar interactions. In contrast, two other isoforms RF2D and RF2E, which do not differ in active site residues, were found to have similar kinetic properties. These findings combined with our current results support the idea that substrate preference and hence specificity among highly conserved ALDH isoforms is defined by a few substitutions within the substrate-binding site of the enzyme. Consistently, examination of duplicated ALDH2 genes in Drosophila melanogaster showed that the diameter of the substrate entry channel is restricted by naturally occurring substitutions, which shift substrate specificity among duplicated genes [56,57]. It was demonstrated that eukaryote ALDH1/2s often switched between large and small substrate entry channels after gene duplication, transforming restricted channels into wide opened ones and vice versa [58]. We are not aware of any report on a side-by-side comparison of the affinity and catalytic activity of the monocotyledon and dicotyledon ALDH2C-type enzymes toward their preferred substrates coniferaldehyde and sinapaldehyde. But based on those experimental evidences, one may speculate that the exchange of F466Y can potentially alter the specificity of the ALDH2C-type enzymes toward these two substrates because a widened substrate channel is likely to alter the substrate specificity and the activity of the enzyme. Whether that substitution alone can explain why cell walls of monocotyledon species often contain more ferulic acid than the wall of dicotyledon species, however, remains to be examined [21,22]. Indeed, more than one metabolic routes were found to contribute to the ferulic acid content in the plant cell wall. According to de Oliveira et al. [30], the current knowledge suggests that ferulic acid is synthesized from the mainstream phenylpropanoid pathway. In this pathway, L-Phenylalanine is deaminated by phenylalanine ammonia-lyase to produce t-cinnamic acid. This step is followed by hydroxylation of the aromatic ring, catalysed by cinnamate 4-hydroxylase, to give p-coumaric acid. In the next step, the carboxylic group of p-coumaric acid is activated to a thioester via 4-coumarate:CoA ligase to yield p-coumaroyl-CoA. This compound is transesterified to shikimate or quinate by the action of p-hydroxycinnamoyl CoA:quinate/shikimate p-hydroxycinnamoyl-transferase (HCT). The ester is further hydroxylated in the C3 to produce caffeoylshikimate/quinate ester by p-coumaroyl shikimate/quinate 3-hydroxylase. Caffeoyl-shikimate/ quinate is transesterified back with CoA by HCT and O-methylated in the hydroxyl group in C3 by caffeoyl-CoA O-methyl-transferase (CCoAOMT) to produce feruloyl-CoA, the activated form of ferulic acid [59][60][61]. Feruloyl-CoA is considered as the major substrate of the enzymes that transfer the ferulic acid moiety into the cell wall by esterification to the cell wall polysaccharides. In a second pathway, feruloyl-CoA is reduced to coniferaldehyde in a reaction catalysed by cinnamoyl-CoA reductase (CCR). Nair et al. [20] showed that coniferaldehyde is oxidised to ferulic acid by ALDH2C4/REF1) in Arabidopsis. In order to be esterified to the cell wall polysaccharides, the free form of ferulic acid must be first activated to its active form feruloyl-CoA [62,63]. The enzyme 4-coumarate:CoA ligase has been demonstrated to be responsible for catalysing the esterification of exogenous-free ferulic acid to feruloyl-CoA in vivo [22,64]. In a third possible biosynthetic route, a caffeoyl shikimate esterase [65,66], upstream of feruloyl-CoA, catalyses the conversion of caffeoyl shikimic/quinic acid into caffeic acid that is then O-methylated in the hydroxyl group in C3 by caffeic acid O-methyl-transferase (COMT) to produce ferulic acid. The free ferulic acid may serve as a precursor for the biosynthesis of feruloyl hexose and feruloyl sinapate [67]. Of the three routes, only the second route described above involve the ALDH2C dehydrogenase activity. Recently, an Arabidopsis mutant defective in the ccr1 gene coding for cinnamoyl-CoA reductase was shown to accumulate significantly higher amounts of ferulic acid compared to the wild type. In contrast, the ferulic acid level was dramatically reduced in a double mutant defective in caffeic acid O-methyltransferase and caffeoyl-CoA 3-O-methyltransferase (comt ccoaomt) compared to the wild type [68]. These observations suggest that studying the contribution of each of the three routes to the total ferulic acid pool together with the implications of the active site amino acid substitution described in this study may greatly help develop crops with reduced ferulic acid contents. A way may be found to engineer cell walls with high digestibility based on low ferulic acid content [30]. For now, our current data support the idea that ALDH gene duplications did not evolve by pure chance. Their amino acid sequences, albeit showing more than 60% conservation within subfamilies, would include key substitutions that likely confer functional specificity. Biologists often rely on the principle of gene orthology for the transfer of functional information from experimentally characterized genes in model organisms to uncharacterized genes in newly sequenced genomes [69,70]. Our data now calls for caution in this approach.
Supporting Information S1