The Pectin Lyases in Arabidopsis thaliana: Evolution, Selection and Expression Profiles

Pectin lyases are a group of enzymes that are thought to contribute to many biological processes, such as the degradation of pectin. However, until this study, no comprehensive study incorporating phylogeny, chromosomal location, gene duplication, gene organization, functional divergence, adaptive evolution, expression profiling and functional networks has been reported for Arabidopsis. Sixty-seven pectin lyase genes have been identified, and most of them possess signal sequences targeting the secretory pathway. Phylogenetic analyses identified five gene groups with considerable conservation among groups. Pectin lyase genes were non-randomly distributed across chromosomes and clustering was evident. Functional divergence and adaptive evolution analyses suggested that purifying selection was the main force driving pectin lyase evolution, although some critical sites responsible for functional divergence might be the consequence of positive selection. A stigma- and receptacle-specific expression promoter was identified, and it had increased expression in response to wounding. Two hundred and eighty-eight interactions were identified by functional network analyses, and most of these were involved in cellular metabolism, cellular transport and localization, and stimulus responses. This investigation contributes to an improved understanding of the complexity of the Arabidopsis pectin lyase gene family.


Introduction
Pectins are major primary cell walls components of land plants that are important for maintaining cellular structural integrity [1,2]. Pectins are a family of complex polysaccharides with 1,4linked a-D-galactosyluronic acid residues [1], and they can be degraded by pectinases. Pectinases are classified by substrate specificity or mode of action into various classes, such as pectin esterase (EC 3.1.1.11), polygalacturonase (EC 3.2.1.15), pectate lyase (EC 4.2.3.2) and pectin lyase (EC 1.4.2.10). Pectin esterases remove methoxyl and acetyl residues of polygalacturonic acids. Polygalacturonases degrade polygalacturonan by hydrolysis of the glycosidic bonds that link galacturonic acid residues. Pectate lyases are responsible for the eliminative cleavage of pectate, thereby yielding oligosaccharides with 4-deoxy-a-D-mann-4-enuronosyl groups at their non-reducing ends. Pectin lyases are the only known pectinase capable of degrading pectin polymers directly via a b-elimination mechanism that results in the formation of 4,5unsaturated oligogalacturonides without methanol production [3]. This is very important because methanol's toxicity and unpleasant, volatile off flavors are a concern for the paper, food and textile industries [4,5]. Correlative studies about production, biochemical characterization, and applications of pectin lyases were recently reviewed [6].
Structurally, pectin lyase include an asparagines ladder and amino acid stacks and can fold into a parallel b-sheet similar to those in pectate lyase, despite about 17% sequence identity between them [3,7]. In addition, the substrate-binding clefts of these two pectinases are also dominated by aromatic residues and are enveloped by negative electrostatic potential [3,7]. While the major difference between them is in the conformation of the loop formed by residues 182-187, which might due to the different pH values of crystallization [3,7].
Pectinases have multiple biological functions. Pectate lyases can act as extracellular virulence agents [8,9], and their role in the release of cell wall oligogalacturonides is important for activation of plant defense mechanisms [10]. Furthermore, pectate lyases may be important for fruit ripening and softening [11], as well as plant growth and development [9]. Similarly, polygalacturonases contribute to pectin disassembly during many stages of plant development, such as those that require cell separation [12]. Polygalacturonase genes also have roles in pollen maturation and pollen tube growth, as well as intine and exine formation [13,14,15]. In addition, like pectate lyase and polygalacturonase, pectin lyase can enhance the reconstituted expansin-induced extension of the apical (elongating) segments of cucumber hypocotyls [16].
Most pectin lyases are produced by microorganisms (such as Aspergillus, Penicillium and Fusarium). In these microorganisms, expression of pectin lyase gene is generally induced by medium pH, carbon sources and pectin, and is generally repressed by glucose [17,18]. Limited information is available on the evolution and expression of pectin lyase genes in plants. Because phylogenetic analyses can be the basis for molecular and biochemical analyses of protein families, I performed genome-wide research on the pectin lyase gene family in Arabidopsis. Analyses of sequence phylogeny, gene organization, functional divergence, adaptive evolution, expression profiling, functional networks were performed to provide insights into the evolutionary mechanisms of Arabidopsis' pectin lyase protein family.

Identification and Characterization of the Pectin Lyase Gene Family in Arabidopsis
To identify members of the pectin lyase gene family in Arabidopsis, I first searched relevant databases using the corresponding Arabidopsis protein sequence (AT1G17150) as query. Additional searches were also performed based on keyword querying. All protein sequences with expect value #1e205 related to this family were retrieved from TAIR and NCBI. Other more divergent proteins (such as, QRT3, AUX1, PHO2 etc.) with expect value .1e205 were not included in this analysis. The Arabidopsis sequences returned from such searches were confirmed as encoding pectin lyases using the CDD (Conserved Domain Database) [19,20] and Pfam (http://pfam.sanger.ac.uk/) databases. Except for one gene (AT2G33160), all pectin lyases contained only the polygalacturonase domain. AT2G33160 contained not only the common polygalacturonase domain, but also the RNase H domain. Acquisition of this domain is likely to reflect a functional increase. As a result, 67 pectin lyase genes were identified in Arabidopsis (Table 1). The pectin lyase genes in Arabidopsis encoded for polypeptides ranging from 332 to 664 amino acids in length, with predicted pIs ranging from 4.78 to 9.84. Further analyses using protein subcellular localization prediction software TargetP [21] and PredoTar (http://urgi. versailles.inra.fr/predotar/predotar.html) predicted the probable protein localization for each of the different candidate pectin lyases in Arabidopsis. It was found that over 85% Arabidopsis pectin lyase proteins possess signal sequences for targeting the secretory pathway. Five pectin lyases (AT3G06770, AT4G32375, AT4G32380, AT5G27530 and AT5G44830) did not contain any known protein targeting motif. Another five members (AT1G02460, AT1G10640, AT1G19170, AT1G48100 and AT1G60590) were predicted to be targeted to mitochondria (Table 1). Proteins inhibited in endoplasmic reticulum (ER) are usually experience some processing steps, such as protein folding, glycosylation, disulfide bond formation and rearrangement in ER, and are finally transported to their destinations when the Nterminus signal peptides are removed. To identify the potential signal peptides in pectin lyase precursors, SignalP 4.0 server [22] was used. The results indicated that 50 members of pectin lyases possess signal peptide. The length of about 76% signal peptides in Arabidopsis pectin lyases is 20,24 amino acid residues (Table 1). AT5G14650 has the maximum length signal peptide (about 29 amino acid residues). However, signal peptide of AT3G27790 only has 17 residues in N-ternimal extension (Table 1).

Phylogenetic Analyses of the Pectin Lyase Genes in Arabidopsis
To examine the phylogenetic relationships in Arabidopsis pectin lyase genes, I performed phylogenetic analyses of the pectin lyase protein sequences based on maximum likelihood (ML) method and neighbor joining (NJ) method using MEGA 5 [23]. Tree topology assessed by the ML method was substantially similar to the NJ tree ( Figure S1). Based on their phylogenetic relationships, I divided the Arabidopsis pectin lyase family into 5 groups, designated from Group I to Group V (Fig. 1). The relationships of AT1G80140 with other pectin lyase genes, however, could not be confidently determined in my analyses: AT1G80140 is basal to a large clade consisting of Groups I and II with weak support in the ML analyses, but it forms a clade with Group IV in NJ analyses. Therefore, AT1G80140 is not classified into any group in this study. In addition, I also excluded AT3G15720, AT5G17200, and AT5G39910 out of any groups because of low bootstrap support (Fig. 1). Most of the designated groups were supported by bootstrap values. Moreover, other lines of evidence, such as gene organization (described below), also supported the group classifications established by these analyses. Group I contained 20 members and constituted the largest clade in the pectin lyase phylogeny. Evolutionary relationships between different groups of pectin lyase proteins could not be inferred. By contrast, within each group, strong amino acid sequence and gene organization conservation were evident, suggesting strong evolutionary relationships among subfamily members [24,25,26]. Phylogenetic analyses also showed that several pairs of pectin lyase proteins were putative paralogs (Fig. 1). These paralogous pectin lyase members were closely related and had similar structures (described below), indicating that they evolved from relatively recent gene duplications [27,28]. Evolutionary dates of the segmental duplication events were estimated using K s as the proxy for time ( Table 2). Six of the twenty pairs (AT2G15450/AT2G15470, AT2G40310/AT4G13760, AT1G43090/AT1G43100, AT3G07850/AT3G14040, AT1G05650/ AT1G05660 and AT4G44830/AT5G44840) had low K s values (from 0.00244 to 0.26995), indicative of duplication events occurring within the last 0.08 to 8.99 million years ago. Duplication events of five pairs (AT2G43870/AT3G59850, AT2G41850/AT3G57510, AT1G23460/ AT1G70500, AT4G32375/AT4G32380 and AT3G06770/ AT5G49215) occurred within the last 21.28 to 27.17 million years, near the time of large-scale duplication events (polyploidy or aneuploidy) [29], implying that these pairs might have come from these large-scale duplication events. The earliest observed segmental duplication event occurred in the pectin lyase (AT3G61490/ AT4G23500) of Arabidopsis around 63.67 million years ago (Table 2).

Exon-intron Organization, Motif Distribution, and Conserved Tertiary Structure for Pectin Lyase Members of Arabidopsis
Exon-intron organization has been used as supporting evidence for evolutionary relationships among genes or organisms [30]. To investigate the mechanisms of structural evolution of pectin lyase paralogs in Arabidopsis, I compared the exon-intron structures of individual Arabidopsis pectin lyase genes. Fig. 1 provided a detailed illustration of the distribution and position of introns within each of the pectin lyase paralogs. From Fig.1, it can be seen that the positions and phases of some introns are conserved in paralogous genes, whereas others are group-specific. Introns are important and necessary subassemblies of eukaryotic genes because their loss or gain is important for generating structural diversity and complexity. This is the basis for the evolution of multiple gene families. Intron's potential functions include serving as a source of regulatory elements, their role in exon shuffling and alternative splicing, and as a source of signals for exporting mRNA from the nucleus [31,32]. It has also been suggested that introns can increase intragenic recombination and moderate the evolutionary rate of genes [33]. In addition, growing evidence has shown a functional link between introns and gene expression [34]. Therefore, it is clear that the different expression profiles (described below) might be relevant to different gene organization among these groups. This result was consistent with our previous researches [35,36], showing that most pectin lyases in the same group have similar coding sequences and similar exon-intron structures, strongly supporting their close evolutionary relationship. There is a need for further examination of the potential functions of different exon-intron structures. As stated above, the major domains of pectin lyase proteins in Arabidopsis were identified using CDD and Pfam. Results showed that all pectin lyase proteins possessed singular characteristics and a structurally conserved polygalacturonase domain that is essential for their pectinase activity. While these tools are suitable for defining the presence or absence of recognizable domains, they are unable to recognize smaller individual motifs and more divergent patterns. Thus, I used the program MEME (http://meme.sdsc. edu) [37] to study the diversification of pectin lyase genes in Arabidopsis. Ten distinct motifs were identified in these genes ( Figure S2) (Table S1). As indicated above, phylogenetic analyses broadly divided the pectin lyase genes into five groups. Noticeably, most of the closely related members in each of these groups have common motif compositions, suggesting functional similarities among the pectin lyase proteins within the same group ( Figure S2). Most members of Groups I, II, III and IV possess 9 motifs, while most members of Group V have 7 motifs. Six of the motifs (motif 1, motif 2, motif 4, motif 5, motif 6 and motif 8) are shared by all pectin lyase proteins. Whether the motifs that are specific to the Groups I, II, III and IV (motif 3, motif 7 and motif 9) or to Group V (motif 10) have the unique functional roles of pectin lyases remains to be further investigated. As an example, multiple alignments clearly showed highly structured motif 8-6-1 clusters of approximately 120 amino acids in Group I located in the proteins middle section. This was one of the most conserved sections of the Arabidopsis pectin lyases ( Figure S3). Regardless, conserved motifs in pectin lyase proteins of the same group may provide additional support for the results of phylogenetic analyses. On the other hand, the divergence in motif composition among different groups may indicate that they are functionally diversified.
I also estimated the tertiary structures of different pectin lyase groups. The results indicated that all Arabidopsis pectin lyases adopt very similar tertiary structures ( Figure S2). In addition, I examined the relationships between sequence diversity, motif construction and protein tertiary structure, and found that while there was an increase in sequence diversity or motif construction (as stated above), and pectin lyases in Group V possess different motif constructions from members of other groups. Over time, this variation has a tendency to be limited for protein tertiary structures. If one motif in part of a structure undergoes sequence change, sequence change may also occur in other motif(s) so as to avoid disruption of the protein structure. It has been suggested that correlated sequence or motif evolutions could be a means of maintaining optimal structural and functional integrity [38]. Therefore, I suggest that compensatory changes or co-evolutionary mechanisms also exist in Arabidopsis pectin lyase genes.

Arabidopsis Pectin Lyase Gene Chromosomal Location and Duplication Events
Chromosomal macro-and micro-scale duplication and rearrangement are thought to be the major modes of genome evolution [29,39,40]. To investigate the relationship between pectin lyase genes and potential gene duplication within the genome, I compared the locations of pectin lyase genes in duplicated chromosomal blocks that were previously identified in Arabidopsis [41]. Pectin lyase genes are unevenly distributed among Arabidopsis 5 chromosomes, relative to corresponding duplicate genomic blocks (Fig. 2). Within identified duplication events, only two pairs (AT2G41850/AT3G57510 and AT4G23820/ AT5G41870) are retained as duplicates, whereas all others lack corresponding duplicates. This suggests that dynamic changes occur following segmental duplication that causes the loss of many genes. Therefore, segmental duplication is not the major factor that contributed to expansion of the pectin lyase gene family in Arabidopsis. Interestingly, I found that 21 of 67 pectin lyase members are located in tandem clusters on the chromosomes. The two largest pectin lyase gene clusters are located on chromosomes II and III, and each contains four tandem arrayed members, i.e. AT2G43860, AT2G43870, AT2G43880 and AT2G43890 on chromosome II and AT3G07820, AT3G07830, AT3G07840 and AT3G07850 on chromosome III (Fig. 2). Most of these genes form a single phylogenetic clade, suggesting that they may be the result of recent tandem duplications [42,43]. However, I also found that these clades contained genes from different locations, implying that they may be the consequence of ancient duplication, rearrangement, or retroposition events.

Functional Divergence Analysis
Could amino acid substitutions have caused adaptive functional diversification? To answer this question, Type-I functional divergence between gene clusters of the Arabidopsis pectin lyase family was estimated by posterior analysis using the program DIVERGE [44,45], which evaluates shifts in rates of evolution and altered amino acid properties. Comparisons of ten pairs of paralogous members' proteins were carried out and the rate of amino acid evolution at each sequence position was estimated. My results indicated that the coefficient of all functional divergence (h) values between these groups or classes is less than 1 ( Table 3). This indicates significant site-specific selective constraints for most members of the Arabidopsis pectin lyase family. This led to groupspecific functional evolution after diversification [46,47]. Moreover, critical amino acid residues responsible of functional divergence were predicted based on site-specific profiles, in combination with suitable cut-off values derived from the posterior probability of each comparison. The results showed distinct differences in the number and distribution of predicted sites for functional divergence within each pair. For example, when the cut-off value is 0.5, no critical amino acid sites are predicted for sequences in Group pairs I/II, I/III,I/IV, II/III, II/IV and III/ IV, while approximately 7, 8, 19 and 20 critical amino acids sites are predicted for Group pairs I/V, IV/V, II/V and III/V, respectively. In Table 3, I also found that higher theta values (h) existed in Group III/V (0.7112), indicating a higher evolutionary rate or site-specific selective relaxation between them. Therefore, because of the different evolutionary rates predicted at some amino acid sites, the pectin lyase genes may be significantly divergent from each other in their functions. During long periods of evolution, different evolutionary rates at specific amino acid sites can spur pectin lyase family genes to evolve new functions after divergence. Thus, functional divergence analysis may reflect the existence of long-term selective pressures.

Site-specific Selective Pressures Analysis
The K a /K s ratio measures selection pressure on amino acid substitutions. A K a /K s ratio greater than 1 suggests positive selection and a ratio less than 1 suggests purifying selection [48,49]. Amino acids in a protein sequence are expected to be under different selective pressures and to have different underlying K a /K s ratios. Detection of selective pressure can indicate selective advantages for altered amino acid sequences in the residues. These selective advantages are essential for understanding functional residues and functional protein shifts [50]. Ratios of nonsynonymous (K a ) versus synonymous (K s ) mutations were used to analyze for positive and negative selection of specific amino acid sites within full-length pectin lyase protein sequences among different groups. This was calculated using Datamonkey, an adaptive evolution server [51,52]. The results showed that the K a /K s ratios of the sequences from the different pectin lyase groups were significantly different (Table 4). However, despite the differences in K a /K s values, all the estimated K a /K s values were substantially lower than 1, suggesting that the pectin lyase sequences within each of the groups are under strong purifying selection pressure and that positive selection may have acted only on a few sites during the evolutionary process. I performed this analysis using SLAC, FEL and REL methods. The SLAC software detected no positively selected codon sites within the five groups. The REL analyses also found no positively selected sites in Groups III and V, but detected one, two and three positively selected sites in Groups I, II, and IV, respectively. In contrast, FEL analyses identified multiple sites where positive selection occurred (Table 4). I also used PARRIS to test for signatures of selection, but did not identify strong evidence (P,0.001) of positive selection in the alignment of pectin lyase coding sequences (Table 5). Therefore, a few sites might have undergone positive selection during evolution, and this might have accelerated functional divergence and the formation of multiple subgroups [53].

Differential Expression Profiles of Arabidopsis Pectin Lyase Genes
Expression profiling is a useful tool for understanding gene function. A comprehensive expression analysis was performed using publicly available microarray data for Genevestigator [54,55] and AtTOME [56]. These analyses indicated that divergent expression profiles are present among pectin lyase members across the nine tissues and developmental stages assessed (Fig. 3). In this study, most genes in Group I were expressed at the lowest levels, genes in Group II, III, and IV at intermediate levels, and genes in Group V at the highest levels. However, some pectin lyases did not to follow this trend. For example, AT3G07850, AT3G14040 and AT1G07290 in Group I displayed especially high expression levels in flower and silique developmental stages. Nevertheless, AT2G23900 in Group V displayed low expression levels in stages, from germination to bolting. I also examined the expression patterns of Arabidopsis pectin lyases under different stress  conditions. Interestingly, several genes such as AT1G05650, AT5G44830 and AT3G06770 showed low expression levels when treated with salt, whereas a subset of genes, including AT3G07850, AT1G02790, AT3G07820 and AT1G10640, displayed high expression levels under hypoxic stress (Fig. 3). Similarly, several genes, such as AT5G48140, AT5G14650 and AT3G62110, demonstrated depressed expression patterns while under osmotic conditions. As shown in Fig. 3, different expression levels of the pectin lyase genes were found in the nine different growth phases and five abiotic stressors studied, suggesting divergent functions of the pectin lyase members for the developmental process and stress response. Duplicated genes may have different evolutionary fates, as indicated by divergence in their expression patterns [57]. I also investigated the expression profiles of duplicated pectin lyase gene pairs (identified above) in Arabidopsis, and found that most gene pairs did not share similar expression patterns (Fig. 3). This indicates that substantial neofunctionalization may have occurred during the evolution of duplicated genes. During long-term evolution, the expression patterns of the paralogs and duplicated genes have diverged [29]. Such a process may increase the adaptability of duplicated genes to environmental changes, thus conferring a possible evolutionary advantage [58,59,60,61].
Next, quantitative real-time RT-PCR analysis was conducted to comprehensively examine the differential expression of the Arabidopsis Group I pectin lyase genes under wound stress. There were 20 pectin lyase genes for real-time RT-PCR analysis. The results were given in Table 6 and the primer sequences in Table  S2. The relative quantification method was used to evaluate the quantitative variation between replicates, and bold indicated upregulated gene expression under wound treatment. Five pectin lyase genes (AT2G15470, AT2G40310, AT4G13760, AT1G17150 and AT3G07830) in Group I were up-regulated during wound stress ( Table 6). The expression of additional members of pectin lyase gene families in Group I was also analyzed, but there transcripts did not change under wound treatment ( Table 6). The real-time RT-PCR results indicated that some pectin lyase genes in Group I were up-regulated under wound stress. To clarify the accurately expressional patterns of pectin lyase genes in wound stress responses in plants, I cloned promoter sequence for AT1G17150, which has been verified up-regulated gene expression under wound treatment using real-time RT-PCR analysis, and applied a transgenic approach to study its expression in the wound response. As Fig. 4-A shown, strong GUS expression was observed in floral organs, including stigma, and receptacle, (flower stalk or flower abscission zone) using histological staining of GUS activity in promoter-GUS transgenic Arabidopsis plants. Moreover, this expression mode continues during the silique developmental period. However, I did not detect observable expression activity in other tissues or organs. Furthermore, strong GUS expression was also observed in regions associated with wounding ( Fig. 4-B), and this expression is transported over time to other areas via veins ( Fig. 4-C). These results indicated that AT1G17150 might function during stigma, receptacle, and silique development. Because of increased GUS expression in wounded tissues, I suggested that the AT1G17150 promoter be responsive to wounding and can activate multiple plant defense systems. In addition, wound signals may be transmitted along veins, thereby increasing the expression or functional scope of this gene.

Functional Network of the Arabidopsis Pectin Lyases
Genes involved in related biological pathways are usually expressed cooperatively for their functions, and thus information on their interaction is a key to understand the biological systems at the molecular level [62]. To further explore which genes are possibly regulated by members of the pectin lyase family or pathway, a protein-protein interaction network was assembled. The protein-protein interactions were based on experimentally  demonstrated or predicted physical interactions [63], and functional interactions suggested by a large number of gene expression analyses [64]. As a result, 15 of 67 pectin lyases were present in the network database, resulting in a total of 235 unique genes that exhibited 288 physical or functional interactions. Therefore, a network for 15 pectin lyase genes was assembled (Fig. 5). Functional categorical analysis of these 235 genes showed that genes involved in cellular metabolism, cellular transport and localization, and stimulus responses were overly represented, compared with the whole genome. Among the 288 interactors identified, 104 and 54 proteins interacted with AT5G44830 and AT1G23460, respectively. Among them, AT5G44830 had 69 unique interactors and AT1G23460 had 36 (Table S3). Some of these interacting genes include AtSTP9 (AT1G50310), IRT1 (AT5G43370), AT3G09820 (ADK1, adenosine kinase 1), LOS2 (AT2G36530), and flagellin-sensitive 2 (FLS2, AT5G46330). AtSTP9 encodes a member of sugar transporters that have been linked to pollen germination or pollen tube growth [65]. IRT1 is identified as a major transporter responsible for high-affinity metal uptake under iron deficiency [66]. ADK1 is involved in root gravitropism and cap morphogenesis [67], and there is a direct correlation between ADK1 activity and the level of methylesterified pectin in seed mucilage [68]. LOS2 encodes a putative enolase and is shown to be a cold-responsive gene [69]. A leucine-rich repeat receptor kinase, FLS2 acts as a receptor for bacterial pathogen-associated molecular patterns, and contributes to resistance against bacterial pathogens [70]. Multiple transport pathways in plants might be regulated by coordinated expression of different genes. The interaction networks reflect the correlation of the expression pattern of different genes, and are suggestive in tracing the genes in the same pathway. Here, this network analysis has revealed that the function of some pectin lyase proteins might require the participation of various members of sugar transporters (Fig. 5). A pectin lyase, AT1G05650, was found to be interacted with three transporters, AtSTP1 (AT2G13650), AtSTP2 (AT1G07340) and AtSTP9. In addition, two other sugar transporters, AtSTP14 (AT1G77210) and AtSTP12 (AT4G21480) might be the potential interactors of the pectin lyases, AT1G43080 and AT5G44830, respectively. Thus, whether sugar transportation could serve as a link to pectin degradation process was a suggestive direction in further study. The pectin lyase gene was interacted with the same group of transporters, suggesting that they may take part in correlated molecular pathways by interaction with these partners. Plant pathogen resistance is crucial to plant development and reliable production of food. This functional network analysis has revealed that pectin lyase genes may function with pathogen resistance proteins. An important group of enzymes involved in pathogen resistance is leucine-rice repeat proteins kinases (FLS2) [70]. Here, four leucine-rice repeat proteins kinases were identified  to be coexpressed with the pectin lyase proteins, suggesting possible interactions between the pectin lyase and FLS genes. Although the exact pathways mediated by these genes were still unclear, one might speculate that these pectin lyase genes play important roles in disease resistance. The characterization of pectin lyase proteins function could therefore open new perspectives for understanding the molecular mechanism of bacterial disease resistance. These findings suggest that differential interaction may assist in the investigation of key regulatory steps in metabolic pathways. The approaches and results reported here

Conclusion
This study provided a comparative genomic analysis addressing phylogeny, chromosomal location, gene structure, functional divergence, selective pressures, expression profiling, and functional networks of the pectin lyase gene family in Arabidopsis. Phylogenetic analyses revealed five well-supported groups in the pectin lyase family. The exon/intron structure of the pectin lyase genes were highly conserved in each of the groups, indicative of their functional conservation. The pectin lyase genes were nonrandomly distributed across the Arabidopsis chromosomes, and a high proportion of the pectin lyase genes might be derived from tandem duplications. An additional functional divergence analyses suggested that significant site-specific selective constraints might have acted on most pectin lyase paralogs after gene duplication, leading to subgroup-specific functional evolution. Furthermore, comprehensive analysis of the expression profiles provided insights into possible functional divergence among members of the pectin lyase gene family. A wound-induced promoter for one pectin lyase gene that was expressed in stigma-and receptacle-specific tissues was also confirmed. Functional network analyses identified cellular metabolism, cellular transport and localization, and stimulus response genes. These data may provide valuable information for future functional investigations of this gene family.

Sequence Retrieval, Identification, and Microarray Analyses
Potential members of the pectin lyase gene family of Arabidopsis were identified using multiple database searches. One pectin lyase (AT1G17150) gene sequence was retrieved and used as a query in BLASTP searches against the TAIR (The Arabidopsis Information Resource; http://www.arabidopsis.org) using the default settings. PSIBLAST (Position-Specific Iterated BLAST) searches were also performed against the NCBI (National Center for Biotechnology Information) using the same sequence. All protein sequences with expect value #1e205 related to this family were retrieved from TAIR and NCBI and redundancies were removed. TargetP [21] and PredoTar (http://urgi.versailles.inra.fr/ predotar/predotar.html)] were used for primary structural analysis of the Arabidopsis pectin lyases. SignalP 4.0 server [22] was used to identify the potential signal peptides in pectin lyase precursors. Molecular weights and isoelectric points of pectin lyases were estimated using Compute pI/Mw (http://web.expasy.org/ compute_pi/). Genome-wide microarray data were obtained from Genevestigator [54,55,71]) and the Arabidopsis Transcriptome Genomic Express Database (http://signal.salk.edu/cgi-bin/atta).

Phylogenetic Analyses of the Arabidopsis Pectin Lyase Gene Family
Multiple sequence alignments of the full-length protein sequences were performed using MUSCLE 3.52 [72], followed by manual comparisons and refinement. Phylogenetic analyses of the pectin lyase proteins based on amino acid sequences were carried out using ML and NJ methods in MEGA v5 [23]. NJ analyses were done using p-distance methods, pairwise deletion of gaps, and default assumptions that the substitution patterns among lineages and substitution rates among sites are homogeneous. Support for each node was tested with 1,000 bootstrap replicates. ML analyses were done using a Poisson substitution model, uniform rates among sites, and partial deletion of gaps (95%). Support for each node was also tested with 1,000 bootstrap replicates.

Chromosomal Location and Gene Structure of the Pectin Lyase Genes
The chromosomal locations of the pectin lyase genes were determined using the Chromosome Map Tool (http://www. arabidopsis.org/jsp/ChromosomeMap/tool.jsp) on TAIR. Gene intron/extron structure information was collected for genome annotations of Arabidopsis from TAIR and NCBI (http://www. ncbi.nlm.nih.gov/) databases.

Protein Tertiary Structure Model-building and Visualization Analysis
The tertiary structure prediction and visualization analysis of pectin lyase proteins were performed with PHYRE2 server (http:// www.sbg.bio.ic.ac.uk/˜phyre2/html/page.cgi?id = index) [73].

Inference of Duplication Time
MEGA 5 [23] was used to perform the pairwise alignment of nucleotide sequences of the pectin lyase paralogs with ClustalW (codons). K-Estimator 6.0 program [74] was used to estimate the K a and K s values of paralogous genes. Estimates of evolutionary rates are useful for explaining patterns of macroevolution because K s can be used as a proxy for time when estimating dates of segmental duplication events. The K s value was calculated for each of gene pair and then used to calculate the approximate date of the duplication event (T = K s /2l), assuming clock-like rates (l) of synonymous substitution (1.5610 28 ) for Arabidopsis [29].

Functional Divergence Analyses
To estimate levels of functional divergence and predict amino acid residues responsible for functional differences in pectin lyase groups, the coefficients of type-I functional divergence were calculated using the method suggested by Gu [44,45]. Analyses were carried out using DIVERGE (version 2.0) [44,45], which uses maximum likelihood procedures to estimate significant changes in site-specific shifts in evolutionary rates or site-specific shifts in amino acid properties after the emergence of two paralogous sequences. The advantage of this method is that it uses amino acid sequences and, therefore, is not sensitive to saturation of synonymous sites. Type-I functional divergence designates amino acid configurations that are highly conserved in gene 1 but highly variable in gene 2, or vice versa, implying that these residues have experienced altered functional constraints [44]. Coefficients of functional divergence significantly greater than 0 indicate site-specific altered selective constraints or radical shifts of amino acid physiochemical properties after gene duplication. Sitespecific posterior analysis is also used to predict amino acid residues that may be crucial for functional divergence.

Site-specific Selection Assessment and Testing
Three methods [SLAC (single likelihood ancestor counting), FEL (fixed-effects likelihood) and REL (random-effects likelihood)] were employed to select individual codons, using the default settings of the Datamonkey web interface [51,52]. SLAC fits a nucleotide substitution model to the data and then calculates a global Ka/Ks ratio. Next, ancestral sequences at each codon are reconstructed using maximum likelihood methods. Finally, expected and observed numbers of synonymous and nonsynonymous substitutions are calculated to infer selection at each codon site. Significance is assessed by using a P value derived from a twotailed binomial distribution. SLAC calculates the expected and observed numbers of synonymous and nonsynonymous substitutions to infer selection. FEL directly estimates Ka and Ks based on a codon-substitution model, and a likelihood ratio test assesses significance at a level of 0.1. REL is an extension of the site-by-site positive selection analyses implemented in PAML [75]. Notably, it allows synonymous and nonsynonymous substitution rates to vary among codon sites, and uses Bayes factors .50 to determine codon sites (default conditions) [51,52]. Finally, ''integrative selection analysis'' was applied to identify the total number of positively selected codons that were detected by at least 1 of the 3 methods [51,52]. PARRIS [76] was concurrently used to test for signatures of selection, because it allows tree topologies and branch lengths to change across detected recombination breakpoints. In addition, synonymous substitution rates were allowed to vary across codon sites. A null model was fitted to the data, followed by an alternative (selection) model. A likelihood ratio test was then used to compare models and test for evidence for positive selection [76].

Plant Growth Conditions, Wound Treatments and RNA Isolation
Arabdiopsis ecotype Columbia was used in this study. Seeds were sterilized with one rinse in 70% EtOH for 10 min, followed by one rinse in 10% NaClO for 5 min, and then five rinses in sterilized water. Sterilized seeds were sown on Petri dishes containing Murashige-Skoog (MS) salts with 0.75% w/v agar B. Dishes were kept at 4uC for 2 days and then moved to a growth chamber at 22uC and 80% relative humidity under light/night (16h/8h) conditions. Approximately 10 days after germination, seedlings were transplanted to Cornell-mix soil (3:2:1 peat moss:vermiculite:perlite, v/v/v) and grown in a growth chamber. Arabidopsis thaliana ecotype Columbia was soil-grown as described above for 5 weeks. Wounding of soil-grown plants (5 weeks old) was performed by thoroughly crushing with forceps one-half of the rosette leaves. Wounded and nonwounded leaves were harvested at the indicated times and frozen in liquid nitrogen for further total RNA isolation. Total RNA was extracted from by using the Trizol total RNA extraction kit (Sangon, shanghai, China, SK1321) and was treated with RNase free DNase-I to remove genomic DNA.

Real-time Quantitative RT-PCR
Reverse transcription was performed with total RNA (2 mg) using M-MLV (Sangon, shanghai, China). The cDNA samples were diluted to 8 ng/mL. Triplicate quantitative assays were performed on 1 mL of each cDNA dilution using SYBR Green Master Mix (Sangon, shanghai, China, BS643) with an ABI HT7900 sequence detection system, according to the manufacturer's protocol. The gene-specific primers were synthesized in Sangon. The amplification of Atactin-11 (AT1G12110) RNA was used as an internal control to normalize all data (forward primer, 59-CCACATGCTATTCTGCGTTTGGACC-39; reverse primer, 59-CATCCCTTACGATTTCACGCTCTGC-39). The genespecific primers are given in Table S2. The relative expression levels were calculated using the ddCt method and Atactin-11 as endogenous control to normalize all data. Fold changes were calculated from the expression data obtained by qRT-PCR in all samples. Quantitative variation was evaluated between replicates.

Plasmid Construction and Plant Transformation
For constructing the fusing vector of promoter of AT1G17150::-GUS, the promoter region of AT1G17150 was amplified using primers At1-1 (59-AAGCTTGTATACAGGTTTTAGTAGTT-GAT-39, the underlined section is an engineered HindIII) and AT1-2 (59-GGATCCAATGCACGGCGCCATCTTCTTCT-39, the underlined section is an engineered BamHI) on a template of gDNA derived from leaves. The PCR products were cloned into pMD18-T (TaKaRa, Dalian, China) to form pMD-P. The promoter sequence was then released from pMD-P with BamHI and HindIII, and was subcloned into BamHI and HindIII sites of pBI121. This formed pBI-P for producing recombinant promoter::GUS fusion vector. The binary vector (pBI-P) was transferred into Agrobacterium tumefaciens strain EHA105 and the transformed Agrobacterium cells were used to transform A. thaliana ecotype Columbia via the floral dip method [77].

Histochemical GUS Staining
The histochemical GUS assay was performed in a staining solution containing 0.5 mg/ml 5-bromo-4-chloro-3-indolyl glucuronide in 0.1 M Na 2 HPO 4 , pH 7.0, 10 mM Na 2 EDTA, 0.5 mM potassium ferricyanide/ferrocyanide, and 0.06% Triton X-100 [78]. Samples were infiltrated with staining solution and incubated at 37uC overnight. Staining buffer was removed, and samples were subsequently cleared in 70% EtOH. All observations were recorded with a Sony digital camera.

Network Assembly
A protein-protein network was assembled using previously described methods [79,80]. Briefly, the network database includes known protein interaction databases, data from yeast two-hybrid experiments, predicted interactions via orthology and co-citations [63], other literature sources, and a recently reported interaction database derived from analysis of large-scale DNA microarray analyses [64]. Pectin lyase genes were mapped to their corresponding proteins in the network database. Fifty-two pectin lyases were not present in the assembled network database. Resulting interactions were used to build the 15 members interaction network. Table S1 Motif sequences identified by MEME tools. Numbers correspond to the motifs were described in Figure S2. (DOC)