Figure 1.
Nematode species contributing to NemPep3.
EST cluster consensuses (putative genes) from 37 nematode species were obtained from NEMBASE3. This set of species includes seven not previously analyzed [11]. The species are organized by their systematic grouping based on the SSU rRNA phylogeny [14]. Feeding strategy is indicated by the small icons. We use contig to describe the consensus sequence produced for each set of clustered ESTs. For each species, the numbers of peptides derived from the BLAST-similarity and ESTScan methods of prot4EST [39] are given: only polypeptides generated by these two high-quality components contributed to NemPep3. The complete proteomes of C. elegans and C. briggsae were obtained from WormBase.
Figure 2.
Protein family discovery in the phylum Nematoda.
Nematode protein families (NemFam3) were generated using Markov flow clustering [50] with a range of Inflation parameters. The bars show the extreme number of protein families considering different Inflation parameters. Here we analyse families defined with an Inflation parameter of 3.0. A collector's curve was derived as described in Materials and Methods. Yellow circles indicate the cumulative counts of proteins (x-axis) and unique families (y-axis) as each species was added. The upper black line follows the cumulative number of protein families identified as each new species was included. For example, the 4,368 protein sequences from A. caninum included 1,200 NemFam3 families not present in the Caenorhabditis proteomes. The middle black line tracks the cumulative number of NemFam3 protein family models that identify representatives in non-nematodes, and the bottom line shows the number of NemFam3 protein family models that were present in C. elegans and in species from other (non-nematode) phyla. Region A protein families were restricted to nematodes (given current databases), while region B families have been lost in C. elegans or gained in specific nematode lineages (loss/gain candidates) and are shared with non-nematode taxa. Region C protein families are shared between C. elegans, other nematodes and non-nematode species.
Figure 3.
Nematode-restricted protein families.
(A) Distribution of protein family size can be described by a power law, with a large number of small families and the number of families decreasing as their size increases. Removing C. elegans-containing families reduced the total number of families, but the power law distribution persisted. (B) Many protein families had restricted taxonomic distribution within Nematoda. For all protein families with at least five non-C. elegans members, the systematic affinities of the contributing species were compared. Proteins families were identified that were restricted to each of the taxonomic families represented in the analysis, and to higher-level taxonomic groups (e.g. the Spirurina which includes Ascaridomorpha and Spiruromorpha). For example, 243 protein families were restricted to the Tylenchomorpha One species from a taxonomic family needed to be represented in the protein family for inclusion in the figure.
Figure 4.
Methionine metabolism in nematodes.
Cytosine-5′-methyltransferase (EC 2.1.1.37) is not present in C. elegans but has been detected in four phylogenetically divergent nematode species, suggesting that it may be widespread throughout the phylum and lost in the Caenorhabditis lineage. Enzymes found in C. elegans are green, those present in other nematodes but absent in C. elegans are red. Three further enzymes were identified as possible candidates for gene loss in C. elegans. Betaine-homocysteine S-methyltransferase (EC 2.1.1.5), homocysteine S-methyltransferase (EC 2.1.1.10) and 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC2.1.1.14) were found in one, seven and four nematode species, respectively, but not in C. elegans. The latter two enzymes have not previously been reported in metazoans and their identification in plant-parasitic nematodes may be a result of horizontal gene transfer.
Table 1.
Plant-like enzymes identified in nematode proteomes.
Figure 5.
Signal peptides in nematode proteomes in NemPep3.
Signal peptides were predicted in NemPep3 using SignalP [53]. For each species the proportion of signal peptide-containing proteins is given. There is a significant increase in the proportion of novel nematode proteins containing signal peptides relative to proteins with homologues in other phylum (p<0.0001; t = 10.53230; df = 38; paired t-test with data arcsin transformed).
Table 2.
The domain content of nematode proteomes.
Table 3.
Novel NemDom3 domains also identified in plants (Viridiplantae).