Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes

doi:10.1371/journal.pbio.1000048

Figure 1.

Workflow of the C. elegans Proteome Analysis

Proteins and peptides were isolated from whole worm or egg homogenates, and separated biochemically. Peptides were identified by μLC-ESI-MS/MS and database searches, and validated using the Trans-Proteomic Pipeline [62]. We detected peptides for 10,631 different gene loci, which corresponds to 54% of the predicted gene loci in WormBase WS140 (19,735 gene loci). For 7,476 gene loci, more than one peptide was identified; for 580 gene loci, a single peptide was identified independently multiple times; for 2,575 gene loci, a single peptide was identified; and 9,104 gene loci were not covered at all.

More »

Expand

Figure 2.

Classification of Detected Proteins

(A–C) A bias analysis of the 10,977 identified proteins (including splice variants) in comparison to the 22,269 predicted proteins in WormBase (WS140) was performed for the parameters (A) length, (B) isoelectric point (pI), and (C) hydrophobicity. Red lines indicate the percentages of identified proteins in comparison to all C. elegans proteins in each bin. A value below 49% indicates fewer detections than expected; a value above 49% indicates more detections than expected.

(D and E) Over- and underrepresentations of transmembrane (TM) proteins (D) and their functional classes (E) in our experimental dataset. Statistically significant categories are labeled with asterisks: p-values better than 0.05 are indicated by a single asterisk (*); p-values better than 1E−4 are indicated by double asterisks (**). The proportion of proteins with transmembrane helices was 36.5% in WormBase, and 30.5% in our proteome dataset.

(F) The global functional GO slim analysis for all proteins showed statistically significant over- or underrepresentations in the categories “biological process,” “cellular component,” and “molecular function.” We used abbreviated terms for three categories (GO:0006139, GO:0008152, and GO:0005488).

More »

Expand

Figure 3.

Improved Genome Annotation via Novel Peptide Identifications

Examples of novel peptides obtained from genomic searches against a six-frame translation of the C. elegans genome, and the region where they match to the genome.

(A) The novel peptide sequence LFEMHQISGINAASPEK suggests an alternative translational start site for the protein SYN-4 (T01B11.3). The sequence predicted to code for this peptide extends upstream of the annotated translational start site. An alternative start codon can be found further upstream in the same reading frame.

(B) A peptide points at a novel splice variant that was identified for the gene F47B7.7. The peptide WGDAGYVSHSPSPTGEIHEEYQYTR extends an existing annotated exon into the downstream intron, resulting either in the selection of an alternative 5′ splice site downstream of the peptide, or in intron retention, which would result in an early translation stop (shown).

More »

Expand

Figure 4.

Operon Genes Are More Highly Expressed Than Singleton Genes

(A) Proteins whose genes are organized in operons were identified more frequently (84%) and more abundantly (median expression: 20 ppm) compared to proteins encoded by individually transcribed genes (47%; 5 ppm). p-values: double asterisks (**) indicate better than 1E−10; triple asterisks (***) indicate better than 1E−15.

(B) A similar result is obtained when analyzing Affymetrix data instead (albeit with a smaller abundance difference). In both panels, the left-most data column encompasses singleton genes (i.e., not in operons), and the four columns to the right encompass genes in operons of various lengths. Medians are indicated as black dots, and whiskers encompass the range from 25% to 75% of values.

More »

Expand

Figure 5.

Interspecies Comparative Proteomics of Orthologous Proteins in C. elegans and D. melanogaster

(A) Protein abundances deduced from spectral counting of 2,695 pairs of orthologs from both species are shown. Medians of equal-sized bins are indicated as crosses; whiskers encompass the range from 25% to 75% of values. The distribution of the orthologs (dots) is indicated in the background. The distribution and correlation coefficients of proteins involved in signal transduction and translation are shown in the inset.

(B) The correlation coefficient of R_S = 0.79 between the two species is higher than that of the comparison between protein and transcript abundance within the organisms, based on SAGE or Affymetrix data.

(C) For C. elegans, we plotted protein abundance versus sequence conservation (the latter determined by alignment with the D. melanogaster orthologs). All correlation coefficients are rank-based with p-values better than 2.2E−16.

More »

Expand