Identification and characterization of epicuticular proteins of nematodes sharing motifs with cuticular proteins of arthropods

doi:10.1371/journal.pone.0274751

Fig 1.

Sequence comparison of different Asu-epic-1 clones.

Formerly characterized genomic (G) and cDNA clones (C) are aligned with the TSA sequence JI176387.1rc and the complete Asu-epic-1 gene of A. suum. Exons 1 and 2 are indicated as light and darker blue boxes or lines. The intron and untranslated regions are indicated as a gray box and gray lines with the number of nucleotides, respectively. The GenBank accession numbers, the initial designation used for the sequence submission [29, 30], and the sequence length (bases, b) are given.

More »

Expand

Fig 2.

Characteristics of the aligned nucleotide and deduced amino acid repeats of the gene Asu-epic-1.

A gap is introduced in repeat 2 for an optimal alignment. The seven aligned repeats (without the signal peptide sequence) vary in length from 147 to 153 nucleotides and start with a conserved motif of eight nucleotides (green). The seven repeats are individually colored. These colors are reused in Fig 3. Nucleotide variations between repeats are highlighted in blue, red, or black letters. Blue highlighted nucleotides are specific for a given repeat. Nucleotides highlighted in red are present in two repeats. Black letters of nucleotides show at a given position variations in more than two repeats. Underlined nucleotides lead to changes in the amino acid residues. In the amino acid repeats, the regions at the N- and C-terminal sides are conserved (red letters). Underlined red amino acids represent hydrophilic motifs. Each of the conserved region contains one tyrosine residue (highlighted in yellow). A third tyrosine (yellow) is present in between, except in repeat 2. Amino acid variations between the repeats are highlighted in blue. Repeat 6 ends with an arginine (blue) instead of the usual lysine.

More »

Expand

Fig 3.

Schematic representation of the nucleotide sequence of the Asu-epic-1 gene and five cDNA/TSA sequences.

The nucleotide sequences (total length in parenthesis) are represented by light gray bars. The exon1 coding for the signal peptide is indicated as a light blue box. Arrows in different colors represent the seven tandem repeats of exon two (see Fig 2). The specific nucleotides of a given repeat of the Asu-epic-1 sequence are shown as numbers above their position in the repeat sequence. In the cDNA/TSA sequences, missing repeats are indicated above their correct corresponding position. Duplicated repeats or parts of them are represented with the color of the corresponding repeat. Their colored borders indicate the type of missing repeat. Single nucleotide differences in comparison to the Asu-epic-1 gene are indicated below the sequences.

More »

Expand

Fig 4.

Physicochemical properties of Asu-EPIC-1.

A) The protein has a highly intrinsically disordered profile as predicted by IUPred2A [41]. The ANCHOR2 score of 0.65 is a threshold indicative of intrinsically disordered regions. B) Hydrophobicity plot, according to Kyte & Doolittle, shows six hydrophilic regions. C) Identification of seven putative molecular recognition features (MoRFs) (blue regions) using the MoRFchibi SYSTEM [43]. The cut-off level is set at 0.72 (blue line). The first six MoRFs are the parts of the conserved amino acid regions, and each contains two of the three tyrosine residues (SGYRKKRNNAY). The last MoRF is associated with the sequence QAPAPI. D) The distribution and proportion of disorder-promoting amino acids (in red) versus more order-promoting amino acids (blue) [44]. E) Distribution of the hydropathy pattern in the TRs according to Pommié et al. [45].

More »

Expand

Fig 5.

Comparison of Pfam patterns in Asu-EPIC-1 and the protein B4IZ60 of D. grimshawi.

The signal peptide sequence is in green letters. The Drosophila protein is organized analogically to the repeat units of Asu-EPIC-1. The number of amino acid sequences per TR is indicated on the right side. Gaps were introduced for better alignments. The red letters in Asu-EPIC-1 represent the highly conserved regions located at each repeat’s N- and C-terminal ends. All tyrosine residues (in red letters) are highlighted in yellow. The tyrosine motifs are boxed (orange = Pfam02756; light green = Pfam02757 and green = motif YGD). The horizontal lines below the D. grimshawi repeat sequences represent the amino acids most frequently present in all Drosophila matches on the Interpro website (http://www.ebi.ac.uk/interpro/) [46]. The red background letters indicate the amino acids that differ from the D. grimshawi sequence.

More »

Expand

Fig 6.

Comparison of the epicuticlins of seven nematode species belonging to three different clades.

The sequences, with the Uniprot accession numbers, are drawn to scale, based on the number of amino acids, using IBS version 1.0.3 [47]. The different epicuticlin sequences of a given species are horizontally positioned. Signal peptide sequences (S) are shown as light green boxes. Repeats are represented as blue boxes in which the number of amino acids is given. Grey regions are non-repeat stretches. For each repeat, tyrosine residues are indicated with symbols (light green circles Y1 = YGD; yellow circles Y3 = GYR; the blue circles = Y2 indicate the additional tyrosine). Cysteine residues (red triangles C) are present in several epicuticlins of nematode species belonging to Clade V. Additional information is available in the S2 Table and deposited in the Dryad, Dataset, https://doi.org/10.5061/dryad.fttdz08vs).

More »

Expand

Fig 7.

Scheme of STRING analysis of the C. elegans epicuticlin Q8MXU8 (K08D12.6) to show potential interactions of this epicuticlin with the second epicuticlin Q9U3J8 (F11E6.3), cuticular collagens, and other proteins rich in tyrosine.

The red-colored node represents the query protein, and the other nodes are the first shell of interactors. Co-expression and homology analysis are indicated in the Table. Each protein-protein interaction is annotated with ’scores’, which are confidence indicators. All scores rank from 0 to 1, with 1 being the highest possible confidence. A score of 0.5 indicates that roughly every second interaction might be erroneous (i.e., a false positive) [39].

More »

Expand

Fig 8.

Schematic representation of the two C. elegans epicuticlins Q9U3J8 and Q8MXU8 and the cuticular collagen DPY-10.

The Uniprot accession numbers, in parentheses, the Wormbase accession, and the total amino acid numbers are shown. The three protein sequences are drawn to scale. The TRs in the two epicuticlins are indicated as blue boxes with the amino acid numbers. The GYR and YGD motifs are shown as orange and green boxes, respectively. The positions of the cysteine residue in the epicuticlin Q8MXU8 and the eight cysteine residues, organized in three clusters (red triangles) in the collagen, are marked with red triangles. In the collagen, the three collagen (GXY)n regions (green boxes), the two tyrosine motifs, and thirteen tyrosine residues (yellow circles) are indicated.

More »

Expand