Fig 1.
Domain/motif organization of the 26 human ADAMTS-TSL paralogs, adapted from [5].
Fig 2.
Phylogenetic inference of module and phenotype appearances.
The different steps of the method, illustrated here for a dummy set of sequences containing two paralogs p1 and p2 (from one species) and their ortholog p3 (from another species), are: 1) Inference of the reference gene tree from protein sequences by a standard pipeline (PASTA, RAxML, TreeFix); 2) Identification of conserved sequence modules (i.e. sets of strongly similar segments from at least 2 protein sequences aligned in PLMA blocks by Paloma-D); 3) Inference of the module composition of ancestral genes in the reference tree (through Module-Gene-Species reconciliation by SEADOG-MD using the phylogenetic tree of each module inferred with PhyML and TreeFix); 4) Annotation of proteins with known phenotypic traits of interest (here Protein-Protein Interactions); 5) Reconstruction of the ancestral scenario of phenotype evolution across the reference gene tree (PastML); 6) Merging module and phenotype evolutionary information: each ancestral gene of the reference gene tree is then characterized by a module composition and a set of phenotypic traits (protein interactants here). The final result is the prediction of functional signatures by identification of module(s) and phenotypic trait(s) co-appearance.
Table 1.
Distribution of protein sequences among the nine species.
The dataset-708 contains all sequences of the 24 selected orthogroups (several isoforms for a single gene). The dataset-214 contains one representative sequence per gene (described in S3 Table).
Fig 3.
Identification of modules by partial local multiple alignment.
We show here a schematic PLMA of sequences S1, …, S5 composed of the alignment blocks B1, …, B6. This example illustrates the locality of the alignments: each alignment of positions is supported by a local alignment of sequences, leaving possibly other sequence positions unaligned. For instance the alignment block B2 is local since it aligns only the two segments M2.3 and M2.4 of the sequences S3 and S4, and no other positions of these sequences. Local alignments authorize to align only subsets of adjacent positions from each sequence. In an orthogonal way, partial alignments authorize to align positions of only a subset of the sequences. This is also illustrated by B2 which is partial since it aligns only positions from S3 and S4 (and does not even align them to positions of the block B1 above B2). Partial local alignments are not limited to pairwise alignment: the block B3 is an example of partial local alignment block, aligning the segments M3.2, M3.3 and M3.4 from sequences S2, S3 and S4, that can be built from the pairwise local alignments (highlighted here in blue colors) of the segments M3.2 with M3.3, M3.3 with M3.4 and M3.2 with M3.4. The PLMA blocks align each a set of segments conserved specifically in a sequence subset. This set of segments, provided that they are long enough, is said to be a conserved sequence module. Let us assume here that all the blocks, except B6, align segments of 5 or more residues. The set of segments {M2.3, M2.4} aligned in B2 defines then for instance the module M2, while the block B3 enables to identify the module M3 = {M3.2, M3.3, M3.4}. Blocks B4, B5 and B6 illustrates how the definition of the blocks –requiring that each segment is aligned to, and only to, all the other segments of the block– enables to split possibly longer local alignments to identify segments specifically conserved in sequence subsets: even if the concatenation of M4.1 and M5.1 is locally aligned to the concatenation of M4.3 and M5.3 (this could be for instance a pairwise alignment used to build the PLMA), none of these two concatenations is locally aligned to M5.2. In this case, the maximal set of segments aligned with M5.2 is {M5.1, M5.3, M5.2} and the modules are here M4 = {M4.1, M4.3} specifically conserved in {S1, S3} and M5 = {M5.1, M5.3, M5.2} specifically conserved in {S1, S3, S2}. Similarly, the block B6 aligns the segments S6.3 and S6.2 but, in contrast with the segments aligned by B4, these segments are shorter than 5 residues and do not define a module.
Fig 4.
Reference gene tree of the 125 ADAMTS, 48 ADAMTSL and 41 ADAM outgroup members (figure produced with Itol).
Fig 5.
Module composition of the 26 H. sapiens ADAMTS-TSL sequences.
The phylogenetic gene tree of the 26 H. sapiens ADAMTS and ADAMTSL paralogs was extracted from the reference gene tree (Fig 4). The modules identified by the Paloma-D program are represented on the sequences with Itol, using a unique combination of form and color to designate each module. The complete list of modules is provided in the S8 Table.
Fig 6.
Protein-Protein Interaction networks of H. sapiens ADAMTS-TSL.
The 119 PPIs shared by the 26 human ADAMTS-TSL are visualized with Cytoscape [41]. Yellow nodes are hyalectanases, green nodes are pro-collagenases, grey nodes are ADAMTS with unspecific substrates, blue nodes are ADAMTSL and white nodes are proteins interacting with ADAMTS-TSL.
Table 2.
The 45 events of module(s)-PPI(s) co-appearance.
G91, G96, G161, G235 and G341 are ancestral genes of at least 2 paralog gene.
Fig 7.
Location in H. sapiens paralogs of the modules involved in the 45 events of module(s)-PPI(s) co-appearance.
(A) ADAMTS-TSL phylogeny indicating the 45 ancestral nodes (labels in white boxes) corresponding to the 45 module(s)-PPI(s) co-appearance events. TS, ADAMTS; TSL, ADAMTSL; Hs, Homo sapiens; Dr, Danio rerio; Xt, Xenopus Tropicalis; Gg, Gallus gallus; Mm, Mus musculus; Bt, Boss Taurus. (B) Each line corresponds to a H. sapiens protein chosen as representative of the ancestors, and onto which the ancestral module signature is reported. The Pfam domains are represented as grey boxes and the gained modules as green marks. Each sequence is divided into 3 regions; 1) the N-terminal region that contain the propetide in ADAMTS (blue), 2) the central region including the catalytic domain and the disintegrin domain (orange) and the central TSP1, the cys-rich domain and the spacer (yellow) and 3) the variable ancillary region from the end of the spacer to the C-terminal end (purple).
Fig 8.
Convergent evolution of COMP and CCN2 interactions with ADAMTS-TSL.
COMP and CCN2 interactions with ADAMTS-TSL are associated with independent module signatures acquired during evolution. (A) Phylogenetic tree of ADAMTSs with magnification of phylogenetic ADAMTS subtrees involved in COMP and CCN2 PPIs. (B) Heatmap: the published interactions of H. sapiens ADAMTS-3, ADAMTS-4, ADAMTS-7 and ADAMTS-12 with COMP and CCN2 are represented as dark blue boxes. The gains of the PPIs (inferred by PastML) are represented by the internal nodes: G315 (the ADAMTS-3 Amniota ancestor), G161 (the ADAMTS-7 and ADAMTS-12 paralogs ancestor) and G15 (the ADAMTS-4 mammalian ancestor) for CCN2, CCN2/COMP and COMP respectively. The PPIs inferred by PastML are represented as light blue boxes. At the opposite the absence of interaction (missing information) between ADAMTS and CCN2 and COMP is represented as dark orange boxes (for human proteins) and the absence of interaction inferred by PastML is represented as light orange boxes (for non human proteins).
Fig 9.
Three module signatures are associated with COMP and/or CCN2 PPIs.
The ancestral module signatures reported on the descendant proteins in H. sapiens: (A) ADAMTS-7, (B) ADAMTS-4 and (C) ADAMTS-3. The location of the modules is represented as green boxes along with the location of the Pfam domain represented as grey boxes (top panel) while the content of the modules is represented by raw sequence logos [49] using the Protomata visualization [14], which displays also the chaining of the modules with arrows labeled by the minimal and maximal distances between the modules in the sequences of descendants (bottom panels). Because of the size of the G161 module signature, only the modules in the region of interaction with COMP are shown in (A).
Fig 10.
Evolutionary histories of hyalectanases PPIs.
(A) Hyalectanase tree. The last common ancestor of the human hyalectanases is the G96 gene node. The gain of the ACAN and VCAN PPIs was inferred at the G96 gene node. (B) Heatmap: the published interactions of H. sapiens hyalectanases with ACAN, VCAN, LRP1, TIMP3 and COMP are represented as dark blue boxes. The PPIs inferred by PastML tool are represented as light blue boxes. At the opposite the absence of interaction (missing information) between hyalectanases and ACAN, VCAN, LRP1, TIMP3 and COMP is represented as dark orange boxes (for human proteins) and the absence of interaction inferred by PastML is represented as light orange boxes (for non human proteins).
Fig 11.
Module signatures of ADAMTS-5 and hyalectanase proteins.
(A) Location of G65 and G96 signature modules on H. sapiens ADAMTS-5 (NP_008969.2). The protein domains are shown in grey, the modules gained at the ADAMTS-5 ancestral gene G65 are shown in green and the modules gained at the G96 hyalectanase ancestral gene are shown in purple. (B) Excerpt of the PLMA restricted to the sequences of H. sapiens hyalectanases and three non hyalectanases (ADAMTS-6, ADAMTS-10 and ADAMTSL-4) in the spacer domain. PLMA blocks defining modules are shown as boxes containing the module segments. The succession of the module segments of each sequence is indicated by arrows labeled by a numbering of the sequences displayed (from 1 to 10 here), completed with the interval of sequence positions skipped if the segments are not contiguous. (C) Sequence logos as in Fig 9 of the modules gained at G96 (left) and G65 (right) in the spacer. (D) All the segments of H. sapiens ADAMTS-5 (NP_008969.2) in G65 and G96 signature modules (E) Predicted structures of the H. sapiens ADAMTS-5 protein with and without propeptide, colored with G65 and G96 modules. The three hypervariable loops, β1-β2, β3-β4 and β9-β10 previously described in Santamaria et al, 2019 [59] are marked by *.