Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

A scenario of rearrangements from one ancestral chromosome to one extant chromosome and corresponding conserved segments and synteny blocks.

Along chromosomes, arrows represent uninterrupted segments of several genes and triangles represent segments of chromosomes composed of one gene. The three first lines describe the scenario of inversions, from the ancestral chromosome (first line) to the extant chromosome (3rd line) in two steps: 2 macro-inversions and then 5 micro-inversions. Macro-inversions (reversing at least 4 genes) are in red whereas micro-inversions (reversing at most 3 genes) are in blue and grey: blue inversions reverse 3 genes and grey inversions reverse 1 gene. The two lines below show the corresponding conserved segments and “synteny blocks” (as intuitively defined first by Pevzner and Tesler [2]) while the last two lines show all synteny blocks corresponding to our formal definition. The set of optimal non-overlapping synteny blocks corresponds well to the original definition, except that the conserved segments nested in the gaps of our synteny blocks are not considered as part of the synteny blocks. In this example, the length of the maximum allowed gap in synteny blocks is the maximum length of micro-rearranged segments; g is equal to 3 genes.

More »

Fig 1 Expand

Fig 2.

Proportion of the number of tandem duplications among all gene duplications that occurred between the Amniota ancestor and five extant vertebrates.

Panel A contains the species tree linking extant human, mouse, dog, opossum and chicken species to Amniota, their most recent common ancestor. The topology of the tree and the dates of speciation come from the Ensembl database [11]. The graph in Panel B shows the proportion of tandem duplications among all duplications. Genes are considered duplicated in tandem if they fulfil two criteria: (i) they must belong to the same gene family, and therefore share the same ancestral Amniota gene (ii) they are separated, in the extant genome, by at most tandemGapMax genes (on the x-axis). Tandemly duplicated genes form clusters of two or more tandem duplicates of the same gene family. The number of tandem duplication events within a cluster is estimated as the number of tandem duplicates minus 1 (the original ancestral gene). Computations are performed using genomes from Ensembl v81 and the corresponding gene trees of Ensembl Compara. The proportion of tandem duplications among all duplications is substantial and varies from approximately 40% to 70% depending on the lineage.

More »

Fig 2 Expand

Fig 3.

Collapsing a cluster of tandem duplicates can circumvent ruptures of collinearity caused by tandem segmental duplications.

Panel A contains the matrix of homologies of the comparison of a segment of the human chromosome X and a segment of the mouse chromosome X from Ensembl v69. Two segmental tandem duplications of human genes C and D blur the conservation of the ancestral gene order. Panel B details the process of collapsing the clusters of tandem duplicates on the human segment. If tandemGapMax ≥1 there are two clusters, a cluster of 3 genes of family C and a cluster of 3 genes of family D. The three C genes form a cluster because less than tandemGapMax other genes separate any pair of C genes. All genes in a cluster are collapsed at the location of the first gene, as indicated by the yellow arrow. The same applies with the cluster of the D family. When tandem duplicates are collapsed, the conserved order of ancestral genes now forms an uninterrupted linear diagonal in the matrix of homology packs [5] in Panel C. Bounding boxes are drawn as black rectangles around diagonals in panel A and around the diagonal of the identified conserved segment in panel C.

More »

Fig 3 Expand

Fig 4.

Chromosomes orientations may influence conserved segments identification when tandem duplicates are collapsed.

Panel A shows the matrix of homologies of the comparison of a segment of human chromosome 17 and a segment of mouse chromosome 11 from Ensembl v81. For a tandemGapMax ≥ 1, the human segment contains two clusters of tandem duplicates, one with two genes E and one with two genes G, while the mouse segment also contains two clusters of tandem duplicates for E and G but with three copies of each. The matrix of homology packs [5] after collapsing all clusters is shown in panel B. Panel C shows the same data as in panel A, but this time the mouse segment is inverted on the y-axis, so that now mouse genes are ranked in the opposite order. With the new orientation of the mouse segment, the gene content of the resulting conserved segment (ABCDEFGHIJK in Panel A and ABCDFHIJK in Panel D) changes but in both cases the two extremities are the same, the 5’ extremity of the gene A and the 5’ extremity of gene K.

More »

Fig 4 Expand

Fig 5.

Identifying micro-rearrangements and mono-genic conserved segments.

In panel A, the homology matrix corresponds to two synteny blocks, one (corresponding to the purple diagonal) completely nested in the other (green diagonal). In addition, the large diagonal contains a nested single-gene homology (- sign in the middle) and is adjacent to another single-gene homology (the bottom-right—sign). In panel B, after post-processing, the two synteny blocks are now broken down into four conserved segments of which one is mono-genic (corresponding to the red homology) and the two single gene homologies are identified as mono-genic conserved segments (blue and orange homologies). In Panel C, there is a synteny block (purple diagonal) partially nested in another block (blue diagonal). In Panel D, the identification of the corresponding micro-rearrangement leads to three separate diagonals of conserved segments.

More »

Fig 5 Expand

Fig 6.

Resolution of overlaps and resolution of what seems to be an incorrect rupture of synteny.

The homology matrix in Panel A corresponds to the comparison of a segment of the opossum chromosome and a segment of the chicken chromosome with a tandemGapMax = 2 in Ensembl version 81. The scenario that leads to the compared extant genomes is debatable. Except for the insertion of the gene X (grey gene on the y-axis) that is probably due to a dispersed duplication, the scenario may involve: inversions, transpositions of genes over a small distance or tandem duplications (with the insertion of copies a few genes away from copied ancestral genes) followed by deletions of the copied ancestral genes. Since tandem duplications and gene deletions seem to outnumber chromosomal rearrangements, we made the choice of considering that the true scenario involves only two tandem duplications followed by deletions of the copied ancestral genes thus both limiting uncertain breakpoints and uncertain extremities of conserved segments. The homology matrix in Panel B represents the result of the truncation used to solve small overlaps of diagonals. Panel C shows the result of the merge of the extremity of the truncated diagonals that solves the initial incorrect rupture of synteny, delimited by the red rectangle in Panel A.

More »

Fig 6 Expand

Fig 7.

Recall and precision analysis of PhylDiag and i-ADHoRe 3.0 based on simulated conserved segments of two distant species, mouse and chicken that diverged 325 Million years ago.

The analysis is performed based on a realistic simulation [9](S1 Text) of the evolution of gene order that replicates features of extant genomes of Ensembl 81. The first column (left) corresponds to the detection of extremities of conserved segments. The second column (middle) corresponds to the detection of adjacencies of genes in conserved segments. And the third column (right) corresponds to the detection of gene names in conserved segments. For each item and each parameterisation of algorithms, recall (top), precision (middle) and F1-score (bottom) are shown as a function of gapMax. The refinement methods described in this manuscript are imr (Identify Micro-Rearrangements), imcs (Identify Mono-genic Conserved Segments) and t (Truncation). A “-”sign means that the option is inactive and an integer, even 0, means that the option is active. For the option imr, the integer value specifies the maximum gap allowed between: the extremity of an identifiable micro-segment and the nearest homology of the diagonal in which it is included (S12 Fig). For the option imcs, the integer value sets the width of the neighbourhood around edges of bounding boxes of diagonals of synteny blocks where mono-genic conserved segments are identified (S13 Fig). For the option t, the integer value specifies the truncationMax parameter value. Truncating and solving remaining overlaps with truncationMax = 4 does not decrease substantially recall and precision while ensuring that conserved segments are not overlapping. The black curves represent the results of i-ADHoRe 3.0 for varying values of the parameters gap_size and cluster_gap, both equal to gapMax along the axis. gap_size and cluster_gap parameters of i-ADHoRe 3.0 does not allow 0 values thus graphs of i-ADHoRe 3.0 begin at gapMax = 1.

More »

Fig 7 Expand

Fig 8.

Recall and precision analysis of Cyntenator based on the same simulation as in Fig 7.

The parameter varying is threshold, the cut-off value of Cyntenator that discards all alignments of genes with lower scores.

More »

Fig 8 Expand

Fig 9.

Two scenarios, one with breakpoints, the other without breakpoint that cannot be distinguished with our data.

In Panel A, an initial chromosome of an ancestral species Sa evolves up to two extant species, S1 and S2. No events take place from Sa to S2 but gene B is inverted between Sa and S1, creating two breakpoints and resulting in three conserved segments which are easily identified in the homology matrix at the bottom. In Panel B, gene B is tandemly duplicated with a reverse orientation from Sa to S1, and the ancestral copy is deleted. From the comparison of extant genomes of S1 and S2, since the non-ancestral gene B (with no black outer line) is incorrectly considered as an ancestral gene, the homology matrix appears identical to the mono-genic inversion scenario in panel A. Therefore 3 conserved segments are returned and 6 extremities of conserved segments are detected whereas only one conserved segment should be returned, with two extremities, as explained in Panel C. In Panel B, the sets of extremities of conserved segments used for the calculation of the recall and the precision are T = {sA,eC}, D = {sA,eA,sB,eB,sC,eC}, Tp = {sA,eC}, Fp = {eA,sB,eB,sC} and Fn = ∅; with sX the 5’ extremity (start) of the gene X and eX the 3’ extremity (end). Thus the recall is 100% and the precision is 33%. Similarly, if gene orientations are not considered, T = {A,C}, D = {A,B,C}, Tp = {A,C}, Fp = {B} and Fn = ∅, thus the recall is 100%, the precision is 66%. Concerning gene adjacencies with gene orientations, T = {eA-sC}, D = ∅, Tp = ∅, Fp = ∅ and Fn = {eA-sC}; with X-Y the adjacency of the gene extremity X and gene extremity Y. Thus recall and precision are both null here. Finally, if we are interested in gene names in conserved segments, T = {A,C}, D = {A,B,C}, Tp = {A,C}, Fp = {B} and Fn = ∅ thus the recall is 100% and the precision is 66%.

More »

Fig 9 Expand

Fig 10.

Breakpoints between tandem duplicates yield unsolvable false positive and false negative extremities of conserved segments.

In Panel A, an initial chromosome of an ancestral species Sa evolves until two extant species, S1 and S2. Along the lineage from Sa to S2 the chromosome is perfectly conserved, and along the lineage from Sa to S1 gene C is duplicated in tandem. The non-ancestral copy of gene C has no black outer line. Then an inversion occurs with the right breakpoint falling between the two tandem duplicates. From extant genomes of S1 and S2 it is impossible to know which paralog of gene C in S1 is non-ancestral, thus both copies are considered as a probable ancestral gene C. Although the analysis of the homology matrix yields 3 conserved segments, if the non-ancestral gene C is falsely considered as the ancestral gene, there are two false positive extremities and two false negative extremities. Panel B describes the desired homology matrix obtained when the ancestral gene C is correctly identified. In Panel A, the sets of extremities of conserved segments used for the calculation of the recall and the precision are T = {sA,eA,sB,eB,sC,eD}, D = {sA,eA,sB,eC,sD,eD}, Tp = {sA,eA,sB,eD}, Fp = {eC,sD} and Fn = {eB,sC} with sX the 5’ extremity (start) of the gene X and eX the 3’ extremity (end). Thus the recall and the precision are both equal to 66%. If we focus on the detection of gene adjacencies without considering gene orientations, T = {C-D}, I = {B-C}, Tp = ∅, Fp = {B-C} and Fn = {C-D} with X-Y the adjacency of genes X and Y. The associated recall and precision are thus null here.

More »

Fig 10 Expand