Figure 1.
Frequency distribution of pairwise sequence divergence between 6929 single-copy gene orthologs in T. b. brucei and T. b. gambiense.
Divergence values to the right of the dashed line are statistically significant; the identities of selected divergent gene pairs are noted. An asterisk * denotes genes belonging to a T. b. brucei subspecies-specific tandem gene array (see Figure 2 and Supplementary Figure S2).
Table 1.
Species-specific segmental duplications in (a) T. b. gambiense and (b) T. b. brucei.
Figure 2.
Segmental duplication on chromosome 9 in T. b. brucei.
A single segment in T. b. gambiense comprising three coding sequences (Tbg972.9.4160, 4140 and 4130) corresponds to a three-gene segmental duplication (5 repeats) on chromosome 9 in T. b. brucei. The first coding sequence (shaded red) is a conserved, hypothetical gene encoding a putative secretory protein and all copies are identical. The second (shaded yellow) and third (shaded orange) coding sequences are tandem-duplicate, conserved hypothetical genes encoding putative membrane-bound proteins. Both second and third genes contain substantial sequence variation in T. b. brucei; the upstream-most copies are orthologous to the T. b. gambiense genes, but none of the remaining variants were identified among T. b. gambiense sequence reads. The segmental duplication is preceded immediately upstream by an INGI-mediated insertion (shaded purple).
Figure 3.
Analysis of sequence variation among T. brucei 65 kDa invariant surface glycoprotein genes.
Gene copies are numbered consecutively in positional order from left to right on the chromosome, beginning with Tbb1 (Tb927.2.3270) and Tbg1 (Tbg972.2.1130) respectively. a. Sequence variation (expressed as Shannon entropy score, left scale) along a multiple sequence alignment, combined with recombination breakpoints (red lines, right scale) inferred by GARD analysis. A dotted blue line marks the boundary between coding and 3′ UTR regions. Coloured bars above the chart indicate recombination tracts as identified by GARD. b. A phylogenetic network including sequences unique to one subspecies (marked with an asterisk *). Annotated pseudogenes are indicated by ψ. c. A chart showing the affinities of sequence Tbg7 only; all sequences are represented in a circle, coloured bars connect Tbg7 (or regions thereof) to its closest relative among other sequences. Different colours are used to denote different affinities. d. Affinity chart for Tbg1.
Table 2.
Evidence for recombination within variable tandem gene arrays conserved in T. b. gambiense and T. b. brucei.
Figure 4.
Variant surface glycoprotein (VSG) repertoire of T. b. brucei (927) represented as a three-dimensional network graph.
1258 T. b. brucei VSG protein sequences were compared using pairwise BLAST searches. BLAST scores were used to arrange VSG into a graph using BioLayout Express 3D 3.0. 968 individual VSG are represented as coloured spheres and are joined by edges to all other nodes with which they share >70% amino acid identity. The program minimizes the distance required to arrange all nodes such that related nodes are arranged closest to one another. Nodes are shaded by type: orthologous sequence in T. b. gambiense (blue), orthologous sequence in T. b. gambiense, but closest relative in T. b. brucei (green), no corresponding sequence in T. b. gambiense (red), metacyclic-stage VSG (purple) and VSG-related (VR) proteins (yellow).