Breast cancer is marked by specific, Public T-cell receptor CDR3 regions shared by mice and humans
Fig 4
Specific Cross-species TCRβ Clones Dominate the tumor-developing Group.
(A) We ranked the 258 cross-species sequences according to their abundance in each sample. To visualize similarities between the ranking in each time point and the samples from early-stage breast cancer patients as in Beausang et al., we stacked bars from each of the 258 ranking lines. The area of each bar has been determined so that it is reciprocal to (ranking in human X ranking in mouse). In that manner, if, in a specific time point, a clonotype is ranked #1 in mouse samples, and is also ranked #1 in human samples, it would demonstrate the largest area. Color of sequences is preserved across bars, so that we see that three sequences dominate the similarity between samples: blue, orange and green. These sequences are included in Table 1, and, as the Table further indicates, have been previously associated with Melanoma, with Influenza, and with Diabetes. In this panel we collectively used the grey color to indicate all the sequences that were not one of the three sequences colored differently in the panel. (B) We used IGoR (38) (see Text) to learn of any differences between the populations of cross-species sequences used in panels and the full collection of sequences. To do that, we plotted each sequence as a dot on the graph. Red dots represent cross-species sequences, blue dots represent the full set of all sequences, and green dots represent the 4700 NT sequences that code for the 258 AA sequences that are shared across all time-points in our mouse samples and with human samples. The vertical location of the dot is determined by its IGoR value and the horizontal location by its CR value. The right-hand side curves present the histogram over the IGoR values, and the upper-side curves present the histograms over CR values. We used a Kolmogorov–Smirnov test to estimate p-value for the differences between distributions. Indeed, a highly significant p-value (p<2.2x10-16) has been obtained, that demonstrates a large difference between the sequence populations. It is interesting to note, as shown by the three CR histograms on the upper side of the panel, that the CR values of these three populations of sequences also come from extremely different distributions (p<2.2x10-16). We also highlighted the locations of the two sequences that are described in panel A. (C) The two highly ranked clones in the cross-species analyses are visualized for their CR sources. In the panel, each nucleotide sequence is connected to its translated AA sequence. Edges are blue if they originate from a mouse sequence and red if they originate from a human sequence. The colored bars represent the NT sequences encoded to these AA sequences. Each color represents different nucleotide: T–blue; G–yellow; C–green; A–red. As the panel shows, there is no overlap between sources for these two AA sequences. (D) The number of sequences that differ from CASSLGYEQYF (grey bars), CASSLSYEQYF (yellow bars) and from 1000 random sequences (blue bars), by 1 or 2 AA. The X-axis provides the different time points obtained from our mouse data, and the Y-axis represents the number of similar sequences with Levenshtein edit distance of up to 2. (E) A representation of the sequences that are close in their edit distance to tumor-associated sequences (orange bars) and to 1000 random sequences (blue bars). (F) Network representation of similar sequences to CASSLGYEQYF and CASSLSYEQYF (upper networks), and to 2 random sequences (lower networks) over time.