Breast cancer is marked by specific, Public T-cell receptor CDR3 regions shared by mice and humans

doi:10.1371/journal.pcbi.1008486

Fig 1.

Experimental procedure.

(A) 120 blood samples were drawn from the retro-orbital sinus of 10 FVB/N-Tg(MMTVneu), a mouse model of HER2 human breast cancer mice, and from 5 FVB/NJ Control mice. Over these 8 time-points, none of the Control mice (blue) developed any tumors. We will use the term "Transgenic" for FVB/N-Tg(MMTVneu) 202 Mul/J and "Control" for FVB/NJ strain. Progress of tumor in the ten Transgenic mice is demonstrated using the red colored samples in the figure. The last time point before tumors are shown was defined as pre-cancer and marked light red. (B) At each time point, the peripheral blood mononuclear cells were isolated and stained for flow cytometry. Cells were analyzed and gated for sorting using a FACS ARIA III sorter, and the CD4⁺CD62L^hiCD44^lo naïve population was separated for RNA extraction and T cell receptor library preparation (see Methods). TCR alpha and TCR beta were obtained separately and not via single-cell TCR sequencing (alpha-beta pairing is not possible).

More »

Expand

Fig 2.

Convergent Recombination Dominates the Public Repertoire and the tumor-developing mice repertoire.

Public repertoires and their subtypes are shown in tumor-developing mice and in Control mice using nucleotide sequences (A) and using AA sequences (B) in α chain. The Y-axis in the panels shows the percentages of “unique” clones, where we used the common definition of “unique” sequences as that in which we count each sequence only once and disregard its copy-number. The two panels show the repertoire in the α chain, but a similar effect is seen in the beta chain (S1 Fig). The different categories included in the panel bars are relative abundance of the different categories. That is, together they represent 100% of the sequences. Therefore, the color bars together sum up to 100%. (C, D) Convergent recombination in Control and Transgenic mice in α chain. The upper bars indicate nucleotide (NT) sequences and their division to the different Public groups (Private, Public-Inclusive and public-Exclusive), while the lower bars indicate amino-acid (AA) sequences in Control mice (C) and tumor-developing mice (D). The lines between NT and AA represent the effect we see in convergent recombination–in which different NT sequences encode to the same AA sequence and change the Public/Private balance. For example, in panel C, we can see that 5% of the NT sequences that were Private in an NT view, became 66% of the Public Exclusive when we looked in an AA view. (E, F) Frequencies of CR clones in the Public repertoire in α and β chains. (G, H) Correlation between the averaged mouse-mouse sharing level and CR level in Control versus tumor-developing groups (G) and in the Early-tumor group versus Cancer group (H) in β chain.

More »

Expand

Fig 3.

Comparing human T cell repertoire data and mouse T cell repertoire data.

To learn more about the connection between the repertoires of mouse mammary tumor and human breast cancer, we studied the three different types of datasets (left to right): (1) TCR β-seq data from 50 breast cancer patients from 3 different studies, in which different conditions and different tissues were studied. (2) Single-cell RNA-seq TCR data of 3 breast cancer patients obtained from Azizi et al. (3) TCR sequences extracted from RNA-seq data of breast cancer patients obtained from The Cancer Genome Atlas (TCGA).

More »

Expand

Fig 4.

Specific Cross-species TCRβ Clones Dominate the tumor-developing Group.

(A) We ranked the 258 cross-species sequences according to their abundance in each sample. To visualize similarities between the ranking in each time point and the samples from early-stage breast cancer patients as in Beausang et al., we stacked bars from each of the 258 ranking lines. The area of each bar has been determined so that it is reciprocal to (ranking in human X ranking in mouse). In that manner, if, in a specific time point, a clonotype is ranked #1 in mouse samples, and is also ranked #1 in human samples, it would demonstrate the largest area. Color of sequences is preserved across bars, so that we see that three sequences dominate the similarity between samples: blue, orange and green. These sequences are included in Table 1, and, as the Table further indicates, have been previously associated with Melanoma, with Influenza, and with Diabetes. In this panel we collectively used the grey color to indicate all the sequences that were not one of the three sequences colored differently in the panel. (B) We used IGoR (38) (see Text) to learn of any differences between the populations of cross-species sequences used in panels and the full collection of sequences. To do that, we plotted each sequence as a dot on the graph. Red dots represent cross-species sequences, blue dots represent the full set of all sequences, and green dots represent the 4700 NT sequences that code for the 258 AA sequences that are shared across all time-points in our mouse samples and with human samples. The vertical location of the dot is determined by its IGoR value and the horizontal location by its CR value. The right-hand side curves present the histogram over the IGoR values, and the upper-side curves present the histograms over CR values. We used a Kolmogorov–Smirnov test to estimate p-value for the differences between distributions. Indeed, a highly significant p-value (p<2.2x10^-16) has been obtained, that demonstrates a large difference between the sequence populations. It is interesting to note, as shown by the three CR histograms on the upper side of the panel, that the CR values of these three populations of sequences also come from extremely different distributions (p<2.2x10^-16). We also highlighted the locations of the two sequences that are described in panel A. (C) The two highly ranked clones in the cross-species analyses are visualized for their CR sources. In the panel, each nucleotide sequence is connected to its translated AA sequence. Edges are blue if they originate from a mouse sequence and red if they originate from a human sequence. The colored bars represent the NT sequences encoded to these AA sequences. Each color represents different nucleotide: T–blue; G–yellow; C–green; A–red. As the panel shows, there is no overlap between sources for these two AA sequences. (D) The number of sequences that differ from CASSLGYEQYF (grey bars), CASSLSYEQYF (yellow bars) and from 1000 random sequences (blue bars), by 1 or 2 AA. The X-axis provides the different time points obtained from our mouse data, and the Y-axis represents the number of similar sequences with Levenshtein edit distance of up to 2. (E) A representation of the sequences that are close in their edit distance to tumor-associated sequences (orange bars) and to 1000 random sequences (blue bars). (F) Network representation of similar sequences to CASSLGYEQYF and CASSLSYEQYF (upper networks), and to 2 random sequences (lower networks) over time.

More »

Expand

Table 1.

CDR3 sequences that show cross-species, tumor-Exclusive, behaviors.

More »

Expand

Fig 5.

Cross-species tumor-associated clones.

(A) The averaged predictive binding score to tumor peptides of tumor-associated cross-species sequences (right bar) and 5 random subsets of sequences. (B) The averaged Jaccard overlap index between control-exclusive sequences and tumor exclusive sequences to breast cancer samples (left plot) and Melanoma samples (right plot). (C) The frequency of highly abundant sequences from ‘Transgenic Old’ samples and random sequences in breast cancer samples (left plot) and melanoma samples (right plot).

More »

Expand

Fig 6.

The relationship between TCR clones and different breast cancer stages.

Public repertoires of the TCGA-associated TCR clones and their subtypes are shown in tumor-developing mice and in Control mice for α chain (A) and β chain (B). Frequency of Convergent recombined clones shared between TCGA samples and mouse samples in α (C) and β chain (D). (E) Frequency of shared clones between TCGA samples and tumor-developing mice in different stages of breast cancer in α chain clones. For each sample, we calculated the number of shared clones according to the following formula: (Shared clones between sample and stage) / (# of clones in a sample x # of total cases of a stage).

More »

Expand

Fig 7.

Transgenic mice share a higher similarity with human breast cancer patients than Control mice.

(A) The number of different human α-β pairs shared with our mouse data within the Control group (blue bar) and the Transgenic group (red bar). (B) The percent of convergent recombined (CR) human α-β pairs in the Control and Transgenic mice. (C) The percent of the different Public groups (Private, Public-Inclusive and Public-Exclusive) in the Control group (blue bars) and the Transgenic group (red bars). (D) Distance networks of all the α-β pairs shared with our mice subsampled data that appear only in the Control group (blue network), only in the Transgenic group (red network) and in both groups (green network).

More »

Expand