Comprehensive analysis of structural and sequencing data reveals almost unconstrained chain pairing in TCRαβ complex

doi:10.1371/journal.pcbi.1007714

Fig 1.

Contact map of TCRαβ complexes.

A heatmap of inter-chain residue contact frequencies observed in n = 131 human and n = 39 mouse TCR:peptide:MHC complexes. Residue pairs having a distance between closest atoms of less than 5Å in at least one complex were considered in the analysis. Contact frequency was estimated by counting the number of times a given residue pair has a C_α distance of less than 15Å in PDB structures. CDR regions are shown with dashed lines, excluded middle portion of CDR3 is shown with dotted line.

More »

Expand

Fig 2.

Variability and contact frequency of TCR chain residues.

a. Contact frequency (red bars) and information content (blue line) across α and β chains. For human, transparent blue line shows information content computed from residue frequencies weighted by corresponding V and J gene usage in donor repertoires. Dashed lines show FR-CDR boundaries. b. Scatter plot represents the probability of being involved in inter-chain contact (x axis) and information content of amino acid frequency distribution (y axis) for individual α and β chain residues. Region with a selection of frequently contacted residues (x > 0.75) that are have variable amino acid content (y < 0.75) is highlighted with a red background. Residues with more information content should be considered as less variable, residues having no inter-chain contacts are not shown.

More »

Expand

Fig 3.

Amino acid statistics at contacting residues.

A two dimensional density plot comparing the number of times a given amino acid combination is observed at an inter-chain contact versus the number of times it was expected to be observed n_E. The expected count n_E is calculated using amino acid frequency distributions at separate chains and assuming random amino acid pairing; higher n_O / n_E ratio suggests enrichment of a given amino acid pair at corresponding contacting residues. The number of contacting residue pairs observed with certain n_O and n_E values (density of points at a given bin) is highlighted by color. Dotted lines show 95% confidence interval for the n_O / n_E ratio assuming Normal distribution with standard deviation computed from Binomial distribution approximated by n_E and the total number of observations n_T ≈ 3x10⁵.

More »

Expand

Fig 4.

Bayesian network (BN) of TCRαβ complex residues.

a. The graph of BN built with separately learned inter-chain contacts (shown in S4 Fig) whitelisted and residues that are not contacting according to contact frequency thresholding blacklisted. b. A density plot showing correlation between log-likelihood (LL) of BN for paired chains (y axis, computed using the network in a.) and sum of LLs of individual α and β chains for i^th clonotype from PairSEQ dataset. In order to compute individual chain LLs two independent networks were built by removing inter-chain edges and separating α and β residue components of the BN.

More »

Expand

Fig 5.

Contacting residues define mutual orientation of chains in TCRαβ complex.

a. A schematic definition of angles between α and β chains. Principal axes (x_α, y_α, z_α) and (x_β, y_β, z_β) of both TCR chains are computed using the inertia tensor of all atoms of a given chain with the exception of constant domain atoms (top panel); representative orientation of principal axes in real TCR:pMHC complex are shown. Euler angles φ_1,2,3 are then computed by superimposing chain centers of mass and computing angles between α and β principal axes (bottom panel). Illustrations were adapted from Wikimedia Commons (https://commons.wikimedia.org/wiki/File:63-T-CellReceptor-MHC.tif by David Goodsell and https://commons.wikimedia.org/wiki/File:Eulerangles.svg by Lionel Brits). b. Testing association between amino acid type (see Methods section and panel d. insert for amino acid cluster definition) and inter-chain angles. Point size shows ANOVA F-score for association between amino acid type and each of three Euler angles across TCR alpha and beta chain positions. The testing is performed for a non-redundant set of TCR chain orientations: all PDB structures with the same VαJαVβJβ are collapsed into a single observation with mean φ₁, φ₂ and φ₃ angles to prevent biases from several complexes with the same TCR. Red circles and labels show contact positions where a significant association between amino acid content and inter-chain angle is present, determined as P < 0.05 (adjusted for multiple testing). c. Representative distribution of φ₃ angle values for each amino acid type at α₅₇ position. d. Visualization of all PDB structures aligned to a single representative TCR beta chain. TCR alpha chains are colored according to amino acid type at α₅₇ position.

More »

Expand

Fig 6.

Log-likelihood (LL) distributions for TCRαβ pairs with known antigen specificity.

Cumulative distribution functions of αβ pair LLs computed according to the model shown in Fig 4A. The plot shows n = 1388 real TCRαβ pairs from the VDJdb database (red curves), as well as pairs shuffled within groups specific to the same epitope (orange curves) and pairs shuffled across the entire dataset (dashed black curve). For the latter case, n = 10,000 pairs were selected at random for each epitope pair with re-sampling allowed in order to balance the dataset. Labels above panels are the cognate epitope sequences. Significant differences (Kolmogorov-Smirnov test P-value less than 0.05) between real and shuffled distributions are observed for CINGVCWTV (Kolmogorov-Smirnov test D-value = 0.25, P = 6x10^-4), ELAGIGILTV (D = 0.23, P = 2x10^-2), GILGFVFTL (D = 0.45, P < 10⁻¹⁵) and GLCTLVAML epitopes (D = 0.26, P = 3x10^-8).

More »

Expand

Fig 7.

Characteristic residue contacts of MAIT TCRs.

a. Scatter-plot of amino acid pair enrichment at contacting residues for the Jα gene choice of MAIT T-cells versus overall enrichment observed for given contact residues in the PairSEQ dataset. Y axis shows the log ratio of amino acid pair probabilities for VαJαVβ combinations corresponding to MAIT T-cells and those with a free choice for the Jα gene. X axis shows observed to expected amino acid pair count ratio at contacting residues in the PairSEQ dataset. Residue pairs with enriched amino acid pairs coming from MAIT Jα gene choice (y > 0.25, corresponding to ~19% increase in frequency) are colored in red and labeled. Not that overall enrichment for corresponding amino acid pairs in the PairSEQ dataset is relatively moderate (x < 0.125, corresponding to ~9% increase in frequency). b.-d. Structural data showing Glutamine (GLN, Q) at α₁₀₈ and Tyrosine (TYR, Y) at β₅₅ interacting with an Arginine (ARG) of MHC alpha-1 helix domain in the MAIT:MR1 complex. MR1 complex structures are shown in b (4pj7) and c (5u1r), d shows an non-MAIT TCR (having the same amino acids at α₁₀₈ and β₅₅) in complex with MHCI (4jry). Polar contacts between GLN:ARG and TYR:ARG are shown with dotted lines in b and c, but are absent in d. PDB structure chain coloring: green for MHC, yellow for TCRα and pink for TCRβ; antigen peptide in d is shown with purple.

More »

Expand

Fig 8.

Exploring invariant TCR using enrichment analysis of VαJαVβJβ gene combinations.

a. Scatterplot showing enrichment of certain TCR gene trios (unique combinations of three of four TCR germline genes, either JαVβJβ, VαVβJβ, VαJαJβ or VαJαVβ) in the PairSEQ dataset. Logarithm of the ratio of the observed and expected counts for all possible gene trios is plotted against their observed count. Expected count is calculated under the assumption of random αβ pairing as (count of α part alone) x (count of β part) / (total number of reads). Points are colored by the P-value of the hypergeometric enrichment test for the co-occurrence of α and β parts of the gene trio (adjusted for multiple comparisons using Holm method). Canonical MAIT (TRAV1-2, TRAJ12/20/33, TRBV6-4) and iNKT (TRAV10, TRAJ18, TRBV25-1) variants are highlighted with corresponding labels. Only gene trios supported by at least 10 reads are shown. Pink circle highlights the Va13 Ja56 Vb10-3 population. b. Grouping of selected TCR gene trios (having adjusted P < 0.05 for enrichment test and represented by at least 10 reads) according to overlap between their VαJαVβJβ gene sets. The plot shows the layout of the resulting graph of gene trios (nodes), having edges connecting pairs of nodes with exactly matching gene sets (missing genes, e.g. Vα in JαVβJβ, are considered as wildcards). Nodes of the graph are represented by points and are colored according to the connected component (cluster) of the network they were assigned to. Cluster ID is a combination of most frequent gene names in co-clustered trios. c. CDR3 spectratyping and motifs for the Va13 Ja56 Vb10-3 population. Top plots show distribution of CDR3 alpha (left) and beta (right) chains of the population compared to all PairSEQ TCRs rearranged with corresponding alpha or beta segments, note that only a single dominant length is present for both alpha and beta. Bottom plots show sequence logos of corresponding CDR3 lengths in the population.

More »

Expand