CDR3α drives selection of the immunodominant Epstein Barr virus (EBV) BRLF1-specific CD8 T cell receptor repertoire in primary infection

The T cell receptor (TCR) repertoire is an essential component of the CD8 T-cell immune response. Here, we seek to investigate factors that drive selection of TCR repertoires specific to the HLA-A2-restricted immunodominant epitope BRLF1109-117 (YVLDHLIVV) over the course of primary Epstein Barr virus (EBV) infection. Using single-cell paired TCRαβ sequencing of tetramer sorted CD8 T cells ex vivo, we show at the clonal level that recognition of the HLA-A2-restricted BRLF1 (YVL-BR, BRLF-1109) epitope is mainly driven by the TCRα chain. For the first time, we identify a CDR3α (complementarity determining region 3 α) motif, KDTDKL, resulting from an obligate AV8.1-AJ34 pairing that was shared by all four individuals studied. This observation coupled with the fact that this public AV8.1-KDTDKL-AJ34 TCR pairs with multiple different TCRβ chains within the same donor (median 4; range: 1–9), suggests that there are some unique structural features of the interaction between the YVL-BR/MHC and the AV8.1-KDTDKL-AJ34 TCR that leads to this high level of selection. Newly developed TCR motif algorithms identified a lysine at position 1 of the CDR3α motif that is highly conserved and likely important for antigen recognition. Crystal structure analysis of the YVL-BR/HLA-A2 complex revealed that the MHC-bound peptide bulges at position 4, exposing a negatively charged aspartic acid that may interact with the positively charged lysine of CDR3α. TCR cloning and site-directed mutagenesis of the CDR3α lysine ablated YVL-BR-tetramer staining and substantially reduced CD69 upregulation on TCR mutant-transduced cells following antigen-specific stimulation. Reduced activation of T cells expressing this CDR3 motif was also observed following exposure to mutated (D4A) peptide. In summary, we show that a highly public TCR repertoire to an immunodominant epitope of a common human virus is almost completely selected on the basis of CDR3α and provide a likely structural basis for the selection. These studies emphasize the importance of examining TCRα, as well as TCRβ, in understanding the CD8 T cell receptor repertoire.

α) motif, KDTDKL, resulting from an obligate AV8.1-AJ34 pairing that was shared by all four individuals studied. This observation coupled with the fact that this public AV8.1-KDTDKL-AJ34 TCR pairs with multiple different TCRβ chains within the same donor (median 4; range: [1][2][3][4][5][6][7][8][9], suggests that there are some unique structural features of the interaction between the YVL-BR/MHC and the AV8.1-KDTDKL-AJ34 TCR that leads to this high level of selection. Newly developed TCR motif algorithms identified a lysine at position 1 of the CDR3α motif that is highly conserved and likely important for antigen recognition. Crystal structure analysis of the YVL-BR/HLA-A2 complex revealed that the MHC-bound peptide bulges at position 4, exposing a negatively charged aspartic acid that may interact with the positively charged lysine of CDR3α. TCR cloning and site-directed mutagenesis of the CDR3α lysine ablated YVL-BR-tetramer staining and substantially reduced CD69 upregulation on TCR mutant-transduced cells following antigen-specific stimulation. Reduced activation of T cells expressing this CDR3 motif was also observed following exposure to mutated (D4A) peptide. In summary, we show that a highly public TCR repertoire to an immunodominant epitope of a common human virus is almost completely selected on the basis of CDR3α and provide a likely structural basis for the selection. These studies emphasize the importance of examining TCRα, as well as TCRβ, in understanding the CD8 T cell receptor repertoire. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Introduction EBV infects almost 95 percent of the world's population by the fourth decade of life. In older children and adults, primary infection with EBV often manifests as acute infectious mononucleosis (AIM), a self-contained illness characterized by fever, pharyngitis, lymphadenopathy and malaise. Activated virus-specific CD8 T cells are commonly expanded in the peripheral blood in AIM followed by a contraction in convalescence. EBV-specific memory CD8 T cells provide protection from EBV reactivation [1][2][3][4][5]. In HLA-A:02+ individuals, the EBV-specific CD8 T-cell response is dominated by responses to two lytic cycle proteins, BRLF1 (BRLF1 109-117 epitope: YVLDHLIVV) and BMLF1 (BMLF1 280-288 epitope: GLCTLVAML) [6,7].
The interaction of CD8 TCRs with virus-derived peptides bound to MHC-I molecules (pMHC) expressed on an infected cell surface [8][9][10][11] confers specificity of the CD8 T-cell response. The TCR repertoire is an important determinant of CD8 T-cell-mediated antiviral efficacy or immune-mediated pathology [12][13][14][15][16][17]. Each TCR is a membrane-bound, heterodimeric protein that is formed from two polypeptides: α and β. TCR diversity is generated through recombination events, in which each chain arises from a random rearrangement of variable (V), diversity (D), joining (J) and constant (C) gene segments [18]. The TCRβ chain rearranges first, followed by the TCRα; the TCRα chain keeps rearranging until a TCRα chain has been rearranged that is capable of successfully pairing with the TCRβ chain. The TCR is then expresses on the surface of the T cell. Due to this sequence of events it is not unusual to find T cells with the same TCRβ chain paired with different TCRα; reports of the reverse are, however, rare. This recombination process results in a diverse pool of unique TCRα and β clonotypes. Additions of non-templated (N) nucleotides or deletions of nucleotides at the V(D)J junctions, commonly known as the complementarity-determining region 3 (CDR3) and pairing of different TCRα and β segments further enhance the overall diversity of the TCR repertoire, estimated to range from 10 15 −10 20 unique potential TCRαβ clonotypes [15,19].
TCR repertoire analyses enable us to visualize the clonotypic identity of TCRs and to glean information about important features that dictate antigenic specificity or recognition [20,21]. Novel sequencing methods have shed important insights into the composition and organization of the TCR repertoires of common pathogens. Despite the multitude of V and J genes and virtually limitless number of TCRs that can be made from V(D)J recombination, skewing of pathogen-specific repertoires have been observed [22][23][24][25][26]. These skewed repertoires have been observed in the forms of preferential usage of particular V and/or J genes, as well as in conservation of CDR3 motifs or sequences within and across individuals. There has been a growing interest in understanding how these biases emerge, their immunological relevance, and their implications for either protection or immunopathology [12,20,21,[26][27][28]. A caveat of the majority of studies to date is that they have focused primarily on analyses of the TCRβ chain, with little regard for the TCRα chain. One of the reasons that the mechanisms that shape Tcell memory through TCRα selection have been difficult to delineate is due to technical constraints associated with the lack of VA-family specific antibodies. A T cell's ability to coexpress two α-chains may also contribute [29,30]. We performed single-cell TCRαβ sequencing of immunodominant GLC-BM (BMLF-1 280 epitope) and YVL-BR-specific CD8 T cells directly ex vivo over the course of primary infection and applied a newly developed analytical tool [20] for the identification of significantly enriched features in epitope-specific TCR repertoires. This revealed selective use of particular AV genes in the YVL-specific TCR repertoire, as well as identified novel pairing relationships between the α and β TCR chains. We identified a TCRα chain, AV8.1-KDTDKL-AJ34, which was shared by all study participants. Additionally, this chain was degenerate and paired with multiple different TCRβ chains in the same individual, suggesting that it might be important for antigen recognition. To gain insight into a potential basis for the conserved use of this TCRα chain, we solved the crystal structure of the YVL-BR peptide-bound and uncovered a solvent-exposed Asp(D) at position 4 of the peptide. Further examination of potential structural constraints suggests that a conserved Lys(K) within the CDR3α of the TCRα clones may mediate contact with the protruding Asp(D) on the YVL-BR peptide-bound MHC.

The TCR repertoire is individualized and qualitative features distinguish YVL-BR-and GLC-BM-specific TCR repertoires
To better understand the EBV-specific CD8 TCR repertoire, we performed single-cell paired TCR sequencing of tetramer-sorted, epitope-specific CD8 T cells from four donors during AIM and CONV. A total of 65 and 64 (YVL-BR; AIM and CONV) and 48 and 52 (GLC-BM; AIM and CONV) productive paired TCRαβ sequences were generated (Tables 1 and S2). Circos plots were used to examine pairing relationships between AV and BV gene segments by individual, epitope and time point (Fig 1). These analyses revealed that each individual had a unique repertoire. For example, the pattern of AV-BV pairing for YVL-BR TCRs was more  intricate in some individuals (E1655 and E1651), as demonstrated by the dense interaction web, than in others (E1603 and E1632) (Fig 1A). However, despite the uniqueness of the repertoire, there were prominent features shared across individuals and these features were peculiar to each epitope. Such features included the use of AV8.1 by YVL-BR TCRs in all individuals in both AIM and CONV. This gene was overrepresented in three of the four donors (E1603, E1655, and E1651) and paired with multiple BV genes (Fig 1A). The pairing of AV8.1 with multiple BV genes was more pronounced in E1651 and E1655. The BV repertoire of YVL-BR used multiple different families that differed between the donors with no obvious shared features. With respect to GLC-BM, although there were difference between the donors, AV5 and

Fig 1. Patterns of AV-BV pairings by YVL-BR (A) and GLC-BM (B) specific CD8 T-cells as revealed by single-cell TCRαβ sequencing.
The frequencies of AV-BV combinations in four donors during AIM (i) and CONV (ii) for YVL-BR-(A) and GLC-BM-specific (B) TCRαβ repertoires are displayed in circos plots, with frequency of each AV or BV cassette represented by its arc length and that of the AV-BV cassette combination by the width of the joining ribbon. The numbers of unique and productive paired TCRαβ clonotypes as well as the total numbers of sequences for each donor are shown below the pie charts (# of unique TCRαβ clonotypes; total # of sequences).
BV20 were common in all four patients in AIM and CONV. The public AV5-BV20 pairing was conserved in all individuals (Fig 1B), consistent with previous reports [31,32].

The selection of the YVL-BR-specific repertoire is mainly driven by TCRα
The qualitative analysis of the repertoire discussed above suggested the presence of generalizable features associated with epitope-specific TCRs. To quantitatively evaluate these features, we used a TCRαβ sequence-based analytical tool recently developed by Dash and colleagues [20]. This tool identifies features that are significantly enriched in an epitope-specific repertoire by quantifying different properties of an epitope-specific TCR repertoire and comparing them against a collection of publicly available human TCR sequences. The Shannon-Jensen divergence is used to identify significantly enriched gene segments (Fig 2A) and to score the magnitude of preferential gene usage (Fig 2B) within an epitope-specific repertoire, then the adjusted mutual information is used to measure the magnitude of gene usage correlations within and across TCRα/β (Fig 2C). In order to work with a large dataset, we merged the data from all patients, per time point and per epitope. We first examined the pattern of VA-JA/ VB-JB gene segment pairing by ribbon plots (Fig 2A). There were identifiable enriched gene segments and pairings. For example, the AV8.1/AJ34 pairing was significantly enriched in the YVL-BR repertoire in AIM and CONV but paired with multiple different BV/JB genes (Fig  2A). In contrast, the AV5/AJ31 combination paired with BV20.1/ BJ1.3 was significantly enriched in the GLC-BM repertoire in AIM and CONV (Fig 2A).
To quantitatively evaluate the degree of bias in the repertoire, we evaluated the gene preference score. We observed that YVL-BR-specific CD8 T cells exhibited a strong preferential usage of particular AV and AJ in both AIM and CONV. By contrast, GLC-BM-specific TCRs exhibited strong preferential usage of particular AV, AJ, BV and BJ (Fig 2B). This is consistent with previous structural studies demonstrating that both the TCR α and β chains are important for recognition of the GLC-BM epitope [33]. Altogether, these data suggest that the TCRα chain may be important for selection of YVL-BR-specific CD8 T cells whereas both the TCR α and β chains together are important for selection of GLC-BM-specific CD8 T cells. Moreover, quantification of the degree of gene usage correlations within and across TCRα/β revealed that GLC-BM-specific TCRα and β repertoires were rigid; every gene association except for two in AIM (AV-AJ and AV-BJ pairings) and one in CONV (AV-BJ) were enriched and thus important for selection into the repertoire (Fig 2C). In contrast, the YVL-BR repertoire was highly flexible in gene pairings. No obvious gene associations emerged for YVL-BR-specific TCR in AIM, whereas only the AV-AJ association emerged as important in CONV (Fig 2C). These data highlight the differences in selection of the YVL-BR and GLC-BM TCR repertoires at the clonal level.

Identification of a shared public and dominant CDR3α, AV8.1-KDTDKL-AJ34 that pairs with multiple different BV genes within the YVL-BR repertoire
To dissect the clonal composition of the TCRαβ repertoires, we clustered similar clones into groups (Figs 3 and 4) by performing a number of analyses including 2D kernel principal component analysis (kPCA) projections (Fig 4C-4D) and hierarchical clustering with dendograms (Figs 3 and S3) for YVL-BR (C) and GLC-BM (D) specific responses (n = 4 donors pooled in AIM and CONV). In the 2D kPCA projections, the color correlates to gene usage. The hierarchical clustering is presented as a dendogram of the paired TCRαβ clones and also derived TCR logo representations showing gene usages and frequencies and CDR3 amino acid sequences of specific clusters (Figs 3 and 4C and S3). For the YVL-BR response, clustering was driven by the TCRα chain, particularly the dominant AV8.1-KDTDKL-AJ34 expressing clones; this TCRα chain was detected in all individuals and resulted from an obligate pairing between AV8.1 and AJ34 (Fig 3). More importantly, this public AV8.1-KDTDKL-AJ34 TCR is so important for selection of the YVL-BR TCR repertoire that there is an unusually high frequency of clones where this one TCRα chain pairs with multiple different TCRβ chains within a single donor (median 4; range: 1-9) (Fig 3 and Table 2). It is not uncommon to find a single TCRβ chain to rearrange and pair with multiple different TCRα as TCRβ rearranges first and is expressed before TCRα. Because of this order in TCR rearrangement, it would be less common to see multiple TCRβ with the same TCRα. This finding suggests that this TCRα is so highly favored by its interaction with EBV-BR/MHC that these rare event TCR rearrangements dominate the repertoire. In contrast, in the GLC-BM TCR repertoire there was no evidence of such pairing of a single public TCRα chain being paired with multiple different TCRβ chains or vice versa. Unlike YVL-BR, the clustering of GLC-BM-specific TCRs was driven by dominant interactions with both the TCRα and β chains (Figs 4D and S3).

Predictable CDR3α motif features drive the selection of the shared AV8.1-KDTDKL-AJ34 expressing clones
To better understand the factors driving the selection of AV8.1-KDTDKL-AJ34 expressing clones, we analyzed CDR3 sequences for motifs and conserved residues that may determine epitope recognition [20]. The analytical tool reported by Dash and colleagues [20] is a predictive algorithm for the identification of key enriched residues within CDR3 regions that may be important in antigen recognition, known as the CDR3 motif. It is based on the principle that amino acids within the CDR3 which do not tolerate substitutions when comparing clusters of similar TCRα or β chain sequences must be important for interaction with the pMHC. It also identifies sequence patterns or motifs that occur more frequently in an epitope-specific TCR repertoire compared with a background dataset, which consists of publicly available and randomly selected unpaired non-epitope-specific TCRs derived from high-throughput TCR repertoire profiling [34][35][36][37] and calculates fold enrichment of the observed motifs in epitopespecific repertoires over the expected in the background dataset (Fig 4A and 4B).
With this useful algorithm, for GLC-BM we identified a known [20,33] highly public dominant CDR3α motif AEDNNA, with a conserved Asp(D) residue within these public AV5-AJ31  CONV (B). Number on the branches and next to TCR logos depicts number of TCRs contributing to the cluster. Color of the branches indicates the TCR probability generation scores. The bar at the bottom of the CDR3 logo is color-coded by the source of the nucleotide. Light grey, red, black, and dark grey denote that the nucleotides encoding those amino acid residues originate from the V, N, D and J regions, respectively. Analyses are based on Dash et al. [20]. https://doi.org/10.1371/journal.ppat.1008122.g003 CDR3α drives selection of EBV-specific CD8 TCR repertoire expressing clones (Fig 3Bi). We also identified two known [20,33] CDR3β motifs, RDxTGN within the public BV20.1-expressing clones in CONV and S/P,T/P,S/G,G within the public BV14-expressing clones in AIM (Fig 3Bii) as being important for GLC-BM recognition. We should note that CDR3 motifs that were identified within the GLC-BM response during AIM used similar clonotypes in CONV, particularly the highly public AV5-AEDNNA-J31 paired with VB20.1-SARDXXGN-J1.2/1.3, consistent with observations from deep sequencing data In the YVL-BR-specific TCR repertoire, we discerned a strong CDR3α motif and some CDR3β motifs (Fig 4A and 4C). Within the public AV8.1/AJ34 clonotypes, the CAVKDTKL motif was highly conserved (Fig 4A) from AIM to CONV and contained the conserved amino acid pair "KD" which is partially non-germline (S2 Fig), suggesting that they may provide a selective advantage and play a critical role in YVL-BR recognition. There was a second CDR3α motif (CAXRxGGN) that included a conserved "GG" pair predominantly resulting from the AV14-AJ21 recombination event. Interestingly, although many different BV were used in the YVL-BR response, this algorithm identified two potential CDR3β motifs, "xAxL", present in AIM and "GTSx" present in CONV, resulting from multiple BV-BJ recombination events; this suggests that TCRβ may play some role in selection of YVL-BR-specific TCRs but it differs between AIM and convalescence.
To experimentally test whether the first Lys(K) in the CAVKDTKL motif and present in the highly conserved amino acid pair, "KD" was important for YVL-BR recognition, we cloned one of the YVL-BR-specific TCR expressing the AV8.1-CAVKDTKL-AJ34 motif (wild-type, WT,TCR; AV8.1-CAVKDTDKLIF-AJ34BV24.1-CATSDWDDSTGELFF-BJ2.2). We confirmed the antigenic specificity of the cloned WT TCR by expressing it in TCR-null CD8αexpressing J76 cells; the TCR-transduced cells stained with YVL-BR tetramer (Fig 5Ai and  5B). To determine whether the Lys(K) within the conserved "KD" amino acid pair was involved in antigen recognition, we mutated the WT TCR (K113A TCR). This mutation abrogated YVL-BR tetramer staining (Fig 5Aii and 5B), consistent with this Lys(K) contributing to YVL-BR/MHC recognition. We confirmed that the mutated TCR was expressed by measuring CD3 upregulation (S1 Fig), which excludes the possibility that the lack of tetramer staining was due to a lack of TCR expression. Increasing the amount of YVL-BR tetramer did not result in tetramer staining to the mutated TCR (K113A; Fig 5B). Additionally, we observed CDR3α drives selection of EBV-specific CD8 TCR repertoire functional signaling through the wild-type TCR as measured by the upregulation of CD69 following stimulation of the WT TCR-transduced J76 cells with peptide-pulsed HLA-A2-expressing T2 cells acting as antigen presenting cells (Fig 5C). Introduction of the K113A mutation substantially decreased CD69 upregulation, further indicating that Lys(K) contributed to CDR3α drives selection of EBV-specific CD8 TCR repertoire YVL-BR/MHC recognition (Fig 5C). Of interest, a small functional response was still inducible even though tetramer no longer bound to this mutated TCR, consistent with previous studies demonstrating induction of functional response but no tetramer binding when using variant ligands [17,[38][39][40]. To further dissect the role of the Lys(K) within the KD motif, we investigated the effect of D4!A peptide mutation (D4A mutant peptide) on CD69 upregulation on cells expressing the WT TCR (Fig 5C). We observed that J76 cells expressing the WT TCR and stimulated with T2 cells pulsed with either the WT peptide or the D4A mutant peptide displayed levels of CD69 upregulation not significantly different from background levels (cells stimulated with a non-cognate HLA-A2-restricted epitope derived from the protein tyrosinase) at peptide concentration of 10 −9 M and 10 −8 M. However, as peptide concentration increased, CD69 expression on WT TCR-expressing cells stimulated with D4A mutant peptide at peptide concentrations of 5x10 -7 and 10 −7 M was significantly reduced compared to those stimulated with the WT peptide. These observations suggest that the Asp(D) residue at position 4 of the peptide plays a role in mediating T-cell signaling. However, the Lys(K) on the CDR3α may play a more important role given that mutation of this residue led to substantial reduction of CD69 upregulation. Further structural analyses of the TCR with its cognate ligand will be necessary to ascertain the existence of an interaction between the Lys(K) in the "KD" CDR3α motif and Asp(D) at position 4 on the peptide.

Structural features of YVL-BR/MHC support the role of Vα in selection of the YVL-BR-specific TCR repertoire
We next determined the crystal structure of the HLA-A2/YVL-BR complex in order to identify specific structural features that might be associated with the dominant AV8.1-KDTDKL-AJ34 usage in the YVL-BR-specific TCRs (Fig 5D). Structure determination was complicated initially by the relatively low resolution (S3 Table) and high degree of translational pseudosymmetry (S4A and S4D Fig) that characterized the HLA-A2/YVL-BR crystals (see Methods), but ultimately clear and continuous electron density extending the full length of the peptide allowed confident location of all peptide atoms (S4E-S4F Fig). In the HLA-A2 complex, the nonameric YVL-BR peptide is bound in conventional orientation, with the amino terminus and side chains of Val(V) at position 2 and Val(V) at position 9 accommodated in pockets (A, B, and F, respectively) in the peptide binding site. Notably, the side chains of Tyr(Y) at position 1, highly negative charged Asp(D) at position 4, and His(H) at position 5 project away from the binding site, and compose a highly featured surface positioned for recognition by CDRα loops of a TCR bound in conventional orientation (Fig 5D). By contrast, the side chains of Leu (L) at position 6, Ile(I) at position 7, and Val(V) at position 8, together with exposed mainchain atoms, present a rather non-descript generally hydrophobic surface positioned to interact with CDRβ loops of a TCR bound in conventional orientation.

Discussion
We report the use of single-cell TCR sequencing on tetramer-sorted EBV-specific CD8 T cells to study the TCR repertoires at the clonal level to a previously unstudied immunodominant, HLA-A2-restricted EBV epitope, YVL-BR, comparing it to GLC-BM repertoires in the same donors. Our studies are unique since they examined both CD8 TCR α and β directly ex vivo from the peripheral blood of individuals during primary EBV infection (AIM) and again 6 months later in convalescence (CONV). Although the TCR repertoire was individualized (i.e., each donor studied had a unique TCR repertoire), there were prominent public features that were significantly enriched and shared across individuals.
We identified a CD8 T-cell repertoire to a dominant human viral epitope, YVL-BR, that was significantly biased towards TCRα usage based on the CDR3α. We identified a ubiquitous and public TCRα chain, AV8.1-KDTDKL-AJ34, that paired with multiple different TCRβ chains in the same donor. In contrast and consistent with prior reports, GLC-BM-specific CD8 T-cell repertoires were notably skewed in both TCRα and TCRβ usage with a bias towards use of the public AV5-AJ31-BV20-BJ1.2 combination.
Prominent biases have been noted in the TCR repertoire of various common virus-specific CD8 T cells (influenza, cytomegalovirus, hepatitis C virus) [20,22,23,25,26,28,41]. However, most studies to date have only examined the TCRβ chain. The TCRα repertoire has been under-appreciated due to the technical constraints associated with the lack of VA-family specific monoclonal antibodies and the potential for T cellsto co-express two α-chains [29,30]. The apparent selection of a strongly TCRα-driven YVL-BR repertoire suggests that examination of the TCRα repertoire as well as the TCRβ provides a more comprehensive view of an epitope-specific TCR repertoire.
There have been limited reports of the importance of TCRα in viral epitope-specific responses but this appeared to relate to TCR CDR1α interaction with the MHC as observed with the HLA-A � 02-restricted yellow fever virus epitope, LLWWNGPMAV which has a biased TRAV12.2 usage [42]. HLA-B � 35:08 restricted EBV BZLF1-specific responses appear to be biased in both TCRα and TCRβ usage, much like HLA-A2-restricted GLC-BM [43,44]. There is a strong preservation of a public TCRα clonotype, AV19-CALSGFYNTDKLIF-J34, which can pair with a few different TCRβ chains. TCRα chain motifs have also been described for HLA-A02:01-restricted MI 58-67 (GIL-M1) IAV-M1, but these appear to make minor contributions to the pMHC-TCR interaction, which is almost completely dominated by CDR3β [26,30,45].
The TCR repertoire of the HLA-A02:01-restricted MI 58-67 (GIL-M1) epitope from influenza A virus is highly biased towards the TRBV19 gene in many individuals and displays a strong preservation of a dominant xRSx CDR3β motif. Crystal structures of TCR specific to this epitope have revealed that the TCR is β-centric with residues of the TRBV19-encoded CDR1 and CDR2 loops engaging pMHC and the conserved arginine in the CDR3β loop being inserted into a pocket formed between the peptide and the α2-helix of the HLA-A02:01 [26,46]. The TCRα is not as important as the TCRβ in pMHC engagement and this helps explain the high degree of sequence conservation in the CDR3β and the variability in the CDR3α. Similarly, studies using EBV virus GLC-BM-specific CD8 T cells have documented that TCR-pMHC binding modes also contribute to TCR biases. Miles and colleagues [33] showed that the highly public AS01 TCR, which is specific to the HLA-A � 02:01-restricted EBV-derived GLC epitope, was highly selected by the GLC-BM epitope because of a few very strong interactions of its TRAV5-and TRBV20-encoded CDR3 loops with the peptide/MHC.
Given the aforementioned studies, we reasoned that the differences in constraints in the TCR repertoires of YVL-BR and GLC-BM may give a picture of the essential requirements of antigen recognition and that the topology of the pMHC may provide some structural insights into the mechanisms underlying these constraints. This alpha-centricity displayed by the YVL-BR repertoire might be grounded in the fact that the TCRα chain makes more pronounced contact with its ligand, the pMHC, compared to the TCRβ chain. In light of this apparent TCRα bias in the YVL-BR repertoire and the substantial evidence that TCR-pMHC binding modes also contribute to TCR biases, we hypothesized that the bias for TCRα in the YVL-BR repertoire most likely reflects an inherent feature of how the YVL-BR peptide lies in the HLA-A2 groove or the topology of the pMHC. To understand why the YVL-BR epitope drives the selection of this highly conserved public TCRα, we applied the Dash et al. algorithm [20] to identify key amino acid residues that might be important for antigen recognition. We uncovered the AV8.1-KDTDKL-AJ34 motif with the highly conserved "KD" amino acid pair within this clonotype as being potentially critical for recognition of the YVL-BR epitope.
These motifs and conserved residues suggest a structural basis for YVL-BR recognition. Our structural analysis revealed that the MHC-bound YVL-BR bulged at position 4, in a region of the peptide that TCRα would have to accommodate, exposing a negatively charged Asp(D). We cloned a TCR expressing the dominant public TCRα chain AV8.1-KDTDK-L-AJ34 and using mutagenesis, we provided evidence indicating that the positively charged Lys(K) residue in the CDR3α of the public TCRα chain was important for YVL-BR/MHC recognition. Hence, we propose that the preferential selection of AV8.1-KDTDKL-AJ34, which contains the positively charged (Lys)K in position 1, is potentially driven by an electrostatic interaction between this Lys(K) and the solvent-exposed Asp(D) on the peptide. Future structural analyses of this TCRαβ with its ligand would be important to validate the existence of this electrostatic interaction and to confirm whether the TCRα contributes the majority of contacts with the pMHC. This apparent preference for TCRα may create a large repertoire of different memory TCRβ that could potentially cross-react with other ligands such as IAV-M1 58 , which predominantly interact with TCRBV [17,20,26].
In summary, we describe a virus-specific TCR repertoire that is CDR3α-centric and have proposed a structural basis for the selection of a public TCRα chain with a conserved CDR3 motif that seemingly relies on the topology of the pMHC. These studies underscore the importance of studying the TCRα, which has been underappreciated, as well as the TCRβ chains in order to get a complete picture of the determinants of TCR selection. Our findings will contribute to the growing interest and effort focused on the ability to predict the specificity and efficacy of TCR from sequence analyses for the fundamental goal of designing better T-cell therapies [12,20,21,[47][48][49].

Study population
Four individuals (E1603, E1632, E1655 and E1651) presenting with symptoms consistent with acute infectious mononucleosis (AIM) and laboratory studies consistent with primary infection (positive serum heterophile antibody and the detection of EBV viral capsid antigen (VCA)-specific IgM) were studied as described [17]. Blood samples were collected in heparinized tubes at clinical presentation with AIM symptoms (acute phase) and six months later (memory phase). PBMC were extracted by Ficoll-Paque density gradient media.

Ethics statement
The Institutional Review Board of the University of Massachusetts Medical School approved these studies, and all participants were adults and provided written informed consent.

Isolation of YVL-BR-and GLC-BM-specific CD8 T-cells by tetramer staining and single-cell sorting directly ex vivo
The percentages of peripheral blood antigen-specific CD8 T-cells were measured by tetramer staining and flow cytometry. Antibodies were purchased from BD Biosciences and included: anti-CD3-FITC, anti-CD4-AF700 and anti-CD8-BV786, 7AAD and PE-conjugated HLA-A � 02:01-peptide tetramers (BRLF-1 109−117 : YVLDHLIVV, A2-YVL-BR; BMLF-1 280−288 : GLCTLVAML, A2-GLC-BM). Tetramers were made in-house and underwent quality assurance, as previously described [50]. Total CD8 T-cells were enriched from PBMCs by positive selection using MACS technology (Miltenyi Biotec, Auburn, CA) according to the manufacturer's protocol. The cells were then stained with anti-CD3, anti-CD4, anti-CD8, 7AAD, and GLC-BM-or YVL-BR-loaded tetramers for 30 minutes on ice in staining buffer (1% BSA in PBS). Single-cell (into 384-well PCR plates) (FACSAria III, BD) of live CD3+, CD8+, and GLC-BM-or YVL-BR-tetramer+ cells were sorted by flow cytometry for subsequent TCR analysis. . Reverse transcription and global amplification of full-length cDNA was performed using the SMARTer kit (Clontech, Mountain View, CA) as described by the manufacturer. The synthesized cDNA library was used as template to amplify CDR3α/β as previously described [51]. Briefly, amplification involves a two-step multiplex PCR using a pool of 20 and 22 forward primers specific to Vα and Vβ gene families and two reverse primers, each specific to Cα or Cβ. The PCR products were purified and individually analyzed by Sanger sequencing. Nucleotide sequences (see S4 Table) were analyzed according to the IMGT/V-QUEST web-based tool [52]. Only productively rearranged CDR3α and CDRβ sequences without stop codons were used for repertoire analyses.

Single-cell cDNA synthesis and paired TCRαβ analysis of EBV-specific CD8 T-cells
Ribbon plots, gene correlations and gene preferences. An analytical tool developed by Dash et. al. [20] was used to characterize patterns of gene segment usage by ribbon plots, correlate gene usage within a chain (for example, AV-AJ, BV-BJ) and across chains (for example, AV-BV, AV-BJ), and to quantify gene preference usage (the quantification was done by comparing the gene frequencies in our epitope-specific repertoires to those seen in a background set of publicly available non-epitope-selected repertoire using the Shannon diversity index). The analysis was done by combining productive single-cell TCRαβ sequences from the 4 donors at each time point and for each epitope. A total of 65 and 64 (YVL-BR; AIM and CONV) and 48-52 (GLC-BM; AIM and CONV) productive paired TCRαβ sequences were generated.

EBV DNA quantitation in B cells
B cells were purified from whole blood using the RosetteSep human B-cell enrichment cocktail according to the manufacturer's recommendations (StemCell Technologies, Vancouver BC, Canada). Cellular DNA was extracted using QIAGEN DNeasy Blood & Tissue Kit (Valencia, CA). Each DNA sample was diluted to 5ng/ul and the Roche LightCycler EBV Quantitation Kit (Roche Diagnostics, Indianapolis, IN) was used to quantify EBV DNA copy number in the samples as recommended by the manufacturer. Reactions were run in duplicate. B cell counts in each sample were determined using a previously described PCR assay to quantify the copy number of the gene encoding CCR5 (two copies per diploid cell) [53]. Samples were normalized to B cell counts and EBV DNA copy number was calculated as DNA copy per 10 6 B cells.

TCR cloning
Full-length TCRα and TCRβ chains were amplified using the synthesized cDNA (described above) as template, a forward primer (specific to the corresponding Vα or Vβ gene segment identified by single-cell CDR3 sequencing) and a reverse primer specific to the constant region. The alpha chain was linked to the beta chain via the viral P2A sequence for stoichiometric expression and subcloned by homologous recombination into the pscALPS lentiviral vector (a gift from Dr. Jeremy Luban).

Lentivirus transduction of TCR
293T cells (CRL-3216, ATCC) and J76 cells (a TCR-deficient CD8α-expressing Jurkat cell line [54]; a gift from Dr. Wolfgang Uckert) were maintained in DMEM and RPMI, respectively, containing 10% FBS, HEPES and L-glutamine. Viral packaging was performed in 293T cells using the following three vectors (gifts from Dr. Jeremy Luban): recombinant lentiviral vector containing the cloned TCR gene, an HIV-1 Gag-Pol packaging plasmid (psPAX2) and a VSV-G plasmid (pMD2.G), and the Lipofectamine 2000 Transfection Reagent Kit (Invitrogen, Carlsbad, CA) as recommended by the manufacturer. Infection was performed by incubating the harvested virus supernatant with J76 cells. The cells were expanded and tetramer staining of the TCR-transduced J76 cells was performed 5 days post infection to validate the specificity of the overexpressed TCRs.

CD69 upregulation
T2 HLA-A2 expressing antigen-presenting cells (CRL-1992, ATCC) were maintained in RPMI containing 10% FBS, HEPES and L-glutamine. T2 cells were pulsed with the desired peptide concentration overnight at 37˚C. The cells were then washed to remove excess peptide. The peptide-pulsed T2 cells were co-cultured with TCR-transduced J76 cells for 90 minutes. Cells were harvested and stained with anti-CD8, anti-CD3 and anti-CD69 for 30 minutes in staining buffer. Cells were washed twice and analyzed via flow cytometry using a Calibur (BD Biosciences). Assay was performed in duplicate.

Soluble HLA-A2/YVL-BR protein production and crystallization
Soluble HLA-A2/YVL-BR complexes were prepared by folding urea-solubilized bacteriallyexpressed inclusion bodies of HLA-A2 heavy chain and human βeta2-microglobulin in the presence of 5mg/L synthetic YVL-BR peptide essentially as described [55], followed by concentration and buffer exchange into 10mM Tris-Cl (pH 8.0) using a tangential flow concentrator. Folded HLA-A2/peptide complexes were isolated from the buffer-exchanged folding mixture by a series of chromatography steps consisting of Hitrap Q and Mono Q ion exchange and S-200 gel-filtration columns (GE healthcare). Crystals were grown from purified HLA-A2/ YVL-BR by sitting drop vapor diffusion using 10.5% (w/v) PEG 4000, 35 mM Tris base/ HCl (pH 8.5), 70 mM Li 2 SO 4 . Crystals were briefly soaked in 1:1 mixture of saturated sucrose and reservoir buffer for cryoprotection and flash-frozen in liquid nitrogen and sent to LRL-CAT beamline at the Advanced Photon Source (Argonne, IL USA).

HLA-A2/YVL-BR Structure determination and refinement
Diffraction data extending to~3.2Å collected from a single crystal were integrated and indexed using Mosflm [56]. Initially data were indexed in a C2 unit cell (189.2 x 49.7 x 291.6 Å, β = 94.5˚), with molecular replacement using Phaser [57] identifying four copies per asymmetric unit of a HLA-A2 model [26] with TFZ = 22. However, refinement of this model stalled at Rfree = 0.42. Re-examination of the diffraction pattern identified a lattice of weak spots spaced between the stronger spots originally indexed (S4A Fig), and the data were reindexed in a P2 1 unit cell (189.9 x 100.2 x 292.4 Å, β = 94.4˚). The newly identified spots comprise the k = 2n+1 and h+(k/2) = 2n+1 sets (S4 Fig). Additional molecular replacement, symmetry considerations, and examination of composite-omit maps calculated using CCP4i [58] identified 20 copies of HLA-A2 per asymmetric unit (S4B Fig). The molecules are arranged in two layers viewed looking into the ac plane (S4C Fig); slight differences can be observed between these layers, and between similarly oriented molecules within the same layer (S4D Fig). These differences, along with four molecules (A,J,K,T) not identified in the C2 cell, explain the lower symmetry and strong translational pseudosymmetry that resulted in weak intensities for the k = 2n +1 and h+(k/2) = 2n+1 spots. After identification of the correct crystallographic and non-crystallographic symmetries, clear electron density covering all peptide atoms was observed in 20-fold averaged composite omit maps (S4E Fig), and a model for the YVL-BR peptide was built using Coot [59]. Refinement using Phenix [60] proceeded smoothly despite the relatively low resolution when dihedral restraints to a higher-resolution reference model were provided. Models for the YVL-BR peptide, which was not included in the reference model restraints, did not vary significantly between the non-crystallographically related copies (S4F Fig). A paired refinement test [61] Table. Characteristics of study population. 1 Single-cell paired TCRαβ sequencing was performed on tetramer sorted CD8 T cells of all four donors at presentation with AIM and 5-8 months later. 2 Time elapsed between AIM and CONV. 3 Frequency of HLA-A2 restricted GLC or YVL tetramer+ cells within CD3+ CD8+ T cells in PBMCs isolated from each respective donor. 4