A Universal Approach to Eliminate Antigenic Properties of Alpha-Gliadin Peptides in Celiac Disease

Celiac disease is caused by an uncontrolled immune response to gluten, a heterogeneous mixture of wheat storage proteins, including the α-gliadins. It has been shown that α-gliadins harbor several major epitopes involved in the disease pathogenesis. A major step towards elimination of gluten toxicity for celiac disease patients would thus be the elimination of such epitopes from α-gliadins. We have analyzed over 3,000 expressed α-gliadin sequences from 11 bread wheat cultivars to determine whether they encode for peptides potentially involved in celiac disease. All identified epitope variants were synthesized as peptides and tested for binding to the disease-associated HLA-DQ2 and HLA-DQ8 molecules and for recognition by patient-derived α-gliadin specific T cell clones. Several specific naturally occurring amino acid substitutions were identified for each of the α-gliadin derived peptides involved in celiac disease that eliminate the antigenic properties of the epitope variants. Finally, we provide proof of principle at the peptide level that through the systematic introduction of such naturally occurring variations α-gliadins genes can be generated that no longer encode antigenic peptides. This forms a crucial step in the development of strategies to modify gluten genes in wheat so that it becomes safe for celiac disease patients. It also provides the information to design and introduce safe gluten genes in other cereals, which would exhibit improved quality while remaining safe for consumption by celiac disease patients.


Introduction
Celiac Disease (CD) is an intestinal T cell-mediated disease caused by the gluten fraction of wheat or the homologous proteins from barley or rye. CD has prevalence between 0.5 and 2% in human populations [1] and is characterized by a chronic intestinal inflammation upon ingestion of gluten proteins. Recently, the molecular aspects have been comprehensively addressed in several review papers [2][3][4][5]. In short, in CD patients CD4+ T cells are present in the lamina propria that secrete interferon-gamma upon recognition of gluten-derived peptides bound to HLA-DQ2 or HLA-DQ8 molecules present on antigen presenting cells. Strikingly, most of the gluten peptides implicated in CD require modification by the enzyme tissue transglutaminase before they can bind to the disease-predisposing HLA-DQ molecules and trigger T cell responses [2][3][4][5]. In addition to the adaptive CD4+ T cell response to gluten, CD is characterized by the upregulation of IL-15, an intraepithelial T cell infiltrate expressing the NKG2D receptor, and the overexpression of a ligand for NKG2D (MICA) [6,7].
Many gluten peptides with T cell stimulatory properties have now been identified. Such peptides have been found in wheat a-, c-, and v-gliadins as well as in low molecular weight (LMW) and high molecular weight (HMW) glutenins [8][9][10][11][12][13][14]. Several studies have demonstrated that peptides derived from a-gliadins induce strong T cell responses in the large majority of patients, while responses to the other peptides are less frequently found [8][9][10]13]. An a-gliadin derived 33-mer peptide (amino acid sequence LQLQPFPQPQLPYPQPQLPYPQPQLPYPQPQPF) was identified that encodes six partially overlapping T cell epitopes and has very potent T cell stimulatory properties [13]. It harbours the p56-75 peptide (LQLQPFPQPQLPYPQPQLPY) that has been identified as the dominant gluten epitope [9,10]. Furthermore, agliadins are the only gluten molecules that harbor the p31-43/49 peptide that has been implicated in the innate immune response induced by gluten [7].
The a-gliadins are a gene family encoded by the Gli-2 loci, Gli-A2, Gli-B2 and Gli-D2, located on the short arm of three homoeologous chromosomes (6AS, 6BS and 6DS) of hexaploid bread wheat (Triticum aestivum L.). These loci may contain from 25-35 to even 150 a-gliadin genes per haploid genome [15][16][17], although most of these (72-95%) are presumably pseudogenes [16,17]. Sequencing of genomic a-gliadin clones from hexaploid bread wheat enabled to differentiate the sequences according to their loci Gli-A2, Gli-B2 and Gli-D2 based on genome-specific SNPs [16,17]. Relevant for CD, the occurrence and frequency of the HLA-DQ2 epitopes DQ2-Glia-a1, DQ2-Glia-a2 and DQ2-Glia-a3 (previously designated Glia-a20, ref . 12] and the HLA-DQ8 T-cell epitope DQ8-Glia-a1 also differs between the loci [17]. This was corroborated by the observation that T cell clones specific for the DQ2-Glia-a2 epitope did not recognise gluten derived from diploid species carrying the S-genome, ancestrally related to the B genome of bread wheat, while gluten derived from diploid A-and D-genome species was recognized [18]. Variation in T cell stimulatory capacity of cereal-derived gluten was observed with other T cell clones as well [19][20][21]. Indeed, differences have been observed in the T cell stimulatory capacity of pasta and bread wheat varieties [22,23], but none were safe for CD patients.
Given the overall importance of the a-gliadins in CD we set out to determine the naturally existing sequence variation in CD epitopes as deduced from a-gliadin transcripts from developing wheat grains. The immunogenic potential of these epitope variants was subsequently tested in T-cell proliferation assays. This produced insight in which key amino acid changes are sufficient to abolish T-cell recognition. Finally, we verified that the observed differences in antigenicity of a-gliadin peptides derived from diploid species corresponded to differences in the antigenicity of the gluten from these species, ancestrally related to bread wheat. Altogether the results offer a molecular basis for differential CD toxicity of the wheat genomes. Based upon these results we present a rational strategy to develop genes that encode a-gliadins that are safe for consumption by celiac disease patients.

Genetic variation in a-gliadin (Gli-2) transcripts
In total 3022 expressed a-gliadin sequences (expressed sequence tags and mRNA sequences from NCBI and Unigene) originating from 11 different T. aestivum L. varieties were analyzed. These agliadin transcripts were grouped into 55 contigs with at least 90% sequence homology. Forty per cent of the a-gliadin transcripts clustered with A genomic sequences and were attributed to locus Gli-A2, 35% originated from Gli-D2 and only 25% came from Gli-B2 (Fig. S1). After tracing all non-synonymous DNA polymorphisms 83 different transcript contigs were obtained for the 39 region of the gene that each contained at least four sequence equivalents. This indicates a high sequence diversity among expressed a-gliadin sequences in these 11 T. aestivum L. varieties.

Variants of T cell stimulatory and innate stimulatory sequences
The N-terminal region of a-gliadins contains the p31-43 epitope implicated in the innate immune response, and the immunodominant DQ2-Glia-a1 and DQ2-Glia-a2 epitopes as well as the DQ2-Glia-a3 T cell epitope. The carboxyl-terminal part encodes the immunodominant DQ8-Glia-a1 T cell epitope ( Table 1). To obtain information on the immunogenic potential of the various agliadins, the translated amino acid sequences for each locus were checked for the presence of canonical T cell epitopes and variants thereof. Tables 2, 3, 4, 5 present the most frequently expressed epitope variants (.5 transcripts each).
In addition to the canonical epitope motifs, a large series of sequence variants with one or two amino acid substitutions were detected ( Tables 2, 3 , 4, 5). The large majority of the Gli-B2 gliadins contained sequence variants of the DQ2-Glia-a1, DQ2-Glia-a2, and DQ2-Glia-a3 epitopes with two amino acid substitutions. Furthermore, Gli-A2 transcripts harbored a proline to serine substitution at position 8 (p8) of the DQ2-Glia-a2 epitope and a sequence variant (QGSFRP(S/F)QQN, amino acid substitution in bold) of the DQ8-Glia-a1 epitopes. Importantly, the 33-mer peptide (LQLQPFPQPQLPYPQPQLPYPQPQL-PYPQPQPF) that is highly resistant to degradation in the gastrointestinal tract and contains six overlapping DQ2-Glia-a1 and DQ2-Glia-a2 epitopes, conferring superior T cell stimulatory properties [13], was only observed in a subset of the a-gliadins from the Gli-D2 locus and never in the a-gliadins from the Gli-A2 and Gli-B2 loci. The latter expressed substantially truncated versions of the 33-mer (Fig. S2).

T cell stimulatory capacity of a-gliadin derived peptides
Several of the amino acid variants that we found in the a-gliadin transcriptome have never been described before while some have been described but were never tested for their T cell stimulatory capacity. In order to determine which variants are capable of inducing T cell responses, the DQ2-Glia-a1, DQ2-Glia-a2, DQ2-Glia-a3, and DQ8-Glia-a1 variants were synthesized as 15-or 16mer peptides and tested for their capacity to bind to HLA-DQ2 and induce in vitro proliferation of HLA DQ2-or DQ8-restricted T cell clones.
DQ2-Glia-a1 variants. Virtually all peptides that carry the canonical 9-mer Glia-a1 epitope core P 1 {F/Y} 2 P 3 Q 4 P 5 E 6 L 7 P 8 Y 9 (in which the glutamic acid at p6 is introduced by TG2-deamidation of the original glutamine) were able to stimulate DQ2-Glia-a1 T cells. Only an arginine residue at the position preceding the epitope core diminished T cell stimulation. In the core sequence, several amino acid substitutions diminished or abolished the T cell stimulatory capacity, such as a proline to serine substitution at p3 or p8 and a proline to alanine substitution at p5 ( Table 6, no. 3). Peptides in which an amino acid was deleted at p3 or p4 were not causing any proliferation of the T cells. Strikingly, such safe peptides are all from locus Gli-B2 ( Table 6, Fig. S3).
DQ2-Glia-a2 variants. Full responses of DQ2-Glia-a2 T cells were only observed against peptides that carry the core P/ F 1 Q 2 P 3 E 4 L 5 P 6 Y 7 P 8 Q 9 . A deletion of the glutamine at p2, indicative for a-gliadins from locus Gli-B2, or substitution of this glutamine by histidine diminished or abolished the stimulatory capacity. Furthermore, a single substitution of the proline for an serine residue at either p6 or p8 abolished the T cell stimulating capacity. The latter substitution is found in a-gliadins from locus Gli-A2 ( Table 6).
DQ2-Glia-a3 variants. Also for this epitope several amino acid substitutions were found to destroy T cell stimulatory capacity, including an arginine to proline substitution at p2, which is found in a-gliadins from Gli-B2 (Table 6).
DQ8-Glia-a1 variants. While several amino acid substitutions were found to influence T cell recognition of the canonical sequence Q 1 G 2 S 3 F 4 Q 5 P 6 S 7 Q 8 Q 9 , a single serine to phenylalanine substitution at p3 and a single glutamine to arginine substitution at p5 were found to completely destroy T cell stimulatory properties ( Table 6). While the former T cell stimulatory variants are found in a-gliadins from Gli-D2 and Gli-B2, the glutamine to arginine variants are from Gli-A2 (Table 5, Fig. S2).
Thus, the a-gliadins encoded by Gli-A2 of bread wheat are marked with a specific single amino acid substitution (P to S at p8)   and lack the capacity to stimulate Glia-a2 T cell clones, whereas agliadins encoded by Gli-B2 carry a deletion that prevents Glia-a1 T cell clone stimulation. Alpha-gliadins with the two intact epitopes, Glia-a1 and Glia-a2, are encoded by Gli-D2 (Tables 2  and 3).

T cell stimulatory capacity of diploid wheat accessions
Previously, differential reactivity of a-gliadin specific T cell clones against gluten extracts from diploid wheat accessions has been reported [18,19]. In order to link such differential reactivity to the presence of specific epitope variants, a panel of diploid wheat accessions containing either the A, S or D genome was tested for their reactivity of an a-gliadin specific monoclonal antibody (mAb) and T cells.
Pepsin-trypsin digests of gluten extracts from kernels of 29 diploid Triticum and Aegilops accessions were prepared and tested in a competition ELISA with a mAb specific for a sequence partially overlapping with the DQ2-Glia-a1 and DQ2-Glia-a2 epitopes (Fig. 1A) and, after treatment with TG2, with T cell clones specific for either the DQ2-Glia-a1 and DQ2-Glia-a2 epitope ( Fig. 1B and 1C). The results indicate that the mAb reacts strongly with all extracts except three derived from diploids expressing the S genome (Fig. 1A). Similarly, and in agreement with previous results [18], extracts from A, S and D origin were capable of stimulating the DQ2-Glia-a1 specific T cell clone (Fig. 1B) while the extracts of the diploids expressing the A genome failed to stimulate the DQ2-Glia-a2 specific T cell clone (Fig. 1C).
Based on our observation that the a-gliadins expressed from locus Gli-A2 of the T. aestivum A genome carry a variant DQ2-Glia-a2 epitope in which the proline at p8 has been replaced by a serine, and our experimental result that introducing this substitution in a peptide leads to loss of DQ2-Glia-a2 T cell stimulatory properties (Table 6), we wanted to determine if this amino acid substitution was indeed the cause of loss of immunogenicity. For this purpose we sequenced the a-gliadin locus of three diploid wheat (T. monococcum, A genome) accessions. In agreement with the results from hexaploid wheat transcripts (Fig. S2) we observed that the a-gliadin transcripts from T. Table 6. T cell proliferation and HLA-DQ2 binding capacity of DQ2-Glia-a variants.

20
LGEGFFQPSQENP Gli-D2 nd - Variants of the DQ2-Glia-a1 and DQ2-Glia-a2 epitopes as encoded by the a-gliadin transcriptome were synthesized as deamidated 14-to 17-mer peptides (column 1, underlined: DQ2-Glia-a1/a2 epitope region) and tested for stimulation of DQ2-Glia-a1 and DQ2-Glia-a2 specific T cell clones in a proliferation assay. monococcum contain only a single form of DQ2-Glia-a1 and DQ2-Glia-a2. Moreover, in the DQ2-Glia-a2 epitope the proline at p8 was consistently replaced by a serine (Table S1). Together these results establish that this naturally occurring single amino acid substitution is sufficient to completely eliminate the T cell stimulatory properties of the DQ2-Glia-a2 epitope in gluten. Differential reactivity of the DQ2-Glia-a3 specific T cells towards the extracts of the diploids correlated with an arginine to proline replacement at p2 in a-gliadins derived from the Sgenome, FRPQQPYPQR FPPQQPYPQ ( Table 6).

Elimination of a-gliadin toxicity by a naturally occurring single amino acid substitution
The large majority of known antigenic peptides derived from wheat gluten, as well as homologous peptides derived from the hordeins from barley and the secalins from rye, contain a proline at p8 [24]. Based on our observation that a single proline to serine substitution at p8 induced unresponsiveness of DQ2-Glia-a2 specific T cells and the previous observation that a similar substitution at p8 of the DQ2-Glia-a1 epitope induced T cell unresponsiveness ( Table 6; peptide no. 4), we investigated if a similar substitution would also eliminate the antigenic properties of the DQ2-Glia-a3 epitope. Wild type versions of the DQ2-Glia-a1, -a2 and -a3 epitopes as well as versions in which the proline at p8 was substituted by a serine were synthesized and tested in T cell proliferation studies. As expected neither the substituted DQ2-Glia-a1 epitope (QLQPFPQPELSYPQPQ) nor the substituted DQ2-Glia-a2 epitope (QLQPFPQPELPYSQPQ) induced T cell proliferation of respectively DQ2-Glia-a1 and DQ2-Glia-a2 specific T cells (Table 6). Likewise, the substituted DQ2-Glia-a3 epitope failed to induce T cell activation ( Fig. 2A).
We also analysed the effects of proline to serine substitutions in the 33-mer a-gliadin derived peptide that encodes 6 partially overlapping antigenic DQ2-Glia-a1 and -a2 sequences (Fig. 2B) as well as in an elongated version of the 33-mer which also encodes the DQ2-Glia-a3 epitope (Table S2). In all cases, the proline to serine substitutions completely abrogated the response of DQ2-Glia-a1, DQ2-Glia-a2 and DQ2-Glia-a3 specific T cell clones (Fig. 2B, Table S2).
Finally, we tested the effect of systematic introduction of single, double, triple and quadruple P to S substitutions in the DQ2-Gliaa-1 peptide with five T cell clones isolated from small intestinal biopsies of 3 CD patients. T cell responses of single T cell clones were only observed for two of the single P to S substitutions at p3 and p5. In contrast, with a single P to S substitution at p8 and in all but one of the other cases the substitutions completely eliminated the T cell response of all five T cell clones (Fig. 3) Thus, the naturally occurring proline to serine substitution constitutes a universal approach to remove the antigenic properties of HLA-DQ2 restricted a-gliadin peptides.

Discussion
Although T cell responses to peptides derived from a-, cand vgliadins as well as from HMW-and LMW-glutenins have been described, various studies have indicated that the a-gliadins are among the most immunogenic regarding CD [8][9][10]13,25]. A crucial step towards the elimination of gluten toxicity would thus be the elimination of T cell stimulatory a-gliadin sequences. Our extensive genetic analysis of over 3000 a-gliadin transcripts from different bread wheat accessions showed a high heterogeneity of  the a-gliadins genes and considerable differences in the number of T cell stimulatory sequences encoded by the various a-gliadin genes. We identified three major factors determining these differences: i) the length of tandem repeats of antigenic sequences; ii) natural amino acid substitutions that affect the antigenicity of T cell epitopes, and iii) amino acid deletions that eliminate the antigenicity of T cell epitopes.
The a-gliadins from locus Gli-D2 generally encode several copies of both the DQ2-Glia-a1 and DQ2-Glia-a2 epitopes in addition to the DQ2-Glia-a3 and DQ8-Glia-a1 epitopes, Gli-A2 genes usually encode only the DQ2-Glia-a1 and DQ2-Glia-a3 epitopes while the Gli-B2 genes encode no or at most one DQ2-Glia-a1 epitope, next to the DQ8-Glia-a1 epitopes. The 33-mer sequence with 6 T cell stimulatory sequences [13] was only found in a minority of the a-gliadins analyzed, all of which are expressed from Gli-D2. Many natural occuring amino acid substitutions affecting the antigenicity of the canonical a-gliadin peptides were identified. Typical examples are the proline to serine substitution at p8 in the DQ2-Glia-a2 epitope and the arginine to proline substitution at p2 in the DQ2-Glia-a3 epitope, that both completely eliminate the T cell stimulatory properties of these peptides.
The analysis of gluten extracts from the diploid wheat varieties underscores these observations and provides a molecular basis for the previous observation that A-genome diploid T. monococcum varieties lack the DQ2-Glia-a2 epitope [18]. All a-gliadins from this genome encode an altered version of the DQ2-Glia-a2 epitope with a serine at p8 that fails to induce T cell responses. We observed that a similar substitution eliminates the T cell stimulatory properties of the DQ2-Glia-a1 and DQ2-Glia-a3 epitopes. The more variable reaction pattern of T-cells and antibodies towards gluten extracts of the S-genomes reflects the higher level of genetic variation in these outcrossing species. These a-gliadins contain another variant of the DQ2-Glia-a1 and DQ2-Glia-a2 region that does not incude the A-genome specific serine at p8. Furthermore, amino acid deletions in the canonical agliadin peptides prohibit binding to HLA-DQ2 and hence T cell recognition. A typical example is the deletion of the glutamine at p4 in the DQ2-Glia-a1 epitope which generates a peptide that no longer binds to HLA-DQ2, presumably due to defective docking of the anchor residues into their respective pockets in the HLA-DQ2 molecule. Our results confirm previous observations [18,19] that the a-gliadin locus Gli-D2 encodes the most toxic a-gliadins while substantially less toxicity is associated with those from Gli-A2 and Gli-B2, but also provide the molecular basis for these differences.
Unfortunately, due to the complexity of both the Gli-2 gene family and the wheat genome, it will be a difficult task to generate tetraploid pasta and hexaploid bread wheat that is entirely safe for consumption by all CD patients by conventional breeding methods. Our results now provide a rationale for an alternative approach as we demonstrate that by the introduction of naturally occurring amino acid substitutions the toxicity of all four T cell epitopes in a-gliadins can be eliminated. Using novel methods such as zinc finger nucleases [26][27][28] we can introduce the underlying SNPs as specific mutations into the a-gliadin genes of wheat to eliminate toxicity completely. Technically this will not be easy, but our results indicate precisely the three complementary sets of actions that need to be performed.
i) In the Gli-B2-derived a-gliadins analyzed, none of the DQ2 epitopes are present due to a single amino acid deletion in the region encoding the DQ2-Glia-a1 and DQ2-Glia-a2 epitopes, which generates a peptide that has decreased binding affinity for HLA-DQ2 and is no longer recognized by T cells. Therefore, a single amino acid substitution in the DQ2-Glia-a3 epitope will result in a peptide that has completely lost HLA-DQ2 binding properties and T cell stimulatory properties. To eliminate the remaining DQ8-Glia-a1 epitope in some Gli-B2 a-gliadins, a single glutamine to arginine substitution, which naturally occurs in the a-gliadins from the A-genome, would suffice. Such a minimally genetically modified B-genome a-gliadin gene would thus no longer encode any T cell stimulatory peptides. Moreover, by starting with an a-gliadin gene in which the sequence of the p31-43 peptide is naturally altered (for example the gene encoding protein no. 9 in SI 2), the chance of innate immune stimulation by a protein derived from such a gene would also be minimized.
ii) For Gli-A2 a-gliadins, the approach to eliminate toxicity would be to introduce two proline to serine substitutions at p8 in the DQ2-Glia-a1 and DQ2-Glia-a3 epitopes present. As these agliadins genes encode a shorter version of the immunodominant 33-mer in which the DQ2-Glia-a-2 epitope is already nonfunctional, and contain a version of the DQ8-Glia-a1 epitope that has no T cell stimulatory properties, these two substitutions would completely remove toxicity in proteins encoded by such modified genes.
iii) Regarding the Gli-D2 a-gliadins, we found that the proline to serine substitutions at p8 completely abrogated the in vitro T cell stimulatory properties of the 33-mer peptide and of an elongated version of the 33-mer that also encodes the DQ2-Glia-a3 epitope. This result underscores our previous observation that most T cell stimulatory gluten peptides have a proline at p8 [24]. However, to render a-gliadins from the D-genome non-toxic, up to seven substitutions need to be introduced in a single gene.
An alternative approach is to design safe a-gliadin genes that can subsequently be introduced into celiac disease safe cereals such as rice or maize, for the production of gluten proteins. As such modified proteins will be very similar to existing a-gliadins they will most likely have indistinguishable technological properties. Thus, such gluten proteins could enhance the baking properties of these cereal crops, or they could be extracted from these crops and used as an ingredient to generate novel high quality foods for celiac disease patients. For the generation of high quality cereals that can replace wheat-based products the simple introduction of detoxified a-gliadins is unlikely to be sufficient, as baking quality is mostly determined by the HMW and LMW glutenin proteins. Therefore, additional studies will have to investigate how other gluten proteins can be detoxified as there is substantial evidence that these contain T cell stimulatory peptides as well. In previous studies we provided evidence that such epitopes can be found in the c-gliadins as well as in the LMW-and HMW-glutenins [11,12] and others have extended these observations [14,25]. In particular, we found that T cell responses to LMW-glutenins were found in children while these are much less frequent in adults [12,25]. Moreover, in a recent study a highly antigenic v-gliadin peptide was described [25] that is identical to an antigenic hordein-derived peptide reported by us earlier [20]. In essence this hordein/omega peptide is a sequence variant of the DQ2-Glia-a1 peptide and also carries a proline at the p8 position [20]. It is therefore feasible that the toxicity of this peptide can be eliminated by a proline to serine substitution at p8 as well. Preliminary results show that amino acid substitutions similar to those that destroy the T cell stimulatory properties in a-gliadins might also be effective for c-gliadin derived epitopes (Salentijn et al, in prep), but it is likely that for the LMWand HMW-glutenins other approaches will be required. This will be the subject of further studies.
In conclusion, we have demonstrated that by utilizing naturally occurring amino acid substitutions the toxicity of the four T cell epitopes in a-gliadins can be eliminated. Such modified proteins will most likely display indistinguishable technological properties. Thus, our results provide a rational approach to eliminate CD related toxicity from a-gliadins, which represents a first but crucial step towards the realization of safe gluten containing food products for CD patients.

Materials and Methods
Analysis of a-gliadin transcripts from diploid wheat varieties T. monococcum accessions CGN10500, CGN12035 and CGN10555 (CGN, Wageningen, The Netherlands) were used for cloning and sequencing of a-gliadin transcripts. Plants were grown under greenhouse conditions. Developing green kernels of single plants were harvested and used for RNA isolation according to Doyle and Doyle [29] but with 1% (w/v) poly-(vinylpyrrolidone)-10 in the extraction buffer. For the production of first strand cDNA 1 mg of total RNA was treated with DNAse I (Invitrogen, amplification grade; 18068-015) followed by RT PCR (Invitrogen SuperScript III First-Strand Synthesis System for RT-PCR; 18080-051) using random hexamer primers in a final reaction volume of 20 ml. Primers specific for a-gliadin genes, located on the conserved sequences at the 59 and 39 end of the coding region of the a-gliadin, were used to amplify a-gliadin transcripts from the cDNA samples (aF1: 59-atgaaRaCmtttcYcatc and a5R: 59-gttagtaccgaNgatgcc). The PCR conditions: 5 min. at 94uC, 30 cycles (94uC for 1 min., 49uC for 1 min. and 72uC for 2 min), 72uC for 10 min, 25 ml reaction volume. The PCR products were cloned and sequenced.

Characterization of expressed a-gliadin sequences
Over 3200 T. aestivum expressed sequence tags (ESTs) and mRNAs designated as a-gliadin or a/b-gliadin were downloaded from the NCBI UniGene library (Ta.15268, Ta Novobirskaya 67, 0.1%). The DNA sequences were aligned using the SeqMan II (DNAstar) and first assembled at a minimum match percentage of 60%, gap lengths 3200, maximum match size 50 bp. BLAST analysis of the contigs was performed to verify the a-gliadin identity of the contigs and short (,100 bp) and bad sequences were discarded. The transcript contigs of $60% homologous sequences were trimmed up to the start and stop codons. Next, the sequences were reassembled at 90% homology, which resulted in 55 a-gliadin transcript contigs containing 1 to 475 sequences. The 39 end was covered by 50 contigs (2911 transcripts) whereas the 59 end was present in 30 contigs (2753 transcripts). The consensus of these contigs were saved in separate files and used for phylogenetic studies to deduce the genome of origin of the sequences in each contig.

Phylogenic analysis
With the aim to deduce the locus, Gli-A2, Gli-B2 or Gli-D2, from which the transcripts were expressed, the 55 a-gliadin EST consensus nucleotide sequences obtained from clustering, were aligned using Clustal W, MEGA 4 [30], together with 56 genomic DNA sequences of known origin, i.e. derived from the diploid wheat species T. monococcum (A genome), T. speltoides (S genome) and Aegilops tauschii (D genome) [18] and DNA sequences that were previously assigned to a locus [31]. The sequences that covered the 59 region of the a-gliadin sequences were trimmed up to the start and up to nucleotides coding for the conserved amino acid motif PIS, located just in front of the first glutamine repeat, to cover the same region and subsequently used to generate a Neighbor-Joining tree (bootstrap test of 1000 replicates, pairwise deletion of gaps and missing data, Kimura 2parameter, Substitutions to Include Transitions + Transversions; Pattern among Lineages Homogeneous, Uniform rates among sites, number of sites = 750, in MEGA 4) (Fig. S1).

Sequence variation in epitope regions
To analyze all sequence variation, the 55 a-gliadin EST contigs, now assigned to a specific chromosome, were reassembled one by one at 99-100% match (SeqMan II, Lasergene, DNAstar). This yielded 717 different allelic variants. The consensus nucleotide sequences were translated (MEGA 4) and explored for epitopes and surrounding sequence regions using a text explorer program (PatternResearch, in house developed) after which the output file was analysed in Excel.

T cell clones, T cell proliferation and HLA-DQ2 binding assays
Gluten specific T cell clones were generated from small intestinal biopsies of celiac disease patients as described before [8,11,12]. All patients signed an informed consent form which was approved by the hospital ethics committee. Proliferation assays were performed in triplicate in 150 ml Iscove's Modified Dulbecco's Medium (Bio Whittaker, Verviers, Belgium) with 10% pooled normal human serum in 96 well flat-bottom plates using 26104 gluten specific T cells stimulated with 105 irradiated HLA-DQ2-matched allogeneic peripheral blood mononuclear cells (3000 rad) in the presence or absence of antigen (1-10 mg/ml) [8,11,12]. After 2 days 3 H-thymidine (0.5 mCi/well) was added to the cultures, and 18-20 hours thereafter the cells were harvested. 3 H-thymidine incorporation in the T cell DNA was determined with a liquid scintillation counter (1205 Betaplate Liquid Scintillation Counter, LKB Instruments, Gaithersburg, MD). A binding assay was performed as described previously [24].

Supporting Information
Figure S1 Phylogenetic analysis of a-gliadin sequences. A neighbor-joining tree was made with 55 EST consensus nucleotide sequences from hexaploid bread wheat together with 56 genomic DNA sequences derived from the diploid wheat species T. monococcum (A genome, green dots), T. speltoides (S/B genome, blue dots) and Aegilops tauschii (D genome, red dots), after alignment using Clustal W. The EST sequences (black dots) can be assigned to their locus in hexaploid bread wheat as they cluster into the same three groups as the sequences from the diploid species (A genome = locus Gli-A2, S/B genome = locus Gli-B2, D genome = locus Gli-D2 [17]). (TIF) Figure S2 Sequence variation in the N-terminal and Cterminal regions of a-gliadin genes from hexaploid wheat. The 23 most frequently found expressed sequence tag (EST) contigs were translated, the amino acid sequences were aligned and grouped per chromosomal location (the Gli-A2 locus on chromosome 6AS, Gli-B2 on 6BS, and Gli-D2 on 6DS) to present the variation in various gluten epitope regions. Note the large differences in the number of times (N) each sequence was present in the set of ESTs. In red: amino acid variation in the sequence. In back: chymotrypsin or trypsin sites (.72% affinity). (TIF) Figure S3 DQ2-Glia-a1 variants. The effect of various amino acid substitutions on the ability of the DQ2-Glia-a1 peptide to stimulate DQ2-Glia-a1 T cells.

(TIF)
Table S1 Amino acid sequences of the DQ2-Glia-a1/-a2/-a3 region of a-gliadin cDNA transcripts from three diploid T.monococcum accessions (diploid, AA genome). The gene sequences have been submitted as accession numbers HQ317881-HQ317890. (DOC) Table S2 The effects of proline to serine substitutions in an elongated version of the 33-mer, which also encodes the DQ2-Glia-a3 epitope. (DOC)