Impact of mutations in Toll-like receptor pathway genes on esophageal carcinogenesis

Esophageal adenocarcinoma (EAC) develops in an inflammatory microenvironment with reduced microbial diversity, but mechanisms for these influences remain poorly characterized. We hypothesized that mutations targeting the Toll-like receptor (TLR) pathway could disrupt innate immune signaling and promote a microenvironment that favors tumorigenesis. Through interrogating whole genome sequencing data from 171 EAC patients, we showed that non-synonymous mutations collectively affect the TLR pathway in 25/171 (14.6%, PathScan p = 8.7x10-5) tumors. TLR mutant cases were associated with more proximal tumors and metastatic disease, indicating possible clinical significance of these mutations. Only rare mutations were identified in adjacent Barrett’s esophagus samples. We validated our findings in an external EAC dataset with non-synonymous TLR pathway mutations in 33/149 (22.1%, PathScan p = 0.05) tumors, and in other solid tumor types exposed to microbiomes in the COSMIC database (10,318 samples), including uterine endometrioid carcinoma (188/320, 58.8%), cutaneous melanoma (377/988, 38.2%), colorectal adenocarcinoma (402/1519, 26.5%), and stomach adenocarcinoma (151/579, 26.1%). TLR4 was the most frequently mutated gene with eleven mutations in 10/171 (5.8%) of EAC tumors. The TLR4 mutants E439G, S570I, F703C and R787H were confirmed to have impaired reactivity to bacterial lipopolysaccharide with marked reductions in signaling by luciferase reporter assays. Overall, our findings show that TLR pathway genes are recurrently mutated in EAC, and TLR4 mutations have decreased responsiveness to bacterial lipopolysaccharide and may play a role in disease pathogenesis in a subset of patients.


Introduction
Esophageal adenocarcinoma (EAC) is increasing in incidence and has poor survival outcomes. The main risk factor for EAC is Barrett's esophagus, a pre-malignant glandular epithelium that develops in the setting of gastro-esophageal reflux disease. Over time exposure to refluxed acid and bile in the lower esophagus leads to chronic inflammation, increased cell turnover, production of reactive oxygen species and DNA damage [1,2]. The combination of exposure to noxious substances and defects in DNA damage repair enable cancer cells to accumulate mutations, evidenced by characteristic mutational signatures [3,4].
Next generation sequencing studies have shown that accumulation of somatic mutations occurs along the metaplasia-dysplasia-carcinoma sequence in EAC [5,6] and potentially impair important cell functions. However, few genes are mutated in greater than 10% of cases, underlining the heterogeneous nature of point mutations and small indels in this type of cancer. As a result it is difficult to pinpoint the genes that are relevant to tumorigenesis using traditional approaches based on the frequency of mutations in a single gene. It has been proposed that groups of genes involved in similar processes or pathways could have a cumulative effect. An example is the SWI/SNF nucleosome remodeling complex (ARID1A, SMARCA4 and ARID2), for which gene members are mutated collectively in 20% of EAC tumors [6]. Computational tools such as PathScan [7] have been developed to identify cellular pathways that are targeted by somatic mutations above the background mutation rate.
Epidemiologic evidence has linked the rising incidence of EAC with the eradication of Helicobacter pylori [8,9], and studies have suggested that the esophageal microbiota are altered in Barrett's esophagus [10][11][12] and EAC with decreased microbial diversity [13]. The Toll-like receptor (TLR) signaling pathway is a key component of the innate immune system and one of the main ways in which tumor cells interact with the microbiota. TLRs are pattern recognition receptors that bind unique molecular components of the microbiota and generate a proinflammatory innate immune reaction through nuclear factor kappa B (NF-κB) [14]. TLR signaling exerts an effect on epithelial cell function in the gastrointestinal tract, including stimulating repair of damaged enterocytes (lipopolysaccharide), enhancing cell proliferation, and triggering secretion of antimicrobial peptides (lipopolysaccharide and flagellin) [15]. Alongside TLRs, inflammasomes also contribute to inflammation through recognition of pathogenassociated molecular patterns. NLRP6 is a member of the NOD-like receptor family that plays a role in regulating inflammation and epithelial cell repair in the intestine and has been implicated in colorectal carcinogenesis [16,17].
Genome-wide association studies [18,19] and next-generation sequencing studies [5,6] have identified TLR4 gene mutations in solid tumors including EAC. We hypothesized that somatic mutations may collectively target the TLR signaling pathway in EAC and alter inflammatory signaling. We aimed to interrogate TLR pathway mutations and expression in a cohort of EAC and Barrett's esophagus patients with clinical outcome data. We then investigated whether TLR4 mutations in EAC affect downstream inflammatory signaling using an in vitro model system. Finally, we aimed to determine the broader relevance of TLR pathway mutations in other cancers using TCGA data and the COSMIC database.

The TLR pathway is significantly mutated in EAC
To determine whether the TLR signaling pathway is dysregulated through somatic mutations in EAC, we interrogated the mutational profiles of 171 EAC tumors and matched germline controls that were sequenced as part of the International Cancer Genome Consortium (ICGC) esophageal study. Non-synonymous somatic mutations affected the Toll-like receptor signaling pathway in 25 Table). Applying the PathScan tool showed that genes in the TLR signaling pathway were significantly enriched for mutations (p = 8.7x10 -5 ). Fig 1A presents the mutation and copy number status for known recurrent alterations in EAC in the context of TLR pathway mutated samples, and these events are compared with the remainder of the ICGC cohort in S1 Fig. As expected in EAC, most samples showed mutations in TP53, independent of whether or not they contain TLR pathway mutations. There was no significant enrichment of other known driver mutations in the TLR pathway mutated samples (S1 Fig). TLR pathway mutated tumors did not show significant differences in the numbers of total SNVs (Welch's t-test, p = 0.134) or non-synonymous SNVs (p = 0.147) compared to wild-type tumors and did not show significant enrichment of any of the molecular subtypes recently defined on the basis of their dominant molecular signature (Fisher's exact test, p = 0.57, S1 Fig) [20].
To investigate other potential sources of altered function of the TLR pathway genes, we investigated copy number and structural variants potentially affecting the TLR pathway in the ICGC cohort. Analysis of the copy number profiles of the TLR pathway genes indicated copy number gains for LBP and CTSK, while TLR3, RIPK1 and CD14 frequently showed fewer copies compared to the average ploidy (S2 Fig). Three samples showed homozygous deletions in TLR pathway genes: MYD88 in ICGC-30, IRAK1 in ICGC-24, and TLR7, TLR8 and IRAK1 in ICGC-10. Interestingly, several TLR pathway genes were affected by loss of heterozygosity. Loss of heterozygosity events overlapped with SNVs in TLR4 in four samples (ICGC-24, ICGC-156, ICGC-18 and ICGC-141) and in MYD88 in one sample (ICGC-78). Eleven out of 171 samples showed structural variants whose breakpoints overlap with TLR pathway genes or lead to a potential fusion with a TLR pathway gene (S2 Table). The following genes were affected: TLR1, TLR3,  TLR9, TLR10, RIPK1, TOLLIP, TRAF3, TRAF6, TRAM1 and LBP. Most structural variants were classified as duplications or deletions and there were three translocation and two inversion events. No recurrent events were detected.

Validation of TLR pathway mutations in a second EAC cohort and other cancers
To ensure that the findings were generalizable we interrogated exome sequencing data from the Dulak et al. TCGA cohort [6] using our computational pipeline, and found a similar mutational spectrum with non-synonymous mutations in TLR pathway genes in 33/149 (22.1%, PathScan p = 0.05) of EAC samples (Fig 1B). Based on the combined data from both cohorts (Fig 1C and 1D), the most frequently mutated genes were TLR4 (5.3%) and TLR9 (3.1%), and these events appear to be mutually exclusive. We verified 11/11 TLR4 mutations and 2/2 TLR9 mutations from the ICGC cohort using PCR and Sanger sequencing (S3 Fig). We also examined the COSMIC database for 20 different cancer types (as defined in S3 Table), comprising a total of 10,318 samples, and found a high proportion of mutations in TLR pathway genes in endometrial carcinoma ( Table). Next we categorized the different cancer types into two groups based on tumors that arise in body sites associated with significant microbiomes, including the oral and gastrointestinal tract, skin, urogenital tract and respiratory tract [21], and those with rare exposure to microbes. There was an increase in the frequency of TLR pathway mutations in cancer types that are highly exposed to microbes (p = 0.019, Wilcoxon rank-sum test, S4 Fig). This trend was also present when looking only at the fraction of TLR4 mutant tumors (p = 0.028).

Clinical relevance of TLR pathway mutations
Next we investigated the clinical relevance of TLR pathway mutations through correlation with clinical outcome data. In the ICGC EAC cohort, patients with TLR pathway mutations (n = 25) tended to have more advanced disease with metastases (Fisher's exact test, p = 0.036, Table 1). TLR mutant tumors originated more proximally at the level of the gastro-esophageal junction or above (Siewert Type 1-2 or esophageal, Fisher's exact test, p = 0.012). There was a trend towards decreased survival in patients with TLR mutations in comparison to wild-type although this did not reach statistical significance, possibly due to the limited number of TLR mutant cases and length of follow-up (S5 Fig). This trend was also seen when comparing patients with TLR pathway mutations to wild-type patients in the EAC cohort from the TCGA database, although again the clinical data is limited.
EAC frequently arises from the premalignant lesion Barrett's esophagus through different degrees of dysplasia. To investigate the timing of TLR pathway mutations in disease pathogenesis, we examined samples from patients with Barrett's esophagus adjacent to tumor (n = 24). The adjacent Barrett's was non-dysplastic in 18/24 cases, contained low grade dysplasia in 2/24 cases and was indefinite for dysplasia in 2/24 cases. Only 2/24 Barrett's samples showed mutations in the TLR pathway. A TRAF6 mutation was present in tumor and adjacent Barrett's, indicating that the mutation had occurred early in carcinogenesis. In another patient, a TLR9 mutation was detected in Barrett's but not the adjacent tumor. No other TLR pathway mutation status for known driver mutations in EAC (TP53, CDKN2A, SMAD4, PIK3CA) is presented for each sample with TLR pathway mutations in the ICGC and TCGA cohorts. Recurring oncogenic amplifications (ERBB2, EGFR, CCND1) and deletions (CDKN2A/B) that are known drivers in EAC are presented for the ICGC cohort. LY96 is the gene encoding MD2 protein. (C) Total TLR pathway mutation frequency from the combined ICGC and TCGA cohorts (n = 320), with gene names along the vertical axis and percent of mutated samples along the horizontal axis. (D) Schematic of the TLR signaling pathway summarizing the somatic mutations from the combined ICGC and TCGA cohorts, with mutated pathway components highlighted in color and large bold font. Grey pathway components have no TLR pathway mutations in either cohort. https://doi.org/10.1371/journal.pgen.1006808.g001 Impact of mutations in TLR pathway genes on esophageal carcinogenesis mutation was found in any of the Barrett's adjacent to tumors, while three additional tumors had TLR pathway mutations.
In silico analysis suggests TLR4 mutations are potentially damaging Since TLR4 was frequently mutated in both EAC cohorts and other solid tumor types in the COSMIC database, we decided to characterize these mutations in greater detail. Both EAC cohorts identified missense mutations at amino acid position E439 (substitution to glycine) and F487 (substitution to leucine or valine, Fig 2B). The COSMIC database showed additional missense mutations at position E439 (two in stomach adenocarcinoma and one in cutaneous melanoma) and position F487 (two in stomach adenocarcinoma and one in esophageal carcinoma, Fig 2C). Seven tumors with TLR4 mutations had paraffin-embedded tissue available to evaluate mutant TLR4 protein expression using immunohistochemistry for TLR4 monoclonal antibody. Similar to wild-type tumors, the TLR4 mutant tumors showed combined membranous and cytoplasmic staining of TLR4 protein, with staining intensity ranging from weak to strongly positive (S6 Fig). None of the mutant tumors showed complete loss of TLR4 expression, which was anticipated since the missense mutations did not cause truncation of the protein. We hypothesized that the mutations could have a functional effect on TLR4 signaling, and this was supported by computational modeling using a published crystal structure for dimerized human TLR4 ectodomain with associated MD2 co-receptor (LY96 gene product)  Table). TLR4 mutations are indicated in dark blue. (B) Schematic of TLR4 protein with mutations from the ICGC (green circles) and TCGA (purple circles) EAC cohorts. Both EAC cohorts identified missense mutations at amino acid position E439 and F487. (C) Schematic of TLR4 protein with mutations from different cancer types in the COSMIC database. (D) Three lanes show dimer contacts (to other TLR4 chain), MD2, and LPS (PDB ID: 4G8A [22]). The fourth lane was calculated from a homology model of TLR4 TIR domain dimer (based on PDB ID: 2J67 [23]). Contacting TLR4 residues with atoms less than 0.5 Å apart at surfaces were determined using Chimera findclash function. Numbers of contacts are plotted in 10 residue sections of increasingly dark red tones (maximum 8 is black). https://doi.org/10.1371/journal.pgen.1006808.g002 with LPS bound (PDB ID 4G8A [22]) and a hypothetical structure for the dimerized TIR domain based on TLR10 (PDB ID 2J67 [23], S7 Fig). Two structurally significant mutations affected the TIR domain (F703C and R787H), which is involved in downstream signaling and interaction with the adaptor molecule MyD88 [14]. The E439G mutation is also critically located at the TLR4 dimerization interface and may disrupt hydrogen bonds in the binding site of LPS and MD2 ( Fig 2D). Further, amino acid sequence alignment against seven nonhuman species showed that the positions of the TIR domain mutations (F703 and R787) are evolutionarily conserved, along with amino acids L80, L498 and S570, and E439 is semi-conserved (S8 Fig). Overall the combination of crystal structure modeling, sequence alignment, and SNP prediction algorithms (SIFT and Polyphen) suggested that six of the verified TLR4 mutations could have a functional consequence: L80M, E439G, L498V, S570I, F703C and R787H (S5 Table).

TLR4 mutations show decreased NF-κB activation upon stimulation in vitro
To test our functional predictions for the TLR4 mutants, we performed site-directed mutagenesis and NF-κB luciferase reporter assays in HEK293 cells, a common model for measuring TLR signaling. The TLR4 mutants were stimulated first using the weak agonist synthetic monophosphoryl Lipid A (MPLA). There was a significant decrease in ligand-dependent signaling for 7/9 of the TLR4 mutations stimulated with synthetic MPLA (S9 Fig). A double mutation of E439G with F703C, representing a tumor with two TLR4 mutations, showed a further decrease in Impact of mutations in TLR pathway genes on esophageal carcinogenesis TLR4 signaling compared to F703C (p = 0.0052) but not E439G (p = 0.099). Stimulation with stronger TLR4 agonists, synthetic Lipid A and lipopolysaccharide (LPS), showed a significant decrease in signaling for four single mutants (E439G, S570I, F703C and R787H) and the double mutation E439G + F703C ( Fig 3A). Western blotting for recombinant FLAG-TLR4 confirmed adequate expression of the different TLR4 mutants (Fig 3B). We also visualized recombinant TLR4-FLAG protein expression in HEK293 cells using confocal microscopy for mutants E439G and R787H, and there was no difference in expression or localization of TLR4-FLAG for either of the mutants in comparison to wild-type, suggesting that the decreased signaling was due to altered protein function rather than mis-folding and failure to reach the cell surface ( Fig 3C). Next, TLR4 mutants E439G, R787H and E439G+F703C were transfected into EAC cell lines stimulated with LPS for 24 hours, and secretion of the NF-κB dependent cytokine IL-8 was measured. OE33 and JH-EsoAd1 cells were selected because of their low endogenous TLR4 mRNA expression and ability to secrete measurable amounts of IL-8 (S10 Fig). The fold change of IL-8 secretion was significantly lower for mutants R787H and E439G+F703C in comparison to wild-type TLR4 (Fig 3D). In contrast to HEK293 cells, no significant decrease in TLR4 signaling was observed for mutant E439G stimulated with LPS in the EAC cell lines, suggesting that the strong agonist LPS was still able to trigger TLR signaling despite mutation of the ligand binding site.

Upregulation of NLRP6 expression in EAC tumors with TLR pathway mutations
Our experiments in cell lines suggest that TLR4 mutations impair TLR signaling and NF-κB activation in HEK293 cells and IL-8 secretion by EAC cell lines. We next tested whether there was any effect on gene expression in patient data using RNA-Seq data available for the TCGA cohort. Out of 89 samples with RNA-Seq data available, 17 samples had TLR pathway mutations and four had TLR4 mutations. Our analysis using DESeq2 [24] shows that expression of IL-8, NFKB2 and RELB was significantly elevated in the tumors compared to normal samples (n = 10), but no significant difference was observed when comparing tumors with mutations in the TLR pathway and wild-type tumors, possibly related to the small sample size (Fig 4A). Similarly, there was no significant difference in gene expression between TLR4 mutant and wild-type tumors. We next searched for alternative genes whose expression could be affected by TLR mutations in vivo. Tumors with TLR pathway mutations showed significant upregulation of NLRP6, GAST, TTC29 and C19orf69, and down-regulation of SFRP5, MYO18B, NAT8L, SHISA9 and IGFALS (Fig 4B). We validated our findings using RNA-Seq data from 23 independent tumors of which 2 were mutated in the TLR pathway, and significant upregulation was found only for NLRP6 (p = 0.004). NLRP6 is involved in pathogen recognition through the inflammasome pathway and has overlap in function with the TLR pathway. Additionally, quantitative RT-PCR was performed in 22 tumor samples with available RNA (11 of which contained TLR pathway mutations, including 5 TLR4 mutations). The results showed a similar trend, with upregulation of NLRP6 in TLR pathway mutated samples in comparison to wild-type tumors; however, this did not reach significance likely due to the small sample size (p = 0.172, S11 Fig). Of the samples with TLR4 mutations, R787H (relative expression 6.8) and L80M (relative expression 3.7) showed upregulation of NLRP6 compared to mean expression in wild-type tumors (relative expression 0.87, S6 Table).

Discussion
Our results suggest that somatic mutations may collectively target the TLR signaling pathway in EAC (25% of cases) and other solid tumor types. TLR mutant cases were associated with more proximal tumors and metastatic disease, indicating possible clinical significance of these mutations. TLR4 was the most frequently mutated TLR gene in 5% of cases in the combined EAC cohorts. Only two TLR pathway mutations (TLR9 and TRAF6) were detected in the adjacent Barrett's samples in the ICGC cohort. TLR4 mutations have been previously reported in high grade dysplasia [5], which implies that this mutation may be acquired later during disease progression. Although TLR4 is mutated in less than ten percent of EAC cases, it is one of the top twenty most frequently mutated genes in both the ICGC and TCGA cohorts. The most frequently mutated gene in this disease is TP53, which is mutated in approximately 70% of tumors [6]. A limited number of candidate driver genes have been identified that are mutated in greater than ten percent of EAC tumors, including CDKN2A, SMAD4, ARID1A, MYO18B and DOCK2 [5,6].
In addition to EAC, TLR pathway mutations were frequently observed in solid tumors arising in body sites exposed to microbiota [21] including the oral and gastrointestinal tract (colorectal adenocarcinoma, stomach adenocarcinoma and head and neck squamous cell carcinoma), skin (melanoma), urogenital tract (endometrial carcinoma) and respiratory tract (lung adenocarcinoma and squamous cell carcinoma). Regrettably, it was not feasible to analyze our whole genome sequencing data using pathogen discovery software such as PathSeq [25], because the concentration of microbial DNA is so low relative to human DNA in samples derived from esophageal tumor tissue and the results were unreliable with high levels of contaminants. This is a major limitation of analyzing low microbial biomass tissue samples with a WGS approach. Our recent work examining the microbiome in esophageal tumors and Barrett's tissue samples used 16S rRNA gene amplicon sequencing and a microbial extraction protocol with rigorous reagent controls to maximize the microbial DNA yield and account for contaminants [13]. However, only one of the tumors from that study had a TLR4 mutation so Impact of mutations in TLR pathway genes on esophageal carcinogenesis it was not possible to further correlate the findings with our current study, and additional research is needed to investigate the link between TLR mutations and the microbiome in EAC.
Seven TLR4 mutants in our study showed decreased signaling in response to weak agonists, and four were hypo-responsive to strong agonists. This suggests that the mutations differentially affected signaling with weak versus strong agonists, which may be relevant to the different microbial antigens present in the tumor environment. The model system was also a contributing factor; for instance E439G was hyporesponsive to LPS in HEK293 cells but not EAC cell lines. In EAC cell lines, addition of the second mutation F703C was required to significantly reduce TLR4 signaling activity. Further, loss of heterozygosity events overlapped with four TLR4 mutations, R787H, E603D, T193K, and L498V. Of these, R787H showed decreased signaling in response to strong and weak agonists, L498V showed decreased signaling in response to weak agonist only, and E603D and T193K showed no significant change. It is conceivable that loss of heterozygosity could potentially compound the effect of TLR4 missense mutations and further decrease signaling. A possible mechanism is that defective TLR4 signaling may negatively impact epithelial cell repair, which is in part dependent on TLR4 stimulation, and potentially enable microbes to breach the epithelial barrier in the tumor microenvironment. Hold et al. found that the single nucleotide polymorphism c.896A>G (p.D299G) was associated with an increased risk of gastric cancer in two patient populations [19], and cells with this mutation have been shown to be hypo-responsive to lipopolysaccharide [26,27]. However, Hold et al. found no significant increased risk for EAC or esophageal squamous cell carcinoma with this germline polymorphism, and the D299G polymorphism was not identified in either of the ICGC or TCGA cohorts.
The relationship between TLR4 signaling and tumorigenesis is complex and involves both innate and adaptive immunity, with evidence showing that TLR4 signaling can enhance or suppress cancer development, depending on the model system. For example, in the setting of chemically induced colitis TLR4 deficient mice were protected from colon carcinogenesis [28], while villin-TLR4 mice overexpressing TLR4 in the intestinal epithelium were highly susceptible to cancer when treated with dextran sodium sulphate and azoxymethane [29]. Conversely, overexpressing TLR4 using the CD4-TLR4 transgene in the intestinal epithelium of APC Min/+ mice that are genetically susceptible to colon carcinoma reduced tumorigenesis by increasing apoptosis [30]. Additionally, analysis of TCGA expression data suggested that NLRP6 is upregulated in TLR mutant tumors, which may reflect cross-talk between the TLR pathway and NOD-like receptor signaling pathway. NLRP6 regulates inflammasome-dependent innate immune signaling and has been shown to inhibit TLR2 and TLR4-dependent activation of NF-κB and MAP-kinase signaling pathways [31]. Further understanding of how altered TLR signaling may contribute to the inflammatory tumor microenvironment in Barrett's carcinogenesis could be helpful in cancer prevention strategies.

Ethics statement
Ethical approval was obtained from the National Research Ethics Services Cambridgeshire Research Ethics Committee on behalf of all hospital centers in the OCCAMS/ICGC trial (REC 07/H0305/52 and 10/H0305/51). Written informed consent was obtained from all subjects prior to the collection of samples and recording clinical information.

Human samples
EAC tissue samples with matched germline controls (n = 171 patients) were collected from 11 UK hospitals participating in Oesophageal Cancer Clinical and Molecular Stratification (OCCAMS) for the International Cancer Genome Consortium (ICGC). Ethical approval was obtained from the National Research Ethics Services Cambridgeshire Research Ethics Committee on behalf of all hospital centers in the OCCAMS/ICGC trial (REC 07/H0305/52 and 10/ H0305/51). Written informed consent was obtained from all subjects prior to the collection of samples and recording clinical information. All tissue samples were flash frozen in liquid nitrogen and stored at -80˚C. The tissue histology was reviewed by an expert gastrointestinal pathologist to ensure that tumor cellularity was greater than seventy percent in samples selected for sequencing.

Whole genome sequencing
DNA was extracted using the QIAGEN AllPrep kit and quantified using the Qubit Fluorometer. Whole genome sequencing was performed on the Illumina HiSeq platform in San Francisco, CA, USA. 100 bp paired-end sequencing was performed to an average depth of 50-fold for tumors and 30-fold for controls, achieving a Phred quality score of at least 30 for 80% of mapping nucleotide bases. WGS sequencing data has been deposited in the European Genome-phenome Archive under accession number EGAD00001001960: https://www.ebi.ac.uk/ega/search/ site/EGAD00001001960.

Variant calling
Single-nucleotide variant calling was performed as described previously [20]. Briefly, reads were mapped to the human reference genome (GRCh37) using BWA 0.5.9 [32]. Mutations and Indels were called using Strelka 1.0.13 [33]. The resulting SNVs were filtered using a custom set of filters [20] and then annotated using the Variant Effect Predictor (VEP release 75) tool [34]. As described in Secrier et al. [20], for structural variant calling the reads were mapped to the GRCh37 reference genome, and Manta [35] was used to identify putative breakpoint junctions using discordant read pairs and split reads. Breakpoints and potential runthrough events were annotated using the Ensembl GRCh37, version 75 gene annotation.

Copy number calling and loss of heterozygosity
Copy number calling was performed as described by Secrier et al. [20] In short, absolute and minor copy number of genomic loci was estimated using ASCAT-NGS v2.1 [36]. Read counts at germline heterozygous positions were estimated by GATK 3.2-2 [37]. ASCAT estimates for tumor purity and average ploidy for each sample were used. Loss of heterozygosity and homozygous deletions were defined for loci with an estimated minor copy number of 0 and an estimated total copy number of 0, respectively. For each gene the highest copy number change is shown. The copy number alterations for genes were classified into gains, losses, amplifications and deletions based on the relative copy number rCN = log2(CN/ploidy), where CN is the copy number of the gene and ploidy is the average ploidy as estimated by ASCAT. The cut-offs for classification are: deletion-rCN< = -2, loss-rCN< = -1, no change--1>rCN<1, gain-rCN> = 1, amplification-rCN> = 2.

TLR pathway analysis
We defined a list of genes that are specific to the TLR pathway based on the KEGG Toll-like receptor signaling pathway. PathScan [7] was used to assess whether mutations affecting the TLR pathway were significantly enriched in the ICGC cohort or the Dulak et al. TCGA cohort [6]. We used a Docker image (https://hub.docker.com/r/beifang/music/) of the Genome MuSiC 0.4 suite [38] to run PathScan. The overall background mutation rate was calculated on non-synonymous mutations using the human genome GRCh37 and all human Ensembl genes as a reference. The mutational consequences of each SNV predicted by VEP were converted to MAF format using the conversion scheme of vcf2maf (https://github.com/mskcc/vcf2maf/ blob/master/vcf2maf.pl). The COSMIC database v78 (http://cancer.sanger.ac.uk/cosmic) was downloaded and interrogated for TLR4 and TLR pathway mutations. We only used mutations derived from high-throughput studies. Further, we selected only missense and nonsense substitutions, as well as frameshift insertions or deletions. Mutations were counted by cancer type as defined by the filters for tissue type and histology described in S3 Table. Sample classification based on mutational signatures Samples were grouped into molecular subtypes as defined by Secrier et al. [20] The mutational context of each mutation was defined using the UCSC hg19 reference genome and the R package SomaticSignatures [39]. The estimated mutational signatures from Secrier et al. were used to fit the exposures by non-negative least squares. The three molecular subgroups were defined by consensus clustering on the estimated exposures.

Structural modeling of TLR4 mutations
The ICGC TLR4 mutations were modeled using mutation functions in COOT (Crystallographic Object-Oriented Toolkit) [40] applied to the crystal structure for dimerized TLR4 ectodomain (PDB ID: 4G8A [22]) bound to MD2 and lipopolysaccharide (LPS) and a hypothetical structure for the Toll/interleukin-1 receptor (TIR) domain based on the TLR10 dimer (PDB ID: 2J67 [23]). The starting ectodomain PDB structure 4G8A was the common human variant (D299G and T399I), which was initially mutated back to D299 T399 before modeling the observed ICGC mutations. The TLR10 TIR domain template had a 35% identity (61% similarity with BLOSUM50 matrix) over the 143 residues of the TLR4 TIR domain. TLR4 ectodomain model energy minimization, was with FoldX 3.0 [41], and TIR dimer contacts were optimized using PyRosetta [42]. Contacting TLR4 residues less than 0.5 Å apart at surfaces were determined using UCSF Chimera and sorted by residue range in Mathematica(TM) to show dimer contacts (to other TLR4 chain), interactions with the ligand lipopolysaccharide (LPS) and adaptor MD2. Amino acid sequence alignment for human TLR4 was performed using Uniprot (http://www.uniprot.org/align/) and Espript (http://espript.ibcp.fr/ESPript/cgibin/ESPript.cgi) against seven non-human species.

PCR verification of TLR4 mutations
Primers were designed to amplify an area approximately 200 bp in size surrounding TLR4 mutations using Primer3 (S7 Table). PCR amplification was performed with Q5 Hot Start High Fidelity Taq and the following conditions: denaturation at 98˚C for 30 s, 35 cycles of 98˚C for 10 s, 55-60˚C for 10 s and 72˚C for 10 s, followed by a final extension at 72˚C for two minutes. The PCR product was sent for Sanger sequencing (Source BioScience, Cambridge, UK).

Site-directed mutagenesis
Site-directed mutagenesis was performed using the Stratagene QuikChange II Kit and PfuUltra HF polymerase (Agilent Technologies, Inc.). Primers were designed using PrimerX with the mutation site at the center region (S7 Table). Mutagenesis was confirmed by Sanger sequencing the entire TLR4 cDNA. NOT1 and BamH1 (New England Biolabs) were used for cloning, and ligation was performed with T4 DNA Ligase (New England BioLabs) using an insert:vector ratio of 3:1.

Quantitative reverse transcription PCR (qRT-PCR)
Total RNA was extracted using the AllPrep mini kit (QIAGEN), and 1 μg of RNA was transcribed into cDNA using the Quantitect reverse transcription kit (QIAGEN) and diluted 1:5 in nuclease free water. Two μl of cDNA were amplified in a 10 μl reaction volume containing 500 nM of each primer and 5 μl SYBR Green I Master (Roche). The primers for TLR4 spanned across two exons and the sequences were 5'CCGACAACCTCCCCTTCTCA (forward primer) and 5'GGCTCTGATATGCCCCATCTTC (reverse primer). The sequences for NLRP6 primers were 5'cctggtgggtatgcttcgg (forward primer) and 5'ctcctgtagtgactgctcgc (reverse primer). The LC480 LightCycler 480 II (Roche) template for SYBR Green 384 well plates was used with 45 amplification cycles of 10 s each for 95˚C, 60˚C, 72˚C. All reactions were performed in triplicate. The cycle threshold (C T ) results were normalized by subtracting the average of three housekeeping genes: ribosomal protein S18 (RPS18: forward 5'ATCCCTGAAAAGTTCCAG CA, reverse 5'CCCTCTTGGTGAGGTCAATG), beta-actin (ACTB: forward 5'GGCACCCAG CACAATGAAGA, reverse 5' AAGCATTTGCGGTGGACGAT) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH: forward 5'GTCTCCTCTGACTTCAACAGCG, reverse 5'AC CACCCTGTTGCTGTAGCCAA). The delta-delta C T method [47] was used to compare relative mRNA expression between cell lines.

Enzyme-linked immunosorbent assay
Cells were seeded at 20,000 cells per well in 96-well plates and transfected with 0.25 μl Lipofectamine 2000 (Polyplus Transfection) per well and the expression plasmids (S9 Table). Twentyfour hours after transfection, cells were stimulated with 10 ng/μl monophosphoryl Lipid A or 40 ng/ml LPS diluted in culture medium. Secretion of IL-8 was measured using human Quantikine ELISA kits (R&D Systems Europe Ltd.) in duplicate.

Confocal microscopy
Cells were seeded at 40,000 per well on coverslips in 24-well plates and transfected with 1 μl jet-PEI per well and the appropriate transfection vectors (S10 Table), in triplicate. Forty-eight hours after transfection the cells were washed with PBS and fixed with 4% paraformaldehyde in PBS for 30 minutes at room temperature. The cells were washed twice and permeabilized with 0.2% Triton X-100 in PBS for 20 minutes at room temperature. The cells were washed three times with PBS and blocked with 2% bovine serum albumin and 5% chick serum in PBS for one hour at room temperature. The cells were stained with Alexa Fluor 488 conjugated DYKDDDDK (FLAG) Tag antibody (Cell Signaling Technology, 5407) diluted 1:200 in blocking buffer overnight at 4˚C protected from light. The cells were washed three times with 0.1% TWEEN 20 in PBS (PBS-T) and incubated with Atto 594 Phalloidin (Sigma-Aldrich, 51927) diluted 1:100 in PBS for 20 minutes at room temperature protected from light. The cells were washed with PBS-T and stained with DAPI in PBS (1:5000) for 15 minutes at room temperature protected from light. The coverslips were inversely mounted on slides with hard-dry DPX mountant (Leica Biosystems). A Leica TCS SP5 II confocal microscope (Leica Microsystems) was used to image the cells.

RNA-Seq analysis
RNA-Seq read counts per gene and variant calls for esophageal cancer were downloaded from the TCGA data portal (https://gdc.cancer.gov/). Only esophageal adenocarcinoma samples that had matched genotype data available and normal control samples were selected for the analysis. ICGC RNA-Seq bam files in Secrier et al. [20] for samples matching our cohort were also used. Reads overlapping RefSeq genes were counted in R using the GenomeAlignment package [49]. Entries with duplicated gene names were summarized to the median read count per sample. Only genes having at least one read count in any of the samples were retained. Differential expression analysis between TLR pathway mutated vs. wild-type, TLR pathway vs. control and wild-type vs. control samples was performed using DESeq2 [24].

Statistical analysis
The analysis for luciferase reporter assays was performed using the ANOVA test and the Tukey post-test in Graphpad Prism 6. Within Graphpad, the nonparametric Kruskal-Wallis test and Dunns post-test was used to compare means for ELISA assays. Kaplan-Meier curves for survival analysis were generated using the log-rank test. The Wilcoxon rank-sum test was used to compare the frequency of TLR mutant samples for microbiome exposed versus rarely exposed tumors in the COSMIC database. A significant p-value was defined as less than 0.05.
Supporting information S1  Table. Predicted functional effects of TLR4 mutations that were verified by Sanger sequencing n the ICGC cohort. For SIFT a low score (0) suggests a deleterious consequence, while a higher score (1) indicates the mutation is well tolerated. For PolyPhen a low score (0) indicates a benign mutation, while a high score (1) indicates a damaging mutation.  that occur in body sites exposed to microbiota versus rarely exposed to microbiota. There is an increase in the frequency of mutated samples in cancer types that are highly exposed to microbes for TLR pathway mutant samples (p = 0.019, Wilcoxon rank-sum test) and TLR4 mutant samples (p = 0.028). Global sequence alignment to seven non-human species (mouse, gorilla, orangutan, rat, bovine, equine and porcine) was performed using Uniprot (http://www.uniprot.org/align/) and Espript (http://espript.ibcp.fr/ ESPript/cgi-bin/ESPript.cgi). Evolutionarily conserved amino acids are highlighted in orange with red text, semi-conserved amino acids have red text only, and non-conserved amino acids have black text only. The secondary structure annotation is from the model in S7 Fig