Oral Microbiota and Risk for Esophageal Squamous Cell Carcinoma in a High-Risk Area of China

Poor oral health has been linked with an increased risk of esophageal squamous cell carcinoma (ESCC). We investigated whether alteration of oral microbiota is associated with ESCC risk. Fasting saliva samples were collected from 87 incident and histopathologicallly diagnosed ESCC cases, 63 subjects with dysplasia and 85 healthy controls. All subjects were also interviewed with a questionnaire. V3–V4 region of 16S rRNA was amplified and sequenced by 454-pyrosequencing platform. Carriage of each genus was compared by means of multivariate-adjusted odds ratios derived from logistic regression model. Relative abundance was compared using Metastats method. Beta diversity was estimated using Unifrac and weighted Unifrac distances. Principal coordinate analysis (PCoA) was applied to ordinate dissimilarity matrices. Multinomial logistic regression was used to compare the coordinates between different groups. ESCC subjects had an overall decreased microbial diversity compared to control and dysplasia subjects (P<0.001). Decreased carriage of genera Lautropia, Bulleidia, Catonella, Corynebacterium, Moryella, Peptococcus and Cardiobacterium were found in ESCC subjects compared to non-ESCC subjects. Multinomial logistic regression analyses on PCoA coordinates also revealed that ESCC subjects had significantly different levels for several coordinates compared to non-ESCC subjects. In conclusion, we observed a correlation between altered salivary bacterial microbiota and ESCC risk. The results of our study on the saliva microbiome are of particular interest as it reflects the shift in microbial communities. Further studies are warranted to verify this finding, and if being verified, to explore the underlying mechanisms.


Introduction
The positive association between alcohol use, tobacco smoking and the risk of esophageal squamous cell carcinoma (ESCC) has been well established, especially in Western countries. However, in areas with high incidence of ESCC, including the so-called "Asian esophageal caner belt", the major factors contributing to ESCC are yet to be established. [1] Recently, an association between indicators of poor oral hygiene and ESCC has been reported in studies from several high-risk areas of China, [2] India, [3] Iran [4], and from other areas including Latin America [5] and Japan. [6] Furthermore, poor oral health was reported as a risk factor for the precursor lesion of ESCC, i.e. esophageal squamous dysplasia, [7] and it may act synergistically in increasing the risk of ESCC with other risk factor (e.g. gastric atrophy). [8] There is reason to assume that poor oral health and hygiene are critical risk factors for ESCC in high-risk areas.
The underlying mechanism for the associations between oral health status and ESCC risk is not completely understood. It is well established that the oral microbiome plays a critical role in the maintenance of a normal oral physiological environment and in development of oral diseases, including periodontal diseases and tooth loss. Although little studied, the oral microbiome may be important in cancer and other chronic diseases, through direct metabolism of chemical carcinogens (e.g. nitrite, ethanol) [9,10] and through systemic inflammatory effects [11]. We assumed that a stronger underlying association of ESCC risk with oral microbiome profiles would exsit. Although some specific bacterial species in tissue and saliva have been linked to an elevated risk of ESCC by targeted approach, [12,13] to date few studies have systemically investigated the relation between oral microbiota and ESCC risk. In current study, we aim to investigate the potential association between oral microbiota in saliva and ESCC risk using 16S rRNA amplicon sequencing approach, based on a large case-control study conducted in Taixing, an area with a high incidence of ESCC.

Study base
A case-control study on esophageal cancer was conducted during October of 2010 and March of 2012 in Taixing of Jiangsu Province, China. Briefly, cases were recruited mainly from endoscopy units at the four largest hospitals of Taixing (the People's Hospital of Taixing, the Second People's Hospital of Taixing, the Third People's Hospital of Taixing and the Hospital of Traditional Chinese Medicine of Taixing). More than 90% of the patients in this area are referred to these hospitals. Subjects who were suspected to have esophageal cancer under endoscopy were asked to participate in the study. Case recruitment was also supplemented by additional linkage to the local Cancer Registry, and sample collection of the supplementary cases was conducted at the end of the same year. Control subjects, frequency matched to the cases of ESCC on sex and age in 5-year groups and randomly selected from the Taixing population register, were enrolled into the study during the same period with cases. All subjects in the study were restricted to local inhabitants who have lived in Taixing for at least 5 years prior to diagnosis date for cases or interview date for controls.
The current study is a sub-project of the case-control study focusing on the relation between oral microbiota and ESCC risk. Cases were those who were recruited from endoscopy room during the period from October of 2010 to August of 2011 (N = 331), and controls were those who were recruited during June of 2011 and August of 2011 (N = 400). In order to avoid possible confounding which might affect diversity of oral microbiota, i.e. ambient temperatures and dietary habits in different seasons, cases collected during November of 2010 and March of 2011 were excluded (N = 124). Cases without histopathological confirmation, complete questionnaire or saliva sample were also excluded (N = 36). The study base of the current study thus included 171 ESCC cases and 400 controls. Since saliva collection was performed after endoscopy for ESCC cases, while for control subjects it was performed only after fasting overnight, we could not exclude the possibility of contamination by endoscopy which would affect the diversity of oral microbiota. Therefore, we included 80 subjects with a suspected diagnosis of esophageal dysplasia who also underwent endoscopy during the whole study period as another "control" group in the current study.

Data collection by interview
All subjects underwent face-to-face interviews by trained interviewers using a standardized questionnaire. The questionnaire covered detailed information on age, sex, education, smoking, alcohol drinking, family history of ESCC and other potential confounders of interest. Dietary habits 10 years before interview were collected using a food frequency questionnaire specifically designed for this population. [14] The trained personnel counted each subject's number of teeth, recorded the number of missing and filled teeth (the sum of which was the MFT score) [4] and oral hygiene habits (times of tooth brushing per day).

Saliva DNA extraction and subject selection
About 2~3 mL of saliva was collected from each participant after overnight fasting. Saliva collection was before antitumor treatment of cases, and for both cases and controls, no prescription of antibiotics one month before interview was reported. Saliva sample was mixed with 3mL lysis buffer (50mM Tris, pH 8.0, 50mM EDTA, 50mM sucrose, 100mM NaCl, and 1% sodium dodecyl sulfate). The mixture was delivered to the laboratory same day of collection and stored in the -20°C freezers. A modified high-salt DNA extraction method was used to extract DNA from saliva samples. Thirty microliters of proteinase K (20mg/mL, Sigma) and 150uL of 10% sodium dodecyl sulfate were added to 2mL of the mixture, which was then incubated overnight at 53°C in a shaking water bath. After addition of 400uL of 5M NaCl and incubation for 10 min on ice, the mixture was distributed equally into 2-mL centrifuge tubes and centrifuged for 10min at 13,000 rpm in an Eppendorf 5415D centrifuge. The supernatant from each tube was transferred to a new tube to which 800uL of isopropanol was added; the tubes were then incubated 10 min at room temperature and centrifuged for 15min at 13,000 rpm. The supernatants were discarded, and the sediments were washed twice with 500uL of 70% ethanol; then the sediments were dried and dissolved in 30uL of double-distilled water. DNA concentration of each sample was measured by the NanoDrop spectrophotometer.
We first selected study subjects according to the DNA quality standards (DNA concentration: 20ng/uL; A260/280: 1.8~2.0; total amount: 400ng) set by the BGI Company (Shenzhen, China) which conducted sequencing for current study. Eventually, 100 of 171 ESCC cases, 70 of 80 dysplasia controls and 312 of 400 controls met the standards. We thus enrolled 100 ESCC cases and 70 dysplasia control subjects, and for healthy controls we selected 100 controls frequency matched to the ESCC cases by sex and age in 5-year groups. Meanwhile, pathological sections were re-reviewed by an experienced pathologist, and one case from ESCC group was re-diagnosed as esophageal adenocarcinoma (excluded), three subjects from the dysplasia group were re-diagnosed as ESCC (regrouped into ESCC). Finally, 102 ESCC cases (ESCC group), 67 dysplasia control subjects (Dysplasia group) and 100 healthy controls (control group) were included in the current study.
Sequencing, data processing and statistical analysis 16S ribosomal RNA (16S rRNA) amplicons covering hypervariable regions V3 to V4 were generated using primers (341_F-CCTACGGGNGGCWGCAG and 805_R-CTACCRGGGTATC TAATCC) incorporating Roche 454 FLX Titanium adapters (Branford, CT) and sample barcode sequences. [15] Amplicons were sequenced using single-read sequencing method following the manufacturer's specifications on the 454 Roche FLX Titanium pyrosequencing platform. Laboratory personnel were blinded to the case-control status. All the procedures except DNA extraction were conducted by the BGI Company.
Amplicon reads with mismatches in either primer or barcode were discarded and the remaining reads were stripped of barcode and primers. The fastq_filter command of USEARCH 7.0.1001 [16] was used to discard reads with more than one expected error as well as to truncate reads to a length of 300 nucleotides. Shorter reads were discarded. The qualityfiltered reads were abundance sorted and clustered into operational taxonomic units (OTUs) using the USEARCH cluster_otus command with 97% sequence identity. Singleton reads were ignored in the cluster_otus command to avoid spurious OTUs. Chimera removal was performed as part of the OTU clustering step and by using the USEARCH uchime_ref command against the "Gold" ChimeraSlayer reference database (r20110519). [17] Abundance tables were created by aligning the quality-filtered reads against the database of OTUs with the usearch_global command. QIIME 1.7.0 [18] was used to assign taxonomy and to generate a phylogenetic tree after aligning the reads and filtering alignments. The scripts used in this step were: assign_taxono-my_rdp.py to assign taxonomy against the Greengenes database (v12_10) [19] with the RDP classifier (v2.2), make_phylogeny.py to build a phylogenetic tree using FastTree (v2.1.3), [20] align_seqs_pynast.py to align with PyNAST [21] against the Greengenes core reference alignment, and filter_alignment.py to filter the PyNAST alignment.
Data analysis and visualization was performed using R (v3.0.1) and the package phyloseq (v1.4.5). [22] Samples with less than 1000 depth were discarded before analysis to ensure that sufficient biological diversity was captured. Alpha diversity and UniFrac [23] analyses were done after subsampling to even depth to reduce bias due to the dependence of these measures on sampling depth. For analyses on phylum and genus level the following steps were taken: 1) Greengenes suggested taxa assignments were heeded (e.g. [Prevotella] was treated as Prevotella), 2) spurious taxa with mean abundance under 0.01% were removed, 3) each sample was normalized to relative abundance by dividing by total abundance, and finally 4) unclassified taxa at the given rank were removed from further analysis.
Carriage (presence or absence; i.e. prevalence) of each genus was compared in three groups, and odds ratios (ORs) were calculated for genus, based on unconditional logistic regression modelling, adjusting for age, sex, education, smoking, alcohol drinking, family history of ESCC, MFT score (the number of missing and filled teeth), times of tooth brushing per day, daily consumption of pickled vegetables and daily consumption of fresh fruits. Relative abundance of each genus was compared using the Metastats package. [24] False discovery rate (FDR) adjustment was used to correct for multiple comparisons. Principal coordinate analysis (PCoA) was applied to ordinate dissimilarity matrices. A multinomial logistic regression model was used to compare the first 10 coordinates from PCoA among groups of study subjects.

Ethical considerations
The study was approved by the Institutional Review Board of School of Life Sciences, Fudan University and the Institutional Review Board of Qilu Hospital, Shandong University ().
Written informed consent was obtained from all participants before interview and sample collection.

Results
Multiplexed, barcoded sequencing data were deconvoluted and a total of 1.7M amplicon reads was obtained. Thirty-four samples had less than 1000 reads and were excluded (235 samples were left. Fig 1). Approximately 52% of all reads were discarded due to insufficient quality or read length less than 300 bp (384K short reads and 471K low quality reads), leaving 800K reads with 3402 average good quality reads per sample. The final data contained 32,192 unique reads and clustered into 489 OTUs.
Finally, 87 patients with ESCC (ESCC group, 59 males and 28 females), 63 patients with dysplasia (Dysplasia group, 41 males and 22 females), and 85 control subjects (Control group, 62 males and 23 females) remained for further analysis. The clinical parameters including age, sex, education, smoking status, drinking status, MFT score, times of tooth brushing per day, family history of ESCC and daily consumption of pickled vegetables and fruits are shown in Table 1. Times of tooth brushing per day and daily consumption of pickled vegetables were significantly different among three groups (P<0.05); ESCC patients consumed more pickled vegetables and brushed teeth less often compared to Dysplasia and control subjects.
The sequencing reads were assigned to 437 OTUs in the ESCC group, 446 OTUs in the Dysplasia group, and 471 OTUs in the Control group. To evaluate the diversity and richness of bacterial types in the samples, Chao1 and Shannon indices were calculated. Observed mean values of Chao1 and Shannon indices were 120.8 and 3.4 for the ESCC group, 129.1 and 3.6 for the Dysplasia group, and 147.2 and 3.7 for the Control group, respectively. Tests for difference in OTU diversity and richness, measured by both mean Chao1 and Shannon indices showed significant differences for ESCC vs Control (P<0.001) and ESCC vs Dysplasia (P<0.01) ( Table 2). At low depth, indices of Chao1, Shannon and mean numbers of OTUs increased sharply in all groups, however, the curves leveled off gradually with the increasing sequencing depth. The differences were always significant among three study groups, even at lower depth (S1 Fig).
We compared the overall bacterial community composition using weighted and unweighted UniFrac distance matrices, and applied PCoA to ordinate the matric. The first 10 coordinates explained 53% and 85% of the variance for Unifrac and weighted Unifrac distances, respectively (S2 Fig). Correlations between coordinates and genera were calculated, and a multinomial logistic regression model was applied to compare the first 10 coordinates in three groups. For Unifrac distance, except for coordinates 7-9, all other coordinates showed significant differences between ESCC and healthy control group, while only coordinates 3, 4, 5 and 8 were significant when comparing ESCC with the Dysplasia group. Results were similar after adjustment for age, sex, education, smoking, alcohol drinking, family history of ESCC, MFT, times of tooth brushing per day, daily consumption of pickled vegetables and daily consumption of fresh fruits (Table 4). For weighted Unifrac distances, highly significant differences (coordinates 2 and 3) emerged when comparing ESCC with both healthy control and Dysplasia groups (Table 4). We further visualized the Unifrac or weighted Unifrac distances using the two most significant coordinates (the ones which explained most variances and with small P values) found in multinomial logistic regression analyses (coordinates 1 and 3 for Unifrac distance; coordinates 2 and 3 for weighted Unifrac distance). For both Unifrac and weighted Unifract distances, ESCC and healthy control subjects tended to cluster in opposite directions, while dysplasia subjects were located between the two groups (Fig 3).

Discussion
Increasing evidence indicates a key role for the bacterial microbiota in carcinogenesis. [25] Our study was based on one of the largest sets of 16S rRNA gene sequences from the human oral cavity to evaluate the association between oral microbiota and ESCC risk. We found that ESCC subjects had decreased overall microbial diversity compared to dysplasia and healthy control The current study is a sub-study of a case-control study on upper gastrointestinal cancers. For this study, every effort was made to enroll all of the incident ESCC cases in the study area. The frequency matched healthy controls were randomly selected from the general population. The participation rates for both cases and controls were more than 75%. Since we tried to enroll cases before histopathological diagnosis being made, we were also able to include another control group, i.e. dysplasia patients. The similar directions of associations when using different control groups strengthened the validity of our findings. In addition, histopathological diagnosis of ESCC and dysplasia by a single pathologist, saliva collection after overnight fasting for both cases and controls, and collection of extensive information on potential confounders (e.g. smoking, alcohol drinking, and other lifestyle factors) were among the strengths of the current study.
Our study also has several limitations. One of the main limitations was that we did not add beads during the process of saliva DNA extraction which might affect the composition and diversity of the oral microbiota. Nevertheless, since the same procedure of saliva DNA preparation was applied in all three groups, the observed differences between groups were still valid, although we could not draw conclusion on those hard-to-break bacteria. As a large fraction of samples did not pass the quanlity control and were excluded, while there might be underlying factors being masked in these subjects, we compared inculed subjects and exclued subjects for the clinical parameters including age, sex, education, smoking status, drinking status, MFT score, times of tooth brushing per day, family history of ESCC and daily consumption of pickled vegetables and fruits, and no significant differences were found. In addition, saliva samples for ESCC cases were collected after endoscopy, which was different from healthy controls. This might raise a concern that the differences of oral microbiota between ESCC cases and healthy controls might be due to contamination during endoscopy. In order to determine whether endoscopy has led (or not) to the changes in the microbiome found in the ESCC population, a small cohort including 30 individuals were enrolled in one of the study hospitals and their saliva samples were obtained both before and after endoscopy. The overall bacterial community composition was analyzed using the same pipeline. No significant difference in bacterial community composition was found between saliva samples collected before and after endoscopy (S3 Fig). Moreover, for subjects in another control group, i.e. dysplasia subjects, saliva samples were also collected after endoscopy. The similar directions of associations when comparing to different control groups somewhat allayed such a concern. Another limitation is that comparison of the composition and diversity of oral microbiota might be biased by the nonconsistent sampling seasons among three groups, even if samples collected in winter in ESCC group were excluded due to the difference of temperature and dietary pattern. However, we compared the composition and diversity between different sampling seasons, but did not find any significant difference between different sampling seasons (data not shown). Finally, due to case-control study design, our results could not distinguish whether decreased microbial richness causes ESCC or is an effect of the cancer status, e.g. the oral microbiota may be modified by the confunding effect of restricted food intake due to symptoms from extensive lesions and/ or dry month in the cancer group. Currently, we are conducting a prospective cohort study, Taizhou Longitudinal Study, [26] in which saliva samples were collected in baseline survey and can be used to prospectively assess the relationship between oral microbiota and the development of ESCC and other gastrointestinal cancers.
In the present study, we found that ESCC subjects had low salivary microbial diversity compared to healthy controls and dysplasia subjects. Similarly, lower bacterial diversity was observed in some other habitats such as the stomach with gastritis [27] and the intestines with colorectal cancer [28]. Most recently, a cross-sectional study in China showed that a decreased microbial richness in the upper digestive tract was associated with cancer-predisposing conditions of the stomach and esophagus (i.e. low serum pepsinogen I/II ratio and esophageal squamous dysplasia). [29] Overall, the abundant bacterial groups found in our study are similar to those found in most other studies. The most common phyla in our samples were Bacteroidetes, Firmicutes, Proteobacteria, Fusobacteria and Actinobacteria. However, our data suggest that the most abundant phylum and genus were Bacteroidetes and Prevotella, and this might be a little different from other studies which showed Firmicutes and Streptococci were dominant in oral microbiota. [30] The shift may be due to the different DNA extraction methods and different broadrange PCR primers applied. An alternative explanation for this inconsistency may be the impact of diet and oral hygiene. It has been reported that the composition of oral microbiota is   [32] investigated human intestinal microbiota from children characterized by a modern western diet and a rural diet, and found that children in rural Africa showed a significant enrichment in Bacteroidetes and depletion in Firmicutes. Individuals with excellent oral hygiene typically harbor a relatively simple flora dominated by gram-positive cocci and rods, mostly comprised of Streptococci, however in individuals who do not maintain good oral hygiene, the flora shifts to become more diverse and complex and is dominated by anaerobic gram-negative bacteria, including Prevotella. [33] In the present study, even the healthy controls had relatively poor oral hygiene (as indicated by high indices of MFT and few times of tooth brushing per day). All the above reasons may contribute or partly explain the differences of the oral microbiota between different studies. Our data show that decreased carriage of some genera, e.g. Lautropia, Bulleidia, Catonella, Corynebacterium, Moryella, Peptococcus and Cardiobacterium, are significantly associated with an increased risk of ESCC. However, due to low relative abundances of these genera (<0.5%), there is a danger that the observed differences were due to insufficient sampling depth. On the contrary, presence of genus Mycoplasma, which is unaffected by many common antibiotics and reported to be associated with several types of cancer, such as gastric cancer, [34] colon cancer [35] and prostate cancer, [36] was more common in ESCC cases than in healthy and Dysplasia control subjects, although the differences were not statistically significant after FDR adjustment. Higher relative abundance of Prevotella and Streptococcus were also observed in the ESCC group compared to non-ESCC groups. The proportion of these two genera accounted for nearly 65% of the overall community in ESCC subjects, which might to some extent explain the low diversity of oral microbiota in these patients. Although these genera seem to be non-pathogenic to the host, several studies have indicated that they might be associated with oral and upper digestive tract cancers. [37][38][39] Further studies are warranted to confirm these findings. Microbiota and host form a complex "super-organism" in which symbiotic relationships confer benefits to the host in many key aspects of life. A growing body of evidence implicates oral bacteria in the etiology of oral and gastrointestinal cancers. [29,39] The oral microbiome may play an important role in cancer development, through direct metabolism of chemical carcinogens (e.g. activating alcohol and smoking-related carcinogens locally), and/or through systemic inflammatory effects. [30] Multi-disciplinary collaborations among various fields including epidemiology, microbiology, genetics, immunology, and bioinformatics will be needed to broaden our understanding of the relationship of oral microbiome and cancer risk, and to better understand of cancer etiology.
To summary, this is the first epidemiological study comparing the oral microbiota of ESCC and control subjects while controlling for potential confounders. We observed a correlation between altered salivary bacterial community structure and ESCC risk. The results of our study on the saliva microbiota are of particular interests given its modifiable nature. However, prospective and longitudinal cohort studies are required to verify this finding, along with functional studies, e.g. metagenomics and transcriptome studies. Establishment of the association between oral microbiome and ESCC risk may lead to significant advances in understanding of cancer etiology, potentially opening a new research paradigm for cancer prevention.