Esophageal microbiome signature in patients with Barrett’s esophagus and esophageal adenocarcinoma

Preliminary studies suggested a possible correlation of microbiota with Barrett’s esophagus (BE) and esophageal adenocarcinoma (EAC), where the need for tools to ameliorate its poor prognosis is mandatory. We explored the potential signature of esophageal microbiota and its predicted functional profile along the continuous spectrum from BE to EAC. We analyzed through 16S-based amplicon sequencing the mucosal microbiota and the microbiota-related functional predictions in 10 BE and 6 EAC patients compared with 10 controls, exploring also potential differences between the metaplastic mucosa (BEM) and the adjacent normal areas of BE patients (BEU). BEM and EAC showed a higher level of α and β-diversity. BEM evidenced a decrease of Streptococcus and an increase of Prevotella, Actinobacillus, Veillonella, and Leptotrichia. EAC displayed a striking reduction of Streptococcus, with an increase of Prevotella, Veillonella and Leptotrichia. LefSe analysis identified Leptotrichia as the main taxa distinguishing EAC. BEM showed a decreased α-diversity compared with BEU and a reduction of Bacteroidetes, Prevotella and Fusobacterium. Functional predictions identified peculiar profiles for each group with a high potential for replication and repair in BEM; an upregulated energy, replication and signaling metabolisms, with the fatty-acids biosynthesis and nitrogen and D-alanine pathways down-regulated in EAC. Our pilot study identifies a unique microbial structure and function profile for BE and EAC, as well as for metaplastic and near-normal areas. It proposes a new concept for BE, which could be intended not only as the histological, but, also, as the microbial closest precursor of EAC. This requires further larger follow-up studies, but opens intriguing horizons towards innovative diagnostic and therapeutic options for EAC.

ad-hoc built reference database, based on the sequences available from NCBI RefSeq database for bacteria (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/), which comprised, as of 2018, February 12th a total of 17898 species, belonging to 2259 genera.

Reference sequences
Through a custom script, for each genus, sequenced genome for all species and strains were downloaded and properly formatted for further processing. In all our analyses, only bacterial strains with a genome finishing grade of "Complete", "Chromosome" or "Scaffolds" were considered. The following table summarizes the references used for each of the groups considered. Reads to re-classify From the OTU table comprising all the samples, OTUs classified within the above genera were selected and the sequences of all the reads grouped in each OTU (clustered at 97% similarity) were retrieved. In order to reduce the number of sequences to re-classify, clonal reads (i.e.: reads being identical throughout 100% of their length and composition) were grouped together.

Classification
Re-classification of the reads was performed through nucleotide BLAST (legacy BLAST, v 2.26) [1], using a cutoff of 1e-10 for the e-value and de-activating the dust-filter. Only reads matching for at least of 80% of their length were retained and, for each read, the best match (i.e.: that or those with the higher bit-score) was selected. If a read had multiple classifications on different species, the classification was reset to genus level.

Statistical analysis
In order to keep only consistent data for species-level evaluations, only samples having an incidence (i.e.: a relative abundance) higher than 0.5%, were considered. This was made to exclude samples with very few reads classified in the genus that could profoundly alter the dataset (e.g.: considering a sample in which we had only 1 read in a genus, this would have brought a 100% to the species-level classification for that certain species A non-parametric Mann-Whitney U-test was used to compare the relative abundance of each bacterial species in the different experimental groups, considering p-values <0.05 as significant.

Co-abundance network analysis
Bacterial genera were selected considering only those present at >0.5% of abundance in at least 30% of the samples in at least one experimental group, in order to exclude minor and transient contributors of the gut microbiota. This resulted in a subset of 28 genera using the whole dataset, having a relative across all samples ranging from 32.20% to 0.23%.
The co-abundance between each pair of genera was evaluated calculating the Kendall's correlation coefficient and displayed as heatmaps, hierarchically clustered using Spearman's correlation metric and Ward linkage. Only associations having a Benjamini-Hochberg adjusted p-value <0.05 for the linear model were used to build for the hierarchical clustering. Results obtained considering the entire dataset of samples were used to define the co-abundance groups (CAGs). Permutational multivariate analysis of variance (P-MANOVA) [4] was used to determine whether CAGs were significantly different from each other. Essentially this compared strength of the correlations between the groups to correlation strengths within the groups in a pairwise manner. All comparisons, performed via 9999 random permutations, had a p-value<0.003 for rejecting the hyphothesis of no-difference among groups. Co-abundance network plots were created as previously described [5]. In the plots, circle and label size is proportional to the genus' relative abundance in the experimetal group or along the whole dataset; circle colors represent the CAGs clusters; red edges suggest a positive correlation between genera, whereas blue edges represent negative correlation; edge thickness is proportional to the Kendall's correlation coefficient.