Bile microbiota in primary sclerosing cholangitis: Impact on disease progression and development of biliary dysplasia

Objective The etiopathogenesis and risk for development of biliary neoplasia in primary sclerosing cholangitis (PSC) are largely unknown. Microbes or their metabolites have been suggested to play a role. To explore this potential microbial involvement, we evaluated the differences in biliary microbiota in PSC patients at an early disease stage without previous endoscopic retrograde cholangiography (ERC) examinations, advanced disease stage, and with biliary dysplasia or cholangiocarcinoma. Design Bile samples from the common bile duct were collected from 46 controls and 80 patients with PSC during ERC (37 with early disease, 32 with advanced disease, and 11 with biliary dysplasia). DNA isolation, amplification, and Illumina MiSeq sequencing were performed for the V1-V3 regions of the bacterial 16S rRNA gene. Results The most common phyla found were Bacteroidetes, Firmicutes, Proteobacteria, Fusobacteria, and Actinobacteria. The most common families were Prevotellaceae, Streptococcaceae, Veillonellaceae, Fusobacteriaceae, and Pasteurellaceae, and the most common genera were Prevotella, Streptococcus, Veillonella, Fusobacterium, and Haemophilus. The bacterial communities of non-PSC subjects and early stage PSC patients were similar. Alpha diversity was lower in patients with biliary dysplasia/cholangiocarcinoma than in other groups. An increase in Streptococcus abundance was positively correlated with the number of ERC examinations. Streptococcus abundance was also positively correlated with an increase in disease severity, even after controlling for the number of ERC examinations. Conclusions Our findings suggest that the aetiology of PSC is not associated with changes in bile microbial communities, but the genus Streptococcus may play a pathogenic role in the progression of the disease.


Introduction
noted that a likely source for these bacteria were previous endoscopic retrograde cholangiography (ERC) procedures. A more recent study used a 16S rRNA amplicon sequencing approach to characterize the biliary bacterial communities of 39 PSC patients [22]. Their analysis focused on microbiome changes related to patients' genetic features, and did not include any non-PSC controls. We are not aware of any studies using modern high-throughput amplicon sequencing methodology to compare biliary microbiota of PSC patients and non-PSC subjects.
In the present study, we set out to explore the role of biliary microbiota 1. in the etiopathogenesis of PSC by comparing non-PSC controls to newly diagnosed earlystage PSC patients (ERC severity score < 6); 2. in disease progression by comparing early-stage PSC patients to advanced-stage patients (ERC severity score ! 6); 3. in the development of biliary dysplasia and cholangiocarcinoma (CCA) by comparing advanced-stage PSC patients to patients with dysplasia/carcinoma; and 4. the overall relationship of microbiota and disease severity as measured with the ERC score.
In addition to the PSC-related comparisons, we also studied the impact of the number of ERC examinations, since as has been noted before [23], these are likely to affect the microbiota, and repeated examinations could be an important confounding factor.

Materials and methods
The clinical part of the study was conducted at the Helsinki University Clinic of Gastroenterology. The patients were recruited from the Clinic's PSC registry. Informed consent was obtained from each patient, and the study was approved by the Ethics Committee of Internal Medicine in accordance to the ethical guidelines of the 1975 Declaration of Helsinki (Dnro 278/13/03/01/2009). The subject population consisted of 80 patients with PSC and 46 controls, for a total of 126 subjects. The control subjects were patients referred for their first ERC due to inconclusive bile duct MRCP findings or elevation of serum alkaline phosphatase (ALP) of unknown origin. PSC was excluded with ERC and in most cases with liver histology and clinical follow up. Detailed clinical information on the study subjects can be found in Table 1. Exclusion criteria were age under 18 years and use of antibiotics within one month before the study.
The indications for ERC examination included: 1) constantly elevated ALP levels in conjunction with IBD, or 2) magnetic resonance cholangiography findings, or 3) liver biopsy suggestive of PSC, or 4) biliary dysplasia surveillance. ERC was performed using needle knife and guide wire for cannulation and the balloon catheter occlusion technique to ensure adequate filling of intra-and extrahepatic bile ducts. Before injecting contrast media, a bile sample was aspirated from the extrahepatic bile ducts with a balloon catheter. The sample was divided into 1 ml plastic tubes and immediately immersed in liquid nitrogen (-196˚C). Samples were stored at -20 o C until transfer to the sequencing facility, where they were stored at -80 o C. In addition, brush cytology was systematically performed during ERC regardless of the presence or absence of dominant strictures. Cholangiographic findings were scored according to the modified Amsterdam score (mAm score) [24]. Both intra-and extrahepatic changes were scored, and a sum score was calculated. At the time of ERC, blood samples for routine laboratory values were collected and analysed in the laboratory of the Helsinki University Hospital (Huslab) using appropriate methods.

Analysis of brush cytology specimens
The tip of the cytological brush was cut off, and the brush and the fluid from the brush catheter shaft were flushed into 50% ethanol. Cytospin slides were prepared and stained with Papanicolaou stain, and a cell block was also prepared when possible. Sections from cell blocks were stained with haematoxylin and eosin. The brush cytology slides were re-evaluated to reach a consensus blinded to the clinical status of the patient. Neutrophilic inflammation was evaluated semi quantitatively as described previously [25]. Brush cytology was graded as benign (including inflammatory/regenerative atypia), suspicious (suspicious for malignancy/cytological dysplasia), or malignant (CCA) using generally accepted cytological criteria.
DNA extraction, library preparation, and sequencing DNA extraction, amplification, and sequencing were performed at the Institute of Biotechnology, University of Helsinki. Bulk DNA was extracted from centrifuged and pelleted bile using Invisorb1 Spin Blood Mini Kit (Stratec Molecular, Berlin, Germany), according to the manufacturer's instructions. The V1-V3 regions of the bacterial 16S rRNA gene were amplified following a previously described protocol [26], except that the first PCR round included two 25μl technical replicates, and the two PCR rounds were performed with 15 and 18 cycles, respectively. The amount of template DNA for the first round ranged from 2.8 ng to 352 ng per

Bioinformatics and statistical analysis
Bioinformatics and statistical analyses were performed at the Institute of Biotechnology, University of Helsinki. Primers and low quality bases and sequences were removed with CutAdapt [27]. mothur [28] was used to process the sequence data and to perform taxonomic assignments following the OTU (Operational Taxonomic Unit, a DNA sequence-based proxy for bacterial species) approach from the MiSeq Standard Operating Procedure [29,30]. All singleton sequences were discarded to aid in processing the data as well as to reduce the number of unique sequences that are likely to be caused by sequencing errors. The raw sequence data are available at the European Nucleotide Archive with accession number PRJEB15501. Prior to any further comparisons, all OTUs of the genera Ralstonia, Shewanella, and Halomonas were removed as probable contaminants based on sequenced blanks, previous personal experience, as well as reported cases in the literature [31]. All statistical analyses were performed with the R programming language [32]. Doubletailed p-values were used throughout the study, with p 0.05 considered as statistically significant. Potential batch effects from the DNA isolation date and sequencing run were evaluated with Non-metric Multidimensional Scaling plots (NMDS) based on Bray-Curtis dissimilarity. The pattern for sequencing run was suggestive of batch effects, which warranted using this variable as a confounder in other comparisons.
Alpha diversity (which quantifies the "species" richness and evenness of the microbial communities) was estimated with the Shannon index, calculated with the phyloseq package [33] using non-rarefied data and compared between subject groups of interest using Kruskal-Wallis and pairwise Wilcoxon rank-sum tests.
Generalized linear models with negative binomial distribution as implemented in the DESeq2 package [34] were used to estimate differential abundances of taxa. The Benjamini-Hochberg method was used for multiple comparisons correction. Before comparisons, all taxa that were not represented by more than one sequence per sample in more than ten samples were removed. This pre-filtering was performed since rare taxa are particularly prone to produce false positives due to chance effects. This approach to low count taxa is conservative, given that DESeq2 also performs automatic filtering using the mean of normalised counts.
Three different statistical models were run in DESeq2: The second and third models included the number of ERC examinations as a numerical variable. All three models also included variables for IBD status and sequencing run to correct for their respective potential confounding effects.
After the DESeq2 comparisons, all statistically significant hits were further evaluated visually using box-and scatter plots of taxon relative abundance vs variable of interest to assess the robustness of the results. This assessment was performed to look for features suggesting that the taxa might be false positives, such as 1) putative outliers, 2) influential points with high leverage, 3) very low mean abundances, and 4) inconsistent abundance increase and decrease patterns across the groups that are difficult to explain as biologically significant.

Results
Based on the clinical results, we classified the study subjects with PSC as follows: 37 patients were at an early disease stage (ERC severity score < 6), 32 at an advanced disease stage (ERC severity score ! 6), and 11 had biliary dysplasia/cholangiocarcinoma. 20% of control subjects had IBD, as opposed to 68%, 59% and 73% in the three PSC patient groups ( Table 1).
The microbiome sequence data includes 20 bacterial phyla, subdivided into 124 families, 309 genera, and 2125 OTUs. The total number of sequences in the data set is 3 740 318, with a minimum of 2488 and a maximum of 101 556 per sample, and an average of 29 686 sequences. The most common phyla are Bacteroidetes, Firmicutes, Proteobacteria, Fusobacteria, and Actinobacteria (Fig 1A), the most common families Prevotellaceae, Streptococcaceae, Veillonellaceae, Fusobacteriaceae, and Pasteurellaceae (Fig 1B), and the most common genera Prevotella, Streptococcus, Veillonella, Fusobacterium, and Haemophilus ( Fig 1C). The bar charts suggest that the dysplasia/carcinoma patients might have a higher abundance of bacteria of phylum Firmicutes and genus Streptococcus and a lower abundance of phylum Bacteroidetes and genus Prevotella than the other groups, and that the abundance of Streptococcus could also be higher in advanced stage PSC patients than in controls or early stage patients.

Microbial diversity
Comparisons of microbial alpha diversity, estimated with the Shannon index, suggest that there are differences between disease stages (p = 0.036, Kruskal-Wallis rank sum test). Further, pairwise comparison of groups does not show statistically significant differences between any two groups (p ! 0.055, pairwise Wilcoxon rank sum test), but the low p-values nevertheless suggest that differences are present and are more pronounced between controls and early disease patients against the dysplasia/carcinoma stage, with the advanced disease stage falling between the two (Fig 2A). When the microbiota of control subjects are compared to that of PSC patients with early stage at their first ERC examination (to exclude any effects of the procedure), no statistically significant differences are detected (p = 0.64, pairwise Wilcoxon rank sum test; Fig 2B). In general, visual inspection of the groups' diversity progression pattern and the above test results suggest that diversity is similar between controls and the early disease stage, and then drops progressively through the advanced and dysplasia/carcinoma stages.

Microbial differential abundance
Differential abundance assessment using DESeq2 results in a long list of potential taxa of interest in all group comparisons (S1 Table). Comparing the control patients and the early stage PSC patients at their first ERC examination reveals four OTUs (an unclassified Enterobacteriaceae OTU, Otu0008; Neisseria Otu00045; Campylobacter Otu00089; and an unclassified Neisseriaceae OTU, Otu00213) and three families (Pasteurellaceae, Staphylococcaceae, and Xanthomonadaceae) as differentially abundant, with no genus-level results ( Table 2, S1 Fig).
When contrasting the control group with the early disease group as a whole, we obtain statistical significance for two OTUs (an unclassified Clostridiales, Otu00188, and the same unclassified Neisseriaceae as above, Otu00213) and one family, Staphylococcaceae ( Table 2, S2 Fig). However, a visual assessment of these taxa (S1 and S2 Figs) suggests that many of these taxa might be false positives.
For the other comparisons, we find a total of 24 taxa with a statistically significant difference in abundance between early and advanced stage patients, and 36 when contrasting advanced stage patients and patients with dysplasia or carcinoma (S1 Table). After exploring these taxa visually, we consider the streptococcal group the most important one, showing a consistent, robust pattern coupled to meaningful mean abundances ( Table 2, S1 Table). Our comparisons Bile microbiota in primary sclerosing cholangitis indicate that there is a statistically significant difference between early and advanced stage patients in the abundances of genus Streptococcus (Fig 3A) and one Streptococcus OTU (Otu00020), and between advanced disease patients and patients with dysplasia/carcinoma in two Streptococcus OTUs (Otu00035 and Otu00061). Two Streptococcus OTUs (Otu00020 and Otu00061) are also significant when disease severity is measured using the ERC score as a numerical variable (Fig 3B). Finally, a low abundance Prevotella OTU (Otu00128) is reduced to zero abundance in the dysplasia/carcinoma group.
Since previous ERC examinations could have had an effect on the biliary microbiome, we looked at what taxa appear to be associated with the number of ERC examinations undergone by each patient. The resulting taxa are the same for both the model where PSC patients are categorized according to severity, and the one where a numerical ERC score is used instead, suggesting that the abundances of the genus Streptococcus and Streptococcus Otu00035 increase with additional ERC examinations (Table 2).
Finally, we took the opportunity to assess the impact of IBD on bile bacterial communities in PSC. Our data does not support alpha diversity changes for IBD versus no IBD when using both controls and PSC subjects, nor among PSC subjects only (p = 0.9 and p = 0.24, respectively). Differential abundance analysis produces some statistically significant hits (S1 Table), but none of them hold as robust under close scrutiny: they seem likely to be either contaminants, misclassified (e.g. Soonwooa, a marine bacterium not expected to be present in human samples), or false positives affected by outliers or chance effects due to low mean abundances.  Legend. Statistically significant results from differential abundance comparisons. The two contrasts between the control group and the early disease groups contain all the original significant results, for the reader's convenience. The next five comparisons contain only the results that passed assessment of robustness. A full table containing all statistically significant results from all models and contrasts of interest can be found in the Supplement (S1 Table). Mean abundance = mean taxon abundance (number of sequences) across data set after normalization for sequencing depth.

Discussion
Microorganisms have been suggested to be involved in the etiopathogenesis of PSC, but so far, there have only been only few studies exploring the topic with modern molecular methods [7]. To our knowledge, the present study is the first one to use high-throughput 16S rRNA gene sequencing of bile material from both PSC patients and non-PSC control subjects to assess the role of microbiota in the initiation, progression and development of dysplasia in PSC. Our study subjects included patients with newly diagnosed PSC at their first ERC examination, patients at an early disease stage but with a history of multiple ERC examinations, patients with advanced biliary disease, and those presenting with biliary neoplasia. We found evidence for a decrease in bacterial diversity in patients with dysplasia or cholangiocarcinoma. Looking at specific bacterial taxa, the genus Streptococcus appears to be of particular interest, with the genus itself and several OTUs being differentially abundant between the groupings under comparison.

Bile microbiota and the etiopathogenesis of PSC
Our results do not support a diversity difference in microbiota between controls and PSC patients with early-stage. None of the taxa reported as differentially abundant (except for Neisseriaceae Otu00213) show convincing, robust patterns under visual inspection, and they could very well be false positives, although it would be premature to disregard them. Otu0008, Otu00045, Otu00089, Otu00213, and Otu00188 are all less abundant in the early stage PSC group than in controls, which argues against an etiological role in PSC. Also, except for Otu0008, all show extremely low mean abundances. The families Pasteurellaceae, Staphylococcaceae, and Xanthomonadaceae all contain known human pathogens, but all are quite diverse as a group. Given that no specific genera or OTUs representing these families are differentially abundant, and Xanthomonadaceae has a very low mean abundance, the results do not suggest an infectious role in the initiation of PSC. Overall, our data doesn't provide convincing evidence that bile microbiota are involved in the initiation of bile duct inflammation or in the aetiology of PSC. Finally, it should be noted that our controls' microbial communities may not represent "normal" or fully healthy bile microbiota, as they all had some indication for undergoing an ERC.

Role of bile microbiota in disease progression
Our results indicate that microbial diversity could be reduced between early and advanced stage patients. This could be thought to mirror the pattern seen in several studies of gut-related microbial communities where PSC subjects were found to have a reduced diversity compared to controls [11,14,15]. There are significant differences in the abundances of streptococci between early and advanced stage patients, both for a specific OTU and the entire genus ( Fig  3A, Table 2). Two OTUs are also positively correlated with disease severity as measured with the ERC score ( Fig 3B). Based on these findings, streptococci could have a role in disease progression, even if they might not be involved in the initiation of PSC. Both the genus Streptococcus and one specific OTU are also positively correlated with the number of ERC examinations performed ( Table 2). A previous study concluded that the streptococci cultured from bile samples of PSC patients are primarily a consequence of ERCs [23]. However, our statistical model takes into account both disease severity and the history of multiple ERC examinations, and streptococci appear associated with both. Bacteria of the genus Streptococcus are also detectable in bile samples of control subjects and early-stage PSC patients with no history of previous ERC examinations, suggesting that the genus might be constantly present in bile, although it is not possible to completely exclude contamination during sample collection. Therefore, the potential role of streptococci in disease progression calls for further attention, while their association with the number of ERCs underlines the risk of nosocomial infection during the procedure.

Bile microbiota and development of biliary neoplasia
Our results showed that biliary microbial diversity is the lowest in patients with dysplasia or cholangiocarcinoma (Fig 2A). As for differential abundance analyses, two Streptococcus OTUs appear more abundant in dysplasia/carcinoma than in advanced stage PSC patients ( Table 2). The genus Streptococcus is also more abundant in dysplasia/carcinoma patients' bile, although this difference is not statistically significant (Fig 3A). The differences in diversity and Streptococcus abundance seem to follow the same trend as those seen in the comparison between early and late stage patients, indicating a progressive reduction in bacterial diversity and a generalised increase in streptococcal abundance as PSC develops.

Limitations of the study
This study was designed to be as representative as possible of the main disease stages within the limitations of a sample of convenience, i.e. to have an adequate number of samples representative of the groups under study within the practical restrictions of obtaining suitable bile samples. We are primarily looking for pathogenic organisms associated with the aetiology and/or development of PSC, and in the case of an infection we would expect that any candidate organism(s) would be clearly overrepresented in PSC. The sample sizes in our study should be adequate to constrain interpersonal variation of microbiota for this particular purpose, with the possible exception of the dysplasia/carcinoma group. Additionally, the analysis with the ERC severity score uses a numerical scale and all 126 samples, and in this case we do not think that sample size would limit the detection of potential pathogens in a clear clinical infection scenario. On the other hand, we are dealing with a complex disease in which more subtle microbiome effects could be at play. Thus, we can't exclude the possibility of substantially more intricate relationships between bile microbial communities and PSC. Larger sample sizes could have allowed the detection of more subtle differences in bacterial abundances that could still have biological meaning in the context of the disease.

Conclusions
The results of our exploratory study suggest that the aetiology of PSC is not associated with specific changes in biliary microbial communities. However, members of the Streptococcus genus appear to be positively correlated with disease progression, even when the number of previous ERC examinations is controlled for. ERC examinations are also associated with an increase of streptococcal abundance, supporting previous findings that the procedure might exacerbate the growth of these bacteria. Also of interest are the findings regarding alpha diversity, which is estimated to decrease gradually from the early disease stage to biliary neoplasia, even though the diversity of naïve PSC patients is similar to that of non-PSC controls. It is not possible to evaluate based on our results whether these changes in the microbiome are directly associated with the disease process, or a product of biliary bacterial communities adjusting to the changes in their environment. Either way, our study underlines the need to further explore the role of Streptococcus in PSC. Studies of what constitutes a "normal" biliary microbiome, if there indeed is one, would be crucial for better understanding the disease-related changes.  Table. Complete list of statistically significant results. All statistically significant results from all the GLMs used in this study for differential abundance analysis, except for those associated with the sequencing run variable, which was used only for controlling purposes. Mean abundance = mean taxon abundance (number of sequences) across data set after normalization for sequencing depth. (PDF)