Gut bacterial peptides with autoimmunity potential as environmental trigger for late onset complex diseases: In–silico study

Recent evidences suggest that human gut microbiota with major component as bacteria can induce immunity. It is also known that gut lining depletes with ageing and that there is increased risk of autoimmune and inflammatory disorders with ageing. It is therefore likely that both may be correlated as depletion of gut lining exposes the gut bacterial antigens to host immune mechanisms, which may induce immunity to certain bacterial proteins, but at the same time such immunity may also be auto-immunogenic to host. This autoimmunity may make a protein molecule nonfunctional and thereby may be involved in late onset metabolic, autoimmune and inflammatory disorders such as, Diabetes, Rheumatoid Arthritis, Hyperlipidemias and Cancer. In this in-silico study we found a large number of peptides identical between human and gut bacteria which were binding to HLA-II alleles, and hence, likely to be auto-immunogenic. Further we observed that such autoimmune candidates were enriched in bacterial species belonging to Firmicutes and Proteobacteria phyla, which lead us to conclude that these phyla may have higher disease impact in genetically predisposed individuals. Functional annotation of human proteins homologous to candidate gut-bacterial peptides showed significant enrichment in metabolic processes and pathways. Cognitive trait, Ageing, Alzheimer, Type 2 diabetes, Chronic Kidney Failure (CKF), Chronic Obstructive Pulmonary Disease (COPD) and various Cancers were the major diseases represented in the dataset. This dataset provides us with gut bacterial autoimmune candidates which can be studied for their clinical significance in late onset diseases.


Introduction
Inflammation and autoimmunity is a major factor in almost all late onset diseases, like, Type II Diabetes, Rheumatoid Arthritis, Multiple Sclerosis, Inflammatory Bowel Disorder, Systemic Lupus Erythematosus and Cancer. Of the several mechanisms leading to autoimmunity, molecular mimicry, due to its sequence similarities between foreign and self-peptide, is one such mechanism known to result in cross-activation of pathogen derived autoreactive T or B-cells. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 These T and B-cell can cross react with host epitopes, thus leading to autoimmunity [1]. Viruses and their peptides are the main non-self-molecules studied for autoimmunity.
Human gut is the connection between environment and human system. It therefore, is the major human reservoir of environmental microbes which are kept isolated from rest of the human system by intestinal mucosal barrier. Loss in this barrier may result in epithelial permeability to gut microbiota which may lead to a phenomenon of molecular mimicry leading to autoimmunity. Many studies have shown expression of MHC class II molecules on intestinal epithelial cells (IEC) and that IECs are capable of processing and presenting luminal peptides and proteins to immuno-competent T-cells [2][3][4]. Integrity of this mucosal lining depends on factors like, diet, ageing, microbial interactions and disease conditions. In addition, mucosal secretions decreases with age [5][6][7][8] and it was also shown that ageing increases colonic permeability [9]. This may lead to exposure of microbial peptides to IEC. Antibodies generated against gut microbial peptides cross reactive to human proteins may lead to depletion in such protein functions, leading to late onset diseases. Metabolic, inflammatory and autoimmune diseases are common in ageing population, and the underlining factor could be antibodies to gut microbiome which is enriched in metabolic orthologous genes. Lately, role of gut microbiota in host immunity has opened up a new frontier of biomedical research with perspective of host-environment interactions.
Detailed mechanistic studies revealed that introduction of a single gut microbiota species, in an autoimmune arthritis mouse model was able to trigger disease development. Although, many studies have postulated that effect of microbiota on the systemic immune response is mediated by circulation of microbiota-derived soluble factors from the gut to periphery [10], the autoimmune arthritis study provided an alternative mechanism where microbiota (composition or microbiota-derived products) can affect the immune system by induction of Th17 cells present on lamina propria of small intestine [11,12]. Th17 cells then migrate into the peripheral lymphoid tissue and secrete IL-17, which in turn, acts directly on B cells and induces systemic B cell differentiation and antibody production. This ultimately can lead to development of autoimmune disease via molecular pattern recognition from gut microbiota [13]. Gut microbiota has been shown to play role in inflammatory, and autoimmune disorder like Rheumatoid Arthritis in many other studies as well [14,15] although the mechanism of this association remains obscure. Understanding these mechanisms is crucial for a better treatment efficacy and personalized patient management.
In ageing population and also in inflammatory disease patients, increased levels of serum antibodies to gut bacteria and oral bacteria has been shown [16][17][18][19] which strengthens our belief in gut bacteria having antigenic potential. These studies encouraged us to consider gut bacterial peptides homologous to human peptide to be involved in autoimmunity which might be involved in disease etiology by bringing down effective protein quantity. This molecular mimicry to gut bacteria might be the cause of some late onset (if not all) metabolic, autoimmune/inflammatory diseases in genetically predisposed individuals with intestinal permeability. In ageing population this process might be pronounced as the goblet cell's mucin replenishing ability may decrease and there might be a break-down in cell-cell adhesion junctions in gut [9,20]. Although, the mechanism of processing of bacterial antigens to raise antibodies is not very clear, autoimmunity to otherwise non-pathogenic commensal bacteria might be possible in an ageing population. Exploring the role of gut microbes in intestinal as well as extra-intestinal diseases will significantly advance our understanding of disease pathogenesis and may further help develop strategies for a therapy based on controlling autoimmunity to gut microbiota.
In our understanding, this is the first instance where an in-silico study was carried out to know if gut bacterial peptides has potential to raise antibodies against human proteins. Such peptides could be auto-immunogenic to humans. The auto-antigenicity towards a specific tissue protein may be involved in particular disease pathology depending on the tissue involved. Herein, human proteins cross-reacting to antibodies against candidate peptides were looked for their tissue specific expression, biological processes and to the diseases they are associated with. This approach has given a comprehensive view of plausible effects of autoimmunity raised to the gut candidate peptides on human system.

Methods
Methodology adopted for identifying gut bacterial peptides with auto-immunity potential is depicted in Fig 1 and has also been uploaded at PLOS ONE's site (https://www.protocols.io/ view/gut-bacterial-peptides-with-autoimmunity-potential-ibccaiw).

Candidate peptides identification and characterization
Identification of candidate peptides. Sequence identity search was conducted between all gut bacterial species/genus and complete human proteome. Here we took gut bacterial species/ genus information from Human Microbiome Project database (www.http://hmpdacc.org/) and used all available annotated protein sequences of these bacterial species/genus from Uniprot database (https://www.ebi.ac.uk/uniprot). Wherever, gut bacterial species information was mentioned in HMP database, the related species were looked for sequence homology with human proteins. 319 bacterial strain (from a total of 823 in HMP) found in gut, did not have species information but had genus information. Protein sequences of all the species of these genera were searched against human protein database (Uniprot database). Sequence similarity search using pBLAST (Basic Local Alignment Search Tool for protein sequences) was carried out, to get peptide similarity between gut bacteria and human expressed proteins. Data on length and sequence of homologous regions, and protein/gene IDs and names were recorded. Peptides having homologous regions of !9 aa were included in the study.
All the gut bacterial proteins as well as human proteins corresponding to homologous peptides were subjected to various characterization in order to further understand their biological relevance to disease etiology.

Bacterial characterization
Bacterial proteins were studied for their antigenic potential and for location of peptides in bacterial cell. This may help us in predicting effect of antigen on our immune system. Predominant gut bacterial phyla possessing these auto-immune peptides were then estimated in the dataset. Candidate peptide antigenic potential. The homologous peptides were subjected to HLA class II binding profile using in-silico method. The peptide candidates (as seen in S1 Table) were analyzed for HLA class II using ProPred software [21,22]. Propred is a matrixbased method that allows prediction of MHC binders for various alleles based on experimental binding profiles. Binding scores were generated for 50 HLA class II (HLA-II) alleles. Affinity values were generated for candidate as well as random peptides from the same protein. The threshold binding values for all peptides were subjected for significant association using nonparametric Mann-Whiteny test. A significant difference in binding threshold values defines a difference in binding to candidate peptide than to others from same protein with respect to presence of specific HLA-II allele in the host. However, all the peptides (irrespective of their significant binding to HLA) were considered for further analysis for their role in human diseases. The candidate peptides having association of corresponding human protein with common diseases are displayed in heatmap (Fig 2) to show their binding affinity with various HLA-II alleles.
Bacterial protein location. The set of unique candidate peptides were subjected to PSLpred [23] server for identification of bacterial candidate protein's cellular location, that is, extracellular, periplasmic, outer membrane, inner membrane or cytoplasmic.
Predominant gut bacterial phyla in dataset. The bacterial species encoding candidate peptides were subjected for phylum level classification (www.microwiki.com) to know predominance of each bacterial phylum in our dataset of autoimmune candidate peptides. Difference in number of species in each phylum was compared with number of autoimmune candidates encoded by them. P-value and 95% CI for a significant difference was calculated using pearson's correlation using SISA statistical tool (www.quantitativeskills.com/sisa/).

Human protein characterization
The human proteins corresponding to candidate peptides (candidate human proteins) were characterized using systems biology approach to know which biological process, KEGG pathway, and tissue may be affected and the type of diseases associated, with raising of antibody to the candidate peptide.
Disease association. 575 human proteins were annotated under disease subgroups using the Genetic Association Database (GAD). Genes were classified using GAD module of DAVID functional annotation tool (https://david-d.ncifcrf.gov/summary.jsp). To know diseases with significant number of associated candidate human proteins matching our list, pvalue was generated using boneferroni multiple correction method. A distance matrix to disease was created based on number of gut bacterial species having the candidate peptide with autoimmunity potential. Closer the peptide to the disease in the cytoscape (Fig 3) higher is the number of bacterial species harboring the specific candidate peptide.
Tissue specificity. The details on tissue specific expression of candidate human proteins was generated using Uniprot tissue database (UP_Tissue database) and p-values were calculated through DAVID functional annotation tool. Boneferroni multiple correction was used to get adjusted p-values. A total of 667 candidate human proteins could be annotated for tissue expression.
Gene ontology and functional annotation. Candidate human proteins were studied for their involvement in biological processes using DAVID functional annotation tool. These genes were also studied for their involvement in molecular pathways using KEGG database and significant pathway associations were recorded. All the p-values were checked for multiple corrections using boneferroni. Total 670 and 470 candidate human proteins were annotated by biological processes and KEGG pathways respectively.

Results
Candidate peptides identification and HLA class II binding affinity Identification of candidate peptides. Gut bacterial peptides with !9 amino acids length and having human homology were selected as candidate peptides for further analysis (henceforth denoted as candidate peptides). 15576 candidate peptides (unique candidate peptides = 2295) from 396 unique bacterial species were found to have peptide homology with 689 unique human proteins (candidate human proteins). Information was recorded on their sequence, protein and gene Ids (both human and bacterial) and bacterial species having the peptide (S1 Table).
Binding potential of candidate peptides with HLA-II targets. All 15576 peptides with their binding threshold scores and comparative binding threshold scores for random peptide from the same protein has been listed in S2 Table. Binding threshold of 3 was considered as peptide binding affinity to HLA-II alleles. The threshold scores to common HLA-II alleles for peptides linked with candidate human proteins associated with important diseases have been displayed in Fig 2. Of the total 50 HLA-II alleles tested 35 alleles significantly differed in their binding to candidate peptides than to a random non-homologous peptide from the same protein (S1 Fig). None of the HLA-II alleles were uniformly binding to all the peptides (S2 Table) and therefore, would raise antibodies specific to peptides, if presented with the antigen (peptide). 24 gut bacterial peptides with homology to four human peptides from proteins like (Low molecular weight phosphotyrosine protein phosphatase, Aldehyde dehydrogenase family 3 member B1, Maleylacetoacetate isomerase and Uracil-DNA glycosylase) were found to have binding affinity with either of the ten common HLA-II alleles [24] namely, HLA-DRB1_0101, HLA-DRB1_0301, HLA-DRB1_0401, HLA-DRB1_0405, HLA-DRB1_0701, HLA-DRB1_0802, HLA-DRB1_1101, HLA-DRB1_1302, HLA-DRB1_1501 and HLA-DRB5_0101 (S2 Table).

Bacterial candidates characterization
Predominant gut bacterial phyla. The predominant bacterial phylum in dataset with autoimmune candidate peptides was Proteobacteria and Firmicutes. On the other hand Actinobacteria, Bacteroidetes and Fusobacteria, were under-represented in this dataset. Verrucomicrobia, Synergistetes, Enterobacteria and Tenericutes were found to be very low in HMP gut dataset itself and so were also either absent or under-represented in autoimmune candidate dataset. Bacterial species belonging to Actinobacteria phylum was found to be significantly (p = 0) low in autoimmune candidate peptides (Table 1). Whereas, Proteobacteria (not significant) and Firmicutes (p = 0.04) were having more of such peptides.
Bacterial peptide cellular location. Out of a total unique gut bacterial candidate peptides, 3 were observed to be present on outer membrane, 60 on inner membrane, 15465 are cytoplasmic proteins, 34 were periplasmic and 14 were extracellular (secretory) proteins (S3 Table). Selected peptides (based on their importance in common diseases and binding with common HLA-II alleles) were all found to be cytoplasmic (data not shown). Extracellular/

Fig 2. Clustered heatmap of human candidate proteins associated with late onset complex diseases and their binding affinity threshold with common HLA class II alleles.
Binding affinity threshold [range: 1 (red)-11(green)]. Lower the threshold higher is the binding affinity with particular HLA class II allele. Human candidate proteins and the homologous gut bacterial proteins (Hu. protein_Bac. Protein, as provided in S1 Table) have been indicated on the y-axis. The common HLA class II alleles tested have been indicated on xaxis. Only, human candidate proteins depicted in Fig 2 and having common HLA class II binding affinity are represented here.

Human candidate protein characterization
Candidate human proteins are human proteins having peptide homology to respective bacterial peptides.
Human disease targets. The candidate human proteins were found to be significantly associated with common complex human diseases. 575 candidate proteins were found to have genetic association with diseases. The diseases significantly associated are, Drug Response (DRG), Cognitive trait, Ageing, Spinal Dysraphism (SD), Chronic Kidney Failure (CKF), Chronic Obstructive Pulmonary Disease (COPD) and certain Cancers apart from some other common diseases namely, Type 2 Diabetes (T2D), Hypertension, Alzheimer's Disease, Rheumatoid Arthritis (RA), Schizophrenia, etc. AIDS, despite of a very significant p-value, was not considered as it contains several other disease associations within it. Many of these proteins were found to be metabolic in function (n = 118, without considering genes related to AIDS) and these metabolic proteins were associated with almost all the diseases (Fig 3 and S4 Table).
Human tissue targets. Out of 689 unique candidate human proteins, a total of 667 proteins were classified for specific expression in human tissues. Significant number (boneferroni p-value <0.05) of candidate proteins were associated with tissues like liver, Cajal-Retzius cell, fetal brain cortex, kidney, muscle, adipocyte and lung. Maximum number of candidate proteins were expressed in liver (Table 2 and S5 Table).
Human biological processes and KEGG pathways. Metabolic processes (such as, carbohydrate, lipid, folic acid metabolism, nucleotide and vitamin biosynthesis) and biological processes involved in response to drug, protein folding, translation and oxidation reduction were the significant biological processes associated with candidate human proteins. Of the entire proteins in the carbohydrate metabolic process and mismatch repair category 4.64% and 1.45% respectively were represented in our dataset (Table 3 and S6 Table). KEGG pathway annotated 470 proteins of the dataset and showed metabolic pathway as the most significant pathway which included Glycolysis / Gluconeogenesis, Citrate cycle (TCA cycle), Fatty acid degradation, ABC transporters etc. (Table 4 and S7 Table).

Discussion
Function of the immune system, and so does autoimmunity, is affected by various factors like, host genetics, age and diet. Age of an individual also defines integrity of mucosal layer which is a barrier between host and his microbiota. Microbiota composition is defined by host immunity and vice-versa [25]. Development of immunity, after anergy is established in host, may lead to autoimmune reactions if developed against own functional proteins. Genetically predisposed individuals to autoimmunity are the ones having HLA-II allele that may induce immunity to self-tissue / proteins. Autoimmunity can be raised to certain pathogenic bacteria and also to microbiota [26], through molecular mimicry. As gut is the major source of host interaction with environmental microbes it may be one of the major factors in autoimmunity due to host microbiota in predisposed individuals. As it has recently been proven in in-vitro as well as in-vivo studies that gut microbiota has the potential to induce lymphocytes for antibody production and also to raise IgG antibodies [27,28], this study is well in time to delineate gut bacterial peptides with auto-immunity potential. Through our in-silico study we have tried to delineate gut bacterial peptide candidates having antigenic properties to raise auto-antibodies, in individuals with HLA-II binding affinity to these peptides (genetically predisposed individuals). Here we have also delineated HLA-II alleles that could be a risk factor (i.e. make an individual genetically predisposed) to certain late onset diseases. If we know which gut bacterial composition is low in peptides similar to self and thereby is not a potential autoimmunity trigger, then we can use this information as a preventive therapy in genetically predisposed individuals. This may be achieved by means of antibiotics and/or pre/probiotics. Modulation of gut microbiota is predicted to be possible therapy in some autoimmune disorders [29]. Gut microbial composition has also  been associated lately with extra-intestinal inflammatory conditions like Rheumatoid Arthritis (RA) [14].
In the present study we have found 15576 gut bacterial peptides (S1 Table) having potential to induce autoimmune response in genetically predisposed host (host with compatible HLA-II allele). The common HLA-II alleles show a binding affinity with 24 candidate peptides (binding score <3, S2 Table) encoded by four different human proteins. These four human proteins were found to be associated with diseases such as cancers, COPD, chronic renal failure and T2D and Atherosclerosis (S4 Table). The most common gene binding to common HLA-II alleles is "aldehyde dehydrogenase 3 family member B1" produced by many E. coli strains. This gene encodes isozyme of aldehyde dehydrogenases and may detoxify aldehydes generated by alcohol metabolism and lipid peroxidation. It may therefore play a role in protection from oxidative stress. As it is an isozyme, its function is not indispensible and therefore, autoantibody to this protein may not result in a specific disease but may reduce the efficiency of protection from oxidative stress. There were 70% of HLA-II alleles tested, significantly displaying different binding affinity to random peptides taken from same protein (S1 Fig), indicating that these HLA-II alleles may have a role in host-microbe interaction leading to autoimmunity. On comparing diversity within phyla in our dataset it was found that overall representation of bacterial species in Firmicutes and Proteobacteria phylum is significantly high in our dataset (with autoimmune candidate peptides) ( Table 2). This indicates that a control in diversity and amount of Firmicutes and Proteobacteria may in general have beneficial effect on autoimmune conditions developed due to cross-reactivity to gut bacteria, if any. Proteobacteria is the most diverse bacterial phyla and has been linked to dysbiosis and increased risk of diseases [30]. In our dataset, proteins associated with T2D were observed to be over-represented in Proteobacteria phyla and therefore a control in Proteobacteria phyla may be beneficial for T2D. Proteobacteria was also observed to be over-represented in T2D in other studies [31,32]. This also suggests a plausibility that autoimmunity to metabolic proteins due to the dysbiosis in Proteobacteria and/or gut permeability to Proteobacteria may be the underlining cause of T2D etiology in some. On the other hand, an increase in Actinobacteria might be harmless/beneficial for autoimmune condition raised due to gut bacteria. Diseases significantly associated (p<0.05) with candidate peptides are mainly those with late age of onset and are multifactorial (genetic as well as environmental factors) in its etiology, like those involving Cognitive trait/Ageing, Cancer, Chronic Obstructive Pulmonary Diseases (COPD) and Chronic Renal Failure (CRF) (Fig 3) except for Spinal Dysraphism (SD). Prevalence of all these diseases have been observed to increase from 2005 to 2015 by a factor almost 2 times [33], which could be due to increase in ageing population. This may necessitates exploration of additional age related factors into the disease etiology such as, decrease in mucous secretion by epithelial goblet cells [5]. SD has an occurrence in early phase of life (i.e. fetal development), but in here it was found that nine candidate peptides (P11586, P31939, P22102, P13995, P04818, P48728, P34896, O75891, Q99707) associated with SD are from 'One carbon pool by folate' KEGG pathway. This pathway is involved in folate uptake and so autoimmunity to these peptides in mothers, may lead to a defect in maternal folate uptake, and may induce SD in the fetus (Fig 3, Table 4 and S7 Table). Other major associated diseases displayed in Fig  3 are also multifactorial and late in onset.
T2D and response to rosiglitazone, shows a good trend towards association (unadjusted p = 0.05) with 16.4% of total T2D/edema/rosiglitazone associated genes present in our dataset. T2D is known to be a multifactorial condition with elusive causal factor, therefore, autoimmunity to gut bacteria is worth exploring in this disease. T2D is known to be a metabolic disorder and bacterial genome mainly encodes metabolic proteins. Our dataset of potential autoimmune peptides from gut bacteria is predominated by conserved metabolic proteins, and may therefore have an implication in auto-antibody response specifically to host metabolic proteins.
This indicates that if an autoimmunity developed to gut bacterial peptide is cross reactive to human protein, it might have an influence on health and disease. If proven to be the case, autoimmunity to gut bacterial peptides could be a phenomenon in late onset disease etiologies, which needs to be targeted for an effective therapy.
The tissue with maximum cross-reactivity with gut bacterial peptides, is liver followed by cajal-retzius cell, fetal brain, kidney, adipocyte and lung. Autoimmunity to tissues may cause inflammation leading to diseases involving these tissues. For example: proinflammatory markers has been observed in adipocytes of obese individual with insulin resistance [34]. In our dataset also common candidate human proteins have been observed between adipocyte and T2D (Q13085, P24666, P05091). Likewise, there are some overlapping genes between lung tissue and COPD in our dataset as well as between kidney tissue and Chronic Renal Failure; and between fetal brain tissue and Cognitive trait and Spinal Dysraphism (S3 and S5 Tables). Among molecular pathways Metabolic (such as, Glycolisis/Gluconeogenesis) and ABC transporter pathways are the most significant pathways sharing proteins from the dataset (S6 Table). This indicates importance of proteins involved in metabolic pathways in diseases linked to candidate gut peptides. This study suggests that bacterial peptides rather than bacterial species driven analysis might have more relevance of microbiota to late onset complex diseases.
Hallmarks of autoimmunity have recently been associated with many of the late onset complex diseases, like Diabetes (LADA form of diabetes), Macular Degeneration, Rheumatoid Arthritis, COPD etc. [35][36][37][38]. We can take evidence from published literature or clinical observations of autoimmunity linked with above mentioned disease pathogenesis and look whether the underlining cause of autoimmunity lies within gut bacterial peptides. The in-silico data provided here could be a lead in such directions specifically in diseases those are late in onset.

Conclusions
In this in-silico study we have adopted an approach to identify gut bacterial peptides with potential to raise autoimmunity to the host. We found a large number of gut bacterial peptides which are homologous to human peptides and also binding to HLA-II alleles. Thus, these peptides can stimulate autoimmunity via antibody production to gut candidate peptides. This particularly may occur in aging individuals with depleting mucosal gut lining. Provided that gut permeability being a general phenomenon in ageing population this process may trigger immune responses towards the commensal peptides and contribute to a sustained autoimmunity against human homologous proteins, if remain untreated. This theory seems plausible and needs to be further strengthened by experimentally confirming autoimmunity to commensal gut bacteria. Some interesting observations on correlation of human proteins corresponding to candidate peptides with human diseases and tissues are observed and presented here. More specifically the authors found an association of candidate human proteins with important metabolic processes. Metabolism, is the major process in humans, which gets affected in a genetically predisposed ageing population. Is the underlining cause being gut bacterial antibodies which are cross-reactive to host metabolic proteins? remains to be answered. If so the therapy to this may be preventive in nature which may involve repair of gut mucosal barrier. For the diseased cases depending on genetic predisposition (HLA-II allele) of the individual, the individual may be screened and treated with personalized therapy based on presence and type of antibody. Given the fact that the two phylum (Proteobacteria and Firmicutes) found to be associated with T2D, were also observed to have higher proportion of autoimmune peptides in this study, the present study is promising to be taken up for clinical findings. Finally, we would like to conclude that gut bacterial peptides could be the so called environmental trigger to late onset complex diseases in genetically predisposed individuals. Although, the findings of this study using mega data contributes towards understanding of molecular mechanisms which may underline late onset diseases, the present study is an in-silico study and needs further experimental confirmation.