In silico search for modifier genes associated with pancreatic and liver disease in Cystic Fibrosis

Cystic Fibrosis is the most common lethal autosomal recessive disorder in the white population, affecting among other organs, the lung, the pancreas and the liver. Whereas Cystic Fibrosis is a monogenic disease, many studies reveal a very complex relationship between genotype and clinical phenotype. Indeed, the broad phenotypic spectrum observed in Cystic Fibrosis is far from being explained by obvious genotype-phenotype correlations and it is admitted that Cystic Fibrosis disease is the result of multiple factors, including effects of the environment as well as modifier genes. Our objective was to highlight new modifier genes with potential implications in the lung, pancreatic and liver outcomes of the disease. For this purpose we performed a system biology approach which combined, database mining, literature mining, gene expression study and network analysis as well as pathway enrichment analysis and protein-protein interactions. We found that IFI16, CCNE2 and IGFBP2 are potential modifiers in the altered lung function in Cystic Fibrosis. We also found that EPHX1, HLA-DQA1, HLA-DQB1, DSP and SLC33A1, GPNMB, NCF2, RASGRP1, LGALS3 and PTPN13, are potential modifiers in pancreas and liver, respectively. Associated pathways indicate that immune system is likely involved and that Ubiquitin C is probably a central node, linking Cystic Fibrosis to liver and pancreatic disease. We highlight here new modifier genes with potential implications in Cystic Fibrosis. Nevertheless, our in silico analysis requires functional analysis to give our results a physiological relevance.


Introduction
Cystic Fibrosis (CF) is the most common lethal autosomal recessive disorder in the white population. Its incidence is one in 2,500 with a carrier frequency of one in 25. The Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene, causing CF, was identified in 1989 [1] and located on chromosome 7q31.2, spanning a transcription unit of about 216.7 kb with 27 exons [2]. It encodes a transmembrane protein (CFTR, 1,480 amino acids) which is an ATPbinding cassette transporter functioning as a chloride (Cl -) channel [1][2][3][4][5][6]. The  CF associated genes, PI and Liver Disease. CF and CF associated genes were used to retrieve genes directly involved in CF as well as genes potentially implicated in the disease. The Comparative Toxicogenomics Database (CTD, http://ctdbase.org/, [27]) was also used. It curates relationships between genes and human diseases in a unique fashion which integrates gene/protein-disease relationships. In CTD, CF, Pancreatic diseases and Liver Disease key words were used. In CTD, disease-gene associations are reported as curated or inferred. We selected curated associations due to a higher confidence than inferred associations. Solely the genes that may be biomarkers of a disease or play a role in the etiology of a disease were selected.
Lastly, the Human Genome Epidemiology encyclopedia (HuGE Navigator, http://www. hugenavigator.net/HuGENavigator/startPagePubLit.do, [28]) which mines the scientific literature on human gene-disease associations was used. It is a database of population-based epidemiologic studies of human genes [29]. In HuGE Navigator we used the CF, PI and Liver Disease key words. The genes with a genetic association above 5 were selected (score formula is described in [30]).
Different key words were used among databases because some of them did not permit to retrieve any result.
Venn diagrams were drawn to find common genes from these databases.

Comparison with published gene lists from bibliography
Differentially expressed genes in CF were retrieved from works published in PubMed (http:// www.ncbi.nlm.nih.gov/pubmed). Up-and down-regulated genes were retrieved from a paper [31] in which 4 independent studies [32][33][34][35] and a re-analyzed list of genes [36] were compared. We used here a list of 75 up-regulated and 114 down-regulated genes which are shared by these studies and having the same direction of expression in at least two studies [31][32][33][34][35][36][37]. These genes are given in Table 1.
To search for up-and down-regulated genes in Liver Disease, we focused on gene expression in Human non alcoholic steato-hepatitis (NASH, [38][39][40][41]). The rationale that liver disease in NASH and liver of patients with CF share common pathways is based on the fact that steatosis is one of the most common hepatic affection in CF (prevalence of 23-75%). It is observed in 70% of children with liver disease, independently of their nutritional status. Together with a uniform hyperechogenicity of their liver, some pseudomasses may be seen by ultrasound and are due to lobulated fatty structures. In about 60% of the cases, the steatosis is associated with hepatic enzymes elevation (aminotransferases). Because this is observed in children, it is likely not due to alcohol consumption. Therefore, beside the many liver affections observed in livers from CF patients, they present many features of NASH (for review: [42]). Liver Disease key word was not used because it did not permit to retrieve studies with gene analysis. In a first study [38], 16 genes were differentially expressed in subjects with NASH when compared with healthy controls. 12 and 4 genes were significantly under-and 4 over-expressed in NASH, respectively. In a second study [39] comparing NASH samples versus controls, 14 genes were found with more than a two-fold difference. In the third study [40], comparison of non-obese controls patients with NASH exhibited 34 differentially expressed genes (> two fold). We noticed that the expression of 19 genes among 34, were not different in the obese and nonobese controls, showing that the second study using obese controls could be used here. In a last study [41], 9724 genes were differentially expressed in the normal liver versus NASH, (21,619 genes were unchanged).
To search for disease-specific genes in Chronic Pancreatitis, a study showing 152 and 34 genes with increased and decreased expression, respectively, was used [43]. Full name and OMIM identification for these 186 genes was searched. We used another study from which we extracted specific genes in Chronic Pancreatitis [44].
In our study, some genes were withdrawn because the corresponding record was deleted by NCBI or because the term was not found in Gene records. Pseudogenes and non human genes (artifacts) were also withdrawn.

Expression analysis
Datasets restricted to humans, were collected from the Gene Expression Omnibus (GEO, [45]) and GEO2R was used for the analysis [46]. For this purpose, experimental groups were reassigned. The following datasets were passed through the GEO2R online application: GSE40445 (Gene expression in CF-vs-non CF airway epithelial cells from nasal brushing), GSE48452 (Gene expression in control liver-vs-NASH) and GSE44314 (Gene expression in type 1 diabetes-vs-healthy controls). Type 1 diabetes was used because no PI dataset was found. Samples were assigned to the case or control groups. The GEO2R uses the limma (Linear Models for Microarray Data) package from Bioconductor and was designed to analyze complex experiments involving comparisons between many RNA targets simultaneously, with the idea to fit to a linear model to the expression data for each gene. Whereas limma provides Table 1. Retrieved differentially expressed genes in CF (adapted from [31]). The lower part of the table shows genes shared between two or more studies when all six studies are combined. Common genes in at least three studies are underlined.

Origin of the gene lists (References)
Regulated genes shared with [ several p-value options, we applied the adjustment to the p-values also called multiple-testing corrections, to correct the occurrence of false positive results. The Benjamini & Hochberg false discovery rate method, commonly used to adjust microarray data, was selected. After GEO2R dataset analysis, genes of interest were selected by using the criteria of a Bonferroni-corrected p-value 0.05. According to the annotation files, probes that were not corresponding to any genes and genes which were present several times within a single study were eliminated. Differentially expressed transcripts between normal and pathologic were selected and common genes between GSE40445, GSE48452 and GSE44314 were searched by drawing a Venn diagram (not shown).

Analysis of enrichment of KEGG biochemical pathways in CTD
Our target genes were mapped to biological process pathways using CTD (http://ctd.mdibl. org). Biocurators at CTD manually curated gene-disease relationships from the literature as well as pathways. The core data are integrated to construct pathway networks. The Venn-Viewer function, which permits to compare associated datasets, was used for the genes that were found to be potential modifiers in CF, Liver Disease and Chronic Pancreatitis. The Pathway associations function was selected to compare annotated enriched pathways, as well as pathways for disease for these genes. Associated data for interacting genes and diseases, inferred KEGG and REACTOME pathways, or enriched GO terms were downloaded. Annotated pathways for genes are associations established by KEGG and REACTOME curation. The significance of a given pathway is reported as a p-value.

Gene-disease association search
Gene-disease associations were search in CTD in which associations are extracted from the published literature or are derived from the OMIM database using the mim2gene file from the NCBI Gene database. Curated associations were retrieved according to their Bonferroni-corrected p-value < 0.01.

PPI networks design
PPI networks were obtained by STRING 10.0 (Search Tool for the Retrieval of Interacting Genes and Proteins, http://string-db.org/). STRING is a global resource for searching connections (edges) between genes or proteins (nodes) currently covering 9,643,763 proteins. Each edge has a confidence value between 0 and 1, lowest and highest confidence, respectively. The gene's names were used as an initial input and experimental data sets were selected to display interactive networks. The parameters were defined as follows: all PPI prediction methods were enabled, except for High-throughput Lab Experiments; maximum of 5 interactions by node; cut-off criterion of combined score ! 0.7 (interactions at high confidence or better); Homo sapiens.

Results and discussion
Study design Common genes were searched in CF, pancreatic disease and Liver Disease. Differentially expressed genes in CF were also retrieved from published results in PubMed. Finally, Datasets were collected from the GEO and re-analyzed. Because the CF, Chronic Pancreatitis and Liver Disease keywords did not give any result, type 1 diabetes-vs-healthy control and control liver-vs-NASH were used in GEO. Candidate genes from these databases were compared and analyzed to search for potential common genes and pathways. PPI were further searched to give our results a physiological relevance.

Gene selection from databases
OMIM is a comprehensive, daily-updated human phenotype database, containing more than 12,000 genes of all human genetic diseases. The searches in OMIM using CF, CF associated genes, PI and Liver Disease key words led to 100, 109, 152, 1192 genes, respectively (Fig 2). The intersection of CF and CF associated genes showed 42 different genes and 46 genes were shared by PI and Liver Disease. The 6 genes shared by CF, CF associated genes and PI keywords were S100A8/S100A9 (OMIM Accession ID: 123885), CFTR (OMIM Accession ID: 602421), SCNN1G (OMIM Accession ID: 600761), TGFB1 (OMIM Accession ID: 190180), Common genes in CF, PI and Liver Disease were retrieved from OMIM, CTD and HuGE. Candidate genes were also retrieved from published works (PubMed) and datasets were re-analyzed in GEO. Potential modifier genes from the 3 different origins were compared and analyzed to search for pathways and protein-protein interactions.
https://doi.org/10.1371/journal.pone.0173822.g001 SERPINA1 (OMIM Accession ID: 107400) and PKD1 (OMIM Accession ID: 601313). In the intersection of CF plus CF associated genes with Liver Disease key words, we found 24 genes (Fig 3). Finally, the number of genes in the intersection of the pooled CF and CF associated genes search and the pooled PI and Liver Disease search was only 4. These genes were CFTR (OMIM Accession ID: 602421), TGFB1 (OMIM Accession ID: 190180), SERPINA1 (OMIM Accession ID: 107400) and PKD1 (OMIM Accession ID: 601313). Using the CTD database, 11, 142 and 813 genes were retrieved using the CF, Pancreatic diseases and Liver Disease key words, respectively. Fig 4 presents a Venn diagram showing the shared genes between these key words. Only 3 genes were found to be shared among the 3 used key words: TGFB1 (NCBI Accession ID: 7040), TNFRSF1A (NCBI Accession ID: 7132), CFM1 (NCBI Accession ID: 10167). Surprisingly, among some differences with the OMIM search, the CFTR gene was not found to be present in the list of genes retrieved using the Liver Disease key word in CTD.
HuGE Navigator database is an integrated, searchable, Web-based knowledge base which mines the scientific literature on human genetic associations and human genome epidemiology [29,30]. In HuGE Navigator, common genes (score > 5) in PI, CF and Liver Disease were CFTR, SPINK1, GSTP1, PRSS1, ADRB2, GSTM1, GSTT1, HRAS, MIF, OGG1 and TNF. The corresponding OMIM Accession IDs are given in Fig 5. Differential gene expressions retrieved from bibliographic analysis Differentially expressed genes in CF were retrieved from PubMed. First, a small-scale microarray study performed with human native nasal epithelial cells (F508del homozygous patients vs. controls) was used [31]. We also retrieved genes from other papers aimed to compare gene expression in CF vs normal cells [32]. Another work using non-CF and F508del-CFTR homozygote samples, showed significant changes in the expression in 24 genes (two-sample t-test, p < 0.00001). A three-filter comparative analysis showed that 18 genes were significantly increased and 6 genes were decreased in CF relative to and non-CF samples, respectively [33]. We also retrieved genes from a microarray study in which results from 12 CF and 11 non-CF participants were used [34], and from data in which functional CFTR was absent of the plasma membrane [35]. Finally, we used a paper in which 4 studies evaluating the effect of the F508del-CFTR mutation on airway epithelial cells gene expression were analyzed [36]. A profiled gene expression in CF and non-CF nasal and bronchial epithelium samples, using Illumina HumanRef-8 Expression BeadChips was used. It showed that 863 genes were differentially expressed between CF and non-CF bronchial epithelium and that only 15 were differentially expressed between CF and non-CF nasal epithelium [37]. This indicated that within airways, gene expression varies depending on the region that is studied. Up-and down-regulated genes were compared within these studies [32][33][34][35][36][37] and common genes were searched (Table 1). We found that 75 genes were up-regulated In silico search for modifier genes in Cystic Fibrosis and 114 were down-regulated in human airway cells expressing F508del-CFTR (Table 1, lower part).
Nonalcoholic fatty liver disease has a large spectrum ranging from simple steatosis to NASH, which may lead to progressive fibrosis. In the first study that we used [38], the abundance of intra-hepatic messenger RNA for a broad array of genes was measured. From this study we retrieved differentially expressed genes in NASH-vs-normal liver. Among 6,412 genes, only 16 were differentially expressed in subjects with NASH when compared with controls ( Table 2). We used another study using microarrays [39]. 14 genes for NASH-vs-obese controls were found to be up-regulated (Table 2). In a third study, genes in NASH patients and controls were selected [40]. Whereas, 34 genes were differentially expressed in NASH vs non-obese controls, 19 of these genes had no significant differences in obese vs non-obese, suggesting a stronger association of these genes to NASH. Therefore, we used this list of 15 genes (Table 2). Finally, a study with normal, steatotis, NASH with fatty liver, and NASH without fatty liver samples was analyzed. Among 11,633 genes with altered expression out of 33,252 genes, 39 genes were changed in expression, between normal and NASH [21]. Thus, a list of 84 differentially expressed genes from 4 different studies was used in the present study ( Table 2). For pancreatic disease, a study with normal and Chronic Pancreatitis specimens was used [43]. Comparison of the expression of 5,600 genes between the normal and Chronic Pancreatitis was performed. GenBank accession numbers of 152 genes with increased expression ( [43], Footnote 1) and 34 with decreased genes levels in Chronic Pancreatitis ( [43], Footnote 2) were retrieved. We analyzed those genes one by one and removed non human genes and pseudo genes. This led us to list 129 increased genes and 23 decreased genes in Chronic Pancreatitis (Table 3). Because 152 genes were simultaneously increased in pancreatic cancer, only 5 of 5,600 genes were significantly over expressed in Chronic Pancreatitis compared with normal pancreas: Mucin-6 (GenBank Accession Number: L07517), COMP (GenBank Accession Number: L32137), TPSB1 (GenBank Accession Number: M33493), Rearranged Ig-lambda light chain (GenBank Accession Number: X57809) and CRISP-3 (GenBank Accession Number: X95240). An analysis of 6,800 different genes expressed in samples of normal pancreas and Chronic Pancreatitis was studied [44]. 107 genes were predicted to be expressed within cells with Chronic Pancreatitis ([44], Table 1). Genes from both studies [43 and 44] were retrieved and non human, pseudo genes and common genes were withdrawn. Finally, we found 23 decreased genes and 229 increased genes in Chronic Pancreatitis, when compared to normal pancreas (Table 4).

Differential gene expressions retrieved from microarray studies
We also acquired gene array data from the GEO database, a public available archive of individual microarrays studies [46,47]. GSE40445 [31], is a study of gene expression in human native  Table 2. Retrieved differentially expressed genes in Liver Disease (from [38][39][40][41]). When possible, names or abbreviations as well as OMIM accession number were added (in italic). The number sign # is used because α1-antitrypsin deficiency is caused by mutation in the SERPINA1 gene (OMIM: 107400). "?" is used when no result was obtained in our OMIM search for the corresponding gene name or when several results could be retrieved. The lower list is retrieved from [21] shows the Solute Carrier Family (SLC, sodium/potassium/chloride transporter family) with differential gene expression in NASH. Uptake transporter with Differential Gene Expression (from 41): In silico search for modifier genes in Cystic Fibrosis nasal epithelial cells from F508del-CFTR homozygous patients and non-CF controls. GSE48452 [48] is an expression profiling performed with samples of control human liver, healthy obese, steatosis and NASH). Only controls and NASH groups were re-analyzed here.

Reference
For PI, we failed to find any array analysis. Therefore, we used a gene expression of type 1 diabetes-vs-classical type 1A diabetes-vs-healthy controls array (GSE44314) [49]. We solely reanalyzed classical type 1A diabetes-vs-healthy controls. 246, 211 and 145 genes were retrieved from our GSE40445, GSE48452 and GSE44314 re-analysis, respectively (Table 5). A Venn diagram was drawn (not shown) but no common genes among the genes selected from the 3 studies were found. Therefore, if there are some common genes in the 3 studies, they do not appear among the most highly regulated genes of each study.

Common genes in CF, retrieved from bibliography and microarray studies
We then search for common genes between the genes which were obtained by the bibliographic search for CF genes (Table 1), Liver Disease (Table 2) and Chronic Pancreatitis (Tables  3 and 4) and the genes which were retrieved by the re-analyzing of GSE40445, GSE48452 and GSE44314. The result is summarized in Table 6. Only 4 genes were found to be common to both groups of genes retrieved from the bibliographic analysis and the GEO database analysis, regarding CF. These genes were IFRD1 (Interferon-related developmental regulator 1, OMIM Accession ID: 603502), IFI16 (Interferon gamma inducible protein 16, OMIM Accession ID: 147586), CCNE2 (Cyclin E2, OMIM Accession ID: 603775) and IGFBP2 (Insulin-like growth factor-binding protein 2, OMIM Accession ID: 146731). This low number of common genes is likely due to the fact that in microarray experiments, the recorded intensities depend on the conditions under which the measurements are made [50].

Genes expressed at increased levels in Chronic Pancreatitis OMIM ID Reference
LGALS4     We failed to find published works from PubMed, showing that IFI16, CCNE2 and IGFBP2 are also possibly involved in CF. Therefore, we searched in previously published works some lines of evidence to link these genes to CF.
IFI16 has a possible role in chronic inflammatory autoimmune disorders. The IFI16 protein is normally expressed in nuclei but may be mislocalized in the cytoplasm and secreted. Indeed, significant levels of extracellular IFI16 protein have been identified in the sera of patients with autoimmune diseases [52]. These data provide evidence for a function of IFI16 upon inflammation and tissue damage, as it is also observed in CF.
The human CCNE2 gene encodes a 404-amino-acid protein, related to cyclin E. It is associated with Cdk2 in a functional kinase complex [53] and is involved in the G1 progression [54]. Activated neutrophils elastase is found in high concentrations in the airways of CF patients [55] and has injurious effects on airway epithelial cells. Because the treatment of normal human bronchial epithelial by elastase results in G arrest, due to cyclin E complex inhibition ( [56]), we propose that CCNE2 is possibly involved in the pathophysiology of CF.
The insulin-like growth factors (IGF) stimulates growth of multiple cell types. IGF-II access to the cell surface receptor is mediated by IGFBP-2 [57]. It was previously described that serum IGF-II levels were significantly lower in CF patients than in controls, whereas serum IGFBP-2 was significantly higher [58]. IGF-II/IGFBP-2 molar ratios were also found to be significantly lower in CF. Because chronic inflammation is an important modulator of the IGF/ IGFBP system in CF and because IGF2 promotes development of pancreatic beta cells, related to some forms of diabetes mellitus, we also propose that IGFBP-2 is a modifier gene in CF.
In conclusion, we propose here that IFI16, CCNE2 and IGFBP2 are new candidates of modifiers genes in CF.
Common genes in CF and Chronic Pancreatitis and in CF and Liver Disease retrieved from bibliography and microarray studies. Common genes in CF-vs-non CF re-analysis of SE40445 and bibliographic search for Chronic Pancreatitis and Liver Disease were EPHX1 (Epoxide Hydrolase 1 Microsomal, OMIM Accession ID: 132810) and SLC33A1 (Solute carrier family 33 (Acetyl-CoA Transporter), member 1, OMIM Accession ID: 603690), respectively. EPHX1 gene encodes a ubiquitous enzyme (α/β hydrolases), localized in the ER. Mutations in the EPHX1 gene contribute to several diseases [61] in link with tobacco exposure and lung Table 6. Genes involved in CF, Chronic Pancreatitis and Liver Disease. Common genes present in our re-analyzed data of the GEO database and in our bibliographic analysis. Common genes in CF and in pancreatic and in liver affections are underlined.
Differentially expressed genes in the bibliographic search for CF and in the Diabetes-vshealthy re-analysis of GSE44314 were the Major Histocompatibility Complex Class II genes DRB1 and DQA1 (OMIM Accession ID: 142857 and 146880, respectively) and the Desmoplakin gene (DSP, OMIM Accession ID: 125647) ( Table 6). HLA-DQA1 and HLA-DQB1 are likely genetic markers of the Celiac Disease [71] which was suggested to be a risk factor in CF [72,73]. Therefore, together with our present results, we propose HLA-DQA1 and HLA-DQB1 genes, as new modifiers in CF. Desmoplakin is a cytoskeletal linker protein conferring the structural integrity of tissues by linking intermediate filament cytoskeleton from cell to cell (for review: [74]). In remodeled airways, as in CF, abnormalities of the cytokeratins and desmoplakins are observed [75]. According to the importance of intermediate filaments and cell-to-cell communications in CF [76,77], DSP gene expression is likely involved.
Involvement of Class II major histocompatibility complex (MHC) genes in CF were discussed above. Because HLA Class II Polymorphism is already known to be a modifier of the pulmonary phenotype in CF [78,79] our finding is not new.
GPNMB is a transmembrane glycoprotein expressed in the heart, lung, and small intestine [80]. The GPNMB gene is a p53-and androgen-dysregulated gene with antiproliferative and antitumorigenic effects in prostate [81]. To our knowledge, its role in CF has never been studied.
Chronic granulomatous disease (CGD) is the most common phagocyte immunodeficiency due to mutations in a gene encoding NADPH oxidase in phagocytes. As in CF, CGD is characterized by recurrent infections and inflammation in the lungs. Because patients with mutated NCF2 gene present severe infections and CGD [82,83] NCF2 gene is a good candidate for modifying CF.
RASGRP1 plays a critical role in T cell receptor (TCR) stimulation and is essential in the Natural Killer (NK) cell activation and in the immune response [84,85]. The NK cells are of particular importance in CF as they play a central role in clearing P. aeruginosa from the lung [86,87].
LGALS3 gene encodes a β-galactoside-binding lectin which plays a role in apoptosis and innate immunity. Because LGALS3 corrects F508del-CFTR trafficking [88], its gene is a good candidate as a modifier in CF.
The protein encoded by PTPN13 is a tyrosine phosphatase with five PDZ (postsynaptic density protein 95/discs large/zonula occludens-1) domains that bind to plasma membrane and cytoskeleton. The C terminus of CFTR binds various PDZ domain-containing proteins including EBP50, CAL and members of the GRASP family [89]. Despite it was suggested a possible role of reduced PDZ interactions in the accelerated internalization of F508del-CFTR [90], the role of CFTR-PDZ interactions remains incompletely resolved [88]. The PTPN13 gene is therefore of interest and the encoded protein may bind and regulate CFTR.

New potential modifier genes
In conclusion, we found three groups of potential modifier genes. From our bibliographic analysis and microarray studies, we propose here that IFI16 (NCBI ID:

Gene-pathway association of the new potential modifier genes
To better understand the biological function of our new potential modifier genes, we search for their annotated pathways, which are associations established by KEGG [91] and REACTOME [92]. Biological pathways describe biological processes. Therefore, they can be used to integrate and visualize gene products in different conditions (healthy vs disease). KEGG and REACTOME pathway data show known molecular interactions and reaction networks. These data, integrating genes and diseases provide insights into molecular networks involving our depicted candidate genes. Indeed, pathway enrichment analysis permits to decipher biological functions associated with these genes [93]. First, the three common genes in CF (IFI16, CCNE2 and IGFBP2), which were retrieved from Table 6, were submitted to the pathway search, using VennViewer in CTD. 1, 1 and 9 pathways were retrieved for IFI16, IGFBP2 and CCNE2, respectively (Fig 6). No pathway was common between those genes. The 4 common genes in CF and Chronic Pancreatitis and the 6 common genes in CF and Liver Disease were further submitted to the pathway search ( Table 7). The only common pathway between CF, Chronic Pancreatitis and Liver Disease was "Immune System" (REACTOME: 6900). The implicated genes were IFI16 for CF, HLA-DQB1 for CF plus Chronic Pancreatitis and LGALS3, NCF2 and RASGRP1 for CF plus Liver Disease (Table 7). Therefore, genes of the Immune System are likely involved in CF and in its association with Chronic Pancreatitis and Liver Disease.
We further search for the diseases that are statistically enriched among our genes. Modifiers candidates in CF (IFI16, CCNE2 and IGFBP2), the common genes in CF and Chronic Pancreatitis (EPHX1, HLA-DQA1 -DQB1, DSP) and in CF in link to Liver Disease (SLC33A1, GPNMB, NCF2, RASGRP1, LGALS3 and PTPN13) were entered as a single data set in CTD. According to CTD, a disease was considered enriched if the proportion of genes annotated to it in our test set was significantly larger than the proportion of all genes annotated to it in the genome. The results are presented Table 8.

Biological function and PPI networks association of the new potential modifier genes
For a full description of the proteins which are encoded by our candidate modifiers genes, we search for their potential interactions using the recent 10.0 version of STRING [94]. We used the basic interaction unit in STRING for functional association between proteins, derived from known experimental data. Given the importance of interactions in protein function, we search for groups of interacting proteins into functional sets, in physical complexes.
We first search for high confidence (! 0.7) interactions involving the proteins encoded by IFI16, CCNE2 and IGFBP2 (CF candidates). We added the CFTR protein to the query. As shown in Fig 7A, a network of 9 nodes and 10 edges was obtained (clustering coefficient: 0.744). Whereas IGFBP2 did not belong to the network, we observed that CFTR and IFI16 have a common partner: Ubiquitin C (UBC, OMIM: 191340). IFI16 and UBC are involved in the innate immune signaling pathways [95]. Furthermore, Ubiquitin modification is required for the proteasomal targeting of CFTR, which is ubiquitinated in the ER during assembly and during recycling from the cell surface [96]. Therefore, modifications in the IFI16 and/or UBC gene expression could lead to modifications of the maturation of CFTR, showing the importance of IFI16 as a modifier gene in CF, in link to immunity. CCNE2 binds to CDK2 in a catalytically active complex which modulates the cell cycle progression [97] and CDK inhibition corrects F508del-CFTR proteins [98]. Together with the role of UBC upon the proteasomal targeting of CFTR, the physical and functional complexes involving CCNE2, UBC and CFTR (Fig 7A) indicates that CCNE2 is likely a modifier gene in CF. Gene-pathway association of the new potential modifier genes in KEGG and REACTOME. The 3 common genes in CF, which were retrieved from Table 6, were submitted to the pathway search, using VennViewer in CTD. 1, 1 and 9 pathways were retrieved for IFI16, IGFBP2 and CCNE2, respectively. No pathway was common between those genes. https://doi.org/10.1371/journal.pone.0173822.g006 In silico search for modifier genes in Cystic Fibrosis Interactions involving the proteins encoded by common genes in CF and Chronic Pancreatitis (EPHX1, HLA-DQA1 -DQB1, DSP and CFTR) were searched (Fig 7B). Whereas, HLA-DQA1 and HLA-DQB1 did not belong to any complex, EPHX1, DSP and CFTR were linked by UBC. The Ubiquitin-proteasome system (UPS) is an important regulator for the intracellular  (1) : Bonferroni-corrected p-value; (2) : Genotype frequency in a population is the number of individuals with a given genotype divided by the total number of individuals in the population. https://doi.org/10.1371/journal.pone.0173822.t008 In silico search for modifier genes in Cystic Fibrosis trafficking of proteins and a UPS-dependent stabilization of cell-cell contacts involving DSP was shown [99]. Because a low activity of EPHX1 is a risk factor for COPD in Caucasian [63] and because an increased UPS activity is observed in COPD [100], a functional link between EPHX1 and UBC is likely involved in CF and Chronic Pancreatitis. Fig 7C shows the networks formed by the proteins encoded by the genes found in CF in link to Liver Disease (SLC33A1, GPNMB, NCF2, RASGRP1, LGALS3 and PTPN13). GPNMB, RASGRP1, LGALS3 and PTPN13 were not found to be involved in a network, with the used search parameters. NCF2 was linked to NCF4 and to NCF1. NCF1-NCF2-NCF4 (NCF1-4, neutrophil cytosolic factors 1 to 4) from a multicomponent enzyme system (NADPH-oxidase) responsible for oxidative bursts. We failed to find any information regarding a potential involvement of the NCF1-NCF2-NCF4 enzymatic complex in CF, in PubMed. Interestingly, we observed that SLC33A1 forms a protein complex together with CFTR, with UBC as an intermediate. As mentioned above, IRE1/XBP1 controls ERAD by activating the expression of SLC33A1 [70]. UBC is also involved in ERAD which is a key element of the CFTR degradation. This reinforces our proposition that SLC33A1 is a strong modulator in CF.
Finally, a network was drawn using the proteins encoded by the genes involved in CF, in CF plus Chronic Pancreatitis and in CF plus Liver Disease, which were found in a network linked to the CFTR protein (Fig 7D). UBC was observed as a central node. It is well known that F508del-CFTR protein degradation involves Ubiquitin modification. It was previoulsy shown that the de-ubiquitinating enzyme, Ubiquitin C-terminal hydrolase-L1 (UCH-L1), is highly expressed in CF airway epithelial cells and that there is a positive correlation between UCH-L1 expression and steady state levels of Wt-CFTR protein and F508del-CFTR [101]. Although it is not sufficient to rescue F508del-CFTR, the effect of UCH-L1 upon CFTR processing shows its potential roles in CF. UBC is thus a key component in CF.

Conclusion
CF is characterized by progressive lung disease with inflammation, pancreatic exocrine insufficiency and fatty liver disease. Liver disease is found in two-thirds of sick childrens [102][103][104]. Inflammation is indeed a hallmark of CF and is characterized by bacterial infection, neutrophils infiltrations in the lung and high levels of cytokines. Whereas, inflammatory response is deregulated and excessive, it is not known whether abnormal CFTR is responsible or a consequence. The most common hypothesis is that a defective CFTR leads to a decreased airway surface liquid which fails to clear infected secretions from the lung, triggering the excessive inflammatory response. It is therefore still controversial whether this hyper-inflammation is solely the result of the chronic infection or is a primary event due to CFTR defects [105]. Regarding liver, three types of Liver Disease are observed in CF patients: steatosis, cirrhosis and biliary fibrosis. Fatty infiltration of the liver being found in 30% at biopsy and in 60% at the autopsy of the CF patients, is the most common hepatic alteration [106]. In CF, PI phenotype is a hallmark in patients with two severe alleles such as F508del [16] and CF Related Diabetes (CFRD) is observed in more than 50% of patients. Whereas CFRD is a unique entity, it exhibits common features of both type 1 and 2 diabetes. However, the hallmark of CFRD is insulin deficiency, as in type 1 diabetes in which pancreatic beta cells are destroyed [107].
Despite residual involvement of CFTR's sequence variation upon lung function, modifiers are of main importance. Indeed, it is clear that differences in the genotype in CF are not solely responsible for the disease variability [108] and large-scale genome-wide association studies (GWAS) were undertaken to search for genetic determinants of phenotypic variation (for review, [23]). Nevertheless, further studies are still needed. Therefore, our aim was to depict new candidate genes with modifying activity in CF, in association with liver and pancreatic diseases. Using databases, bibliographic mining and gene expression analysis we found some candidate genes specific to lung, liver and pancreas. Nevertheless, two groups of different genes could be built, depending on the way they were retrieved. Unsurprisingly, databases search led to known genes involved in CF. Therefore, we focused on genes obtained by microarray results re-analysis, compared to those from our bibliographic analysis. We highlighted 3 genes specific to CF, 4 genes specific to CF and Chronic Pancreatitis and 6 genes for CF and Liver Disease. Nevertheless, we failed to find common genes in CF, Chronic Pancreatitis and Liver Disease. Pathways associated with these retrieved potential modifiers indicated that immune system is likely involved and that Ubiquitin C is the central node linking CF to liver and pancreatic disease. A schematic representation of the results is proposed in Fig 8. Despite we propose here some new genes involved in CF, we are fully aware that some other modifier genes may be missing and that our in silico analysis requires functional analysis to give our results a physiological relevance. Indeed, our analysis may present some limitations due to the statistical combination of results from several studies which may present variable quality and heterogeneity of the used tools. This could lead to misleading results, inherent to any systematic meta-analyses which may be non-exhaustive or over interpreted. Database and literature analysis can also be limiting because of incomplete knowledge base leading to byproduct results. Because in the manual curation of the literature we have not performed an exhaustive In silico search for modifier genes in Cystic Fibrosis review of all the genes and because the used studies have been performed using different methodologies and have different power, no statistical assessment was possible here. Finally, gene expression profiles may be affected by the variability between the used cell models, their intrinsic properties and their type of statistical analysis.
In conclusion, we propose here a methodology that may be used for some other genetic disease with great variability and identify new candidate modifier genes in CF, related to lung, pancreas and liver, such as Ubiquitin C.