YARG: A repository for arsenic-related genes in yeast

Arsenic is a toxic metalloid. Moderate levels of arsenic exposure from drinking water can cause various human health problems such as skin lesions, circulatory disorders and cancers. Thus, arsenic toxicity is a key focus area for environmental and toxicological investigations. Many arsenic-related genes in yeast have been identified by experimental strategies such as phenotypic screening and transcriptional profiling. These identified arsenic-related genes are valuable information for studying arsenic toxicity. However, the literature about these identified arsenic-related genes is widely dispersed and cannot be easily acquired by researchers. This prompts us to develop YARG (Yeast Arsenic-Related Genes) database, which comprehensively collects 3396 arsenic-related genes in the literature. For each arsenic-related gene, the number and types of experimental evidence (phenotypic screening and/or transcriptional profiling) are provided. Users can use both search and browse modes to query arsenic-related genes in YARG. We used two case studies to show that YARG can return biologically meaningful arsenic-related information for the query gene(s). We believe that YARG is a useful resource for arsenic toxicity research. YARG is available at http://cosbi4.ee.ncku.edu.tw/YARG/.


Introduction
Arsenic (As), the 20 th most abundant element on earth, is a toxic metalloid. In nature, arsenic is found in two chemical forms: inorganic species [arsenite (As 3+ ) and arsenate (As 5+ )] and organic species [monomethylarsonic acid (MMA) and dimethylarsinic acid (DMA)]. Inorganic forms of arsenic are more toxic than organic forms. It is well reported that arsenic affects almost all cellular processes and organ functions that manifest due to cellular stress, mitochondrial and oxidative damage, genetic mutations and epigenetic dysregulation [1]. Low to moderate levels of arsenic exposure (10-300 μg L -1 ) from drinking water can cause health problems such as skin lesions, circulatory disorders, neurological complications, diabetes, respiratory complications, hepatic and renal dysfunction [2]. Thus, arsenic toxicity is a major concern for environmental and toxicological investigations worldwide.
The availability of complete genome sequence and mutant libraries of Saccharomyces cerevisiae provides multi-directional opportunities to design insightful experiments to understand a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 arsenic metabolism, detoxification and tolerance acquisition mechanisms, which are mostly conserved from yeast to human [3][4][5]. Majority of yeast genes involved in arsenic response have homolog in humans that could potentially modulate toxicity in a similar manner as their yeast counterparts. Therefore, the budding yeast S. cerevisiae is a useful eukaryotic model organism for studying arsenic toxicity in human health and diseases [6].
To understand how yeast cells respond to arsenic exposure, it is crucial to identify arsenicrelated genes. Two main experimental strategies are used to identify arsenic-related genes systematically. The first is phenotypic screening of the S. cerevisiae mutant libraries under arsenic exposure [7][8][9][10][11][12][13][14]. Comparing growth kinetics of wild-type (WT) and a homozygous deletion mutant of a specific gene, both with and without arsenic exposure, identifies the potential role of that gene in arsenic sensitivity or resistance if it significantly affects the growth rate. The phenotypic screening is done for all~4870 non-essential genes in the yeast genome [14]. Although most of the phenotypic screens are performed in nutrient rich media, different studies used different strains and exposed cells to various arsenic concentrations for different durations. Therefore, diverse phenotypic screenings have identified different sets of arsenic-related genes [7][8][9][10][11][12][13][14]. Phenotypic screenings supported with additional physiological and biochemical characterizations have highlighted the genetic determinants of arsenic susceptibility and resistance along with their designated functions in yeast. On a larger scale, it is known that arsenic (specifically arsenite) binds to α-helices affecting secondary structures of proteins, which could be corroborated by phenotypic screens suggesting functional inhibition of the chaperonin complex [13]. These details are critical to understand the arsenic molecular response mechanism in yeast.
The second strategy is genome-wide transcriptional profiling [7,9,15]. By comparing the genome-wide gene expression patterns between WT cells with and without arsenic exposure, the genes whose expressions are significantly affected by arsenic can be identified. By knowing differentially expressed genes under arsenic exposure, multiple inferences could be derived on the functioning of genetic networks, their response directions and diverse pathways involved. Systematically, yeast has evolved various defence strategies to tolerate and detoxify arsenic by reduction of metal uptake, enhanced extrusion, sequestration within vacuoles and chelation by metal-binding proteins and polypeptides [9,15,16]. Considering the need for a systematic and prompt response in yeast, signal transduction pathways undergo a rapid re-programming of cellular transcriptome that eventually modulates proteome and metabolome profiles. The functionality of differentially expressed genes in the defense against the arsenic-toxicity could identify a novel responsive candidate that might be critical for arsenic resistance and cellular processing. Thus, it is essential to compile and compare all global transcriptome profiling data obtained under arsenic exposure.
Further, experimental evidence shows that arsenic can directly bind to and activate transcription factor Arr1 [17]. Other transcription factors such as Yap1 (a key regulator of oxidative stress response) and Rpn4 (a key regulator of proteotoxic stress response) also play a critical role in arsenic detoxification. It is known that transcription factors Arr1, Rpn4 and Yap1 confer resistance to arsenic via regulating the expression levels of diverse genes [7]. Therefore, arsenic-related genes can also be identified by comparing the genome-wide gene expression differences between WT and transcription factor mutants (arr1Δ, rpn4Δ or yap1Δ) both under arsenic exposure [7,15]. Transcriptional profiling also helps to understand the role of Arr1, Rpn4 and Yap1 in arsenic resistance and their effects on genome-wide gene expression patterns.
Additionally, Haugen et al. [7] utilized both phenotypic screening and gene expression analysis to draw conclusion that arsenic might channel sulfur into glutathione for detoxification, lead to indirect oxidative stress by depleting glutathione pools, and alter protein turnover via arsenation of sulfhydryl groups on proteins. They also highlighted that phenotypically sensitive pathways are upstream of differentially expressed genes, suggesting that transcriptional and phenotypic profiling implicate distinct but functionally related pathways in the yeast system. Therefore, to comprehensively characterize the arsenic-related genes, we need to integrate both phenotypic screening and transcriptional profiling datasets from all available studies in the literature.
Thousands of arsenic-related genes in yeast have been identified by many experimental studies using either phenotypic screening or transcriptional profiling, or both [7][8][9][10][11][12][13][14][15]. However, the literature about these identified arsenic-related genes is widely dispersed and cannot be easily acquired by researchers. Therefore, there is a need of a database which comprehensively collects the arsenic-related genes from the literature and provide a dynamic interface to all information. To meet this need, we have constructed YARG (Yeast Arsenic-Related Genes) database, which collects 3396 arsenic-related genes and their experimental evidence. Users can search YARG by gene names and get the information about their arsenic-related correlation with collective evidence from phenotypic screening and/or transcriptional profiling. Besides, users can browse YARG to retrieve 20 different lists of arsenic-related genes from nine experimental studies [7][8][9][10][11][12][13][14][15]. The experimental strategy, experimental strain and experimental condition of these studies are also provided. In summary, YARG is a useful resource for the scientific community to investigate arsenic toxicity in yeast.

Construction and contents
The configuration of YARG database Fig 1 illustrates the configuration of YARG database. Python with Django MTV framework was used to construct YARG website. Python was also used for raw data processing. The processed data was stored in MySQL. The tables were produced by Data Tables (a table plug-in for jQuery). The graphics were generated by vis.js (a browser based graphic drawing library).

Collection of 3396 arsenic-related genes
We collected 20 gene lists in nine existing studies [7][8][9][10][11][12][13][14][15] which experimentally identified arsenic-related genes by phenotypic screening (PS) or transcriptional profiling (TP). Among the 20 collected gene lists, 13 were generated by PS (Table 1) and 7 were generated by TP ( Table 2). We then retrieved 3396 arsenic-related genes from these 20 collected gene lists. Among the 3396 arsenic-related genes, 535 are supported by both PS and TP, 737 are supported only by PS, and 2124 are supported only by TP (Fig 2A). The distribution of these 3396 arsenic-related genes on different chromosomes is shown in Fig 2B.

Testing the enrichment of arsenic-related genes in input genes
For users' input genes, YARG tests whether they are enriched with arsenic-related genes. The p-value is calculated using hypergeometric test [18] as follows where G = 6572 is the number of genes in yeast genome, A = 3396 is the number of arsenic-related genes in YARG, I is the number of users' input genes, and K is the number of input genes which are also arsenic-related genes.

Utility and discussion
Database interface YARG provides two search modes. In the first search mode, users can input a gene name ( Fig  3A). After submission, YARG returns a page showing the basic information of the input gene and links to YeastMine [19] to see the homology information such as human homologs, fungal homologs, non-fungal homologs, functional complementation and the paralogs (Fig 3B). If the input gene is an arsenic-related gene, details (experimental strain, experimental condition and reference) of the experimental evidence (phenotypic screening and/or transcriptional profiling) are provided (Fig 3C). In the second search mode, users can input a list of genes ( Fig 4A). After submission, YARG uses the hypergeometric test [18] to analyze whether the input genes are enriched with arsenic-related genes ( Fig 4B). YARG also provides a figure and a table to show which input genes are arsenic-related genes and total number of supporting evidences ( Fig 4C). Details (experimental strain, experimental condition and reference) of the supporting evidence are also shown in Fig 4D. YARG provides three browse modes (Fig 5A). In the first browse mode, users can browse 3396 arsenic-related genes. For each gene, YARG provides a systematic name, a standard name, name description, genomic location, total number of arsenic-related evidence from PS and TP (Fig 5B). In the second browse mode, users can browse 13 arsenic-related gene lists generated by phenotypic screening. These 13 arsenic-related gene lists consist of 1 mutant gene list of arsenate-sensitive phenotypes, 3 mutant gene lists of arsenite-resistant phenotypes and 9 mutant gene lists of arsenite-sensitive phenotypes (Fig 5C). In the third browse mode, users can browse 7 arsenic-related gene lists generated by transcriptional profiling. These 7 arsenic-related gene lists consist of (i) 3 lists of genes which are differentially expressed between WT and WT under arsenic exposure and (ii) 4 lists of genes which are differentially expressed between WT and transcription factor mutants (arr1Δ, rpn4Δ or yap1Δ) both under arsenic exposure (Fig 5D).

Two case studies
Here we give two case studies to show that the search modes of YARG can return biologically meaningful arsenic-related information for the users' query gene(s). The first case illustrates a scenario of a single gene name submission. Yap1 is a transcription activator known to be involved in arsenic adaptation process via regulation of expression of ACR (arsenic compounds resistance) genes [17,20,21]. When we input a single gene name YAP1 (Fig 3A), YARG successfully identified YAP1 as an arsenic-related gene and provided seven arsenicrelated existing experimental evidences ( Fig 3C). Specifically, five phenotypic screening studies [7,10,11,13,14] utilized different experimental yeast host strains that collectively elucidate that YAP1 is an arsenic-sensitive gene, signifying that yap1Δ mutant has decreased fitness and transforms host strains into arsenic-sensitive phenotype. For example, Huagen et al. [7] identified deletion mutants with increased sensitivity to growth inhibition utilizing an available deletion mutant library of nonessential genes (4,650 homozygous diploid strains) pinpointing that yap1Δ is present in the first 50 arsenic-sensitive deletion strains. Moreover, Thorsen et al.'s transcriptional profiling experiment [15] showed that YAP1 is differentially expressed between wild-type strains with and without arsenic exposure. Haugen et al.'s transcriptional profiling experiment [7] identified 50 differentially expressed genes between wild-type and yap1Δ strain both under arsenic exposure. Strikingly, 20 of these 50 genes are known to play a critical role in protection against arsenic exposure, suggesting that the transcription factor Yap1 might strongly mediate arsenic-induced stress adaptation [7]. In addition to showing arsenic-related evidences of YAP1, YARG also provides links to YeastMine [19] for users to find out homology information of YAP1. For example, the human homolog(s) link to YeastMine (Fig 3B) reveals that arsenic-related gene YAP1 has a human A gene was considered differentially expressed if the fold-change value was greater than or equal to twofold and if the p-value was less than 0.001 (in [7]) or 0.01 (in [9] and [15]).
homolog TUSC1 which is known to play a role in various kinds of tumorigenesis [22][23][24][25], providing a possible explanation as to why arsenic is a potential carcinogen for various kinds of  cancers in human. Thus, YARG enables investigators to expose the various novel possible nexus between arsenic toxicity and human disease manifestations utilizing existing yeast toxicity studies. In summary, this case study successfully demonstrates that YARG can provide arsenic-related information and homology information for the user's queried gene. The second case study illustrates a scenario of a gene list submission. It is known that cadmium induces unfolded protein response, endoplasmic reticulum (ER) and oxidative stress, and hampers energy metabolism in yeast [26]. Several experimental studies have shown that S. cerevisiae uses similar detoxification mechanisms against cadmium and arsenic as other higher eukaryotic systems [9,15,16]. It is also well documented that the genes required for cadmium resistance have significant overlap with the genes required for arsenic resistance [27,28]. When we input a list of 73 cadmium-sensitive genes identified by phenotypic screening from Serero et al. [29], YARG successfully identified that these 73 cadmium-sensitive genes are enriched (p-value = 1.024E-8 calculated by hypergeometric test [18]) with arsenic-related genes (Fig 4B), which are consistent with the existing knowledge [27,28]. Specifically, 61 cadmium-sensitive genes are also arsenic-related genes with experimental evidence of phenotypic screening or/and transcriptional profiling (Fig 4C), suggesting that S. cerevisiae may use similar detoxification mechanisms against cadmium and arsenic [9,16]. This case study clearly demonstrates that YARG can support users to compare the gene lists related to different metals and toxins, which may help in identifying novel candidate genes for toxicological research.

Comparison with SGD and YeastMine
YARG collected 3396 arsenic-related genes supported by phenotypic screening or/and transcriptional profiling evidence from the literature. The advantages of YARG over SGD [30] and YeastMine [19] are as follows. First, SGD only allows users to check one gene at a time whether it is an arsenic-related gene according to the phenotype annotations. Second, YeastMine only allows users to retrieve a list of genes that are annotated to "metal resistance"; therefore, users still need to extract arsenic-related genes from this gene list. Third, both SGD and YeastMine define arsenic-related genes using only the phenotype annotations; neither of them provides the arsenic-related genes supported by transcriptional profiling. In summary, YARG is a useful resource of arsenic research since it provides arsenic-related genes supported by transcriptional profiling or/and phenotypic screening evidence from the literature.

Conclusions
In this study, we present YARG, a database which is a collection of 3396 arsenic-related genes from the literature. For each arsenic-related gene, the number and types of experimental evidence (phenotypic screening and/or transcriptional profiling) are provided. Users can use both search and browse modes to query arsenic-related genes in YARG. Two case studies (a single gene YAP1 and a list of 73 cadmium-sensitive genes) have been provided to show that YARG can retrieve biologically meaningful arsenic-related information along with experimental evidence for the users' query gene(s). In future, we will keep updating YARG as and when new arsenic-related gene lists are available from newly published papers. YARG will be maintained regularly by our laboratory personnel. Therefore, the long-term stability of YARG is guaranteed. We also provide two backup sites (http://cosbi5.ee.ncku.edu.tw/YARG/ and http://cosbi2.ee.ncku.edu.tw/YARG/) just in case the main website is temporarily not available. We believe that YARG is a useful resource for arsenic toxicity research in yeast, supporting research community worldwide.