Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions

High-throughput phenotyping projects in model organisms have the potential to improve our understanding of gene functions and their role in living organisms. We have developed a computational, knowledge-based approach to automatically infer gene functions from phenotypic manifestations and applied this approach to yeast (Saccharomyces cerevisiae), nematode worm (Caenorhabditis elegans), zebrafish (Danio rerio), fruitfly (Drosophila melanogaster) and mouse (Mus musculus) phenotypes. Our approach is based on the assumption that, if a mutation in a gene leads to a phenotypic abnormality in a process , then must have been involved in , either directly or indirectly. We systematically analyze recorded phenotypes in animal models using the formal definitions created for phenotype ontologies. We evaluate the validity of the inferred functions manually and by demonstrating a significant improvement in predicting genetic interactions and protein-protein interactions based on functional similarity. Our knowledge-based approach is generally applicable to phenotypes recorded in model organism databases, including phenotypes from large-scale, high throughput community projects whose primary mode of dissemination is direct publication on-line rather than in the literature.


Introduction
The functional annotation of genes and their products using the Gene Ontology (GO) [1] has been essential to the impact of recent advances in the biomedical sciences arising from the explosion of genome sequences now becoming available. The majority of GO annotations are manually asserted by trained experts based on literature evidence. Recently, large-scale community projects, using forward and reverse genetics, as well as pan-genomic phenotyping efforts, such as the International Mouse Phenotyping Consortium (IMPC) [2] and the Zebrafish mutation project [3], have begun the systematic phenotyping of animal model mutants. Such efforts have a huge potential to provide novel insights into gene functions and their roles in disease. However, data resulting from high-throughput phenotyping efforts are not immediately reported in the literature, and gene functions are not readily inferred from phenotype data. Here, we present a novel method to automatically infer functions of gene products from phenotype data. Our method is applicable to both manually assigned phenotype annotations and those resulting from electronically reported, high-throughput phenotype experiments.
GO annotations are comprised of a gene product, a term that represents molecular function, biological process or a cellular component, the literature reference for the assignment, and an evidence code that indicates how the annotation was derived.
Annotations of genes are maintained in model organism databases and the GO annotation (GOA) database [4] which now covers over 160,000 taxa and more than 32 million annotations. The strategies for annotating genes and proteins range from explicit expert manual curation of the literature, through electronic inference based on orthology, protein-protein or genetic interactions, to inference of functions based on protein family relations. GO annotations for humans currently comprise 353,102 annotations for 45,364 proteins (based on GOA version 115, accessed December 2 2012), of which approximately half are manually curated, the remainder being derived electronically. In the mouse, there are 25,437 protein-coding genes with GO annotation (both electronic and manual), and 9,990 proteins with experimentallyderived annotations. The scale of the annotation task and the speed with which new genomes are becoming available has necessitated the development of automated and semi-automated annotation strategies to maintain significant coverage, and several automated function annotation methods have gained importance in recent years [5,6]. However, phenotype data has not yet been employed on a large scale as a source of high-quality electronic annotations.
Animal model phenotypes are commonly characterized using a species-specific phenotype ontology, many of which are based on the Entity/Quality (EQ) framework [7]. While phenotype data has traditionally been gathered through literature curation, the results of high-throughput phenotyping are commonly made available directly in research databases without an associated report in the literature. These results are therefore not available through literature curation or inference of gene functions from phenotypes by trained curators. A systematic exploitation of phenotype data to identify gene functions requires a computational approach that can assign functions to genes based on the recorded phenotypes. Such an approach is applicable to manual curation of phenotype data from the literature as well as data derived from highthroughput phenotyping efforts.
The main challenge in designing such an approach is to relate phenotype observations in mutagenesis experiments, which are characterized using terms in species-specific phenotype ontologies, systematically to gene functions, which are described using the Gene Ontology [1]. We have developed a method that employs the logical definitions of terms in phenotype ontologies [8] and infers functions of genes and gene products from phenotype statements. Our method relies on the assumption that, if a mutation of a gene G results in a phenotypic manifestation that affects a GO process or function F , then G must have been involved in the function F . For example, if a phenotypic observation of a targeted gene knockout mouse mutant is abnormal B cell apoptosis, then the mutation must have occurred in a gene that is involved in B cell apoptosis. We apply our approach to yeast (Saccharomyces cerevisiae) [9], fly (Drosophila melanogaster) [10], fish (Danio rerio) [11], worm (Caenorhabditis elegans) [12] and mouse (Mus musculus) [13] phenotype data and identify several thousand novel associations between genes and their functions.
The quality of gene function annotations is often evaluated based on inter-annotator agreement [14,15] or a gold standard [16]. However, for novel functions that either have not yet been extensively studied or have not yet been reported in the literature, these approaches are not readily applicable. Therefore, in order to assess the quality of our inferred annotations, we applied the phenotypically inferred functions to the task of predicting known genetic and protein-protein interactions based on functional similarity over GO biological functions [17][18][19]. We use the GO annotations of gene products available from the various model organism databases as well as the GOA database [4] as baseline for predicting genetic interactions and PPIs, and compare the results to manually curated and experimentally validated genetic interactions and PPIs using Receiver Operating Characteristic (ROC) analysis [20]. We then perform the same analysis again, adding our inferred GO annotations to the annotations already available from the model organism databases and GOA. For each species, we identify an increase in the performance of predicting known genetic interactions and PPIs when adding the functions we infer, and, in most cases, the increase in performance is significant. When combining the inferred annotations across all five species and predicting genetic interactions and PPIs, we also find a significant increase in ROC AUC (p~1:4|10 {3 for PPIs, p~8:2|10 {5 for genetic interactions, pv10 {6 for PPIs from STRING, pv10 {6 for BioGRID interactions; one-tailed t-test).
The inferred functions and our analysis code are freely available on http://phenotype2go.googlecode.com. Table 1 provides an overview of the datasets and resources used in this work. Phenotypes in the yeast, fly, worm and mouse phenotype ontologies were manually defined [8] using the PATO framework [7], while fish phenotypes are directly annotated using PATO. According to PATO, a phenotype statement can be decomposed into one or more entities (E) that are affected in a phenotype and a quality (Q) that determines how the entity is affected. Table 2 shows the number and completeness of the phenotype definitions for yeast, fly, worm and mouse phenotypes ontologies.

Predicting Functions from Phenotypes
To identify gene functions from phenotypes, we first identified the quality and the entity of the animal model's phenotype annotation and then identified the genes that have been mutated in the animal model. If the entity that is part of the phenotype statement is based on the GO, we assign that GO term as a function to the gene that has been mutated in the animal model. For example, the mouse model Vgf tm1Srjs (MGI:2179681), a targeted mutation of the Vgf gene, is characterized by a phenotype lactation failure (MP:0010249). Lactation failure is decomposed into the entity lactation (GO:0007595) and the quality lacking processual parts (PATO:0001558). Since lactation is impaired in the phenotype resulting from a mutation in Vgf, we infer that Vgf must be involved in lactation.

Predicting Interactions
We evaluate the inferred gene functions by applying them to the prediction of genetic interactions and PPIs. To obtain the genetic interactions and PPIs, we use the GO annotation files and identify GO annotations with the IGI (inferred from genetic interaction) and IPI (inferred from protein interaction) evidence codes. The GO annotations contain as additional evidence the interaction partner from which the annotation has been inferred, and we use this pair as a genetic interaction or PPI (depending on the evidence code). Since the use of interactions contained in the GO annotation files may introduce a bias when predicting interactions based on GO annotation similarity, we further use known PPIs from the STRING database [21] and interactions from the BioGRID database [22] to provide additional independent verification datasets.
We then filter the sets of interaction data from the model organism databases and remove the interaction pairs for which we have not inferred a novel function (i.e., if G 1 and G 2 interact but we were not able to infer a novel function for G 1 or G 2 , we remove this pair from the interaction data set). For each species for which we infer novel functions, we then perform a pairwise computation of functional similarity between genes. To calculate the similarity between two sets of GO annotations, we used the Jaccard index as a measure of semantic similarity. If a gene G has the GO terms t 1 ,:::,t i as annotations, we generate the set An(G) as the smallest set that contains t 1 ,:::,t i and is closed against superclasses (i.e., if t[An(G), and s is a superclass of t, then s[An(G)). We then define the similarity between the genes G 1 and G 2 as: While a large number of different semantic similarity measures exists [19], we chose to apply the Jaccard index as it does not rely on information content to determine similarity. While similarity measures that incorporate the information content of an ontology term commonly provide better performance than measures that do not use information content [19], they may also introduce a bias when comparing the results of an analysis performed on multiple independent datasets. To ensure comparable results across all species we analyze, we used the Jaccard index without any weights based on information content.
As a result of applying this similarity measure, we obtain, for each gene, a functional similarity value to all other genes. For each gene G, we then rank this similarity list so that the gene that is functionally most similar to G is on rank 1 and the least similar on the last rank. Using the genetic interaction and PPI datasets as positive instances and all other pairs as negative, we then predict genetic interactions and PPIs based on functional similarity. We measure the success using an analysis of the receiver operating characteristic (ROC) curve and determine the area under the ROC curve (ROC AUC) [20]. A ROC curve is a plot of the true positive rate as a function of the false positive rate and can be used to visualize the quality of the predictions. The ROC AUC is a quantitative measure of the classifiers performance: a ROC AUC of 0.5 indicates a random classifier (i.e., the true positive rate increases proportional to the false positive rate), while a ROC AUC of 1 indicates a perfect classifier (i.e., all true positive instances are placed on the first rank, while the true negative instances are all ranked lower).
In the absence of a large set of true negative examples of PPIs or genetic interactions, we make the assumption that interactions that are not present in our evaluation datasets are negative instances. As a consequence, our true positive rate is lower than the one that we would obtain when treating only validated negative interactions as negative examples. Furthermore, the resulting ROC AUCs are also lower than the ones we would achieve with validated negative examples of interactions. Since we use the same positive and negative instances (for each species) to perform our comparative evaluation of current and inferred GO functions, this assumption will not affect the validity of our results.

Prediction of Gene Functions
Applying our method, we extract 1,409 novel associations between genes and their functions for zebrafish, 12,483 for yeast, 1,057 for fruitfly, 3,885 for worm and 14,013 for mouse, using only the GO annotations with manually created evidence for comparison (evidence codes Inferred from Experiment (EXP), Inferred from Direct Assay (IDA), Inferred from Physical Interaction (IPI), Inferred from Mutant Phenotype (IMP), Inferred from Genetic Interaction (IGI) and Inferred from Expression Pattern (IEP)). We evaluate the quality of the inferred functions both manually and by applying them for predicting known genetic interactions and PPIs. First, we randomly selected 20 annotations from each species and examined scientific papers in which the gene and the resulting phenotypes   but which has not yet been added as a GO annotation of Efnb2. In several cases, the annotations we generate are too general, i.e., a biologist would be able to infer a more specific function from the described experiment; nevertheless, even the general annotations are valid and may provide useful information about a gene's function. The detailed manual evaluation results, including references to the manuscripts that support the novel annotation, are included as supplementary material.
We also found evidence in some systems for an improvement in annotation granularity. Taking the novel annotations in the mouse genome to erythrocyte development (GO:0048821), we manually examined the underlying phenotype evidence in MGI for the new assertions, together with the existing GO process and function annotations. Of 77 novel annotations to erythrocyte development, 29 genes already had some annotation to erythrocyte differentiation (GO:0030218) or regulation of the erythroid or myeloid lineages, with the most common annotation being to the parent term erythrocyte differentiation. Some genes were annotated to directional regulation, such as positive regulation of erythrocyte differentiation (GO: 0045648), but others were annotated to much more general GO terms such as myeloid cell differentiation (GO:00030099). The remaining genes with novel annotations to erythrocyte development have no current GO annotation to erythroid lineage processes but mutants show phenotypes affecting erythroid differentiation or development. Table 3 provides an overview over our manual  Figure 1 shows the part of the GO hierarchy containing erythrocyte development. Of the 77 genes predicted by phenotypic analysis to annotate to erythrocyte development, 29 already had some relevant annotation, shown in column 3. Relevant annotation was taken to be any child class of myeloid cell differentiation (GO:0030099), or erythrocyte homeostasis (GO:0034101), thereby including as many levels of granularity as possible in order to compensate for possible curator decisions to annotate more generally. The remaining 48 genes had no existing annotations to any of these classes. In many cases, multiple genotypes provided evidence for the novel annotation; an example allele is shown in column 4. Phenotype annotations to abnormal erythropoiesis (MP:0000245) or its subclasses were counted as evidence. Whilst close curation of the phenotypic evidence may suggest that annotation to a parent of erythrocyte development is more appropriate, in all cases the evidence indicated that annotation to the neighbourhood of this class was correct but missing. doi:10.1371/journal.pone.0060847.t003 evaluation results for annotations to erythrocyte development, and Figure 1 shows the corresponding part of the GO hierarchy.
To further evaluate the predicted annotations, we quantify their impact on predicting known genetic interactions and PPIs. For this purpose, we applied a measure of functional similarity between genes (see Materials and Methods section) and rank genes based on their similarity. For evaluating the functional similarity, we use all GO annotations available for a gene, including electronically inferred annotations. We then use datasets of genetic interactions and PPIs as a gold standard. We obtain the interactions from the GO annotations tagged with the IGI (inferred from genetic interaction) and IPI (inferred from protein interaction) evidence codes. We further use protein interactions from the STRING database [21] and protein and genetic interactions from the BioGRID database [22] to evaluate the results. Figure 2 shows the results of the ROC analysis for predicting genetic interactions, Figure 3 shows the results of the ROC analysis for predicting PPIs (extracted from GO annotations), Figure 4 shows the results of the ROC analysis for predicting PPIs from STRING and Figure 5 shows the results of the ROC analysis for predicting interactions from the BioGRID database. We find that the performance of predicting genetic interactions and PPIs based on gene functions improves for every species when including the gene functions we infer. The results are summarized in Table 4, and detailed evaluation results are provided as Supplement S2.
One example of our evaluation is provided by Casp1 (caspase 1, MGI:96544) and Il1b (interleukin 1 beta, MGI:96543), which are known to interact in mice and both are essential for several shared functions [24]. Based on the asserted GO functions, their functional similarity is relatively low. However, based on the phenotypes observed for caspase 1 mutations and interleukin 1 beta mutations, we infer several new functions in which both are involved, including defense response to bacterium and interleukin-1 beta secretion. We also infer the involvement of Casp1 in inflammatory response which is a known function of Il1b. As a consequence of the novel functional annotations, Casp1 and Il1b are functionally significantly more similar than currently inferred through asserted functional annotations (full data provided as Supplement S2).

Comparison to related work
The most similar related work of which we are aware are explicit mappings between phenotype terms and GO terms that have been created as part of the curation pipeline in WormBase (available at http://wiki.wormbase.org/index.php/Gene_Ontology# Phenotype2GO_pipeline_.28Sanger_and_Caltech.29). In these mappings, particular PATO-based ontology terms are explicitly and manually mapped to GO terms that can reliably be inferred based on the phenotype annotation. For example, the WormBase Phenotype Ontology term Long (WBPhenotype:0000022) is mapped to the GO term negative regulation of multicellular organism growth (GO:0040015), and these mappings are used to infer functional annotations from mutant phenotype automatically in WormBase. In our approach, we use the PATO-based definitions that have been created for the WormBase Phenotype Ontology [25] to infer gene functions, which leads to complementary functional annotations. In particular, as a consequence of automated inference of GO functional annotations from phenotypes in WormBase, we observe the highest overlap of functions we infer with existing GO annotations (see Table 4), while we nevertheless infer a large number of novel functions that cannot currently be identified through WormBase's mappings. Furthermore, we use an ontology-based approach to extend such a mapping between observed phenotypes and functional annotations to other model organism species. In particular, we reuse the large number of PATO-based definitions [7] for phenotype ontologies that has recently been created [8,26], and are therefore able to apply our approach to any model organism data for which such definitions have been created. Furthermore, model organism databases such as ZFIN [11] use PATO-based phenotype descriptions directly, and our method is directly applicable to such phenotypes.
There are a large number of automated function prediction algorithms that utilize text mining [4,[27][28][29], interaction networks [5] and sequence information [6]. Our approach incorporates experimentally derived phenotype data that has, to the best of our knowledge, not yet been incorporated on a large scale into GO function prediction algorithms.

Electronically Inferred Annotation and ''Downstream'' Effects
While our approach will not replace the experimental validation and manual curation of functional information in model organism databases, it is, to the best of our knowledge, the first large-scale approach to infer gene functions from phenotype information. With the emergence of genome-wide phenotyping projects, our method provides the necessary tool to bridge the gap between the availability of phenotype information and the inference of functions. In particular, traditional literature curation alone will not be applicable to the analysis of phenotypes resulting from highthroughput phenotyping efforts and the insights they can provide into gene functions, primarily since they are not directly reported in literature.
Our method electronically infers functions from mutant phenotypes and will not create GO annotations in which scientists can have the same confidence as in manually created annotations. However, we have demonstrated the great utility of inferring some annotations electronically from experimental data, in particular the improvements these novel annotations can bring to computational analyses and the prediction of genetic interactions and PPIs. GO evidence codes are used to indicate the source and evidence for an annotations, and our annotations would obtain an inferred by electronic annotation (IEA) evidence code. An evidence code specifically indicating that the electronic annotation was made based on the analysis of mutant phenotypes would further improve the accuracy of the evidence code annotation.
One limitation of our approach is its inability to distinguish between direct involvement of a gene in a biological process or function, and the involvement of a gene through regulation of other genes, or functions, that are directly responsible for the resulting phenotypes. This phenomenon, known as ''downstream effects'' (cf. http://www.geneontology.org/GO.annotation. conventions.shtml#Downstream_Process_guidelines), is a major concern for GO annotations. Currently accepted practices resolve this issue by requesting more specific terms to be added to GO and annotating to these terms instead. In particular, parthood and regulatory terms, which are defined using appropriate part-of or regulates relations, should be used instead of annotating to the more general process in which genes are only involved indirectly. By following the relations used in defining the more specific terms, involvement in the general process can then be defined based on the GO structure [1]. As our inference of GO functions is based on phenotype information alone, we cannot infer the specific function in most cases. Often, additional experiments would be required to determine how a gene leads to an observed phenotype. In some cases our annotations will be rated high-level, but nevertheless are likely to be useful and correspond to GO annotations that can be inferred if the specific function of the gene was known, assuming the appropriate relations between processes and functions are asserted in GO and the phenotype annotations and the definition of phenotype terms are correct. Our manual analysis of annotations to processes in erythrocyte development and differentiation, however, suggests that in some cases we are able to suggest more specific annotation based on underlying experimental phenotype data.

Relevance for Scientific Analyses
One of the most widely adopted applications of GO-based gene function annotation falls in the domain of analysis and interpretation of gene expression data [30]. This method relies on the quality and quantity of available functional annotation of genes and gene products, and our method has the potential to further improve the accuracy and statistical power of such analyses. Gene functions are also widely used to infer relations between genes and gene products, including the construction of genetic and protein interaction networks [31], the identification of causal genes in diseases [32] or for drug discovery and drug repurposing [33]. All these approaches can be improved with a higher coverage of reliable functional gene annotations, and further extend the functional analysis of gene expression datasets using data observed in phenotype experiments.
A further computational application for functional annotations is the prediction of genetic interactions and PPIs, and we have demonstrated that both tasks improve significantly when the gene functions we infer are included. This improvement is measurable even when the electronically inferred annotations currently available for genes are taken into consideration, thereby demonstrating that our approach is complementary to other electronic annotation methods.
However, we find significant differences between species when predicting genetic interactions and PPIs. For example, we observe only a small increase in ROC AUC for yeast, although we infer a large number of novel gene functions, while we observe a high increase in ROC AUC for predicting both genetic interactions and PPI in zebrafish, although the number of gene functions we infer is much lower. One explanation for this observation may be the different completeness of annotations in different species, either as a result of different cost and complexity of functional genomics experiments (which is lower in yeast than for most other species), or as a consequence of different resources available for annotating gene functions in the various model organism databases. Furthermore, our evaluation datasets contain large differences in the number of interactions within each species. We aim to account for these divergent numbers of positive and negative examples of interactions by using a t-test to compare the difference in ROC AUC. Nevertheless, the ROC AUCs reported for species with low numbers of known interactions will be less accurate than ROC AUCs for species with a high number of known interactions, and this may explain parts of the differences observed in ROC AUC.

Future Research
Currently, we are conservative in the assumptions we make that allow us to infer functional information. However, our approach can be extended to infer more detailed and complex functional information. For example, if an abnormal morphology of the tail is observed as a phenotype resulting from a mutation in a gene, then this gene will likely be involved in tail morphogenesis. However, in some cases such a phenotype may not immediately be the consequence of mutations in the gene but rather the result of an impaired function of another gene that is related with the mutated gene through a biochemical, cellular or physiological pathway. In future research, an explicit representation of such interactions, in particular on an organism-wide physiological scale, will further improve the performance of our method.

Supporting Information
Supplement S1 Inferred GO functions. A complete dataset of inferred GO annotations from phenotypes. Each file contains the gene idenfier and the novel GO functions we infer for each species. (ZIP) Supplement S2 Computational evaluation results. A complete dataset for predicting interactions (genetic interactions, PPIs from GO, PPIs from STRING and interactions from BioGRID) using GO functional similarity. The first two columns of each file contains the interaction partners, the third column contains the position of the interaction pair in the functional similarity list (i.e., a value of 0 indicates that both partners are the functionally most similar, while a value of 1 indicates that both partners are the functionally least similar) based on GOA annotations, and the fourth column indicates the position of the interaction pair in the functional similarity list based on GOA's and our inferred annotations. (BZ2) Supplement S3 Manual evaluation results. A dataset of manually evaluated inferred functions. The file contains an inferred function for 20 genes from yeast, worm, fruitfly, zebrafish and mouse, as well as a PubMed reference to a manuscript providing evidence for the function. (ZIP) Table 4. Summary of results, including the number of inferred functions and the p value resulting from a one-tailed t-test comparing the ROC AUC for predicting genetic interactions using the original functional annotations vs. the combination of original and inferred annotations.