HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores

Protein function is often modulated by protein-protein interactions (PPIs) and therefore defining the partners of a protein helps to understand its activity. PPIs can be detected through different experimental approaches and are collected in several expert curated databases. These databases are used by researchers interested in examining detailed information on particular proteins. In many analyses the reliability of the characterization of the interactions becomes important and it might be necessary to select sets of PPIs of different confidence levels. To this goal, we generated HIPPIE (Human Integrated Protein-Protein Interaction rEference), a human PPI dataset with a normalized scoring scheme that integrates multiple experimental PPI datasets. HIPPIE's scoring scheme has been optimized by human experts and a computer algorithm to reflect the amount and quality of evidence for a given PPI and we show that these scores correlate to the quality of the experimental characterization. The HIPPIE web tool (available at http://cbdm.mdc-berlin.de/tools/hippie) allows researchers to do network analyses focused on likely true PPI sets by generating subnetworks around proteins of interest at a specified confidence level.


Introduction
Protein function occurs or is regulated by protein interactions and therefore knowledge on the partners of a given protein can give us important information regarding its activity. For instance, specific protein-protein interactions (PPIs) can be involved in diseases (see e.g. [1]). PPIs can be evaluated by many experimental methodologies, which have hugely different degrees of confidence and different experimental set-ups. For instance, while yeast two hybrid (Y2H) identifies direct physical interactions between two proteins, mass spectrometry (MS) based datasets report components of protein complexes, which may or may not be in direct physical contact. In addition to experimental methods, computational methods propose protein interactions based, for example, on orthology, protein domains known to interact, co-expression and functional annotations [2,3].
PPIs are collected in several databases that make the data and the evidence behind it easily accessible and allow different mechanisms to query and display the data [4,5,6,7,8,9,10]. These resources are very useful for researchers interested in checking a small number of particular proteins of interest. However, PPI data can also be used globally for systematic network analyses, prediction of protein properties, and evaluation of novel datasets of PPIs produced in a high-throughput fashion.
Computational use of PPI datasets often requires selecting PPIs at particular levels of confidence. For example, the quality of a novel PPI dataset may be evaluated by its overlap with known interactions defined with high reliability, whereas a statistical analysis might require a large number of interactions therefore benefiting from a less restricted set of PPIs. The flexible selection of PPI datasets at various confidence levels requires a continuous scoring scheme for PPIs reflecting the reliability of their experimental characterization.
With the objective of creating a resource allowing the selection of PPIs by experimental confidence cut-offs, we generated HIPPIE (Human Integrated Protein-Protein Interaction rEference), a scored human PPI collection integrated from multiple sources. Following [8], we developed an expertly curated scoring scheme that takes into account the reliability of different experimental evidence in the definition of a PPI combining three types of information: experimental techniques used, number of studies finding the PPI, and reproducibility in model organisms.
A web tool to browse the data as well as the scored PPI dataset are provided at http://cbdm.mdc-berlin.de/tools/hippie. The scored dataset includes information on the data we used to build it so that modifications of the scoring mechanism can be easily achieved. We illustrate the usefulness of HIPPIE in increasing the coverage of novel PPI datasets and demonstrate that its scoring scheme reflects the reliability of the reported interactions.
Where available, we retrieved the information on the originating study and the experimental methodology used to measure each interaction from the source databases and also assigned an experimental category to interactions from the additionally included studies. As a result, more than 99% of all interactions in HIPPIE are associated to at least one of the methods listed in Table 2 and are annotated with the studies in which they were detected.
To add to the confidence scoring of experimentally verified human PPIs a component based on experimental evidence in nonhuman organisms we included data from three databases that map interactions between non-human protein pairs to their human orthologs: HomoMINT (release date: March 5, 2009) [23], I2D (release date: January 7, 2010) [2] and the PPI dataset from [24].

Identifier mapping
Different public PPI databases and datasets use different types of gene or protein identifiers. We aimed at mapping all protein pairs listed in HIPPIE to Entrez Gene and UniProt identifiers. For this purpose we applied the database identifier mapping tables curated by UniProt [25] and the HUGO Gene Nomenclature Committee (HGNC) [26]. We mapped all database entries to their canonical representatives and did not consider splicing forms. In the web interface the data can be queried either by protein (UniProt id or accession) or by gene identifier (Entrez Gene id or gene symbol).
Interactions containing identifiers that could not be mapped to human Entrez Gene ids or UniProt ids were not included in HIPPIE.
Mapping PPIs to the genes encoding the interacting proteins is affected by certain ambiguity since the same protein sequence may be encoded by duplicated genomic loci. In the flat file version of HIPPIE these ambiguous PPIs are expanded such that a given PPI is represented by all possible combinations of gene identifiers.

Score calculation
For each interaction a score S between 0 and 1 was calculated reflecting the reliability of its combined experimental evidence. This score was calculated as a weighted sum of three different subscores which are s s (a function of the number of studies in which an interaction was detected), s t (a function of the number and quality of experimental techniques used to measure an interaction; see below for details) and s o (a function of the number of non-human organisms in which an interaction was reproduced). Each of these three subscores s i was calculated with a non-linear saturating function of the form: such that s i (0) = 0 and s i (') = 1, where the a i are constants that control the steepness of the function. For subscore s s , n is the number of different studies where the interaction was reported (number of PubMed identifiers associated), regardless of whether multiple experimental evidence was provided in each study.
For subscore s o , n is the number of species where orthologs of the interacting proteins could be defined and were found experimentally to interact (currently Bos taurus, Caenorhabditis elegans, Canis familiaris, Drosophila melanogaster, Gallus gallus, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, and Sus scrofa).
For subscore s t , n is a sum of scores from different experimental techniques by which an interaction was verified (even if used in the same study). Most PPI databases use controlled vocabulary descriptors for these experimental techniques as defined by the PSI-MI ontology [27], however for some terms we could not find an equivalent ontology term. Manual curation was used to assign a score to each PPI detection method ranging from 0 (no experiment assigned, less than 1% of PPIs) to 10. Scores and corresponding PSI-MI codes are displayed in Table 2. Methods that can ascertain interactions with the highest reliability, such as in vitro techniques like X-ray crystallography, were assigned the highest scores. Complementation-based assays and affinity based technologies were roughly equally scored with an average value of 5, slightly increased for those methods that are used generally in homologous, more physiological setups, such as FRET. Methodologies that do not directly provide evidence for interaction, such as colocalization or cosedimentation, are scored with the lowest values. The total score S was calculated as a weighted sum of the three subscores: It is important to note that our dataset does not include interactions not experimentally verified with human proteins: no interaction received a score alone from its verification in nonhuman organisms. We also remark that this scoring scheme does not consider computational evidence other than the definition of orthology relations from human proteins to proteins in other organisms.

Parameter selection
The six free parameters of the scoring formula (a s , a o , a t , w s , w o and w t ) were optimized by performing a grid search in the parameter space. We performed the search in the range [0, 3] for the a i and in the range [0, 1] for the w i . We chose a step width of 0.1 for both a i and the w i . The step width was chosen sufficiently small such that selecting neighboring parameter combinations resulted only in small changes in the interaction scores which decreased the probability of missing an optimal solution. Constraints were set on the weights w i by requiring that they sum up to 1. Parameter combinations leading to only few discrete scores were excluded (this happened, for example, when w t was set to 0, since the different experimental weights account for a large fraction of the score's granularity).
PPIs are sometimes reported in multiple studies. We reasoned that we could use this property to assess the performance of a parameter combination. To do this evaluation we used the IntAct dataset, which currently consists of 28 073 interactions (38.5% of HIPPIE). This dataset has explicit associations between studies and experiments, and the experimental information is annotated following the PSI-MI format.
The assessment of performance of a parameter set was done by successively removing each one of the 109 studies in IntAct that contain at least 10 interactions and more than two PPIs found in multiple studies. For each study j, we recalculated the scores of the remaining dataset, IntAct red , found the set of PPIs described both in the study j and in IntAct red , IntAct red \study j È É , and computed the deviation from random expectation of the number of highly scored interactions among the overlap: dev j~s cores IntAct red \study j À Á wQ 3 IntAct red \study j

0:25
where Q 3 is the upper quartile of the score distribution of IntAct red .
To measure the overall performance of a parameter combination we chose a function f of the weighted mean of the logarithm of dev i over all studies: where the weights v i were chosen proportional to the overlap size between IntAct red and study j and n is the number of studies. The best parameter combination maximizes f. We found several parameter combinations (several thousand optimal combinations out of more than 700 000 different parameter combinations tested) maximizing the function f (max(f) = 1.023). From the equally well performing parameter combinations we chose the set of parameters that resulted in the largest spread of the distribution of scored interactions. For that purpose the scores of the entire HIPPIE were repeatedly calculated for each of the optimal parameter combination and for each score distribution the interquartile range (iqr) was determined. We found that the parameter set [a s = 2.3, a o = 1.6, a t = 0.2, w s = 0.6, w o = 0.1, w t = 0.3] maximized both f and iqr.

Results
HIPPIE is a dataset of experimentally measured human PPI derived from several publicly available PPI datasets (Table 1). For reference, we distribute a stable release of HIPPIE consisting of 72 916 interactions, which was used in this manuscript for several descriptive analyses (Table S1; HIPPIE version 1.2). The live version of HIPPIE is monthly updated making use of the web query interface PSICQUIC [28], which allows us to automatically retrieve the newest interaction data from most of the manually curated source databases (BioGrid, IntAct, MINT, DIP and BIND) and integrate the new interactions and updated evidence records into HIPPIE.
The network is accessible via a web tool (http://cbdm.mdcberlin.de/tools/hippie) that allows for querying the interactions by a gene symbol, Entrez gene id or UniProt identifier (id and accession). On the result page a confidence score is listed with each interaction partner of the query protein and detailed information on the evidence contributing to the confidence score can be accessed. Links to the original studies are provided.
A typical problem after generation of experimental results producing a list of genes, proteins and/or interactions between them, is the evaluation of the results in relation to the known PPI data. For example, a researcher may have obtained proteomics data for a few proteins of interest and wants to evaluate the novelty of the interactions, or the possible relation of the interactors with a disease protein of interest.  To facilitate this analysis, HIPPIE can be queried with a set of proteins and/or interactions between them from which a network of known data around the proteins of interest is constructed. The online tool will identify interactions between the proteins submitted (layer 0 network), or their interactors not contained in the query set (layer 1 network). The computation of networks with more layers might be lengthy if hundreds of protein partners have to be analysed. For this we provide a Java command line tool (available from http://cbdm.mdc-berlin.de/tools/hippie and also deposited at the SourceForge open software archive: https:// sourceforge.net/projects/hippiecbdm) that will do the computation on the local machine of the user for large input sets or neighbours of neighbours. A confidence threshold to control the reliability and size of the constructed network can be also applied. Additionally, we provide a filter option for the PSI-MI interaction type annotation provided by most of HIPPIE's source databases. This feature allows for selecting direct physical interactions from HIPPIE. The thereby generated HIPPIE subnetworks can be exported from HIPPIE for further analyses or can be visualized using the tool Cytoscape Web [29], which has been integrated into HIPPIE.
The web site also offers the entire HIPPIE dataset for download in two different formats: in PSI-MI TAB 2.5 format as defined by the Protein Standard Initiative [27] and in our own tab delimited flat file format. Currently we distribute a freeze version (version 1.2) used in this manuscript for analyses, and the monthly updated version.
While merging the different data sources we kept track of the information about which experimental system type was used to detect each single interaction and whether there were several studies where the interaction was found. Additionally we retrieved the interaction data from PPI databases that link interactions in non-human model organisms to their human orthologs. From these different types of information (experimental systems, number of studies and reproducibility in other organisms) we calculated an overall score reflecting the reliability of each interaction (See Methods for details and Table 2).
We note that the different experimental methodologies behind the PPIs in HIPPIE are able to detect direct physical interactions between proteins to a varying degree. Even though some of them are in fact measuring co-membership in larger protein-complexes we will refer to all types of associations detected by these methods as interactions or PPIs. The HIPPIE score tries to reflect both the reliability of the various methods as well as the ability to detect direct rather than indirect interactions.
The number of PPIs derived from different experimental system types was very variable. HIPPIE integrates various datasets dealing with different experimental systems and thus contains a larger amount of interactions than each of those sets separately (Table 1). Values for three well populated and meaningful sources of PPIs, Y2H, anti-bait coimmunoprecipitation (Coprep), and tandem affinity purification (TAP) are shown in Figure 1 that cover 78% of the total amount of proteins in the current version of HIPPIE, but only around 50% of its interactions. Coprep and TAP share relatively many PPIs between each other (139 PPIs) compared to the other pairwise overlaps between methods. For example, TAP shares 95 interactions with Y2H despite the much higher amount of Y2H interactions as compared to Coprep. This higher overlap between Coprep and TAP in comparison with the Y2H data might reflect the similarity between the first two approaches in comparison with the latter, as Coprep and TAP are both based on antibody capture of a protein complex while Y2H is based on the reconstitution of a binary interaction inside of a heterologous system (yeast).
To illustrate the benefit of using a large dataset such as HIPPIE, we compared it with novel high-throughput PPI datasets not used for its production. We chose two high-throughput PPI datasets from the recent literature: a Y2H dataset, Y2He [30], containing 551 PPIs between 434 proteins and a MS dataset, MSe [31], containing 711 PPIs between 424 proteins. The coverage of the Y2He and MSe datasets by HIPPIE was of 120 (21.8%) and 73 (10.3%) PPIs, respectively.
We evaluated the usefulness of the HIPPIE score using the two novel datasets. The HIPPIE database was divided in a high quality Figure 1. Coverage of HIPPIE and overlap by three technique specific datasets. Left: proteins. Right: PPIs. Y2H is yeast-two-hybrid, Coprep is anti-bait coimmunoprecipitation and MS is affinity capture mass spectrometry. The protein numbers show that Y2H can focus on many proteins that have not been targeted by the other two techniques. Together the three techniques already cover 80% of all proteins currently considered in HIPPIE (i.e. 80% of all proteins in HIPPIE participate in at least one Y2H, Coprep or MS experiment). However, the overlap in PPIs between these datasets and to the remainder of HIPPIE is much smaller indicating that PPI detection is technique specific. Nevertheless, one can appreciate that similar techniques have a bias towards detecting similar PPIs, here illustrated by the significant overlap between Coprep and MS and by the little overlap of Y2H to the other two techniques. doi:10.1371/journal.pone.0031826.g001 subset containing the top 25% highest scoring interactions (score . = 0.73) and a lower quality subset (score ,0.73; see Figure 2). Then, we compared the fraction of PPIs in each HIPPIE subset that was recalled by the novel dataset. If the scores are meaningful one would expect better recall of the set with high-confidence scores.
To measure the recall of HIPPIE by an external dataset of PPIs one has to consider that some HIPPIE PPIs may not be detectable by the experimental setup used to produce the external dataset. In the case of Y2H and MS datasets a number of proteins are used as baits. Therefore, we considered for each of these studies that the ''detectable PPIs'' from HIPPIE were those where at least one of the interacting proteins was a bait in the study considered ( Table 3). The values of detectable PPIs and recall were used to calculate one-sided Fisher's exact tests to assess the significance of the differences in recall between high and low confidence HIPPIE subsets. The high quality subset had the largest overlaps in percentage with the PPIs of the novel datasets and these overlaps were significant (Table 3; p-values of 6.40e-15 and 1.75e-6 for Y2He and MSe, respectively) suggesting that the PPI score correlates with experimental reproducibility.

Discussion
In this work we presented HIPPIE, an integrated dataset of human protein interaction data scored according to experimental evidence. This resource has been created for those researchers that need to use globally the complete knowledge on human protein interactions. This is required in systems biology studies and in the evaluation of high-throughput results (e.g. novel PPI datasets) that require contrasting results with interactions selected for a particular level of reliability.
HIPPIE currently integrates 72 916 interactions from several public PPI resources scored according to confidence. For comparison, the complete human interactome map has been estimated to contain between 200 000 and 400 000 interactions (according to [32] and [33], respectively) suggesting that our knowledge of the human interactome is still incomplete.  Nevertheless, producing a large collection of integrated PPI data is critical for its usability because novel high-throughput PPI datasets often contain just hundreds of PPIs and might have little overlap with smaller existing PPI resources integrated in HIPPIE. Several resources have been created that, like HIPPIE, integrate PPI data from multiple sources but do not have a focus on distributing a simple scored dataset, while offering excellent tools to examine evidence behind each PPI (e.g. iRefWeb [34]) or do not focus on experimentally verified interactions (e.g. STRING [35]). Some other databases offer a continuous confidence scoring scheme, e.g. MINT [8] and HAPPI [36], but they do not allow batch scoring of PPI sets or the exclusive retrieval of high confidence interactions and lack the integration of several important high-throughput experimental datasets. The scoring system of MINT is closer to the one we use as it considers levels of technical evidence, number of studies and orthology [8]; however, as the PPI data from MINT is manually curated, the amount of human PPIs in MINT is currently less than a third of those in HIPPIE, limiting its use in the evaluation of novel datasets. Finally, in contrast with MINT and HIPPIE, HAPPI contains only a small fraction of PPIs experimentally derived in human while the majority are either computationally predicted or inferred from other species.
We are aware that any assignment of reliability scores to experimental techniques necessarily reflects the individual belief of researchers. We tried however to base our selection of parameters and weights in the scoring formula on objective criteria by optimizing the performance of our scoring scheme in assigning high values to reproducible interactions. For researchers who nevertheless wish to modify either the selected parameters or the scores assigned to the different techniques we offer a tool at our homepage that allows the scoring of HIPPIE using an altered set of these values.
HIPPIE has been used for the evaluation of existing novel PPI datasets showing that it increases their coverage over individual resources and that its scoring scheme correlates with the ability to find a PPI in experimental data not included in the database (Table 3). A web tool to query the data, the scored PPI dataset as well as the raw data are available at http://cbdm.mdc-berlin.de/ tools/hippie. The tool allows batch annotation of datasets of PPIs. Future work on HIPPIE will be directed towards the inclusion of novel datasets and versions for major model organisms.

Supporting Information
Table S1 Scored dataset of PPIs. The columns indicate (1) UniProt identifier and (2) Entrez Gene identifier of the first protein partner, (3) UniProt identifier and (4) Entrez Gene identifier of the second protein partner, (5) score and (6) a comment field summarizing the origin of the evidence. Evidence is arranged in three types: experiments, pmids, and sources. Experiment types are indicated in Table 2. Pmids are the PMID of manuscripts reporting the interaction. Sources are the datasets where the interaction was found and are indicated in Table 1. Multiple evidences for each type are separated by semicolon and multiple evidence codes for each type are separated by comma. If one protein maps to several genes, each combination of genes is listed in a separate line. This table is available from: http://cbdm.mdcberlin.de/tools/hippie/hippie_v1_2.txt.