Still Stratus Not Altocumulus: Further Evidence against the Date/Party Hub Distinction

Analysis of multi-validated protein interaction data reveals networks with greater interconnectivity than the more segregated structures seen in previously available data. To help visualize this, the authors draw comparisons between continuous stratus clouds and altocumulus clouds.

or (5) higher genetic connectivity of date hubs. In sum, all of our prior conclusions remain robust and there is no evidence for distinctive classes of network hubs.
It was suggested [1] that some hub proteins operate at the same intracellular place and time with their multiple interactants (as if at a party) while others operate on a oneby-one basis with their numerous partners (as if on a date). Is this distinction informative? Originally, four features were used to the draw a partition between date and party hubs: expression bimodality, localization entropy, network fragmentation, and genetic connectivity [1]. A subsequent analysis suggested a fi fth distinction, namely different rates of evolution after control for covariables [3]. Given the small size of the original dataset and the absence of statistical support for some of the assertions, we asked [2] whether these claims were robust. In both the original dataset and in new high-confi dence interaction datasets [2], we found we could not support any of the fi ve points of evidence. Bertin et al. now nominate a new dataset, which they argue supports three of the fi ve points of evidence.
Bertin et al. fi rst note a curation issue with one of our many datasets, called HC, which inadvertently contains interactions that were, owing to an ambiguity in the literature, supported by a single analysis. We certainly agree that inclusion of the data from [4] and [5] as independent validations was in error, as the data in [5] indeed fully encompasses that of [4] (A-C Gavin, personal communication). However, approximately half of the interactions reported in [4] remain multi-validated by other means. An updated high-confi dence dataset that removes this duplication and incorporates more recent interaction data is available here (Dataset S1) and as a download from the BioGRID database (see http:⁄⁄www. thebiogrid.org). Importantly, however, our dataset HC m is unaffected by the above concern as we required validation of an interaction by multiple different methods. The new build of Bertin et al. (called "fi ltered-HC") mimics HC m by excluding interactions not multivalidated with different methods. As the results of HC m confi rmed those of our other datasets [2], we were surprised by the claim that the date/ party distinction is still supported in fi ltered-HC. Because this dataset provides the most robustly defendable set of interactions, here we re-analyse the fi ltered-HC network to ask whether it substantiates the date/party distinction.  [1]. This proposal was based exclusively on visual inspection of the data. By contrast, we applied a formal test that examines deviation from a null of unimodality [6,7] and found no evidence for bimodality. Up to now we have analysed 25 expression datasets across seven protein interaction builds (including fi ltered-HC; Table 1) and added datasets nominated by Han et al., a total of 181 separate tests. To a fi rst approximation, by chance we should expect to see around nine incidences of signifi cance at the 5% signifi cance level owing to type I error (although this assumes independence between datasets). We fi nd just two. Given this lack of evidence for bimodality, Bertin et al. appear to concur that bimodality cannot be used to defi ne party and date hubs. Surprisingly though, they now assert that bimodality never was a key point of evidence. The original defi nition of date and party hubs, however, stated that bimodality represented a "natural boundary" between the two classes [1]; indeed it was argued that the lack of obvious bimodality in some expression datasets was due to low sample sizes [1]. At the same time, Bertin et al. also venture to suggest that the standard statistical test for deviation from unimodality [6,7] has a high false negative rate. It does not (see Text S1).

Neighbours of Date Hubs Do Not Have more Diverse Localizations
Originally, Han et al. reported that the partners of date hubs have more diverse intracellular localizations, as measured by information entropy [1]. However, this analysis did not normalise for connection density and arbitrarily omitted data from some cellular compartments [1]. In the fi ltered-HC dataset again (Table 2), as before [2], upon normalization and inclusion of all the data, the entropy is in the opposite direction to that predicted by the date/party hypothesis. This inversion we showed [2] is owing to differences in abundance that follow from the assignment of party hubs as those with highly co-expressed partners (PCC > 0.5). As Bertin et al. make no statement on this issue, we assume that they do not dispute this result.

Defi nitions and Inferences
The evidence for the biological relevance of date and party hubs falls into two classes: the defi nitional, namely bimodality/co-expression and subcellular colocalization, and the inferential, or corollary behaviours that may derive from the underlying biology. As the defi nitional aspects do not bear scrutiny, one must be suspicious that any correlates are merely consequences of the method used to defi ne the two hub classes. The only standing criterion left is the arbitrary distinction between those hubs with a PCC > 0.5 and those without. Highly co-expressed proteins do have a number of  ANCOVA of natural log of rate of evolution, measured either as (A) dN or (B) dN/dS predicted by date/party distinction, with protein abundance as the covariate. The black line is for date, the dotted for party hubs. For (A), ANCOVA ln(Ka) versus party/date with log10(abundance) as a covariate: effect of covariate, t = 6.9, p ~ 8 x 10−11, effect of date/party, t = 1.27, p = 0.21. For (B), ANCOVA ln(Ka/Ks) versus party/date with log10(abundance) as a covariate: effect of covariate, t = 6.99, p ~ 5 x 10−11, effect of date/party, t=1.24, p = 0.21. Note that taking the log of the variables on the y-axis forces loss of two data points (one party, one date) with dN = 0. However, results are unaffected by using, for example, ln(0.1 + dN) and ln(0.1 + dN/dS), which permits their inclusion. Similarly the residuals from the fi t of x versus y are no different for date and party hubs for ln(0.1 + dN) residuals of date are if anything lower than those for party (−0.026 versus 0.46 but not signifi cantly so, p = 0.16, t-test; ln(0.1 + dN/dS) mean for date is −0.015, for party 0.027, p = 0.18). We have repeated the analysis using a different outgroup (S. bayanus), and still fi nd no effect on covariate controlled analysis (unpublished data). odd properties, namely higher connectivity and abundance. These biases are robust in the fi ltered HC dataset: party hubs have higher connectivity (p = 0.00006) and protein abundance (p = 0.001). It is then important to ask whether further properties stem from such biases.

Given their Abundance, Date and Party Hubs Do Not Evolve at Different Rates
Bertin et al. fi nd that party hubs evolve more slowly. As originally noted [3], the question is whether party hubs evolve slower than date hubs when controlling for important covariates, most notably protein abundance [8]. We showed previously that any weak tendency for party hubs to evolve slower was accounted for by their abundance [2]. Unlike our prior analysis, Bertin et al. do not ask if party and date hubs evolve at different rates controlling for abundance but, instead, ask if PCC is related to evolutionary rate controlling for abundance. However, they inappropriately apply a parametric test (Pearson product-moment correlation) that requires the distribution of all variables to be normally distributed. Although the method is robust to some degree of deviation from normality, the extent to which the abundance data is nonnormal is extreme (Shaprio-Wilks tests for null of normality, W = 0.2, p << 0.0001, W = 1 implying normality, W << 1 implying deviation from normality). This leaves two avenues: either to transform the data to make them approximately normal or to perform the equivalent non-parametric test.
Partial Spearman's correlation is the nonparametric equivalent. Using evolutionary rate data from sensu strictu yeasts [9], controlling for abundance [10], the more highly co-expressed genes have, if anything, a slightly higher rate of evolution (partial rho controlling for abundance, rho = +0.13, p = 0.02, p determined by simulation, implemented in R [11]). If we log transform the abundance data then the parametric correlation agrees that the sign of the partial correlation changes (rho = +0.029, p = 0.36). The log transformed abundance data has a Shapiro Wilks W score of 0.95, as opposed to 0.2 for the untransformed.
Our previous tests differed from that performed by Bertin et al: we employed analysis of covariance (ANCOVA) to ask whether date and party hubs evolve at different rates when covariate controlled (this being the prior claim [3]). We fi nd, in accord with our results [2], that differences in abundance explain all difference in rates of evolution between the two classes (Fig 1). In the ANCOVA, as above, if anything date hubs evolve slightly slower than party hubs (Fig 1). Analysis of residuals supports these results (Fig 1). Although we can recover the result of Bertin et al. when Pearson's partial correlation is inappropriately applied to nontransformed data, all appropriate tests reject the contention of evolutionary rate differences.
Bertin et al. also suggest that a recent study of hub proteins that bind partners at multiple different sites, as opposed to re-use of the same site, provides support for the difference in evolutionary rate between party and date proteins [12]. However, this report failed to properly control for abundance [12], which if performed reveals no differences (p > 0.45) (Text S2). These results accord with our prior fi nding that, controlling for abundance, more highly connected hubs do not evolve more slowly, in no small part owing to re-use of binding sites [13]. In summary, evolutionary rate differences provide do not support the date/party distinction.

No Evidence for Large Differences in Effects of Hub Deletion when Allowing for Connectivity
It is argued that date hubs establish network integrity because of their positioning as intermodule linkers, as opposed to the intramodule positioning of party hubs [1]. But might any differences in deletion of date versus party hubs merely refl ect a difference in connectivity of the two hub classes? Two metrics were used to measure the effect of hub deletion on the network: characteristic pathway length (CPL) and main component size (MCS) [1]. We previously considered [2] CPL to be of limited worth, because differences in pathway length may not have biological consequences (for example, since diffusion is fast, transmission delays due to increase in number of intermediate steps may be inconsequential). Moreover, CPL is susceptible to network incompleteness, which is acute for small stringent datasets such as fi ltered-HC. However, to enable comparison with Bertin et al., we analyze both MCS and CPL.
In addition to connectivity, it is desirable to correct for dispensability, because it is not biologically sensible to analyze networks that are fatally crippled by the loss of essential genes. Fortunately, nonessential date and party hubs have equal connectivity (p = 0.94), and thus control for both parameters simultaneously. Deletion of nonessential date and party hubs has an identical effect on network integrity (for MCS, see Figure 2; for CPL see Figure S1). Bertin et al. observe the same result for MCS even without controlling for dispensability. As an alternative means to correct for connectivity, we randomly swapped date and party hubs of the same connectivity. If the differential deletion effect is solely due to inter-versus intramodule positioning, then interchanging date with party hubs should obviate the difference. Instead, hub swapping yields the same deletion  profi le as the original unswapped case (Figure 3). Finally, we asked whether, even in the absence of controls for connectivity or dispensability, the difference between party and date hubs is sensitive to removal of just a few extreme hubs. Removal of just the top two percent of hubs obviates the difference between date and party hub deletion on MCS ( Figure S2).
In sum, controlling for connectivity by two different means eliminates the difference between date and party hub deletion; even when not controlling for connectivity, the deletion effect relies entirely on a few extreme date hubs. There is thus no reason to suppose date and party hubs have different network positions.

No Evidence for a Difference in Genetic Connectivity
While Bertin et al. contend that date hubs have more genetic interactions in fi ltered-HC, they acknowledge that study bias may confound analysis, as noted [1]. Using a metric of study bias [14] (see Figure 4), we fi nd that date and party hubs do indeed differ in their study bias (p = 0.039, Mann Whitney U-test). To examine the impact of this, we considered the difference in mean number of genetic interactions per physical connection (g i /p i ) between date and party hubs; this metric controls for the fact that genetic and physical interactions are positively correlated [15]. As we incrementally purge the data of study bias, the difference in mean g i /p i between date and party hubs diminishes to zero (Figure 4). Even making no allowance for study bias, g i /p i for date and party hubs is not signifi cantly different (Mann Whitney U-test; p > 0.06). There is thus no signifi cant difference in genetic connectivity of party and date hubs.

Conservation of Date/Party Classifi cation Is a Consequence of Defi nition, Not of Biology
Finally, Bertin et al. raise one new prospective line of evidence, namely, those proteins that appear as hubs across datasets tend to preserve their status as party or date. This observation, however, follows defi nition: if a hub is coexpressed (with PCC > 0.5) in any one dataset, it is defi ned as a party hub; if not, by default it is a date hub. Once a hub is classifi ed as a party hub, its status cannot change solely with the addition of extra expression data. The reverse classifi cation, i.e., date to party, is also disfavored because coexpression across different assays is not independent. The low rates of transfer of hub status merely follow from defi nitions and do not address the biological validity of the date/party distinction.

Summary
In the new fi ltered-HC dataset, as in others, the two defi nitional criteria of date/party hubs fi nd no support. Four corollary points of evidence-rate of evolution, effect of deletion on network topology, genetic connectivity, and hub status quo-also fi nd no support. That across multiple datasets, and under multiple different tests, we repeatedly fi nd no evidence for the date/party hypothesis suggests that network hubs do not fall into discrete classes.  We defi ne bias as the difference between the number of independent validations of a genetic interaction of a given protein and the actual, nonredundant genetic connectivity, normalised by the nonredundant genetic connectivity [14]. We rank ordered all genes according to their study bias. We then eliminated the most biased data point and recalculated the difference in g i /p i for date versus party hubs for the remaining genes (reported on the y-axis). We then removed the next most biased, and so forth. At 0.5 residual, half of the original 489 genes were left in the analysis. Purging of the most biased genes removes any tendency for party and date hubs to differ; any possible difference between party and date hubs is hence owing to study bias.  Lines in red are after 50% swap of hubs, in blue for the original case. Because hub swapping has no effect, connectivity (not position in the network) explains why date and party have apparently different effects upon deletion.

Supporting Information
Each indicated hub class was serially deleted in order of decreasing connectivity (percent of network deleted, x-axis) and CPL calculated after each deletion (y-axis) Deletion of either all hubs, date hubs, party hubs, or random nodes recapitulates previously reported effects on CPL. However, as for MCS analysis described in the main text, when connectivity is controlled for by deletion of only nonessential hubs, there is no differential contribution of date versus party hubs to CPL.   A recent PLoS Biology article [1] rejected the conclusions of two previous publications [2,3] that two categories of highly connected "hub" proteins-"date" and "party" hubs-have distinct properties in the Saccharomyces cerevisiae interactome network. Currently available protein-protein interaction datasets are vastly incomplete, even for yeast [4]. Therefore, it is reasonable to rigorously re-scrutinize global properties of interactome networks as new datasets become available. Here we show that distinctions between date and party hubs [2], previously shown in a high-quality fi ltered yeast interactome (FYI) dataset [2,3], are in fact confi rmed in an updated literature-curated yeast interactome network.

Data Quality
Two protein-protein interaction datasets were used in [1]: a high-confi dence (HC) network obtained from both curated literature and high-throughput sources, and a subgraph of HC that was obtained by linking the nodes of FYI with HC edges (HC fyi ). As explained in [2], it is crucial that high-quality data be used to partition date and party hubs. Therefore, FYI was originally generated as the union of two high-confi dence interaction datasets: one curated from small-scale studies published in the literature [5] and another obtained by stringently requiring support from at least two out of four sources of high-throughput interaction evidence [2]. We use a similar defi nition here to derive a fi ltered high-confi dence ('fi ltered-HC') dataset containing 2,561 proteins linked by 5,996 interactions (Table S1) from HC. To eliminate false positive interactions that were either reported once but never confi rmed or that were obtained through curation error, our analysis included literature-curated interactions only if they were observed in two independent articles (i.e., associated with two or more independent PubMed IDs). Moreover, many interactions in HC were derived from a single experiment reported in multiple publications-e.g., reference [6] describes an approximate superset of the experiments including those reported in reference [7]. Such publications [6][7][8][9][10] were considered dependent and merged. Thus 2,423 protein pairs were removed from HC. Also, we did not include interactions supported solely by high-throughput yeast twohybrid screening [11,12] (97 pairs) or supported solely by high-throughput pull-down followed by mass spectrometry screening (742 pairs) [6-10,13] (see Table S1 for a complete list of interactions in fi ltered-HC).

Consistency of Date and Party Hub Classifi cation across Datasets
We identifi ed date and party hubs in both HC and fi ltered-HC (all analyses were also performed on the HC fyi network; see Figure S1). Since both networks contain many new interactions relative to FYI, and since some erroneous interactions might have been corrected, the proteins originally identifi ed as hubs in FYI cannot and should not be assumed to be identical. For the analyses described here, we therefore defi ned hubs anew using a degree threshold that includes the top 20% most connected nodes [2]. This corresponds to a degree of 10 or more for HC (19.4% of the proteins) and a degree of 7 or more for fi ltered-HC (21.7%).
In the original report of the date/party hub distinction [2], bimodality was observed in the average Pearson's correlation coeffi cient (AvgPCC) distribution of hubs for two out of fi ve expression datasets examined [2]. The complete lack of bimodality observed in [1] may stem from a conservative statistical test that assumed a uniform unimodal null distribution. We emphasize that bimodality was not deemed essential evidence of the party/date hub distinction in the initial report [2].
We suggest that some analyses presented in [1] (in particular the network tolerance to hub deletion) erred by not taking into account new hubs defi ned by the increased number of interactions relative to the original FYI. This strategy ignores 46% of the hubs in HC fyi [1] and thus effectively immunizes them in the attack resistance analysis and eliminates them from the genetic interaction comparison.

Distinct Topological Properties of Date and Party Hubs
When removed from the network, party and date hubs have strikingly distinct effects on the overall topology of HC, fi ltered-HC, and HC fyi . Removing date hubs dramatically disrupts the characteristic path length (CPL) of the network, whereas removing party hubs has a negligible effect ( Figure  1B), as previously observed [2]. Importantly, this difference in behavior is not sensitive to the specifi c threshold values of degree k and AvgPCC chosen here to defi ne hubs and party hubs, respectively ( Figure S2). The CPL of a network measures the mutual closeness of nodes in a network. The claim in [1] that date and party hub removal has an indistinguishable effect on network topology was based on the analysis of a different topological feature altogethermain component size. This is a poor measure of network clustering in that it does not, for example, discriminate an extended beads on a string topology from a completely connected clique. This measure is also highly sensitive to a single spurious interaction that connects two otherwise disconnected subgraphs. By contrast, the dramatic decrease in CPL that we observe for date hubs in HC, fi ltered-HC, and HC fyi suggests their coordinating role and confi rms the original fi ndings [2].

Genetic Interactions
In [2] we showed that date hubs exhibit a higher genetic interaction density than party hubs. Reference [1] described analysis of two sets of genetic interactions: one from a union of high-throughput studies (HTP-GI), and another from the literature (LC-GI) [32]. Both LC-GI and HTP-GI datasets are potentially subject to bias since gene pairs were selected nonrandomly for testing, but these are the best datasets currently available. While the LC-GI analysis confi rmed our original fi nding, the HTP-GI analysis did not [1], which we confi rmed using date/party hubs defi ned from FYI. However, examining HTP-GI in the larger HC and fi ltered-HC networks, we fi nd that date hubs in both HC and fi ltered-HC exhibit higher genetic interaction density than party hubs or non-hubs ( Figure 1C), confi rming the original report [2]. This difference remains after controlling for connectivity of hubs in the protein interaction network ( Figure S3).

Evolutionary Rate
We also confi rmed the difference in evolutionary rates [33] between date and party hubs that was reported previously [3]. Using the fi ltered-HC network (with hubs defi ned as above) we found that date hubs evolve signifi cantly faster than party hubs (Wilcoxon p = 0.01). Furthermore, using our expanded expression dataset, the PCC of hubs was negatively correlated with their evolutionary rates (Pearson r = −0.22, p = 1 ×10 −7 ), even when controlling for protein abundance [34] in either rich (Pearson partial r = −0.19, p = 3 ×10 −6 ) or minimal media (Pearson partial r = −0.20, p = 2 ×10 −6 ). The same result was obtained when considering the HC and HC fyi networks (unpublished data). Moreover, a recent report independently supported evolutionary rate differences between date and party hub and explained these differences in terms of threedimensional protein structure [35].

Summary
We confi rmed that date and party hubs have different topological properties, with the coordinating role of date hubs being supported by a greater impact on CPL. We also confi rmed that date hubs participate in more genetic interactions and evolve more rapidly than party hubs. These observations, as well as the identity of the nodes considered as date and party, remained largely consistent within all tested networks (HC, fi ltered-HC, HC fyi ), demonstrating the robustness of the results originally observed in [2]. Thus, this updated analysis confi rms the validity of the distinction between date and party hubs in the yeast interactome [2,3], and shows that the date and party hub concept and the "stratus-like" network [1] model are not mutually exclusive. Figure S1. Hub Deletion and Genetic Interaction Analysis for the HC fyi Interaction Network as Defi ned in [1] Found at doi:10.1371/journal.pbio.0050153.sg001 (172 KB PDF).     (A) Consistency of the party/date attribution between FYI and fi ltered-HC. Because fi ltered-HC network has many more interactions than FYI, only 162 of the 546 hubs in fi ltered-HC had been previously found in FYI. Filtered-HC confi rmed 86% of the party/date designations in FYI. In addition, 20% of FYI hubs are not considered as hubs anymore in the new fi ltered-HC network because of the higher connectivity threshold. (B) The effect on the characteristic path length (top panels) and main component size (bottom panels) of the networks upon gradual node removal for HC (left panels) and fi ltered-HC (right panels). Attacks against all hubs (brown curve), party hubs (blue curve), date hubs (red curve), and random nodes (green curve). Insets show an additional control for connectivity differences between categories with the x-axis representing the number of edges removed from the network. (C) Date hubs participate in more genetic interactions than party hubs or non-hubs [2], as measured here by mean number of interactions [1] from a network of curated genetic interactions [32] for both fi ltered-HC (right panel) and HC (left panel). Inside each panel, bars show the number of genetic interactions held by date hubs (red), party hubs (blue), and non-hub proteins (yellow). The p-values assessing the difference of the means between date and party hubs (Mann-Whitney U-test) are indicated above the bars.