Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks

Essential proteins are indispensable for living organisms to maintain life activities and play important roles in the studies of pathology, synthetic biology, and drug design. Therefore, besides experiment methods, many computational methods are proposed to identify essential proteins. Based on the centrality-lethality rule, various centrality methods are employed to predict essential proteins in a Protein-protein Interaction Network (PIN). However, neglecting the temporal and spatial features of protein-protein interactions, the centrality scores calculated by centrality methods are not effective enough for measuring the essentiality of proteins in a PIN. Moreover, many methods, which overfit with the features of essential proteins for one species, may perform poor for other species. In this paper, we demonstrate that the centrality-lethality rule also exists in Protein Subcellular Localization Interaction Networks (PSLINs). To do this, a method based on Localization Specificity for Essential protein Detection (LSED), was proposed, which can be combined with any centrality method for calculating the improved centrality scores by taking into consideration PSLINs in which proteins play their roles. In this study, LSED was combined with eight centrality methods separately to calculate Localization-specific Centrality Scores (LCSs) for proteins based on the PSLINs of four species (Saccharomyces cerevisiae, Homo sapiens, Mus musculus and Drosophila melanogaster). Compared to the proteins with high centrality scores measured from the global PINs, more proteins with high LCSs measured from PSLINs are essential. It indicates that proteins with high LCSs measured from PSLINs are more likely to be essential and the performance of centrality methods can be improved by LSED. Furthermore, LSED provides a wide applicable prediction model to identify essential proteins for different species.


Introduction
Different proteins play different roles and have different degrees of importance in biological activities of organisms. There is a kind of proteins, named essential proteins, which are needed by living organisms to maintain life activities, and the organism cannot survive or grow without them [1,2]. The study of essential proteins can facilitate other researches. For example, determining a minimal set of essential genes for a simplest free-living organism is fundamental in synthetic biology [3,4]. In the field of the resistance to antibiotics and toxicity, studying the essential proteins in bacteria and viruses can help design new antimicrobial drugs, since the bacteria and viruses may die from removing, interrupting or obstructing their essential proteins [5].
Essential proteins can be identified by biological experiments, such as single gene knockout [6], RNA interference [7] and conditional knockout [8]. However, these experiments are both time consuming and inefficient, and can only be applied to a few species. Thus, it is appealing to develop highly reliable and efficient computational methods to identify essential proteins.
Fast growth in the amount of available Protein-Protein Interactions (PPIs) has provided unprecedented opportunities for detecting essential proteins at the network level. In Saccharomyces cerevisiae, proteins with high degrees in a Protein-protein Interaction Network (PIN) are more likely to be encoded by essential genes and thus more likely to be essential proteins [9]. From the perspective of topology, highly connected proteins can maintain the basic structures of PIN, and the whole PIN will collapse if these proteins are removed. This phenomenon is called the centrality-lethality rule in biological networks [10]. Thus some centrality methods have been used to measure the essentiality of proteins, for example, Degree Centrality (DC) [9], Betweenness Centrality (BC) [11], Closeness Centrality (CC) [12], Subgraph Centrality (SC) [13], Eigenvector Centrality (EC) [14], and Information Centrality (IC) [15]. Later on, some other centrality measures have been proposed by looking into the topology properties of essential proteins' neighborhoods. By investigating the essentiality of proteins and their neighbors in Saccharomyces cerevisiae PIN, Lin et al. [16] proposed maximum neighborhood component and density of maximum neighborhood component algorithms to identify essential proteins. Li et al. [17] found that the neighbors of non-essential hubs seldom interact with each other, thus they proposed a method based on local average connectivity. Wang et al. [18] proposed a centrality measure based on edge clustering coefficient, named NC.
However, the available PPI data is incomplete and contains false-positives, which will affect the accuracy of essential protein prediction methods that are solely based on topology. Much information provided by the high-throughput experiments can help reduce the influence of false-positives and capture the characteristics of essential proteins from other angles. Thus a new trend to improve the essential protein identification is to integrate other information with PINs. Based on the combination of logistic regression-based model and function similarity, Li et al. [19] proposed a weighting method to evaluate the confidence of each PPI, and the accuracies of nine centrality measures in the weighted network were improved. Luo et al. [20] utilized Gene Ontology to obtain a weighted network, and calculated local topological characteristics of proteins in the weighted networks to identify essential proteins. Recently, the relationship between protein essentiality and their cluster property have been considered when identifying essential proteins. Elena et al. [21] reexamined the connection between the network topology and essentiality. As a result, they observed that the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological functions enriched in essential proteins. Based on this observation, Ren et al. [22] integrated the topology of PINs and protein complexes information to predict essential proteins. Li et al. [23] proposed a new prediction method based on Pearson correlation coefficient and edge clustering coefficient, named PeC, and Tang et al. [24] proposed a Weighted Degree Centrality method (WDC). Both PeC and WDC integrate network topology with gene expression profiles. Considering that essential proteins tend to be conservative, Peng et al. [25] proposed an iteration method (ION) for predicting essential proteins by integrating the protein orthology information [26] with PINs.
In addition, some machine learning methods have been developed for identifying essential proteins. For example, Acencio et al. [27] constructed a decision tree-based meta-classifier and trained it on datasets with the integration of network topological features, cellular localization and biological process, to explore essential proteins. Zhong et al. [28] proposed a GEP-based method to predict essential proteins by combining biological features, classical topological features, and other composed features computed by the PeC, WDC and ION methods. However, when applying to other species, the performances of these supervised machine learning methods may be affected by the differences between the training species and the prediction species [29]. All above methods try to improve the essential protein identification from different angles. However, due to the incompleteness and the dynamics of PPIs, it still lacks efficient methods to identify essential proteins accurately for different species.
In the network-based methods aforementioned, the PINs are constituted by all PPIs available at the moment which may take place in different subcellular localizations (denoted as global PINs). However, proteins must be localized at their appropriate subcellular compartments to perform their desired functions [30][31][32][33][34], and PPIs can take place only when proteins are in the same subcellular localization [31,35]. In this paper, we demonstrated that the centrality-lethality rule also exists in Protein Subcellular Localization Interaction Networks (PSLINs), which are constituted by proteins and their PPIs in the same subcellular localization. A number of proteins and essential proteins from different PSLINs are significantly different. This paper proposes a method based on Localization Specificity for Essential protein Detection (LSED), which can be combined with any centrality method to calculate Localization-specific Centrality Scores (LCSs) for proteins based on PSLINs. LSED combined with a certain centrality method XC is denoted as LSED-XC, in which the centrality method XC is applied to each PSLIN to calculate centrality scores of proteins. Based on the centrality scores from different PSLINs, a Localization-specific Centrality Score (LCS) is calculated for each protein and the localization-specific essential proteins are largely explored. The results show that, compared to the proteins with high centrality scores measured from the global PINs, more proteins with high LCSs measured from PSLINs are essential. It indicates that compared with the centrality method XC applied to the global PINs, LSED-XC can improve the accuracy of centrality methods for essential protein predictions of different species.

Materials and Methods Materials
In this study, the prediction methods were applied to four species (Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Drosophilamelanogaster) for essential protein identification. The PINs of four species were downloaded from Biogrid database [36]. All these PINs are the mixtures of PPIs from different subcellular localizations, and are considered as the global PINs. Their statistics are summarized in Table 1. For each species, the known essential proteins were extracted from DEG [2] and used as the benchmark set to evaluate the essential protein predictions.
The localization information of proteins in COMPARTMENTS database [37] was used in this study. The subcellular localizations (or compartments) in a cell are generally classified into the following 12 categories: 1) Chloroplast, 2) Endoplasmic, 3) Cytoskeleton, 4) Golgi, 5) Cytosol, 6) Lysosome(or Vacuole), 7) Mitochondrion, 8) Endosome, 9) Plasma, 10) Nucleus, 11) Peroxisome and 12) Extracellular, where Chloroplast only exists in plant cells [38]. These labeled compartments are used to annotate the localization of proteins based on the supporting evidence. The number of proteins annotated by these compartments in the global PIN of each species is listed in Table 1.

The Framework of LSED
The LSED method mainly contains four steps, as shown in Fig 1. Given a global PIN and subcellular localization information of proteins, firstly, a PSLIN was constructed for each subcellular localization. Secondly, the confidence level of each PSLIN is calculated according to the size of the PSLIN. In the third step, a centrality method is applied to each PSLIN for calculating the centrality scores of proteins in the PSLIN. Then the LCS of each protein is calculated based on its centrality scores in different PSLINs and the confidence levels of these PSLINs. Finally, the proteins are sorted by their LCSs in descending order. The details of each step will be discussed in the following subsections.

Construction of PSLINs
The PPIs in a global PIN are identified from various in vitro conditions without knowledge of subcellular localization where they take place. Proteins must be localized to the correct compartments [30][31][32][33][34] and the interacting protein pairs should be in the same subcellular localization [31,35]. Thus, a global PIN can be divided into a number of PSLINs based on subcellular localizations. A eukaryotic cell can be divided into 11 compartments, Endoplasmic, Cytoskeleton, Golgi, Cytosol, Lysosome(or Vacuole), Mitochondrion, Endosome, Plasma, Nucleus, Peroxisome and Extracellular, where Lysosome only exists in animal cells. The PSLIN of each compartment is constituted by the proteins localized in this compartment and their interactions. If a protein is annotated by multiple subcellular localizations, it will appear in multiple PSLINs. Let G = (V, E) denote the global PIN, and Loc(i) denote the set of proteins in compartment i. The PSLIN of compartment i can be denoted as Fig 2, with the subcellular localization information of proteins, the PSLINs can be generated by mapping the global PIN to each compartment separately. For more information about the PSLINs of Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Drosophila melanogaster, see S1 Dataset, S2 Dataset, S3 Dataset and S4 Dataset in the Supporting Information files of this paper.  Rechecking the Centrality-Lethality Rule in the Scope of PSLINs

Localization-specific Centrality Score
A protein may appear in several different PSLINs, and it will have several centrality scores calculated from these PSLINs. A LCS is calculated to measure the essentiality of proteins. The LCS of a protein depends on its centrality scores from various PSLINs and the reliability of these centrality scores. In this study, the reliability of the centrality score from a PSLIN is measured by the confidence level of the PSLIN. The size of a network is defined as the number of proteins in the network. Intuitively, the larger its size, the higher confidence level of a PSLIN should be (see more explanations in Discussion). Let S Max denote the PSLIN with the largest size (containing the largest number of proteins), j Ã j denotes the size of the PSLIN Ã . Therefore, in this study, the confidence level of a PSLIN is calculated by the ratio of its size to the largest size of PSLINs as follows. where S i is the PSLIN of compartment i. From the definition, the value of C(S i ) is in the range of (0, 1]. For more information about the confidence levels of different PSLINs of four species, see S1 Table in the Supporting Information files of this paper. The details for calculating the LCSs of proteins is described Algorithm 1. To calculate the LCSs of proteins, firstly, all PSLINs are sorted in descending order according to their confidence levels. Then, the centrality scores of each protein in the sorted PSLINs are calculated. If a protein appears in a PSLIN, its centrality score in the PSLIN is calculated by a centrality method; otherwise, its centrality score is zero. Later on, for each protein, its LCS is calculated based on its centrality scores computed from the sorted PSLINs and the confidence levels of these PSLINs. After the LCSs of all proteins in the global PIN are calculated, the proteins are sorted in descending order by their LCSs.

Evaluation Metrics
Selecting a certain number of proteins as candidates for essential proteins, the percentage of true essential proteins can be calculated according to the list of known essential proteins. Many centrality methods were evaluated by comparing the percentage/number of essential proteins in the top ranked proteins (the proteins with high centrality scores) [16-18, 25, 39-41]. In this paper, we also adopt this metric to evaluate the prediction accuracy of each method. if p 2 S i then 10: Ess(p, S i ) is calculated by a centrality method 11: of ranked proteins are selected as predicted essential proteins, and then the percentage of true essential proteins correctly identified by each method is compared. In this paper, LSED-XC represents LSED is combined with a certain centrality method XC. XC can be DC, IC, EC, SC, BC, CC, NC, as well as ION, while the corresponding LSED-XC can be LSED-DC, LSED-IC, LSED-EC, LSED-SC, LSED-BC, LSED-CC, LSED-NC, as well as LSED-ION. In the comparison of the enrichment level of essential proteins in the top percentages of ranked proteins, Eqs (2)-(5) are defined to explain the comparison between LSED-XC and XC methods.
Accuracy (Acc): given a certain value of c, the accuracy of a method M in the top c% of ranked proteins is defined as the percentage of true essential proteins identified by method M in the top c% of ranked proteins, calculated according to Eq (2).
where TP is the number of true essential proteins identified by method M in the top c% of ranked proteins, and N is the number of the top c% of ranked proteins. Improved accuracy (IAcc): the improved accuracy of method LSED-XC in the top c% of ranked proteins, calculated according to Eq (3), is defined as the improvement of the Acc of LSED-XC compared to the Acc of the corresponding method XC.
IAccðLSED À XC; cÞ ¼ AccðLSED À XC; cÞ À AccðXC; cÞ AccðXC; cÞ ð3Þ Average Improved Accuracy(AIAcc): the average value of improved accuracy of a method LSED-XC is defined as the average value of the IAcc of LSED-XC in different top percentages of ranked proteins, and is calculated according to Eq (4), where Topcset is a set of different percentages.
where jTopcsetj denotes the number of different percentages in Topcset.
Comparison of the Average Accuracy over Species. To evaluate the prediction accuracy of each method more comprehensively, the average accuracy (AKAcc) of a method M in the top c% of ranked proteins over more than one species is calculated by Eq (5).
where k is the number of species, Acc i (M, c) is the accuracy of M in the top c% of ranked proteins of species i. The higher the AKAcc gained by a method, the higher the likelihood of success prediction for different species.

Results
To recheck the centrality-lethality rule in the scope of PSLINs, we carried out experiments for four species, Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Drosophila melanogaster. In our experiments, seven typical topology-based centrality methods (DC, IC, EC, SC, BC, CC, and NC) and a centrality method integrating with other biological knowledge (ION) were adopted, respectively. LSED was combined with these centrality methods (denoted by LSED-XC) to calculate centrality scores from PSLINs separately. The proteins were sorted by the LCSs calculated with LSED-XC in descending order. Specifically, the orthology frequency of each protein in a species needed by ION [25] was calculated among 272 species from INPARANIOD database [26]. In addition, for the sake of comparison, each centrality method was applied to the global PINs independently, and the proteins were sorted by centrality scores in descending order, too. In the following, top 1%-25% means the set 1%, 5%, 10%, 15%, 20%, 25%. For more information about the proteins of each species with LCSs calculated by LSED-XC and centrality scores calculated by the corresponding centrality methods, see S1 File, S2 File, S3 File and S4 File in the Supporting Information files of this paper. For most topology-based centrality methods, we can observe that the percentages of true essential proteins correctly predicted by LSED-XC methods are greatly higher than those of the corresponding XC methods in the top 1%-25% of ranked proteins. The IAcc of LSED-XC methods is shown in Table 2. In Table 2, the positive value of IAcc gained by LSED-XC method indicates that the Acc of the corresponding XC method can be improved by LSED method. LSED-DC, Percentage of top c% ranked proteins, identified by LSED-XC methods and XC methods, to be essential proteins of Saccharomyces cerevisiae. Eight centrality methods (DC, BC, CC, SC, EC, IC, NC, and ION) were adopted to calculate centrality scores from the global PIN, respectively. LSED was combined with these centrality methods to calculate Localization-specific Centrality Scores from PSLINs separately. In (a)-(f), all the centrality methods are denoted as XC in the legend, and LSED with different XC methods are denoted as LSED-XC in the legend. The proteins are ranked in the descending order based on their Localization-specific Centrality Scores (LCSs) and centrality scores computed by LSED-XC methods and XC methods, respectively. Then, top 1%, 5%, 10%, 15%, 20% and 25% of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the percentages of true essential proteins were calculated. The figure shows the percentage of true essential proteins identified by each method in each top percentage of ranked proteins. The digits in brackets stand for the number of proteins ranked in each top percentage. For example, since the total number of ranked proteins of Saccharomyces cerevisiae is 6,304, the number of proteins ranked in top 1% is about 63 (= 6,304*1%). By integrating with orthology information, the percentages of true essential proteins correctly identified by ION are much greater than those of other XC methods in the top 1%-25% of ranked proteins. However, the Acc of XC methods can be greatly improved when considering protein subcellular localization. In the top 1% of ranked proteins, LSED-DC, LSED-IC, LSED-SC, LSED-NC, and LSED-ION outperform ION; In the top 5% of ranked proteins, the Accs of LSED-NC and LSED-ION are greater than that of ION; In the top 10%-20% of ranked proteins, more true essential proteins are identified by LSED-DC and LSED-IC methods. It demonstrates that both protein sublocalization information and orthology information are helpful for identifying essential proteins in Saccharomyces cerevisiae. For more information about the true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Saccharomyces cerevisiae, see S2 Table in the Supporting Information files of this paper.

Homo sapiens
In Fig 4, the percentage of true essential proteins correctly predicted by LSED-XC methods is compared to that by XC methods in each top percentage of ranked proteins of Homo sapiens. In the top 1%-25% of ranked proteins, the percentages of top c% ranked proteins, identified by DC and BC, to be essential proteins are higher than those of other XC methods, while LSED-DC and LSED-BC outperform DC and BC, respectively.
From Table 3 Compared with the XC methods based on topology, the percentages of true essential proteins correctly identified by ION in top 1%-25% of ranked proteins are quite low. Compared with NC, which is used to initialize the centrality scores in ION, ION predicts less true essential proteins in top 1%-25% of ranked proteins. It seems that the orthology information of Homo sapiens proteins used in ION degrades the performance of NC. However, LSED-ION outperforms ION in the top 1%-25% of ranked proteins, which demonstrates the effectiveness of Rechecking the Centrality-Lethality Rule in the Scope of PSLINs

Mus musculus
The percentages of true essential proteins of Mus musculus which are correctly predicted by LSED-XC methods are compared to those by XC methods in the top 1%-25% of ranked proteins, as shown in Fig 5. It is evident that the enrichment levels of true essential proteins identified by LSED-XC methods in each top percentage of ranked proteins are higher than those of the corresponding XC methods. DC, IC and BC outperform other XC methods for the top 1%-25% ranked proteins. From Table 4, we can find that, compared with DC, the AIAcc of LSED-DC in the top 1%-25% of ranked proteins is 10.5%. Compared with IC, the AIAcc of LSED-IC is 13.7% in the top 1%-25% of ranked proteins, and the AIAcc of LSED-BC is 12.4% in the top 1%-25% of ranked proteins comparing with BC. As shown in Table 4, we can find that LSED also achieves improvements on the prediction accuracy of other centrality methods. Compared with EC, the IAccs of LSED-EC are 175%, 58.3%, 19.3%, 51.4%, 76.8%, and 43% in the top 1%, 5%, 10%, 15%, 20%, and 25% of ranked proteins, respectively. Compared with SC, the AIAcc of LSED-SC is 77.25% in the top 1%-25% Percentage of top c% ranked proteins, identified by LSED-XC methods and XC methods, to be essential proteins of Mus musculus. Eight centrality methods (DC, BC, CC, SC, EC, IC, NC, and ION) were adopted to calculate centrality scores from the global PIN, respectively. LSED was combined with these centrality methods to calculate Localization-specific Centrality Scores from PSLINs separately. In (a)-(f), all the centrality methods are denoted as XC in the legend, and LSED with different XC methods are denoted as LSED-XC in the legend. The proteins are ranked in the descending order based on their Localization-specific Centrality Scores (LCSs) and centrality scores computed by LSED-XC methods and XC methods, respectively. Then, top 1%, 5%, 10%, 15%, 20% and 25% of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the percentages of true essential proteins were calculated. The figure shows the percentage of true essential proteins identified by each method in each top percentage of ranked proteins. The digits in brackets stand for the number of proteins ranked in each top percentage. For example, the total number of ranked proteins of Mus musculus is 6,582, thus the number of proteins ranked in top 1% is about 65 (= 6,582 *1%). Rechecking the Centrality-Lethality Rule in the Scope of PSLINs of ranked proteins. Compared with CC, the AIAcc of LSED-CC is 15.2% in the top 1%-25% of ranked proteins. In the top 1%-25% of ranked proteins, compared with NC, the AIAcc of LSED-NC is 9.9%. The percentages of true essential proteins correctly identified by ION in top 1%-20% ranked proteins are quite lower than those of topology-based centrality methods, while LSED-ION always identifies more true essential proteins than ION. For more information about the true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Mus musculus, see S4 Table in the Supporting Information files of this paper. Fig 6 shows the percentages of true essential proteins of Drosophila melanogaster correctly predicted by LSED-XC methods and XC methods in the top 1%-25% of ranked proteins. Compared with XC methods,the improvements on the enrichment level of essential proteins obtained by LSED-XC methods can be observed in Fig 6 and Table 5. Specifically, LSED-BC identifies more true essential proteins than others in the top 1% to 10% of ranked proteins, more true essential proteins are correctly predicted by LSED-EC in the top 15% and 20% of ranked proteins, and the percentages of true essential proteins correctly predicted by most LSED-XC methods in the top 25% of ranked proteins are nearly the same, which are higher than those predicted by XC methods. For more information about the true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Drosophila melanogaster, see S5 Table in the Supporting Information files of this paper.

Average Accuracy over Species
From the comparison of the top percentages of ranked proteins for four species, it is clearly observed that some methods can work well for one species, but may fail for the other species. For example, ION can identify more true essential proteins than other methods for Saccharomyces cerevisiae, while the performance of ION is quite poor for other species. The performance of BC for Saccharomyces cerevisiae is not very good, but it outperforms other XC methods for other species, so does the LSED-BC method. A good essential protein prediction method should be not species-specific. Otherwise, it will be difficult for biologists to make a choice which method should be applied to a species without knowing the preference of the methods. Rechecking the Centrality-Lethality Rule in the Scope of PSLINs  The AKAcc of each method in each top percentage of ranked proteins over four species was calculated. As shown in Fig 7, the AKAccs of LSED-XC methods in each top percentage of ranked proteins are higher than those of XC methods consistently. Especially, the AKAccs of LSED-BC and LSED-DC are always higher, compared with XC methods and other LSED-XC methods, which indicates their superior performances to identify essential proteins for different species. Furthermore, the higher AKAccs of LSED-XC methods suggest that in most situations the LCSs taking into consideration the cellular compartments seem to be more predictive than the centrality scores measured in the global PINs and the essential proteins of different species can be explored better in the PSLINs.

Discussion
Through the comparison, it is observed that LSED-XC can identify more true essential proteins than the corresponding XC method in the top 1%-25% of ranked proteins in most situations. Furthermore, LSED-XC methods gained higher AKAccs in each top percentage of ranked proteins over four species, which indicates their better prediction performance for different species. In this section, we will look into the different predictions between the global PINs and the PSLINs, analyze the limitations of centrality methods applied to PSLINs, and discuss the confidence levels calculated in LSED.

Different Predictions between the Global PINs and the PSLINs
LSED-XC methods measure the centrality of proteins based on the PSLINs, while XC methods measure the centrality of proteins based on the global PINs. To figure out the difference between essential proteins identified from the global PINs and the PSLINs, we compare the differences in the top 100 proteins ranked by XC methods and LSED-XC methods, respectively. In Fig 8, from (a) to (d), the X axis represents the number of different proteins between LSED-XC and XC, and the Y axis represents the percentage of true essential proteins in the different proteins. For example, DC(38) means that there are 38 different proteins in the two top 100 protein sets ranked by LSED-DC and DC, while there are 62 common proteins in the two top 100 ranked protein sets. In each different protein set, 65.7% ranked by LSED-DC are true essential proteins, while 26.3% ranked by DC are true essential proteins. In Fig 8, in the top 100 proteins ranked by LSED-XC and its corresponding XC method, half or nearly half proteins are different. It seems that the PSLINs involved in LSED-XC and the strategy of calculating LCSs are the main reasons accounting for this difference. Compared with XC, the higher percentages of true essential proteins in the different proteins gained by LSED-XC demonstrate that the proteins with high LCSs measured from PSLINs are more likely to be essential. As shown in Fig 8(a), in the top 100 ranked proteins of Saccharomyces cerevisiae, LSED-XC methods find more true essential proteins than XC methods in those different proteins. For example, 40 out of 55 different proteins ranked by LSED-NC in top 100 ranked proteins are true essential proteins, while there are only 10 true essential proteins in 55 different proteins ranked by NC. In the top 100 ranked proteins of Homo sapiens, as illustrated in Fig 8(b), the average percentage of true essential proteins in the different proteins ranked by LSED-XC methods is 53.7%, while the average percentage of true essential proteins in the different proteins ranked by XC methods is 36%. In Fig 8(c), in the top 100 ranked proteins of Mus musculus, more true essential proteins are identified by LSED-XC methods than XC methods in those different proteins. About 47% of the different proteins ranked by LSED-XC methods are true essential proteins on average, while only 25% of the different proteins ranked by XC methods are true essential proteins. In Fig 8(d), we can observe that the top 100 proteins of Drosophila melanogaster ranked by LSED-XC methods and XC methods are quite different. Specifically, the top 100 proteins ranked by LSED-SC and SC are totally different, while the average number of different proteins ranked by other LSED-XC methods and XC methods is about 83. LSED-XC methods still find more true essential proteins than XC methods in those different proteins. Compared with the global PIN, the proteins with high LCSs measured from PSLINs are more likely to be essential. In another word, the centrality-lethality rule can be explained better in PSLINs.

The Limitations of Centrality Methods Applied to PSLINs
In Figs 3, 4, and 6, we can observe that the percentages of true essential proteins identified by LSED with a centrality method, like CC, IC, and EC, are lower than those of the centrality methods applied to the global PINs in some situations. The reason might be the fact that these centrality methods are not applicable to networks with disconnected components [42,43]. Compared with a global PIN, the PSLINs tend to be smaller and of less connectivity, containing more disconnected components. Closeness Centrality(CC) dysfunctions when the network contains disconnected components. The centrality scores of many proteins in PSLINs calculated by CC are 0, which have no power to measure the protein essentiality, while a few proteins will have centrality scores of 0 calculated by CC in the global PIN. Therefore, more proteins can be effectively ranked based on the global PIN, compared with PSLINs. IC is another closeness centrality method, similar to CC, facing the same problem [42]. Eigenvector Centrality(EC) is also not applicable to network with disconnected components [43]. Therefore, not every centrality method is proper for calculating the centrality scores of proteins in PSLINs, and more useful centrality methods are expected to be proposed for PSLINs with some disconnected components in the future.

Different Reliability of Centrality Scores Calculated from Different PSLINs
The reliability of centrality scores calculated from different PSLINs are different. Take DC for example. As shown in Table 6, the number of true essential proteins in the top 100 proteins ranked by DC from each PSLIN of each species is compared. We can find out that the accuracies of these rankings from different PSLINs are different, and more essential proteins are ranked in top 100 proteins from networks with large sizes.
In LSED, the confidence levels of PSLINs are used to measure the reliability of centrality scores computed from different PSLINs, and the confidence level of a PSLIN is proportional to its size. The confidence levels of PSLINs are different, because different PSLINs play different roles and have different degrees of importance in cell activities. According to the incomplete statistics, as shown in Table 7, the numbers of proteins in PSLINs are not even, neither are the numbers of essential proteins. It is clearly observed that the number of essential proteins in Highly connected proteins play an important role in maintaining the basic structure of a PSLIN, and the whole PSLIN will collapse if these proteins are removed. The collapse of a PSLIN with the larger size will contribute more to the destruction of the global PIN. Thus, in LSED, the confidence level of a PSLIN is proportional to its size.
In conclusion, the centrality-lethality rule is rechecked in the scope of PSLINs by using LSED method which can be combined with a centrality method to identify essential proteins from PSLINs. Through the comparison on the prediction accuracy between LSED-XC with PSLINs and the corresponding centrality methods with the global PINs, we have found that proteins with high LCSs measured from PSLINs are more likely to be essential and the performance of centrality methods can be improved by LSED. From the biological angle, certain activities are carried out in each PSLIN, and the removal of proteins with high centrality scores will disturb these activities. As a result, the organism can not survive or grow. From the perspective of topology, proteins with high centrality scores play important roles in maintaining the basic structure of a PSLIN, and the whole PSLIN will collapse if these proteins are removed. Moreover, the collapse of PSLINs will lead to the collapse of the global PIN. Thus, the centrality-lethality rule can be supported better in the scope of PSLINs, and the essentiality of proteins can be more accurately predicted by LCSs measured from PSLINs.  Table. The true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Saccharomyces cerevisiae. (XLS) S3 Table. The true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Homo sapiens. (XLS) S4 Table. The true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Mus musculus. (XLS) S5 Table. The true essential proteins in the top percentages of proteins ranked by LSED-XC methods and XC methods in Drosophila melanogaster.