Construction and Analysis of the Protein-Protein Interaction Networks Based on Gene Expression Profiles of Parkinson's Disease

Background Parkinson's Disease (PD) is one of the most prevailing neurodegenerative diseases. Improving diagnoses and treatments of this disease is essential, as currently there exists no cure for this disease. Microarray and proteomics data have revealed abnormal expression of several genes and proteins responsible for PD. Nevertheless, few studies have been reported involving PD-specific protein-protein interactions. Results Microarray based gene expression data and protein-protein interaction (PPI) databases were combined to construct the PPI networks of differentially expressed (DE) genes in post mortem brain tissue samples of patients with Parkinson's disease. Samples were collected from the substantia nigra and the frontal cerebral cortex. From the microarray data, two sets of DE genes were selected by 2-tailed t-tests and Significance Analysis of Microarrays (SAM), run separately to construct two Query-Query PPI (QQPPI) networks. Several topological properties of these networks were studied. Nodes with High Connectivity (hubs) and High Betweenness Low Connectivity (bottlenecks) were identified to be the most significant nodes of the networks. Three and four-cliques were identified in the QQPPI networks. These cliques contain most of the topologically significant nodes of the networks which form core functional modules consisting of tightly knitted sub-networks. Hitherto unreported 37 PD disease markers were identified based on their topological significance in the networks. Of these 37 markers, eight were significantly involved in the core functional modules and showed significant change in co-expression levels. Four (ARRB2, STX1A, TFRC and MARCKS) out of the 37 markers were found to be associated with several neurotransmitters including dopamine. Conclusion This study represents a novel investigation of the PPI networks for PD, a complex disease. 37 proteins identified in our study can be considered as PD network biomarkers. These network biomarkers may provide as potential therapeutic targets for PD applications development.


Introduction
Parkinson's disease (PD) is a neurodegenerative disorder of the central nervous system. It is the second most common degenerative disorder after Alzheimer's disease, affecting more than 1% of those over the age of 55 years and more than 3% of those over the age of 75 years [1]. PD is characterized by tremor, muscle rigidity, and slowed movement (bradykinesia). The motor symptoms of PD result from the death of dopamine generating cells in the substantia nigra, a region of the mid brain. Improving diagnoses and treatment of this disease is essential, as currently there exists no cure for PD.
For a long time, PD has been considered to be a non-genetic disorder; however around 15% of patients with PD are known to have a first-degree relative who is also affected by this disease [2].
Mutations in several specific genes have been conclusively shown to be associated with PD. These genes code for alpha-synuclein (SNCA), parkin (PRKN), leucine-rich repeat kinase 2 (LRRK2 or dardarin), PTEN-induced putative kinase 1 (PINK1), DJ-1 and ATP13A2 [3,4]. The most extensively studied PD-related genes are SNCA and LRRK2 [1]. Mutations in SNCA, LRRK2 and glucocerebrosidase (GBA) are associated with most of the PD related cases [1]. Nevertheless, very less amount of work has been done related to protein interactions specific to the disease state.
Network science is gradually altering our view of cell biology by offering unforeseen possibilities to understand the internal organization of a cell [5]. The developments of high-throughput data-collection techniques have brought insights to our understanding of diseases. Sincere amount of time and effort has to be devoted in order to analyse this vast amount of data if we want to understand the interrelationships among disease-related genes and proteins [5]. In 2009, Taylor et al. [6] studied gene expression based weighted Protein-Protein Interaction (PPI) networks for breast cancer. They found that loss of gene co-expression of proteins interacting within the BRCA1-associated genome surveillance complex (BASC) is associated with poor outcomes of the disease. In 2011, Lee et al. [7] constructed protein-protein interaction (PPI) networks of abnormally expressed genes for schizophrenia, bipolar disease and major depression, and identified several disease markers like SBNO2 for schizophrenia, SEC24C for bipolar disorder, and SRRT for major depression. Recently, in April 2013, Ran et al. [8] constructed and analysed PPI networks for Essential Hypertension (EH), and suggested that blood pressure variation related to EH is orchestrated by an integrated PPI network with the protein encoded by NOS3 gene as its backbone.
In this study, PPI networks were constructed for PD using proteins which code for differentially expressed genes only in substantia nigra and frontal cerebral cortex. The PPI networks were constructed based on the following assumptions [7] 1. Expression level of most of the proteins and mRNAs in the brain are positively correlated. 2. Proteins with similar expression patterns are more likely to interact with each other. 3. Abundant proteins participate more in biological processes.
Topological analyses were performed to find out the significant network biomarkers. The association of these biomarkers with PDrelated genes and neurotransmitters were studied. Several complexes were also studied in the networks. Changes of coexpression level of genes associated with the complexes from control to disease state were also studied. 37 unreported disease marker genes were identified of which eight were significantly involved in the core functional modules and four showed strong association with several neurotransmitters, including dopamine. Thus our study may provide insights into the potential targets for developing new treatments for PD. Figure 1 gives the flowchart of research methodology applied in this study. The raw data (CEL files) of microarray data series GSE8397 were downloaded from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) and normalized by gcRMA [9]. GSE8397 was published by Moran et al. in 2006 [10]. It contains 47 individual localized brain tissue samples of the substantia nigra (SN) (split into medial and lateral portions) and frontal cerebral cortex (FCC) associated with PD as well as control cases, using A (HG_U133A) and B (HG_U133B) Gene Chip per sample. 15 samples of medial parkinsonian SN (MSN), 9 samples of lateral parkinsonian SN (LSN) and 5 samples of parkinsonian FCC were taken. 8 MSN samples, 7 LSN samples and 3 FCC control samples were considered.

Sources of microarray data
Our protein-interaction networks were built based on differentially expressed genes of MSN and LSN only. Initially we started a region wise study of three parts of the brain viz., MSN, LSN and FCC. When we performed 2-tailed t-test and SAM, we did not get any differentially expressed genes for FCC. MSN and LSN Protein-Protein Interaction Networks for PD PLOS ONE | www.plosone.org separately yielded less number of differentially expressed genes. However, when we combined both MSN and LSN, it yielded significant number of differentially expressed genes. Therefore the data presented in our manuscript is the collection of genes present in combined MSN and LSN.

Selection of differentially expressed genes, annotation & gene ontology (GO) analysis
Both 2-tailed t-test [7] and SAM [11] were used separately to obtain all possible differentially expressed genes from the microarray data. Expression Analysis Systematic Explorer (EASE) [12] was used to convert the Affymetrix probe IDs into gene symbols. A particular module in Babelomics 4.3.0 [13], FatiGO (http://www.fatigo.org/) [14], was used to extract relevant GO terms for a group of genes with respect to rest of the genes. FatiGO was used to find the over-representative biological processes, molecular functions, cellular components and KEGG pathways [15] involving the DE genes (p-value,0.05) ( Table 1). Among the GO terms, DE genes were most abundant in the overrepresentative biological processes. These DE genes were considered as the most significant genes in the dataset, and therefore subjected for network construction.
For the sake of clarity, we have denoted the set of significant DE genes extracted from GeneChip A using 2-tailed t-test, by the symbol D A 2ttt , the set of significant DE genes extracted from GeneChip A using SAM, by the symbol D A SAM , and the set of significant DE genes extracted from GeneChip B using 2-tailed ttest, by the symbol D B 2ttt . These sets of significant DE genes (D A 2ttt , were subjected for construction of protein-protein interaction (PPI) networks.

Construction of the QQPPI networks
Two separate approaches were taken to construct the PPI networks. First, Genes2FANs (http://actin.pharm.mssm.edu/ genes2FANs/) [16] was used to construct a Query-Query PPI (QQPPI) network, i.e., a network of protein-protein interactions consisting of query nodes only. Secondly, brain tissue specific and experimentally verified data was taken from POINeT (http:// poinet.bioinformatics.tw/) [17] to create another QQPPI network. The two networks constructed by Genes2FANs and POINeT were separately viewed using the open source network visualization software Cytoscape 2.8.0 (http://www.cytoscape.org/) [18]. The two networks (developed by Genes2FANs and POINeT) were then merged to construct the final QQPPI network, which includes all the interactions present in both the individual networks. This final network was formatted and visualized using the graph editing software yEd (http://www.yworks.com/) [19]. The same procedure was repeated for the datasets D A 2ttt , D A SAM and D B 2ttt . For the sake of clarity, we denote the merged QQPPI network formed by D A 2ttt as N A 2ttt , the merged QQPPI network formed by D A SAM as N A SAM , and the merged QQPPI network formed by D B 2ttt as N B 2ttt ( Figure 2, 3, S1). Here this must be remembered that the algorithm for QQPPI network is built in such a way that a protein occurs only once in each of the networks.

Topological parameters of QQPPI networks
We analysed topological properties of these networks using the tYNA (http://tyna.gersteinlab.org/) [20] web interface. Global properties of the networks are given in Table 2. The topologically significant nodes were extracted from the networks in two steps: (1) In the networks, nodes with degree greater than or equal to the sum of mean and twice the standard deviation (S.D.), i.e., Table 1. Gene Ontology (GO) analysis of DE genes. mean +2*S.D. of the degree distribution, were taken as hubs, i.e., High Connectivity (HC) nodes [21]. (Table 3) (2) In the second step Betweenness centrality was taken as parameter to extract significant nodes. Betweenness centrality of the nodes in the QQPPI networks ( Figure 2,3, S1) showed a varied distribution. Only a handful of nodes had betweenness score greater than 1000. However, almost 40-45% of nodes had zero betweenness. The node betweenness distribution was sorted in descending order and nodes with betweenness score lying in the top 50% of the distribution were selected. Among these sorted nodes, the nodes identified with degree less than the cut-off degree for HC nodes and directly connected to at least 2 HC nodes were selected as bottlenecks, i.e., High Betweenness but Low Connectivity (HBLC) nodes.

Identification of cliques
In this study, cliques with 3 nodes and 4 nodes (3-clique, 4clique) were identified in N A 2ttt , N A SAM and N B 2ttt . The cliques were identified with the help of a self developed algorithm (File S1). To validate the authenticity and correctness of the algorithm, it was simulated for the network obtained from POINeT and the output of the program was compared with the list of cliques given in POINeT for that network, the results exactly matched. The development of the in house algorithm was necessary to find the cliques (three and more) in the merged networks (obtained from POINeT and Genes2FANs). Only 3-Cliques and 4-Cliques were obtained, and higher order cliques were absent in the network.

Identification of complexes containing clique forming proteins
A protein complex is a complex containing multiple proteins that interact with each other. They are in the form of quaternary structure, and the proteins in the complex are linked by noncovalent protein-protein interactions. The complexes in the PPI networks were identified with the help of the database CORUM [22]. The clique forming proteins were given as query in the CORUM database to find out the complexes containing this proteins. Furthermore, with the help of an in house algorithm (File S2) all the proteins associated with a specific complex were identified. A cut-off for the number of query proteins in a complex is assigned. For N A 2ttt , comlexes containing 5 or more query proteins were listed. Similarly for N A SAM , complexes containing 4 or more query proteins were listed. In N B 2ttt , since only 2 proteins are involved in a particular complex, we did not consider this QQPPI network for complex detection. The programs to find

Gene level co-expression analysis of interacting proteins
Pearson correlation coefficient was used to find out the gene level co-expression of interacting proteins in the QQPPI networks (N A 2ttt , N A SAM and N B 2ttt ). In the QQPPI networks, gene level coexpression of each pair of interacting proteins was used to assign weight to the edges of the network. Percentage change in coexpression of interacting proteins was also calculated.
Comparison with the study of Moran et al. [10] Different analytic approaches can be taken to analyse the same microarray data with different set of goals [7]. The original contributors of the microarray data series GSE8397 were Moran et al. who focused on establishing the transcriptomic expression profile of the medial & lateral substantia nigra and the superior frontal cortex. The differentially regulated genes identified in their study were compared to the results of our study.

Study of Differential Expression (DE) of genes
Involvement of substantia nigra (SN) in PD is well known [23,24,25]. PD related motor symptoms mainly occur due to the depletion of up to 60% of dopaminergic neurons and aggregation of round, hyaline neuronal cytoplasmic inclusions called Lewy Bodies (LBs) in SN [24,25]. Significant involvement of frontal cortex in PD has also been reported [10,25,26]. The dataset (GSE8397) provided by Moran et al. [10] is the only available dataset till date which covers the tissue samples both from substantia nigra and frontal cerebral cortex. Therefore we have considered these datasets for our study.
Initially the microarrays in GSE8397 were analyzed using 2tailed t-test. Each disease sample group was paired with the control sample group in the t-tests. 2-tailed t-test is a measure of the statistical significance of the dataset, in terms of a test statistic t, which is given by: where x x and y y are the sample means, S x and S y are the sample standard deviations, n and m are the sample sizes for two samples, x and y. Under the null hypothesis, this test returns the probability (P value) of observing a value as extreme or more extreme of the test statistic. Probes corresponding to a portion of the genes showed significant changes in signal intensities in disease sample groups, as compared to the control. These genes were selected as Differentially Expressed (DE) genes.
Previously, 2-tailed t-test has been successfully used to select differentially expressed data from microarray datasets [7]. However, 2-tailed t-test does not give any up-regulated or downregulated gene information. Therefore, Significance Analysis of Microarrays (SAM) was used to identify up-regulated (UR) or down-regulated (DR) DE genes in the disease state. SAM calculates a test statistic for relative difference in gene expression based on permutation analysis of expression data, and False Discovery Rate [27] which is given by:

FDR~M
edian (90 th percentile) of # of falsely called genes Number of significant genes ð2Þ In SAM, Fold changes are also specified to guarantee that significant genes change at least at a pre-specified amount. This means that the absolute value of the average expression levels of a gene under each of two conditions must be greater than the fold change to be called positive and less than the inverse of the fold change to be called negative. This way, SAM gives better result in Table 3. Cut-off determination for hubs (HC nodes).  terms of differential expression than 2-tailed t-test as the latter does not take into account fold changes to determine significance of average gene expression levels. 1443 and 1518 DE genes were reported using 2-tailed t-test (P values,0.001) and SAM (FDR 0.19%) respectively from Gene-Chip A (HG_U133A). Out of the 1518 SAM reported DE genes, 293 genes were up-regulated (UR) and 1225 were down-regulated (DR).
Similar methodology (2-tailed t-test at P values,0.001 and SAM at FDR 0.19%) was followed to analyse GeneChip B (HG_U133B), but no significant DE gene was found. However when we increased the P value (P,0.05) of 2-tailed t-test, 1606 genes were found to be DE.
These DE genes were selected for subsequent ontological analyses followed by network analyses as their abnormal gene expression profiles in disease state indicated probable involvement in disease pathology.

Functional analysis of DE genes
The DE genes were subjected to FatiGO [14] for functional analysis. The over-representative GO terms (P value,0.05) were considered. Among these GO terms, the over-representative biological processes showed large number of DE genes as compared to other GO terms and KEGG pathways ( Table 1). Therefore, the DE genes involved in the biological processes were selected in our study for subsequent network generation based on a similar approach presented in a previous study [28]. For the dataset obtained from GeneChip A (HG_U133A) using 2-tailed ttest (P,0.001), 779 genes (distributed among 792 biological processes) were chosen as significant DE genes (D A 2ttt ). Similarly, for the dataset obtained from GeneChip A (HG_U133A) using SAM, 207 genes (distributed among 381 biological processes) were chosen as significant DE genes (D A SAM ). For the dataset obtained from GeneChip B (HG_U133B) using 2-tailed t-test (P,0.05), 221

Topological analyses of QQPPI networks
A PPI network is commonly represented as an undirected (edges have no direction) graph, G~(V ,E), where V is the set of nodes (proteins) and E~f(u,v)Du,v[V g is the set of edges (protein interactions). Thus the networks we studied are undirected and unweighted protein-protein interaction networks based on DE genes of PD microarray data.
QQPPI networks can be characterized by several topological parameters. Out of these, one of the most basic yet essential parameter is node degree, or connectivity. It signifies the number of edges incident on particular node. For a node v(V , the set of High connectivity (HC) of a node indicates that the node (protein) has direct interaction (physical interaction and/or complex formation) with many other distinct nodes (proteins). Proteins with high connectivity are considered to be essential hubs of the network, whose removal would result in an overall collapse of the global structure of the network [6]. We have extracted hubs from the QQPPI networks using the criterion described in section 2.4. Table 4 gives the number of hubs obtained from the QQPPI networks. Hub genes identified in the QQPPI networks are listed in Table 5, 6 and 7. Betweenness centrality of a node v is given by the expression: where s st is the total number of shortest paths from node s to node t, and s st (v) is the total number of shortest paths that pass through v. Betweenness centrality quantifies the flow of information  Table 9. List of complexes for the network N A 2ttt and N A SAM .
through a node in the network. In case of a PPI network, it specifies how a node influences the communication among other nodes. Therefore, in a QQPPI network, betweenness centrality helps to locate important but not very highly connected nodes. Current studies [29][30][31][32] have shown that node connectivity might not be the only influential parameter to characterize biological networks. Goñi et al. [33] described that in case of neurodegenerative diseases, less extensively connected proteins are much more appropriate therapeutic targets than highly connected ones, as the critical role of highly connected nodes (hubs) in the network modules prevent them from substantial fluctuation. Recently, it was shown that betweenness centrality can also be an important parameter for finding lowly connected (non-hub) but important nodes [34,35].
Proteins with low connectivity but high betweenness may play a key role in the modular structure in the yeast interactome. Gursoy et al. [36] studied the properties of High Betweenness but Low Connectivity (HBLC) nodes, and their importance in the context of biological networks. The Highly betweened but lowly connected nodes are also considered as bottlenecks [35]. Yu et al. [35] Suggested that HBLC nodes are more essential, and betweenness is found to be a more significant indicator of essentiality than degree. Table 4 gives the number of bottlenecks obtained from the QQPPI networks. Table 5, 6 and 7 gives the bottlenecks of our QQPPI networks. Figure 4 represents the graphical structure of a simple PPI network containing hubs and bottlenecks.

Identification of cliques & complexes
A clique Q(V is a subset of the vertices of G (refer to section 3.3) such that Vi,j[Q : fi,jg[E. In a PPI network, a clique signifies that every pair of proteins physically interacts with each other. Cliques have been used to identify functional units [37] and physical complexes [38] in PPI networks. Several three and four cliques were identified in the QQPPI networks using a selfdeveloped algorithm (refer to section 2.5). Most of these cliques are overlapping. Table 8 shows the number of cliques identified in the QQPPI networks (N A 2ttt , N A SAM and N B 2ttt ). Table 9 shows the complexes formed by individual and overlapping cliques in N A 2ttt and N A SAM . For each QQPPI network (N A 2ttt , N A SAM and N B 2ttt ), 3-cliques and 4-cliques were combined to detect tightly knitted sub-networks, which are the core functional modules in the QQPPI networks [7] ( Figure 2, 3, S1). Table S4 lists the nodes in the functional modules, along with their connectivity, betweenness, and their numbers of occurrences in 3-and 4-cliques. For each QQPPI network, it can be observed that most of the hubs and bottlenecks belonged to the core functional modules. Several cliques in the sub-networks belonging to N A 2ttt and N A SAM were found to be involved in already known protein complexes ( Table 9). Gene level co-expression analysis of proteins interacting within a complex The Pearson correlation coefficient (r) is a measure of the linear dependence between two variables giving a value between +1 and 21 inclusive. It is used as a measure of the strength of linear dependence between two variables. It is defined as the covariance of the two variables divided by the product of their standard deviations. Table 10 and Table 11 lists the values of Pearson correlation coefficient (r) of two interacting complex forming nodes and their change in both control and disease states (in N A 2ttt and N A SAM respectively). Table S5, S6 and S7 shows the Pearson correlation coefficient (r) of proteins interacting within cliques, along with net difference of r between control and disease samples and their percentage of maximum possible change, in the core functional modules detected in N A 2ttt , N A SAM and N B 2ttt respectively. Spliceosome complex (ID: 351) has been found to be the most significant in terms of change in co-expression in N A 2ttt (Table 10). Moreover, Ksr1-CK2-MEK-14-3-3 complex, PDGF treated (ID: 5936) shows significant difference in co-expression value in N A SAM ( Table 11).

Association of disease markers with cliques and neurotransmitters
Having identified the topologically significant (HC and HBLC) nodes, we then set out to study their association with PD. We used Genotator meta-database [39] and the text mining engine PubMed (http://www.ncbi.nlm.nih.gov/pubmed) for this purpose. 13 hubs and 15 bottlenecks in N A 2ttt and 3 hubs and 9    CDC25B cell division cycle 25B CDC25B is a member of the CDC25 family of phosphatases which activates the cyclin dependent kinase CDC2 by removing two phosphate groups and it is required for entry into mitosis.
IARS isoleucyl-tRNA synthetase It catalyzes the aminoacylation of tRNA by their cognate amino acid. It is thought to be among the first proteins that appeared in evolution.
*CTNNA1 catenin (cadherin-associated protein), alpha 1, 102 kDa Protein encoded by this gene associates with the cytoplasmic domain of a variety of cadherins.

PTPN3
protein tyrosine phosphatase, non-receptor type 3 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family which are signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation.
# TFRC transferrin receptor It is necessary for development of erythrocytes and the nervous system.

VASP vasodilator-stimulated phosphoprotein
It is a member of the Ena-VASP protein family. It contains an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions.

MAP3K7IP2
MAP3K7 binding protein 2 The protein encoded by this gene is an activator of MAP3K7/TAK1, which is required for for the IL-1 induced activation of nuclear factor kappaB and MAPK8/ JNK.

ADAM17 ADAM metallopeptidase domain 17
This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family which has been implicated in a variety of biologic processes like fertilization, muscle development, and neurogenesis.

WEE1
WEE1 homolog (S. pombe) This gene encodes a nuclear protein, which is a tyrosine kinase belonging to the Ser/Thr family of protein kinases.
bottlenecks in N A SAM were found to be associated with PD (Table 12). However, 6 hubs, 26 bottlenecks in N A 2ttt and 2 hubs, 5 bottlenecks in N A SAM were unreported for PD ( Table 13, 14). Due to the lack of topologically significant nodes in, N B 2ttt we did not consider N B 2ttt for further analysis. Thus 39 (6+26+2+5 = 39) nodes were obtained from our QQPPI networks which were not previously known to be associated with PD. Among these 39 nodes, 2 nodes (IQGAP1 and PARD3) were common for both N A 2ttt and N A SAM . Therefore, these 37 (3922 = 37) topologically significant nodes (hubs & bottlenecks) were considered as disease biomarkers in our study. The list of these genes, along with their symbols, names and brief description of their functions are shown in Tables 15 and 16.
These 37 unique disease markers (N A 2ttt and N A SAM ) were then subjected to detailed analysis about their association in cliques and neurotransmitters. Interestingly it was found that 8 (CSNK2A1, CLTC, PARD3, IQGAP1, ACTB, ACTG1, CTNNA1 and GSN) out of the 37 nodes were strongly associated with cliques that form the core functional modules of the networks. Furthermore, significant changes in co-expression levels were observed between control and disease states in most of these core forming nodes ( Table 17).
These 37 unreported proteins may be considered as important disease marker genes. However, the 8 clique-forming proteins and the 4 neurotransmitter (including dopamine) associated proteins  showed significant topological and functional importance in the QQPPI networks. Therefore, these 12 (8+4) proteins may be considered as key disease markers or biomarkers for PD. These proteins are called biomarkers due to five different reasons (1) These were found to be differentially expressed in PD-related microarray datasets (2) Proteins corresponding to these genes are the most topologically significant nodes (hubs and bottlenecks) in the protein-protein interaction networks (3) They showed signif-icant involvement in the known complexes (4) They showed involvement with PD-associated neurotransmitters (5) These were not known previously to be associated with PD.
Comparison with the study of Moran et al. Moran et al. reported several genes to be confirmed PDassociated sequences or a first PD expression signature [10]. A very important finding of this study concerned a series of 25 highly DE sequences which map to known PARK loci. It was proposed in their study that these 25 sequences represented candidates for as yet unidentified disease-causing genes. Interestingly, results of our study had very little overlap with their outcomes. Out of the 25 sequences reported in their study, only 1 was common to the data points in D A 2ttt (VAV3), 3 were common to the data points in D A SAM (MDH1, VAV3, CDC42) and 1 was common to the data points in D B 2ttt (CDC42). Out of these, CDC42 was the only protein which acted as a significant node: as a hub in N A SAM and as a bottleneck in N B 2ttt . Here it is interesting to note that CDC42 was recently proposed in a PPI network-based study to play critical roles in PD [53].
However, one should keep in mind that these studies had different goals. Hence the difference in the final outcomes is quite obvious. Also, this study takes into account an extensive statistical, topological and functional analysis to determine significant disease markers which was not performed in the previous study

Limitations
Genes2FANs combines protein interaction data from DIP [54], MINT [55], BIND [56], HPRD [57], BioGRID [58], InnateDB [59], KEGG [60], IntAct [61], PPID [62], Ma'ayan et al. [63], Stelzl et al. [64], Rual et al. [65] and Yu et al. [66]. Similarly, POINeT combines protein interaction data from DIP, MINT, BIND, HPRD, BioGRID, IntAct, MIPS [67], CYGD [68] and MPact [69]. Hence, by the merger of QQPPI networks formed by both Genes2FANs and POINeT, it was possible to access PPI data from all of these 14 databases in this study. Any insufficient and non-updated information in the databases will have an effect on our results. To minimize this error, we performed our studies using the information of the above mentioned databases updated till May, 2014. However information in most of the databases is incomplete. Hence, markers whose PPI data were not included in the databases in the above mentioned open source databases could not be included in this study.
Furthermore, the incompleteness of the human interactome could lead to data insufficiency, resulting in biased topological analyses. In this study, the PPI networks were constructed based on the assumption that the expression level of most of the proteins and mRNAs were positively correlated, but this might not be true for all cases. Furthermore, due to post-transcriptional and translational regulations, the correspondence between expression of a gene and its protein is complicated. It was not possible to incorporate protein expression in our study.

Conclusion
Differentially expressed genes in post-mortem brain samples of patients with PD have been identified in this study. Gene expression data and PPI data were used for topological analyses of protein-protein interactions for PD. Two sets of DE genes were selected from the microarray data separately using 2-tailed t-tests and SAM. These two sets of DE genes were run separately to construct QQPPI networks. Several important topologically significant nodes e.g., hubs and bottlenecks were identified as biologically significant nodes in the network, as it has already been established that hubs and bottlenecks correspond to biologically significant proteins with respect to the disease. With this approach, we have identified 37 proteins in our QQPPI networks which were not previously known to be associated with PD. Three and fourcliques were identified in the QQPPI networks. These cliques contain most of the topologically significant nodes of the networks which form core functional modules consisting of tightly-knitted sub-networks. Several cliques identified in our study were found to be involved in already known protein complexes associated with many biological processes. Out of the 37 markers, eight (CSNK2A1, CLTC, PARD3, IQGAP1, ACTB, ACTG1, CTNNA1 and GSN) were significantly involved in the core functional modules and showed significant change in co-expression levels between disease and control state. Furthermore, proteins encoded by 4 genes (ARRB2, STX1A, TFRC, MARCKS) showed involvement with several neurotransmitters including dopamine, which plays a significant role in PD. These 12 proteins may be considered as biologically significant with respect to PD. Our study represents a novel investigation of the PPI networks for PD. The 37 network biomarkers identified in our study may provide as potential therapeutic targets for PD applications developments. Figure S1 QQPPI network built from the dataset obtained using 2-tailed t-test (P,0.05) (GeneChip B). Orange coloured square nodes represent hubs (HC nodes). Yellow coloured triangular nodes represent bottlenecks (bottlenecks). The core functional module containing 3,4-cliques are represented using blue coloured edges. Non-hub non-bottleneck nodes are coloured green if they are directly connected to a hub or a bottleneck, and grey otherwise. Inset: Subset of the QQPPI network containing hubs and bottlenecks only. (JPG)      This table  contains the interactions within the core functional module in the  network N B 2ttt , along with their Pearson correlation coefficients (r) in control (C) and disease (D) samples, net difference of r in control and disease samples (C-D) and their percentage of maximum possible change from control to disease, expressed as [{(C-D)/max(C-D)} * 100]. Here, max (C-D) is 2 as r lies within the closed interval [21,1]. (XLSX)

Supporting Information
File S1 Clique finding procedure. The file contains the complete procedure, including the algorithm developed by us, which we have used to detect 3-and 4-cliques in the QQPPI networks. (DOCX) File S2 Complex finding procedure. The file contains the complete procedure, including the algorithm developed by us, which we used to detect complexes in the QQPPI networks. (DOCX) File S3 Connectivity and betweenness distribution of nodes in the QQPPI networks. (DOCX)