Community-Based Network Study of Protein-Carbohydrate Interactions in Plant Lectins Using Glycan Array Data

Lectins play major roles in biological processes such as immune recognition and regulation, inflammatory responses, cytokine signaling, and cell adhesion. Recently, glycan microarrays have shown to play key roles in understanding glycobiology, allowing us to study the relationship between the specificities of glycan binding proteins and their natural ligands at the omics scale. However, one of the drawbacks in utilizing glycan microarray data is the lack of systematic analysis tools to extract information. In this work, we attempt to group various lectins and their interacting carbohydrates by using community-based analysis of a lectin-carbohydrate network. The network consists of 1119 nodes and 16769 edges and we have identified 3 lectins having large degrees of connectivity playing the roles of hubs. The community based network analysis provides an easy way to obtain a general picture of the lectin-glycan interaction and many statistically significant functional groups.


Introduction
Glycans play important roles inside eukaryotic cells by binding to proteins and lipids, and they are also found in the extracellular space between cells [1]. Glycans can be grouped into two classes; linear sugars and polysaccharides. The polysaccharides consist of repeating pyranose monosaccharide rings and branched sugars, which are formed by linking various monosaccharide units [2]. Through non-covalent interactions with lectins, glycans control biochemical reactions by engaging in various biological processes such as development [3,4], coagulation [5] and response to infection by bacterial and viral agents [6]. The size of the cellular glycome is believed to be in range of 100000-500000 glycans [7]. This large size of glycomic contents could be attributed to the combinatorial aspect that oligosaccharide chains come in either linear or branched form, monosaccharide building blocks are either in a or in b anomeric configurations and monosaccharides can be linked via various carbon atoms in their sugar rings [8].
Using the complexity of the glycome, cells adopt to encode a massive amount of biological information, and it is a great challenge to decode this hidden information to understand the biology of lectins and their interactions with carbohydrates.
Protein-carbohydrate interactions are involved in a variety of biological and biochemical processes, and, recently, attempts to understand the molecular basis of such interactions have appeared [9]. Traditional methods to probe glycan-protein recognition events include X-ray crystallography, NMR spectroscopy, the hemagglutination inhibition assay [10], enzyme-linked lectin assay [11], surface plasmon resonance [12] and isothermal titration calorimetry [13]. Although these methods have been successfully applied to elucidate the details of carbohydrate-protein interactions, they are rather labor intensive and require large amounts of carbohydrate samples. These shortcomings make the aforementioned traditional approaches unsuitable as high-throughput analytic methods [14]. On the other hand, recently, many computational methods have been suggested to study protein carbohydrate interactions [15][16][17][18][19][20][21].
Conventional methods for carbohydrate ligand detection are often cumbersome and we need sensitive and high-throughput technologies that can analyze carbohydrate-protein interactions in order to discover and differentiate oligosaccharide sequences interacting with carbohydrate binding proteins [8]. Carbohydrate micro-array based technology can serve as an appropriate method [22][23][24][25]. However, at present, one of the biggest limiting factors in utilizing the complete potential of the glycan microarray data is the lack of efficient analysis tools to extract relevant information.
For complete utilization of a glycan microarray data, we need a systematic computational method [26]. Large quantities of data are generated from the analysis of the Consortium for Functional Glycomics (CFG) glycan microarray [27]. Also, predicting the glycan-binding specificity or binding motif can be a time consuming step of scrutinizing and evaluating the linear sequences of monosaccharides in glycans [27]. The CFG offers glycan microarray data for various lectins (both plant and animal origin) and glycan binding antibodies. Recently computational methods have been developed for analyzing the glycan-binding specificity from glycan array data such as the motif-segregation method [26] and the outlier motif analysis (OMA) method [28].
In this work, we have developed a method to group various plant lectins and their interacting carbohydrates by the community detection analysis of a lectin-glycan network generated by the glycan microarray data from CFG. The lectin-glycan network consists of 1119 nodes (lectins and glycans) and 16769 edges (interactions). From this network, we have identified 3 lectins having large degrees of connectivity playing the roles of hubs. Additionally, we compared the results of our community detection method with other well known clustering algorithms. We show that our method outperforms existing clustering methods in terms of both modularity score as well as the number of statistically significant (p-value #0.05) glycan specific lectin groups. We propose that this study can reveal a global organization of lectinglycan interactions, and help to identify strongly correlated lectin and glycan clusters.

Data Generation
A total of 786 glycan array files for plant lectins were downloaded using a custom made script from Consortium for Functional Glycomics (CFG) as of Dec 2013. CFG provides extensive glycomics resources so that one can explore functions of glycans and glycan-binding proteins that play important roles in human health and disease [http://www.functionalglycomics.org/ static/consortium/consortium.shtml]. All of these 786 files were further processed into a single input file, which consists of rows of protein-carbohydrate pairs. Three datasets were generated by filtering the protein-carbohydrate pairs using the cutoff values of relative fluorescence units (RFU) 5000, 10000 and 20000. These three datasets were used for network construction and their community detection. Figure 1 shows the histogram of the RFU values collected from 786 glycan array files. The data corresponding to RFU larger than 5000 constitutes only about 3.5% of the whole data. All the data is available to researchers upon request.

Network Construction
To perform a systematic analysis of protein-carbohydrate interaction, we have constructed a bipartite network, where unweighted edges are assigned between proteins and carbohy-drates. Each node represents a lectin or a glycan and its identity is indicated by its array ID or glycan ID at a given condition. A glycan array ID represents a specific protein under a specific condition. Therefore, two different nodes in the network may represent two different concentrations of a protein in the glycan array experiment. The strength of a lectin-glycan interaction is represented by its RFU value and three networks are generated using three cutoff values of RFU of 5000, 10000 and 20000.

Community Detection of a Network
We have identified the community structure of the lectin-glycan network by using the Mod-CSA method, which is a highly effective modularity optimization method [29,30,31]. The modularity is a widely used measure to determine the community structures of various networks. From a given community structure it measures the difference between the number of inter-community edges and its expected value from a randomly re-wired counterpart preserving the degrees of nodes. Modularity (Q) is defined as: where M is the total number of edges in the network, N c is the number of communities, l i is the number of edges within community i and D i is the sum of degrees of nodes in community i. The value of Q ranges between 21 and 1 and it becomes close to 1 for a highly modular community structure and 0 for a random community structure [32].

Network Visualization and Comparison with other Clustering Methods
Three lectin glycan array networks constructed in this study were exported to the Cytoscape 2.8.2, a bioinformatics package for biological network visualization and data integration [33]. To compare our clustering method with other widely used network clustering algorithms such as MCL [34,35], MCODE [36] and greedy algorithm [32], we have used clusterMaker [37] and GLay plugins [38], a multi-algorithm clustering plugins for Cytoscape.

Enrichment of Glycan-specific Proteins
Enriched glycan-specific lectins within each cluster were investigated by annotating each lectin with a predetermined glycan binding specificity. Reported specificities of various lectins were extracted from literature [39,40] and Uniprot database [41] as summarized in Table 1. The full list of all 513 protein nodes used in this study with annotations (wherever possible) are listed in Table S1.
The enrichment of glycan-specificities of lectins in each cluster was assessed by calculating the hypergeometric p-value. The p-value corresponds to the probability that a given lectin cluster sharing the same glycan-specificity can be obtained by chances. The p-value was calculated as follows: where N is the total number of lectins in the network, K is the number of all lectins having a particular glycan-specificity, and k is the number of lectins having the particular glycan-specificity in a cluster with the size of n.  Enrichment analysis was also attempted by using DAVID functional annotation cluster tool [http://david.abcc.ncifcrf.gov/ home.jsp], which did not yield any statistical significant clusters. We then manually searched each lectin in InterPro database [42] but only 8 unique GO terms such as chitin-binding, carbohydratebinding, protein binding, endopeptidase inhibitor activity, etc, were retrieved. However, these GO terms are too general to signify any detailed glycan binding specificities of corresponding lectins. Therefore, in this study, the enrichment analysis for each cluster was performed based on the annotations listed in Table 1.
Only those clusters with at least 10 protein nodes were analyzed for statistical significance.

Identification of Hub Proteins
In general, biological networks possess the scale-free property [43] in which only a few nodes in the network have many connections serving as hubs in the network. Hub proteins were identified by calculating the node degree distribution [44] by using the NetworkAnalyzer plugin of Cytoscape. Top three highest degree protein nodes were assigned as hubs (see Figure 2).

Results and Discussion
We constructed three lectin-glycan interaction networks by using the plant lectin-glycan micro array data filtered by three RFU cut-offs. The network where the interactions were filtered by RFU ,5000 consists of 1119 nodes (513 proteins and 606 carbohydrates) and 16769 edges. Similarly, the second network filtered by RFU ,10000 has 1035 nodes and 12169 edges, and the third one (filtered by RFU ,20000) consists of 901 nodes and 8042 edges. Since the first network has the maximum number of nodes and edges, and shows more statistically significant glycan specific groups (discussed later) than the other two networks, the results specified henceforth represent the first network if not specifically indicated. The first network is shown in Figure 3, where proteins are represented as diamonds and glycans as circles and the interactions between them are represented as edges.
The network representation enables a quick visual inspection of the glycans bound to a lectin of interest. Additionally, in order to identify hub lectins from the lectin-glycan array, the node degree distribution of the network was calculated and is shown in Figure 2. In an interaction network, proteins that interact with a large number of partners are considered as hubs [45], and are essential components of biological networks [46]. The definition of the hub node is rather subjective, but based on the observation of the biggest gap between the 3 rd and 4 th largest degree nodes in Figure 2, we assigned hub proteins as those three with degree larger than 220. The 3 hubs are Phloem Protein2 (PP2A1) from Arabidopsis thaliana, wheat germ agglutinin (WGA) from Triticum vulgaris (wheat), and Ricinus communis agglutinin (RCA) from Ricinus communis (castor bean).
By using the Mod-CSA method, the lectin-glycan network is clustered into 4 modules (communities), which are represented by separate colors in Figure 3. The largest module consists of 168 protein nodes and 215 glycan nodes, and the smallest community contains 98 protein nodes and 133 glycan nodes.
To validate the lectin-glycan interaction network and its detected community-structure, we investigated the binding specificities of the first neighbors of two plant lectins, Sambucus nigra agglutinin (SNA) and concanavalin A (ConA) whose glycan binding specificities are well known. The first lectin is a wellcharacterized plant lectin, elderberry bark agglutinin from Sambucus nigra, which is known to recognize the Neu5Aca2-6Gal linkage [47]. The second one is concanavalin A (ConA), which is known to have specificity for mannose sugars [48,49,50]. Proper categorization of the specificities of glycan-binding proteins plays a significant role in understanding protein-glycan interactions and utilizing glycan-binding proteins as analytical reagents.

Binding Specificities of SNA
It is well known that some plants contain more than one lectin with different sugar binding specificities [51]. The bark of the elderberry (Sambucus nigra) has two lectins SNA-I and SNA-II with different glycan binding specificities. Sambucus nigra agglutinin I (SNA-I), is the first lectin identified from the elderberry bark which has been conventionally employed to recognize Neu5Aca2-6Gal [47] or Neu5Aca2-6Galb1-4GlcNAc sequence [27]. SNA-I is composed of two polypeptides, namely chain A of 33 kDa with enzymatic activity, and chain B of 35 kDa with carbohydrate-binding activity [52]. Molecular modeling studies have indicated that the overall structure of SNA-I is quite similar to that of Ricin [53] and SNA-I belongs to the group of type 2 ribosome-inactivating proteins [52]. SNA-II is the second lectin isolated from the elderberry bark tissue, and it exhibits high affinity for glycoconjugates and Type 14 pneumococcal polysaccharides having multiple terminal D-Gal groups [51]. SNA-II consists of two identical carbohydrate-binding B-chains [51,52].
In the current lectin glycan array network, nineteen nodes represent both SNA-I and SNA-II lectins. Out of these nineteen SNA nodes, fifteen SNA-I nodes are from community 1 (1000180,  The 10 SNA-I nodes in community 3 show specificity for complex-type biantennary N-glycans (Table 2A). From this table we observe that almost all of the interacting glycans possess the determinant Neu5Aca2-6Gal or Neu5Aca2-6Galb1-4GlcNAc (shown by bold text in the table). Another interesting point to notice is that the glycans 527 and 479 exhibit low RFU values in Table 2. This could be due to the fact that these glycans contain Neu5Aca2-3 sequence, which is known to decrease the binding of SNA [27]. On the other hand, 316 (Neu5Aca2-3Galb1-4GlcNAcb1-2Mana1-3(Neu5Aca2-6Galb1-4GlcNAcb1-2Mana1-6)Manb1-4GlcNAcb1-4GlcNAcb-Sp12) contains two sequences, one (Neu5Aca2-6Galb1-4GlcNAc) increasing the binding and the other (Neu5Aca2-3) decreasing the binding.
Compared to SNA-I nodes in community 3, five SNA-I nodes in community 1 (1000180, 1000181, 1000183, 1000184 and 1000725) interact with a smaller number of complex glycans (see Table 2B). Top 3 glycans possess either Neu5Aca2-6Gal or Neu5Aca2-6Galb1-4GlcNAc and show RFU values greater than 40000. Two glycans from the second half of the table (glycans 60 and 59) show lower values of RFU because of the presence of the Neu5Aca2-3Gal sequence, which is known to decrease glycan binding. All these results are consistent with existing studies on the SNA specificity [27].
The 4 SNA-II nodes (1004707, 1004708, 1004709 and 1004710) in community 3 show preference for mainly mannose glycans or terminal GlcNAcb1-4GlcNAcb. Only two glycans (347 and 349) possess the determinant of Neu5Aca2-6Galb1-4GlcNAc (Table 2C). In general, SNA-II is known to be Gal/ GalNAc specific and is precipitated by glycoproteins, which consist of terminal GalNAc oligosaccharide chains [51]. Specifically, it shows higher affinity for D-GalNAc-and terminal N-acetyl-Dgalactosaminyl disaccharides as compared to D-Gal. Conversely, the affinity exhibited by SNA-I for D-Gal and D-GalNAc-is identical [51]. However, SNA-I recognizes Neu5Aca2-6Gal [47] or Neu5Aca2-6Galb1-4GlcNAc glycan sequence [27] with high specificity. Despite the differences in their glycan binding specificities, SNA-I and SNA-II share some similarities. For example, both lectins contain similar amino acid composition, while SNA-II contains more asparagine/aspartic acid, glycine and methionine residues [51]. Additionally, the carbohydrate-binding B-chains of both lectins show caspase-dependent apoptosis in different insect cell lines [52]. Considering their characteristic glycan binding specificities, SNA-I and SNA-II may play different functional roles in plants.

Binding Specificities of ConA
Concanavalin A (ConA) binds to a variety of eukaryotic cells through specific interactions with saccharide-containing cellular receptors, and has been widely used as a molecular probe in studies of cell membrane dynamics and cell division [54]. ConA typically binds to glucosyl and mannosyl residues at the nonreducing termini of oligo-or polysaccharides [48,49] and it can also bind to non-terminal mannosyl residues [50].   Table 2. Three types of complex glycans for SNA proteins are listed.
In comparison to communities 1 and 2, the ConA nodes in community 3 show high preference for mannose containing sugars especially ''N-glycan, high mannose'' (Table 3C). These results agree with existing reports on ConA's binding structure and specificity for mannose containing structures [55][56][57], in addition to the recognition of biantennary glycans, complex N-glycans [58] and terminal glucose [57].
Existing studies on SNA-I [47] and ConA [55][56][57] demonstrate the validity of the lectin-glycan interaction network and its detected community structure. Once a network is constructed, it is fairly easy to identify a lectin that explicitly binds to a certain glycan sequence by just selecting the lectin node of interest and its first neighbors in the network. The lectins in different communities show a dramatic difference in their glycan binding specificities. The current network-based approach should provide quick overall analysis and the use of glycan microarray data on the lectin-glycan interaction without time-consuming calculations.

Community Detection of the Lectin-glycan Interaction
We performed community detection of the lectin-glycan interaction network by using Mod-CSA [28], and compared the results with existing methods such as MCL [34,35], MCODE [36] and greedy algorithm [32,38]. The number of identified communities and the modularity values obtained by various community detection algorithms are shown in Table 4, Figure 4 and Figure 5.
From Table 4, Figure 4 & Figure 5a-d, it is clear that Mod-CSA [29] outperforms the other clustering methods in terms of the modularity score as well as the number of nodes left unclassified. The only method comparable to our modularity score of 0.37 obtained by Mod-CSA was the fast greedy algorithm [32,38] with a modularity score of 0.30. The algorithm recognizes clusters by repetitively eliminating edges from the network and then checks again which nodes are still connected [59]. The method detected 6 communities with the largest community containing 223 protein nodes and 298 glycan nodes (community 1) whereas the three smallest communities consist of either 4 nodes (community 4) or 3 nodes (community 5 & 6) only (see Figure 5b).
To compare the biological significance of modules (communities) obtained by Mod-CSA and by the greedy algorithm, we calculated the numbers of statistically meaningful enriched clusters of lectins that bind to the same specific glycan. The glycan binding specificity of each protein node was identified either from the literature or from Uniprot database as described in the methods section, and the significance of each glycan specific clusters was assessed by calculating its p-value (p#0.05). From Table 5, we observe that 44 statistically meaningful enriched clusters of lectins are identified with p-values #0.05. Whereas only 33 enriched clusters are identified by the greedy algorithm. This result suggests that many additional functionally related lectin clusters are identified by Mod-CSA, than detected by greedy algorithm.
For example, the greedy algorithm failed to identify 15 glycan specific lectin clusters (shown in bold in Table 5) that were identified by Mod-CSA. On the contrary, 3 glycan specific clusters (shown in italic bold in Table 5) were not detected by Mod-CSA, which are found by the greedy algorithm result. Specifically, the greedy algorithm failed to identify all fucose specific lectins, while Mod-CSA [29] successfully detected almost all fucose specific lectins and grouped them in community 1. Similarly, the greedy algorithm identified only five mannose related specificities in community 3, which is the major mannose binding community detected by greedy algorithm. However, Mod-CSA recognized eight mannose related specificities in community 1.
We compared our method with other popular clustering algorithms such as MCODE [36] and MCL [34,35]. MCODE method divided the network into a total of 23 clusters with the modularity score of 20.036. The largest cluster consists of 56 nodes whereas the smallest cluster contains only 4 nodes. However, only 3 clusters contain more than 10 protein nodes and they were further analyzed for enrichment of glycan specific lectin groups. The statistical analysis of these 3 clusters resulted in only 4 statistically meaningful lectin groups. From Figure 5c, we observe that a large number of single nodes (791) are not clustered into any groups. This is because MCODE identifies clusters of tightly connected nodes and does not intend to assign every node in the network to a cluster [59]. The main reason for this could be the fact that the MCODE algorithm is sensitive to noise in the network, particularly to false positive interactions [60]. Consequently, only a small number of strongly connected clusters are identified by MCODE and the rest of the nodes remain unclustered, which makes it hard to extract information from the network.     GalNAca1-3(Fuca1-2)Galb1-3GlcNAcb1-2Mana1-6(GalNAca1-3(Fuca1-2) Galb1-3GlcNAcb1-2Mana1-3)Manb1-4GlcNAcb1-4(Fuca1-6)GlcNAcb-Sp22   Galb1-3GlcNAcb1-2Mana1-3(Galb1-3GlcNAcb1-2Mana1-6)Manb1-4GlcNAcb1-4 (Fuca1-6)GlcNAcb-Sp22 Among all four methods tested, the MCL algorithm performed worst in terms of its modularity value of 20.815. MCL detected 33 clusters with the largest cluster consisting of 340 nodes while the smallest cluster has 2 nodes (Figure 5d). Similar to MCODE, the MCL method detected only 3 clusters containing more than 10 protein nodes and many nodes (689) in the network were not assigned to any group, again making it difficult to interpret these unassigned nodes. Therefore, these unassigned nodes were left out for further analysis. The MCL method resulted in only 12 statistically significant glycan specific groups.
If the performances of MCL and MCODE are hindered by false positive interactions, MCL and MCODE may perform better with networks generated using only reliable data. To find out if the Mod-CSA method outperforms the other methods regardless of  the amount of potentially false information, we performed the enriched cluster analysis on two additional networks generated using more stringent RFU criteria, RFU $10000 and RFU $ 20000 (see Table S2). The results remain same regardless of the RFU cutoff values used to generate the network. For example, the numbers of statistically significant glycan specific groups identified by Mod-CSA are 41 and 35 using RFU cutoff values of 10000 and 20000, respectively. However, the greedy algorithm provides 23 and 20 statistically significant glycan specific groups. Similarly, with the MCL method, 20 and 14 statistically significant glycan specific groups were identified (see Table S3). Surprisingly, MCODE detected no statistically significant glycan specific lectin groups from more stringent networks. Finally? we compared the clusters obtained by Mod-CSA with random clusters. We divided the nodes into four random clusters, which have the same number of nodes with those detected by Mod-CSA. This process was iterated 20 times and the average number of statistically enriched glycan-specific groups detected by random clustering was compared with that by Mod-CSA. The maximum and minimum number of significantly enriched lectin groups was 11 and 1, respectively. On average, these 20 random permutations of clusters resulted in about 7 glycan-specific lectin groups having p-value #0.05 (see Table S4). A comparison of the number of significantly enriched lectin groups detected by the different clustering methods is shown in Figure 6. All these results demonstrate that Mod-CSA extracts more information than the other widely used clustering methods, and it can serve as a powerful tool for investigating the lectin-glycan interaction.

The Optimal Community Structure of the Lectin-glycan Interaction Network
It has been shown that Mod-CSA can provide globally optimal modularity partitioning of a network containing up to 2000 nodes [31]. Since our lectin-glycan network has 1119 nodes, we believe that the Mod-CSA result corresponds to the optimal grouping of the network in terms of its modularity. The optimal modularity grouping of lectins and glycans results in 4 communities with the modularity score of 0.37. We attempted to explore the relationship between all nodes within the same community on the basis of structure and function of each lectin and the type of glycan binding specificity. Each lectin node was assigned with its known glycan binding specificity, and the statistical significance of their grouping was assessed by calculating its p-value (p#0.05) (see Table 5 and Figure 4). A brief description of each community is given below: Community 1 (Fucose specific). This is the largest community of the lectin-glycan network detected by Mod-CSA analysis and contains 168 protein nodes and 215 glycan nodes, respectively. This community is dominated by protein nodes with fucose specific lectins, such as ulex europaeus agglutinin I (UEA-I), aleuria aurantia lectin (AAL), ralstonia solanacearum lectin (RSL), etc. The fucose binding sites of RSL are very similar to those of previously reported five fucose-binding sites of AAL [61]. Fucosecontaining xyloglucans are known to promote signaling consequences on plant tissues [62]. The other types of overrepresented lectins in this community have specificity for Galactose-and Nacetylgalactosamine binding with cell adhesion as their main function. The most common protein domains correspond to these galactose specific lectins are H_lectin (PFAM ID: PF09458) domain, which is involved in self/non-self recognition of cells through binding with carbohydrates [63], and Galactose-binding domain-like domain known as Discoidin domain (PFAM ID: PF00754), which is found in many blood coagulation factors. The galactose specific lectins in this community include agglutinin from Helix pomatia, Discoidin I and Discoidin II from Dictyostelium discoideum (Slime mold). Additionally, the unannotated lectins in this cluster such as 6RG, Tap1, Mubin1 show specificity for galactose or fucose sugars (see Table S5), which strongly indicates that these proteins are related to cell adhesion.
This community contains the top hub PP2A1 (1001943) with the largest node degree of 257. The other three PP2A1 nodes (1002090, 1002091 and 1002092) belong to community 2. The list of unique glycans that interact with these PP2A1 nodes are    [64]. Community 2 (Galb1-3GalNAc specific). This is the smallest community with 98 protein nodes and 133 glycan nodes. Community 2 is rich in N-acetylglucosamine and N-acetylgalactosamine binding lectins such as Wheat Germ Agglutinin (WGA), Griffonia simplicifolia II (GS-II), and Sclerotium rolfsii lectin (SRL). WGA belongs to a highly conserved family of chitinbinding lectins from cereals (Gramineae), such as rye, barley, rice and wheat [65]. Chitin, a polymer of b-1,4-N-acetylglucosamine is present in the cell wall of many fungi, in the exoskeleton and digestive tract of some insects, and in some nematodes [66]. Similarly, GS-II, also an N-acetylglucosamine-specific legume lectin, has insecticidal activity against cowpea weevil [67]. In contrast to WGA and GS-II, SRL displays strong binding to Olinked galactose-beta-1,3-N-acetylgalactosamine, disaccharide (Thomsen Friedenreich antigen) similar to Agaricus bisporus lectin [68]. Similarly, the other N-acetylgalactosamine specific lectins in this group are involved in the binding of T-antigen structure Gal-beta1,3-GalNAc e.g. Agglutinin alpha chain (Jacalin alpha chain) from Artocarpus integer (Jack fruit) and Agglutinin alpha chain (MPA) from Maclura pomifera (Osage orange). . PP2A1 is known to interact with diverse types of carbohydrates and may be involved in numerous recognition functions [64]. On the other hand, CFT shows preference for the a-anomer of GalNAc and recognizes GalNAca1 sequences as well as high affinity for the Forssman pentasaccharide and for Galb1-.3GalNAc-a- [69], which is one of the overrepresented (p-value ,0.05) glycan specific group in this community. Lists of unique glycans for PP2A1 and CFT nodes are summarized in Table S7.
Community 3 (Mannose specific). Protein nodes in this group are dominantly mannose binding lectins and nine out of twelve statistically significant glycan groups are mannose specific. Many members of these mannose specific lectins have B_lectin (PFAM ID: PF01453) structural domain. The members of this family are mannose specific and belong to Bulb lectin super-family (Amaryllidaceae, Orchidaceae and Aliaceae).For example, Galanthus nivalis agglutinin (GNA), a mannose-specific lectin from snowdrop bulbs, is a tetrameric member of the family of Amaryllidaceae lectins that exhibit antiviral activity towards HIV [70]. Other mannose binding lectins in this group have Lectin_legB (PFAM ID: PF00139) structural domain and require metal ions like Ca and Mn ions for carbohydrate binding and cellagglutinating activities. Examples include ConA and Garden pea lectin. The group also includes various high mannose binding lectins such as Hippeastrum hybrid lectin (HHL), Narcissus psuedo-narcissus agglutinin (NPA), Salt stress-induced protein, Allium sativum agglutinin (ASA), etc. Another mannose binding lectin in this group which has an antiviral activity is Cyanovirin-N (CV-N). The antiviral activity of CV-N is mediated through specific interactions with the viral surface envelope glycoproteins gp120 and gp41, as well as to high-mannose oligosaccharides found on the HIV envelope [71].
Other lectins that were grouped in this community for which we could not find the reported glycan specificity include Arum maculatun agglutinin (AMA), Caragana arborescens agglutinin (CAA), Colchicum autumnale lectin (CA), and Arisaema helleborifolium schott lectin (AHL). All these lectins also show high specificity for mannose sugars (Table S8). Overall the community consists of 147 protein nodes and 124 glycan nodes.
Community 4 (GalNAc specific). From Table 5 it can be observed that this community is enriched in GalNAc specific lectins such as Datura stramonium agglutinin (DSA), Soybean agglutinin (SBA), Vicia villosa agglutinin (VVA), Bauhinia purpurea lectin (BPL), etc. These galactose specific lectins may play a significant role in cell-agglutinating activities e.g. VVA (Lectin B4) from Vicia villosa (Hairy vetch). Another galactosespecific lectin in this group is a legume lectin known as Erythrina cristagalli lectin (ECL) [72]. Although its function in the legume is unknown, it has been shown that ECL possesses hemagglutinating activity and it is believed to be mitogenic for human T lymphocytes [73]. A large number of plant and fungal proteins (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) that bind Nacetylglucosamine contain chitin-binding domain (PFAM ID: PF00187). These proteins might function as a defence against chitin containing pathogens, e.g. Chitin-binding lectin 1 of Solanum tuberosum (Potato). This community also includes lectins such as Macrolepiota procera agglutinin (MPA) and Laccaria bicolor lectin both of which show high specificity for complex GalNAc glycans (Table S9). This community consists of 100 protein and 134 glycan nodes.
Additionally, this community includes 2 out of three hub nodes identified in the lectin-glycan array network. One of the hubs   [74]. These structural characteristics and the closeness of binding sites make WGA a worthy candidate to explore multivalent protein-carbohydrate interactions and to assess the impact of structural modifications of glycoclusters [75]. These multivalent interactions are favorable as compared to monomeric ones and are frequently employed by nature to control an array of diverse biological processes [76]. RCA as well as ECL recognize carbohydrate chains with nonreducing terminal b-d-galactose (Galb) and show preference to Galb1-4GlcNAc instead of Galb1-3GlcNAc sequence [77,78]. The diverse types of glycans including Galb1-4GlcNAc that interact with RCA hub node are listed in Table S11. The table also shows many Neu5Aca2-6Galb1 sugars having large RFU values.?RCA is a glycoprotein from seeds of castor plants and one of the most important applied lectins that have been widely used as a tool to study cell surfaces and to purify glycans [79]. RCA promotes binding and agglutination of polysaccharides and glycoproteins in addition to liposomes and micelles containing glycolipids with galactosyl residues [80,81]. Furthermore, the specificities of interactions of RCA with neutral and sialylated oligosaccharides have been well established and is consistent with our results as summarized in Table S11 [82].
The current community-based network study of the lectinglycan microarray data provides not only a quick and systematic analysis of lectin specificities, but also global organization and grouping of biologically related lectins along with their binding partners (glycans). Such information will be vital to identify lectins that bind to particular glycan structures or to catalogue lectins according to the similarity in specificities. Another important significance of the community-based network analysis is the identification of a novel lectin and the initial guess about its specificity. For this, a sequence database should be constructed for each community identified and a target lectin under investigation should be fed into the databases to get an idea about the structural/functional role of the query lectin and the type of glycans it might bind to. This approach will be more practical when the communities have a large number of different lectins and might help in Siaa2-3Galb1-3(Siaa2-6)GalNAc 0.0036 Siaa2-3Galb1-4GlcNAc 0.0377 Tri/tetra-antennary complex-type N-glycan 0.0234 Communities generated by Mod-CSA and greedy algorithm are used. The statistical significance of each reported glycan binding lectin was calculated by hypergeometric distribution using p#0.05. For each glycan listed in Table S1, interacting lectin nodes were identified to calculate the significance of the community structure determined in this study. The number of statistically significant glycan-specific groups according to Mod-CSA partitioning is 44 (p-value ,0.05) while greedy algorithm provides only 33 groups. 15 glycan-specific groups generated by Mod-CSA but not by greedy algorithm are shown in bold, whereas 3 groups generated by greedy algorithm but not by Mod-CSA are shown in italic bold. doi:10.1371/journal.pone.0095480.t005 determining the glycan binding nature of a given lectin. There are many network-based protein function prediction methods along with approaches utilizing structural or sequence information of proteins. Recently, when dealing with a proteinprotein-interaction network, it has been shown that more accurate protein function prediction results were obtained by modularity based community detection of the network. The current study provides the first attempt to study lectincarbohydrate interactions via community detection of a network.

Conclusion
We have constructed a bipartite lectin-glycan interaction network from the collection of glycan microarray data. The network itself provides a quick and global view of the lectinglycan interaction from which hub proteins are identified. We find that the hub proteins match well with the characteristics of known biological relevance. Using Mod-CSA, a recently developed efficient community detection method, 4 modules are identified. The clustering results are shown to be biologically more meaningful than those obtained by other widely used methods. Most significantly, 44 statistically significant glycan specific groups are identified including fucose and mannose binding ones, some of which could not be detected by alternative methods. Even with more strict RFU cut-offs, clusters generated by Mod-CSA provide consistently better results as compared to other methods. We provide overall analysis of 4 communities identified in the lectin-glycan microarray network. We also show how multiple lectins from the same plant, such as Sambugus nigra (SNA-I and SNA-II) are grouped into different communities based on their glycan binding specificities. The network study provides a framework to get a broad picture of data containing many interacting components. These capabilities of a community-based network analysis allow researchers to explore, analyze and compare a variety of proteins and glycans within the context of modules/ communities identified in the network. We expect that this will trigger interest in the prediction of protein-carbohydrate interactions using biological networks and will have wider applications as additional glycan binding proteins are identified. The method can also be applied to study other types of lectins as well as other interaction networks.

Supporting Information
Table S1 List of all protein nodes, their clusters and reported specificity in the lectin-glycan network. (XLS)