Centralized Modularity of N-Linked Glycosylation Pathways in Mammalian Cells

Glycosylation is a highly complex process to produce a diverse repertoire of cellular glycans that are attached to proteins and lipids. Glycans are involved in fundamental biological processes, including protein folding and clearance, cell proliferation and apoptosis, development, immune responses, and pathogenesis. One of the major types of glycans, N-linked glycans, is formed by sequential attachments of monosaccharides to proteins by a limited number of enzymes. Many of these enzymes can accept multiple N-linked glycans as substrates, thereby generating a large number of glycan intermediates and their intermingled pathways. Motivated by the quantitative methods developed in complex network research, we investigated the large-scale organization of such N-linked glycosylation pathways in mammalian cells. The N-linked glycosylation pathways are extremely modular, and are composed of cohesive topological modules that directly branch from a common upstream pathway of glycan synthesis. This unique structural property allows the glycan production between modules to be controlled by the upstream region. Although the enzymes act on multiple glycan substrates, indicating cross-talk between modules, the impact of the cross-talk on the module-specific enhancement of glycan synthesis may be confined within a moderate range by transcription-level control. The findings of the present study provide experimentally-testable predictions for glycosylation processes, and may be applicable to therapeutic glycoprotein engineering.


Introduction
Carbohydrates are a basic cell constituent, and are one of the most abundant and diverse biopolymers in nature [1]. Complex carbohydrates have recently become widely recognized as more than just a metabolic energy source [2][3][4][5][6]. For example, the cell surface contains a layer of complex carbohydrates involved in signalling roles that are indispensable to multicellular organisms [2,7]. Glycosylation, the attachment of glycans (oligosaccharides) to proteins or lipids, is a ubiquitous post-translational modification that generates an extensive functional capability from a limited set of genes [8][9][10]. In contrast to gene and protein sequences, the glycosylated glycan sequences are not arranged in a simple linear chain [5]. Several monosaccharides can be placed simultaneously on a particular monosaccharide, forming branched structures that provide enormous glycan structural diversity.
Vertebrates, and especially mammals, have evolved a unique glycan repertoire which is structurally distinct from that of nonvertebrate organisms [2,[8][9][10]. Mammalian cells are used as host cell systems for the production of many recombinant glycoproteins; these systems can synthesize properly folded proteins with glycans resembling those in human bodies [11,12]. N-Linked and O-linked glycans are the major contributors to the structure and function of mammalian secretory glycoproteins. N-Linked glycans are attached to asparagine residues of proteins, located within the Asn-X-Ser/Thr motif of amino acids, where X can be any amino acid except proline.
N-Linked glycosylation occurs co-translationally in the endoplasmic reticulum (ER) compartment. The addition of an oligosaccharide to the peptide at an early stage of glycoprotein synthesis allows the glycan to participate in the folding and quality control of a newly synthesized protein [13]. Upon successful folding of the protein and the trimming of some residues in the glycan, the glycoprotein migrates into the Golgi apparatus. Processing in the Golgi involves the removal of mannose groups and the addition of various monosaccharides to the growing glycan. The removal of the mannose groups is driven by mannosidases, and the addition of different monosaccharides is facilitated by specific glycosyltransferases. Thus, N-linked glycosylation pathways comprise consecutive enzymatic steps that rely on the glycan structures produced by the previous enzyme to produce the substrate for the next enzyme. The pathways formed in this process diverge when a glycan is a substrate for multiple enzymes, or converge when multiple glycan substrates all lead to the same product. Many glycan intermediates at different loci along the pathways, not necessarily glycans at the termini, can be secreted out of the Golgi to the targeted sites where they perform biological functions, such as mediating cell growth and development, cell-cell communication, immune recognition/response, and molecular homeostasis [2,[7][8][9][10]14].
Recent advances in understanding the generic properties of complex networks, including various biological, technological, and social networks [15,16], allow for a quantitative examination of the organization of N-linked glycosylation pathways. This development in network research has been driven largely by the availability of massive digital records and statistical methods that permit network data to be collected and analyzed on a scale far larger than previously possible. The emerging results in complex network research have led to the realization that, notwithstanding the importance of individual molecules, cellular phenotype is a contextual attribute of seamless and quantifiable network patterns among numerous constituents [17]. Despite the key role of glycosylation pathways in sustaining many biological functions, their large-scale properties have not yet been characterized from a complex network perspective. Understanding the global organization of complex networks will provide valuable and perhaps unique topological information, and may also lead to a better understanding of the dynamical and evolutionary processes of the networks, as demonstrated in several other biochemical systems, such as metabolic networks and protein-protein interaction networks [18][19][20][21][22]. Here, we explore whether the organization of glycosylation pathways can be elucidated from a complex network perspective, by investigating the structural and regulatory properties of N-linked glycosylation pathways in mammalian cells. Our findings don't only have the implications in the organizing principle of cellular glycosylation processes, but also in the glycoprotein engineering to be applicable for therapeutic purposes.

Topological Properties and Modularity
We constructed N-linked glycan biosynthetic pathways by incorporating ten typical Nlinked glycosylation enzymes in mammalian cells and their substrate specificities (Table 1; see also Materials and Methods). These enzymes can accept multiple N-linked glycans as substrates, and are thus capable of generating a large number of glycan intermediates. Construction was initiated from 9-mannose glycan, the common precursor of N-linked glycans in the Golgi, and followed by biosynthetic steps to produce mainly complex-type glycans (Figure 1), giving rise to a glycosylation network composed of 638 glycans and 1499 enzymatic reactions (Figure 2A). Central and peripheral regions. The essentiality of a particular glycan in the glycosylation network was assessed by counting the number of all downstream substrates that could not be produced in the absence of the given glycan. Following the terms in complex network research, this might be analogous to evaluating the avalanche size of a network after perturbing a single vertex [23][24][25]. Figure 2B shows that for most glycans (95.8%) the absence of an individual glycan did not affect any glycan production or only hindered the production of fewer than three glycans. On the other hand, the impact of the removal of the few remaining glycans (4.2%) spread over a wide range, even up to the damage at the whole system level. These minor, but highlyimpacting glycans tended to be located adjacent to each other, thereby occupying a single clustered region in the pathways. Therefore, the clustered region could be easily distinguished from the other parts of the network, and was termed the central region (Materials and Methods). The central region consisted of one connected component of glycans, including the initial input substrate, and the non-central or peripheral region was bound to and derived from this central region.

Modular structure.
The spectral method developed for graph partitioning (Materials and Methods) revealed that the peripheral region comprised 21 tightly-knit subgraphs. These 21 subgraphs or modules are densely connected groups of glycans, with only sparser connections between groups. Therefore, the modules tend to be biosynthetically isolated from each other. This biosynthetically-modular property of the 5 pathways originates from the substrate specificity of the enzymes considered here, as described below, rather than from simple differences in individual glycan structures that the conventional scheme for glycan classification [2] has been based on. Interestingly, each module in the peripheral region was generated from only a few roots, all of which belonged to the central region (Figures 2A and S1). In other words, N-linked glycosylation pathways organized their modular structure in a highly centralized manner; the central region with a small number of glycans proliferated directly into all 21 modules in the peripheral region, thereby forming a star-like structure. Indeed, direct connections between different peripheral modules were relatively sparse compared with those between the central and peripheral regions ( Figures S1 and S2). Remarkably, the glycosylation network had unusually high modularity (Q = 0.83) compared with other biological and non-biological networks [26], suggesting that glycosylation-specific evolutionary pressure was required for the development of such a unique network structure. The number of glycans across modules was unevenly distributed, with the largest module containing 40-fold more glycans than the smallest module. The discrete jumps between module sizes in Figure 2C indicate that the size of each module was due to the complexity of the terminal glycan structures. Specifically, the more processed the terminal glycans were with N-acetylglucosamines (GlcNAc) following α1,3-and α1,6linked mannoses, the greater the number of glycan species that developed in the module.
This glycan enrichment pattern across modules comes from the inherent capability of carbohydrates to add branches [5] to the mannose residues, which exponentially diversifies the glycan structures.  Figure   S2). On the other hand, in the central region, all enzymes except galactosyltransferase (GalT) and sialyltransferase (SiaT) were involved in the reactions (Figure 2A).
These findings suggest the enzymatic mechanisms that are responsible for generating the unique modular structure of the glycosylation network, as highlighted by the role of GalT: GalT are generous in their substrate specificity, accepting any substrate with free GlcNAc on the mannose branches, and multiple products arise from the same substrate 6 depending on the specific galactosylated residues. Once glycans are galactosylated, however, they inhibit the approach of many other enzymes (Table 1). Such tolerance in substrate specificity and product formation facilitates the development of redundant pathways within each module, whereas the inhibition of other enzyme activities keeps different modules separated. The effect of such inhibition for module differentiation was also observed from another enzyme, β-1,4-mannosyl-glycoprotein 4-β-Nacetylglucosaminyltransferase (GnTIII). GnTIII adds bisecting GlcNAc to its substrate, and the presence of bisecting GlcNAc inhibits the activity of many enzymes (Table 1).
Therefore, the bisecting GlcNAc is thought to insulate relevant modules, as shown in Figure S2 where the junctions of different modules only contain glycans without bisecting GlcNAc. Accordingly, if we exclude the glycan syntheses catalyzed by GnTIII, then the network becomes slightly less modular (Q = 0.72) as the well-insulated modules selectively disappear. We believe that this organizing principle of modular structures manifested by GalT as well as by GnTIII offers a useful guideline for the engineering of novel glycosyltransferases, as discussed below.

Regulatory Properties and Cross-Talk
The highly modularized, yet centralized organization of N-linked glycosylation pathways raises the question of how cells enhance or suppress the glycan production across modules against distinct physiological conditions. Within the same module, glycans are easily convertible to other glycans along densely connected pathways, whereas the conversion of glycans between different modules, which are only sparsely connected, is more difficult. Furthermore, glycans in the peripheral modules are surrounded by homogeneous enzymatic reactions (catalyzed mostly by GalT and SiaT in the Golgi), and are thus not as likely to be regulated but routed randomly along the pathways. Glycans along such unregulated routes are thought to be trapped for a long time in a particular module because there are few paths through which they can enter the other modules [27]; therefore, glycans delivered from the central region might continue to be processed inside the arrival modules until they are eventually secreted out of the Golgi. In this regard, the paths glycans take through the central region ahead of the peripheral modules likely play a critical role in the end-product formation.
Specific reactions in the central region may be manipulated by the transcriptional regulation of enzyme expression. Previous experiments demonstrated a correlation between glycan production and transcript expression of the corresponding enzymes. For example, the abundance of bisected glycoforms and of GnTIII transcript as well as that of fucosylated glycoforms and of glycoprotein 6-α-L fucosyltransferase (FucT) transcript is positively correlated across different mouse tissues [28][29][30]. The heterogeneous enzyme pools in the central region favor such specific transcriptional control. Glycosylation enzymes, however, are usually involved in multiple reactions; a change in the abundance of a single enzyme is likely to affect more than one reaction in the central region, and a number of modules derived from the affected reactions will also be affected. Therefore, it is important to assess specifically how to control these modules that share the common upstream enzymes to result in the cross-talk between the modules. 9 We considered combinations of up-and down-regulation of glycosylation enzymes that would unambiguously predict changes in glycan syntheses, and for each case, we determined which modules would enhance or suppress glycan production relative to their basal levels (Materials and Methods). Figure 3A shows one such result in which the down-regulation of GnTIII, α-1,3-mannosyl-glycoprotein 4-β-Nacetylglucosaminyltransferase (GnTIV), and FucT led to the enhancement of the 1st and 16th modules, but also to the suppression of the other modules. Minimizing cross-talk or unwanted enhancement of modules other than those specified requires an orchestrated regulation across enzymes. Under the regulation to minimize such crosstalk, Figures 3B and 3C show that each enhancement of three-quarters of the modules was accompanied by the unwanted enhancement of less than one-third of the modules, and the enhancement of the remaining modules could be at most accompanied by the unwanted enhancement of less than one-half of the modules. Consequently, although the cross-talk between modules is not negligible, the effect on glycan synthesis is confined within a moderate range, and probably further reduced by post-transcriptional regulation or by other combinations of enzyme regulation which were excluded here for clarity.
The explicit prediction of modules to be enhanced under given transcriptional regulation (Table S1) can be tested experimentally by measuring the change in the glycan production after genetic manipulation and identifying the relevant modules. For example, the production of glycans belonging to the 1st and 16th modules ( Figure S1) is supposed to be increased after gene knockdown of GnTIII, GnTIV, and FucT, as indicated in Figure 3A. It should be noted that the glycan production here was quantified by the amount of flux into the glycan synthesis, rather than by the glycan abundance itself. Therefore, measuring only the abundance of secreted glycans and not the abundance of all the glycan intermediates will be more relevant in this case.
Experimental validation of this prediction will allow us to design genetic regulation to enhance glycan synthesis in targeted modules. For example, if some modules contain desirable glycoproducts like biopharmaceuticals, then genetic regulation can be applied to enhance the glycan synthesis in these modules, and accordingly, to increase the production rate of the biopharmaceutical glycans. Such genetic regulation toward specific module enhancement might also be applied to reduce the heterogeneity of glycoforms and to improve the consistency of glycoprotein production [11,12].  Table   S1. Enhanced modules are colored blue or green, and suppressed ones are indicated in white.

The row including greens is for the case demonstrated in (A). (C) From the lists of modules to
be enhanced together with a given module on the horizontal axis, we enumerated the minimum number of such co-enhanced modules as shown in the vertical axis. Blue is for less than or equal to 6 in the minimum number, obtained from the module enhancement pattern shown in (B).

Discussion
The complexity and biological significance of protein glycosylation have long been underestimated, and now, in the post-genomic era, are at the forefront of scientific research. It is increasingly appreciated that biological systems exploit glycosylation in synthesizing cell-surface glycans to organize plasma membrane receptors and control the recruitment of intracellular signal transduction mediators. Hence, further knowledge of glycobiology will contribute to deciphering a myriad of biological phenomena. Clearly, a systems-level understanding of glycosylation processes will advance such scientific achievement. The N-linked glycosylation pathways comprise very distinct topological modules, all directly stemming from the common upstream pathway termed the central region. This central region might act as a 'control tower' of glycan production by redistributing glycan synthesis fluxes over the modules to adapt to different physiological conditions. Cross-activation or cross-talk between the modules, however, will restrict the fine-tuning level of the flux distribution. The topological properties of such N-linked glycosylation pathways were elucidated from a complex network viewpoint that further helps set the hypotheses on implicated functional and evolutionary properties.
The underlying mechanism of module development is clarified by the role of GalT, which accepts a wide range of substrates and makes multiple products to inhibit many other enzyme activities. The tolerance in glycan synthesis and the inhibition of other enzyme activities contribute to module formation and differentiation, respectively, while the latter is also observed similarly in the case of GnTIII. The significant influence of GalT in pathway formation provides a pattern for the design of novel glycosyltransferases to implant another module that does not severely disturb the preexisting pathways. Such construction or evolution of a new module would not significantly hamper the functioning of the old modules, and are thus favorable both for engineering purposes and for evolution, which could be facilitated [31] by this modulelevel modification. Specifically, the sugar residues attached by these novel enzymes should not inhibit galactosylation and sialylation. On the other hand, the enzymes should not accept already-galactosylated substrates. If these two rules are satisfied, then the enzymes will synthesize glycans at the central region introducing a new module in the peripheral region. Interestingly, GnTIII satisfies both rules, and this might be one reason why GnTIII works properly in recombinant Chinese Hamster Ovary (CHO) cells although it is not present in wild-type CHO [32][33][34][35]. In addition, CHO cells transfected with GnTIII are utilized in industry for the production of antibodies that significantly improve antibody-dependent cellular cytotoxicity and treat neuroblastoma and non-Hodgkin's lymphoma [34,35].
More immediate applications for glycoprotein engineering might arise from the relationship between transcriptional regulation and glycan production, as described above. Orchestrated regulation of enzyme expression in the central region will allow glycan production to be enhanced in specific modules, while avoiding moderately the increased production of other unwanted modules. Possible deviations between the prediction and the empirical data may arise due to incompleteness in modelling or regulation at a post-transcriptional level of which the potential effects on glycosylation remain largely unknown. Further integration of poly-N-acetyllactosamine structures and many degradation mechanisms will dress up the pathways considered here, and the original pathways can be viewed as an organizational kernel [36] of which the main properties we expect to be still reflected in more complicated pathways. Various techniques used to study metabolic flux analysis are also expected to allow for in-depth analysis of glycosylation processes [6]. In conjunction with such mathematical modelling [6,37,38], the development of high-throughput experimental techniques for glycan and glyco-gene profiling [3][4][5]10] will further facilitate the systems analysis of glycosylation processes as successfully demonstrated in this study.

Network Construction
N-linked glycosylation pathways were constructed by enumerating N-linked glycan structures commonly observed in mammalian cells [39], starting from the input substrate shown in Figure 1, which results from an oligosaccharide precursor in the ER with three glucose residues trimmed out. In our attempts to build consecutive enzymatic steps, we used ten enzymes constituting a large proportion of the mammalian N-linked glycosylation processes. The mannosidases (ManI and ManII) are exoglycosidases that remove mannose groups from N-linked glycans. The other eight enzymes are glycosyltransferases that catalyze the formation of glycosidic bonds. Five Nacetylglucosaminyltransferases (GnTI, GnTII, GnTIII, GnTIV, and GnTV) were considered for the addition of GlcNAc, and FucT, GalT, and SiaT for the addition of fucose, galactose, and sialic acid, respectively.
Based on previous in vivo observations, the removal of α1,2-linked mannoses by ManI was considered in the following order [40,41]: ER resident ManI removes free α1,2linked mannose attached to α1,3-linked mannose in the initial input substrate and then Golgi resident ManI removes each of two remaining free α1,2-linked mannoses successively, making 6-mannose and then 5-mannose glycans. For the remaining enzymatic reactions, we applied the substrate specificity data shown in Table 1 obtained from publicly available literatures [37,42,43]. Except GnTI which uses only one substrate, the other enzymes could catalyze reactions that involve the same glycosidic 15 linkage on a range of different substrates. Finally, by taking into account only the pathways to be terminated at glycans containing mannoses no more than three in the core residue, we integrated pathways to produce mainly complex-type glycans for clarity of analysis. The resulting pathways are represented by a directional graph in which the vertices stand for glycan species and the edges for glycan synthetic reactions with arrows pointing from substrates to products.

Network Decomposition into Subunits
The essentiality of individual glycans in the pathways was investigated by perturbing the pathways through the removal of single glycans. For each removal, we calculated how many glycans could not be produced due to the complete absence of their substrate production. The removal for most of the glycans gave only negligible effects (smaller than the cut-off in Figure 2B), and accordingly, we grouped the remaining glycans, whose removal had large effects, into those in the central region together with the early glycans processed by ManI. Glycans in the central region were located adjacent to each other, forming a self-jointed subgraph and containing root vertices linked to the noncentral or peripheral region. Different criteria for the central region did not affect the main results presented here as long as the cut-off was set between 2~8 ( Figure 2B).
The peripheral region could be further partitioned by maximizing modularity Q for directional graphs [44]: where A ij is 1 if there is an edge from vertex j to vertex i and, otherwise 0. k i in and k j out are the numbers of incoming and outgoing edges of the vertices, m is the total number of edges in the graph, δ ij is the Kronecker delta symbol, and c i is the label of the partition to which vertex i is assigned. Search for the division of the graph into partitions {c i } maximizing Q is known to be NP-complete, thus we used the spectral optimization method [44], which is both computationally efficient and practically acceptable in terms of partitioning results. For this purpose, we pre-assigned the central region a partition and recursively decomposed the peripheral region based on the spectral method (Q = 0.83). The resulting partitions or modules in the peripheral region were labelled in ascending order of the number of constituent glycans. The results of such partitioning on the glycosylation pathways remained robust when an alternative method that was designed for bidirectional or undirected graphs was applied by ignoring the edge directions [26]. Automatic decomposition of all the pathways, including the central region, yielded only a slight increase in modularity (∆Q = 0.02), and this result was excluded to prevent method-specific over-partitioning that does not convey any information of biological significance.

Glycan Synthesis Regulation
To evaluate the effect of transcriptional regulation on glycan synthesis, each enzyme i was assigned variable E i depending on its regulated state: if neutral, and E i = -1 if down-regulated. In addition, let G ij be 1 if enzyme i is involved in synthesizing glycan j and, otherwise 0. Likewise, M jk is 1 if glycan j is located at the entry of module k and, otherwise 0. To focus on unambiguous cases in the prediction of regulatory effects, we only considered the combinations of E i s that satisfied the following rules simultaneously: (1) E i · E i' ≥ 0 for every pair of i and i' satisfying G ij = G i'j = 1, when the given j and k satisfy M jk = 1. Hence, the mixture of both up-and down-regulated enzymes to synthesize a particular glycan at the entry of any given module was excluded. ( for every pair of j and j' satisfying M jk = M j'k = 1 with a given k. Hence, the mixture of both enhanced and suppressed glycan production at the entry of a particular module was excluded.
Furthermore, we kept E ManI = E SiaT = 0 and E GalT ≥ 0 to avoid an otherwise global and unspecific impact on glycan synthesis across modules. Each module k could be assigned where H(x) = 1, 0, or -1 if x > 0, x = 0, or x < 0, respectively.
Although Φ k can be 1, 0, or -1, these three numbers did not appear simultaneously for any combination of E i s. For example, some particular combination of E i s allowed Φ k s to take 1 and 0, but never observed was a combination allowing them to take all of 1, 0, and -1. Here we considered the cases where two of 1, 0, and -1 were taken by Φ k s for given E i s. Because we were interested in the regulatory cases keeping a similar level of the influx of the starting substrate for which modules compete with each other, modules assigned Φ k larger than the other were expected to have enhanced glycan production relative to their basal levels, and were otherwise suppressed. (Table S1). For example, modules assigned Φ k = 1 were regarded as enhanced while the others assigned Φ k = 0 were regarded as suppressed. One can easily prove that such a regulatory effect remains invariant to applying both E i → -E i and permutations of enhanced and suppressed modules. We also examined alternative regulatory models, such as explicitly considering the substrate competition between reactions, but the results did not differ much from the present results. Figure S1. Entry and terminal glycans of peripheral modules. For each module, the parent glycans in the central region and the corresponding reactions are also depicted.

Supplementary Materials
The bulk of each module is dominated by galactosylation and sialylation. Figure S2. Reactions between glycans belonging to different modules. Table S1. Lists of enhanced or suppressed modules under combinations of enzyme regulation. For the comparison with Figure 3(B), the most right column labels each regulatory outcome in which no more than six modules become enhanced.