Constructing a human complex type N-linked glycosylation pathway in Kluyveromyces marxianus

Glycosylation can affect various protein properties such as stability, biological activity, and immunogenicity. To produce human therapeutic proteins, a host that can produce glycoproteins with correct glycan structures is required. Microbial expression systems offer economical, rapid and serum-free production and are more amenable to genetic manipulation. In this study, we developed a protocol for CRISPR/Cas9 multiple gene knockouts and knockins in Kluyveromyces marxianus, a probiotic yeast with a rapid growth rate. As hyper-mannosylation is a common problem in yeast, we first knocked out the α-1,3-mannosyltransferase (ALG3) and α-1,6-mannosyltransferase (OCH1) genes to reduce mannosylation. We also knocked out the subunit of the telomeric Ku domain (KU70) to increase the homologous recombination efficiency of K. marxianus. In addition, we knocked in the MdsI (α-1,2-mannosidase) gene to reduce mannosylation and the GnTI (β-1,2-N-acetylglucosaminyltransferase I) and GnTII genes to produce human N-glycan structures. We finally obtained two strains that can produce low amounts of the core N-glycan Man3GlcNAc2 and the human complex N-glycan Man3GlcNAc4, where Man is mannose and GlcNAc is N-acetylglucosamine. This study lays a cornerstone of glycosylation engineering in K. marxianus toward producing human glycoproteins.


Introduction
Proper protein glycosylation is important because glycosylation affects the stability, biological activity, and immunogenicity of a protein [1]. Many clinically approved therapeutic proteins are glycosylated. Therefore, efforts to engineer glycosylation pathways have been made in a wide variety of cell types including bacterial, fungal, and mammalian cells [2,3]. Mammalian cell lines are usually preferred because they produce complex glycans similar to those in humans. However, the requirements for complex nutrients in culture media and the special

Analysis of N-glycans in different yeasts
Many yeasts, such as Pichia pastoris and Saccharomyces cerevisiae, produce hyper-mannosylated proteins [16]. We analyzed the N-glycans in S. cerevisiae, K. lactis, K. marxianus 4G5 (a wild-type diploid), and K. marxianus α2, a Cas9-carrying haploid strain derived from 4G5 [15] (Fig 1). The glycan profile differs significantly among these yeasts, but in each strain most of the glycans contained 7-12 mannoses and two N-acetylhexosamines (i.e., Man 7-12 GlcNAc 2 ). In K. marxianus 4G5, the N-glycans with <10 mannoses account for 74% and those with ≧10 glycans accounted for 26% of the glycans. The same was true for K. marxianus α2. Although the proportions of Man 7 GlcNAc 2 and Man 8 GlcNAc 2 were higher in S. cerevisiae than in K. marxianus 4G5, the proportion of Man 8 GlcNAc 2 was much lower in S. cerevisiae. Thus, in S. cerevisiae, the N-glycans with <10 mannoses accounted for only 64% while those with ≧10 glycans accounted for 36% of the total N-glycans. K. lactis belongs to the same genus as K. marxianus, but only 36% of its N-glycans had <10 mannoses, while 64% had ≧10 mannoses. Thus, K. marxianus has weaker mannosylation than S. cerevisiae and K. lactis.

Generation of the Man 5 GlcNAc 2 N-glycan
Our first step was to produce the Man 5 GlcNAc 2 N-glycan core in K. marxianus α2 (S1 Fig); it is the glycan flipped from the cytosolic face to the luminal face of the ER [17]. For this purpose, we planned to knock out ALG3 and OCH1 in K. marxianus α2 (S1 Fig). Also, to increase the frequency of the DNA integration via homologous recombination (HR) [18], we planned to knock out KU70, which is involved in the non-homologous end joining (NHEJ) pathway. At the same time, we also wanted to knockin GnTII (~3.5 kb) and a donor DNA fragment HR-Blank (4.5 kb) for testing the knockin of a relatively long DNA fragment. Thus, we simultaneously knocked out ALG3, OCH1 and KU70 and knocked in GnTII and HR-Blank into the gRNA cutting sites on the KU70 and ALG3 genes, respectively, using PCK and the G418 selection marker gene. For the above knockouts and knockins, we designed the gRNAs to target the conserved regions of the KU70, ALG3 and OCH1 genes using the CRISPOR software [19] and constructed them on T&A vectors (S2 and S3A Figs, S1 Table). We named the strain obtained "K. marxianus αO3-I2", which has the genotype ku70::GnTII, alg3::HR-Blank, and och1:: (We use "O" and "I" to denote "knockout" and "knockin", respectively, so "O3" means "3 knockouts" and "I2" means "2 knockins".) We confirmed the insertion of 33 bp in the OCH1 gene by PCR and sequencing (S4 Fig We conducted LC-MS analyses of glycan profiles and found that in K. marxianus αO3-I2, the proportion of Man 5 GlcNAc 2 increased from 0% to 48±6%, compared to K. marxianus α2 (Fig 2). Also, the proportions of Man 6 GlcNAc 2 and Man 7 GlcNAc 2 increased significantly while the proportions of glycan forms with > 7 mannoses were greatly reduced (Fig 2). These observations can be taken as the effect of the deletion of ALG3 and OCH1 on mannosylation in K. marxianus αO3-I2.

Increasing the accumulation of Man 3 GlcNAc 2
We noted above that the proportions of Man 3 GlcNAc 2 in the K. marxianus αO4-I3 and αO4-I4 strains, both of which include the MdsI gene, were extremely low. This could be due to a low expression level of the MdsI gene in these two strains. Our RNA analysis showed that the expression level of MdsI was lower than that of GnTII in K. marxianus αO4-I3 and αO4-I4 (Figs 4A and 5A), so it might not be high enough for producing the amount of Mds1 required for cleaving α1,2 mannose. This could be in part because the Cas9, MdsI, GnTI and GnTII genes all used the LAC4 promoter (P LAC4 ). To test this possibility, we constructed two new strains: (1) K. marxianus αO4-I3ΔC, which was derived from K. marxianus αO4-I3 by knocking out the multiple Cas9 genes and the zeocin, hygromycin and G418 resistance genes, using no selection marker but by cell dilution, and (2) K. marxianus αO4-I4ΔC, which was derived from K. marxianus αO4-I3ΔC by knocking in the GnTI gene, using the G418 resistance gene as the selection marker. The expression of MdsI was indeed greatly increased in these two new strains (Fig 4A). The production of Man 3 GlcNAc 2 was also increased to 2.43±0.25% in αO4-I3ΔC and 2.88±0.6% in αO4-I4ΔC (Fig 4C). These proportions were still very low, likely because of severe protein degradation ( Fig 4B) (see discussion). The GnTI gene was expressed at a fairly high level in K. marxianus αO4-I4ΔC (Fig 4A), but only 0.01±0.008% Man 3 GlcNAc 4 was detected (Fig 4C). This might be because only a faint band was seen in the western blot analysis of protein expression, probably because of protein degradation (Fig 4B).

Discussion
In this study, we developed a CRISPR/Cas9 system, called PCK, for gene knockouts and knockins in K. marxianus. Our protocol uses linearized DNA fragments to facilitate transformation by electroporation. We showed that PCK could be used to simultaneously knock out three genes and knock in two genes. Moreover, our DNA cassette design enables two or more DNA fragments to recombine into one fragment after they are transformed into the cell. Thus, PCK is a useful tool for genome editing. We found that K. marxianus 4G5 has weaker hypermannosylation than S. cerevisiae and K. lactis (Fig 1). K. marxianus α2 and K. marxianus 4G5 have similar glycan profiles (Fig 1) and growth rates [15]. Moreover, K. marxianus α2 is a haploid and carries multiple Cas9 genes, which can facilitate genetic manipulations. Thus, it is a suitable host for our purpose.
According to our glycosylation engineering plan (S1 Fig), we first knocked out the ALG3 and OCH1 genes in K. marxianus α2 (and at the same time knocked in the GnTII gene) to reduce hypermannosylation and the resultant strain K. marxianus αO3-I2 indeed showed a great reduction in the average number of mannoses per glycan (Fig 2). However, more than 50% of the glycans in K. marxianus αO3-I2 still carried more than 5 mannoses, suggesting that other enzymes can add mannoses to glycans.
To convert Man 5 GlcNAc 2 to Man 3 GlcNAc 2 , the core glycan, we knocked the MdsI gene into K. marxianus αO3-I2 and obtained the K. marxianus αO4-I3 strain, which could produce Man 3 GlcNAc 2 , albeit at a very low level (only 0.06±0.09%) (Fig 3). To produce the human complex glycan Man 3 GlcNAc 4 , our target glycan structure, we knocked the GnTI and MdsI genes into K. marxianus αO3-I2, which already carried the GnTII gene, and obtained the K. marxianus αO4-I4 strain. This strain indeed could produce Man 3 GlcNAc 4 , although at a very low level (only 0.02±0.03%).
To raise the productions of Man 3 GlcNAc 2 and Man 3 GlcNAc 4 , we constructed three new strains. First, we derived K. marxianus αO4-I3ΔC from K. marxianus αO4-I3 by knocking out the multiple Cas9 genes and the antibiotics zeocin, hygromycin and G418 resistance genes. As the Cas9 genes and MdsI and GnTII genes in αO4-I3 were all driven by the P LAC4 , knocking out the Cas9 genes and their promoters greatly increased the expression level of MdsI, although not that of GnTII, leading to an increase in the production of Man 3 GlcNAc 2 from~0% to 2.43 ±0.25%. Second, we knocked in the GnTI gene into αO4-I3ΔC to obtain αO4-I4ΔC, which could produce 2.88±0.58% Man 3 GlcNAc 2 and 0.01±0.008% Man 3 GlcNAc 4 , which is our target glycan structure. Third, we derived αO4-I4ΔR from αO4-I4. Although the only difference between the two strains is that αO4-I4 contains the hygromycin resistance gene while αO4-I4ΔR does not, αO4-I4 could produce only 0.05±0.09% Man 3 GlcNAc 2 and 0.02±0.03% Man 3 GlcNAc 4 ( Fig 3B) but αO4-I4ΔR could produce 2.10±1.24% Man 3 GlcNAc 2 and 0.23 ±0.07% Man 3 GlcNAc 4 ( Fig 5C).
Our study is still substantially behind studies in other yeasts. For example, the proportions of Man 3 GlcNAc 2 and Man 3 GlcNAc 4 are 2.10% and 0.23%, respectively, in our study but were 1.92% and 35.48% in S. cerevisiae [21]. Thus, much effort remains to be made.
As proof of concept, our data do indicate that the glycosylation engineering steps we proposed (S1 Fig) can indeed lead to the production of the human complex glycan Man 3 GlcNAc 4 , although at a very low level. Thus, our challenge now is how to raise the production of Man 3-GlcNAc 4 . Our Western blot analysis of the MdsI, GntI and GnTII proteins in K. marxianus αO4-I4ΔC and K. marxianus αO4-I4ΔR suggested that severe degradation of these proteins was likely a reason for the low production of Man 3 GlcNAc 4 . Therefore, our next task is to reduce protein degradation.
Protein degradation is usually due to peptide cleavage by proteases and disruption of protease genes has been found to increase the yield of recombinant peptides expressed in yeasts [22][23][24]. Of particular relevance is the eukaryotic secretory aspartyl protease family (pfam00026) that includes cathepsin D, pepsin, renin, penicillopepsin, and fungal yapsins (Yps's). For example, disrupting the Yps1 gene in S. cerevisiae increased the yield of heterologous peptides. From the K. marxianus genome, we have identified five proteins homologous to pfam00026 aspartyl proteases (i.e., Yps1p, Yps7p, Pep4p, Prb1p and Bar1p). K. marxianus Yps1p (KLMA_20534) and Yps7p (KLMA_40262) are yapsin family proteases that are putatively attached to the plasma membrane or cell wall via a glycosylphosphatidylinositol anchor.
Bar1p (KLMA_50468) is homologous to a S. cerevisiae periplasmic protease that mediates pheromone degradation and cleaves and inactivates α-factor [23]. Pep4p (KLMA_70025) is a soluble vacuolar protease (proteinase A) required for the post-translational precursor maturation of vacuolar proteinases that are important for protein turnover after oxidative damage [25]. K. marxianus Prb1p (KLMA_80029) is a yeast vacuolar protease (proteinase B) and its role is similar to Pep4p [26]. Destruction of these proteases could effectively increase the peptide yield [27]. We shall first knock out the genes for Yps1 [28] and/or Pep4 [27,29] to see if the productions of Man 3 GlcNAc 2 and Man 3 GlcNAc 4 are increased. If this is still not sufficient to explain the low productions, we will consider the other two proteases or search for other proteases.
Another possible reason for the low production of Man 3 GlcNAc 2 in glycoengineered yeasts is the phosphorylation of glycans, which adds phosphates to α1,2-linked mannose residues at four sites of N-glycans, preventing the hydrolysis of terminal α1,2-linked mannose by MdsI [10,30]. In our data, the proportions of phosphorylated glycans were much higher in K. marxianus glycoengineered strains than K. marxianus α2 and phosphorylation occurred mainly on Man �5 GlcNAc 2 (S8 Fig). It has been shown that N-glycan mannosylphosphorylation can be abolished in S. cerevisiae, P. pastoris, and Y. lipolytica by the disruption of the MNN4 and/or MNN14 genes [31][32][33]. Our bioinformatics analysis revealed that K. marxianus lost the MNN6 gene and that K. marxianus MNN4 (KLMA 30052) and MMN14 (KLMA_10282, PNO1) are homologous S. cerevisiae MNN4 and MMN14, respectively. We therefore plan to knock out these two genes in our glycoengineered strains (e.g., K. marxianus αO4-I4ΔR) to see if it can prevent or reduce phosphorylation of glycans.

Yeast strains, media and culture conditions
The Kluyveromyces marxianus α2 strain (MATα, ΔMATα3) used in this study was created from the K. marxianus 4G5 diploid strain [15]. It is a haploid Cas9-carrying strain. The culture conditions used in this study were as described previously [15,34]. The genotypes of all strains used in this study are shown in S2 Table. For the selection of gene knockout strains, the YPG medium with 200 μg/mL of G418 was used, if G418 was used as the selection marker, and the YPG medium with 0.1% 5-FOA (5-Fluoroorotic acid, Watson Biotechnology Co.) was used for selection of URA3 knockout strains.
The knockout mutants were streaked out for 5 generations for colony purification and then cultured in YPGU (YPG with 0.1% uracil) or YPGUC (YPGU with 0.2% CaCl 2 �H2O) media at 30˚C for 36 hours for growth test and glycan analysis.

The PCK protocol
The PCK (protocol for CRISPR/Cas9 multiple gene knockouts and knockins) protocol starts with the K. marxianus α2 strain as the host. The protocol consists of four steps (S2 Fig): (1) RNA design and construction. We use CRISPOR (http://crispor.tefor.net/) to exclude off-targets and improve on-target efficiency, and the RNAfold Webserver (http://rna.tbi.univie.ac. at//cgi-bin/RNAWebSuite/RNAfold.cgi) to predict gRNA secondary structure. We select 1~3 gRNAs for each target gene. We construct each gRNA on the T&A vector. The double-stranded gRNA cassette is amplified by PCR using the M13 primer pairs. (2) Gene or donor DNA cassette design and construction. A homologous recombination sequence of~60 bp is designed at the left end and another sequence at the right end of each cassette. We use the P LAC4 to drive the GnTI, GnTII, and MdsI genes. Each selection marker gene is driven by the P ADHI derived from S. cerevisiae. The primer pairs of the recombination fragments are ligated to the head and tail of the target gRNA or gene cassette for PCR amplification. (3) Transformation of gRNA and gene cassettes. The Cas9 gene expression is continued for 6 to 12 hours. Linearized gRNA, donor DNA fragments, and a selection marker gene are simultaneously transformed into yeast cells by electroporation. (4) Colony selection. A colony screening can be done using antibiotics or nutrition gene selection.

Plasmid construction
The plasmids used in this study are listed in S1 Table. The commercial vector pKLAC2 (K. lactis Protein Expression Kit [35], New England Biolabs, MA) was used as the gene expression backbone with the G418 selection marker. We synthesized the genes by optimizing its codon usage for K. marxianus (Protech Technology Enterprise Co., Ltd.). Restriction cutting sites on the plasmid pMH1-pMH3 are marked in S7 Fig AF212153] had proven effective in hydrolyzing α-1,2-linked mannose residues in vivo in fungus [36,37]. The plasmid pMH-1 contained the 3' end of the P LAC4 , the signal peptide sequence of the S. cerevisiae α-mating factor, the open reading frame of the T. reesei α-1,2-mannosidase cloned in frame, the coding sequence for HDEL and a stop codon. The coding sequence for a 12x His-Tag was inserted between the sequences coding for the catalytic domain and the HDEL signal ( S7A Fig). 2. The pMH-2 plasmid was constructed according to a previous study [38] and 3. The pMH-3 plasmid was constructed according to a previous study [41]  All oligonucleotide primers used for PCR-based assembly of DNA fragments and for checking gene insertions are listed in S3 Table. The gRNA cassettes were constructed in pMHg1-g12 plasmids with the SNR52 promoter and SUP4 terminator (S1 Table). All PCR amplification of gRNA and donor DNA cassettes was performed in 2X Green tag buffer (EmeraldAm Max HS PCR Master Mix, TaKaRa) in a total reaction volume of 30 μl. Thermo-cycling consisted of incubation at 95˚C for 3 min followed by 35 cycles of successive incubations at 95˚C for 10 secs, 55˚C for 30 secs (5 min for donor DNA) and 68˚C for 30 secs (8 min for donor DNA). After thermos-cycling, a final extension was performed at 68˚C for 10 min.

Validation of gene knockouts and knockins
If the size of a DNA fragment knockout was smaller than 50 bp, the validation was carried out by sequencing. Each target gene insertion of the HR-cassette at the gRNA cutting site was checked by PCR. After culturing, we lysed the cells in QE buffer (QuickExtract TM DNA Extraction Solution, Lucigen) at 65˚C for 30 min and 95˚C for 15 min. The total of 2 μl DNA with the specific primer pair and Green Tag PCR Mix solution (EmeraldAm Max HS PCR Master Mix, TaKaRa) was used for PCR reaction. The PCR reaction was conducted at 95˚C for 3 min followed by 35 cycles of incubation at 95˚C for 10 sec, 55˚C for 20 sec (6 min for long fragment) and 68˚C for 1 min (8 min for long fragment). The final extension was performed at 68˚C for 10 min.

Western blot and qRT-PCR
Western blot analysis and qRT-PCR were conducted as in Lee et al. [15]. His-Tag antibody (HRP-conlugated 6 � His, His-Tag Mouse McAb, Proteintech) was diluted 1: 5000 for western blot. The qRT-PCR primer pairs used in this study are listed in S1 Table.

Mass spectrometry and data analysis
Yeast cell pellets were collected after overnight culturing in the volume of 50 ml and then resuspended in 30 ml of 10 mM HEPES buffer. Lysates were prepared through the disruption process six times in a Microfluidizer 1 processor (Microfluidics Co., Westwood, MA), followed by centrifugation at 6,000 rpm for 5 min. The supernatant was passed through a 0.45 μm filter (Pall Co., Port Washington, NY) and the protein concentration was measured by Pierce BCA assay (Thermo Fisher Scientific, San Jose, CA). Lysates were subjected to in-solution tryptic digestion with filter-assisted sample preparation (FASP) method [44] and subsequently treated with PNGase F to release N-glycans. Released glycans were cleaned up by C18 cartridges and detected by LC-ESI-MS on a LTQ Orbitrap XL ETD mass spectrometer (Thermo Fisher Scientific) equipped with Waters Acquity UPLC (Waters, Milford, MA) system, and a PGC HT column (1.0 mm x 150 mm, 3 μm, Thermo Fisher Scientific) with homemade heating oven (190˚C). The gradient employed was 98% buffer A/2% buffer B at 2 min 40% buffer A/to 60% buffer B at 20 min with a flow rate of 250 μL/min, where buffer A was 0.1% formic acid/H 2 O, and buffer B was 0.1% formic acid/80% acetonitrile. Survey full-scan MS condition: mass range m/z 500-2000, resolution 15,000 at m/z 400. The most intense ions were sequentially isolated for HCD (Resolution 7500). Electrospray voltage was maintained at 4.0 kV and the capillary temperature was set at 275˚C. The m/z corresponding to the N-glycan was analyzed by GlycoWorkbench [45] through the search in the Consortium of Glycomics (CFG) N-glycan database, and the relative intensity of each ion was used for the calculation to give the percentage of each glycan.

S2 Fig. The PCK protocol.
Step 1: gRNA design and construction. To exclude off-targets and improve on-target efficiency, we use the CRISPOR software (http://crispor.tefor.net/). For gRNA secondary structure calculation, we use the bioinformatical tool RNAfold Webserver (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi). The designed gRNA is constructed on the T&A vector. The double-stranded gRNA expression cassette is amplified by PCR using the M13 primer pairs. Step 2: Gene or donor DNA cassette design and construction. A homologous recombination sequence of~60 bp is designed at the left and right ends of each gRNA site. The primer pairs of the recombination fragments are ligated to the head and tail positions of the target gene cassette for PCR amplification.
Step 3: Transformation of gRNA and gene cassettes. Cas9 gene expression is continued for 6 to 12 hours. Linearized gRNA, donor DNA fragments and a selection marker are transformed into yeast cells by electroporation.
Step 4: Colony selection. We select strains from the plate. cerevisiae P ADHI and terminator, which were used for designing antibiotic gene cassettes. Note that the Cas9 coding region is in front of a zeocin cassette and is repeated in the P LAC4 region. When the zeocin cassette is cut, the area of P LAC4 will be rearranged, giving rise a chance to remove the Cas9 gene. αO4-I4ΔC, Lane 3: αO4-I3ΔR, Lane 4: αO4-I4ΔR. (a) All gene cassettes were inserted to the chromosome and the genes inserted were validated by PCR, using the S1274F and S1276R primer pairs. The white font indicates the different fragment sizes of the transformed genes on the left side of the figure. We used the S1274F and MdsI-R2 primer pairs to confirm the three strains that were supposed to carry by the MdsI gene (right side of the figure). (b) The left side of the figure confirmed that the GnTI gene was inserted into the URA3 gene position; it was checked by PCR using the URA3-F and GnTI-R primer pairs. The right side of the figure confirmed that the mating-type was retained on the α haploid. (c) The left side of the figure confirmed that the MdsI gene was inserted into the URA3 gene; it was checked by PCR using the URA3-F and MdsI-R2 primer pairs. The right side of the figure confirmed that the GnTI gene was retained on the transformants by PCR using the S1274F and GnTI-R primer pairs. The proportions of phosphorylated glycans are higher in K. marxianus glycoengineered strains than α2 wild type. (a) K. marxianus αO3-I2, αO4-I3, αO4-I3ΔC and αO4-I4ΔC were glycoengineered strain. Their phosphorylation is significantly higher than a2. The production of total glycan was increased to 24% in αO4-I3ΔC and 28.4% in αO4-I4ΔC. Phosphorylated glycoforms focus on Man 5-6 GlcNAc 2 . (b) K. marxianus αO4-I4, αO4-I3ΔR and αO4-I4ΔR were glycoengineered strain. Their phosphorylation is significantly higher than a2. The production of total glycan was increased to 41.6% in αO4-I3ΔR and 36.6% in αO4-I4ΔR. Phosphorylated glycoforms focus on Man 5-6 GlcNAc 2 . (TIF) S1 Table. The list of all plasmids used in this study. The plasmids (pMH-1 to pMH-3) contained the gene for glycosyltransferase with specialized anchor positioning signal peptides, LAC4 promoter (P LAC4 ), and terminator, which was constructed in the pU18 vector. The donor DNA PCR was also constructed in the pU18-genes vector. The plasmids (pMH-g1 to pMH-g12) of the gRNA expression cassette contained the SNR52 promoter and the SUP40 terminator. (DOCX) S2 Table. The list of the yeast strains used in this study. (DOCX) S3 Table. The list of all primer pairs used. These primers were for the construction of gRNA cassettes, homologous recombination of donor DNA cassettes and confirmation of target gene knockout fragments by PCR. (DOCX) S1 Raw images. (PDF)