Genome-Wide Screening of Genes Required for Glycosylphosphatidylinositol Biosynthesis

Glycosylphosphatidylinositol (GPI) is synthesized and transferred to proteins in the endoplasmic reticulum (ER). GPI-anchored proteins are then transported from the ER to the plasma membrane through the Golgi apparatus. To date, at least 17 steps have been identified to be required for the GPI biosynthetic pathway. Here, we aimed to establish a comprehensive screening method to identify genes involved in GPI biosynthesis using mammalian haploid screens. Human haploid cells were mutagenized by the integration of gene trap vectors into the genome. Mutagenized cells were then treated with a bacterial pore-forming toxin, aerolysin, which binds to GPI-anchored proteins for targeting to the cell membrane. Cells that showed low surface expression of CD59, a GPI-anchored protein, were further enriched for. Gene trap insertion sites in the non-selected population and in the enriched population were determined by deep sequencing. This screening enriched 23 gene regions among the 26 known GPI biosynthetic genes, which when mutated are expected to decrease the surface expression of GPI-anchored proteins. Our results indicate that the forward genetic approach using haploid cells is a useful and powerful technique to identify factors involved in phenotypes of interest.


Introduction
Anchoring of cell surface proteins by the glycolipid glycosylphosphatidylinositol (GPI) is a conserved posttranslational modification in eukaryotes [1,2]. Indeed, more than 60 proteins in Saccharomyces cerevisiae and more than 100 proteins in mammals are known to be modified with GPI. Moreover, GPI biosynthesis is essential for yeast growth and embryogenesis in mammals [3,4]. It is synthesized by the stepwise addition of sugars, an acyl-chain, and phosphoethanolamines to phosphatidylinositol (PI) in the endoplasmic reticulum (ER). GPI-anchored proteins (GPI-APs) are then remodeled and transported to the plasma membrane through the Golgi apparatus [5][6][7]. At least 17 steps are required for the correct biogenesis of GPI-APs in mammalian cells, including GPI biosynthesis, attachment to proteins, remodeling, and transport (S1 Table), and more than 25 genes directly involved in GPI biosynthesis have been identified.
Mutations in GPI biosynthetic genes result in several disorders. Somatic mutations in the phosphatidylinositol glycan class A gene (PIGA) in hematopoietic stem cells are causative of paroxysmal nocturnal hemoglobinuria (PNH), which is an acquired GPI deficiency [8]. PNH has also been shown to be caused by a combination of heterozygous germline mutations and somatic mutations in PIGT [9]. Recent progress in exome sequencing by the next generation sequencer (NGS) technology found that mutations in at least 12 GPI genes cause inherited GPI deficiencies (S1 Table) [10,11], and the identification of genes required for GPI biosynthesis greatly contributes to our understanding of disease phenotypes and diagnosis.
To identify genes or proteins required for GPI biosynthesis, three major strategies have been used to date: yeast genetic screens, genetic screens using mammalian cells, and biochemical approaches (S1 Table). One of the advantages of using yeasts is that they stably maintain haploid states, which enables both forward and reverse genetics to be performed. Thus, genes involved in GPI biosynthesis have been identified using yeast mutants defective in the incorporation of tritiated inositol on proteins or α-agglutinin on the cell wall [4,12]. However, the limitation of isolating mutant yeast cells is that most GPI biosynthetic genes are essential for growth, and their mutations cause severe phenotypes [6]. Biochemical approaches use coimmunoprecipitation with proteins that form functional complexes for biosynthetic reactions, and have identified subunits of the GPI-N-acetylglucosamine transferase (GPI-GnT) complex, dolichol-phosphate mannose (Dol-P-Man) synthase complex and GPI transamidase complex [13][14][15]. However, because most GPI biosynthetic proteins are multi-spanning transmembrane proteins, they can be difficult to isolate.
Genetic analysis using mammalian cells is an alternative method used to identify genes required for GPI biosynthesis [16] (S1 Table). It is based on the utilization or isolation of mutant cells defective in GPI-APs on the cell surface, and expression cloning of the genes responsible. Mutagenized mammalian cells such as T-lymphoma cells or Chinese hamster ovary (CHO) cells are often used, and 17 GPI biosynthetic genes have thus far been determined by expression cloning. A combination of mutant cell isolation and expression cloning involving mammalian cells can be a useful approach, but has several disadvantages. Clonal mutant cells have to be isolated, then the responsible genes from a cDNA library need to be determined from each mutant cell line.
Recently, several mammalian haploid cell lines have been reported [17][18][19][20]. Human HAP1 cells are one such adherent cell line derived from haploid KBM7 cells, and which can stably maintain the haploid state with one of each chromosome from 1 to 22 and the X chromosome [18]. Brummelkamp et al. previously used these cells to conduct genetic screening methods combined with gene trapping [18,21], which is a method of disrupting genes by inserting trap vectors into an intron of an expressed gene to inhibit transcription and splicing. By combining haploid cells with gene trap methods, mutant cells can be obtained for phenotypes of interest, and the gene responsible for the mutant cells can be determined by sequencing the trapped site of the genome. In the present study, we used a forward genetic method involving haploid human cells and a combination of gene trap methods and NGS technology to identify genes required for GPI biosynthesis. Our results indicate that this genetic screening method is a powerful tool to help comprehend the GPI biosynthetic pathway.

Establishment of Gene-Trapped HAP1 Mutant Cell Population
A gene trap virus was produced by transfecting the Platinum-GP Retroviral Packaging Cell Line in eight 15-cm dishes with a mixture of pCMT-SApA-BSD and pLC-VSVG plasmids. The virus-containing supernatant was concentrated five times using PEG-it virus precipitation solution (System Biosciences) and then mixed with 8 μg/ml of polybrene prior to transfection. HAP1 cells were enriched by the cell sorter FACSAria II (BD Bioscience) and proliferated before mutagenesis. A total of 6 × 10 7 cells prepared in six-well plates containing 2.5 × 10 6 cells per well were infected by centrifugation at 2,500 rpm for 2 h at 32°C. Two days after infection, the cells were selected with 6 μg/ml of blasticidin (InvivoGen) for 1 week.

Enrichment of GPI-negative HAP1 Mutant Population
After selection with blasticidin, mutagenized HAP1 cells (2.4 × 10 8 cells) were treated with 0.2 nM proaerolysin for 1 day. Surviving cells were cultured, proliferated, and treated again with 0.2 nM proaerolysin for 1 day. Surviving cells were stained with an anti-CD59 antibody followed by PE-conjugated anti-mouse IgG, and CD59-negative cells were enriched by cell sorting using a FACSAria II. The sorted cells were pooled as the GPI-negative cell population (S1 Fig). Flow Cytometry HAP1 cells were harvested, collected, and resuspended in FACS solution (phosphate-buffered saline containing 1% bovine serum albumin and 0.1% NaN 3 ). A total of 5 × 10 5 cells/sample were stained with an anti-CD59 or anti-DAF antibody (10 μg/ml) and PE-conjugated goat anti-mouse IgG. In some cases, cells were stained with FLAER (10 −8 M). Stained cells were analyzed using the FACSCanto II (BD).

Electroporation
HAP1 cells (2 × 10 7 cells) were collected and resuspended in 0.2 ml of OPTI-MEM (Life Technologies). Cells were mixed with 10 μg of plasmids and electroporated once at 250 V for 20 ms using ECM830 (BTX). Alternatively, cells and 10 μg of plasmids were mixed and resuspended in 100 μl of buffer T (Life Technologies) and electroporated at 2000 V for 10 ms twice using Neon (Life Technologies).

Sequence Analysis of Gene-Trap Insertion Sites in Clonal Cells
Genomic DNA was isolated from 2 × 10 6 HAP1-GT-C3 cells using the Wizard Genomic DNA purification kit (Promega), then 2 μg was digested with HaeIII and ligated with the splinkerette adaptor [24], which consists of two oligonucleotides: Spl-top-HaeIII and SplB-BLT-HaeIII (oligonucleotide sequences are listed in S2 Table). DNA fragments were then digested with PvuII, which cleaves the vector sequence between the 3 0 long terminal repeat (LTR) and the upstream HaeIII site, to prevent unwanted vector amplification (S2 Fig). After column purification, the fragments were used as templates for splinkerette polymerase chain reaction (PCR) using the Spl-P1 and LTR-1st primers, followed by nested PCR using the Spl-P2 and LTR-2nd primers (S2 Table). The resulting DNA fragments were separated by agarose gel electrophoresis, cut and extracted from the gel. After phosphorylation of the DNA fragments by T4 polynucleotide kinase (NEB), they were ligated into EcoRV sites of pBluescript II and sequenced.

Sequence Analysis of Gene-Trap Insertion Sites by NGS
Genomic DNA was isolated from 3 × 10 7 cells using the Wizard Genomic DNA purification kit (Promega), then 15 μg was digested with HaeIII and ligated with the splinkerette adaptor as before. DNA fragments were digested with PvuII, purified, and used as templates for splinkerette PCR as described above. PrimeSTAR GXL DNA polymerase (Takara) was included in the PCR to reduce amplification bias. Resulting DNA fragments were further amplified by nested PCRs using Spl-P2 and LTR-2nd primers, followed by Rd1Tru-LTR and Rd2Tru-Splink primers, which contain the Illumina sequencing primer sequences. The resulting DNA products contained the end of the 5 0 LTR retroviral sequence, followed by the genomic DNA sequence flanking the insertion site ending at the HaeIII restriction site and part of the splinkerette adaptor sequence. Illumina P5 (AATGATACGGCGACCACCG) and P7 (CAAGCAGAAGACGG CATACGA) adapters and barcode sequences were attached to the products by six cycles of PCR using 10 ng of each of the initial PCR product as template. Single-end sequencing (151-bp) was performed using the HiSeq 2500 system (Illumina). The numbers of reads obtained from non-selected control cells and GPI-negative cells were approximately 7.5 million and 1.7 million, respectively.

Analysis of Gene-Trap Insertions
FASTQ data files were analyzed using CLC Genomic Workbench software (Qiagen). After quality trimming and removal of the common LTR sequence, all reads were further trimmed from their 3 0 ends to a length of 50-bp. These 50-bp reads were mapped onto the human genome (hg19). To exclude ambiguous alignments, all non-specific matched reads were ignored. To eliminate PCR amplification bias and to determine the independent insertion sites, duplicate reads were removed and counted as one read (a unique insertion site). The mapped reads in each gene were further analyzed by the RNA-Seq tool of the CLC Genomics Workbench software. Mismatched reads were excluded to eliminate bias of the mapped reads in each gene caused by PCR or sequencing errors. We mapped 199,043 and 12,183 independent reads to the human genome in non-selected and GPI-negative cell populations, respectively. Of these, 56,839 insertion sites in the non-selected population and 1,681 in the GPI-negative population were found in gene regions. The number of insertions per gene and inactivating insertions (all the forward insertions and reverse insertions in exons) per gene were counted. 34,768 and 1,164 inactivating insertions were identified in non-selected and GPI-negative cell populations, respectively. The amount of enrichment of a particular gene was calculated by comparing the selected with the non-selected population. For each gene, a P-value and a P-value corrected for the false discovery rate were calculated by the one-sided Fisher's exact test using R software. A bubble plot was also created using R software.

Genetic Screening of GPI Biosynthetic Factors using HAP1 Cells Treated with Proaerolysin
Recently, Brummelkamp and co-worker reported excellent genetic screening methods using human haploid cells combined with gene trapping [18,21]. We applied the methods for analysis of genes required for GPI biosynthesis. Retrovirus-based gene trap vector was designed for mutagenesis that confers blasticidin resistance to the cells to ensure vector integration into the genome (Fig 1A). Aerolysin was used to obtain mutant cells defective in the biosynthesis of GPI-APs (Fig 1A). Aerolysin is secreted by the gram-negative bacteria Aeromonas and is known to bind to the GPI-anchor itself for cell membrane targeting [26], after which it forms a heptamer and makes a hole in the cell membrane [27]. To determine the cytotoxicity of aerolysin in HAP1 cells, we treated them with various concentrations of proaerolysin (Fig 1B). At concentrations 0.15 nM, 99% of cells died, so we used 0.2 nM proaerolysin to obtain cells resistant to aerolysin. We introduced gene trap vectors into the genomes of 6 × 10 7 HAP1 cells, then treated them twice with proaerolysin to obtain aerolysin-resistant cells (Fig 1A). Since significant amounts of cells still express GPI-APs on the cell surface even after proaerolysin treatment, cells with reduced surface expression of CD59, which is one of the ubiquitously expressed GPI-APs, were obtained (GPI-negative cell pools) using a cell sorter to become the enriched population.
Flow cytometric analysis showed that the surface expression of FLAER and GPI-APs such as CD59 and DAF was greatly reduced in the enriched cells compared with wild-type cells (Fig 2A). If the gene trap insertions had occurred within introns, we would expect that excision of the mutagenic vector components by Flp recombinase using flippase recognition target  (Fig 2B) would reverse the mutant phenotype. Indeed, some HAP1-GT cells showed restored surface GPI-AP expression following transfection of the Flp recombinase gene (Fig 2A). The observed low restoration rate represented the transfection efficiency and the efficiency of Flp recombinase activity in the excision of insertional sites.

Characterization and Determination of the Genes Responsible for Clonal GPI-Negative HAP1 Cells
We isolated several clonal cell lines from the GPI-negative cell population, including HAP1-GT clone 3 (C3) which showed no surface expression of GPI-APs (Fig 3A). GPI-AP expression could, however, be rescued by transfection of the Flp recombinase gene. We identified six insertion sites of the gene trap vector in the genome of HAP1-GT-C3 cells (S3 Fig).
Three of these were found in intergenic regions, one matched a bacterial artificial chromosome sequence derived from chromosome 3, one was inserted in the reverse direction of an intron so would not affect splicing, and the last was integrated in the forward direction in an intronic region of PIGP (Fig 3B), which encodes a subunit of GPI-GnT. Transfection of a complete PIGP gene restored the surface expression of GPI-APs (Fig 3C), showing that the screening was accurate.

Comprehensive Analysis of Genes Required for GPI Biosynthesis
We next tried to comprehensively identify insertion sites of the gene trap vector in GPI-negative cell pools. Genomic DNA was extracted from cell populations without selection or GPInegative cell populations, and the Illumina HiSeq was used to determine the insertion sites (S1 and S2 Figs). The enrichment of insertion sites between GPI-negative and non-selected populations was analyzed, enabling us to rearrange genes according to their significance. We found that the top 23 genes enriched in the GPI-negative population were known to be directly or indirectly involved in GPI biosynthesis (Fig 4 and S3 Table). Most were phosphatidylinositol glycan genes (PIG) [27], although post-GPI-attachment to protein gene 2 (PGAP2), DPM1, DPM3, MPDU1, and PMM2 were also enriched. PIGM and PGAP3 were enriched to a lesser extent, although this was not significant (S3 Table).
We undertook further analysis of a number of genes. To count the insertion sites in each genes, CLC Genomics Workbench software was used as described in "Materials and methods" part. In PIGS, four independent insertion sites were identified in the non-selected population, of which three were inactivating (Fig 5A). In the GPI-negative population, a total of 58 insertion sites were identified, of which 55 were inactivating. In the case of PIGW in the nonselected population, five independent insertion sites were identified, of which four were inactivating; this compares with 22 in the GPI-negative population, all of which were inactivating (Fig 5B). In PIGP, nine independent insertion sites were identified in the non-selected population, all of which were inactivating (Fig 5C). In the GPI-negative population, a total of 92 insertion sites were identified, of which 72 were inactivating. the screening procedure, the 4 times of infected cells (2.4 × 10 8 cells) were used for the treatment with 0.2 nM proaerolysin for 1 day. After the first treatment, the survived cells were harvested and cultured in new plates. Then, 6 × 10 7 cells were treated with 0.2 nM proaerolysin for 1 day again. Resistant cells were proliferated and stained with an anti-CD59 antibody and CD59-negative cells were further enriched by cell sorting. Ψ, packaging signal; PGKpro, PGK promoter; CMV pro, CMV promoter; BSD, blasticidin S-deaminase gene. B. Sensitivity of HAP1 cells to proaerolysin. HAP1 cells were treated with indicated concentrations of proaerolysin (nM) for 3 h. After changing to medium without aerolysin, cell viability was measured by the WST-1 assay and shown in % viability. Viability of cells without proaerolysin treatment was 100%. Data are means ± standard deviation (n = 4).

Discussion
In this study, we performed a genome-wide screen for GPI biosynthesis using human haploid cells and gene trap vectors. Treatment with proaerolysin enabled mutants defective in the GPI biosynthetic pathway to be enriched. At least 30 genes were shown to be involved in the biosynthesis of GPI-APs, either directly or indirectly (Fig 6 and S1 Table). Of these, 26 genes demonstrated decreased surface expression of GPI-APs when they were disrupted (Fig 6, shown in blue). Screening identified 23 genes required for GPI biosynthesis that were significantly enriched in the GPI-negative cell population. These included PGAP2, which is required for GPI fatty acid remodeling in the Golgi. In PGAP2 mutant CHO cells, the surface expression of The size of the bubble shows the number of inactivating insertion sites in enriched GPI-negative populations. Genes significantly enriched in the GPI-negative population (P<0.001) are colored. Yellow bubbles indicate genes encoding the biosynthesis of GPI intermediates in the ER, green bubbles the genes required for GPI-transamidation, orange bubbles the genes involved in Dol-P-Man synthesis and utilization, and pink bubbles the gene involved in GPI-AP remodeling. The bubbles of PIGM and PGAP3 (arrows) were close to the significance limit. Screening of Genes Required for GPI Biosynthesis GPI-APs was significantly decreased because lyso-forms of GPI-APs (the product in step 16 shown in Fig 6) are transported and released from the plasma membrane into the medium soon after arrival [28]. Other enriched genes were DPM1, DPM3, MPDU1, and PMM2, while PGAP3 and PIGM were weakly enriched. DPM1 and DPM3 are required for the synthesis of Dol-P-Man, which is used as a substrate in the mannosylation of GPI precursors in the ER, while MPDU1 functions in the utilization of Dol-P-Man [29]. Mutations in DPM1, DPM3, and MPDU1 were previously reported to impair the surface expression of GPI-APs [30][31][32]. PMM2 encodes phosphomannomutase 2, which catalyzes the isomerization of mannose-6-phosphate to mannose-1-phosphate. This is subsequently converted to GDP-mannose, which is utilized for mannosylation and Dol-P-Man synthesis. Mutations in PMM2 cause the most common congenital disorder of glycosylation, known as CDG-Ia or PMM2-CDG, although the partial loss of PMM2 activity has little effect on the expression of GPI-APs in patient cells [33]. PIGM is a small gene (4322 bp) with no introns that encodes a protein required for the transfer of the first Man to the GPI intermediate together with PIGX. PGAP3 is required for the fatty acid remodeling of GPI-APs and is critical for the nano-clustering of GPI-APs on the cell membrane [34][35][36]. In Pgap3 -/mouse embryonic fibroblasts, proaerolysin cytotoxicity is reduced, probably because of slow oligomerization of aerolysin on the cell membrane due to low clustering of GPI-APs containing an unsaturated fatty acid [37].
Two genes, DPM2 and PIGY, were not enriched in our screening, although their mutants show decreased expression of GPI-APs on the cell surface [32,38]. We detected one inactivating insertion site in PIGY gene of the GPI-negative population, but not non-selected population (S3 Table). It is conceivable that because they are relatively small (2821 bp for PIGY and 3390 bp for DPM2), gene trap vectors might not readily integrate into these genes. Alternatively, insertion of retroviral vectors may have preferences to specific genome regions as previously reported [39]. Similar findings were also observed in the screening of haploid embryonic stem cells mutagenized by N-ethyl-N-nitrosourea, so it was concluded that the mutation rate for each gene is dependent on the length of the coding sequence [40]. More recently, a haploid genetic screen for the GPI biosynthetic pathway similar to ours has been reported in which CD59-negative or Prion protein-negative cells were enriched from gene-trapped HAP1 cells [41]. In the latter study, three additional genes were found. SEC62 and SEC63, which encode subunits of the ER protein translocon complex, were enriched in the Prion protein-negative cell population and SPPL3 was enriched in the CD59-negative cell population. It has been reported that Prion proteins and GPI-APs in yeast require the SEC62 and SEC63-dependent translocation system to enter into the ER lumen [42][43][44]. Because aerolysin binds to GPIanchor itself for cell targeting, factors generally required for GPI-APs would be enriched in our screening, however those required for expression of specific GPI-APs may not be enriched. In addition, it has been reported that the GPI-anchor on Prion proteins does not seem to be recognized by aerolysin [45]. Actually, the factors such as SEC62 and SEC63 were not enriched in our screening. Conversely, PGAP3 was found only in our screening consistent with a previous report that PGAP3 deletion caused mild resistance to proaerolysin [37].
Details of the factors required in the GPI biosynthetic pathway have not been fully clarified (S1 Table). For example, glucosamine-PI is flipped into the ER luminal side from the ER cytoplasmic face in the early stages of the pathway, but the putative GPI-flippase has not yet been identified [2]. Our study and those of others failed to show the enrichment of any suitable candidate genes [40,41,46]. There could be several reasons for this. First, it is possible that the GPI-flippase gene is essential for cell growth, although no other gene involved in GPI biosynthesis is essential for growth at the cellular level. Second, the gene required for flipping might be dispensable for the surface expression of GPI-APs, so even if mutated, GPI-APs surface expression would be normal. Third, the flippase gene could be duplicated in the genome, so a mutation in one gene would be compensated for by the other. Fourth, if the gene encoding flippase is very short, insertions would not be enriched during screening, as seen for DPM2 and PIGY. Therefore, to identify all regulatory factors involved in the GPI biosynthetic pathway, different strategies or new assay systems are essential.
One of the difficulties of forward genetics in mammalian cells is the ploidy of the chromosomes, which is at least diploid in mammalian cell lines. Therefore, even if one allele is mutated, the corresponding allele on the homologous chromosome remains intact, so the phenotype is unaffected. This multi-ploidy makes it difficult to conduct genetic analysis. In mammalian cells, forward genetic approaches using RNA interference have been widely used [47,48]. However, genes encoding enzymes can be difficult to isolate using this technique because only a small amount of enzyme is sufficient for the reaction to occur. Moreover, substantial off-targeting effects can be observed [49,50]. Recently, CRISPR/Cas9 systems have been used to overcome this [46,[51][52][53]. Although the screening achieved with these systems appears to be efficient, not all genes can be disrupted and one also need to care about the off targeting effects.
Genetic screening using HAP1 cells combined with gene trapping can therefore be used as a high throughput and reliable alternative to yeast genetics [18,21,54]. Although gene trap-independent mutations or genetic revertants are often observed during the screening process, they can be eliminated by removing the SA site in the trapped region from the genome by expressing Flp recombinase. This is also useful to confirm whether the mutant cells that are isolated or enriched show the phenotypes of interest.