Single Cell Transcriptome Amplification with MALBAC

Recently, Multiple Annealing and Looping-Based Amplification Cycles (MALBAC) has been developed for whole genome amplification of an individual cell, relying on quasilinear instead of exponential amplification to achieve high coverage. Here we adapt MALBAC for single-cell transcriptome amplification, which gives consistently high detection efficiency, accuracy and reproducibility. With this newly developed technique, we successfully amplified and sequenced single cells from 3 germ layers from mouse embryos in the early gastrulation stage, and examined the epithelial-mesenchymal transition (EMT) program among cells in the mesoderm layer on a single-cell level.

Introduction mRNA expression analyses have been extensively used in biomedical research by fluorescence in situ hybridization (FISH), qRT-PCR, and microarray, and recently have been carried out on the entire transcriptome with the advent of next-generation sequencing via RNA-seq [1]. In general, FISH at single molecule resolution [2][3][4] gives the most quantitative measurement, but has limited dynamic range and low throughput. Similarly, RT-qPCR has high accuracy but cannot achieve whole transcriptome scale analyses [5][6][7][8][9]. RNA-seq has surpassed microarrays in both accuracy and dynamic range [10,11]. In a single cell, gene expression is intrinsically stochastic and cannot be synchronized among cells, which leads to cell-to-cell variations in mRNA expression levels [2,4,12]. This necessitates single cell transcriptome measurements, which have prompted intense recent efforts.
The first single-cell RNA-Seq method [10,11,13,14] was developed with PCR-based exponential amplification scheme, taking advantage of adding a poly-A tail to the 3'end of firststrand cDNAs by terminal transferase prior to the second strand synthesis. This PCR-based RNA-seq method lacked spike-in controls and displayed general amplification bias towards the 3' ends of mRNAs as expected. Another PCR-based technique named Quartz-Seq [15] was developed with different strategy, while the same problems remained. Subsequent methods relied on a reverse transcriptase with template-switching activity, such as STRT [16][17][18] and SMART-seq [19][20][21]. Although they have the potential to amplify full-length mRNA, these PCR-based techniques may still consist of significant bias dependent on the length of mRNAs, considering the general preferences of PCR towards shorter amplicons. CEL-seq [22] and MARS-Seq [23] utilize in-vitro transcription (IVT) as the amplification method instead of PCR, and reduce hands-on time with the ability to pool many samples before amplification. At the same time, the requirement for barcoding limits coverage to only the 3' or 5' ends of the transcripts. Another method [24] based on random priming has been demonstrated recently, but could not address the low amplification efficiency issue.
Multiple annealing and looping-based amplification cycles (MALBAC) [25] was able to significantly reduce the amplification bias compared to previous MDA-based whole genome amplification [26]. It can also confidently detect copy-number variations and point mutations in the genome, presenting great downstream opportunities, such as profiling meiotic recombination and genome aneuploidy in sperm [27]. Taking advantage of its effectiveness in DNA amplification, here we present a single-cell transcriptome amplification method based on MALBAC, named MALBAC-RNA. Throughout this work, we systematically analyze the efficiency and technical consistency of this novel technique, and demonstrate its ability by applying it to single embryonic stem cells during mouse gastrulation.
Every organ or somatic tissue of a mouse is derived from a single sheet of epiblast [28,29]. During the gastrulation stage from 6.5 to 8.5 days post coitum (d.p.c.), the cup-shaped epiblast diversifies to generate three distinct germ layers known as ectoderm, mesoderm and endoderm. During this period, the mesoderm and endoderm delaminate from the epiblast in a specialized region, namely the primitive streak, which contains a narrow stripe of egressing and differentiating cells running down one side of the cup. Each layer then gives rise to different components of the fetal organ primordia. Therefore, gastrulation represents a crucial phase of cytodifferentiation, morphogenesis and pattern formation, dramatically transforming an epithelial sheet into an embryo with recognizable vertebrate form within 48 hours.
During the early stage of gastrulation, in order to move into the primitive streak in the embryo and further differentiate into 3 distinct germ layers, the epiblast cells have to lose their cell-cell adhesion through an epithelial-mesenchymal transition (EMT) [30]. With the induction of EMT, cells within the newly formed mesoderm layer acquire the characteristics of the mesenchymal cells [31].
Transcriptome profiling of each of the germ layers could shed light on the differences in gene expression between the ectoderm, mesoderm and visceral endoderm. However, the study of post-implantation embryonic development has been hampered by the limited amount of RNA obtainable from a mammalian embryo. Taking advantage of our MALBAC-RNA single cell sequencing method, we were able to compare single-cell transcriptomes between germ layers, which enables us to have a detailed look at the transcriptional network active during the EMT process.

Mouse embryo dissection
At 7.0 days post coitum (dpc), C57BL/6 mice were sacrificed under anesthesia by isoflurane overdose followed by cervical dislocation, and the embryos were collected. The extra embryonic tissues were mechanically removed in M2 medium with 10% fetal calf serum. The remaining embryonic region was rinsed in PBS and then digested with dispase, followed by mechanical dissection. The isolated ectoderm, mesoderm, visceral endoderm pieces were trypsinized into single cells, which were individually mouth picked into cell lysis buffer in PCR tubes for single-cell amplification. Animal experiments were approved by the Institutional Animal Care and Use Committees (IACUC) at Harvard University.
Cell culture and sample preparation before single cell amplification Obtained from American Type Culture Collection (ATCC), SW480 cells were cultured in Leibovitz's L-15 Medium with 10% fetal bovine serum, 100 I.U./ml Penicillin and 100 μg/ml Streptomycin. Prior to the experiment, the cells were treated with 0.25% Trypsin-EDTA and washed once with 1x phosphate buffered saline (PBS). After the wash, cells are diluted and counted under the microscope to estimate the cell concentration. With calculated amount of dilution from the original cell suspension with 1xPBS, a final concentration of 100 cells/uL is reached. 1uL of the well-mixed diluted cell suspension is added into a total of 4uL cell lysis buffer, which contains 1x first-strand buffer for Superscript III Reverse Transcriptase, 5mM DTT, 0.5mM each dNTP mix, 0.45% IGEPAL CA-630, 0.4U/uL RNase inhibitor, 0.2U/uL SUPERase In, 2.5uM GAT-12dT primer. Cell is lysed by heating at 70°C for 90 seconds and then the reaction undergoes MALBAC-RNA amplification as described below.

Library preparation and sequencing
For each sample, several micrograms of amplified cDNA were generated by the PCR amplification, following MALBAC-RNA. With the validation on a few housekeeping and highly expressed genes with qPCR, including Gapdh, Rps13, Rpl21, Rps8, Actb, libraries were constructed for Illumina HiSeq 2000 sequencer, with about 1ug cDNA from each sample. The number of reads for each cell sequenced ranges from 3 to 7 million, with 100 bp paired-end sequencing. All data are accessible at the NCBI Sequence Read Archive through the accession number SRP049515.

Sequencing data analysis
Reads were aligned to the reference genome using Tophat 2.0.4 [32] and FPKM values were estimated using Cufflinks 2.0.1 [33]. Data from SW480 cells were aligned to genes annotated in the UCSC knownGenes table for the hg19 reference genome. Gene expression estimates were rescaled using upper quartile normalization of genes with detectable expression in at least one of the replicates [34]. For technical evaluations, spike-ins were limited to those with GC content between 40% and 60%. Data from mouse embryos and was aligned to the mm9 reference genome using RefSeq annotations. Hierarchical clustering was performed in R using heatmap.2 and differential expression analysis was performed using DESeq [35]. Gene ontology enrichment was performed using GOrilla [36].

Results and Discussion
Single cell transcriptome amplification with MALBAC-RNA During the experiment, each cell is picked and transferred into PCR reaction tubes preloaded with mild cell lysis buffer. After cell lysis, mRNA is reverse transcribed to cDNA with poly-T primers, which include a 27-nucleotide sequence. With cDNA being synthesized, the same 27 nucleotides together with 7 random nucleotides are used for cDNA amplification. Those 7 random nucleotides can hybridize evenly onto reverse transcribed cDNA at 4°C (Fig 1). As the temperature is slowly increased to 65°C, second strand cDNA synthesis is started. With strand displacement activity, DNA polymerase enables the primer from behind to displace the primer downstream base-by-base as it proceeds along the template. Upon reaching the end of extension, each newly synthesized cDNA has a 27-base tag at its 3' end complementary to its 5' end, thanks to the same sequence being used at both reverse transcription and second strand synthesis. In order to avoid being further amplified, after being melted at 95°C, cDNA with complementary tags at both ends is able to form a loop at 58°C, finishing a full MALBAC cycle. A total of 10 MALBAC pre-amplification cycles are used to generate enough amplicons for PCR.
Since during each cycle, only the original cDNA template is targeted for amplification, MAL-BAC does not generate as much bias, and the overall amplification efficiency is quasilinear. In order to acquire enough material for sequencing, a further 19-cycle PCR amplification is applied using the same 27-base common sequence as in the primers. To evaluate the technical reproducibility of MALBAC-RNA, we amplified two replicates by diluting and aliquoting a 100-cell lysate from the colorectal cancer cell line SW480 into singlecell portions. These technical replicates should differ in molecular counts only by Poisson fluctuations. Additionally, we amplified and sequenced nine SW480 single cells which would exhibit biological variability as well. MALBAC-RNA exhibits a linear detection of synthetic spike-in transcripts across five orders of magnitude (Fig 2A). Of the 11,233 genes detected in bulk mRNA sequencing, only 1045 were not detected in at least one of the single cells, while an additional 1622 genes were detected in at least one of the single cells but not the bulk. The correlation between the two technical replicates is shown in Fig 2B. MALBAC-RNA shows high reproducibility with a correlation coefficient of 0.995, while the nine SW480 single cells exhibited reduced correlation due to biological variations between cells (Figure A in S1 File). However, correlation is primarily influenced by a handful of highly expressed genes and is therefore a poor metric for evaluating technical reproducibility ( Figure B in S1 File). More tellingly, MALBAC-RNA exhibits reproducibility in detecting expressed genes (Fig 2C) with low amplification noise, as depicted by a 10-fold or more FPKM difference for the same gene after amplification (Fig 2D). Moreover, because random primers are incorporated throughout the transcripts, amplification is not biased against longer transcripts (Figure C in S1 File).

Transcriptome Amplification of Single Embryonic Stem Cells
Having demonstrated that MALBAC-RNA generates quantitative and reproducible single-cell transcriptomes, we asked whether a global analysis of cells isolated from early gastrulation stage embryos could reveal germ layer-specific transcriptomic patterns and trace the origin of germ-layer derivation.
To this end, we amplified and sequenced a total of 11 single-cell transcriptomes from each of the three germ layers-ectoderm, mesoderm, and visceral endoderm-from a 7.0dpc embryo (S1 Table). With principal component analysis, samples from the three germ layers were clearly separated (Fig 3A). In particular, the first principal component distinguishes the visceral endoderm from the other two layers, whereas the second principal component represents the difference between ectoderm and mesoderm. The germ-layer origin of these embryonic cells is additionally confirmed by the expression of known germ-layer-specific markers (Fig 3B). All visceral endoderm cells express high levels for endoderm specific marker genes, such as Cited1, Hnf4a, Cubn, Afp, Apoa1, but not mesoderm markers, such as Aplnr. The data show a distinct differentiation for the 3 germ layers based on their unique expression profiles. A total of 738 genes were found to be differentially expressed between ectoderm and mesoderm, 1783 between ectoderm and visceral endoderm, and 1831 between mesoderm and visceral endoderm. Differentially expressed genes were enriched for processes including those related to embryonic morphogenesis, pattern specification processes, cell differentiation, and regulation of Wnt signaling pathway (S3 Table). Therefore, both the global transcriptomes and the expression of known germ layer associated marker genes clearly support the germ-layer identity of all the examined single cells.
We next investigated the relationship between the three germ layers. Interestingly, they are not equally separated from each other. In principal component analysis, the visceral endoderm is more distinct from the other two layers and this difference constitutes the most significant component of variance. Consistent with this observation, hierarchical clustering placed ectoderm and mesoderm under the same subtree ( Fig 3B). Therefore, our data suggest a more distinct separation of visceral endoderm from the other two germ layers at 7.0dpc.
In addition, we also investigated the results of EMT programming within the mesoderm, as compared to the other two germ layers, based on their single-cell transcriptomes. As can be seen in Fig 4, both FGF10 and Snai1 have been significantly overexpressed in mesoderm, whereas the E-cadherin level is lowered compared to ectoderm and visceral endoderm, indicating the downregulation of E-cadherin expression by FGF signals, through the regulation of snail gene expression [37]. As Sox3 genes have been completely depleted in mesoderm, the reciprocal repression between Snail and Sox3 is suggested in our experiment as well as previously reported [38]. At the same time, both Eomes and Mesp1 are highly expressed in the mesoderm, supporting the theory that Eomes acts upstream of Mesp [30], although in our data only the upregulation of Mesp1 is observed rather than both Mesp1 and Mesp2. A few other EMT signature genes have also been found significantly overexpressed in mesoderm cells, like CDH2, Wnt5a, Wnt3, Hmga2, Smad1, Fgf10, which further confirms the transition of the cells in gastrulation stage.
Lastly, MALBAC-RNA revealed novel patterns of gene expression during early gastrulation. Some known germ-layer-specific markers for mesoderm, like BMP2, are not expressed in our samples. In some cases a known marker, such as T, is observed but only in one or two of the corresponding single cells. In addition, we identified new genes that are specific for each germ layer. For example Cotl1, a Coactosin-like protein, is found to be highly expressed in all cells from visceral endoderm.

Conclusions
In this work, we developed a new single-cell transcriptome amplification method based on MALBAC. Instead of performing a second strand synthesis right after the reverse transcription, as is usually done by most other RNA amplification methods, we deployed a modified version of MALBAC genome amplification on first-strand synthesized cDNA directly, followed by PCR amplification. Furthermore, we showed that MALBAC-RNA has great amplification sensitivity and consistency, especially for the genes at relatively low expression levels.
Although a critical stage during embryonic development, gastrulation has never been thoroughly studied transcriptome-wide on a single-cell level. And recently, there has been a strong interest in identifying the key components in the transcriptomes of different germ layers during gastrulation. To demonstrate our ability to amplify single-cell transcriptomes with MAL-BAC-RNA, the complete transcriptomes for the three germ layers of early gastrulation in mouse are uncovered for the first time at single-cell and single-base resolution.
With the availability of the single-cell transcriptomes from the early gastrulation stage of mouse embryos, we were able to examine the EMT process during embryonic development. We successfully found some of the transcriptional networks on single-cell levels as suggested in previous research. The cells from the mesoderm layer showed characteristics of the cells that went through EMT, compared to the other two germ layers that were sequenced. This analysis demonstrates that as an accurate single-cell RNA amplification method, MALBAC-RNA could be used to analyze certain cellular mechanisms on a single-cell level, which provides more detailed information than would be possible with bulk population analysis.
Probing gene expression in small populations of cells in vivo is critical for the study of developmental biology. Although cell lines exist to imitate some of these processes in vitro, which provide a large amount of RNA for molecular analysis, many events that involve complex morphogenesis and pattern formation, such as the mammalian gastrulation, can only be studied in living embryos. These cases call for a reliable technique that directly assesses changes in gene expression in vivo. In particular, this technique should not be limited to studying genes that are already identified due to their activity in other biological systems, because such approaches impose an inherent prejudice and may thus overlook novel pathways or responses. Here, we precisely micro-dissected single cells form an embryo in its gastrulation stage and sequenced the transcriptomes of three to five individual cells from each of the three germ layers. This provides a useful resource for studying differential gene expression between the three germ layers-the three most important cell populations within gastrulating embryos. The validity of the microdissection was confirmed by unsupervised hierarchical clustering, correlation analyses of gene expression levels and the profiling of known marker genes from each germ layer. We find that single cells from the same germ layer exhibit similar gene expression patterns. Therefore, we demonstrate that analyses of germ-layer-specific gene expression can provide a rapid screen for novel genes that are expressed in a tissue-or region-specific manner.
Supporting Information S1 File. Figure