Transcriptome Analysis during Human Trophectoderm Specification Suggests New Roles of Metabolic and Epigenetic Genes

In humans, successful pregnancy depends on a cascade of dynamic events during early embryonic development. Unfortunately, molecular data on these critical events is scarce. To improve our understanding of the molecular mechanisms that govern the specification/development of the trophoblast cell lineage, the transcriptome of human trophectoderm (TE) cells from day 5 blastocysts was compared to that of single day 3 embryos from our in vitro fertilization program by using Human Genome U133 Plus 2.0 microarrays. Some of the microarray data were validated by quantitative RT-PCR. The TE molecular signature included 2,196 transcripts, among which were genes already known to be TE-specific (GATA2, GATA3 and GCM1) but also genes involved in trophoblast invasion (MUC15), chromatin remodeling (specifically the DNA methyltransferase DNMT3L) and steroid metabolism (HSD3B1, HSD17B1 and FDX1). In day 3 human embryos 1,714 transcripts were specifically up-regulated. Besides stemness genes such as NANOG and DPPA2, this signature included genes belonging to the NLR family (NALP4, 5, 9, 11 and 13), Ret finger protein-like family (RFPL1, 2 and 3), Melanoma Antigen family (MAGEA1, 2, 3, 5, 6 and 12) and previously unreported transcripts, such as MBD3L2 and ZSCAN4. This study provides a comprehensive outlook of the genes that are expressed during the initial embryo-trophectoderm transition in humans. Further understanding of the biological functions of the key genes involved in steroidogenesis and epigenetic regulation of transcription that are up-regulated in TE cells may clarify their contribution to TE specification and might also provide new biomarkers for the selection of viable and competent blastocysts.


Introduction
Pre-implantation development of mammalian embryos encompasses a series of critical dynamic events, such as the transition from a single-cell zygote to a multicellular blastocyst and the first segregation of cells within the embryo with the formation of the inner cell mass (ICM) surrounded by trophectoderm (TE) cells. ICM retains pluripotency and gives rise to the embryo proper, whereas TE cells play an important role in embryonic implantation in the uterine endometrium and placental formation. In humans, the embryonic genome activation (EGA) program is functional by day 3 after fertilization [1]. The 6-8 cell stage embryo (day 3 post-fertilization) starts the process of ''compaction'' that leads to the generation of the tightly organized cell mass of the morula and is followed by differentiation of the morula into a blastocyst [2]. The transition from day 3 embryos to day 5 blastocysts is likely to be controlled by many and specific changes in the expression of different genes as this process involves both cellular differentiation and transcriptional reprogramming. Although some genes that are specifically expressed in day 3 human embryos and in TE cells, such as CCNA1 and GATA3 respectively have been identified [3,4], our knowledge on the changes in gene expression associated with the initial embryo-TE transition and the specification of the TE cell lineage is still limited. In addition, since TE biopsies from day 5 human blastocysts might become a reliable alternative to blastomere biopsies to assess the expression of biomarkers of embryo viability [5], a better knowledge of the genes that are specifically expressed in TE cells and the embryo proper is crucial. Recent technological advances in mRNA amplification methods and DNA microarray assays have allowed the simultaneous analysis of the transcript level of thousands of genes in one experiment, thus offering a global view of the molecular events regulating physiological functions and cellular processes [6,7]. Indeed, these methodologies have already contributed to improving our knowledge on the genetic network controlling key stages of pre-implantation embryo development [8,9,10,11]. In this study, we used high-density oligonucleotide Affymetrix HG-U133P microarray chips to analyze the gene transcription profiles of single day 3 human embryos and TE cells isolated from day 5 blastocysts. By comparing the transcriptomes of TE cells and day 3 embryos, we identified the specific molecular signature of human TE cells. These findings should provide a base for investigating the molecular mechanisms of the embryo-TE transition as well as important insights for the development of diagnostic tests to test blastocyst quality in assisted reproduction programs.

Dynamic Changes in Overall Gene Expression in Mature MII Oocytes, Single Day 3 Embryos, TE Cells from Day 5 Blastocysts and hESCs
In order to determine the global gene expression variation in the different samples, we established the gene expression profile of mature MII oocytes (n = 3), day 3 single embryos (n = 6), TE samples from day 5 blastocysts (n = 5) and hESCs (n = 4) (to represent the ICM) by using high-density oligonucleotide Affymetrix HG-U133P microarray chips. A non-supervised analysis using the principal components analysis (PCA) showed that samples from the same group clustered together very tightly ( Figure 1A), corroborating the robustness of the Affymetrix microarrays [12]. Moreover, a non-supervised hierarchical clustering analysis of the array data (based on 15,000 genes) clustered perfectly the different samples, confirming their very specific expression profiles ( Figure 1B). Finally, a scatter plot analysis ( Figure S1) showed that expression variations between mature MII oocytes and single day 3 embryos were high as illustrated by the dispersed scatter plots and the low correlation coefficient (0.51). Conversely, the differences in gene expression between day 3 embryos and TE or hESC samples were lower as indicated by the tighter scatter plots and the high correlation coefficients (0.60-0.76) ( Figure S1). These results reveal dynamic transcriptome changes during the transition from mature oocyte to day 3 embryo and from day 3 embryo to blastocyst. These ''dynamic patterns'' are due to the large-scale degradation of human maternal transcripts and the activation of embryonic genes, as was also observed in the mouse [10,13].

Comparison of the Gene Expression Profiles of Day 3 Embryos and TE Cells Isolated from Day 5 Blastocysts
We then compared the expression profiles of day 3 embryos and TE cells, by using the significance analysis of microarrays (SAM) software with a 2-fold change cut-off and false discovery rate (FDR) ,1%. We found that 2,196 transcripts were up-regulated in human TE cells (''TE molecular signature'') and 1,714 in day 3 embryos (''day 3 embryo molecular signature'') ( Figure 2). The comprehensive lists of these signatures are presented in Tables S1 and S2 and the 100 genes with the highest fold change and significant statistical value (FDR = 0) for each signature are listed in Table 1 and 2. The ''day 3 embryo molecular signature'' included the Developmental Pluripotency Associated gene 5 (DPPA5), members of the Ret finger protein-like gene family (RFPL1, 2 and 3), of the NLR family (NALP4, 5, 9, 11 and 13), and of the melanoma antigen family (MAGEA1, 2, 3, 5, 6 and 12). Several maternal genes were found in this signature, such as members of the Zona Pellucida gene family (ZP2, 3 and 4), ZAR1, AURKC and FIGLA, suggesting that they are still active in day 3 embryos. Several transcription factors were also significantly overexpressed in day 3 embryos, such as TFB1M and TFB2M, the transcriptional regulators MBD3L2 and ZSCAN4, as well as metabolic genes such as Pyruvate Dehydrogenase Kinase 3 (PDK3) and Lactate Dehydrogenases (LDHC). The ''TE molecular signature'' comprised genes important for placental development (PGF and TFAP2A), cytoskeleton-associated genes (Keratin 18 and 19), and genes encoding S100 calcium binding proteins (S100P, S100A6, 10, 13, 14 and 16), retinoid receptor-related testisassociated receptors (NR2F2 and NR2F6) or the B receptor (CCKBR). Moreover, genes encoding extracellular matrix proteins, such as Laminins (LAMA1, LAMA5 and LAMC1) and Integrins (ITGB4 and ITGB5) were also up-regulated. Gene ontology (GO) annotations were used to explore the specific functional properties of the two molecular signatures (Figure 3). The day 3 embryo molecular signature was enriched in genes associated with localization in the ''nucleus'', while genes associated with the ''cytoplasm'' localization were over-represented in the TE molecular signature. Concerning the ''biological processes'', the day 3 embryo molecular signature was enriched in genes involved in the regulation of cellular processes, transcription and posttranslational protein modifications. Conversely, in the TE molecular signature, genes connected with different metabolic and steroid biosynthetic processes were over-represented. The ''molecular function'' analysis showed that genes involved in oxido-reductase activity were significantly enriched in the TE signature (p,0.001), whereas genes related to ''GTPase activity'' and DNA binding were over-represented in the day 3 embryo signature. Finally, the expression pattern of 11 genes belonging to the TE (GATA3, LAMA1, KRT18, HSD3B1, HSD17B1 and DNMT3L) or to the day 3 embryo molecular signature (MBD3L2, CCNA1, BIK, RFPL2 and FIGLA) was confirmed by qRT-PCR analysis using specific primer pairs (Table S3). All qRT-PCR data were normalized to GAPDH to control for variations in mRNA recovery and RT efficiency ( Figure S2).

Expression of Genes Encoding Proteins which Play a Role in Apoptosis in Day 3 Embryos and TE Samples
We then investigated the expression of genes coding for proteins linked to the extrinsic and intrinsic apoptosis pathways in day 3 embryos and TE cells. The expression of genes of the TNF ligand and receptor family was not different in day 3 embryos and TE cells. Conversely, several genes belonging to the BCL-2, BIRC and Caspase families appeared to be differentially expressed in the two groups ( Figure 4A). Specifically, the BCL-2 family members BCL2L10 (637, FDR ,0.0001), BCL2L11 (616, FDR ,0.001), and BIK (63.7, FDR ,0.001), the expression of which was validated by qRT-PCR (Figure 3), and the BIRC family member BIRC2 (64, FDR ,0.001) were up-regulated in day 3 embryos. Caspase 6 (63, FDR ,0.001) was over-expressed in TE cells. MCL-1, a gene that belongs to the BCL2 family and promotes cell survival, was strongly expressed in both day 3 embryos and TE samples.

Evaluation of DNA Repair Regulation in Day 3 Embryos and TE Samples
The microarrays data were also used to investigate the expression of a comprehensive list of DNA repair genes [14] in day 3 embryos and TE samples (Tables S1 and S2). Of the 123 DNA damage repair genes investigated, five [UNG, RFC1, UNG2 (now named CCNO), PCNA, MSH2] were up-regulated in day 3 embryos and eleven [BRCA1, TDG, FANCG, FEN1, XRCC5, XRCC6, XPC, MUTYH, XPA, SMUG1, POLD2] in TE cells. We then analyzed the functional relationship between the DNA damage repair genes that were differentially expressed in TE samples and day 3 embryos using the Ingenuity Pathway Analysis (IPA) software. In both cases, all the DNA repair genes displayed a documented functional interaction with each other, forming a tightly connected network ( Figure S3).

Stemness Genes and Transcriptional Regulatory Networks Identified in Day 3 Embryos and TE Cells
We then performed a stemness gene enrichment analysis using a previously published dataset from hESCs, in which we defined a consensus hESC stemness gene list (n = 48 genes) [7]. The key stemness factors NANOG, POU5F1 (OCT3/4) and SOX2 [15] were enriched in day 3 human embryos, whereas DNMT3B, LIN28, PHF17, SEPHS1 were over-represented in TE cells. Conversely, other genes, such as UGP2 and PIM2, were enriched in both day 3 embryos and TE samples ( Figure 4B). Bioinformatic gene pathway analysis (Ingenuity software) of the day 3 embryo molecular signature showed that many genes of the NANOG signaling pathway, including NANOG ( Figure 5), were up-regulated in day 3 human embryos, thus confirming the role of NANOG in the maintenance of pluripotency [16]. The ''TE molecular signature'' included transcription factors such as GCM1, which is induced by Transforming Growth Factor-b (TGF-b) [17], and Bone Morphogenic Protein 4 (BMP4) that induces the differentiation of pluripotent stem cells to trophoblast cells [18,19]. Other components of the TGF-b signaling cascade, such as Transforming Growth Factor Beta Receptor III (TGFBRIII), were also included in the ''TE molecular signature''.

Dynamic Expression of Epigenetic and Metabolic Regulators During Trophoblast Development
Since specification of the TE lineage during blastocyst formation involves initiation of differentiation, it is likely that epigenetic regulators may have an important role in this first developmental decision. The majority of the epigenetic regulators that were up-regulated in TE cells are associated with a repressive epigenetic status ( Figure 5). Specifically, the expression of the DNA methyl transferases (DNMT) DNMT3A, DNMT3B and DNMT1 increased between 2-and 13-fold in TE cells in comparison to day 3 embryos. DNMT3L expression was 70-fold higher in TE samples than in day 3 embryos. Similarly, several transcripts coding for proteins involved in chromatin remodeling and histone modification (SMARCA4, SMARCC1 and SMARCE1) were up-regulated between 2-and 7-fold in TE cells. Conversely, many histone deacetylases (HDAC9and HDAC2) and histone acetyltransferases (HAT1, SETD8, RNF20, TAF1, STK17B, 31, 32B and 35) were down-regulated in TE cells in comparison to day 3 embryos. Another feature of the TE molecular signature was the upregulation of several metabolic genes. Specifically, genes that are involved in estrogen biosynthesis (CYP11A1 x35, CYP19A1 x14) and lipid metabolism (PTGES x20) were strongly up-regulated in TE cells. One of the most striking observations was the high expression of genes that are involved in steroidogenesis (HSD3B1 6383, STS 6135, HSD17B1 6108, FDX1 614 and SRA1 66). Intersection with the Transcriptomes of Mature MII Oocytes and hESCs In an effort to link the genes involved in the day 3 embryo-TE transition with early embryonic development, we further investigated differences and similarities in the gene expression patterns of MII oocytes, day 3 embryos, TE cells and hESCs samples (comprehensive list in Table S4). The genes that were found to be up-regulated in day 3 embryos (Table S1) and TE cells (Table S2) were individually compared to those up-regulated in MII oocytes and hESCs using Venn diagrams ( Figure S4). Only 36 genes were common to both the TE and the MII oocyte signatures. On the other hand, day 3 embryos and MII oocytes shared a set of 511 genes, among which many are associated with oogenesis, such as DAZL, GDF9 and FIGLA. Finally, 1263 genes were common to both TE and hESC profiles, whereas only 124 genes were shared by day 3 embryos and the hESCs. Genes that were up-regulated in both TE and hESC samples were associated with cell death and proliferation (BAG6, CASP2 and ANXA3), metabolism (GCDH and HPGD) and WNT signaling (FZD5, AXIN1 and TCF3). Genes that were up-regulated in both day 3 embryos and hESCs (124 genes) are involved in the maintenance of pluripotency and tissue development, such as NANOG. Among the genes specifically upregulated in TE samples (644 genes), key genes related to epigenetic and metabolic pathways, such as DNMT3L, HSD3B1 and HSD17B1, were observed.

Discussion
Here, we compared the transcriptomes of day 3 human embryos and TE cells from day 5 human blastocysts to identify transcripts that are differentially expressed during the embryo-to-TE transition and the specification of the TE cell lineage. Many of the genes that were up-regulated in TE cells are already known to be associated with human TE differentiation [20,21]. For instance, we confirmed that GATA3 and KRT18, two trophoblast-determining genes, are enriched in TE from human blastocysts [22]. Moreover, the ''TE molecular signature'' included also unexpected genes, the TE-specificity of which has been overlooked. For instance, CCKBR activates signaling pathways involved in cell proliferation or migration [23,24] and stimulates the expression of b1-Integrin in vitro [25]. A number of cell adhesion genes that might be implicated in the embryo attachment to the endometrium were also up-regulated in TE cells, including members of the Integrin family (ITGB5) and genes related to extracellular matrix remodeling, such as Laminins (LAMA1 and LAMC1). In humans, active steroid hormones, including progesterone that is secreted by mouse TE cells [26], are essential for implantation and maintenance of pregnancy. Our analysis reveals that HSD3B1, HSD17B1 and FDX1, which encode enzymes involved in the metabolism of cholesterol, were specifically up-regulated in TE cells in comparison to day 3 embryos ( Figure S5). Moreover, PTGES (Prostaglandin E synthase) as well as CYP11A1 and CYP19A1 (estrogen synthesis) were also up-regulated in TE cells, suggesting a central role of these steroidogenic enzymes in TE steroid biosynthesis and  metabolism. Thus, the TE joins the group of tissues with ''steroidogenic'' activity, such as brain, heart, gonads, endometrium and placenta [27,28]. It is now important to compare the steroidogenic gene expression profiles in TE cells isolated from good and bad quality blastocysts to fully correlate specific transcriptional events with efficient TE development.
Among the models used to study trophoblast development, hESCs have emerged as a useful tool to examine the emergence and differentiation of TE cells. Particularly, the transcriptomic analysis of TE cells derived from hESCs has provided new insights into the signaling pathways and the molecular mechanisms underlying early trophoblast development. Recently, by using a microarray approach, Marchand and colleagues investigated gene expression during differentiation of hESCs into the trophoblast lineage upon addition of Bone Morphogenetic Protein 4 (BMP4) for 10 days and identified 670 genes that were up-regulated from day 0 to day 10 [29]. By intersecting these genes with those we found to be up-regulated in TE cells isolated from day 5 embryos, we found 104 common genes (see Table S5) among which there were not only trophoblast markers (for instance, GATA3 and KRT19), but also many genes implicated in lipid metabolism and estrogen biosynthesis (i.e., CYP19A1, CYP11A1, HSD17B1, HSD3B1, PTGES, STS, HPGD, SLCO2A1, HMOX1, ABCG2, ASAH1 and SMPD1). This finding validates the importance of metabolic genes during TE specification. Aghajanova et al. [30] compared the transcriptome of embryo-derived TE cells with that of hESC-derived TE cells and found that most of the shared genes were involved in the development of receptive endometrium during implantation. Suzuki et al. [31] used human embryonic carcinoma cells (G3), which can differentiate into TE cells, as an experimental model to investigate the molecular mechanism of trophoectoderm differentiation. Thus, comparative studies using human TE and hESC or G3 cells are relevant to better understand the molecular basis of cell fate decisions and to develop models of human TE development.
The ''day 3 embryo molecular signature'' was enriched in genes from the NLRP (named NALP) family which might play a role in early embryo development [32,33]. Indeed, NLRP5, NLRP8 and NLRP9 are expressed in bovine and human pre-implantation embryos [32,34] and, in pregnant NLRP5 null female mice,   embryo development is arrested at the two-cell stage [35].
Remarkably, many genes of the day 3 signature belong to the Melanoma Antigen family and the Ret finger protein-like family.
Most of their functions remain largely unknown, but some of them are thought to regulate, respectively, placenta and early embryo development [36,37]. Mouse data suggest that two other day 3 embryo-specific genes (MBD3L2 and ZSCAN4) might regulate early embryo development. In mouse embryos, MBD3L2 expression coincides with EGA [38] and ZSCAN4 (zinc finger and SCAN domain containing 4) is important for the progression from the 2cell to 4-cell stage [39]. ZSCAN4 plays also a key role in defying cellular senescence and maintaining a normal karyotype during propagation of embryonic stem cells in culture [40]. Additionally, the expression levels of DPPA5, DPPA2 and the stemness factor NANOG were much higher in day 3 embryos than in TE samples. The reciprocal pattern of expression of Nanog and the transcription factors Gata6 and Cdx2 in the mouse morula suggests that Nanog might determine ICM pluripotency by repressing Gata6 and Cdx2, which are implicated in the extra-embryonic lineage specification [41]. Our transcriptome analysis also shows that the TE molecular signature includes many genes that are annotated as ''membrane'', demonstrating a strong bias towards genes involved in cell-to-cell communication processes. Conversely, genes specifically expressed by day 3 embryos are largely ''nuclear''. Additionally, we categorized the genes that were up-regulated during the MII-day 3 transition according to their molecular and cellular function using the GO annotations and found that they were mainly associated with nuclear localization. This is in line with previously published data showing that proteins produced by the most upregulated genes during the MII-day 2 embryo transition are mainly localized in the nucleus [11] and that hESC-specific genes are significantly depleted in extracellular signaling components [7]. One assumption that can be inferred from these findings is that the determinants of the MII-embryo transition and pluripotency may be regulated by intrinsic factors.
Apoptotic cell death has been observed in human and other mammalian pre-implantation embryos [42]. The expression profile of apoptosis-related genes in day 3 embryos suggests that the balance between anti-(BCL2L10 and BIRC2) and proapoptotic factors (BCL2L11 and BIK) might be critical at this stage of development. As the onset of EGA occurs at day 3 postfertilization in humans, embryos that fail to accurately activate their genome might be committed to death by default. In contrast to mouse blastocysts where apoptosis occurs predominantly in ICM cells [43], apoptotic nuclei have been detected in both ICM and TE cells in human blastocysts [44]. Accordingly, we show that some molecular actors of apoptosis signaling are up-regulated in human TE cells (i.e. Caspase 6, MCL-1).
The expression of some DNA repair genes has been detected in mammalian embryos at different stages of development [45]. Our data show that two ''DNA damage sensor'' genes (RFC1 and PCNA1) and two ''base excision repair'' genes (UNG and UNG2 (now named as CCNO)) are up-regulated in human day 3 embryos, in line with previous works [46], and three ''Double strand break repair'' genes (BRCA1, XRCC5 and XRCC6) are over-expressed in TE cells. In homozygous Brca1 5-6 mouse mutants, in which exons 5 and 6 of Brca1 were deleted, the development of the extraembryonic region was abnormal and diploid trophoblast cells were absent [47]. This may indicate that the ''Double strand break repair'' activity may be important for TE specification.
Epigenetic mechanisms, including DNA methylation, are key elements for controlling gene expression during the embryo-TE transition. In mouse blastocysts, DNA methyltransferase expression is restricted to the ICM, in which nuclei are highly methylated [48], whereas in human and bovine blastocysts, DNA methylation is higher in TE than ICM cells [49]. Here we report a strong expression of DNA (cytosine-5) methyltransferases (DNMT3A, DNMT3B, DNMT1 and DNMT3L) in human TE cells ( Figure 5). DNMT3A and DNMT3B are de novo enzymes that establish methylation patterns. DNMT1 is a maintenance enzyme involved in preserving already acquired methylation patterns. DNMT3L lacks a catalytic domain, but can interact with the de novo enzymes [50], stimulating their activity [51]. Comparison with other samples including MII oocytes and hESCs suggests that DNMT3L is specifically up-regulated in TE cells ( Figure S4). However, DNA methylation levels have been described to be globally low in extra-embryonic tissues in both mouse and human embryos [52,53]. In these tissues, DNA (cytosine-5) methyltransferases enzymes are expressed only transiently and do not contribute to adult tissues maintenance, thus long-term epigenetic reprogramming may not be critical for extra-embryonic tissues. Moreover, the high expression of different epigenetic regulators in human TE cells could be a consequence of in vitro embryo culture. Studies in animal models have demonstrated that under certain in vitro culture conditions, DNA methylation profiles can be altered [54]. In another hand, the association between in vitro culture conditions during assisted reproduction and increased risk of some epigenetic disorders has been reported, clearly indicating that epigenetic deregulation must be considered when examining in   vitro fertilized embryos. Our findings suggest that epigenetic modifiers cooperate with transcription factors and DNA repair genes to regulate the whole gene expression profile in TE cells ( Figure 5). Disruption of this epigenetic regulatory circuit might lead to alterations of the normal physiological functions. Therefore, a comprehensive elucidation of this regulatory network would be highly beneficial for understanding TE anomalies and for improving assisted reproduction procedures. Moreover, a better knowledge on the TE-specific genes and the transcriptional networks operative in TE cells and day 3 embryos might led to the identification of new biomarkers that might be used as diagnostic tools to monitor the health, viability and competence of embryos in assisted reproduction programs.

Limitations
As the day 3 embryos and the day 5 embryos used to isolate TE cells were donated from infertile women who underwent IVF treatments, the gene expression profiles could be have been influenced by the controlled ovarian stimulation (COS) carried out during IVF and thus they might not completely reflect the physiological situation under natural cycles. Moreover, due to the bioethics law that regulates the research on human embryos in France, the number of embryos donated for research is smaller. In view of these limitations, we optimized our technique to obtain transcriptome data for each single embryo and trophectoderm sample, respectively.

Specimen Collection and Processing
Human day 3 (post-fertilization) embryos and day 5 blastocysts were donated for research by infertile couples undergoing IVF treatment. All patients signed informed consent forms and the protocol for collecting human embryos and TE was approved by the Ethical Committee of the French National Agency of Biomedicine.
Day 3 embryos. 9 embryos from 6 different couples were used for microarray analyses (n = 6) and qRT-PCR validation (n = 3). Day 3 embryos were all 6-8 cells with ,20% fragmentation. Each embryo was individually transferred in a tube containing extraction buffer and frozen at 280uC for subsequent RNA extraction.
Trophectoderm biopsy. 8 day 5 blastocysts were used for TE isolation for microarray analyses (n = 5) and qRT-PCR validation (n = 3). Blastocysts were fully expanded with a welldefined ICM and TE was scored according to Gardner [55]. After removal of the zona pellucida, TE was mechanically dissected from ICM. All TE samples were immediately transferred in tubes containing RLT lysis buffer and frozen at 280uC.
Mature MII oocytes and hESCs. After informed consent, unfertilized MII oocytes were collected 24 or 48 hours postinsemination as previously described [56]. Briefly, three pools of 16 MII oocytes (6 patients), 21 MII oocytes (8 patients) and 24 MII oocytes (8 patients) provided from couples referred to our center for conventional IVF for tubal infertility or for ICSI for male infertility were used for microarray analyses and qRT-PCR validation. The three hESC lines (HD83, HD90 and HD129) were derived by our group. Briefly, derivation of these lines was carried out using mechanical extraction of the inner cell mass [57]. The culture medium used for hESC derivation and culture consisted of 80% KO-DMEM, 20% Knockout serum replacement (KO-SR), 0.1 mM non-essential amino acids, 2 mM L-Glutamine, 0.5 mM b-mercaptoethanol and 10 ng/mL of bFGF. Passaging was performed mechanically by cutting the colony using a #15 scalpel under microscope. Mitotically inactivated (by irradiation) human foreskin fibroblasts (HFF) were used as feeder cells. HFFs were cultured in 85% DMEM, 15% FBS. HD83, HD90 and HD129 hESC lines were used for microarray analyses and HD90, HD129 and HS181 (imported from the Karolinska Institute (Stockholm, Sweden)) hESC lines were used for qRT-PCR validation.
RNA extraction. The RNeasy Micro kit (Qiagen) was used to isolate total RNA from TE samples and the Picopure RNA isolation kit (Arcturus Reagents/Molecular Devices, KIT0204, USA) for day 3 embryos, according to the manufacturers' recommended protocols. The quantity and purity of the total RNAs were determined by using a NanoDropH ND-1000 spectrophotometer (NanoDrop ND-Thermo Fisher Scientific, Wilmington, DE, USA) and their integrity by using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, http:// www.agilent.com).

Complementary RNA (cRNA) Preparation and Microarray Processing
Total day 3 embryo RNA samples (from 450 pg to 855 pg) were subjected to three rounds of linear amplification and total TE RNA samples (between 50 and 100 ng) were twice amplified to  generate suitable quantity of labeled cRNA for hybridization to HG-U133 plus 2.0 GeneChip arrays (Affymetrix, Santa Clara, CA, USA) as described in [9] and following the standard Affymetrix instructions. Briefly, RNA was amplified from individual human embryos using the RiboAmpH HS Kit according to manufacturer's instructions (Arcturus Bioscience). During the first strand synthesis reaction, cDNA that incorporates a T7 promoter sequence is produced. This cDNA was then used as a template for the in vitro-transcription reaction driven by the T7 promoter to synthesize antisense RNA (aRNA), which was used as input for the second round of amplification. cRNA was then transcribed into cDNA and the T7 promoter was used to drive the second round of in vitro transcription. The double-stranded cDNA was then subjected to three rounds of linear amplification. The amplified aRNA was labeled with biotin using the Turbo Labeling Kit (Arctutus) and fragmented. Finally, fifteen micrograms of each labeled sample were hybridized to the HG-U133plus2 GeneChip array (Affymetrix). The microarray data were obtained in agreement with the Minimal Information about Microarray Experiment (MIAME) recommendations [58]. All data are accessible at the US National Center for Biotechnology Information, Gene Expression Omnibus (GEO) repository http://www. ncbi.nlm.nih.gov/geo through the provisional accession series number GSE33025.

Data Processing and Visualization
After image processing using the Affymetrix Microarray Suite 5.0, the cell files were analyzed using the Affymetrix Expression Console software and normalized with the MAS5 algorithm by scaling each array to a target value of 100 using the global scaling method to obtain an intensity value signal for each probe set. Gene annotation was performed using NetAffx (http://www.affymetrix. com; March 2009). Genes with significant differential expression profiles were identified using the two-class Significance Analysis of Microarray (SAM) algorithm (http://www-stat.stanford.edu/ ,tibs/SAM/) with the Wilcoxon test and sample label permutation (n = 300). Briefly, the algorithm assigns a score to each gene based on differences in expression between conditions relative to the standard deviation of repeated measurements. The false discovery rate (FDR) is determined using permutations of the repeated measurements to estimate the percentage of genes identified by chance. The algorithm was applied to each dataset separately using FDR,1%. Subsequently, only the genes marked as significantly up-regulated or down-regulated were considered as differentially expressed in TE or embryos compared with the other samples. For hierarchical clustering, data were log-transformed, median-centered and processed with the CLUSTER and TREE-VIEW software packages [59]. To cluster the samples according to the similarity of their gene expression patterns, we performed an unsupervised principal component analysis (PCA) with the RAGE bioinformatics platform [http://rage.montp.inserm.fr/] to project samples onto three-dimensional spaces that were further visualized to see the constellation of all samples using all the detected genes. The expression of selected genes in the panel of samples that includes germinal, stem cells and adult tissues, were retrieved through our ''Amazonia!'' database (http://amazonia.montp. inserm.fr/). The Ingenuity Pathways Analysis (IPA) system (Ingenuity Systems, Redwood City, CA, USA) was used to identify networks related to the genes that were differentially expressed in day 3 embryos and TE samples.

Gene Ontologies (GO) Classification
Gene Ontology (GO) annotation analysis was carried out using the Fatigo+ tool http://babelomics.bioinfo.cipf.es [60] to identify biologically relevant themes among the genes that were differentially expressed in day 3 embryos and TE cells. Briefly, Fatigo+ performs a functional enrichment analysis by comparing two lists of genes by means of the Fisher's Exact Test. Gene modules used in the test are defined in different ways that include functional criteria (GO, KEGG, Biocarta, etc.). Also user-defined gene modules can be imported and used for functional enrichment.

Validation of Microarray Data by Quantitative RT-PCR Amplification
Gene expression profiles derived from microarray analyses were confirmed quantitatively by real-time qRT-PCR analysis using RNAs from three TE samples, three day 3 embryos, three MII oocytes and three hESC samples. The primer sequences are shown in Table S3. Briefly, cDNA was reverse transcribed following the manufacturer's instructions using 500 ng of total RNA in a 20 ml reaction that contained Superscript II (Invitrogen), oligo dT primer, dNTP mixture, MgCl2 and RNase inhibitor. Aliquots of cDNA (1/25 of the RT reaction) were diluted in 50 ml reaction volume. Q-PCR was performed using a LightCycler 480 apparatus with the LC480 SYBR Green I Master kit (Roche Diagnostics, Mannheim, Germany) containing 2 ml cDNA and 0.6 mMol primers in a total volume of 10 ml. After 10 min of activation at 95uC, cycling conditions were 10 s at 95uC, 30 s at 63uC and 1 s at 72uC for 50 cycles. Gene expression levels were normalized to GAPDH using the following formula 100/2 DDCt where DDCt = DCt unknown -DCt positive control. Statistical comparisons were carried out using the Student's t test and the SPSS software. P values less than or equal to 0.05 were considered significant. Figure S1 Scatter plots showing the comparative distribution of transcripts in mature MII oocytes, day 3 embryos, TE and hESC samples. Each sample was plotted against all the other samples to visualize expression variations. The blue areas highlight a greater than two-fold gene expression difference (up-regulated) between the X-axis and Y-axis samples. The orange areas indicate a greater than two-fold gene expression difference (down-regulated) between the X-axis and Y-axis samples. The yellow areas highlight a 0.5-to 2-fold gene expression difference between the X-axis and Y-axis samples. For each couple of samples, the Pearson's correlation coefficient was computed (r). (TIF) Figure S2 Quantitative RT-PCR validation of the microarray results: All qRT-PCR results were normalized to the expression of GAPDH in each sample and are the mean 6 SEM of individual day 3 embryos (n = 3), TE (n = 3), pooled MII oocyte (n = 3) and hESC (n = 3) samples analyzed in duplicate. *P,0.05 was considered significant (TIF) Figure S3 IPA results showing the network of DNA repair genes that are up-regulated in TE samples from day 5 human blastocysts and day 3 embryos. (TIF) Figure S4 Venn diagram representing the number of genes in each comparison and the overlaps between the three main comparison groups. The day 3 embryo/MII oocyte/hESC signatures were defined as the intersection of the day 3 embryo signature (genes over-expressed in day3 embryos compared with TE samples; 1714 genes), the MII oocytes signature (genes over-expressed in MII oocytes compared with TE cells; 4444 genes ) and the hESC signature (genes up-regulated in hESC compared to TE samples, 5502 genes). The TE/MII oocyte/hESC signature were defined as the intersection of the TE signature (genes over-expressed in TE compared with day 3 embryos; 2196 genes), the MII oocyte signature (genes overexpressed in MII oocytes compared with day 3 embryos; 3198 genes ) and the hESC signature (genes over-expressed in hESCs compared with day 3 embryos; 8584 genes). The comparison between categories were generated by using the SAM software with a fold change $2 and FDR ,1%.        [29]) and in TE cells isolated from day 5 embryos (this study). (XLS)