Fig 1.
The DNA formylation dynamic of human early embryos and hESC.
(A) The flowchart of CLEVER-seq and sample information. The number of sperm cells, oocytes, the first polar bodies, male pronuclei, female pronuclei, blastomeres of two-cell, four-cell, eight-cell, morula, ICM and TE, and hESCs without CNV are listed in the top table, as well as the number of embryos. The sampling time of pronucleus at zygote stage is 16 to 18 h after ICSI. (B) The UCSC browse view showing the 5fCpG sites distribution across the human early embryo developmental stages. (C) The line chart showing 5fCpG level throughout the human early embryonic development. The 5fCpG percentage was calculated by the number of 5fCpG sites divided by the sum of 5fCpG sites and unmodified CpG sites. The center is the mean of 5fCpG level and error bars are SEM. The sample size are listed in Fig 1A. The numerical data is listed in S1 Table. (D) The representative immunostaining image of 5mC and 5fC in human zygotes. The scale bar represents 20 μm. BF, bright field; CGI, CpG island; CLEVER-seq, chemical-labeling-enabled C-to-T conversion sequencing; CNV, copy number variation; CpG, cytosine phosphate guanine; hESC, human embryonic stem cell; ICM, inner cell mass; ICSI, intracytoplasmic sperm injection; MALBAC, multiple annealing- and looping-based amplification cycles; SEM, standard error of the mean; TE, trophectoderm; UCSC, University of California, Santa Cruz; 5fCpG, 5-formylcytosine phosphate guanine; 5mC, 5-methylcytosine.
Fig 2.
The production of 5fC in human early embryos.
(A) The stacked bar plot showing the newly generated and inherited number of 5fCpG sites between two consecutive stages. The hESC was compared with ICM, whereas both ICM and TE were compared with morula. (B) The stacked bar plot showing the fraction of 5fCpG sites located in different genomic regions in each developmental stages. (C) Unsupervised clustering of stage merged 5fCpG sites calculated by Spearman correlation. (D) Heat map showing the median variance of 5fCpG abundance in different genomic regions among individual cells. The variance is calculated by the 5fCpG distribution in 1-kb window. The color key from blue to red represents the value of median variance from low to high. The binding peak of histone, transcription factor, and DNase I hyper-sensitive sites are downloaded from GSE29611, GSE61475, and GSE32970, respectively. (E) Relative enrichment analysis of 5fCpG sites in distinct binding regions of transcription factor and histone as well as DNase I hyper-sensitive sites. The DHS are downloaded from GSE32970. The binding peaks of histone and transcription factor are downloaded from GSE29611, GSE61475. In (A–E), the sample size in these panels are listed in Fig 1A. In (A–B) and (D–E), the numerical data is listed in S1 Data. (F) Density plot showing the relationship of 5mC and 5fC in 5fCpG-marked 1-kb windows. The x axis shows the 5fCpG percentage in 1-kb window calculated by the number of 5fCpG divided by the sum of unmodified CpG and 5fCpG covered. The red dashed line denotes the mean of DNA methylation level calculated in 1-kb windows across the genome in each developmental stage. The DNA methylaiton data of human early embryos are from GSE81233. The numerical data is listed in S2 Data. CGI, CpG island; CpG, cytosine phosphate guanine; CTCF, CCCTC-binding factor; DHS, DNase I hypersensitive sites; hESC, human embryonic stem cell; ICM, inner cell mass; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; TE, trophectoderm; TTS, transcriptional termination site; UTR, untranslated region; 5fC, 5-formylcytosine; 5fCpG, 5-formylcytosine phosphate guanine.
Fig 3.
The 5fCpG distribution character on repeat elements.
(A) Relative enrichment analysis of 5fCpG sites located in repeat elements and the subfamilies. (B) The line chart showing the 5fCpG level on subfamilies of repeat elements throughout the human early embryo developmental stages. The center is the mean of 5fCpG level and error bars are SEM. (A–B) The sample size in these panels are listed in Fig 1A, and the numerical data are listed in S1 Data. (C–D) The UCSC browser view of 5fCpG sites located in L1 (C) and ERVK (D). CGI, CpG island; ERVK, endogenous retrovirus-K; ERVL-MaLR, endogenous retrovirus-L, mammalian-apparent long-terminal repeat retrotransposon; hESC, human embryonic stem cell; ICM, inner cell mass; MIR, mammalian-wide interspersed repeat; PB, polar body; SEM, standard error of the mean; TE, trophectoderm; UCSC, University of California, Santa Cruz; 5fCpG, 5-formylcytosine phosphate guanine.
Fig 4.
The features of 5fCpG sites distribution on paternal and maternal genome.
(A) Venn diagram showing the overlap of 5fCpG-marked regions (1-kb window) between sperm and oocyte. The number of sperm-specific, oocyte-specific, and gamete-shared 5fCpG-marked regions are listed in the diagram. (B) Venn diagram showing the overlap of 5fCpG-marked regions (1-kb window) between male pronuclei and female pronuclei. The number of male pronuclei-specific, female pronuclei-specific, and pronuclei-shared 5fCpG-marked regions are listed in the diagram. (A–B) The sample size in these panels are listed in Fig 1A. (C–D) The UCSC browser view of 5fCpG sites in individual male pronucleus (C) and female pronucleus (D). (E) Relative enrichment of gamete-specific, gamete-shared, pronuclei-specific, and pronuclei-shared 5fCpG-marked regions (1-kb window) in distinct genomic region. (F) Relative enrichment of gamete-specific, gamete-shared, pronuclei-specific, and pronuclei-shared 5fCpG-marked regions (1-kb window) in repeat elements and subfamilies. (E–F) The sample size in these panels are listed in Fig 1A, and the numerical data are listed in S1 Data. CGI, CpG island; MIR, mammalian-wide interspersed repeat; TTS, transcriptional termination site; UCSC, University of California, Santa Cruz; UTR, untranslated region; 5fCpG, 5-formylcytosine phosphate guanine.
Fig 5.
The asymmetric 5fCpG distribution in paired pronucleus.
(A) The illustration diagram showing the two cases of 5fCpG level in paired pronucleus. The bar plot is used to show the difference of 5fCpG level in male pronuclei minus that in corresponding female one. The bar is in blue if the male pronuclei has higher 5fCpG level than female one (difference > 0), otherwise it is in red. (B) The bar plot showing the difference of 5fCpG level between paired pronucleus in whole genome (n = 10). The difference is calculated by 5fCpG level in male pronuclei minus that in corresponding female one. The order of paired pronuclei is ranked by the value of difference from high to low. (C) The bar plot showing the difference of 5fCpG level between paired pronucleus in distinct genomic region (n = 10). The pronuclei are ranked by the order showed in Fig 5B. (D) The bar plot showing the difference of 5fCpG level between paired pronuclei in distinct repeat elements (n = 10). The pronuclei are ranked by the order showed in Fig 5B. (B–D) The numerical data are listed in S1 Data. ALR, alpha-satellite repeat; CGI, CpG island; CpG, cytosine phosphate guanine; ERV1, endogenous retrovirus-1; ERVK, endogenous retrovirus-K; ERVL, endogenous retrovirus-L; ERVL-MaLR, endogenous retrovirus-L, mammalian-apparent long-terminal repeat retrotransposon; LINE, long interspersed nuclear element; LTR, long terminal repeat; MIR, mammalian-wide interspersed repeat; SINE, short interspersed nuclear element; SVA, SINE/variable number of tandem repeats/Alu; TTS, transcriptional termination site; UTR, untranslated region; 5fCpG, 5-formylcytosine phosphate guanine.
Fig 6.
The regulatory network of 5fC-mark region.
(A) The normalized DHS signal of hESC at the center of all 5fCpG sites and its flanking regions (left panel) as well as at the center of proximal 5fCpG sites (TSS ± 2 kb) and its flanking regions (right panel) in each development stage are shown. The DHS signal of hESC is downloaded from GSE32970. (B) The line chart showing the 5fCpG level in NORs (left panel) and in NDRs (right panel). The 5fCpG level was calculated by the number of 5fCpG sites divided by the sum number of 5fCpG and unmodified CpG sites. scCOOL-seq of human preimplantation embryos is from GSE100272. The center is the mean of 5fCpG level, and error bars are SEM. (C) Motif analysis of 5fCpG-marked distal regions (2 kb away from TSS). Only motif with P ≤ 10−20 and RPKM ≥ 1 in at least one stage are shown in the diagram. The significance was calculated by the binomial test in HOMER by default of motif enrichment. The color of the circle from blue to red indicates the expression level in each stage from low to high. The scRNA-seq data of human early embryos and ESCs are from GSE36552. The size of the circle indicates the −log10(P value). (D) Motif analysis of 5fCpG-marked proximal regions (TSS ± 2 kb). Only motif with P ≤ 10−5 and RPKM ≥ 1 in at least one stage are shown in the diagram. The significance was calculated by the binomial test in HOMER by default of motif enrichment. The color of the circle from blue to red indicates the expression level in each stage from low to high. The scRNA-seq data of human early embryos, and ESCs are from GSE36552. The size of the circle indicates the −log10 P value. (A–D) The sample size in these panels are listed in Fig 1A. (B–D) The numerical data are listed in S1 Data. CpG, cytosine phosphate guanine; DHS, DNase I hypersensitive sites; ESC, embryonic stem cell; hESC, human embryonic stem cell; HOMER, hypergeometric optimization of motif enrichment; NDR, nucleosome-depleted region; NOR, nucleosome-occupied region; scCOOL-seq, chromatin overall omic-scale landscape sequencing; RPKM, reads per kilobase of transcript per million mapped reads; scRNA-seq, single-cell RNA sequencing; SEM, standard error of the mean; TSS, transcription start site; 5fC, 5-formylcytosine; 5fCpG, 5-formylcytosine phosphate guanine.