Discovery of genes required for body axis and limb formation by global identification of retinoic acid regulated enhancers and silencers

Identification of direct target genes for transcription factors is hampered by the large number of genes whose expression changes when the factor is removed and numerous binding sites in the genome. Retinoic acid (RA) receptors bound to RA directly regulate transcription through RA response elements (RAREs) of which there are thousands in the mouse genome. Here, we focused on identification of direct RA target genes in the embryonic trunk during body axis and limb formation. Using ChIP-seq and RNA-seq on trunk tissue from wild-type and Aldh1a2-/- embryos lacking RA synthesis, we identified a relatively small number of genes with altered expression when RA is missing that also have nearby RA-regulated deposition of H3K27ac (gene activation mark) or H3K27me3 (gene repression mark) associated with RAREs. Such RAREs were identified in genes known to be required for body axis and limb formation, plus many new direct RA target genes were found. In addition to RARE enhancers, we identified numerous RARE silencers located near genes down-regulated by RA. Nr2f1, Nr2f2, Meis1, and Meis2 gene family members all have RARE enhancers, and double knockouts of each family demonstrated requirements for body axis and/or limb formation, thus validating our method for identifying RA target genes important for development.


Introduction
Retinoic acid (RA) is generated from retinol by the sequential activities of retinol dehydrogenase 10 (RDH10) (Sandell et al. 2007) and aldehyde dehydrogenase 1A2 (ALDH1A2) (Niederreither et al. 1999;Mic et al. 2002). Knockout studies of these enzymes revealed an essential role for RA in many early developmental programs including those controlling hindbrain anteroposterior patterning, neuromesodermal progenitor (NMP) differentiation, spinal cord neurogenesis, somitogenesis, forelimb bud initiation, and heart anteroposterior patterning (Rhinn and Dolle 2012;. RA functions as a ligand for nuclear RA receptors (RARs) that bind DNA sequences known as RA response elements (RAREs) as a heterodimer complex with retinoid X receptors (RXRs) (Mark et al. 2006). Binding of RA to RAR alters the ability of RAREs to recruit nuclear receptor coactivators (NCOAs) that activate transcription or nuclear receptor 3 corepressors (NCORs) that repress transcription (Kumar et al. 2016). Thus, RA functions are mediated by transcriptional activation or repression of key genes via RAREs.
Identification of RA-regulated genes that are required for development has been difficult as loss or gain of RA activity alters the mRNA levels of thousands of genes in various cell lines or animals, perhaps most being indirect targets of RA or regulated post-transcriptionally. As RA target genes are dependent upon RAREs, identification of RAREs by RAR-binding studies, cell line transfection assays, and enhancer reporter transgenes in mouse or zebrafish has been used to identify RA target genes that may be required for development, but progress is slow as each gene is analyzed separately . Genomic RAR chromatin immunoprecipitation (ChIPseq) studies on mouse embryoid bodies and F9 embryonal carcinoma cells reported ~14,000 potential RAREs in the mouse genome (Moutier et al. 2012;Chatagnon et al. 2015), but it is unclear how many of these RAREs are required to regulate genes in any specific tissue, and many may not function in any tissue at any stage of development. Only a few RAREs have been shown to result in gene expression and developmental defects when subjected to deletion analysis in mouse, i.e. a RARE enhancer that activates Hoxa1 in the hindbrain (Dupé et al. 1997), a RARE enhancer that activates Cdx1 in the spinal cord (Houle et al. 2003), and a RARE that functions as a silencer to repress caudal Fgf8 in the developing trunk (Kumar et al. 2016). In one additional case, a RARE described within intron 2 of Tbx5 that was suggested to be required for activation of Tbx5 in the forelimb field based on a mouse enhancer reporter transgene (Nishimoto et al. 2015) was found to be unnecessary for Tbx5 activation and forelimb budding when subjected to CRISPR deletion analysis, suggesting Tbx5 is not an RA target gene . Many DNA control elements (including RAREs) that exhibit appropriate tissue-specific expression in enhancer reporter transgene assays have been shown to not be required as an enhancer in vivo when deleted; this may be due to enhancer redundancy or because the control element is really not an enhancer but appeared to be when inserted as a transgene at a random location in the genome near a heterologous promoter (Duester 2019). Thus, additional methods are needed (preferably genomewide) to locate functional RAREs in a particular tissue which can be used to identify new candidate RA target genes that are required for development.
Epigenetic studies have found that histone H3 K27 acetylation (H3K27ac) associates with gene 5 This analysis identified 314 RA-regulated ChIP-seq peaks for H3K27ac located within or near 214 genes (i.e. the genes with the nearest annotated promoters) using a log2 cut-off of <-0.51 or >0.51 to include a RA-regulated peak near Sox2 known to be activated by RA (Ribes et al. 2009;. We identified 262 RA-regulated peaks for H3K27me3 located within or near 141 genes (i.e. the genes with nearest annotated promoters) using a log2 cut-off of <-0.47 or >0.47 to include a RA-regulated peak near Fst known to be repressed by RA ); all ChIP-seq data available at GEO under accession number GSE131624. Thus, we found a much smaller number of RA-regulated ChIP-seq peaks for H3K27ac/H3K27me3 compared to the very large number of genes found to have altered mRNA levels with RNA-seq.
In order to identify genes that are good candidates for being transcriptionally activated or repressed by RA (RA target genes), we compared our ChIP-seq and RNA-seq results to identify RA-regulated ChIP-seq peaks where nearby genes have significant changes in expression in wildtype vs Aldh1a2-/-based on RNA-seq. We found 73 RA-regulated peaks for H3K27ac near 63 genes with significant changes in expression when RA is lost (Table S1), plus 46 RA-regulated peaks for H3K27me3 near 41 genes with significant changes in expression when RA is lost (Table   S2). As some genes have more than one nearby RA-regulated peak for H3K27ac or H3K27me3, and some genes have nearby RA-regulated peaks for both H3K27ac and H3K27me3 (Rarb,Dhrs3,Fgf8,Cdx2,Fst,Meis1,Meis2,Nr2f2,Foxp4,Ptprs,and Zfhx4), a total of 93 RA-regulated genes have nearby RA-regulated peaks for H3K27ac and/or H3K27me3 when RA is lost, thus identifying them as candidate RA target genes for trunk development (Tables S1-S2; Fig. S1A).
Among the 93 candidate RA target genes for trunk development identified with our approach are included many examples of genes previously reported to be regulated by RA in the trunk based on studies of Aldh1a2-/-embryos (Niederreither and Dolle 2008; or RA-treated NMPs ; this includes Hoxa1, Cdx1,Rarb,Crabp2,Sox2,Dhrs3, and Pax6 whose expression is increased by RA, plus Fgf8, Cdx2, and Fst whose expression is decreased by RA (Table 1). H3K27ac peaks near Cdx1, Rarb,Crabp2,Sox2,Dhrs3,and Pax6 are reduced in Aldh1a2-/-trunk consistent with these being RA-activated genes, whereas H3K27ac peaks near Fgf8, Cdx2, and Fst are increased in Aldh1a2-/-consistent with these being genes repressed by RA. Conversely, H3K27me3 peaks near Fgf8, Cdx2, and Fst are decreased in 6 Aldh1a2-/-, whereas H3K27me3 peaks near Rarb, Hoxa1, and Dhrs3 are increased in Aldh1a2-/-, consistent with the former being genes repressed by RA and the latter being genes activated by RA (Table 1). In addition to these 10 well-established RA target genes that are required for trunk development, we identified 83 additional genes that our findings indicate are candidate RA target genes for trunk, including Nr2f1, Nr2f2, Meis1, Meis2, and Spry4 that were further examined here (Table 1); differential expression of these genes in E8.5 wild-type vs Aldh1a2-/-trunk was validated by qRT-PCR (Fig. S2). As our approach identified many known trunk RA target genes, it is a reliable approach for identifying new candidate RA target genes required for trunk development.

Identification of RAREs associated with RA-regulated deposition of H3K27ac or H3K27me epigenetic marks
As RA target genes need to be associated with a RARE, the DNA sequences within the RAregulated H3K27ac/H3K27me3 ChIP-seq peaks we found near our list of 93 RA-regulated genes were searched for RARE sequences using the Homer transcription factor binding site program for the mm10 genome; we searched for three types of RAREs including those with a 6 bp direct repeat separated by either 5 bp (DR5), 2 bp (DR2), or 1 bp (DR1) , and the presence or absence of RAREs is summarized (Tables S1 and S2). We found that 46 of these 93 genes contained at least one RARE in their nearby RA-regulated H3K27ac and/or H3K27me3 ChIP-seq peaks, thus narrowing down our list of candidate RA target genes to 49% of the genes originally identified. Our approach identified the three RAREs previously shown to have required functions during trunk development in vivo by knockout studies (RAREs for Hoxa1, Cdx1, Fgf8) plus several RAREs associated with known RA-regulated genes in the E8.5 trunk from Aldh1a2-/studies (Rarb,Crabp2,Sox2,Dhrs3,Cdx2,Fst), thus validating our approach for identifying RAregulated genes required for trunk development. The sequences of all the RAREs near these 46 RA target genes here are summarized; included are 65 RAREs near 34 RA-activated genes (we refer to these as RARE enhancers associated with increased H3K27ac and/or decreased H3K27me3 in the presence of RA) and 20 RAREs near 12 RA-repressed genes (we refer to these as RARE silencers associated with increased H3K27me3 and/or decreased H3K27ac in the presence of RA) (Table S3).

7
The results here provide evidence that many of the RA-regulated H3K27ac and H3K27me3 marks are associated with regulation of the nearest genes. However, it is possible that some H3K27ac and H3K27me3 RA-regulated peaks may be related to RA-regulated genes located further away in the same topologically associated domain (TAD). In order to address this issue, we assigned each RA-regulated H3K27ac and H3K27me3 peak to a TAD using the 3D Genome Browser (http://promoter.bx.psu.edu/hi-c/view.php); TAD analysis has not been performed on mouse E8.5 trunk tissue, but as TAD domains are similar between different tissues ) we used the TAD database for mouse ES cells which is the closest biologically relevant database in the 3D Genome Browser. Then the genes in each TAD containing an RA-regulated peak were searched in our RNA-seq database to identify genes whose mRNA levels are decreased or increased when RA is lost, and if at least one gene was found we determined whether a RARE is present in the ChIP-seq peak. This analysis resulted in the identification of 82 additional RARE enhancers near RA-activated genes, and 40 additional RARE silencers near RA-repressed genes, where the gene is not the gene nearest to the RARE in the TAD; in some cases more than one RAregulated gene was identified in a TAD (Table S3).
Up to now, Fgf8 represents the only example of a gene that is directly repressed by RA at the transcriptional level as shown by developmental defects upon knockout of the RARE at -4.1 kb, and by the ability of this RARE to stimulate binding of NCOR and PRC2 plus deposition of H3K27me3 in an RA-dependent manner (Kumar and Duester 2014;Kumar et al. 2016). Here, in addition to Fgf8, we found many more candidates for genes repressed by RA in the trunk based on identification of nearby RARE silencers (Tables S3).

Analysis of known RA target genes for trunk validates our approach for finding new targets
The RA-regulated H3K27ac and/or H3K27me3 peaks we identified near Rarb, Crabp2, Hoxa1, and Cdx1 all overlap previously reported RAREs for these genes (Fig. 1). In the case of Rarb, the DR5 RARE in the 5'-untranslated region (Mendelsohn et al. 1991) overlaps RA-regulated peaks for both H3K27ac and H3K27me3, suggesting that this RARE in the presence of RA stimulates deposition of H3K27ac and removal of H3K27me3 during activation of Rarb; we identified a DR1 RARE in the 5'-noncoding region of Rarb within another RA-regulated H3K27me3 ChIP-seq peak 8 (Fig. 1A). For Crabp2, two closely-spaced RAREs previously reported in the 5'-noncoding region (Durand et al. 1992) associate with RA-regulated peaks for H3K27ac, plus another RARE we identified in the 3'-noncoding region associates with changes in H3K27ac (Fig. 1B). For Hoxa1, the RARE located in the 3'-noncoding region is associated with RA-regulated peaks for both H3K27ac and H3K27me3, plus another RARE we identified in the 3'-untranslated region is associated with RA-regulated peaks for H3K27me3 (Fig. 1C); importantly, knockout studies on the Hoxa1 RARE in the 3'-noncoding region demonstrated that it is required in vivo for Hoxa1 expression and normal development (Dupé et al. 1997). For Cdx1, two RAREs have been reported, one in the 5'noncoding region that was shown by knockout studies to be required for Cdx1 expression and body axis development (Houle et al. 2003), plus another RARE in intron 1 (Gaunt and Paul 2011). Both of these Cdx1 RAREs are overlapped by RA-regulated peaks for both H3K27ac and H3K27me3 ( Fig. 1D). These findings demonstrate that our approach can identify genes that are already known to be transcriptionally activated by RA via a RARE and required for development.

Identification of RA-regulated epigenetic marks and RAREs near RA-regulated genes known to control neuromesodermal progenitors
Ingenuity Pathway Analysis (IPA) of our list of 93 RA target genes shows enrichment for the pathway "development of body trunk", including Sox2, Cdx2, and Fgf8 known to be required for neuromesodermal progenitor (NMP) function during trunk development (Fig. S1B). NMPs are bipotential progenitor cells in the caudal region co-expressing Sox2 and T/Bra that undergo balanced differentiation to either spinal cord neuroectoderm or presomitic mesoderm to generate the post-cranial body axis (Wilson et al. 2009;Kondoh and Takemoto 2012;Henrique et al. 2015;Amin et al. 2016;Kimelman 2016;Gouti et al. 2017;Koch et al. 2017;Edri et al. 2019). NMPs are first observed in mouse embryos at about E8.0 near the node and caudal lateral epiblast lying on each side of the primitive streak (Olivera-Martinez et al. 2012;Tsakiridis et al. 2014;Garriock et al. 2015). Caudal Wnt and FGF signals are required to establish and maintain NMPs (Naiche et al. 2011;Takemoto et al. 2011;Martin and Kimelman 2012;Olivera-Martinez et al. 2012;Jurberg et al. 2014;Garriock et al. 2015;Wymeersch et al. 2016). Cdx2 is required for establishment of NMPs (Amin et al. 2016). During development, RA is first produced at E7.5 in presomitic mesoderm expressing Aldh1a2 to generate an anteroposterior gradient of RA with high activity in the trunk and low activity caudally . Loss of RA does not prevent establishment or maintenance of NMPs, but does result in unbalanced differentiation of NMPs, with decreased caudal Sox2 expression and decreased appearance of neural progenitors, plus increased caudal Fgf8 expression and increased appearance of mesodermal progenitors and small somites due to encroachment of caudal Fgf8 expression into the trunk where it reduces epithelial condensation of presomitic mesoderm needed to form somites (Diez del Corral et al. 2003;Patel et al. 2013;Cunningham et al. 2016). Cdx2 expression is increased when RA is lost in Aldh1a2-/-embryos (Zhao and Duester 2009).
Here, when RA is lost we observed RA-regulated H3K27ac and/or H3K27me3 peaks near several genes required for NMP function that show decreased (Sox2) or increased (Fgf8 and Cdx2) expression ( Fig. 2A-C). Most of these RA-regulated peaks contain RAREs, providing evidence that Sox2, Fgf8, and Cdx2 are direct RA target genes (Table S3). For Sox2, we observed two RAregulated H3K27ac ChIP-seq peaks, but only the one in the 3'-noncoding region was found to have a RARE ( Fig. 2A). In the case of Fgf8, previous studies reporting knockout of the RARE located in the 5'-noncoding region at -4.1 kb resulted in increased caudal Fgf8 expression and a small somite phenotype (although the defect is not as severe as for Aldh1a2-/-embryos), demonstrating that this RARE functions in vivo as a silencer by RA-dependent recruitment of nuclear receptor corepressors (Kumar et al. 2016); RARE redundancy may explain the milder phenotype as our approach suggests that Fgf8 has two additional candidate RARE silencers (Fig. 2B). RARE redundancy may be common as we observe that Cdx2 has three candidate RARE silencers (Fig. 2C), and our overall analysis shows that many genes have more than one nearby RARE (Table S3). These findings provide evidence that RA controls NMP differentiation directly at the transcriptional level by activating Sox2 and repressing Fgf8 and Cdx2 as progenitor cells progress from a caudal to a trunk location.

Evidence for genes regulated indirectly by RA at the transcriptional level
Our studies show that many genes that are downregulated or upregulated following loss of RA are associated with RA-regulated peaks for H3K27ac or H3K27me3 (either nearby or in the same TAD) that do not contain RAREs (Tables S1-S2). Such genes may be indirectly activated or repressed by RA at the transcriptional level. In the case of Pax6, our results indicate that RA stimulates H3K27ac deposition in Pax6 introns 2 and 6 that do not contain RAREs, with no other RA-regulated peaks in the same TAD (Fig. 3A). Previous studies identified an enhancer in Pax6 intron 6 containing a SOXB1 binding site that is important for activation in the spinal cord (Oosterveen et al. 2013). Activation of Pax6 in the spinal cord requires CDX proteins in the posterior-most neural tube, and CDX binding sites have been identified in Pax6 intron 2 (Joshi et al. 2019); in addition to expression in the caudal progenitor zone, mouse Cdx1 is expressed in the posterior neural plate where Pax6 is activated, and this expression domain requires RA (Zhao and Duester 2009). Activation of Pax6 requires that caudal FGF signaling be downregulated (Patel et al. 2013). Thus, although it is possible that our H3K27ac/H3K27me3 studies failed to identify an unknown RARE near Pax6, our findings suggest that the RA requirement for Pax6 activation may operate through several indirect mechanisms due to the ability of RA to activate Sox2 and Cdx1, and repress Fgf8 (Figs. 1, 2). We observed that Spry4 (shown here to be down-regulated by RA) does not have a RARE associated with its RA-regulated ChIP-seq peak for H3K27me3; no other RA-regulated peaks were found in its TAD (Fig. 3B). Many of the RA-regulated ChIP-seq peaks observed with our approach that do not contain RAREs may be indirect RA-regulated peaks that contain DNA binding sites for transcription factors other than RARs whose expression or activity is altered by loss of RA, thus resulting in changes for H3K27ac/H3K27me3 marks that are caused by the other transcription factors.

Conservation of RAREs identified with our approach identifies candidate RA target genes
While it possible that some RAREs that are conserved only in mammals perform mammalspecific functions, the RAREs we found in mouse that are conserved to birds or lower provide further evidence that they are needed to regulate target genes. The candidate RARE enhancers and RARE silencers we identified here that are associated with RA-regulated epigenetic marks were searched for evolutionary conservation using the UCSC genome browser. Among the RAREs in which the nearest gene is RA-regulated we found 6 RAREs that are conserved from mouse to zebrafish, 11 conserved to frog (X. tropicalis), 18 conserved to reptile (lizard; painted turtle), 20 conserved to bird (chicken; turkey), 39 conserved to human, 65 conserved to rodent (rat), and 20 that are not conserved with rat (Table S3). The large number of RAREs (i.e. 20) conserved beyond mammals to bird, lizard, frog, or fish demonstrate that our approach is able to identify highly conserved RAREs that point to excellent candidate genes required for development. Among the additional RAREs we found located further away in the TAD from an RA-regulated gene we identified only 4 more RAREs conserved beyond mammals to bird, lizard, frog, or fish, thus bringing the total to 24 highly conserved RAREs (Table S3). Thus, most of the highly conserved RAREs we identified are located very close to an RA-regulated gene rather than further distant in the TAD. In addition, all these highly conserved RAREs are either identical to the RARE consensus or have only one mismatch. Here we summarize the 24 most highly conserved RAREs that point to 38 RAregulated genes that may be required for development ( Table 2).
As RAREs need to be bound by RAR in order to function, we examined previously reported RAR ChIP-seq databases for mouse embryoid bodies (Moutier et al. 2012) and mouse F9 embryonal carcinoma cells (Chatagnon et al. 2015) to determine if the highly conserved RAREs we identified are included in RAR-binding regions. We found that 19 of our 24 highly conserved RAREs are included in the RAR ChIP-seq peaks from at least one of those studies (Table S4).
Our list of best candidate RA target genes (Table 2) includes several for which knockout studies have already demonstrated required functions during trunk development, i.e. in RA signaling (Rarb, Dhrs3), body axis formation (Hoxa1,Hoxa4,Hoxa9,Sox2,Fgf8,Pbx1,Tshz1,Zbtb16), and foregut formation (Foxp4); mouse knockout data summarized by Mouse Genome Informatics (http://www.informatics.jax.org). This list also includes many genes for which knockout studies have either not been performed or knockouts resulted in no reported early developmental defects. This list of genes thus contains excellent new candidates that can be tested for function during trunk development by generating knockouts or double knockouts in the case of gene families.

Nr2f and Meis gene families have nearby RA-regulated epigenetic marks associated with highly conserved RARE enhancers
We identified two gene families (Nr2f and Meis) where two family members have decreased expression when RA is lost and nearby RA-regulated peaks for H3K27ac or H3K27me3 containing RAREs. Nr2f1 and Nr2f2 encode orphan nuclear receptors NR2F1 and NR2F2 (previously known as COUP-TFI and COUP-TFII, respectively) that regulate transcription, although they have not been found to have endogenous ligands that control their activity (Lin et al. 2011). Meis1 and Meis2 encode transcription factors belonging to the TALE (three amino acid loop extension) family of homeodomain-containing proteins (Penkov et al. 2013;Schulte and Geerts 2019).
Here, Nr2f1 and Nr2f2 were both found to have a single RARE in the 5'-noncoding region close to exon 1 that is overlapped by or close to the edge of RA-regulated H3K27ac and H3K27me3 peaks genes (Dohn et al. 2019) and this conservation to mouse was detected by our analysis (Table 2).
Meis1 and Meis2 were previously shown to be upregulated by RA in chick limbs treated with RA (Mercader et al. 2000), and loss of RA in Aldh1a2-/-embryos results in reduced expression of Meis2 in paraxial mesoderm (Niederreither et al. 2000). Other studies have shown that Meis1 and Meis2 are activated by RA in embryonic stem cells and other cell lines, and RAREs were identified in their 5'-noncoding regions (Lalevee et al. 2011;Kashyap et al. 2013). Here, Meis1 was found to have four RAREs in introns 1, 6 and 7 that are overlapped by RA-regulated peaks for H3K27ac and/or H3K27me3, plus we identified the previously reported RARE in the 5'-noncoding region that is located at the edge of a small RA-regulated H3K27ac peak (Fig. 4C). Meis2 was found to have two RAREs that are overlapped by RA-regulated peaks for H3K27ac and/or H3K27me3, one in the 5'-noncoding region (previously identified) and another in intron 7 (Fig. 4D). Our analysis shows that Meis1 and Meis2 each have a highly conserved DR5 RARE enhancer (Table 2). Together, these studies identify Nf2f1, Nr2f2, Meis1, and Meis2 as candidate RA target genes in the developing trunk.

Nr2f1 and Nr2f2 function redundantly to control body axis formation
In order to be a biologically important RA target gene, the gene must not only be associated with a RARE, but must perform a function downstream of RA during trunk development which can 13 be determined by gene knockout studies. Here, we sought to validate our approach by performing knockout studies on some of the candidate RA target genes, particularly those that have nearby highly conserved RAREs. One could also undertake deletion studies of the RAREs, but this is only relevant after a knockout of the gene itself shows a defect. In addition, as genes are often controlled by redundant enhancers (which we observed here for many genes that have two or more RAREs associated with RA-regulated epigenetic marks; Table S3), studies in which predicted enhancers are deleted often have no effect on development Cunningham et al. 2018;Dickel et al. 2018;Osterwalder et al. 2018;Duester 2019); this includes knockout studies we performed on a RARE that was predicted by enhancer transgene studies to be needed for Tbx5 expression in forelimb bud that had no effect on Tbx5 or development . Below, we describe gene knockout studies on candidate RA target genes with nearby highly conserved RAREs to determine if these genes have a required function in trunk development.
Nr2f1 and Nr2f2 were selected for gene knockout as they both have nearby candidate RARE enhancers (identified by our H3K27ac/H3K27me3 ChIP-seq analysis) that are conserved from mouse to zebrafish (Table 2). Nr2f1 (formerly known as COUP-TFI) and Nr2f2 (formerly known as COUP-TFII) are both expressed at E8.5 in somites and presomitic mesoderm but not spinal cord, suggesting they may function in mesoderm formation during body axis formation (Béland and Lohnes 2005;Vilhais-Neto et al. 2010). Here, in situ hybridization analysis shows that Nr2f1 and Nr2f2 have reduced expression in the trunk of E8.5 Aldh1a2-/-embryos compared to wild-type (Fig.   S3).
The Nr2f1 knockout is lethal at birth with brain defects but no somite, spinal cord, or body axis defects are observed (Qiu et al. 1997). The Nr2f2 knockout is lethal at E10.5 with defects in heart development but not body axis formation (Pereira et al. 1999). As redundancy may have masked a body axis defect, we generated Nr2f1/Nr2f2 double mutants. As it would be quite time-consuming and expensive to obtain (if possible) the previously described Nr2f1 and Nr2f2 single knockout mouse lines, then generate a double heterozygote mouse line, and then generate double homozygote embryos at a ratio of 1:16, we employed CRISPR/Cas9 gene editing to examine F0 embryos as we previously described for Ncor1/Ncor2 double mutants (Kumar et al. 2016). Fertilized mouse oocytes were injected with sgRNAs designed to generate frameshift knockout deletions in the second exons of both Nr2f1 and Nr2f2. After dissecting embryos at E9.0, we obtained Nr2f1/Nr2f2 double knockouts that exhibited a body axis growth defect, more similar in size to that of wild-type E8.25 embryos (Fig. 5). Genotyping showed that embryos carrying 1 or 2 knockout alleles were normal in size compared to E9.0 wild-type (Fig. 5A), whereas embryos carrying either 3 or 4 knockout alleles exhibited a defect in body axis extension and are similar in size to E8.25 wildtype; n=7 ( Fig. 5B-C). Staining for Uncx somite expression demonstrated that embryos with 1-2 knockout alleles all have a normal number of somites with normal size (Fig. 5A), whereas embryos with 3-4 knockout alleles all have less somites that are smaller in size; embryos with 3 knockout alleles (either Nr2f1-het/Nr2f2-hom or Nr2f1-hom/Nr2f2-het) or 4 knockout alleles (Nr2f1-hom/Nr2f2hom) have a similar small somite defect ( Fig. 5B-C). As E9.0 Nr2f1/Nr2f2 mutants carrying 3-4 knockout alleles are more similar in size to E8.25 wild-type, in order to estimate somite size along the anteroposterior axis we compared them to Uncx-stained E8.25 wild-type embryos (Fig. 5D), thus revealing that the E9.0 mutants have somites about 57% the size of somites in E8.25 wild-type embryos, showing they have a specific defect in trunk development rather than a global body growth defect (Fig. 5E).
Overall, our findings show that loss of 3 or 4 alleles of Nr2f1 and Nr2f2 hinders body axis formation and results in smaller somites, thus validating our approach for finding new genes required for body axis formation. Although the body axis defect we observe for Nr2f1/Nr2f2 double mutants may be caused at least in part by a cardiac defect (Pereira et al. 1999), our RNA-seq and ChIP-seq studies were performed on trunk tissue in which the heart was excluded showing that RAregulated epigenetic marks are observed near Nr2f1/Nr2f1 genes located outside the heart in the trunk. Also, our results show that double mutants have smaller somites than wild-type embryos of a comparable size, revealing a specific effect on somitogenesis (body axis extension) rather than just an overall effect on embryonic growth. In the future, more detailed studies of Nr2f1/Nr2f2 double mutants can be performed to determine how these genes control body axis extension. In addition, future studies can be performed to determine how RARE enhancers function along with other factors to control Nr2f1 and Nr2f2 expression during body axis formation.

Meis1 and Meis2 function redundantly to control both body axis and limb formation
Meis1 and Meis2 were selected for gene knockout as Meis1 has a nearby candidate RARE enhancer conserved from mouse to frog, and Meis2 has a nearby candidate RARE enhancer conserved from mouse to bird (Table 2). Meis1 and Meis2 are both expressed throughout the trunk and in the proximal regions of limb buds (Mercader et al. 2000). Here, in situ hybridization analysis shows that Meis1 and Meis2 have reduced expression in the trunk of E8.5 Aldh1a2-/-embryos compared to wild-type (Fig. S3).
The Meis1 knockout is lethal at E11.5 with hematopoietic defects, but no body axis or limb defects are observed (Hisa et al. 2004). The Meis2 knockout is lethal at E14.5 with defects in cranial and cardiac neural crest, but no defects in body axis or limb formation were observed (Machon et al. 2015). As redundancy may have masked a body axis or limb defect, we generated Meis1/Meis2 double mutants via CRISPR/Cas9 gene editing of fertilized mouse oocytes employing sgRNAs designed to generate frameshift knockout deletions in the second exons of both Meis1 and Meis2. Embryos were dissected at E10.5 and stained for somite Uncx expression. Genotyping showed that E10.5 embryos carrying 1 or 2 knockout alleles for Meis1/Meis2 were normal in size with normal size somites compared to E10.5 wild-type (Fig. 6A). However, E10.5 embryos carrying

Discussion
Our epigenetic ChIP-seq studies combined with RNA-seq on wild-type vs Aldh1a2-/-RAdeficient trunk tissue provides a means for identifying new candidate RA target genes that may be required for development. By focusing on RA-regulated genes that also have changes in nearby RA-regulated H3K27ac and/or H3K27me3 epigenetic marks associated with highly conserved RARE enhancers or silencers, our approach can be used to identify excellent candidates for gene knockout studies to learn more about gene function.
Here, in our studies on Aldh1a2-/-trunk tissue, we were able to narrow down 4298 genes identified with RNA-seq that have significant changes in gene expression following loss of RA to 38 excellent candidate RA target genes in E8.5 trunk that also have significant changes in H3K27ac and/or H3K27me3 marks (located nearby or further away in the same TAD) associated with highly conserved RAREs. Our method allows one to identify genes that are most likely to be transcriptional targets of the RA signaling pathway as opposed to those whose expression is changed by effects downstream of RARs and RA signaling such as changes in expression or activity of other transcription factors or post-transcriptional changes in mRNA abundance. Our findings allow us to predict that some genes are likely to be indirect transcriptional targets of RA as they have nearby RA-regulated peaks for H3K27ac or H3K27me3 but no RAREs, i.e. Pax6 that is transcriptionally regulated by factors whose expression is altered by loss of RA including Sox2 (Oosterveen et al. 2013), Cdx (Joshi et al. 2019), and Fgf8 (Patel et al. 2013).
Our findings provide evidence for additional RARE silencers. Previous methods designed to identify RAREs favored discovery of RARE enhancers as studies were designed to find DNA elements that when fused to a heterologous promoter and marker gene would stimulate expression of the marker gene in the presence of RA. Also, when nuclear receptor coactivators (NCOA) and corepressors (NCOR) that control RA signaling were originally discovered, the model proposed for their function suggested that binding of RA to RAR favored binding of NCOA to activate transcription, with unliganded RAR favoring release of NCOA and binding of NCOR to repress transcription (Perissi et al. 2004). However, analysis of the Fgf8 RARE silencer at -4.1 kb demonstrated that RARs bound to RAREs can recruit NCOR in an RA-dependent manner, plus this RARE is required for normal body axis extension (Kumar et al. 2016). The Fgf8 RARE silencer was also found to recruit Polycomb Repressive Complex 2 (PRC2) and histone deacetylase 1 (HDAC1) in an RA-dependent manner, providing further evidence that RA can directly control gene silencing (Kumar and Duester 2014). Here, we identified additional RARE silencers near Fgf8 and Cdx2 plus several additional genes. Our studies indicate that RARE silencers are less common than RARE enhancers, and we found that Fgf8 is the only gene associated with a RARE silencer conserved beyond mammals. These additional RARE silencers can be further examined in comparison to the Fgf8 RARE silencer to determine the mechanism through which RA directly represses transcription.
It will be important to determine how RAREs can function as RA-dependent enhancers for some genes but RA-dependent silencers for other genes.
RA has been shown to be required for balanced NMP differentiation during body axis formation by favoring a neural fate over a mesodermal fate Henrique et al. 2015;Gouti et al. 2017). Our studies provide evidence that RA directly regulates several genes at the trunk/caudal border needed for NMP differentiation; i.e. activation of Sox2 in the neural plate that favors neural differentiation, repression of Fgf8 that favors mesodermal differentiation, and repression of Cdx2 that helps define the location of NMPs. We now provide evidence for a candidate RARE enhancer that activates Sox2, three candidate RARE silencers that repress Cdx2, and two additional candidate RARE silencers for Fgf8. As the knockout of the original Fgf8 RARE silencer at -4.1 kb exhibited a body axis phenotype less severe than loss of RA in Aldh1a2-/embryos (Kumar et al. 2016), it is possible that the additional two candidate RARE silencers found here provide redundant functions for Fgf8 repression.
Our observation of highly conserved candidate RARE enhancers near two members of two different gene families (Nr2f and Meis) was intriguing as it suggested that these gene family members may play redundant roles in body axis formation downstream of RA. As we were not sure whether the lack of previous studies on Nf2f1;Nf2f2 double knockout and Meis1;Meis2 double knockout mouse lines may be due to defects in double heterozygote adults that prevent generation of double homozygote embryos by conventional genetic approaches, we employed CRISPR gene editing to directly generate F0 double knockouts. Our Nr2f1/Nr2f2 double knockout studies indeed revealed a defect in body axis formation and small somites that is not observed in each single knockout. Interestingly, zebrafish nr2f1a/nr2f2 double knockout embryos reported recently exhibit a heart defect more severe than each single knockout, but not a body axis defect or body growth defect (Dohn et al. 2019). This observation is consistent with studies showing that RA is not required for NMP differentiation or body axis formation in zebrafish (Begemann et al. 2001;Berenguer et al. 2018). Thus, it appears that the ancestral function of Nr2f genes in fish was to control heart formation, but that during evolution another function to control body axis formation was added. Future studies can be directed at understanding the mechanism through which Nr2f1 and Our studies demonstrate the power of combining gene knockouts, ChIP-seq on epigenetic marks, and RNA-seq to identify transcription factor target genes required for a particular developmental process. In addition to H3K27ac and H3K27me3 epigenetic marks that are quite commonly observed near genes during activation or repression, respectively, it is likely that further ChIP-seq studies that identify RA-regulated binding sites for coactivators and corepressors will provide additional insight into RA target genes and transcriptional pathways. Such knowledge is essential for determining the mechanisms through which RA controls developmental pathways and should be useful to address RA function in adult organs. A similar epigenetic approach can be used to determine the target genes for any transcriptional regulator for which a knockout is available, thus accelerating the ability to understand gene regulatory networks in general.

Generation of Aldh1a2-/-mouse embryos and isolation of trunk tissue
Aldh1a2-/-mice have been previously described (Mic et al. 2002). E8.5 Aldh1a2-/-embryos were generated via timed matings of heterozygous parents; genotyping was performed by PCR analysis of yolk sac DNA. E8.5 trunk tissue was released from the rest of the embryo by dissecting across the posterior hindbrain (to remove the head, anterior hindbrain, pharyngeal region, and heart) and just posterior to the most recently formed somite (to remove the caudal progenitor zone) as previously described (Kumar and Duester 2014). All mouse studies conformed to the regulatory standards adopted by the Institutional Animal Care and Use Committee at the SBP Medical Discovery Institute which approved this study under Animal Welfare Assurance Number A3053-01 (approval #18-092).

RNA-seq analysis
Total RNA was extracted from E8.5 trunk tissue (two wild-type trunks and two Aldh1a2-/-trunks) and DNA sequencing libraries were prepared using the SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian (Takara). Sequencing was performed on Illumina NextSeq 500, generating 40 million reads per sample with single read lengths of 75 bp. Sequences were aligned to the mouse mm10 reference genome using TopHat splice-aware aligner; transcript abundance was calculated using Expectation-Maximization approach; fragments per kilobase of transcript per million mapped reads (FPKM) was used for sample normalization; Generalized Linear Model likelihood ratio test in edgeR software was used as a differential test. High throughput DNA sequencing was performed in the Sanford Burnham Prebys Genomics Core.

qRT-PCR analysis
Total RNA was extracted from 20 trunks of either E8.5 wild-type or Aldh1a2-/-embryos with the RNeasy Micro Kit (Qiagen #74004). Reverse transcription was performed with the High-Capacity 20 cDNA RT Kit (Thermo Fisher Scientific #4368814). Quantitative PCR (qPCR) was performed using Power SYBR Green PCR Master Mix (Life Tech Supply #4367659). Relative quantitation was performed using the ddCt method with the control being expression of Rpl10a. Primers used for PCR (5'-3'):

Chromatin immunoprecipitation (ChIP) sample preparation for ChIP-seq
For ChIP-seq we used trunk tissue from E8.5 wild-type or Aldh1a2-/-embryos dissected in modified PBS, i.e. phosphate-buffered saline containing 1X complete protease inhibitors (concentration recommended by use of soluble EDTA-free tablets sold by Roche #11873580001) and 10 mM sodium butyrate as a histone deacetylase inhibitor (Sigma # B5887). Samples were processed similar to previous methods (Voss et al. 2012). Dissected trunks were briefly centrifuged in 1.5 ml tubes and excess PBS dissection buffer was removed. For cross-linking of chromatin DNA and proteins, 500 µl 1% formaldehyde was added, the trunk samples were minced by pipetting up and down with a 200 µl pipette tip and then incubated at room temperature for 15 min. To stop the cross-linking reaction, 55 µl of 1.25 M glycine was added and samples were rocked at room 21 temperature for 5 min. Samples were centrifuged at 5000 rpm for 5 min and the supernatant was carefully removed and discarded. A wash was performed in which 1000 µl of ice-cold modified PBS was added and mixed by vortex followed by centrifugation at 5000 rpm for 5 min and careful removal of supernatant that was discarded. This wash was repeated. Cross-linked trunk samples were stored at -80C until enough were collected to proceed, i.e. 100 wild-type trunks and 100 Aldh1a2-/-trunks to perform ChIP-seq with two antibodies in duplicate.
Chromatin was fragmented by sonication. Cross-linked trunk samples were pooled, briefly

Generation of mutant embryos by CRISPR/Cas9 mutagenesis
CRISPR/Cas9 gene editing was performed using methods similar to those previously described by others (Wang et al. 2013;Tan et al. 2015) and by our laboratory (Kumar et al. 2016). Singleguide RNAs (sgRNAs) were generated that target exons to generate frameshift null mutations, with two sgRNAs used together for each gene. sgRNAs were designed with maximum specificity using the tool at crispr.mit.edu to ensure that each sgRNA had no more than 17 out of 20 matches with any other site in the mouse genome and that those sites are not located within exons of other genes. DNA templates for sgRNAs were generated by PCR amplification (Phusion DNA Polymerase; New England Biolabs) of ssDNA oligonucleotides (purchased from Integrated DNA Technologies) containing on the 5' end a minimal T7 promoter, then a 20 nucleotide sgRNA target sequence (underlined below), and finally the tracrRNA sequence utilized by Cas9 on the 3' end, shown as follows: The 20 nucleotide target sequences used were as follows: Nf2f1 exon 2 (#1) TTTTTATCAGCGGTTCAGCG For injection into mouse embryos, a solution containing 50 ng/µl Cas9 mRNA (Life Technologies) and 20 ng/µl for each sgRNA used was prepared in nuclease free water. Fertilized oocytes were collected from 3-4 week-old superovulated C57Bl6 females prepared by injecting 5 IU each of pregnant mare serum gonadotrophin (PMSG) (Sigma Aldrich) and human chorionic gonadotropin (hCG) (Sigma Aldrich). Fertilized oocytes were then transferred into M2 medium (Millipore) and injected with the Cas9 mRNA/sgRNA solution into the cytoplasm. Injected embryos were cultured in KSOMaa medium (Zenith) in a humidified atmosphere with 5% CO2 at 37˚C overnight to maximize the time for CRISPR/Cas9 gene editing to occur at the 1-cell stage, then reimplanted at the 2-cell stage into recipient pseudo-pregnant ICR female mice. Implanted females were sacrificed to obtain F0 E9.0 embryos (Nr2f1/Nr2f2) or F0 E10.5 embryos (Meis1/Meis2). As fertilized mouse oocytes spend a long time at the 1-cell and 2-cell stages, this facilitates CRISPR/Cas9 gene editing at early stages and allows many F0 embryos to be examined for mutant phenotypes (Kumar et al. 2016). For genotyping, yolk sac DNA was collected and PCR products were generated using primers flanking the sgRNA target sites; PCR products were subjected to DNA sequence analysis from both directions using either upstream or downstream primers. For each gene analyzed, embryos were classified as heterozygous (het) if the DNA sequence contained both a wild-type allele and a frame-shift allele; embryos were classified as homozygous (hom) if only frame-shift alleles were detected but no wild-type sequence.

Body axis length analysis of embryos
ImageJ software (https://imagej.net) (Schneider et al. 2012) was used to measure body axis length along a 3-somite region (Nr2f1/Nr2f2 double mutants) or 4-somite region (Meis1/Meis2 double mutants) compared to wild-type, with each specimen photographed at the same magnification. Statistical analysis was performed using one-way ANOVA (non-parametric test) with data presented as mean ± standard deviation (SD) and with p > 0.05 indicating significance.

25
Embryos were fixed in paraformaldehyde at 4˚C overnight, dehydrated into methanol, and stored at -20˚C. Detection of mRNA was performed by whole mount in situ hybridization as previously described (Sirbu and Duester 2006).

Acknowledgments
We thank the Genomics Core Facility and Bioinformatics Core Facility at SBP Medical Discovery Institute for help with ChIP-seq and RNA-seq analysis. We thank the Animal Resources Core analyzed the data and wrote the paper.

Competing financial interests:
The authors declare no competing financial interests.

Data availability
RNA-seq data have been deposited in GEO under accession number GSE131584. ChIP-seq data have been deposited in GEO under accession number GSE131624. All other relevant data are within the paper and its Supporting Information files. ChIP-seq values for RA-regulated peaks between Aldh1a2-/-(KO) and wild-type (WT) for H3K7ac (log2 <-0.51 or >0.51) and H3K27me3 (log2 <-0.47 or >0.47) with BHP <0.05; a cut-off of log2 <-0.51 or >0.51 for H3K27ac was employed to include a RA-regulated peak near Sox2 known to be activated by RA; a cut-off of log2 <-0.47 or >0.47 was employed for H3K27me3 to include a RA-regulated peak near Fst known to be repressed by RA. RNA-seq values are log2 <-0.85 or >0.85 for differentially expressed genes with FPKM values (KO and WT) >0.5; a cut-off of log2 <-0.85 or >0.85 was employed to include the known RA target gene Sox2. RARE, retinoic acid response element; DR1 or DR2 or DR5, direct repeat with 1 or 2 or 5 bp between each repeat; TFBS, transcription factor binding site. Also see related data in Tables S1-S3 (describing all known and new candidate RA target genes) and Figs. S1 and S2. Table 2. DNA sequences of highly conserved RAREs located in RA-regulated ChIP-seq peaks for H3K27ac or H3K27me3 near all RA-regulated genes in same TAD. RAREs shown here are conserved from mouse to bird, reptile, frog, or fish. RAREs contain no more than one mismatch to Homer consensus DR5, DR2, or DR1 RARE motifs shown here; DR, direct repeat.  (A) Two RA-regulated ChIP-seq peaks for H3K27ac (red bars) near Sox2 are shown for trunk tissue from E8.5 wild-type (WT) vs Aldh1a2-/-(KO). A RARE (green) was found in the 3'-noncoding peak (but not the 5'-noncoding peak) suggesting it may function as a RARE enhancer as the H3K27ac peak is decreased when RA is lost. (B) Shown are RA-regulated ChIP-seq peaks for H3K27me3 and H3K27ac near Fgf8. In the 5'-noncoding region of Fgf8 we found two RAREs on either end of the peak for H3K27me3 (repressive mark) that is decreased in KO, indicating they are candidate RARE silencers; the RARE furthest upstream in the 5'-noncoding region at -4.1 kb was shown by knockout studies to function as an RA-dependent RARE silencer required for caudal Fgf8 repression and somitogenesis (ref. 7). We also found another RARE in the 3'-noncoding region of Fgf8 that is another candidate for a RARE silencer as it is contained within a RAregulated peak for H3K27ac (activating mark) that is increased when RA is lost. (C) Cdx2 has a peak for H3K27ac that is increased and an overlapping peak for H3K27me3 that is decreased, along with three RAREs included within both peaks, indicating that all these RAREs are candidates for RARE silencers.

Fig. 3. ChIP-seq findings for
Pax6 and Spry4 that lack RARE enhancers or silencers. These genes are good candidates for being indirect transcriptional targets of RA as their RA-regulated ChIP-seq peaks do not contain RAREs. (A) Pax6 has two RA-regulated peaks (red bars) for H3K27ac (decreased) when RA is lost in E8.5 trunk tissue from Aldh1a2-/-(KO) compared to wildtype (WT); these RA-regulated peaks do not contain RAREs suggesting that transcription of Pax6 is indirectly activated by RA. (B) Spry4 has an RA-regulated peak for H3K27me3 (decreased) when RA is lost with no associated RARE suggesting that transcription of Spry4 is indirectly repressed by RA. and Nr2f2 have differential peaks (red bars) for both H3K27ac (decreased) and H3K27me3 (increased) when RA is lost in E8.5 trunk from Aldh1a2-/-(KO) compared to wild-type (WT). Each family member has one RARE (green) contained within these differential peaks that are candidates for RARE enhancers. (C-D) Meis1 and Meis2 have differential peaks for both H3K27ac (all decreased) and H3K27me3 (all increased) when RA is lost, along with associated RAREs for each peak that are candidates for RARE enhancers.  Table S1. Comparison of Aldh1a2-/-and wild-type E8.5 trunk tissue for H3K27ac ChIP-seq and RNA-seq results to identify RA-regulated H3K27ac ChIP-seq peaks near genes with RA-regulated expression. 2.77 ChIP-seq values represent differentially marked H3K27ac peaks comparing Aldh1a2-/-(KO) and wild-type (WT) with BHP <0.05; a cut-off of log2 <-0.51 or >0.51 was employed to include a differential peak near Sox2 known to be activated by RA. RNA-seq values represent differentially expressed genes comparing KO and WT in which FPKM >0.5; a cut-off of log2 <-0.85 or >0.85 was employed to include Sox2 known to be activated by RA. Genes that have differential peaks for both H3K27ac and H3K27me3 (Table S2) are marked with an asterisk. RARE, retinoic acid response element; DR1 or DR2 or DR5, direct repeat with 1 or 2 or 5 bp between each repeat; TFBS, transcription factor binding site. Table S2. Comparison of Aldh1a2-/-and wild-type E8.5 trunk tissue for H3K27me3 ChIP-seq and RNA-seq results to identify RA-regulated H3K27me3 ChIPseq peaks near genes with RA-regulated expression. and wild-type (WT) with BHP <0.05; a cut-off of log2 <-0.47 or >0.47 was employed to include a differential peak near Fst known to be repressed by RA. RNA-seq values represent differentially expressed genes comparing KO and WT in which FPKM >0.5; a cut-off of log2 <-0.85 or >0.85 was employed to include the known RA target gene Sox2. Genes that have differential peaks for both H3K27me3 and H3K27ac (Table S1) are marked with an asterisk. RARE, retinoic acid response element; DR1 or DR2 or DR5, direct repeat with 1 or 2 or 5 bp between each repeat; TFBS, transcription factor binding site.