Efficient Array-Based Identification of Novel Cardiac Genes through Differentiation of Mouse ESCs

Remarkably, although cardiac disease accounts for the largest proportion of adult mortality and morbidity in the industrialized world, the genetic programs controlling early cardiogenesis are largely incompletely understood. To better understand this process, we set out to identify genes whose expression is enriched within early cardiac fated populations, obtaining the transcriptional signatures of mouse embryonic stem cells (mESCs) at defined intervals during their differentiation along a cardiac path. We compared the RNA profiles of cardiac precursors cells (CPCs) with time-matched non-CPCs and undifferentiated mESCs, using a transgenic mESC line harboring an Nkx2-5 cardiac-specific regulatory sequence driving green fluorescent protein (GFP) to facilitate selection of CPCs. We identify 176 transcripts that are significantly elevated in their abundance within CPCs compared with other assayed populations, predicting that they will likely play a role in cardiogenesis. Of note, approximately 24% (43/176) of the cardiogenic candidate transcripts have known roles in cardiac function or development. Importantly, we evaluated the biological relevance of a significant subset 31/133 (23%) of the remaining candidate genes by in situ hybridization at multiple time points during development (embryonic day, E7.5–9.5) and report that all were expressed in key cardiac structures during cardiogenesis. Furthermore 9/31, of which many were previously uncharacterized, were detected as early as the formation of the cardiac crescent. These data demonstrate the potential power of integrating genomic approaches with mESC differentiation to illuminate developmental processes, and provides a valuable resource that may be mined to further elucidate the genetic programs underlying cardiogenesis.


Introduction
The heart is the first organ to form and function in the vertebrate embryo [1]. It is derived from the mesoderm that arises from the primitive streak during gastrulation. In mouse, cardiac progenitors migrate from the middle and anterior region of the streak to a region under the head folds on either side of the midline by E6.5 [2]. At approximately E7.5, these progenitor cells extend across the midline forming what is termed the cardiac crescent, the first anatomically distinct cardiac structure in the developing embryo. The cells of the crescent migrate ventrally to form a linear heart tube consisting of an outer myocardial layer and an inner endocardial layer. Over the next 48 hours, this heart tube undergoes progressing rightward looping, giving rise to a four chambered heart by E10. 5.
The developing heart was recently demonstrated to comprise two distinct fields identified through the expression of unique markers, and each giving rise to specific cardiac structures [3,4]. The secondary heart field, which contributes to the right ventricle and outflow tract, is distinguished by Hand2, Isl1, and Fgf10 expression [3][4][5]. Another key cardiac gene, Nkx2-5, is simulta-neously expressed in both the primary and secondary heart fields, making it a very useful marker for cardiac fated cells during development. Nkx2-5 is a homeodomain transcription factor and an ortholog of the Drosophilia gene tinman, a gene that is essential for cardiac specification in the fly [6]. In mouse, Nkx2-5 appears dispensable for specification and formation of the cardiac crescent. However, it is essential for stages beyond heart tube formation, with embryos lacking Nkx2-5 exhibiting looping defects and dying in utero (E9.5-E11.5) [7]. Although recent studies have provided further insight to mammalian heart development, our understanding of this complicated process is incomplete; elucidation of these processes has been greatly hampered by technical difficulties in studying early mammalian development in utero.
We, and others, have attempted to overcome this complication by utilizing the differentiation of embryonic stem cells (ESCs) as a model to study early developmental processes in vitro. The expression patterns of several key cardiac genes during ESC differentiation is known to closely reflect their endogenous expression during cardiogenesis in vivo [8,9]. The differentiation of ESCs toward cardiac populations that include cardiomyocytes are characterized by a temporally and spatially ordered expression of specific genes. Brachyury expression which specifies mesoderm is followed by Gata4 and Nkx2-5 (required for cardiac lineage development), leading to the expression of aMhc, bMhc and Cx43 which mark distinct maturation stages [10], and ultimately generating myofibrils and sarcomeres characteristic of mature cardiomyocytes [9]. Although several studies have identified multipotent cardiac progenitor cells (CPCs) in ES-derived populations [11][12][13][14], we have recently demonstrated that such CPCs are capable of forming all cardiac lineages, including endothelial and vascular smooth muscle in addition to cardiomyocytes [11][12][13][14]. Although these studies verify the cardiac potential of ESCs most have been performed using multi-lineage populations of cells in which the cardiac lineage represents a minor population.
We set out to establish whether ESCs can serve as a model to provide genetic information on cardiogenesis, determining and comparing transcriptional profiles at different stages during differentiation. To this end we have isolated GFP positive (GFP + ) and GFP 2 populations at three developmentally significant time points during their differentiation, simultaneously separating GFP + and GFP 2 populations within embryoid bodies. Our first time point was established as the undifferentiated starting population and was designated day 0 (D0). At this time our mESCs express markers of pluripotency such as Oct4 and SSEA-1 [11]. As differentiation proceeds, these mESCs decrease expression of the pluripotency markers and by the fourth day of the differentiation process (D4) these mESCs this cell population expresses the mesodermal marker Brachyury [11]. This was selected as our second time point. The third and final time point utilized in this study, day 6 (D6), is characterized by the initiation of robust Nkx2-5 expression, a critical gene expressed in the earliest stages of cardiac development, the formation of the cardiac crescent. This pivotal cardiac marker is expressed within a substantial subset of the differentiating population, but not in all cells and it remains expressed for the life of the cardiomyocyte in vitro and in vivo. Importantly expression of the GFP reporter occurs in concert with Nkx2-5 expression in the cardiac fated cell population, making it possible to obtain a discrete population of cardiac Nkx2-5 expressing by fluorescence activated cell sorting (FACS). RNA isolated from these populations was then subjected to array-based transcript profiling analyses. We compared the expression profile of the GFP + population with earlier time points and time matched non-cardiac fated cells (GFP 2 ).
We predicted that transcripts upregulated in the GFP + population relative to earlier time points or non-cardiac fated cells, would most likely be genes expressed in cardiovascular structures during cardiac development. Of the genes upregulated in the GFP + population, 43/176 (24%) have known roles in cardiac or hematopoietic development or function. The significant remainder (133/176) consists of genes not previously known to function in cardiogenesis. Indeed, more than 20% (52/260) of the transcripts differentially regulated in the GFP + population have no known function. We then asked what which genes lacking an identified role in cardiac development were appropriately expressed in cardiac structures. Through the assessment of 31 candidate genes at 3 critical points during heart development: E7.5, the cardiac crescent; E8.5, the heart tube; and E9.5, the looping heart; we demonstrated that all were expressed as predicted within appropriate cardiovascular structures, including 9/31 (29%) that were expressed in the cardiac crescent or heart tube at the earliest stages of heart formation. These data validate the power of ESC differentiation strategies as a model for mammalian development and, in combination with established genomic technologies, as a powerful gene discovery tool for developmental processes.

Methods
Cell Culture and Differentiation Strategy mESCs were differentiated through embryoid body [15] formation (hanging droplet technique) as described [16]. Following primary mouse embryonic fibroblasts (PMEF) feeder subtraction, mESCs were dissociated and resuspended in differentiation medium at a density of 56104 cells/mL. Cells at this time point were designated as day 0 (D0). Following incubation for two days, the EBs were transferred, in suspension, to poly-HEMA coated tissue culture dishes to prevent cell attachment and grown in medium containing ascorbic acid [17] which was replenished every 3 days thereafter. The generation of the mESC line used in these experiments has been previously described [11] and harbors a 6.7 kb Nkx2-5 cardiac specific enhancer sequence [15] driving GFP.

Fluorescence Activated Cell Sorting
EBs were harvested on days 4 and 6 (D4 and D6), washed in PBS for 30 minutes at 37uC then resuspended in 0.25% trypsin for 4, and 7 minutes, respectively. Cells from each time point (D0, D4, and D6) were sorted using the BD Biosciences FACSAria TM Cell sorting system. D6 cells alone were sorted into GFP+ and GFP2 fractions; a small fraction of these cells were collected in PBS for post-sort analysis, the remainders were sorted into RLT buffer (RNeasy, Qiagen) as were the D0 and D4. Data collection for each time point was performed in triplicate.

RNA Preparation and Array Hybridization
The isolated RNA was amplified and labeled using the Affymetrix Two Cycle TM labeling kit according to protocols described by the manufacturer. The resulting labeled cRNA was hybridized to Affymetrix mouse genome GeneChip 430 arrays, version 2.0. Fluorescence was detected using the Affymetrix-GS3000 GeneArray TM Scanner and image analysis of each GeneChip TM was performed using the GeneChip TM Operating System software from Affymetrix.

Data Analysis
Microarray data were preprocessed (background subtraction, normalization, and summarization) using robust multi-chip analysis (RMA) [18]. A linear mixed-model ANOVA was used to detect differential transcript expression based on the time point and presence or absence of a GFP population (GFP positive populations were cells fated along a cardiac lineage). A comparison was made among the Day 6 GFP+ population and the Day 0, Day 4, and Day 6 GFP-population. All analyses were performed using the PartekH Genomic Suite software package version 6.3 Beta.

Gene Ontology Annotation
The ANOVA filtered Affymetrix Mouse 430_2.0 probe sets were assigned to functional categories based largely on Gene Ontology Consortium (GO) annotation [19], (http://geneontology. org). The protein-based GO annotation terms were mapped to genes by the Mouse Genome Database (MDG) of Mouse Genome Informatics [20], http://www.informatics.jax.org/. Affymetrix expression probes were mapped to a transcript in their NetAffx Annotation files, which also include a subset of the genes' GO annotations. To supplement the Affymetrix GO annotation this analysis also employed the SpotfireH (www.spotfire.com) Functional Genomics platform to access full GO annotation.

Nucleic Acid in situ Hybridization
Probes 1-11 of the candidate genes were generated from E15 whole mouse RNA. A ,600 bp region from PCR amplified from the cDNA and TA cloned into the Invitrogen PCRII vector (Primers available upon request). Probes 12-31 were obtained through Invitrogen Clone Ranger ß. Nkx2-5 probe was a gift from Dr. Richard P. Harvey [21]; Isl1 probe was a gift from Dr. Sylvia Evans [5]. Digoxygenin-labelled riboprobes were synthesized using Sp6, T3, and T7 RNA Polymerase. CD-1 embryos were collected on days E7.75, E8.5, and E9.5, and in situ hybridization was performed as previously described [22].

Isolation of ESCs Differentiating Along a Cardiac Lineage
In an effort to give access to cardiac fated populations at the earliest stages of development, we have made use of a transgenic mESC line (Nkx2-5-GFP) containing an established cardiac specific enhancer element of the Nkx2-5 transcription factor [15] driving expression of the green fluorescent protein (Methods). This cell line was subjected to cardiac differentiation strategies described in the methods; we selected specific time points for analysis at which transcriptional changes in key pluripotent/mesodermal/cardiac genes indicate critical stages in cardiogenesis: D0 representing undifferentiated mESCs; D4 corresponding to the differentiation along a mesodermal lineage; and D6, with the onset of robust Nkx2-5 expression in the cardiac progenitor population. Cells were harvested at all selected time points and FAC sorted, with only the day 6 samples were sorted into GFP + and GFP 2 fractions (36.9%69.8% of the cells expressed GFP in the day 6 indicating a high level of cardiac enrichment). In order to assess the efficiency of our FAC sorting to discriminate between the GFP + and GFP 2 population, we performed an additional round of flow cytometry, which demonstrated a high purity of the GFP + and GFP 2 populations, 95.7%63.4% and 99.5%61.5% respectively ( Figure  S1). We isolated RNA from undifferentiated (D0), mixed (cardiac fated and non-cardiac fated; D4), and FAC sorted D6 non-cardiac fated (GFP negative; GFP 2 ) and cardiac fated (GFP positive; GFP + ) cell populations ( Figure 1); RNA isolations were performed on independent cell isolates of each population in triplicate (Methods)

Analyses of Transcriptional Profiles and Identification of Cardiac Candidate Genes
We examined RNA from four groups (day 0; day 4; day 6 GFP + ; day 6 GFP 2 ; n = 3 samples per group). RNA was isolated from the cell populations, amplified, labeled and hybridized to Affymetrix Genechip 430 version 2.0 arrays as described in Methods. Expression data were normalized using Robust Multichip Average (RMA), consisting of three steps: a background adjustment, quantile normalization and finally summarization [18]. In order to determine the integrity of the data generated in our array experiment, we undertook several analytical approaches (PCA, hierarchical clustering, and p-value analysis), which are established strategies to evaluate microarray data.
Principal components analysis (PCA) is an unsupervised technique in which the data are not segregated into groups prior to the analysis. Instead, patterns are observed based on the output of the analysis. To this end we used Partek software with default settings, including a covariance dispersion matrix (Methods). The data used to generate this plot consist of a matrix of 12 samples, each comprising 45,101 RNA transcript intensity values. The axes correspond to the principal components (PCs), with the first PC axis accounting for 28.9% of the variance and the second PC axis accounting for 19.3% of the variance. The plot showed the samples in four groups corresponding to the four experimental conditions. This demonstrates that the samples were distinguishable based on their overall gene expression profiles, that the samples were normalized consistently, and that there were no outlier samples (Figure 2A). Our ability to discriminate between experimental groups by PCA was consistent with hierarchical clustering analyses (data not shown). These analyses gave us confidence in the integrity of our data and we therefore proceeded to test the hypothesis that day 6 GFP + cells (which represented the cardiac progenitor population) possess a unique transcriptional signature relative to control populations (D0, D4 and D6 GFP 2 ). To test this hypothesis we implemented ANOVA using a ''contrast'' of the day 6 GFP + versus the ''control'' populations (Methods). We applied several false discovery rates (0.05, 0.01, 0.001), identifying 2643 (FDR = 0.05), 1018 (FDR = 0.01) and 297 significantly regulated probe sets, respectively (FDR = 0.001; Table 1, Table S1). We also demonstrated that a large proportion of transcripts were significantly regulated in the day 6 GFP + cells by plotting a histogram of the frequency of p-values (y-axis) versus the observed p-value (x-axis) ( Figure 2B). If our comparisons of contrast (GFP + ) RNAs to controls (D0, D4 and D6 GFP 2 ) were to demonstrate no significant differential regulation, we would expect a random distribution of p-values resulting in 2255 p-values in each bin. Our data demonstrated that the observed number of small pvalues is clearly overrepresented, an indication of robust power to discriminate between the groups being compared.
To establish the integrity of the array-based data and these statistical analyses we performed qRT-PCR on a subset of known cardiac genes (Nkx2-5, Gata4, Isl1), as well as genes shown by  Masino et al., (2004) to be upregulated in developing cardiomyocytes evaluated ex vivo [23] (Tek, Gyg) ( Figure 3). All qRT-PCR data were consistent both with the published data and with the enrichment of these transcripts in the D6 GFP + population relative to the D0 and D4 populations detected within the array experiment. These data further validated the statistical analysis of our array data and support its potential in vivo biological relevance. Thus we moved on to impute some biological correlates to this large body of data.

GO Annotation of Differentially Regulated Transcripts
To determine the function of the transcripts upregulated in the Day 6 GFP + population, we queried ontology databases. At FDR #0.001, the identified 297 probe sets represented 260 unique genes or transcripts, of which approximately 80% (208/260) had GO annotation. The corresponding GO annotations were established for processes, functions, or cellular locations that have been related to cardiac differentiation and the results presented in Table S2. The one exception is that ''antigens'', which is not a GO classification, were selected by their MGD Nomenclature Committee names. The resulting ontology data indicates that there were significant fractions of genes, upregulated in the D6 GFP + population, responsible for cell signaling (extracellular, 24/ 260, 9%; and intracellular, 26/260, 10%) as well as transcriptional regulation (31/260, 12%). There is also a large population of structural molecules (30/260, 12%) in this dataset, including myosins and troponins, indicative of a cardiogenic population. Finally the largest population of the ontology dataset is transcripts of unknown function (52/260; 20%), and an additional 10% (27/ 260) of the identified transcripts providing insufficient information for definite classification (Figure 4). Although these data suggest trends in biological classification of identified genes, they fall short of determining their biological relevance.
To this end we set out to define the developmental expression of selected candidates, we selected those transcripts that were upregulated in GFP + population (n = 176) choosing as a threshold a high stringency FDR of 0.001. In addition, we selected only those transcripts displaying a minimum fold change threshold of +1.2. Importantly, of 176 transcripts upregulated in the cardiac D6 population, 43 (24%) have known roles in cardiac development or function (Table S1). Our underlying hypothesis was straightforward; identified genes highly likely to be expressed in key cardiac structures during development.

Determination of Biological Significance in vivo by in situ
We selected 31 candidates of the remaining 133 for evaluation (23%), prioritizing genes for which insufficient biological data were available to include or exclude a role in cardiogenesis. We determined the spatial expression of selected candidates by in situ hybridization at mouse E7.5, E8.5, and E9.5, critical time points for mouse embryonic heart development. To aid our analysis, we first established the expression patterns of two known cardiac genes; in situ hybridization was performed using probes for Nkx2-5 and Isl1 at the corresponding time points (Figure 5 A-F). The transcript corresponding to the Nkx2-5 transcription factor is present in the primary and secondary heart fields at E7.5, the linear heart tube at E8.5, and throughout the looping heart at E9.5 ( Figure 5 A-C). Secondly, Isl1 is expressed in the secondary (anterior) heart field at E7.5; its expression persists in the outflow tract at E8.5; and subsequently in the outflow tract, trigeminal ganglia, spinal cord, and dorsal root ganglia on E9.5 ( Figure 5 D-F).
The candidates evaluated by in situ hybridization demonstrated a wide range of expression levels and locations throughout the studied time points. Consistent with our hypothesis that the candidates upregulated in the GFP + population are expressed in early cardiac structures, transcripts corresponding to nine of the genes (4930544G21Rik, Ak080047, Rbm24, Ppp1r3d, Tmem108, Air, Arhgap29, Igf2, and Igfbp5) were detected in the cardiac crescent, at the earliest timepoint studied (E7.5) ( Figure 5). Four candidates, 4930544G21Rik, Ak080047, Rbm24, and Ppp1r3d, were expressed exclusively in the cardiac crescent (  Tmem108, Air, Arhgap29, Igf2, and Igfbp5 were expressed in the cardiac crescent and in other early embryonic structures ( Figure 5 S, V, Y, BB, EE). At E8.5, the cell populations in which these genes were expressed correspond to discrete cardiac structures, with Ak080047 expressed solely in the heart tube ( Figure 5K); 4930544G21Rik, Tmem108, and Rbm24 expressed in the heart inflow tract, heart tube and outflow tract ( Figure 5 H, N, T); Ppp1r3d expressed in the inflow and outflow tracts ( Figure 5Q); Arhgap29 and Igfbp5 expressed in the branchial arches, inflow and outflow tract (Figure 5 Z, FF); and Air and Igf2 expressed in the inflow tract, heart tube, and outflow tract, in addition to other embryonic tissues (Figure 5 W, CC). At E9.5, 4930544G21Rik was expressed in the inflow tract, common atria, right ventricle, outflow tract, and central nervous system (CNS) ( Figure 5I); whereas Ak080047 was expressed throughout the heart and CNS ( Figure 5L); similarly Rbm24 was expressed in the entire heart but is also detectible in the somites ( Figure 5O). Arhgap29 was expressed in the inflow tract, common atria, outflow tract, and branchial arches and other embryonic tissues ( Figure 5AA); while Ppp1r3d was inflow and outflow tracts, otic vesicle, and somites ( Figure 5R); Tmem108 was expressed in the inflow tract, heart, outflow tract, and CNS ( Figure 5U); Igfbp5 was expressed in the inflow and outflow tracts, branchial arches, and somites ( Figure  GG); finally Igf2 and Air were expressed in the inflow tract, heart, outflow tract, and branchial arches along with other embryonic tissues ( Figure 5 X, DD).
Although transcripts corresponding to the remaining candidates could not be detected at E7.5, many were detectable by E8.5.
AK033658 was expressed in the inflow tract of the E8.5 and E9.5 embryo, with expression also detected in the common atria and CNS of the E9.5 embryo (Figure 6 A-B). AI465270 and Unc45b were expressed in the inflow tract, heart, and outflow tract of the E8.5 and E9.5 embryo (Figure 6 C-F). Other candidate transcripts (Dlk1, 1110062M06Rik, and 8430436O14Rik) were detected in key cardiac structures such as the branchial arches, inflow tract and outflow tract, but not the heart itself ( Figure 6 G-L). Ctla2a is expressed in the branchial arches and outflow tract of the E8.5 embryo, with expression later seen in the inflow tract, common atria, right ventricle and outflow tract of the E9.5 embryo (Figure 6  M-N). AI430856 was expressed in the branchial arches and outflow tract of the E8.5 mouse embryo ( Figure 6O); in the E9.5 embryo, expression was observed in the branchial arches, inflow tract, and outflow tract ( Figure 6P). AI430856 was later identified to be a portion of the 39 untranslated region (UTR) of the Hand2 gene, but the absence of expression in the right ventricle, in contrast to the Hand2 gene [24], indicates a possible regulatory role of this fragment. Finally, A730054J21Rik was expressed in the inflow and outflow tract at E8.5 ( Figure 6Q); at E9.5, A730054J21Rik expression continued in the outflow tract, and was also shown to be expressed in the spinal cord and a position consistent with the forming gut ( Figure 6R).
Finally, the expression of some candidates was only detectible by E9.5, with Taok3 expression observed in the inflow tract, common atria outflow tract, and otic vesicle of the E9.5 embryo ( Figure 6S); 9130005N14Rik expression observed in the inflow tract, common atria, and outflow tract at E9.5 ( Figure 6T); expression of Gab2 evident in the branchial arches, inflow tract, common atria, spinal cord, and forming gut of E9.5 embryos ( Figure 6U); and BQ947020 was expressed in the right ventricle and CNS of the E9.5 embryo ( Figure 6V). The remaining eight candidates evaluated through in situ hybridization were expressed in cardiac related structures such as the heart, outflow tract, and branchial arches, as well as other embryonic structures ( Figure S2).
A table summarizing the results of 31 in situ hybridizations completed in this study provides a listing of developmentally important cardiac structures in which they are expressed (Table 2). Our data demonstrate the successful utilization of mESCs to model and illuminate early cardiac development. By acquiring a transcriptional profile of these cells at multiple times during their differentiation along a cardiac lineage, we have been able to identify both known and novel transcripts expressed in early  cardiac structures in vivo, providing a repository of genes with potential functional significance in cardiogenesis.

Discussion
Elucidating the genetic programs underlying developmental processes is a central goal in contemporary medical research. Small populations of target cells that are not easily isolated from the surrounding tissue can complicate this challenge. We demonstrate that in vitro differentiation strategies using mESCs can effectively model the early stages of cardiogenesis, and that established genomic technologies can be effectively used to illuminate pertinent genetic signatures therein. Through the use of the Nkx2-5-GFP ES cell line we have isolated a relatively pure cardiac progenitor population for transcriptional profiling. By comparing the cardiac progenitor population (D6 GFP + ) against undifferentiated mESCs (D0), early differentiating mESCs (D4) and non-cardiac multi-lineage mESCs (D6 GFP 2 ), we generated a robust dataset represented by 260 unique transcripts significantly differentially expressed (FDR of 0.001). We have validated the integrity of this data through qRT-PCR and demonstrated its concordance with a sample of genes identified in a previous ex vivo analysis (Masino et al., 2004). Consistent with the underlying premise of this experiment, 43/176 (24%) of the transcripts upregulated in the cardiac fated population were known to play a role in cardiogenesis or hematopoiesis, confirming the strength of this dataset. While a significant fraction of the transcripts in this dataset were of unknown function (52/260), we posit that many play a role in cardiogenesis. Analysis of the gene ontology of differentially expressed transcripts, while imperfect, provides insight to the potential interactions that underlie the fate determination that drives the mesodermal subpopulation of differentiating mESCs along a cardiac lineage. Although the complex signaling pathways necessary for cardiogenesis are incompletely known, the large number of transcripts in this dataset encoding transcription factors (n = 32) or genes involved in extracellular (n = 24) or intracellular (n = 26) signal transduction provide novel avenues for the investigation of cardiac development. While some of the transcription factors identified in this study, such as Nkx2-5, Tbx5, and Mef2c, have an established role in cardiogenesis, many do not, making them ideal candidates for further appraisal. Finally, the large number of transcripts with no (n = 52) or incomplete (n = 27) annotation offer an equally novel foundation for future work in this field. Importantly, the candidacy of several novel players must first be dependent on a demonstration of their biological relevance to cardiogenesis.
We set out to establish the biological relevance of a large subset of candidates through in situ hybridization during embryogenesis and in doing so we uncovered a wide range of expression patterns consistent with cardiogenesis. Most significantly, nine of the 31 candidates (29%) (4930544G21Rik, Ak080047, Rbm24, Ppp1r3d, Tmem108, Air, Arhgap29, Igf2, and Igfbp5) are expressed in the forming cardiac crescent (E7.5). The expression of these genes is consistent with their potential role in the formation of the earliest cardiac structures. Furthermore, these genes continue to be expressed at subsequent time points (E8.5, E9.5) consistent with a possible role in the maintenance or maturation of cardiac lineages. Although transcripts corresponding to the remaining 22 candidates evaluated in this study were not detected at E7.5, all were shown to be expressed in cardiovascular tissues at later time points (E8.5, E9.5). The expression of these transcripts at later time points does not diminish their potential importance in cardiogenesis; they may also play significant roles in the maturation of the early cardiogenic population, or the morphological changes to the structure of the looping heart. While many of these candidates are expressed solely in cardiac structures, the expression of most major cardiac genes extends beyond the cardiovascular system. Nkx2-5, for example, is also expressed in the foregut, thyroid, spleen, stomach and tongue [25]; similarly, in addition to the outflow tract Isl1 is expressed in the spinal cord, dorsal root ganglia, and trigeminal ganglia at E9.5; and Tbx5 is expressed in both the heart as well as the limb bud [26]. Consistent with these observations, many of the candidates evaluated in this study are also expressed in both cardiac and noncardiac structures.
Importantly Nkx2-5 is expressed in both the primary and secondary heart fields. Consequently we anticipated that our transcriptional analyses could identify transcripts present in both heart fields, as well as those exclusively present in one heart field. Of the candidate genes investigated, we would predict Igfbp5, 4930544G21Rik, and Ppp1r3d are expressed in the secondary heart field at E7.5. Consistent with lineage tracing experiments of the secondary heart field [3,5], these genes are later expressed in the outflow tract at E8.5 and E9.5 and the developing atria and right ventricle at E9.5. Conversely, AK080047 is expressed in a region consistent with the primary heart field at E7.5, with later expression in the linear heart tube at E8.5 and throughout the heart entire heart at E9. 5. These examples underscore the ability of the Nkx2-5-GFP cell line to isolate multiple populations of cells that contribute to the developing heart.
Although the candidate genes whose biological relevance has been evaluated in this study represent an incomplete sample of the total dataset, we have shown that many transcripts with no prior functional annotation display expression tight and specific patterns consistent with genes involved in cardiogenesis. While we have not experimentally validated the biological relevance of every transcript that is upregulated in the cardiac fated GFP + population, taken in concert with published data we demonstrate that more than 40% (74/176) had expression patterns consistent with a role in cardiogenesis. First, 24% (43/176) of the genes identified in this screen are known to have an established role in cardiogenesis or hematopoiesis. Second, of the genes upregulated in the cardiac fated population, but lacking an identified role in cardiogenesis (133/176), 31 are detected in early cardiac structures by in situ hybridization. Taken collectively, these data strongly support the utility of mESCs to model early cardiac development. These data will serve as a great resource as we continue to investigate the mechanisms of cardiac development; focusing on further exploration of this dataset and characterizing the mechanisms by which these newly identified cardiac genes participate in cardiac development.