Integrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes

Progress in understanding complex genetic diseases has been bolstered by synthetic approaches that overlay diverse data types and analyses to identify functionally important genes. Pre-term birth (PTB), a major complication of pregnancy, is a leading cause of infant mortality worldwide. A major obstacle in addressing PTB is that the mechanisms controlling parturition and birth timing remain poorly understood. Integrative approaches that overlay datasets derived from comparative genomics with function-derived ones have potential to advance our understanding of the genetics of birth timing, and thus provide insights into the genes that may contribute to PTB. We intersected data from fast evolving coding and non-coding gene regions in the human and primate lineage with data from genes expressed in the placenta, from genes that show enriched expression only in the placenta, as well as from genes that are differentially expressed in four distinct PTB clinical subtypes. A large fraction of genes that are expressed in placenta, and differentially expressed in PTB clinical subtypes (23–34%) are fast evolving, and are associated with functions that include adhesion neurodevelopmental and immune processes. Functional categories of genes that express fast evolution in coding regions differ from those linked to fast evolution in non-coding regions. Finally, there is a surprising lack of overlap between fast evolving genes that are differentially expressed in four PTB clinical subtypes. Integrative approaches, especially those that incorporate evolutionary perspectives, can be successful in identifying potential genetic contributions to complex genetic diseases, such as PTB.


Introduction
Complex genetic diseases derive from evolutionary and mutational processes that generate segregating variants conferring susceptibility to disease [1][2][3][4][5]. The challenges for identifying disease genes have been well-documented: different approaches traditionally used to identify them can produce large numbers of candidates that explain only modest amounts of variation in risk, and often lack replication [6,7]. Moreover, the disease itself may constitute a poorly defined or understood phenotype, which means that hypotheses regarding potential contributors to disease are made in the absence of a sufficient understanding of traits in their normal, non-disease states.
One promising approach that has the potential to reduce the number of candidates and increase replication, while identifying broad features of the genotype-to-phenotype map, is the integration of diverse datasets [8][9][10][11][12]. Several integrative approaches have been developed to interrogate the rapidly growing body of high-throughput genomic datasets for identifying genes that may either directly harbor causal variants, or else be involved in complex syndromes [13,14]. Convergence in identifying disease genes across different data types defines features of the genetic architecture and the functional associations between loci that underlie complex phenotypic traits, and potentially highlight important pathways and gene sets that may be overlooked within the framework of a single data type because of weak effect sizes or because they only indirectly affect the trait of interest [14][15][16][17][18]. Convergence-based approaches have been successful in identifying genetic networks underlying certain cancers, as well as a number of other diseases, including lung, autoimmune and neurodegenerative diseases, type 2 diabetes [15,17,18], and even well-studied polygenic traits, such as human height [19].
Pregnancy maintenance and parturition are complex reproductive processes that involve interactions between the fetal, paternal and maternal genomes, and maternal physiology and environment. Complications associated with pre-term birth (PTB) are among the leading causes of mortality worldwide of children under the age of five [20]. PTB is a heterogeneous phenotype that includes nine different obstetrically defined clinical manifestations: infection/ inflammation, maternal stress, decidual hemorrhage, uterine distention, cervical insufficiency, placental dysfunction, premature rupture of the membranes, maternal comorbidities, and familial factors [21]. While PTB results from a complex set of causes, various studies have indicated that PTB exhibits moderate heritability [22][23][24][25][26][27], motivating efforts to identify the genetic factors that confer risk for PTB. Like other human complex genetic traits [28,29], the genetics that characterize PTB most probably involve both coding and non-coding variation at many loci, with causal alleles displaying a range of effect sizes and population frequencies [30][31][32][33][34][35][36]. Candidate gene analyses and studies of patterns of differentially expressed genes across various tissues have implicated many variants and numerous differentially expressed genes across various tissues, although few have been replicated or confirmed by genome-wide association studies (GWAS) [37][38][39][40][41]. To date, integrative approaches have not kept pace with the proliferation of new data and data types on PTB, hampering identification of genes and pathways that underlie birth timing (e.g., [41]).
To evaluate the convergence of different data types on PTB, we overlaid datasets that identified fast evolving genes in the human and primate lineage with datasets that identified differentially expressed genes enriched for placental expression across four PTB clinical subtypes. The rationale for this approach follows from the fact that the mechanisms that determine parturition and birth timing in humans are poorly understood [38]. The placenta mediates implantation in pregnancy, performs all of the major organ functions of the developing fetus, and forms the metabolic, immunological and endocrinological interface between mother and fetus [39]. Placental pathologies are a leading cause of diseases of pregnancy, such as pre-eclampsia [40]. Characterizing the genetic features of placentally-expressed genes is thus a necessary step in the effort to understand human parturition and the genetic factors that disrupt pregnancy. Because pregnancy traits have evolved very fast in modern humans, and are obviously closely tied to fitness, the signatures of adaptation and rapid evolution in maternal and fetal traits associated with pregnancy must be reflected in the genes that underlie them [41]. Evolutionaryinformed discovery of the genetic contributions to human pregnancy can thus help to pinpoint the genes, functional mechanisms and adaptations that comprise parturition and birth timing in modern humans, and aid in the discovery of genetic elements associated with disease [42].

Gene Expression Data
An overview of the experimental scheme is shown in Fig 1. We first downloaded a list of genes expressed in trophoblastic and decidual placenta cells from the Protein Atlas Database (PAD) of a Tissue-Based Map of the Human Proteome ver. 13 [43]. Most of these placentally expressed genes, or PEGs, are expressed in various tissues in addition to the placenta. Only genes that have official gene symbols based on the DAVID gene ID conversion tool were used, and after excluding duplicates, the final list of PEGs from Protein Atlas consisted of 12,478 genes [44,45]. Next, we downloaded the lists of genes from the PAD that are enriched in placenta (86 genes) and those that are expressed in 23 other tissues. The PAD defines tissue enrichment as those that are expressed at levels at least 5X higher in the focal tissue compared to all other tissues in the body. Our list included those tissues with at least five genes or more that are enriched. Finally, we downloaded lists of differentially expressed genes from four PTB clinical subtypes (preeclampsia (PE), 896 genes; spontaneous or idiopathic preterm birth (sPTB), 44 genes; preterm premature rupture of membranes (PPROM), 70 genes; and presence of birth without labor (Labor Expressed Differentially; LED), 443 genes) compiled from 93 Overview of the scheme for identifying convergence between genes under positive selection and those associated with expression differences in normal pregnancy and various syndromes. Convergence between different data set was determined by overlaying gene sets from each of the data categories using Venn diagram. Genes that fall in overlapping sets were functionally annotated using PANTHER web tool. Key: CAC, Coding Accelerated Changes; EPS, European Positive Selection; HARs, Human Accelerated Regions; PEG, Placental Expressed Genes; PED, Preeclampsia Expressed Differential; LED, Labor Expressed Differential; PPROM, Preterm Premature Rupture of Membranes; sPTB, Spontaneous Preterm Birth. studies (that looked at patients with pregnancy complicated by a particular PTB clinical subtype relative to individuals with normal pregnancies as controls) by Eidem and co-workers [46].

Evolutionary Data
We also collated three different lists of genes that represent both ancient and more recent signatures of fast evolution in coding and non-coding regions along the human and primate lineage. We used studies that were genome wide, reported lists of genes in the text or as supplemental data, and that captured a range of methods that infer fast evolutionary rates, e.g., site frequency spectrum (SFS), linkage disequilibrium and composite methods (see the S1 Materials for explanations of the methods described below). The first list collated data from 11 different studies reporting the 1,035 genes that exhibit signatures of fast evolution in the genic regions among primate lineages based on interspecies comparisons [42,[47][48][49][50][51][52][53][54][55][56] (S1 Table). We call this list Coding Accelerated Changes, or "CAC". The second list collated data from four studies that identified short elements exhibiting accelerated lineage-specific substitutions in conserved noncoding sequences in vertebrates (known as Human Accelerated Regions or HARs) [57][58][59][60]. To generate the lists of 2,657 genes that correspond to 3,939 HARs, we used the Genomic Regions Enrichment of Annotations Tool (GREAT-http://bejerano.stanford. edu/great/public/html/) (S1 Table) [61]. The third list collated data from 19 studies that analyzed genes in regions associated with signals of positive selection (including genome wide single nucleotide polymorphisms (SNPs), HapMap, HGDP, Perlegen data and sequence data from the 1000 Genomes Project and Complete Genomics in European populations) [62][63][64][65][66][67][68][69][70][71][72][73][74][75][76][77][78][79][80]. See the S1 Materials for descriptions of analytical methods for measuring selection in human populations, and references therein. We limited our survey to the 3,053 genes in such regions in European populations because most analyses of pregnancy phenotypes are skewed towards individuals of European ancestry. Fast evolving genes identified by this method occurred after emergence of modern humans and out-of-Africa migrations in ancestral European populations. We call this dataset European Positive Selection, or "EPS" (S1 Table).
In summary, CAC genes therefore correspond to fast evolution in exonic regions (largely determined by ratios of nonsynonymous to synonymous substitutions or dN/dS), and tend to be genes with more 'ancient' signatures of fast evolution. In contrast, HARs and EPS genes correspond largely to genes linked to fast evolving non-exonic elements, and constitute genes that tend to be associated with more recent signatures of selection. Below, for simplicity, we generally refer to any genes emerging from these three lists as being "fast evolving", and we use the terms "coding" CAC genes and "non-coding" HARs and EPS genes to mean the genomic localization of the fast evolving allele, and not the protein coding potential of the genes themselves. Thus, for example, a "fast-evolving EPS gene" is one in the genomic neighborhood of a SNP identified in a scan for accelerated evolution. Possible evolutionary interpretations for genes identified by these different methods are provided in the S1 Materials.

Visualization and Statistical Analysis
CAC, HAR-associated, and EPS genes were overlaid with those from the different gene expression data sets described above. We visualized these data with Venn diagrams using Venny v. 2.0 [81]. We evaluated statistical significance of the overlap between pairs of gene sets by a hypergeometric distribution test as implemented in http://nemates.org/MA/progs/overlap_ stats.html [82]. We summarized the overlap using a simple index known as the representation factor (RF), which is the number of overlapping genes divided by the expected number of overlapping genes drawn from two independent groups [82]. An RF > 1 indicates more overlap than expected, whereas an RF < 1 indicates less overlap than expected [82]. We only present significant results for RF > 1, as we are interested in those genes that overlap more than expected by chance. The representation factor (RF) was calculated using the GENCODE ver. 22 estimate of 19,814 genes in the human genome [83].
For the genes we collated, we summarized patterns of biological, molecular, protein and pathway annotations using PANTHER ver. 10.0 (Protein Annotation Through Evolutionary Relationship-http://www.pantherdb.org/) [84]. We evaluated patterns of overrepresentation for overlapping genes in PANTHER, using lists of overlapping genes as tests, and reference lists appropriate to the relevant comparison. For example, for placentally expressed genes derived Protein Atlas, we summarized functional annotations and overrepresentation of those successfully mapped to the ENSEMBL genome archived in the PANTHER database as 2014-4. Alternatively, for analysis of overrepresentation in PANTHER classes of fast evolving genes among all placental genes, our reference gene list was placental genes only, rather than all human genes. Significance was evaluated using a binomial distribution test corrected for multiple tested, as implemented in PANTHER. There were few or no genes that were differentially expressed in sPTB and PPROM that were also fast evolving, probably due to the small numbers of studies that looked at genes differentially expressed in these two PTB clinical subtypes [46]. Therefore, evaluation for overrepresentation was not done for these two phenotypes.

The overlap between fast evolving and placentally expressed (PEG) genes
More than 60% of the protein coding genes in the human genome is expressed in the placenta [43]. Of the 12,478 placental genes we evaluated, however, only 3,196 are fast evolving (about 26% of all placentally expressed genes and about 52% of the 6,106 fast evolving genes that we assembled). There was no evidence that fast evolving genes are overrepresented among all genes expressed in the placenta (hypergeometric test; RF = 0.8; Fig 2; Table 1). Although we aggregated more HARs and EPS genes, placentally expressed, fast evolving genes are drawn roughly proportionally from coding and non-coding gene sets (about half of each gene list; Table 1). Of the 12,478 placentally expressed genes, only 16 genes are classified as fast evolving in each of the three categories (S1 Table). These include 10 neurodevelopmental genes, namely AUTS2, ASTN2, COL25A1, GFRA1, MGAT5B, MTR, NFIB, PTPRD, ROBO1 and HERC2, a centrosomal protein gene associated with microcephaly (CDK5RAP2), two genes with possible associations with immunity (FCRL3 and THSD7B), a cell adhesion gene associated with epithelial tumorigenesis (PTPRK), and a nucleoside transporter (SLC28A3).
Placentally expressed genes (PEG) that are fast evolving in at least one category are overrepresented in various biological processes, especially those involving neurological processes, cell adhesion, and various developmental processes, such as mesoderm and nervous systems (S2 Table). Proteins related to defense, immunity and receptor activity are overrepresented, as are two signaling pathways (epidermal growth factor receptor (EGFR) (p value = 0.03) and platelet-derived growth factors (PDGF) (p value = 0.04)). Among the most numerous genes in the intersection of fast evolution and placental expression are those involved in Wnt signaling and gonadotropin releasing hormone receptor activity. There are differences between the PEG genes that are in the coding CAC category and the non-coding categories. Coding CAC genes are enriched for genes that code for proteins involved in immune system, including defense/ immunity and cytokines, while the placentally expressed, non-coding HARs genes are enriched for genes that encode transcription factors, and proteins involved in development, adhesion and extracellular matrix proteins, and those involved in receptor activity (S2 Table). Overall, only two pathways are overrepresented among all categories of fast evolution: placental expressed HARs genes are enriched for cadherin and Wnt signaling (p value = 0.003, 0.001) (S2 Table).

The overlap between fast evolving genes and genes enriched for placental expression
For the genes enriched for tissue expression (expressed 5X more in a given tissue than other tissues), only five of the 24 tissues we evaluated were significantly overrepresented in any of the fast evolution categories, bone marrow, cerebral cortex, placenta, salivary gland and thyroid gland (Fig 3; Table 2; S1 Table). Nearly 32.5% of placental enriched genes (30 of 86) are fast evolving genes, more than that all tissues other than thyroid gland (39%) and cerebral cortex  (34.6%). The signature of fast evolution differed among the five tissues ( Table 2). CAC genes tended to be overrepresented among salivary gland (RF = 2.6; p value < 0.02), bone marrow (RF = 2.3; p value < 0.03) and placentally enriched (RF = 1.6) genes, although the latter was not significant (p value < 0.16). By contrast, genes whose expression was enriched in cerebral cortex and thyroid tissues were significantly overrepresented among genes linked to HARs (RF = 2.0; p value < 1.6e-10 and RF = 2.6; p value < 0.01, respectively). No EPS genes were overrepresented in any tissues. There were no genes in the intersection of placental enrichment and each of the three categories of fast evolution (Fig 3). A number of genes overlapped in two categories of fast evolution, however. These include a pregnancy-associated plasma protein A (PAPPA), a corticotropin-releasing hormone (CRH), a proteoglycan (EPYC), and a hepatocyte growth factor (HGF) important in angiogenesis and tumorigenesis (S1 Table).
As with PEGs, those genes enriched for expression in the placenta encode diverse proteins, many of which have catalytic, transport and signaling properties, and are involved in variety of processes typical of placental expression, such as cell adhesion, immunity, proteolysis and hormone biosynthesis (S3 Table). Probably due to small sample size, most fast evolving, placentally enriched genes are not statistically overrepresented in functional categories. The exception is HARs-associated genes where, relative to all placentally enriched genes, there is overrepresented in adhesion processes (S3 Table). Among the 30 fast evolving, placentally enriched genes, three (CGA, CGB2, CRH) encode releasing hormones (corticotropin releasing factor receptor signaling, gonadotropin releasing hormone receptor, and thyrotropin releasing hormone receptor, respectively (S1 Table)). A number of fast evolving, placental genes show tumorigenic or tumor suppression activity (ADAM12, ADAMTS18, CAPN6, EGFL6, HTRA4, LIN28B) and others are involved in disorders associated with epithelial and connective tissues (FBN2) or have immune functions (IL1RL1, PRG2, SIGLEC6). Three members of the pappalsin family are fast evolving in the placenta (PAPPA, PAPPA2, PAPPA-AS1), and altogether, seven members of the pregnancy-specific glycoproteins are fast evolving (S1 Table).

The overlap between fast evolving genes and differentially expressed genes in PTB clinical subtypes
Of those genes differentially expressed in the PTB clinical subtypes, the proportion of fast evolving genes ranges from 23% to 34%, with sPTB having the largest fraction of fast evolving genes ( Table 3). The large fraction of fast evolving sPTB genes is largely driven by fact that 8 of Table 2. Overlap between fast evolving genes and those that exhibit tissue enrichment in their expression. Only tissues with significant representation factors greater than one are shown, out of 24 tissues evaluated in the Protein Atlas database with more than five enriched genes. Values are genes, with associated representation factors in parentheses and asterisk for values significantly > 1.  Table 3).
No fast evolving genes were common to the four categories of differentially expressed PTB clinical subtypes, nor did any genes in the four PTB clinical subtypes overlap in each of the three categories of fast evolution (Fig 4), perhaps reflecting real underlying differences in the biological axes categorizing these clinical subtypes and the breadth and complexity of the phenotypes subsumed under the various clinical subtype categories. For example, PPROM and sPTB share no fast evolving genes in common. Nevertheless, although sample sizes are small, 14% and 39% of the PPROM and sPTB differential expressed genes overlapped with those in differentially expressed in PE, highlighting biological commonalities both of these clinical subtypes likely share with PE.
In terms of their functional annotations, no fast evolving PTB clinical subtype genes are overrepresented in PANTHER categories (S4 Table). However, these genes fall in categories that are consistent with recognized disease pathways in pregnancy, including the P53, 5-hydroxytreptamine (serotonin or 5-HT) degradation, and TGF-β signaling pathways, and various pathways involved in neurodevelopment and immune system processes. As was the case with placental enriched, fast evolving genes, a number of fast evolving genes that are differentially expressed in PTB clinical subtypes express tumor-proliferative or suppressive activity. These included WWOX (common to preeclampsia, HARs and EPS), a gene that play a role in apoptosis and act as tumor suppressor [85,86]. KDR (common to LED, HARs and EPS) is a gene involved in mediating endothelial proliferation, survival, migration, tubular morphogenesis and sprouting [87,88]. Also common to that group is ITPR1, which is a gene that mediates calcium release from the endoplasmic reticulum and triggers apoptosis [89,90]. As well, two of the three genes play roles in neurogenesis [91][92][93][94][95], while KDR has been implicated in recurrent pregnancy loss [96]. A number of genes that are differentially expressed in PTB clinical subtypes and fast evolving are immune related or involved in angiogenesis, such as CFB which is involved in complement activation, and CXCR4, which is a chemokine receptor. Finally, two genes differentially expressed in PTB clinical subtypes and classified as fast evolving in each of the three categories, NFIB (preeclampsia and LED) and CXCR4 (sPTB) (S1 Table).

Discussion
Common heritable diseases are evolutionary conundrums. Debates about disease models that can account for alleles that segregate at appreciable frequencies hinge on population genetic assumptions about evolutionary history, effect sizes, and demography, and whether the loci that underlie diseases ultimately will conform to modeling constraints [6,7,97]. One alternative method that has emerged in recent years is a "nonparametric" approach, which is based on the assumption that different data types can converge on the loci underlying common diseases, even if the data do not readily conform to contemporary disease categories or disease models.
In this study we asked, to what extent do placental gene sets derived from evolutionarybased analyses converge on those derived from expression analyses, and what light can the intersection of such data shed on our understanding of the genetic basis of PTB clinical subtypes? We found that when fast evolving genes and elements are aggregated by evolutionary rate variation in coding and non-coding regions and partitioned by differential expression in the placenta, they converge on a small number of genes that may be candidates for PTB. For example, fast-evolving PEG genes typically encode membrane-bound proteins with functions related to binding and signaling between cells and the extracellular matrix. Disruption of membrane formation and rupture are well-characterized pathologies of normal pregnancy [98]. Likewise, of the genes that are enriched for placental expression, nearly 35% are fast evolving, greater than all but two of the other tissues we evaluated (thyroid gland and cerebral cortex). Many of these placental genes are evolving rapidly in coding regions. By comparison, the fast evolving genes expressed in the brain tend to be associated with HARs, possibly reflecting fundamental differences in how selection has acted on the human brain and placenta [99]. A third of these placentally enriched fast evolving genes have well-characterized roles in pregnancy and differentially expressed in PTB clinical subtypes (EPYC, HGF, PSG2, PSG3, PSG4, CRH, PAPPA, PSG1, PSG5, PSG11), and as with fast evolving PEG genes, are signaling or extracellular molecules with roles in inflammation, neurodevelopment, and inflammation. Despite performing pregnancy related functions, most of the 16 PEG genes that exhibit pattern of fast evolution in all selection categories (10/16) were not differentially expressed in PTB clinical subtypes. Thus dysregulation in these genes might contribute to pregnancy pathologies by processes that do not involve expression modulation.

Pathways enriched in PEG and fast evolving PEG genes
In terms of pathways in which they are expressed, dysregulation of any of the genes uniquely enriched for placental expression will conceivably underlie pregnancy related pathologies. Relative to all genes in the human genome, PEG genes are enriched for two pathways (EGF and PDGF signaling). Fast-evolving PEG genes (those associated with HARs) are enriched for two additional pathways (cadherin and Wnt signaling). Interestingly, all four pathways play critical roles in regulating growth, proliferation and differentiation of mammalian cells [100,101], roles that are indispensable for normal development and functioning of the placenta. In addition, EGFR signaling has been shown to stimulate angiogenesis, promote cytotrophoblast migration and invasion, and block apoptosis [102,103], while PDGF is important in regulating trophoblast angiogenesis [104,105]. The Wnt signaling pathway is facilitator of cell-cell signaling events during embryogenesis [106,107], and plays several roles in human placentation [107,108]. Cadherins are a group of transmembrane glycoproteins involved in cell adhesion and tissue formation [109][110][111][112][113]. Most of the genes from the cadherin superfamily are expressed in the embryonic and adult nervous system and have been implicated in diseases of the central nervous system. Wnt and cadherin signaling share a key component that facilitates normal cascades within both pathways, and several studies have shown cross talk between the two [114][115][116][117]. Overall, the four pathways are crucial for successful implantation and development of early pregnancy [118,119]. Furthermore, dysregulation of the EGFR, PDGF and Wnt genes have been implicated in several pregnancy pathologies: complete hydratidiform mole (a rare mass or growth that forms inside the uterus at the beginning of a pregnancy), low birth weight, intrauterine growth restriction (IUGR), recurrent abortions and PE [108,[120][121][122][123][124][125][126].

Fast evolving genes of interest in PTB clinical subtypes
While there was no functional enrichment of fast evolving genes among genes differentially expressed between preeclampsia or birth with labor and normal or birth without labor, respectively, a number of genetic pathways differs between fast evolving PE and LED genes (S1 Table), possibly highlighting contrasting axes along which these two clinical subtypes segregate (there were too few PPROM and sPTB fast evolving genes for meaningful comparisons). For example, genes in the TGFβ signaling and 5-hydroxytryptamine degradation pathways are fast evolving in coding regions in PE, but not in LED. 5-HT is thought to interfere with the hormonal mechanisms responsible for the maintenance of gestation by hindering the production of progesterone needed for the maintenance of gestation in mice [127]. It is thought that many crucial fetal neurodevelopmental processes are regulated by 5-HT, both from the maternal system early in development and later from the fetal system. 5-HT has been shown to induce labor by stimulating contraction of human uterine smooth muscle myometrium through special contractile receptors expressed in pregnant human myometrium [128,129]. The contractile effects by 5-HT in myometrium have been identified in species that have discoidal placental types like rabbit, rat and guinea pig, while in contrast, 5-HT inhibits myometrium contractions of a species with diffuse placental types like pig [129,130]. Thus, 5-HT may have played an important role in the differentiation of placental forms. The TGFβ signaling pathway, an evolutionarily conserved pathway that plays a fundamental role in cell growth and differentiation [131], has been implicated in regulating vascular endothelia growth factors that have been shown to underlie PE [132]. The genes in the TGFβ pathway play roles in preparation of the endometrium for implantation, embryo development and pregnancy [133]. Furthermore, TGFβ is an angiogenetic factor, and variants in several angiogenetic factors such as eNOS and FLT1 have been implicated in PE [35]. That we identified genes the TGFβ signaling pathway supports the view that PE may be a pathological legacy of the human pattern of interstitial implantation [52,134,135]. Recent work has suggested that there might be convergence of genetic factors that underlie placental diseases like PE and larger evolutionary patterns in placental traits in mammals [136]. If so, genes involved in mechanisms that distinguish different placentation types and placental phenotypes in mammalian species are prime candidates for involvement in human pregnancy pathologies.
The evolutionary framework for discovering genes underlying pregnancy related phenotypes in humans Human have evolved a distinct ensemble of traits relative to our close primate relatives. The four most cited ones, bipedalism, large brain size, metabolism and immune system have been used to formulate hypotheses to explain unique features of human pregnancy [137][138][139][140][141][142][143][144][145]. From a clinical perspective, the goal is to understand both unique and shared features of gestation timing in humans (normally~38-42 weeks and vary by up to 37 days), and the mechanisms/ pathways that underlie these traits [146]. The central issue is that the genetics that underlie such traits as bipedalism have yet to be discovered, but may provide crucial insights into human pregnancy. For example, genome-wide scans of individuals that suffer from Unertan syndrome, a rare quadrupedal gait phenotype, implicated the VLDLR gene which encodes the very low-density lipoprotein receptor, a component of the Reelin signaling pathway involved in neuroblast migration in the cerebral cortex and cerebellum [147]. Interestingly, this gene is moderately and highly expressed in normal trophoblastic and decidual cells, respectively [43], and is linked to HARs. Moreover, EPS genes that are expressed in the placenta are enriched for categories such as anatomical structure morphogenesis, and two of three genes from TGF-β signaling pathway identified in genes differential expressed in PE cases are involved in bone formation. Thus, an evolutionary perspective on identifying genes involved in pregnancy pathologies also broadens the scope for understanding the genetics of bipedalism.
Finally, the human brain has undergone rapid evolution, and the genes involved in brain development and their regulatory elements exhibit strong patterns of accelerated evolution [99,[148][149][150][151]. The majority of fast evolving, placentally enriched genes (18 of 30) have neurodevelopmental functions, and a number of fast evolving genes in overlapping gene sets is in pathways that either perform brain related functions or have been implicated in diseases of the central nervous system (S5 Table). Three fast evolving genes (NBPF11, NBPF12 and NBPF15) that are differentially expressed in both sPTB and PE are part of a neuroblastoma breakpoint (NBPF) gene family that has been shown to exhibit neuron-specific expression and copy number variations. These NBPF genes have been implicated in both evolutionary and contemporary variation in brain size among primate and human lineages, and an array of pathologies of the central nervous system, including microcephaly, macrocephaly, autism, schizophrenia and mental retardation [152][153][154][155][156][157][158][159]. Coupling these results regarding the developmental genes with the fact that genes involved in inflammatory/immune response and membrane homeostasis also emerged in our study, a general implication is that evolutionary history has potential to not only inform our understanding of pregnancy pathologies, but also generate hypotheses regarding how changes in neurogenesis, immunity, and membranes have influenced the evolution of human pregnancy.

Conclusions
Changes in our evolutionary past might have made us susceptible to some pathologies of pregnancy. Despite numerous studies on the genetics of pregnancy and its many diseases and syndromes, our understanding of the genetic factors at play remains incomplete and biased towards much-studied genes that generally underlie feto-maternal interaction in anti/proinflammatory pathways [160]. This study highlight both the extent to which there is limited integration of disparate pregnancy related genetic data, and the promise of such integration. Integrative approaches such as these, especially those that incorporate evolutionary, comparative perspectives can be successful in identifying promising avenues for research on complex heritable diseases that have emerged out of the unique changes in our evolutionary past.
Supporting Information S1 Materials. Additional methods and rationale for detecting overlap of regions and genes exhibiting accelerated evolutionary rates. (DOCX) S1 Table. Lists of genes used for all of the analyses described.  Table. A: Overlap between genes that fall in genomic regions that are fast evolving in European populations based on different selection methods on 1000 Genomes project data. There is proportionally more overlap between data from STR method and site frequency spectrum. B: Overlap between fast evolving genes in European populations based on integrated haplotype scores (iHS) methods on four different data type of European populations. There is proportionally more overlap between genes inferred from HapMapII and human genome diversity project data than any other pairwise comparison. Statistical significance of the overlap between genes from different methods inferred using hypergeometric method as implemented in http://nemates.org/MA/progs/overlap_stats.html.