Combining multi-OMICs information to identify key-regulator genes for pleiotropic effect on fertility and production traits in beef cattle

The identification of biological processes related to the regulation of complex traits is a difficult task. Commonly, complex traits are regulated through a multitude of genes contributing each to a small part of the total genetic variance. Additionally, some loci can simultaneously regulate several complex traits, a phenomenon defined as pleiotropy. The lack of understanding on the biological processes responsible for the regulation of these traits results in the decrease of selection efficiency and the selection of undesirable hitchhiking effects. The identification of pleiotropic key-regulator genes can assist in developing important tools for investigating biological processes underlying complex traits. A multi-breed and multi-OMICs approach was applied to study the pleiotropic effects of key-regulator genes using three independent beef cattle populations evaluated for fertility traits. A pleiotropic map for 32 traits related to growth, feed efficiency, carcass and meat quality, and reproduction was used to identify genes shared among the different populations and breeds in pleiotropic regions. Furthermore, data-mining analyses were performed using the Cattle QTL database (CattleQTLdb) to identify the QTL category annotated in the regions around the genes shared among breeds. This approach allowed the identification of a main gene network (composed of 38 genes) shared among breeds. This gene network was significantly associated with thyroid activity, among other biological processes, and displayed a high regulatory potential. In addition, it was possible to identify genes with pleiotropic effects related to crucial biological processes that regulate economically relevant traits associated with fertility, production and health, such as MYC, PPARG, GSK3B, TG and IYD genes. These genes will be further investigated to better understand the biological processes involved in the expression of complex traits and assist in the identification of functional variants associated with undesirable phenotypes, such as decreased fertility, poor feed efficiency and negative energetic balance.


Introduction
The regulation of complex traits involves multiple loci, that contribute to a small proportion of the phenotypic expression, and environmental effects [1,2]. In addition, loci that regulate complex traits are known to be involved in the regulation of several phenotypes. This describes the primary genetic effect that leads to genetic correlation, known as pleiotropy. Currently, the interaction and regulation pattern of these loci are poorly understood, especially in livestock species. However, the use of pleiotropic markers for genomic selection may result in improvement in efficiency of selection. On the other hand, some pleiotropic markers can lead to the indirect selection of undesirable traits. The exclusion of these markers from genetic selection programs may result in the improvement of a specific trait without affecting other traits or multiple traits simultaneously.
Indirect selection for undesirable traits has been well described in livestock breeding programs. For example, variants associated with high productive performance can also reduce fertility or affect mortality rates in both beef and dairy production systems [3][4][5][6]. The identification of loci with pleiotropic effects and the dissection of the biological processes in which these loci are involved can provide useful information to improve the accuracy of genomic breeding values by providing insights on the best statistical model to be used. Thus, more accurate breeding values translates in better breeding decisions, which will lead to reduced frequency of undesirable genotypes and phenotypes in the population under selection. Additionally, the use of causal and functional variants in selection models may result in a higher prediction accuracy and improved prediction persistence over generations [7][8][9].
The investigation of genomic loci with pleiotropic effects involved in the regulation of economically important traits is a key step to apply the breeding strategies previously mentioned. The detection of such genomic regions enables the identification of potential causal variants mapped in key regulator genes. Multivariate trait analysis and subsequent selection indexes may then be enhanced by the identification of these variants, leading to higher genetic improvement [10][11][12].
The use of functional genomics in a systems biology approach is a powerful tool for the integration and analysis of biological networks using high-throughput OMICs technologies [13,14]. The integration of results from different breeds and OMICs approaches helps unravel the relationship between candidate genes and several phenotypes to further identify genetic variants associated with changes in the expression of those genes.
Fertility and production traits are good examples of phenotypes with distinct genetic architecture, different potential to respond to selection, and with antagonistic genetic relationship. The majority of fertility traits have unfavorable genetic correlations with production traits, which may be explained by the unidirectional selection, "hitchhiking" effect or pleiotropic effects [15][16][17]. The biological process responsible for sexual maturity, known as puberty, involves complex pathways that are regulated by several genes involved with the development of a wide variety of phenotypes [18]. Crucial biological structures for puberty development, e.g., hypothalamus, pituitary gland and thyroid, are involved in the regulation of synthesis and secretion of several hormones directly related to production traits [19][20][21]. Therefore, a deep investigation of the biological processes related to genes involved in puberty may result in the identification of key regulator genes for both fertility and production traits. Within this context, the aim of this study was to integrate multi-OMICs data into studies that investigated pubertal status and fertility traits. This integration was performed using a systems biology approach to identify candidate genes with pleiotropic effects on economically relevant traits in beef cattle.

Ethics statement
This study was carried out analysing data from previous studies, which have been approved by respective ethics in research committees [13,20,22]. Therefore, additional animal welfare and use committee approval was not required.

Data collection
Candidate genes identified in three independent populations (i.e., breeds) of beef cattle were used. For the first population (Brangus cattle, n = 64), genes differentially expressed (DE) between pre-and post-puberty in eight tissues (hypothalamus, pituitary gland, liver, longissimus dorsi muscle, adipose tissue, uterine horn, endometrium, and ovary) were evaluated [13]. The candidate genes for the second population (Tropical Composite cattle, n = 866; Brahman = 843) were identified by a genome-wide association analysis (GWAS) for age at puberty, postpartum anestrous interval, and occurrence of the first postpartum ovulation before weaning in the first rebreeding period [22]. For the third population (Brahman cattle, n = 24), DE genes were identified in pre-and post-puberty stages in the pituitary gland and ovary [20].

Identification of genes near pleiotropic markers
A total of 9,194 genes identified in the three breeds (2,120 in Brangus, 3,714 in Tropical Composite, and 5,269 in Brahman) were mapped against a list of genomic markers (n = 729,068) with pleiotropic effect (P < 0.05 after false discovery rate (FDR) correction-n = 21,908 markers) reported by Bolormaa et al. (2014) [10], which analyzed 32 traits including growth, feed intake, carcass and meat quality, and reproduction. Using the list of candidate genes obtained by integrating and combining data from OMICs technologies described by Cánovas et al.

Genes shared among breeds and QTL mapping
The genes identified within the pleiotropic regions were compared and those shared across the three independent populations were selected to be mapped against QTL regions using resources from the Cattle QTL database (CattleQTLdb, www.animalgenome.org/cgi-bin/ QTLdb/BT/index). Using an interval of 2 Mb (1 Mb downstream and 1 Mb upstream) from each one of the shared genes, all the QTLs mapped in this interval were annotated. Those genes mapped in regions where at least five QTL categories were annotated (from six possible categories: exterior appearance, health, reproduction, production, meat and carcass and milk traits) were ranked as the genes with the highest pleiotropic potential.

Functional analyses
First, the relationships among pleiotropic genes were estimated using literature text-mining, co-expression, gene fusion, protein homology, gene-neighborhood and gene co-occurrence using STRING database [25]. This estimation used human database, as the available information is more complete for humans than bovine. Additionally, using STRING database, an enrichment analysis was also performed to identify associations between the pleiotropic genes with biological processes (BP) and metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG). From the gene network created using STRING database, a centrality analysis was performed in order to identify the pleiotropic genes with the highest number of connections in the network. The igraph package [26] was used to calculate the centrality coefficient, defined as the number of connections of each node in a network. The VarElect software [27] was used to match the genes present in this list to the enriched KEGG pathways. In addition, for all the genes present in the main network, BLAST2GO [28] was used to obtain a better description of the BP that these pleiotropic genes are related to. Finally, the NetworkAnalyst tool [29] was used to confirm the genes with the highest regulatory potential based on the interaction with other proteins. The identification of the interactions among the genes was performed using IMEx, based on a literature-curated comprehensive data from InnateDB

Results and discussion
A total of 108 genes (S1 File) mapped in pleiotropic regions were shared among all three independent populations of the three beef cattle breeds considered (Fig 2). Among them, 89 genes were mapped in regions with five or six QTL categories annotated (S2 File). It is important to highlight that the approach to evaluate the post-puberty stage was different between Cánovas et al. [20] defined the puberty period when the observation of the first corpus luteum occurred. These differences, together with the different genetic backgrounds and environmental effects between these two populations, can result in different expression profiles, even when two similar groups of phenotypes are measured (pre-and postpuberty). However, the same biological processes are expected to underlying the puberty development in both populations. Therefore, it is expected that the key-regulators of these processes maintained a similar DE profile between pre-and post-puberty phenotypes in both populations. Additionally, the consistency of expression among tissues and populations can help to identify these key-regulatory genes. The results obtained from BP and KEGG pathways enrichment analyses indicated that these genes are involved in development and maturation of the body (Table 1). From those 89 genes, 38 were grouped in the main network identified by STRING database (Fig 3 and S1 File).  Table 2). The detailed mapping of the BP associated with these 38 genes suggested a strong relationship between these genes and the regulation of biological processes involved with cellular metabolism, cellular growth and development (Fig 5 and S3 File). Fig 6 shows the genomic context of the regions where these 38 genes were mapped as a function of the pleiotropic markers harboring these genes. The results from evaluation of the gene list composed of 38 genes suggested that the identification of key-regulatory genes was an important step to elucidate the relationship among these genes and the regulation of BP, as well as their association with economically important traits.

Identification of key-regulatory genes through gene-network analysis
The centrality analysis for the main gene network, composed by 38 genes, identified the genes with the largest number of connections in the network. Table 3 shows the number of connections for the top 10 genes with the largest number of interactions. In order to confirm and reinforce the identification of genes with the highest regulatory potential, the interaction pattern of these 38 genes with other proteins was evaluated using IMEx. From this analysis, it was Key-regulator genes for pleiotropic effects in beef cattle observed that there was an overlap between the top 10 genes with more interactions in the STRING database network and the top 10 genes from the IMEx interactions. In the IMEx analyses, it was also possible to identify, from the top 10 genes, six genes directly related to positive and negative regulation of cellular metabolic processes (red circles in Fig 7). Interestingly, these 6 genes were also present in the list of top connected genes in the main network identified by STRING database. These results confirmed the regulatory potential of these genes and highlight the biological processes in which these genes were involved. These genes were: MYC proto-oncogene (MYC) Peroxisome Proliferator Activated Receptor Gamma (PPARG), Glycogen Synthase Kinase 3 Beta (GSK3B), SMAD Family Member 2 (Smad2), ABL Proto-   Oncogene 1, Non-Receptor Tyrosine Kinase (ABL1) and Insulin-like growth factor 1 receptor (IGF1R). Three of these genes are involved in cell proliferation: MYC, SMAD2 and ABL1. MYC was the gene with the highest number of interaction with other genes in both network analyses. This gene codes for a transcription factor responsible for regulating transcription of several genes. Consequently, MYC plays a multifunctional action involved with the control of crucial biological processes, such as cell cycle control and cellular transformation [31]. The expression of MYC was decreased in the muscle tissue of orchidectomized testosterone-treated male SMAD2 is a member of the TGF-beta-SMAD signaling pathway and it is involved with the regulation of several processes associated with female reproduction and embryonic development in cattle [36]. In male rats, SMAD2 is DE between non-sexually mature and sexually mature rats, indicating a relationship with puberty progression [37]. The expression of myostatin, a protein involved with muscle proliferation, is directly regulated by SMAD2 activity [38][39][40]. Therefore, SMAD2 is a crucial regulator of processes involved with fertility and production traits.
ABL1 is a proto-oncogene associated with the regulation of several biological processes related to cellular division and differentiation. ABL1 was observed as DE in mouse testis after heat shock, indicating a regulatory activity in this tissue [41]. Additionally, homozygous disruptions of ABL1 are associated with neonatal lethality in mice [42]. To our best knowledge, the association between ABL1 and production traits in cattle has been poorly investigated. However, a significant peak associated with angularity in Brown Swiss cattle was identified close to ABL1 region. These results suggested a potential association between ABL1 with fertility and production and conformation traits, reinforcing the necessity to improve our understanding of these relationships. Two of the other six genes identified in this analysis are involved in energy conservation metabolism: PPARG, GSK3B. PPARG belongs to a subfamily of nuclear receptors involved in several crucial biological processes, such as adipogenesis and immune cell activation [43]. Alterations in the expression pattern of PPARG, usually decrease in expression, were already associated with puberty progression and fertility traits in several species, including humans and cattle [44][45][46]. Additionally, PPARG has also been associated with meat quality and, more specifically, intramuscular fat percentage, and milk synthesis in cattle, which are economically important traits in beef and dairy cattle production [47][48][49], respectively.
Muscle glycogen is absolutely fundamental as energy reservoir. The amount of liver and muscle glycogen available determines the use of fat and, as a last resource, amino acids, for providing energy. GSK3B is a serine-threonine kinase member of a subfamily of glycogen synthase kinase and its corresponding function is known to be associated with energy metabolism, body pattern formation and neuronal cell development [50]. Additionally, GSK3B has been associated with age at puberty, sperm motility and decrease in spermatogenesis. Therefore, its function is directly related to fertility traits [51][52][53]. Interestingly, GSK3B has also been associated with several economically relevant traits such as skeletal muscle hypertrophy, intramuscular fat, meat quality, milk synthesis and mammary gland proliferation [54][55][56][57].
The last gene in this list, IGF1R, is the receptor of the Insulin-like growth factor 1 (IGF1), which also binds IGF2 and insulin with lower affinity. IGF1 and IGF2 have remarkable functions in the steroidogenic activity and regulation of body growth and maturation. In female cattle, IGF1 and IGF2 are associated with the regulation of the steroidogenic activity in the glanulosa cells, as well as, the regulation of the mitosis (IGF1 and IGF2) and apoptosis during the follicular development (IGF1) [58,59]. The bovine testis is an important source of IGF1  Key-regulator genes for pleiotropic effects in beef cattle production [60]. Both circulating and locally produced (testis) IGF1 may play a crucial role in the testicular size and testosterone secretion [61,62]. The IGF1 pathway influence gonadotrophin-releasing hormone (GnRH) neurons during the puberty and it is directly associated with the puberty progression and the development of reproductive traits [52,63,64]. Additionally, IGF1 was associated with the regulation of several production traits, for example, feed intake, feed conversion, body weight, milk protein yield, milk fat yield, milk fat concentration and somatic cell score [65][66][67]. Variants mapped on IGF1R may result in a change of affinity between IGF1 and its receptor, resulting in a different response to the circulating levels of this hormone. The crucial roles of IGF1 in the regulation of body development, in addition to the results obtained in the present study, highlights the potential of IGF1R to act like a key-regulator of pleiotropic effects associated with fertility and production traits. All the six genes described above are directly related to fertility and economically relevant traits. Additionally, these genes were identified as potential key-regulatory genes due to the number of interactions with other genes and its biological functions. Consequently, these genes are important candidate genes for pleiotropic effect on multiple production and fertility traits.
Genes that do not appeared in both analyses described in Table 3 were not included in the discussion. However, some of these genes are fundamental to the metabolic processes Key-regulator genes for pleiotropic effects in beef cattle discussed here. For example, somatostatin (SST), also known as growth hormone inhibiting hormone (GHIH), affects energy conservation metabolism by inhibiting insulin and glucagon secretions. In addition, somatostatin produced in the hypothalamus is transported to anterior pituitary, where it inhibits the release of growth hormone (GH), TSH and prolactin [68]. This gene also encodes neuronostatin, which is also implicated in the releasing of pituitary hormones [69]. In addition, somatostatin is involved in the MAPK, cAMP, PKA-pathways, among many other ones [70]. As a consequence, somatostatin has also a role in energy conservation metabolism and cell proliferation, the two main biological processes detected in the current study.
ACACB and ADCYAP1 are involved in the energy conservation metabolism. ACACB encodes acetil-CoA carboxilase, the enzyme converting acetil-CoA in malonyl-CoA, fundamental for fatty acids biosynthesis (KEGG ID: 04931; ID: 00061). ADCYAP1 encodes adelylate-clycase interacting protein 1, which through the cAMP and PKA pathways is involved in the insulin upregulation and in the regulation of insulin levels in insulin secretory granules, respectively, among other functions (KEGG ID: 04911).
ABI1 and ATF2 are cell proliferation regulators. ABI1 encodes ABL interacting protein 1, which facilitates the ABL cell proliferation signal. ATF2 encodes activating transcription factor 2, also known as CREB2. CREB2 regulates cell cycle proteins, pro-apoptotic proteins, cell adhesion molecules, and membrane and cytoplasm signaling proteins. In additions, CREB2 is the final step in many pathways, including cAMP, PKA, estradiol 17-beta (KEGG ID: 04915) and glucagon (KEGG ID: 04922). Therefore, CREB2 is also involved in the energy conservation metabolism.
LGALS3 encodes galectin 3, a carbohydrate binding protein with affinity for beta-galactosides. As a consequence, galectin 3 is involved in cell proliferation, adhesion, differentiation, angiogenesis and apoptosis [71]. It has been shown that galectin 3 is expressed by trophoblast cells in response to 17b-estradiol, progesterone, and human chorionic gonadotropin (hCG). Among many other effects, galectin 3 induces apoptosis in endometrial cells, which would allow embryo implantation [72].

Thyroid function and genes with pleiotropic effect
Among the genes present in the main network identified by STRING database, there are two crucial genes for the synthesis of thyroid hormones. These genes are thyroglobulin (TG) and iodotyrosine deiodinase (IYD). TG is metabolized through the addition of iodine molecules to produce mono-and di-iodotyrosine and the thyroid hormones triiodothyronine (T3) and thyroxine (T4). Consequently, TG is one of the main storage molecules of iodine in the body [73]. IYD is responsible to conserve iodide, recycling iodine, during synthetization of T3 and T4 hormones, from mono-and diiodotyrosine. Due to the low availability of iodine in the nature, IYD dysfunctions reduce the amount of available iodine for T3 and T4 synthesis [74][75][76]. In target cells, IYD converts T4 to T3, the active thyroid hormone, and converts T3 to di-iodotyrosine, inactivating T3. These processes also provide iodine to peripheral tissues. The thyroid hormones are related to the control of several crucial biological processes involved with the regulation of basal metabolism. In cattle, genes related to thyroid hormone regulation have been identified as DE in studies evaluating feed efficiency, lactation stage, fat deposition and early embryonic development [77][78][79][80][81][82][83]. Additionally, some studies suggest a pleiotropic effect for TG polymorphisms in production traits [84,85]. Additionally, changes in the levels of thyroid hormones during pregnancy, mainly in the initial development of the embryos, are associated with adverse pregnancy outcomes and embryonic losses [86]. Moreover, alterations in thyroid activity may result in male and female infertility [19,87].
An interesting link between the thyroid hormones and the selection for productive traits in cattle is that TG (BTA14:9,253,697-9,263,933) is mapped in the same core selective sweep (CSS) region of DGAT1 (BTA14: 1,795,425-1,804,838) [88]. The QTL related to DGAT1 is considered to have a major effect on production traits and has been associated with several phenotypes in both dairy and beef cattle breeds [89][90][91]. Due to this major effect, molecular markers associated with the DGAT1 effect are intensively exploited in genetic improvement programs. Generally, the use of molecular markers in selection programs does not consider the relationship among the aimed marker and the surrounding mapped markers. However, the intensive selection may increase the extent and the overall LD in a region [92]. This phenomenon may result in an indirect selection (hitchhiking effect) of markers mapped in different genes and with unpredictable effects. It is important to note that, in the same CSS, the highest signal for pleiotropic effect was observed by Bolormaa et al. (2014) [10], as shown in Fig 6. In the same position, three additional genes shared among the three independent populations mapped to regions with 5 or more previously reported QTLs and in the main network identified by STRING database, i.e. MYC, TGS1 and CHD7.
The thyroid cancer pathway (KEEG ID: 05216) was one of the enriched pathways identified by STRING database (Table 1). In this pathway, thyroid hormones and precursors are not involved, but both MYC and PPARG are implicated, reinforcing the association between this main-network and the regulation of thyroid activity. In addition to the presence of TG and IYD in the main network identified by STRING database (Fig 3), these results suggest that this network is enriched by genes related to thyroid function. The VarElect software was used to confirm this hypothesis through an association processes between the genes that composed the main network and the keywords "thyroid" and "thyroid hormones". Through data-mining using information available on GeneCards and MalaCards, it was possible to identify that from the 38 genes (S4 File) present in the main network, 31 are directly related to those keywords. Therefore, indicating that this gene network was enriched by genes related to thyroid function.
As previously described, thyroid function is related to the regulation of several biological processes associated with economically important traits. Additionally, due to this association with several processes and traits, the genes involved in thyroid activity are excellent functional candidate genes for pleiotropic effects. Further functional analyses will be performed in order to elucidate the relationship between the different traits affected by pleiotropic effects, as well as, to identify candidate variants associated with the function of these candidate genes.
It is important to highlight that during the analyses performed here, some differences were observed in the gene expression profile among populations, even when similar groups were compared (Pre-and post-puberty). A very common phenomenon observed in biological analyses that can help to address this issue is the Simpson's paradox. The Simpson's paradox is observed when results from aggregated data contradict those from separate analyses. There are several reports in the literature discussing the impact of Simpson's paradox in different fields, such as network analysis and gene expression [93,94,95]. The biological bases for the Simpson's paradox in biological analyses are still poorly understood. However, some points can be raised to help in the discussion of this phenomenon. For example, in the data analysed in the present manuscript, as addressed in the previous commentary, the differences in the genetic background, environmental effects and evaluation of the puberty, and number of tissues evaluated between populations can help to explain these differences. It is important to highlight that these populations (Brangus, Brahman and Tropical composition) shared the Brahman genetic component. However, these proportions are different in each population. For example, in the Brangus population the animals have 3/8 of Brahman component and 5/8 of Angus component. All these differences can be taken together in order to discussion the possible causes of the phenomenon observed here. Additionally, it is important to highlight that the genes shown in Table 2 are a specific group of genes, which are the genes with the highest potential to perform pleiotropic effect. Additionally, the consistency of expression among tissues and populations can help to identify these key-regulatory genes. The present study aims exactly on those genes that even with all these possible confounding factors maintain the expression profile, which can be an additional evidence of crucial regulatory role.

Conclusions
The present study described a multi-breed and multi-OMICs approach to identify key-regulatory candidate genes for pleiotropic effects in beef cattle using the results generated by previous studies. Our findings confirm the feasibility of using a systems biology approach to unravel candidate genes regulating complex traits. Genes identified in this study are mainly involved in two biological processes: energy conservation metabolism and cell proliferation, probably the most theoretically plausible processes to unify the phenotypes investigated in this study: exterior appearance, health, reproduction, production, meat and carcass, and milk traits. This study contributes to the understanding of the cause-consequence relationships between variants mapped on candidate pleiotropic genes affecting complex traits. Additionally, the results obtained here will be useful for better defining statistical models to improve the accuracy of genomic prediction of breeding values and avoid the simultaneous selection for unfavorable genetically correlated traits in beef cattle and other livestock species.