Putative Regulatory Factors Associated with Intramuscular Fat Content

Intramuscular fat (IMF) content is related to insulin resistance, which is an important prediction factor for disorders, such as cardiovascular disease, obesity and type 2 diabetes in human. At the same time, it is an economically important trait, which influences the sensorial and nutritional value of meat. The deposition of IMF is influenced by many factors such as sex, age, nutrition, and genetics. In this study Nellore steers (Bos taurus indicus subspecies) were used to better understand the molecular mechanisms involved in IMF content. This was accomplished by identifying differentially expressed genes (DEG), biological pathways and putative regulatory factors. Animals included in this study had extreme genomic estimated breeding value (GEBV) for IMF. RNA-seq analysis, gene set enrichment analysis (GSEA) and co-expression network methods, such as partial correlation coefficient with information theory (PCIT), regulatory impact factor (RIF) and phenotypic impact factor (PIF) were utilized to better understand intramuscular adipogenesis. A total of 16,101 genes were analyzed in both groups (high (H) and low (L) GEBV) and 77 DEG (FDR 10%) were identified between the two groups. Pathway Studio software identified 13 significantly over-represented pathways, functional classes and small molecule signaling pathways within the DEG list. PCIT analyses identified genes with a difference in the number of gene-gene correlations between H and L group and detected putative regulatory factors involved in IMF content. Candidate genes identified by PCIT include: ANKRD26, HOXC5 and PPAPDC2. RIF and PIF analyses identified several candidate genes: GLI2 and IGF2 (RIF1), MPC1 and UBL5 (RIF2) and a host of small RNAs, including miR-1281 (PIF). These findings contribute to a better understanding of the molecular mechanisms that underlie fat content and energy balance in muscle and provide important information for the production of healthier beef for human consumption.


Introduction
Intramuscular fat (IMF, also known as marbling) represents the amount of fat accumulated between muscle fibers or within muscle cells, which is the sum of phospholipids (present in cell membranes), and triglycerides (lipid droplets). Understanding the biological and functional mechanisms that regulate IMF content is an interesting issue in meat science and human medicine. Intramuscular fat content is a polygenic trait regulated by many genes involved directly, or indirectly in adipogenesis and fat metabolism [1]. The deposition of IMF is influenced by many factors such as sex, age, breed, nutrition, and genetics [2].
High intramuscular fat content (marbling) has been associated with tenderness, juiciness and consumer satisfaction [3]. At the same time, red meat consumption or more specifically saturated fat consumption has been associated with human diseases, such as cardiovascular disease, obesity and colon cancer [4]. These associations have created a demand for beef with both low fat and high quality by consumers. In addition, consumers in some countries have different preferences of IMF amount. For example, consumers in Asia and North America desire beef with high IMF, while Europeans prefer lean beef, with low IMF [5].
RNA-Seq has been used to provide insight into biological and molecular mechanisms of complex traits and diseases. Recent RNA-Seq studies of pigs with extreme IMF phenotypes have defined differentially expressed genes (DEG) and potential gene networks important in lipid and fatty acid metabolism in the liver [6]. In beef cattle, DEG were identified in subcutaneous fat from Bos taurus crossed steers, which revealed that expression pattern depends on the genetic background [7]. Two studies have investigated DEG due to IMF deposition in Bos taurus, Bos indicus and their crosses [8,9], however the biological and functional mechanisms involved with IMF amount are still unclear. From the published literature, lipid metabolism is considerably different between nonruminants and ruminants, but gene expression studies in ruminant are relatively scarce.
Understanding the biological mechanisms involved with complex traits requires analysis of genetic networks, as well as determination of relationships between genes and networks. A novel algorithm for the partial correlation coefficient with information theory (PCIT) mathematical method was developed to determine the relationships between all genes and networks [10]. Previously, the PCIT algorithm approach identified causal regulatory changes in myostatin gene expression in beef cattle [11] by co-expression network analysis and the regulatory and phenotypic impact factor methods (RIF and PIF). These co-expression methods provide powerful incite into the changes that occur in the network makeup and wiring between different treatment groups. These methods allow subtle changes in networks to be detected even when many genes in a pathway are not differentially expressed.
The objective of this study was to identify differentially expressed genes, pathways and putative regulators associated with IMF variation in Longissimus dorsi muscle of extreme IMF GEBV Nellore steers to better understanding of IMF metabolism.

Ethics statement
All experimental procedures involving steers were approved by the Institutional Animal Care and Use Committee Guidelines from Brazilian Agricultural Research Corporation-EMBRAPA and sanctioned by the president Dr. Rui Machado.

Animals, samples and phenotypes
Three hundred and ten Nellore steers from the Brazilian Agricultural Research Corporation (EMBRAPA/Brazil) experimental breeding herd, raised between 2009 and 2011 were included in this study, as described by Cesar and collaborators [12]. These steers were sired by 34 unrelated sires, and were selected to represent the main genealogies used in Brazil according to the National Summary of Nellore produced by the Brazilian Association of Zebu Breeders (ABCZ) and National Research Center for Beef Cattle. Animals were raised in feedlots under identical nutrition and handling conditions until slaughter at an average age of 25 months. Samples from LD muscle located between the 12th and 13th ribs were collected in two moments: at slaughter for RNA sequencing analysis, and 24 hours after slaughter for the intramuscular fat (IMF) content measurement.
Approximately 100 g samples of beef collected for IMF content analysis were lyophilized and ground to a fine powder. Five g of this ground, lyophilized tissue was used to obtain IMF. The Ankom XT20 lipids equipment was used to determine lipid content according to the procedure of AOCS (Official Procedure Am 5-04) for IMF extraction [13]. Restricted maximum likelihood analysis was performed to estimate variance components, heritability and Genomic Best Linear Unbiased Prediction (GBLUP) using ASREML software [14] on 310 total animals. The SNP markers information was obtained as described by Cesar and collaborators (2014) using BovineHD 770 k BeadChip (Infinium BeadChip, Illumina, San Diego, CA). SAS PROC MIXED was used to test independent sources for significance. Fixed effects included contemporary group classes (animals with the same origin, birth year and slaughter date) and hot carcass weight as a covariate. Animal and residuals were fitted as random effects [12]. The animal model used in this analysis was, y = Xb + Zu + e, where y is the vector of observations, which represented the trait of interest (dependent variable), X and Z are the design or incidence matrices for the vectors of fixed and random effects in b and u, respectively, and e was the vector of random residuals. The expected variance of vector u is Var(a) = I σ 2 m ; where σ 2 m is the variance explained by markers, and I is the identity matrix. The variance of vector u was G σ 2 m for the genomic analyses where G is the genomic relationship matrix derived from SNP markers using allele frequencies as suggested by VanRaden [15], with σ 2 m being the marker-based additive genetic variance. A group of 14 animals were selected based on their extreme genomic estimated breeding values (GEBVs) for IMF (seven high and seven low). To verify the difference in IMF level between the high and low group a Student's t-test was performed using R package. The genomic estimated breeding values (GEBVs) for other muscle characteristics such as ribeye area and backfat thickness were also calculated to ascertain these animal were not extreme for another characteristic.
RNA extraction, quality analysis, library preparation and sequencing Total RNA was extracted from 100 mg of frozen LD muscle that was collected at slaughter using the TRIzol reagent (Life Technologies, Carlsbad, CA). RNA integrity was verified by Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). Only samples with RIN > 8 were used. A total of 2μg of total RNA from each sample was used for library preparation according to the protocol described in the TruSeq RNA Sample Preparation kit v2 guide (Illumina, San Diego, CA). Libraries average size was estimated using the Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) and quantified using quantitative PCR with the KAPA Library Quantification kit (KAPA Biosystems, Foster City, CA, USA). Quantified, samples were diluted and pooled (three pools of six samples each). Three lanes of a sequencing flowcell, using the TruSeq PE Cluster kit v3-cBot-HS kit (Illumina, San Diego, CA, USA), were clusterized and sequenced using HiScanSQ equipment (Illumina, San Diego, CA, USA) with a TruSeq SBS Kit v3-HS (200 cycles), according to manufacturer instructions. The sequencing analyses were performed at the Genomics Center at ESALQ, Piracicaba, São Paulo, Brazil.

Quality control and read alignment
Sequencing adaptors and low-complexity reads were removed in an initial data-filtering step. Quality control and reads statistics were estimated with FASTQC version 0. 10 [17], to identify novel transcripts. A combination of novel transcripts identified by Cufflinks and those from the reference GTF file at Ensembl were used as the reference for read quantification for each transcript. The abundance (read counts) of mRNAs for all annotated genes, was calculated using HTSeq version 0.5.4 software [http://www-huber.embl.de/users/anders/HTSeq/] [18]. Only sequence reads that uniquely mapped to known chromosomes (excluding reads mapped to unassigned contigs) were used in this analysis.

Identification of differential expressed genes and pathway analysis
Differentially expressed genes were identified using the DESeq software according to the protocol proposed by Anders and collaborators [18], which uses read counts that fall into annotated genes and perform statistical analysis based on the table of counts to discover quantitative changes in expression levels between experimental groups. Prior to statistical analysis, the read count data was filtered as follows: i) transcripts with zero counts were removed (unexpressed); ii) transcripts with less than 1 read per sample on average were removed (very lowly expressed); iii) transcripts that were not present in at least three samples were removed (rarely expressed). After filtering, a total of 16,101 transcripts were analyzed for differential expression using the "nbinomTest" function of DESeq to fit transcript expression level as a negative binomial distribution. Exploratory diagnostic plots were generated to check the dispersion estimates. Benjamini-Hochberg [19] methodology was used to control the false discovery rate (FDR) at 10%. Transcript annotations were retrieved with a perl script to query the Ensembl database using the Ensembl Perl Application Program Interface (API). Transcripts that lacked annotation information were annotated using the Genome-to-seq and GOanna for GO annotations based on sequence homology by Basic Local Alignment Search Tool (BLAST) at AgBase [20]. A twotiered approach was taken to detect pathway level changes in gene ontology in the extreme IMF samples RNAseq profiles. First, enrichment analysis of curated gene ontology terms was completed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 tool [21] using the list of genes that presented FDR < 10%. Second, a literature based pathway enrichment analysis was performed using Pathway Studio [22] from a list of genes that presented FDR < 20%.

PCIT and differential hubbing network analysis
Expression values were normalized as the number of fragments per kilobase of exon per million reads (FPKM) as reported in Cufflinks output [17] for co-expression analysis. A modified version of the PCIT algorithm [23] was used to identify differential hubbing (DH) for all transcripts [11]. A shell script pipeline was developed to summarize all DH results. The gene list used for PCIT included all transcripts detected in our study, but only those with a direct and partial correlation 0.90 were used for the DH analysis. The DH was computed by the difference of significant connections of a transcript between the low and high IMF group. The top ten most positively and negatively DH transcripts were identified for further investigation.

Regulatory Impact Factor (RIF) network co-expression analysis and phenotypic impact factor (PIF) scores
The regulatory impact factor (RIF) and phenotypic impact factor (PIF) scores were calculated as described [24] to predict which transcripts were potential regulators of gene expression differences between the high and low IMF groups. The RIF calculations presented here were modified from the original method and the complete list of expressed transcripts were tested as potential regulators and only transcripts with a significant partial correlation 0.90 from PCIT were included in the RIF and PIF score estimates.
The PIF values represent a way to rank genes based on the magnitude of the expression of a gene and the difference in the expression of that gene between two treatments. A high PIF value would indicate that a given gene would likely be closely related to changes in phenotype. The RIF1 value allows genes to be ranked as potential regulators of networks based largely on changes in correlation between two treatment levels (i.e. differential wiring). The RIF2 value allows genes to be ranked as potential regulators of networks with more of an emphasis on how expression changes of a potential regulator may predict the abundance of genes DE due to treatment differences [24]. Thus, RIF2 ranks genes as potential biomarkers tracking key differences in gene expression related to treatment differences.

Phenotypic groups, mapping and annotation
A summary of the IMF phenotypic data expressed as a percentage, GEBVs, and the total number of reads mapped against the Bos taurus UMD3.1 reference genome assembly following quality control are shown in Table 1. The genetic variance, residual variance and heritability for IMF obtained from this population were 0.196, 0.490 and 0.29±0.16, respectively. The  GEBV for IMF values for all 310 animals were ranked and seven animals with high GEBV for IMF (H) and seven with low (L) were selected for RNA-Seq analysis. This strategy, to select the animals with extreme GEBV was performed for two main reasons: (1) the correlation between the raw IMF values (percentage of IMF) and GEBV was high (r = 0.76) (S1 Fig) and (2) the GBLUP procedure used genomic information from all relatives and can account for environmental effects [25]. The T-test showed that the IMF averages for two groups were statistically different (p-value = 2.016e-09). The cattle used in this study were not extreme individuals for either subcutaneous fat thickness or longissimus muscle area (S1 Table). These results indicate that the samples selected for IMF were not divergent for other characteristics of muscle measured in this population. A total of 364.18 million (M) 100 bp paired-end reads were obtained from three lanes of an Illumina HiScanSQ. The average number of total reads per sample was 26 M, with an estimated sequencing depth of 40X coverage (i.e. the expected mean coverage at all transcripts). The mean number of total mapped reads estimated by Tophat [16] was 17 M, for an average of 65.38% paired reads mapped. All transcripts were annotated using the Ensembl Perl API or AgBase tools [20], genome2seq and GOanna. A total of 16,101 genes were analyzed after filtering out lowly expressed genes. Correlations among mean gene expression levels between both the H and L groups was high (r = 0.99).

Differential expression analysis and differentially expressed genes
The DESeq R package was used to identify DEG. This statistical package uses a parametric method, which relies on assumptions regarding the distribution of sampled data [18].  Table). The expression level, fold change, p-value and annotation of all 16,101 genes identified are showed in the S3 Table. Functional enrichment and pathways A gene-annotation enrichment analysis was performed using the DAVID software [21] and knowledgebase to capture enriched biological terms from the DEG list (FDR < 10%). This analysis identified (Table 2) two significant KEGG pathways based on Benjamini-Hochberg (BH) methodology [19]: focal adhesion (BH-adj < 0.02) and Extracellular matrix (ECM)-receptor interaction (BH-adj < 0.056). A gene set enrichment analysis (GSEA) was performed to identify over-represented pathways in DEG (FDR < 20%) using Pathway Studio. The GSEA detected 13 pathways (FDR < 10%) that contained 27 DEG (Table 3). These DEG are associated with multiple pathways, which could indicate an inter-relation between the pathways. An over-representation of genes were identified in the following functional classes by Pathway Studio: L-cysteine, Zn 2+ , H 2 O 2 , Mg 2+ , ATP, retinoic acid, NO, ROS, oxidized LDL complex, inflammatory cytokines, and protein tyrosin-kinase. Pathway Studio also identified genes associated with IL-1ß and NF-kB pathways. Fig 1 shows the genes associated with the L-cysteine small molecule, and the genes associated with the retinoic acid and inflammatory cytokine functional classes are shown in S5 and S6 Figs, respectively, where genes colored in red are overexpressed in animals of Low group and those in blue are overexpressed in H group.

PCIT and differential hubbing analysis
The PCIT algorithm was used to identify differential hubbing (DH) between the H and L IMF groups. Differential hubbing (DH) or differential connectivity (DC) is the difference in the number of significant partial correlations a gene has between two different states (i.e. compared between high and low groups), as computed by the PCIT algorithm. In other words, DH is the change in the number of significant connections between two states. The significance of a correlation is determined using an information theory approach in the case of the PCIT method.
The top ten negative and positive DH values are shown in Tables 4 and 5, respectively. All DH results are presented in S4 Table. The modified version of the PCIT algorithm used in this study allowed all transcripts to be tested as putative regulators genes without prior knowledge of their function [10]. Among the top ten positive DH, three have been previously reported as associated with adipogenesis and adipose metabolism: ANKRD26, HOXC5 and PPAPDC2 (Fig 2). Among the top ten negative DH, two were identified by literature as good putative regulatory genes: Zinc finger protein, friend of GATA (FOG) family member 1 (ZFPM1) and zinc finger protein 90 (ZFP90) (Fig 3).

Regulatory impact factor and phenotypic impact factor
The DH approach is not applicable to assess the importance of each DEG on differences in phenotype, so Hudson, Reverter and Dalrymple [11] proposed a new metric approach called "phenotypic impact factor" (PIF). PIF is based on DEG numerical properties, which "weigh" the contribution of the DEG based on the per unit differences in phenotype across groups of phenotypically extreme individuals. These authors also proposed the "regulatory impact factor" (RIF) to have a metric that accounts for changes in the molecular wiring of networks represented by changes in gene-gene correlation as well as changes in gene expression and PIF in response to changes in regulators expression level. Two types of RIF scores were developed. The RIF1 score prioritizes regulators that have a greater impact on the changes in wiring (i.e. correlation) in the network, whereas RIF2 score prioritizes regulators whose changes in expression mostly reflect the changes in expression of DEG. A complete list of all RIF1, RIF2 and PIF  Tables 6 and 7, respectively. The RIF1 analysis identified many novel or pseudo genes as potential regulators that lead to differences between the low and high IMF groups. Two particularly interesting regulators included GLI2 and IGF2. The RIF2 score identified additional candidate regulators, including two strong candidates: MPC1 and UBL5. The PIF analysis identified many small RNAs as having a large impact on the IMF phenotype. In fact, 9 of the top 10 most highly ranked PIF genes are small RNAs.
To further investigate the functional associations of the 3,000 genes with higher scores of PIF, the functional term enrichment at DAVID database (http://www.david.abcc.ncifcrf.gov) was performed (S8 Table). Several significant clusters were enriched (BH-adj < 0.10), including GO terms of biological processes related to translation, generation of precursor metabolites and energy, ATP synthesis coupled proton transport, tricarboxylic acid cycle, acetyl-CoA catabolic process, acetyl-CoA metabolic process, posttranscriptional regulation of gene expression, protein catabolic process, mRNA metabolic process, RNA processing, RNA splicing, intracellular protein transport, intracellular transport and protein folding.

Discussion
Intramuscular fat (IMF) quantity is an economically important trait, which influences the sensorial and nutritional value of meat. In addition, it is related to insulin resistance, which is an important predictive factor for disorders, such as cardiovascular disease, obesity and type 2 diabetes in human. In beef cattle, adipose tissue development is of significant interest because the deposition and composition of IMF are involved with organoleptic characteristics, consumer preference, public health, and producer profitability. Fat deposition is a consequence of the balance between energy intake and energy expenditure [26], which involves complex biological  processes. Adipogenesis depends on a cascade of transcriptional factors activation [27] and events that are still unclear in species such as ruminants. As reviewed by Cristancho and Lazar [28] the adipogenesis process can be divided into three phases: commitment, transition and terminal differentiation. The commitment phase involves the conversion of mesenchymal stem cells (MSCs) to committed white preadipocytes, when occurs a dramatic alteration in cell shape mediated by extracellular matrix (ECM) factors. The committed white preadipocyte then goes through an epigenomic transition phase, when adipogenic stimuli (such as insulin and glucocorticoids) stimulates transcription factors that will induce peroxisome proliferatoractivated receptor gamma (PPARg). In the terminal differentiation phase, committed white adipocyte becomes mature white adipocyte by induction of metabolic genes involved with triacylglycerol synthesis and degradation [28]. The present transcriptome study performed two strategies to identify differentially expressed genes (DEG), biological pathways and putative regulatory factors. The first strategy identified DEG between two groups (H and L) and biological pathways [18]. The second strategy identified putative regulatory factors and pathways [11]. In the first strategy, 77 significantly DEG between the groups were identified (S1 Table). It should be noted that in this study, the myofibers were not separated from intramuscular fat. Therefore, it is not possible to determine if the differential expression observed is the direct result of differences in the content of intramuscular fat or if there is an actual change in gene expression in either just the myofibers or intramuscular fat. Some of the genes and pathways herein identified agree with previous gene expression studies in preadipocyte differentiation process [29] and in IMF in cattle [30,31].
The animals of low GEBV IMF group presented higher expression levels of genes associated with the first phase of adipocyte differentiation (MSC to committed white adipocytes). Some of these genes (ROCK2, SPARC, ELN, RGS16, TGM2, and FLNA) are involved in actinomyosin cytoskeleton remodeling, controlling the expression of adipogenic WNT genes [28]. Secreted protein acidic and rich in cysteine (SPARC) was the first matricellular protein to be linked to the accumulation of white adipose tissue [32]. SPARC inhibits differentiation of preadipocytes into adipocytes, by favoring osteoblastocyte differentiation. In this study the expression level of SPARC was higher in the lower GEBV IMF cattle, similar to that reported by Nie and Sage [33] in mice.
Animals with high GEBV IMF showed higher expression level of HIV-1 tat interactive protein 2 (HTATIP2) and signal transducer and activator of transcription 5A (STAT5A), which participate in molecular processes during the epigenomic transition phase, when insulin and glucocorticoids can lead to changes in the chromatin conformation, inducing the DNase I hypersensitivity "hotspots" [34]. In these "hotspots" Mandrup and collaborators [35] identified transcription factor motifs and binding of CCAAT-enhancer-binding proteins (C/EBP), glucocorticoid receptor (GR) and retinoid X receptor (RXR) and signal transducer and activator of transcription. STAT5 is activated by kinases associated with transmembrane receptors and play role in cell growth and division, apoptosis and inflammation [36]. STAT5 mediates energy homeostasis in response to endogenous cytokines and herein showed greater expression level in the group with high GEBV for IMF. HTATIP2, also more expressed in high GEBV for IMF, acts as a redox sensor, which is linked to regulation of nuclear import and is important in the transition phase [37].   Previously, Wang and collaborators [31] used spotted microarray to identify DEG due to different intramuscular fat content in Wagyu x Hereford and Piedmontese x Hereford crossbreds (Bos taurus). Similar to this study (Table 2), they identified DEG with the over-represented GO terms: muscle development and extracellular structure. Among the DEG identified by Wang and collaborators were SPARC, RGS and STAT family genes, which also were identified in this study. Lee and collaborators [30] evaluated gene expression due to differing IMF levels in Bos taurus coreanae animals and identified GO terms associated with biological adhesion, ECM, metabolic and development processes, which corroborate the findings of this study ( Table 2). Lee and collaborators [30] identified some protein components of ECM (ITGA1, ITGB1, COL2A1, COL11A1, and COL11A2) significantly up-regulated in IMF, as identified herein.
In this study, PPARg and fatty acid synthase (FAS), major genes associated with terminal differentiation and fat deposition, presented low expression level in both H and L groups (S2 Table) and the difference of expression level was not significant between these groups. This could be explained by the fact that the animals used in the current study were in early stage of fat deposition (between 1.6 and 4.6% of IMF), while studies that identified terminal differentiation genes employed animal with 7.08% of IMF [38]. These findings corroborate with previous studies, which demonstrated that some Bos taurus breeds are genetically predisposed to deposit intramuscular fat earlier than Bos indicus breeds [32,39,40]. Lee and collaborators [31] evaluated differences in gene expression between three different adipose tissues (omental, subcutaneous and intramuscular). FASN, FABP4, LPL, THRSP, DGAT1 and PPARg, all involved in adipogenesis and lipid metabolism, were significantly down-regulated in the intramuscular fat tissue as compared to other tissues. Based on these results, the authors suggested that those genes showed lower metabolic activities in intramuscular tissue of animals over 30 months old. Previous study [41] also reported that the PPARg expression level was different in the Longissimus muscle from Holstein and Charolais bulls slaughtered at 18 mo of age.
Similar to Huff and collaborators [42], De Jager and collaborators [6], presented the correlation between the PPARg gene set (TF, ACSS2, ACLY, PPARg, CEBPA and CYB5A) and IMF in Wagyu x Hereford and Piedmontese x Hereford (Bos taurus crosses) crosses, and Brahman (Bos indicus). These authors reported that the correlation between IMF percentage and PPARg gene set was weak in animal with lower IMF percentage (1.9% in average), i.e. in animals of Brahman breed. Enrichment analyses detected 13 over-represented functional classes and pathways related to: 1) Small molecule signaling (L-cysteine, Zn 2+ , H 2 O 2 , Mg 2+ , ATP, retinoic acid, NO, IL-1ß, NF-kB and ROS); 2) Complex molecules (oxidized LDL); 3) Functional classes (inflammatory cytokines and protein tyrosin-kynase) as central regulators, which are involved in molecular mechanisms of adipogenesis ( Table 3).
The L-cysteine pathway (Fig 1) regulates the expression of ELN and MT3 that are involved in ECM process during the commitment phase of adipogenesis and have higher expression in animals with low GEBV for IMF. L-cysteine is involved in acylation, a covalent modification of intracellular polypeptides by the addition of C16 palmitic acid by a thioester linkage to cysteine residues of proteins [42,43].
Elastin (ELN) protein is a component of the extracellular matrix (ECM) and with others EMC components (collagen and thrombospondin) are required for the expansion of fat mass process in obese animals [44]. Metallothionein-3 (MT3) is a zinc-binding protein and was observed that in null MT3 male mice there is an increase in weight, resulting in obesity [45]. This obesity was caused by reduced energy expenditure and not from increased feed intake. This result suggests that MT3 expression is negatively associated with fat variation, and agrees with our observation that animals in the low GEBV for IMF group show high expression level of MT3. On the other hand, phosphoglucomutase 1 (PGM1), which is involved with glycogen storage and is associated with the transition phase of adipogenesis, had higher expression in the animals with high GEBV for IMF. PGM1 is associated with diseases such as hypertension, obesity and cardiovascular disease. It is overexpressed in skeletal muscle from insulin resistant humans [46]. The higher expression of PGM1 in animals with in high IMF GEBV corroborates the findings reported by Nguyen and collaborators [46] in human skeletal muscle from obese individuals.
The retinoic acid and inflammatory cytokine pathways contain several genes that are in common, such as STAT5A, ELN, transglutaminase 2 (TGM2) and solute carrier family 2 (facilitated glucose transporter) member 4 (SLC2A4, also called GLUT4). These genes are involved in different phases of adipogenesis. Glucose transporter 4 (GLUT4) is a major regulator of glucose uptake in adipocytes (insulin regulation), which participates the adipogenic stimulation process. In mice with selective reduction of GLUT4 there is a reduction in insulin-stimulated glucose uptake in adipocytes and consequently insulin resistance [47]. In this study, GLUT4 had higher expression level in animal with low GEBV for IMF. The pathways identified in this study agree with previously published studies that showed the influence of retinoic acid and inflammatory cytokine systems on transcription process of peroxisome proliferator-activated receptor (PPAR) family, which regulate the PPARs expression level in human, mouse and ruminant [48].
The second strategy (PCIT, RIF1, RIF2 and PIF) was performed to better understand the correlations between expressed genes and to identify putative regulators of adipogenesis. These methods revealed regulators known to be involved in adipose development, obesity and retinoic acid signaling.
Differential hubbing (DH) or differential connectivity (DC) is the difference in the number of connections, measured as partial correlations a gene has in two different states. Previous studies discovered that a gene list of extreme values of DH showed higher percentage of transcriptional regulator factors than a list obtained by differential expression (DE) analysis [11]. In this study, the DH analysis using PCIT identified a host of interesting candidate regulators of IMF variation. Biologically interesting regulators include: ankyrin repeat domain 26 (ANKRD26) and phosphatidic acid phosphatase type 2 domain containing 2 (PPAPDC2), homeobox genes, such as HOXC5, zinc finger protein, friend of GATA (FOG) family member 1 (ZFPM1), and zinc finger protein 90 (ZFP90) as putative causal regulatory genes.
Ankyrin repeat domain 26 (ANKRD26) is expressed in the hypothalamus, brain, liver, adipose tissue and skeletal muscle and is located within the cell membrane. In humans and in mice, ANKRD26 is responsible for white adipose tissue insulin response and appetite control [49]. Interestingly, one of the pathways enriched in this study, retinoic acid, has been demonstrated to be associated with adipose metabolism and Sahab and collaborators [50] showed that down-regulation of RAR could impact the expression of ANKRD26.
The homeobox family of genes is a conserved family of transcription factors, which play important roles in morphogenesis, metabolism [51] and differentiation of adipocytes [52].
Phosphatidic acid phosphatase type 2 domain containing 2 (PPAPDC2) is located within the endoplasmic reticulum and nuclear envelope in mammalian cells. Phosphatidic acid is a new class of lipid mediators, which are involved in in cell growth, proliferation and reproduction pathways such as a regulatory factor [53]. In Saccharomyces cerevisiae, phosphatidic acid was reported as an essential metabolic intermediate and a signaling lipid [54].
Zinc-finger protein, friend of GATA (FOG) family member 1 (ZFPM1) and zinc finger protein 90 (ZFP90) are important putative regulator genes in lipid metabolism. GATA family proteins are zinc-finger transcription factors, which recognize GATA motifs in DNA. GATA plays an important role in transcriptional regulation of many genes [55]. ZFP90 is associated with RNA polymerase II core, which is involved in negative regulation of transcription (GO:0001078). Previous study also showed the zinc-finger protein as transcriptional regulators comparing the expression profiles between adipogenic and non-adipogenic fibroblasts [28].
The RIF1 analysis identified as putative regulators IGF2 and GLI2, and RIF2 identified MPC1 and UBL5. The identification of IGF2 as a potential regulator of the degree of IMF has been suggested previously in pigs [56] and in mice [57], as it is involved in adipocyte proliferation [58]. The GLI2 gene, although not documented to play a role in adipose development, is a regulator of cellular proliferation and is known to be regulated by retinoic acid signaling [59], which is consistent with the signaling mechanisms identified by Pathway Studio. The MPC1 gene is a part of a mitochondrial pyruvate carrier complex and has been implicated in the mitochondrial response to insulin and PPARg signaling [60]. The UBL5 gene has been suggested to impact adipose deposition in both the pig and human. The UBL5 is a candidate gene for meat quality and IMF in the pig [61] and also associated with body fat and fat accumulation related to metabolic dysfunction and diabetes in humans [62,63]. Previous study also reported that PPARg proteasomal degradation is ubiquitin-dependent and the stable overexpression of ubiquitin gene reduces PPARg protein levels and suppress adipocyte differentiation in human cell [64].
One of the top PIF genes was microRNA-1281, previously identified as an adipose expressed miRNA that may regulate the lipid metabolism gene EP300 [65]. The functional enrichment by DAVID database showed that the PIF approach was well suited to identify putative regulators involved with the phenotype studied (intramuscular fat variation).
Associations between GEBVs and RNA abundance are based on the assumption that changes in RNA levels are related to genetic differences in IMF potential in animals with extreme GEBVs for IMF since environmental variation known to impact IMF is account for within the genetic prediction model. However, it is possible that environmental factors could still impact which RNAs are identified as differentially expressed.
In this study, changes in gene expression in this study may indicate changes in IMF, muscle or other cell types within muscle that impact the content of IMF. A portion of the DE genes identified in this study may be related to differences in the lipid content within intramuscular adipocytes or extracellular matrix (ECM) deposition. Changes in genes related to ECM proteins deposition in this study are consistent with previous studies that indicate that increased ECM proteins deposition is associated with increased lipid content within intramuscular adipocytes [66,67]. This may provide an explanation for DE genes related to ECM growth.
The genes identified by association and network analysis in this study could be related to any of the physiological processes involved in creating variation in IMF. It is difficult to determine if IMF deposition was altered versus other processes like lipid filling in intramuscular adipocytes or other changes in muscle physiology since IMF deposition rates were not measured over time in this study. The lack of FA and TAG synthesis pathway genes may indicate that the DE genes in this study may be related to the lipid content or filling of intramuscular adipocytes which results in the observed variation in IMF levels. Since this study provides novel information about transcriptional differences in the Nellore Bos indicus breed, it is possible that novel genes may be detected that impact the variation in IMF levels. This may represent genetic or physiological differences in Nellore compared to other breeds. Further studies are needed to discern the difference between these two possible explanations.

Conclusions
The present study showed the complexity of Longissimus dorsi muscle transcriptome and the molecular mechanisms involved in lipid metabolism related to differences in IMF in extreme IMF GEBV Nellore cattle. Differential gene expression and pathway enrichment analysis identified a number of genes and pathways related to adipogenesis and lipid metabolism. These changes in gene expression and associated pathways indicate that animals in the high GEBV for IMF mature earlier in respect to IMF content. At the same age, animals with low GEBV for IMF had a higher expression of genes related to the commitment phase of adipogenesis, whereas animal with high GEBV for IMF had higher expression of genes related to the transition phase. Furthermore, there appears to be differences between Bos indicus and Bos taurus in regards of IMF deposition. This study indicated that retinoic acid signaling, IGF2 and ANKRD26 are important regulators of molecular mechanisms related to IMF content and adipogenesis process. These findings contribute to a better understanding of the molecular mechanisms underlying fat deposition and energy balance in muscle of ruminant, and may provide important information to other species, such as human and mouse.  Table. Regulatory impact factor 1 (RIF) scores between the high and low IMF groups in Nellore steers. (XLSX) S6 Table. Regulatory impact factor 2 (RIF) scores between the high and low IMF groups in Nellore steers. (XLSX) S7 Table. Phenotypic impact factor (PIF) scores between high and low IMF groups in Nellore steers. (XLSX) S8 Table. Functional term enrichment (FDR < 0.01) of 3,000 genes with higher phenotypic impact factor (PIF) scores comparing high and low IMF GEBV in Nellore steers. (DOCX)