Integrated analysis of genome-wide association studies and 3D epigenomic characteristics reveal the BMP2 gene regulating loin muscle depth in Yorkshire pigs

Background The lack of integrated analysis of genome-wide association studies (GWAS) and 3D epigenomics restricts a deep understanding of the genetic mechanisms of meat-related traits. With the application of techniques as ChIP-seq and Hi-C, the annotations of cis-regulatory elements in the pig genome have been established, which offers a new opportunity to elucidate the genetic mechanisms and identify major genetic variants and candidate genes that are significantly associated with important economic traits. Among these traits, loin muscle depth (LMD) is an important one as it impacts the lean meat content. In this study, we integrated cis-regulatory elements and genome-wide association studies (GWAS) to identify candidate genes and genetic variants regulating LMD. Results Five single nucleotide polymorphisms (SNPs) located on porcine chromosome 17 were significantly associated with LMD in Yorkshire pigs. A 10 kb quantitative trait locus (QTL) was identified as a candidate functional genomic region through the integration of linkage disequilibrium and linkage analysis (LDLA) and high-throughput chromosome conformation capture (Hi-C) analysis. The BMP2 gene was identified as a candidate gene for LMD based on the integrated results of GWAS, Hi-C meta-analysis, and cis-regulatory element data. The identified QTL region was further verified through target region sequencing. Furthermore, through using dual-luciferase assays and electrophoretic mobility shift assays (EMSA), two SNPs, including SNP rs321846600, located in the enhancer region, and SNP rs1111440035, located in the promoter region, were identified as candidate SNPs that may be functionally related to the LMD. Conclusions Based on the results of GWAS, Hi-C, and cis-regulatory elements, the BMP2 gene was identified as an important candidate gene regulating variation in LMD. The SNPs rs321846600 and rs1111440035 were identified as candidate SNPs that are functionally related to the LMD of Yorkshire pigs. Our results shed light on the advantages of integrating GWAS with 3D epigenomics in identifying candidate genes for quantitative traits. This study is a pioneering work for the identification of candidate genes and related genetic variants regulating one key production trait (LMD) in pigs by integrating genome-wide association studies and 3D epigenomics.

Response: Thank you for this insightful suggestion. In this study, we identified a QTL region (15.51-16.31 Mb) significantly associated with LMD. Further LDLA analysis showed the most significant SNP site were located in a LD block, which were named as block2 and was considered being an important candidate QTL region associated with LMD. Moreover, 3D epigenomic characteristics were used to dissect the functions of SNPs. Results showed that the LD block2 is located within a sub-domain and the LD block1, LD block3, LD block4 were outside the sub-domain. SNPs within the LD block2 have higher possibility to interact with genomic regions within the sub-domain, rather than other LD blocks outside the sub-domain. Thus we concluded the LD block2 is an important candidate QTL region for LMD. Further experiments including the dual-luciferase expression assays and EMSA were performed to identify the important variant sites associated with LMD. We agree that we cannot totally exclude the possibility that sequence variants outside this region contribute to the LMD QTL effect because we don't have direct proofs that they do not contribute to the variants of LMD, but according to our results, the LD block2 is indeed an important candidate region for LMD. We have revised the wording of the full paper to realistic describe our results (Line 412-427).
Specific comments 1. L302-(Narrowing down the QTL region). I am still not convinced that these data are sufficient to narrow down the QTL region from about 800 kb to about 10 kb. The reason is that the authors have used a sparse SNP panel. There could be other sequence variants within the 800 kb region with equally strong association to phenotype as those within LD block 2. For instance, Figure 3 shows that the SNP INRA0052824 that is located about 800 kb from the top SNP still shows D'=1 (or very close to 1) to the top SNP in the 10 kb region.

Response:
We are very grateful for your thoughtful suggestions. After consideration, we decide to revised "Narrowing down the QTL region" to "Linkage disequilibrium analysis highlights the candidate QTL region harboring BMP2" (Line 288-289).
In this study, we identified a QTL region (15.51-16.31 Mb) significantly associated with LMD and the most significant SNP sites were located in block2. Subsequently, through LDLA analysis, the LD block2 region was considered being an important candidate QTL region associated with LMD. To further integrate the Hi-C data, our results showed the LD block2 independently located in a sub-domain, which indicate the SNPs within the LD block2 have higher possibility to interact with genomic regions within the sub-domain. We agree that we cannot exclude the possibility that sequence variants outside this region contribute to the LMD QTL effect, but our results also suggest the LD block2 is an important candidate region. With the help of 3D epigenetic data, the candidate region was identified.
2. Figure 4d. This figure illustrates very well that it is impossible to identify a 10 kb interval as the sole genomic region harboring causal mutations for the LMD QTL because the strong association with phenotype is over a broad region and may reflect a haplotype effect. To make this figure coherent I suggest that the authors mark their favorite region 15.65-15.75 in Fig. 4d. As far as I can see it is not at all the most strongly associated region in 4d. The authors should also explain in the legend that this plot tests for genetic association to the LMD phenotype.
Response: Thank you for your suggestions. This graph only shows whether such region is important or not, but we do not prefer to emphasize this region in Fig 4d  because it does not mean that the SNP with highest p-value is the most important one. In the single-SNP association study, it was assumed that all the individuals were independent to each other and no genetic relationship was corrected because we cannot correct such relationships with the limited regional re-sequenced data, and thereby the results are not conclusive. If mark the region 15.65-15.75 in Fig. 4d, it may erase the importance of the results in GWAS. Fig. 4d just provides clues about important candidate SNPs in the single-SNP association. We followed your suggestion to add details in the legend (Line 740). Fig. 5d when the result first is highly significant but then the authors make the specific T to G change in the T construct and then there is no significant difference. The authors conclusion on line 364-365 "that SNP 15684170 is not an important site for enhancer activity" is wrong. This is still possible but the important conclusion is that the highly significant difference between the T and G construct must be caused by another SNP in this interval, perhaps that is a candidate causal sequence variant? The authors end this section by concluding that SNP rs321846600 is the best candidate SNP. However, it is not clear if the 800 bp fragment also carries other SNPs that may be important. The authors need to sequence the four 800 bp fragments and declare which sequence differences each region contains in a Supplementary Table. If the fragment with SNP rs321846600 contains multiple sequence variants they need to mutate the candidate SNP to prove that this is underlying the difference in luciferase activity between constructs. Response: Among the 5 candidate SNPs, four of them (rs328487632, rs319025934, rs321766789 and 15684170, Fig. 5a, 5b and 5d) were not considered as candidate variants as following: 400-bp genomic region flanking the significantly associated SNPs with different allelic combinations was cloned into pGL3-promoter luciferase reporter vectors respectively. We compared the luciferase activities between different alleles.

is apparently that these 800 bp fragments may contain other SNPs in addition to the candidate SNP. This becomes apparent in
For SNPs rs328487632&rs319025934, and SNP rs321766789 (Fig. 5a and 5b), the enhancer activities were not significantly changed between different alleles, No matter how many other SNPs the 800 bp genomic regions contained, these SNPs (rs328487632&rs319025934, and SNP rs321766789) do not play a decisive role.
For the SNP 15684170 (Fig. 5d), we found the enhancer activities are significantly different between 15684170T and 15684170G. However, due to the fact that the 800-bp contains other SNPs, we need to verify if the SNP 15684170 plays the decisive role. Thus, we mutated the T allele to G allele at this position. Results showed that SNP 15684170 is not underlying the difference in luciferase activity between constructs because enhancer activity does not change significantly between the two alleles.
For the SNP rs321846600 (Fig 5c), in the 800-bp genomic region, it only contains this one SNP. Our dual-luciferase expression assays results showed that the C allele of rs321846600 showed significantly higher enhancer activity than the T allele, which implied that the SNP rs321846600 in the enhancer region of BMP2 may play a crucial role in the regulation of BMP2 expression. Furthermore, the EMSA results showed that, for rs321846600, the T>C mutation decreased the DNA-protein binding affinity (Figure 5e). Thus, we considered SNP rs321846600 is the one of candidate SNPs for our study.
This process can be found in Line 332-351. 4. L432-445. I still find the authors arguments here problematic. They are mixing up genetic significance and functional significance. Epigenetic data cannot be used to narrow down a QTL region. It can be used to test the functional significance of SNPs within a QTL region. In this paper the authors have decided to focus on a 10 kb region but they have not excluded that sequence variants outside this 10 kb region but within the 800 kb region contribute to the QTL effect or even is more important than the possible effect of the variants within the 10 kb region. This paragraph needs to be modified to not be misleading to the field. In particular the sentence (The significant SNPs ..) on line 440-441 needs to be followed by something like this: "However, this does not exclude the possibility that sequence variants outside this region contribute to the LMD QTL effect in the genomic region 15.51-16.31 on SSC17."