A Validated Genome Wide Association Study to Breed Cattle Adapted to an Environment Altered by Climate Change

Continued production of food in areas predicted to be most affected by climate change, such as dairy farming regions of Australia, will be a major challenge in coming decades. Along with rising temperatures and water shortages, scarcity of inputs such as high energy feeds is predicted. With the motivation of selecting cattle adapted to these changing environments, we conducted a genome wide association study to detect DNA markers (single nucleotide polymorphisms) associated with the sensitivity of milk production to environmental conditions. To do this we combined historical milk production and weather records with dense marker genotypes on dairy sires with many daughters milking across a wide range of production environments in Australia. Markers associated with sensitivity of milk production to feeding level and sensitivity of milk production to temperature humidity index on chromosome nine and twenty nine respectively were validated in two independent populations, one a different breed of cattle. As the extent of linkage disequilibrium across cattle breeds is limited, the underlying causative mutations have been mapped to a small genomic interval containing two promising candidate genes. The validated marker panels we have reported here will aid selection for high milk production under anticipated climate change scenarios, for example selection of sires whose daughters will be most productive at low levels of feeding.


Introduction
Likely effects of climate change are rising temperatures in some food production areas, water shortages and rising grain prices due to increased demand for human food and biofuel feedstuffs [1][2][3][4][5][6][7]. As a result, future dairy farming systems may become increasingly reliant on pasture instead of grain to feed cows. In this scenario, the selection of dairy cows that can produce at high levels with lower levels of feeding is important. As cattle reproduce slowly, we need to develop methods to select suitable cattle before the change in production systems occurs. Fortunately the range of production environments in which dairying is already carried out in Australia is wide, from fully pasture based systems to fully feedlot based systems, and from tropical climate to temperate climate [8]. This gives us a chance to discover loci or genetic markers that can be used to select cattle that are suitable for future farming systems.
Genetic variation in the sensitivity of milk production of dairy cows to environment has been reported. For example, as heat stress is increased, dairy sires change ranking in their estimated breeding values (EBVs) for milk yield [8][9]. Some re-ranking of dairy sires based on the level of feeding of their daughters has also been reported [8,10]. This indicates that it is possible to select cows that are less sensitive to heat stress and low feeding level than average cows. Although traditional selection methods could be used to achieve this change in environmental sensitivity, gains could be accelerated if the loci responsible for the genotype by environment interaction, or DNA markers in linkage disequilibrium with these loci, could be identified and then used in marker assisted selection. In an attempt to find such genetic markers, we combine milk production recording information and historical climatic data from a wide range of environments across Australia, with genome wide dense single nucleotide polymorphism (SNP) data on dairy sires. This enabled a genome wide association study for sensitivity of milk production to environmental parameters. An across breed validation strategy was used to refine the genomic interval containing the causative mutation underlying these associations.

Methods
Three data sets were used. The discovery data set consisted of first lactation test day milk yield records of 62343 Holstein Friesian cows sired by 798 sires, milking across the range of environments of dairying in Australia, from the Australian Dairy Herd   Within each data set, the average daily milk production for each herd (herd test day milk yield or HTDMY) at the time the cows were milked was used as a surrogate for the level of feeding, as actual feeing information is unavailable on this scale, and average daily milk production in a herd has a close relationship with actual level of feeding [11]. Temperature and humidity data for each date of test were extracted from a dataset provide by the Queensland Department of Environment and Resource Management DataDrill [12] project. These records are derived from interpolation of meteorological station data onto a 5-65-km grid across Australia. The data are interpolated onto a twodimensional spline providing the ''best estimate'' of daily weather variables on a 5-65-km grid. The dairy farms in the study were located near a number of meteorological stations recording daily weather measurements, Figure 1. These data were used to calculate the temperature humidity index (THI) on the day of milking for use as a measure of heat stress [8]. In Bos taurus cattle, stress as measured by respiration rate increases rapidly when the maximum daily THI is above 74 units [13,14]. Earlier investigations showed heat stress only affected milk production . Responses in milk production to environmental variables for different sires A. Predicted response in daily milk production of daughters to temperature humidity index (THI) for the two most extreme sires from the data set. In a climate change scenario where the THI increases significantly, sire 2 should be selected for breeding as the milk yield of his daughters is relatively insensitive to THI. B. Predicted response in daily milk production of daughters to herd average daily milk production (HTDMY), a surrogate for the level of feeding, for two sires from the data set. With low levels of feeding, eg. low inputs of grain, sire 2 could be considered as his daughters produce more milk than the daughters of sire 2 at very low levels of feeding. doi:10.1371/journal.pone.0006676.g003 above 60 THI units [8]. To accommodate this when THI was the environmental descriptor, all values of THI below 60 were given the value of 60.
The next step was to derive the sensitivity of the milk yield of the sires of the cows to changes in either THI or HTDMY, again within each data set. We did this by regressing the sire's daughters daily milk yield on the environmental variable for the same day (THI or HTDMY) using a random regression model The intercept of the regression is the relative average milk production of the sires's daughters at the mean level of the environmental variable. The slope can be interpreted as the sensitivity of the milk yield of a bull's daughters to changes in the environmental variable. These traits are designated HTDMY int , HTDMY slope , THI int and THI slope . The model used to estimate daughter yield deviations for the four traits was where y ijkl is yield of milk from the i th herd test day, j th year season of calving, k th sire and l th cow in her first lactation, m is the overall mean, HTD i is the effect of the i th herd test day; YS j is the effect of the j th year season of calving, x n is the nth-order orthogonal polynomial corresponding to age on day of test, A n is a fixed regression coefficient of milk on age at test, Z n is the nth-order polynomial corresponding to days in milk (DIM) at test, D n is a fixed regression coefficient of milk yield on DIM, P ln is a random regression coefficient on the environmental descriptor for the l th cow, S kn is a random regression coefficient on the environmental descriptor for the k th sire, W n is either the intercept (n = 0) or slope (n = 1) solution for HTDMY or THI, and e ijkl is the vector of residual effects.
In the next step, the 798 Holstein Friesian dairy sires of the cows in the discovery data, the 453 Holstein Friesian dairy sires in first validation data set, and the 364 Jersey sires were genotyped with the Illumina BovineSNP50 beadchip containing more than 56,000 SNP assays [15]. Samples were screened for the proportion of missing genotypes, and animals with greater than 10% missing genotypes were removed. The SNPs were included only if they met the following criteria; call rate .90%, minimum allele frequency .5%, and did not have extreme Hardy Weinberg (HWE) x2 values (.600). The rational here was to remove SNPs where genotype calls were poorly clustered. In our complex pedigree population, actual HWE values can be quite misleading, so we prefer not to remove SNPs with a lower cut off. The many other quality control steps are likely more effective at removing problematic SNPs than HWE scores. Parentage checking was performed, and any genotypes incompatible with pedigree were removed. There were 781, 400 and 362 samples in the discovery data set, first validation data set and second validation data set respectively with greater than 90% of SNPs genotyped, these were used for further analysis. There were 39048 SNPs that satisfied all selection criteria.
The SNPs were ordered by chromosome position using Bovine Genome Build 4.0 (http://www.ncbi.nlm.nih.gov/projects/genome/guide/cow/). The genotypes were then submitted to fastPHASE [16] chromosome by chromosome. The missing genotypes were taken as those filled in by fastPHASE. Accuracy of filling in missing genotypes was assessed by removing known genotypes at every 50 th position for 10% of animals on chromosome 26. Imputed genotypes were then compared to the known genotypes. There were 3571 missing genotypes filled in by the fastPHASE program 3525 of which were correct, giving an accuracy of 98.7%. For comparison, an approach which filled in missing genotypes by sampling from a uniform distribution with mean allele frequency gave an accuracy of only 51.1%. Average marker spacing was 66.5 kb. Average LD between adjacent markers, measured by r 2 , was 0.271.
A linear model was fitted to the sires' daughter yield deviations for HTDMY int , HTDMY slope , THI int and THI slope to determine if the SNPs accounted for any of the between sire variation in these traits. The top-bottom called genotypes were re-coded as 0 for the  The number of SNPs tested in the validation data sets was HTDMY int 152, HTDMY slope 76, THI int 92, THI slope 42, as in Table 2 homozygote of the first alphabetical allele, 1 for the heterozygote, and 2 for the homozygote of the second alphabetical allele. The SNPs were fitted to the sire solutions for intercept and slope for either HTDMYor THI: S kmn~mm zSe k zx km b m ze kmn where S kmn is the estimated effect for intercept (m = 0) or slope (m = 1) (analysed separately) for the k th sire with SNP genotype x km (either 0,1 or 2), b m is the effect of the SNP for intercept (m = 0) or slope (m = 1), and Se k is the random residual effect of sire k and other parameters are as defined above. The variance of the sire effects was As 2 S where A is the relationship matrix among the sires and s 2 S is the sire variance. Fitting the relationship between the sires should remove any spurious associations due to population structure. The relationship was derived from the herdbook pedigree of the sires which dates back to 1940.
All data analyses were performed using mixed linear models with variance components estimated by residual maximum likelihood [17].

Results and Discussion
The milk production records were sourced from farms across a wide range of environments, Figure 1, with a resulting large range of THI and HTDMY across milk recording days, Figure 2. The results indicated considerable sire by THI and HTDMY interaction, Figure 3. The distributions of both THI and HTDMY in the data set indicated a large range for these environmental descriptors (Figure 2). In a larger data set the genetic correlation between milk production at the 5 th and 95 th percentile of THI and HTDMY was 0.93 and 0.84 respectively [8].
Using P,0.001 as a significance threshold, a number of significant associations were detected for all four traits (Table 1). False discovery rates (the ratio of expected significant SNPs given the significance level to the actual number of SNPs) were moderate for HTDMY slope , and high for THI slope . These results are consistent with our previous finding that there is less genetic variation in sensitivity to heat stress than to feeding level [8].
We then attempted to validate the significant results in the two independent sets of data. The significant SNPs for HTDMY slope and THI slope from the discovery data set were tested in this validation data set. The significant SNPs from the intercept traits are described and discussed in more detail in Pryce et al. [18]. Despite the moderate to high false discovery rate among the SNPs in the discovery data set, more markers were significant (P,0.05) than expected by chance in the validation data set, at least for Table 3. SNPs for HTDMY slope and THI slope validated in the Jersey data set and their effects in the discovery and validation data set. HTDMY slope , Table 2 and 3. For the majority of the validated markers, the direction of the SNP effects was the same in the discovery and Holstein validation data sets, although the magnitude of the estimated effects was generally reduced in the validation data set. For THI slope , the number of validated SNPs was lower, however one SNP on BTA29 was validated in both breeds, Table 3. This report illustrates the power of experiments in dairy cattle. By utilising the large databases of milk production records that are maintained by the dairy industry we can estimate the genetic merit of each bull with high accuracy and consequently use a rather small number of bulls with SNP data to detect loci for such a complex trait as sensitivity of milk production to feeding level and temperature humidity index. In addition, the recent small effective population size in Holstein Friesian cattle has led to useful LD extending for considerably larger distances than in humans for example, which increase the prospects of finding associations with the marker density used in this experiment [19][20][21]. However, this leads to associations across large genomic intervals, Figure 4. These genomic intervals can be refined by using an across breed validation strategy. The extent of LD between breeds such as Holstein and Jersey is such that markers will only be validated across breeds if they are very close to the causative mutation [22], so the validation strategy we have used should map the causative mutation to a small genomic interval (eg. Figure 4). An across breed mapping strategy has been successfully used to map traits including coat pattern in dogs, a species with a similar pattern of linkage disequilibrium to cattle [23].
We investigated the list of genes and their reported functions in the region of the SNPs with validated effects in both breeds. One such SNP associated with sensitivity of milk production to THI was located on BTA29 position 48329079 bp. Of the genes in the region, the strongest candidate for harbouring a mutation affecting the trait is fibroblast growth factor 4 (48851846 bp to 48852868 bp). This gene is a regulator of mammary epithelial cells apoptosis during both morphogenesis and involution of the mammary gland. In transgenic mice over-expressing human FGF4, the most striking effect caused by FGF4 over-expression was on the remodelling of mammary tissue at the end of lactation [24]. In human testis, FGF4 expression is increased in response to increasing temperature, with a putative role in protecting germ cells [25]. FGF4 was expressed in bovine mammary gland at 90 days of lactation at a moderate level (Bovine Gene Atlas http:// www.agbase.msstate.edu/bovineatlas/). While a search of dbSNP did not reveal any SNPs within exons of this gene, there is a SNP in the promoter region of the gene which warrants further testing (at 48856593 bp, ARS-BFGL-NGS-65571).
The most promising candidate for harbouring a mutation affecting sensitivity to feeding level (HTDMY slope ) is located between the two most significant SNPs from the Jersey validation, on chromosome 9 (Table 3, Figure 4). This gene (NCBI XM_865508), between 33155321 bp and 33156376 bp, is similar to the glycerol-3-phosphate dehydrogenase-1-like gene. The translated bovine gene at this location is 88% identical to the peptide sequence of the human and bovine glycerol-3-phosphate dehydrogenase-1-like predicted proteins and 72% identical to that for bovine glycerol-3-phosphate dehydrogenase-1. Because of it's similarity to glycerol-3-phosphate dehydrogenase (G3PD), the candidate gene retains the G3PD sequence motif and might be expected to exhibit similar enzyme activity. G3PD is at the nexus of pathways for carbohydrate and phospholipd metabolism and is therefore a key enzyme for energy utilisation. In one study, a high carbohydrate diet fed for a prolonged period induced hyperglycaemia, hyperinsulinaemia, and islet hyperplasia in the mice with normal mitochondrial glycerol-3-phosphate dehydrogenase function, while mice with disrupted mitochondrial glycerol-3-phosphate dehydrogenase function did not develop these traits, but did show increased insulin sensitivity [26]. Given that insulin sensitivity differs between cows which differ in their milk production response to the level of feeding [27], we could hypothesise that a mutation in the candidate gene or it's regulatory regions alters insulin sensitivity which in turn alters the response of milk yield to the level of feeding. Two adjacent SNPs in the coding region of the gene (rs42378599 and rs42378600) have been reported in dbSNP and these warrant further testing.
In this paper we have identified and validated panels of markers to enable selection of dairy cattle for adaptation to the altered production systems that are possible under climate scenarios. For example, the validated SNPs affecting HTDMY slope should be valuable to select bulls to generate daughters that will be productive at low levels of feeding, if high energy feed stuffs become increasingly scarce.