The genetic architecture of helminth-specific immune responses in a wild population of Soay sheep (Ovis aries)

Much of our knowledge of the drivers of immune variation, and how these responses vary over time, comes from humans, domesticated livestock or laboratory organisms. While the genetic basis of variation in immune responses have been investigated in these systems, there is a poor understanding of how genetic variation influences immunity in natural, untreated populations living in complex environments. Here, we examine the genetic architecture of variation in immune traits in the Soay sheep of St Kilda, an unmanaged population of sheep infected with strongyle gastrointestinal nematodes. We assayed IgA, IgE and IgG antibodies against the prevalent nematode Teladorsagia circumcincta in the blood plasma of > 3,000 sheep collected over 26 years. Antibody levels were significantly heritable (h2 = 0.21 to 0.57) and highly stable over an individual’s lifespan. IgA levels were strongly associated with a region on chromosome 24 explaining 21.1% and 24.5% of heritable variation in lambs and adults, respectively. This region was adjacent to two candidate loci, Class II Major Histocompatibility Complex Transactivator (CIITA) and C-Type Lectin Domain Containing 16A (CLEC16A). Lamb IgA levels were also associated with the immunoglobulin heavy constant loci (IGH) complex, and adult IgE levels and lamb IgA and IgG levels were associated with the major histocompatibility complex (MHC). This study provides evidence of high heritability of a complex immunological trait under natural conditions and provides the first evidence from a genome-wide study that large effect genes located outside the MHC region exist for immune traits in the wild.


Introduction
Individual differences in immune responses are widely observed in nature, and are likely to be a major factor underlying variation in disease resistance, health and evolutionary fitness in vertebrates [1]. Understanding the role of genes and the environment in generating this variation over the lifetime of an individual is therefore of considerable importance across disciplines including immunology, human and veterinary medicine, and evolutionary biology [2,3]. Yet, efforts to quantify the genetic basis of immunological variation have been largely limited to studies of humans, domesticated livestock and laboratory rodents [4][5][6][7][8]. Experimental studies provide a controlled environment in which to address mechanistic questions about immune function, but offer limited insight into immunological variation under more complex and challenging environmental conditions, as illustrated by dramatic differences in immune phenotypes between wild and laboratory rodents [9,10]. Whilst domesticated livestock and humans are subject to more complex environments, individuals generally receive regular treatment against prevalent infectious agents, which is likely to modify the natural immune response and the contributions of underlying genetic variation. Therefore, widening immunological studies to encompass more natural, untreated vertebrate systems will allow us to obtain an accurate understanding of the role of genetics in driving variation in immune phenotypes more broadly [1,2]. Here, we harness an extensive data set of immune phenotypes and genotypes from an unmanaged and untreated population of Soay sheep to determine the relative contribution of genetic, environmental and individual variation in shaping differences in immunity, and how this varies over the lifetime of individuals.
Studies in humans and livestock have shown that variation in immune traits is often heritable; that is, a significant proportion of phenotypic variance can be attributed to additive genetic effects [3,8,[11][12][13][14][15]. Genome-wide association studies (GWAS) in these systems have identified a number of genes of relatively large effect contributing to heritable variation, most notably the major histocompatibility complex (MHC) and cytokine genes [4,5,[16][17][18][19]. In wild populations, studies have investigated the heritability of immune traits, most often in birds [20-25], with candidate gene approaches further implicating MHC and cytokine regions in cases where significant associations are observed [25][26][27][28][29][30]. However, these studies often focus on broad, non-specific immune phenotypes such as the phytohaemagglutinin (PHA) response, haematocrit levels and/or parasite burden, rather than specific immune responses to ecologically-relevant parasites [31][32][33]. In addition, candidate gene studies focus on a small proportion of the genome and may fail to identify previously undiscovered coding or regulatory regions associated with immune trait variation [34,35]. The application of genome-wide association studies of immune variation allows us to identify genomic regions and their relative contribution to immune phenotypes without a priori selection of a candidate gene or gene set. This could identify novel important genes and regulatory regions influencing parasite resistance, and test whether genes identified in human, livestock and laboratory rodent studies also shape immunological variation under natural, untreated conditions. Yet, to our knowledge, there are no genome-wide association studies of specific immune phenotypes in the wild.
In addition to genetic effects driving persistent among-individual differences in immune phenotypes, within-individual variation in immunity associated with recent exposure to parasites, nutritional state, and age are well documented in many systems [36,37]. Immunological studies in humans, livestock and laboratory rodents tend to focus on specific age groups, meaning that little is known about how temporally stable or "repeatable" immune traits are, and whether the genetic architecture underlying these traits remains consistent across age groups. To separate among-and within-individual contributions to phenotypic variation, longitudinal data across the entire lifetime is required. A growing number of longitudinal studies in humans have found that high immunological diversity is maintained by high inter-individual variation, but that immune profiles of individuals are stable across longitudinal sampling [4,[38][39][40]. Studies in cattle found that cellular based immune traits are highly repeatable, while antibody-based traits were less so [15,41]. Therefore, the ability to dissect the relative contributions of among-and within-individual variation in immune phenotype using longitudinal data collected in natural systems is a crucial step towards our understanding of the evolutionary and ecological causes and consequences of observed variation in immunity in the wild.
Domestic sheep (Ovis aries) and their gastrointestinal strongyle nematodes represent a well-understood host-parasite system, due to their agricultural and economic importance, with much recent interest in determining the genes underlying host resistance to these parasites [42]. Of the strongyle parasites, Teladorsagia circumcincta is of major economic importance for domestic sheep in temperate regions [43] and has a simple direct life-cycle, with an infective L3 stage which develops to L4 stage within the gastric glands before emerging as sexually mature adult parasites which reside in the abomasum. Defence against T. circumcincta in lambs is associated with parasite-specific IgA antibody responses directed at worm growth and subsequent female fecundity [12,44], while in older animals a hypersensitive response, involving IgE antibodies, results in expulsion of incoming larvae from the mucosa [43]. Anti-T. circumcincta IgA levels are moderately heritable in lambs and adults [12,45,46]. Candidate gene and genome-wide studies have identified regions associated with faecal egg counts (FEC) or protective immunological traits related to gastrointestinal nematodes, with candidate gene studies primarily focussed on interferon gamma (IFNγ) and the MHC [42]. However, due to the focus on identifying individuals for selective breeding and the greater impact of parasite infections in lambs, most studies focus only on lambs, with only a few studies of adult ewes [6,[46][47][48]. As a consequence, we know relatively little about age-dependent genetic effects; indeed, differences in resistance loci between lambs and adults suggest that the genetic control of these mechanisms may differ [6]. Furthermore, domestic sheep populations are almost always regularly treated with anthelmintic drugs, and resistance to all available major drug classes has been documented [49,50]. Understanding the genetic factors regulating host resistance in the absence of treatment (i.e. natural conditions) could facilitate artificial selection programmes that promote resistance and productivity without reliance on drugs [42,51,52].
The long-term individual-based study of the wild Soay sheep of St Kilda provides a powerful opportunity to understand the genetic architecture of immune traits at different ages under natural conditions. Soay sheep are infected with several gastrointestinal strongyle nematodes common to domestic sheep, predominantly Teladorsagia circumcincta, Trichostrongylus axei and Trichostrongylus vitrinus [53,54]. Strongyle nematode burden, in combination with harsh winter weather and low food availability, is a strong selective force on the sheep [53-56]. Parasite-specific antibody responses are moderately heritable [28,57] and parasite-specific IgA levels and parasite-specific pan-isotype antibody levels have been shown to be negatively associated with strongyle faecal egg count [26,57]. Recent examination of anti-nematode antibody isotypes (namely IgA, IgE and IgG) over a 26-year period showed that levels of IgG are positively associated with adult survival, and negative associations are observed between antibodies and FEC for all isotypes in lambs, but only for IgG in adults [58][59][60].
Previous examination of the genetic architecture of immune traits in Soay sheep using QTL mapping and candidate gene approaches failed to identify loci associated with parasite egg counts and pan-isotype antibody levels [28,61]; however, a microsatellite polymorphism at the IFNγ locus in lambs had previously been associated with reduced faecal egg counts and increased parasite-specific IgA levels [26]. Today, the majority of study individuals have been genotyped on the Illumina 50K OvineSNP50 BeadChip, and genome-wide association studies have identified genomic regions associated with traits such as horn morphology, body size and recombination rate [62][63][64]. Here, we investigate the heritability and conduct genome-wide association studies of anti-T. circumcincta IgA, IgE and IgG levels from > 5,800 plasma samples collected from > 3,000 Soay sheep over a 26-year period. We show that antibody levels are heritable and temporally stable over an individual's lifetime, and that several genomic regions explain heritable variation in both lambs and adults.

Phenotypic variation and heritability
Lambs in their first year of life had considerably lower antibody levels compared to older animals, so we investigated these two groups separately in our analyses (Figs 1 & S1). August T. circumcincta-specific antibody levels of IgA, IgG and IgE were weakly positively correlated with each other, with slightly stronger correlations in lambs (adjusted R 2 values from 0.078 to 0.175 in lambs, and from 0.005 to 0.012 in adults, P < 0.001; S2 Fig, S1 Table). Males had lower IgA levels as lambs and lower IgG levels as lambs and adults compared to females (Wald test P < 0.001, Figs 1 & S3, S2 Table). All three antibody isotypes were positively associated with age in days in lambs and age in years in adults, except for adult IgG levels, which were negatively associated with age (Wald test P < 0.001, Fig 1, S2 Table). Each antibody isotype was temporally stable or 'repeatable' over the lifespan of adults, as shown by high betweenindividual variation attributed to additive genetic effects and permanent environmental differences (proportion of phenotypic variance: IgA = 0.76, IgE = 0.72, IgG = 0.52; Fig 2, Table 1). This was further illustrated by a strong positive correlation between antibody measures taken in two consecutive years (IgA and IgE: slopes > 0.8, Adjusted R 2 > 0.66; IgG: slope = 0.52, Adjusted R 2 = 0.29; Fig 3, S4 Table).
All antibody measures were heritable in lambs and adults, with IgA levels showing the highest heritabilities (h 2 = 0.39 & 0.57 for lambs and adults, respectively; Tables 1 & S3, Fig 2). Heritabilities in lambs and adults were 0.21 and 0.47 for IgE, and 0.29 and 0.23 for IgG, respectively (Tables 1 & S3, Fig 2). Permanent environment effects were also significant, explaining between 18.6 and 28.5% of the phenotypic variance in adults, resulting in repeatabilities of 0.76, 0.72 and 0.52 for IgA, IgE and IgG, respectively. There was significant variation in antibody levels among birth years in lamb IgA and IgG measures, although the effect was small (� 7% of the phenotypic variance) and there was a weakly significant maternal effect explaining < 4% of variation in lamb IgA levels (Fig 2, Tables 1 & S3). In adults, capture year explained < 1.5% of the phenotypic variance in all antibody measures. The full results of the animal models are provided in S2 Table (fixed effect structures) and S3 Table (random effect  structures). Genetic correlations among antibody isotypes in lambs were strongly positive (r A � 0.695; Fig 4), but much weaker in adults (r A � 0.232; Fig 4). Genetic correlations between lambs and adults when considering the same antibody isotype were positive and ranged from 0.33-0.61 (Fig 4), whilst adult-adult and lamb-adult genetic correlations across different isotypes were generally weaker and often not significantly different from zero (Fig 4).

Genome-wide association studies
Genome-wide association studies (N SNPs = 39,176) and regional imputation approaches identified several genomic regions associated with variation in anti-T. circumcincta IgA, IgE and IgG levels in lambs and adults (Figs 5 and S5, Tables 2, S5 & S6). SNP genotypes were fit as a two or three level factor (AA, AB or BB), rather than as an additive effect (0, 1 or 2). All test statistics were corrected using the genomic control parameter λ; this value was low for all 6 GWAS (λ < 1.074), indicating that population structure was adequately captured by fitting pedigree relatedness. Below, we discuss associations for each antibody separately, with summary information in Table 2. Full association results for genome-wide and imputed SNPs are provided in S5 and S6 Tables, respectively. Information on genes and orthologues within associated regions are provided in S7 Table and immune GO terms associated with these genes are provided in S8 Table. For the most highly associated SNPs (S6 Fig, Tables 2 & S9), we also examined sex by SNP interactions (S7 Fig & S10 Table). Whilst these interactions were  Table 1. significant for nearly all of these loci, the direction of these effects were similar unless otherwise stated below.
Anti-T. circumcincta IgA. There was a strong association between IgA levels and a region between 6.89 and 14.95 Mb on sheep chromosome 24, with the highest association observed at the SNP locus OAR24_12006191.1 in both lambs and adults (Wald test P = 1.01 x10 -31 and 2.23x10 -51 in lambs and adults, respectively; Figs 5, 6, S8 & S9; Tables 2 & S5). This SNP had an approximately additive effect on IgA levels in both lambs and adults (Table 2), with the region explaining 20.0% and 27.2% of the additive genetic variance in lambs and adults, respectively, equating to 7.8% and 15.3% of the phenotypic variance in lambs and adults, respectively. Associations at imputed SNPs in this region showed the strongest association at SNPs between 10.62Mb and 10.86Mb (maximum Wald test P = 4.08 x10 -39 and 5.74x10 -71 in lambs and adults, respectively; Fig 6, S6 Table), again with an additive effect on IgA levels ( Table 2, Table); GO terms indicated that this locus is associated with red blood cell phenotypes in humans and mice, including variation in haematocrit, erythrocyte cell number and circulating alkaline phosphate levels (International Mouse Phenotyping Consortium data; S8 Table). Whilst this gene has no clear role in driving IgA levels, the associated SNPs were downstream of two candidate genes (Fig 6; S7 & S8 Tables; distances of~1.021Mb and~709Kb, respectively): the Class II Major Histocompatibility Complex Transactivator (CIITA), which is described as a "master control factor" for gene expression at the major histocompatibility complex [65,66]; and C-Type Lectin Domain Containing 16A (CLEC16A), variants at which have been associated with common variable immunodeficiency disorder and IgA deficiency [67][68][69]. An unmapped SNP was significantly associated with IgA levels in both lambs and adults (   Tables 2 & S5), and was located~311kb to 454kb downstream of four novel genes (ENSOARG00000008862, ENSOARG00000008994, ENSOARG00000009143, ENSOARG00000009269) orthologous to various forms of immunoglobulin heavy constant alpha, epsilon, gamma and delta loci in humans (IGHA, IGHE, IGHG and IGHD, respectively; S7 & S8 Tables). These loci code for constituent proteins of immunoglobulins and have GO terms associated with variation in IgA, IgE and IgG levels in mice (S8  Table). When the threshold was relaxed to include loci with < 0.95 imputation success, the most highly associated SNPs occurred at the immunoglobulin heavy chain complex, closest to IGHA and IGHE (highest associated SNP = oar3_OAR18_68320039, Wald test P = 1.82 x10 -10 , imputation success = 0.754; S6 Table, S11 Fig).
Anti-T. circumcincta IgE. Lamb IgE levels were associated with a gene-poor region of chromosome 10, with the highest association observed at the imputed locus oar3_OAR10_ 10333145 (Figs 5 & S12, Table 2). The only protein-coding gene in this region, olfactomedin 4 (OLFM4), is associated with down-regulation of immune responses against bacterial infections in mice [70]. However, given the low minor allele frequency of this locus (S6 Fig), a lack of other associations at adjacent loci (S12 Fig, S5 & S6 Tables) and no contribution of the region to additive genetic variance (Table 2), we cannot rule out that association seen here is spurious and due to the sampling of rare alleles in individuals with extreme trait values. Adult IgE levels were associated with a region from 25.8Mb to 27.5Mb on chromosome 20, with the highest association seen at the locus OAR20_27259292.1 for both the SNP50 and imputed SNP loci (Figs 5 & S13, Tables 2, S5 and S6). This SNP is directly upstream of the major histocompatibility complex (MHC) class II locus HLA-DRA, as well as the MHC class II loci DQA and HLA-DQB1; the wider region contains~46 annotated genes with GO terms associated with immune function (S13 Fig, S7 & S8 Tables).

Table 2. SNPs showing the strongest association with anti-T. circumcincta IgA, IgE and IgG levels in lambs and adults.
The P-values provided in this table have not been corrected using genomic control to allow comparisons between directly genotyped and imputed SNPs. Asterisks next to the SNP name indicate that the most highly associated SNP was imputed from the high-density SNP chip. N 50K and N HD indicate how many SNPs were significantly associated with the trait in the same region for the 50K and HD SNP chips, respectively. A and B indicate the reference and alternate alleles at each SNP. MAF indicates the minor allele frequency (allele B); for imputed SNPs, this was calculated using the HD chip data only and not imputed genotypes. Genotype effects AA, AB and BB are the effect sizes as calculated from the associated animal model. Full results including corrected P values are provided in Fig 5 and Table. Gene positions were obtained from Ensembl (gene build ID Oar_v3.1.94) and are provided in S7 Table. https://doi.org/10.1371/journal.pgen.1008461.g006 Genetics of immune responses in a wild population of Soay sheep Anti-T. circumcincta IgG. Lamb IgG levels were significantly associated with a region from 29.6-30.9Mb on chromosome 20, with the highest association observed at the imputed locus oar3_OAR20_30876754 (Figs 5 & S14, Table 2). This region was~4Mb from the region associated with IgE levels in adults and was close to protein coding regions orthologous to MHC Class I genes (S14 Fig, S7 & S8 Tables). This locus showed a sex by genotype interaction, with the G allele dominant to A for lower IgG levels in males, compared to an additive effect in females; AA males showed higher IgG levels than in females (S7 Fig, S10 Table). A further association was observed on chromosome 16, corresponding to a region containing CD180, a gene associated with variation in IgG2b levels in mice [71] (S8 Table,

Discussion
This study is the first to examine the genetic architecture of immune traits using a genomewide association approach in a wild population. We have shown that anti-Teladorsagia circumcincta IgA, IgE and IgG levels in Soay sheep are highly repeatable within individuals' lifetimes and show substantial heritable variation that is underpinned by several genomic regions containing immune-associated genes. This suggests that antibody phenotypes have the potential to respond rapidly to selection, but also demonstrates that individual sheep develop distinct, temporally stable antibody phenotypes despite marked annual variation in exposure to nematode parasites, food availability and climate conditions [53, 72,73]. Below, we discuss the genetic architecture of these traits in more detail and how our findings inform the broader field of understanding the evolution and adaptive potential of immune traits in both domestic and natural populations.

Temporal stability of antibody levels
We observed a large increase in anti-Teladorsagia circumcincta antibody levels between lambs (aged 4 months) and adults (aged >16 months). This was consistent with previous observations in this system and is probably due to the development of anti-helminth immunity with exposure over early life [26]. In adults, antibody levels were stable within individuals, as indicated by high repeatabilities and strong temporal correlations of antibody measures between years (Fig 3, S4 Table). This low intra-individual variation is notable given the temporally and spatially variable environment that individuals experience on St Kilda. The relatively small amount of variation explained by cohort, maternal and annual effects found here suggests that temporal variation in exposure to parasites, condition or early life effects had relatively little influence on antibody levels. It is also notable that repeatabilities for each antibody isotype were high despite different isotypes being only weakly correlated with one another, suggesting complex individualised immune phenotypes which are consistent over lifetimes. Our findings are consistent with the consensus emerging from human studies, which have also determined that variation in immune parameters is driven by high inter-individual and low intra-individual variation, indicative of stable immunological profiles of individuals [4,[38][39][40]. Whilst most intra-individual variation in this study was attributed to additive genetic effects, the permanent environment effects were substantial, accounting for 19%, 25% and 29% of the phenotypic variance in IgA, IgE and IgG, respectively. At present, the factors contributing to this variation remain unknown, but may be driven by consistent spatial differences in exposure or individual disease history, or due to complex interactions between nutritional state, exposure to other parasites and life history during early life.

Heritable variation in antibody levels
Anti-T. circumcincta IgA, IgE and IgG levels were highly heritable in Soay sheep, ranging from 0.21 to 0.39 in lambs and from 0.23 to 0.57 in adults. These estimates are comparable to previous work estimating the pedigree heritability of an anti-T. circumcincta pan-isotype antibody measure (likely to be mainly comprised of IgG) in Soay sheep lambs (h 2 = 0.30) and adults (h 2 = 0.13-0.39) [28,57]. In domestic sheep, similar heritability estimates have been obtained for anti-T. circumcincta IgE in Texel lambs (h 2 = 0.39 and 0.50 against the third and fourth stage larvae, respectively [11]) and anti-T. circumcincta IgA in Scottish Blackface lambs (h 2 = 0.56 against fourth stage larvae [12]). The observation that immune traits in Soay sheep and domestic breeds appear to have substantial heritable variation is interesting from an evolutionary perspective, as selection for reduced parasite load is likely to be strong, which in turn is predicted to reduce underlying genetic variation and hence the heritability of quantitative traits [74]. In domestic sheep, anti-helminthic treatments may have relaxed the selection pressure on immune traits. Alternatively, the observed high heritabilities in both domestic and wild sheep may be in accordance with theory predicting that stabilising selection, rather than directional selection, is likely to be acting on immune traits [75], which in turn may lead to the maintenance of genetic variation at the underlying trait loci. In the Soay sheep, we have shown with the same dataset that there is little evidence for stabilising selection, with directional selection present for IgG in adults but not for other isotypes or age groups [60]. It is notable that adult IgG, as well as being under the strongest directional selection, also has the lowest heritability compared to other isotypes and age groups (Fig 2, Table 1). This is consistent with the prediction that directional selection should erode heritable variation, whilst the high observed heritabilities in general are consistent with observations of weak or variable selection on these antibody measures [60]. Nevertheless, a full understanding of the mechanisms maintaining this genetic variation will require examination of association between genotypes at significant loci with individual fitness, i.e. survival and reproductive success.

Genetic correlations among ages and antibody isotypes
Our study provides, to our knowledge, the first estimates of genetic correlations among immune phenotypes from any wild population. Our results illustrate the complexity of the genetic architecture underpinning immunological variation: even in a set of immune measures all associated with the same immune challenge (i.e. parasitic helminth antigens) and produced by the same arm of the immune system, genetic correlations among measures can still be weak. Our findings also highlight the importance of age-dependence in the genetic control of immune phenotypes, mirroring our recent finding that selection on these three isotypes is highly age-dependent in this study system [60]. Whilst the inter-isotype genetic correlations were highly positive in lambs (r A � 0.695), they were much smaller in adults (r A � 0.232). This is consistent with previously documented patterns at the phenotypic level [60]. Lambs measured in their first year have relatively naïve immune systems and have only very recently become exposed to parasitic helminths, whilst on St Kilda levels of exposure to infective larvae vary considerably over time and space [53]. The breakdown in adults of the tight genetic correlations among antibody isotypes observed in lambs and the general pattern of low correlations across age groups (Fig 4) may well reflect complex interactions between genotypes and the unique pattern of exposure and resource availability experienced by different individuals as they develop and mature. The notable exception here, the relatively high among-age genetic correlation observed for IgA antibodies (r = 0.61) presumably reflects the influence at both ages of the region identified in our GWAS on chromosome 24. From an evolutionary perspective, these results demonstrate that natural selection on different antibody isotypes against parasitic helminths is relatively weakly constrained by genetic correlations in adults at least. The genetic architecture underpinning antibody levels also does not appear to be identical in lambs and adults, and future studies in wild and domestic animals, as well as humans, should note the possibility that immunological variation may be under the control of different genes and regions depending on age or life stage.

Genetic variants associated with antibody levels
In this study, we found associations between antibody levels and known immune loci, most notably at the MHC (lamb IgA and IgG, adult IgE) and IGH complex (lamb IgA). However, the strongest association observed in this study was between lamb and adult IgA levels and a region on chromosome 24 corresponding to the gene SNX29. This gene has no previous association with immune trait variation (see Results) but occurs downstream of two candidate genes. The first, CIITA, is a master regulator of MHC class II gene expression; overexpression of CIITA in rats can induce transcription of MHC Class II genes in nearly all cell types [76] and CIITA knockout mice show impaired MHC Class II expression [77]. Mutations in CIITA in humans are associated with bare lymphocyte syndrome type II, a severe primary immunodeficiency caused by the absence of MHC class II gene expression [78]. In addition, a human GWAS study showed an association with variants at CIITA and levels of activated T cells (i.e., HLA DR+ T lymphocytes) and is in linkage disequilibrium with disease variants associated with ulcerative colitis [4]. The second candidate, CLEC16A, is almost directly adjacent to CIITA and has been associated with IgA deficiency and common variable immunodeficiency disorder characterised by inadequate levels of multiple antibody isotypes [67][68][69] and CLEC16A knockdown mice have a reduced number of B cells and increased IgM levels compared with controls [68]. In addition to antibody variation, polymorphisms at CLEC16A and CIITA have been associated with numerous autoimmune diseases, including multiple sclerosis [79,80], diabetes [81], Crohn's disease [82], adrenal insufficiency [83] and arthritis [84], and CLEC16A has been functionally linked to auto-inflammation and autoimmunity in mice [85]. Previous work on Soay sheep found that auto-reactive antibody levels are variable and moderately heritable in this system and, despite their clear links with autoimmune diseases in humans, high levels were predictive of increased chances of over-winter survival [58,86]. However, we found only weak correlations between anti-T. circumcincta antibody levels and these auto-reactive antibodies [58], suggesting they may independently reflect very different components of the humoral immune system. The question of how natural selection in the wild acts on immune measures, and on the genes found in humans to be associated with autoimmune pathology, remains a fascinating and open topic. Further investigation of the function of these genes in natural systems, such as the Soay sheep, may help address this.
Despite CIITA and CLEC16A being strong candidate genes for IgA expression a priori, they lie~1 & 0.7 Mb upstream from the GWAS peak, respectively (Fig 6). We cannot rule out that variants in protein-coding regions at SNX29 and adjacent loci may drive IgA expression. However, a more plausible hypothesis is that the associated region contains cis-regulatory elements affecting the expression of CIITA and/or CLEC16A. Direct evidence of the precise cis-regulatory regions driving gene expression is scarce, but there is increasing evidence that genes can have multiple cis-regulatory regions driving expression [87], and that cis-regulatory regions can occur at distances of >1Mb from their target genes (see [88] and references therein).
The non-MHC variants identified in this study have not previously been associated with anti-T. circumcincta IgA, IgE or IgG levels in other sheep breeds investigated to date. A genome-wide association study in Scottish Blackface lambs failed to identify any SNPs associated with T. circumcincta IgA [45], while a study in Spanish Churra ewes found one genome-wide significant SNP on chromosome 12 [6]. A quantitative trait locus (QTL) mapping study in Romney lambs found total IgE and anti-Trichostrongylus colubriformis IgG levels were each associated with a region on chromosome 23 [89]. Together with our results, it appears that QTL for parasite-specific antibody traits have not been consistently observed between sheep breeds. This may be due to different loci associated with immune responses at different ages, differences in host-parasite exposure, inherent differences between breeds driven by different selective breeding histories, and/or genetic drift [6,28,90]. Alternatively, there may be differences in the power to detect trait loci due to differences in patterns of linkage disequilibrium, effect sizes, sample sizes and/or analytical approaches between the studies. The loci identified in the current study may also be due to a genotype-by-environment effect that may only be manifested under natural conditions or could have been introduced with a historical admixture event with the Dunface breed [91]. Investigation of candidate causal mutations in the current study will shed light on the mechanisms driving antibody levels within Soay sheep, as well as their ubiquity and origin across different sheep breeds.
The identification of several large effect loci is in contrast with GWAS studies on body size and fitness-related traits in wild populations which have found few, if any, associations of SNPs with quantitative traits [29, 63,[92][93][94][95][96]. This is because wild studies are subject to limitations related to sample size, environmental heterogeneity and marker density, which may fail to identify trait loci, over-estimate effect sizes and/or generate spurious associations (e.g. as stated above for observed associations at rare variants for lamb IgE and IgG on chromosomes 16 and 10, respectively) [97]. We believe our overall findings are robust for the following reasons. This study has one of the highest sample sizes of any GWAS conducted in a wild system, with~2,000 measures in lambs and~3,800 measures in~1300 unique adults, and sampling studies in this population suggest that causal variants contributing to heritable variation are adequately tagged by the Ovine SNP50 BeadChip [63,98]. The extent of LD between genotyped SNP loci allowed successful imputation of high-density SNP loci in almost all significant regions of the genome, providing sufficient power to fine-map loci of large effect on immune phenotypes [63]. We acknowledge that reduced LD in some regions (such as on chromosome 18 for lamb IgA) may mean that some regions of the genome are less able to tag heritable variation, potentially leading to reduced power to detect some trait loci. In addition, the Ovine SNP50 BeadChip has a low SNP density around the DQA and DQB loci in the MHC class II region, reducing power to detect associations (S10 & S13 Figs). Nevertheless: the quality of imputation was high within this region; other work has shown that there is no significant difference in patterns of LD and recombination rate compared to other locations within the genome [64]; and traits were successfully mapped to the MHC region within the current study.

Conclusion
This study provides evidence of a number of major effect loci and high additive genetic variation underlying complex immune traits in a wild population of Soay sheep, and provides a foundation for determining why genetic variation persists in immune traits by investigating associations with identified trait loci with individual fitness and genomic signatures of selection. The high heritability and repeatability of immune measures, as well as low correlations between them, suggests that strong targets for selection exist; a full understanding would require multivariate analysis with individual reproductive success and survival to understand the constraints on immune phenotype evolution. Previous studies of immunity in the wild often focussed on specific immune regions (e.g. the MHC) and candidate genes encoding proteins of known immune function. Our study reveals the importance of using a genome-wide association, rather than candidate gene approach, for a clearer understanding of the genetic control of immune phenotypes. Overall, our study provides a rare example of multiple regions of large effect driving variation in immune phenotypes in the wild and presents strong evidence that immune profiles are temporally stable over an individual's lifetime.

Study population
The Soay sheep is a primitive breed of domestic sheep that was isolated on the island of Soay in the remote St Kilda archipelago several millennia ago, and has been living in unmanaged conditions since then [99]. In 1932, >100 Soay sheep were moved to the larger island of Hirta after the evacuation of all human residents. The population now fluctuates between 600 to 2,200 individuals. Approximately a third of the Hirta population lives in the Village Bay area, and these individuals have been the subject of a long-term study since 1985 [99]. In April, around 95% of all lambs born in the study area are caught each year and individually tagged. Each August, as many sheep as possible from the study population are re-captured using temporary traps [99]. At capture, whole blood samples are collected into heparin tubes, centrifuged at 3000 r.p.m. for 10 minutes, and plasma removed and stored at -20˚C.

Ethics statement
All animal work has been carried out according to UK Home Office procedures and is licensed under the UK Animals (Scientific Procedures) Act of 1986 (license no. PPL60/4211).

Quantifying antibody levels
This study quantified antibody levels in animals that were caught and blood sampled in August between 1990 and 2015, comprising 6543 samples from 3190 individuals. Five samples from late-born lambs caught in August within 50 days of birth were excluded from the dataset, due to the potential presence of maternal antibodies and differences in development stage to other lambs. Levels of the antibodies IgA, IgG and IgE against antigens of the third larval stage of Teladorsagia circumcincta were measured using direct (IgA, IgG) and indirect (IgE) ELISAs. We used T. circumcincta L3 somatic antigen, provided by the Moredun Research Institute, as the capture antigen for all three assays diluted to 2μg/ml in 0.06M carbonate buffer at pH 9.6. 50μl of the diluted capture antigen was added to each well of a Nunc-immuno 96-microwell plate, which was covered and incubated at 4˚C overnight. After washing the wells three times in Tris-buffered saline-Tween (TBST) using a plate washer, 50μl of the Soay sheep plasma sample diluted to 1:50 for IgA and IgE, and 1:12800 for IgG was added to each well. The plates were then covered and incubated at 37˚C for 1 hour. Plates were then washed five times with TBST and 50μl per well of rabbit polyclonal anti-sheep IgA detection antibody conjugated to horseradish peroxidase (HRP) (AbD Serotec AHP949P) diluted 1:16000 was added to the anti-T. circumcincta IgA assay and 50μl per well of rabbit polyclonal anti-sheep IgG detection antibody conjugated to HRP (AbD Serotec 5184-2104) diluted 1:16000 was added to the anti-T. circumcincta IgG assay. For the anti-T. circumcincta IgE assay, 50μl per well of anti-sheep IgE (mouse monoclonal IgG1, clone 2F1, provided by the Moredun Research Institute) diluted 1:100 was added, followed by 1-hour incubation at 37˚C, five washes with TBST and then 50μl per well of goat polyclonal anti-mouse IgG1-HRP detection antibody (AbD Serotec STAR132P) was added diluted to 1:8000 in TBST. All plates were then incubated at 37˚C for 1 hour. Plates were then washed five times with TBST and 100μl of SureBlue TMB 1-Component microwell peroxidase substrate (KPL) was added per well and left to incubate for 5 minutes in the dark at 37˚C. Reactions were stopped by adding 100μl per well of 1M hydrochloric acid and optical densities (OD) were read immediately at 450nm using a Thermo Scientific GO Spectrophotometer.
All results were measured as OD values due to the lack of standard solutions. To minimise confounding of capture year and age effects with plate to plate variation, each plate included samples from two years paired at random with different age groups on each plate. All plates were run in duplicate and duplicate sample ODs were removed if the coefficient of variation was > 0.2 or the difference between ODs was greater than 0.2. We also checked the correlation of ODs across duplicate plates and re-ran both plates if r < 0.8. We included two sample free wells (50μl TBST) as blanks and two wells of positive controls on each plate. The positive control for the IgE assay was pooled serum from ewes trickle-infected with T. circumcincta and for the IgA and IgG assays was pooled plasma from normal healthy non-immunised domestic sheep. For subsequent analyses, the mean optical density ratio of each sample was taken according to this formula:

SNP data set
DNA was extracted from ear tissue or buffy coats using the Qiagen DNeasy blood and tissue kit according to the manufacturer's protocol, except that a single final elution with 50ul AE buffer was used to give DNA at a concentration � 50ng/ul. A total of 7,386 Soay sheep have been genotyped at 51,135 SNPs on the Illumina Ovine SNP50 BeadChip. Quality control was carried out using the check.marker function in GenABEL version 1.8-0 [100] using the following thresholds: SNP minor allele frequency (MAF) > 0.01, SNP locus genotyping success > 0.95, individual sheep genotyping success > 0.95, identity by state with another individual � 0.95. Following quality control, 39,176 SNPs from 7,268 sheep remained. A further 189 sheep have been genotyped at 606,066 SNP loci on the Ovine Infinium HD SNP BeadChip; these sheep were selected to maximise the genetic variation represented in the population and were subject to the same quality control thresholds as above (see [64] for individual selection criteria). All SNP locations were taken from their estimated positions on the sheep genome assembly Oar_v3.1 (GenBank assembly ID GCA_000298735.1 [101]). Pedigree relationships between individuals were inferred using data from 438 SNP loci in the R package Sequoia v1.02 [102] and from field observations between mothers and their offspring born within the study area (see [63] for SNP selection criteria).

Animal models
We modelled IgA, IgE and IgG levels in lambs and adults using a restricted maximum likelihood (REML) animal model approach [103] to determine the heritability of antibody levels in ASReml-R 3.0 [104] in R v3.4.3. We analysed lambs and adults separately due to a large difference observed in antibody levels (S2 Fig) and due to the expected immaturity of the immune response in 4-month-old lambs [60]. Using the above SNP dataset, a genomic relatedness matrix (GRM) at all autosomal markers was constructed for all genotyped individuals using GCTA 1.90.2 beta0 [105] to determine the variance attributed to additive genetic effects (i.e. the narrow-sense heritability, h 2 ). Pedigree and GRM relatedness have been shown to be highly correlated in this system [98]. The GRM was adjusted using the argument--grm-adj 0, which assumes that allele frequencies of causal and genotyped loci are similar.
The fixed effect structure for the lamb models included sex and age in days as a linear covariate, while the random effects included the additive genetic component, maternal identity, birth year, ELISA plate number and ELISA run date. The fixed effect structure for the adult models included sex and age in years as a linear covariate, while the random effects included permanent environment (i.e. repeated measures within an individual) and capture year effects in addition to the random effects included in the lamb model. The proportion of the phenotypic variance explained by each random effect was estimated as the ratio of the relevant variance component to the sum of all variance components (i.e. the total phenotypic variance) as estimated by the animal model. The heritability of each measure was determined as the ratio of the additive genetic variance to the total phenotypic variance. The repeatability (i.e. the between-individual variation) of each measure in the adult and all age models was determined as the ratio of the sum of the additive genetic and permanent environment variance to the total phenotypic variance. Bivariate models were also run to determine the genetic correlation between all lamb and adult traits. For correlations within lambs or adults, the fixed effect structure of the model was as above; for correlations between lambs and adults, only sex was fit as a fixed factor. The additive genetic correlation, r A , was calculated without constraint using the CORGH function (i.e. correlation with heterogenous variances) in ASReml-R 3.0. For adult models, a permanent environment was also fit as an additional random effect. The significance of the correlation estimate was determined using the reported Z-ratio with 1 degree of freedom.

Genome-wide association studies
Genome-wide association (GWA) was used to identify associations between individual single nucleotide polymorphisms (SNPs) and IgA, IgE and IgG levels in lambs and adults. This included SNPs on the X chromosome (N = 824) and those of unknown position (N = 313). For each trait and each class, a total of 39,176 individual animal models were run to determine the association with each SNP locus. Each model used the same fixed effect structures as above, with SNP genotype fitted as a two or three-level factor. To speed up computational time, the GRM was replaced with a relatedness matrix based on the pedigree (which is highly correlated with the GRM in this population [98]), and ELISA plate ID and run date were removed as random effects as they explained a very small proportion of the phenotypic variance (Fig 2). Models were run in ASReml-R 3.0 [104] in R v3.4.3. P-values were corrected for any additional unaccounted-for population structure by dividing them by the genomic control parameter λ [106] in cases where λ > 1, to reduce the incidence of false positives. λ was calculated as the median Wald test χ 2 2 divided by the median χ 2 2 expected from a null distribution. The significance threshold after multiple testing was determined using a linkage disequilibrium-based approach with a sliding window of 50 SNPs (outlined in [107]); for a false discovery rate of α = 0.05, the threshold P-value was set at 2.245x10 -6 [64]. At the most highly associated SNPs, we repeated the animal models above including an interaction term between sex and SNP genotype to investigate if gene effects were sex-specific. The significance of this term was tested using a Wald test. Lamb IgE levels show strong right skew in their distribution (S1 Fig), which can increase spurious associations at rare alleles present in individuals with large trait values. To mitigate against this, all zero trait values were removed (N = 394), and the response variable log 10 transformed; this correction had a negligible effect on the variance component estimates.

Variance explained by significantly associated regions
In regions of the genome where a SNP locus was significantly associated with an antibody measure, the proportion of phenotypic variation explained was modelled using a regional heritability approach [108]. Briefly, a second GRM was constructed as above using 50K SNP data from the most highly associated SNP in that region and the 9 SNP loci flanking that SNP on either side (i.e. 19 SNPs in total). This GRM was fitted as an additional random effect in the animal models and used to quantify the variance explained by variants within the associated region (see [64] for further details on the use of this method in Soay sheep).

Imputation of SNP genotypes in associated regions
Further investigation of significant associations from GWAS was carried out using an imputation approach using data from individuals typed on the Ovine Infinium HD SNP BeadChip. SNP genotypes were extracted from the HD chip ±2Mb on either side of all significantly associated regions. These data were then used to impute autosomal SNP genotypes in individuals typed on the 50K SNP chip alone, using a heuristic method in AlphaImpute 1.98 [109]. Briefly, this method uses pedigree information to phase the genotypes of the high-density individuals, which is followed by a haplotype library construction. The haplotype library is then used to iteratively impute missing genotypes around known genotypes in the 50K genotyped individuals. An example parameter file is provided in the data repository described below. Parameter files for each region are included in the analysis code repository (see below). SNPs with an imputation success >95% were retained and associations between antibody levels and genotypes at each imputed SNP was calculated using the same animal model structures as outlined for the GWAS above.

Gene and gene ontology annotation in associated regions
Gene annotations in significant regions were obtained from Ensembl (gene build ID Oar_v3.1.94). Gene ontology (GO) annotations for genes occurring within 1Mb of a significantly associated SNP were obtained from humans, mice, cattle and sheep gene builds using the function getBM in the R package biomaRt v2.34.2 [110]. For genes where the gene name was not known, orthologous genes were identified using the biomaRt function getLDS. For all the genes and orthologues identified within these regions, the gene names, phenotype descriptions and GO terms were queried for all terms associated with immune function and antibodies (using the strings immun � and antibod � ).  Table 2. The model intercept is A/A females, except for locus oar3_OAR10_10333145, which is A/G females. The model results are provided in S10 Table. Sample Table. Genes coloured red have GO terms associated with immune traits (S8 Table).  Table. Genes coloured red have GO terms associated with immune traits (S8 Table).  Table. Genes coloured red have GO terms associated with immune traits (S8 Table). (TIF) S11 Fig. Local association of anti-Teladorsagia circumcincta IgA levels in lambs with SNP50 and imputed SNP loci at the most highly associated region on chromosome 18. The dotted line indicates the genome-wide significance threshold equivalent to an experimentwide threshold of P = 0.05. Points are colour-coded by their imputation status i.e. from the SNP50 chip (red triangles) or imputed from the Ovine HD chip (black points). Underlying data, sample sizes and effect sizes are provided in S6 Table. Gene positions are shown in the grey panel at the top of each plot and were obtained from Ensembl (gene build ID Oar_v3. 1.94) and are provided in S7 Table. Genes coloured red have GO terms associated with immune traits (S8 Table).  Table. Genes coloured red have GO terms associated with immune traits (S8 Table).  Table. Genes coloured red have GO terms associated with immune traits (S8 Table). (TIF) S14 Fig. Local association of anti-Teladorsagia circumcincta IgG levels in lambs with SNP50 and imputed SNP loci at the most highly associated region on chromosome 20. The dotted line indicates the genome-wide significance threshold equivalent to an experimentwide threshold of P = 0.05. Points are colour-coded by their imputation status i.e. from the SNP50 chip (red triangles) or imputed from the Ovine HD chip (black points). Underlying data, sample sizes and effect sizes are provided in S6 Table. Gene positions are shown in the grey panel at the top of each plot and were obtained from Ensembl (gene build ID Oar_v3.1.94) and are provided in S7 Table. Genes coloured red have GO terms associated with immune traits (S8 Table). (TIF) S15 Fig. Local association of anti-Teladorsagia circumcincta IgG levels in lambs with SNP50 and imputed SNP loci at the most highly associated region on chromosome 16. The dotted line indicates the genome-wide significance threshold equivalent to an experimentwide threshold of P = 0.05. Points are colour-coded by their imputation status i.e. from the SNP50 chip (red triangles) or imputed from the Ovine HD chip (black points). Underlying data, sample sizes and effect sizes are provided in S6 Table. Gene positions are shown in the grey panel at the top of each plot and were obtained from Ensembl (gene build ID Oar_v3. 1.94) and are provided in S7 Table. Genes coloured red have GO terms associated with  immune traits (S8 Table). (TIF) S1 Table. Correlations between anti-Teladorsagia circumcincta antibody levels in lambs and adults. Slope, intercept, adjusted R 2 and P-values are given for linear regressions.  Table. Full GWAS results for animal models of anti-Teladorsagia circumcincta IgA, IgE and IgG in lambs and adults, fitting SNP genotype as a factor. A and B indicate the reference and alternate allele at each SNP. CallRate is the genotyping success of the locus on the SNP50 BeadChip. MAF is the frequency of allele B (minor allele frequency). Wald P and Wald P Corrected are the association P-values before and after correction with genomic control λ, respectively. Significant indicates if the SNP was significantly associated with trait variation after correcting for multiple testing. Effect AA, AB and BB are the effect sizes from the animal model for each genotype relative to the model intercept. (CSV) S6 Table. Full association results for animal models of anti-Teladorsagia circumcincta IgA, IgE and IgG in lambs and adults, fitting imputed SNP genotypes as a factor. SNP.Type indicates whether the SNP was imputed from the HD chip or from the SNP50 BeadChip (unknown genotypes are also imputed for the SNP50 BeadChip in this analysis meaning that results will not exactly match those of S5 Table). A and B indicate the reference and alternate allele at each SNP. ImputeSuccess is the imputation success reported from the AlphaImpute analysis. MAF is the frequency of allele B (minor allele frequency). Wald P are the association P-values that have not been corrected for genomic control (see main text). Effect AA, AB and BB are the effect sizes from the animal model for each genotype relative to the model intercept. whether transcription occurs in the forward or reverse strand. Gene_id is the Ensembl identifier for the gene. Gene_name is the gene name associated with the gene_id. Gene_biotype indicates the type of gene (i.e. protein coding, RNA etc). Orthologue is the gene name of orthologues associated with the gene ID, with orthologue count giving the number of unique orthologues. Consensus locus is the gene name or likely gene name based on orthology. (CSV)

Supporting information
S8 Table. Gene Ontology information for loci (including orthologues) in S7 Table that are associated with immune and antibody phenotypes in humans (hsapiens), mice (mmusculus), cattle (btaurus) and sheep (oaries) obtained using biomaRt. Column names are as for S7 Table, including the following: gene_id is the sheep gene ID; Species = species as previous; ensembl_gene_id is the gene ID within that Species; external_gene_name is the gene name for that species; description is the full gene name; phenotype_description is a description of phenotypes associated with the gene; go_id is the GO term identifier; name_1006 is the GO term name; definition_1006 is the GO term definition. (CSV) S9