Susceptibility towards Enterotoxigenic Escherichia coli F4ac Diarrhea Is Governed by the MUC13 Gene in Pigs

Enterotoxigenic Escherichia coli (ETEC) F4ac is a major determinant of diarrhea and mortality in neonatal and young pigs. Susceptibility to ETEC F4ac is governed by the intestinal receptor specific for the bacterium and is inherited as a monogenic dominant trait. To identify the receptor gene (F4acR), we first mapped the locus to a 7.8-cM region on pig chromosome 13 using a genome scan with 194 microsatellite markers. A further scan with high density markers on chromosome 13 refined the locus to a 5.7-cM interval. Recombination breakpoint analysis defined the locus within a 2.3-Mb region. Further genome-wide mapping using 39,720 informative SNPs revealed that the most significant markers were proximal to the MUC13 gene in the 2.3-Mb region. Association studies in a collection of diverse outbred populations strongly supported that MUC13 is the most likely responsible gene. We characterized the porcine MUC13 gene that encodes two transcripts: MUC13A and MUC13B. Both transcripts have the characteristic PTS regions of mucins that are enriched in distinct tandem repeats. MUC13B is predicated to be heavily O-glycosylated, forming the binding site of the bacterium; while MUC13A does not have the O-glycosylation binding site. Concordantly, 127 independent pigs homozygous for MUC13A across diverse breeds are all resistant to ETEC F4ac, and all 718 susceptible animals from the broad breed panel carry at least one MUC13B allele. Altogether, we conclude that susceptibility towards ETEC F4ac is governed by the MUC13 gene in pigs. The finding has an immediate translation into breeding practice, as it allows us to establish an efficient and accurate diagnostic test for selecting against susceptible animals. Moreover, the finding improves our understanding of mucins that play crucial roles in defense against enteric pathogens. It revealed, for the first time, the direct interaction between MUC13 and enteric bacteria, which is poorly understood in mammals.


Introduction
Enterotoxigenic Escherichia coli (ETEC) expressing the F4 (previously known as K88) fimbriae is a major cause of diarrhea in neonatal and pre-weaned piglets [1], which leads to considerable economical loss in the pig industry. The bacteria use fimbriae to adhere to specific receptors on brush borders of enterocytes of the small intestine. Colonizing bacteria secret the deleterious enterotoxins that cause an increased secretion of electrolytes into the lumen. Subsequently, water flows into the lumen resulting in diarrhea [1].
Three antigenic variants of F4 have been described: F4ab, F4ac and F4ad, of which F4ac is the most prevalent [2]. As early as 1977, Gibbons et al. [3] showed that the adherence to ETEC F4ac was inherited as an autosomal dominant Mendelian trait with the two alleles: S (adhesion, dominant) and s (non-adhesion, recessive). It is assumed that susceptibility towards ETEC F4ac is determined by the intestinal receptor that allows the bacterium to adhere to the intestinal tract or not. The identification of the receptor locus is thus desirable for the pig industry as it would enable us to accurately and efficiently eliminate the susceptible allele from nucleus breeding populations, leading to decreased mortalities caused by ETEC F4ac infection.
The locus encoding the intestinal receptor for ETEC F4ac, denoted as F4acR, has been initially mapped to the q41 region on pig chromosome 13 (SSC13) by two independent linkage analyses [4][5]. The responsible region was subsequently refined to 5.7 cM by a meta-analysis of different experimental populations [6] and narrowed down to an interval of 3.1 Mb by haplotype sharing analysis [7]. More recently, the receptor locus has been further defined within the LMLN-S0283 region by recombination breakpoint analysis [8]. Several interesting candidate genes of F4acR including MUC4 [9], MUC13 [10], MUC20 [11] and TFRC [12][13] in the critical region have been investigated, and genetic markers significantly associated with in vitro F4ac adhesion phenotypes in specific pig populations have been described [9][10][14][15]. However, the responsible gene and causal variant(s) of F4acR remains unknown so far. By a battery of genetic analysis, we herein show the compelling evidence that MUC13 is the responsible gene for the intestinal receptor conferring susceptibility to ETEC F4ac infection in pigs. We further identified MUC13 markers that are in complete linkage disequilibrium with the resistant causal allele in a broad panel of Western pig populations. The finding allowed us to select for the F4ac resistant animals and would greatly benefit the worldwide pig industry.

Results and Discussion
Whole Genome Scan Confirms the Location of F4acR in the q41 Region on SSC13 To identify loci affecting economically important traits in pigs, we constructed a large scale White Duroc 6 Erhualian F 3 intercross population [16], in which 755 F 2 and 461 F 3 animals were recorded for in vitro F4ac adhesion phenotypes by a microscopic enterocyte adhesion assay as described previously [17]. We genotyped the entire F 2 pedigree for 194 microsatellite markers covering the pig genome and performed a whole genome scan. The linkage analysis mapped F4acR to a region of 7.8 cM flanked by SW207 and S0075 in the q41 region on SSC13, which confirmed the previous reports of other investigators [4][5].
Chromosome Scan with High-density Markers on SSC13 Refine F4acR to a 5.7-cM Region To refine the location of F4acR, we increased the marker density in the SW207 -S0075 interval on SSC13. A panel of 50 informative markers including 32 microsatellite and 18 SNPs on SSC13 were genotyped on all animals in the White Duroc 6 Erhualian F 2 cross. A multipoint linkage analysis showed that the UMNp997-S0283 interval of 5.7 cM was defined as the most likely region harboring F4acR as the association of this region was 100-fold stronger than that for any other region in the genome ( Figure 1A). The result was consistent with the recent mapping report of F4acR by Joller et al. [6].

Recombination Breakpoint Analysis Defines F4acR within a 2.3-Mb Interval
To further define the physical location of F4acR, we performed recombination breakpoint analysis in the White Duroc 6 Erhualian F 3 intercross. The entire cross was genotyped for 23 informative markers flanking the 5.7-cM interval, and the F 2 pedigree was further genotyped using PorcineSNP60 BeadChips (see below). We identified susceptible and resistant haplotypes of founder animals by their complete association with adhesion and non-adhesion phenotypes in the cross, respectively. Recombination events in the candidate region of F4acR were observed in one F 2 (individual 1501) and one F 3 animals (individual 3314). Individual 3314 was a non-adhesive animal and should be homozygous for the resistant allele. In the F4acR region, this animal carried a non-recombinant resistant chromosome from Erhualian founder sows and a recombinant chromosome from White Duroc founder boars. The recombinant SWR2054-UMNp595 interval was identical to the susceptible haplotype, which thus positioned F4acR downstream of UMNp595 ( Figure 1B). Individual 1501 showed the adhesive phenotype and should be a carrier of the susceptible allele. The individual carried a non-recombinant resistant chromosome and a recombi-nant chromosome both from White Duroc founder boars. The recombinant ALGA0072095 -H3GA0037376 interval around the F4acR region was a resistant haplotype, which hence mapped F4acR upstream of ALGA0072095 ( Figure 1B). Taken together, the breakpoint analysis unambiguously defined F4acR within the UMNp595-ALGA0072095 interval of 2.3 Mb (139.29 Mb -141.59 Mb, Sscrofa10.2) on SSC13 ( Figure 1B). The responsible region refined the recently described 3.1-Mb interval of F4acR [7].
Genome-wide Association and Combined Linkage and Linkage Disequilibrium (LDLA) Analyses Reveal MUC13 as the Most Likely Gene for F4acR To pinpoint the most probable candidate gene for F4acR in the defined 2.3-Mb region, we further genotyped all animals across the F 2 intercross using PorcineSNP60 BeadChips [18]. We performed a genome-wide association study (GWAS) on the basis of 39,720 informative SNPs scan under a dominant model. The GWAS identified the most significantly associated SNP (ASGA0058923, corrected P value = 2.98610 28 ) at 140.93 Mb on SSC13 (Sscrofa10.2, Figure 2A). The SNP was located in the 2.3-Mb interval. Six markers in or proximal to the 2.3-Mb region showed similar association strength (P,1610 27 ) as the SNP. We further performed LDLA analysis for F4acR using the 60K chip data and adhesion phenotype data of the F 2 population. The analysis detected the most significant marker (MARC0096736) in the 2.3-Mb region on SSC13 ( Figure 2B). The 360-kb interval from 140.93 to 141.29 Mb on SSC13 (Sscrofa10.2) appears to the most probable region of F4acR as it harbors the most significant markers in both GWAS and LDLA assays. The region contains 4 annotated genes: SLC12A8, HEG1, ITGB5 and MUC13.
Of the 4 genes, MUC13 appears to be a strong candidate, as F4acR has been shown to be mucin-like sialoglycoproteins [19][20][21]. Mucins form the first line of host defense against enteric pathogens, but are also targets for microbial attachment as they have a variety of oligosaccharide structures providing binding site for bacteria [22][23]. MUC13 is a transmembrane mucin that is highly expressed in the jejunum of the pig [10]. It plays a protective role in intestinal inflammation by inhibiting epithelial cell apoptosis in mice [24]. Aberrant expression of human MUC13 is associated with a variety of epithelial carcinomas, including colorectal, intestinal-type gastric and ovarian cancers (for a review, see [25]). We have previously assumed that MUC13 is an interesting candidate gene for F4acR [10]. More recently, Fu et al. [26] identified five promising candidate genes for F4acR including MUC13 using GWAS. In the present study, MUC13 mapped to the 2.3-Mb critical region of F4acR and was proximal to the most significant SNPs in both GWAS and LDLA assays. We thus believe that MUC13 is the most likely responsible gene for F4acR.

Association Analysis in Outbred Populations Further Supports MUC13 as the Responsible Gene of F4acR
To acquire more evidence for the causality of MUC13, we characterized a mass of SNP markers around the 2.3-Mb critical region and performed a linkage disequilibrium based association analysis for F4acR in a collection of diverse outbred populations. In detail, we recorded F4ac adhesion phenotypes on 292 unrelated animals from 12 Chinese indigenous breeds and 3 Western commercial breeds ( Table 1). These animals were genotyped for a total of 188 informative SNPs covering 24 annotated genes in the critical region. Of the 188 SNPs, 79 were from the MUC13 gene and 53 from another mucin gene (MUC4) that has also been proposed as a candidate of F4acR by other investigators [14][15].
Given that Chinese and Western pigs have different domestication origin and could differ in causal mutations within the F4acR gene, we first performed association analyses separately on Chinese and Western pigs. We found that MUC13 g.28784 T.C was the most significantly associated marker in both Chinese and Western pigs. Especially, this SNP had an accuracy of more than 97% (144 out of 148) distinguishing susceptible and resistant animals in the 148 independent Western pigs ( Table 2). It provides an excellent diagnostic DNA marker for selecting against genetically susceptible animals in Western commercial pigs. We have developed a diagnostic test for the SNP and are applying the test on nucleus animals of Western commercial breeds in China. The result is  (Table S3) were genotyped on all animals in the White Duroc 6 Erhualian F 2 intercross, and a multipoint linkage analysis was performed to localize the receptor locus. The confidence interval from UMNp997 to S0283 for the locus is indicated by dashed vertical lines. (B) Recombination breakpoint analyses define the locus within a 2.3-Mb region. The diagram shows recombination breakpoint events in the candidate region of F4acR in individuals 1501 and 3314. The Erhualian-derived resistant chromosome is indicated in blue, and the White Duroc-derived resistant chromosome is marked in green. The recombinant susceptible haplotype from White Duroc founder boars is highlighted in red. Polymorphisms are displayed at the respective gene or microsatellite markers. The positions of polymorphisms (Table S3 and SNPs on the 60k chip) are shown according to the pig genome assembly (Sscrofa10.2). Microsatellite alleles are numbered consecutively from shortest to longest fragments. For SNP markers the allele with the higher frequency is denoted 1, and the allele with the lower frequency is denoted 2. doi:10.1371/journal.pone.0044573.g001 expected to benefit animals and breeders by protecting against the pathological condition and ensuring economic losses.
Of note, when we performed association analyses across Chinese and Western pigs, the six most significant SNPs were all located in the MUC13 gene. These SNPs had 1000-fold stronger association than any other SNP including MUC4 SNPs ( Figure 3). This observation strengthens the assumption that MUC13 is the responsible gene for F4acR.

MUC13 is a Single Copy Gene that Encodes Two Transcripts (MUC13A and MUC13B) with Distinct PTS Domains
We have previously isolated a 2679-bp cDNA of pig MUC13 (NM_001105293) that was highly expressed in the jejunum. As the deduced MUC13 protein lacked the typical PTS region of mammalian mucins in the N-terminus that is enriched in proline, threonine and/or serine, we speculated that pig MUC13 could have another much longer transcript containing the PTS region [10]. To test this hypothesis, we performed rapid amplification of 59cDNA end (59RACE) assays using both Clontech SMART and TaKaRa technologies as described in Method. The RACE assays identified two extended MUC13 transcripts compared with our previous finding [10]. The two transcripts, namely MUC13A (JN613414) and MUC13B (JN613417), share the same 59UTR of 35 bp, transcription start site and 39UTR of 1497 bp, but have distinct PTS regions that are rich in tandem repeats spanning approximate 3-5 kb ( Figure 4). The PTS oligopeptide core repeat units in MUC13A are 8-9 amino acid residues with the two most abundant types of ASTSAPSA and ASTSAPAAG; while the repeat unit in MUC13B is a string of 8 amino acid residues comprising threonine and proline (TPTPTTTP or TPTPTTTL). It is noteworthy that we failed to characterize the exact number and length of repeats of both transcripts, as the repetitive sequences were unsuccessfully amplified or sequenced using the current available technologies possibly due to the complex second structures of the sequences. Nevertheless, Southern blot analysis revealed that the length of the tandem repeat region was approximate 3-5 kb long (data not shown).
To determine the complete genomic DNA sequence of pig MUC13, we screened 4 pig genomic DNA libraries and identified positive BAC/PAC clones encompassing the MUC13 gene from the libraries. By using the Solexa deep sequencing technology, we obtained the DNA sequences of these clones (JN613413, JN613416) and characterized the genomic structure of the porcine MUC13 gene. Each BAC clone contained a single MUC13 gene, corresponding to one of the abovementioned two transcripts of MUC13 ( Figure 4). The two types of MUC13 DNA sequences (JN613415, JN613418) exhibit a high degree of sequence identity at the nucleotide level (.95%), and both consist of 12 exons and 11 introns ( Figure 4). The different nucleotides between MUC13A and MUC13B DNA sequences are predominantly presented in the PTS region on exon 2 and its flanking intronic sequences. For instance, we identified an Indel of 68 bp in intron 2 with the longer sequence for MUC13A and the shortened sequence for MUC13B ( Figure S1). The Indel was used as a diagnostic marker for MUC13A and MUC13B alleles for the following analysis. Like the cDNA analysis, we unsuccessfully determined the complete DNA sequence of the PTS regions as the Solexa sequencing technology generated short pair-end reads of 148 bp that can not reveal the definite number of tandem repeats. Sequencing mucin genes has been shown to be technically difficult due to the large size and the repetitive structure of these molecules. For example, the missing sequence information for the PTS region is also encountered for MUC3A, MUC6, MUC7, MUC12 and MUC13 in cattle [27].
To examine if MUC13A and MUC13B transcripts are encoded by a single gene or two separate loci, we developed a genomic qPCR assay to quantify copy numbers of MUC13 in the pig genome. The copy number assay measured the relative copy ratio between MUC13 and the reference GAPDH gene. We performed the assay on 60 representative pigs from Chinese and Western diverse breeds. The assay showed that all tested animals had a single MUC13 gene with the copy number ratio of 1.0 to GAPDH ( Figure 5). It demonstrates that MUC13A

MUC13A is Completely Associated with the Resistant Phenotype Across Diverse Breeds
To examine the effect of MUC13A and MUC13B alleles on susceptibility to ETEC F4ac, we genotyped a large sample of pigs (n = 718) from diverse breeds for the diagnostic Indel marker, and analyzed association of the two MUC13 alleles with F4ac adhesion phenotypes in these pigs. We found that all 124 pigs homozygous for the MUC13A allele from the broad breed panel were resistant (non-adhesive) to ETEC F4ac. Moreover, all 594 susceptible animals carried at least one MUC13B allele ( Table 1). The complete association of MUC13A with the non-adhesive phenotype across diverse breeds provides compelling evidence that resistance to ETEC F4ac is governed by the porcine MUC13 gene.  It is noteworthy that the MUC13B allele is associated with both susceptibility and resistance towards ETEC F4ac. Nevertheless, we noticed that of the 188 SNPs around the 2.3-Mb region, only MUC13 SNPs (n = 10) showed the complete (100%) association with the adhesion phenotypes in Western MUC13B homozygous pigs (Table S1). It further supported the MUC13 gene as F4acR.

MUC13 is Perfectly Concordant with the Biochemical Prosperity of F4acR
To elucidate why MUC13B is associated with both susceptibility and resistance while MUC13A confers only resistance to ETEC F4ac, we analyzed the O-glycosylated site of the two MUC13 transcripts as the site is presumed to be the binding site of the bacteria [23]. It has been well shown that mucins are often very densely O-glycosylated, i.e. addition of many short O-linked glycans, such as N-acetyl-galatosamine (GalNAc), to the peptides of mucins. The O-glycosylation is essential for the function of mucins as it is required to maintain an extended conformation to create a long, filamentous structure. The highly elaborate structures allow mucins to mediate the interactions between epithelia and their surroundings. The abnormal interactions have been implicated in many disease processes including infectious and inflammatory diseases, cancer and metastasis [23].
Protein motif/domain analysis showed that MUC13A does not have O-glycosylation sites (Figure 4, Figure S2). This indicates that the peptide of MUC13A can not form the proper filamentous structure by the O-glycosylation for the attachment of ETEC F4 fimbriae. It hence explains why MUC13A homozygous animals are all resistant to ETEC F4ac. For MUC13B, it has potential Oglycosylation sites predominantly in the PTS region (Figure 4, Figure S2). Therefore, MUC13B could be heavily or lightly Oglycosylated depending on the variable tandem repeat sequences of the PTS region. This is concordant with the observation that MUC13B is associated with both susceptibility and resistance towards ETEC F4ac.
The most abundant amino acids for the receptor proteins are threonine (49%) and proline (25%) [19]. The PTS region of MUC13B is enriched in tandem repeats of TPTPTTTP or TPTPTTTL with a proportion of threonine to proline being about 2:1, which is perfectly consistent with the protein properties of F4ac receptor. A search of homologous sequence against the latest porcine genome assembly (Sscrofa10.2) did not find any other sequence similar to the MUC13B PTS sequence. These findings give additional strong supporting evidence for the MUC13 gene determining susceptibility/resistance to ETEC F4ac.

Variable Tandem Repeats (VNTR) in the PTS Region of MUC13 are Potential Causal Variant(s)
Quantitative RT-PCR analysis showed that the expression level of MUC13 did not differ significantly in the small intestine of susceptible and resistant animals ( Figure S3). The finding is consistent with the recent report that the expression of MUC13 is not related to susceptibility towards ETEC F4ac [28]. This indicates that the more probable causative mutation(s) are coding variants altering the function of MUC13. To identify MUC13 causative mutation(s), we screened variants in the complete coding region except for the PTS repetitive sequences using RNA of susceptible and resistant animals from both White Duroc and Erhualian breeds. We detected 14 nonsynonymous mutations out of 24 cSNPs. All cSNPs along with 55 intronic SNPs of MUC13 were included in the data set of 188 SNPs that were genotyped on the 292 outbred pigs. To test if these mutations of interest contribute to susceptibility towards ETEC F4ac, we analyzed their association with the adhesion phenotypes in the 292 animals. The protein-altered SNPs occurred in both susceptible and resistance animals, thereby excluding them as the causative mutation.
As mentioned above, the large PTS regions with variable tandem repeat sequences are characteristics of mucins. The regions constitute O-glycolysation sites that are essential for the biological functions of mucins. Hence, variance in the number, length and sequence of the tandem repeats can impact the extent and type of glycosylation and consequently the functions of Figure 5. The quantitative analysis for MUC13 copies by a real time TaqMan PCR assay. Both susceptible (+) and resistant (2) animals randomly sampled from Western and Chinese pigs were used for the copy number assay. These animals (n = 60) were classified into 10 groups according to their genotypes at the diagnostic Indel site and F4ac adhesion phenotypes. Each group included 6 animals, and each animal were analyzed in triplicate. Estimation of copy number was performed by the comparative CT relative quantification assay. The y-axis is the ratio of MUC13 copy to the reference GAPDH copy. The assay shows that the porcine MUC13 gene is a single copy gene. doi:10.1371/journal.pone.0044573.g005 mucins. For example, the variable tandem repeats in a variety of mucins have been associated with disease susceptibility in humans (for a review, see [29]). This knowledge led us to hypothesize that variable tandem repeats in the PTS region are the most probable causative mutations in the MUC13 gene. Chinese and Western pigs are expected to have evolved multiple VNTR alleles in the PTS region that govern susceptibility/resistance to ETEC F4ac. If so, it is unlikely to detect SNPs showing complete LD with the causative mutations, just as observed in this study. Currently, the variable tandem repeats can not be characterized due to amplification and sequencing failure. Further investigation will be directed to validate our hypothesis using the next-generation technologies.

Summary of the Supporting Evidence for MUC13 as the Responsible Gene
We herein described the causality of the MUC13 gene for susceptibility/resistance to ETEC F4ac in pigs. The causality is established on the basis of the following arguments: (1) MUC13 maps to the 2.3-Mb critical region containing F4acR; (2) MUC13 is proximal to the most significant markers in both GWAS and LDLA analyses based on large scale SNPs scan across the pig genome; (3) Of the 188 SNPs around the critical region, the six most significant SNPs that had 1000-fold stronger association than any other SNP in diverse outbred populations were all located in MUC13 (4) MUC13A allele was completely associated with the F4ac non-adhesion phenotype across diverse pig populations; (5) All susceptible animals from the broad breed panel carried at least one MUC13B allele; (6) MUC13 is perfectly consistent with the known biochemical properties of F4acR, as MUC13B has the unique O-glycosylation region that forms the binding site of bacterium and is rich in threonine and proline while MUC13A does not. Altogether, these data allow us to conclude that the MUC13 gene confer susceptibility/resistance to ETEC F4ac in pigs.
Overall, our findings have important practical consequences and will have immediate impact on pig breeding programs, as they allow the rapid elimination of the susceptible allele and consequently greatly benefit animal welfare and the pig industry. The findings also provide novel insights into the functions of mammalian mucins, as it establishes, for the first time, the direct interaction between MUC13 and enteric bacteria. Further endeavors will be directed to identify causative mutations in the MUC13 PTS region that can not be amplified and sequenced using the current technologies.

Ethics Statement
All animal work was conducted according to the guidelines for the care and use of experimental animals established by the Ministry of Agriculture of China. The ethics committee of Jiangxi Agricultural University specifically approved this study.

Animals
Experimental animals were from one White Duroc 6Erhualian F 3 intercross population, one Western commercial population, one Chinese cultivated population (Sutai) and 15 outbred populations. The intercross population was constructed with two divergent founder breeds: White Duroc and Chinese Erhualian. Two White Duroc boars were mated to 17 Erhualian sows, and 9 F 1 boars were then intercrossed with 59 F 1 sows avoiding full-sib mating to generate 1912 F 2 animals, of which 87 boars and 299 sows were intercrossed to produce 5311 F 3 animals. In this study, 755 F 2 and 461 F 3 animals at day 240 were slaughtered for ETEC F4ac adhesion phenotype recording. The management of the experimental population has been described previously [16]. The Western commercial population included 260 hybrid pigs at day 180 that were produced from a three-way cross between 24 Duroc boars and 24 Landrace 6 Large White hybrid sows in 5 farms. The Chinese Sutai population comprised 166 adult pigs at day 240 from 6 sire families. The breed was developed after 18-genereation selection from a Duroc (50%) 6 Erhualian (50%) cross in 1986. A total of 292 unrelated individuals at the age of 6 to 8 weeks were sampled from 15 Chinese breeds and 3 Western breeds ( Table 1).
To represent a broad consanguinityhttp://www.iciba.com/strain/ , animals of each Chinese breed except for Lantang pigs were collected from at least 3 unrelated sire families (no common ancestry for 3 generations) each with 2 to 4 animals. For the three Western breeds, piglets of each breed were collected from 5 nucleus populations representing 18 (Duroc), 8 (Landrace) and 21 (Large White) sire families. Genomic DNA was extracted from ear tissues using a routine phenol/chloroform way and diluted to a final concentration of 20 ng/ml.

Phenotype Recording
A microscopic enterocyte adhesion assay developed by Baker et al. [30] was adopted to record in vitro ETEC F4ac adhesion phenotypes with slight modification as described previously [17]. In brief, brush borders of enterocytes were harvested from a 2-cm segment of the jejunum collected from each animal within 30 min after slaughter. These brush borders were then incubated with F4ac bacterial suspension and 50 ml of mannose (0.4 mg ml 21 ) at 37uC for 30 min with gentle shaking. Each brush border was subsequently tested for its adhesion with F4ac by a phase contrast microscopy (Leica). A total of 20 well-separated and intact brush borders were examined in each specimen. In cases where less than four brush borders bound more than two bacteria, an additional 20 brush borders were scored. According to the classification standard proposed by Baker et al. [30], individuals were classified as susceptible (adhesive) to ETEC F4ac when at least 10% of the brush borders bound more than two bacteria. Specimens with all brush borders bound by less than two bacteria were considered as resistant (non-adhesive) subjects. Otherwise they were considered as weakly adhesive animals.

Whole Genome and Chromosome Scan
A panel of 194 informative microsatellite markers covering the pig genome was genotyped across the White Duroc 6 Erhualian F 2 population as described in Guo et al. [16]. To identify SNPs in the mapped region of F4acR on SSC13, genomic DNA of F 1 boars was amplified with primers listed in Table S2 and amplicons were directly sequenced in a 3130xl Genetic Analyzer (Applied Biosystem) using original primers. Additional microsatellite markers in the critical region were mined from the pig genome assembly (Sscrofa10 at http://www.ncbi.nlm.nih.gov/mapview/ map_search.cgi?taxid = 9823). The newly developed microsatellite and SNP markers were genotyped for all animals in the intercross F 2 pedigree with primers given in Table S3 by using the fluoresce dye labeled primers (for microsatellite), SNapshot (Applied Biosystem) and PCR-RFLP technologies. A multipoint linkage analysis was performed on SSC13 with Allegro version 2 [31].

Recombination Breakpoint Analysis
Haplotypes in the candidate region of F4acR were reconstructed for all tested animals in the White Duroc 6 Erhualian cross using SimWalk2 software [32][33]. Susceptible and resistant haplotypes were determined by their complete association with adhesion (susceptible) and non-adhesion (resistant) phenotypes in the resource population, respectively. F 2 and F 3 animals that carried a recombinant susceptible haplotype from the founder animals in the F4acR region were explored to define the genomic location of F4acR by recombination breakpoints.

Genome-wide Association and LDLA Analyses
The PorcineSNP60 BeadChips (Illumina) were used to genotype all animals across the White Duroc 6 Erhualian F 2 intercross on an Illumina iScan System following the manufacture's protocol. The bead arrays with call rate ,85% were excluded for further analyses. Genome-wide association studies were performed on all SNPs with a minor allele frequency (MAF) .0.05 and call rate .95% by GenABEL [34]. First, a generalized linear mixed model was performed to adjust polygenic effect. The model was formulated as: y = u+Zu'+e, where y is the adhesion phenotypes (1 for adhesive and 0 for non-adhesive), u is the mean, Z is the kinship matrix and u' is the random effect. The residual from the fitted model was then used to evaluate association with genotypes by a score test [35][36].
LDLA was performed in a haplotype-based approach with the assumption that each current founder population was originated from a history population (K = 20) after recombination and drift of N generations. The haplotypes of the history population can be reconstructed by a Hidden Marcov model [37]. For each individual in the intercross population, we can trace back its genotype at each polymorphic site to its ancenstor K. Association of the adhesion phenotypes (1 for adhesive and 0 for non-adhesive) with genotypes were finally tested.

Association of SNPs with F4ac Adhesion Phenotypes in Outbred Populations
Polymorphisms in the responsible region of F4acR defined by recombinant breakpoint analysis were identified by comparative sequencing of genomic DNA of two adhesive White Duroc and two resistant Chinese Erhualian animals using primers given in Table S4. A final panel of 188 informative SNP markers (Table  S4) in 24 genes was genotyped on the 292 purebred animals with the adhesion phenotypes by iPLEX SEQUENOM MassARRAY platform. SNP genotype calls were filtered and checked manually, and aggressive calls were omitted from the dataset. Associations of SNP markers with F4ac adhesion phenotypes were evaluated with the standard x-test implanted in BEAGLE [38].
Isolation of the Complete cDNA and Genomic DNA Sequence of the MUC13 Gene MUC13 specific primers F1/R1 and F2/R2 (Table S5) were designed from the 59-and 39-regions of the previously isolated MUC13 mRNA sequence (NM_001105293). Three BAC and one PAC clones harboring the complete porcine MUC13 gene were identified by PCR screening of 4 genomic DNA libraries constructed from Western [39][40][41] or Chinese Erhualian pigs [42] using the MUC13 specific primers. These clones were sequenced at 3006coverage by the Solexa (Illumina) technology at Beijing Genomic Institute, Shenzhen.
Total RNA was extracted from the jejunum of both adhesive and non-adhesive F 2 animals using the Rneasy Fibrous Tissue Mini Kit (Qiagen). The first strand-complementary DNA was synthesized with the SMART RACE cDNA synthesis Kit (Clontech) and the 59-Full RACE Kit (TaKaRa). To obtain the extended 59-end of MUC13 cDNA, the first strand cDNA (Clontech) was first amplified with MUC13 nested primers F3/ NF3 (Table S5) and universal primers UPM/NUP (Clontech ,   Table S5). To isolate the further 59-end sequence of cDNA, primers F4/NF4 and F5/NF5 (Table S5) were designed from a conserved region of the first exon of mammalian MUC13 and the extended 59 cDNA sequence by Clontech RACE. The primers together with 59RACE Outer and Inner Primers (TaKaRa , Table  S5) were used to amplify the first strand cDNA (TaKaRa). Primers F6/R6 (Table S5) were designed to amplify a fragment filling the gap of MUC13A transcript. All RACE PCR products were cloned to pGEM-T Easy Vector (Promega) for sequencing analysis using M13 universal primer. The full-length MUC13 cDNA sequence was obtained by joining the 59RACE amplicon sequences with the previously isolated cDNA sequence [10]. The complete MUC13 genomic DNA sequence was determined by the alignment of the obtained cDNA sequence with the BAC/PAC sequences.

Copy Number Assay
MUC13 and GAPDH specific amplicons of 437 bp and 368 bp were generated by routine PCR with primers MUC13-FP1/RP1 and GAPDH-FP1/RP1 (Table S6), respectively. The two amplicons were connected to form an 805-bp fragment by bridge PCR using primers MUC13-FP1 and GAPDH-RP1 (Table S6). The fused fragment was cloned into a pGEM-T Easy vector (Promega). Sequence analysis confirmed that the recombinant plasmid clone contained a single copy of MUC13 and GAPDH fragments. The plasmid DNA was used as the reference sample in subsequent genomic qPCR assays, which determined copy numbers of MUC13 in the pig genome.
TaqMan probes and primers (Table S6) were designed for target (MUC13) and reference (GAPDH) genes. The target and reference probes were 59 labeled with 6-FAM and VIC, respectively. Both probes were 39 labeled with the minor groove binder non-fluorescent quencher (ABI). The amplification efficiencies of MUC13 and GAPDH were measured and validated by the C T slope method over a fivefold range dilution of the reference DNA. Standard curves were created by plotting the C T values against the logarithm amount of DNA. Genomic qPCR assays were performed using 60 independent animals from Chinese and Western diverse breeds. The 60 animals were classified into 10 groups according to their adhesion phenotypes and genotypes at MUC13A and MUC13B alleles. The target/reference ratios of all samples are normalized by the target/reference ratio of the calibrator sample (the plasmid DNA) using the method described in [43]. Each sample was analyzed in triplicate. The results are expressed as a fold ratio of the normalized target amounts to the reference amounts. All quantitative PCR were performed on a 7500 FAST Real-Time PCR system (ABI).

Computational Analyses of MUC13 Domains
Computational analyses were performed to identify the protein domains of MUC13A and MUC13B. Since the exact number of tandem repeat in the PTS region was not known, we initially assumed this repeat number as 10 for the following analyses. To make sure the analyses to be robust, we compared the results from the sequences with repeat number varying from 10 to 100. The protein domains were identified using Pfam [44]; the GlcNAc Oglycosylation sites and N-Glycosylation sites were predicted using the DictyOGlyc and NetNGlyc server, respectively [45]; the coilcoiled structures were analyzed using COILS [46]; and the disorder regions were predicted using RONN [47].