Cloning, Annotation and Developmental Expression of the Chicken Intestinal MUC2 Gene

Intestinal mucin 2 (MUC2) encodes a heavily glycosylated, gel-forming mucin, which creates an important protective mucosal layer along the gastrointestinal tract in humans and other species. This first line of defense guards against attacks from microorganisms and is integral to the innate immune system. As a first step towards characterizing the innate immune response of MUC2 in different species, we report the cloning of a full-length, 11,359 bp chicken MUC2 cDNA, and describe the genomic organization and functional annotation of this complex, 74.5 kb locus. MUC2 contains 64 exons and demonstrates distinct spatiotemporal expression profiles throughout development in the gastrointestinal tract; expression increases with gestational age and from anterior to posterior along the gut. The chicken protein has a similar domain organization as the human orthologue, with a signal peptide and several von Willebrand domains in the N-terminus and the characteristic cystine knot at the C-terminus. The PTS domain of the chicken MUC2 protein spans ∼1600 amino acids and is interspersed with four CysD motifs. However, the PTS domain in the chicken diverges significantly from the human orthologue; although the chicken domain is shorter, the repetitive unit is 69 amino acids in length, which is three times longer than the human. The amino acid composition shows very little similarity to the human motif, which potentially contributes to differences in the innate immune response between species, as glycosylation across this rapidly evolving domain provides much of the musical barrier. Future studies of the function of MUC2 in the innate immune response system in chicken could provide an important model organism to increase our understanding of the biological significance of MUC2 in host defense and highlight the potential of the chicken for creating new immune-based therapies.


Introduction
The vast majority of the gastrointestinal tract is covered by a mucosal surface, which creates an important biological barrier that shields the epithelial lining. The top layer of the mucus gel surface, which is the first line of the innate immune defense, is composed primarily of a family of proteins called mucins (MUC). Mucin family members are broadly grouped into secretory and membrane-associated mucins. Membrane associated mucins are involved in signal transduction, oncogenic processes and/or gel formation [1]. Secretory gel-forming mucins (i.e. MUC2, MUC5AC, MUC5B, MUC6, MUC7 and MUC19) contain at least one repetitive domain rich in Pro, Thr and Ser (i.e. the PTSdomain), as well as von Willebrand domains (B, D or C), a cysteine rich domain (CysD), and a cystine knot (CT) [2,3]. O-linked glycosylation occurs in the PTS domain, while the VWB, VWD, VWC, CysD and CT regions facilitate oligomerization and polymerization. In the small and large intestine, the primary gelforming mucin is MUC2, although there are detectable levels of MUC5AC and MUC6 in the large intestine [4].
Human MUC2 is a large (5179 amino acid) heterologous glycoprotein that can be modified posttranslationally with more than 100 different oligosaccharides [5]. The oligosaccharides attach along the middle of the protein throughout the mucin domain [6]. The cystine knots at the C-terminus facilitate homodimerization in the endoplasmic reticulum [7], while trimerization occurs in the Golgi through the formation of disulfide bonds at the N-terminus [8]. This produces a 6membered homopolymer that potentially oligomerizes into hexagonal sheets [4,9,10,11]. Interactions between internal CysD sites create the 3D architecture of the mucosal gel surface [12]. In the intestinal lumen, the charged sugar chains efficiently trap water molecules, creating a stable continuous network that functions analogously to a protective semi-permeable membrane [13]. This protective structure is continually assaulted by physical shear stress due to luminal fluid flow, microbial forging and erosion from proteases or chemical degradation [4].
MUC2 is fundamental in maintaining the architecture of the gel layer on the intestinal surface and in preventing microorganisms from approaching the innermost mucus layer [6]. Alternate splicing of MUC2 and the heterologous nature of the attachment of the sugar molecules generate a highly heterogeneous mucin gel layer, which creates a broad innate defense mechanism within the gastrointestinal tract. Deficiency of or missense mutations in Muc2 causes the epithelial barrier to become permeable to bacteria, leading to colonic inflammation and spontaneous colitis in mice [14,15], as well as increased susceptibility to infection by enteric nematodes [16]. In humans, rare short MUC2 exonic minisatellites comprised of sequences from the tandem repeat PTS cassettes, have been associated with the onset of gastric cancer [17].
Functional annotation of MUC2 in humans indicates the presence of two polymorphic PTS cassettes [18] and 11 alternatively spliced MUC2 transcripts (UniProtKB, Swiss-Prot) [19]. In addition, analysis of MUC2 in the LS174T derived HM7 colon cancer cell line led to the identification of a transcript variant that lacked the second PTS domain [20]. The presence of this highly polymorphic PTS VNTR (variable number of tandem repeats) has inhibited the resolution of the full-length mRNA, as well as the functional annotation of the complete DNA sequence in many species, including mouse and human [4,9,10]. Despite these efforts, the precise annotation of these alternatively spliced MUC2 transcripts remains incomplete, and the length of the PTS domain, which is predicted to span 55-110 cassettes, remains highly polymorphic within the human population [18]. Although the biological relevance of these alternatively spliced products in human is not fully understood, it is believed that they are associated with pathogenesis of intestinal diseases. Although functional studies in mice have indicated that Muc2 plays roles in the biology and health of the gut [15,21,22,23], the function of the PTS domain in mice is less clear, due to the annotation of a relatively short and imprecise repetitive cassette [24].
Evolutionary studies predict that the gel forming mucins share a common ancestor with lower metazoa, as their domain structures are well conserved across a wide range of species from invertebrates to humans [3,25]. However, relatively few MUC genes have been identified in avians and amphibians. The first Mucin gene cloned in chicken was ovomucin alpha-subunit [26], now annotated as MUC5B. In silico predictions [3,27] and annotation of short mRNAs and expressed sequence tags (ESTs) have generated a putative partial MUC2 cDNA in chicken. However, these studies have provided very little functional annotation evidence of the genomic organization of the chicken MUC2 locus. To determine the structure, expression, biosynthesis and gene signatures of intestinal mucins from a functional and evolutionary perspective, we cloned the chicken MUC2 cDNA that encodes the MUC2 peptide backbone. We achieved this by analyzing and assembling more than 85 cDNA clones that were generated by overlapping RT-PCR products, rapid amplification of cDNA ends (RACE), sequencing of ESTs, and incorporating functional annotation data (i.e. mRNAs and ESTs) from the UCSC database [28] and NCBI [29]. We also compared our sequence to the predicted chicken cDNA (http://www.medkem. gu.se/mucinbiology/databases/). We found that the 11,359 bp chicken cDNA spans 74.5 kb of genomic DNA and is comprised of at least 64 exons. MUC2 is expressed in multiple regions of the gastrointestinal tract, and we detected transcripts as early as embryonic day 14.5. We found several alternatively spliced products, and characterized the splice junctions of one of these transcripts. We determined that the chicken MUC2 protein is remarkably similar to human and mouse outside of the central PTS domain, but is highly divergent within this central repetitive structure. In humans, this PTS domain is highly glycosylated by Oglycans in the Golgi, and it is predicted that these posttranslational modifications largely contribute to the innate immune response, as proteolytic cleavage of these sugar chains occurs in the outer mucus layer when these molecules come into contact with foreign pathogens [30]. It will be interesting to compare the posttranslational modifications in chicken with other species, especially given the high degree of divergence in this region.

Methods
Tissue biopsy, total RNA isolation Ethics statement: This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The protocol was approved by the Purdue University Animal Care and Use Committee, protocol #03-095. Euthanasia was performed using CO 2 inhalation, and all efforts were made to minimize suffering. Intestinal samples (50-100 mg) were taken from chicken embryos at embryonic day (E) 21.5, hatchlings and White Leghorn adult male birds. Tissues were stored in RNAlater, snap frozen in LN 2 or processed immediately for RNA isolation. Fertile chicken eggs (n = 720) were obtained and incubated (Jamesway Incubator Company Inc., Cambridge, Ontario, Canada) for gene expression studies.
Since intestinal segments can be identified by E14.5, embryonic intestinal tracts (n = 5-8) from E14.5, E15.5, E16.5, E18.5, and E21.5 of incubation and post-hatch chickens (d 1, 3 and 7) were dissected as discussed previously [31]. Intestinal regions include: duodenum (from the ventriculus to the end of the pancreatic loop), jejunum (from the duodenum to the yolk sac), and ileum (from the jejunum to the ileal-cecal junction). Total RNA was isolated using TRIzolH (Invitrogen, Carlsbad, CA). For most studies, 5 mg of total RNA was reverse transcribed with M-MLV (Invitrogen, Carlsbad, CA) using random hexamers. To ensure transcripts of appropriate length, the reverse transcription reaction in studies involving qRT-PCR was performed using the iScript cDNA synthesis kit (Bio-Rad Life Science Research, Hercules, CA), which contains a mixture of random hexamers and oligo d(T). Alternately, some samples were reverse transcribed using oligo d(T) and SuperScript III (Invitrogen, Carlsbad, CA) or SMART-Scribe TM (Clontech, Mountain view, CA). Each PCR was performed at least twice to ensure consistency.

RNA-ligase-mediated rapid amplification of cDNA ends (RLM-RACE)
Total RNA was purified using the DNA-free TM DNase Treatment and Removal Kit (Ambion Inc., Austin, TX) as described [32]. Integrity was verified by gel electrophoresis (1% agarose, 16 TAE). RLM-RACE was performed using the GeneRacer TM RLM-RACE kit (Invitrogen Inc., Carlsbad, CA) according to the manufacturer's protocol. Briefly, full-length capped mRNA was obtained by treating purified total RNA with calf intestinal phosphatase (CIP), which removes fragmented mRNA and non-mRNA. The protective 59 cap structure from fulllength mRNA was then dephosphorylated with tobacco acid pyrophosphatase to facilitate ligation of an RNA oligo to the 59 end by T4 RNA ligase. Ligated mRNA (2 mg) was reverse transcribed using SuperScript TM III RT and GeneRacer TM Oligo d(T) primers.
To obtain the 39 end of the MUC2 transcript, first strand cDNA was amplified using the provided 39 anchor primer and a forward, gene specific 39 primer (GSP). Hot-start Taq mixed with Pfu polymerase (Advantageß 2 system, Clontech Laboratories, Inc., Mountain View, CA) was used for the 39 long-range PCR reaction. Amplification was performed under the following conditions: denaturation at 95uC for 1 min, followed by 35 cycles of denaturation at 94uC for 30 s, annealing at 55uC for 1 min, and extension at 68uC for 3 min. To amplify the 59 end of MUC2, a reverse complement 59-GSP and the 59 anchor primer from the kit were used for a touchdown PCR with a long DNA polymerase (BIO-X-ACTTM Long Mix, Bioline, Tauton, MA). The conditions for the 59 touch-down PCR reaction were: 2 min at 94uC for initial denaturation; 5 cycles of 30 s at 94uC followed by 90 s at 72uC; 5 cycles of 30 s at 94uC followed by 90 s at 70uC; 25 cycles of 94uC followed by 30 s at 68uC and 90 s at 70uC; and 7 min at 72uC for the final extension. To obtain the 59 and 39 ends, we performed nested PCR on 1 ml of the first round amplification reaction using internal MUC2-specific primers for both ends of the transcript and the corresponding anchor primers provided by the kit. RACE products were resolved on 1.2% agarose gels, purified with a gel recovery kit (Zymo Research Corp., Irvine, CA) and cloned using the TOPO TA cloning system (Invitrogen Inc., Carlsbad, CA). Internal primers were designed from either in silico sequences or RACE amplified reads. PCR conditions include initial denaturing at 95uC for 5 min followed by 33-34 cycles of denaturation at 94uC for 30 s, annealing at 58 to 63uC for 20 s, and extension at 72uC for 90 to 120 s, and extension at 72uC for 5 min.

Cloning and sequencing
RT-PCR products were inserted into a pCR-4 TOPO vector and chemically transformed into TOP10 E. coli cells (Invitrogen Inc., Carlsbad, CA) as previously described [32]. Long amplicons from RACE-PCR (.2 kb) were cloned into the T vector and chemically transformed into JM109 Competent cells (Promega, Madison, WI). Plasmids from each clone were prepared and purified using a Quicklyse Miniprep kit (Qiagen Inc., Valencia, CA) and digested with EcoRI. Digested fragments were resolved by gel electrophoresis on 1.5% agarose, 0.56 TBE gels. Three to ten subclones from each clone were sequenced bidirectionally using BigDye 3.1 on an ABI3730XL apparatus (ABI, Life Technologies). Resulting sequences were aligned using Sequencher TM Software (Gene Codes Corp., Ann Arbor, MI). Additionally, two overlapping EST clones (Accession #s BU287205 and BU368530) downstream to the annotated MUC2 transcript were purchased (ARK-Genomics, the Roslin Institute, UK) [33] and sequenced as described.
Genomic DNA was isolated from spleen from four independent chicken samples following proteinase K digestion and phenol/ chloroform extraction. High molecular weight DNA was collected by spooling and diluted to a concentration of 50 ng/ml for PCR amplification. Following amplification and purification using the DNA Clean & Concentrator TM -5 Kit (Zymo Research Corp., Irvine, CA) to remove free nucleotides and excess primers, the amplicons were sequenced using a J BigDye 3.1 reaction. In a 10 ml reaction volume, this corresponds to 2 ml of 56 sequencing buffer, which ensures that the correct concentrations of reagents are included in the sequencing reaction, 5 mM primer, 2 ml of each amplicon, 1 ml of BigDye 3.1 and 5 ml of H 2 O. Sequencing reactions were purified using the ZR DNA Sequencing Clean-up Kit TM (Zymo Research Corp., Irvine, CA) and were sequenced as described above. ABI files were uploaded, aligned and analyzed using Sequencher TM Software (Gene Codes Corp., Ann Arbor, MI).

Northern blot hybridization
Total RNA prepared from chick intestine was denatured in 50% formamide (v/v), 5% formaldehyde (v/v) and 20 mM MOPS, pH 7.0, at 65uC for 10 min; electrophoresed in 1.2 to 1.3% agarose gels containing 5% formaldehyde (v/v); and transferred to Hybond N + nylon membranes overnight. RNA was fixed by cross-linking under UV for 125 s. Membranes were prehybridized in ULTRAhybH buffer (Ambion) for 1 h at 42uC. Hybridization was carried out at 42uC overnight in ULTRAhybH buffer containing 32 P-labeled probes and 0.1 mg/ml denatured salmon sperm. Probes for chicken MUC2 were prepared by asymmetric PCR or PCR in the presence of [c-32 P]dCTP using gel recovered RT-PCR products as the template. The RNA ladder was radioactively labeled using reverse transcription with random primers. Membranes were washed at 65uC in 26SSC; 0.16SDS; 16 SSC; 0.16 SDS, and subsequently 0.16 SSC; 0.16 SDS and exposed to Kodak XAR (Eastman Kodak, Rochester, NY) autoradiography film.
Quantitative RT-PCR MUC2 expression was analyzed by quantitative RT-PCR (qRT-PCR) in embryonic and post-hatch tissues of chicks as described [32]. Primer pairs ( Table 1, P34 to P37) for qRT-PCR analysis were optimized, and PCR products were cloned (into the pCR-4TOPO vector) and confirmed by sequencing. Assays were conducted in 15 mL reactions using iQ SYBR Green Supermix (Bio-Rad Life Science Research, Hercules, CA) with diluted firststrand cDNA. qRT-PCR programs for MUC2 and 18S RNA were: 5 min at 95uC, 40 cycles of 95uC for 15 sec, 56uC or 57uC for 15 sec, 72uC for 15 sec and 82uC or 83uC for 15 sec data collection, followed by 80 cycles for melting curve analysis. All cDNA samples calculated from 100 ng of total RNA per reaction were assayed in duplicate. Quantification standards were comprised of four 100-fold dilutions of purified plasmid DNA (containing from 10 8 to 10 2 molecules or 10 7 to 10 1 molecules) and assayed in triplicate with R square values of 0.99 or above. Standards were used to calculate a linear regression model for threshold cycle (Ct) relative to transcript abundance in each sample. The log value of MUC2 transcript starting abundance was calculated from the Ct values corrected by a factor calculated from 18S RNA as previously described [31].

Cloning the chicken MUC2 cDNA
In our aim to clone the full-length chicken MUC2 gene, we amplified, cloned and sequenced 16 overlapping MUC2 RT-PCR products (F1-F14), two expressed sequence tags (ESTs) from the 39 end of the MUC2 gene (not shown), and products from 59 and 39 RACE (F17, F19) ( Figure 1A). We sequenced the 1.5 kb 39-RACE and 3.3 kb 59-RACE clones in their entirety using multiple internal primers ( Figure 1B and 1C). RT-PCR clones derived from internal primers were sequenced to confirm the exon-intron junctions of the 59 RACE product (F14, Figure 1A). Long-range RT-PCR was performed to determine the sequence of the central and 39 terminal exons of MUC2, resulting in amplification of two fragments close to 3.7 kb in size (F15 and F16, Figure 1A). We sequenced two overlapping EST clones (Accession #s BU287205 and BU368530) [33] located at the 39 end of the cDNA in an attempt to close the gap ( Figure 1A) produced by the highly polymorphic PTS domain, however this was not successful..
To close the gap, we used BLAT alignment [34] to map the cDNA to the UCSC reference sequence (Nov. 2011 (ICGSC Gallus_gallus-4.0/galGal4)) [28]. Based on the genomic alignment, the UCSC database predicted that there was no gap in the cDNA, as the two exons spanning the gap ( Table 2, exons 45 and 46) were located within a 136 bp sequence with a short intron. To confirm the genomic organization, we designed primers that flanked the putative gap in the genome. PCR amplification and sequence analysis confirmed that the UCSC annotation was correct, and that we had spanned the gap between the two cDNA contigs ( Figure 1D). We next assembled all of the cDNA clones, as well as the predicted cDNA and annotated mRNAs and ESTs from the UCSC and NCBI databases into an 11,359 bp chicken MUC2 cDNA sequence ( Figure S1), which has been deposited into GenBank (Accession # JX284122). Translation of the cDNA indicates that we identified a 3697 amino acid protein ( Figure S2), which is 1482 amino acids shorter than the predicted human orthologue [5] and 1017 amino acids longer than the annotated mouse protein [24,35].

MUC2 genomic organization and protein structure
Using northern blot analysis, we estimated the size of the fulllength MUC2 transcript to be approximately 12 kb using probes targeting the 39 and 59 termini (Figure 2A Figure 3A, 3B). Alignment of our MUC2 cDNA with the Nov 2011 Build (ICGSC Gallus_gallus-4.0/galGal4) of the chicken reference genome [28] indicates that MUC2 spans at least 64 exons (Table 2; Figure 3B). The translational start site occurs within exon 1 at nucleotide 25, while the translational stop site is found at position 11,117 in exon 64.
By comparing the positions of known chicken mRNAs, ESTs and predicted transcripts, as well as cross-species comparison of human, turkey and helmeted guineafowl mRNAs with our cloned cDNA, we demonstrate strong evidence for our annotation of the genomic structure of MUC2 in chicken (Table 2, Figure 4). Three partial chicken MUC2 mRNAs share significant overlap with our gene. HQ739084 (derived from spleen) and JN639849 share perfect homology with exons 9-11, while CR386462 overlaps with exons 42, 44, 50, 51, 53, 54-57. However, exons 54-56 are annotated as one exon in cDNA CR386462, and exon 57 is smaller than the sequence we cloned. Several chicken ESTs map to our MUC2 exons and add two additional exons. BU296220 overlaps with exons 23-27, while CD753801 maps to exons 32    Four predicted transcripts provide additional support for our MUC2 gene structure (Figure 4). The most complete predicted chicken sequence [3] [36]. Although XM_421035 has been removed, BLAT analysis of the sequence aligns perfectly with exons 2-31 of our cloned MUC2 cDNA, while XM_001234581 aligns directly with exons 51, 53, and 55-64 of our annotated MUC2 cDNA.
The alignment of our cDNA, along with the chicken mRNAs, ESTs and putative transcripts, to the annotated genome matches very well between exons 1-33 and 46-64. However, there are   Figure 3B). Exons 34, 38 and 43 are completely missing from the assembly, while exon 36 is located within the small gap and exon 39 ends abruptly in the large 21.5 kb gap ( Figure 3B). In an attempt to close these gaps, we designed primers that spanned exons 33-35. The predicted amplicon from this region is 2,340 bp. Despite repeated efforts, we were unsuccessful in generating the correct amplicon, due to the fact that the genomic DNA flanking both of these exons contains several elements that are repeated between exons 32 through 39 and exons 41 through 44. In addition, the presence of multiple poly T and poly A tracts within these regions hampered amplification and/or sequencing efforts due to slippage of the polymerase. Similar challenges occurred when we tried to design primers to amplify the region between exons 37 and 39 and between exons 42 and 44.
The chicken MUC2 locus contains a 21,496 bp gap in the assembly. When we align the compiled cDNA to the genomic locus, we were surprised to discover that the only exon that falls within this large gap is exon 40 (29 bp). Since exon 40 lies within the highly repetitive PTS domain, attempts at cloning the intervening sequences by PCR of genomic DNA have been unsuccessful. Similar challenges occur in the human and mouse genes, and it is likely that additional exons in this region could be identified when the technology becomes available to sequence long DNA or cDNA molecules, as assembling DNA or cDNA that contains multiple repeated cassettes is a major challenge with the current Sanger sequencing and next generation sequencing technologies.

Expression analyses of MUC2
We investigated spatial expression of MUC2 throughout the gastrointestinal tract by RT-PCR and temporal expression in the small intestine at embryonic (E) days 14.5, 16.5, 18.5, 21.5 of incubation and 1, 3, 5 days post-hatch by qRT-PCR ( Figures 5  and 6). We used amplicons that spanned three distinct regions of the gene (Exons 1-6; 16-23 and 44-64). MUC2 is highly expressed  (1-64). Black cells represent cDNAs, ESTs or predicted sequences that show the same exon structure as the reported cDNA; Grey boxes represent sequences that are shorter or longer than the compiled cDNA, or that contain significant numbers of mismatches compared to the intestinal MUC2 cDNA; blank cells indicate that this sequence is not present in the corresponding sequence. 1 G. gallus intestinal MUC2 cDNA from this report; 2 Helmeted guineafowl; 3 Turkey. doi:10.1371/journal.pone.0053781.g004 throughout the gastrointestinal tract, with weak signals in the crop and brain ( Figure 5A, 5B and 5C). We observed no alternative splicing using any of these primer pairs. Quantitative RT-PCR analysis of intestinal MUC2 (Exons 25-26; primers P34 and P35) during embryogenesis indicates that expression initiates during late embryogenesis, increasing as development progresses (Figure 6). In the duodenum, jejunum and ileum, MUC2 mRNA levels are relatively low at E14.5, and steadily increase through E21.5. Expression of MUC2 at E14.5 was further confirmed by gel electrophoresis (data not shown). At day of hatch (E21.5), relative MUC2 mRNA levels show a spike (1 to 2 logs) in duodenal and ileal tissues, followed by a steady increase throughout the post-hatch time points. In the jejunum, MUC2 mRNA levels surge to an approximate 2-log increase at H1 followed by a decrease from 1 to 3 d post-hatch, and remain high at 7 d post-hatch.

Alternative splicing of chicken MUC2
We investigated the presence of alternative splicing events of MUC2 by RT-PCR, long-range PCR and available ESTs. Several sets of primers spanning the entire cDNA were assayed in multiple tissue types. We identified and characterized one distinct splicing event ( Figure 7); we detected one shorter fragment in cecal tonsil samples, which revealed that this transcript used internal splice acceptor/donor sites in exons 41 and 43, removing exon 42. This Figure 5. Expression of MUC2 in the gastrointestinal tract and brain. RNAs were reverse transcribed using SMARTScribe TM (Clontech) with Oligo d(T) to generate long, full-length cDNA. We performed 33 cycles of RT-PCR amplification on 40 ng of cDNA with three sets of MUC2 primers. Alternating blank lanes lack reverse transcriptase. A. Exons 1-6. MUC2 is highly expressed in the proventriculus, duodenum, jejunum, ileum, colon, and cecal tonsil, with lower levels in the brain and minimal expression in the crop and ventriculus. Although these primers (P27 and P2) amplify genomic DNA, Genomic DNA controls demonstrate the lack of genomic contamination in all samples, indicating that observed expression is from cDNA B. Exons 16-23. MUC2 is highly expressed in the proventriculus, duodenum, jejunum, ileum, colon, and cecal tonsil, with lower levels in the brain and minimal expression in the crop and ventriculus. Although these primers (P7 and P8) amplify genomic DNA, Genomic DNA controls demonstrate the lack of genomic contamination in all samples, indicating that observed expression is from cDNA C Exons 44-65. A touchdown longrange PCR was used to amply the 39 end of MUC2 using an internal primer and a primer targeting the exact end of the MUC2 cDNA (P30 and P29). This region demonstrates a very similar pattern of expression, with high levels detected in all tissues, except brain, which shows low-level expression,and ventriculus which has minimal expression. No expression is detected in the crop in this analysis. D. HPRT1 control gene. All samples express HPRT, and lack the presence of the genomic DNA band, indicating that the samples do not have genomic contamination. RT-PCR products were examined by electrophoresis through a 2.5% agarose gel in 0.56 TBE (A and B) or 1.2% TAE; water and genomic DNA were used as controls. doi:10.1371/journal.pone.0053781.g005 product is 495 bp shorter than the full-length transcript, but is predicted to result in an in-frame deletion of 165 amino acids within the central PTS domain. Moreover, to explore whether massive alternative splicing events of MUC2 gene would occur in infected versus normal intestine, MUC2 transcripts in Eimeria infected chicks were analyzed, as MUC2 has reported to be aberrantly expressed and critically involved in the pathogenesis of coccidiosis [20], a prevalent protozoal disease in the gastrointestinal tract of the chicken. However, no detectable alternative splicing event(s) were observed at this the resolution ( Figure 2D).

Predicted amino acid sequences and protein structure
We used a combination of protein analysis software (Interproscan; http://www.ebi.ac.uk/Tools/pfa/iprscan/) [37] and analysis of the domain structure of the predicted protein by The Mucin Biology Group (http://www.medkem.gu.se/ mucinbiology/databases/) to determine the putative domain structure of chicken MUC2 ( Figure 8A). The deduced amino acid sequence of MUC2 contains several elements common to gelforming mucins, including: VWD and VWC regions; a central PTS domain interspersed with CysD motifs; and a characteristic cystine knot (CT) at the C terminus ( Figure S2). Other prominent features include four cysteine-rich regions (C8) and two trypsin inhibitor-like, cysteine rich domains (TIL).
We used Rapid Automatic Detection and Alignment of Repeats (RADAR) profiling (http://www.ebi.ac.uk/Tools/Radar/) [38] to detect a core repetitive cassette within the PTS domain. There are 10 total cassettes within this region in chicken, which encompasses amino acids 1702 through 2763 (nt 5131 through 8313) and spans exons 32 through 44. These 10 cassettes are split into three regions containing varying numbers of a highly similar 69 amino acid repetitive element ( Figure 8B): element one contains two repeats interspersed with a GPTPESTTRTT motif; element two contains 6 repeats interspersed with alternating GPTPESTTRTT and GPTSQSTTSTTVSSPS motifs; while element three contains two repetitive cassettes with a GPTPESTTRTT linker motif. These three regions are divided by two of the four CysD domains.
Although the N-terminus and the C-terminus share significant identity among human, mouse and chicken, the PTS domain is highly divergent amongst these three species. The human MUC2 protein contains two types of PTS motifs. The larger one contains 97 highly identical direct head to tail repeats of a 23 amino acid sequence (PTGTQTPTTTPITTTTTVTPTPT). The PTS domain in the mouse is separated in two clusters; cluster one contains nine imperfect duplications of an 8 amino acid repeat, while section two contains 15 imperfect duplications of a 10 amino acid cassette [24].

Discussion
It has been over two decades since the initial cloning of the first intestinal mucin gene in humans [39]. Although the physiological implications and disease associations of mucins on various mucosal surfaces have been well recognized, many questions remain as to how and why the gene architecture of this family contributes to diverse protein modifications that may display distinct functionalities. Different species demonstrate structural and sequence conservations as well as their own uniqueness. Chicken, the moststudied and characterized avian species, bridges the evolutionary gap between mammals and non-amniote vertebrates, providing an excellent model system for agricultural and biological research.
In the mucin family, the PTS-domains (or mucin domains) are highly polymorphic in both length and sequence in humans, which is primarily due to the presence of multiple alleles of various number of tandem repeats (VNTRs). However, the presence of the VNTR, as well as the cDNA sequence within the PTS domain, is not highly conserved evolutionarily [3], highlighting the distinct possibility that broad functional differences exist between species [40]. Our data indicate that the PTS domain of the chicken MUC2 protein contains a vastly different repeat structure than the human protein. Although the chicken PTS region is shorter, the central repeat motif is 69 amino acids in length (as opposed to a 23 amino acid cassette in humans) and shows very little identity with the human motif.
Recent in vitro studies using human intestinal cells demonstrated that the intestinal mucins isolated from chicken were detrimental to the proliferation of Campylobacter jejuni, an infectious bacteria causing acute gastroenteritis in humans but not in chicken [10,40,41]. In addition these studies demonstrated that the chicken mucins attenuated the invasiveness of Campylobacter jejuni, suggesting that differences in mucin protein sequence or structure between humans and chicken could account for the differences in susceptibility to infection. Alternatively, the functional differences between human and chicken may imply species-specific divergence in intestinal mucus composition and/or structures. This could also occur through differences in posttranslational modifi- cations of the human and chicken proteins. Outside of the PTS domain, the human and chicken MUC2 proteins share large blocks of highly conserved sequences, strongly suggesting that this variable PTS region could account for the phenotypic differences. Plausibly, MUC2 is of utmost importance, as the functionality of intestinal mucus was proposed to rely primarily on MUC2 encoded mucins [10]. Therefore, the full understanding of the functional divergence and prognostic implications of chicken mucins compared to their mammalian orthologues necessitates identification and comparisons of the gene sequences across species.
Although identification of new MUC family members is ongoing, sequencing of most MUCIN genes is hampered due to the highly complex PTS cassettes clustered throughout the gene, and several gaps still remain in mouse and human family members [4,9,10]. In the case of the secretory mucins, this can largely be accounted for by the large, frequently repetitive PTS region. The presence of several different polymorphic elements in many of the MUCIN genes hinders annotation efforts at the gene and protein levels, and could even hamper the understanding of the biological significance and disease associations of the diverse family members. By using overlapping RT-PCR, long-range PCR and RACE techniques we have cloned an 11,359 bp chicken MUC2 cDNA. Previous annotations and predictive modeling validate our predicted gene structure. The cDNA that we cloned spans at least 64 exons on chicken 5q16. The central PTS region of the chicken MUC2 locus harbors four CysD motifs and contains 10 repeat cassettes. Although we have closed the gap across the PTS domain by sequencing overlapping cDNA clones derived primarily from chicken intestinal mRNA, it is likely that future studies using more sophisticated sequencing platforms will identify additional exons within the PTS domain. The highly complex nature of this motif indicates that obtaining the full-length MUC2 cDNA could be difficult in the absence of single molecule sequencing efforts. This problem is a common occurrence in the delineation of other mucin genes in mouse and human [4,9,10].
The 59-end of the MUC2 mRNA contains two in-frame ATG codons. Comparing the surrounding sequences of the first ATG codon to the Kozak consensus sequence [42] indicates that the purine at 23 and the G at +6 of GCCGCCATGGGG are conserved within the optimal context for initiation of translation [43]. The sequences surrounding the second ATG codon (Met 10 ; GCCTTTTTATGCTC) are non-consensus Kozak sequences with a T at position 23 and a C at +6. Additionally, analysis of human and mouse MUC2 proteins indicates that the first three amino acids are MGL, which strongly indicates that the first inframe ATG codon is most likely the translational start site.
The initiating methionine residue is followed by a signal sequence of 18 amino acid residues (analyzed by Signal P3.0; HMM probability: 0.997) (http://www.cbs.dtu.dk/services/ SignalP-3.0/) [44] that are rich in leucine but not isoleucine, and are plausibly cleaved to generate the mature mucin isoform during mucin biosynthesis. The amino-terminal region of MUC2, from its initiating methionine to the third C8 motif, spans 1,166 residues composed of multiple VWD and two TIL domains. TIL domains consist of 10 cysteines that are capable of forming disulfide bonds, indicating a high degree of secondary and tertiary structure is possible for these heterogeneous MUC2 protein isoforms. The carboxyl-terminus contains a terminal cystine knot (CT), as well as VWC, VWD domains. These domains are highly conserved throughout evolution [3].
In the endoplasmic reticulum, MUC2 forms disulfide-linked dimers via the VWD domains of the amino-terminus [7,45], while the CT knot in the carboxy-terminus supports disulfide-linked trimerization in the trans-Golgi network [8]. CysD (C8) domains exert non-covalent cross-linkages in the MUC2 gel formation process, likely contributing to tertiary structure and determination of the pore size of the mucus network [12]. Chicken may plausibly carry more CysD domains than that of human, which may suggest that the polymeric net-like structure contains smaller pores in chickens than humans. This could account for differences in innate defense response to pathogens. The conservation of a cationic domain at the C-terminus observed in rodents was not found in chickens [46].
In human MUC2, two different PTS domains have been identified, both of which are located on the same large exon separated by ,600 bp. One region consists of repeats that are interrupted in places by 21 to 24 bp segments. The other is composed of an uninterrupted array (of up to 115 repeated units) of a tandem 23-amino acid repeat cassette [18]. Due to the highly unpredictable but repetitive nature, the PTS regions are somewhat refractory to traditional cloning and sequencing technologies [24,47]. In mice, partial cDNA sequences from the PTS domain suggest the presence of two repetitive PTS regions containing 8 or 10 repetitive units interspaced by a cysteine-rich domain [24]. These repeats are highly dissimilar from both the human and chicken PTS domain. The cDNA that we cloned is composed of 10 interspersed segmental duplications, with the following consensus sequence: VSSSSAPPTPTGSSPTTTSGTTP SSSTIGSTVSTTPVTSPPSPSPTSVSTSTPGPTPTTSVTRPP-TSTE. The repetitive unit is rich in threonine (30%), proline (22%) and serine (29%), and is especially high in serine compared to human MUC2 (0% per repetitive unit in the human PTSregion 2). The significance of this is not clear, however, the PTS domains are highly modified posttranslationally by oligosaccha- Figure 8. Cross-species comparison of the MUC2 protein structure. A. Protein structure of chicken, human and mouse MUC2. InterProScan protein domain prediction analysis (www.ebi.ac.uk/Tools/pfa/iprscan/) [37] indicates that the full-length chicken intestinal MUC2 transcript encodes a 3697 amino acid (aa) protein with a short signal peptide at the N-terminus, multiple von Willebrand factor domain structures (VWD, VWC), several cysteine-rich domains (C8), two trypsin Inhibitor-like cysteine-rich domains (TIL), a 1614 amino acid central PTS domain that is interspersed with four CysD motifs and a C-terminal cystine knot (CT). The structure of the human (5179 aa) and mouse (2680 aa) proteins shows strong homology on both sides of the central PTS domain. The two exceptions are that humans and mice lack the second TIL domain, and mice have an additional VWC motif. Although the N-terminal and C-terminal sequences are highly conserved amongst species, the PTS domain is highly divergent, containing different types and varying numbers of repeat cassettes within the central domain. In chicken, this region stretches between aa 1308 and 2922. The different colors in the cartoons represent the finding that the PTS domains are highly divergent among the three species. B. Sequence comparison of the 10 repeats within the PTS domain. Amino acids 1702 through 2763 demarcate this highly repetitive element, which spans exons 32 through 45. RADAR analysis (http://www.ebi.ac.uk/Tools/Radar/) [38] indicates that these cassettes consist of three blocks of repetitive elements interspersed with two CysD domains. Each repeat is 69 amino acids in length and contains one of two short spacer motifs. Repeats 1 and 2 are located in block one, repeats 3-8 are located in block two and repeats 9 and 10 are located in block 3. doi:10.1371/journal.pone.0053781.g008 rides in humans [30], and these differences could play a role in species-specific innate immune response.
The spatio-temporal expression of MUC2 transcripts follows a specific pattern in humans and rodents [24,48,49,50]. Similarly, our data show that chicken intestinal MUC2 transcripts are expressed throughout the gastrointestinal tract and in embryos as early as E14.5. This is thereafter followed by a rapid increase that follows a developmental timeline. This pattern is seemingly disrupted during the developmental switch from E21.5 (hatch day) to post-hatch day 1 in the duodenal and jejunal tissue. These types of temporal trends in MUC2 expression patterns have been linked to a previous morphometric investigation of intestinal goblet cells, where a gradient increase in goblet cell density was observed along the gastrointestinal tract, and during the period from 3 d prior to and 7 d post hatch [51].

Conclusions
In summary, we have characterized the chicken MUC2 cDNA and identified several conserved structural features of the chicken gene, including VWC, VWD, TIL, C8 and CT domains, as well as large PTS tandem repeat region. Interestingly, although the VWC, VWD, TIL, C8 and CT domains are highly conserved amongst human, mouse and chicken, the PTS domain is quite divergent. Since MUC2 is highly glycosylated posttranslationally, this diversity could prove to be a valuable method for generating species-specific innate immune responses to different host pathogens. This is supported by the supposition that the different species could create mucin gel layers with vastly different pore sizes. This could hamper the ability of pathogens to invade different species and provide a mechanism for the different responses seen across species. Interestingly, known sequence variations in other species have elicited functional differences in cancer incidence, induction of virulence from pathogens, bacterial mucolysis, amongst others, suggesting that the heterogeneity of MUC2 plays an important role in many different biological processes. By defining the structure of mucin from an avian species, we provide important information pertaining to a deeper understanding of the evolutionary mechanisms by which genes contribute to innate barrier functions in the host amongst a wide variety of species. By understanding the role of MUC2 in innate host defense in chickens, we may be able to develop more effective therapies for creating enhanced defense mechanisms in humans. Figure S1 Chicken MUC2 cDNA. We identified an 11,359 bp cDNA for MUC2. We derived the cDNA from all available sources, including: in silico data, mRNAs, ESTs, RACE products and RT-PCR amplicons. (DOCX) Figure S2 Chicken MUC2 protein. Based on the cDNA sequence from Figure S1, we deduced that the MUC2 protein was 3697 amino acids. We used InterProScan (REF) to determine the different domains, and then compared this analysis to the protein structure provided (medkem). When necessary, we adjusted domains to corresponded with the medkem analysis. We used RADAR (REF) to identify repetitive elements within the protein.

Supporting Information
Domains are noted in bold, color-coded text, which corresponds to the color coding in Figure 8. Color coding is as noted: signal peptide (black), VWD domains (medium blue), C8 domains (fuschia), TIL motifs (orange), VWC domains (dark blue), CysD (green), PTS (purple), Cysteine knot (yellow). Shaded text indicates the location of the 69 bp repeat within the PTS motifs. (DOCX)