Skipping of Exons by Premature Termination of Transcription and Alternative Splicing within Intron-5 of the Sheep SCF Gene: A Novel Splice Variant

Stem cell factor (SCF) is a growth factor, essential for haemopoiesis, mast cell development and melanogenesis. In the hematopoietic microenvironment (HM), SCF is produced either as a membrane-bound (−) or soluble (+) forms. Skin expression of SCF stimulates melanocyte migration, proliferation, differentiation, and survival. We report for the first time, a novel mRNA splice variant of SCF from the skin of white merino sheep via cloning and sequencing. Reverse transcriptase (RT)-PCR and molecular prediction revealed two different cDNA products of SCF. Full-length cDNA libraries were enriched by the method of rapid amplification of cDNA ends (RACE-PCR). Nucleotide sequencing and molecular prediction revealed that the primary 1519 base pair (bp) cDNA encodes a precursor protein of 274 amino acids (aa), commonly known as ‘soluble’ isoform. In contrast, the shorter (835 and/or 725 bp) cDNA was found to be a ‘novel’ mRNA splice variant. It contains an open reading frame (ORF) corresponding to a truncated protein of 181 aa (vs 245 aa) with an unique C-terminus lacking the primary proteolytic segment (28 aa) right after the D175G site which is necessary to produce ‘soluble’ form of SCF. This alternative splice (AS) variant was explained by the complete nucleotide sequencing of splice junction covering exon 5-intron (5)-exon 6 (948 bp) with a premature termination codon (PTC) whereby exons 6 to 9/10 are skipped (Cassette Exon, CE 6–9/10). We also demonstrated that the Northern blot analysis at transcript level is mediated via an intron-5 splicing event. Our data refine the structure of SCF gene; clarify the presence (+) and/or absence (−) of primary proteolytic-cleavage site specific SCF splice variants. This work provides a basis for understanding the functional role and regulation of SCF in hair follicle melanogenesis in sheep beyond what was known in mice, humans and other mammals.


Introduction
Many growth factors such as colony-stimulating factor-1 (CSF), transforming growth factor-a (TGF-a), and tumor necrosis factor (TNF) occur in both membrane-bound and secreted forms [1] by specific proteolytic cleavages. These growth factors and their receptors play vital roles in normal development as mediators of intercellular communication by diffusible molecules and often promote cell differentiation and maturation. Stem cell factor (SCF) [2] also known as steel factor (SLF or SF) [1,3]; mast cell growth factor (MGF) [4,5]; kit ligand (Kitl, KL or KITLG) [6] is one of several pleiotropic growth factors, a cytokine that binds to its cognate c-KIT receptor or stem cell factor receptor (SCFR) [2], the product of the c-kit gene. SCF is encoded by the murine Steel (Sl) locus while KIT is encoded by dominant white spotting (W) KIT locus in the mouse [7,8]. SCF plays an important role in hematopoiesis, spermatogenesis, and melanogenesis [1]. In the hematopoietic microenvironment (HM), SCF is produced either as a membrane-bound or soluble form [9,10].
SCF is produced as transmembrane proteins that are released by specific proteolytic cleavage to generate soluble factors [3,5,11]. Alteration in the balance between the diffusible and membrane-bound forms may lead to phenotypic abnormalities as previously reported in dominant white spotting (W) or the Steel (Sl) loci which are among the most studied mutations in mouse [7,8,[12][13][14][15]. Investigations into the expression of c-KIT and SCF in the skin during melanocyte migration are consistent with the known W and Sl phenotypes and suggest that SCF mediates a chemotactic/ hapatotactic signal for c-kit in the development of pigmentation [6]. The membrane-bound SCF/c-KIT signalling could act on mammalian hair follicle melanogenesis during cyclic anagen phases, resulting in hair follicle pigmentation [16]. Besides its role as a melanocyte survival factor, SCF can also act synergistically with several interleukins and granulocyte-macrophage-colony stimulating factor to enhance UV-induced pigmentation [17,18]. The signalling of SCF and its receptor c-KIT has been documented to regulate essential roles in the maintenance of embryonic melanocyte lineages and postnatal cutaneous melanogenesis [3,[19][20][21].
Alternative Splicing (AS) is a key element in gene regulation that increases proteome diversity and the coding potential of various eukaryotic genomes. Evidence from expressed sequence tags (ESTs), cDNA, genome-wide Tilling and splicing microarray datasets in human demonstrate that alternative splicing occurs in 90% of genes [22,23]. The high incidence of AS in the pigmentation gene network for example SCF/c-KIT [14,16] and MITF [24] might contribute to the regulation of their switch in the development of various genetic disorders and phenotypic abnormalities. In the case of SCF gene, AS results in two membrane-bound protein products [5,6,25]. To date, two alternatively spliced forms of SCF mRNA have been reported in the mouse: 1) the full-length form; and 2) an alternative form lacking exon 6, which like the corresponding human transcript, produces a 28 aa deletion. Exon 6 codes for an extracellular cleavage site, which is susceptible to proteolytic cleavage by proteases. Expression of the SCF variant containing this exon 6 will produce a membrane-bound isoform, designated as SCF-1 (KL-1) or (+) form, and its proteolytic cleavage will generate a soluble form of the factor. In contrast, expression of the SCF splice variant, lacking exon 6, gives rise to a stringent membranebound protein, known as SCF-2 (KL-2) or (2) form [5,6,11,25]. The SCF expression ratio between the KL-1 and KL-2 isoforms varies significantly between various cell types [6,11]. A Mouse mutagenesis study [10] reported the usage of secondary cleavage site in the absence of primary cleavage site (exon 6) to generate the soluble form and is located at or near Lys 178 -Ala 179 -Ala 180 -Ser 181 (exon 7).
Three isoforms have been identified and documented for human and mouse SCF genes (source: GenBank, NCBI, http:// www.ncbi.nlm.nih.gov/; Ensembl, www.ensembl.org/; UniProt, www.uniprot.org/). Basically, the first two isoforms (273 aa and 245 aa) differ by the presence (+) and absence (2) of potential primary proteolytic site (exon 6), respectively. The third, shortest isoform (238 aa) differ in its N-terminus for the first 8 aa vs first 43 aa of the (+) and (2) form but has the primary proteolytic site. In sheep, there exists only two partial mRNA records, the counterparts of SCF-1 (+ form), one from ovarian follicle (780 bp, Acc. No. U89874.1), the other in keratinocyte (622 bp, Acc. No. Z50743.1) and two partial records of SCF genomic DNA sequences i.e., a 59 UTR and partial CDS sequence (358 bp, Acc. No. HM347344.1), the stem cell factor MGF25 (781 bp, Acc. No. AF165788.1) gene, coding region not determined. The larger mRNA species of SCF encodes a protein (Uniprot, P79368) of 267 amino acids (aa), known as 'soluble' isoform (SCF-1/b), which is a transmembrane protein comprising of a 25 aa leader signal peptide sequence, a 189 aa extracellular domain that includes a proteolytic cleavage site (28 aa), followed by a hydrophobic membrane spanning helical region (21-23 aa) and a short cytoplasmic tail (36-37 aa) [5,6,25]. The alternative SCF mRNA lacks exon 6, a deletion of 84 bp. This shorter mRNA species gives rise to a protein, known as 'membrane-bound' isoform (SCF-2/a) that lacks 28 aa, including one of the four N-linked glycosylation sites in the C-terminus (Ala 164 and Ala 165 ) of the soluble SCF, as well as the protease recognition site. This shorter form of the protein yields soluble SCF less efficiently than the longer form of the transmembrane protein. Hence the regulation of the abundance of the alternatively spliced messages might significantly contribute to the regulation of the production of soluble and/or membrane-associated SCF by the cell [26]. The physiologic roles of these SCF proteins remain uncertain. Notably, the biological effects of the membrane-bound (as opposed to soluble) forms of the protein may be significantly different, at least with respect to bone marrow progenitor cells [9].
Numerous pigmentation mutants are phenotypically (.800 alleles) profound, but remain mechanistically uncharacterized [27]. In sheep, the candidate genes for recessive black (ASIP) [28,29], dominat black (MC1R) [30,31] and Brown (tyrosinase related protein-1, TYRP1) [32] have been found which are known to influence pigmentation or pigment synthesis level. In the merino experimental models [33], authors proposed that ''The inheritance of white coat colour in merino sheep is dependent on single gene segregation, without any modifying effects and is completely dominant over pigmented animals''. According to their data, Agouti (A) locus or extension (E D ) locus [28][29][30][31] which are encoded by agouti signalling peptide (ASIP) and the melanocortin-1 receptor (MC1R) loci respectively [34] have never been associated with spotting or white in mammals. They are involved, in fact, in melanin switching [13,35,36]. White can be caused by defects at various stages of melanocytes development, including proliferation, survival, migration, invasion of the integument, hair follicle entry and melanocytes stem cell renewal [36]. Many white spotting traits have been identified in mouse and man, and 10 of the genes have been cloned [36]. It has been hypothesized that the gene for white phenotype in merino sheep is on these loci [33]. Among those, for the loci microphthalmia-associated transcription factor (MITF, microphthalmia) [37], c-KIT (Dominant White Spotting) and SCF (Steel), it is possible to obtain completely white live animals [27,36,38]. Since c-KIT/SCF signaling and MITF-dependent transcription are both essential for the melanocyte development and pigmentation [39].
The study of genes controlling coat colour and pigmented fibres are most relevant to 'white' wool production as brown or black wool will not dye as readily. Since natural coloured fibre is a new opportunity for textile industries, development of valid genetic tools (coat colour tests) and effective sheep breeding programme should go hand-in-hand to help breeders and small scale farmers to reduce future occurrences with the wool market. The present empirical study was undertaken as part of the huge in-house project evaluating the involvement of three candidate genes such as MITF, c-KIT and SCF in various coat colour traits of merino sheep especially the white phenotype. Isolation of these genes and knowledge of their structure will allow for further studies into the regulation of gene expression in the ovine melanocyte biology and skin pigmentation. In an effort, to better characterize the mRNA/ cDNA structure of SCF in the skin of white merino sheep we performed cDNA cloning, sequencing and gene expression analysis by semi-quantitative RT-PCR and Northern blot. In this study, we isolated a novel mRNA splice variant from skin designated as 'SCF truncated isoform-2a/b (2)', demonstrating for the first time, that a premature stop codon (PTC) at the short 39 UTR sequence corresponding to intron 5 is due to the usage of an alternative splice donor/acceptor site. The other ovine transcript variant, 'SCF isofrom-1 (+)' expressed in skin, the commonly known homolog of SCF (+) isoform in other mammals is also been presented here. We also demonstrated that the relative gene expression at mRNA transcript level is mediated via an intron 5 splicing event by Northern blot analysis. Further, this manuscript discusses extensively on ovine SCF mRNA structural coverage, putative AS events on the intron-5 of the SCF gene, mRNA and protein structure domain characterization, homology modelling, and molecular phylogeny of SCF.

Collection of Skin Biopsies and Blood
Skin biopsies were collected from uncoloured (white) and coloured (black and brown) animals of the merino sheep using disposable, sterile, biopsy punch (8 mm diameter), treated and stored in RNAlater (Sigma-Aldrich, Milan, Italy), transferred to the molecular biology laboratory and immediately frozen in liquid nitrogen until RNA extraction. Blood samples were collected from the jugular vein of the same individuals with PAXgene Blood DNA Tubes (PreAnalytix kit, Qiagen, Milan, Italy) via standard phlebotomy technique, processed immediately in the lab according to the manufacturer's protocol for the DNA isolation and the aliquotes were stored at -80 o C. Samples were collected and recorded according to the farm technicians from Aziende la Campana Montefiore dell'Aso (Ascoli Piceno, Marche), La Meridiana Umbertide (Perugia, Umbria), Italy with permission from the owners of each farm.

RNA and DNA Isolation and Quantification
Total RNAs were extracted from the stored skin biopsies of all three animals using TRI Reagent (Sigma-Aldrich, Milan, Italy) according to the manufacturer's instructions followed by treatment with RNase-free DNase (Fermentas, Milan, Italy) to remove contaminated DNAs. Tissue was homogenized (0.075 g in 750 ml TRI reagent) using Polytron homogenizer (Qiagen, Milan, Italy). Genomic DNAs were isolated from the blood samples with PAXgene Blood DNA kit (PreAnalytix kit, Qiagen, Milan, Italy) following the given handbook protocol.
The qualitative assessment of the isolated, purified DNAs, RNAs were done utilizing the Genesys 10 UV Spectrophotometer (Thermo Electron Corporation, Madison, USA). The purity was assessed by calculating the ratio of optical density (OD) at A260/ A280 and the integrity was determined by running the samples on 1.0% formaldehyde-agarose gel electrophoresis for RNA and 0.8% agarose gel electrophoresis for DNA [40]. For DNA, the concentration was also evaluated based on the intensities of band with reference to the molecular weight standard Lambda (l) DNA EcoRI HindIII digest (Fermentas, Milan, Italy) or 1 kb gene ruler (USB Corporation, Cleveland, USA). The nucleic acid concentration was calculated following [40] and the DNA samples were diluted to 10 ng/ml or 50 ng/ml for PCR amplification.
cDNA Synthesis and RT-PCR Amplification cDNAs were synthesized from total RNA extracted from the skin of the merino sheep. Reverse Transcription (RT) from 1-1.5 mg of RNA in a toal volume of 20 ml containing 50 pmol oligo(dT) (18-mer) or oligo(dT) 18 modified primer, 0.5 mM deoxyribonucleoside triphosphate (dNTPs), 16RT buffer, 20 U of RNase inhibitor and 200 U PrimScript TM Reverse Transcriptase (Takara Bio Inc., Clontech, Jesi, Italy) or StrataScript TM Reverse Transcriptase (Stratagene, Agilent Technologies, Milan, Italy) according to the manufacturer's instructions. The reaction was incubated for 60 min at 42uC and then heated at 70uC for 15 min, and cooled on ice. All the RT reactions were performed in a Perkin-Elmer Cetus Model 480 DNA Thermal Cycler (Perkin-Elmer, Monza, Italy) and/or MyCycler TM Thermal Cycler (Bio-Rad Laboratories, Segrate, Italy). Subsequently, 0.5-0.7 ml of the first strand cDNA reaction was used for PCR amplification. The reactions were performed in 25 ml volume containing 16PCR bufffer, 1.5 mM MgCl 2 , 2.0 mM dNTPs, 0.3-0.5 mM gene specific primers (Table S2), 20-30 ng/ul cDNA and 1.5 U of proofreading Easy-A High-Fidelity PCR Cloning Enzyme (Stratagene, Agilent Technologies, Milan, Italy) and the cDNA check amplification was performed with Dream Taq DNA polymerase (Fermentas, Milan, Italy). Three-step RT-PCR amplification was performed in a MyCycler TM Thermal Cycler (Bio-Rad Laboratories, Segrate, Italy), TGRADIENT Thermocycler (Biometra GmbH, Göttingen, Germany) with an initial denaturation at 95uC for 3 min, followed by 5 primary cycles of 94uC for 1 min, annealing temperature (Ta o C) below 3-5uC of the temperature melting (Tm) of the gene specific primer whichever is lowest of the two primers for 1 min, and a 72uC for 1 min. This was then followed by 25 consecutive cycles of 94uC for 15-30 sec, annealing temperature (Ta o C) for 15-30 sec and 72uC for 20-30 sec with a final extension at 72uC for 10 min, lastly a hold temperature at 4uC. NOTE: PCR cycling conditions especially Ta, timing interval varies with primer sets and the expected size of amplicons (see Table S2 [41]. The remaining 59 and 39 RACE SCF gene specific primer pairs were deduced from the 621 bp cDNA coding sequence (CDS) fragment to walk up and down in order obtain the full-length cDNAs. All the designed primer pairs were checked with the online software tools [42,43] before making an order with [42]. The primers used in this study were synthesized and purchased from Sigma-Aldrich, Milan, Italy. 59 RACE cDNAs were reverse transcribed from 1-1.5 mg of RNA in a total volume of 20 ml containing 2-2.5 pmol SCF gene specific splice variant primers (Table S2), 0.5 mM dNTPs, 16RT buffer, 20 U of RNase inhibitor, 200 U of PrimScript TM Reverse Transcriptase (Takara Bio Inc., Clontech, Jesi, Italy) and StrataScript TM Reverse Transcriptase (Stratagene, Agilent Technologies, Milan, Italy) according to the manufacturer's instructions. The reaction was incubated for 60 min at 50uC and then heated at 70uC for 15 min, cooled on ice and stored at 220uC. Two different 59 RACE cDNAs were synthesied with gene specific primer for the SCF (+) and (2) form (see Table S2 for details). This was then followed by 0.1 volume of 3 M sodium acetate, pH 4.8 or 5.2 salt and 2.5 volume of 100% ethanol preicipitation to the final volume of 100 ml 59 RACE cDNA. The precipitation was carried out at 280uC over night and centifuged twice at 16,000 g for 30 min. The pellet was washed twice with 70% ethanol at 16,000 g for 15 min. The collected, air dried pellet was finally dissolved in 40 ml DEPC treated water and stored as aliquotes at 280uC. A homopolymeric tail was then added to the 39-end of the purified cDNA (10 ml) using 30 U terminal deoxynucleotidyl transferase (TdT, USB Corporation, Cleveland, USA and Invitrogen, Life Technologies, Monza, Italy) and 0.2 mM dCTP (Fermentas, Milan, Italy) following the protocol of 59 RACE System (v. 2.0, Invitrogen, Life Technologies, Monza, Italy). The reaction was incubated at 95uC for 3 min for the denaturation and 37uC for 12 min for the addition and then heat inactivated at 70uC for 10 min, cooled on ice and stored at 220uC. Subsequently, 3 ml of the dC-tailed cDNA was used in a final volume of 50 ml for the first round enrichment PCR amplification followed by second round nested amplification using 2-3 ml of the primary enriched RT-PCR reaction. The primer combinations used were adapter forward primers aapfwd (first round), auapfwdnst (second round, nested) and scfrev1 (proteolytic site, + form, first round), scfrev2 (common region, -form, first round) and scfrev3 (common, second round) as the reverse primers, respectively. NOTE: Forward adapter primer sequences were retrieved from the 59 RACE kit, Invitrogen, Life Technologies, Monza, Italy and synthesised by Sigma-Aldrich, Milan, Italy. The PCR amplification was carried out as described above for 36 cycles and the cycling conditions especially Ta, timing interval varies with 59  RACE primer sets and the expected size of amplicons (see Table  S2 for details).
First strand 39 RACE cDNAs were prepared with a high Tm oligo(dT) 18 modified primer as described above and 1 ml of this cDNA was used in a final volume of 50 ml for the first round PCR amplification. Successive nested, splice variant specific amplifications were performed in a 50 ml PCR volume using 1 ml of the primary enriched RT-PCR reaction.
The PCR was run for 36 cycles as described above and the cycling conditions especially Ta, timing interval varies with specific 59 and 39 RACE primer sets (see Table S2 for details). For 39 RACE the primer pairs having high Tm were subjected to a twostep PCR with a coupled annealing, extension at 69 or 72uC for 3 min 10 sec up to 10 min. The primer combinations used for distinctive 59 and 39 RACE amplification and the expected size of amplicons were presented in Table S2. NOTE: Forward adapter primer sequences were retrieved from the 59 RACE kit, Invitrogen, Life Technologies, Monza, Italy and synthesised by Sigma-Aldrich, Milan, Italy.

DNA Splice Junction Amplification
Blood genomic DNA was amplified to confirm the splice site premature termination with a poly A signal detected on sheep SCF cDNA transcripts. The Expand Long Range, dNTPack (Roche S.p.A., Milan, Italy) was used following the manufacturer's instructions, including 0.3-0.5 mM specific primers scffwd3 (exon 5, common) and scfrev1 (exon 6, + form specific) (Table S2), 500 mM dNTP mix, 3% DMSO, 100-150 ng of genomic DNA and 3.5 U of Expand Long Range Enzyme mix in a final 50 ml PCR volume. The PCR protocol was performed as per Roche's kit protocol. Since the available unfinished draft reference sheep genome, Oarv2.0 (current version, March 2011 -till date, http:// www.livestockgenomics.csiro.au/sheep/oar2.0.php) did not provide much information regarding the SCF gene, the reference SCF genomic locus at the exon 5-intron (5)-exon 6 splice junction was covered in comparison to the orthologous SCF gene assembly of human, mouse, cow and dog.

Expression of Ovine SCF in Skin
To determine the relative abundance of the SCF (+) and (2) cDNA transcripts, we performed semi-quantitative RT-PCR amplification using two different sets of splice variant specific (+ and -) primers as summarised in Table S2. Four sets of primer pair included (+) form specific forward (exon 5-exon 6: stpro3'Rfwd1), reverse primer (exon 6: scfrev1); the common forward primers (scffwd1, scffwd4) located on the common region of the CDS and a (2) form specific reverse primer (scf(2)rev) which was designed spanning into the exon 7-exon 5 splice junction. Total RNA of 1.5 mg of each animal (white, black, brown) was reverse transcribed into cDNA using 200 U PrimScriptTM Reverse Transcriptase (Takara Bio Inc., Clontech, Jesi, Italy) and 50 pmol oligo(dT) modified primer in a 20 ml reaction volume, as described above. PCR amplification was performed using 0.5 ml of the each cDNA sample as a template in 25 ml of a reaction mixture consisting of 16DreamTaq buffer, 1.5 mM MgCl 2 , 0.2 mM dNTPs, 0.5 mM each of primer and 1.5 U of DreamTaq DNA polymerase (Fermentas, Milan, Italy). After an initial denaturation step of 3 min at 95uC, a 3-step PCR programme was carried out with 5 successive cycles of 25 sec at 95uC for denaturation, 25 sec at primer-specific annealing temperature (TauC) for annealing procedure and 25 sec at 72uC for extension, followed by 25 repeat cycles of 94uC for 15 sec, annealing temperature (Ta o C, see Table S2) for 15 sec and 72uC for 20 sec with a final extension at 72uC for 10 min and a cooling phase at 4uC. Amplified RT-PCR products were separated on 1.5-2% agarose gel electrophoresis, and were evaluated by ethidium bromide staining and UV transillumination. For the RT-PCR reference, constitutively expressed glyceraldehyde 3-phosphate dehydrogenase (GAPDH, 252 bp) and 18 S rRNA (132 bp) was used as an equal loading control. The house keeping gene (HKGs) primers were designed from the corresponding Ovis aries NCBI GenBank Accession Nos. (see Table S2) and amplified with the same PCR conditions and cycle numbers. Amplicons were confirmed by cloning and direct sequencing. The relative signal strength was measured using the QuantiScan Demo software [44].
For Northern blot analysis, total RNA was isolated from skin as described above. The poly(A)+ mRNA from total RNA was purified using Oligotex mRNA Midi Kit (Qiagen, Milan, Italy) following the manufacturer's protocol. The 40 ml eluted mRNA sample was separated on a 1.2% denaturing formaldehydeagarose gel electrophoresis [40]. Subsequently, mRNA was transferred to a Hybond TM -N Neutral nylon membrane (Amersham Biosciences, GE Healthcare Europe GmbH, Milan, Italy) overnight by capillary diffusion [40]. The mRNA was crosslinked onto the membrane by baking at 80uC for 2 h. The membrane was pre-hybridized at 50uC for 1 h and then hybridized overnight at 50uC containing denatured DIG-labeled PCR probe (2 ml/ml). DIG-labeled PCR probes were synthesized using a PCR DIG Probe Synthesis kit (Roche S.p.A., Milan, Italy). DIG labeled DNA fragments of ovine SCF (222 bp, +/2 form) and 18 S rRNA (132 bp) were synthesized by PCR using the corresponding cDNA clones as templates and gene-specific primers (see Table S2). Following low (265 min with 26SSC, 0.1% SDS at room temperature) and high (2615 min with 0.16SSC, 0.1% SDS at 50uC) stringent washes, the nylon membrane was incubated in the blocking solution for 45 min followed by additional incubation with a blocking solution that contained a 1:5,000 dilution of alkaline phosphatase conjugated, anti-DIG antibody (Roche S.p.A., Milan, Italy) and incubated for 15-45 min at room temperature. The hybridized probe was detected with the chemiluminescent substrate, CSPD (Roche S.p.A., Milan, Italy). Hybridization signals were detected by exposure of the membrane to KodakH BioMaxH XAR Film (Sigma, Milan, Italy) at room temperature. Pre-hybridization, hybridization, blocking and washing solution recipes were prepared and followed according to the procedures for nonradioactive (DIG) labeling and detection of nucleic acids (Roche S.p.A., Milan, Italy). Probes were stripped at 80uC for 2660 min before rehybridization according to the manufacture's instructions (DIG application manual, Roche S.p.A., Milan, Italy).

Cloning and Sequencing
All the selected amplicons were gel purified either manually by salt precipitation or using Nucleospin columns (Macherey-Nagel, GmbH & Co. KG, Düren, Germany). Cloning was performed in the TA cloning system (pGEMH-T Easy, Promega, Milan, Italy; pCRH2.1 TOPO, Invitrogen, Life Technologies, Monza, Italy; InsTAclone TM , Fermentas, Milan, Italy and pSC-A, StrataClone-UA, Stratagene, Agilent Technologies, Milan, Italy). The ligated products (3-5 ml) were transfered by heat shock treatment into a chemically competent DH5a cells which were prepared manually [40], except for pSC-AStrataClone-UA vector system for which StrataClone SoloPack competent cells were used (included in the kit package). Clones were screened by M13 colony PCR amplification. Identified positive colonies were inoculated into the selective antibiotic LB or SOB medium for the over night culture at 37uC, 150 rpm in a shaker waterbath. Subsequently, plasmid DNAs were isolated [40] and screened for the release of expected insert(s) by analytical single or double restriction enzyme digestion (EcoRI or EcoRI+HindIII) according to the vector map. Positive clones were prepared for sequencing and sequenced by the commercial vendors (StarSEQ, Mainz, Germany; BMR sequencing, Padova, Italy) with M13 forward and/or reverse primer or sequenced with any one of the gene specific primer for deeper sequencing of the inserts whenever necessary. Sequences were viewed with sequencing chromatogram trace viewer FinchTV v. 1.4.0 [45].

Sequence Data
Our new sequenced data of SCF can be accesed through NCBI GenBank accession nos. GU386371-GU386374 (Table S1).

mRNA Secondary Structure Analysis
We used the webserver program Mfold v. 3.5 [46] for predicting the non-coding RNA (ncRNA) secondary structure stability of the different SCF transcripts and its miRNA target binding sites. The structure of DNA splice junction was analysed with DNA Folding Form [46]. The ncRNA secondary structures were also predicted with a set of MUSCLE [47] aligned mammalian homologous sequences of the SCF cDNA transcripts using Sequences Selection for the Comparative Approach (SSCA) by Tfold [48]. The optimal secondary structures for all sequences were obtained in a dotbracket notation with minimum free energy and the structural elements such as helices, internal and terminal loops were deterrmined by drawing the RNA structure in the java applet VARNA v. 3.7 [49]. All fold analyses were performed using the default setting of the web servers.
The TargetScan program Release 5.1 [50] and miRBase Release 16 [51] were used to locate potential sheep SCF 39 UTR miRNA target sites from human, mouse, dog, cow and chicken.

Protein Homology Modelling
Protein templates were identified and scrutinized using Template Identification tool at SWISSMODEL Workspace v 8.0.5 [52], Reverse PSI-BLAST (in BLAST 2.2.12 packages) search against protein data bank (PDB) and Structural Classification of Proteins (SCOP) at Genomes TO Protein structures and functions (GTOP) [53].
The homology modeling was performed with Modeller 9v2 [54] using an integrated multiple sequence alignment and multiple structure visualization application 'Friend' v. 2.0 [55]. All the modelled structures were stored as a PDB format data (.pdb) and then viewed, edited with ViewerLite v. 5 [57]. Homology modeling was also attempted with an automated modeling server at SWISSMODEL Workspace [52].

Sequence Analysis and Molecular Phylogeny
Whole mammalian genome scanning was done to identify the homologous regions of sheep SCF cDNA transcript variants using Basic Local Alignment Search Tool (BLAST) at National Center for Biotechnology Information (NCBI), Bethesda, Maryland, USA [58], ENSEMBL release 60 [59] and BLAT [60] searches, sequentially. Sequences were edited, translated using the BioEdit v.7.0.5.2 (Ibis Therapeutics, Carlsbad, CA, USA) [61] and DNASTAR 7 [62] software packages. The open reading frame (ORF) of the full-length SCF cDNAs was determined by DNASTAR 7 [62] and ORF Finder at NCBI (www.ncbi.nlm. nih.gov/gorf/). The positions of exons and introns were determined and the translated SCF protein to genome structure was drawn using WebScipio [63] in reference to the SCF gene structure of human, mouse and dog. ClustalW2 [64] and MUSCLE [47] programs were used to align the DNA and protein sequences. Subsequently, Gblocks program [65] was used to eliminate the poorly aligned positions and divergent regions on the DNA and protein alignments for the phylogenetic analysis. The datas were then converted to FASTA (.fas) and NEXUS (.nex) formats using DataConvert (v. 1.0) [66].
Distance based neighbour-joining (NJ) phylogenetic trees were generated using the Molecular Evolutionary Genetics Analysis (MEGA) software v. 4.1 [67]. The NJ algorithm [68] was implemented with the p-distance [69], Jukes-Cantor [70] and Tamura-Nei [71,72] model using a transition+transversion substitution at uniform rates as well with the gamma parameter of 4.0. The robustness of each phylogeny was assessed by percentage of 1000 bootstrap (BS) [73] re-samplings.
Bayesian Inference consisted of two independent Markov Chain Monte Carlo (MCMC, mcmc nruns) runs of 100,000 (ngen) were calculated with trees samples at every 10 th generation and with a prior burn-in of 25% (sump burnin = 2500; sumt burnin = 2500) i.e., the first 2500 sampled trees were discarded. BI was run with GTR+G, HKY+G and a JTT+G substitution models under the above set parameters for the nucleotide and amino acids alignments, respectively.

Use of other Computational Tools and Databases
Ovine SCF transcripts were searched on chr. 5 of the Bos taurus (Btau_5.2, current release 2011) chromosomal map using NCBI map Viewer [84]. The sequence similarity was visualized with Circos table viewer [85]. The post-transcriptional associated regulatory elements located in the 59 and 39 untranslated regions (UTRs) of the SCF cDNA transcripts were retrieved from UTR databases (UTRdb or UTRSite) [86] using the online tools UTRScan and UTRBlast. The graphical representation of SCF amino acid and nucleic acid multiple sequence alignment was drawn by a sequence logo generator, WebLogo [87]. SCF polyadenylation sites were predicted using the polyADQ web server [88]. Alternative splicing pattern of the ovine SCF transcripts with human, mouse reference assembly were predicted using ACEVIEW [89] and Alternative Splicing and Transcript Diversity (ASTD 1.1) [90]. The splice site prediction such as putative alternative exon isoform, cryptic and constitutive splice sites of internal (coding) exons was performed using Alternative Splice Site Predictor (ASSP) [91] and Regulatory RNA Motifs and Elements Finder (RegRNA Release 1.0) [92]. SCF protein knowledge, sequence analysis, classification were performed with the UniProtKB Protein existence Server [93]. SCF protein secondary structure and site interactions were analyzed using Protein data Bank (PDB) [94]   and PDBsum [95]. The putative SCF protein domain figure was drawn with MyDomains -Image Creator at ExPASy [96].

Ethics Statement
In agreement with the new European Directive on the protection of animals used for scientific purposes (Directive 2010/63/EU, Article 15, Annex VIII), all animal procedures used in the study are classified as 'mild' (i.e. procedures with no significant impairment of the well-being or general condition of the animals) and have been preemptively approved by the Animal Ethics Committee of the University of Camerino.

Identification and Isolation of the Sheep SCF cDNA Fragment
To examine the SCF variant(s) expressed in the skin of white merino sheep, 1-1.5 mg of total RNAs from the skin were reverse transcribed and the synthesized single strand cDNAs were amplified by PCR. We initially carried out the cDNA coding (CDS) region amplification using the primer pair scffwd1 and scfrev1 (Table S2). Primer walking and the mRNA/cDNA structural coverage of the longer and shorter cDNA amplification strategies from the ovine total RNA (skin) are shown in Figure 1A and 1B. RT-PCR primers were selected based on the mammalian nucleotide (nt) sequence alignment of the soluble-SCF (s-SCF) cDNA encompassed to the open reading frame (ORF) of 606 bp of the 621 bp amplicon ( Figure 1A(a)) commonly known as 'soluble or secreted form'. The purified RT-PCR amplification product was then cloned and sequenced. Sequencing results revealed no differences among white, black and brown clones of the 621 bp ( Figure S1A), which additionally appear to be identical (99%) with two of the previously submitted NCBI GenBank mRNA (partial) sequences of ovine s-SCF (U89874.1 in 2002 and Z50743.1 in 2005; see Figure S1B) from ovarian follicles and keratinocytes, respectively. An exception of transition at T 54 C in U89874.1 was observed among the 621 bp sequences. Similarly, a transversion at C 81 G was observed (see the chromatogram of Figure S1B) in 2 out of 5 clones sequenced in white animal. The possible allelic variant at this position will elucidate its true identity. Nevertheless, these substitutions do not result in an amino acid substitution change.
The virtual translation of the 606 bp CDS (of the 621 bp) resulted in a protein corresponding to the first 202 amino acids (aa) ( Figure S1A and S1B) of the ovine SCF (oSCF), in which the last 28 aa at the C-terminus was spanned into the putative primary proteolytic region of the long isoform i.e., s-SCF or (+) form and is identical to three of the GenBank oSCF protein sequences of 260 aa, 202 aa and 267 aa (Acc. No. AAB49491, CAA90620.1 and P79368), respectively.

Rapid Amplification of cDNA Ends (RACE)
To obtain the full length cDNAs, we performed the 39 and 59 RACE experiments sequentially. Two different sets of primer (Table S2) were used for RT-PCR amplification in order to ascertain the corresponding 39 and 59 untranslated regions (UTRs) of the two different transcript variants i.e., (+) and (2).

RACE -Detection of a Splice Variant of Ovine SCF
Initially, 39 RACE cDNAs were prepared as described in materials and methods. One ml of this cDNA was used for the first round PCR amplification with the common CDS region forward primer and oligo(dT) 18 modified as a reverse primer (Table S2). We got an unexpected short size of approximately 350 bp prominent amplicon since the expected 39 UTR sequence with respect to other mammalian SCF mRNA species ranges from ,500 bp to ,4.5 kb. This was then gel purified, cloned into the TA cloning system and sequenced. To our surprise, the BLASTN sequence analyses revealed a 336 bp oSCF product ( Figure 1B(b 5 )). Overlapping the 336 bp to the 621 bp CDS amplicon, we obtained a novel, truncated oSCF mRNA splice variant of 691 bp (without the 59 UTR). Subsequent virtual translation of the ORF containing 546 bp resulted in a truncated oSCF protein of 181 amino acids with a unique C-terminus. The concomitant deletion in the shorter clone resulted in the substitution of aspartic acid (D) at aa pos. 175 with glutamic acid (G) i.e., D 175 G. Truncation would delete the C-terminal 93 aa residues of ovine s-SCF and fully conserved till G 175 which is explained below. Henceforth, the new truncated protein isoform has a short stretch of 6 aa sequences right after the 'G 175 ' residue 'KTYKHS' as its C-terminus ( Figure S1B). This shorter form of oSCF has not been previously reported; however, short isoforms of SCF commonly known as membrane-bound form (m-SCF) corresponding to 245 aa lacking the proteolytic site have been reported as the (2) form of previously reported mammalian species including human [97,100], mouse [11], cow [98] and avain [99]. The newly identified 181 aa oSCF (2) form differed from the 245 aa by the deletion of 64 aa at the C-terminus corresponding to the transmembrane and intracellular region. Hence, this novel cDNA variant could be recognized as the 'membrane-anchored' SCF protein (m-SCF) form and named as 'SCF truncated isoform-2', designated hereafter as (2) form since it lacks the primary proteolytic site. To our knowledge, this information of oSCF truncated (2) protein product is previously unreported in other mammal species especially in skin.
The remaining short 145 bp (after removing the adapter sequences from the oligo(dT) 18 modified primer) including the polyA nucleotides belong to the 39 UTR of ovine m-SCF (2) form. Mammalian genome scanning for the SCF gene represented that this novel 39 UTR of ovine m-SCF (2) form corresponds to the intervening sequence in between exon 5 and exon 6 i.e., intron-5 of the (+) form. Here we hypothesis that the premature truncation could be the result of alternative use of the splice donor/acceptor site in the intervening sequences. Later, this short 39 UTR amplification was confirmed ( Figure 1B(b 5 )) in black and brown animals by direct sequencing but did not considered for further characterization such as SNPs.
In order to identify the 39 UTR of the (+) form, we used the same 39 RACE cDNA preparation as mentioned above. One ml was used for the first round amplification with the common CDS region forward primer and oligo(dT) 18 modified as the reverse primer (Table S2). We obtained three different RT-PCR amplicons ranging from ,700 to 1300 bp ( Figure 1A(b)). At this stage it was difficult to substantiate this amplification. Hence, we performed three individual nested amplification sequentially using oligo(dT) 18 modified as the reverse primer with the Nested forward primers (Table S2) for the consequent PCR reactions. All these amplified nested fragments were gel purified, cloned into the TA cloning system. Colonies were screened by colony PCR as well by restriction digestion, and the positive clones were subjected to sequencing.
Sequencing results showed three different sizes of fragment, one each from the Nested amplification (Table S2) viz. 597 bp ( Figure 1B(b 3 )); 389 bp ( Figure 1B(b 4 )); and 336 bp ( Figure 1B(b 5 )) as positives for oSCF. Sequence analysis by BLASTN, BLASTP and ClustalW2 revealed all three products as ovine m-SCF (2) form and are identical to the one described above i.e., 336 bp for the reason that the (2) form override (+) form during the RT-PCR amplification. In other words, there exists a considerable difference in the mRNA expression level between these two transcript variants which is further explained in the later section. The rest of the amplicons were found to be nonspecific including the two expected amplicons viz. ,0.7/1.2 kb ( Figure 1B(b 3 )) amplified from the primary RT-PCR amplification ( Figure 1A(b) and B(b)).
In all the above cases, we obtained always the (2) form, hence we designed a splice variant specific Nested forward primer (Table  S2) with higher Tm for the (+) form. The primer was designed in between two exonic junctions (see Figure 1A and 2(c)) spanning into the proteolytic site viz. exon 5 into exon 6 in reference to the human, mouse, dog, horse SCF (source: Ensembl). The second round 39 RACE amplification (Nested 1; see Table S2) was performed with 1 ml of the primary reaction product using (+) form specific forward primer (Table S2) and oligo(dT) 18 modified reverse primer into a final PCR volume of 50 ml. The RT-PCR yielded an amplicon size of 855 bp ( Figure 1A(b 1 )). Further third round amplification (Nested 2; see Table S2) yielded the expected 793 bp amplicon with some non-specific amplicons. The purified fragment of 793 bp ( Figure 1A(b 2 )) was then cloned and sequenced. Sequence analyses by BLASTN and BLASTP confirmed the oSCF and named as 'SCF isoform-1', hereafter referred as (+) form, which is the counterpart of previously reported 'soluble' SCF (s-SCF) sequences in other vertebrate species [99][100][101][102][103] (source: GenBank, NCBI). Overlapping and editing of the 793 bp 39 UTR fragment with the 621 bp CDS fragment, we obtained a total length of 1330 bp (without the 59 UTR). The ORF of 825 bp corresponding to the deduced amino acid sequence of 274 aa revealed it as the s-SCF (+) form, indicating that this cDNA encodes the 'soluble' form of oSCF. This ovine s-SCF (+) form included the stretch of 28 aa recognized as a putative primary proteolytic site ( Figure S1B) right after the D 175 at its C-terminus as observed in the previously reported sequences [99][100][101][102][103]. The remaining long 505 bp (after removing the adapter sequences from the oligo(dT) 18 modified primer) including the polyA nucleotides belong to the 39 UTR of ovine s-SCF (+) form. The other two amplicons (data not shown) were found to be non-specific and omitted from further characterization.

RACE
To determine the 59 UTR of the oSCF (+) form, a gene specific 59 RACE cDNA was synthesized using the proteolytic site specific reverse primer ( Figure 1A; see Table S2) as described in materials and methods. Three ml of the dC-tailed cDNA was subjected to the first round RT-PCR amplification with the respective forward and reverse primer (Table S2). After the primary RT-PCR, the expected size of ,780 bp amplicon was not detected on the gel. Consequently, a second round nested amplification was performed with a common CDS reverse primer and the forward adapter primer (Table S2)  The 59 RACE RT-PCR for the oSCF (2) form was performed in a final PCR reaction volume of 50 ml containing 3 ml of the dCtailed cDNA which was synthesised by a common CDS reverse primer ( Figure 1B; see Table S2) along with all other necessary components as described in materials and methods. The first round enrichment PCR amplification was carried out using the same CDS reverse primer and the forward adapter primer (Table  S2). The second round nested amplification was performed with another common CDS reverse primer and the forward adapter primer (Table S2) using 2-3 ml of the primary enriched RT-PCR reaction. Upon 1.5% gel electrophoresis, the secondary reaction yielded two distinct amplicons in the range of 200 to 330 bp. The two amplified 59 RACE products were gel purified, cloned and sequenced. Sequence analysis revealed the two oSCF 59 RACE products of sizes 325 bp and 215 bp ( Figure 1B(d)). These two 59 UTR products were not detected in the (+) form specific 59 RACE cDNA ( Figure 1A(c)) though the primer combination rely on the common CDS region. Hence, these two 59 UTR products were characterized and named as 'SCF isoform-2a (2) and 2b (2)', respectively ( Figure S4(a 1 )). In order to confirm the amplification, this common 59 RACE was repeated twice along with the (+) form specific 59 RACE RT-PCR. Overlapping and sequence comparison of these two clones with the 691 bp (CDS +39 RACE) revealed a deduced 144 bp, a 34 bp 59 UTR sequences (after subtraction of the forward adapter primer sequence) for the two respective clones (325 bp, 215 bp).

Genomic DNA -Spliceosomal Intron-5 Specific Amplification of oSCF
To verify the alternative splicing (AS) event that resulted in the shorter mRNA transcript i.e., ovine m-SCF (2) form, we amplified the intervening sequence between two exons. The sequenced chromatogram from the cDNA and gDNA of oSCF illustrating a PTC followed by the p(A) 11/18 tail signal is shown in Figure 2(a,b), respectively. The reference SCF genomic locus at the exon 5-intron(5)-exon 6 splice junction was determined in comparison to the orthologous SCF gene assembly of human, mouse, rat, cow, horse and dog (source: Ensembl). The genomic DNA (gDNA) was obtained from the blood of white merino sheep. A expected amplicon size of 948 bp amplicon (Figure 2(d)) was amplified using an exon-5 (common CDS) specific forward primer and exon 6 specific reverse primer (+ form, proteolytic site; Table S2) as shown in Figure 2(c). Sequence analyses and orthologous comparison of the oSCF gene product (948 bp) with other mammals revealed that the first 136 bp corresponds to exon 5, followed by an intron-5 of 729 bp ( Figure S4(b)) and an exon 6 containing 83 bp which encodes for the primary proteolytic site. This result was compared with the shorter cDNA transcript. The first 161 nt including a 11 bp polyA (pA) stretch of the intron-5 exhibited 100% identity to the nt pos. 668-835 of the shorter cDNA ( Figure S4(c)). However, careful annotation of the 161 nt unveil a premature stop codon at nt pos. 21-23 of the 729 bp intronic sequnce. Figure 3 shows the oSCF gene structure(s) in reference to mouse, dog and human SCF gene (see also Figure S2 for the humanSCF alternative forms). The overall similarity for this 948 bp DNA splice region in other vertebrates was found to be highest with goat and cow SCF (99 and 94%) where as the lowest was detected with chicken and zebra finch SCF (62%).
Intron-5 has a constitutive 59 splice donor (GT) at its start and six other alternative isoform/cryptic splice donor (GT) sites ( Figure  S4(b)). Similarly, it has a constitutive 39 splice acceptor (AG) site exactly at the end of the intron-5 and five other alternative isoform/cryptic splice acceptor (AG) sites ( Figure S4(b); see also Figure 3(d)) as predicted by ASSP, RegRNA [91,92]. Seven important sequences, the so called the 'branch site' (BS; Figure  S4 11 signal (see key to symbols below the diagram); (d) Gel picture shows the PCR 'CUGAU' are considered at most to be the main branch point sites that could be involved in the AS event. PolyADQ [88] prediction revealed two polyadenylation signal (PAS) of the type 'AAUAAA' in the 729 bp gDNA (intron-5; Figure S4(b)), but present after p(A) 11 stretch hence was not considered to be part of the polyadenylation. However, here we hypothesis that the two other single base variants of 'AAUAAA' [104] such as type 'UAUAAA' at nt pos. 24 (right after the stop 'TAA'), 126 and 'AAUAUA' at nt pos. 83, 124 found just before p(A) 11 bp stretch could be responsible for the polyadenylation process of the shorter mRNA (2) transcript ( Figure S4(b); see also Figure 3(d)). These two strong polyA signals are also present in the cDNAs of the respective oSCF mRNA transcripts ( Figure S4(c)). The other two single base variants 'AAUAGA' and 'UAUAAA' detected at 407 nt and 427 nt ( Figure S4(b)) away from (pA) 11 [63] revealed that the oSCF gene consists of 9 exons interrupted by 8 introns to the dog (Figure 3(e)), pig, horse SCF gene where as in comparison to human, chimpanzee, marmoset, mouse (Figure 3(e)) and rat including the unfinished alpaca genome (source: Ensembl), oSCF gene has been characterized by 10 exons and 9 introns. Comparative analyses of oSCF (+) protein to the dog and mouse SCF gene assembly exhibited 96/90.1 and 93/80.6 match ratio and % identity, respectively. Similarly, oSCF (2) protein showed the match ratio and % identity of 91/87.2 and 90/77.7 to dog and mouse SCF gene assembly, respectively. Among the 9/10 exons, it is predicted by gene annotation (source: Ensembl) that the exon 5 and exon 6 has its importance in determining the final protein product through AS event(s) and the longer exon 10 (9) corresponds to ,4.4 kb 39 UTR in human, chimpanzee, mouse, rat and goat in contrast to the shorter 39 UTR in sheep (reported in this study), cow, pig, horse, dog, cat and panda (source: Ensembl).
Protein Characterization of the Ovine SCF s-SCF (+) and m-SCF (2) The molecular mass of the oSCF isofoms presented in this study as predicted by EditSeq, DNASTAR [62] is 31.1 kDa and its theoretical iso-electric point is 5.236 for the s-SCF (+) isoform corresponding to the 274 aa. Similarly, the m-SCF (2) isoform has a molecular weight of 20.6 kDa with a theoretical iso-electric point of 6.002 for the 181 aa residues.
Topological features of both the isoforms (+ and 2) of ovine SCF in comparison to the human SCF is given in Figure 5(a,b). In ovine s-SCF (+) form, the first 25 amino acids contain features (Figure 5a) Figure S3(d)). The above described features of ovine m-SCF (2) form has been shown in Figure 5(b), which depicts the shortage/deletion of the primary proteolytic site including an N-glycosylation site, a trasmembrane domain (necessary to make a soluble product) and a cytoplasmic domain. The sketch of oSCF gene transcription and translation is shown in Figure 5(c).
Similarly, the deduced 181 aa sequence from 546 bp CDS of the ovine m-SCF (2) form shares 49-99% identity with the predicted m-SCF (2) form of the same length of a number of other vertebrate species (Figure 6(a,c); see also Figure S3(a,c)) including avian SCF. The highest identity was with the goat (99%) followed by cow (95%) where as the lowest was noticed with chicken (49%) followed by zebra finch SCF (51%).

Skin Expression of the Two Ovine SCF Splice Variants
Initially, to verify any eventual difference(s) between the expression level of two different splice variants of oSCF (+/2) four sets of primer (summarized in Table S2) were used as described in materials and methods. Three individuals of white, black and brown animals were subjected to a single round RT-PCR amplification. The RT-PCR reactions gave fragments (see Table S2 for details) exhibiting almost the same level of band intensity for both the (+) and (2) form (data not shown). In contrast, Northern blot analysis showed substantial differences in amplification of 948 bp fragment corresponding to the above schema (c) of oSCF gene from blood gDNA; In the picture, arrow mark indicates the exact size of amplicon; M 1 indicates DNA size marker of l-DNA EcoRI/HindIII digest; and (2)ve represents PCR negative control. doi:10.1371/journal.pone.0038657.g002 the expression oSCF between (+) and (2) form (Figure 7). At this juncture, we propose that the oSCF gene expression in white, black and brown animals at mRNA transcript level is mediated via an intron-5 AS event (Figure 3(c,d). However, both forms (+/2) are biologically active and reported to have different effects on cells [9][10][11]20]. The regulation of processing of the proposed secondary proteolytic cleavage site encoded by exon 7, could play a critical role in the function of membrane-associated SCF (2) protein [10].

SCF UTR Regulatory Motifs that Affect mRNA Stability
The different 59 and 39 UTR sequences of sheep SCF(s) were searched against the UTRdbases [86,92] for the post-transcriptional associated regulatory elements located in the 59 and 39  Figure S4(a 1 ). The critical regulatory sequences, known as Cytoplasmic Polyadenylation Elements (CPEs), are AU-rich elements (AREs) [108] located in the 39 UTR near by the canonical nuclear polyadenylation element (AAUAAA), key sequence features controlling mRNA deadenylation and decay. Surprisingly, sheep SCF mRNA has the following single base variant [104] of the type CAUAAA (nt.

MicroRNA Targets: Another Type of cis-acting Regulatory Element
The above described differences between the two ovine splice variants i.e., (+) and (2) in the conservation of non-coding sequences ( Figure S4(d)) suggests that the 39 UTRs, might have a functional role in gene regulation.
A number of potential miRNA target sites are found within the longer ,4.4 kb 39 UTR sequence of human SCF (data not shown). However, in sheep, the analyzed miRNA sites that are located in the 505 bp 39 UTR of the ovine s-SCF (+) form belongs to the miRNA families of miR-27a/b, miR-194, miR-128, miR-370, and two sites for miR-132/212, miR-320/320abcd (Figure 9(a)) where as miR-669f/a/o-3p, miR-466b and miR828b are detected on the shorter 39 UTR segment (144 bp) of ovine m-SCF (2) form (Figure 9(b)). Interestingly, the 8-mer miRNA (miR-669f) has a high context score (87 percentile) which binds to the Figure 5. Schematic representation of the topological characteristics of two different ovine SCF (oSCF) protein products in comparison to human SCF (huSCF). (a) Illustrates the identical topological features for the soluble oSCF (+) and huSCF (+) which corresponds to the 273 aa vs. 274 aa, respectively. The D 174/175 G represents the change of aa residue for the alternative natural variant i.e., right at the proteolytic site (28 aa, 'green line'). The difference in the position is due to sequence divergence of soluble oSCF (+) which has an additional 'Glu' residue at 'E 154 ' (see Figure S3d). (b) Demonstrates the difference in topological features of the membrane-bound oSCF (2) and huSCF (2) which corresponds to the 181 aa vs. 245 aa, respectively. This novel ovine m-SCF (2) has a unique C-terminus with an additional uncharacterized 6 aa residue (176-181, see key to symbols) right after D 175 G. Given below the diagram (in 5a, b) are the appropriate topological features (see key to symbols) of human and ovine soluble SCF (+) and membrane-bound SCF (2) with referencce to UniProt ID. P79368 and P21583; (c) Schematic representation of ovine SCF gene transcription and translation in skin (hypothetical view). The corresponding oSCF protein products, s-SCF (+) and m-SCF (2) and their topological characteristics are labeled and highlighted respectively. doi:10.1371/journal.pone.0038657.g005 Figure 6. Graphical representation of evolutionary conservation of sheep SCF isoforms. (a) Percent of conservation was calculated for sheep, goat, cow, pig, cat, dog, panda, horse, human, chimpanzee, marmoset, mouse, rat, rabbit, chicken, zebra finch and fishes, such as zebra fish, gold fish using the multiple sequence alignment (MSA) tool, ClustalW2 with four different datasets (provided on request). The Circos graphical table view represents the sheep soluble, s-SCF (+) and membrane-bound SCF, m-SCF (2) nucleotide (nt) and protein (aa) as the query sequences (in black dotted left bracket) against 17 other vertebrate species. Four different colour small bars on the query sequences represnts the four different data sets of sheep s-SCF (+) and m-SCF (2) nt/aa sequences. The 15 different colour ribbons passing through each other represent respective vertebrate species and the percent identity is indicated outside as the boundary. The four different colour small bars over the 15 vertebrate species as against 15 different colour small bars above the sheep query sequences represnts the percent identity among each other. The scale over each species (above small bar) represents the total score obtained from the sequence coverage; (b) Graphical logo representing the conservation of oSCF splice junction (intron-5) which was generated by MUSCLE alignment (manually predicted for other species), depicting the GT repeats (black oval dotted lines) proximal to the poly(A) 11 stretch (black dotted right brace symbol). The constitutive splice donor (GT) and acceptor (AG) sites are circled by black dotted lines along with one of the proposed usage of alternative/cryptic splice donor site (GT) (see Figure 3f). Numbers below the logo indicate the nucleotide/amino acid position of the MUSCLE aligned sequences; (c) Logo representing the 23 nt conservation of the m-SCF (2) form (novel sequence reported in this study) and its deduced 7 aa new C-terminus is shown; (d) Graphical logo representing the 84 nt conservation of the s-SCF (+) form and its deduced 28 aa proteolytic site is shown. Numbers below the graphical representation of (c), (d) indicate the actual nucleotide/amino acid position. The height of the letters on each logo represents the relative frequency of each nucleotide/amino acid in a given position. doi:10.1371/journal.pone.0038657.g006 21 nt off 23 nt of the 39 UTR target of the oSCF (2) form (Figure 9(b)).

Homology Modeling
The predicted three-dimensional structures of the deduced SCF protein corresponding to 141 aa and 132 aa residues were modelled using the best matched PDB templates with 90-100% identity to the individual chains such as 1EXZ, 2E9W:chainC,D and 1SCF. The structure was predicted as using Modeller 9v2 [54] as described in materials and methods. The quality assessment of the modelled structures were performed at SWISSMODEL Workspace [52].
Topologically, the modelled oSCF structure has a core of four alpha(a)-helices (aA, aB, aC and aD) and two antiparallel beta(b)-strands arranged to form a protomer i.e., b1 between aA and aB and b2 between aC and aD. Apart from this, it consist of three other additional unique conformations i.e., one-turn helix, aB' between b1 and aB, an hairpin loop between aB and aC at the dimer interface, and an extra one-turn helix, aD', in the Cterminal extension [112]. This conformation is in accordance with the crystal structure determined for 1EXZ, 1SCF and 2E9W [112][113][114]. The best models were choosen based on the quality assessment reports of ProCheck [115] and Promotif [116]. The calculated Ramachandran plot showed 91-95% of the aa residues lie in the core region for those structures modelled using Modeller 9v2, representing the most favourable combinations of phi-psi values, guiding to the better stereochemical quality of the oSCF protomers while for the one modelled using an automated comparative protein modeling server at SWISS-MODEL, exhibited 70.7% in the core region. Six out of eighteen modelled strcutres were picked and the superimposition of one of the oSCF monomer model to the PDB template 1EXZ:chainB is shown in Figure S5. All these observations suggest correct structure and folding for the modelled putative oSCF.

Molecular Phylogenetic Analyses
The evolutionary divergence of sheep SCF cDNAs and its corresponding protein sequences were studied using other vertebrate sequences from the GenBank, Ensembl and necessary BLAT searches. Except the s-SCF (+) form, the spliceosomal intron junction on the DNA sequences and the m-SCF (2) form  (2) mRNA expression. Ovine 18S rRNA was used as an internal control. Northern blot analysis was carried out with a DIGlabeled cDNA probe for SCF and 18S rRNA (see Table S2) as described in Materials and Methods section. Br, Bl, Wh represents individual of Brown, Black and White merino sheep, respectively. doi:10.1371/journal.pone.0038657.g007  Figure S4(a 1,2,3 ). Similarly, IV (+84 nt) and V (284 nt) represents potential fold difference for the proteolytic segments (dotted dark red arrows directd to the corresponding expanded structures). doi:10.1371/journal.pone.0038657.g008 were predicted manually in accordance to the ovine SCF sequences.
Five different alignments were constructed for the phylogenetic analysis (data sets provided on request): 1) SCF (+) CDS nucleotide data sets (14 mammals, 2 avian and 2 fish, 822 nt unambiguously aligned characters); and 2) SCF (+) CDS deduced protein sequences (13 mammals, 2 avian and 2 fish, 274 aa unambiguously aligned characters); 3) Predicted SCF (2) CDS nucleotide data sets (12 mammals and 2 avian, 543 nt unambiguously aligned characters) and 4) predicted SCF (2) CDS deduced protein sequences (11 mammals and 2 avian, 181 aa unambiguously aligned characters); (5) Predicted SCF DNA sequences concatenated to the exon 5-Intron(5)-exon 6 (12 mammals and 2 avian, 948 nt unambiguously aligned characters). Unambiguously MUS-CLE [47] aligned sequences were confirmed by eye, and unnecessary gaps were excluded from the alignments with GBLOCK program [65] prior to phylogenetic analyses. Phylogenetic relationships were inferred from all five alignments using neighbour-joining (NJ), maximum likelihood (ML) and Bayesian inference (BI) methods as described in materials and methods. The best fit models were scrutinized from 88 nt models [81] and 56 aa models [82] based on the AIC/AICc/BIC/2lnL scores. After the appropriate model selection, the final trees were constructed using the simple p-distance for NJ method, JTT+G, a protein model for ML, BI methods and GTR+G and/or HKY+G for ML, BI as the nucleotide substitutions models. Numbers on the respective nodes denote the supportive bootstrap values of NJ, ML in percentages, and Bayesian posterior probabilities, respectively with the separa- tion of a solidus (/) symbol ( Figure S6(a-e)). Apart from the regular GTR+G, HKY+G models, the other useful nucleotide substitution models for our evaluated data sets include TIM3+G, TPM3uf+G, TVM+G, TrN+G and TPM1uf+I+G. All these evaluated models differ in their respective scores by 65 and produced consistent tree topologies.
All five constructed phylogenetic tree ( Figure S6(a-e)), based on oSCF nucleotide and protein sequences (5 different data sets, provided on request) produced similar monophyletic clusters as mammals, avian, and fishes indicating that all the species delineated successfully and was found to be in harmony with the established positioning of these vertebrates. In the tree ( Figure  S6(c,d,e)) pig-1, pig-2 represents two possible predicted m-SCF amino acid, nucleotide and DNA splice junction sequences, respectively (data sets provided upon request). Note: The s-SCF and m-SCF protein sequence of chimpanzee has 100% identity, hence omitted from the MLA and further tree analyses.

Discussion
Stem cell factor (SCF), characterized as mast cell growth factor (MGF), is a multifunctional growth factor for haematopoietic progenitors, germ cells, melanocytes and mast cells [117]. It is mainly produced by fibroblasts, keratinocytes, endothelial, bone marrow, thymic stromal and small cell lung cancer cells [117]. Moreover, SCF mRNAs (cDNAs) structure and expression have been identified in a variety of other tissues such as brain, kidney, lung, and placenta (source: Ensembl, Aceview). Perhaps one of the more interesting improvements in the area of hair follicle melanogenesis is the isolation of SCF. Although considerable information on SCF cDNA sequences are available in the GenBank repository (NCBI) for several mammal species, the full-length mRNA (cDNA) structure for sheep (Ovis aries) remains unclear untill now (source: Oarv2.0, GenBank, NCBI, March 2012). To our knowledge, there is no experimental evidence or report for the existence of ovine SCF in skin. Taking into account the potential role exerted by SCF in hair follicle melanogenesis [16,36], ovine SCF cDNAs were amplified, cloned and sequenced from the skin of white merino sheep (Figure 1 and 2). Nucleotide sequence analyses and the deduced amino acid sequences disclose the orthology of ovine SCF gene with other mammal species (Figure 3 and 6(a); see also Figure S3(a-d)). Herein, we report for the first time, the isolation of the two alternatively spliced, fulllength oSCF mRNA (cDNA) transcripts such as the longer, SCF isoform-1 (+) widely known as 'soluble or secreted' (s-SCF) form and a shorter, SCF truncated isoform-2a/b (2) (a/b denotes the 59 UTR differences; see Figure S4(a 1 )) possibly characterized as the 'membrane-anchored' (m-SCF) form from the skin biopsies of white merino sheep. In which, the later has been identified and characterized as 'novel' in that the truncated (2) form reported in this study is devoid of 28 aa proteolytic site including a N-linked (GlcNAc) Glycosylation sites and the 23 aa transmembrane region followed by the cytoplasmic tail corresponding to 35 aa in comparison to the commonly known SCF (+) form ( Figure 5(a,b)). As a result of the premature termination codon (PTC) in intron-5, the novel protein isoform has a unique, truncated, short stretch containing 'KTYKHS' (6 aa) as its novel C-terminus ( Figure 6(c,d) and Figure S3(b,c); see also Figure S1B). It has been proposed that soluble SCF is derived from the transmembrane form by proteolytic cleavage within its extracellular domain [25].
The primary oSCF cDNA fragment (621 bp; Figure S1A,B) corresponding to the CDS of 606 bp reported here closely matches to the previously described oSCF sequences [118,119].
The only exception in the deduced 202 aa is at Q 134 (glutamine) which has been reported as E 134 (glutamic acid) [118]. However, it has been confirmed as Q 134 in our virtual translation from the sequenced oSCF cDNA sequences (in this study) and is in agreement with the previously reported oSCF [119]. Besides, the bovine SCF amino acid sequence also has a Q at pos. 134 [98] ( Figure S3(d)). In the present study, the coding region of the longer, oSCF (+) form is identical to that of previously isolated human SCF (Figure 5(a)) and corresponds to the other mammalian counterpart of SCF (+) form ( Figure S6(a,b)). In contrast, the shorter oSCF (2) form identified in the present study, has a premature termination codon (PTC) at intron-5 ( Figure S4(b)) leading to the complete skipping of exon 6 to exon 9/10 viz. differing in the cassette exon (CE 6-9/10). Since this splicing event leads to complete elimination of the proteolytic site, a transmembrane region and the subsequent cytoplasmic domain of the oSCF (+) protein ( Figure 5(a,b)), the resultant product of the shorter isoform would not be secreted ( Figure 5(c)). Perhaps, the cell would require an alternative mechanism for producing this shorter isoform. At this stage, it is important to examine which cell type (melanocyte, keratinocyte and fibroblast) is producing this truncated oSCF (2) form and where it is expressed either intracellular or extracellular environment ( Figure 5(c)) will elucidate the functional and biological significance of this oSCF (2) product in hair follicle melanogenesis. In comparison to the previously reported 245 aa membrane-bound isoforms in other mammals [5,6,25,136], it is possible that the SCF encoded by this shorter ovine cDNA would remain membrane-bound as it lacks the necessary primary proteolytic cleavage site to produce a soluble form [25]. This form of SCF mRNA thus can produce only membrane-anchored SCF [6,25,120]. While the in vivo roles of soluble versus membrane-bound SCF are unclear, like other membrane-associated growth factors (e.g. Transforming growth factor, TGF-a and Tumour necrosis factor, TNF), is thought to be involved in intercellular communication [10].
Over all sequencing results revealed that the SCF gene like in other mammal species, oSCF primary transcripts also undergo alternative splicing (Figure 3(c,d)) with the exon-intron boundary location, size and amino acid composition of the alternatively spliced region being highly conserved [5,6,97,98]. Alternate splicing of intron-5 (CE 6-9/10; skipping of exons 6-9/10) of SCF might therefore provide a mechanism by which the specific type of cell (melanocyte, keratinocyte and fibroblast) could regulate the relative amounts of soluble and membrane-bound SCF that were produced inside the cell (Figure 5(c)). In addition, to the known variant lacking exon 6, an alternative splicing of exon 4, resulting in four possible isoforms was reported in pig [121]. Analysis of oSCF to human, mouse, rat and dog genomic clones showed identical exon/intron boundaries of the oSCF gene architecture (Figure 3(a,b,e,f); see also Figure S2). While performing oSCF cDNA amplification including 59 and 39 RACE, we have analyzed a number of independent oSCF clones and have found no evidence for an alternatively spliced form encoding a membrane-anchored isoform corresponding to the 245 aa as reported in other vertebrate species [5,6,100,122]. From our RT-PCR results, it seems that this particular mRNA species is completely absent in sheep atleast in skin. In other words, the spliceosomal machinery in the skin of sheep failed to generate the oSCF mRNA (2) form which encode for the 245 aa. Instead, it generates the above described truncated shorter ovine m-SCF (2) form (Figure 3(d)). Henceforth, we assume that spliceosomal machinery eliminates the probability of SCF mRNA(s) processing in a similar manner across species i.e., the retention of exon 7 to exon 9/10 in the (2) form (245 aa) (compare Figure 3(b,d)) which has been reported in several studies [5,6,100,122].
The original descriptions of the cloning of SCF including location of introns in the coding regions have been reported for human and rat SCF genes [100]. In comparison to other vertebrate species, the SCF gene is composed of at least 9/10 exons (Figure 3; see also Figure S2) ranging from ,63 bp to ,4 kb in length which are intervened with a wide range of varying length of 8/9 introns viz. ,700 bp to ,34 kb (source: Ensembl). The locations of introns in the coding region of SCF are conserved in rats, mice, and humans [5,100]. The total length of SCF gene ranges between ,72 kb to ,87 kb (source: Ensembl). Previous reports on oSCF Northern blot analysis revealed a major SCF mRNA transcript of ,6 to 6.5 kb in ovarian follicles, corpus luteum and stroma [118,119]. In other species, a major band between 5.5 and 6.5 kb has been described in human [100], mouse [5,11], cow [98], pig [101] and chicken [103]. Shorter and less abundant SCF mRNA species have been reported in the mouse [5] and the chicken [103] (source: Ensembl). From our Northern blot ananlysis and long range 39 RACE RT-PCR, it seems that the larger SCF transcript (,6 kb) is not expressed in ovine skin.
The human and mouse SCF gene on AceView program [89] revealed 18 different 'GT-AG' introns and the transcription produces 8 different mRNAs ( Figure S2 and Table S3A), 7 alternatively spliced variants and 1 unspliced form. There exist 2 probable alternative promotors, 2 non overlapping alternative last exons and 5 validated alternative polyadenylation sites (Table  S3C). The mRNAs appear to differ by truncation of the 59 end, truncation of the 39 end, presence or absence of 9 cassette exons, overlapping exons with different boundaries (Table S3B). The corresponding protein coding potential resulted in 7 different complete isoforms (coding for proteins; Table S3D) from the 6 spliced and one unspliced mRNAs. The remaining left over mRNA variant (spliced) appears not to encode for a protein (noncoding). Similar structural features have also been documented (data not shown) at ASTD 1.1 [90]. According to AceView, this gene is expressed at high level in a wide range of tissues (Table  S3B) revealing its heterogeneity of SCF expression, for example in human placental tissue, five SCF mRNA transcripts were detected [123] by RT-PCR, and that they appear to be under tissue-specific regulation whereas only one transcript size was detected in porcine endometrial total cellular RNA (tcRNA) [101].
In human, exon 1 (198 bp) is organized into 183 bp as 59 UTR sequences and the last 15 bp including the initiation codon 'ATG' encode for the first 5 aa of the putative 25 aa signal peptide. Exons 2-7 encode portions of the extracellular domain of the SCF and exon 7 encodes the transmembrane region. While exon 8 encodes 35/36 aa of the cytoplasmic tail, the stop codon, and part or all of the very long ,4.4 kb 39 UTR (exon 9/10) of the SCF mRNA transcript [100]. As noted previously, SCF can exists as two alternative mRNA transcripts that have been identified for the presence (+) or absence (2) of the 84 nt sequences encoding the proteolytic cleavage site relative to the full-length SCF cDNA [5,6,25,100]. Based on this SCF has been basically classified into variant-1 (+) and variant-2 (2) which are encoding for the protein 273 or 274 aa and 245 aa, respectively. The end points of the missing sequence correspond to the boundaries of exon 6 reported for the rat and human SCF genes [100]. This spliced feature is commonly seen in almost all vertebrate species (source: Ensembl). In murine, SCF cDNA (MGF94), the deletion in the second variant is smaller (48 bp) but shares the same 59 boundary [5], this might be a due to different exon/intron structure for the mouse SCF (MGF) gene, or different alternative splicing within exon 6 that may have occurred during mRNA processing [97]. After analyzing the exon 5-intron (5)-exon 6 boundaries, it is certain that these transcripts are derived from the use of alternative 39 splice donor/acceptor sites in the precursor mRNAs. The exon 6 region encoding for the 28 aa proteolytic site of oSCF is absolutely conserved (100% except for marmoset which has 92% identity) among the reported vertebrate SCF sequences (Figure 6(d); see also Figure S3(b)), suggesting a functional importance of this region of the molecule. However, avian species which has an additional 6 aa 'SIGSNT' (Figure S3(b) in a total of 34 aa shows 75% identity to its counterpart of 28 aa proteolytic site of other SCF (+) mRNA species. At this junction, fish has only 10-17% identity, revealing its long distance of evolutionary conservation for adaptation.
Overall, it was determined that the longer ovine s-SCF isoform-1 (+) of 1519 nt which would encode for a larger secreted protein product of 274 aa (Figure 5(a)). This longer transcript has an insertion of 84 bp at nt. pos. 713-796 by an AS event corresponding to the 28 aa putative proteolytic cleavage site. Similarly, the novel, shorter ovine m-SCF isoform-2a/2b (2) of 835/725 nt (named in respect to the 59 UTR differences; Figure  S4(a 1 )) identified in this study would encode a smaller membraneanchored protein product of 181 aa, which lacks the proteolytic site ( Figure 5(b)). The mRNA/cDNA structural coverage of oSCF is shown in Figure 1A,B. The nucleotides and subsequent deduced amino acid sequences of the ovine SCF isoform (+/2, complete) has high % identity with other mammalian SCF species (Figure 6(a)). The longer and shorter oSCF cDNAs has 85-100% identity to the kit ligand ESTs which are deposited in mouse (DV046036.1, DV044494.1, DT909652.1) and human (DR005930.1, DR002356.1, BX474960.1, DC320486.1) especially in brain, prostate and hematopoietic stem cells but not in skin. In comparison to the previous submitted records, GenBank Acc. No. AAB49491.1 and the Swiss-Prot ID. P79368.2, our ovine s-SCF (+) form (GU386372; see Table S1) encoding for a total of 274 aa (Figure 5(a)), has an additional 7 aa i.e., 'EREFQEV' at its C-terminus ( Figure S1B). Also, it differs from the Acc. No. CAA90620.1 with an additional 72 aa right after the proteolytic site, towards the C-terminus. Conversely, the alternatively spliced truncated transcript of ovine m-SCF (2) form (GU386373, GU386374 and GU386371; see Table S1) reported in this study has been recognized as novel, with a new additional 6 aa residues i.e., 'KTYKHS' as its C-terminus ( Figure 6(c); see also Figure S1B and S3(c)), right after D 175 G. This ovine m-SCF (2) form completely lacks the proteolytic site, a transmmebrane region and the cytoplasmic tail ( Figure 5(b)), in contrast to the 245 aa SCF (2) form that has been widely reported in other mammal species. Both soluble and transmembrane forms of SCF are active in promoting mast cell proliferation [5,122]. However, the transmembrane form appears to be more potent in maintaining the viability of primordial germ cells in vitro [125]. Mice that produce the soluble SCF (s-SCF) but not transmembrane SCF (m-SCF) suffer from anemia, lack pigmentation and are sterile [126]. This suggests that transmembrane SCF plays a special role in vivo that is separate from that of soluble SCF. Hence, the presence of both soluble and transmembrane SCF is required for the normal biological function. The proteolytic processing can also occur in mouse SCF at a secondary site at or near the tetra-peptide 'KAAK' in exon 7 [10]. This secondary proteolytic cleavage site appears to be species-specific as in case of human there is an amino acid sequence divergence in this ('KAKN') region directing no protein processing [10]. The oSCF may also lack this secondary processing site as the amino acid sequence differs by 'KASN' from the mouse in that region ( Figure S3(d)).
The ovine s-SCF (+) isoform-1 specific 59 RACE amplification ( Figure 1A(c)) yielded a 364 bp amplicon with its isoform specific primer pair (Table S2) which is highly conserved among other mammals ( Figure S4(a 2 )). Conversely, the common CDS region (+/2) primers (Table S2) yielded two different amplicon of sizes 325 bp and 215 bp ( Figure 1B(d)) for the 59 RACE RT-PCR which are subsequently differentiated by their 59 UTR differences ( Figure S4(a 1 )) and characterized as ovine m-SCF (2) isoform-2a/ 2b, respectively (in this study). All three 59 RACE amplicons differ by their length for the (+) and (2) form as shown in Figure S4(a 1 ). Owing to its high G+C content (65%; Figure S4(a 3 )), sheep SCF mRNAs has the potential to form compact, thermodynamically stable secondary structures (Figure 8(a,b)), due to the third hydrogen bond in G-C pairs compared to A-U pairs, and the ability of guanine residues to interact with uracil in folded RNA [127]. Henceforth, it favors the amplification of minor oSCF 59 RACE cDNA products (in our case, iso-2b (2); Figure S4(a 1 )). The elevated G+C content is predicted to affect folding of the cDNA templates, compromising DNA polymerase processivity [111]. G+C sequence bias is a well known problem in cDNA profiling studies [128]. This is not only because of the fall out of Taq DNA polymerase during PCR, also at certain level of reverse transcription by reverse transcriptase since our sequenced individual clones of all three 59 RACE products (+/2) has the complete 59 adapter forward primer sequences complementary to the 59 end capping (C-tail). The GC-rich non-coding 59 segment of SCF forms a dense secondary structure (Figure 8(a,b)) that may have the consequences for oSCF protein expression. For example, translation may require specific mRNA unwinding activity, creating another mode of possible post-transcriptional regulation [129]. Furthermore, mRNA hairpin structures are known to obstruct ribosome elongation [130] and G+C content is inversely correlated with translation efficiency [131].
Apart from the classical 273 or 274 aa SCF starting with 'MKK' as its N-terminus sequences, there are a number alternatively spliced protein/peptide sequences do exists for SCF, resulting in a unique or skipped N-terminus sequences such as, N-terminus starting with 'MPSCLAAQ' (protein: Consistent with the already reported and submitted SCF sequences, oSCF gene consists of 9/10 exons separated by 8/9 introns (Figure 3(e,f)). Exon sizes correlate well with those reported for the human, mouse and dog (source: GenBank, Ensembl). From the gDNA spliceosomal intron-5 amplification, the premature termination could be explained by the use of an alternative isoform/cryptic 59 donor site at nt pos. 218 (GT, Figure 6(b) and S4(b); right after 57 nt of the p(A) 11 ) and a constitutive 39 acceptor (AG) at nt pos. 728 (just before the start of exon 6) or the one at nt pos. 350 recognised by the transcription machinary (Spliceosome) and/or the lack (?) of any consensus 39 splice site sequence downstream of exon 6 to exon 9/10 prevents the removal of the 161 nt intronic sequences which is present in the shorter cDNA (Figure 2(a,b) and S4(c)). The retaining of 161 bp noncoding DNA (intron-5) sequences in the truncated shorter m-SCF (2) cDNA might have arisen from failure of the transcription machinary to correctly remove the intronic sequence from the skin oSCF mRNA transcript. Though the chromosomal number was determined in sheep (chr 3) [105], it was observed that the sheep SCF locus is yet to be mapped (see Figure 4), depicting its unfinished status of the Sheep Genome Project at this juncture (current version Oarv2.0, March 2011 -till date, http://www. livestockgenomics.csiro.au/sheep/oar2.0.php). The mechanism illustrated in Figure 3(c,d) (for splicing notation) explains how the truncated oSCF mRNA could have been generated in the normal skin and adds to the list of variants of the SCF gene that undergo alternative splicing (AS).
Previous studies have shown that skin expression of SCF stimulates melanocyte migration, proliferation, differentiation, and survival and is required for ongoing maintenance and survival of normal melanocyte numbers in adults [132]. SCF (KL) upstream region is associated with significant differences in human skin color, one of the most obvious superficial differences between human populations [133]. Although no amino acid differences are known in the SCF (KL) protein of different human groups, SCF is expressed at significantly higher levels in skin keratinocytes from Africans than Europeans [134]. The interruption of SCF-KIT signalling using anti-KIT antibody abolished tyrosinase and MITF expression, resulting in the depigmentation of hair follicles in a reversible manner [16].
The preliminary analysis of oSCF gene expression in skin, showed similar mRNA (cDNA) expression profile between (+) and (2) form among white and coloured animals (data not shown). Our result was in agreement with porcine SCF (KL) gene expression for exon 6 [121]. However, this would require verification via more sensitive qRT-PCR methods on reasonable number of breeding populations i.e., F 2 generations. Conversely, Northern blot analysis (Figure 7) revealed considerable difference between oSCF (+) and (2) form providing a hypothetical clue on transcription regulation via an intron-5 AS event. Different biological activity have been reported between the membrane anchored (2) and the soluble forms (+) of SCF [9,11]. In 1999, Dr. James M. Grichnik, wrote in his reply to [135] ''While both forms of SCF activate its receptor, KIT, the duration of activation and potential for receptor degradation is different for each form. Keratinocytic bound SCF may lock on to the melanocyte's KIT receptor resulting in persistent KIT activation (without KIT receptor internalization and degradation), while soluble SCF may transiently activate the KIT receptor followed by internalization and degradation''. This implies that the membrane-bound steel factor induces more persistent tyrosine kinase activation and longer life span of c-KIT gene-encoded protein than its soluble form. More sustained signaling was mediated by membrane associated SCF in a myeloid cell line where as the soluble SCF down regulates cell surface expression of c-KIT and promotes receptor proteolysis [136]. The differential expression of SCF-specific mRNA splice variants, SCF-1 and SCF-2 in immature and mature human mast cells may play a role in autocrine stimulation, maintenance of survival and the differentiation of tissue mast cells [137]. An increased level of soluble SCF expression in the skin has been implicated in the pathogenesis of mastocytosis that could theoretically be due to the abnormality at any level of metabolism occurring after the mRNA transcription and splicing rather than the result of changes in the sequence or regulation of the gene itself [19]. Hence, further investigation regarding sheep skin SCF gene expression is required at cellular level rather than at tissue basal level. The possible functional role of these two oSCF isoforms in skin remains poorly understood. According to AceView [89] gene expressinon analyses, SCF is defined by 198 GenBank accessions from 192 cDNA clones, some from brain (seen 14 times), trachea (13), placenta (9), thalamus (7), whole brain (7), lung (6), amygdala (5) and 61 other tissues excluding skin. Molecular biological analyses of murine follicular skin indicated a significant increase of membrane-bound SCF expression [16], after anagen induction in concert with the escalation of cutaneous tyrosinase activity and corresponding pigmentation.
Eukaryotic splicing produces a variety of functional and nonproductive mRNAs during normal gene expression [138].
While alternative splicing greatly enhances recurrent errors that include exon skipping, intron retention, and activation of cryptic splice sites [138]. The resulting aberrant RNAs may outnumber correctly spliced mRNAs among initial spliceosomal products [139]. This could be one of the reason for the oSCF (2) form to be present predominant over (+) form during the reverse transcription reaction (RT) and its subsequent PCR amplification. For proteincoding genes with multiple exons, the majority of aberrant RNAs contain a premature truncation codon (PTC; in our case, the shorter ovine m-SCF (2) form) which are frequently produced in mammals are known to be degraded through the nonsensemediated decay (NMD) pathway [140]. However, the abundance of full length oSCF (2) mRNA transcripts in the skin of sheep argues against such degradation.
Control of gene expression is achieved at various levels. The cisregulatory elements, uORFs (in + and -form) and TOP (in + form) detected on the 59 UTR of oSCF just upstream to the AUG initiation codon ( Figure S4(a 1 )) are known to be involved in the translation down regulation. The uORFs can induce formation of a translation-competent ribosome that may translate and (i) terminate and re-initiate, (ii) terminate and leave the mRNA, resulting in down-regulation of translation of the main open reading frame, or (iii) synthesize an N-terminally extended protein [108]. The 59 TOP tract consisting of 5-15 pyrimidines that is required for coordinate translational repression during growth arrest, differentiation, development and certain drug treatments [141]. Deletion of the pyrimidine tract or exchanging purines for pyrimidines results in unregulated translation [109,141]. In our case, we observed the deletion of TOP sites in the two shorter 59 UTRs of oSCF-2a/2b (2) form ( Figure S4(a 1 )). Regarding the 39 UTR cis-regulatory sequences such as AREs (PAS) [110], BRD-Box [111] and MBE [112] mediates negative post-transcriptional regulation by affecting mRNA transcript stability and translational efficiency [110,140]. In our case, the 39 cis-regulatory signals, BRD-Box and MBE, located upstream and downstream PAS ( Figure S4(d,c)) may regulate tissue-specific alternative polyadenylation which has been detected in approximately 54% of human genes [142]. The exact role of the conserved miRNA target sites (Figure 9(a,b)) in SCF is currently unknown, although this conservation in other farm animals (71-100%) suggests functional importance (evolutionary pigmentation adaptation). On the other hand, various miRNA target sites in the longer 39 UTR (data not shown) might signify that the mRNA is regulated specifically in different tissues or at different times during development. The potential role of miRNAs in SCF gene regulation is currently unidentified in particular for hair follicle melanogenesis.
SCF is a member of the helical cytokine structural super-family characterized by a double crossover four-helix bundle topology [143]. The N-terminal 141 residues of SCF have been identified as a functional core, SCF 12141 , which includes the dimer interface and portions that bind and activate its receptor, c-kit [112]. The homology-based structural modeling results showed that the protomer structure of oSCF contained 4 a-helices and 2 b-sheets that were folded to form the non-covalent homodimer composed of two slightly wedged protomers [114]. The two disulfide bridges between Cys 29 /Cys114 and Cys 68 /Cys 164 ( Figure S5) plays a role in maintaining the functional integrity of SCF [143] and are highly conserved in mammals except for fishes where it is replaced with Ile 27 /His 107 ( Figure S3(d)). The available PDB crystallographic models for SCF proteins such as 1EXZ, 1SCF and 2E9W:chain C, D [112][113][114] share the same canonical fold. The superimposition of our modelled structure(s) to the individual templates revealed identical structural features as described in [112]. The folding differs in some regions from the above mentioned models with an additional 3 or 4-turn helices as depicted in Figure S5. The previously determined crystal structure 2E9W, demonstrates the interaction between SCF and its receptor, c-KIT [114]. In which, each protomer of SCF binds exclusively to a single KIT molecule and that receptor dimerization is driven by SCF dimers that facilitate additional receptor-receptor interactions. Dimerization of KIT is driven by bivalent SCF binding whose sole function is to bind SCF and to bring together two KIT molecules [114]. The three potential binding region of SCF i.e., site I, II, III for its receptor, c-KIT has been well explained in [114] and the same are shown in Figure S5 (see also Figure S3(d)). There are notable differences found in the interacting residues of KIT and SCF [114]. Mutational analysis of SCF has shown that replacement of Asn 35 with alanine or glutamic-acid residue, reduces the binding affinity of SCF towards KIT by approximately 10-fold and Asn 35 (in human, chimpanzee and marmoset) or Asp 35 (in other species) is required for the biological activity [144]. ClustalW comparison ( Figure S3(d)) of the receptor-binding interface in SCF from different species shows the high conservation for Asn 35,36 in human, chimpanzee, marmoset or Asp 35,36 in sheep, goat, cow, pig, dog, panda, cat, horse, chicken, zebra finch, zebra fish and gold fish and Asp 35 , Asn 36 in mouse, rat and rabbit ( Figure S5). Similarly, Asp 79 of SCF in human, chimpanzee is substituted by a Leu 79 in mouse or Val 79 in sheep, goat, cow, pig, dog, panda, cat, horse, rabbit, rat, zebra fish and gold fish or Ser 79 in marmoset, chicken and zebra finch. Besides, Lys 106 in sheep, goat, cow, pig, dog, panda, cat, horse, mouse, rat and rabbit is substituted by Asn 106 or Arg 106 in human, chimpanzee, marmoset and chicken, zebra finch respectively. In addition, Glu 113 of SCF in sheep, goat, cow, pig, dog, panda, cat, horse, rabbit, human, chimpanzee, marmoset, is substituted for by Leu 113 and Ala 113 residues in mouse and rat, chicken, zebra finch, respectively. Similarly, Phe 127 (loss of a hydrogen bond) in human, chimpanzee, marmoset is substituted with Ser 127 in sheep, goat, cow, pig, cat and rabbit which is quite common in protein fucntional centres, most likely able to form a hydrogen bond. All these substitutions ( Figure S3(d)) involved in salt bridges, hydrogen and van-der-Waals bonding may account for the reduced affinity of SCF towards its receptor, c-KIT [145]. SCF (KITLG) was found not only in the mammal species such as sheep, goat, cow, pig, cat, dog, panda, horse, human, chimpanzee, marmoset, mouse, rat, and rabbit but also in avian such as chicken, zebra finch and fishes, such as zebra fish, gold fish, indicating that it had the co-emergence with huge divergence across species (Figure S6(a-e)). The enormous evolutionary distance on the phylogentic tree (branch length) indicate the low sequence identity of the fish (Figure 6(a); see also Figure S3(d)) species to the other mammal species ranging between ,20-55% for the SCF (+) and (2) protein sequences which implies SCF evolutionary changes may make it as monophyletic group(s) for more pigmentation adaptation in a wide range of habitats. One such example is that, the cis-regulatory (UTRs) changes in SCF (KL) expression contribute to pigmentation differences in both sticklebacks and humans suggesting its contribution to natural variation in vertebrate pigmentation and those similar genetic mechanisms may underlie rapid evolutionary change in sticklebacks and humans to rapidly evolve changes in pigmentation patterns [146]. The little skate in the tree topologies ( Figure S6(ae)) especially those nodes showing ,60% bootstrap values viz. horse to dog; mouse, rat to rabbit; and rabbit to primates are most likely reflects the use of incomplete SCF sequences from the gene/ genome databank (partial sequences, unfinished genomes) or due to the use of unwanted gaps in the alignment or could be the huge sequence divergence at certain level in the block analyzed in the present study.

Conclusion
The study that we describe here represents the first attempt to experimentally address the SCF mRNA/cDNA structural coverage in the skin of merino sheep. The analysis of coat color gene(s) structure unique to sheep will extend our understanding of the functional role and regulation of pigmentation genes beyond what was known in mice, humans and other mammals. Here, we have presented evidence for two splice variants of ovine SCF, differing in the cassette exon (CE 6-9/10; skipping of exons 6-9/10) by a premature termination in the non-coding intron 5, which resulted in the presence or absence of a proteolytic site and there by the following transmembrane region and cytoplasmic domain. To our knowledge, this information is previously unreported. Further research is required to determine whether this premature terminated isoform has biological relevance, and whether it leads to the active variant proteins with effects on melanocytic, reproductive or haematological development. The functional role of these two transcripts in ovine skin-specific expression remains unknown. It is important to elucidate which SCF transcript(s), either soluble-SCF (+) or membrane-SCF (2), predominate in the skin which will provide a new insight into an elaborate mechanism involving m-SCF/c-KIT and its counteracting s-SCF/c-KIT signaling that will add to the understanding of the regulation of SCF on hair follicle melanogenesis. In addition, characterization of the SCF promoter(s) is also critical to the design of experiments intended on analysis of the role of various SCF isoforms in vivo using gene targeting techniques. Also, in connection to [33], it would be interesting to determine whether any of the individuals (white, black, and brown) in their families (F 2 generation) have alterations in the SCF gene expression at allele level (QTL/SNPs) or it may have the other alternative splice variant(s)? or lacking any particular reported SCF variants or duplication [147] and/or SCF DNA rearrangement [148]. Future studies exploring other candidate genes are underway especially those involved in the pigmentation regulatory network namely c-KIT and MITF. Altogether, these genes are likely to provide great insight into our understanding of molecular mechanism of the white trait in merino sheep. In this context, further developing ovine chip(s) with key pigmentation associated genetic information such as c-KIT, SCF, MITF, MC1R, ASIP and FGF etc., will open up promising perspectives on using those molecular information in the management of breeding schemes of sheep populations i.e., aiming at Gene Assisted Selection (GAS). Comparison of the primary RT-PCR product of 621 bp CDS covering the putative primary proteolytic site of white, black and brown animal (representative data from one of three animal is shown). The start (ATG) codon is labeled in bold blue letters and the +84 bp proteolytic site is indicated in bold black italic letters. The virtual translation of 606 bp CDS corresponding to the 202 aa (in bold black letters) is given below to the 'white' nucleotide sequences; (B). Comparison of complete coding sequence (CDS) and its corresponding deduced amino acid sequence of the newly isolated Ovis aries SCF isoform-1 (+) and isoform-2 (2) with the partial GenBank records of oSCF (+) sequences. The newly identified oSCF cDNAs from the skin of white merino sheep (GU386372 (+); GU386373 (2), see. Table  S1) are marked in bold black letters. While the other two oSCF partial CDS sequences (U89874.1; Z50743.1) retrieved from GenBank, NCBI. Dotted black arrows indicate the corresponding common forward primer 'scffwd1' and (+) form specific reverse primer 'scfrev1' used to amplify the initial 621 bp (see also Figure 1A(a)). The highlighted opened black box indicates the flanking partial 59 UTR sequence (15 bp) of the forward primer sequence (see . Table S2 and Figure 1A). The start (ATG) and stop codons (TAA) are labeled in bold blue and bold red letters respectively. The virtual translation of oSCF (+) and (2) forms are given below to the respective triplet codons and highlighted in bold black letters. The +84 bp putative primary proteolytic site and its virtual translation (+28 aa) are indicated in bold black italic letters. Similarly, the substitution of aspartic acid (D) with glutamic acid (G) i.e., D(+) 175 G(2) is indicated in bold light orange to bold light green letters respectively (see the chromatogram of cDNA on the left side). The new truncated protein isoform of oSCF (2) form having a short stretch of 6 aa sequences as its C-terminus (in bold black letters) is highlighted by opened green box. Two clone differences are highlighted in bold red and bold light blue letters (see the respective cDNA chromatograms given on left side). shown. In addition, ClustalW2 comparisons of the three potential receptor (c-kit) interactive sites (Site I, II and III) in SCF from different species [114] are shown. Sheep s-SCF orthologous evolutionary aa substitutions are highlighted in black bold letters. The four cysteine residues involved in disulfide bridges are indicated in pink bold letters and the orthologous aa substitutions in avian species are highlighted in red bold letters. An additional aa residue at Glu(E) 155 in sheep s-SCF which differentiate it from primates and rodents is highlighted in blue bold letters, which is conserved in farm animals suggesting a functional importance of this residue. Besides, the 28 aa proteolytic site, the putative alternative proteolytic site, a tetra peptide [10] is indicated in black bold letters. (DOC) Figure S4 Nucleotide sequence comparison of 59 and 39 untranslated regions (UTRs) of sheep SCF isoform-1 (+) and isoform-2a/2b (2), the predicted UTR regulatory motifs and the possible splice donor/acceptor sites on intron-5 are shown. (a 1 ) Sequence alignment shows sheep SCF 59 UTR length differences between isoform-1 (+) and isoform-2a/2b (2). The additional sequences of 59 UTRs are indicated in green, light orange and blue opened boxes for isoform-1 (+), isoform-2a (+) and isoform-2b (2), respectively. The cis-regulatory elements located in the the 59 UTR such as TOP and uORFs are labeled and indicated in red opened boxes. The trinucleotide elements such as 'CGC' and 'TGC' are highlighted in bold black letters and by underline respectively. The hexamer direct repeats (DRs) are labeled and indicated by opened boxes. Clone differences are labeled in bold red to bold black letters; (a 2 ) Alignment shows SCF 59 UTR nucleotide sequence conservation of hexamer DRs (in opened boxes) with other mammals; (a 3 ) Histogram shows the GC% of three different 59 UTR of sheep SCF; (b) The complete sequence of sheep SCF intron-5 (729 bp) shows the constitutive splice donor (GT, in bold blue upper case letters) site at the start and the constitutive splice acceptor (AG, in bold red upper case letters) site at the end. Other alternative isoform/cryptic splice donor (gt), acceptor (ag) sites are labeled in blue, red lower case letters respectively. The dinucleotide repeats, polyA stretch (p(A)) and predicted splice branch sites (BS, in green lower case letters) are labeled and highlighted in opened boxes; (c) Nucleotide sequence alignment shows 100% similarity of 39 UTR of isoform-2 (2) with 161 bp retained intron-5 of sheep SCF. The p(A) stretch and the conservation of dinucleotide repeats flanked by two tandem repeats (TRs) on either side of 39 UTR are marked in opened boxes along with its counterpart sequences on intron-5 in other animals; (d) Sequence alignment shows two different 39 UTRs of sheep SCF isoform-1 (+) and isoform-2 (2). The +84 bp proteolytic site is indicated in bold black italic letters. The common identical CDS just upstream to the proteolytic site are indicated in opened box. The 39 UTR regulatory motifs such as BRD, MBE and hepatamer DRs are labeled and highlighted with opened boxes. In the above figure, the AREs located in the 39 UTR near by the canonical PAS are indicated by an underline and the single base variants of its type is highlighted in blue letters. Similarly, the start (ATG) and stop codons (TAA) are highlighted in bold blue and bold red letters respectively. (DOC) Figure S5 Three-dimensional structure of oSCF monomer generated by homology-based modelling represented by a ribbon diagram. Here the superimposition of oSCF monomer to the PDB template 1EXZ:chainB (set to transparency) is shown. The 4 a-helix, two antiparallel b-sheets, two additional one-turn helix are labelled in blue (aA, aB, aC and aD), red (b1, b2) and black (aB', aD') letters respectively. An exceptional hairpin loop between aB and aC is shown in red dotted line. The observed additional 3-4 turn helices are highlighted in green as G 1 to G 8 with the corresponding aa residues labeled respectively. The three potential interactive sites of SCF for its receptor c-kit are shown in bold letters as Site I, Site II and Site III [95]. In comparison to human SCF, the highlighted aa residue in red at Site I, II and III represents the orthologous substitution of aa residues in accordance with sheep, goat, cow, pig, dog, horse, cat and panda SCF to huSCF protein. The two disulfide bridges Cys 29 /Cys 114  Table S1 GenBank Accession Nos. and description of ovine SCF cDNAs submitted to NCBI.