An Evaluation of a SVA Retrotransposon in the FUS Promoter as a Transcriptional Regulator and Its Association to ALS

Genetic mutations of FUS have been linked to many diseases including Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal Lobar Degeneration. A primate specific and polymorphic retrotransposon of the SINE-VNTR-Alu (SVA) family is present upstream of the FUS gene. Here we have demonstrated that this retrotransposon can act as a classical transcriptional regulatory domain in the context of a reporter gene construct both in vitro in the human SK-N-AS neuroblastoma cell line and in vivo in a chick embryo model. We have also demonstrated that the SVA is composed of multiple distinct regulatory domains, one of which is a variable number tandem repeat (VNTR). The ability of the SVA and its component parts to direct reporter gene expression supported a hypothesis that this region could direct differential FUS expression in vivo. The SVA may therefore contribute to the modulation of FUS expression exhibited in and associated with neurological disorders including ALS where FUS regulation may be an important parameter in progression of the disease. As VNTRs are often clinical associates for disease progression we determined the extent of polymorphism within the SVA. In total 2 variants of the SVA were identified based within a central VNTR. Preliminary analysis addressed the association of these SVA variants within a small sporadic ALS cohort but did not reach statistical significance, although we did not include other parameters such as SNPs within the SVA or an environmental factor in this analysis. The latter may be particularly important as the transcriptional and epigenetic properties of the SVA are likely to be directed by the environment of the cell.


Introduction
Genetic variation which alters the primary sequence of a protein has allowed tremendous insight into underlying mechanisms associated with predisposition, progression and severity of diseases. However, most genetic variation identified in candidate gene and genome wide association studies associated with disease processes is within non coding regions. This has led to a greater analysis and emphasis on the importance of gene-environment interactions in which tissue specific or stimulus inducible challenges target transcriptional regulatory domains to alter mRNA abundance underlying the disease process. Amyotrophic Lateral Sclerosis (ALS) is one disease in which such a mechanism may play a significant role, because although about 5% of ALS is familial (FALS), in most cases of ALS the patient has no family history of the disease (sporadic ALS; SALS). Nevertheless, cases with a significant genetic component can give us insight into which signal transduction pathways may be compromised in the development of the disease as they can highlight processes which may be targets for the challenges which trigger ALS.
FUS (Fused in sarcoma) found on chromosome 16p11.2 is a RNA binding protein. Mutations in its coding exons have been identified in some cases of FALS and it is therefore a candidate for genetic association with ALS [1]. The number of ALS cases attributed to mutations in the FUS gene is small; FUS mutations are present but rare in SALS at around 1% [2][3][4][5] and found in only 3-5% of FALS [6,7]. Although rare genetic mutations in the FUS gene account only for a small proportion of apparently nonfamilial SALS, FUS positive inclusions have been found in the anterior horn of the spinal cord in SALS patients without FUS mutations, and in non-SOD1 FALS [8]. Whilst FUS is ubiquitously expressed, the levels of FUS may be critical for cell viability, and modulation of expression may be associated with the initiation or progression of ALS suggesting a role for the environment in modulation of levels of FUS gene expression. A differential response in gene expression to the stimulus could be modulated by the genotype thus allowing for a Gene x Environment interaction (GxE) in the initiation or progression of conditions such as ALS in which FUS is implicated. This would be consistent with a recent mouse model in which over expression of wildtype FUS caused progressive motor neuron degeneration in an age-and dosedependent fashion [9]. We therefore undertook an analysis of the FUS locus to determine potential regions of genomic variation that are candidate domains to direct differential gene expression in response to environmental challenge.
Although it is difficult to accurately predict the regulatory domains for a particular gene other than the proximal promoter (often 0.5 to 1 kb upstream of the transcriptional start site), our group and others have demonstrated important domains for gene regulation can reside in both the most evolutionary conserved regions (ECRs) which are non-coding [10][11][12] and the highly polymorphic and often rapidly evolving variable number tandem repeats (VNTRs) [13][14][15][16][17][18][19]. In both cases the ECR or VNTR can be tens of thousands of bases from the major transcriptional start of a gene [20]. Genetic variants in both classes of domains are often clinical correlates of disease progression [10,14,21]. The searches for potential areas involved in transcriptional regulation can be aided by utilisation of ENCODE (encyclopaedia of DNA elements) data searching for the presence of potential transcription factor binding sites, active histones or DNase 1 hypersensitivity clusters [22,23]. We performed such a bioinformatic analysis of the FUS locus and highlighted one large VNTR region 59 of the FUS gene which overlapped active histones and other ENCODE data suggesting it might act as a transcriptional regulatory domain ( Figure S1). Further analysis demonstrated the VNTR was part of a larger primate specific retrotransposon termed a SINE-VNTR-Alu (SVA) element. SVAs are the most recent family of retrotransposons to insert into the human genome with 2676 SVAs identified in the Hg19 release from UCSC genome browser [24]. There is considerable interest, but limited data available describing the role of retrotransposon elements in human health with 96 disease causing insertions having been identified as of 2012 [25]. In the ageing brain somatic retrotransposition has been demonstrated and this plasticity in the genome has been suggested to play a role in the diseases associated with an ageing population [26,27]. Furthermore in tumours it has been shown that epigenetic modulation of retrotransposons in general including SVAs can vary in cancer progression, specifically, alterations in methylation patterns have been detected [28]. The SINE region of the SVA derived from the human endogenous retrovirus K10 (HERV-K10) has been used to classify SVAs into subtypes A-F with the age of each subtype ranging from an estimated 13.6 million years for the oldest (SVA A) to 3.2 million years for the youngest (SVA F) [29]. An additional subtype was identified that contains sequence from exon 1 of the MAST2 gene and associated CpG island at the 59 end of the SVA and was named CpG-SVA, MAST2 SVA or SVA F1 [30][31][32]. The SVA in the FUS gene is classified as subtype D, or SVA D. Based on data from the UCSC browser using the human genome sequence release 19 as the reference genome (http:// genome.ucsc.edu/), this particular element is found only in humans and chimpanzees amongst the primates.
Retroviruses, exogenous and endogenous, have been linked with ALS [33]. An increased prevalence of reverse transcriptase (RT), a key enzyme in the retrovirus life cycle converting RNA to DNA, has been observed in the serum of patients with SALS [34,35]. In the second study [34] the elevated RT enzyme levels were interpreted as indicative of involvement of an endogenous retrovirus rather than an exogenous retrovirus as blood relatives also had elevated levels whereas spouses were the same as controls. A further study has implicated retrotransposons as having a role in ALS because HERV-K transcripts and RT protein were detected in autopsy brain tissue of patients with ALS along with the aberrant expression of TDP-43 [36]. These authors suggested targeting of activated genome-encoded retroviral elements may open new prospects for the treatment of ALS. The cellular environment that led to this increased expression of HERV-K transcripts and RT may be a global change that could influence the expression or activity of other retrotransposons in the genome for example epigenetic changes across mulitple loci of retrotransposons have been shown in cancer [28]. We hypothesised that the SVA upstream of the FUS gene could be one such domain. The activation does not have to lead to retrotransposition for it to affect gene expression in adjacent genomic loci, as alteration of epigenetic factors may modulate any transcriptional properties embedded within the SVA. Our hypothesis is that the SVA domain could have significant potential to modulate gene expression at the FUS locus and that the variation in the VNTR could support differential gene expression based on the challenge that the cell receives. Therefore we addressed the ability of the SVA D 59 of the FUS gene to act as a classical transcriptional regulator in reporter gene constructs in vitro and in vivo. We further addressed its potential polymorphic variation and whether such variation acts as a predisposing factor for ALS.
Cell transfection and dual luciferase assay SK-N-AS cells were co-transfected with test constructs (firefly luciferase reporter gene) and an internal control construct, pMLuc-2 (renilla luciferase reporter gene; Novagen, USA) using TurboFect Transfection Reagent (ThermoScientific/ Fermentas, R0531) according to manufacturer's protocol in 24-well plate format. Transfectant was removed after 4 hours of incubation and exchanged with fresh medium and subsequent luciferase activity assays performed after 48 hours of incubation.
Luciferase activity of reporter constructs was measured using a Dual Luciferase Reporter Assay System (Promega, USA) using lysates from transfected cultured cells according to manufacturer's instructions. Assays were carried out on a Glomax 96-well microplate Luminometer (Promega, USA) using 20 ml of cell lysate. Measurements were averaged from 6-fold replicates to minimize pipetting errors and repeated at least three times to confirm results. Statistical analyses were performed using MSExcel software and a one tailed t-test to measure the significance of fold activity of the FUS SVA and TR/VNTR over the minimal promoter of the pGL3P vector *P,0.05, ***P,0.001, and to compare the activity of the alleles of the SVA and the TR/VNTR to each other # P,0.05.

Construction of plasmids for in vivo fluorescent models
Generation of tomato reporter plasmid. Tomato gene sequence was PCR amplified from pG-tdTomato (a kind gift from Marco Marcello, University of Liverpool) using primers Tomato UP 59-ATAGGAATTCCGTGTACGGTGGGAGGTCTA-39 and Tomato DOWN 59-GGCCGTCGACATCATTT-TACGTTTCTCGTTC-39 which introduce Eco RI and Sal I restriction sites, upstream and downstream respectively, for directional cloning into the plasmid pIRESGFP (kind gift from John Gilthorpe). The pIRES-GFP cassette was removed using EcoRI and XhoI restriction sites and replaced by the Tomato reporter gene, such that it was downstream of the chick b-actin promoter.
Generation of human FUS L-SVA and L-TR/VNTR in vivo reporter plasmids. The generation of the proximal FUS promoter reporter plasmid is described elsewhere (Kursheed et al. in preparation). Briefly, human FUS promoter sequences (2160/+ 84) were cloned into the SacI/BamHI sites of the promoter-less reporter vector phrGFP (Stratagene,UK) upstream of the GFP reporter gene. Identity was confirmed by sequencing and plasmid named ppGFP. The FUS SVA and isolated TR/VNTR sequences, both isotype 'long' allele were amplified by PCR from L-SVA and L-TR/VNTR reporter plasmids described above using standard Phusion polymerase conditions (NEB Biolabs) with the addition of 3% DMSO (v/v). The primers used are outlined below and included NsiI and XbaI restriction enzyme sites (underlined) to facilitate directional cloning: SVA UP 59-TTGCATGCATGT-GACTATTGCATACCTTGC-39and SVA DN 59-GACGTC-TAGAGGAGAGGTTGTCATGGTACA-39 and TR/VNTR  UP  59-TTGCATGCATCAGTTTTCCCTCAGACCCAG-39and TR/VNTR DN59-GACGTCTAGAGTTGGGGG-TAAGGTCACAGA-39. The resulting products were cloned into the Nsil/Xbal sites of FUS ppGFP and sequences were verified, this created L-SVA ppGFP and L-TR/VNTR ppGFP.

Manipulation of chick embryos
Fertile chick eggs were incubated at 37.8uC for two days until they were approximately developmental stage 14 HH. 2-3 ml of albumen was removed and a window was cut in the egg. Embryos were staged according to Hamburger and Hamilton [37]. In those at stage 11-14 the vitelline membrane was removed to aid manipulation of the embryo. The lumen of the neural tube was injected with a solution containing 2-5 mg/ml of test DNA reporter plasmid, 1 mg/ml of Tomato plasmid (control for successful injection) in PBS containing 1 mM MgCl 2 and 0.2%(v/v) fast green (to help visualisation). Injections were undertaken with a pulled micropipette made from a borosilicate capillary (Warner Instruments). Post-injection, DNA was immediately electroporated into the cells of the neural tube; gold plated electrodes of 3 mm length (Harvard Apparatus) were placed either side of the embryo with an internal gap of 5 mm and 5650 ms square wave pulses with 100 ms gaps were delivered. Electroporated embryos were incubated at 37.8uC for 48 hours until they were approximately developmental stage E5 and then assessed for expression of plasmid DNAs. Electroporated embryos were dissected out and photographed using epifluorescent microscopy.

Genotyping the VNTR of the FUS SVA
The following primers; forward 59CAGTTTTCCCTCA-GACCCAGCAC 39 and reverse 59GAGCTGTTGGGTA-CACCTCCCAGAC 39 were used to amplify the TR/VNTR sequences within the SVA 59of the FUS gene in a SALS and matched controls cohort from the King's College London MND DNA Bank by PCR. All participants gave ethically approved written consent to participate in the study, which was approved by the South London and Maudsley Ethics Committee (reference 222/02). The templates were 5 ng of genomic DNA from the SALS patient samples and matched controls and amplification reactions used Taq polymerase with FailSafe 2XD buffer (Cambio) following the recommended protocol. The products were run on a 1.2% agarose gel stained with GelRed Nucleic Acid Stain (Biotium) and visualised using a UV transilluminator (BioDoc-it Imaging System).

Results
The SVA 59 to the FUS gene is a transcriptional regulator Analysis of the FUS gene +/215 kb identified a large repetitive region approximately 10 kb 59 of the FUS gene and 20 kb from the 59 end of the PRSS36 gene using the UCSC genome browser ( Figure 1A). This repetitive region is part of a larger SVA D element. ENCODE data demonstrated that this SVA overlapped or was adjacent to many features that suggested that it could be regulatory in nature. These included; 1) an area of active histones, H3K4Me1, which are associated with transcription factor binding in genome-wide datasets, 2) human ESTs have been identified which originate and are transcribed in both directions from this location and 3) DNase 1 clusters are located on each side of the SVA ( Figure S1). The SVA is present in chimpanzees and humans but not in other primates and does not contain the 59 CCCTCT hexamer repeat found in a canonical SVA ( Figure 1B). Analysis using rVista through the ECR browser identified 146 conserved transcription factor binding sites between the human and chimpanzee SVA sequences; which included a variety of factors such as members of the Sp and GATA families.
The region encompassing this SVA D and the central repetitive region were prepared by PCR from commercially available DNA (Promega), cloned and the sequence validated. On sequence analysis two distinct alleles of the SVA were observed, which differed from one another by one copy of the repeat from the central repetitive region and could therefore be classed as a VNTR. SVAs in general can contain one or two central VNTRs sharing similarities in their sequences but which are distinct from each other. The occurrence of two central VNTRs as opposed to one is seen more frequently in the younger subtypes (D, E, F and F1). The FUS SVA appears to belong to the group of SVAs that contain two central repetitive regions as opposed to one. It is in the second of these repetitive regions where the difference between the two alleles is seen. Such variation in only the 2 nd domain of the central repeats has been noted in another SVA D located upstream of the PARK7 gene, which supports gene expression in a reporter gene model in vitro [24]. We therefore termed the two repetitive regions in the FUS SVA a tandem repeat (TR) and a VNTR when analysed individually and a TR/VNTR when in combination. The two alleles identified were named long (L) and short (S) and the sequence of the TR/VNTR within the SVA is shown in Figure 1C with the additional repeat in the long allele underlined.
Reporter gene constructs were prepared in the pGL3P vector including both variants of the SVA (L-SVA and S-SVA), and the isolated central TR/VNTR (L-TR/VNTR and S-TR/VNTR) ( Figure 1B). It was not possible to test the TR and VNTR as separate independent domains as they could not be amplified individually due to their location adjacent to each other, preventing design of a specific primer that would not bind to more than one of the repeats in the FUS TR or VNTR. Activity of the constructs was measured in the human neuroblastoma cell line SK-N-AS, which was shown by RT-PCR to express endogenous FUS, data not shown. Statistically significant differences were observed in the levels of reporter gene expression supported by the complete SVA or the TR/VNTR compared to the minimal SV40 promoter alone in pGL3P vector (S-SVA p,0.05, L-SVA p,0.05, S-TR/VNTR p,0.001 and L-TR/VNTR p,0.05). Both alleles of the complete SVA repressed reporter gene expression whilst both alleles of the TR/VNTR were activators in this cell line, demonstrating that the SVA may contain multiple and distinct regulatory domains, one of which is a dominant repressor in SK-N-AS cells (Figure 2). When comparing the long and short TR/ VNTR constructs no significant difference in the level of reporter gene activity observed was noted, however there was a small but significant difference in the levels of reporter gene expression when these variants were contained within the complete SVA sequence (p,0.05). In both the SVA and TR/VNTR constructs it was the long variant that showed lower activity when compared to the short.
We have previously demonstrated that human specific VNTRs can support tissue specific expression patterns in mouse transgenic models during development [38]. We wanted to address a similar model for the SVA but rather than use a mouse model we used the more convenient and practical chick embryo model [39,40]. The SVA and TR/VNTR (long allele) domains as used above in the SK-N-AS cell line were inserted into a reporter gene vector we had developed to allow us to visualise activity via hrGFP in the chick embryo model. Briefly the reporter vector phrGFP contained the proximal human FUS promoter 2160 of the major transcriptional start site to +84 cloned upstream of hrGFP, the TR/VNTR and SVA sequences were inserted immediately upstream of the promoter sequence. The minimal FUS promoter does not support gene expression in this model and therefore any marker gene expression is dependent on the cloned regulator.
The test plasmid was injected into the neural tube and then transfected into cells by electroporation; thus only one side of the neural tube should be transfected. The reporter gene construct was co-injected with an internal control, the tomato reporter plasmid directed by the chick b-actin promoter; the latter acts as an internal control marker for cells which have been successfully transfected. In this manner we addressed the activity and tissue specificity of the L-SVA and the L-TR/VNTR reporter. The series of FUS reporter gene constructs were injected into the developing embryo at embryonic stage 14HH and activity analysed at stage 22HH. Endogenous chick FUS expression was demonstrated by RT-PCR at this point in the development of the embryo (data not shown). The proximal FUS promoter alone did not support sufficient reporter gene expression to be observed in our assay ( Figure 3B). However, both the L-SVA and L-TR/ VNTR reporter gene constructs supported expression; which was readily observed in the neural tube of the chick embryo ( Figure 3E and 3H respectively).

Genetic variation in the FUS SVA
It has been previously demonstrated that VNTRs with distinct copy numbers of the repeating element can not only support tissue specific and stimulus inducible reporter gene activity but can also be differentially associated with genetic predisposition to a specific disorder, for example the human transporters for serotonin and dopamine [14,18,41,42]. We therefore expanded the analysis of the polymorphic variation associated with the VNTR within the SVA, addressing this in a cohort of 241 individuals with SALS and 228 matched controls. The genetic variation was analysed by agarose gel electrophoresis of the PCR The fold values of activity demonstrated by each construct compared to pGL3P normalised to the internal control (pMLuc-2) to account for differences in transfection efficiency are displayed. Both alleles when tested as a complete SVA showed repressive function and were significantly different to each other. When the alleles were tested as a smaller fragment consisting of the central TR/VNTR region they both showed enhancer properties. One tailed t-test was used to measure the significance of fold activity of the FUS SVA and TR/VNTR over the minimal promoter of the pGL3P vector *P,0.05, ***P,0.001, and to compare the activity of the alleles of the SVA and the TR/VNTR to each other # P,0.05. doi:10.1371/journal.pone.0090833.g002 fragments spanning the TR/VNTR region of the SVA. We found there were only two alleles that could be determined in this cohort (this analysis cannot determine SNP or small insertion/deletion variation within the SVA) ( Figure 4A). We confirmed the sequence from both a L and S allele after gel purification; this demonstrated that the L allele corresponded to the sequence found in the UCSC browser for the VNTR of this SVA element ( Figure 1C). The two alleles also matched the variants originally identified when cloning the SVA for reporter gene studies from commercially available DNA (Promega). The following genotype frequencies were observed in the SALS cohort 45.6% LL, 39% LS and 15.4% SS and 46.9% LL, 42.1% LS and 11% SS in the matched controls ( Figure 4B). Although there was a small difference of 4.4% between the frequency of SS individuals in the SALS cohort compared to the matched controls this was found not to be significant when analysed using CLUMP [43]. The T1 26N table statistic from CLUMP [43] was p = 0.36 and the clumped 262 T4 p-value was 0.33, both from 10,000 simulations. CLUMP simulations allow for the small cell values present in sparse 26N tables such as those in highly multiallelic repeat loci and prevent inflation of the test statistics from generating false positive results.

Discussion
We have demonstrated that a retrotransposon, of the SVA family, 59 of the FUS gene is both polymorphic and a transcriptional regulator domain. The SVA acted as a classical regulatory domain when analysed in reporter gene constructs in vitro and in vivo. This data would suggest that the SVA can affect FUS gene expression patterns by multiple mechanisms without the requirement for retrotransposition and that distinct polymorphic variants could act to direct differential regulation in response to the same environmental challenges. The transcription factor complement within the cell will be based on a specific stimulus the cell is receiving at any given moment and this synergistic tissue specific and stimulus inducible challenge may result in altering the complement of transcription factors able to direct function from the SVA. There can also be epigenetic variation across SVA elements dependent on the environment for example a change in methylation across retrotransposons was identified in cancer [28].
Both the alleles of the complete SVA and the TR/VNTR domain of the SVA were tested in an in vitro reporter gene assay. Distinct from standalone VNTR domains which we have previously addressed, the repetitive region of this particular SVA D contains two adjacent domains comprised of a TR and a VNTR and it was this composite element that was tested in the reporter gene assay. It is interesting that while both the L and S TR/VNTR regions were enhancers of activity, the intact L and S SVA acted as a repressor in the SK-N-AS cell line. This suggests that in addition to the activator region in the TR/VNTR the SVA contains a strong active silencer element, flanking this central TR/VNTR region, which is functional in the SK-N-AS neuroblastoma cell line. There are multiple conserved transcription factors within the FUS SVA sequence however the action of repressors or enhancers are often determined by the factors available in the cell at any given time therefore further analysis will be required to determine the action of specific transcription factors on the SVA. An alternative explanation for the difference in activity between the TR/VNTR and SVA may be due to proximity of the TR/VNTR domain to the reporter gene when part of the complete SVA element affecting its ability to enhance expression.
There was no significant difference between the activities of the two alleles of the TR/VNTR when tested alone, but there was a significant difference between the two alleles when tested as part of the complete SVA (p,0.05). To further validate the regulatory properties of this domain we tested its properties in the neural tube of the chick embryo. Although FUS is a ubiquitously expressed protein this region of the embryo contains motor neurons which are the appropriate cell type to test a domain that might be involved in ALS. As in the cell line model the long allele of the TR/VNTR domain acted as an activator but in this model the long allele of the SVA also demonstrated activator properties which were not exhibited in vitro. This would be consistent with our previous analyses of VNTRs from both the serotonin and the dopamine transporters demonstrating cell line specific properties in reporter gene constructs [41,44,45] and the intron 2 VNTR from the human serotonin transporter having tissue specific properties in a transgenic mouse model [38]. This particular system of analysing the transcriptional properties of a domain is not quantitative therefore we cannot compare the amount of expression activated by the TR/VNTR and the intact SVA.
Our functional data demonstrated the potential for the FUS SVA to act as a transcriptional regulatory element, however only a small difference in the function of the two alleles was observed, although we hypothesise such a difference could be increased upon exposure of the cell to specific challenges. Nevertheless the genotype of the SVA could be a factor which associates with a predisposition to disorders such as ALS. We therefore performed a genotype analysis of the TR/VNTR of the SVA in a SALS and the control cohort from the King's College London MND DNA Bank. This demonstrated two major alleles which we termed L and S and which correlated to those identified in the cloned commercial DNA (Promega). The frequencies of LL, LS and SS were not found to be significantly different in the sporadic cases compared to the matched controls, although a minor difference could be seen between the frequency of individuals with a SS genotype in SALS and control (15.4% vs 11%), when analysed using CLUMP [43]. This may also reflect in part that FUS mutations themselves are rare in SALS (1%) and that we need to address an environmental challenge as a modulator of FUS expression. A much larger cohort will be required to validate such variation as an association in the SALS cohort. Our study would not determine the potential SNP or indel variation in the SVA; such variation may be significant for both clinical association with disease and transcriptional properties of the SVA. Precedent for this exists; the long and short alleles of the VNTR within the promoter of the human serotonin transporter gene, in this example there is a genetic association based on GxE interactions, namely a SNP in the long allele makes it clinically similar to the short 'risk' allele in genetic associations [46,47].
In summary we have determined a novel primate tissue specific regulator that could play a role in FUS transcriptional regulation. This regulation could be modified by a number of environmental challenges including the changes correlated with the increased RT activity seen in the serum of patients that could affect the epigenetic structure of the FUS locus. This regulation could be further modulated by genetic variation in the SVA apart from the VNTR variant observed in this analysis thus allowing for a GxE interaction in any of the diseases' in which FUS is implicated.