The frequency of SMN gene variants lacking exon 7 and 8 is highly population dependent

Spinal Muscular Atrophy (SMA) is a disorder characterized by the degeneration of motor neurons in the spinal cord, leading to muscular atrophy. In the majority of cases, SMA is caused by the homozygous absence of the SMN1 gene. The disease severity of SMA is strongly influenced by the copy number of the closely related SMN2 gene. In addition, an SMN variant lacking exons 7 and 8 has been reported in 8% and 23% of healthy Swedish and Spanish individuals respectively. We tested 1255 samples from the 1000 Genomes Project using a new version of the multiplex ligation-dependent probe amplification (MLPA) P021 probemix that covers each SMN exon. The SMN variant lacking exons 7 and 8 was present in up to 20% of individuals in several Caucasian populations, while being almost completely absent in various Asian and African populations. This SMN1/2Δ7–8 variant appears to be derived from an ancient deletion event as the deletion size is identical in 99% of samples tested. The average total copy number of SMN1, SMN2 and the SMN1/2Δ7–8 variant combined was remarkably comparable in all populations tested, ranging from 3.64 in Asian to 3.75 in African samples.


Introduction
Spinal Muscular Atrophy (SMA) is one of the most prevalent genetic disorders in Caucasians with an incidence of approximately 1:10.000 newborns. SMA is a recessive disorder with a population dependent carrier frequency ranging from 1:35 in Caucasians to 1:91 in African Americans [1,2]. Based on age of onset and maximum motor milestones achieved, SMA is divided into different disease categories, distinguished as type 0 to type IV [3][4][5]. Type 0 and I are the most severe types with SMA onset beginning at birth, patients achieving minimal motor milestones, and a life expectancy of around two years. Intermediate types, type II and III, are milder forms of the disorder where patients are able to sit or walk and most individuals survive beyond 10 years of age. Type IV is the mildest form of SMA with an adult-onset of the disorder and patients having a normal life expectancy [4,5]. With the recent availability of treatment options, newborn screening and pre-symptomatic treatment of SMA is now being considered in several countries [6]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 In more than 95% of SMA cases, the disease is caused by the complete absence of the SMN1 gene or by SMN1 to SMN2 gene conversions [7][8][9][10][11]. In the majority of the remaining cases, intragenic mutations or partial deletions in the SMN1 gene cause the disease [12,13]. The copy number of both SMN1 and SMN2 is very variable due to instability of the 5q13.2 region, leading to a high frequency of deletion/duplication and gene conversion events [11,[14][15][16][17]. In the majority of populations, most individuals have two copies of both SMN1 and the almost identical SMN2 gene [18]. There is a single clinically relevant nucleotide difference between SMN1 and SMN2, c.840C>T located in exon 7 [19]. This single nucleotide difference affects splicing of the last coding exon, resulting in 90% of SMN2 transcripts forming a dysfunctional SMN protein [20,21]. The complete absence of SMN2 has no effect on healthy individuals. However, for patients who have no functional SMN1 gene copies, approximately 10% of SMN functionality is retained per copy of SMN2. This results in an inverse correlation between the number of SMN2 copies and disease severity [8,13,[22][23][24].
Both SMN1 and SMN2 consist of nine exons, historically named exons 1, 2a, 2b, 3, 4, 5, 6, 7 and 8. Using multiplex ligation-dependent probe amplification (MLPA), the presence of an SMN1 or SMN2 gene variant lacking exons 7 and 8 has been reported to have a frequency of 8% in healthy Swedish individuals [18], and 23% in healthy Spanish individuals [25]. During initial validation tests of a new version, B1, of the SALSA MLPA Probemix P021 SMA at MRC Holland the copy number status of both SMN1 and SMN2 were determined in a large number of blood derived DNA samples from Dutch individuals. Results showed that approximately 20% of the samples had a lower exon 7 and 8 copy number for the SMN1 and SMN2 genes combined as compared to the exon 1-6 copy number. This validation study led us to investigate the prevalence of an SMN variant lacking exons 7-8 in multiple populations. Our study showed a strong inverse correlation between the copy number of SMN gene variants lacking exons 7 and 8 and the SMN2 copy number, corroborating a report by Alias et al. [26]. This suggests the SMN1/2Δ7-8 variant in the general population to be predominantly derived from SMN2 deletion events, in concordance with the difference in deletion frequency between SMN1 and SMN2 [24].
In this study 1255 samples from the 1000 Genomes Project comprising of multiple ethnic groups were analyzed using the SALSA MLPA Probemix P021-B1 SMA, which can readily detect the SMN1/2Δ7-8 variant (Fig 1). The copy numbers for SMN1, SMN2 and SMN1/2Δ7-8 were determined for each sample and compared for each ethnic group. Our results show that the frequency of the SMN1/2Δ7-8 variant is strongly population dependent.

Results
We tested 1255 DNA samples from the 1000 Genomes Project collection with a new B1 version of the SALSA MLPA Probemix P021 SMA. This new P021 probemix has increased gene coverage for the SMN genes as compared to previous versions and covers each exon of the SMN genes with one or more probes. Sequence differences in exon 7 (c.840C>T) and 8 (c. � 239G>A) of both SMN1 and SMN2 are covered by four probes, in addition 17 probes detect sequences present in both SMN1 and SMN2. Of the latter, ten probes detect sequences in the exon 1-6 region and seven probes detect sequences in the exon 7-8 region.
Upon data analysis, a high frequency of SMN1/2Δ7-8 variant alleles were observed in specific populations (Table 1), which is in line with previous reports [18,25]. No other partial gene deletions or duplications were observed. Using synthetic MLPA probes, we located the breakpoint of the SMN1/2Δ7-8 variant to be in intron 6, between nucleotides 4679 and 2363 before exon 7 (Fig 2). 13 samples were tested and all showed the same probes deviating, indicating that the breakpoint is in the same 2.3 kb region. preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. JS (the owner of MRC Holland) played a role in study design, decision to publish and preparation of the manuscript.
Gap-PCR was used to determine if the deletion of exons 7 and 8 of the 82 SMN1/2Δ7-8 positive samples was of equal size to the exon 7 and 8 deletion described by Ruhno et al [27]. A deletion of equal size was found in 99% of positive samples (81/82). None of 24 negative samples tested positive by this Gap-PCR.
The total number of copies for SMN1, SMN2 and SMN1/2Δ7-8 in a sample could reliably be quantified by using the median value of the ten MLPA probes for the exon 1-6 region ( Fig  1). The total copy number for SMN1, SMN2 and SMN1/2Δ7-8 ranged from 2 to 7 in our sample cohort. Samples that contained the SMN1/2Δ7-8 variant could clearly be identified by subtracting the median copy number obtained from the seven MLPA probes detecting exons 7 and 8 of both SMN1 and SMN2 from the median copy number obtained from the ten MLPA probes detecting exons 1-6 of both SMN1 and SMN2 (Fig 1). In total, we observed 78 samples  Tables 1 and  2). In addition, the average number of either SMN1 or SMN2 copies in each individual was found to be strongly population dependent with African populations showing much higher SMN1 and lower SMN2 copy numbers in comparison to the other populations examined ( Table 2). The number of SMN1 copies ranged from an average of 2.70 in African samples to 2.03 in European samples. SMN2 copy numbers were highest in Asian samples with an average of 1.56 and lowest in African samples with an average of 1.05. The total number of SMN copies   Table 2). The presence of a SMN1/2Δ7-8 gene variant is strongly linked to the SMN2 copy number as shown in Table 3. A significant difference in SMN1/2Δ7-8 presence was found between individuals carrying two or more copies of either SMN1 or SMN2 (p = 1.48e-12, OR: 39.6 [95%CI: 6.9-1573.9]). Presence of SMN1/2Δ7-8 was nearly absent (0.17%) within the group of two or more SMN2 copies while more frequently found in carriers of two or more SMN1 copies (6.73%). When comparing the presence of SMN1/2Δ7-8 between the groups of one or less copies of either SMN1 or SMN2 no significant difference was found (p = 0.39). Samples without SMN1/2Δ7-8 copies had on average 2.25 SMN1 copies and 1.44 SMN2 copies. While samples that contained one or two SMN1/2Δ7-8 copies had on average 2.02 SMN1 copies and 0.76 SMN2 copies. 50.2% of samples (589 out of 1173) without the SMN1/2Δ7-8 variant had two or more SMN2 copies, while in the 82 samples that contained one or two SMN1/2Δ7-8 copies, only one sample had two SMN2 copies.

Discussion
Our results show the frequency of having at least one SMN1/2Δ7-8 copy to be 15-20% in three European populations (Italian, Spanish, and English/Scottish), which is in line with previous reports [25]. Interestingly, only two out of 650 Asian and African samples tested contained an SMN1/2Δ7-8 copy, which is approximately 50 times lower than in the tested European Our results also confirm previous reports of several African populations having a higher average SMN1 copy number [1,2], with an average SMN1 copy number of 2.70 in African populations and 2.03-2.19 in non-African populations. The increased SMN1 copy number in Africans is accompanied by a lower average SMN2 copy number and essentially complete absence of SMN1/2Δ7-8 copies. Interestingly, our results show that the combined copy number of SMN1, SMN2 and SMN1/2Δ7-8 is remarkably constant across all populations tested (Table 2), ranging from 3.64 in Asian samples to 3.75 in African samples.
The frequent occurrence of an SMN1 or SMN2 variant lacking exons 7-8 was reported first by Arkblad et al in 2006 [18]. Thus far, little attention has been given to this SMN gene variant, likely due to technical difficulties in variant detection [28]. The development of an improved version of the SALSA MLPA Probemix P021 SMA has made the identification and copy number determination of SMN genes, including the SMN1/2Δ7-8 variant, simple and robust. Our results on 1255 samples from the 1000 Genomes Project show that the presence of this SMN1/ 2Δ7-8 variant is almost completely restricted to DNA samples that have an SMN2 copy number of zero or one (Table 3). Contrary to our results Ruhno et al [27] presented evidence showing that SMN1/2Δ7-8 can also originate from SMN1. This discrepancy could be due to the absence of patients and the small number of SMA carriers within our cohort resulting from the lower frequency of SMN1 deletion events compared to that of SMN2 [24]. Only one out of the 590 samples with two SMN2 copies contained an SMN1/2Δ7-8 copy, while 81 out of the 665 samples (12%) with zero or only one SMN2 copy had at least one SMN1/2Δ7-8 copy. This finding is in agreement with earlier reports [26] where a strong correlation between SMN2 copy number and the presence of one or two copies of the SMN1/2Δ7-8 variant was found, further suggesting that most SMN1/2Δ7-8 variant copies are originally derived from an SMN2 deletion event.
Using additional MLPA probes, we localized the 3' breakpoint of SMN1/2Δ7-8 to intron 6 between nucleotides 4679 and 2363 before exon 7, in 13 samples examined (Fig 2). This is in line with a recent report from Ruhno et al [27] who describe a 6310 bp exon 7-8 deletion that has breakpoints at 3643 nucleotides before exon 7 and 2164 nucleotides after the start of exon 8. The SMN1/2Δ7-8 breakpoint, located in intron 6, is more than 2 kb from the 3' breakpoint of a known SMN1 exon 1-6 deletion that is located between 133 and 110 nucleotides before exon 7 [15]. This latter breakpoint does not contain the recurring repeat sequence identified by Ruhno et al that contains the breakpoint of SMN1/2Δ7-8. Thus, the SMN1/2Δ7-8 copies are not the reciprocal of the recurrent SMN1 exon 1-6 deletion [15,18]. It should be noted that the presence of an SMN1/2Δ7-8 copy could mask an SMN1 exon 1-6 deletion, although the occurrence would be very rare this could have clinical relevance.
SMN2 copy number is the most important modifier of SMA disease severity. Next to this, the presence of the c.859G>C variant and/or the intron 6 A-44G, A-549G and C-1897T variants in SMN2 also seem to be correlated with a milder phenotype [27,29]. One example of a disease modifier located outside the SMN region is the PLS3 gene, located on the X chromosome. In siblings, with identical SMN genotypes but discordant phenotypes, the expression level of PLS3, is found to influence the severity of the SMA phenotype. High expression levels were found in lymphoblastoid cell lines of asymptomatic but not in symptomatic siblings [30]. The effect of PLS3 expression is gender specific and not completely penetrant [30,31] therefore there must be additional disease modifiers inside or outside the SMN region. Several articles have reported healthy adults that lack all SMN1 exon 7 copies [18,30,[32][33][34][35][36][37][38][39][40][41]. In some cases, the absence of symptoms was attributed to a high SMN2 copy number (4-5 copies). However, the identification of individuals with homozygous SMN1 loss and a physique described as "extremely well-muscled" [36] or "athletically built" [32]. As these unaffected individuals had the same SMN2 copy number as their affected siblings, additional disease modifiers outside the SMN genes are expected. The start of SMA newborn screening and pre-symptomatic treatment makes further research on SMA disease modifiers urgent. The SMN1/2Δ7-8 copy number is unlikely to be a strong disease modifier as its presence was not related to disease severity in the patient cohort studied by Ruhno et al [27]. Besides, Calucho et al. [25] identified 13 SMA patient samples with an SMN1/2Δ7-8 copy (2%, 13/625), and of these, 11 patients were diagnosed as type I.
The SALSA MLPA Probemix P021-B1 version provides accurate copy number determination of the SMN1 and SMN2 genes and the SMN1/2Δ7-8 variant and will likely result in a more clear picture of the functionality of the SMN1/2Δ7-8 variant in the future when more SMA patient samples are analyzed.

Subjects
Cell line derived DNA samples from the 1000 Genomes Project [42] were obtained from Coriell laboratories (https://www.coriell.org/). Subsets from the following panels were tested: Ben  Table. The number of samples per subset are  shown in Table 1.

Copy number analysis
A new version, B1, of the SALSA MLPA Probemix P021 SMA was developed at MRC Holland (MRC Holland, Amsterdam, The Netherlands). In contrast to previous P021 versions all exons of the SMN genes are covered by at least one probe in this B1 version. In addition to four SMN1 and SMN2 specific probes, each detecting a single nucleotide variation in exon 7 and 8, the probemix contains seven probes that detect a sequence present in, or close to, exon 7 or exon 8 of both SMN1 and SMN2, as well as ten probes that detect sequences present in exons 1-6 of both SMN1 and SMN2.
All MLPA reactions were performed according to the General MLPA protocol (www.mlpa. com). Reactions were analyzed by capillary electrophoresis using the Applied Biosystems 3130 (Life Technologies/Applied Biosystems, Carlsbad, CA). Data analysis was performed using Coffalyser.Net software (www.mlpa.com).
SMN1 and SMN2 copy numbers were determined with the use of the SMN1 and SMN2 specific probes for exon 7. The SMN1/2Δ7-8 copy number was determined by subtracting the median copy number obtained for the seven MLPA probes that detect a sequence present in exons 7 and 8 of both SMN1 and SMN2 from the median copy number obtained for the ten MLPA probes that detect a sequence present in exons 1-6 of both SMN1 and SMN2.
Five additional probes were designed to investigate the 3' breakpoint of the SMN1/2Δ7-8 variant. These probes are specific for sequences located in intron 6 of the SMN genes (Fig 2). Out of the 82 samples containing a SMN1/2Δ7-8 variant a random subset of 13 samples were tested using these probes.
Hot start gap-PCR was performed in order to determine whether the size of the deletion in SMN1/2Δ7-8 positive samples is equal to the deletion of 6310 nt described by Ruhno et al [27]. Primers (Forward: /56FAM/CAGTTATCTGACTGTAACACTGTAGGC, Reverse: GTTGTTGCTTATGCTGGTCTTG) were designed based on breakpoints described by Ruhno et al [27] and positive samples generated an amplicon of 583 nucleotides.

Statistical analysis
Statistics analysis was performed using R version 3.5.1 [43] and Rstudio version 1.1.463 [44]. SMN1 copy numbers, SMN2 copy numbers and SMN1/2Δ7-8 copy numbers were compared using the Fischer's exact test with the basic R function fisher.test. SMN1/2Δ7-8 results were dichotomized in a group with either 0 copies or in a group with one or more copies. Presence of the SMN1/2Δ7-8 was compared within two groups. In the first group SMN1/2Δ7-8 presence was compared between carriers of one or less copies of either SMN1 or SMN2. In the second group SMN1/2Δ7-8 presence was compared between individuals carrying two or more copies of either SMN1 or SMN2. Individuals carrying one copy of both SMN1 and SMN2 were excluded from the analysis (n = 2). Furthermore, continent data was dichotomized per continent and subsequently compared using the Fischer's exact test. Bonferroni correction was applied to correct for multiple testing.
Supporting information S1 Table. SMN copy number distribution and cell line catalog ID. The median copy number value based on both the SMN exon 1-6 probes and SMN exon 7-8 probes is provided for each DNA sample tested, as shown in Fig 1. For each DNA sample, the Coriell Institute calatog ID is provided. (XLSX)