Systemic nature of spinal muscular atrophy revealed by studying insurance claims

Objective We investigated the presence of non-neuromuscular phenotypes in patients affected by Spinal Muscular Atrophy (SMA), a disorder caused by a mutation in the Survival of Motor Neuron (SMN) gene, and whether these phenotypes may be clinically detectable prior to clinical signs of neuromuscular degeneration and therefore independent of muscle weakness. Methods We utilized a de-identified database of insurance claims to explore the health of 1,038 SMA patients compared to controls. Two analyses were performed: (1) claims from the entire insurance coverage window; and (2) for SMA patients, claims prior to diagnosis of any neuromuscular disease or evidence of major neuromuscular degeneration to increase the chance that phenotypes could be attributed directly to reduced SMN levels. Logistic regression was used to determine whether phenotypes were diagnosed at significantly different rates between SMA patients and controls and to obtain covariate-adjusted odds ratios. Results Results from the entire coverage window revealed a broad spectrum of phenotypes that are differentially diagnosed in SMA subjects compared to controls. Moreover, data from SMA patients prior to their first clinical signs of neuromuscular degeneration revealed numerous non-neuromuscular phenotypes including defects within the cardiovascular, gastrointestinal, metabolic, reproductive, and skeletal systems. Furthermore, our data provide evidence of a potential ordering of disease progression beginning with these non-neuromuscular phenotypes. Conclusions Our data point to a direct relationship between early, detectable non-neuromuscular symptoms and SMN deficiency. Our findings are particularly important for evaluating the efficacy of SMN-increasing therapies for SMA, comparing the effectiveness of local versus systemically delivered therapeutics, and determining the optimal therapeutic treatment window prior to irreversible neuromuscular damage.

Introduction to develop risk-prediction models and characterize disease progression [26][27][28]. Our results have major implications as they highlight the need for broader systemic monitoring of SMA progression and provide a framework for comparative effectiveness studies between locally delivered therapeutics, such as Spinraza, and those delivered systemically, some of which may be nearing clinical approval. Finally, while newborn screening for SMA will support early intervention for the severe patients (Types I/II), later stage patients (Types III/IV) may benefit from improved methods of tracking disease progression including during the stage when they are still pre-symptomatic for neuromuscular weakness.

Materials and methods
The Harvard Medical School Institutional Review Board approved this research.

Data
We utilized a de-identified administrative database from Aetna Inc. representing 63,444,784 unique members during a period extending from January 1, 2008 through October 1, 2015.
For each individual, we extracted gender, year of birth, enrollment duration, and diagnoses in the form of International Classification of Diseases, 9 th Revision (ICD9), Clinical Modification codes.

Subject/Control selection
Using a selection approach similar to one used to evaluate the economic burden of SMA [29], we first selected all individuals with at least 2 ICD9 codes for SMA on different visits. Next, to avoid inclusion of subjects that might have been misdiagnosed with SMA, such as those affected by muscular dystrophy (MD) and myoneural disorders, we required the final neuromuscular disease diagnosis on record to be SMA. Additionally, after reviewing each of the selected individuals, we found that a large number of women undergoing prenatal genetic testing or care for pregnancies with a high risk for SMA received SMA diagnoses for billing purposes. To overcome this, we excluded any subject with pregnancy codes or codes related to pregnancy complications as detailed in S1 Table. Confirmed SMA patients were then stratified into three subcategories based on age at first SMA diagnosis on record [1]: Group A represented likely SMA types I/II patients having diagnoses from birth until 2 years of age; Group B represented likely SMA types III patients with diagnoses from 2 until 21 years of age; and Group C represented likely SMA type IV patients with diagnoses between 21 and 65 years of age. Control populations were selected to match the range of ages at insurance enrollment for each SMA Group. As such, control Groups may share individuals given that there is an overlap in the age at insurance enrollment for the SMA Groups since their stratification is based on age at first SMA diagnosis. For each patient/control Group, two different analyses were performed as shown in Fig 1. The first analysis utilized the entire coverage window of each individual, whereas the second included, for SMA cases, only data prior to the first diagnosis of neuromuscular disease (SMA, MD, etc.) or evidence of major neuromuscular degeneration defined by the presence of certain ICD9 codes (S1 Table). In this manuscript, this time-point is referred to as the first sign of neuromuscular degeneration, with the time prior to that being their pre-neuromuscular degeneration window. Finally, in order to have sufficient depth of health records, we required each subject to have at least 6 months of coverage during the analysis window. For the pre-neuromuscular degeneration analysis, the required six-months prior to their first sign of neuromuscular degeneration supports exclusion of patients that potentially had an SMA diagnosis prior to their coverage.

Phenotypic differences
To reduce the number of variables tested, we first converted ICD9 codes to phenome-wideassociation (PheWAS) diagnosis groups [30]. This allowed us to aggregate similar ICD9 codes that, for the purpose of this study, can be considered diagnoses of equivalent conditions. In order to test whether PheWAS groups were diagnosed at significantly different rates between SMA and control groups and to obtain covariate-adjusted odds ratios (OR), we regressed an indicator of the PheWAS group onto an indicator of SMA diagnosis using logistic regression via the glm() function in R-3.3.3 [31] with gender, age at insurance enrollment, and enrollment months during the analysis window as covariates. Only phenotypes with prevalence of at least 1% in the SMA Group or control population were evaluated. The false discovery rate (FDR) was controlled at 5% using the Benjamini-Hochberg procedure [32] to adjust p-values.

Temporal trajectories of the onset of categorized phenotypes
Next, for only the pre-neuromuscular degeneration analysis window, we took all phenotypes with adjusted p-values less than 0.05 and grouped them by physiological system to evaluate broader categories of disease-associated dysfunctions and repeated the logistic regression analyses where now the outcome was the binary indicator of any phenotype covered by the specified physiological system. To characterize a potential timeline of disease progression, we computed the median time between first diagnosis of any ICD9 code covered by each specified physiological system and a patient's first sign of neuromuscular degeneration. The goal was to ascertain whether there is a temporal relationship between the onsets of various non-neuromuscular defects and that of neuromuscular dysfunction. To eliminate the possibility that these timelines could result from a technical artifact due to variability in the length of a subject's insurance coverage, we calculated the R 2 coefficient of determination between first diagnoses of specific defects for all SMA cases with their insurance enrollment duration.  SMA patients were identified based on having at least 2 SMA diagnosis codes in their insurance coverage window. Additionally, their diagnosis must have been associated with a final diagnosis of SMA as opposed to another similarly treated neuromuscular disorder. Using these cohorts, two analyses were performed. The first, included the entire coverage window (Entire Coverage Window Analysis) with a 6-month minimum observation window. The second, Pre-Neuromuscular Degeneration Analysis, only included data prior to the first neuromuscular disease diagnosis or evidence of major neuromuscular degeneration with a minimum observation window of 6 months. The SMA cohort was split into three groups by age at first SMA diagnosis on record. . Complete details on each cohort including total population, gender, age at insurance enrollment, mean enrollment duration, and a measure of medical utilization in the form of days with ICD9 codes per six-month period are presented in Table 1.

Identification of non-neuromuscular phenotypes in SMA patients
Differential diagnoses or phenotypes, in the form of PheWAS groups, from the entire coverage window of SMA patients compared to controls for Groups A, B, and C are presented in S2-S4 Tables. These data reveal a host of non-neuromuscular phenotypes, as well as more traditional hallmarks of disease progression including muscular, respiratory, feeding, and mobility complications. Adjusted p-values for each phenotype, organized by primary associated physiological system, are presented using Manhattan plots in Figs 2A, 3A and 4A. Notably, nontraditional aspects of the disease that are revealed include, but are not limited to, cardiovascular (peripheral vascular disease, tachycardia, chronic vascular insufficiencies of the intestine), metabolic (hypoglycemia, hypopotassemia (hypokalemia)), and sensorial (chronic pain syndrome, pancreatitis) defects.
Since this analysis included features detected post SMA-diagnosis, the determination of whether a phenotype is an independent feature of SMA, directly attributable to reduced SMN   levels, or simply a consequence of neuromuscular dysfunction is unclear. Therefore, we repeated the analysis, but limited the data to the period prior to clinically detectable neuromuscular degeneration. The analysis of data collected in that period yielded a total of 17, 67, and 35 differential phenotypes for Groups A, B, and C are presented in S5-S7 Tables, respectively. Visualization of the relative significance of each phenotype is presented using Manhattan plots (Figs 2B, 3B and 4B), alongside data from the entire coverage window. Selected non-neuromuscular phenotypes of particular interest representing cardiovascular, gastrointestinal, and male reproductive defects are also provided in Table 2. These data demonstrate the broad impact of reduced SMN outside of the CNS and in advance of neuromuscular degeneration. While it is possible that some of the phenotypes could be consequences of the earliest stages of neuromuscular dysfunction, others, such as defects in the cardiovascular and male reproductive systems, are more definitively independent. Despite having had a limited length of time prior to their first sign of neuromuscular degeneration, children from Group A showed evidence of numerous non-neuromuscular phenotypes. These phenotypes included: cardiovascular (non-rheumatic tricuspid valve disorders: 6.5% in SMA, OR = 42, adjusted p-value<0.001); gastrointestinal (dysphagia: 12.9% in SMA, OR = 16, adjusted p-value<0.001); and skeletal (congenital anomalies of face and neck: 22.6% in SMA, OR = 5, adjusted p-value = 4E-2). Unsurprisingly, subjects in Group A also presented with early, but not definitive, signs of impending neuromuscular defects such as lack of coordination (67.7% in SMA, OR = 282, adjusted p-value<0.001), developmental delays and disorders (38.7% in SMA, OR = 39, adjusted p-value<0.001), and muscle weakness (19.4% in SMA, OR = 61, adjusted p-value<0.001).

Temporal trajectories of the onset of categorized phenotypes
We then grouped phenotypes from the pre-neuromuscular degeneration window with significant differences in rates of diagnoses between cohorts into categories representing independent physiological systems, as detailed in (S5-S7 Tables). In addition to running the logistic regression on these broader categories, we established a general time course for the appearance of these non-neuromuscular components of the disease. Timelines of phenotypes categorized by individual physiological system are presented along with odds ratios and prevalence in Table 3. Timelines for phenotypes with at least 5% prevalence were visualized based on median date of onset compared to the date of the first sign of neuromuscular involvement in Fig 5. These data establish that all types of SMA, even the most severe, can involve non-neuromuscular tissues in the period prior to significant neuromuscular decline and present a potential temporal ordering. Data for Group A (Fig 5A) show that the earliest detected phenotypes were skeletal (-142 days, 6.5% in SMA, OR = 42.2) and cardiovascular (-141 days, 29.0% in SMA, OR = 5.4), while the remaining phenotypes-neurological (-38, 71% in SMA, OR = 175.8), failure to achieve developmental milestones (-36 days, 64.5% in SMA, OR = 51.8); and muscular (-23 days, 35.5% in SMA, OR = 70.5)-were found closer to the first signs of neuromuscular degeneration. Data for Group B (Fig 5B) show that metabolic defects (-526 days, 15% in SMA, OR = 4.2) were the earliest detectable phenotypes, followed by developmental (-292 days, 23.4% in SMA, OR = 11.8), gastrointestinal (-262 days, 23.4% in SMA, OR = 3.3), cardiovascular (-256 days, 12.1% in SMA, OR = 11.3), respiratory (-254 days, 7.5% in SMA, OR = 6.3), and skeletal (-220 days, 20.6% in SMA, OR = 10.2) phenotypes. These patients were typically diagnosed with neurological (-168 days, 33.6% in SMA, OR = 9.5), spinal/joint issues (-147 days, 10.3% in SMA, OR = 15.0) and musculoskeletal (-101 days, 12.1% in SMA, OR = 19.2) phenotypes only within the last 6 months before their first sign of neuromuscular dysfunction. Data from Group C (Fig 5C) highlight that the earliest phenotypes represented spinal/joint issues (-354 days, 55.8% in SMA, OR = 2.1), followed by gastrointestinal (-273 days, 11.3% in SMA, OR = 2.4) and male reproductive (-236 days, 6.8% in SMA [11.8% in males], OR = 2.81) issues. Muscular (-201 days, 28.5% in SMA, OR = 3.3) and neurological (-149 days, 16.9% in SMA, OR = 3.0) phenotypes were, again, the last to be diagnosed, being noted around 6 months prior to the first sign of neuromuscular involvement. The R 2 coefficients of correlation for these results with enrollment duration were 0.21, 0.10, and 0.19 for Groups A, B, and C, respectively, indicating little of the variation can be explained by enrollment duration.

Discussion
Growing evidence indicates that SMA is not solely a motor neuron disease. This includes observations made using a severe SMA mouse model, post-SMA diagnosis clinical studies and reports based on small numbers of patients. This is consistent with gene expression data showing that SMN is transcribed in all tissues and cellular data demonstrating that SMN plays a broad and important role in mRNA splicing in all cells. However, evidence is lacking that SMN deficiency directly translates into detectable non-neuromuscular phenotypes in humans, and, if so, at which times during disease progression. One reason for this is that historical SMA studies focused on patients after SMA diagnosis [33][34][35][36][37][38], when phenotypes can be masked by neuromuscular degeneration or characterized as downstream complications. Given the current and pending approvals of SMN increasing therapies, understanding the complete pathophysiology of SMA is of critical importance in order to provide more comprehensive measures of treatment efficacy, support comparative effectiveness studies of local versus systemic interventions, and to determine whether early clinically detectable signs exist that could be used to support pre-symptomatic intervention in milder patients.
In this study, we first revealed a broad spectrum of phenotypes that were differentially diagnosed in the SMA population compared to controls during the entire coverage window, including pre-and post-SMA diagnosis. We then focused on investigating SMA patient health prior to the first clinical sign of neuromuscular degeneration, when the cause of non- neuromuscular phenotypes is more likely to be independently caused by SMN deficiency. To our knowledge, this is the first large-scale study investigating pre-diagnostic SMA disease progression. Our study revealed numerous non-neuromuscular phenotypes in patients with varying severities of disease. Notably, peripheral phenotypes including those in the cardiovascular, gastrointestinal, metabolic, reproductive, and skeletal systems were detectable prior to any sign of neurological or muscular defects. Some of the phenotypes we detected in SMA patients have been reported in SMA mouse models despite their short survival time. For example, SMA mice have been demonstrated to have vascular defects that lead to necrosis of the tail and ears [39,40]. Our results indicate that SMA patients are also more likely to be diagnosed with vascular defects, such as peripheral vascular disease, chronic venous insufficiency, and chronic vascular insufficiency of the intestines. SMA mouse models have been reported to present with cardiac failure, remodeling, and septal defects [41][42][43], and our study found evidence of valve disorders, cardiomyopathies, septal defects, and premature beats spanning all three SMA severity Groups. Additionally, SMA model mice have reduced numbers of intestinal villi that are blunt and club-shaped with severe intramural edema [44]. This may correlate with our finding that all three Groups of SMA patients have diagnoses of gastrointestinal disorders and dysfunction. However, gastrointestinal phenotypes, such as dysphagia and constipation, diagnosed in patients could be a consequence of early muscle weakness. One of the more surprising findings of our study is dysfunction in the male reproductive system of adult-onset SMA patients in Group C. This is not completely unprecedented, as it aligns with recent studies in an SMA mouse model showing infertility in males and developmental issues in their testes [7,8] and with an anecdotal human study of two subjects with atrophic testes [45]. If male reproductive dysfunction is confirmed to be a factor in this disease, then hormone markers such as testosterone may prove to function as biomarkers of disease severity that track with therapeutic efficacy. Our findings indicate that it may be possible to identify potential patients for genetic testing before major neurological damage more effectively. For example, symptoms that could motivate genetic testing for SMA in infants or adolescents may include a history of diagnoses for cardiovascular disorders, gastrointestinal problems, or skeletal deformities when a child fails to meet traditional developmental milestones. For the milder cases, genetic testing could be motivated through a combination of more common defects such as skeletal deformities (e.g., flat feet or scoliosis), constipation, disturbances in skin sensation, or irregular cardiovascular function occurring prior to common early signs of neurological issues such as spondylosis or degeneration of intervertebral disks. In men, another early sign could include genitourinary issues such as testicular hypofunction. However, these findings need validation by studies with different subjects and determination of whether sets of diagnoses translate into sufficient conversion rates to be considered actionable. Finally, it would also be valuable to run a prospective study of pre-symptomatic patients to gather information on the entire set of tissues that could be affected in particular individuals, as our results are based on more routine tests not developed for exploring systemic nature of a disease. It may be that it is an unusual combination of affected tissues that most effectively tracks pre-neuromuscular disease progression of later onset of SMA.
More broadly, our results indicate that SMA is not a pure motor neuron disease, since other tissues are implicated. Motor neuron dysfunction may not even be the earliest clinically detectable manifestation of SMN deficiency even though this dysfunction ultimately is clinically of most significance, particularly in children with early-onset SMA. Application of these findings may include the development of broader endpoints for SMA clinical trials, a framework for comparing efficacy of systemic versus local SMN increasing therapeutics and the provision of a multi-systems framework that can guide the search for non-neuromuscular biomarkers that may track patient disease before and during treatment more dynamically. In that regard, SMA may resemble other disorders involving changes in CNS tissue, such as Parkinson's disease, that are well-known to have extra-CNS symptomology [46]. Most importantly, studies like ours could highlight specific components of SMA that might require more targeted therapies.
There are limitations and potential pitfalls to our study. The first is that our findings are built on health insurance claims data, which were generated for billing purposes and not for research. However, it has been shown that insurance claims are equivalently accurate to electronic health records, a data source widely used in medical research [47]. Furthermore, the fact that healthcare is not universal in the United States, where the data was generated, may lead to biases in our findings. A recent review of healthcare coverage in the United States revealed that the majority of individuals are covered through commercial providers, with employerbased coverage being the most common, followed by Medicaid and Medicare. They further showed a difference in uninsured rates by race with non-Hispanic Whites being the most insured and Hispanics being the least [48]. A second potential pitfall is that many phenotypes, especially those not directly linked to billable events, are likely to have gone undiagnosed, thereby hindering our ability to properly identify neuromuscular involvement and also reducing the detected prevalence of other phenotypes. One potential consequence is that some patients had been previously diagnosed with SMA prior to their coverage window. However, the required six-month window prior to detectable neuromuscular degeneration reduces the risk that we are missing a subject's first SMA diagnosis. Another potential challenge in our data interpretation is that, despite the alignment with mouse data, our non-neuromuscular findings could still be secondary to the underlying neurological degeneration that may not have been symptomatic at that point. Finally, while this study is the largest SMA study to date and the only one focusing on subject health prior at points prior to SMA diagnosis, our numbers did not enable us to control for all possible confounding variables. Even considering these potential limitations, we are confident in the results and hope they may motivate larger studies to validate and extend our findings in other pre-symptomatic cohorts.
In conclusion, this work presents a phenome-wide analysis of SMA progression, demonstrating its association with a range of neuromuscular and non-neuromuscular phenotypes. Given the extended lifespan of patients treated by SMN increasing therapeutics, our data may provide insights into other physiological systems that will need dedicated care. Our temporal analysis indicates that many non-neuromuscular phenotypes are present prior to early manifestations of neuromuscular degeneration. This points not only to a primary relationship of these symptoms with SMN deficiency, but also towards the possibility of their use in predicting time to neuromuscular symptom onset for treatment initiation prior to irreversible nervous system damage in later stage patients who may not have been screened at birth or who want to delay treatment while they remain asymptomatic.
Supporting information S1 Table. Details of ICD9 codes used. List of the diagnosis codes used as inclusion and exclusion. criteria used for patient selection and establishment of critical time points in disease progression. These diagnosis codes include: SMA codes; any pregnancy codes or codes related to pregnancy complications; and codes related to any neuromuscular disease (MD, SMA, etc.) or evidence of major neuromuscular degeneration used to determine the neuromuscular inflection point. (XLSX) S2 Table. Differential PheWAS codes for Group A from entire coverage analysis. PheWAS codes representing differential diagnoses from the entire coverage analysis for SMA patients in Group A are presented. Data includes OR, prevalence in each cohort, and adjusted pvalue. (XLSX) S3 Table. Differential PheWAS codes for Group B from entire coverage analysis. PheWAS codes representing differential diagnoses from the entire coverage analysis for SMA patients in Group B are presented. Data includes OR, prevalence in each cohort, and adjusted p-value. (XLSX) S4 Table. Differential PheWAS codes for Group C from entire coverage analysis. PheWAS codes representing differential diagnoses from the entire coverage analysis for SMA patients in Group C are presented. Data includes OR, prevalence in each cohort, and adjusted p-value. (XLSX) S5 Table. Differential PheWAS codes for Group A from pre-neuromuscular degeneration. PheWAS codes representing differential diagnoses from the pre-neuromuscular degeneration analysis for SMA patients in Group A are presented. Data includes OR, prevalence in each cohort, and adjusted p-value. (XLSX) S6 Table. Differential PheWAS codes for Group B from pre-neuromuscular degeneration. PheWAS codes representing differential diagnoses from the pre-neuromuscular degeneration analysis for SMA patients in Group B are presented. Data includes OR, prevalence in each cohort, and adjusted p-value. (XLSX) S7 Table. Differential PheWAS codes for Group C from pre-neuromuscular degeneration. PheWAS codes representing differential diagnoses from the pre-neuromuscular degeneration analysis for SMA patients in Group C are presented. Data includes OR, prevalence in each cohort, and adjusted p-value. (XLSX) 47. Kottke T.E., Baechler C.J. and Parker E.D., Accuracy of heart disease prevalence estimated from claims data compared with an electronic health record.