Mitochondrial Network Genes in the Skeletal Muscle of Amyotrophic Lateral Sclerosis Patients

Recent evidence suggested that muscle degeneration might lead and/or contribute to neurodegeneration, thus it possibly play a key role in the etiopathogenesis and progression of amyotrophic lateral sclerosis (ALS). To test this hypothesis, this study attempted to categorize functionally relevant genes within the genome-wide expression profile of human ALS skeletal muscle, using microarray technology and gene regulatory network analysis. The correlation network structures significantly change between patients and controls, indicating an increased inter-gene connection in patients compared to controls. The gene network observed in the ALS group seems to reflect the perturbation of muscle homeostasis and metabolic balance occurring in affected individuals. In particular, the network observed in the ALS muscles includes genes (PRKR1A, FOXO1, TRIM32, ACTN3, among others), whose functions connect the sarcomere integrity to mitochondrial oxidative metabolism. Overall, the analytical approach used in this study offer the possibility to observe higher levels of correlation (i.e. common expression trends) among genes, whose function seems to be aberrantly activated during the progression of muscle atrophy.


Introduction
Amyotrophic lateral sclerosis (ALS) is classically defined as a motor neuron disorder, characterized by the progressive degeneration of upper and lower motor neurons, leading to muscle atrophy and corresponding loss of muscle strength. ALS is eventually lethal within 3-to-5 years after the symptom onset, due to the involvement of respiratory muscles. Although muscle atrophy has been originally considered a secondary direct consequence of neurodegeneration, new pathogenetic hypotheses, during the last decade, have been suggesting a primary role for the events occurring at the post-synaptic site, i.e. in the skeletal muscle. Damage of the neuromuscular junction (NMJ), along with skeletal muscle degeneration and atrophy, indeed precede neuronal degeneration in the ALS SOD1 mouse model [1,2]. This supports the notion that muscle degeneration may lead and/or contribute to neurodegeneration and play a key role in the cause and/or progression of ALS [3,4].
Skeletal muscle of ALS patients and ALS mice presents severe atrophy [5] and has considerable mitochondrial disruption and dysfunction, indicated by the deficiency of key respiratory chain enzymes [6][7][8]. It was recently demonstrated that the disruption of the mitochondrial network genes enhances skeletal muscle atrophy programs [9,10]. This evidence indicates that a perturbation in the homeostasis of the molecular network involved in the maintenance of skeletal muscle mitochondrial biogenesis should occur during the pathogenesis and progression of ALS [10].
Though, the hypothesis of the skeletal muscle as the primary damaged site in ALS pathogenesis is still pending, due to the dearth of evidences obtained in human tissues, which represent the unique clinically-relevant model to study sporadic ALS.
As a tool to investigate the molecular scenario occurring at the post-synaptic site in ALS injured muscles, we used microarraybased genome-wide expression profiling. In order to obtain original hints toward the functional interpretation of the wide resulting dataset we used a system biology approach, based on gene regulatory networks, to analyze the changes in between gene expression correlation structure occurring in the affected tissue [11].
Microarray data analysis has been classically based on supervised statistical methods, enabling to compare gene-by-gene differential expression. This approach tends to consider genes as independent functional units, regardless of the multifactorial regulation of gene expression acting in biological systems [12][13][14]. In addition, given the high dimensionality of the microarray experiments compared to the number of tested samples/individuals, it can be severely biased by chance correlation effect [15]. On this regard, gene networks describe the connections existing between genes that are involved in the same biological process and are used to identify functional modules (i.e. subsets of genes that regulate each other with multiple interactions, but have few interactions with other genes outside the subset).
Therefore, the analytical strategy proposed in this study, allows identifying the expression signature of the atrophic skeletal muscle, based both on differential gene expression and on gene correlation networks.

Samples Collection
Ethics statement. The Ethics Committee of the Università Cattolica del Sacro Cuore, School of Medicine (Rome, Italy) approved this study. Written informed consent was obtained from all patients.
Patients and specimens. A case-control study was performed comparing 7 ALS patients with 7 age-and sex-matched healthy controls (table 1). The ALS sample was represented by 4 males and 3 females, aged 55-to-73; mean and median age was 64 in both the patients and the control group. ALS diagnosis was performed according to the revised El Escorial criteria. The control sample was selected among patients undergoing orthopedic surgery for traumatic injury and without positive history for muscle weakness, nor any neurological disorder. All patients were of Caucasian origin. Detailed clinical information of the individuals enrolled in the study is provided in table 1. A skeletal muscle specimen was collected through an open biopsy performed either in the deltoid muscle (4 ALS patients and all controls) or in the quadriceps (3 ALS patients).
The tissue specimens were stored in liquid nitrogen upon collection and subsequently used for RNA isolation.
RNA isolation and microarray analysis. Total RNA was isolated from frozen tissue specimens using pestel homogenization, TRIzol protocol (Invitrogen, Carlsbad, CA, USA), and further oncolumn purification as previously described [16]. The yield, quality and integrity of RNA were determined using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) as previously described [17]. The resulting total RNA was then used to create the biotin-labeled library to be hybridized on GeneChip Human Genome Focus Array (Affymetrix, Santa Clara, CA, USA), as already described elsewhere [18].

Microarray Data Analysis
The overall data analysis process involved to distinct level of analysis, according to the flowchart depicted in figure 1.
Gene expression data analysis -differentially expressed gene list. The CEL files resulting from the hybridization were analyzed using the oneChannelGUI 1.6.5 bioinformatics tool [19]. Gene-level calculation was performed by Robust Multichip Average [20] and normalization by quantile sketch [21,22]. A data table (rma), together with the relative cel files and relevant information about the experiment, is available at http://www. ncbi.nlm.nih.gov/geo/under accession #GSE 41414.
Quality control of samples was assessed by unsupervised multidimensional scaling (MDS) analysis on all probeset intensity values, in order to assess the segregation efficiency of samples. In the MDS analysis samples are positioned in a tridimensional space on the basis of first three principal components of variability [17].
To assess differential gene expression levels, an empirical Bayes method was employed [23]. The intensity values were filtered using an inter quantile range (IQR) = 0,25, to remove invariant probe-set on the basis of their distribution in the array under analysis.
A hierarchical linear modeling approach was then used to identify differentially expressed genes. This method is based on fitting a linear model to estimate the variability of the studied data. A Benjamini & Hochberg (BH)-adjusted p-value was calculated and a p#0.05 cut-off was set. The resulting gene list was then annotated according to the Gene Ontology (GO) database (www. geneontology.org). This allowed assigning a category to each gene in the list, according to three defined ''ontologies'' (i.e. terms representing gene product properties): cellular component, biological process and molecular function.
The expression of selected genes was quantified in real time PCR to obtain an independent validation of microarray data. Real time PCR was carried out as previously described elsewhere [24]. Principal component analysis -discriminant gene list. The discrimination has been performed using an a posteriori unsupervised approach relying on the application of principal component analysis (PCA) of the gene expression profiles, where the genes represent the statistical units and the patients are used as variables. This inversion leads to a better statistically conditioned approach, since genes largely outnumber patients [25]. Moreover this approach minimizes the variability in gene expression profiles due to tissue type. The PCA has been used to extract a list of genes that best discriminate the two groups (patients and controls), adopting a correlation-based approach, devoid of any overfitting/chance significance risk. Indeed PCA is an unsupervised method, which analyses the differences in gene expression profiles between patients and controls, related to the very small part of the data variability [13,25,26]. To this aim, the principal components were extracted from a matrix having genes as statistical units and patients as variables. Raw data from the entire gene set were used without any a priori selection. PCA projects the initial space spanned by the different samples into a new derived space whose axes (principal components, PCs) are each other orthogonal. This allows for a direct, unbiased normalization of the data field, where the 'shared variance' is accounted for by the first principal component. The minor components (from second component onward) keep trace of the relevant differences among samples. The analysis of such a small difference was performed in terms of factor loadings (FL, correlation coefficients between original variables and components) and scores.
The space of component loadings represents different individuals in terms of similarities in the gene expression space. Fls are the correlation coefficients between original variables and components; given in our analysis the variables correspond to the individuals, FLs allow for a quantification of the weight of each patient and healthy sample on each principal component.
The FLs were then analyzed by a linear discriminant analysis to find out whether and which FL allows for a separation of the data set into patients and controls.
The component scores represent the contribution of the observations (statistical units, that are the gene expression values in this case) to the principal component. Thus for each gene expression value we calculated a score, with the component having a relevant discriminant power. This procedure allowed for a biological association of components to groups of genes and thus permitted a biological interpretation of the obtained discrimination.
In the space of component scores each gene is defined, and it is possible to order genes on the basis of their scores with components endowed with a relevant discriminant power.
Correlation analysis. The correlation existing between two genes in the population, based on their expression trends, was calculated using the Pearson correlation coefficient.
The quantification of the correlation between genes can be used to analyze the link between any pair of genes. In this work we use this information both to construct the gene network and to analyze the link between selected genes. Network analysis. The 100 genes that best discriminate patients from controls, sorted according to the principal component scores, were then analyzed in terms of gene correlation network.
To this aim, the intergenic correlation structure was represented as a network, where the nodes were genes connected by edges. The genes were connected if the correlation coefficient between their expression values in the population was higher than a given threshold (i.e. the trend of expression values over the population is similar) [27].
The choice of the threshold is not trivial. Many biologically relevant connections could not be included in such network if the threshold is too high, while lowering the correlation threshold will significantly increase the number of potential links, including many random ones. The choice of the threshold has been made according to the results obtained from surrogate data analysis, which allows quantifying the connections detected by chance or due to noise [27]. Surrogate data effectively destroy any correlation between pair of gene expression values across the population, by randomly shuffling the gene expression values, i.e., by reassigning each gene expression value to a different individual. On the other hand, if there are no correlation, randomly shuffling the expression values across individuals will not alter the correlation value: any associations should still be small and attributable to chance. To this aim, individuals in the population are indexed from 1 to n. The data are shuffled by computing a random permutation of the indices 1,..., n and assigning the i th gene expression value to the individual whose index is given by the i th element of the permutation. The shuffled data were then analyzed in terms of number of connections, given a certain value of correlation threshold. To increase the robustness of the method, 1000 realizations of surrogate data have been generated. The threshold was set by computing the average number of resulting connections for the surrogate data. Particularly, given N genes, the number of potential connections is N*(N21)/2 (for 100 genes it results in 4950 potential connections). Based on these considerations, the threshold for the correlation coefficient was set so that the number of connection for surrogate data was at least 10% lower than that obtained for original data, in this case 0.95. The wiring pattern of the networks was set on the basis of their actual correlation coefficient, as computed over the entire population.
The global topological structure of the networks was analyzed by using the Network Workbench, a large-scale network analysis, modeling and visualization toolkit for biomedical, social science, and physics research (freely downloadable at http://nwb.cns.iu. edu).
Besides the visual inspection of the networks, the functional connection of the genes was quantified by computing the average degree (AD) of the gene network. The AD denotes the number of links that connect a node to the rest of the network and is calculated as AD = 2*C/N, where C is the number of connections and N is the number of genes [28].
Functional categorization. Both the gene expression list and the network-based gene list have been functionally categorized using the Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/) [29]. The algorithms implemented in this software allow identifying overrepresented gene ontology (GO) terms with respect to the total number of genes assayed and annotated. To this aim, DAVID applies a modified Fisher exact test, to establish if the proportion of genes falling into an annotation category significantly differs from the background group of genes. In addition, this tool enables the fine mapping of genes within well-defined signaling and/or metabolic pathways, classified in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (www.genome.jp/kegg/). The KEGG mapping tool was employed for the functional categorization of the gene regulatory networks.
For this purpose, AffyGene IDs, corresponding to the genes in the selected list, were used as queries and the whole set of genes represented on the array was used as the background group. A false discovery rate (FDR) #0.05 was set.

Differentially Expressed Gene List
The MDS analysis based on the expression level of all the 8793 probesets spotted on the array was performed in order to evaluate the segregation of the sample groups. As illustrated in figure 2, the analysis showed the efficient segregation of ALS samples from control samples.
To identify differentially expressed genes, the dataset was filtered by IQR, which allowed obtaining 2478 transcript clusters out of 8793. Thereafter, an absolute log 2 -fold change $1 and a BH-corrected P-value #0.05 were used as cutoffs (figure 3). This allowed identifying 96 differentially expressed transcripts (table 1), including 16 downregulated and 80 upregulated genes in the ALS patient samples compared to controls.
According to the GO annotations, the gene list included functional categories related to the skeletal muscle structure and metabolism (table 2). In particular, three downregulated genes (namely PGAM2, FBP2 and ENO3, all muscle-specific) and an upregulated one (PFKP), encode enzymes involved in glycolysis and gluconeogenesis. Also, nine transcripts included in the gene list are active during skeletal muscle contraction and skeletal system development. These latter included: four distinct myosin chains (MYH3, MYH8, MYL3, MYL5), myogenin (MYOG), the myogenic factor 6 (MYF6) and collagens (COL1A1, COL1A2, COL3A1). In addition, the forkhead box O1 (FOXO1) gene, belonging to the FoxO family members present in skeletal muscle, and the cAMP-dependent protein kinase regulatory subunit RI1 alpha (PRKAR1A) were significantly up-regulated (table 2).
To further investigate the possible transcriptional interplay for FOXO1 and PRKAR1A the correlation existing between each of the two genes and all the other genes in the dataset was analyzed. The correlation coefficient between FOXO1 and PRKAR1A resulted statistically significant (0.89), suggesting that the expression trends of these two genes were closely correlated in the ALS group. In addition, the gene showing the highest degree of correlation with PRKAR1A in the ALS group was the Tripartite motif containing 32 (TRIM32) (correlation coefficient = 0.97). The same gene showed a lower but still significant correlation with FOXO1 (correlation coefficient = 0.85).
The results of quantitative real time PCR, performed to amplify selected genes from the gene list, confirmed the trends of gene expression obtained with microarray analysis (figure S1).

Biological Interpretation of the Differentially Expressed Gene List
The functional analysis of the gene list was accomplished using DAVID annotation tool, which allowed identifying the most significant biological functions in the data set (FDR #0.05). In particular, the biological processes were linked to: skeletal muscle development/contraction and response to organic substance, consistently with previous data [30]. The most significant cellular component and molecular function categories involved in this analysis were correlated to ''actin cytoskeleton'' and ''cytoskeletal protein binding'', respectively. The most represented 'biological pathway' categories were represented by focal adhesion, regulation of actin cytoskeleton and glycolysis/gluconeogenesis. Complete results are reported in table S1.

Discriminant Gene List
The PCA performed on the microarray dataset allowed categorizing the principal components (PCs) of data variability. Usually, the first PC (PC1) accounts for more than 97% of the total variability and represents the 'tissue type attractor' [14], i.e. the typical profile of the analyzed tissue. The vast part of variance explained by PC1 in this study confirmed the samples homogeneity, as a result of the careful sample selection carried out based on patients' clinical features ( Table 1).
The FL that further best discriminated between ALS patients and controls, corresponded to the second PC. Hence, the scores related to the PC2 were used to generate the list of most discriminant genes. The 100 genes with the highest FL scores (in absolute value) for the PC2 are showed in table 3, sorted in descending order. Common genes featured in both the differen-

Network Analysis of the Discriminant Genes
In order to go in depth into the mutual relations between the selected genes, the correlation structure connecting the 100 best discriminant genes was represented as a network, in which the nodes are genes connected by edges.
The graphical representation of the gene networks observed in the two experimental groups (figure 4), clearly showed that genes that were isolated in controls, formed a highly inter-connected sub-network in the ALS group. The correspondence between the numerical labels and the gene ontology annotation is displayed in table 3. For ALS patients, the AD of the network resulted to be 1.16. The network obtained from the normal group had many isolate nodes (83) and an AD as low as 0.2. The widest diseaserelated sub-network (formed by 22 genes) was mainly formed by mitochondrial genes, while the small sub-networks corresponded to ACTN3-and CHRNA1-correlated genes ( figure 4, table 3).

Biological Interpretation of the Networks
The DAVID-based functional analysis of genes connected in the ALS network, allowed identifying the over-represented functional categories in the list. Table 4 shows the categorization of the genes according to the GO terms. The top biological processes represented in the network (generation of precursor metabolites and energy; striated muscle contraction) were clearly connected to the muscle metabolic activity due to the contractile function (table 4). The molecular function categories indicated that genes interconnected in the ALS network shared a role in oxidative metabolism and in muscle structure definition. Finally, according to the cellular component ontology annotation of the network, a large number of networking genes in the ALS group encoded protein located in the intracellular compartment, either in the mitochondrion or in the myosin complex (table 4).

Discussion
This study attempted to provide a systemic view of the human ALS muscle expression profile, in terms of networks based on mutual correlations between intervening genes. A homogeneous patient group was recruited in this study, which comprised ALS patients with a comparable clinical background and disease stage. This homogeneity was reflected by the clear-cut segregation of patients and controls into two groups, based on MDS and PCA. This segregation also suggested that the gene expression profiles observed in the ALS group reflected the extent of skeletal muscle damage, rather than the site of muscle biopsy (either deltoid or quadriceps muscle). Pradat and colleagues previously analyzed the gene expression changes occurring in skeletal muscle from ALS patients in different stages of the disease [30]. Actually, 11 out of the 38 most significant transcripts described in that study were also present in our gene list. In particular we noted the upregulation of the myogenic factor 4 (MYOG) and the concomitant increase in the expression of the acetylcholine receptor subunits (CHRNA1), known to be under the transcriptional control of MYOG [31]. These genes were coherently upregulated in ALS muscles of both human patients and mice, reflecting the response of muscle to the increasing loss of innervation [30]. It is noteworthy that acetylcholine receptor expression has already been used for monitoring the effects of therapy on disease progression in the ALS mice [32].
Also, the small GTP-binding protein RRAD, involved in the modulation of cytoskeleton remodeling and inhibition of voltagegated calcium channel activity, and GADD45A, a cell cycle inhibitor involved in the apoptosis of atrophying muscle fibers, were upregulated in both the datasets presented in this study and in that by Pradat and colleagues. Moreover, ACTN3, HDAC4, PFKP, whose expression correlated with deltoid muscle injury in ALS patients [30], resulted significantly modulated in our data. In particular, ACTN3 is both the top down-regulated gene and the top-discriminating gene (i.e. the gene that best discriminates between ALS group and controls in the network analysis). On the whole, this independent reproduction of data on an additional sample set, indicates the robustness of the experimental design, along with a confirmation of the key features of the human ALS muscle gene expression profile.
In addition, this study provided original insights into the coordinated regulation of gene expression and its perturbation occurring in the human skeletal muscle of ALS patients, as a result    of the systems biology approach employed in the analysis of gene regulatory networks. This allowed hypothesizing previously unknown links between genes showing similar expression trends in ALS muscles compared to controls. In particular, the increased inter-gene connections, observed in the ALS group, should reflect the predicted universal effect of stress condition on biological systems, described by Gorban and colleagues [33]. The gene regulatory network, correlating genes that share common regulation within a biological system, tends to be constrained by the stressful event into a much more correlated model. Looking at the biological functions of the strongly connected genes of the ALS patients, a large number of mitochondrial genes belonging to the oxidative phosphorylation pathway were represented, along with two smaller networks including ACTN3 and CHRNA1. Distinct evidence has recently indicated that mitochondrial content, shape and function correlate with muscle wasting [9,34,35]. The gene networks observed in this study could confirm a sort of recognition of a 'crisis area' correspondent to mitochondrial activity, Z-line organization and CHRNA1 related cluster.Mitochondrial function, shape and number are closely related to muscle size and integrity. In particular, after denervation of the tibialis anterior muscle, genes typical of fast fibers were downregulated, whereas those typical of slow fibers were upregulated. These changes in gene expression appear to be coordinated in the direction of a fastto-slow transformation [36]. Consistently, the expression profiles of denervated muscles revealed the molecular signature of a reduced metabolic activity [37].
On this regard, Romanello and colleagues demonstrated that impaired mitochondrial function might activate signals that trigger muscle atrophy in mice, inducing a condition of energy unbalance through the AMP -activated protein kinase (AMPK) signaling [9].
Moreover, the connection between PRKAR1A and FOXO1, observed in this study, may provide some hints towards the delineation of the molecular events associated to human muscle atrophy. PRKAR1A encodes a regulatory subunit of the cAMPdependent protein kinase, which has been previously demonstrated to accumulate in the NMJ [38]. FOXO1 belongs to the forkhead family of transcription factors, playing a key role in the regulation of skeletal muscle mass. In particular, muscle-specific overexpression of FoxO1 is sufficient to cause skeletal muscle atrophy in vivo [39].
Also, TRIM32, a ubiquitin ligase, was highly correlated with both PRKAR1A and FOXO1 in this study. Recently Cohen and colleagues demonstrated that Trim32 ubiquitynates thin filaments (actin, tropomyosin, troponins) and Z-band components (aactinin), promoting their degradation, during fasting-induced atrophy in mice [40]. It is worth noticing that TRIM32 is not differentially expressed between ALS patients and controls, although being significantly connected with PRKAR1A and FOXO1 in this study. It has been previously shown that, despite its role, Trim32 expression does not increase in fasting-induced atrophy [41].
The gene regulatory network observed in the ALS group, included also a small subnetwork of downregulated genes: ACTN3, ENO3 and FBP2. ACTN3 is one of the four human This evidence supports the hypothesis that glyconeogenic enzymes in striated muscle form a metabolic complex on both sides of Z line.
On the whole, the results obtained in this study, supported by some of the most recent literature data, could pave the way to future targeted studies focusing on the functional link between genes involved in metabolic pathways and muscle contractility. Figure S1 qPCR validation. Real-time PCR validation. Results of real-time PCR on selected genes. Relative quantity values (RQ) obtained in qPCR are compared to fold changes obtained with microarray analysis (mA). Results for all genes are statistically significant (p,0.01).