A novel network analysis approach reveals DNA damage, oxidative stress and calcium/cAMP homeostasis-associated biomarkers in frontotemporal dementia

Frontotemporal Dementia (FTD) is the form of neurodegenerative dementia with the highest prevalence after Alzheimer’s disease, equally distributed in men and women. It includes several variants, generally characterized by behavioural instability and language impairments. Although few mendelian genes (MAPT, GRN, and C9orf72) have been associated to the FTD phenotype, in most cases there is only evidence of multiple risk loci with relatively small effect size. To date, there are no comprehensive studies describing FTD at molecular level, highlighting possible genetic interactions and signalling pathways at the origin FTD-associated neurodegeneration. In this study, we designed a broad FTD genetic interaction map of the Italian population, through a novel network-based approach modelled on the concepts of disease-relevance and interaction perturbation, combining Steiner tree search and Structural Equation Model (SEM) analysis. Our results show a strong connection between Calcium/cAMP metabolism, oxidative stress-induced Serine/Threonine kinases activation, and postsynaptic membrane potentiation, suggesting a possible combination of neuronal damage and loss of neuroprotection, leading to cell death. In our model, Calcium/cAMP homeostasis and energetic metabolism impairments are primary causes of loss of neuroprotection and neural cell damage, respectively. Secondly, the altered postsynaptic membrane potentiation, due to the activation of stress-induced Serine/Threonine kinases, leads to neurodegeneration. Our study investigates the molecular underpinnings of these processes, evidencing key genes and gene interactions that may account for a significant fraction of unexplained FTD aetiology. We emphasized the key molecular actors in these processes, proposing them as novel FTD biomarkers that could be crucial for further epidemiological and molecular studies.


Introduction
Frontotemporal Dementia (FTD), also known as Frontotemporal Lobar Degeneration (FTLD), is a neurodegenerative disorder characterised by deficit of executive functions, language impairment and behavioural disturbances [1,2]. FTD is considered the most common neurodegenerative dementia in the young adulthood along with Alzheimer Disease, and it is often familial (25-50%), usually with an autosomal dominant pattern of inheritance [3]. However, even though mendelian genetic determinants have been identified, such as microtubule associated protein tau (MAPT), Granulin (GRN) or C9orf72, in most of the cases no causative genes are recognized. For such cases, multiple loci and genes appear to influence FTD risk with rather small effect size [4].
Genome-wide association study (GWAS) is a powerful approach to evaluate the genetic components of human complex disorders including FTD [5][6][7]. The GWAS filtering strategy involves the evaluation of individual markers with the use of a genome-wide significance threshold p-value of 5 Ã 10 −8 under the assumption of independence among markers. This approach minimizes false discoveries and was effective in uncovering multiple Single Nucleotide Polymorphisms (SNPs) associated with complex diseases and traits. However, the published GWASs for FTD relied on analyses at the SNP level with few reproducible and genomewide significant findings, in independent samples [8]. To overcome SNP-based analysis limitations, gene-set analysis [9] has been proposed to examine groups of functionally related SNPs, grouped according to the corresponding gene locus. Gene-based tests range from the simple computation of overrepresentation of associated loci in annotation databases, including Gene Ontology (GO) biological processes (GO:BP) [10], KEGG [11] and Reactome [12] pathways up to the use of interaction networks and searches for subnetworks (modules) enriched with the associated genes [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27].
Network-based models are more powerful than other methods, since they can simultaneously include different biological interactions, enabling a topology-aware gene prioritization and testing at different complexity scales, including single connections, pathways, and communities [13][14][15][16][17][18]. Moreover, mapping genetic data onto a reference interactome provides a straightforward and meaningful way to model gene context [28,29]. Typically, in networkbased approaches, the context of a gene is thought to determine at least part of its properties. According to the so-called guilt-by-association rule, genes with a related function tend to be proximal in the interactome topology and share common profile patterns (e.g. gene expression levels). However, this may lead to misleading conclusions, especially for GWAS data, in which genetic variability may have subtle and unpredictable functional effects, close to statistical "noise" [8,19,29]. Three main reasons can be addressed through guilt-by-association underperformance: (i) the presence of hubs, being over-represented because attracting a large fraction of network interactions, (ii) the substantial topological difference between interaction networks and the directed acyclic graphs used to represent functional information in bioontologies, and (iii) compositional interactome biases due to well-studied groups of genes that may cause artificial over-representation of their associated terms. By contrast, a recent study [29] suggests that few biologically-critical interactions may account for a large fraction of the functional information content of the entire network, that do not necessarily involve hubs.
In the present work, we applied a novel network-based method that combines the principles of connectivity significance [13][14][15][16][17], through the concept of node and edge perturbation [18]. Genetic information encoded in the GWAS data was used to select seeds and weight the differential gene-gene co-variation, to reduce the network to its essential parts, removing non-informative noise and misleading nodes. To this end, we performed Steiner tree search [30][31][32][33], combined to Structural Equation Model (SEM) analysis [34][35][36]. Furthermore, we know that complex disorders tend to form specific sub-network structures, corresponding to groups of perturbed disease-associated genes [13][14][15][16][17][18]28,29]. We considered module sub-structure to characterize disease local perturbation, based on significant genetic differences between cases and controls. Through this molecular network framework, we built a metabolism-based FTD pathogenesis theory that can be summarized in two main steps: (i) oxidative damage accompanied by loss of neuroprotection, and (ii) abnormal neuronal activity and neurodegeneration. In the following discussion, we will examine the molecular bases of both processes, proposing a root for further molecular investigations in FTD aetiology. The goal of our pipeline, shown in Fig 1, was to extend the list of FTD risk genes uncovering the functional network underlying sporadic (i.e., non-mendelian) variability associated with this complex disorder.
for the purposes of this study. Each study site obtained approval from a local ethics committee (UK ethics committee number 10/H0716/3) or institutional research ethics board.
Control data. Control data [37] presented in this work are part of the HYPERGENES Project (European Network for Genetic-Epidemiological Studies; www.hypergenes.eu). Institutional review boards at each collection site approved the study and all individuals gave their informed consent. A further Ethical Revision of the University of Milano and of the HYPER-GENES Internal Ethical Steering Board approved the entire process.

Input data
Our network-based analysis relies on three inputs: (i) quantitative data, (ii) a set of seed genes, and (iii) a reference interactome.
The GWAS data used as input in the current work were generated in [4]. There the features of the study population were described in detail. For the cases, genotyping data of DNA samples diagnosed with FTD were available from the International FTD-GWAS data set for 634 samples obtained from 8 Italian research centers. After quality check (QC) steps 530 patients diagnosed with bvFTD (n = 418), semantic variant PPA (n = 27), agrammatic variant PPA (n = 61), and FTD-MND (n = 23) survived. All cases were diagnosed according to the Neary criteria [38] and/or the more recent Rascovsky and Gorno-Tempini criteria [1,2]. Age of onset was 64.1 ± 20.7 years (Mean ± SD); range: 29.0-87.0), including 287 (54.2%) women. The cases were collected and genotyped at the University College London by means of Illumina human 660K-Quad Beadchips assayed on the Illumina Infinium platform (Illumina, San Diego, CA, USA).
For the controls, genotyping data were obtained from the HYPERGENES project (European Network for Genetic-Epidemiological Studies; www.hypergenes.eu) [37]. All participants in the sample set (n = 1327; 926 after QC) had no abnormal findings on physical and neurological examination, were unrelated, collected in Italy, and of Caucasian ancestry. Age was 58.2 ± 6.1 years mean (±SD); range, 50.0-97.0), including 349 (37.7%) women. The control samples were genotyped at the University of Milan, using the Illumina 1M-duo array.
Quantitative data was generated by applying supervised Principal Component Analysis (sPCA) [39] over additive-encoded genotypes (0 for the frequent homozygote genotype, 1 for the heterozygote, and 2 for the rare homozygote genotype), and taking the first principal-component score (PC1) for each gene, as reported in our previous study [4]. Briefly, polymorphisms were grouped by gene membership, where the gene region is defined by its locus coordinates ±5000 base pairs. A single gene was then represented as a continuous score by taking the first supervised PC1 on a subset of SNPs selected using outcome (case-control) information, i.e. supervised PC1 was a weighted sum of the SNPs within the gene region that maximize the variance of the gene score, which then varies with the outcome.
Seeds are genes of interest, that can arbitrarily be chosen using existing knowledge (e.g. from databases such as OMIM, DisGeNET, or by text mining), and/or experimental data analysis (e.g. GWAS, expression data, or DNA-binding data). We used the list of FTD-associated genes from our previous SNP-to gene approach [4], applying an FDR threshold < 10%, yielding 280 putatively FTD-associated genes. These genes were referred to as the FTD-seeds.
An interactome is defined as the ensemble of known interactions among biological entities (e.g, genes, proteins, or metabolites) in a given organism, usually represented as a network. The interactome can be retrieved from many different sources, including KEGG  . We selected KEGG, for different biological and computational reasons, including annotation curation and completeness, and GO biological process mappability for functional annotation.
Furthermore, KEGG is a directed interactome, thus causality can be easily interpreted in terms of biological signalling, and can be used for the FTD model validation and extension. However, edge direction is not a necessary requirement in our method. We retrieved KEGG signalling pathways and merged them to obtain a unique network using graphite [45] R software package.

Network edge weighting
The input interactome was converted into a weighted network, endowed with node and edge weights reflecting their perturbation status. Genes (nodes) were weighted as being FTD-seed (weight = 1) and non-seed (weight = 0). Gene-gene interactions (edges) were weighted based on the case/control statistical difference. In our notation, j and k represented any two connected nodes of the network, with j->k being the edge direction. In general, we tested if the total difference between case vs. control groups for gene k through gene j was significant, given data. This implied testing the group change at the same time on gene j, gene k and their direct link j->k. The p-value was yielded by a t-test on the combined difference of the group over the node j, the node k, and their direct connection j->k, fitting a trivariate (X = j-th gene, Then, the edge weights were defined as inverse of negative logarithm of the p-values, w = 1/-log(p-value). In this way, edges with lower p-values had lower weights, on a positive continuous range. Intuitively, this weight can be assumed as the perturbance acting on the relationship between two connected genes in the interactome, due to the genotype difference between groups. The lower the p-value (i.e. the weight), the higher the perturbance. In general, we defined the perturbance over a node k, due to the action of a node j, as the altered status of j and k genes, and their j->k interaction in the diseased sample, comparatively to healthy controls.

The Steiner tree problem
The FTD seeds were mapped to the weighted interactome, and a FTD-related sub-network was constructed by adding new genes to connect FTD genes solving a Steiner tree problem [30][31][32][33], minimizing the sum of weights of every edge in the sub-graph. We applied a modified shortest path heuristic (SPH) distance algorithm, from Kou et al. solution [33], implemented in our subnet() R function. Our algorithm selected outgoing shortest paths combing the edge weights by Fisher's method and testing the statistical significance (p < 0.05) with multiple comparison Bonferroni correction. The resulting Steiner tree, corresponding to the maximum-perturbance sub-graph, preserves the original directed edges. Therefore, we distinguish three types of nodes: "sources" (emitting perturbance) with no incoming connections, "targets" (absorbing perturbance) with no outgoing connections, and "connectors" (transmitting perturbance) with incoming and outgoing connections. We referred to a perturbation route, as a perturbed path originating from a source node, traversing a number of connectors, and terminating in a target node. This tree was used as a backbone for the subsequent augmenting SEM step.

Structural Equation Model (SEM) analysis
The sub-graph obtained as a Steiner tree was converted into a SEM [34-36], such that every node in the sub-network corresponds to a variable of the SEM, and every edge is a relationship between variables. In summary, a group node (C = {0, 1}) connected to each gene was added inside the Steiner tree, and the overall sub-graph was converted into a system of linear equations, and then fitted using PC1-trasformed SNP genotype data. The system of linear equations has the form: with a covariance structure: where V(x) and V(y) are, respectively, the sets of the exogenous variables (i.e. source genes) and endogenous variables (i.e, connector plus target genes) in the sub-network. The linear equations define the relationships between the variable Y j with the group variable C and variables Y k in the "parents" set, pa (j), quantified by path coefficients β jC and β jk , respectively. Unobserved variables U j represent the variation of each Y j not explained by its parent nodes. The covariance structure describes the bi-directed relationships between variables in the "siblings" set, sib (j), quantified by covariances ψ jk , and interpreted as unmeasured common causes for pairs of variables. SEM analysis allowed to evaluate: (i) the overall goodness of fit of alternative constrains in the models, (ii) the significance of the group difference on every gene, and (iii) how significant the covariance between target nodes (i.e., genes that are not connected by a directed path) was, added to improved SEM goodness of fit. The significant new edges added in (iii), called "ancestral bow-free" covariances in SEM literature closed the directed paths into circuits characterized by the presence of signaling sources and targets.
Minimun AIC (Akaike Information Criterion) score, and a standardized root mean squared residual (SRMR) less than 0.05 were considered for model selection and overall good model fitting, respectively [34]. The statistical significance of the SEM parameters (the regression coefficients and the ancestral bow-free covariances) were estimated by Maximum Likelihood Estimation (MLE) and the beta coefficients were evaluated through t-test with bootstrap standard error (SE) with B = 1000 resamplings, and the significance level established at p < 0.05, two-sided.
After Fisher's normalizing t-transformation of the covariance, ancestral bow-free covariances, with abs(t) > 2 were selected. Successively, the selected covariances were tested by a SEM with Latent Variables (LVs) [47] using a model in which two target genes are connected through a LV modelling the underlying common unknown cause(s) acting on them. Every LV is subjected to group (C = {case, control}) effect. In this context, a LV is a variable not present in the initial Steiner tree, introduced to capture the differential case/control co-variance between two disconnected target nodes. The goodness of fit of each LV model was evaluated through a Likelihood Ratio Test (LRT), where a good fit corresponds to a p-value > 0.05 (i.e., there is no significant difference between sample and model covariance matrices). The influence of the group C over each LV was evaluated by a t-test (= MLE/bootstrap SE) with pvalue < 0.05, two sided. Every step of the SEM analysis was performed calling the lavaan

Functional analysis
After model building SEM assessment and model extension with LVs underling covariances between pairs of leaf genes, we proceeded with the biological evaluation. To improve the FTD model interpretation, we initially focused on nodes with specific topological properties (e.g., those connecting different portions of the gene network). We computed various topological indices, including in/out degree, i.e. the number of incoming/outgoing connections for a node, and weighted node betweenness [28], i.e. the number of weighted shortest paths travelling through a node in the network, by means of perturbance. Using these topological indices, we identified four types of nodes: (i) "springs", i.e. source nodes with high out-degree. (ii) "sinks", i.e. target nodes with high in-degree; (iii) "hubs", i.e. connector nodes with high in/ out-degree, and (iv) "bottlenecks", i.e. nodes with high weighted betweenness. We also defined as secondary sources many direct targets of proper sources, with in-degree 1 and out-degree 1, representing ligand-receptor interactions.
We focused on the concept of "essential" node, by combining definitions (iii) and (iv) as critical nodes for the FTD-network integrity, and part of the network backbone. These nodes carry the majority of perturbed information flow through the FTD network, and therefore we expect them to be assumed as suitable novel FTD risk factors.
Annotation enrichment over the main biomedical ontologies was used to functionally validate and extend our biological interpretation of the FTD model. Enrichment analysis was performed using a hypergeometric test with Bonferroni correction (adjusted p-value < 0.05), over GO

The FTD sub-network
We first generated the SNPs-to-Genes data set, taking the first principal-component score (PC1) for each gene obtained by applying supervised Principal Component Analysis over additive-encoded genotypes (see Materials and Methods). We then used the 280 FTD-associated genes from our previous study [4] as seed list, applying a False Discovery Rate (FDR) threshold < 10%. We also compiled the reference human interactome by merging KEGG signalling pathways [11]. After mapping PC1 quantitative data (13971 genes) over the KEGG database, our interactome was made of 4073 nodes and 34606 weighted edges (23075 directed and 11530 bidirected), with a median vertex degree of 4 and average directed shortest path distance equal to 5.464. Finally, gene-gene interactions (i.e. edges) were weighted by combining the effects of the group variable (i.e. C = {0, 1}, representing controls and cases, respectively) over pairs of nodes, and their direct connection (see Materials and Methods).
The interactome contained 83 out of 280 FTD seeds, with a median vertex degree of 7 and average directed shortest path distance equal to 4.396. Notably, a higher vertex degree in combination with a shorter weighted shortest path for the mapped FTD seeds showed that FTDassociated genes are more clustered than the rest of the interactome. To assess the significance of this finding we tested if the mapped FTD-seeds were more connected than a random selection of 83 genes from the interactome (B = 1000 samples) or a node permutation of the interactome (B = 1000 permutations). Results showed that there was a significantly higher degree (p = 0.034 and p = 0.029 for randomization and permutation tests, respectively) and a significantly shorter average shortest path (p < 0.001 for both tests) for the FTD genes.
We applied the Steiner tree algorithm to the mapped FTD seeds, detecting an initial set of 3854 shortest paths between the 83 seeds. The final FTD Steiner tree (Fig 2), including only connector genes traversed by perturbed paths, contained 167 nodes (77 out of 83 FTD seeds) and 166 edges (117 of which significantly perturbed, p < 0.05). The Steiner's tree topology revealed the presence of: (i) 43 sources, of which 33 FTD seeds; (ii) 49 (38 FTD seeds) targets; and (iii) 75 (6 FTD seeds) connectors. The Steiner's tree backbone, defining the mainstream perturbation route, was defined by the genes CAMK2A-EP300-TCF7L2, bifurcating to EGFR-CTNNB1 and MAPK8-JUN (Fig 2). SEM analysis of the Steiner tree was performed by fitting various models with different constraints on their parameters; the selected model, with fixed beta coefficients, equal variances, and bow-free covariances, yielded the lowest AIC score (AIC = -9324) and good fitting (SRMR = 0.026; see Table 1).

Topological analysis and essential nodes
To drive and improve our biological interpretation of the FTD network, we initially focused on nodes with specific topological properties, their function, and their possible implication in neurodegeneration. In particular, we searched for those genes in the FTD model that connect different perturbed routes throughout the network. Hereafter, we will use "essential nodes" to designate a class of genes that share a high number of perturbed connections (especially outgoing ones), thus bearing the largest amount of perturbed information spreading through the FTD network. This definition differs from the usual definition of hub, since it takes into account both the number of perturbed connections and perturbed shortest paths traversing a given node (Fig 2 and S1 Fig). Many of these are terminal seed-genes (i.e. peripheral nodes), designed as FTD-associated in our previous analysis [4] and confirmed as perturbed in the present study (i.e. green nodes in Fig 2 and S1 Fig). Besides seeds, we also identified additional FTD-associated genes (blue and red nodes in Fig 2 and S1 Fig), using SEM testing. However, also non-perturbed genes (i.e. not carrying FTD-associated variants) are functionally altered if they are significantly influenced by genes carrying disease-associated variants (i.e. they have perturbed incoming connections). Based on the perturbation routes, we could define topologically-critical FTD-associated genes.
A set of connectors bind sources to targets. These genes can both receive and send perturbed interactions. Steiner's connectors are genes with generally high centrality. They may have high degree (i.e. hubs, with degree > 3), such as MAPK8, and/or high betweenness (i.e. they are important for connecting network modules), such as TCF7L2. Connectors are key genes revealing the set of critical trait-specific molecular processes for the cell. Among all nodes in the FTD-module, the most central ones are designed as essential. Essential nodes are those genes that cannot be removed without a deep impact on network connectivity. Considering that the FTD network has been generated based on phenotype-associated genotypic variability, essential nodes should be also functionally critical for the FTD phenotype. To point out the most central connectors, we extracted an essential high weighted-betweenness sub-network (Fig 3). Essentiality combines the concepts of biological process perturbation with the importance for network structural integrity. The essential sub-network defines the FTD-network backbone, characterizing the most perturbed interactions. They include: (i) the EGFR-PLCB3 interaction, (ii) the CAMK2A-EP300-TCF7L2 perturbed path, and (iii) the PRKACG-MAPK8 interaction. The essential backbone highlight the presence of nodes with exceptionally    Oxidative stress and calcium/cAMP homeostasis-associated biomarkers in frontotemporal dementia highest incoming perturbance level (Fig 4). This is particularly interesting as this gene might be responsible of Ca +2 homeostasis impairments in FTD (see Discussion). Remarkably, PLCB3-specific LVs (LV6-9) connect this gene to four FTD-relevant targets: (i) GABRG3, a perturbed/seed GABA receptor, the major inhibitory neurotransmitter in mammalian brain;   Calcium pathway. Furthermore, Disease Ontology (DO) data showed enrichment for nervous system and mental health-related terms, including: central nervous system disease, cognitive disorder, psychotic disorder, and neurodegenerative disease (S5 Fig).
In our model perturbance is a measure of edge perturbation significance deriving from a synthesis of genetic variability (PC1) and phenotype (group) effects, considering the whole network architecture (through SEM). To assess the association between perturbance and FTDrelated term enrichment, we used existing annotations to map our network onto FTD-associated KEGG and DO annotations. The KEGG database reports 6 pathways related to FTD (KEGG ID: H00078), including: hsa04010 (MAPK signaling pathway), hsa04141 (Protein processing in the endoplasmic reticulum), hsa04144 (Endocytosis), hsa04310 (WNT signaling pathway), hsa04330 (NOTCH signaling pathway), and hsa04722 (Neurotrophin signaling pathway). As shown in S6 Fig, these pathways encompass the fully-perturbed backbone of our network, suggesting that our method can capture the core FTD-associated variability and extend it through the data-driven concept of edge perturbation. Notably, the FTD-network backbone contains the top-perturbed connections. To have a precise measure of the perturbance content we measured the proportion of perturbed connections in the KEGG FTD-specific sub-network comparing it to the total FTD-network, excluding LVs (i.e. the basal perturbance). The basal perturbance proportion is expected to be already high, since the whole disease-network is associated with FTD: 117/166 (= 70%) significantly perturbed edges. In the KEGG FTD-specific sub-network this percentage is even higher: 40/46 (= 87%). We repeated this measure using DO terms included in the Nervous System Disease (DOID:863) and Disease of Mental Health (DOID:150) ontology roots, obtaining the DO Nervous System-specific sub-network (S7 Fig). Also in this case, the sub-network perturbance proportion is higher than the basal one: 40/52 (= 77%).

Biological relevance of the FTD model
In the present study, we investigated whether FTD-associated genetic variability drives the identification of perturbed functional sub-networks, indicating likely impacted biological processes in frontotemporal lobar degeneration. The core structure of the essential FTD-network (Fig 3) highlights the central role of four biological processes in FTD: (i) WNT/Ca 2+ signaling, (ii) response to oxygen-containing compounds (ROCC), (iii) postsynaptic plasticity and Long Term Potentiation (LTP), and (iv) MAPK-JNK signaling. These functions encompass the entire set of genes and processes defining our theory. As a follow-up to our previous GObased gene-set analysis [4], several processes appeared replicated, these being neurodevelopment-related processes, Ca 2+ /K + channel regulation and LTP. In addition, in the current study we also found evidence for an implication of oxidative stress and DNA damage, coupled with impairments in the energetic and Ca 2+ /cAMP metabolism, involved in synaptic plasticity and neuroprotection. Particularly, the notion of DNA damage, which is associated with neuronal homeostasis and longevity, appears to gain momentum in the landscape of FTD pathogenesis [81,82,101,102].
Topologically, the FTD-network can be divided into in two main modules: (i) the noncanonical WNT/Ca2+ signaling (ncWNT)-associated module (Fig 4), and (ii) the MAPK/ JNK-associated module (Fig 5), including PRKACG-MAPK8, AKT3, and CREB3L2, and their sink JUN. The interface between the two sub-modules is represented by the TCF7L2-JUN-MAPK8 interaction (Figs 2 and 3), depicting an overall disruption of the DAG-IP3/Ca2 +/cAMP homeostasis and energetic metabolism, leading to the induction of several oxidative stress-responsive Serine/Threonine (Ser/Thr) kinases and their related targets, including members of the CREB family, protein phosphatases, junction proteins, K + cannels, and members of the FoxO signalling pathway. These critical Ser/Thr kinases include: (a) the Ca 2 + -dependent CAMK2A, PRKCG, PRKACG, PRKACB, and PRKG1; (b) the MAP-kinases MAPK8 and MAPK11; and (c) the AKT-kinase AKT3. Every Ser/Thr kinase has a specific influence over a group of functionally-related nodes, converging towards common target nodes. Specifically, we may define two high-degree sinks (Fig 3): CTNNB1 and JUN. Given these premises, we hypothesized that oxidative stress and DNA damage generated in response to energetic metabolism and Ca 2+ /K + homeostasis impairments, could be among the major potential molecular underpinnings of FTD pathogenesis.

The non-canonical WNT/Ca 2+ module
A large portion of our model involves the ncWNT pathway, converging to the sink CTNNB1-CREBBP, and targeted by a series of perturbed receptors: ITPR1-CAMK2A, WNT3-FZD4, FGF11-EGFR, and THRB. In the landscape of WNT signalling, the WNT3-FZD4 interaction might be a novel susceptible functional element in FTD. FZDs are G Protein-Coupled Receptors (GPCRs), involved in downstream DAG-IP3/Ca 2+ /cAMP signalling. WNT/FZD-mediated Phospholipases C (PLCs) activation leads to diacylglycerol (DAG) and inositol triphosphate (IP3) production, stimulating intracellular Ca 2+ release. Calcium release activates CAMK2A, inhibiting the GSK3 kinase and causing nuclear accumulation of beta-catenin (CTNNB1) and TCF-mediated gene regulation . Our model suggests that the disruption of this mechanism may involve CAMK2A directly, as a perturbed (i.e. disease-variant carrying) gene, and through its interaction with EP300-TCF7L2-CTNNB1. Beta-catenin has been implicated in many forms of neurological and cognitive impairments, including the Autism Spectrum Disorder, cell adhesion impairments, dendritic branching, and Long-Term Potentiation (LTP) [103]. Therefore, mutations in the CTNNB1 gene could lead to altered transcriptional activity and impaired synaptic plasticity that may result in brain malformation, intellectual disability, and neuronal loss [104].
TCF7L2 (a perturbed/seed node) is a beta catenin-interacting transcription factor [63,92] whose involvement in WNT signalling is well documented, although its physiological role in adult brain is still unclear [63]. This supports previous reports about calcium-dependent activation of CAMK2A and its implication in neurodevelopment, transcriptional regulation, cell fate determination [54,57, [93][94][95][96], and neurodegeneration [70]. Furthermore, both WNT3-FZD4 and FGF11-EGFR interactions reveal a key role of the PLCs PLCB1, PLCB2, and PLCB3 within our FTD-network (Fig 4). FTD-associated variants in PLCs may lead to improper intracellular Ca 2+ levels and thus dysregulation of the Ser/Thr kinases controlling the downstream MAPK-JNK signaling. Previous studies on PLCB1 showed an association between its depletion and early-onset epileptic encephalopathy [88], suggesting its involvement in learning impairments and neurodegeneration. PLCs activity was recently associated with the activation of the repair factor PARP1. Neuronal DNA is constantly exposed to ROS due to the intense mitochondrial activity in response to the high energy demand required by the Central Nervous System (CNS). This massive exposure to ROS causes the accumulation of single-strand breaks (SSBs) that remain unresolved, despite the presence of repair mechanisms [105]. In particular, aged cerebral neurons showed a group of genes targeted of PARP1-bound SSBs that are implicated in synaptic plasticity and long-term memory.
The second CAMK2-secific route includes EP300, CPT1B, and ACSBG1 (Fig 2). These nodes are involved in fatty acid metabolism and all of them can be found in mitochondria [106][107][108]. Moreover, CPT1B has been recently associated with behavioural disorders characterizing post-traumatic stress both in human and rodent models [107], and ACSBG1 participate in myelinogenesis [108].
The third CAMK2-secific perturbed route is represented by the central network axis CAMK2A-EP300-TCF7L2, showing a bifurcation through CTNNB1 and JUN (Fig 2). We also identified several important junction proteins, such as Ca 2+ , IP3, and cAMP transporters, highlighting the perturbed status of both ncWNT and MAPK signalling pathways. The interaction PRKCG-GJD2-TJP1 is one of the perturbation sources acting on CTNNB1. PRKCG is a Ca 2+ -activated Ser/Thr protein-kinase C (PKC), which mutations are known to be associated with spinocerebellar ataxia, characterized by cognitive impairment, tremor, and sensory loss [109]. Enrichment analysis showed how this PKC is involved in ROCC and LTP. Our model highlights GJD2 and TJP1 as possible perturbed PRKCG interactors. Former evidences demonstrated a strong association of GJD2 and TJP1 with schizophrenia [110].

Stress-induced Ser/Thr-kinases and neural cell death
The largest sink in our model is JUN, and its largest incoming hub is MAPK8 (aka JNK1). The second module of our FTD-network is centred on these two components of the c-Jun N-terminal kinase (JNK) pathway (Fig 5). The two mostly perturbed interactions acting on JUN in the entire FTD-network are PRKG1-CREB3L2 (aka BBF2H7), and PRKACG-MAPK8; together with the ncWNT-related route CAMK2A-EP300-TCF7L2.
The cGMP-activated Ser/Thr-kinase PRKG1 is a key mediator of the nitric oxide (NO)/ cGMP signalling pathway, involved in LTP and neuron branching [111]. CREB3L2, a PRKG1 target in the cGMP signalling pathway, is perturbed in our FTD-model. Remarkably, it has been demonstrated that CREB3L2, whose expression is induced by endoplasmic reticulum (ER) stress, is involved in preventing the accumulation of unfolded proteins in normal damaged neurons, and protecting neuroblastoma cells from ER-stress induced cell death [112].
On the other hand, PRKACG and MAPK8 are the FTD-connectors with the largest perturbed outgoing connectivity, and sharing one of the mostly perturbed interaction, indicating that they may play a critical role in FTD aetiology. MAPK8 is a key Ser/Thr kinase capable of phosphorylating JUN and prompting stress-induced cell apoptosis by phosphorylating other transcription factors, including TP53 [74][75][76][77][78], a direct MAPK8 target. Moreover, our model evidenced a large variety of potentially FTD-associated pro-apoptotic stress-responsive MAPK8 perturbed interactions and targets. The main target is the disease-variant carrying cytokine TNF, sharing with MAPK8 the second-largest perturbed interaction of the model (after MAPK8-PRKACG). Other important targets include: the Ser/Thr kinase MAPK11, the transcription factor FOXO1, the adapter protein CRK, the interleukin IL12B, the Ca 2+ -dependent phospholipase PLA2G4B, and the cytidylyltransferase PCYT1A. Conversely, MAPK8 is influenced by few perturbed interactions, among which the most important is PRKACG. The latter is the most degree-central Ser/Thr kinase in the FTD-model after MAPK8, and part of the perturbed route MAPK8-PRKACG-GRIN2B-PRKACB (Fig 5). PRKACG is a protein kinase A (PKA) involved in lipid and glucose metabolism, immune system response, and G1-checkpoint response during cell cycle. It can be dysregulated in response to different kind of stresses, including metabolic stress, DNA damage and cancer [20,78,93,100]. Alongside PRKACG, we found other protein kinases, NMDA receptors, and GPCRs that are involved in stress response, (neuronal) apoptosis, and critical neurological functions, such as LTP, learning, and behaviour [69,70,80,81,[91][92][93][94][95]99,100]. They include: PRKACB, PRKCG, PRKG1, AKT3, PIK3R1, PIK3CA, GRIN2B, together with the EGFR, CAMK2A, and FZD4. PRKACG and PRKG1 also regulate the activity of the potassium channels KCNMB4 and KCNMA1, respectively, through perturbed interactions. These channels are responsible for membrane excitation and sensitivity to Ca 2+ levels [113].
MAPK and NMDA receptor signalling, oxidative stress, and neurodegeneration in FTD  76]. A crucial NMDAR that emerged from our FTD model is GRIN2B, part of the NR2 subunit, and acting as glutamate agonist. It is the major excitatory receptor in the mammalian brain. Information exchange through the post-synaptic excitatory CNS neurons occurs mainly through the ionotropic glutamate NMDARs. Synaptic plasticity (i.e. the modulation of the activity of glutamatergic synapses) may persist over long periods of time, leading to the so called LTP. LTP leads to NMDARs activation and influx of Ca 2+ ions in the post-synaptic membrane. These processes deeply affect key cellular processes for learning and memory [57,70,75]. According to our model, the oxidative stress-responsive Ser/Thr kinase PRKACG also influences MAPK8. MAPK8 interacts, in turn, with MAPK11 (aka p38-beta), a member of p38-MAPK family [75], involved in neuro-inflammatory processes preluding neurodegeneration. This is also supported by the highly perturbation of the osmotic stress-induced cytokine TNF, target of MAPK8 (Fig 5). Furthermore, MAPK8 may elicit neuronal cell death through activation of the perturbed target TP53 [73][74][75][76]. Another interesting MAPK8 target is FOXO1, involved in oxidative stress response and neuronal cell death [77]. In this respect, our current study supports previous work highlighting a role for oxidative damage and cell degeneration neurological disorders [68][69][70][71][72][73][74][75][76][77][78][79][80][81][82][83], including FTD [52,53].
Notably, our findings support neurodegeneration-associated mechanisms, particularly in astrocytes. Astrocytes are glial cells implicated in various neurodegenerative diseases, including Amyotrophic Lateral Sclerosis (ALS), Alzheimer's Disease (AD), Huntington's Disease (HD) and Parkinson's Disease (PD), although not much is known to date in FTD [114]. In general, Astrocyte-mediated onset and progression of these disorders seems to be a consequence of loss of homeostatic functions (e.g., Ca 2+ /cAMP homeostasis, implicated in neuroprotection), and gain of disruptive functions (e.g. abnormal LTP and ROS accumulation) [115,116]. Interestingly, astroglial cells are known to modulate neuronal activity through glutamate release, causing an NMDAR-mediated increase in Ca 2+ [117,118]. An excess of neuronal metabolic activity, accompanied by an impaired calcium homeostasis, due to astroglial-associated disorders, could make CNS synapses more sensitive to oxidative damage. As reported in the results section, oxygen radicals present in ROS induce phosphorylation of MAPK-associated NMDARs, activating their coupled PLCs, and consequently cGMP-dependent oxygenresponsive kinases (including PRKG1, PRKACG, PRKACB, PRKCG, and CAMK2A) [57,70,76]. The activity of these oxygen-responsive kinases could be the cause of abnormal LTP, neuroinflammatory response, and neuronal cell death [52,53,68-83].

Methodological and theoretical aspects
The core of our method consists in reducing the initial network to a minimum set nodes and edges bearing the disease-associated information content of the interactome (KEGG, in this case), filtered to exclude misleading interactions and noise. Although the initial interactome is given, it is then transformed and cross-connected, respectively, by perturbance and covariance-driven LVs inclusion. The last passage enables to also evaluate possible unobserved sources of co-variation. Several disease-relevant aspects apply to our 'seed-and-expand' method.
First, our approach assumes that genetic variability influences both single genes and their interaction. We expect that phenotype-associated genetic variants influence proteins' chemical-physical properties and thus their functions as well as interactions. In this perspective, perturbance is a measure of deviation from the given (i.e. KEGG) interactome. Our supervised-PC1 scores of SNP-to-gene quantification reflects genetic variability at a given locus, therefore the concept of perturbance can be explained in terms of phenotype-associated (i.e. FTD-associated) genetic variation over gene-gene interactions. More in general, the key entity comprised in the definition of perturbance is the information transmitted through such interactions, from the source of perturbation to the final target. If the genotype is altered, also the biochemical properties of the effectors are influenced. Therefore, the genetic alterations that underlie edge perturbations are not limited to deep structural rearrangements, but my also include mild phenotypes that cause small changes in protein functionality and connectivity, resulting in functional impairments [18].
Second, given the scale-free nature of biological regulatory networks [28] one natural class of relevant nodes is represented by hubs, many of which are not perturbed, including FZD4, TJP1, EP300, JUN, and PRKACG (Figs 2, 3 and 4). This supports the hypothesis according to which hubs are essential for embryonic development and thus are less likely to accumulate disease-related variants that usually turn to be lethal [28,29]. Nevertheless, their functional involvement in the disease phenotype is critical since they send and receive most perturbed connections. However, our method recovered also perturbed hubs, especially those connecting different functional sub-modules, including TCF7L2 and MAPK8. Furthermore, seeds and novel perturbed genes are not necessarily the closest to hubs, but rather those which are mostly influenced by disease-related genetic variation. In our model this is shown through paths maximizing edge perturbation, allowing to avoid the classical and often misleading guilt-by-association rule and favouring disease-relevant (or in general phenotype-relevant) variation. In this scenario, the discovery of novel disease-associated genes is at the same time dependent on gene perturbation, local gene context (i.e. edge perturbation), and the whole disease-network topology, through the overall FTD-model fitting.
The final FTD-network revealed several useful notions: (i) the presence of significantly relevant routes, sources, connectors (including hubs), and targets of perturbance; (ii) the presence of latent "common cause(s)" underling hidden gene-circuits, which connect perturbed routes; (iii) the detection of novel FTD-biomarkers based on functional perturbed connectors. Of note, other than for effect of genetic data, the pipeline is also suited for assessing any kind of quantitative data, including eQTL, expression microarrays, and RNA-seq, in the context of complex disorders.
In conclusion, the current study allowed to expand our previous work [4], revealing the potential functional impact of the genetic variation associated with FTD in the Italian population, using a newly developed network model. Specifically, our study suggests that the genetics underpinning (Italian) FTD is associated with metabolic stress involving possible oxygen compound excess and Calcium/cAMP homeostasis deregulation. This happens through different complex molecular mechanisms, including Adenylate Cyclases, PDEs, stress-induced Ser/Thrkinases, and WNT/EGFR and NMDA receptors malfunctioning. According to our hypothesis, these impairments lead to neuronal damage, loss of neuroprotection, and eventually to cell death. Moreover, by indicating these complex biological processes we also indicate their key functional players, i.e. those genes that we called novel potential FTD-biomarkers, which appear to be perturbed, and having synergic and small effect size.

Conclusions
We indicate oxidative stress as a major cause, not only of neurodegeneration in several forms of dementias, but also sporadic FTD. Firstly, we showed as Calcium/cAMP homeostasis and energetic metabolism impairments as the primary causes of loss of neuroprotection and neural cell damage, respectively. Secondly, we described how the altered postsynaptic membrane potentiation due to the activation of stress-induced Serine/Threonine kinases leads to neurodegeneration. By studying the molecular underpinnings of these processes, our study evidences key genes and gene interactions that may account for a significant fraction of unexplained sporadic FTD aetiology.
To the best of our knowledge, this is the first functional network study that highlights potentially impacted biological processes for FTD in the Italian population. In the future, we foresee to apply this method to the entire FTD-GWAS and other population-specific FTD datasets, as well as to the integration of gene expression analysis and genotyping data, to evaluate and refine the impact and translation of disease-associated genetic variation into the functional domain.  Table. Selected ancestral bow-free covariances (p < 0.05, values in bold) between pairs of "target" (outgoing degree = 0) nodes of the extracted Steiner tree. Pairs of target genes that do not share a directed path are called ancestral bow-free nodes. When these genes share significant covariances, there may be an unobserved common cause perturbing their interaction. This condition is evaluated using a latent variable (LV) model in which a LV, influenced by the group variable C (0 = controls, 1 = cases), is connected to the two bow-free targets. A LV is designed as a significant unknown cause acting on the targets, if the C->LV interaction is significant (i.e., p-value(C->LV) < 0.05, in bold), and the LV model has a good fit (i.e., pvalue of the Likelihood Ratio Test (LRT) ! 0.05, in bold). (DOCX) 6. Ferrari