Application of Artificial Neural Networks to Investigate One-Carbon Metabolism in Alzheimer’s Disease and Healthy Matched Individuals

Folate metabolism, also known as one-carbon metabolism, is required for several cellular processes including DNA synthesis, repair and methylation. Impairments of this pathway have been often linked to Alzheimer’s disease (AD). In addition, increasing evidence from large scale case-control studies, genome-wide association studies, and meta-analyses of the literature suggest that polymorphisms of genes involved in one-carbon metabolism influence the levels of folate, homocysteine and vitamin B12, and might be among AD risk factors. We analyzed a dataset of 30 genetic and biochemical variables (folate, homocysteine, vitamin B12, and 27 genotypes generated by nine common biallelic polymorphisms of genes involved in folate metabolism) obtained from 40 late-onset AD patients and 40 matched controls to assess the predictive capacity of Artificial Neural Networks (ANNs) in distinguish consistently these two different conditions and to identify the variables expressing the maximal amount of relevant information to the condition of being affected by dementia of Alzheimer’s type. Moreover, we constructed a semantic connectivity map to offer some insight regarding the complex biological connections among the studied variables and the two conditions (being AD or control). TWIST system, an evolutionary algorithm able to remove redundant and noisy information from complex data sets, selected 16 variables that allowed specialized ANNs to discriminate between AD and control subjects with over 90% accuracy. The semantic connectivity map provided important information on the complex biological connections among one-carbon metabolic variables highlighting those most closely linked to the AD condition.


Introduction
Folate metabolism, also known as one-carbon metabolism, plays a fundamental role in DNA synthesis and integrity, in chromosome stability, in DNA and protein methylation, as well as in antioxidant defence mechanisms, and impairments of this pathway have been often linked to Alzheimer's disease (AD) risk [1][2][3][4]. In 1990, Regland and colleagues first reported elevated homocysteine (hcy) levels in patients with primary degenerative dementia [5]. Since then, several researchers have investigated the levels of hcy, folate, and other B group vitamins involved in one-carbon reactions, such as vitamin B12, in mild cognitive impairment and AD [6][7][8]. Most of the retrospective studies focusing on the comparison between plasma hcy levels in AD patients and healthy controls revealed increased hcy values in AD subjects [3,9]. Also evidence from prospective studies suggests that a moderate elevation in hcy levels is a potential AD risk factor [3,9]. However, results are often conflicting, and it remains controversial whether hyperhomocysteinemia (hhcy) is really an AD risk factor or rather a consequence of the disease [4,9,10]. Several retrospective studies observed significantly decreased serum folate levels in AD subjects with respect to controls, and an inverse correlation between plasma hcy and serum folate [1,11]. There is also indication that low serum vitamin B12 levels are associated with neurodegenerative diseases and cognitive impairment [12], and several clinical investigations have demonstrated that folate and related B-vitamins administration is able to reduce hcy levels and antagonize some mechanisms favouring neurodegenerative impairments, as mild cognitive impairment and dementia [13]. In addition, increasing evidence from large scale case-control studies, genome-wide association studies (GWAS), and meta-analyses of the literature suggest that polymorphisms of genes involved in one-carbon metabolism influence the levels of folate, hcy and vitamin B12, and might be among AD risk factors [8,[14][15][16][17]. Unfortunately, the overall results of the literature are sometimes conflicting and often insufficient to disclose the effective relationship among studied variables [2]. This is partially due to the complexity of the one-carbon metabolic pathway ( Figure 1) and to the number of genes and environmental factors involved [2], as well as to the fact that traditional statistical algorithms are both unsuitable and underpowered to dissect the relationship between high number of markers due to the non-linearity and complexity of the folate metabolic pathway [18].
We performed the present study using Artificial Neural Networks (ANNs) to identify key factors linking folate metabolism to AD. The method used by ANNs aims to understand natural processes and recreate those processes using automated models. These networks allow a method of forecasting with understanding of the relationship among variables, and in particular nonlinear relationships [19][20][21]. ANNs function by initially learning a known set of data from a given problem with a known solution (training) and then the networks, inspired by the analytical processes of the human brain, are able to reconstruct the imprecise rules which may be underlying a complex set of data (testing). In recent years ANNs have been used successfully in medicine, for example they have been used to investigate the predictive values of risk factors on the conversion of amnestic mild cognitive impairment to AD [22], to identify genetic variants essential to differentiate sporadic amyotrophic lateral sclerosis cases from controls [23,24], to understand the relationship among polymorphisms of genes involved in one-carbon metabolism, chromosome damage, and maternal risk for having a birth with Down syndrome [18], to detect multiple genes of smaller effects in predisposing individuals to Barrett's esophagus [25], and to differentiate fronto-temporal dementia from AD [26], among others.
In this pilot study we applied ANNs to investigate biochemical and genetic markers related to one-carbon metabolism in 40 late onset AD patients and 40 matched controls selected from a previously described database [8,27] in order to assess the predictive capacity of ANNs in distinguish consistently these two different conditions and to identify the variables expressing the maximal amount of relevant information. Moreover, we used the Auto Contractive Map-Auto-CM algorithm (Auto-CM), a special kind of Artificial Neural Network able to define the strength of the associations of each variable with all the others and to visually show the map of the main connections of the variables and the basic semantic of their ensemble [28,29]. Auto-CM was previously applied by us to a dataset of genetic and cytogenetic data collected from mothers of Down syndrome individuals and matched control mothers [18] and successfully disclosed previously unknown connections among polymorphisms of genes involved in folate metabolism and chromosome damage and malsegregation events in those women.
At best of our knowledge no previous study has investigated the relationship among biochemical markers of one-carbon metabolism (folate, hcy, vitamin B12) and genetic polymorphisms of major enzymes involved in this pathway (methylenetetrahydrofolate reductase: MTHFR; methionine synthase: MTR; methionine synthase reductase: MTRR; thymidylate synthase: TYMS; reduced folate carrier: RFC1; DNA methyltransferases: DNMTs) by means of ANNs in AD. The aim of this study was to investigate whether this revolutionary mathematical approach can increase our knowledge on the connections among those variables in AD and matched control individuals and to identify key variables to discriminate among these two conditions.

Study Population
We aimed to re-analyze from a completely new perspective some of the data obtained from our previous studies [8,27]. From a previously described dataset [8,27] containing data from AD patients and healthy matched controls, we have selected 40 late onset AD (15 males and 25 females, mean age at sampling 78.1 ± 6.3 years) and 40 age and sex matched control subjects (17 males and 23 females, mean age at sampling 76.5 ± 6.7 years) for whom all the following information on one-carbon metabolism was available: 1) plasma hcy levels, 2) serum folate levels, 3) serum vitamin B12 levels, 4) genotype for the MTHFR 677C>T (CC, CT or TT) polymorphism (rs1801133), 5) genotype for the MTHFR 1298A>C (AA, AC or CC) polymorphism (rs1801131), 6) genotype for the MTRR (AA, AG or GG) 66A>G polymorphism (rs1801394), 7) genotype for the MTR 2756A>G (AA, AG or GG) polymorphism (rs1805087), 8) genotype for the SLC19A1 (RFC1) 80G>A (AA, AG, GG) polymorphism (rs1051266), 9) genotype for TYMS 28-bp repeats (2R2R,2R3R,3R3R) polymorphism (rs34743033), 10) genotype for TYMS 1494 6bp ins/del (+/+, +/-, -/-) polymorphism (rs34489327), 11) genotype for DNMT3B -149C>T (CC, CT, TT) polymorphism (rs2424913), and 12) genotype for DNMT3B -579G>T (GG, GT, TT) polymorphism (rs1569686). As detailed elsewhere [8,27] all subjects included in our dataset were Caucasians of Italian origin (North-West Tuscany and neighboring areas) and diagnosis of probable AD was performed according to DSM-IV and NINCDS-ADRDA criteria at the time of patients recruitment [8,27]. The 40 AD subjects included in the present study also met the revised core criteria for probable AD [30]. A progressive cognitive decline on subsequent evaluations was observed. All the subjects included in the present study were sporadic cases, and none of them was a carrier of a causative genetic mutation in APP, PSEN1, or PSEN2 [30]. Control subjects consist of healthy volunteer subjects having no individual or family history of dementia or cognitive decline [8,27]. Table 1 shows the distribution of the studied variables among AD subjects and controls. All the samples were coded and data were processed in blind by operators. Figure 2 explains how genotypes were coded in the database. All individuals gave written informed consent for inclusion in the database, whose creation was performed in accordance with the Helsinki Declaration and approved by the Ethics Committee of the Pisa University Hospital (Project Reference N°3 618/2012).

Genotyping and biochemical data collection.
The database data concerning folate, hcy and vitamin B12 values and the genotypes for all the studied polymorphisms have been previously obtained by means of standard diagnostic protocols and validated PCR/RFLP techniques as described elsewhere [8,27].  [18]. Folates require several transport systems to enter the cells, the best characterized being the reduced folate carrier (RFC1). Methylenetetrahydrofolate reductase (MTHFR) reduces 5,10methylenetetrahydrofolate (5,10-MTHF) to 5-methyltetrahydrofolate (5-MTHF). Subsequently, methionine synthase (MTR) transfers a methyl group from 5-MTHF to homocysteine (Hcy) forming methionine (Met) and tetrahydrofolate (THF). Methionine is then converted to S-adenosylmethionine (SAM) in a reaction catalyzed by methionine adenosyltransferase (MAT). Most of the SAM generated is used in transmethylation reactions, whereby SAM is converted to S-adenosylhomocysteine (SAH) by DNA methyltransferases (DNMTs) that transfer the methyl group to the DNA. Vitamin B12 is a cofactor of MTR, and methionine synthase reductase (MTRR) is required for the maintenance of MTR in its active state. If not converted into methionine, Hcy can be used for the synthesis of glutathione (GSH) in a reaction catalyzed by cystathionine b-synthase (CBS) and other enzymes. Another important function of folate derivatives (THF and dihydrofolate: DHF) is in the de novo synthesis of DNA and RNA precursors (dUMP, dTMP, etc). This pathway is mediated by thymidylate synthase (TYMS), methylenetetrahydrofolate dehydrogenase (MTHFD), and phosphoribosylglycinamide transformylase (GART) enzymes.

TWIST algorithm
TWIST algorithm is a complex evolutionary algorithm able to look for the best distribution of the global dataset divided in two optimally balanced subsets containing a minimum number of input features useful for optimal pattern recognition. TWIST is an evolutionary algorithm based on a seminal paper about Genetic Doping Systems [52], already applied to medical data with very promising results [18,19,21,[53][54][55][56][57][58][59]. Usually TWIST evolutionary system is constituted by a population of Multilayer Perceptrons. Each ANN has to learn a subset of the global dataset and has to be tested in a blind way with another subset. In this application we re-program the fitness function of TWIST: we exchange the population of Multilayer Perceptrons with a population of simple K Nearest Neighbour (KNN), based on Euclidean metric. This change makes TWIST faster and more oriented to discover explicit similarities between input attributes and classes (AD and Controls). And that is exactly what we were looking for. Indeed, TWIST selected 16 of the 30 original attributes (see Table 3) and generated a global dataset of 16 attributes, and two optimal subsets for training and testing. We then applied the K-Fold protocol to the global dataset to verify if the 16 attributes selected by TWIST may improve the performances of the learning machines already applied to the original dataset. Moreover, since the K-fold protocol is not always a trustable strategy [51], as a second step we have applied the same learning machines (Table 2) to the two subsets generated directly by TWIST.

Semantic connectivity map
An existing mapping method [28,29] was used to highlight through a graph the most important links among variables, using a mathematical approach based on an artificial adaptive system called Auto Contractive Map-Auto-CM algorithm. The Auto Contractive Map (Auto-CM) is a special kind of Artificial Neural Network able to find, by a specific data mining learning algorithm, the consistent patterns and/or systematic relationships and hidden trends and associations among variables. After the training phase the weights developed by Auto-CM are proportional to the strength of associations of all variables each-other. The weights are then transformed in physical distances. Variables couples whose connection weights are higher become nearer and vice versa. A simple mathematical filter represented by minimum spanning tree is applied to the distances matrix and a graph is generated. This allows seeing connection schemes among variables and detecting variables acting as "hubs", being highly connected. This matrix of connections preserves non linear associations among variables and captures connection schemes among clusters. After the training phase, the weights matrix of the Auto-CM represents the warped landscape of the dataset. Subsequently, a simple filter to the weights matrix of the Auto-CM system was applied to obtain a map of the main connections between the variables of the dataset and the basic semantic of their similarities, defined connectivity map as detailed elsewhere [28,29]. The dataset data were coded as shown if Figure 2 for genotypes. We transformed the three biochemical variables (folates, hcy, and vitamin B12) in six input variables constructing for each of the variable, scaled from zero to 1, its complement, as detailed elsewhere [60].  Tables 4 and 5 show the results in the two selected strategies of validation (K-Fold and Training and Testing with random Split, respectively) and using all the 30 variables in the dataset as input vectors. Generally speaking the classification capabilities of all the algorithms are poor (from 50% to 65% in general accuracy) and sometimes similar, except the Sine Net (71% of general accuracy). The conclusion could be: there is no evidence of correlation between these variables and AD, at least in this dataset. However, the application of TWIST algorithm to eliminate noisy variables before the main test of pattern recognition allowed the selection of 16 attributes (listed in Table 3). First, we have applied the K-Fold protocol to the global dataset to verify if the 16 attributes selected by TWIST may improve the performances of the learning machines already applied to the original dataset. Table 6 shows the results. The most of learning machines improve dramatically their performances (up to 70% and more of global accuracy) and both the Semeion ANNs reach up the 77% of global accuracy. Consequently, two of the tested algorithms were able to find a good correlation between some variables and AD, once noisy attributes were removed.

Classification performances with ANNs
But K-Fold protocol is not a trustable strategy as shown in [51]. TWIST, in fact, generates also two new subsets (with the selected variables) with a similar density of probability distribution [51]. That means that the two subsets are statistically equivalent and each of them is also equivalent to the global dataset. The K-Fold protocol has not this capability, and for this reason its results are an average whose variance could be very high. For this reason as a second step we have applied the same learning machines to the two subsets generated directly by TWIST. The results are shown in Table 7. In this case the performances of all the learning machines are still improved and some of them (Sine Net, IBk and Back Prop) show to be able to be used as optimal predictors of AD (Table  7).    Semantic connectivity map Figure 3 shows the semantic connectivity map obtained with the application of the Auto-CM system. Variables which have the maximal amount of connections with other variables are called "hubs" of the system. In order to better understand the meaning of the connections a numerical value is applied to each edge of the graph. This value, deriving from the original weight developed by Auto-CM during the training phase scaled from 0 to 1, is proportional to the strength of the connections among two variables. Results clearly indicated that AD cases can be visually separated from the controls, and particularly it was possible to visualize an AD area characterized by low folates, low vitamin B12, high hcy and several risk genotypes,  and a control area characterized by low hcy, high folates, high vitamin B12, and several protective genotypes (Figure 3). Moreover, by means of Auto-CM, it is possible to obtain not only the direction of the association as provided by standard statistical analyses, but importantly also the strength of this association (link strength = ls). For example, reduced folates were strongly (ls=0.98) related to AD as it was the MTHFR 677 mutant (TT) genotype (ls=0.90) and the TYMS 1494 6bp mutant (-/-) genotype (ls=0.88). Reduced folates were also closely linked to low levels of vitamin B12 (ls=0.99), and this condition was linked to increased hcy levels (ls=0.82). Several genotypes were also connected to low vitamin B12 levels ( Figure 3). Concerning control subjects they resulted strongly connected with the TYMS 1494 6bp wild-type (+/+) genotype (ls=0.92) and with reduced hcy levels (ls=0.98) which in turn were connected with high vitamin B12 and folate levels, as well as with several genotypes (Figure 3).

Discussion
Both prospective and retrospective studies have suggested a possible link among folate, hcy, and vitamin B12 levels and AD risk [3][4][5][6][7][8][9][10][11][12][13]. Moreover, there is indication from genetic association studies, GWAS, and meta-analyses of the literature, suggesting that polymorphisms of genes involved in one-carbon metabolism might represent AD genetic susceptibility factors [14][15][16][17]. In the present study we selected 40 late-onset AD subjects and 40 age and sex matched controls to see if ANNs were able to discriminate between those two conditions using a set of data that included the circulating values of folate, hcy and vitamin B12 and 27 different genotypes generated by nine biallelic polymorphisms of genes involved in one-carbon metabolism.
Through TWIST system, we established a consistent possibility to predict the status of being an AD or a control subject on the basis of 16 selected variables ( Table 3) that allowed to reach up to 90% global accuracy to some of the used learning machines (Table 7), this meaning that the selected variables contained specific information to discriminate between the two conditions. In particular, folate and hcy values, as well as MTHFR 677CC, MTHFR 677TT, MTHFR 1298CC, TYMS 28bp 2R/3R, TYMS 1494 6bp +/+, TYMS 1494 6bp +/-, MTRR 66AG, MTR 2756GG, RFC1 80GA, DNMT3B -149CT, DNMT3B -149TT, DNMT3B -579GG, DNMT3B-579GT, and DNMT3B -579TT genotypes resulted the most important variables for discriminating between AD and control subjects ( Table 3). Most of these variables, such as folate, hcy, MTHFR, MTRR, and RFC1 genotypes, had been previously associated with AD risk (reviewed in 2), but others, including TYMS and DNMT3B genotypes, were not associated with disease risk when considered independently from the others [8,27]. The present study represents the first attempt to use ANNs to understand the complex relationship between one-carbon metabolism and AD, and at best of our knowledge also the first attempt to evaluate the combined effect of 30 different variables in this pathway to AD pathogenesis. ANNs provided a valuable tool to evaluate the whole pathway and to unravel the links among studied variables as shown in the semantic connectivity map (Figure 3). Particularly, the semantic connectivity map obtained by means of the Auto-CM system revealed already known connections as well as novel ones ( Figure 3). It is not surprising that reduced folates resulted the most related variable linked to AD (ls=0.98), since several literature papers observed reduced blood folate levels in AD patients with respect to controls [2,6,8,11,13]. Moreover, the MTHFR 677TT genotype was closely linked to AD (ls=0.90), and this is also known from the literature [8,15], as it is known that the effect of this mutant genotype is exacerbated under conditions of reduced folates that impair protein stability and activity [2]. The observed strong link between reduced folates and reduced vitamin B12 levels (ls=0.99) is also known [1,8], and this condition is likely to foster an increase in hcy levels (ls=0.82) that is often seen in AD individuals [1][2][3][4][5][6][7][8][9][10], likely because of vitamin B12 is a cofactor required by the MTR/ MTRR complex during the conversion of hcy to methionine ( Figure 1). Indeed, several genotypes such as those generated by MTHFR, MTR, and MTRR polymorphisms are likely to contribute to vitamin B12 levels, but also those in TYMS and DNMT3B genes for the existence of feedback inhibitory loops in the pathway [2]. Very interesting and unexpected is the link between the TYMS 1494 6bp -/-genotype and AD (ls=0.88), that was paralleled by a strong link between the the TYMS 1494 6bp +/+ genotype and the condition of being a healthy control (ls=0.92) (Figure 3). At best of our knowledge the present is the first report of a possible contribution of this polymorphism to AD risk. The TYMS 1494 6bp ins/del polymorphism impairs the TYMS mRNA stability and is often studied in conjunction with the 28bp repeat polymorphism in the promoter of the gene that affects gene expression levels [2]. Previous reports by us revealed a borderline significant difference (P =0.08) in the distribution of TYMS 28bp 2R and 3R alleles and related genotypes between late onset AD subjects and healthy matched controls [8]. Taken overall, present and previous data by us suggest that TYMS might be another candidate gene of the one-carbon metabolic pathway deserving further investigation in AD genetic association studies. Indeed, impairments of TYMS might shift the metabolic pathway toward DNA methylation (Figure 1), and favour epigenetic processes that are increasingly linked to AD pathogenesis [2].
Among factors tightly linked to controls there is low hcy (ls= 0.98), which is linked to high folates and high vitamin B12. This is not surprising since several authors previously observed reduced hcy and increased folate and vitamin B12 levels in controls with respect to AD subjects [1][2][3][4][5][6][7][8][9][10][11][12][13]. Several gene polymorphisms are linked to those conditions. For example, MTHFR 1298 homozygous genotypes are in the control area of the map. This is not surprising because of the effect of this polymorphism is often reported to be opposite to that of the MTHFR 677C>T one in AD risk, and has been often suggested to be a protective factor for AD [61,62].
None of the genotypes generated by DNMT3B polymorphisms have been directly linked to AD or control conditions (Figure 3), and this partially confirms the results of a previous genetic association study by us [27]. However, those genotypes seem to interact with others and play a role in determining folate and vitamin B12 levels (Figure 3), suggesting that their contribution to AD risk might be completely different when evaluated in combination with other variables of the pathway.
Several factors, and particularly medicaments and dietary supplements, may alter significantly the one-carbon metabolism. One example is that of metformin, an antidiabetic and gerosuppressant drug that has been suggested to work against AD, even if with conflicting results [63,64]. Indeed, metformin was shown to impair one-carbon metabolism in a manner similar to the antifolate class of chemotherapy drugs [65,66]. Other factors that could affect folate metabolism in aged individuals are dietary supplements containing folate, Bvitamins, or similar [67]. In order to minimize the effect of polymedication in our cohort of subjects, biochemical measurements of folate, hcy, and vitamin B12 were performed during the first visit and most of the subjects were not regularly taking supplements known to interfere with this pathway. In the case of individuals taking medicaments or supplements known or suspected to interfere with one-carbon metabolism, they were interrupted for one month before blood drawings. If this was not possible, the subject was not enrolled for the study.
Present results are indicative of a possible contribution of one-carbon metabolism variables as an additional tool to help during AD diagnosis. At this regard, a recent report from the Vienna Transdanube aging study suggests that high cortisol and low folate levels are the only routine blood tests predicting probable AD after age 75-years, thereby stressing on the utility of a deeper understanding of folate metabolism in AD pathogenesis [68]. Indeed, authors followed 493 persons who were cognitively healthy at baseline for a period of 90 months, and observed that a serum folate increment of 10 ng/mL reduced the risk of switching to probable AD to one-third [68]. Present data revealing that reduced folates are the most related variable linked to AD in our cohort are in strong agreement with that study, but we must stress that biochemical markers alone can be useful, but not sufficient to fully discriminate between AD and control subjects. However, their combination with neural correlates and imaging data, as well as with other markers of the disease such as cerebrospinal fluid markers, might be really useful within this context.

Conclusions
The present study represents the first attempt to use ANNs the understand folate metabolism in AD and healthy matched control subjects, and reveals the importance to evaluate this pathway as a whole rather than to take into consideration its components one at once. Among 30 initial variables of the pathway, 16 of them seem to contain significant information to discriminate between AD and control subjects in our cohort, and the semantic connectivity map here generated reveals both already known and novel connections among variables and disease risk (Figure 3). Of particular interest are variables, such as TYMS and DNMT3B genotypes, that albeit not previously detected in genetic association studies might play a significant contribution when considering the complexity of interactions with other variables of this pathway. Though we achieved good results using ANNs for our small dataset, results are not necessarily generalizable to other populations but need to be validated independently in future studies. Differences might arise from a population to another one, due to different dietary habits or to a different distribution of the studied polymorphisms and other geographic factors. However, our system is able to understand the connections among studied variables and those of relevance in a particular dataset. The addiction of other variables, such as brain volume, DNA methylation content, DNA damage, and so on, coupled with the possibility to graphically visualize the strengths of connections among all the studied variables, could be a helpful and timely tool to unravel the link between folate metabolism and AD, particularly in a period when nutritional supplementation has been often suggested as a preventative strategy to delay epigenetic modifications linked to the onset of age-related disease such as AD [69].