Mutations in SORL1 and MTHFDL1 possibly contribute to the development of Alzheimer’s disease in a multigenerational Colombian Family

Alzheimer’s disease (AD) is the most common cause of dementia in the elderly, affecting over 50 million people worldwide in 2020 and this number will triple to 152 million by 2050. Much of the increase will be in developing countries like Colombia. In familial forms, highly penetrant mutations have been identified in three genes, APP, PSEN1, and PSEN2, supporting a role for amyloid-β peptide. In sporadic forms, more than 30 risk genes involved in the lipid metabolism, the immune system, and synaptic functioning mechanisms. We used whole-exome sequencing (WES) to evaluate a family of 97 members, spanning three generations, with a familiar AD, and without mutations in APP, PSEN1, or PSEN2. We sequenced two affected and one unaffected member with the aim of identifying genetic variants that could explain the presence of the disease in the family and the candidate variants were validated in eleven members. We also built a structural model to try to determine the effect on protein function. WES analysis identified two rare variants in SORL1 and MTHFD1L genes segregating in the family with other potential risk variants in APOE, ABCA7, and CHAT, suggesting an oligogenic inheritance. Additionally, the structural 3D models of SORL1 and MTHFD1L variants shows that these variants produce polarity changes that favor hydrophobic interactions, resulting in local structural changes that could affect the protein function and may contribute to the development of the disease in this family.

Introduction AD is the leading cause of dementia in Latin America. The term dementia is used to define a heterogeneous group of progressive and degenerative brain pathologies, which are clinically characterized by deterioration in memory, learning, orientation, language, comprehension, and judgment [1]. Of all dementia patients, 60 to 80 percent of cases are diagnosed with AD, which affected almost 50 million people worldwide in 2018; this number will more than triple to 152 million by 2050 [2]. Less is known about trends in low-and middle-income countries like Colombia, where the prevalence per 1000 population for dementia is 13.1 (95% CI: 8.5 to 19.3) [3] and The number of people with AD could be approximately 260.000 in 2020 but the current estimates could be underestimating 50% [4].
The hallmark pathologies of AD are the extracellular accumulation of the processing products (Aβ42) of amyloid-β protein precursor (APP), which tend to aggregate forming beta-folded sheets, (amyloid plaques), and extracellular fibrillar aggregates of the microtubule-associated protein tau (neurofibrillary tangles), which are neurotoxic [5]. Plaques and tangles are present mainly in the entorhinal cortex, hippocampus, basal forebrain, and amygdala. These brain regions are involved in learning, memory, and emotional behaviors. [6]. AD is classified considering the age of onset and the inheritance pattern, with earlyonset Alzheimer's disease (EOAD, before 65 years of age) characterized by an autosomal dominant inheritance pattern and late-onset Alzheimer's disease (LOAD, after 65 years of age) presenting a complex inheritance pattern. Twin and family studies indicate that genetic factors play an important role in more than 80% of AD cases [7]. Heritability estimates for LOAD range from 70-80%, and EOAD shows 92 to 100% heritability [7,8]. An autosomal dominant inheritance pattern only been observed in 5% of AD families. In all other AD families the inheritance pattern is complex, and the disease is caused by a combination of both genetic and environmental factors [9,10]. Linkage studies in families provided early insights into the molecular genetics of AD. More than 350 highly penetrant mutations have been identified in three genes: APP [9], PSEN1, [11] and PSEN2 [12] in EOAD patients [13]. But mutations in these genes are causative of the disease in only about 13% of AD patients [14]. Technological advances like genome-wide association studies (GWAS), has achieved the identification of more than 30 loci associated with AD. The APOE-ε4 allele, located in the 19q13.2 region, had been the only well-established risk factor for both EOAD and LOAD [15,16]. Collaborative efforts have changed the direction of research in the genetically complex forms of AD. At least 10 new risk loci have been identified thanks to these efforts in different genes as CLU, CR1, PICALM and BIN1, MS4A, CD2AP, CD33, EPHA1, ABCA7 and SORL1 [17][18][19][20][21][22][23], implicated in pathways related to lipid processing, the immune system, inflammation, and endocytosis [20,24]. This list will probably be extended by the Next-generation sequencing techniques (NGS) such as whole-exome sequencing (WES), whole-genome sequencing (WGS), and targeted sequencing (targeted-panel NGS), have expanded the possibilities to identify causal or risk variants that remain difficult to reach [29]. For instance, WES of a neurological and immunological disorders associated with cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADA-SIL) identified a mutation in NOTCH3 (R1231) in AD patients [30]. Later, rare variants in the NCSTN gene associated with LOAD were identified [31]. Through WGS, a mutation with a protective effect was identified in the gene encoding the APP protein (Ala676Thr). WGS in patients with EOAD and LOAD identified nonsense and missense mutations in SORL1 gene [32,33]. Also, a rare susceptibility variant in TREM2 was identified [34,35]. A target sequencing in a cohort of African descendant samples identified new AD risk variants in the ABCA7, AKAP9, COBL, MS4A6A, PTK2B, SLC10A2, and ZCWPW1 genes [36,37]. Recently, two new variants in the PDE11A gene were identified in Chinese individuals with EOAD that increase Tau phosphorylation [38].
We evaluated a multigenerational family of 97 members, spanning three generations, with an inheritance pattern suggestive of autosomal dominance from eastern Antioquia of Colombia. Eleven family members have a diagnosis of EOAD, and the index case diagnosis was verified by immunohistochemistry of the brain. In eastern Antioquia, the Neurosciences group of the University of Antioquia identify the E280A mutation in PSEN1 gene in the population under 50 years old with an autosomal dominant transmission mode [39]. We sequenced the exons of the PSNE1 gene looking for this mutation and we did not find this or any other mutation in this gene. This suggests that there must be other genetic factors involved that explain the presence of this pathology in this family. We sequenced two affected and one unaffected family members by NGS and the candidate variants found in WES were validated by Sanger sequencing in eleven family members. Finally, we build a structural model of the variants in SORL1 and MTHFDL1 genes to try to predict the effect of these variants on the function of these proteins and its role in the development of the disease in this family.

Sample size
We evaluated a multigenerational family of 97 members, spanning three generations, from eastern Antioquia of Colombia with eleven affected. We did the complete clinical and neuropsychological evaluation in five of the family members, two affected and three un affected, according the criteria of the National Institute of Neurological and Communicative Diseases and Stroke and the Alzheimer's Disease and Related Disorders Association, NINCDS-ADRDA [40]. DNA was available from eleven family members for sequencing, four affected and seven unaffected. For the identification of variant in the PSEN1 gene, we sequence by Sanger all exons of the gen in two family members, one affected and one unaffected. Later, for the variant identification in other genes, we sequenced by NGS three family member, two affected and one unaffected. And finally, to assess the segregation of candidate variants in the family, we sequenced by Sanger the candidate variants found with WES in all eleven family members with available DNA sample. To confirm the AD diagnosis of the index case, the histological evaluation was made following the CERAD protocol stained with hematoxylin-eosin and immunohistochemistry [41] since he died and donated the brain to the Antioquia's Neurobank

AD diagnosis
The patients involved in the study were diagnosed with EOAD by neurologists of the Neuroscience Research Group of the University of Antioquia based on medical history and clinical, neurological, and neuropsychological examination, following the criteria of NINCDS-ADRDA [40]. To assess cognitive decline, we apply the standard cognitive tests following the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) neuropsychological test battery and additional tests validated by the Neuroscience Research Group [42]. In some cases, laboratory studies and neuroimaging were necessary for the diagnosis. For the confirmation of senile plaques, we performed a research autopsy, with brain extraction according to Neuroscience Research Group brain processing protocol to obtained the histopathological stains following the CERAD protocol stained with hematoxylin-eosin and immunohistochemistry [41] in one of the affected family members, the index case, who donated their brain to the Antioquia's Neurobank.

Pedigree
Through direct communication with patients and their relatives, the family pedigrees were built and diagrammed using the Progeny version 7.0 (Progeny CLINICAL Version N) (Progeny Software LLC, Delray Beach, FL, www.progenygenetics.com) and Cyrillic version 3.0.400 [43] software, S1 Fig. Additional information requested from patients included personal data (age, sex, geographical origin), personal and family history of neurodegenerative diseases, clinical characteristics, neuropsychological characteristics, and support exams, among others, S1 Table. DNA samples During the evaluation, we obtained blood samples for DNA extraction. DNA was available from eleven family members, four affected and seven unaffected. DNA was isolated from peripheral blood from four affected family members (III:1, III:5, III:9, III:10) and seven unaffected family members (III:4, III:7, III:8, IV:1, IV:5, IV:6, IV:29). The extraction of DNA was carried out following standard extraction protocols (salting out) [44], and the samples were stored at -20˚C until the time of sequencing, S1 Table. E280A and PSEN1 mutations screening Exons 4 to 13 and the flanking intron regions of the PSEN1 gene were amplified by conventional PCR. Two individuals were sequenced by bidirectional sequencing using the Sanger method, an affected individual (III:5) and a healthy relative (III:9). The chromatogram's quality was evaluated using the program FinchTV version 1. 4

Exome sequencing
The coding regions of the genome of three family members, two affected (III:5 and III:10) and one unaffected (III:7) were sequenced by NGS. This sequencing was requested by the company Macrogen in South Korea. A library enriched with the SureSelectXT Library Prep Kit was constructed, following the protocol proposed in SureSelectXT Target Enrichment System for Illumina Version B.2, April 2015. The sequencing was performed on a HiSeq 4000 device (2 x 101 base pair paired-end reads) following the protocol proposed in HiSeq 3000 4000 System User Guide Part # 15066496 Rev. HCS 3.3.52. Finally, the data was processed by the software HCS (HiSeq Control Software) version 3.3 to obtain the raw data. The data product of the sequencing was converted to the format FASTQ using the package Illumina bcl2fastq version 2.16.0.10 https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversionsoftware.html.

Bioinformatic analysis
The quality of reads was evaluated with the fastqc_v0. 11

Clinical interpretation of candidate variants
The clinical interpretation of the genetic variants was carried out considering the guide pro-

Variants validations
To validate the variants identified in the exome analysis by NGS, each of the candidate variants were sequenced by the gold standard technique (Sanger sequencing) in the healthy and affected individuals available in the family. This service was requested by Macrogen. We sequenced the following variants: SORL1:c.C2710T:p.R904W, MAPT:c.G1667C:p.R556P, CHAT:c.G770A:p.R257Q, ABCA7:c.G2629A:p.A877T, MTHFD1L:c.G1691A:p.R564H, APOE:c.T388C:p.C130R and APOE:c.T526C:p.C176R in eleven members of the family, four affected family members (III:1, III:5 III:9, III:10) and seven non-affected family members (III:4, III:7 III:8, IV:1, IV:5, IV:6, IV:29), S11 Table. Structure and function of genes and proteins where candidate variants were identified We determined the structure and function of each of the genes/proteins in which variants were identified based on information reported in different databases such as NCBI  [68], in order to determine the pathways in which these proteins would be participating and to elucidate how they could be contributing to the development of these pathology related to the nervous system.

Protein model
A model was constructed for each of the proteins where variants were identified, for the wild type and for the corresponding variant identified in order to predict the possible effect of the variants on the structure and function of the proteins involved. For the construction of the models, different protein modeling programs were used, such as I-Tasser https://zhanglab. ccmb.med.umich.edu/I-TASSER/ [69] Swiss-Model https://swissmodel.expasy.org/ [70] Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index, [71] and were visualized with Chimera https://www.cgl.ucsf.edu/chimera/ [72]. The models obtained were refined with the tools FG-MD https://zhanglab.ccmb.med.umich.edu/FG-MD/ and ModRefiner https:// zhanglab.ccmb.med.umich.edu/ModRefiner/ of I-Tasser [69]. Finally, for the validation of the models, the Q-mean values of Swiss Model https://swissmodel.expasy.org/qmean/ [70] and the Ramachandran plots http://mordred.bioc.cam.ac.uk were taken into account for the selection of the best model. The models that presented the highest Q-mean values and the highest number of amino acids in the favorable region were selected according to the Ramachandran plots.
Ethics statement. All procedures involving experiments on human subjects are done in accord with the ethical standards of the Committee on Human Experimentation of the institution in which the experiments were done or in accord with the Helsinki Declaration of 1975. The approval of the research protocol and informed consent for the study in humans was granted by the bioethics committee of the medical research institute of the Faculty of Medicine of the University of Antioquia in the Act number 008 of May 29, 2014. Informed consent is a written document. This was read with the members of the family and after the resolution of doubts, it was signed by each individual who participated in the investigation. In the case of minors, consent was granted by signing the parents. In the case of individuals with strong neurological involvement, consent was granted by signing the responsible person. Two witnesses sign as proof of consent.

Clinical evaluation
Index case (III:5). Clinical evaluation. A 67-year-old patient, reports a chief complaint of short-term memory impairment (in episodic and semantic memory) and unimpaired longterm memory, with onset of symptoms at 63 years old and a slowly progressive course. Additional neuropsychiatric symptoms: insomnia, distractibility, depressive symptoms, and aggressive behavior. Two years after AD onset, he suffered a mild traumatic brain injury (TBI) that accentuated his clinical state. At 65 years old, he suffered a convulsive status epilepticus (CSE) and he was hospitalized, where they performed a cerebral and lung computed tomography (CT) showing lung cancer with cerebral metastasis. We conclude the patient had a pattern of disease compatible with EOAD, but with increased neurologic motor deterioration associated with metastatic lung cancer. More details in S1 and S2 Appendices.
Histopathological stains. Conventional morphological study, staining reactions and immunohistochemistry, shows the presence of classic neuritic plaques with dystrophic and pathological neurites in the isocortex cortex/frontal association (up to >20/mm2), in the temporal isocortex (up to 17 mm2) and in the parietal isocortex (up to 19 mm2) according to a plate score correlated by age CERAD C [41]. Immunohistochemical evidence of diffuse deposits of positive β-amyloid in the frontal neocortex, central gray matter (Entorhinal region) as well as in the CA4 region according to phase 4 described by Thal, D.R., et al. (2002) [73]. Also, a scant detection of β-amyloid in vascular walls, that involves vessels in the entire circumference, and occasionally vessels in a few regions with circumferential β-amyloid affecting parenchyma and meninges, corresponding to a score of 1 in CAA, according to Love, et al. (2014) [74]. Fig 1A and 1B. In the staining reactions as well as in the immunohistochemistry for the tau protein, the trans entorhinal, entorhinal, pyramidal (CA1-CA4) layer, the dentate gyrus, striatum, and thalamus are affected, with involvement of the neocortex. And intraneuronal tangles and extracellular neurofibrillary threads stage V are seen according to Braak and Braak (1991) [5]. Fig 1C and 1D. No Lewy bodies were observed in the different structures, when stained with α-synuclein, corresponding to a stage 0 [75]. Absence of atherosclerosis of small arterial blood vessels in central gray matter and centrum semiovale without significantly affected perivascular neuropil. Amylaceous bodies are observed in a moderate amount in the hippocampus. In the frontal cortex (middle frontal gyrus) and hippocampus, infiltrating lesion is observed that destroys the morphology of the cortex and the white matter; composed of cuboid and highly mitotic cells. When studying tumor tissue, highly vascularized, infiltrating tissue is observed, with abundance of mitotic cells, ovoid, fusiform, small, and hyperchromatic nucleus, compatible with metastatic lung tumor.

Genetic evaluation
Extended genealogy. We constructed a genealogy of five generations with 97 individuals 11 of which were affected. The mode of inheritance was suggestive of autosomal dominant transmission. We observed affected individuals in all generations, with a proportion of men and women affected proportionally in the first three generations. Since AD is a disease that develops mainly in old age, individuals of the last two generations (20-38 years old individuals) were asymptomatic. S1 Fig.  PSEN1 gene. No mutations were observed in any of the exons of the PSEN1 gene analyzed in the family with AD. The family members evaluated in this study are not carriers of E280A mutation, the most prevalent mutation in Antioquia's population [76] The polymorphism rs165932 located in intron 8 of the PSEN1 gene was identified. Both the affected individual (III:5) and the unaffected individual (III:7) carried the variant, being heterozygous G/T. S2 and S3 Figs. The rs165932 polymorphism is considered benign according to the VEP tool (Variant Effect Predictor) of Ensembl. https://www.ensembl.org/Tools/VEP, Table 1.
Exome sequencing. WES revealed 71,854 variants in total, with 53,063 variants per individual on average. After applying the filters, an average of 41,834 SNPs (single nucleotide   Table. Of these variants, 45% are in intronic regions, 14% upstream, and 10% downstream of the gene regions. The remaining 55% correspond to variants located in coding regions; of these, 70% were classified as non-synonymous, of these, 59% correspond to missense variants, 5% to frameshift variants, and 2% to stop-loss or stop gain variants. , PP5 (Pathogenic supported) and PS3 (Pathogenic strong) simultaneously due to its high frequency (>0.05) in genome or exome sequencing projects and multiple lines of computational evidence suggesting that there is no impact on the gene or genetic product, this variant is located in a region with a high mutation rate (hot spot) and/or located in a well-established functional domain (for example, an enzyme's active site), Uniprot identifies this variant as associated with a disease (Hyperlipoproteinemia 3) and also well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. Finally, ABCA7 variant c. G2629A:p.A877T was classified as VUS (Variant with Uncertain Significance) because no rule has met the criteria, Table 3.
Validation and segregation analysis of genetic variants.   The c.C2710T, p.R904W variant in SORL1 gene and the c.G1691A, p.R564H variant in MTHFD1L gene do not segregate in all affected family members and are present in some unaffected cases. The c.C2710T, p.R904W variant in SORL1 gene is present in two affected family members: 1. in the index case (III:5) with an onset of 63 years and age of death of 68 and, 2. in one of the sisters (III:9) with an onset age of 60 years and a current age of 70 years at the time of the study. And the c.G1691A, p.R564H variant in MTHFD1L gene is present in another two affected family members: 1. in another sister (III:1) with an onset of 60 years and current age of 68 and, 2. in one of the brothers with an onset age of 62 years and a current age of 66 years at the time of study. The sister of the index case (III: 8) who carries the variant c.C2710T, p.R904W variant in SORL1 gene was evaluated at the age of 60 years and was considered healthy because at the time of the study she did not manifest memory complaints. The other two carriers of these variants are in the fourth generation and are daughters of the index case. The first daughter (IV: 5) had a current age of 38 years and the second daughter (IV: 6) was a current age of 42 at the time of evaluation and they were considered healthy because they were too young to start the cognitive deterioration process. It is important to note that patients in this family began their cognitive decline after 60 years of age, Fig 2. None of the risk alleles we found were homozygous. Most of the individuals in the family are heterozygous for the ApoE4 haplotype. All individuals in the family are carriers of at least one ApoE-Ɛ4 allele. AD risk is increased 3-5 fold for heterozygous APOE-ε4. Also, the affected family members (III:5 and III:9), in addition to carrying one ApoE-Ɛ4 allele, carry the variant c.C2710T in SORL1 gene in a heterozygous state. And the affected

Structural model of SORL1 protein.
In RCSB Protein Data Bank database [77], the structure of the sortilin-1 receptor (SorLA) is not completely resolved and only one region has been crystallized by diffraction of X-rays with a resolution of 2.35Å. We use 3WSX, 3WSY and 3WSZ as a template for model building [78]. The structure was completed using a structural prediction method using the protein sequence stored on the Uniprot platform (ID: Q92673) [79,80]. The hypothetical structural model for the receptor was obtained with I-Tasser, [81][82][83] and with Phyre2 tool [71]. The best model for wild type receptor, (QMEAN6 = 0.3327 and Z-score = -9.5289) and SorLA-R904W (QMEAN6 = 0.3327 and Zscore = -9.5289) was obtained with the I-Tasser tool, and this model was refined with Model Refiner tool, Fig 3A and 3B and S6 and S9 Tables. Ramachandran plot shows that 72% of amino acids for the wild type SorLA model are in the favorable zone, 16.7% are in the allowed zone and 6.1% in the forbidden zone, and for the SorLA-R904W, 83.0% of amino acids are in the favorable zone, 13.1% are in the allowed zone and 3.9% in the forbidden zone, Fig 3A and S7 and S9 Tables. Hydropathic Index analysis of the SorLA protein showed that the amino acid change, Arg904Trp, produces a variation in the hydrophobicity and hydrophilicity values scores in the flanking region (from amino acid 900 to amino acid 910) according to Kyte and Dolittle coefficients calculated with Protscale software from the Swiss ExPASY suite [84], Fig 3D. The hydrophobicity increases cause SorLA to fold within itself when gaining non-covalent intramolecular forces, from the β-antiparallel sheets that with their aromatic residues in the distance of accessible links of 3,748 Å and 4,737 Å, favor alternate stacking interactions between specific amino acids in the side chain, which explains the structural change from the chemical point of view and the possible alteration of protein function, Fig 3E. Structural model of MTHFD1L protein. This enzyme only has a structural fragment crystallized by solid nuclear magnetic resonance (NMR), with an ID of 2EO2 as recorded in RCSB Protein Data Bank database (ID:2EO2) [77]. Therefore, we generate prediction models using I-Tasser [81][82][83] and Phyre2 tools tool [71], using the sequence reported in the Uniprot database (ID:Q6UB35) [79,80]. The best structural model for wild type enzyme (QMEAN6 = 0.483446 and Z-score = -6.560672) and MTHFD1L-R564H (QMEAN6 = 0.483446 and Z-score = -6.560672) was built with Phyre2 tool, and these models were refined with the Model Refiner tool, Fig 4A and 4B and S8 and S10 Tables. 83.8% of amino acids are in the favorable zone, 10.3% are in the allowed zone and 5.8% in the forbidden zone for the wild type MTHFD1L, and 86.3% of amino acids are in the favorable zone, 9.1% are in the allowed zone and 4.5% in the forbidden zone for the MTHFD1L-R564H, according to Ramachandran plots, Fig 4A and S9 and S10 Tables. Hydropathic Index analysis [84] showed an increase in hydrophobicity from the amino acid 558 to amino acid 568, due to change from a charged polar amino acid for a weakly charged amino acid, Fig 4D. The 3D model of wild type and MTHFD1L-R564H shows an evident topological difference in the region adjacent to the variant with the histidine less exposed. Additionally, the minimum energy values show values of 2441.98 kJ/mol and 2089.95 kJ/mol for the wild type and the carrying variant protein, respectively, which means that the structure of the wild type is more stable. Finally, although the change from Arginine to Histidine does not alter the nature of the basic functional group, the basic contribution of the imidazole group of Histidine is shorter in binding distance than the contribution of the di-amino of Arginine, generating an acid-base interaction between Glutamic acid and Histidine with a distance of 2,319Å, suitable for a formation of adduct by hydrogen bonds that produces a structural approach, possibly altering the functional domain of the protein, Fig 4E.

Discussion
The extended family from Antioquia, Colombia studied here has approximately 97 individuals with eleven affected family members. The age of onset of these patients was between 55 and 65 years, so they were diagnosed with EOAD. We found a brain with high-grade neurodegenerative changes in the form of 1. Diffuse deposits of positive B-amyloid, neurofibrillary tangles and neuritic plaques with a score of A3, B3 and C2. According to the current National Institute for Classification of Aging, this results in a "high change in neuropathology of AD considered as a sufficient explanation for dementia" [85], 2. The evaluation of amyloid angiopathy shows limited involvement of vessels, with scant involvement of meningeal and parenchymal vessels, indicating a very low vascular compromise associated with AD [74], 3. Diffuse infiltration of small, highly mitotic cells that affect the morphology of the cortex and white matter that correspond to metastatic lesions of lung cancer [86]. In accordance with the current guidelines of the National Institute on Aging, the clinically known dementia of the patient is adequately explained by the "neuropathological changes of AD".
No previously reported variants in PSEN1 were found, included the E280A mutation, the most prevalent mutation in the Antioquia's population [76]. We identified the rs165932 polymorphism located in intron 8 PSEN1 gene. The rs165932 polymorphism was initially reported as a risk factor for LOAD with an OR = 1.97, 95% CI 1.29-3.00. In this case, the T allele in the homozygous state confers a double risk for LOAD [87]. Although these findings were   [99,100] where the G allele has been identified as a risk factor for AD. This polymorphism has also been associated with EOAD [97,99,101] with similar results to those reported for LOAD. The affected and healthy individuals sequenced are both heterozygous for the T allele. Since these data are inconclusive to determine an association of the T allele with AD in these families, we decided to broaden the search for variants in other genes that may explain the development of the disease in the studied family.
The index case carries variants in SORL1, CHAT, ABCA7, LPA, and APOE genes. SORL1 encodes the Sortilin-related receptor, SorLA. This protein is a multifunctional endocytic receptor that is widely expressed in the central nervous system in particular in the cerebellum, cerebral cortex, hippocampus and in the caudate nucleus, is involved in the uptake of lipoproteins and proteases and participates in APP traffic to and from the Golgi apparatus. Therefore, it probably acts as a classification receptor that protects APP from late endosome traffic and its processing to beta-amyloid peptide, thus reducing the formation load of amyloidogenic peptides.
Its structure consists of several domains with different functions: an N-terminal domain VPS10 (vacuolar protein sorting domain) important for the classification and transport of endosomal proteins and can also interact with different neuropeptides, participate in the processing of APP, 5 LDL domains-Class B receptor (Low-density lipoprotein-YWTD domain receptor) that play a central role in cholesterol metabolism, an epidermal growth factor type domain, which plays a vital role in immune responses, as well as in the elimination of dead cells in the organism, and, epidermal growth factor precursor type repeat (EGF), 11 class A LDL-receptor domains, identified as the lipoprotein binding site and 6 type III fibronectin domains, involved in cell adhesion processes, cell morphology, thrombosis, cell migration and embryonic differentiation and a fibronectin type III domains (FNIII), Fig 3C. SorLA interacts (via N-terminal ectodomain) with APP, forming a 1:1 stoichiometric complex, this interaction retains APP in the trans-Golgi network and reduces processing into soluble APP-and amyloid-beta peptides [102][103][104][105]. The R904W variant is in the extracellular region of the protein, specifically in the third LDL-receptor class B domain that extends from amino acid 888 to 932, important in cholesterol metabolism and APP binding site.
Both common variants with modest OR values and rare missense, stop codon and proteintruncating variants, (PVT) in the SORL1 gene have been associated with AD, both in familial and sporadic forms in different populations [106][107][108][109][110][111][112][113]. The use of NGS techniques has contributed to the identification of variants associated with AD in this gene [32, [114][115][116][117][118][119], including the Alzheimer's Disease Sequencing Project (ADSP), which performed a complete sequencing of the 5,740 patients with late-onset Alzheimer's disease and 5,096 cognitively normal controls, mainly of European descent, which includes 218 cases and 177 controls of Hispanic origin, in which SORL1 has been one of the most prevalent genes [120]. However, the exact functional consequences of most of the rare variants identified, as well as their corresponding levels of risk for the development of AD are yet to be determined. Only the effect of some of these variants, PVT, that generate a loss of protein function have been studied in vitro showing variable degrees of decreased protein function, leading to an increase in the secretion of beta-amyloid peptide [106,121,122].
The mechanism by which the variants in the receptor, SorLA, may be associated with AD are the following: 1) APP binds through the LDL domain and can redirect it to the non-amyloidogenic pathway, inhibiting the formation of beta-amyloid peptide, and 2) binds to nascent beta-amyloid peptides and directs them to the lysosome, preventing their secretion [104, 106,121,[123][124][125]. Functional studies of the p.Gly511Arg variant showed an interrupted interaction of the Vps10p domain with monomers of the beta-amyloid peptide, which reduced the lysosomal orientation of the beta-amyloid peptide by SORL1 [105]. Regarding the p. Glu270Lys and p.Thr947Met variants, both showed an increase in the secretion of the Aβ1-40 and Aβ1-42 forms of the amyloid-beta peptide and the levels of APP on the cell surface in transfected cell lines [118]. The analysis of the structural model of Arg904Trp variant shows that the polarity changes in the flanking region can affect the protein structure possibly affecting its function, its interaction with the membrane and even blocking its functional domain.
CHAT encodes the enzyme Choline O-acetyltransferase, ChAT, whose expression in the CNS is characteristic of cholinergic neurons given its function since it is responsible for the synthesis of the neurotransmitter acetylcholine. Its structure consists of a colin/carnitine acetyltransferase domain, which participates in the transfer of an acyl group from one compound (donor) to another (acceptor). The G1124A variant is found in exon 8 of the gene and the R375Q change is located within the colin/carnitine acetyltransferase domain that extends from amino acid 131 to 719 and about 145 amino acids from the Coenzyme A binding site that covers 13 amino acids of the 520 to 532 of the enzyme. A moderate number of variants in ChAT associated with AD have been reported, among the most supported are rs3810950, rs2177369, rs1880676 and rs868750. The rs3810950 polymorphism has been shown to be associated with AD in more than a dozen studies in different populations of Asia, America, and Europe mainly [126][127][128][129][130][131][132][133][134][135][136][137][138]. These variants can affect the synthesis of the enzyme, amplifying a cholinergic neurotransmission deficit in Alzheimer's disease [126][127][128][129][130][131][132][133][134][135][136][137][138]. The association of these variants in ChAT is also related to the response to AChEI therapy [139,140].
The affected sibling (III:10) of the index case does not share the variants identified in the index case, however this individual is a carrier of the variant c.G1691A:p.R564H in the MTHFD1L. The protein encoded by MTHFD1L, a Methylenetetrahydrofolate Dehydrogenase (NADP+ Dependent) 1 Like protein is involved in folate metabolism, specifically, in the synthesis of tetrahydrofolate (THF) in the mitochondria. THF is important in the de novo synthesis of purines and thymidylate and the regeneration of methionine from homocysteine. This monofunctional enzyme consists of two main domains: an inactive N-terminal methylene-THF dehydrogenase and cyclohydrolase domain from amino acid 31 to 348 and an active Cterminal formyl-THF synthetase (FTHFS) domain of amino acid 349 to 978. The G1691A variant is found in exon 16 of the gene and the R564H change is located in the C-terminal domain of the active formyl-THF synthetase (FTHFS), Fig 4C. Elevated plasma homocysteine levels have been linked to AD [141,142] and other neurodegenerative diseases, including Parkinson's disease [143], and have been recognized as a risk factor for preeclampsia [144], diabetic complications [145], heart disease [146][147][148] and coronary artery disease (CAD) [149].
The mechanisms by which the association between folate metabolism and AD can be explained are the following: 1. Folate is a cofactor in the metabolism of carbon, during which it promotes the remethylation of homocysteine, a cytotoxic amino acid that contains Sulfur that can induce DNA chain breakage and oxidative stress, promoting the generation of reactive oxygen species (ROS) and cell death by apoptosis [150][151][152] and 2. Elevated homocysteine contributes to the risk of AD by causing vascular alterations, which have been directly related to AD and can cause a cholinergic deficit in cortical neurons due to its toxicity [153]. This gene has also been associated with neural tube defects that include spina bifida, meningocele, encephalocele, and anencephaly, as a result of abnormalities in proliferation, differentiation, and death of neural cells [154][155][156]; and with adenocarcinoma and it has been considered as a new molecular target for cancer therapy [157][158][159][160]. The R564H variant found in the exome analysis was evaluated in the structural model that was built. This variant produces an increase in hydrophobicity in the adjacent region and produces a change in the topology of the protein generated by the possible formation of an adduct, which can finally alter the functional domain of the protein and therefore its function, Fig 4E. The ABCA7 gene encodes the ATP Binding Cassette Subfamily A Member 7. This protein is responsible for transport of various molecules through extracellular and intracellular membranes, playing an important role in the homeostasis of lipids and macrophage-mediated phagocytosis. This transporter has been predominantly detected in myelo-lymphatic tissues with greater expression in peripheral leukocytes, thymus, spleen, and bone marrow, and although it is also expressed in the CNS where it participates in the clearance of beta-amyloid peptide by microglia and macrophage cells, limiting the production of beta-amyloid by playing a role in the regulation of endocytosis and/or APP processing. Its structure consists of two highly conserved ATP-binding domains (ATPase domain), the first (ABC transporter 1) located from amino acid 807 to 1038 and the second (ABC transporter 2) located from amino acid 1793-2025, which use the energy product of the hydrolysis of ATP for the export or import of a wide variety of substrates ranging from small ions to macromolecules. The G2629A variant is found in exon 19 of the gene and the A877T change is in the first ATP binding domain important for ATP hydrolysis.
Finally, the variant rs429358: c.T388C: p.C130R in the APOE gene was identified in the several affected individuals of the family. This gene encodes apolipoprotein E involved in lipid metabolism. The rs429358: c.T388C: p.C130R variant in the APOE gene is classified as a strong pathogenic (PS3) since well-established in vitro or in vivo functional studies support a harmful effect on the gene or the gene product and pathogenic supported (PP5) since reliable sources recently remark the variant as pathogenic. This variant together with the variant rs7412: c. C526T: p.R176C located in exon 4 of the APOE gene constitutes the haplotype APOE (derived from the combination of rs429358 and rs7412). APOE-Ɛ4 is the most important genetic risk factor for AD, and this risk increases according to the number of copies of the allele. Heterozygous individuals, carriers of one copy of the ApoE-Ɛ4 allele have twice the risk, while homozygous individuals carrying two copies of the ApoE-Ɛ4 allele have eleven times the risk of developing the disease in relation to those carrying the other ApoE alleles ApoE-Ɛ3 and ApoE-Ɛ2. The ApoE-Ɛ3 allele, the most common in the population, is considered neutral, while the ApoE-Ɛ2 allele is considered protective, this being the least frequent in the population [161][162][163][164][165][166]. All family members have almost one copy of ApoE-Ɛ4 allele, which raises the risk of developing AD twice, according to the literature [161][162][163][164][165][166].
Although the implication of many of these genes as risk factors is highly discussed, when the susceptibility genes belong to the same signaling pathway, the risk associated with a multigenic disease can be better explained by relating the possible integrated effects of the variants in the genes that intervene in the same pathway rather than with the individual effect of each of the variants in a single gene separately [167,168]. Recent studies have reported similar results to those we found in this study. 1. WES revealed no mutations in the PSEN1, PSEN2, and APP genes in any of the family members. 2. WES detected possible pathogenic rare variants segregating in multigenerational families with autosomal dominant transmission like in the SORL1, ABCA7, and APOE genes. 3. The SORL1 variants were present in both affected family members and some of the unaffected family members, and, in some cases, affected noncarries were reported, raising interrogations on the inheritance pattern, or suggesting incomplete penetrance , Fig 2. 4. The variants are in a highly conserved amino acid, affecting an important functional domain, has a CADD score higher than 14, and was predicted to be deleterious for more than three pathogenicity predictors [169,170].
As limitations of the study, we have a small sample size since it was not possible for us to take biological samples from all affected and unaffected individuals of the family to perform genetic analyzes, even though the multigenerational family is very large. Some members of the family live in rural areas with difficult access, and it was not possible to carry out the home visit. This also limited the performance of additional tests to refine the diagnosis of all individuals in the family, since some did not have the possibility of traveling to the city for these tests. Only individuals who had a complete clinical evaluation were included in the study. Finally, it is important to be clear that although the WES is an effective tool for identifying genetic variants in families with Mendelian inheritance patterns, this type of study only covers 2% of the whole genome, therefore, 98% of non-coding genome remains unexplored. It is possible that genetic variants located in this region (introns, splicing regions, regulatory regions) contribute to the development of neurological diseases, and considering that epigenetic mechanisms also play a role in mediating synaptic and neural network connectivity and plasticity, epigenetic mechanisms can also be involved in the molecular pathophysiology of these diseases. However, it was possible to carry out the clinical evaluation of a considerable number of individuals of the family in three generations, and the confirmation of the diagnosis of AD in the index case, who donated the brain for the identification of amyloid plaques and neurofibrillary tangles in the histopathological examination. And it was possible to identify variants previously associated with AD such as (APOE:c.T388C:p.C130R), as well as new variants in genes previously associated with AD such as SORL1:c.C2710T:p.R904W and MTHFD1L:c.G1691A:p.R564H. "Nevertheless, additional studies are required to determine how these changes could affect the protein function and if these changes could be contributing to the development of the disease in this family. This may be prospects for future studies.

Conclusions
We found possibly pathogenic genetic variants in SORL1 and MTHFD1L genes and other risk variants in CHAT, ABCA7, and APOE genes segregating in a Colombian multigenerational family with EOAD, suggesting an oligogenic model where multiple genetic factors may be interacting in different biological pathways related with the Aβ production and/or clearance, contributing to the risk of AD. 0020 build for the variants R904W in SORL1 and R564H in MTHFDL1 shows that these changes may produce polarity variations that favors hydrophobic interactions, resulting in local structural changes that could affect the protein function and may contribute to the development of the disease in this family. Additional studies are required to determine how these changes could affect the protein function and if these changes could be contributing to the development of the disease in this family.