Whole genome sequencing and rare variant analysis in essential tremor families

Essential tremor (ET) is one of the most common movement disorders. The etiology of ET remains largely unexplained. Whole genome sequencing (WGS) is likely to be of value in understanding a large proportion of ET with Mendelian and complex disease inheritance patterns. In ET families with Mendelian inheritance patterns, WGS may lead to gene identification where WES analysis failed to identify the causative single nucleotide variant (SNV) or indel due to incomplete coverage of the entire coding region of the genome, in addition to accurate detection of larger structural variants (SVs) and copy number variants (CNVs). Alternatively, in ET families with complex disease inheritance patterns with gene x gene and gene x environment interactions enrichment of functional rare coding and non-coding variants may explain the heritability of ET. We performed WGS in eight ET families (n = 40 individuals) enrolled in the Family Study of Essential Tremor. The analysis included filtering WGS data based on allele frequency in population databases, rare SNV and indel classification and association testing using the Mixed-Model Kernel Based Adaptive Cluster (MM-KBAC) test. A separate analysis of rare SV and CNVs segregating within ET families was also performed. Prioritization of candidate genes identified within families was performed using phenolyzer. WGS analysis identified candidate genes for ET in 5/8 (62.5%) of the families analyzed. WES analysis in a subset of these families in our previously published study failed to identify candidate genes. In one family, we identified a deleterious and damaging variant (c.1367G>A, p.(Arg456Gln)) in the candidate gene, CACNA1G, which encodes the pore forming subunit of T-type Ca(2+) channels, CaV3.1, and is expressed in various motor pathways and has been previously implicated in neuronal autorhythmicity and ET. Other candidate genes identified include SLIT3 which encodes an axon guidance molecule and in three families, phenolyzer prioritized genes that are associated with hereditary neuropathies (family A, KARS, family B, KIF5A and family F, NTRK1). Functional studies of CACNA1G and SLIT3 suggest a role for these genes in ET disease pathogenesis.

Introduction Essential tremor (ET) is one of the most common neurological disorders. In most studies the prevalence of ET is markedly higher than that of Parkinson's disease (PD). The prevalence of ET is estimated to be 2.2% and as much as 4.6% in cases aged >65 years [1]. Similarly, the age specific incidence is reported to increase after the age of 49 years and reaches a maximum (84 per 100 000) in the ninth decade [2]. While the majority of studies do not show a gender difference for ET, a minority of studies show a statistically significant gender difference with a higher prevalence among men than women (crude male prevalence: crude female prevalence = 1.08:1) [1]. The defining clinical feature of ET is a kinetic tremor at 4-12 Hz. This tremor occurs in the arms and hands; it may also eventually spread to involve several cranial regions (e.g., the neck, voice, and jaw). Both genetic and environmental (i.e., toxic) factors are likely to contribute to the etiology of ET. The high heritability and aggregation of ET in families suggests a Mendelian pattern of inheritance [2][3][4][5]. Family studies indicate that on the order of 30-70% of ET patients have a family history with the vast majority (>80%) of youngonset (<40 years old) cases reporting >1 affected first-degree relative [6].
With the limited nature of this progress, the genetic etiology of ET still remains largely unexplained. Whole genome sequencing (WGS) is likely to be of value in furthering our understanding of a large proportion of ET where WES analysis has failed to identify the causative variant [17]. WGS which forgoes capturing is less sensitive to GC content and is more likely than WES to provide complete coverage of the entire coding region of the genome [18].
Here we report analysis of eight early-onset ET families (n = 40 individuals) enrolled in the family study of Essential Tremor (FASET) at Columbia University. The analysis included filtering on WGS data based on allele frequency in population databases, rare SNV and indel classification and association using the Mixed-Model Kernel Based Adaptive Cluster (MM-KBAC) test [19,20]. A separate analysis of rare SVs and CNVs segregating within families, prioritization of candidate genes identified within families using phenolyzer and functional studies of two candidate genes was also performed.

Study participants and clinical diagnosis
Study subjects and relatives were enrolled in a family study of ET at Columbia University NY, USA. The study was approved by the Institutional Review Board at Columbia University and written informed consent was obtained from all participants. Details of the study, criteria for enrollment, and diagnosis of ET has been described previously [15]. We selected a total of 8 families for WGS (n = 40 individuals), which included affected and unaffected first-degree relatives. A subset of the families (Families A, B and F) have been previously described in a WES study [15]. All affected individuals included in the study received a diagnosis of definite, probable or possible ET. Possible and probable ET family members were considered affected. The criteria we used, namely, the Washington Heights Inwood Genetic Study of ET (WHIGET) criteria are very strict [21]. All ET diagnoses (possible, probable and definite) required, at a minimum, moderate or greater amplitude kinetic tremor on at least three tasks, and an absence of other etiologies. As such, these criteria for all three categories of ET (i.e., possible, probable and definite) are even more stringent than those for definite ET that were outlined in the original Consensus Statement on Tremor of the Movement Disorders Society (published in 1998) [22] and the revised Consensus Criteria (published in 2017) [23]. The clinical characteristics of study participants are summarized in Table 1

Whole genome sequencing and quality control
Genomic DNA was isolated from peripheral blood cells using standard methods. Whole genome sequencing was performed on the genomic DNA of 4-5 individuals including affected Table 1. Clinical characteristics of affected ET individuals and unaffected family members that were whole genome sequenced in eight families.

Clinical characteristic ET patients n = 31
Unaffected n = 9

Total n = 40
Male n (%) 12 (39) 3 (33) 15 (38) Age at tremor onset mean years (SD) 27 [25]. Duplicate reads were removed using Picard (http://broadinstitute.github.io/picard/). Local realignment and quality recalibration was performed via GATK. Quality control checks for samples were performed according to GATK best practices. Previously, we performed WES [15] on a subset of the families (Families A, B and F) included in the current WGS analysis. For these families, WES did not identify the candidate genes identified in the current WGS study. In S1 Table and S2 Table, we summarize the overlap of variants identified in families in both WGS and WES datasets (S1 Table) and the reason variants were not called or prioritized in the WES dataset (S2 Table).

Variant filtering based on allele frequency in population databases
We filtered and removed variants with MAF�0.01 in all individuals in 1000 Genomes Phase 3 or the NCBI dbSNP common 147 database, resulting in a total of 3,777,271 rare variants across all samples (Fig 2).
Classification of rare variants based on variant type. Annotation of variants was performed based on reference sequence GRCh37 and RefSeq Gene transcripts of NBCI Homo sapiens Annotation Release 105 that was implemented in the Golden Helix SNP & Variation Suite (SVS) ver.8.2 (Golden Helix MT). Rare variants were classified into five groups, based on localization to a gene region and predicted effect on transcript and protein: 1) 5'-UTR and 3'-UTR (n = 26,872 variants in 8,299 genes), 2) nonsynonymous (n = 11,272 variants in 4,877 genes), 3) loss-of-function (LoF) (n = 1,365 variants in 711 genes), 4) synonymous (n = 5,854 variants in 3,164 genes), and 5) intronic (n = 1,174,082 variants in 16,486 genes). LoF variants were defined as follows: nonsense variants that introduce stop gain/loss of codons, variants that disrupt splice sites including canonical splice donor and acceptor sites and frameshift variants that disrupt a transcript's open reading frame.
Annotation of variants. Rare variants were assessed using several in silico tools including the Combined Annotation Dependent Depletion (CADD) tool [26] implemented in the Golden Helix SNP & Variation Suite (SVS) ver.8.6.0 (Golden Helix MT) (Fig 2). CADD measures deleteriousness of variants (coding and non-coding intronic) that is a property strongly correlated with molecular functionality and pathogenicity [27]. Variants were filtered based on a phred-scaled CADD score and variants with a phred-scaled CADD score>10, corresponding to the top 10% of deleterious substitutions relative to all possible variants in the human reference genome [26] were retained for further analyses. We also assessed deleteriousness of variants using several in silico tools including SIFT [28], PolyPhen2 [29], LRT [30], Mutation Taster [31], FATHMM [32], PROVEAN [33], MetaSVM and MetaLR [34] as implemented in the Golden Helix SNP & Variation Suite (SVS) ver.8.6.0 (Golden Helix MT). Only variants with a phred-scaled CADD score>10 and/or predicted to be deleterious or damaging by �1 in silico prediction tool were retained for further analysis.

Non-coding intronic variants in DNase I hypersensitivity and transcription factor binding sites.
We performed further evaluation of non-coding intronic variants by assessing whether these variants are enriched in DNase I hypersensitive sites that represent open chromatin regions accessible to transcription factors. We downloaded the wgEncodeRegDnase-ClusteredV3 table from the DNAse Clusters track which contains DNaseI Hypersensitive Sites in 125 cell types in ENCODE (http://genome.ucsc.edu/cgi-bin/hgTables) [37].
Residual variation intolerance score (RVIS). We assessed the candidate genes identified in this study to determine whether they are intolerant to variants by applying the residual variation intolerance score (RVIS) [38].

MM-KBAC analysis
We performed a rare variant classification and association analysis using the regression and permutation based Mixed-Model Kernel-Based Adaptive Cluster method (MM-KBAC) [19], and the within gene interaction model to analyze rare functional variants, as implemented in SVS ver.8.6.0 (Golden Helix MT) (Fig 2). KBAC catalogs rare variant data within a gene region/transcript (genome-wide) into multi-marker genotypes and determines their association with the phenotype, weighing each multi-marker genotype by how often that genotype was expected to occur according to control and case data and the null hypothesis that there is no association between the genotype and the case/control status. Thus, genotypes with high sample risks are given higher weights that potentially separate causal from non-causal genotypes. The logistic mixed model approach for KBAC to adjust for family structure and relatedness was used and has been described previously [20]. Possible and probable ET family members were considered affected. The control population used included unaffected family members. A p value was assessed by an adaptive permutation procedure in association tests [19]. The test applied 10 000 permutations and an adaptive permutation threshold of alpha 0.01 and used the earliest start position and the last stop position of all transcripts to define a gene based on the RefSeq Gene transcripts 105v2 NCBI. By default, variants flanking (proximal and distal) the gene region up to a distance of 1000 bp were included in the analysis. We selected genes with a p value<0.05 for further analysis.
The analysis was performed separately for variants classified by variant type in the dataset. When MM-KBAC analysis was performed separately for variants based on variant type (nonsynonymous, LoF, 5'UTR and 3'UTR, synonymous and intronic) the total number of genes with p value <0.05 was 163.

Co-segregation of variants with ET within families
Variants identified from the MM-KBAC analysis, that were annotated with a phred scaled score>10 by CADD (coding and non-coding intronic variants) and/or predicted by in silico prediction tools to be deleterious or damaging (coding variants) were assessed for co-segregation with ET within families (Fig 2). The criteria that we used to define co-segregation is as follows: 1) the annotated variant was present in all affected ET individuals and 2) absent from unaffected individuals within a family.
Sanger sequencing was used to validate and confirm single nucleotide variants and small indel variants within a family and to genotype family members with available DNA that did not have WGS data.
Genes harboring variants that were annotated with a phred scaled score>10 by CADD (coding and non-coding intronic variants) and/or predicted by in silico prediction tools to be deleterious or damaging (coding variants) and that co-segregated with ET within single family were prioritized for phenolyzer.

Prioritization of candidate genes using phenolyzer
Phenolyzer is a computational tool that uses prior information to implicate genes involved in diseases [39]. Phenolyzer exhibits superior performance over competing methods for prioritizing Mendelian and complex disease genes based on disease or phenotype terms entered as free text. The most disease relevant genes, considering all reported gene-disease relationships, are shown as seed genes. Predicted genes are input (seed) genes that are expanded to include related genes on the basis of several gene-gene relationships (e.g. protein-protein interactions, biological pathway, gene family or transcriptional regulation). The following disease/phenotype terms were used: Tremor, Essential Tremor, Parkinson's disease, Channelopathy, Epilepsy, neurological, neurodegenerative, Spinocerebellar ataxia, Fragile X Associated Tremor Ataxia Syndrome, brain, cerebellar diseases. For each family, candidate genes with prioritized variants were uploaded as input for phenolyzer analysis. The gene disease score and gene prediction score system is described in Yang et al., 2015 [39]. Phenolyzer generates raw and normalized scores for seed and predicted genes [39].

SV and CNV detection algorithms
CNV calling. To identify germline genic SVs and CNVs from short read WGS data we adhered to the recommended best practices workflow described by Trost et al., 2018 [40].
The SV and CNV-detection algorithms used were Canvas version 1.19.1 and Genome STRucture in Populations (Genome STRiP) version 2.0. The Canvas algorithm was run using the Germline-WGS module. B-allele frequencies were from the dbsnp.vcf provided with Canvas software. Custom parameters were used for the Canvas Bin and Canvas partition algorithms. The Binary Bin algorithm and the Circular Binary Segmentation Partitioning algorithm were used. Low confidence calls with low supporting bins (BC<8) were filtered from the VCF before analysis. For the GenomeSTRiP CNV command module a minimum refined size of 500bp was used as a cutoff. For the GenomeSTRiP SV command module, SVs were called in two size ranges as recommended by best practices (100-100,000bp and 100,000-10,000,000bp). The overlap and intersect of SVs and CNVs from the two algorithms was determined per sample.
To identify rare genic SVs and CNVs segregating with ET in each family a family based analysis was performed using Clinical Reporter in Opal (Fabric genomics, Inc.). The criteria that we used to define co-segregation is as follows: 1) the annotated CNV was present in all affected ET individuals and 2) absent from unaffected individuals within a family.
Selecting rare or novel CNVs. Common variants were removed following comparison with the Database of Genomic Variants (DGV) and 1000 Genomes CNV calls, and using a criterion of 50% or more reciprocal overlap with population CNVs with 1% or higher frequency. BEDTools [41] was used to identify called genic CNVs that overlapped with variants in databases.
Reducing false calls To minimize false calls, rare genic CNV calls (consensus of Genome STRiP v2.0 and Canvas v1.38.0) from each affected individual was used to query calls in affected and unaffected family members for the same or similar breakpoints. If the same CNV (consensus of the two tools) was present in affected individuals and absent from unaffected individuals it was included in the final list.
Annotation of CNV calls. Annotation of CNV calls was performed using Opal (Fabric genomics, Inc.) and included variant effect predictor (VEP) which determines the effect of the variant on genes, transcripts and protein sequences as well as regulatory regions [42], population databases including the Database of Genomic Variants (DGV) and dbVar (National Center for biotechnology information), and literature evidence (OMIM, ClinVar, COSMIC, etc).
De novo CNV calling. De novo CNV calling was not performed as sequence data was not available from both parents. Trio families were not included in the analysis.

Functional studies of SLIT3 and CACNA1G
Slit Drosophila lines. Generation of drosophila slit lines was performed by GenetiVision Corporation (Houston, TX). To determine the pathogenicity of the SLIT3 variant (c.3505G>C (NM_001271946.1), p.(Val1169Leu)(Drosophila slit p.Val1187Leu; NP_001261017.1) that we identified in family D, Drosophila slit lines were created via two steps of CRISPR-Cas9 mediated homology directed repair (HDR) events. In the first step, two crispr/cas9 targets were designed to delete a 183 bp fragment containing the V1187. Two guide RNAs (TGCTT CCAACCGAACGCTCA & AATGGAATTCTCATGTACGA) were cloned into pCFD3 vector (http://www.flyrnai.org/tools/grna_tracker/web/files/Cloning-with-pCFD3.pdf) and a donor DNA was created with our GFP cassette flanked by two 1 kb SLIT sequences beyond cleavage sites. Upon co-injection of both DNA constructs, two gRNAs will be expressed to direct the double strand break (DSB) by cas9 (endogenously expressed in the injection stock BL#54591). After DSB, the GFP cassette was inserted into SLIT genome via donor DNA mediated recombination. In the second step, based on the same principle, GFP KI cassette was substituted by a 420 bp DNA fragment containing V1187L point mutation using a new set of gRNAs (CCGCT GTCCAGACGACTATA & CGATGGAAAGTACCATGCCG) and new donor DNA. A molecular screen to confirm the mutation involving 20 individual mating crosses followed by genomic DNA isolation of the founder, PCR and sequencing was used to confirm the presence of the mutation. We independently confirmed the presence of the mutation by Sanger sequencing. Briefly, flies (n = 2) were homogenized in AL buffer from the DNeasy Blood and Tissue kit (Qiagen, Germany) and processed for genomic DNA according to the manufacturers instructions. PCR amplification of the region containing the point mutation was amplified and cloned using the Original TA cloning kit (Invitrogen, CA). DNA isolated from randomly selected clones were Sanger sequenced. Sequencing chromatograms showing the point mutation are provided as supplementary data (S1 Fig).
Negative geotaxis climbing assay. The loss of climbing response was used to monitor aging related locomotor changes in Drosophila [43,44]. The climbing assay was performed as previously described [43,44]. We assessed 10 flies per vial for each slit mutant line and control line. 5 trials were conducted for each vial. The average climbing rate was determined by measuring the first fly to climb 17.5cm. Climbing response was assessed at the following time points: Day 7, 14, 21, 28, 35 and 42.
Lifespan assay. Lifespan assays were performed as described previously [45]. Briefly, 100 virgin male flies per line were sex-segregated within 4h of eclosion and maintained in small laboratory vials (n = 20 per vial) containing fresh food in a low-temperature incubator at 25˚C and 40% humidity on a 12/12h dark/light cycle. The flies were then transferred to fresh food vials every 2-3 days and mortality recorded.

Ca V 3.1 electrophysiology
Transfection and cell culture procedures. To determine the functional effects of CAC-NA1G variants identified in ET families on electrophysiology studies by whole cell patch clamp recordings was performed in HEK293 cells expressing the Ca V 3.1 mutant or wild type channels. The variants rs200317339 (c.2271G>A; p.Gly627Arg), rs116920450 (c.1759G >A; p.Arg456Glu) and rs150972562 (c.4096G>A; p.Arg1235Gln) were introduced into human wild-type CACNA1G (isoform 1;BC110995.1, NM_198382.2) by site directed mutagenesis (KIT details). HEK293 cells were cultured in 150mm dishes until 75% confluence. For the transfection reaction cells were harvested, counted and resuspended in MaxCyte buffer to reach 50 million cells per ml. DNA of wild type or mutant Cav3.1 channels were added to the cell suspension to a final concentration of 100μg per ml. Electroporation was performed using the MaxCyte machine. Following the reaction cells were transferred into a 48-well plate to recover in electroporation buffer containing DNase for 20 minutes and in recovery media for 60 min both steps at 37˚C. After recovery cells were counted and plated in 35mm dishes for manual patch clamp studies. Electrophysiological studies were performed 24-72 hours post plating.

Electrophysiology.
Experiments were performed at room temperature or near-physiological temperature. Four voltage protocols (1) activation kinetics, 2) deactivation and inactivation kinetics, 3) steady state inactivation and 4) voltage dependence of recovery from inactivation) were applied to at least four cells expressing the wild type channel (n = 4) and four cells expressing the mutant channel (n = 4). For the voltage protocol designed to investigate activation kinetics currents are activated from a -100 mV holding potential by 60 ms step pulses in 10mV increments up to +70mV. Investigation of deactivation and inactivation kinetics was performed by maximal activation of currents by 2 ms step pulses to +60mV (P1) followed by 38 ms pulses (P2) from +60mV to -120mV in 10mV steps. The current activated at P3 gauges the amount of inactivation developed during P2. To investigate steady state inactivation, the ratio test (P2) to control (P1) currents activated by 5ms step pulses to -20mV was used as an indicator of the fraction of channels inactivated during the 1s pulses (P2) from -120 to -40mV. In protocol 4, to investigate the voltage dependence of recovery from inactivation, inactivation was induced by a 60 ms step pulse to -20mV (P1) and gauged by the current elicited by a 10 ms pulse to -20 mV (P3) after a variable period of recovery at -120, -80 and -70 mV (P2).
Data analysis. Data acquisition and analyses was performed using the suite of pCLAMP (Ver. 8.2) programs (MDS-AT, Sunnyvale, CA).
Activation. Activation was parametrized by the voltage dependence of the peak current amplitude and the time to peak (TP) of currents activated using protocol 1. The voltage dependence of peak negative current amplitude was fitted to the following equation: Where V is the membrane potential in mV, Vhalf is the membrane potential where half of the channels are activated, k is the slope factor in mV, Gmax is the maximal conductance in nS and Er is the apparent reversal potential in mV.
The voltage dependence of the rate of channel opening parametized by TP was fitted to the following equation: Where V TP is the voltage at which TP is equal to 1+TP1 and k is the voltage sensitivity in mV.
Deactivation. Channel closing was assessed using currents activated with protocol 2. The voltage dependence of channel closing was evaluated fitting the time constants of tail currents between -70mV and -120mV or the minimal membrane potential where tail currents were resolved with the following equation: Where Vτ deact is the potential at which τ is equal to 1 and k is the voltage sensitivity in mV. Inactivation. Channel inactivation will be parametrized by 1) the voltage dependence of the steady state inactivation curve from currents elicited using protocol 3, 2) the time constant of inactivation from currents recorded at positive potentials during the P2 step described in protocol 2 and 3) by the voltage dependence of the rate of recovery from inactivation measured using protocol 4.
The voltage dependence of steady state inactivation will be fitted to the equation: Where P2/P1 is the ratio of peak currents elicted by the test and control steps, V is the membrane potential in mV, V half is the membrane potential where 50% of the channels are inactivated and k is the slope factor in mV.
The voltage dependence of the inactivation time constants will be evaluated by plotting the functional relationship between τ inact and V emphasizing on possible changes in the voltage -independent limiting rate.
The rate of recovery from inactivation will be evaluated fitting a single exponential function according the following equation: Where P3/P1 is the ratio of peak currents elicited by the test and control steps, A is the P3/ P1 ratio after complete recovery at the corresponding potential and τ rec is the characteristic time constant of recovery from inactivation at a particular voltage.

Availability of data
All phenotype data generated from this study has been deposited in the database of Genotypes and Phenotypes (dbGaP; http://www.nlm.nih.gov/gap) of the National Center for Biotechnology Information. The study titled 'Identification of Susceptibility Genes for Essential Tremor' received the dbGaP Study Accession: phs000966.v2.p1. Additionally, all deidentified WGS data and related meta data underlying the findings reported in this manuscript are available at the public repository Dryad (datadryad.org) with DOI number doi:10.5061/dryad.td8d20v.

Results
To identify candidate genes in ET we conducted WGS in 40 individuals from 8 families with multiple affected ET members (Table 1 and Fig 1). Datasets were generated based on filtering of variants on allele frequency in population databases (Fig 2). To identify and prioritize genes in the ET families we performed rare variant classification and association analysis using the Mixed-Model Kernel Based Adaptive Cluster (MM-KBAC) test [19] followed by phenolyzer [39].

Rare variant classification and association analysis of rare variants with MAF<0.01
After QC and variant filtering, a total of 3,777,271 variants were selected for the subsequent analyses (Fig 2). By MM-KBAC analysis, we obtained 1,325 genes with p value<0.05 (with-in gene association) and 3,779 variants located within these genes. Of those, 783 variants were annotated with a phred scaled score>10 by CADD and 95 variants were predicted by in silico prediction tools to be deleterious or damaging.
We assessed the following variant types: 1) nonsynonymous, 2) LoF, 3) 5'UTR and 3'UTR, 4) synonymous and 5) intronic variants. Variants identified from the MM-KBAC analysis, that were annotated with a phred scaled score>10 by CADD and/or predicted by in silico prediction tools to be deleterious or damaging and that co-segregated within the ET families are shown in Table 2. A total of 168 variants located in 163 genes co-segregated with ET within families.
Nonsynonymous variants. We conducted the MM-KBAC analysis on 11,272 rare nonsynonymous variants in 4,877 genes and obtained a total of 316 genes with p<0.05. After annotation of variants, we obtained 87 variants that co-segregated within families. One variant in Table 2. Variants identified in families co-segregating with ET based on MM-KBAC analysis of rare variants by variant type.   COPZ2 was removed from analysis based on the MAF>0.01 reported in ExAC although it was not reported in the 1000 Genomes data. LoF variants. The analysis was performed on 1,364 rare LoF variants located in 711 genes and a total of 60 genes were obtained with a p<0.05. Following annotation, 13 deleterious variants co-segregated within families (Table 2). For Indel variants, BAM files were manually examined using the Genome browser in SVS v8.6 (Golden Helix) to verify the variant.

Variants in 5'-UTR and 3'-UTR regions.
The MM-KBAC analysis was conducted on 26,872 rare variants in 8,299 genes and 409 genes were obtained with a p<0.05. Following annotation of variants and analysis of co-segregation, 25 variants co-segregated within families ( Table 2). Synonymous variants. The analysis was performed on 5,854 synonymous rare variants located in 3,164 genes and a total of 216 genes with a p value<0.05 were obtained. Following annotation, a total of 35 variants co-segregated within families (Table 2). A variant in ASB16 was excluded from the analysis based on the allele frequency reported in ExAC (MAF = 0.0278). We also investigated whether synonymous variants were located in splicing enhancer and silencer regions within genes. The variants c.429G>A (NM_006024.6), c.3606C>T, c.1809G>A and c.177G>A were identified in enhancer regions in the TAX1BP1, PDS5B, NTN1 and TECR genes respectively and c.72C>T (NM_02817.2), c.846G>A (NM_152782.3), and c.861C>T (NM_024830.3) were located in splicing silencer regions in the HAPLN2, SUN3, and LPCAT1 genes respectively (Table 3).
Intronic variants. The MM-KBAC analysis was conducted on 1,174,082 intronic rare variants located in 16,486 genes and 324 genes with a p value<0.05 were obtained. Following annotation and co-segregation analysis, we obtained a total of 14 deleterious variants that cosegregated within families (Table 2).
DNAse I hypersensitivity sites and transcription factor binding sites. Genetic variants can affect transcription factor binding sites (TFBS), particularly via their enrichment in DNase I hypersensitive sites (DHS) that provide open chromatin access to transcription factors. Thus we sought variants that could be enriched at these sites using TFBS conserved data in ENCODE [46]. We asked whether the 169 variants (MM-KBAC analysis by variant type, and that includes annotated variants that co-segregated within ET families) identified from our analyses were found in DHS. 67 variants in 65 genes were in DHS. These 67 variants comprised 6 of 67 (9%) 5'-UTR variants; 6 of 67 (9%) 3'-UTR variants; 3 of 67 (4%) were LoF variants; 36 of 67 (54%) were nonsynonymous variants; 12 of 67 (18%) were synonymous; and 4 of 67 (6%) intronic variants.
DHSs are enriched with transcription factor binding sites (TFBSs), crucial sequences for the regulation of gene expression. Cross species conservation of genomic sequence has been successfully used for identifying biologically functional TFBS [47]. We identified 40 variants within TFBS (Table 4).
Family E. No annotated (phred-scaled CADD score >10 or predicted deleterious or damaging by in silico tools) segregating rare deleterious variants were identified in Family E.
Family G. The top ranked and predicted seed gene is CRK (raw score 0.6991; normalized score 0.073) based on disease mapping in DISGENET, protein interactions (PUBMED 16713569; yeast 2-hybrid with ATXN1, score 0.004856), the same biosystem (e.g. signal transduction; NGF signaling via TRKA form the plasma membrane; signal transduction; signalling to ERKs; signalling by NGF; neurotrophic factor-mediated Trk receptor signaling), the same gene family (e.g. SH2 domain containing) or transcription interactions.
CACNA1G. We evaluated all candidate genes prioritized by phenolyzer in a previously published WES dataset of ET families [15]. We identified two additional families (S3 Fig). One family had a non-synonymous variant in CACNA1G (c.3635G>A (NM_018896.4), p. (Arg1212Gln)), rs150972562) that is highly conserved evolutionarily and is predicted to be deleterious or damaging by several in silico tools (provean (score: -3.62), SIFT (score: 0.002) and Mutation Taster (disease causing)) that co-segregated with ET. The second family, also had a non-synonymous variant in CACNA1G (c.1879G>A (NM_018896.4), p.(Gly627Arg), that is highly conserved evolutionarily (GERP score: 5.7199) and is predicted to be damaging (DANN score: 0.9977) that co-segregated with ET. These CACNA1G variants were apparent retrospectively but was not identified in the prior analysis using the bioinformatics pipeline or analysis methods applied in the WES study. The allele frequency of rs150972562 in the Genome Aggregation Database (gnomAD) database is 0.001473 (413/280340+1 homozygote), and for the c.1879G>A variant is 0.001356 (308/227140+1 homozygote) which is below the estimates of the disease prevalence of ET at 2-4%.

Rare CNVs segregating with ET in families
We detected a total of 7 rare genic CNVs that segregated with ET in 5 families (Families A, C, F, G and H)( Table 5). Four CNVs were copy number gains and three were deletions. The size of CNVs ranged from~4Kb to 17Kb. We did not identify rare genic CNVs >100Kb that segregated with ET in families. CNVs in known ET associated genes (e.g. LINGO1) or other neurodegenerative genes were not identified. In family A, a 4.8Kb deletion in intron 2 of the GUCY1A3 gene was identified in addition to two duplications in intron 2 of KANSL1. The duplications in KANSL1 span ENCODE annotated H3K27AC marks, DNAse hypersensitive clusters and transcription factor binding sites that are often found near regulatory elements and promoters. In family C, a 3.9kb deletion spanning DNase hypersensitive clusters~10kb downstream of SOD2 was identified. In family F a 17.3kb deletion spanned several genes including CDK11A, CDK11B, AK097814, intron 5/exon 6 of SLC35E2A and intron 1 of SLC32E2B. Intronic duplications in TAOK1 and C1ORF185 were identified in families G and H respectively.

Functional studies
Slit Drosophila model and nervous system dysfunction. To determine the pathogenicity of the SLIT3 variant (c.3505G>C (NM_001271946.1), p.(Val1169Leu)(Drosophila slit p. Val1187Leu; NP_001261017.1) that we identified in family D, Drosophila slit lines were created. To test the hypothesis that the slit p.Val1189Leu variant causes nervous system dysfunction we evaluated climbing response throughout the fly lifespan. Flies expressing the mutant slit (p.Val1187Leu) compared to wildtype slit displayed significantly slower climbing (p<0.05) throughout lifespan (Fig 3) suggesting that the slit p.Val1187Leu mutation causes age related locomotor changes. Because ET is associated with increased mortality we performed survival assays. Significant differences in lifespan were detected between flies expressing the mutant slit (p.Val1187Leu) compared to wildtype slit (p<0.0001) suggesting that the slit p.Val1187Leu mutation is associated with reduced lifespan (Fig 3).
Ca V 3.1 electrophysiology. To determine the functional effects of CACNA1G variants identified in ET families, electrophysiology studies by whole cell patch clamp recordings was performed in HEK293T cells expressing the Ca V 3.1 mutant channels. At room temperature or at near physiological temperatures the current voltage relationship and the time measured from the beginning of the depolarizing pulse to the peak of the inward current (time to peak) (Fig 4) observed in mutant and wild type channel variants showed no significant differences. Kinetics of channel closing (time constant of deactivation and inactivation) and steady state inactivation (Fig 4) were similar between wild type and mutant channel variants. The voltage dependence of Ca V 3.1-1235 channel activation showed a trend in altered channel function with a small shift to more negative values (Fig 4) and channel deactivation was shifted to positive values (Fig 4).

Discussion
In this study, we applied the MM-KBAC test [19] to analyze rare SNVs and two SV and CNVdetection algorithms Canvas version 1.19.1 and Genome STRucture in Populations (Genome STRiP) version 2.0 in the WGS data generated from eight early-onset ET families enrolled in the family study of Essential Tremor (FASET). While numerous methods have been described for rare variant analysis in case-control studies, relatively few methods exist for family-based studies. The advantages of family-based studies are their robustness to population stratification [48], and the use of information about transmission of genetic factors within families, which is more powerful than population-based case control studies [49]. Genes identified by MM-KBAC analysis in ET families were prioritized using phenolyzer. Phenolyzer prioritizes candidate genes based on disease or phenotype information. Phenolyzer includes multiple components, including a tool to map the user-supplied phenotype to related disease, a resource that integrates existing knowledge on disease genes, an algorithm to predict previously unknown disease genes, a machine learning model that integrates multiple features to score and prioritize candidate genes and a network visualization tool to examine gene-gene and gene-disease relationships [39]. Previously, we performed WES [15] on a subset of the families (Families A, B and F) included in the current WGS analysis. For these families, WES did not identify the candidate genes identified in the current WGS study. There are several reasons why variants and candidate genes could have been missed in the prior WES analysis. Recently published studies [18,50] suggest that WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. WGS which forgoes capturing is less sensitive to GC content and is more likely than WES to provide complete coverage of the entire coding region. Other factors that can affect variant and candidate gene identification include the bioinformatics pipeline (GATK version and implementation options) used and statistical analysis methods (WES study pVAAST [15,51] versus WGS in the current study: MM-KBAC [51]). Although, WGS is currently more expensive than WES (WES 50x coverage, Illumina NovaSeq 6000 = $600/sample versus WGS 30x coverage, Illumina HiSeq X = $1,575/sample; price includes library preparation, sequencing, QC of data and bioinformatics and short term data storage) advantages include the capture of non-coding (intronic and intergenic) and regulatory (5'UTR, 3'UTR, promoter and enhancers) variants in addition to coding variants and analysis of structural variants (SVs) and copy number variants (CNVs). Rare genic CNVs that segregated with ET were identified in 5/8 (62.5%) families.
Recently, we reported copy number variation of LINGO1, a susceptibility gene for ET, in a large 5 generation South Indian family with upper limb postural tremor (dystonic tremor) [52]. We did not identify CNVs in LINGO1 or other known neurodegenerative genes in the ET families in the current study. Additional studies will be needed to determine the significance of the CNVs identified in the ET families in the current study.
In the current study, within each ET family, we generated a prioritized candidate gene list that can be considered for functional studies. In family H, CACNA1G is predicted to be the most disease relevant seed gene because it maps to Spinocerebellar ataxia 42 (SCA42) in OMIM (OMIM 616795). CACNA1G is also a genetic modifier of epilepsy [53,54]. The identification of two additional families, with a deleterious/damaging CACNA1G variants, from a previously published WES dataset strongly suggests that CACNA1G may be a susceptibility gene for ET. SCA42 is an autosomal dominant neurologic channelopathy disorder characterized predominantly by gait instability, tremor (i.e. intention, postural, head, and resting) and additional cerebellar signs (i.e. dysarthria, nystagmus and saccadic pursuits), and is caused by a heterozygous mutation in CACNA1G. There is variable age at onset (range 9->78 years) and slow progression of the disease. We reviewed the clinical data in the CACNA1G families for the characteristic signs of SCA42 including ataxia, gait instability and ocular signs [55][56][57]. None of the individuals with ET in these families exhibited these problems, suggesting that these families do not have SCA. On the other hand, neuropathologic studies available for an 83 year old affected individual with SCA42, who also had dementia, showed cerebellar atrophy with Purkinje cell loss and loss of neurons in the inferior olive [55], which in terms of the Purkinje cell loss, is consistent with neuropathological findings of some ET patients [58].
The CACNA1G gene encodes the pore forming subunit of T-type Ca(2+) channels, Ca V 3.1, and is expressed in various motor pathways and may serve different functions [59]. The T-type calcium channel, Ca v 3.1, has been previously implicated in neuronal autorhythmicity [60,61] and is thought to underlie tremors seen in Parkinson's disease [62], enhanced physiological tremor, and in ET [63] and T-type calcium channel antagonists have been shown to reduce tremor in mouse models of ET [61,64,65].
The identification of CACNA1G in three ET families in the current study is consistent with recent reports of mutations in other ion channel genes in other ET families and the concept that the ETs are channelopathies [14,15]. Electrophysiology studies of the Ca V 3.1 mutant channels identified in ET families in the current study either showed small differences or no change compared to the wild type channel. The lack of significant differences may reflect the small sample size (number of cells sampled) and that the study was underpowered. However, the voltage dependence of Ca V 3.1-1235 channel activation showed a trend in altered channel function with a small shift to more negative values and channel deactivation was shifted to positive values. The other two Ca V 3.1 mutant channels identified in this study are located in the I-II loop of the Ca V 3.1 channel which is associated with channel trafficking [66] and these variants may effect trafficking to the membrane or cytosolic organelles.
We previously reported the identification of a mutation in Kv9.2 (KCNS2), that encodes an electrically silent voltage-gated K + channel α subunit, in a family with pure ET [15]. Kv9.2 is highly and selectively expressed in the brain and modulates the activity of Kv2.1 and Kv2.2 channels, which play a major role in membrane excitability and synaptic transmission and is critical for motor control and other neuronal network functions [67]. In two families with atypical ET, mutations were also identified in genes encoding voltage-gated sodium channel alpha subunits. In a family with epilepsy and ET, a disease-segregating mutation p.(Gly1537-Ser) in the SCN4A gene was identified and functional analyses demonstrated more rapid channel kinetics and altered ion selectivity, which may contribute to the phenotype of tremor and epilepsy in this family [14]. In a four generation Chinese family, with early onset familial episodic pain and ET, a gain-of-function missense mutation p.(Arg225Cys) in SCN11A was identified [68]. Collectively, identification of mutations in a T type Ca(2+) channel (CACNA1G; three families, this study), a voltage-gated K + channel α subunit (Kv9.2; KCNS2, 1 family), and voltage-gated sodium channel alpha subunits (SCN4A and SCN11A) in ET families (five total to date) is emerging evidence that problems in regulation of membrane excitability and synaptic transmission, which are important more broadly for motor control and other neuronal network functions, could play a role in the pathophysiology of ET. The genetic basis of ET has so far remained elusive. Given the clinical and genetic heterogeneity observed in ET [11][12][13][14][15][16], evaluation of ion channel genes as candidate genes for ET is warranted.
In family D, SLIT3 is predicted to be the most disease relevant gene. A disease association with SLIT3 in OMIM has not been described. The non-synonymous variant identified in SLIT3 (c.3505G>C, p.(Val1169Leu); rs144799628) is highly conserved evolutionarily, is predicted to be deleterious or damaging by several in silico tools and has an allele frequency in the ExAC database of 0.0006407 (72/112370+2 homozygotes), which is below the estimates of the disease prevalence of ET at 2-4%. A disease association of SNPs in the SLIT3 gene and genetic risk (models: susceptibility, survival and age-at-onset) for Parkinson disease was previously identified in two independent GWAS datasets [69]. Axon guidance pathway molecules are involved in defining precise neuronal network formation during development and in the adult central nervous system play a role in the maintenance and plasticity of neural circuits. The Slit axon guidance molecules and their receptors, known as Robo (Roundabout) serve as a repellent to allow precise axon pathfinding and neuronal migration during development. Three Slit ligands have been identified in vertebrates with spatio-temporal expression patterns in the nervous system as well as in the peripheral tissue and other organs during development. Slit or Robo null gene animal models (Drosophila or mouse) show that Slit-Robo interactions act as a repulsive signal to regulate actin dynamics for axon guidance at the midline for commissural, retinal, olfactory, cortical and precerebellar axons [70]. The mechanism by which SLIT3 contributes to ET may involve early degenerative changes in the years preceding diagnosis and possibly even during brain development (the miswiring hypothesis). In one published study, the candidate gene, TENM4, which is a regulator of axon guidance and central myelination, was identified in three ET families [12]. This finding together with the identification of SLIT3 as a candidate gene in an ET family in the current study suggests that in some instances ET may be a disorder of axon guidance. To determine the pathogenicity of the SLIT3 variant that we identified in family D, Drosophila slit lines were created. Behavioral manifestations of nervous system dysfunction were observed in the Drosophila slit model suggesting a role in ET disease pathogenesis. Further characterization of the Drosophila model will be needed to determine the disease mechanism.
In three families, phenolyzer prioritized genes that are associated with hereditary neuropathies (family A, KARS, Charcot-Marie-Tooth disease B (OMIM 613641); family B, KIF5A, spastic paraplegia 10 with or without peripheral neuropathy (OMIM 604187); and family F, NTRK1, hereditary sensory and autonomic neuropathy IV (OMIM 256800). Among the clinical features of CMTRIB with peripheral neuropathy, electrophysiologic studies show motor nerve conduction velocities of 39.5 and 30.6 m/s in the median and ulnar nerves, respectively consistent with an intermediate phenotype between that of demyelinating and axonal CMT [71]. Heterozygous pathogenic mutations in KIF5A are also known to cause an axonal CMT subtype [46]. Interestingly, tremor is known to occur in patients with neuropathies although its reported prevalence varies widely [72]. In a case control study that assessed the presence and severity of tremor using the Fahn-Tolosa-Marin Scale, Archimedes spirals and Bain and Findley spiral score, in 43 consecutively recruited patients with inflammatory neuropathies, twenty seven (63%) patients had tremor (posture or action) with a mean age at tremor onset of 57.6 (11.6) years (widely) [72].
In summary, WGS analysis identified candidate genes for ET in 5/8 (62.5%) of the families analyzed. WES analysis of these families in our previously published study failed to identify candidate genes. Functional studies of two candidate genes identified, CACNA1G and SLIT3, suggest a role for these genes in ET disease pathogenesis.
The genes and pathways that we have identified can now be prioritized to further our understanding of the pathophysiology of ET using cellular and animal models.  CACNA1G variant (c.1879G>A (NM_018896.4), p.(Gly627Arg)). (DOCX) S1