A transcription-centric model of SNP-age interaction

Complex age-associated phenotypes are caused, in part, by an interaction between an individual’s genotype and age. The mechanisms governing such interactions are however not entirely understood. Here, we provide a novel transcriptional mechanism-based framework–SNiPage, to investigate such interactions, whereby a transcription factor (TF) whose expression changes with age (age-associated TF), binds to a polymorphic regulatory element in an allele-dependent fashion, rendering the target gene’s expression dependent on both, the age and the genotype. Applying SNiPage to GTEx, we detected ~637 significant TF-SNP-Gene triplets on average across 25 tissues, where the TF binds to a regulatory SNP in the gene’s promoter or putative enhancer and potentially regulates its expression in an age- and allele-dependent fashion. The detected SNPs are enriched for epigenomic marks indicative of regulatory activity, exhibit allele-specific chromatin accessibility, and spatial proximity to their putative gene targets. Furthermore, the TF-SNP interaction-dependent target genes have established links to aging and to age-associated diseases. In six hypertension-implicated tissues, detected interactions significantly inform hypertension state of an individual. Lastly, the age-interacting SNPs exhibit a greater proximity to the reported phenotype/diseases-associated SNPs than eSNPs identified in an interaction-independent fashion. Overall, we present a novel mechanism-based model, and a novel framework SNiPage, to identify functionally relevant SNP-age interactions in transcriptional control and illustrate their potential utility in understanding complex age-associated phenotypes.

Introduction Normal aging is a critical "environmental" risk factor for complex diseases such as hypertension, cardiovascular defects, macular degeneration, Parkinson's disease, and cancers [1,2]. For instance, in 2011-2014, the prevalence of hypertension among people older than 60 years was 65% compared to only 7.3% for people between 18 and 39 years old [3]. Numerous crucial biological functions such as immune response, wound healing, DNA repair, metabolism, and mitochondria function also significantly decline with aging [1,2,4], further underscoring aging as a major risk factor for complex diseases. Even though the underlying mechanisms linking aging with complex diseases are far from clear, profound transcriptomic changes associated with both aging and complex diseases have been identified [5][6][7] and may provide mechanistic insights.
In addition to aging, genomic variations also significantly contribute to complex diseases [8][9][10][11], including cancer [12][13][14]. It is also widely accepted that genomic variations affect complex phenotypes, in substantial part, through transcriptomic variability. This is exemplified by numerous demonstrations that SNPs identified by eQTL studies [15][16][17][18] are enriched in regulatory regions of the genome and for SNPs associated with complex traits, and these links have the potential to help identify key driver genes and mechanisms [19].
It is likely that aging and genome variations jointly contribute to complex diseases. That is, systemic molecular changes through aging may affect complex phenotypes in a genotypedependent manner. Indeed, such genotype-environment ('Age' being an environmental factor in this scenario) interactions have been previously investigated. For instance, by incorporating a SNP-age term in a regression model, Yao et al identified 10 age-dependent eQTL SNPs (eSNPs) in Whole Blood [20]. In a meta-analysis, Simino et al detected 9 SNPs which have age-dependent association with blood pressure [21]. Work by Dongen et al showed that the methylome in whole blood could be affected by the interactions between SNPs and age [22]. However, these previous studies have not investigated specific molecular mechanisms underlying the SNP-age interactions in determining a phenotype, which limits biological insights they provide and, as will be made clear below, limits their statistical power as well.
Here, based on established transcriptional mechanisms, we present a model and a pipeline, that we term "SNiPage", to identify SNP-Age interactions in determining target gene expression. Our model is based on the hypothesis that age-associated transcription factors (TF) exhibiting allele-specific binding at a regulatory SNP will lead to age-associated expression changes in the target gene in an allele-specific manner (Fig 1). Thus, our model uses the expression level of an age-associated TF as a proxy for age and identifies TF-SNP-Gene triplets where the target gene's expression is determined by an interaction between the TF's expression level (equivalently, host age) and the SNP genotype. Our model thus facilitates exploration of SNP-age interactions across TFs and across tissues, substantially extending the scope and statistical power of previous explorations.
Application of SNiPage to 25 human tissues revealed~637 significant TF-SNP-Gene triplets on average in each tissue. Multiple biological evidence, including chromatin marks indicative of active regulation, allelic imbalance, and SNP-Gene spatial proximity, suggest that our detected SNP-Gene pairs are highly likely to be functional. Functional enrichment analyses link the detected target genes regulated by SNP-age interactions to the aging process and complex diseases. Interestingly, aggregated SNP-age interactions significantly inform an individual's hypertension state in six known hypertension-related tissues. Likewise, for several other phenotypes, including cancers, the age-interacting SNPs exhibit a significantly higher linkage to the reported phenotype-associated SNPs than the SNPs detected using standard interaction-independent model. Finally, the detected SNPs exhibit relatively higher derived allele frequencies in human, suggestive of their potential role in adaptive evolution.
In summary, we have reported a novel framework, exploiting established transcriptional mechanisms, to identify SNP-age interactions driving gene expression, and the detected TF-SNP-Gene triplets may provide further insights into the mechanisms linking genotype and aging with complex age-related phenotypes.

SNiPage overview
The SNiPage pipeline is illustrated in Fig 2 and described in Methods. We obtained the imputed SNP genotypes and transcriptome from~570 individuals across 54 tissues from the GTEx consortium [23] (sample details shown in S1 Table). We retained those SNPs within open chromatin regions (using tissue-specific DNase hypersensitivity (DHS) [24]) which were within 1Mbp of gene promoters [25][26][27]. Our analysis is thus limited to those 25 tissues for which DHS profiles are available (S1 Table and S6-S8 Figs). We detected age-associated TFs based on their gene expression (Methods), controlling for genetic background and hidden With aging, the TF expression decreases and consequently so does the target gene's expression for 'A' allele but has no effect on the 'T' allele (C). https://doi.org/10.1371/journal.pgen.1009427.g001 variables [6,28,29]. For age-associated TFs, based on published DNA-binding motifs, we identified their putative allele-specific binding at all retained SNPs (Methods). For each SNP, all genes within 1Mbp of the SNP are considered as its potential cis-regulatory targets, which include other TFs. For each candidate TF-SNP -gene triplet, based on a linear model, we identified significant TF-SNP interactions associated with the target gene's expression. Finally, as a further consistency filter, we selected only those TF-SNP-gene triplets that exhibited a TFgene correlation only for the SNP allele promoting TF binding, and not for the other allele.

SNiPage robustly detects numerous TF-SNP-Gene interactions across 25 tissues
We tested a total of 2,997,407 TF-SNP -gene triplets involving 125 TFs, 8,377 SNPs, and 4,661 genes on average across each of the 25 tissues and identified~637 significant TF-SNP-Gene triplets (parameters at various steps of the pipeline are mentioned in Methods) on average in each tissue; details in Tables 1 and S3. As a technical control, when we randomly shuffle gene expression across samples (permuting sample ids while preserving gene-gene covariance), far fewer triplets (~110 per tissue; S2 Table) are detected (p-value for paired Wilcoxon test for 25 tissues = 4.2e-07). We detected the largest number of interactions in whole blood, Brain-Cortex, and Colon-Transverse tissues. Interestingly, on average, only~18% of the SNPs involved in a detected triplet were previously detected as eSNPs [23]. This potentially implies that numerous functional SNPs may escape detection based on standard interaction-independent eQTL studies due to interactions with environmental factors. On average only half (~53%) of the target genes involved in interactions are themselves significantly associated with age (enriched over the background control; Fisher test p-value = 5e-3), consistent with a regulatory role of age-associated TFs in driving age-associated expression of the target gene. We identify age-associated TFs, and then identify genome-wide (imputed) SNPs at which those age-associated TFs are predicted to have allele-specific binding, based on their known DNA-binding motifs. The resulting SNPs are filtered to retain only those in open chromatin region, based on tissue-specific DNase-Seq data. Then we test the interaction significance using a linear model, which measures the association between the interaction and target gene expression. Finally, we further select the potential functional interactions based on consistency with allele-specific binding predictions and the TF-target gene expression correlations. https://doi.org/10.1371/journal.pgen.1009427.g002

PLOS GENETICS
A specific example of a detected TF-SNP-Gene interaction is illustrated in Fig 3A: TF MAX binds at SNP rs2295079 to regulate the expression of MTOR in Whole Blood. Adjusted expression of TF MAX significantly decreases with age ( Fig 3B); note that here we use TF's adjusted expression value, after regressing out various confounding factors as well as hidden genetic and transcriptomic factors. Notably, SNP rs2295079 was not captured by the GTEx eQTL study, even though its potential functional role is supported by DNase HS, H3K4me3, and H3K27ac peaks in K562, GM12878, and A549 cell lines [30]. In addition, the MAX ChIP-Seq peak also appears in the same region in these cell lines, further supporting rs2295079's functional role. Based on our putative allele-specific binding prediction for MAX at rs2295079, we expect it to differentially bind to the 'G' allele. Accordingly, we observed genotype-specific correlation between MAX and MTOR; as shown in Fig 3C-3E, MTOR expression is significantly negatively correlated (Spearman correlation = -0.24, p-value = 5.8e-3 for only one G allele; Spearman correlation = -0.3, p-value = 4e-4 for two G alleles) with MAX expression when the individuals have at least one G allele, while for 'C' allele homozygotes, there is no correlation between MAX and MTOR (Spearman Correlation = -0.03, p-value = 0.79).
To assess the statistical robustness of the detected Age-associated TFs as well as the interactions, we estimated their replication rate in random down-sampled (90%, 80%, 70%) datasets in four tissues (adipose, whole blood, muscle and lung) with relatively large sample size. In addition, we also estimated the replication rate in the background dataset (randomly shuffled target gene expression profile) as control. S1 Fig shows the results. For the Age-associated TFs, all under-sampled datasets across four tested tissues exhibit substantial replication rate. At

PLOS GENETICS
90%, 80%, and 70% under-sampling, average replication rates across the four tissues are 86.3%, 75% and 72.4% respectively, compared to 0% replication for background control. For the triplet detection, all under-sampled datasets on average across four tested tissues exhibit replication rate of 68%, 56%, 49% respectively for 90%, 80%, 70% sampling in contrast to 0% replication rate for the background datasets. The decreasing trend of replication rates for 90%, 80%, 70% samples is consistent with the fact that sample size determine the statistical power. We did not explicitly use race and ethnicity as covariates in our model because the GTEx data is biased (>80% of the samples) towards White race, and the annotation of ethnicity is sparse. However, we have included SNP-derived hidden variables as confounding factors in our model, which is expected to account for race and ethnicity. We have further assessed the effect of including race in our models as follows. We included race in our age-associated TF detection model in the three tissues (blood, adipose, and muscle) with large sample sizes, and observed that a very large fraction (99.6%, 97.6%, and 97.1% respectively in the three tissues) of age-associated TFs detected by the original model were also detected by the race-controlled

PLOS GENETICS
model. Additionally, in the three tissues we randomly selected 1% of candidate triplets (359070, 139624, and 132188 SNP-TF-gene candidates respectively) and again, a very large fraction (92%, 95%, and 89% respectively in the three tissues) of the triplets detected by the original model were also detected by the race-controlled model.
Overall, our analyses suggest that our detected Age-associated TFs and the interactions are not likely to be false positives and are statistically robust.

Identified interacting SNPs are likely to be functional
In view of our hypothesis that age-associated TFs bind to a SNP locus and regulate target gene expression, the detected SNP loci are expected to be in cis-regulatory elements such as enhancers and promoters. We therefore checked whether the detected SNP loci are enriched for open chromatin signals (DHS) and other histone marks (H3K27ac, H3K4me1 and H3K4me3) characterizing regulatory regions. Even though DHS peaks were used as an inclusion criterion for the SNPs, we tested whether the 200 bps flanks of the detected interacting SNPs exhibit higher DHS intensity compared to other tested SNPs which, notably, also qualified under the initial DHS filter. Each of the 4 epigenomic marks was analyzed for the tissues in which the relevant data was available and there were at least 200 foreground SNP loci. As shown in Fig 4A, the foreground (SNP loci involved in the detected interactions) DHS intensities are significantly higher than the background (SNP loci that passed the initial DHS filter and were tested for interactions) in seven out of eight tissues analyzed. This was also broadly true for H3K27ac and H3K4me3 (Fig 4C and 4D). However, we observed a significant difference between the foreground and the background SNPs for H3K4me1 in six of the fifteen tissues analyzed in Fig 4B. Note that while H3K27ac and H3K4me1 are marker for active enhancers [31][32][33], H3K4me1 could also be related to poised enhancers [31]. H3K4me3 is associated with gene promoters. Overall, these results strongly suggest that the SNPs involved in the detected interactions are likely to play a regulatory role.
Next, we checked whether our detected SNPs are under evolutionary selection by comparing their conservation (PhastCons scores derived from 100 species multiple alignment) with those for three backgrounds: randomly selected SNPs, randomly selected SNPs after DNase filtration, and eSNPs. As shown in S2 Fig, our detected SNPs are significantly more conserved than random SNPs (in 15 tissues out of 25), but in fewer tissues relative to the latter two controls (in 3 and 11 tissues out of 25; S3 and S4 Figs). However, interestingly, our detected SNPs exhibit significantly higher derived allele frequency (DAF; see methods) [34] compared to eSNPs in 21 of the 23 tissues ( Fig 5A). These results suggest that SNPs that interact with age are likely to be in functional regions of the genome and are likely to be under positive or relaxed purifying selection during human evolution.
To further assess functionality of our detected SNPs, we estimated allelic imbalance in DNA accessibility in five tissues (blood, brain, heart, lung, stomach). More specifically, for each tissue we obtained DNase-seq reads from ENCODE/RoadMap [24] and then performed variant calling (for detected age-interacting SNPs) to extract heterozygous sites for allele imbalance estimation (see Methods). The background was generated based on a binomial model with matched reads depth and equal probability of 0.5 for either allele. As shown in Fig  5B, in 4 out of 5 tissues significantly higher-than-expected allelic imbalance were observed in our detected SNPs. Lack of clear signal in lung may be due to insufficient read depths in those specific DNase-seq samples.
Finally, we tested the tendency of detected SNP locus and the target gene locus to spatially co-localize in the nucleus, which would further attest to transcriptional regulation of the gene by the SNP locus. Since tissue-specific ChIA-PET data are relatively rare and of low resolution, we therefore followed [15] and used ChIA-PET data merged from 4 cell lines K562, Hela, Nb4 and MCF7) from ENCODE, after which we quantified (Methods) and compared the spatial interaction of the detected SNP-Gene pairs with a background composed of interacting SNPs paired with randomly selected genes with the same distance distribution as the foreground. As

PLOS GENETICS
shown in Fig 5C, in three tissues (blood, breast, colon), the detected SNP regions exhibit significantly higher spatial proximity to their detected target genes than to the background (Wilcoxon test p-value = 4.7e-2, 1.8e- 15, 9.3e-5). Overall, our results support, via several lines of evidence, potential functionality of the detected age-interacting SNPs and SNP-Gene pairs.

Functional analyses of detected target genes and SNPs suggest links to agerelated processes and complex diseases
The genes involved in our detected TF-SNP-Gene triplets are expected to exhibit age-associated expression in an allele-specific manner, which may have implication on ageing and age-associated complex diseases. To explore this further, first we performed functional enrichment analysis over target genes in a tissue-specific manner (see Methods); we however note that functional enrichment analysis is limited, because with few exceptions, context-specificity of biological processes are not known and difficult to ascertain. S5 Fig shows a [35] and several age-associated complex diseases including hypertension, diabetes, neurodegeneration, and even cancer [2,36,37]. Metabolic process is crucial for the maintenance of homeostasis, which systematically deteriorates with aging [38]. Programmed cell death is an important mechanism to maintain cellular hemostasis and eliminate pathological cells [39], which are both critical for aging process. DNA methylation is the most potent ageing biomarker and potentially could be related to tumorigenesis [40]. In addition, cellular stress response is a crucial biological process which modulates the damage to cells by activating repair signaling pathways. With aging, both cellular stress response and repair pathways decline, which could be a trigger for age-associated pathology.
Next, using Hypertension as an exemplar age-associated complex disease, we assessed the association of detected TF-SNP interactions with Hypertension in eight tissues (Adipose-Subcutaneous, Whole Blood, Artery-Aorta, Lung, Adipose-Visceral, Heart-Left Ventricle, Heart-Atrial Appendage, Artery-Coronary) that are implicated in hypertension. For these tissues, we used all samples that were annotated with hypertension status unambiguously. The number of samples in each tissue are provided in S1 Table. In each tissue independently, using log-likelihood-ratio (LLR) tests, we assessed whether aggregated TF-SNP interactions contribute to hypertension status in an individual (see Methods). As shown in Fig 6, in 6 of the 8 tissues, aggregated interactions significantly (FDR � 0.05) contribute to hypertension. However, the links we discover between our detected TF-SNP interactions and Hypertension is statistical and does not imply causality.
Previous studies have shown that tissue-specific eQTL SNPs (eSNP) exhibit significant overlap with those associated with various phenotypes. For instance, Higgins et al reported that eSNP rs43555985 (linked to gene GFRA2), which is shared with many other species, is associated with residual feed intake in beef cattle [41]. Luo et al reported that gene ZNF323 transcriptionally associated with GWAS SNPs rs1150711 and res2859365 is a potential schizophrenia causal candidate [42]. We assessed whether the SNiPage-detected SNPs exhibit a similar, or greater, overlap with phenotype-associated SNPs. However, as an alternative to overlap statistics, we quantified and compared the distance of SNiPage-detected SNPs (as well as previously identified eSNPs as control) to phenotype-associated SNPs with the rationale that proximal SNPs are likely to be linked. We obtained GWAS signals for 8 diseases along with established corresponding tissue [11,43,44]. For each SNP (our detected SNPs as well as eSNPs), we measure its distance to the closest phenotype-associated SNP and compared the distances for the foreground (S4 Table) and the eSNPs, using Wilcoxon test (see Methods). As shown in Fig 7A, our detected SNPs are significantly closer to GWAS signals for breast cancer, cardiovascular diseases, colorectal cancer, and immune response disease. Even though it is not statistically significant for all the 8 diseases, the trend is still consistent in all cases.
Additionally, we assessed whether our detected SNPs segregate with ethnicity. Interestingly, our detected SNPs, pooled across all tissues, exhibit greater genomic proximity to 299 ethnicity-associated SNPs [45] (Fig 7B), suggesting that SNP-age interactions may partly explain the observed ethnicity-specific differences in age-associated diseases.
Taken together, our results suggest that target genes regulated by SNP-age interactions are potentially linked to crucial aging-related biological processes, aggregated interactions could potentially contribute to age-associated diseases such as hypertension, and relative to eSNPs, which are detected in an interaction-independent manner, SNiPage-detected SNPs are more likely to be causally linked to complex age-associated phenotypes.

Discussion
We have reported a novel framework and a pipeline SNiPage, which incorporates a specific interaction mechanism to test the association between SNP-Age interactions and phenotypes (gene expression/diseases). We hypothesized that age-related TFs might preferentially bind to one of the alleles at a SNP, forming the basis for the interplay between the SNP and Age. We robustly detected numerous TF-SNP-Gene interaction triplets (average~637) across 25 tissues. The detected SNP loci were enriched for epigenetic signals related to regulatory elements, exhibit allelic imbalance for chromatin accessibility, and exhibited spatial proximity to the putative target gene promoters, strongly supporting their functionality. Although our detected SNPs are under evolutionary conservation in human to the same extent as eSNPs, DAF analysis points to their evolution under positive (or relaxed purifying) selection during more recent

PLOS GENETICS
human evolution, while at the same time, based on GO term enrichment and previously reported phenotype-associated SNPs, they are linked to aging and age-associated disease. This paradoxical result is, in fact, consistent with the "antagonistic pleiotropy" model of aging [46], which posits that the dysfunction during aging may be a byproduct of allelic variants that are functionally adaptive during developmental and reproductive stages.
As noted above, although multiple previous studies incorporating intuitive SNP-Age terms in a regression framework have reported potential interactions [20,21], they suffer from both the detection power and insights about explicit interaction mechanisms, which are addressed by our approach. The numbers of SNiPage-detected TF-SNP interactions are orders of magnitude greater than the previously reported numbers of age-SNP interactions (~20). Using the age-associated TFs' expression as proxy for age, and genome-wide gene expression values as molecular phenotype, allowed for detection of unprecedented numbers of significant SNP-Age interactions in multiple tissues. Furthermore, in contrast to previous studies, which identified interactions in a tissue-independent manner, our study identifies TF-SNP interactions independently in 25 tissues, enabling analysis of tissue-specific TF-SNP interaction effects on various phenotypes. It is worth noting that our ability to detect age-associated TFs depends on the age distribution among the samples used, which, in currently available databases such as GTEx, does not perfectly represent the population in terms of both age distribution and race composition, and furthermore, varies across tissues. Likewise, by necessity, our study is limited to relatively well studied TFs having a DNA binding motif available.
As an "environmental" risk factor, aging is likely to affect various age-associated phenotypes via gene regulation, which could be complex and diverse. Age-associated TFs represent just one specific agent of aging, and additional factors could be involved. For instance, age-associated splicing factor or epigenetic modification enzymes, both of which play essential roles in

PLOS GENETICS
gene regulation and are linked to complex diseases, are potential agents of aging. We have previously observed age-associated splicing factor expression changes across tissues [28]. Analogous to our current study, age-associated splicing factors can bind to polymorphic RNA motifs and form the basis for interaction between SNPs and aging. As for DNA methylation, Horvath et al. have shown methylation to be a robust biomarker of Age [40] and moreover, it is well known that methylation usually blocks TF binding [47][48][49]. Thus, it is reasonable to expect that ageassociated methylation changes could affect TF binding landscape across aging, in which case, age factor would interact with allele-specific binding sites to regulate gene expression. It is especially exciting to note that DNA methylation has also been identified as a robust biomarker for various cancers [50]. Combination of genotype, age-associated molecular changes including regulatory proteins and methylation, taken together, may provide novel insights into cancer.
Tissue-specificity of gene regulation complicates our understanding of complex diseases. We expect that our analysis across 25 tissues would shed light on molecular basis underlying tissue-specific genetic regulation. In fact, we specifically observed a greater number of interactions in tissues such as whole blood, Brain-Cortex and Colon-Transverse, which might imply that the genotype-associated functional diversity of those tissues may be more agedependent. What's more, our data can be used to draw links between tissues and complex diseases, as has been suggested based on eQTL studies [16,17]. For instance, we found that in multiple known hypertension-related tissues our detected TF-SNP interactions significantly contribute, from statistical standpoint, to hypertension state of an individual.
Our results can inform the annotation and interpretation of GWAS and eQTL signals. In recent years, large scale GWAS and eQTL signals have been reported, however, it is still difficult to decode the observed associations into explicit pathways and mechanisms. Alignment of our detected interaction with reported associations by GWAS could suggest potential explicit links between environment, genetics, and phenotypes.
Our main goal here was to explore SNP-Age interactions and thus our model first identifies Age-associated TFs with allele-specific binding and then assesses the allele-specificity of the target gene's expression association with Age. However, our overall strategy can be easily adapted for other contexts, for instance, all TFs regardless of their association with Age, to explore general allele-specific regulatory effects of TFs, or to assess allele-specific effects of TFs whose activity are perturbed in a disease, such as cancer.

Transcriptome and epigenomic data
We obtained the processed transcriptome data across 25 tissues from Genotype-Tissue Expression (GTEx) database version 6 [23]. Corresponding tissue-specific DNase-seq, broad peaks data for H3K27ac, H3K4me1 and H3K4me3 were downloaded from Roadmap consortium [24]. In addition, top three principal components generated over SNP profile and PEER factors generated over gene expression profile were also obtained from GTEx database. GENCODE genome annotation version 19 (hg19) [51] was used in this study.

Detecting age-associated TFs
We used the linear regression model from previous studies [5,29] to detect significant ageassociated genes as follows: Where g ij is the expression of target gene i in jth sample and α i is the basal expression and the intercept on y axis for gene i. β i is the coefficient for covariates in the ith gene model. AGE j and SEX j are the age and sex for jth sample respectively. S j is the covariate derived from SNPs for jth sample. The SNP-derived covariates are expected to control for the hidden race and ethnicity confounders, which we did not directly include in the model due to biases and sparsity of such information in the GTEx database. PEERðCF k j Þ is the kth PEER factor (hidden variable) derived over gene expression profile for jth sample. ε ij denotes the error term.
We removed PEER factors, which significantly correlate with age (Pearson correlation p-value� 0.05) [28,52]. We selected all 518 transcription factors from TRANSFAC 2011 having a Positional Weight Matrix. TF genes for which the coefficient of age b 1 i significantly deviated from zero (FDR � 0.1) were considered age associated. In all downstream analyses where TF expression is used, we use the adjusted TF expression as the residual after removing all components except for the Age component from the TF expression in Eq (1) above.

Putative allele-specific binding site prediction
We generated allele-specific sequence (~100 bps) around all the SNP loci and scanned the sequences for age-associated TFs' binding sites based on published binding motifs in form of Positional Weight Matrix (PWM), using PWM-scan tool [53]. Allele-specific binding is said to occur at a SNP when the putative TF binding predictions are significant (at least one PWM-SCAN hit with the default cutoff setting using stage 2, which covers that SNP loci) in exactly one of the two alleles. SNPs having allele-specific binding were additionally filtered using tissue-specific DNase-seq broad peaks data (corresponding cell types from Roadmap Epigenomics Project are listed in S5 Table) to retain only the accessible loci to test for TF-SNP interactions.

Significant interaction detection
We modeled the association between the interaction (TF-SNP) and potential target gene expression (within 1 Mb) as follows: Alternative Interaction model: g ij Eq (2) is the null model, and Eq (3) is the alternative interaction model which includes the additional interaction term between SNP and TF. Log-likelihood-ratio tests were performed to evaluate the significance of interaction's contribution to target gene expression. g ij is the expression of gene i in ith sample, α i is the basal expression and the intercept on y axis for gene i. β i is the coefficient for covariates in the ith gene model. AGE j and SEX j are the age and sex respectively for jth sample. SNP j is allele frequency for ith sample.TF j is adjusted concentration of TF which only includes age component (residuals after controlled for sex, covariates derived from SNP profile and PEER factors that are not correlated with age) for jth sample. The interaction term is denoted as SNP j �TF j . S j is the covariate derived from SNPs for jth sample and PEERðCF k j Þ is the kth PEER factor (hidden variable) derived over gene expression profile for jth sample.
To select interactions consistent with our hypothesis and allele-specific TF binding prediction, we performed further filtering: we divided individuals into three categories: heterozygous (one binding and one non-binding allele), homozygous for binding allele, homozygous for non-binding allele. We retained the interactions only if the correlations between TF expression and target gene expression are significant for heterozygous and 'homozygous for binding allele' groups, but not for 'homozygous for non-binding allele' group. In addition, we require a greater correlation in 'homozygous for binding allele group' than for heterozygous group.

Gene functional analysis
We performed functional analysis on target genes involved in significant interaction across 25 tissues using R package "GoStats" [54]. Then a TreeMap view is generated using Revigo package [55] for the GO terms that were enriched (FDR � 0.05) in at least 3 tissues.

Enrichment test for epigenomic signals
We extracted~200 bps windows centered at SNP loci and mapped raw DNase-seq, ChIP-seq signals for H3K27ac, H3K4me1, and H3K4me3 (corresponding cell types from Roadmap Epigenomics Project are listed in S5 Table) to those windows using "bedtools". The average score across all the base pairs within the window is taken as the signal score. One tailed Wilcoxon test is performed to compare the signal score distributions between significant SNPs and other tested SNPs.

Robustness test
We randomly under-sampled datasets (70%, 80%, 90%) to estimate the replication rate for detected interactions and age-associated TF from the original dataset. The replicate rates were compared with that from the background dataset.

DNase allelic imbalance test
We obtained 2-3 DNase-Seq samples (bam files after alignment listed in S6 Table) for each tissue and performed variant calling for our detected SNPs using "samtools" [56]. We specifically select heterozygous SNPs with at least 10 mapped reads. Then allelic imbalance score is estimated as jlog2 number of reads from allele 1þ1eÀ 6 number of reads from allele 2þ1eÀ 6 � � j. The background was sampled from a binomial distribution, which has the same read depth as the foreground and probabilities for observing either allele is 0.5. We estimated the allele imbalance for both foreground and background, then performed one tail Wilcoxon test to compare the two distributions.

Hypertension association analysis
We model the association between interactions and hypertension as follows: The NULL model: Hypertension status~TF + SNP + confounding factors; The interaction model: Hypertension status~TF + SNP + SNP×TF + confounding factors; The confounding factors used here are the same as ones in the interaction detection step. Log likelihood ratio test was performed to test whether detected SNP-TFs significantly contribute to hypertension. To reduce feature dimensionality, we performed PCA and took top 30 components to represent the information.

CHIA-PET analysis
We obtained CHIA-Pet data from [15], which merged the CHIA-Pet data from 4 cell lines (K562, Hela, Nb4 and MCF7). Then we quantified the support for spatial proximity for a SNP-Gene pair using"bedtools". To test the significance, we generate a background by selecting a random pair of SNP and target gene within 1M bp distance while controlling for the distance to match those for the foreground SNP-Gene pairs; for each pair of detected SNP and target gene, we selected a random pair of SNP and gene with approximate distance (within 10% difference) as corresponding background). We then applied one-sided wilcoxon test to compare the signal difference bwteen the foreground and the background.

Conservation analysis
We obtained PhastCons scores derived from multiple alignment across 100 species from UCSC browser [57]. "bedtools" was used to calculate the mean PhastCons score across~200 bps centered around detected SNPs or background. We used three different background sets: random genome-wide SNPs, random SNPs that had passed DNase filtration, and tissue-specific eSNPs obtained from GTEx [23]. Wilcoxon test was used to compare the difference between foreground and background.

Derived allele frequency
The variant calling files (VCF file) were obtained from 1000 genome consortium [34], then we calculated the DAF (derived allele frequency) for both our detected SNPs and tissue specific eSNPs from GTEx. Wilcoxon test was used to compare the difference.
We obtained a comprehensive published disease-tissue map to select potential causal tissues for all the diseases involved in our study [44]. For each detected SNP in each corresponding tissue, we calculated its distance to the closest phenotype-associated SNP. As a control, we applied the same procedure to eSNPs (SNPs detected by eQTL models) previously reported by GTEx [23]. One-sided Wilcoxon test was performed to compare the two distance distributions.